Congratulations and thank you, Jan! It is so exciting that Solr is now a
TLP!
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta wrote:
> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC
> nominated and elected Jan
We pass ExecutorService to Lucene's IndexSearcher at Amazon (for customer
facing product search) and it's a big win on long-pole query latencies, but
hurts red-line QPS (cluster capacity) a bit, due to less efficient
collection across segments and thread context switching.
I'm surprised it's not
Hello,
Indeed, you cosmetic fix looks great -- I'll push that change. Thanks for
noticing and raising!
Mike McCandless
http://blog.mikemccandless.com
On Tue, Apr 16, 2019 at 12:04 AM zhenyuan wei wrote:
> Hi,
>With current newest version, 9.0.0-snapshot,In
>
I'm not sure this is what's affecting you, but you might try upgrading to
Lucene/Solr 7.1; in 7.0 there were big improvements in using multiple
threads to resolve deletions:
http://blog.mikemccandless.com/2017/07/lucene-gets-concurrent-deletes-and.html
Mike McCandless
Actually, it's one lucene segment per *concurrent* indexing thread.
So if you have 10 indexing threads in Lucene at once, then 10 in-memory
segments will be created and will have to be written on refresh/commit.
Elasticsearch uses a bounded thread pool to service all indexing requests,
which I
t; I am investigating the question, if this change is still needed in 6.5.1
>> or can this be achieved by any other configuration?
>>
>> For now, we are not planning to use NRT and solrCloud.
>>
>>
>> Thanks
>> Nawab
>>
>> On Sun, May 28, 20
Sorry, yes, that commit was one of many on a feature branch I used to work
on LUCENE-5438, which added near-real-time index replication to Lucene.
Before this change, Lucene's replication module required a commit in order
to replicate, which is a heavy operation.
The writeAllDeletes boolean
zingInfixSuggester instead of a regular Solr index (since both are
> using standard Lucene?) is that the AInfixSuggester does sorting at
> index-time using the weightField? So it's only ever advantageous to use
> this Suggester if you need sorting based on a field?
>
> Thanks
>
AnalyzingInfixSuggester uses index-time sort, to sort all postings by the
suggest weight, so that lookup, as long as your sort by the suggest weight
is extremely fast.
But if you need to rank at lookup time by something not "congruent" with
the index-time sort then you lose that benefit.
Mike
I think you can use the term stats that Lucene tracks for each field.
Compare Terms.getSumTotalTermFreq and Terms.getDocCount. If they are
equal it means every document that had this field, had only one token.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Nov 11, 2016 at 5:50 AM,
26 August 2016, Apache Solr 6.2.0 available
Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search,
awn Heisey <apa...@elyograg.org> wrote:
> On 6/16/2016 2:35 AM, Michael McCandless wrote:
> >
> > Hmm, merging can't read at 800 MB/sec and only write at 20 MB/sec for
> > very long ... unless there is a huge percentage of deletes. Also, by
> > default CMS does
Hmm, merging can't read at 800 MB/sec and only write at 20 MB/sec for very
long ... unless there is a huge percentage of deletes.
Also, by default CMS doesn't throttle forced merges (see
CMS.get/setForceMergeMBPerSec).
Maybe capture IndexWriter.setInfoStream output?
Mike McCandless
I added a comment on the INFRA issue.
I don't understand why it periodically "gets stuck".
Mike McCandless
http://blog.mikemccandless.com
On Fri, Oct 23, 2015 at 11:27 AM, Kevin Risden
wrote:
> It looks like both Apache Git mirror
IBM's J9 JVM unfortunately still has a number of nasty bugs affecting
Lucene; most likely you are hitting one of these. We used to test J9
in our continuous Jenkins jobs, but there were just too many
J9-specific failures and we couldn't get IBM's attention to resolve
them, so we stopped. For now
October 2014, Apache Solr™ 4.10.4 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted
Also see this G+ post I wrote up recently showing how %tg deletions
changes over time for an every add also deletes a previous document
stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD
Mike McCandless
http://blog.mikemccandless.com
On Wed, Dec 31, 2014 at 12:21 PM,
They should be reused if the impl. allows for it.
Besides reducing GC cost, it can also be a sizable performance gain
since these enums can have quite a bit of state that otherwise must be
re-initialized.
If you really don't want to reuse them (force a new enum every time), pass null.
Mike
October 2014, Apache Solr™ 4.10.2 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.10.2
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted
September 2014, Apache Solr™ 4.10.1 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.10.1
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting,
September 2014, Apache Solr™ 4.9.1 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.9.1
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted
release
that is critical to the RM (Michael McCandless) and/or an organization
where he has influence or liability. Apparently this was a more
expedient path than completely validating a 4.10 upgrade and waiting for
the 4.10.1 bugfix release. Validating the 4.10 upgrade probably would
have taken
Soft commit (i.e. opening a new IndexReader in Lucene and closing the
old one) should make those go away?
The .nfsX files are created when a file is deleted but a local
process (in this case, the current Lucene IndexReader) still has the
file open.
Mike McCandless
The default terms dictionary (BlockTree) also uses a trie index
structure to locate the block on disk that may contain a target term.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jun 5, 2014 at 12:11 PM, Shawn Heisey s...@elyograg.org wrote:
I just have want know that does the
RC2 is being voted on now ... so it should be soon (a few days, but
more if any new blocker issues are found and we need to do RC3).
Mike McCandless
http://blog.mikemccandless.com
On Sat, Mar 29, 2014 at 2:26 PM, Puneet Pawaia puneet.paw...@gmail.com wrote:
Hi
Any idea on the expected date
You told the fieldType to use SimpleText only for the postings, not
all other parts of the codec (doc values, live docs, stored fields,
etc...), and so it used the default codec for those components.
If instead you used the SimpleTextCodec (not sure how to specify this
in Solr's schema.xml) then
I think it's best to use one of the many autosuggesters Lucene/Solr provide?
E.g. AnalyzingInfixSuggester is running here:
http://jirasearch.mikemccandless.com
But that's just one suggester... there are many more.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Mar 17, 2014 at 10:44
I suspect (not certain) one reason for the performance difference with
Solr vs Lucene joins is that Solr operates on a top-level reader?
This results in fast joins, but it means whenever you open a new
reader (NRT reader) there is a high cost to regenerate the top-level
data structures.
But if
Look in lucene's join module?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 30, 2014 at 4:15 AM, anand chandak anand.chan...@oracle.com wrote:
Hi,
I am trying to find whether the lucene joins (not solr join) if they are
using any filter cache. The API that lucene uses is for
Which version of Java are you using?
That root cause exception is somewhat spooky: it's in the
ByteBufferIndexCode that handles an UnderflowException, ie when a
small (maybe a few hundred bytes) read happens to span the 1 GB page
boundary, and specifically the exception happens on the final read
I have trouble understanding J9's version strings ... but, is it
really from 2008? You could be hitting a JVM bug; can you test
upgrading?
I don't have much experience with Solr faceting on optimized vs
unoptimized indices; maybe someone else can answer your question.
Lucene's facet module (not
On Mon, Jan 6, 2014 at 3:42 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
I think the key optimization when there are no deletions is that you don't
need to renumber documents and can bulk-copy blocks of contiguous documents,
and that is independent of merge policy. I think :)
On Mon, Dec 30, 2013 at 1:22 PM, Greg Preston
gpres...@marinsoftware.com wrote:
That was it. Setting omitNorms=true on all fields fixed my problem.
I left it indexing all weekend, and heap usage still looks great.
Good!
I'm still not clear why bouncing the solr instance freed up memory,
Likely this is for field norms, which use doc values under the hood.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
gpres...@marinsoftware.com wrote:
Does anybody with knowledge of solr internals know why I'm seeing
instances of
Unfortunately the current SynonymFilter cannot handle posInc != 1 ...
we could perhaps try to fix this ... patches welcome :)
So for now it's best to place SynonymFilter before StopFilter, and
before any other filters that may create graph tokens (posLen 1,
posInc == 0).
Mike McCandless
Output is quite a bit simpler than input because all we do is write a
single stream of bytes with no seeking (append only), and it's done
with only one thread, so I don't think there'd be much to gain by
using the newer IO APIs for writing...
Mike McCandless
http://blog.mikemccandless.com
On
The default is 2.0, and higher values will more strongly favor merging
segments with deletes.
I think 20.0 is likely way too high ... maybe try 3-5?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jun 18, 2013 at 6:46 PM, Petersen, Robert
robert.peter...@mail.rakuten.com wrote:
Hi
19, 2013 at 1:36 PM, Petersen, Robert
robert.peter...@mail.rakuten.com wrote:
OK thanks, will do. Just out of curiosity, what would having that set way
too high do? Would the index become fragmented or what?
-Original Message-
From: Michael McCandless [mailto:luc
You could also try the new[ish] PostingsHighlighter:
http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
Mike McCandless
http://blog.mikemccandless.com
On Sat, Jun 15, 2013 at 8:50 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
If you have very large
Alas I think CheckIndex can't do much here: there is no segments file,
so you'll have to reindex from scratch.
Just to check: did you ever called commit while building the index
before the machine crashed?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Apr 30, 2013 at 8:17 PM, Otis
Be sure to test the bloom postings format on your own use case ... in
my tests (heavy PK lookups) it was slower.
But to answer your question: I would expect a single segment index to
have much faster PK lookups than a multi-segment one, with and without
the bloom postings format, but bloom may
At the Lucene level, you don't have to commit before doing the
deleteByQuery, i.e. 'a' will be correctly deleted without any
intervening commit.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Apr 15, 2013 at 3:57 PM, Shawn Heisey s...@elyograg.org wrote:
Simple question first: Is there
On Tue, Mar 12, 2013 at 11:24 PM, Yonik Seeley yo...@lucidworks.com wrote:
On Tue, Mar 12, 2013 at 10:27 PM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
Lucene seems to get a new DrillSideways functionality on top of its own
facet implementation.
I would love to have something like that
It really should be unlimited: this setting has nothing to do with how
much RAM is on the computer.
See http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Mike McCandless
http://blog.mikemccandless.com
On Tue, Feb 26, 2013 at 12:18 PM, zqzuk ziqizh...@hotmail.co.uk wrote:
I'm not very familiar with how AnalyzingSuggester works inside Solr
... if you try this directly with the Lucene APIs does it still
happen?
Hmm maybe one idea: if you remove whitespace from your suggestion does
it work? I wonder if there's a whitespace / multi-token issue ... if
so then maybe
Lucene's misc module has HighFreqTerms tool.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Nov 7, 2012 at 1:15 PM, Edward Garrett heacu.mcint...@gmail.com wrote:
hi,
is there a simple way to get a list of all terms that occur in a field
sorted by their total term frequency within
With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
(FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
bytes/sec load on the IO system due to merging.
Not sure if it's been exposed in Solr / ElasticSearch yet ...
Mike McCandless
http://blog.mikemccandless.com
On
Python's unicode function takes an optional (keyword) errors
argument, telling it what to do when an invalid UTF8 byte sequence is
seen.
The default (errors='strict') is to throw the exceptions you're
seeing. But you can also pass errors='replace' or errors='ignore'.
See
See also SOLR-3390.
Some cases have been addressed. Eg, if you match domain name system
- dns, then dns will have correct offsets spanning the full phrase
domain name system in the input. (However: QueryParser won't work
because a query for domain name system is pre-split on whitespace so
the
Actually FST (and SynFilter based on it) was backported to 3.x.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Aug 3, 2012 at 11:28 AM, Jack Krupansky j...@basetechnology.com wrote:
The Lucene FST guys made a big improvement in synonym filtering in
Lucene/Solr 4.0 using FSTs. Or are
Hi,
You might want to take a look at Solr's trunk (very soon to be 4.0.0
alpha release), which already has a near-real-time solution (using
Lucene's near-real-time APIs).
Lucene has NRTCachingDirectory (to use RAM for small / recently
flushed segments), but I don't think Solr uses it yet.
Mike
Looks like this is a low-level Linux issue ... see Shay's email to the
ElasticSearch list about it:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY
Also see the comments here:
http://news.ycombinator.com/item?id=4182642
Mike McCandless
Is it possible the Linux machine has bad RAM / bad disk?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Jun 18, 2012 at 7:06 AM, Erick Erickson erickerick...@gmail.com wrote:
Is it possible that you somehow have some problem with jars and classpath?
I'm wondering because this problem
This behavior has changed.
In 3.x, you silently got no results in such cases.
In trunk, you get an exception notifying you that the query cannot run.
Mike McCandless
http://blog.mikemccandless.com
On Thu, May 24, 2012 at 6:04 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
Hi,
What is
I believe termPositions=false refers to the term vectors and not how
the field is indexed (which is very confusing I think...).
I think you'll need to index a separate field disabling term freqs +
positions than the field the queryparser can query?
But ... if all of this is to just do custom
This is a good question...
I don't know much about how Solr's transaction log works, but, peeking
in the code, I do see it fsync'ing (look in TransactionLog.java, in
the finish method), but only if the SyncLevel is FSYNC.
If the default is really flush, I don't see how the transaction log
helps
By default, the default merge policy (TieredMergePolicy) won't create
the CFS if the segment is very large ( 10% of the total index
size). Likely that's what you are seeing?
If you really must have a CFS (how come?) then you can call
TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where
(Final)
KERNEL NAME: 2.6.18-128.el5
UPTIME: up 71 days
LOAD AVERAGE: 1.42, 1.45, 1.53
JBOSS Version: Implementation-Version: 4.2.2.GA (build:
SVNTag=JBoss_4_2_2_GA date=20
JAVA Version: java version 1.6.0_24
On Thu, Apr 12, 2012 at 3:07 AM, Michael McCandless
luc
]
--
From: Michael McCandless luc...@mikemccandless.com
Date: Sat, Mar 31, 2012 at 3:15 AM
To: solr-user@lucene.apache.org
It's the virtual memory limit that matters; yours says unlimited below
(good!), but, are you certain that's really the limit your Solr
process runs with?
On Linux
Do you mean you are pre-sorting the documents (by what criteria?)
yourself, before adding them to the index?
In which case... you should already be seeing some benefits (smaller
index size) than had you randomly added them (ie the vInts should
take fewer bytes), I think. (Probably the savings
, Michael McCandless
luc...@mikemccandless.com wrote:
It's the virtual memory limit that matters; yours says unlimited below
(good!), but, are you certain that's really the limit your Solr
process runs with?
On Linux, there is also a per-process map count:
cat /proc/sys/vm/max_map_count
Are you seeing a real problem here, besides just being alarmed by the
big numbers from top?
Consumption of virtual memory by itself is basically harmless, as long
as you're not running up against any of the OS limits (and, you're
running a 64 bit JVM).
This is just top telling you that you've
Hmm, unless the ulimits are low, or the default mergeFactor was
changed, or you have many indexes open in a single JVM, or you keep
too many IndexReaders open, even in an NRT or frequent commit use
case, you should not run out of file descriptors.
Frequent commit/reopen should be perfectly fine,
It's the virtual memory limit that matters; yours says unlimited below
(good!), but, are you certain that's really the limit your Solr
process runs with?
On Linux, there is also a per-process map count:
cat /proc/sys/vm/max_map_count
I think it typically defaults to 65,536 but you should
On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee
prasen@gmail.com wrote:
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share
the same underlying in-memory datastructure so that IndexSearcher need
not be reopened with every commit.
Because the semantics of an
Thank you Ingo!
I think post the 3.x patch directly on the issue?
I'm not sure why this wasn't backported to 3.x the first time around...
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 5, 2012 at 8:15 AM, Ingo Renner i...@typo3.org wrote:
Hi all,
I've backported LUCENE-995 to
Awesome, thanks Ingo... I'll have a look!
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 5, 2012 at 9:23 AM, Ingo Renner i...@typo3.org wrote:
Am 05.01.2012 um 15:05 schrieb Michael McCandless:
Thank you Ingo!
I think post the 3.x patch directly on the issue?
thanks
Which version of Solr/Lucene were you using when you hit power loss?
There was a known bug that could allow power loss to cause corruption,
but this was fixed in Lucene 3.4.0.
Unfortunately, there is no easy way to recreate the segments_N file...
in principle it should be possible and maybe not
On Mon, Nov 28, 2011 at 10:49 AM, Roberto Iannone
iann...@crmpa.unisa.it wrote:
Hi Michael,
thx for your help :)
You're welcome!
2011/11/28 Michael McCandless luc...@mikemccandless.com
Which version of Solr/Lucene were you using when you hit power loss?
I'm using Lucene 3.4.
Hmm, which
Lucene itself has BlockJoinQuery/Collector (in contrib/join), which is
what ElasticSearch is using under the hood for its nested documents (I
think?).
But I don't think this has been exposed in Solr yet patches welcome!
Mike McCandless
http://blog.mikemccandless.com
On Tue, Nov 8, 2011 at
On Fri, Oct 28, 2011 at 3:27 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
one more thing, after somebody (thanks robert) pointed me at the
stacktrace it seems kind of obvious what the root cause of your
problem is. Its solr :) Solr closes the IndexWriter on commit which is
very
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
luc...@mikemccandless.com wrote:
Well... the limitation of DocValues is that it cannot handle more than
one value per document (which UnInvertedField can
Well... the limitation of DocValues is that it cannot handle more than
one value per document (which UnInvertedField can).
Hopefully we can fix that at some point :)
Mike McCandless
http://blog.mikemccandless.com
On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
simon.willna...@googlemail.com
Can you attach this PDF to an email send to the list? Or is it too
large for that?
Or, you can try running Tika directly on the PDF to see if it's able
to extract the text.
Mike McCandless
http://blog.mikemccandless.com
2011/10/5 Héctor Trujillo hecto...@gmail.com:
Sorry you have the
Hmm, no attachment; maybe it's too large?
Can you send it directly to me?
Mike McCandless
http://blog.mikemccandless.com
2011/10/5 Héctor Trujillo hecto...@gmail.com:
This is the file that give me errors.
2011/10/5 Michael McCandless luc...@mikemccandless.com
Can you attach this PDF
if omitPositions is made false.
Thanks,
Isan Fulia.
On 29 September 2011 17:49, Michael McCandless
luc...@mikemccandless.comwrote:
Once a given field has omitted positions in the past, even for just
one document, it sticks and that field will forever omit positions.
Try creating a new index
Once a given field has omitted positions in the past, even for just
one document, it sticks and that field will forever omit positions.
Try creating a new index, never omitting positions from that field?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia
On Wed, Sep 21, 2011 at 10:10 PM, Michael Sokolov soko...@ifactory.com wrote:
I wonder if config-file validation would be helpful here :) I posted a patch
in SOLR-1758 once.
Big +1.
We should aim for as stringent config file checking as possible.
Mike McCandless
Are you sure you are using a 64 bit JVM?
Are you sure you really changed your vmem limit to unlimited? That
should have resolved the OOME from mmap.
Or: can you run cat /proc/sys/vm/max_map_count? This is a limit on
the total number of maps in a single process, that Linux imposes. But
the
and now it looks like everything works
as aspected.
Need some further testing with the java versions, but I'm quite optimistic.
Best regards
Ralf
Am 22.09.2011 14:46, schrieb Michael McCandless:
Are you sure you are using a 64 bit JVM?
Are you sure you really changed your vmem limit
soft nofile 49151
Thanks,
Shawn
On 9/22/2011 9:56 AM, Michael McCandless wrote:
OK, excellent. Thanks for bringing closure,
Mike McCandless
http://blog.mikemccandless.com
On Thu, Sep 22, 2011 at 9:00 AM, Ralf Matulatralf.matu...@bundestag.de
wrote:
Dear Mike,
thanks for your
Since you hit OOME during mmap, I think this is an OS issue not a JVM
issue. Ie, the JVM isn't running out of memory.
How many segments were in the unoptimized index? It's possible the OS
rejected the mmap because of process limits. Run cat
/proc/sys/vm/max_map_count to see how many mmaps are
September 14 2011, Apache Solr™ 3.4.0 available
The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0.
Apache Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit
Even if it applies, this is for Lucene. I don't think we've added
Solr support for this yet... we should!
Mike McCandless
http://blog.mikemccandless.com
On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson
erickerick...@gmail.com wrote:
Does this JIRA apply?
Closing a searcher while thread(s) is/are still using it is definitely
bad, so, this code looks spooky...
But: it possible something higher up (in Solr) is ensuring this code
runs exclusively? I don't know enough about this part of Solr...
Mike McCandless
http://blog.mikemccandless.com
On
Hi,
I just committed a new block tree terms dictionary implementation,
which requires fully re-indexing any trunk indices.
See here for details:
https://issues.apache.org/jira/browse/LUCENE-3030
If you are using a released version of Lucene/Solr then you can ignore
this message.
Mike
Unfortunately Solr's join impl hasn't been backported to 3.x, as far as I know.
You might want to look at ElasticSearch; it has a join implementation
already or use Solr 4.0.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Aug 17, 2011 at 7:40 PM, Cameron Hurst wakemaste...@z33k.com
This file is actually optional; its there for redundancy in case the
filesystem is not reliable when listing a directory. Ie, normally,
we list the directory to find the latest segments_N file; but if this
is wrong (eg the file system might have stale a cache) then we
fallback to reading the
I think we should fix replication to copy it?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Aug 4, 2011 at 8:16 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
Am 04.08.2011 12:52, schrieb Michael McCandless:
This file is actually optional; its there for redundancy in case
I believe the underlying grouping module is now technically able to do
this, because subclasses of the abstract first/second pass grouping
collectors are free to decide what type/value the group key is.
But, we have to fix Solr to allow for compound keys by creating the
necessary concrete
After your writer.commit you need to reopen your searcher to see the changes.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
@Test
public void testUpdate() throws IOException,
ParserConfigurationException,
On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless
luc...@mikemccandless.com wrote:
After your writer.commit you need to reopen your searcher to see the
changes.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote
Good question... I think in Lucene 4.0, the edit distance is (will be)
in Unicode code points, but in past releases, it's UTF16 code units.
Mike McCandless
http://blog.mikemccandless.com
2011/6/30 Floyd Wu floyd...@gmail.com:
if this is edit distance implementation, what is the result apply to
Which version of Solr (Lucene) are you using?
Recent versions of Lucene now accept ~N 1 to be edit distance. Ie
foobar~2 matches any term that's = 2 edit distance away from foobar.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jun 28, 2011 at 11:00 PM, entdeveloper
/2011 3:18 PM, Michael McCandless wrote:
With segmentsPerTier at 35 you will easily cross 70 segs in the index...
If you want optimize to run in a single merge, I would lower
sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set
your maxMergeAtOnceExplicit to 70 or higher
On Tue, Jun 21, 2011 at 9:42 AM, Shawn Heisey s...@elyograg.org wrote:
On 6/20/2011 12:31 PM, Michael McCandless wrote:
For back-compat, mergeFactor maps to both of these, but it's better to
set them directly eg:
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name
On Sun, Jun 19, 2011 at 12:35 PM, Shawn Heisey s...@elyograg.org wrote:
On 6/19/2011 7:32 AM, Michael McCandless wrote:
With LogXMergePolicy (the default before 3.2), optimize respects
mergeFactor, so it's doing 2 steps because you have 37 segments but 35
mergeFactor.
With TieredMergePolicy
On Mon, Jun 20, 2011 at 4:00 PM, Shawn Heisey s...@elyograg.org wrote:
On 6/20/2011 12:31 PM, Michael McCandless wrote:
Actually, TieredMP has two different params (different from the
previous default LogMP):
* segmentsPerTier controls how many segments you can tolerate in the
index
With LogXMergePolicy (the default before 3.2), optimize respects
mergeFactor, so it's doing 2 steps because you have 37 segments but 35
mergeFactor.
With TieredMergePolicy (default on 3.2 and after), there is now a
separate merge factor used for optimize (maxMergeAtOnceExplicit)... so
you could
Alas, no, not yet.. grouping/field collapse has had a long history
with Solr.
There were many iterations on SOLR-236, but that impl was never
committed. Instead, SOLR-1682 was committed, but committed only to
trunk (never backported to 3.x despite requests).
Then, a new grouping module was
1 - 100 of 231 matches
Mail list logo