Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Michael McCandless
The first int to Lucene41PostingsFormat is the min block size (default 25) and the second is the max (default 48) for the block tree terms dict. The max must be = 2*(min-1). Since you were using 8X the default before, maybe try min=200 and max=398? However, block tree should have been more RAM

Re: index writer closes due to OOM/heap space issue but no recovery after GC

2015-01-10 Thread Michael McCandless
IW closes itself on tragic events like OOME to guard against index corruption Mike McCandless http://blog.mikemccandless.com On Fri, Jan 9, 2015 at 4:04 PM, Tom Burton-West tburt...@umich.edu wrote: Hello, I'm testing Solr 4.10.2 with 4GB allocated to the heap. During the indexing

Re: Finding a match for an automaton against a FST

2015-01-10 Thread Michael McCandless
On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda olivier.bi...@wanadoo.fr wrote: Hello. 1) What is the best way to check if an automaton (from a regex or a string with a wildcard) has at least 1 match against a FST (from a WFSTCompletionLookup) ? You need to implement intersect. We already

Re: Finding a match for an automaton against a FST

2015-01-10 Thread Olivier Binda
On 01/10/2015 11:00 AM, Michael McCandless wrote: On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda olivier.bi...@wanadoo.fr wrote: Hello. 1) What is the best way to check if an automaton (from a regex or a string with a wildcard) has at least 1 match against a FST (from a WFSTCompletionLookup) ?

RE: cloning a NumericTermAttributeImpl

2015-01-10 Thread Uwe Schindler
Hi, I checked it out a second time. We *can* implement deep clone. Actually this is a bug from the time when we changed to BytesRefBuilder. I opened https://issues.apache.org/jira/browse/LUCENE-6173 about this. Thanks. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen

Questions regarding Lucene 5

2015-01-10 Thread Elad Margalit
Hi, I would like to ask regarding Lucene 5, Do you have any estimation when it will be ready? Another small question, will Lucene 5 use the RoarBitset for faceted search? Thanks, Sincerely, Elad Margalit

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Tom Burton-West
Thanks Mike, We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 with 4,6, and 8GB for heap. As of Friday night when the indexes were about half done (about 400GB on disk) only the 4GB had issues. I'll find out on Monday if the other runs had issues. If we can go from 10GB

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Erick Erickson
Tom: I'll be very interested to see your final numbers. I did a worst-case test at one point and saw a 2/3 reduction, but that was deliberately worst case, I used a bunch of string/text types, did some faceting on them, etc, IOW not real-world at all. So it'll be cool to see what you come up

Re: Questions regarding Lucene 5

2015-01-10 Thread Jack Krupansky
When? Soon! How soon? U... next question! My guess is within a few weeks - or as soon as the Solr guys finish up. Lucene 5 does have RoaringDocIdSet, an implementation of the Roaring Bitmaps paper. It can be used anywhere a DocIdSet can be used, which would include the facet module, but you