Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Erick Erickson
Tom: I'll be very interested to see your final numbers. I did a worst-case test at one point and saw a 2/3 reduction, but that was deliberately "worst case", I used a bunch of string/text types, did some faceting on them, etc, IOW not real-world at all. So it'll be cool to see what you come up

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Tom Burton-West
Thanks Mike, We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 with 4,6, and 8GB for heap. As of Friday night when the indexes were about half done (about 400GB on disk) only the 4GB had issues. I'll find out on Monday if the other runs had issues. If we can go from 10GB i

Re: Questions regarding Lucene 5

2015-01-10 Thread Jack Krupansky
When? Soon! How soon? U... next question! My guess is within a few weeks - or as soon as the Solr guys finish up. Lucene 5 does have RoaringDocIdSet, an implementation of the Roaring Bitmaps paper. It can be used anywhere a DocIdSet can be used, which would include the facet module, but you

Questions regarding Lucene 5

2015-01-10 Thread Elad Margalit
Hi, I would like to ask regarding Lucene 5, Do you have any estimation when it will be ready? Another small question, will Lucene 5 use the RoarBitset for faceted search? Thanks, Sincerely, Elad Margalit

Re: Finding a match for an automaton against a FST

2015-01-10 Thread Olivier Binda
On 01/10/2015 11:00 AM, Michael McCandless wrote: On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda wrote: Hello. 1) What is the best way to check if an automaton (from a regex or a string with a wildcard) has at least 1 match against a FST (from a WFSTCompletionLookup) ? You need to implement "i

RE: cloning a NumericTermAttributeImpl

2015-01-10 Thread Uwe Schindler
Hi, I checked it out a second time. We *can* implement deep clone. Actually this is a bug from the time when we changed to BytesRefBuilder. I opened https://issues.apache.org/jira/browse/LUCENE-6173 about this. Thanks. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.the

RE: cloning a NumericTermAttributeImpl

2015-01-10 Thread Uwe Schindler
Hi, NumericTokenStream is an internal class to implement NumericField. It is only public for historical reasons (was used by Solr) and because it is in another package. The Attributes it uses do not implement clone, because they have some internal state. Cloning is also not needed, because in "

Re: index writer closes due to OOM/heap space issue but no recovery after GC

2015-01-10 Thread Michael McCandless
IW closes itself on "tragic" events like OOME to guard against index corruption Mike McCandless http://blog.mikemccandless.com On Fri, Jan 9, 2015 at 4:04 PM, Tom Burton-West wrote: > Hello, > > I'm testing Solr 4.10.2 with 4GB allocated to the heap. During the indexing > process I get an

Re: Finding a match for an automaton against a FST

2015-01-10 Thread Michael McCandless
On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda wrote: > Hello. > > 1) What is the best way to check if an automaton (from a regex or a string > with a wildcard) > has at least 1 match against a FST (from a WFSTCompletionLookup) ? You need to implement "intersect". We already have this method for

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Michael McCandless
The first int to Lucene41PostingsFormat is the min block size (default 25) and the second is the max (default 48) for the block tree terms dict. The max must be >= 2*(min-1). Since you were using 8X the default before, maybe try min=200 and max=398? However, block tree should have been more RAM