Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-13 Thread Chris Hostetter
: : The first int to Lucene41PostingsFormat is the min block size (default : 25) and the second is the max (default 48) for the block tree terms : dict. we were discussing over on the solr-user mailing list how Tom would/could go about configuring Solr to use a custom subclass of Lucene41Postin

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-12 Thread Tom Burton-West
Thanks Mike, Do you know how I can configure Solr to use the min=200 and max=398 block sizes you suggested? Or should I ask on the Solr list? Tom On Sat, Jan 10, 2015 at 4:46 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > The first int to Lucene41PostingsFormat is the min block s

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-12 Thread Tom Burton-West
Thanks Mike, > OK. It would be good to know where all your RAM is being consumed, > and how much of that is really the terms index: it ought to be a very > small part of it. > > I made a bunch of heap dumps. I just watched with jconsole and ran jmap -histo when memory use got high. I've appende

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-11 Thread Michael McCandless
On Sat, Jan 10, 2015 at 7:58 PM, Tom Burton-West wrote: > Thanks Mike, > > We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 > with 4,6, and 8GB for heap. As of Friday night when the indexes were about > half done (about 400GB on disk) only the 4GB had issues. I'll find out

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Erick Erickson
Tom: I'll be very interested to see your final numbers. I did a worst-case test at one point and saw a 2/3 reduction, but that was deliberately "worst case", I used a bunch of string/text types, did some faceting on them, etc, IOW not real-world at all. So it'll be cool to see what you come up

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Tom Burton-West
Thanks Mike, We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 with 4,6, and 8GB for heap. As of Friday night when the indexes were about half done (about 400GB on disk) only the 4GB had issues. I'll find out on Monday if the other runs had issues. If we can go from 10GB i

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-10 Thread Michael McCandless
The first int to Lucene41PostingsFormat is the min block size (default 25) and the second is the max (default 48) for the block tree terms dict. The max must be >= 2*(min-1). Since you were using 8X the default before, maybe try min=200 and max=398? However, block tree should have been more RAM

Details on setting block parameters for Lucene41PostingsFormat

2015-01-09 Thread Tom Burton-West
Hello all, We have over 3 billion unique terms in our indexes and with Solr 3.x we set the TermIndexInterval to about 8 times its default value in order to index without OOMs. ( http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again) We are now working with Solr 4 and running in