On 1/3/2015 9:02 AM, Erick Erickson wrote: > bq: For Solr 5 why don't we switch it to 64 bit ?? > > -1 on this for a couple of reasons >> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to >> implement at the last second >> It's not clear that it's even useful. Once you get to that many documents, >> performance usually suffers > > Of course I wouldn't be doing the work so I really don't have much of > a vote, but it's not clear to me at > all that enough people would actually have a use-case for 2b+ docs in > a single shard to make it > worthwhile. At that scale GC potentially becomes really unpleasant for > instance....
I agree, 2 billion documents in a single index is MORE than enough. If you actually create an index that large, you're going to have performance problems, and most of those performance problems will likely be related to garbage collection. I can extrapolate one such problem from personal experience on a much smaller index. A filterCache entry for a 2 billion document index is 256MB in size. Assuming you're using the G1 collector, the maximum size for a G1 heap region is 32MB, which means that at that size, every single filter will result in an object that is allocated immediately from the old generation (it's called a humongous allocation). Allocating that much memory from the old generation will eventually (and frequently) result in a full garbage collection ... and you do not want your application to wait for a full garbage collection on the heap size that would be required for a 2 billion document index. It could easily exceed 30 or 60 seconds. When you consider the current limitations of G1GC, it would be advisable to keep each Solr index below 100 million documents. At 134,217,728 documents, each filter object will be too large (more than 16MB) to be considered a normal allocation on the max heap region size (32MB). Even with the older battle-tested CMS collector (assuming good tuning options), I think the huge object sizes (and the huge number of smaller objects) resulting from a 2 billion document index will have major garbage collection problems. Thanks, Shawn