Erick Erickson [erickerick...@gmail.com] wrote:
> Of course I wouldn't be doing the work so I really don't have much of
> a vote, but it's not clear to me at all that enough people would actually
> have a use-case for 2b+ docs in a single shard to make it
> worthwhile. At that scale GC potentially becomes really unpleasant for
> instance....

Over the last years we have seen a few use cases here on the mailing list. I 
would be very surprised if the number of such cases does not keep rising. 
Currently the work for a complete overhaul does not measure up to the rewards, 
but that is slowly changing. At the very least I find it prudent to not limit 
new Lucene/Solr interfaces to ints.

As for GC: Right now a lot of structures are single-array oriented (for example 
using a long-array to represent bits in a bitset), which might not work well 
with current garbage collectors. A change to higher limits also means 
re-thinking such approaches: If the garbage collectors likes objects below a 
certain size then split the arrays into that. Likewise, iterations over 
structures linear in size to the index could be threaded. These are issues even 
with the current 2b limitation.

- Toke Eskildsen

Reply via email to