On 4/12/2018 4:57 AM, neotorand wrote:
I read from the link you shared that
"Shard cannot contain more than 2 billion documents since Lucene is using
integer for internal IDs."
In which java class of SOLR implimentaion repository this can be found.
The 2 billion limit is a *hard* limit from Lucene. It's not in Solr.
It's pretty much the only hard limit that Lucene actually has - there's
a workaround for everything else. Solr can overcome this limit for a
single index by sharding the index into multiple physical indexes across
multiple servers, which is more automated in SolrCloud than in
standalone mode.
The 2 billion limit per individual index can't be raised. Lucene uses an
"int" datatype to hold the internal ID everywhere it's used. Java
numeric types are signed, which means that the maximum number a 32-bit
data type can hold is 2147483647. This is the value returned by the
Java constant Integer.MAX_VALUE. A little bit is subtracted from that
value to obtain the maximum it will attempt to use, to be absolutely
sure it can't go over.
https://issues.apache.org/jira/browse/LUCENE-5843
Raising the limit is theoretically possible, but not without *MAJOR*
surgery to an extremely large amount of Lucene's code. The risk of bugs
when attempting that change is *VERY* high -- it could literally take
months to find them all and fix them.
The two most popular search engines using Lucene are Solr and
elasticsearch. Both of these packages can overcome the 2 billion limit
with sharding.
Summary: The 2 billion document limit can be frustrating, but since an
index that large on a single machine is most likely not going to perform
well and should be split across several machines, there's almost no
value to raising the limit and risking a large number of software bugs.
Thanks,
Shawn