Re: maximum index size
Chris Fraschetti wrote: I've seen throughout the list mentions of millions of documents.. 8 million, 20 million, etc etc.. but can lucene potentially handle billions of documents and still efficiently search through them? Lucene can currently handle up to 2^31 documents in a single index. To a large degree this is limited by Java ints and arrays (which are accessed by ints). There are also a few places where the file format limits things to 2^32. On typical PC hardware, 2-3 word searches of an index with 10M documents, each with around 10k of text, require around 1 second, including index i/o time. Performance is more-or-less linear, so that a 100M document index might require nearly 10 seconds per search. Thus, as indexes grow folks tend to distribute searches in parallel to many smaller indexes. That's what Nutch and Google (http://www.computer.org/micro/mi2003/m2022.pdf) do. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: maximum index size
Given adequate hardware, it can. Take a look at nutch.org. Nutch uses Lucene at its core. Otis --- Chris Fraschetti <[EMAIL PROTECTED]> wrote: > I know the index size is very dependent on the content being index... > > but running on a unix based machine w/o a filesize limit, best case > scenario... what is the largest number of documents that can be > indexed. > > I've seen throughout the list mentions of millions of documents.. 8 > million, 20 million, etc etc.. but can lucene potentially handle > billions of documents and still efficiently search through them? > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
maximum index size
I know the index size is very dependent on the content being index... but running on a unix based machine w/o a filesize limit, best case scenario... what is the largest number of documents that can be indexed. I've seen throughout the list mentions of millions of documents.. 8 million, 20 million, etc etc.. but can lucene potentially handle billions of documents and still efficiently search through them? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]