The Salesforce book is 2800 pages of PDF, last I looked. What can you do with a field that big? Can you get all of the snippets?
On Tue, Jun 7, 2011 at 5:33 PM, Fuad Efendi <f...@efendi.ca> wrote: > Hi Otis, > > > I am recalling "pagination" feature, it is still unresolved (with default > scoring implementation): even with small documents, searching-retrieving > documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can > take few minutes (I saw it with trunk version 6 months ago, and with very > small documents, total 100 mlns docs); it is advisable to restrict search > results to top-1000 in any case (as with Google)... > > > > I believe things can get wrong; yes, most plain-text retrieved from books > should be 2kb per page, 500 pages, :=> 1,000,000 bytes (or double it for > UTF-8) > > Theoretically, it doesn't make any sense to index BIG document containing > all terms from dictionary without any "terms frequency" calcs, but even > with it... I can't imagine we should index 1000s docs and each is just > (different) version of whole Wikipedia, should be wrong design... > > Ok, use case: index single HUGE document. What will we do? Create index > with _the_only_ document? And all search will return the same result (or > nothing)? Paginate it; split into pages. I am pragmatic... > > > Fuad > > > > On 11-06-07 8:04 PM, "Otis Gospodnetic" <otis_gospodne...@yahoo.com> wrote: > >>Hi, >> >> >>> I think the question is strange... May be you are wondering about >>>possible >>> OOM exceptions? >> >>No, that's an easier one. I was more wondering whether with 400 MB Fields >>(indexed, not stored) it becomes incredibly slow to: >>* analyze >>* commit / write to disk >>* search >> >>> I think we can pass to Lucene single document containing >>> comma separated list of "term, term, ..." (few billion times)... Except >>> "stored" and "TermVectorComponent"... > > > -- Lance Norskog goks...@gmail.com