Re: Algorithm of retrieving docs

2014-02-13 Thread Harshvardhan Ojha
Thanks Michael for help, this helped me with my problem. Regards Harshvardhan Ojha On Thu, Feb 13, 2014 at 8:51 PM, Michael McCandless < [email protected]> wrote: > The bloom filter is only used by the postings format wrapper, and > we've had mixed results on whether it helps performanc

Re: Algorithm of retrieving docs

2014-02-13 Thread Michael McCandless
The bloom filter is only used by the postings format wrapper, and we've had mixed results on whether it helps performance or not (seems to depend heavily on the exact usage). We have bit set / iterator abstractions (oal.util.Bits, oal.search.DocIdSet/Iterator) to manage "sets" of documents, but mo

Re: Algorithm of retrieving docs

2014-02-13 Thread Harshvardhan Ojha
Hi Mike/Mikhail, Don't you guys think org.apache.lucene.codecs.bloom.FuzzySet.java, contains(BytesRef value) methods returns probablity of having a field, and it is a place where we are using hashing ? Are there any other place in source which when given with document id, could determine by calcu

Re: Algorithm of retrieving docs

2014-02-13 Thread Michael McCandless
Lucene only assigns its int docID during indexing. Retrieving a previously stored document is a O(1), but that involves a disk seek which can be very costly when the page is not in the OS's IO cache. Lucene does not do any caching itself (relies on the OS instead). Have a look at the current def