In the code that estimates facet counts by taking random samples; this is
the inner loop:

final DocIdSetIterator it = docs.bits.iterator();
for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc
= it.nextDoc()) {
  if (counter == randomIndex) {
    sampleDocs.set(doc);
  }
  counter++;
  if (counter >= limit) {
    counter = 0;
    limit = binSize;
    randomIndex = random.nextInt(binSize);
  }
}


So it iterates over each document, skipping them along the way. But the
DocIdSetIterator also provides an 'advance' method, I thought maybe we can
use that to iterate faster?

Something like this:

            final DocIdSetIterator it = docs.bits.iterator();
            int doc = it.nextDoc();
            if ((doc + randomIndex) < docs.totalHits) {
                for (doc = it.advance(doc + randomIndex); doc !=
DocIdSetIterator.NO_MORE_DOCS; doc = it.advance(doc + randomIndex)) {

                    sampleDocs.add(doc);
                    randomIndex = this.random.nextInt(binSize) + 1;
//Can't stay at same document, that does not make sense.
                }
            }


What do you think?

Reply via email to