Yes. The trick is to use a hash value on each document. The
SignatureUpdateProcessor provides a tool for this. Store the hash
value in a hex string field.

Now, do wildcard queries on the hash string: hash:a* will randomly
choose 1/16 of the documents. hash:00* will pick 1/256 of the
documents.

On Wed, May 16, 2012 at 6:43 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote:
> Hi Guys
> We have an environment containing billions of documents.
> Faceting over this large result set could take many seconds, and so we
> thought we might be able to use statistical sampling of a smaller result
> set from the facet, and give an approximate result much quicker.
> Is there any way to facet only a random sample of the results?
> Thanks
> Yuval



-- 
Lance Norskog
goks...@gmail.com

Reply via email to