TermsQuery works by pulling the postings lists for each term and OR-ing them 
together to create a bitset, which is very memory-efficient but means that you 
don't know at doc collection time which term has actually matched.

For your case you probably want to create a SpanOrQuery, and then iterate 
through the resulting Spans in a specialised Collector.  Depending on how many 
terms you want, though, you may end up requiring a lot of memory for the search.

Alan Woodward
www.flax.co.uk


On 2 Nov 2015, at 17:14, Upayavira wrote:

> I have a scenario where I want to search for documents that contain many
> terms (maybe 100s or 1000s), and then know the number of terms that
> matched. I'm happy to implement this as a query object/parser.
> 
> I understand that Lucene isn't well suited to this scenario. Any
> suggestions as to how to make this more efficient? Does the TermsQuery
> work differently from the BooleanQuery regarding large numbers of terms?
> 
> Upayavira

Reply via email to