I am working with an in index of ~10 million documents. The index does not change often.

I need to preform some external search criteria that will return some number of results -- this search could take up to 5 mins and return anywhere from 0-10M docs.

I would like to use the output of this long running query as a filter in solr.

Any suggestions on how to wire this all together?

My initial ideas (I have not implemented anything yet -- just want to check with you all before starting down the wrong path) is to: * assume the index will always be optimized, in this case every id maps to a lucene int id.
* Store the results of the expensive query as a bitset.
* use the stored bitset in the lucene query.

I'm sure I can get this to work, but it seems kinda ugly (and brittle). Any better thoughts on how to do this? If we had some sort of external tagging interface, each document could just get tagged with what query it matches.

thanks
ryan


Reply via email to