I am working with an in index of ~10 million documents. The index
does not change often.
I need to preform some external search criteria that will return some
number of results -- this search could take up to 5 mins and return
anywhere from 0-10M docs.
I would like to use the output of this long running query as a filter
in solr.
Any suggestions on how to wire this all together?
My initial ideas (I have not implemented anything yet -- just want to
check with you all before starting down the wrong path) is to:
* assume the index will always be optimized, in this case every id
maps to a lucene int id.
* Store the results of the expensive query as a bitset.
* use the stored bitset in the lucene query.
I'm sure I can get this to work, but it seems kinda ugly (and
brittle). Any better thoughts on how to do this? If we had some sort
of external tagging interface, each document could just get tagged
with what query it matches.
thanks
ryan