Re: filter on millions of IDs from external query
Ryan McKinley schrieb: I am working with an in index of ~10 million documents. The index does not change often. I need to preform some external search criteria that will return some number of results -- this search could take up to 5 mins and return anywhere from 0-10M docs. If it really takes so long, then something is likely wrong. You might be able to achieve a significant improvement by reframing your requirement. I would like to use the output of this long running query as a filter in solr. Any suggestions on how to wire this all together? Just use it as a filter query. The result will be cached, the query won't have to be executed again (if I'm not mistaken) until a new index searcher is opened (after an index update and a commit), or until the filter query result is evicted from the cache, which you should make sure won't happen if your query really is so terribly expensive. Michael Ludwig
filter on millions of IDs from external query
I am working with an in index of ~10 million documents. The index does not change often. I need to preform some external search criteria that will return some number of results -- this search could take up to 5 mins and return anywhere from 0-10M docs. I would like to use the output of this long running query as a filter in solr. Any suggestions on how to wire this all together? My initial ideas (I have not implemented anything yet -- just want to check with you all before starting down the wrong path) is to: * assume the index will always be optimized, in this case every id maps to a lucene int id. * Store the results of the expensive query as a bitset. * use the stored bitset in the lucene query. I'm sure I can get this to work, but it seems kinda ugly (and brittle). Any better thoughts on how to do this? If we had some sort of external tagging interface, each document could just get tagged with what query it matches. thanks ryan