Re: filter on millions of IDs from external query

2009-06-09 Thread Michael Ludwig

Ryan McKinley schrieb:

I am working with an in index of ~10 million documents.  The index
does not change often.

I need to preform some external search criteria that will return some
number of results -- this search could take up to 5 mins and return
anywhere from 0-10M docs.


If it really takes so long, then something is likely wrong. You might be
able to achieve a significant improvement by reframing your requirement.


I would like to use the output of this long running query as a filter
in solr.

Any suggestions on how to wire this all together?


Just use it as a filter query. The result will be cached, the query
won't have to be executed again (if I'm not mistaken) until a new index
searcher is opened (after an index update and a commit), or until the
filter query result is evicted from the cache, which you should make
sure won't happen if your query really is so terribly expensive.

Michael Ludwig


filter on millions of IDs from external query

2009-06-03 Thread Ryan McKinley
I am working with an in index of ~10 million documents.  The index  
does not change often.


I need to preform some external search criteria that will return some  
number of results -- this search could take up to 5 mins and return  
anywhere from 0-10M docs.


I would like to use the output of this long running query as a filter  
in solr.


Any suggestions on how to wire this all together?

My initial ideas (I have not implemented anything yet -- just want to  
check with you all before starting down the wrong path) is to:
* assume the index will always be optimized, in this case every id  
maps to a lucene int id.

* Store the results of the expensive query as a bitset.
* use the stored bitset in the lucene query.

I'm sure I can get this to work, but it seems kinda ugly (and  
brittle).  Any better thoughts on how to do this?  If we had some sort  
of external tagging interface, each document could just get tagged  
with what query it matches.


thanks
ryan