: q=title:dogs AND : (flrid:(123 125 139 .... 34823) OR : flrid:(34837 ... 59091) OR : ... OR : flrid:(101294813 ... 103049934))
: The problem with this approach (besides that it's clunky) is that it : seems to perform O(N^2) or so. With 1,000 FLRIDs, the search comes back : in 50ms or so. If we have 10,000 FLRIDs, it comes back in 400-500ms. : With 100,000 FLRIDs, that jumps up to about 75000ms. We want it be on : the order of 1000-2000ms at most in all cases up to 100,000 FLRIDs. How are these sets of flrids created/defined? (undertsanding the source of the filter information may help inspire alternative suggestsions, ie: XY Problem) : * Have Solr do big ORs as a set operation not as (what we assume is) a : naive one-at-a-time matching. It's not as naive as it may seem - scoring of disjunctions like this isn't a matter of asking each doc if it matches each query clause. what happens is that for each segment of the index, each clause of a disjunction is asked to check for the "first" doc it matches in the segment -- which for TermQueries like this just means a quick lookup on the TermEnum, and the lowest (internal) doc num returned by any of the clauses represents the "first" match of that BooleanQuery. All of the other clauses are asked for their "first" and then ultimately they are all asked to skip ahead to their next match, etc... My point being: i don't think your speed observations are based on the number of documents, it's based on the number of query clauses -- which unfortunately happen to be the same in your situation. : * An efficient way to pass a long set of IDs, or for Solr to be able to : pull them from the app's Oracle database. This can definitely be done, there just isn't a general purpose turn key solution for it. The appoach you'd need to take is to implement a "PostFilter" to implement your custom logic for deciding if a document should be in the result set or not, and then generate instances of your PostFilter implemantion in a "QParserPlugin". Here's a blog post with an example of doing this for an ACL type situation, where the parser input specifies a "user" and a CSV file is consulted to get the list of documents the user is allowed to see... http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ ..you could follow a similar model where given some input, you generate a query to your oracle DB to return a Set<String> of IDs to consult in the collect method. -Hoss