Thanks Jeff. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi

________________________________________
From: Jeff Wartes <jwar...@whitepages.com>
Sent: Thursday, May 5, 2016 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Passing Ids in query takes more time

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 
80k ids though is basically 80k searches as far as Solr is concerned, so it’s 
not altogether surprising that it takes a while. Your complaint seems to be 
that the query planner doesn’t know in advance that <other criteria> should be 
run first, and then the id selection applied to the reduced set.

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better 
results from that: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your <other criteria> is the real discriminating factor in your search, 
you could just search for <other critera> and then apply your ID list as a 
PostFilter: http://yonik.com/advanced-filter-caching-in-solr/
I guess that’d look something like &fq={!terms f=<somefield> v="<id list” 
cache=false cost=150}. You’d want cache=false because there’s not much sense 
caching an id list unless that id list is usually the same, and the cost >= 100 
should qualify it as a post filter, which only operates on an already-found 
result set instead of the full index. (Note: I haven’t confirmed that the Terms 
query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 
80k ids at once, but a key-value store like Cassandra might work out better for 
that.

4. There is a thing called a JoinQParserPlugin 
(https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
 that can join to another collection 
(https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and 
there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi" <bhaumik.jo...@outlook.com> wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids 
>as a query to collection2 so the query to collection2 which contains ids in it 
>takes much more time compare to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal 
>query however we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND <other criteria> slower 
>(passing 80k ids takes 7-9 sec) than query-2: only <other criteria> (700-800 
>ms). Both returns 250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
>pass those ids to other one) in efficient manner or any other way to get data 
>from one collection based on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi

Reply via email to