Re: Passing Ids in query takes more time
Thanks Jeff. TermsQueryParser worked for me. Thanks & Regards, Bhaumik Joshi From: Jeff Wartes <jwar...@whitepages.com> Sent: Thursday, May 5, 2016 8:19 AM To: solr-user@lucene.apache.org Subject: Re: Passing Ids in query takes more time An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising that it takes a while. Your complaint seems to be that the query planner doesn’t know in advance that should be run first, and then the id selection applied to the reduced set. So, I can think of a few things for you to look at, in no particular order: 1. TermsQueryParser is designed for lists of terms, you might get better results from that: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser 2. If your is the real discriminating factor in your search, you could just search for and then apply your ID list as a PostFilter: http://yonik.com/advanced-filter-caching-in-solr/ I guess that’d look something like ={!terms f= v="= 100 should qualify it as a post filter, which only operates on an already-found result set instead of the full index. (Note: I haven’t confirmed that the Terms query parser supports post filtering.) 3. I’m not really aware of any storage engine that’ll love doing a filter on 80k ids at once, but a key-value store like Cassandra might work out better for that. 4. There is a thing called a JoinQParserPlugin (https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser) that can join to another collection (https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and there are some significant restrictions. On 5/5/16, 2:46 AM, "Bhaumik Joshi" <bhaumik.jo...@outlook.com> wrote: >Hi, > > >I am retrieving ids from collection1 based on some query and passing those ids >as a query to collection2 so the query to collection2 which contains ids in it >takes much more time compare to normal query. > > >Que. 1 - While passing ids to query why it takes more time compare to normal >query however we are narrowing the criteria by passing ids? > >e.g. query-1: doc_id:(111 222 333 444 ...) AND slower >(passing 80k ids takes 7-9 sec) than query-2: only (700-800 >ms). Both returns 250 records with same set of fields. > > >Que. 2 - Any idea on how i can achieve above (get ids from one collection and >pass those ids to other one) in efficient manner or any other way to get data >from one collection based on response of other collection? > > >Thanks & Regards, > >Bhaumik Joshi
Re: Passing IDs in query takes more time
Thanks Erick. TermsQueryParser worked for me. Thanks & Regards, Bhaumik Joshi From: Erick Erickson <erickerick...@gmail.com> Sent: Friday, May 6, 2016 10:00 AM To: solr-user Subject: Re: Passing IDs in query takes more time Well, you're parsing 80K IDs and forming them into a query. Consider what has to happen. Even in the very best case of the being evaluated first, for every doc that satisfies that clause the inverted index must be examined 80,000 times to see if that doc matches one of the IDs in your huge clause for scoring purposes. You might be better off by moving the 80K list to an fq clause like fq={!cache=false}docid:(111 222 333). Additionally, you probably want to use the TermsQueryParser, something like: fq={!terms f=id cache=false}111,222,333 see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser In any case, though, an 80K clause will slow things down considerably. Best, Erick On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi <bhaumik.jo...@outlook.com> wrote: > Hi, > > > I am retrieving ids from collection1 based on some query and passing those > ids as a query to collection2 so the query to collection2 which contains ids > in it takes much more time compare to normal query. > > > Que. 1 - While passing ids to query why it takes more time compare to normal > query however we are narrowing the criteria by passing ids? > > e.g. query-1: doc_id:(111 222 333 444 ...) AND slower > (takes 7-9 sec) than > > only (700-800 ms). Please note that in this case i am > passing 80k ids in and retrieving 250 rows. > > > Que. 2 - Any idea on how i can achieve above (get ids from one collection and > pass those ids to other one) in efficient manner or any other way to get data > from one collection based on response of other collection? > > > Thanks & Regards, > > Bhaumik Joshi
Re: Passing IDs in query takes more time
Well, you're parsing 80K IDs and forming them into a query. Consider what has to happen. Even in the very best case of the being evaluated first, for every doc that satisfies that clause the inverted index must be examined 80,000 times to see if that doc matches one of the IDs in your huge clause for scoring purposes. You might be better off by moving the 80K list to an fq clause like fq={!cache=false}docid:(111 222 333). Additionally, you probably want to use the TermsQueryParser, something like: fq={!terms f=id cache=false}111,222,333 see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser In any case, though, an 80K clause will slow things down considerably. Best, Erick On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshiwrote: > Hi, > > > I am retrieving ids from collection1 based on some query and passing those > ids as a query to collection2 so the query to collection2 which contains ids > in it takes much more time compare to normal query. > > > Que. 1 - While passing ids to query why it takes more time compare to normal > query however we are narrowing the criteria by passing ids? > > e.g. query-1: doc_id:(111 222 333 444 ...) AND slower > (takes 7-9 sec) than > > only (700-800 ms). Please note that in this case i am > passing 80k ids in and retrieving 250 rows. > > > Que. 2 - Any idea on how i can achieve above (get ids from one collection and > pass those ids to other one) in efficient manner or any other way to get data > from one collection based on response of other collection? > > > Thanks & Regards, > > Bhaumik Joshi
Re: Passing Ids in query takes more time
An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising that it takes a while. Your complaint seems to be that the query planner doesn’t know in advance that should be run first, and then the id selection applied to the reduced set. So, I can think of a few things for you to look at, in no particular order: 1. TermsQueryParser is designed for lists of terms, you might get better results from that: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser 2. If your is the real discriminating factor in your search, you could just search for and then apply your ID list as a PostFilter: http://yonik.com/advanced-filter-caching-in-solr/ I guess that’d look something like ={!terms f= v="= 100 should qualify it as a post filter, which only operates on an already-found result set instead of the full index. (Note: I haven’t confirmed that the Terms query parser supports post filtering.) 3. I’m not really aware of any storage engine that’ll love doing a filter on 80k ids at once, but a key-value store like Cassandra might work out better for that. 4. There is a thing called a JoinQParserPlugin (https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser) that can join to another collection (https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and there are some significant restrictions. On 5/5/16, 2:46 AM, "Bhaumik Joshi"wrote: >Hi, > > >I am retrieving ids from collection1 based on some query and passing those ids >as a query to collection2 so the query to collection2 which contains ids in it >takes much more time compare to normal query. > > >Que. 1 - While passing ids to query why it takes more time compare to normal >query however we are narrowing the criteria by passing ids? > >e.g. query-1: doc_id:(111 222 333 444 ...) AND slower >(passing 80k ids takes 7-9 sec) than query-2: only (700-800 >ms). Both returns 250 records with same set of fields. > > >Que. 2 - Any idea on how i can achieve above (get ids from one collection and >pass those ids to other one) in efficient manner or any other way to get data >from one collection based on response of other collection? > > >Thanks & Regards, > >Bhaumik Joshi
Passing Ids in query takes more time
Hi, I am retrieving ids from collection1 based on some query and passing those ids as a query to collection2 so the query to collection2 which contains ids in it takes much more time compare to normal query. Que. 1 - While passing ids to query why it takes more time compare to normal query however we are narrowing the criteria by passing ids? e.g. query-1: doc_id:(111 222 333 444 ...) AND slower (passing 80k ids takes 7-9 sec) than query-2: only (700-800 ms). Both returns 250 records with same set of fields. Que. 2 - Any idea on how i can achieve above (get ids from one collection and pass those ids to other one) in efficient manner or any other way to get data from one collection based on response of other collection? Thanks & Regards, Bhaumik Joshi
Passing IDs in query takes more time
Hi, I am retrieving ids from collection1 based on some query and passing those ids as a query to collection2 so the query to collection2 which contains ids in it takes much more time compare to normal query. Que. 1 - While passing ids to query why it takes more time compare to normal query however we are narrowing the criteria by passing ids? e.g. query-1: doc_id:(111 222 333 444 ...) AND slower (takes 7-9 sec) than only (700-800 ms). Please note that in this case i am passing 80k ids in and retrieving 250 rows. Que. 2 - Any idea on how i can achieve above (get ids from one collection and pass those ids to other one) in efficient manner or any other way to get data from one collection based on response of other collection? Thanks & Regards, Bhaumik Joshi