Re: Passing Ids in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Jeff. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Jeff Wartes <jwar...@whitepages.com>
Sent: Thursday, May 5, 2016 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Passing Ids in query takes more time

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 
80k ids though is basically 80k searches as far as Solr is concerned, so it’s 
not altogether surprising that it takes a while. Your complaint seems to be 
that the query planner doesn’t know in advance that  should be 
run first, and then the id selection applied to the reduced set.

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better 
results from that: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your  is the real discriminating factor in your search, 
you could just search for  and then apply your ID list as a 
PostFilter: http://yonik.com/advanced-filter-caching-in-solr/
I guess that’d look something like ={!terms f= v="= 100 
should qualify it as a post filter, which only operates on an already-found 
result set instead of the full index. (Note: I haven’t confirmed that the Terms 
query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 
80k ids at once, but a key-value store like Cassandra might work out better for 
that.

4. There is a thing called a JoinQParserPlugin 
(https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
 that can join to another collection 
(https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and 
there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi" <bhaumik.jo...@outlook.com> wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids 
>as a query to collection2 so the query to collection2 which contains ids in it 
>takes much more time compare to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal 
>query however we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
>(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
>ms). Both returns 250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
>pass those ids to other one) in efficient manner or any other way to get data 
>from one collection based on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi

Re: Passing IDs in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Erick. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Erick Erickson <erickerick...@gmail.com>
Sent: Friday, May 6, 2016 10:00 AM
To: solr-user
Subject: Re: Passing IDs in query takes more time

Well, you're parsing 80K IDs and forming them into a query. Consider
what has to happen. Even in the very best case of the 
being evaluated first, for every doc that satisfies that clause the inverted
index must be examined 80,000 times to see if that doc matches
one of the IDs in your huge clause for scoring purposes.

You might be better off by moving the 80K list to an fq clause like
fq={!cache=false}docid:(111 222 333).

Additionally, you probably want to use the TermsQueryParser, something like:
fq={!terms f=id cache=false}111,222,333
see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

In any case, though, an 80K clause will slow things down considerably.

Best,
Erick

On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi <bhaumik.jo...@outlook.com> wrote:
> Hi,
>
>
> I am retrieving ids from collection1 based on some query and passing those 
> ids as a query to collection2 so the query to collection2 which contains ids 
> in it takes much more time compare to normal query.
>
>
> Que. 1 - While passing ids to query why it takes more time compare to normal 
> query however we are narrowing the criteria by passing ids?
>
> e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
> (takes 7-9 sec) than
>
> only  (700-800 ms). Please note that in this case i am 
> passing 80k ids in  and retrieving 250 rows.
>
>
> Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
> pass those ids to other one) in efficient manner or any other way to get data 
> from one collection based on response of other collection?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi

Re: Passing IDs in query takes more time

2016-05-06 Thread Erick Erickson
Well, you're parsing 80K IDs and forming them into a query. Consider
what has to happen. Even in the very best case of the 
being evaluated first, for every doc that satisfies that clause the inverted
index must be examined 80,000 times to see if that doc matches
one of the IDs in your huge clause for scoring purposes.

You might be better off by moving the 80K list to an fq clause like
fq={!cache=false}docid:(111 222 333).

Additionally, you probably want to use the TermsQueryParser, something like:
fq={!terms f=id cache=false}111,222,333
see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

In any case, though, an 80K clause will slow things down considerably.

Best,
Erick

On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi  wrote:
> Hi,
>
>
> I am retrieving ids from collection1 based on some query and passing those 
> ids as a query to collection2 so the query to collection2 which contains ids 
> in it takes much more time compare to normal query.
>
>
> Que. 1 - While passing ids to query why it takes more time compare to normal 
> query however we are narrowing the criteria by passing ids?
>
> e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
> (takes 7-9 sec) than
>
> only  (700-800 ms). Please note that in this case i am 
> passing 80k ids in  and retrieving 250 rows.
>
>
> Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
> pass those ids to other one) in efficient manner or any other way to get data 
> from one collection based on response of other collection?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi


Re: Passing Ids in query takes more time

2016-05-05 Thread Jeff Wartes

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 
80k ids though is basically 80k searches as far as Solr is concerned, so it’s 
not altogether surprising that it takes a while. Your complaint seems to be 
that the query planner doesn’t know in advance that  should be 
run first, and then the id selection applied to the reduced set. 

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better 
results from that: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your  is the real discriminating factor in your search, 
you could just search for  and then apply your ID list as a 
PostFilter: http://yonik.com/advanced-filter-caching-in-solr/  
I guess that’d look something like ={!terms f= v="= 100 
should qualify it as a post filter, which only operates on an already-found 
result set instead of the full index. (Note: I haven’t confirmed that the Terms 
query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 
80k ids at once, but a key-value store like Cassandra might work out better for 
that.

4. There is a thing called a JoinQParserPlugin 
(https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
 that can join to another collection 
(https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and 
there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi"  wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids 
>as a query to collection2 so the query to collection2 which contains ids in it 
>takes much more time compare to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal 
>query however we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
>(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
>ms). Both returns 250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
>pass those ids to other one) in efficient manner or any other way to get data 
>from one collection based on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi


Passing Ids in query takes more time

2016-05-05 Thread Bhaumik Joshi
Hi,


I am retrieving ids from collection1 based on some query and passing those ids 
as a query to collection2 so the query to collection2 which contains ids in it 
takes much more time compare to normal query.


Que. 1 - While passing ids to query why it takes more time compare to normal 
query however we are narrowing the criteria by passing ids?

e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
ms). Both returns 250 records with same set of fields.


Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
pass those ids to other one) in efficient manner or any other way to get data 
from one collection based on response of other collection?


Thanks & Regards,

Bhaumik Joshi


Passing IDs in query takes more time

2016-05-05 Thread Bhaumik Joshi
Hi,


I am retrieving ids from collection1 based on some query and passing those ids 
as a query to collection2 so the query to collection2 which contains ids in it 
takes much more time compare to normal query.


Que. 1 - While passing ids to query why it takes more time compare to normal 
query however we are narrowing the criteria by passing ids?

e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower (takes 
7-9 sec) than

only  (700-800 ms). Please note that in this case i am passing 
80k ids in  and retrieving 250 rows.


Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
pass those ids to other one) in efficient manner or any other way to get data 
from one collection based on response of other collection?


Thanks & Regards,

Bhaumik Joshi