Re: how to write an efficient query with a subquery to restrict the search space?
Hi, Sounds like a possible document and query routing use case. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 31, 2014 7:11 AM, "svante karlsson" wrote: > It seems to be faster to first restrict the search space and then do the > scoring compared to just use the full query and let solr handle everything. > > For example in my application one of the scoring fields effectivly hits > 1/12 of the database (a month field) and if we have 100'' items in the > database the this matters. > > /svante > > > 2014-01-30 Jack Krupansky : > > > Lucene's default scoring should give you much of what you want - ranking > > hits of low-frequency terms higher - without any special query syntax - > > just list out your terms and use "OR" as your default operator. > > > > -- Jack Krupansky > > > > -Original Message- From: svante karlsson > > Sent: Thursday, January 23, 2014 6:42 AM > > To: solr-user@lucene.apache.org > > Subject: how to write an efficient query with a subquery to restrict the > > search space? > > > > > > I have a solr db containing 1 billion records that I'm trying to use in a > > NoSQL fashion. > > > > What I want to do is find the best matches using all search terms but > > restrict the search space to the most unique terms > > > > In this example I know that val2 and val4 is rare terms and val1 and val3 > > are more common. In my real scenario I'll have 20 fields that I want to > > include or exclude in the inner query depending on the uniqueness of the > > requested value. > > > > > > my first approach was: > > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND > (field2:val2 > > OR field4:val4)&rows=100&fl=* > > > > but what I think I get is > > . field4:val4 AND (field2:val2 OR field4:val4) this result is then > > OR'ed with the rest > > > > if I write > > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND > > (field2:val2 OR field4:val4)&rows=100&fl=* > > > > then what I think I get is two sub-queries that is evaluated separately > and > > then joined - performance wise this is bad. > > > > Whats the best way to write these types of queries? > > > > > > Are there any performance issues when running it on several solrcloud > nodes > > vs a single instance or should it scale? > > > > > > > > /svante > > >
Re: how to write an efficient query with a subquery to restrict the search space?
It seems to be faster to first restrict the search space and then do the scoring compared to just use the full query and let solr handle everything. For example in my application one of the scoring fields effectivly hits 1/12 of the database (a month field) and if we have 100'' items in the database the this matters. /svante 2014-01-30 Jack Krupansky : > Lucene's default scoring should give you much of what you want - ranking > hits of low-frequency terms higher - without any special query syntax - > just list out your terms and use "OR" as your default operator. > > -- Jack Krupansky > > -Original Message- From: svante karlsson > Sent: Thursday, January 23, 2014 6:42 AM > To: solr-user@lucene.apache.org > Subject: how to write an efficient query with a subquery to restrict the > search space? > > > I have a solr db containing 1 billion records that I'm trying to use in a > NoSQL fashion. > > What I want to do is find the best matches using all search terms but > restrict the search space to the most unique terms > > In this example I know that val2 and val4 is rare terms and val1 and val3 > are more common. In my real scenario I'll have 20 fields that I want to > include or exclude in the inner query depending on the uniqueness of the > requested value. > > > my first approach was: > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 > OR field4:val4)&rows=100&fl=* > > but what I think I get is > . field4:val4 AND (field2:val2 OR field4:val4) this result is then > OR'ed with the rest > > if I write > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND > (field2:val2 OR field4:val4)&rows=100&fl=* > > then what I think I get is two sub-queries that is evaluated separately and > then joined - performance wise this is bad. > > Whats the best way to write these types of queries? > > > Are there any performance issues when running it on several solrcloud nodes > vs a single instance or should it scale? > > > > /svante >
Re: how to write an efficient query with a subquery to restrict the search space?
Lucene's default scoring should give you much of what you want - ranking hits of low-frequency terms higher - without any special query syntax - just list out your terms and use "OR" as your default operator. -- Jack Krupansky -Original Message- From: svante karlsson Sent: Thursday, January 23, 2014 6:42 AM To: solr-user@lucene.apache.org Subject: how to write an efficient query with a subquery to restrict the search space? I have a solr db containing 1 billion records that I'm trying to use in a NoSQL fashion. What I want to do is find the best matches using all search terms but restrict the search space to the most unique terms In this example I know that val2 and val4 is rare terms and val1 and val3 are more common. In my real scenario I'll have 20 fields that I want to include or exclude in the inner query depending on the uniqueness of the requested value. my first approach was: q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 OR field4:val4)&rows=100&fl=* but what I think I get is . field4:val4 AND (field2:val2 OR field4:val4) this result is then OR'ed with the rest if I write q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND (field2:val2 OR field4:val4)&rows=100&fl=* then what I think I get is two sub-queries that is evaluated separately and then joined - performance wise this is bad. Whats the best way to write these types of queries? Are there any performance issues when running it on several solrcloud nodes vs a single instance or should it scale? /svante
Re: how to write an efficient query with a subquery to restrict the search space?
Maybe you could move (field2:val2 or field4:val4) into a filter? E.g, q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4)&fq=(field2:val2 OR field4:val4) If I have this correctly, the fq part should be evaluated first, and may even be found in the filter cache. On Thu, Jan 23, 2014 at 12:42 PM, svante karlsson wrote: > I have a solr db containing 1 billion records that I'm trying to use in a > NoSQL fashion. > > What I want to do is find the best matches using all search terms but > restrict the search space to the most unique terms > > In this example I know that val2 and val4 is rare terms and val1 and val3 > are more common. In my real scenario I'll have 20 fields that I want to > include or exclude in the inner query depending on the uniqueness of the > requested value. > > > my first approach was: > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 > OR field4:val4)&rows=100&fl=* > > but what I think I get is > . field4:val4 AND (field2:val2 OR field4:val4) this result is then > OR'ed with the rest > > if I write > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND > (field2:val2 OR field4:val4)&rows=100&fl=* > > then what I think I get is two sub-queries that is evaluated separately and > then joined - performance wise this is bad. > > Whats the best way to write these types of queries? > > > Are there any performance issues when running it on several solrcloud nodes > vs a single instance or should it scale? > > > > /svante >
how to write an efficient query with a subquery to restrict the search space?
I have a solr db containing 1 billion records that I'm trying to use in a NoSQL fashion. What I want to do is find the best matches using all search terms but restrict the search space to the most unique terms In this example I know that val2 and val4 is rare terms and val1 and val3 are more common. In my real scenario I'll have 20 fields that I want to include or exclude in the inner query depending on the uniqueness of the requested value. my first approach was: q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 OR field4:val4)&rows=100&fl=* but what I think I get is . field4:val4 AND (field2:val2 OR field4:val4) this result is then OR'ed with the rest if I write q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND (field2:val2 OR field4:val4)&rows=100&fl=* then what I think I get is two sub-queries that is evaluated separately and then joined - performance wise this is bad. Whats the best way to write these types of queries? Are there any performance issues when running it on several solrcloud nodes vs a single instance or should it scale? /svante