Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: bq works only with q.alt query and not with q queries. So, in your case you
: would be using qf parameter for field boosting, you will have to give both
: the fields in qf parameter i.e. both title and media.

FWIW: that statement is false.  the boost query (bq) is added to the 
query regardless of wether q or q.alt is ultimately used.

if you turn on debugQUery=true and look at your resulting query string, 
you can see exactly what the resulting query is (parsedQuery)

Using the example setup, compare the output from these examples...

http://localhost:8983/solr/select/?q.alt=bazq=solrdefType=dismaxqf=name+catbq=foodebugQuery=true
http://localhost:8983/solr/select/?q.alt=solrq=defType=dismaxqf=name+catbq=foodebugQuery=true


-Hoss



Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: Is not particularly helpful. I tried adding adding a bq argument to my
: search: 
: 
: bq=media:DVD^2
: 
: (yes, this is an index of films!) but I find when I start adding more
: and more:
: 
: bq=media:DVD^2bq=media:BLU-RAY^1.5
: 
: I find the negative results - e.g. films that are DVD but are not
: BLU-RAY get negatively affected in their score. In the end it all seems

that shouldn't be happening ... the outermost BooleanQuery (that the 
main q and all of hte bq queries are added to) has it's 
coordFactor disabled, so documents aren't penalized for not matching bq 
caluses.

What you may be seeing is that the raw numeric score values you see 
getting returned by Solr are lower for documents that match DVD when you add 
teh 
BLU-RAY bq ... that's totally possible because *absolute* scores from 
one query can't be compared to scores from another query -- what's important is 
that 
the *relative* order of scores from doc1 and doc2 should be consistent 
(ie: the score for a doc matching DVD might go down when you add the 
BLUERAY bq, but the scores for *all* documents not matching BLUERAY should 
go down some)

The important thing to look for is:
  1) are DVD docs sorting higher then they would without the DVD bq?
  2) are BLURAY docs sorting higher then they would without the BLURAY bq?
  3) are two docs that are equivilent except for a DVD?BLUERAY distinction 
 sorting such that the BLURAY doc comes first?


...the answers to all of those should be yes.  if you're seeing otherwise, 
please post the query tostrings for both queries, and the score 
explanations for the docs in question against both queries.




-Hoss



RE: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Dean Missikowski (Consultant), CLSA
If you just discovered the omitTf parameter because of this post, please
be aware that I've not really explained it's purpose properly and note
that using it will prevent phrase queries from working. See this thread
for clarification on it's use here:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3
c897559.95769...@web50301.mail.re2.yahoo.com%3e

-- Dean

-Original Message-
From: Dean Missikowski (Consultant), CLSA 
Sent: 16/03/2009 10:30 AM
To: solr-user@lucene.apache.org
Subject: RE: How to correctly boost results in Solr Dismax query

Hi,

My experience is that the BQ parameter can be used with any query type.
You can define boosts on the query fields (qf) that are used with the
query terms (q) in your query, AND you can define additional boosts for
fields that are not used with the query terms through the bq or bf
parameters. 

I think the relative weight that assigning a particular boost to a field
via BQ has on the overall scoring needs to take into consideration the
other fields in your query. If you're searching on titles, you might
want to consider setting omitNorms=true (means don't generate length
normalization vectors) for title in your schema.xml, and if you're using
Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
that results aren't skewed by short and long titles, or titles that
contain multiple occurrences of the same term (setting these requires
you to reindex). I think this should have the effect of making BQ boosts
like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 

-- Dean

-Original Message-
From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
Sent: 13/03/2009 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: How to correctly boost results in Solr Dismax query

Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your
case you
 would be using qf parameter for field boosting, you will have to give
both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but
I
  think it would help if I posted an example. Say I have three
records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using
'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2
columns.
  
  Now first record has values like films=Indiana and media=blue
ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records
but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you
must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is
simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is
designed to
  be
   support raw input strings provided by users with no special
escaping.
   '+' and '-' characters are treated as mandatory and
prohibited
   modifiers for the subsequent terms. Text wrapped in balanced
quote
   characters '' are treated as phrases, any query containing an
odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not
supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List

Re: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Otis Gospodnetic

Also note that we have an open and related issue on Lucene's bug tracking 
system.  omitTf might get renamed so that it's more clear that positional 
information is not stored, which prevents phrase queries.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Dean Missikowski (Consultant), CLSA dean.missikow...@clsa.com
 To: solr-user@lucene.apache.org
 Sent: Monday, March 16, 2009 4:46:32 AM
 Subject: RE: How to correctly boost results in Solr Dismax query
 
 If you just discovered the omitTf parameter because of this post, please
 be aware that I've not really explained it's purpose properly and note
 that using it will prevent phrase queries from working. See this thread
 for clarification on it's use here:
 http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3
 c897559.95769...@web50301.mail.re2.yahoo.com%3e
 
 -- Dean
 
 -Original Message-
 From: Dean Missikowski (Consultant), CLSA 
 Sent: 16/03/2009 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: RE: How to correctly boost results in Solr Dismax query
 
 Hi,
 
 My experience is that the BQ parameter can be used with any query type.
 You can define boosts on the query fields (qf) that are used with the
 query terms (q) in your query, AND you can define additional boosts for
 fields that are not used with the query terms through the bq or bf
 parameters. 
 
 I think the relative weight that assigning a particular boost to a field
 via BQ has on the overall scoring needs to take into consideration the
 other fields in your query. If you're searching on titles, you might
 want to consider setting omitNorms=true (means don't generate length
 normalization vectors) for title in your schema.xml, and if you're using
 Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
 that results aren't skewed by short and long titles, or titles that
 contain multiple occurrences of the same term (setting these requires
 you to reindex). I think this should have the effect of making BQ boosts
 like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 
 
 -- Dean
 
 -Original Message-
 From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
 Sent: 13/03/2009 7:11 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to correctly boost results in Solr Dismax query
 
 Hi,
 
 On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
  bq works only with q.alt query and not with q queries. So, in your
 case you
  would be using qf parameter for field boosting, you will have to give
 both
  the fields in qf parameter i.e. both title and media.
  
  try this
  
  media^1.0 title^100.0
 
 But with that, how will it know to rank media:DVD higher than
 media:BLU-RAY?
 
 Cheers,
 Pete
 
 
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks again for your reply. I am understanding it a bit better but
 I
   think it would help if I posted an example. Say I have three
 records:
   
   
   1
   BLU-RAY
   Indiana Jones and the Kingdom of the Crystal
   Skull
   
   
   2
   DVD
   Indiana Jones and the Kingdom of the Crystal
   Skull
   
   
   3
   DVD
   Casino Royale
   
   
   Now, if I search for indiana: select?q=indiana
   
   I want the first two rows to come back (not the third as it does not
   contain 'indiana'). I would like record 2 to be scored higher than
   record 1 as it's media type is DVD.
   
   At the moment I have in my config:
   
   title
   
   And i was trying to boost by media having a specific value by using
 'bq'
   but from what you told me that is incorrect.
   
   Cheers,
   Pete
   
   
   On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
   Pete,
   
   Sorry, if wasnt clear. Here is the explanation.
   
   Suppose you have 2 records and they have films and media as 2
 columns.
   
   Now first record has values like films=Indiana and media=blue
 ray
   and 2nd record has values like films=Bond and media=Indiana
   
   Values for qf parameters
   
   media^2.0 films^1.0
   
   Now, search for q=Indiana .. it should display both of the records
 but
   record #2 will display above than the 1st.
   
   Let me know if you still have questions.
   
   Cheers,
   amit
   
   
   Pete Smith-3 wrote:

Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
If you want to boost the records with their field value then you
 must
   use
q
query parameter instead of q.alt. 'q' parameter actually uses qf
parameters
from solrConfig for field boosting.

   From the documentation for Dismax queries, I thought that q is
 simply
a keyword parameter:

   From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is
 designed to
   be
support raw input strings provided by users with no special
 escaping.
'+' and '-' characters are treated

RE: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Pete Smith
Thank you Dean. I thought I was on the right track with BQ but it was
the skewing of results that was frustrating me. I'll try out your
suggestion.

Cheers,
Pete

On Mon, 2009-03-16 at 10:29 +0800, Dean Missikowski (Consultant), CLSA
wrote:
 Hi,
 
 My experience is that the BQ parameter can be used with any query type.
 You can define boosts on the query fields (qf) that are used with the
 query terms (q) in your query, AND you can define additional boosts for
 fields that are not used with the query terms through the bq or bf
 parameters. 
 
 I think the relative weight that assigning a particular boost to a field
 via BQ has on the overall scoring needs to take into consideration the
 other fields in your query. If you're searching on titles, you might
 want to consider setting omitNorms=true (means don't generate length
 normalization vectors) for title in your schema.xml, and if you're using
 Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
 that results aren't skewed by short and long titles, or titles that
 contain multiple occurrences of the same term (setting these requires
 you to reindex). I think this should have the effect of making BQ boosts
 like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 
 
 -- Dean
 
 -Original Message-
 From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
 Sent: 13/03/2009 7:11 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to correctly boost results in Solr Dismax query
 
 Hi,
 
 On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
  bq works only with q.alt query and not with q queries. So, in your
 case you
  would be using qf parameter for field boosting, you will have to give
 both
  the fields in qf parameter i.e. both title and media.
  
  try this
  
  str name=qfmedia^1.0 title^100.0/str
 
 But with that, how will it know to rank media:DVD higher than
 media:BLU-RAY?
 
 Cheers,
 Pete
 
 
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks again for your reply. I am understanding it a bit better but
 I
   think it would help if I posted an example. Say I have three
 records:
   
   doc
   long name=id1/long
   str name=mediaBLU-RAY/str
   str name=titleIndiana Jones and the Kingdom of the Crystal
   Skull/str
   /doc
   doc
   long name=id2/long
   str name=mediaDVD/str
   str name=titleIndiana Jones and the Kingdom of the Crystal
   Skull/str
   /doc
   doc
   long name=id3/long
   str name=mediaDVD/str
   str name=titleCasino Royale/str
   /doc
   
   Now, if I search for indiana: select?q=indiana
   
   I want the first two rows to come back (not the third as it does not
   contain 'indiana'). I would like record 2 to be scored higher than
   record 1 as it's media type is DVD.
   
   At the moment I have in my config:
   
   str name=qftitle/str
   
   And i was trying to boost by media having a specific value by using
 'bq'
   but from what you told me that is incorrect.
   
   Cheers,
   Pete
   
   
   On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
   Pete,
   
   Sorry, if wasnt clear. Here is the explanation.
   
   Suppose you have 2 records and they have films and media as 2
 columns.
   
   Now first record has values like films=Indiana and media=blue
 ray
   and 2nd record has values like films=Bond and media=Indiana
   
   Values for qf parameters
   
   str name=qfmedia^2.0 films^1.0/str
   
   Now, search for q=Indiana .. it should display both of the records
 but
   record #2 will display above than the 1st.
   
   Let me know if you still have questions.
   
   Cheers,
   amit
   
   
   Pete Smith-3 wrote:

Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
If you want to boost the records with their field value then you
 must
   use
q
query parameter instead of q.alt. 'q' parameter actually uses qf
parameters
from solrConfig for field boosting.

   From the documentation for Dismax queries, I thought that q is
 simply
a keyword parameter:

   From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is
 designed to
   be
support raw input strings provided by users with no special
 escaping.
'+' and '-' characters are treated as mandatory and
 prohibited
modifiers for the subsequent terms. Text wrapped in balanced
 quote
characters '' are treated as phrases, any query containing an
 odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this q parameter are not
 supported. 

And I thought 'qf' is a list of fields and boost scores:

   From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the boosts to associate with each of them
 when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo

RE: How to correctly boost results in Solr Dismax query

2009-03-15 Thread Dean Missikowski (Consultant), CLSA
Hi,

My experience is that the BQ parameter can be used with any query type.
You can define boosts on the query fields (qf) that are used with the
query terms (q) in your query, AND you can define additional boosts for
fields that are not used with the query terms through the bq or bf
parameters. 

I think the relative weight that assigning a particular boost to a field
via BQ has on the overall scoring needs to take into consideration the
other fields in your query. If you're searching on titles, you might
want to consider setting omitNorms=true (means don't generate length
normalization vectors) for title in your schema.xml, and if you're using
Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
that results aren't skewed by short and long titles, or titles that
contain multiple occurrences of the same term (setting these requires
you to reindex). I think this should have the effect of making BQ boosts
like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 

-- Dean

-Original Message-
From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
Sent: 13/03/2009 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: How to correctly boost results in Solr Dismax query

Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your
case you
 would be using qf parameter for field boosting, you will have to give
both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but
I
  think it would help if I posted an example. Say I have three
records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using
'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2
columns.
  
  Now first record has values like films=Indiana and media=blue
ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records
but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you
must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is
simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is
designed to
  be
   support raw input strings provided by users with no special
escaping.
   '+' and '-' characters are treated as mandatory and
prohibited
   modifiers for the subsequent terms. Text wrapped in balanced
quote
   characters '' are treated as phrases, any query containing an
odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not
supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List of fields and the boosts to associate with each of them
when
   building DisjunctionMaxQueries from the user's query. The format
   supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which
indicates that
   fieldOne has a boost of 2.3, fieldTwo has the default boost, and
   fieldThree has a boost of 0.4 ... this indicates that matches in
   fieldOne are much more significant than matches in fieldTwo,
which are
   more significant than matches in fieldThree. 
   
   But if I want to, say, search for films with 'indiana' in the
title,
   with media=DVD scoring higher than

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Hi Pete,

bq parameter works with q,alt query parameter. If you are passing the search
criteria using q.alt query parameter then this bq parameter comes into
picture. Also, q.alt doesnt support field boosting.

If you want to boost the records with their field value then you must use q
query parameter instead of q.alt. 'q' parameter actually uses qf parameters
from solrConfig for field boosting.

Let me know if you have any questions.

Thanks,
Amit Garg





Pete Smith-3 wrote:
 
 Hi,
 
 I have managed to build an index in Solr which I can search on keyword,
 produce facets, query facets etc. This is all working great. I have
 implemented my search using a dismax query so it searches predetermined
 fields.
 
 However, my results are coming back sorted by score which appears to be
 calculated by keyword relevancy only. I would like to adjust the score
 where fields have pre-determined values. I think I can do this with
 boost query and boost functions but the documentation here:
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
 
 Is not particularly helpful. I tried adding adding a bq argument to my
 search: 
 
 bq=media:DVD^2
 
 (yes, this is an index of films!) but I find when I start adding more
 and more:
 
 bq=media:DVD^2bq=media:BLU-RAY^1.5
 
 I find the negative results - e.g. films that are DVD but are not
 BLU-RAY get negatively affected in their score. In the end it all seems
 to even out and my score is as it was before i started boosting.
 
 I must be doing this wrong and I wonder whether boost function comes
 in somewhere. Any ideas on how to correctly use boost?
 
 Cheers,
 Pete
 
 -- 
 Pete Smith
 Developer
 
 No.9 | 6 Portal Way | London | W3 6RU |
 T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
 
 LOVEFiLM.com
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22490850.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
 If you want to boost the records with their field value then you must use q
 query parameter instead of q.alt. 'q' parameter actually uses qf parameters
 from solrConfig for field boosting.

From the documentation for Dismax queries, I thought that q is simply
a keyword parameter:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is designed to be
support raw input strings provided by users with no special escaping.
'+' and '-' characters are treated as mandatory and prohibited
modifiers for the subsequent terms. Text wrapped in balanced quote
characters '' are treated as phrases, any query containing an odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this q parameter are not supported. 

And I thought 'qf' is a list of fields and boost scores:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the boosts to associate with each of them when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
fieldOne has a boost of 2.3, fieldTwo has the default boost, and
fieldThree has a boost of 0.4 ... this indicates that matches in
fieldOne are much more significant than matches in fieldTwo, which are
more significant than matches in fieldThree. 

But if I want to, say, search for films with 'indiana' in the title,
with media=DVD scoring higher than media=BLU-RAY then do I need to do
something like:

solr/select?q=indiana

And in my config:

str name=qfmedia^2/str

But I don't see where the actual *contents* of the media field would
determine the boost.

Sorry if I have misunderstood what you mean.

Cheers,
Pete

 Pete Smith-3 wrote:
  
  Hi,
  
  I have managed to build an index in Solr which I can search on keyword,
  produce facets, query facets etc. This is all working great. I have
  implemented my search using a dismax query so it searches predetermined
  fields.
  
  However, my results are coming back sorted by score which appears to be
  calculated by keyword relevancy only. I would like to adjust the score
  where fields have pre-determined values. I think I can do this with
  boost query and boost functions but the documentation here:
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
  
  Is not particularly helpful. I tried adding adding a bq argument to my
  search: 
  
  bq=media:DVD^2
  
  (yes, this is an index of films!) but I find when I start adding more
  and more:
  
  bq=media:DVD^2bq=media:BLU-RAY^1.5
  
  I find the negative results - e.g. films that are DVD but are not
  BLU-RAY get negatively affected in their score. In the end it all seems
  to even out and my score is as it was before i started boosting.
  
  I must be doing this wrong and I wonder whether boost function comes
  in somewhere. Any ideas on how to correctly use boost?
  
  Cheers,
  Pete
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 6 Portal Way | London | W3 6RU |
  T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
  
  LOVEFiLM.com
  
  
 
-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Pete,

Sorry, if wasnt clear. Here is the explanation.

Suppose you have 2 records and they have films and media as 2 columns.

Now first record has values like films=Indiana and media=blue ray
and 2nd record has values like films=Bond and media=Indiana

Values for qf parameters

str name=qfmedia^2.0 films^1.0/str

Now, search for q=Indiana .. it should display both of the records but
record #2 will display above than the 1st.

Let me know if you still have questions.

Cheers,
amit


Pete Smith-3 wrote:
 
 Hi Amit,
 
 Thanks very much for your reply. What you said makes things a bit
 clearer but I am still a bit confused.
 
 On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
 If you want to boost the records with their field value then you must use
 q
 query parameter instead of q.alt. 'q' parameter actually uses qf
 parameters
 from solrConfig for field boosting.
 
From the documentation for Dismax queries, I thought that q is simply
 a keyword parameter:
 
From http://wiki.apache.org/solr/DisMaxRequestHandler:
 q
 The guts of the search defining the main query. This is designed to be
 support raw input strings provided by users with no special escaping.
 '+' and '-' characters are treated as mandatory and prohibited
 modifiers for the subsequent terms. Text wrapped in balanced quote
 characters '' are treated as phrases, any query containing an odd
 number of quote characters is evaluated as if there were no quote
 characters at all. Wildcards in this q parameter are not supported. 
 
 And I thought 'qf' is a list of fields and boost scores:
 
From http://wiki.apache.org/solr/DisMaxRequestHandler:
 qf (Query Fields)
 List of fields and the boosts to associate with each of them when
 building DisjunctionMaxQueries from the user's query. The format
 supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
 fieldOne has a boost of 2.3, fieldTwo has the default boost, and
 fieldThree has a boost of 0.4 ... this indicates that matches in
 fieldOne are much more significant than matches in fieldTwo, which are
 more significant than matches in fieldThree. 
 
 But if I want to, say, search for films with 'indiana' in the title,
 with media=DVD scoring higher than media=BLU-RAY then do I need to do
 something like:
 
 solr/select?q=indiana
 
 And in my config:
 
 str name=qfmedia^2/str
 
 But I don't see where the actual *contents* of the media field would
 determine the boost.
 
 Sorry if I have misunderstood what you mean.
 
 Cheers,
 Pete
 
 Pete Smith-3 wrote:
  
  Hi,
  
  I have managed to build an index in Solr which I can search on keyword,
  produce facets, query facets etc. This is all working great. I have
  implemented my search using a dismax query so it searches predetermined
  fields.
  
  However, my results are coming back sorted by score which appears to be
  calculated by keyword relevancy only. I would like to adjust the score
  where fields have pre-determined values. I think I can do this with
  boost query and boost functions but the documentation here:
  
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
  
  Is not particularly helpful. I tried adding adding a bq argument to my
  search: 
  
  bq=media:DVD^2
  
  (yes, this is an index of films!) but I find when I start adding more
  and more:
  
  bq=media:DVD^2bq=media:BLU-RAY^1.5
  
  I find the negative results - e.g. films that are DVD but are not
  BLU-RAY get negatively affected in their score. In the end it all seems
  to even out and my score is as it was before i started boosting.
  
  I must be doing this wrong and I wonder whether boost function comes
  in somewhere. Any ideas on how to correctly use boost?
  
  Cheers,
  Pete
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 6 Portal Way | London | W3 6RU |
  T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
  
  LOVEFiLM.com
  
  
 
 -- 
 Pete Smith
 Developer
 
 No.9 | 6 Portal Way | London | W3 6RU |
 T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
 
 LOVEFiLM.com
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22493646.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks again for your reply. I am understanding it a bit better but I
think it would help if I posted an example. Say I have three records:

doc
long name=id1/long
str name=mediaBLU-RAY/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id2/long
str name=mediaDVD/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id3/long
str name=mediaDVD/str
str name=titleCasino Royale/str
/doc

Now, if I search for indiana: select?q=indiana

I want the first two rows to come back (not the third as it does not
contain 'indiana'). I would like record 2 to be scored higher than
record 1 as it's media type is DVD.

At the moment I have in my config:

str name=qftitle/str

And i was trying to boost by media having a specific value by using 'bq'
but from what you told me that is incorrect.

Cheers,
Pete


On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
 Pete,
 
 Sorry, if wasnt clear. Here is the explanation.
 
 Suppose you have 2 records and they have films and media as 2 columns.
 
 Now first record has values like films=Indiana and media=blue ray
 and 2nd record has values like films=Bond and media=Indiana
 
 Values for qf parameters
 
 str name=qfmedia^2.0 films^1.0/str
 
 Now, search for q=Indiana .. it should display both of the records but
 record #2 will display above than the 1st.
 
 Let me know if you still have questions.
 
 Cheers,
 amit
 
 
 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks very much for your reply. What you said makes things a bit
  clearer but I am still a bit confused.
  
  On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
  If you want to boost the records with their field value then you must use
  q
  query parameter instead of q.alt. 'q' parameter actually uses qf
  parameters
  from solrConfig for field boosting.
  
 From the documentation for Dismax queries, I thought that q is simply
  a keyword parameter:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  q
  The guts of the search defining the main query. This is designed to be
  support raw input strings provided by users with no special escaping.
  '+' and '-' characters are treated as mandatory and prohibited
  modifiers for the subsequent terms. Text wrapped in balanced quote
  characters '' are treated as phrases, any query containing an odd
  number of quote characters is evaluated as if there were no quote
  characters at all. Wildcards in this q parameter are not supported. 
  
  And I thought 'qf' is a list of fields and boost scores:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  qf (Query Fields)
  List of fields and the boosts to associate with each of them when
  building DisjunctionMaxQueries from the user's query. The format
  supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
  fieldOne has a boost of 2.3, fieldTwo has the default boost, and
  fieldThree has a boost of 0.4 ... this indicates that matches in
  fieldOne are much more significant than matches in fieldTwo, which are
  more significant than matches in fieldThree. 
  
  But if I want to, say, search for films with 'indiana' in the title,
  with media=DVD scoring higher than media=BLU-RAY then do I need to do
  something like:
  
  solr/select?q=indiana
  
  And in my config:
  
  str name=qfmedia^2/str
  
  But I don't see where the actual *contents* of the media field would
  determine the boost.
  
  Sorry if I have misunderstood what you mean.
  
  Cheers,
  Pete
  
  Pete Smith-3 wrote:
   
   Hi,
   
   I have managed to build an index in Solr which I can search on keyword,
   produce facets, query facets etc. This is all working great. I have
   implemented my search using a dismax query so it searches predetermined
   fields.
   
   However, my results are coming back sorted by score which appears to be
   calculated by keyword relevancy only. I would like to adjust the score
   where fields have pre-determined values. I think I can do this with
   boost query and boost functions but the documentation here:
   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
   
   Is not particularly helpful. I tried adding adding a bq argument to my
   search: 
   
   bq=media:DVD^2
   
   (yes, this is an index of films!) but I find when I start adding more
   and more:
   
   bq=media:DVD^2bq=media:BLU-RAY^1.5
   
   I find the negative results - e.g. films that are DVD but are not
   BLU-RAY get negatively affected in their score. In the end it all seems
   to even out and my score is as it was before i started boosting.
   
   I must be doing this wrong and I wonder whether boost function comes
   in somewhere. Any ideas on how to correctly use boost?
   
   Cheers,
   Pete
   
   -- 
   Pete Smith
   Developer
   
   No.9 | 6 Portal Way | London | W3 6RU |
   T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
   
   LOVEFiLM.com
   
   
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Pete,

bq works only with q.alt query and not with q queries. So, in your case you
would be using qf parameter for field boosting, you will have to give both
the fields in qf parameter i.e. both title and media.

try this

str name=qfmedia^1.0 title^100.0/str



Pete Smith-3 wrote:
 
 Hi Amit,
 
 Thanks again for your reply. I am understanding it a bit better but I
 think it would help if I posted an example. Say I have three records:
 
 doc
 long name=id1/long
 str name=mediaBLU-RAY/str
 str name=titleIndiana Jones and the Kingdom of the Crystal
 Skull/str
 /doc
 doc
 long name=id2/long
 str name=mediaDVD/str
 str name=titleIndiana Jones and the Kingdom of the Crystal
 Skull/str
 /doc
 doc
 long name=id3/long
 str name=mediaDVD/str
 str name=titleCasino Royale/str
 /doc
 
 Now, if I search for indiana: select?q=indiana
 
 I want the first two rows to come back (not the third as it does not
 contain 'indiana'). I would like record 2 to be scored higher than
 record 1 as it's media type is DVD.
 
 At the moment I have in my config:
 
 str name=qftitle/str
 
 And i was trying to boost by media having a specific value by using 'bq'
 but from what you told me that is incorrect.
 
 Cheers,
 Pete
 
 
 On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
 Pete,
 
 Sorry, if wasnt clear. Here is the explanation.
 
 Suppose you have 2 records and they have films and media as 2 columns.
 
 Now first record has values like films=Indiana and media=blue ray
 and 2nd record has values like films=Bond and media=Indiana
 
 Values for qf parameters
 
 str name=qfmedia^2.0 films^1.0/str
 
 Now, search for q=Indiana .. it should display both of the records but
 record #2 will display above than the 1st.
 
 Let me know if you still have questions.
 
 Cheers,
 amit
 
 
 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks very much for your reply. What you said makes things a bit
  clearer but I am still a bit confused.
  
  On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
  If you want to boost the records with their field value then you must
 use
  q
  query parameter instead of q.alt. 'q' parameter actually uses qf
  parameters
  from solrConfig for field boosting.
  
 From the documentation for Dismax queries, I thought that q is simply
  a keyword parameter:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  q
  The guts of the search defining the main query. This is designed to
 be
  support raw input strings provided by users with no special escaping.
  '+' and '-' characters are treated as mandatory and prohibited
  modifiers for the subsequent terms. Text wrapped in balanced quote
  characters '' are treated as phrases, any query containing an odd
  number of quote characters is evaluated as if there were no quote
  characters at all. Wildcards in this q parameter are not supported. 
  
  And I thought 'qf' is a list of fields and boost scores:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  qf (Query Fields)
  List of fields and the boosts to associate with each of them when
  building DisjunctionMaxQueries from the user's query. The format
  supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
  fieldOne has a boost of 2.3, fieldTwo has the default boost, and
  fieldThree has a boost of 0.4 ... this indicates that matches in
  fieldOne are much more significant than matches in fieldTwo, which are
  more significant than matches in fieldThree. 
  
  But if I want to, say, search for films with 'indiana' in the title,
  with media=DVD scoring higher than media=BLU-RAY then do I need to do
  something like:
  
  solr/select?q=indiana
  
  And in my config:
  
  str name=qfmedia^2/str
  
  But I don't see where the actual *contents* of the media field would
  determine the boost.
  
  Sorry if I have misunderstood what you mean.
  
  Cheers,
  Pete
  
  Pete Smith-3 wrote:
   
   Hi,
   
   I have managed to build an index in Solr which I can search on
 keyword,
   produce facets, query facets etc. This is all working great. I have
   implemented my search using a dismax query so it searches
 predetermined
   fields.
   
   However, my results are coming back sorted by score which appears to
 be
   calculated by keyword relevancy only. I would like to adjust the
 score
   where fields have pre-determined values. I think I can do this with
   boost query and boost functions but the documentation here:
   
  
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
   
   Is not particularly helpful. I tried adding adding a bq argument to
 my
   search: 
   
   bq=media:DVD^2
   
   (yes, this is an index of films!) but I find when I start adding
 more
   and more:
   
   bq=media:DVD^2bq=media:BLU-RAY^1.5
   
   I find the negative results - e.g. films that are DVD but are not
   BLU-RAY get negatively affected in their score. In the end it all
 seems
   to even out and my score is as it was before i started boosting.
   
   I must be doing this 

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your case you
 would be using qf parameter for field boosting, you will have to give both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but I
  think it would help if I posted an example. Say I have three records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using 'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2 columns.
  
  Now first record has values like films=Indiana and media=blue ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is designed to
  be
   support raw input strings provided by users with no special escaping.
   '+' and '-' characters are treated as mandatory and prohibited
   modifiers for the subsequent terms. Text wrapped in balanced quote
   characters '' are treated as phrases, any query containing an odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List of fields and the boosts to associate with each of them when
   building DisjunctionMaxQueries from the user's query. The format
   supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
   fieldOne has a boost of 2.3, fieldTwo has the default boost, and
   fieldThree has a boost of 0.4 ... this indicates that matches in
   fieldOne are much more significant than matches in fieldTwo, which are
   more significant than matches in fieldThree. 
   
   But if I want to, say, search for films with 'indiana' in the title,
   with media=DVD scoring higher than media=BLU-RAY then do I need to do
   something like:
   
   solr/select?q=indiana
   
   And in my config:
   
   str name=qfmedia^2/str
   
   But I don't see where the actual *contents* of the media field would
   determine the boost.
   
   Sorry if I have misunderstood what you mean.
   
   Cheers,
   Pete
   
   Pete Smith-3 wrote:

Hi,

I have managed to build an index in Solr which I can search on
  keyword,
produce facets, query facets etc. This is all working great. I have
implemented my search using a dismax query so it searches
  predetermined
fields.

However, my results are coming back sorted by score which appears to
  be
calculated by keyword relevancy only. I would like to adjust the
  score
where fields have pre-determined values. I think I can do this with
boost query and boost functions but the documentation here:

   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to
  my
search: 

bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding
  more