[Solr Wiki] Trivial Update of "SolrRelevancyFAQ" by Mar cSturlese

Apache Wiki Thu, 04 Feb 2010 16:08:24 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "SolrRelevancyFAQ" page has been changed by MarcSturlese.
The comment on this change is: The bq clause was wrong.
http://wiki.apache.org/solr/SolrRelevancyFAQ?action=diff&rev1=16&rev2=17

--------------------------------------------------

  = Solr Relevancy FAQ =
- 
  Relevancy is the quality of results returned from a query, encompassing  both 
what documents are found, and their relative ranking (the order that they are 
returned to the user.)
  
  <<TableOfContents>>
  
  <<Anchor(standard_vs_dismax)>>
+ 
  == Should I use the standard or dismax request handler ==
  The [[StandardRequestHandler|standard]] request handler uses SolrQuerySyntax 
to specify the query via the '''q''' parameter, and it must be well formed or 
an error will be returned.  It's good for specifying exact, arbitrarily complex 
queries.
  
@@ -15, +15 @@

  For servicing user-entered queries, start by using dismax.
  
  <<Anchor(multiFieldQuery)>>
+ 
  == How can I search for "superman" in both the title and subject fields ==
  The standard request handler uses SolrQuerySyntax for '''q''':
  
@@ -24, +25 @@

  
  {{{q=superman&qf=title subject}}}
  
- 
  == How can I make "superman" in the title field score higher than in the 
subject field ==
  For the standard request handler, "boost" the clause on the title field:
  
@@ -34, +34 @@

  
  {{{q=superman&qf=title^2 subject}}}
  
- 
- 
  == Why are search results returned in the order they are? ==
  If no other sort order is specified, the default is by relevancy score.
  
- 
  == How can I see the relevancy scores for search results ==
- Request that the pseudo-field named "score" be returned by adding it to the 
'''fl''' (field list) parameter. The "score" will then appear along with the 
stored fields in returned documents.
+ Request that the pseudo-field named "score" be returned by adding it to the 
'''fl''' (field list) parameter. The "score" will then appear along with the 
stored fields in returned documents. {{{q=Justice League&fl=*,score}}}
- {{{q=Justice League&fl=*,score}}}
- 
  
  == Why doesn't my query of "flash" match a field containing "Flash" (with a 
capital "F") ==
- 
  The fieldType for the field containing "Flash" must have an analyzer that 
lowercases terms.  This will cause all searches on that field to be case 
insensitive.
  
  See AnalyzersTokenizersTokenFilters for more.
- 
  
  == How can I make exact-case matches score higher ==
  Example: a query of "Penguin" should score documents containing "Penguin" 
higher than docs containing "penguin".
  
+ The general strategy is to index the content twice, using different fields 
with different fieldTypes (and different analyzers associated with those 
fieldTypes). One analyzer will contain a lowercase filter for case-insensitive 
matches, and one will preserve case for exact-case matches.
- The general strategy is to index the content twice, using different fields
- with different fieldTypes (and different analyzers associated with those 
fieldTypes).
- One analyzer will contain a lowercase filter for case-insensitive matches, 
and one will preserve case for exact-case matches.
  
- Use [[SchemaXml#copyField|copyField]] commands in the schema to index a single
+ Use [[SchemaXml#copyField|copyField]] commands in the schema to index a 
single input field multiple times.
- input field multiple times.
  
- Once the content is indexed into multiple fields that are analyzed 
differently, 
+ Once the content is indexed into multiple fields that are analyzed 
differently,  [[#multiFieldQuery|query across both fields]].
- [[#multiFieldQuery|query across both fields]].
  
  == I'm getting query parse exceptions when making queries ==
  For the standard request handler, the '''q''' parameter must be correctly 
formatted SolrQuerySyntax, with any special characters escaped.  If this is a 
user-entered query, [[#standard_vs_dismax|consider using the dismax handler]].
  
- Many other parameters such as '''fq''' and '''facet.query''' must also 
conform to SolrQuerySyntax
+ Many other parameters such as '''fq''' and '''facet.query''' must also 
conform to SolrQuerySyntax regardless of which handler is used.
- regardless of which handler is used.
  
  == How can I make queries of "spiderman" and "spider man" match "Spider-Man" 
==
+ [[AnalyzersTokenizersTokenFilters#WordDelimiterFilter|WordDelimiterFilter]] 
can be used in the analyzer for the field being queried to match words with 
intra-word delimiters such as dashes or case changes.
- [[AnalyzersTokenizersTokenFilters#WordDelimiterFilter|WordDelimiterFilter]] 
can be used
- in the analyzer for the field being queried to match words with intra-word 
delimiters
- such as dashes or case changes.
- 
  
  == How can I search for one term near another term (say, "batman" and 
"movie") ==
+ A proximity search can be done with a sloppy phrase query.  The closer 
together the two terms appear in the document, the higher the score will be.  A 
sloppy phrase query  specifies a maximum "slop", or the number of positions 
tokens need to be moved to get a match.
- A proximity search can be done with a sloppy phrase query.  The closer 
together the two
- terms appear in the document, the higher the score will be.  A sloppy phrase 
query 
- specifies a maximum "slop", or the number of positions tokens need to be 
moved to get a match.
  
  This example for the standard request handler will find all documents where 
"batman" occurs within 100 words of "movie":
  
@@ -90, +73 @@

  
  {{{q=batman movie&pf=text&ps=100}}}
  
- The dismax handler also allows users to explicitly specify a phrase query 
with double quotes, and the '''qs'''(query slop) parameter can be used to add 
slop to any explicit phrase
+ The dismax handler also allows users to explicitly specify a phrase query 
with double quotes, and the '''qs'''(query slop) parameter can be used to add 
slop to any explicit phrase queries:
- queries:
  
  {{{q="batman movie"&qs=100}}}
- 
  
  == How can I increase the score for specific documents ==
  === index-time boosts ===
@@ -113, +94 @@

  Solr can parse function queries in the following 
[[http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)|syntax]].
  
  Some examples...
+ 
  {{{
    # simple boosts by popularity
    q=%2Bsupervillians+_val_:"popularity"
@@ -122, +104 @@

    q=%2Bsupervillians+_val_:"scale(popularity,0,100)"
    defType=dismax&qf=text&q=supervillians&bf=sqrt(popularity)
  }}}
- 
  == How are documents scored ==
  Basic scoring factors:
+ 
-   * tf stands for term frequency - the more times a search term appears in a 
document, the higher the score
+  * tf stands for term frequency - the more times a search term appears in a 
document, the higher the score
-   * idf stands for inverse document frequency - matches on rarer terms count 
more than matches on common terms 
+  * idf stands for inverse document frequency - matches on rarer terms count 
more than matches on common terms
-   * coord is the coordination factor - if there are multiple terms in a 
query, the more terms that match, the higher the score
+  * coord is the coordination factor - if there are multiple terms in a query, 
the more terms that match, the higher the score
-   * lengthNorm - matches on a smaller field score higher than matches on a 
larger field
+  * lengthNorm - matches on a smaller field score higher than matches on a 
larger field
-   * index-time boost - if a boost was specified for a document at index time, 
scores for searches that match that document will be boosted.
+  * index-time boost - if a boost was specified for a document at index time, 
scores for searches that match that document will be boosted.
-   * query clause boost - a user may explicitly boost the contribution of one 
part of a query over another.
+  * query clause boost - a user may explicitly boost the contribution of one 
part of a query over another.
  
  See the [[http://lucene.apache.org/java/2_4_0/scoring.html|Lucene scoring 
documentation]] for more info.
  
- 
  == Why does id:archangel come before id:hawkgirl when querying for "wings" ==
- Add '''debugQuery=on''' to your request, and you will get (fairly dense) 
detailed scoring information for each document returned. 
+ Add '''debugQuery=on''' to your request, and you will get (fairly dense) 
detailed scoring information for each document returned.
  
  {{{q=wings&indent=on&debugQuery=on}}}
  
  This extra information will appear in the "explain" section of the "debug" 
section in the response.
+ 
  {{{
  <response>
  <result>[...]</result>
@@ -164, +146 @@

  </str>
  [...]
  }}}
- 
  In this specific example, we see that the main scoring difference between the 
two documents is the '''tf''' or (term frequency) factor.  The text field for 
the '''id:archangel''' document contains the term '''wings''' 3 times 
({{{termFreq(text:wings)=3}}}) while the '''id:hawkgirl''' document only 
contains it once.
  
  Debug info is expensive to generate, and should only be used for debugging 
problems with specific queries.
@@ -172, +153 @@

  Debug info can also be selected from the admin query page, 
http://localhost:8983/solr/admin/form.jsp
  
  == Why doesn't document id:juggernaut appear in the top 10 results for my 
query ==
+ Since '''debugQuery=on''' only gives you scoring "explain" info for the 
documents returned, the '''explainOther''' parameter can be used to specify 
other documents you want detailed scoring info for.
- Since '''debugQuery=on''' only gives you scoring "explain" info for the 
documents
- returned, the '''explainOther''' parameter can be used to specify other 
documents
- you want detailed scoring info for.
  
  {{{q=supervillians&debugQuery=on&explainOther=id:juggernaut}}}
  
  Now you should be able to examine the scoring explain info of the top 
matching documents, compare it to the explain info for documents matching 
id:juggernaut, and determine why the rankings are not as you expect.
  
- 
  == How can I boost the score of newer documents ==
-   * Do an explicit sort by date (relevancy scores are ignored)
+  * Do an explicit sort by date (relevancy scores are ignored)
-   * Use an index-time boost that is larger for newer documents
+  * Use an index-time boost that is larger for newer documents
-   * Use a FunctionQuery to influence the score based on a date field.
+  * Use a FunctionQuery to influence the score based on a date field.
-     * In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
+   * In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
-     * In Solr 1.4, use something of the form 
recip(ms(NOW,mydatefield),3.16e-11,1,1)
+   * In Solr 1.4, use something of the form 
recip(ms(NOW,mydatefield),3.16e-11,1,1)
+  
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html
 
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
-   
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html
- 
-   
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
  
  A full example of a query for "ipod" with the score boosted higher the newer 
the product is:
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
  }}}
- 
  One can simplify the implementation by decomposing the query into multiple 
arguments:
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost b=$dateboost 
v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
  }}}
- 
  Now the main "q" argument as well as the "dateboost" argument may be 
specified as defaults in a search handler in solrconfig.xml, and clients would 
only need to pass "qq", the user query.
  
  To boost another query type such as a dismax query, the value of the boost 
query is a full sub-query and hence can use the {!querytype} syntax. 
Alternately, the defType param can be used in the boost local params to set the 
default type to dismax.  The other dismax parameters may be set as top level 
parameters.
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq 
defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
  }}}
- 
  == How do I give a very low boost to documents that match my query ==
  In general the problem is that a "low" boost is still a boost, it can only 
improve the score of documents that match. One way to fake a "negative boost" 
is to give a high boost to everything that does *not* match. For example:
  
-        bq=(*:* -field_a:54^10000)
+  . bq=(*:* -field_a:54)^10000
  
  TODO: If "bq" supports pure negative queries then you can simplify that to 
bq=-field_a:54^10000
  
  == TODO ==
  /!\ :TODO: /!\
+ 
   * shorter fields score higher
   * filter vs query clause
   * when should index-time boosts be used

[Solr Wiki] Trivial Update of "SolrRelevancyFAQ" by Mar cSturlese

Reply via email to