Boosting query results

Mark T. Trembley Thu, 07 Jul 2016 07:30:50 -0700

I have a question about the best way to rank my results based on a scorefield that can have different values per document and where eachdocument can have different scores based on which term is queried.

Essentially what I'm wanting to have happen is provide a list of termsthat when matched via a query it returns a corresponding score to helpboost the original document. So if I had a document with a multi-valuedfield named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] andmy search query is "Boost2", I want that document's result to be boostedby 20. Also note that "Boost2" can boost different documents atdifferent levels. The query to select the actual documents will selectagainst other fields in the document and could possibly return documentswith any combination of B1 terms.

I'm still trying to figure out how best to model this in my index,either as child documents, or in another collection, or if it would makemore sense to figure out how to make it work via payloads or by boostingthe terms at index time.

I'm running Solr 5.5.1 in cloud mode. Each server has a complete replicaof all collections.

The document structure I've been toying with the most is to put theboosts into a separate index and join them using !join syntax andreturning the scores, but I've not had any luck getting quality resultsfrom those tests. The extra "scores" index is structured like this (I'lladd the json for my test collections at the end of the email):

id:Document1_Boost1
  B1_s:Boost1
  B1_f:10
id:Document1_Boost3
  B1_s:Boost3
  B1_f:100

Using this structure, I get close, but the scores are not what I'mexpecting. If I use the following query, the explain says it's using thescore from Document6_Boost2 even though my query is specifying B1_s:Boost3http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ssfromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true


<lstname="explain">
<strname="Document6">
*3.379996* = Score based on join value Document6_Boost2
</str>
<strname="Document1">
*2.2533307* = Score based on join value Document1_Boost1
</str>
<strname="Document7">
*0.24786638* = Score based on join value Document7_Boost333
</str>

<strname="Document3">*0.0* = Score based on join valueDocument3_NoBoost</str>

</lst>

My guess is that it's now doing an all document query on the "scores"collection to return the scores in addition to the B1_s query I'vepassed in. I can't figure out where it's getting those scores from as asimple query against the "scores" collection returns scores like I'dexpect to see them based on a similar query:http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND_val_:B1_f&fl=score,*&debugQuery=true


<lstname="explain">
<strname="Document1_Boost3">

*46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1)[ClassicSimilarity], result of: 1.7682717 = score(doc=1,freq=1.0),product of: 0.8926926 = queryWeight, product of: 1.9808292 =idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeightin 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.01.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 =FunctionQuery(float(B1_f)), product of: 100.0 = float(B1_f)=100.0 1.0 =boost 0.45066613 = queryNorm

</str>
<strname="Document6_Boost3">

*15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5)[ClassicSimilarity], result of: 1.7682717 = score(doc=5,freq=1.0),product of: 0.8926926 = queryWeight, product of: 1.9808292 =idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeightin 5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.01.9808292 = idf(docFreq=2, maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 =FunctionQuery(float(B1_f)), product of: 30.0 = float(B1_f)=30.0 1.0 =boost 0.45066613 = queryNorm

</str>
</lst>

I feel like I'm getting close to what I need, but it's just not clear tome what I'm missing at this point.

The other option I've been toying with is using payloads, but actuallyutilizing the payloads as part of the scoring process is beyond me atthis time.

Any thoughts or hints on the best way to boost the relevancy of thesescoreswould be appreciated.

Thanks
Mark







GENERIC:
 {
    "id" : "Document1",
    "B1_ss" : ["Boost1|10","Boost3|100"],
    "title_s" : "Title1"
    ,"otherstuff_ss" : ["stuff1","suggestion"]
    ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
  },
  {
    "id" : "Document2",
    "B1_ss" : ["Boost2|20"],
    "name_s" : "Product2",
    "title_s" : "Title2"
    ,"otherstuff_ss" : ["stuff2","recommendation"]
    ,"B1_name_ss" : ["Document2_Boost1"]
  },
  {
    "id" : "Document3",
    "name_s" : "Product3",
    "B1_ss" : ["NoBoost"],
    "title_s" : "Title3"
    ,"otherstuff_ss" : ["stuff3","new","suggestion"]
    ,"B1_name_ss" : ["Document3_NoBoost"]
  },
   {
   "id" : "Document4",
    "name_s" : "Product4",
    "title_s" : "Title4"
    ,"otherstuff_ss" : ["stuff4","old","suggestion"]
  } ,
   {
   "id" : "Document5",
    "name_s" : "Product5",
    "title_s" : "Title5"
    ,"otherstuff_ss" : ["stuff5","recommendation"]
  },
   {
    "id" : "Document6",
    "name_s" : "Product6",
    "B1_ss" : ["Boost2|15","Boost3|30"],
    "title_s" : "Title6"
    ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
  },
   {
     "id" : "Document7",
    "name_s" : "Product7",
    "B1_ss" : ["NoBoost","Boost333|1.1"],
    "title_s" : "Title7"
    ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
  }

SCORES:
  {
    "id" : "Document1_Boost1",
    "B1_s" : "Boost1",
    "B1_f" : 10
  },
    {
    "id" : "Document1_Boost3",
    "B1_s" : "Boost3",
    "B1_f" : 100
  },
  {
    "id" : "Document2_Boost2",
    "B1_s" : "Boost2",
    "B1_f" : 20
  },
  {
    "id" : "Document3_NoBoost",
    "B1_s" : "NoBoost"
  },
  {
    "id" : "Document6_Boost2",
    "B1_s" : "Boost2",
    "B1_f" : 15
  },
  {
    "id" : "Document6_Boost3",
    "B1_s" : "Boost3",
    "B1_f" : 30
  },
  {
    "id" : "Document7_NoBoost",
    "B1_s" : "NoBoost"
  },
  {
    "id" : "Document7_Boost333",
    "B1_s" : "Boost333",
    "B1_f" : 1.1
  }

Boosting query results

Reply via email to