First let me say that this is very possibly the "x - y problem" so let me
state up front what my ultimate need is -- then I'll ask about the thing I
imagine might help...  which, of course, is heavily biased in the direction
of my experience coding Java and writing SQL...

I have a piece of a query that calculates a score based on a "weighting"
number stored in each solr doc.  I'm including the xml for my custom
endpoint below...

The specific line is this:
<str name="bf">product(field(category_weight),20)</str>

What I just realized is that when I query Solr for a string that has NO
matches in the entire corpus, I still get a slew of results because EVERY
doc has the weighting value in the category_weight field - and therefore
every doc gets some score.

What I would like is to return zero results if there is no match for the
querystring.  My collection is small enough that I don't care if the actual
calculation runs on each doc (although that's wasteful) -- I just don't
want to see results come back for zero matches to the querystring

(The /select endpoint does this of course, but my custom endpoint includes
this "weighting" piece and therefore returns every doc in the corpus
because they all have the weighting.

====================
Enter my imagined solution...  The potential X-Y problem...
====================

So - given that I come from a programming background, I immediately start
thinking of an if statement ...

     if(some_score_for_the_primary_search_string) {
          run_the_category_weight_calculation;
     } else {
          do_NOT_run_category_weight_calc;
     }


Another way of thinking of it would be something like the "WHERE" clause in
SQL...

 run_category_weight_calculation WHERE "searchstring" is found in the
document, not otherwise.

I'm aware that things could be handled in the client-side of my web app,
but if possible, I'd like the interface to SOLR to be as clean as possible,
and massage incoming SOLR data as little as possible.

In other words, do NOT return any docs if the querystring (and any
synonyms) match zero docs.

Here is the endpoint XML for the query.  I've highlighted the specific line
that is causing the unintended results...


 <requestHandler name="/foo" class="solr.SearchHandler">
    <!-- default values for query parameters can be specified, these
         will be overridden by parameters in the request
      -->
     <lst name="defaults">
       <str name="echoParams">all</str>
       <int name="rows">20</int>
       <!-- Query settings -->
       <str name="df">text</str>
      <!-- <str name="df">title</str> -->
       <str name="defType">synonym_edismax</str>>
       <str name="synonyms">true</str>
    <!-- The line below balances out the weighting of exact matches to the
synonym phrase entered by the user
         with the category_weight calculation and the titleQuery calc.
These numbers exist in a balance and
         if one is raised or lowered, the others (probably) need to change
as well.  It may be better to go with decimals
         for all of them... .4 instead of 4 and 2 instead of 20 and 2.5
instead of 25.
         In the end, I'm not sure it really matters, but don't change one
without changing the others
         unless you've tested and are sure you want the results  -->
       <float name="synonyms.originalBoost">1.5</float>
       <float name="synonyms.synonymBoost">1.1</float>
       <str name="mm">75%</str>
       <str name="q.alt">*:*</str>
       <str name="rows">20</str>
       <str name="fq">meta_doc_type:chapterDoc</str>
       <str name="bq">{!synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}</str>
       <str name="fl">id category_weight title category_ss score
contentType</str>
       <str name="titleQuery">{!edismax qf='title' bf='' bq='' v=$q}</str>
=====================================================
       *<str name="bf">product(field(category_weight),20)</str>*
=====================================================
       <str name="bf">product(query($titleQuery),4)</str>
       <str name="qf">text contentType^1000</str>
       <str name="wt">python</str>
       <str name="debug">true</str>
       <str name="debug.explain.structured">true</str>
       <str name="indent">true</str>
       <str name="echoParams">all</str>
     </lst>
  </requestHandler>

And here is the debug output for a query.  (This was a test for synonyms,
which you'll see in the output.) The original query string was, of
course, "μ-heavy
chain disease"

You'll note that although there is no score in the first doc explain for
the actual querystring, the highlighted section does get a score for
product(double(category_weight)=1.5,const(20))

... which is the thing that is currently causing all the docs in the
collection to "match" even though the querystring is not in any of them.

"debug":{ "rawquerystring":"\"μ-heavy chain disease\"",
"querystring":"\"μ-heavy
chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | (contentType:\"mu
heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ heavy
chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1)))
FunctionQuery(product(double(category_weight),const(20)))
FunctionQuery(product(query(+(title:\"μ heavy chain
disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((text:\"μ heavy
chain disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain
disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | (contentType:\"μ
heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1)) ((((title:\"μ heavy chain disease\"))^2.5
((+(title:\"mu heavy chain disease\"))^1.1) ((+(title:\"μ hcd\"))^1.1)
((+(title:\"μ heavy chain disease\"))^1.1) ((+(title:\"μ hcd\"))^1.1)))
product(double(category_weight),const(20)) product(query(+(title:\"μ heavy
chain disease\"),def=0.0),const(4))", "explain":{ "
33d808fe-6ccf-4305-a643-48e94de34d18":{ "match":true, "value":30.0, "
description":"sum of:", "details":[{ "match":true, "value":30.0, "
description":"FunctionQuery(product(double(category_weight),const(20))),
product of:",
=====================================================
*"details":**[{ "match":true, "value":30.0,
"description":"product(double(category_weight)=1.5,const(20))"}, {*
=====================================================

"match":true, "value":1.0, "description":"boost"}, { "match":true, "value":
1.0, "description":"queryNorm"}]}, {

Reply via email to