The score of a document has no scale: it only has meaning against other score in the same query.

Solr does not rank these documents correctly. Without sharing the TF/DF information across the shards, it cannot.

If the shards each have "a lot" of the same kind of document, this problem averages out. That is, the "statistical fingerprint" across the shards is similar enough that each index gives the same numerical range. Yes, this is hand-wavey, and we don't have a measuring tool that verifies this assertion.

Lance

Valli Indraganti wrote:
I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word "Hello"
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?

<add>
  -<#>  <doc>
    <field name="*id*">Valli1</field>
    <field name="*name*">One</field>
    <field name="*text*">Hello!This is a test document testing relevancy
scores.</field>
   </doc>
</add>

Reply via email to