dcausse added a comment.

Sorry... I was completely wrong when analyzing lucene explain for Q3 (it's a 
pain to debug scoring issues 
<https://www.wikidata.org/w/index.php?title=Special:Search&limit=10&offset=850&profile=default&search=life&cirrusDumpResult&cirrusExplain>
 ). 
I think I've read another entity.

Q3 lucene score is 0.1824194
Boost link score is: 1.763428 ~= log(2+53) so it's OK
Namespace boost: 0.05
Final score will be : 0.1824194 * 1.763428 * 0.05 => 0.016084173

Here is few examples:

| entity | Number of words | Life freq | Lucene   | Links    | ns   | final | 
rank | desc                                                                     
                                                          |
| Q3     | 830             | 9         | 0.1824194      | 1.763428 | 0.05 | 
0.01608417   | ~800 | The lucene score is very bad                              
                                                                         |
| Q752241      | 280             | 64        | 0.8565265      | 1.5314789      
| 0.05 | 0.06558761   | 4    | The lucene score is good and incoming_link is OK 
                                                                                
  |
| Q171972      | 89              | 34        | 1.075165 | 0.7781513      | 0.05 
| 0.041832052   | 20   | Incoming link is bad but lucene score is good even if 
there's only 34 occurrences, this is because the size norm (89 vs 280 for 
Q752241) |

So clearly it's because of the bad lucene score.
So I was wrong : incoming links won't take precedence.

But I can't explain why this has changed in August... :(

To sum up:

- fixing the bad lucene score will require a better cirrus <> wikidata 
integration to allow more complex queries with dedicated fields and boosts.
- workaround could be to write a custom rescore profile with a new numeric or 
by overboosting incoming links (maybe completely inhibit lucene score for now). 
Could be addressed by https://gerrit.wikimedia.org/r/#/c/249460/


TASK DETAIL
  https://phabricator.wikimedia.org/T110648

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: aude, dcausse, Deskana, daniel, Mbch331, Aklapper, Lydia_Pintscher, 
Wikidata-bugs, Gryllida, jeremyb



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to