aude added a comment.

@daniel if you would like "encyclopedia of life" to be the first result for 
searching "life", then incoming links alone might be good for scoring

life (Q3) has 56 incoming links

encyclopedia of life (Q82486) has 1365362 incoming links

I'm not sure that *not* doing tf/idf is the solution, but we can investigate. 
The way we munge all the different terms in all the languages together in one 
field is probably not ideal for tf/idf.  "life" is probably translated 
differently in most languages whereas "Half Life" (Q752241) is generally not 
translated yet has labels in lots of languages, so "life" is especially 
frequent.  If we could consider just english when searching in english, then 
"Half Life" probably is not boosted as much compared to "life".

I think considering other attributes (e.g. # of site links, # of statements, 
etc) of the document to boost scoring could help. It already works okayish 
enough in the entity selector. Once we put these in, then we can try different 
rescorings to see what works well.


TASK DETAIL
  https://phabricator.wikimedia.org/T119066

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aude
Cc: daniel, aude, Aklapper, Wikidata-bugs, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to