aude added a comment. @daniel if you would like "encyclopedia of life" to be the first result for searching "life", then incoming links alone might be good for scoring
life (Q3) has 56 incoming links encyclopedia of life (Q82486) has 1365362 incoming links I'm not sure that *not* doing tf/idf is the solution, but we can investigate. The way we munge all the different terms in all the languages together in one field is probably not ideal for tf/idf. "life" is probably translated differently in most languages whereas "Half Life" (Q752241) is generally not translated yet has labels in lots of languages, so "life" is especially frequent. If we could consider just english when searching in english, then "Half Life" probably is not boosted as much compared to "life". I think considering other attributes (e.g. # of site links, # of statements, etc) of the document to boost scoring could help. It already works okayish enough in the entity selector. Once we put these in, then we can try different rescorings to see what works well. TASK DETAIL https://phabricator.wikimedia.org/T119066 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: daniel, aude, Aklapper, Wikidata-bugs, Mbch331 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
