dcausse added a comment. We can inhibit tf/idf by setting the weight of the main query to 0 and use either "max" or "add". Note that tf/idf will still play a role to extract the top-N results that will be rescored. N is 8196*7 (number of shards) so if shards are well balanced we should cover queries that return less than 57372 entities (with more results we have a risk that the interesting entity is outside the window). We can increase this number but with some perf cost. I'll try to extract (from cirrus logs) the number of queries that returns more than this number and see if we should worry about that.
Adjusting all these weights and find the proper formula is not an easy task, we should find a way to evaluate the performance, we could maybe take Q1 to Q1000 and run a query with the english label and count the number of times the entity is in the top 10. But I don't know very well the wikidata content so there's certainly better tests to run. We are building a set of tools to run those perf evaluations that could be useful in this case. Concerning phrases, we have a rescore function with a strong weight but this one is applied only to the top-512*7 because it's very costly. There is techniques to optimize this process (word n-grams). If it makes sense for wikidata we should probably investigate in this direction as well. Concerning PageRank I think you're right and @EBernhardson ran a test on enwiki and results are promising, we are building the tools needed to inject such data into the indices (hadoop <-> elastic). If the wikidata link graph is easy to extract it should be "easy" to do. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Sjoerddebruin, EBernhardson, aude, dcausse, Deskana, daniel, Mbch331, Aklapper, Lydia_Pintscher, Wikidata-bugs, Gryllida, jeremyb _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
