GoranSMilovanovic added a comment.
@Lydia_Pintscher @RazShuty Something to begin with: - each node is a language (Wikimedia language codes <https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all> are used); - each language points towards the three most similar languages to it, - in terms of the overlap in the respective language labels across >57M Wikidata items: - (explanation: for each language we search what WD items have a label in it, - then: similarity between two languages == Jaccard distance <https://en.wikipedia.org/wiki/Jaccard_index> between two binary vectors of length approx. 57M each). F30078182: WD_Languages.png <https://phabricator.wikimedia.org/F30078182> Mapping WDCM item re-use statistics onto languages now. TASK DETAIL https://phabricator.wikimedia.org/T223119 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs