GoranSMilovanovic added a comment.
@Manuel Here is a concise report that relies on UNESCO Language Status <http://www.unesco.org/languages-atlas/>: F34547890: Wikidata_LanguageStatusReport.nb.html <https://phabricator.wikimedia.org/F34547890> The analyses presented here can be completely replicated using the Ethnologue language status <https://www.ethnologue.com/about/language-status> categories as well. Please let me know if you find that necessary or interesting - I have opted for UNESCO language status simply because I thought it would be good to use one criterion - if if we choose it ad hoc - in comparison to a more complicated situation where we use two criteria (UNESCO and Ethnologue). From my perspective, the most important insights are: - Languages that are not endangered are way better represented than the endangered or vulnerable languages in terms of how many sitelinks they have; this is probably more relevant for the Wikipedia community than for us, however, I thought we should help by informing them when we already have the numbers at our hands; - Languages that are not endangered have many more labels in Wikidata in comparison to languages that are endangered or vulnerable; - Beyond that, languages that are not endangered in general label items that are more reused across the Wikimedia projects in comparison to the items for which we have labels in endangered or vulnerable languages. I have used visualizations, labeling languages by their respective code, in order to single out the extremes on the following indicators: - number of sitelinks - number of items for which a particular languages has labels for - the reuse of items labeled by a particular language. In conjunction with the tables - all of them are provided in the report - that might helps us to figure out if there are specific linguistic communities that we could address and see if they need any help. The analysis is exploratory: I did not want to invest any time in statistical hypothesis testing (e.g. comparisons across groups or languages + decision making on whether the differences are statistically significant or not) before we can have a glimpse of the big picture at least. Please let me know if anything needs further clarification; I am open for a 1:1 on this until Wednesday 14. July late CET hours. TASK DETAIL https://phabricator.wikimedia.org/T286257 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Tobi_WMDE_SW, Manuel, GoranSMilovanovic, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
