chelsyx added a comment.

We parsed the wikitext of all files in Commons xml data dumps of November 20, 2017, and extract the language templates in them (e.g. {{en}}, {{LangSwitch}}). Out of the total 43,268,565 files, 14,848,551 (34.32%) files don't have any language templates, 23,780,247 (54.96%) files use only 1 language.
F11792338: files_by_n_languages.png

40.1% of all files have English templates, 9.38% of files use German, and 6.2% of files have description in languages which are not in the top 20.
F11792361: top20_languages_nfiles.png

For those files without language template, we use the langdetect package to detect their languages. We cannot detect any language in 556,684 files (1.29% of all 43,268,565 files). We detect 1 language for 7,577,789 (17.51%) files.
F11795099: files_by_n_detected_languages.png

We detect English in 30.25% of all 43,268,565 files, detect German in 3.93% of files.
F11795155: top20_detected_languages_nfiles.png


TASK DETAIL
https://phabricator.wikimedia.org/T177358

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mpopov, chelsyx
Cc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to