Interesting, but you miss Latin language which is official language of a
country (even if English Wikipedia says differently).


On Mon, Jun 8, 2015 at 12:23 AM, Milos Rancic <> wrote:

> When you get data, at some point of time you start thinking about
> quite fringe comparisons. But that could actually give some useful
> conclusions, like this time it did [1].
> We did the next:
> * Used the number of primary speakers from Ethnologue. (Erik Zachte is
> using approximate number of primary + secondary speakers; that could
> be good for correction of this data.)
> * Categorized languages according to the logarithmic number of
> speakers: >=10k, >=100k, >=1M, >=10M, >=100M.
> * Took the number of articles of Wikipedia in particular language and
> created ration (number of articles / number of speakers).
> * This list is consisted just of languages with Ethnologue status 1
> (national), 2 (provincial) or 3 (wider communication). In fact, we
> have a lot of projects (more than 100) with worse language status; a
> number of them are actually threatened or even on the edge of
> extinction.
> Those are the preliminary results and I will definitely have to pass
> through all the numbers. I fixed manually some serious errors, like
> not having English Wikipedia itself inside of data :D
> Putting the languages into the logarithmic categories proved to be
> useful, as we are now able to compare the Wikipedias according to
> their gross capacity (numbers of speakers). I suppose somebody well
> introduced into statistics could even create the function which could
> be used to check how good one project stays, no matter of those strict
> categories.
> It's obvious that as more speakers one language has, it's harder to
> the community to follow the ratio.
> So, the winners per category are:
> 1) >= 1k: Hawaiian, ratio 0.96900
> 2) >= 10k: Mirandese, ratio 0.18073
> 3) >= 100k: Basque, ratio 0.38061
> 4) >= 1M: Swedish, ratio 0.21381
> 5) >= 10M: Dutch, ratio 0.08305
> 6) >= 100M: English, ratio 0.01447
> However, keep in mind that we removed languages not inside categories
> 1, 2 or 3. That affected >=10k languages, as, for example, Upper
> Sorbian stays much better than Mirandese (0.67). (Will fix it while
> creating the full report. Obviously, in this case logarithmic
> categories of numbers of speakers are much more important than what's
> the state of the language.)
> It's obvious that we could draw the line between 1:1 for 1-10k
> speakers to 10:1 for >=100M speakers. But, again, I would like to get
> input of somebody more competent.
> One very important category is missing here and it's about the level
> of development of the speakers. That could be added: GDP/PPP per
> capita for spoken country or countries would be useful as measurement.
> And I suppose somebody with statistical knowledge would be able to
> give us the number which would have meaning "ability to create
> Wikipedia article".
> Completed in such way, we'd be able to measure the success of
> particular Wikimedia groups and organizations. OK. Articles per
> speaker are not the only way to do so, but we could use other
> parameters, as well: number of new/active/very active editors etc. And
> we could put it into time scale.
> I'll make some other results. And to remind: I'd like to have the
> formula to count "ability to create Wikipedia article" and then to
> produce "level of particular community success in creating Wikipedia
> articles". And, of course, to implement it for editors.
> [1]
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> Unsubscribe:,
> <>

Ilario Valdelli
Wikimedia CH
Verein zur Förderung Freien Wissens
Association pour l’avancement des connaissances libre
Associazione per il sostegno alla conoscenza libera
Switzerland - 8008 Zürich
Wikipedia: Ilario <>
Skype: valdelli
Facebook: Ilario Valdelli <>
Twitter: Ilario Valdelli <>
Linkedin: Ilario Valdelli <>
Tel: +41764821371
Wikimedia-l mailing list, guidelines at:

Reply via email to