Here are some bad and some good news... The bad news is that I've finally realized why I needed a separate wiki for data. It's about restrictive Ethnologue's ToS [1]. In other words, I could say to myself just: Welcome back to the wonderful world of licenses!
So, I've created a private wiki with some of the data. Anyone willing to join me in "data analysis" work is welcome; I'll create accounts on that wiki. Said so, I urge to all relevant persons to contact me privately with preferred username. (And if I have to be more precise, this is related to the languages, chapters, WMF and its funds.) I also need one or more persons willing to code in Python. Good news is that I've realized that I did good job in coding, with a number of relevant categorizations; which triggers a bad news because I'd need some time to get familiarized with my code again. The data about the number of not represented languages on Wikimedia projects: * 23 languages with more than 10 millions of speakers * 230 languages with more than one million of speakers * 866 languages with more than 100 thousands of speakers * 1831 languages with more than 10 thousands of speakers The largest language with the project in Incubator has 38 millions of speakers. [1] http://www.ethnologue.com/terms-use On Sat, Apr 26, 2014 at 2:11 PM, Seb35 <seb35wikipe...@gmail.com> wrote: > Hei, > > As a supporter of language diversity, I'm a bit sad of this thread because > some people find we should not engage in language revitalisation because: > 1/ it's not explicitely in our scope (and I don't fully aggree: "sum of > all knowledge" also includes minority cultures expressed in their > languages, as shown by Hubert Laska with the "Kneip"), > 2/ it's too difficult/expansive "to save most languages". > > Although there are obviously great difficulties, I find it shouldn't stop > us to support or partnership with local languages institutions, > particularly if there are interested people or volunteers: we are not > obliged to select the 3000 more spoken languages and set up parterships to > "save" these 3000 languages, but we can support institutions or volunteers > _interested_ in saving some small language on a case-by-case basis (Rapa > Nui, Chickasaw, Skolt Sami, Kibushi, whatever) if minimum requirements are > met (writing system and ISO 639 code for a website, financial ressources > for a project), i.e. crowdsourcing the language preservation between > Wikimedia, volunteers, speakers, and institutions. > > When multilinguism in the cyberspace is discussed by linguists, Wikipedia > is almost every time shown as *the* better successful example. As > discussed in this thread, perhaps some projects (Wikisource, Wiktionary, > Wikidata) are easier to set up in these languages and this could be a > first step, but these will only preserve these as non-living objects of > interest, at the contrary of a Wikibook/Wikipedia/Wikinews/Wikiversity > where speakers could practice the language, invent neologisms and > terminology, create corpora for linguists, and show the language to other > interested people in the world (I'm sure there are). > > As an example in France, Wikimédia France has quite good relationships > with the DGLFLF (Delegation for the French language and languages of > France), and this institution census 75 languages in France, whose 2/3 are > overseas [1]. The DGLFLF contributed ressources on some small languages > and multilinguism on Wikibooks [2] and Commons [3]. > > [1] (fr) > http://www.culture.gouv.fr/culture/dglf/lgfrance/lgfrance_presentation.htm > [2] (fr) > https://fr.wikibooks.org/wiki/États_généraux_du_multilinguisme_dans_les_outre-mer > [3] (fr)(mul) > https://commons.wikimedia.org/wiki/Category:États_généraux_du_multilinguisme_dans_les_outre-mer > > ~ Seb35 > > 20.04.2014 05:46:47 (CEST), Milos Rancic kirjoitti: > >> There are ~6000 languages in the world and around 3000 of them have >> more than 10,000 speakers. >> >> That approximation has some issues, but they are compensated by the >> ambiguity of the opposition. Ethnologue is not the best place to find >> precise data about the languages and it could count as languages just >> close varieties of one language, but it also doesn't count some other >> languages. Not all of the languages with 10,000 or more speakers have >> positive attitude toward their languages, but there are languages with >> smaller number of speakers with very positive attitude toward their >> own language. >> >> So, that number is what we could count as the realistic "final" number >> of the language editions of Wikimedia projects. At the moment, we have >> less than 300 language editions. >> >> * * * >> >> There is the question: Why should we do that? The answer is clear to >> me: Because we can. >> >> Yes, there are maybe more specific organizations which could do that, >> but it's not about expertise, but about ability. Fortunately, we don't >> need to search for historical examples for comparisons; the Internet >> is good enough. >> >> I still remember infographic of the time while all of us thought that >> Flickr is the place for images. It turned out that the biggest >> repository of images is actually Facebook, which had hundred times >> more of them than the Twitpic at the second place, which, in turn, had >> hundred times more of images than Flickr. >> >> In other words, the purpose of something and general perception of its >> purpose is not enough for doing good job. As well as comparisons >> between mismanaged internet projects and mismanaged traditional >> scientific and educational organizations are numerous. >> >> At this point of time Wikimedia all necessary capacities -- and even a >> will to take that job. So, we should start doing that, finally :) >> >> * * * >> >> There is also the question: How can we do that? In short, because of >> Wikipedia. >> >> I announced Microgrants project of Wikimedia Serbia yesterday. To be >> honest, we have very low expectations. When I said to Filip that I >> want to have 10 active community members after the project, he said >> that I am overambitious. Yes, I am. >> >> But ten hours later I've got the first response and I was very >> positively surprised by a lot of things. The most relevant for this >> story is that a person from a city in Serbia proper is very >> enthusiastic about Wikipedia and contributing to it (and organizing >> contributors in the area). I didn't hear that for years! (Maybe I was >> just too pessimistic because of my obsession with statistics.) >> >> Keeping in mind her position (she said that she was always complaining >> about lack of material on Serbian Wikipedia, although at this point of >> time it's the encyclopedia in Serbian with the most relevant content) >> and her enthusiasm, I am completely sure that many speakers of many >> small languages are dreaming from time to time to have Wikipedia in >> their native language. >> >> Like in the case of a Serbian from the fifth or sixth largest city in >> Serbia, I am sure that they just don't know how to do that. So, it's >> up to us to reach them. >> >> English Wikipedia has some influences on contemporary English language >> ("citation needed", let's say). It has more influences on languages >> with smaller number of speakers, like Serbian is (Cyrillic/Latin >> cultural war in Serbia was over at the moment when Serbian Wikipedia >> implemented transliteration engine; it's no issue now, while it was >> the issue up to mid 2000s). >> >> But it's about well developed languages in the cultural sense. What >> about not that developed ones? While I don't have an example of the >> effects (anyone, please?), counting the amount of the written >> materials in some languages, Wikipedia will (or already has) become >> the biggest book, sometimes the biggest library in that language; in >> some cases Wikipedia will create the majority of texts written in >> particular language! >> >> While we think about Wikipedia as valuable resource for learning about >> wide range of the topics, significance of Wikipedia for those peoples >> would be much higher. If we do the job, there will be many monuments >> to Wikipedia all over the world, because Wikipedia would preserve many >> cultures, not just the languages. >> >> * * * >> >> There is the question "How?", at the end. There are numerous of >> possible ways and there are also some tries to do that, but we have to >> create the plan how to do that systematically, well, according to our >> principles and goals and according to the reality. >> >> What we know from our previous experiences: >> >> * The number of editors has declined and, at the moment, without a >> miracle (or hard work, but I assume the most of our movement is used >> to miracles, not to hard work), the trend will continue. Contrary to >> that, number of readers has increased. Unfortunately, in this case a >> miracle is not necessary for that trend to end. >> >> * If we count languages with relevant statistics for editors per >> million, the top of them belong either to the highly motivated >> communities (Hebrew), either to the rich countries with harsh climate, >> which makes writing on Wikipedia as a good fun (Estonian, Icelandic, >> Norwegian, Finish), either to the community which belongs to the both >> categories (Scots Gaelic). And it's around 100 users per million. >> >> If a community has 100,000 of speakers, it would mean that the >> community would have 10 editors with 5 or more edits per month. In the >> cases of the languages with 10,000 of speakers, it would mean 1 editor >> with 5 or more edits per month. That won't work. >> >> I'd say that Scots Gaelic could be a good test (Wikimedia UK help >> needed!). It's a language with ~70k of speakers and if it's possible >> to achieve 100 active editors per month, we could say that it could >> somehow work in other cases, as well. >> >> * Besides preserving languages and cultural heritage, we want to have >> useful information on those Wikipedias. That's a tough job for many >> communities because of various issues: from the lack of reasonable >> internet access to the inherent cultural biases. >> >> But we have some tools -- Wikidata as the most important one -- to >> create a lot of useful content. >> >> But the entrance level is very high. Editors have to know to use >> computers well, as well as to think quite formally. That's serious >> obstacle in areas without well developed educational systems. >> >> * Good news is that we have chapters in three countries with a lot of >> languages: India, Indonesia and Australia (though, it's about very >> small languages in Australia; though, Australia is much richer). So, >> we have organizational potential. >> >> * There are, of course, a lot of other issues. Many of them, actually. >> But if we wouldn't start, we wouldn't do anything. >> >> * * * >> >> As you could see, I wrote this not as a kind of plan, but as the set >> of open questions. I'd like your input (first here, then on Meta): >> What do you think? How can we start working on it? What do you think >> it would be the most efficient way? Ways? Any other idea? >> >> I'd call you to give wings to your imagination. To be able to solve >> that, we need bold ideas. At the other side, I'd appreciate people >> with more organizational skills to give their input, as well. > > > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>