Indeed, GerardM, I agree with you that a few good women or men with
passions can kick start some Wikimedia projects, and different Wikimedia
projects have different barriers or paths of development.

I also agree with you that the direction that I am pursuing may not be
helpful to those languages in its incubation state. To be honest, I am not
trying to measure the likelihood of success.

What I am trying to measure is probably akin to the external *difficulty*
to be overcome for success. Here I have to admit that I approach this
question wearing a researcher hat more so than a Wikipedian hat.

Having said that, I personally believe this approach can be very productive
in generating outcomes for major world languages such as Mandarin, Spanish,
Hindi, Arabic, Bengali, Russian, Japanese and Punjabi (all these languages
have more native speakers than German, BTW). This way, researchers can make
them more comparable because of the available external baselines.

I can envision that the outcomes can help these communities to find their
strengths and weakness to develop. Then the strategies can be made to
increase/expand their reach of available external content or users.

This should also help sociolinguists to identify which languages
(especially non-national languages such as Kurdish or Cantonese) that are
more developed than others in the Wikipedia sphere, and seeks explanations
for their relative success/failure by contrasting the Wikipedia sphere and
offline/online sphere. These languages include many of the mid-size
language versions of Wikipedias such as Catalan, Cantonese, Tamil, etc.

Thus, I would argue that the analytical direction I want to take would be
useful for many language versions which already have some user base and
content. Again, I want them to be aware of both the internal and external
state of each language versions, thereby contextualizing the differences
among them.  The baseline stats based on external sources should make them
more comparable, instead of just number games among different language
groups of Wikipedians.

Also, I have to agree with GerardM that the issue is both practical and
political. I would like to add it is also political in terms of fund
dissemination within the global Wikimedia/open knowledge movement. I
personally believe that with the external numbers about potential available
users and content outside Wikipedia, we can only realize how much is
utilized/recruited from the external pool to the internal
Wikimedia/Wikipedia projects. This should provide some sensible comparison
bases on which Wikipedians can reflect upon.

Finally, may I point out the external environments for languages are also
changing, which could be useful for the global Wikimedia/open knowledge
movement. Based on my research on the competition of Baidu Baike and
Chinese Wikiepdia in mainland China, I found that the windfall of fast
growing internet users during the years of late 2005-2008 are crucial for
any websites to thrive in mainland China, a windfall that Chinese Wikipedia
missed because of the block by Beijing. From this, I argue that it makes
strategic sense to catch the wave of rising internet users, esp. during the
time when the penetration rates quickly rise from 12.8% to 40% for a given
population.  The external time-series data points can help pointing out the
rising language users on the Web (probably Indian languages when Chinese
languages have reached 40-50%).

Best,
han-teng liao



2014-07-08 12:03 GMT+01:00 Gerard Meijssen <gerard.meijs...@gmail.com>:

> Hoi,
> At the WMF language committee, the question if a language is viable for a
> Wikimedia project is a practical one. It is also very much a political one.
> One vitally important difference with your approach is that the distinction
> is between a first project and a subsequent project. In the latest
> iteration of the approach we do not consider Wikidata a first project.
> Relevance is that we do not require localisation of MediaWiki or an
> Incubator stage.
>
> When the question is what it takes for a new project to work? .. the
> simple answer is "a few good men". There are a few projects that are alive
> and well that rely on no more than 3 people.
>
> By not focussing on Wikipedia, it is possible that a Wikisource becomes
> the first project. When this is what those "few good men" want.. It is
> their party.
>
> You may imagine that we thought about what are the likely success factors
> for a new project. We did come up with similar ideas that you have. The
> problem is that it does not help. So you determine the likelihood of
> success, it does not guarantee it.
>
> What we certainly do not consider is the number of data sources. Sourcing
> is very much a luxury in starting projects. Insisting on sourcing at all
> will kill most initiatives immediately. What is important is that people
> start writing, reading in their language.. With a Wikipedia that gets
> active participation / readership, there will be a move to a more
> consistent orthography. Those that write determine in the end.
>
> Wikidata was given its exception because it represents the lowest level of
> participation with the most effect. Add one label to an item that is used a
> lot (human, male, female eg) and it can be used thousands of times. It is
> also very obvious to re-use dictionary information to make an impact.
> Thanks,
>       GerardM
>
>
>
> On 8 July 2014 09:27, Han-Teng Liao (OII) <han-teng.l...@oii.ox.ac.uk>
> wrote:
>
>> Dear all,
>>
>>      Your suggestions are needed on the ways in which one can construct
>> some sensible baselines, most likely based on data sets *external* to
>> Wikipedia projects, of *expected* Wikipedia language versions development.
>>
>>       Such baselines should ideally indicate, given the availability of
>> language users and content (some numbers based on external data sets), a
>> certain language version should have expected number of articles/active
>> users.
>>
>>       As previous research has suggested that Wikipedia activities need
>> mutually-reinforcing cycles of participation, content, and readership, it
>> is expected that the development of a Wikipedia language version is
>> conditioned by the availability of (digitally) literate users and (possibly
>> digitized) content/sources.
>>
>>      So the assumption is:
>>
>> Wikipedia Activities = Some function of (available users and content)
>>
>>       For example, the major non-English writing languages in the world
>> such as Arabic, Chinese, Spanish, etc., may have different numbers of
>> Internet users and digital content. These numbers indicate the basis on
>> which a Wikipedia language version can develop.
>>
>>       One practical use of this baseline measurement is to better
>> categorize/curate activities across Wikipedia language versions. We can
>> then better come up with expected values of Wikipedia development, and thus
>> categorize language versions accordingly based on the *external conditions*
>> of available/potential users and content.
>>
>>       Another use of this baseline measurement is to better compare the
>> development of different language versions. It should help answer questions
>> such as (1) whether Korean language version is *underdeveloped* on
>> Wikipedia platforms when compared with a language version that enjoys
>> similar number of available/potential users and content.
>>
>>      The current similar external baseline data is probably the number of
>> language speakers. My hunch is that it is not good enough in taking into
>> accounts the available/potential users and content, especially the
>> digitally-ready one.
>>
>>       So I welcome you to add to the following list, any external
>> indicators (and possibly data sources) that may help to construct such base
>> line.
>>
>> ==Indicators==
>>  * Internet users for each language (probably approximate measurement
>> based on CLDR Territory-Language information and ITU internet penetration
>> rates.
>>
>> * Number of books published annually in different languages (suggested
>> data sources? Does ISBN have a database or stat report on published
>> languages?)
>>
>> * Number of web pages returned by major search engines on the queries of
>> "Wikipedia" in different languages, excluding results from Wikimedia
>> projects.
>>
>> * Number of scholarly publications across languages (suggested data
>> sources?)
>>
>> * Number of major newspaper publications across languages (suggested data
>> sources?)
>>
>>
>>     Please share your thoughts!
>>
>> --
>> han-teng liao
>>
>> "[O]nce the Imperial Institute of France and the Royal Society of London
>> begin to work together on a new encyclopaedia, it will take less than a
>> year to achieve a lasting peace between France and England." - Henri
>> Saint-Simon (1810)
>>
>> "A common ideology based on this Permanent World Encyclopaedia is a
>> possible means, to some it seems the only means, of dissolving human
>> conflict into unity." - H.G. Wells (1937)
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to