han-teng liao,
Sorry but I had to read your answer a couple times before I understood what
you were getting at. I missed the previous conversation also. For
information about the 10,000 things, I would just go to GerardM because he
knows all about that stuff. As far as page stats on all the projects, you
may want to talk to Erik Zachte about his Infodesiac charts.

The reason I was confused is because you can't make a comparison study
unless you fix a few variables, and if you take an approach that is only
including external sites that are only available in the specific languages,
then I don't think you have variables that you can compare across
languages. My point about the combination community-content, is that there
is no quality content without all of the chaff, and there is no community
without quality content.

Therefore, you need to look at both, so the social side of editor
interactions (or lack thereof) are as equally important as looking solely
at content. Don't forget that all edits result in one way or another from
an internet search.

As far as the term "a few good men" goes, I think Gerard was referring to
the success of the Dutch Wikipedia, which is pretty good in terms of
"number of people in the world who speak and read the language", while 94%
of all editors are male. I think Erik Zachte has gathered some numbers on
the language-speakers per Wikipedia aspect of this issue. Gender
information is only based on survey results, though I think our survey was
pretty solid, and though 6% declined to specify their gender, even if you
add this to the other 6% then the "few good men" statement still holds.

Jane
Lezer van de Prullenbak van de Ingezonden Brieven




On Tue, Jul 8, 2014 at 2:12 PM, h <[email protected]> wrote:

> (on Laura Hale's pilot study of measuring quality across several languages
> used in Spain)
>
> Laura, I enjoy reading the report on your blog post, which also takes also
> the quantified approach to measuring quality.
>
> If I did not misread your blogpost, you incorporated the measurement of
> general quality of external links (which I assume could be sources or
> suggested reading lists) as components of Wikipedia article quality. I also
> used similar research strategy for my thesis chapter.
>
> I also like the initial pick of “Female MEPs for Spain.” (Women
> representing Spain who are or have been Members of the European Parliament
> for study. It would be interesting to see how the same methodology applies
> for male MEPs for another gender-minded study.
>
> Can you tell me *whether* and *how* you measure or consider the
> availability of users/content for these languages in Spain. To me Spanish
> language is a world language whereas other languages (esp. Catalan) may
> have some kind of development in terms of their publishing market. It would
> be even more interesting to know, in terms of gender-minded external
> publications, for each language, and use this information to contextualize
> the sources used in these Wikipedia articles. It is not uncommon practices
> for a Wikipedia article to cite a more dominant or *published* language
> source. I would imagine some Catalan Wikipedia articles also cite Spanish
> sources, for instance.
>
> This leads to the issue of language/knowledge dependency. Although I only
> addressed this issue superficially by visualizing interlanguage links
> before, it is on my mind, though it is a separate issue.
>
> Best,
> han-teng liao
>
>
> 2014-07-08 11:13 GMT+01:00 Laura Hale <[email protected]>:
>
> I more or less tried to have a go at this on
>> http://wikinewsreporter.wordpress.com/2014/06/30/determining-the-relative-quality-of-one-wikipedia-project-to-another-one-approach-with-english-spanish-catalan-galician-argonese-and-euskera-wikipedias/
>> using both internal and external criteria for determining quality.
>>  (External being defined as what is considered good type of work on the
>> topic using outside, non-Wikipedia specific definitions of quality.)
>>
>> Sincerely,
>> Laura Hale
>>
>>
>> On Tue, Jul 8, 2014 at 12:06 PM, Han-Teng Liao (OII) <
>> [email protected]> wrote:
>>
>>> Thanks Jane for the comments and suggestions.
>>>
>>> Correct me if I misread your comments/suggestions, Jane.
>>>
>>> (1) Did you suggest measurements that are observable *inside*
>>> Wikipedia/Wikimedia websites?
>>> (2) If so, does it mean that your suggestion of measuring the current
>>> state of a language version as "a combination of the state of its
>>> content and community" describes only the *internal* state of that version?
>>> (3) When you said "zero-state", did you mean the state where the number
>>> of articles in a given language version is zero?
>>>
>>> Your suggestions appear to me deal with a measurement of the current
>>> state of a language version. The use of "zero-state" suggests the equal
>>> grounds for any language version to develop on the Wikipedia platform.
>>>
>>> However, my call for help focuses on the current external state out
>>> there external to Wikipedia platform. In this context, the term *baseline*
>>> suggests some languages are already *more equal* than the others because of
>>> the availability of language users and content out there. Since Wikipedia
>>> depends on reliable published secondary sources, some languages are
>>> *expected* to be more developed than the others. What I want to do is to
>>> come up such *expectation values* so that researchers and community members
>>> can see which language versions perform better/worse than expected, in
>>> comparison to other languages.
>>>
>>> While I can agree that on the Wikipedia platform, any language may have
>>> equal groundings when they start from zero. It is my contestation that some
>>> languages are already *more equal* than the other.
>>>
>>> In other words, I want to construct sensible baselines *against which*
>>> the development of language versions can be better understood. Such
>>> baselines thus should capture external factors that are likely to condition
>>> the development. Normalization of development metrics using such baselines
>>> can then control these external factors to see which language versions
>>> underperform even when the external availability content and users is not
>>> an issue. It can also help to see which language versions outperform even
>>> when the external conditions are not that great.
>>>
>>> Hence, I really appreciate your suggestions as potential indicators of
>>> the (internal) development state of a language version of Wikipedia, but
>>> they do not appear to capture factors that are external to Wikipedia.
>>>
>>> Best,
>>>
>>> 2014-07-08 10:09 GMT+01:00 Jane Darnell <[email protected]>:
>>>
>>>> Well as I see it, the state of any language version is a combination of
>>>> the state of its content and community. Going back to the zero-state, in
>>>> order to have permission to start a language version, there must be a "list
>>>> of 10,000 important topics" that has to be registered somewhere (sorry, no
>>>> idea where). This list for the English wikipedia includes an entry for the
>>>> singer Michael Jackson, one of the many articles that gets lots and lots of
>>>> page hits daily. Perhaps this is the case for all other languages in the
>>>> world (I have no idea), but I would assume one measurement going forward
>>>> from the zero-state would be the number of changes over time involving this
>>>> list in the specific language, such as
>>>> 1) The list itself (do these topics ever change?)
>>>> 2) The average number of edits and page views of those pages in the
>>>> specific language
>>>> 3) The average number of blue links per page on those pages in the
>>>> specific language
>>>> 4) The average number of editors *ever* contributing per page on those
>>>> pages in the specific language
>>>> 5) The average number of active editors contributing per page on those
>>>> pages in the specific language
>>>> ...
>>>>
>>>> Other important measurements could be the number of active editors over
>>>> all, the number of edits appearing in the recent changes list per
>>>> day/month/year, the number of pages created or deleted per 
>>>> day/month/year...
>>>>
>>>>
>>>> On Tue, Jul 8, 2014 at 9:27 AM, Han-Teng Liao (OII) <
>>>> [email protected]> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>>      Your suggestions are needed on the ways in which one can
>>>>> construct some sensible baselines, most likely based on data sets
>>>>> *external* to Wikipedia projects, of *expected* Wikipedia language 
>>>>> versions
>>>>> development.
>>>>>
>>>>>       Such baselines should ideally indicate, given the availability
>>>>> of language users and content (some numbers based on external data sets), 
>>>>> a
>>>>> certain language version should have expected number of articles/active
>>>>> users.
>>>>>
>>>>>       As previous research has suggested that Wikipedia activities
>>>>> need mutually-reinforcing cycles of participation, content, and 
>>>>> readership,
>>>>> it is expected that the development of a Wikipedia language version is
>>>>> conditioned by the availability of (digitally) literate users and 
>>>>> (possibly
>>>>> digitized) content/sources.
>>>>>
>>>>>      So the assumption is:
>>>>>
>>>>> Wikipedia Activities = Some function of (available users and content)
>>>>>
>>>>>       For example, the major non-English writing languages in the
>>>>> world such as Arabic, Chinese, Spanish, etc., may have different numbers 
>>>>> of
>>>>> Internet users and digital content. These numbers indicate the basis on
>>>>> which a Wikipedia language version can develop.
>>>>>
>>>>>       One practical use of this baseline measurement is to better
>>>>> categorize/curate activities across Wikipedia language versions. We can
>>>>> then better come up with expected values of Wikipedia development, and 
>>>>> thus
>>>>> categorize language versions accordingly based on the *external 
>>>>> conditions*
>>>>> of available/potential users and content.
>>>>>
>>>>>       Another use of this baseline measurement is to better compare
>>>>> the development of different language versions. It should help answer
>>>>> questions such as (1) whether Korean language version is *underdeveloped*
>>>>> on Wikipedia platforms when compared with a language version that enjoys
>>>>> similar number of available/potential users and content.
>>>>>
>>>>>      The current similar external baseline data is probably the number
>>>>> of language speakers. My hunch is that it is not good enough in taking 
>>>>> into
>>>>> accounts the available/potential users and content, especially the
>>>>> digitally-ready one.
>>>>>
>>>>>       So I welcome you to add to the following list, any external
>>>>> indicators (and possibly data sources) that may help to construct such 
>>>>> base
>>>>> line.
>>>>>
>>>>> ==Indicators==
>>>>>  * Internet users for each language (probably approximate measurement
>>>>> based on CLDR Territory-Language information and ITU internet penetration
>>>>> rates.
>>>>>
>>>>> * Number of books published annually in different languages (suggested
>>>>> data sources? Does ISBN have a database or stat report on published
>>>>> languages?)
>>>>>
>>>>> * Number of web pages returned by major search engines on the queries
>>>>> of "Wikipedia" in different languages, excluding results from Wikimedia
>>>>> projects.
>>>>>
>>>>> * Number of scholarly publications across languages (suggested data
>>>>> sources?)
>>>>>
>>>>> * Number of major newspaper publications across languages (suggested
>>>>> data sources?)
>>>>>
>>>>>
>>>>>     Please share your thoughts!
>>>>>
>>>>> --
>>>>> han-teng liao
>>>>>
>>>>> "[O]nce the Imperial Institute of France and the Royal Society of
>>>>> London begin to work together on a new encyclopaedia, it will take less
>>>>> than a year to achieve a lasting peace between France and England." - 
>>>>> Henri
>>>>> Saint-Simon (1810)
>>>>>
>>>>> "A common ideology based on this Permanent World Encyclopaedia is a
>>>>> possible means, to some it seems the only means, of dissolving human
>>>>> conflict into unity." - H.G. Wells (1937)
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>>
>> --
>> twitter: purplepopple
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to