Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-11 Thread Gerard Meijssen
Hoi,
One more thing to consider is the possibility to generate articles on the
fly based in information in Wikidata. This is already done in the
Reasonator and it functions with differing results for 2,225,364 items.
In essence it is a small script that can be translated in other languages.
Obviously it will not consider grammatical constructs that well so the text
will be awkward. This will however change once Wikidata gains lexical
information as is planned for its future.

To make this work well, the text can be cached and will not be saved as a
Wikipedia article.. A different text is needed for other classes.
Thanks,
 GerardM


On 10 July 2014 01:22, Kerry Raymond kerry.raym...@gmail.com wrote:

One thing that troubles me slightly with this conversation is that I
 think there is a presumption that people will naturally choose to read and
 write Wikipedia in their native language, but that isn’t necessarily so.



 Anecdotally it seems many people read English Wikipedia because precisely
 it is larger and more comprehensive (obviously they must have a reasonable
 ability to read English). And I would imagine that this is true too for,
 say, Catalan, where one might imagine they would also know Spanish and
 might well turn to the larger Spanish Wikipedia most of the time. Generally
 speaking, speakers of “small” languages (meaning small populations of
 native speakers) are likely to speak one or more “larger” language and
 therefore may preferentially read Wikipedia in those “larger” languages in
 order to have a broader and deeper array of content. As I don’t speak any
 “small” languages myself, I do not know if those Wikipedias tend to cover
 the more general topics or whether there is a greater focus on local
 content unlikely to be covered in “larger” Wikipedias – does anyone know?



 If this is true about reading Wikipedia, then it seems likely to flow over
 into writing Wikipedia as well. Writing for the “larger” language has the
 benefit of bringing information to more people. So, here, motivation for
 editing comes into play. I suspect people who write for the “small”
 language Wikipedias probably have a motivation to keep their language
 alive, whereas this is unlikely to be a consideration for the large
 languages. But OTOH if you write for Wikipedia because you are passionate
 about sharing your knowledge of a topic area (e.g. Pokemon, football,
 cactus), then it seems that you would write in the Wikipedia with the
 largest content base on that topic (within your linguistic abilities) as
 you would have more to build on and a larger community of other editors to
 work with. Of course, working with others on Wikipedia isn’t always easy,
 and perhaps that might be a factor that might drive an editor to write in a
 “smaller” language Wikipedia (which might be more work, but with less
 conflict).



 What I don’t know is whether any of these issues are microscopic or
 macroscopic. If they are macroscopic, then they have to be factored into
 the model of “how a Wikipedia should develop”.



 My personal view is that the “extremely small” language Wikipedias are
 unlikely to achieve a broad coverage of general topics because they are
 unlikely to find a large enough editor community. I think they will
 underperform whatever level of development they might theoretically be
 capable of. My rationale is that we know that Wikipedia is written
 predominantly by people with higher than average levels of education, which
 almost certainly means you have had to learn one or more larger languages
 to do this, thus opening up the ability to work with other Wikipedis, thus
 siphoning off some proportion of the editor base to boost the development
 of larger Wikipedias at the expense of their native-language Wikipedia. I
 think it is more realistic to focus on more local content in small language
 Wikipedias and leave the more general content to the larger Wikipedias.



 Note, this is all written on the assumption of not using machine
 translation. Clearly with machine translation, there is far greater
 potential for content in languages for which there are machine translation
 tools. But again, machine translation is less likely to be available for
 the “very small” languages, so even in that scenario, I think the smaller
 language Wikipedias will miss out on the content.



 Kerry





 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Federico Leva (Nemo)
h, 08/07/2014 13:49:
 This should also help sociolinguists to identify which languages
 [...] that
 are more developed than others in the Wikipedia sphere, and seeks
 explanations for their relative success/failure by contrasting the
 Wikipedia sphere and offline/online sphere.

Agreed on the importance of this (though I wouldn't restrict to
Wikipedia), and not only for researchers but also for editors to
self-assess. For many years our main tool has been sorting by Editors
(5+) per million speakers column in
http://stats.wikimedia.org/EN/Sitemap.htm , which however has two main
issues:
1) absurdly high number of editors in some editions makes some noise
though not tragic (classic example: Volapük; funny but doesn't really do
any harm);
2) irrealistic baseline of speakers in millions (which is not so
closely related to what happens on the wiki) means the rank mostly shows
how well those languages are doing on the internet, e.g. classic
dominance of Scandinavia and Israel and classic disuse of
Tagalog/Filipino (with some surprises like Northern Sami which clearly
has some strong supporters out there).

Realistic baselines would let me answer simple questions like whether
it.wiki is really doing better than de.wiki (35 vs. 33?!); given the
similarity of conditions, if not I may conclude there is a large
uncultivated land out there just waiting for some seeds (outreach to
people not knowing Wikimedia projects enough), if yes I may conclude
we've probably exhausted our natural resources and need to focus on
using them more efficiently.

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Heather Ford
This is such a great discussion. Thanks for starting it, Hang-teng :)

Laura, I just loved your analysis. Makes me realize that I spend way too
much time thinking about these things rather than practicing them which is
what you showed in your rapid analysis :)

One thing that I was really interested in was how you are thinking about
diversity of source languages. It's interesting because I tend to think
about this in exactly the opposite way! Basically, it seems that in your
analysis you're rewarding articles if they have a diversity of language
sources whereas I have always considered sources in terms of the
verifiability principle where the source should ideally be in the language
of the Wikipedia version so that users can verify whether the source is
being accurately reflected in the relevant article.

So I went to the 'verifiability' articles in a few different languages to
check whether there is consensus about this on Wikipedia, at least. The
english version [1] states that a) english language sources are preferred
because it's the English Wikipedia b) if another language source is used,
then editors may request a translation of relevant sections of the source,
and c) if other languages are used in quotations, then a translation must
be provided.

I looked at a few other language versions of the verifiability article (only
58 language versions have a version of this page) and few mention what to
do with other language sources. Afrikaans [2] seems to follow the
principles of the English version but Spanish and Catalan, for example,
don't mention other language versions of sources.

Anyway, I'd be really interested in what you think about this. Do you think
it's valuable to take Wikipedia's (or at least Wikipedia English's)
normative framework for evaluating citations or do you think there's value
in using another principle?

Thanks!

Best,
Heather.

[1] https://en.wikipedia.org/wiki/Wikipedia:Verifiability
[2] https://af.wikipedia.org/wiki/Wikipedia:Verifieerbaarheid


Heather Ford
Oxford Internet Institute http://www.oii.ox.ac.uk Doctoral Programme
EthnographyMatters http://ethnographymatters.net | Oxford Digital
Ethnography Group http://www.oii.ox.ac.uk/research/projects/?id=115
http://hblog.org | @hfordsa http://www.twitter.com/hfordsa




On 8 July 2014 11:13, Laura Hale la...@fanhistory.com wrote:

 I more or less tried to have a go at this on
 http://wikinewsreporter.wordpress.com/2014/06/30/determining-the-relative-quality-of-one-wikipedia-project-to-another-one-approach-with-english-spanish-catalan-galician-argonese-and-euskera-wikipedias/
 using both internal and external criteria for determining quality.
  (External being defined as what is considered good type of work on the
 topic using outside, non-Wikipedia specific definitions of quality.)

 Sincerely,
 Laura Hale


 On Tue, Jul 8, 2014 at 12:06 PM, Han-Teng Liao (OII) 
 han-teng.l...@oii.ox.ac.uk wrote:

 Thanks Jane for the comments and suggestions.

 Correct me if I misread your comments/suggestions, Jane.

 (1) Did you suggest measurements that are observable *inside*
 Wikipedia/Wikimedia websites?
 (2) If so, does it mean that your suggestion of measuring the current
 state of a language version as a combination of the state of its
 content and community describes only the *internal* state of that version?
 (3) When you said zero-state, did you mean the state where the number
 of articles in a given language version is zero?

 Your suggestions appear to me deal with a measurement of the current
 state of a language version. The use of zero-state suggests the equal
 grounds for any language version to develop on the Wikipedia platform.

 However, my call for help focuses on the current external state out there
 external to Wikipedia platform. In this context, the term *baseline*
 suggests some languages are already *more equal* than the others because of
 the availability of language users and content out there. Since Wikipedia
 depends on reliable published secondary sources, some languages are
 *expected* to be more developed than the others. What I want to do is to
 come up such *expectation values* so that researchers and community members
 can see which language versions perform better/worse than expected, in
 comparison to other languages.

 While I can agree that on the Wikipedia platform, any language may have
 equal groundings when they start from zero. It is my contestation that some
 languages are already *more equal* than the other.

 In other words, I want to construct sensible baselines *against which*
 the development of language versions can be better understood. Such
 baselines thus should capture external factors that are likely to condition
 the development. Normalization of development metrics using such baselines
 can then control these external factors to see which language versions
 underperform even when the external availability content and users is not
 an issue. It can also help to see which language 

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Laura Hale
I more or less tried to have a go at this on
http://wikinewsreporter.wordpress.com/2014/06/30/determining-the-relative-quality-of-one-wikipedia-project-to-another-one-approach-with-english-spanish-catalan-galician-argonese-and-euskera-wikipedias/
using both internal and external criteria for determining quality.
 (External being defined as what is considered good type of work on the
topic using outside, non-Wikipedia specific definitions of quality.)

Sincerely,
Laura Hale


On Tue, Jul 8, 2014 at 12:06 PM, Han-Teng Liao (OII) 
han-teng.l...@oii.ox.ac.uk wrote:

 Thanks Jane for the comments and suggestions.

 Correct me if I misread your comments/suggestions, Jane.

 (1) Did you suggest measurements that are observable *inside*
 Wikipedia/Wikimedia websites?
 (2) If so, does it mean that your suggestion of measuring the current
 state of a language version as a combination of the state of its content
 and community describes only the *internal* state of that version?
 (3) When you said zero-state, did you mean the state where the number
 of articles in a given language version is zero?

 Your suggestions appear to me deal with a measurement of the current state
 of a language version. The use of zero-state suggests the equal grounds
 for any language version to develop on the Wikipedia platform.

 However, my call for help focuses on the current external state out there
 external to Wikipedia platform. In this context, the term *baseline*
 suggests some languages are already *more equal* than the others because of
 the availability of language users and content out there. Since Wikipedia
 depends on reliable published secondary sources, some languages are
 *expected* to be more developed than the others. What I want to do is to
 come up such *expectation values* so that researchers and community members
 can see which language versions perform better/worse than expected, in
 comparison to other languages.

 While I can agree that on the Wikipedia platform, any language may have
 equal groundings when they start from zero. It is my contestation that some
 languages are already *more equal* than the other.

 In other words, I want to construct sensible baselines *against which* the
 development of language versions can be better understood. Such baselines
 thus should capture external factors that are likely to condition the
 development. Normalization of development metrics using such baselines can
 then control these external factors to see which language versions
 underperform even when the external availability content and users is not
 an issue. It can also help to see which language versions outperform even
 when the external conditions are not that great.

 Hence, I really appreciate your suggestions as potential indicators of the
 (internal) development state of a language version of Wikipedia, but they
 do not appear to capture factors that are external to Wikipedia.

 Best,

 2014-07-08 10:09 GMT+01:00 Jane Darnell jane...@gmail.com:

 Well as I see it, the state of any language version is a combination of
 the state of its content and community. Going back to the zero-state, in
 order to have permission to start a language version, there must be a list
 of 10,000 important topics that has to be registered somewhere (sorry, no
 idea where). This list for the English wikipedia includes an entry for the
 singer Michael Jackson, one of the many articles that gets lots and lots of
 page hits daily. Perhaps this is the case for all other languages in the
 world (I have no idea), but I would assume one measurement going forward
 from the zero-state would be the number of changes over time involving this
 list in the specific language, such as
 1) The list itself (do these topics ever change?)
 2) The average number of edits and page views of those pages in the
 specific language
 3) The average number of blue links per page on those pages in the
 specific language
 4) The average number of editors *ever* contributing per page on those
 pages in the specific language
 5) The average number of active editors contributing per page on those
 pages in the specific language
 ...

 Other important measurements could be the number of active editors over
 all, the number of edits appearing in the recent changes list per
 day/month/year, the number of pages created or deleted per day/month/year...


 On Tue, Jul 8, 2014 at 9:27 AM, Han-Teng Liao (OII) 
 han-teng.l...@oii.ox.ac.uk wrote:

 Dear all,

  Your suggestions are needed on the ways in which one can construct
 some sensible baselines, most likely based on data sets *external* to
 Wikipedia projects, of *expected* Wikipedia language versions development.

   Such baselines should ideally indicate, given the availability of
 language users and content (some numbers based on external data sets), a
 certain language version should have expected number of articles/active
 users.

   As previous research has suggested that Wikipedia activities need

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Stuart A. Yeates
Web browser language settings are an obvious place to start this. This
will give you an approximation of user's preferred language (more
likely the preferred language of those who configured their software).
See http://www.w3.org/International/questions/qa-lang-priorities.en.php
for the gory details.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Gerard Meijssen
Hoi,
At the WMF language committee, the question if a language is viable for a
Wikimedia project is a practical one. It is also very much a political one.
One vitally important difference with your approach is that the distinction
is between a first project and a subsequent project. In the latest
iteration of the approach we do not consider Wikidata a first project.
Relevance is that we do not require localisation of MediaWiki or an
Incubator stage.

When the question is what it takes for a new project to work? .. the simple
answer is a few good men. There are a few projects that are alive and
well that rely on no more than 3 people.

By not focussing on Wikipedia, it is possible that a Wikisource becomes the
first project. When this is what those few good men want.. It is their
party.

You may imagine that we thought about what are the likely success factors
for a new project. We did come up with similar ideas that you have. The
problem is that it does not help. So you determine the likelihood of
success, it does not guarantee it.

What we certainly do not consider is the number of data sources. Sourcing
is very much a luxury in starting projects. Insisting on sourcing at all
will kill most initiatives immediately. What is important is that people
start writing, reading in their language.. With a Wikipedia that gets
active participation / readership, there will be a move to a more
consistent orthography. Those that write determine in the end.

Wikidata was given its exception because it represents the lowest level of
participation with the most effect. Add one label to an item that is used a
lot (human, male, female eg) and it can be used thousands of times. It is
also very obvious to re-use dictionary information to make an impact.
Thanks,
  GerardM



On 8 July 2014 09:27, Han-Teng Liao (OII) han-teng.l...@oii.ox.ac.uk
wrote:

 Dear all,

  Your suggestions are needed on the ways in which one can construct
 some sensible baselines, most likely based on data sets *external* to
 Wikipedia projects, of *expected* Wikipedia language versions development.

   Such baselines should ideally indicate, given the availability of
 language users and content (some numbers based on external data sets), a
 certain language version should have expected number of articles/active
 users.

   As previous research has suggested that Wikipedia activities need
 mutually-reinforcing cycles of participation, content, and readership, it
 is expected that the development of a Wikipedia language version is
 conditioned by the availability of (digitally) literate users and (possibly
 digitized) content/sources.

  So the assumption is:

 Wikipedia Activities = Some function of (available users and content)

   For example, the major non-English writing languages in the world
 such as Arabic, Chinese, Spanish, etc., may have different numbers of
 Internet users and digital content. These numbers indicate the basis on
 which a Wikipedia language version can develop.

   One practical use of this baseline measurement is to better
 categorize/curate activities across Wikipedia language versions. We can
 then better come up with expected values of Wikipedia development, and thus
 categorize language versions accordingly based on the *external conditions*
 of available/potential users and content.

   Another use of this baseline measurement is to better compare the
 development of different language versions. It should help answer questions
 such as (1) whether Korean language version is *underdeveloped* on
 Wikipedia platforms when compared with a language version that enjoys
 similar number of available/potential users and content.

  The current similar external baseline data is probably the number of
 language speakers. My hunch is that it is not good enough in taking into
 accounts the available/potential users and content, especially the
 digitally-ready one.

   So I welcome you to add to the following list, any external
 indicators (and possibly data sources) that may help to construct such base
 line.

 ==Indicators==
  * Internet users for each language (probably approximate measurement
 based on CLDR Territory-Language information and ITU internet penetration
 rates.

 * Number of books published annually in different languages (suggested
 data sources? Does ISBN have a database or stat report on published
 languages?)

 * Number of web pages returned by major search engines on the queries of
 Wikipedia in different languages, excluding results from Wikimedia
 projects.

 * Number of scholarly publications across languages (suggested data
 sources?)

 * Number of major newspaper publications across languages (suggested data
 sources?)


 Please share your thoughts!

 --
 han-teng liao

 [O]nce the Imperial Institute of France and the Royal Society of London
 begin to work together on a new encyclopaedia, it will take less than a
 year to achieve a lasting peace between France and 

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread h
(user language log: e.g. Accept-Language parameter)

Yes Stuart, locale data could be a nice source to look at, including the
HTTP headers of the Accept-Language to find locale such as 
zh-TW,zh;q=0.8,en;q=0.6

Do you or anyone have suggestions on the external or global datasets that
can be used as a proxy for global web user activities based on
locale/languages?

I guess the above question is a more general question that I may want to
also ask the people in air-l mailing list.

Best,
han-teng


2014-07-08 11:13 GMT+01:00 Stuart A. Yeates syea...@gmail.com:

 Web browser language settings are an obvious place to start this. This
 will give you an approximation of user's preferred language (more
 likely the preferred language of those who configured their software).
 See http://www.w3.org/International/questions/qa-lang-priorities.en.php
 for the gory details.

 cheers
 stuart

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread h
Indeed, GerardM, I agree with you that a few good women or men with
passions can kick start some Wikimedia projects, and different Wikimedia
projects have different barriers or paths of development.

I also agree with you that the direction that I am pursuing may not be
helpful to those languages in its incubation state. To be honest, I am not
trying to measure the likelihood of success.

What I am trying to measure is probably akin to the external *difficulty*
to be overcome for success. Here I have to admit that I approach this
question wearing a researcher hat more so than a Wikipedian hat.

Having said that, I personally believe this approach can be very productive
in generating outcomes for major world languages such as Mandarin, Spanish,
Hindi, Arabic, Bengali, Russian, Japanese and Punjabi (all these languages
have more native speakers than German, BTW). This way, researchers can make
them more comparable because of the available external baselines.

I can envision that the outcomes can help these communities to find their
strengths and weakness to develop. Then the strategies can be made to
increase/expand their reach of available external content or users.

This should also help sociolinguists to identify which languages
(especially non-national languages such as Kurdish or Cantonese) that are
more developed than others in the Wikipedia sphere, and seeks explanations
for their relative success/failure by contrasting the Wikipedia sphere and
offline/online sphere. These languages include many of the mid-size
language versions of Wikipedias such as Catalan, Cantonese, Tamil, etc.

Thus, I would argue that the analytical direction I want to take would be
useful for many language versions which already have some user base and
content. Again, I want them to be aware of both the internal and external
state of each language versions, thereby contextualizing the differences
among them.  The baseline stats based on external sources should make them
more comparable, instead of just number games among different language
groups of Wikipedians.

Also, I have to agree with GerardM that the issue is both practical and
political. I would like to add it is also political in terms of fund
dissemination within the global Wikimedia/open knowledge movement. I
personally believe that with the external numbers about potential available
users and content outside Wikipedia, we can only realize how much is
utilized/recruited from the external pool to the internal
Wikimedia/Wikipedia projects. This should provide some sensible comparison
bases on which Wikipedians can reflect upon.

Finally, may I point out the external environments for languages are also
changing, which could be useful for the global Wikimedia/open knowledge
movement. Based on my research on the competition of Baidu Baike and
Chinese Wikiepdia in mainland China, I found that the windfall of fast
growing internet users during the years of late 2005-2008 are crucial for
any websites to thrive in mainland China, a windfall that Chinese Wikipedia
missed because of the block by Beijing. From this, I argue that it makes
strategic sense to catch the wave of rising internet users, esp. during the
time when the penetration rates quickly rise from 12.8% to 40% for a given
population.  The external time-series data points can help pointing out the
rising language users on the Web (probably Indian languages when Chinese
languages have reached 40-50%).

Best,
han-teng liao



2014-07-08 12:03 GMT+01:00 Gerard Meijssen gerard.meijs...@gmail.com:

 Hoi,
 At the WMF language committee, the question if a language is viable for a
 Wikimedia project is a practical one. It is also very much a political one.
 One vitally important difference with your approach is that the distinction
 is between a first project and a subsequent project. In the latest
 iteration of the approach we do not consider Wikidata a first project.
 Relevance is that we do not require localisation of MediaWiki or an
 Incubator stage.

 When the question is what it takes for a new project to work? .. the
 simple answer is a few good men. There are a few projects that are alive
 and well that rely on no more than 3 people.

 By not focussing on Wikipedia, it is possible that a Wikisource becomes
 the first project. When this is what those few good men want.. It is
 their party.

 You may imagine that we thought about what are the likely success factors
 for a new project. We did come up with similar ideas that you have. The
 problem is that it does not help. So you determine the likelihood of
 success, it does not guarantee it.

 What we certainly do not consider is the number of data sources. Sourcing
 is very much a luxury in starting projects. Insisting on sourcing at all
 will kill most initiatives immediately. What is important is that people
 start writing, reading in their language.. With a Wikipedia that gets
 active participation / readership, there will be a move to a more
 consistent 

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Jane Darnell
han-teng liao,
Sorry but I had to read your answer a couple times before I understood what
you were getting at. I missed the previous conversation also. For
information about the 10,000 things, I would just go to GerardM because he
knows all about that stuff. As far as page stats on all the projects, you
may want to talk to Erik Zachte about his Infodesiac charts.

The reason I was confused is because you can't make a comparison study
unless you fix a few variables, and if you take an approach that is only
including external sites that are only available in the specific languages,
then I don't think you have variables that you can compare across
languages. My point about the combination community-content, is that there
is no quality content without all of the chaff, and there is no community
without quality content.

Therefore, you need to look at both, so the social side of editor
interactions (or lack thereof) are as equally important as looking solely
at content. Don't forget that all edits result in one way or another from
an internet search.

As far as the term a few good men goes, I think Gerard was referring to
the success of the Dutch Wikipedia, which is pretty good in terms of
number of people in the world who speak and read the language, while 94%
of all editors are male. I think Erik Zachte has gathered some numbers on
the language-speakers per Wikipedia aspect of this issue. Gender
information is only based on survey results, though I think our survey was
pretty solid, and though 6% declined to specify their gender, even if you
add this to the other 6% then the few good men statement still holds.

Jane
Lezer van de Prullenbak van de Ingezonden Brieven




On Tue, Jul 8, 2014 at 2:12 PM, h hant...@gmail.com wrote:

 (on Laura Hale's pilot study of measuring quality across several languages
 used in Spain)

 Laura, I enjoy reading the report on your blog post, which also takes also
 the quantified approach to measuring quality.

 If I did not misread your blogpost, you incorporated the measurement of
 general quality of external links (which I assume could be sources or
 suggested reading lists) as components of Wikipedia article quality. I also
 used similar research strategy for my thesis chapter.

 I also like the initial pick of “Female MEPs for Spain.” (Women
 representing Spain who are or have been Members of the European Parliament
 for study. It would be interesting to see how the same methodology applies
 for male MEPs for another gender-minded study.

 Can you tell me *whether* and *how* you measure or consider the
 availability of users/content for these languages in Spain. To me Spanish
 language is a world language whereas other languages (esp. Catalan) may
 have some kind of development in terms of their publishing market. It would
 be even more interesting to know, in terms of gender-minded external
 publications, for each language, and use this information to contextualize
 the sources used in these Wikipedia articles. It is not uncommon practices
 for a Wikipedia article to cite a more dominant or *published* language
 source. I would imagine some Catalan Wikipedia articles also cite Spanish
 sources, for instance.

 This leads to the issue of language/knowledge dependency. Although I only
 addressed this issue superficially by visualizing interlanguage links
 before, it is on my mind, though it is a separate issue.

 Best,
 han-teng liao


 2014-07-08 11:13 GMT+01:00 Laura Hale la...@fanhistory.com:

 I more or less tried to have a go at this on
 http://wikinewsreporter.wordpress.com/2014/06/30/determining-the-relative-quality-of-one-wikipedia-project-to-another-one-approach-with-english-spanish-catalan-galician-argonese-and-euskera-wikipedias/
 using both internal and external criteria for determining quality.
  (External being defined as what is considered good type of work on the
 topic using outside, non-Wikipedia specific definitions of quality.)

 Sincerely,
 Laura Hale


 On Tue, Jul 8, 2014 at 12:06 PM, Han-Teng Liao (OII) 
 han-teng.l...@oii.ox.ac.uk wrote:

 Thanks Jane for the comments and suggestions.

 Correct me if I misread your comments/suggestions, Jane.

 (1) Did you suggest measurements that are observable *inside*
 Wikipedia/Wikimedia websites?
 (2) If so, does it mean that your suggestion of measuring the current
 state of a language version as a combination of the state of its
 content and community describes only the *internal* state of that version?
 (3) When you said zero-state, did you mean the state where the number
 of articles in a given language version is zero?

 Your suggestions appear to me deal with a measurement of the current
 state of a language version. The use of zero-state suggests the equal
 grounds for any language version to develop on the Wikipedia platform.

 However, my call for help focuses on the current external state out
 there external to Wikipedia platform. In this context, the term *baseline*
 suggests some languages are