A couple of research papers that might be helpful:

1: Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in
Community-Maintained Knowledge Repositories. Proceedings of the 2009
International Conference on Communities and Technologies , pp. 11-19.
http://www.brenthecht.com/publications/bhecht_CommAndTech2009.pdf

In their paper, Hecht & Gergle study how content in some of the Wikipedia
editions is focused on certain countries, and those typically correspond to
where the languages are spoken.

2: Warncke-Wang, M., Uduwage, A., Dong, Z., and Riedl, J. "In Search of the
Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia
Inter-language Link Network", in WikiSym 2012.
http://www-users.cs.umn.edu/~morten/publications/wikisym2012-urwikipedia.pdf

In this paper we wanted to study similarity based on distance, meaning that
we needed to see if we could locate a Wikipedia edition to a specific
country. Turns out that if you look at the statistics[1], a lot of the
language editions get the vast majority of edits from a single country.
While that's not helpful when it comes to the English edition, it arguably
solves the problem for quite a few other languages.


Footnotes:
1:
https://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htm


Cheers,
Morten


On 24 January 2017 at 07:27, Peter Ekman <[email protected]> wrote:

> Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done
> some very simple informal investigation regarding the quality of
> geographic articles, these are mostly on cities, towns, counties, etc.
> in en:Wikipedia.  Geographic articles have much lower average quality
> scores than other subjects (see
> https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 )
> With just a small bit of poking around it's obvious that the quality
> difference between geo articles and the rest is due to geo articles
> about countries where English is not the native language. A bit more
> poking and something that should have been really obvious jumps out.
> French geo articles on FR:Wiki are much better (at least longer) than
> the corresponding EN:Wiki article; Russian geo articles are much
> better on RU:Wiki than on EN:Wiki, etc.
>
> This is certainly consistent with the "Patriotic editing hypothesis"
> if we define patriotism by language rather than by borders.  It could
> be checked out with other language versions e.g. German vs. French;
> (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian;
> Chinese vs. any language.
>
> The hypothesis even had a very practical implication - we should
> translate more geo articles from their native language Wikipedias.
>
> Hope this helps,
> Pete Ekman
> ====
> Date: Tue, 24 Jan 2017 11:12:58 +1000
> From: "Kerry Raymond" <[email protected]>
> To: "'Research into Wikimedia content and communities'"
>         <[email protected]>
> Subject: [Wiki-research-l] regional KPIs
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> As previously came up in discussion about chapters, it would be very useful
> to have national data about Wikipedia activities, which can be determined
> (generally) from IP addresses. Now I understand the privacy argument in
> relation to logged-in users (not saying I agree with it though in relation
> to aggregate data). However, can we find a proxy that does not have the
> privacy considerations.
>
>
>
> My hypothesis is that national content is predominantly written by users
> resident in that nation. And that therefore activity on national content
> can
> be used as a proxy for national user editing activity.
>
>
>
> In the case of Australia, we could describe Australian national content in
> either of two ways: articles within the closure of the
> [[Category:Australia]] and/or those tagged as  {{WikiProject Australia}}.
> There are arguments for/against either (neither is perfect, in my
> experience
> the category closure will tend to have false positives and the project will
> tend to have false negatives).
>
>
>
> I would like to know what correlation exists between national editor
> activity (as determined from IP addresses mapped to location) and national
> content edits and if/how it changes over time for various nations. This is
> research that only WMF can do because WMF has the IP addresses and the rest
> of us can't have them for privacy reasons.
>
>
>
> If we could establish that a strong-enough correlation existed between
> them,
> we could use national content activity (for which there is no privacy
> consideration) as a proxy for national editing activity. And we might even
> be able to come up with a multiplier for each nation to provide comparable
> data for national editing activity.
>
>
>
> Now, it may be that we need to restrict the edits themselves in some way to
> maximise the correlations between national content and same-nation editor
> activity.
>
>
>
> My second hypothesis is "semantic" edits (e.g. edits that add large amounts
> of content or citation) to national content will be more highly correlated
> with same-nation editors than "syntactic" edits (e.g. fix spelling,
> punctuation or Manual of Style issues) will be. I suspect most bots and
> other automated/semi-automated edits are doing syntactic edits.
>
>
>
> Now, some of you will probably be aware of
> [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_
> Signpost/2017-01-17/Recen
> t_research Female Wikipedians aren't more likely to edit women
> biographies].
> So it may well be that my patriotic-editing hypothesis is also untrue. But
> it would be nice to know one way or the other.
>
>
>
> Kerry
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to