A couple of research papers that might be helpful: 1: Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Proceedings of the 2009 International Conference on Communities and Technologies , pp. 11-19. http://www.brenthecht.com/publications/bhecht_CommAndTech2009.pdf
In their paper, Hecht & Gergle study how content in some of the Wikipedia editions is focused on certain countries, and those typically correspond to where the languages are spoken. 2: Warncke-Wang, M., Uduwage, A., Dong, Z., and Riedl, J. "In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network", in WikiSym 2012. http://www-users.cs.umn.edu/~morten/publications/wikisym2012-urwikipedia.pdf In this paper we wanted to study similarity based on distance, meaning that we needed to see if we could locate a Wikipedia edition to a specific country. Turns out that if you look at the statistics[1], a lot of the language editions get the vast majority of edits from a single country. While that's not helpful when it comes to the English edition, it arguably solves the problem for quite a few other languages. Footnotes: 1: https://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htm Cheers, Morten On 24 January 2017 at 07:27, Peter Ekman <[email protected]> wrote: > Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done > some very simple informal investigation regarding the quality of > geographic articles, these are mostly on cities, towns, counties, etc. > in en:Wikipedia. Geographic articles have much lower average quality > scores than other subjects (see > https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 ) > With just a small bit of poking around it's obvious that the quality > difference between geo articles and the rest is due to geo articles > about countries where English is not the native language. A bit more > poking and something that should have been really obvious jumps out. > French geo articles on FR:Wiki are much better (at least longer) than > the corresponding EN:Wiki article; Russian geo articles are much > better on RU:Wiki than on EN:Wiki, etc. > > This is certainly consistent with the "Patriotic editing hypothesis" > if we define patriotism by language rather than by borders. It could > be checked out with other language versions e.g. German vs. French; > (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian; > Chinese vs. any language. > > The hypothesis even had a very practical implication - we should > translate more geo articles from their native language Wikipedias. > > Hope this helps, > Pete Ekman > ==== > Date: Tue, 24 Jan 2017 11:12:58 +1000 > From: "Kerry Raymond" <[email protected]> > To: "'Research into Wikimedia content and communities'" > <[email protected]> > Subject: [Wiki-research-l] regional KPIs > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > As previously came up in discussion about chapters, it would be very useful > to have national data about Wikipedia activities, which can be determined > (generally) from IP addresses. Now I understand the privacy argument in > relation to logged-in users (not saying I agree with it though in relation > to aggregate data). However, can we find a proxy that does not have the > privacy considerations. > > > > My hypothesis is that national content is predominantly written by users > resident in that nation. And that therefore activity on national content > can > be used as a proxy for national user editing activity. > > > > In the case of Australia, we could describe Australian national content in > either of two ways: articles within the closure of the > [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. > There are arguments for/against either (neither is perfect, in my > experience > the category closure will tend to have false positives and the project will > tend to have false negatives). > > > > I would like to know what correlation exists between national editor > activity (as determined from IP addresses mapped to location) and national > content edits and if/how it changes over time for various nations. This is > research that only WMF can do because WMF has the IP addresses and the rest > of us can't have them for privacy reasons. > > > > If we could establish that a strong-enough correlation existed between > them, > we could use national content activity (for which there is no privacy > consideration) as a proxy for national editing activity. And we might even > be able to come up with a multiplier for each nation to provide comparable > data for national editing activity. > > > > Now, it may be that we need to restrict the edits themselves in some way to > maximise the correlations between national content and same-nation editor > activity. > > > > My second hypothesis is "semantic" edits (e.g. edits that add large amounts > of content or citation) to national content will be more highly correlated > with same-nation editors than "syntactic" edits (e.g. fix spelling, > punctuation or Manual of Style issues) will be. I suspect most bots and > other automated/semi-automated edits are doing syntactic edits. > > > > Now, some of you will probably be aware of > [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_ > Signpost/2017-01-17/Recen > t_research Female Wikipedians aren't more likely to edit women > biographies]. > So it may well be that my patriotic-editing hypothesis is also untrue. But > it would be nice to know one way or the other. > > > > Kerry > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
