Re: [Wiki-research-l] [Release]

Oliver Keyes Mon, 02 Mar 2015 19:37:27 -0800

Indeed! Orienting it that way (pivoting on language rather than
project) is something several people have asked for; I plan to spend a
chunk of my spare time (that is, recreational time) trying to make it
work. Should be fairly trivial.


On 2 March 2015 at 09:55, h <hant...@gmail.com> wrote:
> Hello Finn,
>    I do not have a specific answer to your question. However, it might be
> worthwhile to add Finnish in to the comparison as according to the CLDR 26
> T-L information
> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
>
>    You have some sizable Finnish language speakers in Sweden:
>
> Swedish {O} sv 95.0% 99.0%
> Finnish {OR} fi 2.2%
>
>     So if the similar query is executed on Finnish language, and the results
> also show some "undue" proportion of visits from Sweden, then what you
> observed as anomaly is the that unique. We probably need many iterations of
> comparative outcomes and normalization of data (Sweden does have higher
> population).  Also, it might be handy to have some statistics on immigration
> or residence, it is EU. I will not be surprised that for example the  visits
> from Oxford to Wikipedia website have sizable German language requests.
>
>     I am still a bit bothered by the number "1" in the current dataset. It
> does not feel right since the numbers of 1.4% and 0.6% is a notable
> difference in this regard. Perhaps we need some high precision "universal
> percentage" number for each territory-language pair. It would be also great
> to do another set of aggregation: i.e. given a territory, which language
> versions of Wikipedia are accessed....
>
> Best,
> han-teng liao
>
> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <f...@imm.dtu.dk>:
>>
>> Hi Oliver,
>>
>>
>> Interesting dataset! I am curious about why the Danish Wikipedia is so
>> highly acccessed from Sweden. Could it be an error, e.g., with Telia
>> IP-numbers?
>>
>> In Python:
>>
>> >>> import pandas as pd
>> >>> df =
>> >>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>> >>> sep='\t')
>> >>> df.ix[df.project == 'da.wikipedia.org', ['country',
>> >>> 'pageviews_percentage']].set_index('country') pageviews_percentage
>> country
>> Austria                            1
>> China                              1
>> Denmark                           61
>> Estonia                            1
>> France                             1
>> Germany                            2
>> Netherlands                        2
>> Norway                             1
>> Sweden                            18
>> United Kingdom                     3
>> United States                      3
>> Other                              5
>>
>>
>> MaxMind has some numbers on their own accuracy:
>>
>> https://www.maxmind.com/en/geoip2-city-database-accuracy
>>
>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
>> this really could bias the result so much.
>>
>> If the numbers are correct why would the Swedish read the Danish Wikipedia
>> so much? Bots? It does not apply the other way around: Only 2% of the
>> traffic to Swedish Wikipedia comes from Denmark.
>>
>>
>>
>> best regards
>> Finn
>>
>>
>>
>> On 02/25/2015 10:06 PM, Oliver Keyes wrote:
>>>
>>> Hey all!
>>>
>>> We've released a highly-aggregated dataset of readership data -
>>> specifically, data about where, geographically, traffic to each of our
>>> projects (and all of our projects) comes from. The data can be found
>>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
>>> put together an exploration tool for it at
>>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>>
>>> Hope it's useful to people!
>>>
>>
>>
>> --
>> Finn Årup Nielsen
>> http://people.compute.dtu.dk/faan/
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] [Release]

Reply via email to