Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial.
On 2 March 2015 at 09:55, h <hant...@gmail.com> wrote: > Hello Finn, > I do not have a specific answer to your question. However, it might be > worthwhile to add Finnish in to the comparison as according to the CLDR 26 > T-L information > http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html > > You have some sizable Finnish language speakers in Sweden: > > Swedish {O} sv 95.0% 99.0% > Finnish {OR} fi 2.2% > > So if the similar query is executed on Finnish language, and the results > also show some "undue" proportion of visits from Sweden, then what you > observed as anomaly is the that unique. We probably need many iterations of > comparative outcomes and normalization of data (Sweden does have higher > population). Also, it might be handy to have some statistics on immigration > or residence, it is EU. I will not be surprised that for example the visits > from Oxford to Wikipedia website have sizable German language requests. > > I am still a bit bothered by the number "1" in the current dataset. It > does not feel right since the numbers of 1.4% and 0.6% is a notable > difference in this regard. Perhaps we need some high precision "universal > percentage" number for each territory-language pair. It would be also great > to do another set of aggregation: i.e. given a territory, which language > versions of Wikipedia are accessed.... > > Best, > han-teng liao > > 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <f...@imm.dtu.dk>: >> >> Hi Oliver, >> >> >> Interesting dataset! I am curious about why the Danish Wikipedia is so >> highly acccessed from Sweden. Could it be an error, e.g., with Telia >> IP-numbers? >> >> In Python: >> >> >>> import pandas as pd >> >>> df = >> >>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', >> >>> sep='\t') >> >>> df.ix[df.project == 'da.wikipedia.org', ['country', >> >>> 'pageviews_percentage']].set_index('country') pageviews_percentage >> country >> Austria 1 >> China 1 >> Denmark 61 >> Estonia 1 >> France 1 >> Germany 2 >> Netherlands 2 >> Norway 1 >> Sweden 18 >> United Kingdom 3 >> United States 3 >> Other 5 >> >> >> MaxMind has some numbers on their own accuracy: >> >> https://www.maxmind.com/en/geoip2-city-database-accuracy >> >> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if >> this really could bias the result so much. >> >> If the numbers are correct why would the Swedish read the Danish Wikipedia >> so much? Bots? It does not apply the other way around: Only 2% of the >> traffic to Swedish Wikipedia comes from Denmark. >> >> >> >> best regards >> Finn >> >> >> >> On 02/25/2015 10:06 PM, Oliver Keyes wrote: >>> >>> Hey all! >>> >>> We've released a highly-aggregated dataset of readership data - >>> specifically, data about where, geographically, traffic to each of our >>> projects (and all of our projects) comes from. The data can be found >>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've >>> put together an exploration tool for it at >>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ >>> >>> Hope it's useful to people! >>> >> >> >> -- >> Finn Årup Nielsen >> http://people.compute.dtu.dk/faan/ >> >> >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l