Re: [Wiki-research-l] [Release]
2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org: yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. Still, the question could be: are we fulfilling the mission? (hint: probably not) Cristian ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months. On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com wrote: 2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org: yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. Still, the question could be: are we fulfilling the mission? (hint: probably not) Cristian ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
On 4 March 2015 at 04:28, Pine W wiki.p...@gmail.com wrote: I'm not sure how much influence I have, but I would be happy to make whispers in appropriate places to try to get more support, if that's helpful. I think I'm probably good, but thank you. Perhaps you could show your work at the next Research and Data showcase? I for one would be interested in seeing a presentation. That's in 3 weeks; I'm not convinced that a piece of substantive, useful research about global reach could be done in that time period even if I could drop everything I currently have (which I can't). This problem is too big and too important to be scheduled around meetings; things should work the other way around. Scott Hale and I have been working on a paper looking at global reach and how it tracks with internet access growth, in the context of editing, particularly looking at the mobile web. That, we should be done with by then; presenting it could be highly useful (Scott? ;p) Pine This is an Encyclopedia One gateway to the wide garden of knowledge, where lies The deep rock of our past, in which we must delve The well of our future, The clear water we must leave untainted for those who come after us, The fertile earth, in which truth may grow in bright places, tended by many hands, And the broad fall of sunshine, warming our first steps toward knowing how much we do not know. —Catherine Munro On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes oke...@wikimedia.org wrote: That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months. On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com wrote: 2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org: yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. Still, the question could be: are we fulfilling the mission? (hint: probably not) Cristian ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
'Lots, but that's not currently anyone's job' On Wednesday, 4 March 2015, Dario Taraborelli dtarabore...@wikimedia.org wrote: yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. On Mar 3, 2015, at 9:29 PM, Oliver Keyes oke...@wikimedia.org javascript:; wrote: Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes) On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org javascript:; wrote: Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial. On 2 March 2015 at 09:55, h hant...@gmail.com javascript:; wrote: Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html You have some sizable Finnish language speakers in Sweden: Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2% So if the similar query is executed on Finnish language, and the results also show some undue proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests. I am still a bit bothered by the number 1 in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision universal percentage number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed Best, han-teng liao 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk javascript:;: Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv(' http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
I'm not sure how much influence I have, but I would be happy to make whispers in appropriate places to try to get more support, if that's helpful. Perhaps you could show your work at the next Research and Data showcase? I for one would be interested in seeing a presentation. Pine *This is an Encyclopedia* https://www.wikipedia.org/ *One gateway to the wide garden of knowledge, where lies The deep rock of our past, in which we must delve The well of our future,The clear water we must leave untainted for those who come after us,The fertile earth, in which truth may grow in bright places, tended by many hands,And the broad fall of sunshine, warming our first steps toward knowing how much we do not know.* *—Catherine Munro* On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes oke...@wikimedia.org wrote: That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months. On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com wrote: 2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org : yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. Still, the question could be: are we fulfilling the mission? (hint: probably not) Cristian ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Oliver: Scott Hale and I have been working on a paper looking at global reach and how it tracks with internet access growth, in the context of editing, particularly looking at the mobile web. That, we should be done with by then; presenting it could be highly useful (Scott? ;p) I see what you did there, Oliver :J I believe the showcase in is two weeks (3rd Wednesday of the month), which is a bit too tight to make sure everything is really checked and accurate. I'm in Asia in April, but *we* could definitely present in May on the work, which as Oliver said is correlating Wikipedia editor numbers with mobile and broadband penetration data on a country level. Dario: I wonder how many requests from US-based bots/automata we’re still failing to detect. This reminds me that I would like to engage with the technical development team on the idea of storing the application (i.e., oauth consumer id) for each edit made through the API. Not all bots use the API, I guess, but I would venture that many (maybe most) do and tracking them would then become trivial. Tracking the applications used to make edits via the API would also allow tracking of alternative editing interfaces (e.g., visual editor uses the API and perhaps AutoWikiBrowser or others do as well.) I've never proposed any technical enhancement requests for Mediawiki and so very much welcome guidance. Best wishes, Scott ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect. On Mar 3, 2015, at 9:29 PM, Oliver Keyes oke...@wikimedia.org wrote: Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes) On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org wrote: Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial. On 2 March 2015 at 09:55, h hant...@gmail.com wrote: Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html You have some sizable Finnish language speakers in Sweden: Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2% So if the similar query is executed on Finnish language, and the results also show some undue proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests. I am still a bit bothered by the number 1 in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision universal percentage number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed Best, han-teng liao 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk: Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation -- Oliver Keyes Research Analyst Wikimedia Foundation
Re: [Wiki-research-l] [Release]
Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes) On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org wrote: Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial. On 2 March 2015 at 09:55, h hant...@gmail.com wrote: Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html You have some sizable Finnish language speakers in Sweden: Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2% So if the similar query is executed on Finnish language, and the results also show some undue proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests. I am still a bit bothered by the number 1 in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision universal percentage number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed Best, han-teng liao 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk: Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial. On 2 March 2015 at 09:55, h hant...@gmail.com wrote: Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html You have some sizable Finnish language speakers in Sweden: Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2% So if the similar query is executed on Finnish language, and the results also show some undue proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests. I am still a bit bothered by the number 1 in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision universal percentage number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed Best, han-teng liao 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk: Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html You have some sizable Finnish language speakers in Sweden: Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2% So if the similar query is executed on Finnish language, and the results also show some undue proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests. I am still a bit bothered by the number 1 in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision universal percentage number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed Best, han-teng liao 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk: Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_ pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Hi Oliver, Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers? In Python: import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage country Austria1 China 1 Denmark 61 Estonia1 France 1 Germany2 Netherlands2 Norway 1 Sweden18 United Kingdom 3 United States 3 Other 5 MaxMind has some numbers on their own accuracy: https://www.maxmind.com/en/geoip2-city-database-accuracy For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if this really could bias the result so much. If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark. best regards Finn On 02/25/2015 10:06 PM, Oliver Keyes wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] [Release]
Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Great job. Who knew Esperanto was big in Japan and China at #2 and #3? On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
The one major caveat, I think, is that the danger of proportionate data is that it makes small projects very vulnerable to artificial traffic spikes. I'd go out on a limb and say that some of the massive bumps in popularity we see in particular combinations are likely due to either undetected automata or simply a project having so little traffic that a small number of people can sway the results outlandishly. On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote: Great job. Who knew Esperanto was big in Japan and China at #2 and #3? On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
This is really, really cool, great job guys! G Giovanni Luca Ciampaglia ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA ☞ http://www.glciampaglia.com/ ✆ +1 812 855-7261 ✉ gciam...@indiana.edu 2015-02-25 16:06 GMT-05:00 Oliver Keyes oke...@wikimedia.org: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l