Re: [Wiki-research-l] [Release]

2015-03-04 Thread Cristian Consonni
2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org:
 yay, shiny! The map is a pretty compelling way to show how dominant traffic 
 from the US is, even for very minor languages (say bi.wikipedia.org), I 
 wonder how many requests from US-based bots/automata we’re still failing to 
 detect.

Still, the question could be: are we fulfilling the mission?
(hint: probably not)

Cristian

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
That is the question, and I agree with your conclusion. I'm hoping to
do more research into this; getting buyin internally has been tough,
but I'm confident of making progress on that front over the next few
weeks and months.

On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com wrote:
 2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org:
 yay, shiny! The map is a pretty compelling way to show how dominant traffic 
 from the US is, even for very minor languages (say bi.wikipedia.org), I 
 wonder how many requests from US-based bots/automata we’re still failing to 
 detect.

 Still, the question could be: are we fulfilling the mission?
 (hint: probably not)

 Cristian

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
On 4 March 2015 at 04:28, Pine W wiki.p...@gmail.com wrote:
 I'm not sure how much influence I have, but I would be happy to make
 whispers in appropriate places to try to get more support, if that's
 helpful.


I think I'm probably good, but thank you.

 Perhaps you could show your work at the next Research and Data showcase? I
 for one would be interested in seeing a presentation.

That's in 3 weeks; I'm not convinced that a piece of substantive,
useful research about global reach could be done in that time period
even if I could drop everything I currently have (which I can't). This
problem is too big and too important to be scheduled around meetings;
things should work the other way around.

Scott Hale and I have been working on a paper looking at global reach
and how it tracks with internet access growth, in the context of
editing, particularly looking at the mobile web. That, we should be
done with by then; presenting it could be highly useful (Scott? ;p)


 Pine

 This is an Encyclopedia
 One gateway to the wide garden of knowledge, where lies
 The deep rock of our past, in which we must delve
 The well of our future,
 The clear water we must leave untainted for those who come after us,
 The fertile earth, in which truth may grow in bright places, tended by many
 hands,
 And the broad fall of sunshine, warming our first steps toward knowing how
 much we do not know.
 —Catherine Munro



 On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes oke...@wikimedia.org wrote:

 That is the question, and I agree with your conclusion. I'm hoping to
 do more research into this; getting buyin internally has been tough,
 but I'm confident of making progress on that front over the next few
 weeks and months.

 On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com
 wrote:
  2015-03-04 8:44 GMT+01:00 Dario Taraborelli
  dtarabore...@wikimedia.org:
  yay, shiny! The map is a pretty compelling way to show how dominant
  traffic from the US is, even for very minor languages (say
  bi.wikipedia.org), I wonder how many requests from US-based bots/automata
  we’re still failing to detect.
 
  Still, the question could be: are we fulfilling the mission?
  (hint: probably not)
 
  Cristian
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
'Lots, but that's not currently anyone's job'

On Wednesday, 4 March 2015, Dario Taraborelli dtarabore...@wikimedia.org
wrote:

 yay, shiny! The map is a pretty compelling way to show how dominant
 traffic from the US is, even for very minor languages (say
 bi.wikipedia.org), I wonder how many requests from US-based bots/automata
 we’re still failing to detect.

  On Mar 3, 2015, at 9:29 PM, Oliver Keyes oke...@wikimedia.org
 javascript:; wrote:
 
  Update: the original Shiny instance went down due to server load soon
  after release. It's now up again at http://datavis.wmflabs.org/where/
  on a dedicated Labs machine, where we hope to put...many more
  visualisations. It also now has mapping, largely thanks to Sarah
  Laplante (http://sarahlaplante.com/), and soon it will hopefully be
  /non-hideous/ mapping (the current mass of blue and grey is because my
  aesthetic tastes are...I don't actually have any aesthetic tastes)
 
  On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org
 javascript:; wrote:
  Indeed! Orienting it that way (pivoting on language rather than
  project) is something several people have asked for; I plan to spend a
  chunk of my spare time (that is, recreational time) trying to make it
  work. Should be fairly trivial.
 
  On 2 March 2015 at 09:55, h hant...@gmail.com javascript:; wrote:
  Hello Finn,
I do not have a specific answer to your question. However, it might
 be
  worthwhile to add Finnish in to the comparison as according to the
 CLDR 26
  T-L information
 
 http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
 
You have some sizable Finnish language speakers in Sweden:
 
  Swedish {O} sv 95.0% 99.0%
  Finnish {OR} fi 2.2%
 
 So if the similar query is executed on Finnish language, and the
 results
  also show some undue proportion of visits from Sweden, then what you
  observed as anomaly is the that unique. We probably need many
 iterations of
  comparative outcomes and normalization of data (Sweden does have higher
  population).  Also, it might be handy to have some statistics on
 immigration
  or residence, it is EU. I will not be surprised that for example the
 visits
  from Oxford to Wikipedia website have sizable German language requests.
 
 I am still a bit bothered by the number 1 in the current dataset.
 It
  does not feel right since the numbers of 1.4% and 0.6% is a notable
  difference in this regard. Perhaps we need some high precision
 universal
  percentage number for each territory-language pair. It would be also
 great
  to do another set of aggregation: i.e. given a territory, which
 language
  versions of Wikipedia are accessed
 
  Best,
  han-teng liao
 
  2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk
 javascript:;:
 
  Hi Oliver,
 
 
  Interesting dataset! I am curious about why the Danish Wikipedia is so
  highly acccessed from Sweden. Could it be an error, e.g., with Telia
  IP-numbers?
 
  In Python:
 
  import pandas as pd
  df =
  pd.read_csv('
 http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
  sep='\t')
  df.ix[df.project == 'da.wikipedia.org', ['country',
  'pageviews_percentage']].set_index('country') pageviews_percentage
  country
  Austria1
  China  1
  Denmark   61
  Estonia1
  France 1
  Germany2
  Netherlands2
  Norway 1
  Sweden18
  United Kingdom 3
  United States  3
  Other  5
 
 
  MaxMind has some numbers on their own accuracy:
 
  https://www.maxmind.com/en/geoip2-city-database-accuracy
 
  For Denmark 85% is Correctly Resolved, for Sweden only 68%. I
 wonder if
  this really could bias the result so much.
 
  If the numbers are correct why would the Swedish read the Danish
 Wikipedia
  so much? Bots? It does not apply the other way around: Only 2% of the
  traffic to Swedish Wikipedia comes from Denmark.
 
 
 
  best regards
  Finn
 
 
 
  On 02/25/2015 10:06 PM, Oliver Keyes wrote:
 
  Hey all!
 
  We've released a highly-aggregated dataset of readership data -
  specifically, data about where, geographically, traffic to each of
 our
  projects (and all of our projects) comes from. The data can be found
  at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally,
 I've
  put together an exploration tool for it at
  https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
 
  Hope it's useful to people!
 
 
 
  --
  Finn Årup Nielsen
  http://people.compute.dtu.dk/faan/
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
  

Re: [Wiki-research-l] [Release]

2015-03-04 Thread Pine W
I'm not sure how much influence I have, but I would be happy to make
whispers in appropriate places to try to get more support, if that's
helpful.

Perhaps you could show your work at the next Research and Data showcase? I
for one would be interested in seeing a presentation.

Pine

*This is an Encyclopedia* https://www.wikipedia.org/






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes oke...@wikimedia.org wrote:

 That is the question, and I agree with your conclusion. I'm hoping to
 do more research into this; getting buyin internally has been tough,
 but I'm confident of making progress on that front over the next few
 weeks and months.

 On 4 March 2015 at 04:13, Cristian Consonni kikkocrist...@gmail.com
 wrote:
  2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtarabore...@wikimedia.org
 :
  yay, shiny! The map is a pretty compelling way to show how dominant
 traffic from the US is, even for very minor languages (say
 bi.wikipedia.org), I wonder how many requests from US-based bots/automata
 we’re still failing to detect.
 
  Still, the question could be: are we fulfilling the mission?
  (hint: probably not)
 
  Cristian
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Scott Hale
Oliver:

 Scott Hale and I have been working on a paper looking at global reach
 and how it tracks with internet access growth, in the context of
 editing, particularly looking at the mobile web. That, we should be
 done with by then; presenting it could be highly useful (Scott? ;p)


I see what you did there, Oliver :J
I believe the showcase in is two weeks (3rd Wednesday of the month), which
is a bit too tight to make sure everything is really checked and accurate.
I'm in Asia in April, but *we* could definitely present in May on the work,
which as Oliver said is correlating Wikipedia editor numbers with mobile
and broadband penetration data on a country level.

Dario:

 I wonder how many requests from US-based bots/automata we’re still failing
 to detect.


This reminds me that I would like to engage with the technical development
team on the idea of storing the application (i.e., oauth consumer id) for
each edit made through the API. Not all bots use the API, I guess, but I
would venture that many (maybe most) do and tracking them would then become
trivial. Tracking the applications used to make edits via the API would
also allow tracking of alternative editing interfaces (e.g., visual editor
uses the API and perhaps AutoWikiBrowser or others do as well.) I've never
proposed any technical enhancement requests for Mediawiki and so very much
welcome guidance.

Best wishes,
Scott
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-03 Thread Dario Taraborelli
yay, shiny! The map is a pretty compelling way to show how dominant traffic 
from the US is, even for very minor languages (say bi.wikipedia.org), I wonder 
how many requests from US-based bots/automata we’re still failing to detect.

 On Mar 3, 2015, at 9:29 PM, Oliver Keyes oke...@wikimedia.org wrote:
 
 Update: the original Shiny instance went down due to server load soon
 after release. It's now up again at http://datavis.wmflabs.org/where/
 on a dedicated Labs machine, where we hope to put...many more
 visualisations. It also now has mapping, largely thanks to Sarah
 Laplante (http://sarahlaplante.com/), and soon it will hopefully be
 /non-hideous/ mapping (the current mass of blue and grey is because my
 aesthetic tastes are...I don't actually have any aesthetic tastes)
 
 On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org wrote:
 Indeed! Orienting it that way (pivoting on language rather than
 project) is something several people have asked for; I plan to spend a
 chunk of my spare time (that is, recreational time) trying to make it
 work. Should be fairly trivial.
 
 On 2 March 2015 at 09:55, h hant...@gmail.com wrote:
 Hello Finn,
   I do not have a specific answer to your question. However, it might be
 worthwhile to add Finnish in to the comparison as according to the CLDR 26
 T-L information
 http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
 
   You have some sizable Finnish language speakers in Sweden:
 
 Swedish {O} sv 95.0% 99.0%
 Finnish {OR} fi 2.2%
 
So if the similar query is executed on Finnish language, and the results
 also show some undue proportion of visits from Sweden, then what you
 observed as anomaly is the that unique. We probably need many iterations of
 comparative outcomes and normalization of data (Sweden does have higher
 population).  Also, it might be handy to have some statistics on immigration
 or residence, it is EU. I will not be surprised that for example the  visits
 from Oxford to Wikipedia website have sizable German language requests.
 
I am still a bit bothered by the number 1 in the current dataset. It
 does not feel right since the numbers of 1.4% and 0.6% is a notable
 difference in this regard. Perhaps we need some high precision universal
 percentage number for each territory-language pair. It would be also great
 to do another set of aggregation: i.e. given a territory, which language
 versions of Wikipedia are accessed
 
 Best,
 han-teng liao
 
 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk:
 
 Hi Oliver,
 
 
 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?
 
 In Python:
 
 import pandas as pd
 df =
 pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
 sep='\t')
 df.ix[df.project == 'da.wikipedia.org', ['country',
 'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5
 
 
 MaxMind has some numbers on their own accuracy:
 
 https://www.maxmind.com/en/geoip2-city-database-accuracy
 
 For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if
 this really could bias the result so much.
 
 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.
 
 
 
 best regards
 Finn
 
 
 
 On 02/25/2015 10:06 PM, Oliver Keyes wrote:
 
 Hey all!
 
 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
 
 Hope it's useful to people!
 
 
 
 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 
 
 -- 
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 

Re: [Wiki-research-l] [Release]

2015-03-03 Thread Oliver Keyes
Update: the original Shiny instance went down due to server load soon
after release. It's now up again at http://datavis.wmflabs.org/where/
on a dedicated Labs machine, where we hope to put...many more
visualisations. It also now has mapping, largely thanks to Sarah
Laplante (http://sarahlaplante.com/), and soon it will hopefully be
/non-hideous/ mapping (the current mass of blue and grey is because my
aesthetic tastes are...I don't actually have any aesthetic tastes)

On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org wrote:
 Indeed! Orienting it that way (pivoting on language rather than
 project) is something several people have asked for; I plan to spend a
 chunk of my spare time (that is, recreational time) trying to make it
 work. Should be fairly trivial.

 On 2 March 2015 at 09:55, h hant...@gmail.com wrote:
 Hello Finn,
I do not have a specific answer to your question. However, it might be
 worthwhile to add Finnish in to the comparison as according to the CLDR 26
 T-L information
 http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html

You have some sizable Finnish language speakers in Sweden:

 Swedish {O} sv 95.0% 99.0%
 Finnish {OR} fi 2.2%

 So if the similar query is executed on Finnish language, and the results
 also show some undue proportion of visits from Sweden, then what you
 observed as anomaly is the that unique. We probably need many iterations of
 comparative outcomes and normalization of data (Sweden does have higher
 population).  Also, it might be handy to have some statistics on immigration
 or residence, it is EU. I will not be surprised that for example the  visits
 from Oxford to Wikipedia website have sizable German language requests.

 I am still a bit bothered by the number 1 in the current dataset. It
 does not feel right since the numbers of 1.4% and 0.6% is a notable
 difference in this regard. Perhaps we need some high precision universal
 percentage number for each territory-language pair. It would be also great
 to do another set of aggregation: i.e. given a territory, which language
 versions of Wikipedia are accessed

 Best,
 han-teng liao

 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk:

 Hi Oliver,


 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?

 In Python:

  import pandas as pd
  df =
  pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
  sep='\t')
  df.ix[df.project == 'da.wikipedia.org', ['country',
  'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5


 MaxMind has some numbers on their own accuracy:

 https://www.maxmind.com/en/geoip2-city-database-accuracy

 For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if
 this really could bias the result so much.

 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.



 best regards
 Finn



 On 02/25/2015 10:06 PM, Oliver Keyes wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!



 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-02 Thread Oliver Keyes
Indeed! Orienting it that way (pivoting on language rather than
project) is something several people have asked for; I plan to spend a
chunk of my spare time (that is, recreational time) trying to make it
work. Should be fairly trivial.

On 2 March 2015 at 09:55, h hant...@gmail.com wrote:
 Hello Finn,
I do not have a specific answer to your question. However, it might be
 worthwhile to add Finnish in to the comparison as according to the CLDR 26
 T-L information
 http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html

You have some sizable Finnish language speakers in Sweden:

 Swedish {O} sv 95.0% 99.0%
 Finnish {OR} fi 2.2%

 So if the similar query is executed on Finnish language, and the results
 also show some undue proportion of visits from Sweden, then what you
 observed as anomaly is the that unique. We probably need many iterations of
 comparative outcomes and normalization of data (Sweden does have higher
 population).  Also, it might be handy to have some statistics on immigration
 or residence, it is EU. I will not be surprised that for example the  visits
 from Oxford to Wikipedia website have sizable German language requests.

 I am still a bit bothered by the number 1 in the current dataset. It
 does not feel right since the numbers of 1.4% and 0.6% is a notable
 difference in this regard. Perhaps we need some high precision universal
 percentage number for each territory-language pair. It would be also great
 to do another set of aggregation: i.e. given a territory, which language
 versions of Wikipedia are accessed

 Best,
 han-teng liao

 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk:

 Hi Oliver,


 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?

 In Python:

  import pandas as pd
  df =
  pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
  sep='\t')
  df.ix[df.project == 'da.wikipedia.org', ['country',
  'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5


 MaxMind has some numbers on their own accuracy:

 https://www.maxmind.com/en/geoip2-city-database-accuracy

 For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if
 this really could bias the result so much.

 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.



 best regards
 Finn



 On 02/25/2015 10:06 PM, Oliver Keyes wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!



 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-02 Thread h
Hello Finn,
   I do not have a specific answer to your question. However, it might be
worthwhile to add Finnish in to the comparison as according to the CLDR 26
T-L information
http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html


   You have some sizable Finnish language speakers in Sweden:

Swedish {O} sv 95.0% 99.0%
Finnish {OR} fi 2.2%

So if the similar query is executed on Finnish language, and the
results also show some undue proportion of visits from Sweden, then what
you observed as anomaly is the that unique. We probably need many
iterations of comparative outcomes and normalization of data (Sweden does
have higher population).  Also, it might be handy to have some statistics
on immigration or residence, it is EU. I will not be surprised that for
example the  visits from Oxford to Wikipedia website have sizable German
language requests.

I am still a bit bothered by the number 1 in the current dataset. It
does not feel right since the numbers of 1.4% and 0.6% is a notable
difference in this regard. Perhaps we need some high precision universal
percentage number for each territory-language pair. It would be also great
to do another set of aggregation: i.e. given a territory, which language
versions of Wikipedia are accessed

Best,
han-teng liao

2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk:

 Hi Oliver,


 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?

 In Python:

  import pandas as pd
  df = pd.read_csv('http://files.figshare.com/1923822/language_
 pageviews_per_country.tsv', sep='\t')
  df.ix[df.project == 'da.wikipedia.org', ['country',
 'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5


 MaxMind has some numbers on their own accuracy:

 https://www.maxmind.com/en/geoip2-city-database-accuracy

 For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if
 this really could bias the result so much.

 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.



 best regards
 Finn



 On 02/25/2015 10:06 PM, Oliver Keyes wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!



 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-02 Thread Finn Årup Nielsen

Hi Oliver,


Interesting dataset! I am curious about why the Danish Wikipedia is so 
highly acccessed from Sweden. Could it be an error, e.g., with Telia 
IP-numbers?


In Python:

 import pandas as pd
 df = 
pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', 
sep='\t')
 df.ix[df.project == 'da.wikipedia.org', ['country', 
'pageviews_percentage']].set_index('country') 
pageviews_percentage

country
Austria1
China  1
Denmark   61
Estonia1
France 1
Germany2
Netherlands2
Norway 1
Sweden18
United Kingdom 3
United States  3
Other  5


MaxMind has some numbers on their own accuracy:

https://www.maxmind.com/en/geoip2-city-database-accuracy

For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder 
if this really could bias the result so much.


If the numbers are correct why would the Swedish read the Danish 
Wikipedia so much? Bots? It does not apply the other way around: Only 2% 
of the traffic to Swedish Wikipedia comes from Denmark.




best regards
Finn



On 02/25/2015 10:06 PM, Oliver Keyes wrote:

Hey all!

We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

Hope it's useful to people!




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] [Release]

2015-02-25 Thread Oliver Keyes
Hey all!

We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

Hope it's useful to people!

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Andrew Lih
Great job.

Who knew Esperanto was big in Japan and China at #2 and #3?



On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Oliver Keyes
The one major caveat, I think, is that the danger of proportionate
data is that it makes small projects very vulnerable to artificial
traffic spikes. I'd go out on a limb and say that some of the massive
bumps in popularity we see in particular combinations are likely due
to either undetected automata or simply a project having so little
traffic that a small number of people can sway the results
outlandishly.

On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote:
 Great job.

 Who knew Esperanto was big in Japan and China at #2 and #3?



 On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Giovanni Luca Ciampaglia
This is really, really cool, great job guys!

G


Giovanni Luca Ciampaglia

✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
☞ http://www.glciampaglia.com/
✆ +1 812 855-7261
✉ gciam...@indiana.edu

2015-02-25 16:06 GMT-05:00 Oliver Keyes oke...@wikimedia.org:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l