Re: [Wiki-research-l] [Release]

2015-03-04 Thread Scott Hale
Oliver:

> Scott Hale and I have been working on a paper looking at global reach
> and how it tracks with internet access growth, in the context of
> editing, particularly looking at the mobile web. That, we should be
> done with by then; presenting it could be highly useful (Scott? ;p)


I see what you did there, Oliver :J
I believe the showcase in is two weeks (3rd Wednesday of the month), which
is a bit too tight to make sure everything is really checked and accurate.
I'm in Asia in April, but *we* could definitely present in May on the work,
which as Oliver said is correlating Wikipedia editor numbers with mobile
and broadband penetration data on a country level.

Dario:

> I wonder how many requests from US-based bots/automata we’re still failing
> to detect.


This reminds me that I would like to engage with the technical development
team on the idea of storing the application (i.e., oauth consumer id) for
each edit made through the API. Not all bots use the API, I guess, but I
would venture that many (maybe most) do and tracking them would then become
trivial. Tracking the applications used to make edits via the API would
also allow tracking of alternative editing interfaces (e.g., visual editor
uses the API and perhaps AutoWikiBrowser or others do as well.) I've never
proposed any technical enhancement requests for Mediawiki and so very much
welcome guidance.

Best wishes,
Scott
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
On 4 March 2015 at 04:28, Pine W  wrote:
> I'm not sure how much influence I have, but I would be happy to make
> whispers in appropriate places to try to get more support, if that's
> helpful.
>

I think I'm probably good, but thank you.

> Perhaps you could show your work at the next Research and Data showcase? I
> for one would be interested in seeing a presentation.

That's in 3 weeks; I'm not convinced that a piece of substantive,
useful research about global reach could be done in that time period
even if I could drop everything I currently have (which I can't). This
problem is too big and too important to be scheduled around meetings;
things should work the other way around.

Scott Hale and I have been working on a paper looking at global reach
and how it tracks with internet access growth, in the context of
editing, particularly looking at the mobile web. That, we should be
done with by then; presenting it could be highly useful (Scott? ;p)

>
> Pine
>
> This is an Encyclopedia
> One gateway to the wide garden of knowledge, where lies
> The deep rock of our past, in which we must delve
> The well of our future,
> The clear water we must leave untainted for those who come after us,
> The fertile earth, in which truth may grow in bright places, tended by many
> hands,
> And the broad fall of sunshine, warming our first steps toward knowing how
> much we do not know.
> —Catherine Munro
>
>
>
> On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes  wrote:
>>
>> That is the question, and I agree with your conclusion. I'm hoping to
>> do more research into this; getting buyin internally has been tough,
>> but I'm confident of making progress on that front over the next few
>> weeks and months.
>>
>> On 4 March 2015 at 04:13, Cristian Consonni 
>> wrote:
>> > 2015-03-04 8:44 GMT+01:00 Dario Taraborelli
>> > :
>> >> yay, shiny! The map is a pretty compelling way to show how dominant
>> >> traffic from the US is, even for very minor languages (say
>> >> bi.wikipedia.org), I wonder how many requests from US-based bots/automata
>> >> we’re still failing to detect.
>> >
>> > Still, the question could be: are we fulfilling the mission?
>> > (hint: probably not)
>> >
>> > Cristian
>> >
>> > ___
>> > Wiki-research-l mailing list
>> > Wiki-research-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Pine W
I'm not sure how much influence I have, but I would be happy to make
whispers in appropriate places to try to get more support, if that's
helpful.

Perhaps you could show your work at the next Research and Data showcase? I
for one would be interested in seeing a presentation.

Pine

*This is an Encyclopedia* 






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes  wrote:

> That is the question, and I agree with your conclusion. I'm hoping to
> do more research into this; getting buyin internally has been tough,
> but I'm confident of making progress on that front over the next few
> weeks and months.
>
> On 4 March 2015 at 04:13, Cristian Consonni 
> wrote:
> > 2015-03-04 8:44 GMT+01:00 Dario Taraborelli  >:
> >> yay, shiny! The map is a pretty compelling way to show how dominant
> traffic from the US is, even for very minor languages (say
> bi.wikipedia.org), I wonder how many requests from US-based bots/automata
> we’re still failing to detect.
> >
> > Still, the question could be: are we fulfilling the mission?
> > (hint: probably not)
> >
> > Cristian
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
That is the question, and I agree with your conclusion. I'm hoping to
do more research into this; getting buyin internally has been tough,
but I'm confident of making progress on that front over the next few
weeks and months.

On 4 March 2015 at 04:13, Cristian Consonni  wrote:
> 2015-03-04 8:44 GMT+01:00 Dario Taraborelli :
>> yay, shiny! The map is a pretty compelling way to show how dominant traffic 
>> from the US is, even for very minor languages (say bi.wikipedia.org), I 
>> wonder how many requests from US-based bots/automata we’re still failing to 
>> detect.
>
> Still, the question could be: are we fulfilling the mission?
> (hint: probably not)
>
> Cristian
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Cristian Consonni
2015-03-04 8:44 GMT+01:00 Dario Taraborelli :
> yay, shiny! The map is a pretty compelling way to show how dominant traffic 
> from the US is, even for very minor languages (say bi.wikipedia.org), I 
> wonder how many requests from US-based bots/automata we’re still failing to 
> detect.

Still, the question could be: are we fulfilling the mission?
(hint: probably not)

Cristian

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-04 Thread Oliver Keyes
'Lots, but that's not currently anyone's job'

On Wednesday, 4 March 2015, Dario Taraborelli 
wrote:

> yay, shiny! The map is a pretty compelling way to show how dominant
> traffic from the US is, even for very minor languages (say
> bi.wikipedia.org), I wonder how many requests from US-based bots/automata
> we’re still failing to detect.
>
> > On Mar 3, 2015, at 9:29 PM, Oliver Keyes  > wrote:
> >
> > Update: the original Shiny instance went down due to server load soon
> > after release. It's now up again at http://datavis.wmflabs.org/where/
> > on a dedicated Labs machine, where we hope to put...many more
> > visualisations. It also now has mapping, largely thanks to Sarah
> > Laplante (http://sarahlaplante.com/), and soon it will hopefully be
> > /non-hideous/ mapping (the current mass of blue and grey is because my
> > aesthetic tastes are...I don't actually have any aesthetic tastes)
> >
> > On 2 March 2015 at 22:36, Oliver Keyes  > wrote:
> >> Indeed! Orienting it that way (pivoting on language rather than
> >> project) is something several people have asked for; I plan to spend a
> >> chunk of my spare time (that is, recreational time) trying to make it
> >> work. Should be fairly trivial.
> >>
> >> On 2 March 2015 at 09:55, h > wrote:
> >>> Hello Finn,
> >>>   I do not have a specific answer to your question. However, it might
> be
> >>> worthwhile to add Finnish in to the comparison as according to the
> CLDR 26
> >>> T-L information
> >>>
> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
> >>>
> >>>   You have some sizable Finnish language speakers in Sweden:
> >>>
> >>> Swedish {O} sv 95.0% 99.0%
> >>> Finnish {OR} fi 2.2%
> >>>
> >>>So if the similar query is executed on Finnish language, and the
> results
> >>> also show some "undue" proportion of visits from Sweden, then what you
> >>> observed as anomaly is the that unique. We probably need many
> iterations of
> >>> comparative outcomes and normalization of data (Sweden does have higher
> >>> population).  Also, it might be handy to have some statistics on
> immigration
> >>> or residence, it is EU. I will not be surprised that for example the
> visits
> >>> from Oxford to Wikipedia website have sizable German language requests.
> >>>
> >>>I am still a bit bothered by the number "1" in the current dataset.
> It
> >>> does not feel right since the numbers of 1.4% and 0.6% is a notable
> >>> difference in this regard. Perhaps we need some high precision
> "universal
> >>> percentage" number for each territory-language pair. It would be also
> great
> >>> to do another set of aggregation: i.e. given a territory, which
> language
> >>> versions of Wikipedia are accessed
> >>>
> >>> Best,
> >>> han-teng liao
> >>>
> >>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen  >:
> 
>  Hi Oliver,
> 
> 
>  Interesting dataset! I am curious about why the Danish Wikipedia is so
>  highly acccessed from Sweden. Could it be an error, e.g., with Telia
>  IP-numbers?
> 
>  In Python:
> 
> >>> import pandas as pd
> >>> df =
> >>> pd.read_csv('
> http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
> >>> sep='\t')
> >>> df.ix[df.project == 'da.wikipedia.org', ['country',
> >>> 'pageviews_percentage']].set_index('country') pageviews_percentage
>  country
>  Austria1
>  China  1
>  Denmark   61
>  Estonia1
>  France 1
>  Germany2
>  Netherlands2
>  Norway 1
>  Sweden18
>  United Kingdom 3
>  United States  3
>  Other  5
> 
> 
>  MaxMind has some numbers on their own accuracy:
> 
>  https://www.maxmind.com/en/geoip2-city-database-accuracy
> 
>  For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I
> wonder if
>  this really could bias the result so much.
> 
>  If the numbers are correct why would the Swedish read the Danish
> Wikipedia
>  so much? Bots? It does not apply the other way around: Only 2% of the
>  traffic to Swedish Wikipedia comes from Denmark.
> 
> 
> 
>  best regards
>  Finn
> 
> 
> 
>  On 02/25/2015 10:06 PM, Oliver Keyes wrote:
> >
> > Hey all!
> >
> > We've released a highly-aggregated dataset of readership data -
> > specifically, data about where, geographically, traffic to each of
> our
> > projects (and all of our projects) comes from. The data can be found
> > at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally,
> I've
> > put together an exploration tool for it at
> > https://ironholds.shinyapps.io/WhereInTheWorldIs

Re: [Wiki-research-l] [Release]

2015-03-03 Thread Dario Taraborelli
yay, shiny! The map is a pretty compelling way to show how dominant traffic 
from the US is, even for very minor languages (say bi.wikipedia.org), I wonder 
how many requests from US-based bots/automata we’re still failing to detect.

> On Mar 3, 2015, at 9:29 PM, Oliver Keyes  wrote:
> 
> Update: the original Shiny instance went down due to server load soon
> after release. It's now up again at http://datavis.wmflabs.org/where/
> on a dedicated Labs machine, where we hope to put...many more
> visualisations. It also now has mapping, largely thanks to Sarah
> Laplante (http://sarahlaplante.com/), and soon it will hopefully be
> /non-hideous/ mapping (the current mass of blue and grey is because my
> aesthetic tastes are...I don't actually have any aesthetic tastes)
> 
> On 2 March 2015 at 22:36, Oliver Keyes  wrote:
>> Indeed! Orienting it that way (pivoting on language rather than
>> project) is something several people have asked for; I plan to spend a
>> chunk of my spare time (that is, recreational time) trying to make it
>> work. Should be fairly trivial.
>> 
>> On 2 March 2015 at 09:55, h  wrote:
>>> Hello Finn,
>>>   I do not have a specific answer to your question. However, it might be
>>> worthwhile to add Finnish in to the comparison as according to the CLDR 26
>>> T-L information
>>> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
>>> 
>>>   You have some sizable Finnish language speakers in Sweden:
>>> 
>>> Swedish {O} sv 95.0% 99.0%
>>> Finnish {OR} fi 2.2%
>>> 
>>>So if the similar query is executed on Finnish language, and the results
>>> also show some "undue" proportion of visits from Sweden, then what you
>>> observed as anomaly is the that unique. We probably need many iterations of
>>> comparative outcomes and normalization of data (Sweden does have higher
>>> population).  Also, it might be handy to have some statistics on immigration
>>> or residence, it is EU. I will not be surprised that for example the  visits
>>> from Oxford to Wikipedia website have sizable German language requests.
>>> 
>>>I am still a bit bothered by the number "1" in the current dataset. It
>>> does not feel right since the numbers of 1.4% and 0.6% is a notable
>>> difference in this regard. Perhaps we need some high precision "universal
>>> percentage" number for each territory-language pair. It would be also great
>>> to do another set of aggregation: i.e. given a territory, which language
>>> versions of Wikipedia are accessed
>>> 
>>> Best,
>>> han-teng liao
>>> 
>>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen :
 
 Hi Oliver,
 
 
 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?
 
 In Python:
 
>>> import pandas as pd
>>> df =
>>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>>> sep='\t')
>>> df.ix[df.project == 'da.wikipedia.org', ['country',
>>> 'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5
 
 
 MaxMind has some numbers on their own accuracy:
 
 https://www.maxmind.com/en/geoip2-city-database-accuracy
 
 For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
 this really could bias the result so much.
 
 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.
 
 
 
 best regards
 Finn
 
 
 
 On 02/25/2015 10:06 PM, Oliver Keyes wrote:
> 
> Hey all!
> 
> We've released a highly-aggregated dataset of readership data -
> specifically, data about where, geographically, traffic to each of our
> projects (and all of our projects) comes from. The data can be found
> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
> put together an exploration tool for it at
> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
> 
> Hope it's useful to people!
> 
 
 
 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/w

Re: [Wiki-research-l] [Release]

2015-03-03 Thread Oliver Keyes
Update: the original Shiny instance went down due to server load soon
after release. It's now up again at http://datavis.wmflabs.org/where/
on a dedicated Labs machine, where we hope to put...many more
visualisations. It also now has mapping, largely thanks to Sarah
Laplante (http://sarahlaplante.com/), and soon it will hopefully be
/non-hideous/ mapping (the current mass of blue and grey is because my
aesthetic tastes are...I don't actually have any aesthetic tastes)

On 2 March 2015 at 22:36, Oliver Keyes  wrote:
> Indeed! Orienting it that way (pivoting on language rather than
> project) is something several people have asked for; I plan to spend a
> chunk of my spare time (that is, recreational time) trying to make it
> work. Should be fairly trivial.
>
> On 2 March 2015 at 09:55, h  wrote:
>> Hello Finn,
>>I do not have a specific answer to your question. However, it might be
>> worthwhile to add Finnish in to the comparison as according to the CLDR 26
>> T-L information
>> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
>>
>>You have some sizable Finnish language speakers in Sweden:
>>
>> Swedish {O} sv 95.0% 99.0%
>> Finnish {OR} fi 2.2%
>>
>> So if the similar query is executed on Finnish language, and the results
>> also show some "undue" proportion of visits from Sweden, then what you
>> observed as anomaly is the that unique. We probably need many iterations of
>> comparative outcomes and normalization of data (Sweden does have higher
>> population).  Also, it might be handy to have some statistics on immigration
>> or residence, it is EU. I will not be surprised that for example the  visits
>> from Oxford to Wikipedia website have sizable German language requests.
>>
>> I am still a bit bothered by the number "1" in the current dataset. It
>> does not feel right since the numbers of 1.4% and 0.6% is a notable
>> difference in this regard. Perhaps we need some high precision "universal
>> percentage" number for each territory-language pair. It would be also great
>> to do another set of aggregation: i.e. given a territory, which language
>> versions of Wikipedia are accessed
>>
>> Best,
>> han-teng liao
>>
>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen :
>>>
>>> Hi Oliver,
>>>
>>>
>>> Interesting dataset! I am curious about why the Danish Wikipedia is so
>>> highly acccessed from Sweden. Could it be an error, e.g., with Telia
>>> IP-numbers?
>>>
>>> In Python:
>>>
>>> >>> import pandas as pd
>>> >>> df =
>>> >>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>>> >>> sep='\t')
>>> >>> df.ix[df.project == 'da.wikipedia.org', ['country',
>>> >>> 'pageviews_percentage']].set_index('country') pageviews_percentage
>>> country
>>> Austria1
>>> China  1
>>> Denmark   61
>>> Estonia1
>>> France 1
>>> Germany2
>>> Netherlands2
>>> Norway 1
>>> Sweden18
>>> United Kingdom 3
>>> United States  3
>>> Other  5
>>>
>>>
>>> MaxMind has some numbers on their own accuracy:
>>>
>>> https://www.maxmind.com/en/geoip2-city-database-accuracy
>>>
>>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
>>> this really could bias the result so much.
>>>
>>> If the numbers are correct why would the Swedish read the Danish Wikipedia
>>> so much? Bots? It does not apply the other way around: Only 2% of the
>>> traffic to Swedish Wikipedia comes from Denmark.
>>>
>>>
>>>
>>> best regards
>>> Finn
>>>
>>>
>>>
>>> On 02/25/2015 10:06 PM, Oliver Keyes wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

>>>
>>>
>>> --
>>> Finn Årup Nielsen
>>> http://people.compute.dtu.dk/faan/
>>>
>>>
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@list

Re: [Wiki-research-l] [Release]

2015-03-02 Thread Oliver Keyes
Indeed! Orienting it that way (pivoting on language rather than
project) is something several people have asked for; I plan to spend a
chunk of my spare time (that is, recreational time) trying to make it
work. Should be fairly trivial.

On 2 March 2015 at 09:55, h  wrote:
> Hello Finn,
>I do not have a specific answer to your question. However, it might be
> worthwhile to add Finnish in to the comparison as according to the CLDR 26
> T-L information
> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
>
>You have some sizable Finnish language speakers in Sweden:
>
> Swedish {O} sv 95.0% 99.0%
> Finnish {OR} fi 2.2%
>
> So if the similar query is executed on Finnish language, and the results
> also show some "undue" proportion of visits from Sweden, then what you
> observed as anomaly is the that unique. We probably need many iterations of
> comparative outcomes and normalization of data (Sweden does have higher
> population).  Also, it might be handy to have some statistics on immigration
> or residence, it is EU. I will not be surprised that for example the  visits
> from Oxford to Wikipedia website have sizable German language requests.
>
> I am still a bit bothered by the number "1" in the current dataset. It
> does not feel right since the numbers of 1.4% and 0.6% is a notable
> difference in this regard. Perhaps we need some high precision "universal
> percentage" number for each territory-language pair. It would be also great
> to do another set of aggregation: i.e. given a territory, which language
> versions of Wikipedia are accessed
>
> Best,
> han-teng liao
>
> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen :
>>
>> Hi Oliver,
>>
>>
>> Interesting dataset! I am curious about why the Danish Wikipedia is so
>> highly acccessed from Sweden. Could it be an error, e.g., with Telia
>> IP-numbers?
>>
>> In Python:
>>
>> >>> import pandas as pd
>> >>> df =
>> >>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>> >>> sep='\t')
>> >>> df.ix[df.project == 'da.wikipedia.org', ['country',
>> >>> 'pageviews_percentage']].set_index('country') pageviews_percentage
>> country
>> Austria1
>> China  1
>> Denmark   61
>> Estonia1
>> France 1
>> Germany2
>> Netherlands2
>> Norway 1
>> Sweden18
>> United Kingdom 3
>> United States  3
>> Other  5
>>
>>
>> MaxMind has some numbers on their own accuracy:
>>
>> https://www.maxmind.com/en/geoip2-city-database-accuracy
>>
>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
>> this really could bias the result so much.
>>
>> If the numbers are correct why would the Swedish read the Danish Wikipedia
>> so much? Bots? It does not apply the other way around: Only 2% of the
>> traffic to Swedish Wikipedia comes from Denmark.
>>
>>
>>
>> best regards
>> Finn
>>
>>
>>
>> On 02/25/2015 10:06 PM, Oliver Keyes wrote:
>>>
>>> Hey all!
>>>
>>> We've released a highly-aggregated dataset of readership data -
>>> specifically, data about where, geographically, traffic to each of our
>>> projects (and all of our projects) comes from. The data can be found
>>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
>>> put together an exploration tool for it at
>>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>>
>>> Hope it's useful to people!
>>>
>>
>>
>> --
>> Finn Årup Nielsen
>> http://people.compute.dtu.dk/faan/
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-02 Thread h
Hello Finn,
   I do not have a specific answer to your question. However, it might be
worthwhile to add Finnish in to the comparison as according to the CLDR 26
T-L information
http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html


   You have some sizable Finnish language speakers in Sweden:

Swedish {O} sv 95.0% 99.0%
Finnish {OR} fi 2.2%

So if the similar query is executed on Finnish language, and the
results also show some "undue" proportion of visits from Sweden, then what
you observed as anomaly is the that unique. We probably need many
iterations of comparative outcomes and normalization of data (Sweden does
have higher population).  Also, it might be handy to have some statistics
on immigration or residence, it is EU. I will not be surprised that for
example the  visits from Oxford to Wikipedia website have sizable German
language requests.

I am still a bit bothered by the number "1" in the current dataset. It
does not feel right since the numbers of 1.4% and 0.6% is a notable
difference in this regard. Perhaps we need some high precision "universal
percentage" number for each territory-language pair. It would be also great
to do another set of aggregation: i.e. given a territory, which language
versions of Wikipedia are accessed

Best,
han-teng liao

2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen :

> Hi Oliver,
>
>
> Interesting dataset! I am curious about why the Danish Wikipedia is so
> highly acccessed from Sweden. Could it be an error, e.g., with Telia
> IP-numbers?
>
> In Python:
>
> >>> import pandas as pd
> >>> df = pd.read_csv('http://files.figshare.com/1923822/language_
> pageviews_per_country.tsv', sep='\t')
> >>> df.ix[df.project == 'da.wikipedia.org', ['country',
> 'pageviews_percentage']].set_index('country') pageviews_percentage
> country
> Austria1
> China  1
> Denmark   61
> Estonia1
> France 1
> Germany2
> Netherlands2
> Norway 1
> Sweden18
> United Kingdom 3
> United States  3
> Other  5
>
>
> MaxMind has some numbers on their own accuracy:
>
> https://www.maxmind.com/en/geoip2-city-database-accuracy
>
> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
> this really could bias the result so much.
>
> If the numbers are correct why would the Swedish read the Danish Wikipedia
> so much? Bots? It does not apply the other way around: Only 2% of the
> traffic to Swedish Wikipedia comes from Denmark.
>
>
>
> best regards
> Finn
>
>
>
> On 02/25/2015 10:06 PM, Oliver Keyes wrote:
>
>> Hey all!
>>
>> We've released a highly-aggregated dataset of readership data -
>> specifically, data about where, geographically, traffic to each of our
>> projects (and all of our projects) comes from. The data can be found
>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
>> put together an exploration tool for it at
>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>
>> Hope it's useful to people!
>>
>>
>
> --
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-02 Thread Finn Årup Nielsen

Hi Oliver,


Interesting dataset! I am curious about why the Danish Wikipedia is so 
highly acccessed from Sweden. Could it be an error, e.g., with Telia 
IP-numbers?


In Python:

>>> import pandas as pd
>>> df = 
pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', 
sep='\t')
>>> df.ix[df.project == 'da.wikipedia.org', ['country', 
'pageviews_percentage']].set_index('country') 
pageviews_percentage

country
Austria1
China  1
Denmark   61
Estonia1
France 1
Germany2
Netherlands2
Norway 1
Sweden18
United Kingdom 3
United States  3
Other  5


MaxMind has some numbers on their own accuracy:

https://www.maxmind.com/en/geoip2-city-database-accuracy

For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder 
if this really could bias the result so much.


If the numbers are correct why would the Swedish read the Danish 
Wikipedia so much? Bots? It does not apply the other way around: Only 2% 
of the traffic to Swedish Wikipedia comes from Denmark.




best regards
Finn



On 02/25/2015 10:06 PM, Oliver Keyes wrote:

Hey all!

We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

Hope it's useful to people!




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Giovanni Luca Ciampaglia
This is really, really cool, great job guys!

G


Giovanni Luca Ciampaglia

✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
☞ http://www.glciampaglia.com/
✆ +1 812 855-7261
✉ gciam...@indiana.edu

2015-02-25 16:06 GMT-05:00 Oliver Keyes :

> Hey all!
>
> We've released a highly-aggregated dataset of readership data -
> specifically, data about where, geographically, traffic to each of our
> projects (and all of our projects) comes from. The data can be found
> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
> put together an exploration tool for it at
> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>
> Hope it's useful to people!
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Oliver Keyes
The one major caveat, I think, is that the danger of proportionate
data is that it makes small projects very vulnerable to artificial
traffic spikes. I'd go out on a limb and say that some of the massive
bumps in popularity we see in particular combinations are likely due
to either undetected automata or simply a project having so little
traffic that a small number of people can sway the results
outlandishly.

On 25 February 2015 at 16:32, Andrew Lih  wrote:
> Great job.
>
> Who knew Esperanto was big in Japan and China at #2 and #3?
>
>
>
> On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes  wrote:
>>
>> Hey all!
>>
>> We've released a highly-aggregated dataset of readership data -
>> specifically, data about where, geographically, traffic to each of our
>> projects (and all of our projects) comes from. The data can be found
>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
>> put together an exploration tool for it at
>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>
>> Hope it's useful to people!
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Andrew Lih
Great job.

Who knew Esperanto was big in Japan and China at #2 and #3?



On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes  wrote:

> Hey all!
>
> We've released a highly-aggregated dataset of readership data -
> specifically, data about where, geographically, traffic to each of our
> projects (and all of our projects) comes from. The data can be found
> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
> put together an exploration tool for it at
> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>
> Hope it's useful to people!
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l