Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Oliver Keyes
Makes sense! I actually hadn't factored in that sort of action
(although it does happen), more: the order of the main page links on
the root www.wikipedia.org page.

On 7 May 2015 at 03:51, Scott Hale computermacgy...@gmail.com wrote:
 The accept-language header is the obvious place to start, but there is amble
 scope to combine multiple approaches together.

 In addition to accept-language and geolocation data, any logged in user will
 have view/edit history related to multiple editions. If the user is
 requesting a specific article, (e.g., https://www.wikipedia.org/wiki/普天間飛行場
 ) we also can take account of what editions actually have the article ---
 the vast majority of content on Wikipedia only exists in one language or a
 few languages. (I.e., the above link redirects me to create the article on
 en-wiki although it exists on ja-wiki and Japanese is my second preferred
 language by my accept-language header and is an edition I edit captured in
 my edit history)

 This isn't an either-or question of which to use, but rather a question of
 how all these indicators can be used together to create the best experience.
 I would venture that most users don't change their accept-language header
 (not even possible on some mobile browsers!) and hence probably list give
 only one language. If so, geography and edit history can be signals for
 possible second languages beyond the one language in the accept-language
 header when hitting the homepage without a specific article.

 Cheers,
 Scott

 P.S. It looks like the Universal Language Selector already uses the
 accept-language header for its preference screen.

 On Thu, May 7, 2015 at 5:58 AM, Oliver Keyes oke...@wikimedia.org wrote:

 As I've now said...4 times, I don't think we'd be using geolocation.
 We'd be using the accept-language header. See
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

 On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com
 wrote:
  When a reader comes to Wikipedia from the web we can detect their IP
  address and that usually geolocates them to a country. More often than not
  that then tells you the dominant language of that country.
 
  If we were to default to official or dominant languages then I predict
  endless arguments as to which language(s) should be the default in which
  countries. The large expat community in some parts of the Arab world might
  prefer English over Arabic. India would want to do things by state, and a
  whole new front would emerge in the Israeli Palestine debate.
 
  Regards
 
  Jonathan Cardy
 
 


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Scott Hale
The accept-language header is the obvious place to start, but there is
amble scope to combine multiple approaches together.

In addition to accept-language and geolocation data, any logged in user
will have view/edit history related to multiple editions. If the user is
requesting a specific article, (e.g., https://www.wikipedia.org/wiki/普天間飛行場
https://www.wikipedia.org/wiki/%E6%99%AE%E5%A4%A9%E9%96%93%E9%A3%9B%E8%A1%8C%E5%A0%B4+)
we also can take account of what editions actually have the article --- the
vast majority of content on Wikipedia only exists in one language or a few
languages. (I.e., the above link redirects me to create the article on
en-wiki although it exists on ja-wiki and Japanese is my second preferred
language by my accept-language header and is an edition I edit captured in
my edit history)

This isn't an either-or question of which to use, but rather a question of
how all these indicators can be used together to create the best
experience. I would venture that most users don't change their
accept-language header (not even possible on some mobile browsers!) and
hence probably list give only one language. If so, geography and edit
history can be signals for possible second languages beyond the one
language in the accept-language header when hitting the homepage without a
specific article.

Cheers,
Scott

P.S. It looks like the Universal Language Selector already uses the
accept-language header for its preference screen.

On Thu, May 7, 2015 at 5:58 AM, Oliver Keyes oke...@wikimedia.org wrote:

 As I've now said...4 times, I don't think we'd be using geolocation.
 We'd be using the accept-language header. See
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

 On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com
 wrote:
  When a reader comes to Wikipedia from the web we can detect their IP
 address and that usually geolocates them to a country. More often than not
 that then tells you the dominant language of that country.
 
  If we were to default to official or dominant languages then I predict
 endless arguments as to which language(s) should be the default in which
 countries. The large expat community in some parts of the Arab world might
 prefer English over Arabic. India would want to do things by state, and a
 whole new front would emerge in the Israeli Palestine debate.
 
  Regards
 
  Jonathan Cardy
 
 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Stuart A. Yeates
Accept-language is systematically broken for minority languages within
dominant language communities. In New Zealand, a country with three
official languages and a textbook case of language revivalism, I've never
met anyone without a degree in computer science who sets accept-language,
and I've never seen a computer system which ships with all three official
languages selectable. Most computer systems ship with en or en-us as the
default.

If there were silver bullets in this area, the solution would be obvious
and we wouldn't even be thinking about having this conversation.

cheers
stuart

On Thursday, May 7, 2015, Oliver Keyes oke...@wikimedia.org wrote:

 As I've now said...4 times, I don't think we'd be using geolocation.
 We'd be using the accept-language header. See
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

 On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com
 wrote:
  When a reader comes to Wikipedia from the web we can detect their IP
 address and that usually geolocates them to a country. More often than not
 that then tells you the dominant language of that country.
 
  If we were to default to official or dominant languages then I predict
 endless arguments as to which language(s) should be the default in which
 countries. The large expat community in some parts of the Arab world might
 prefer English over Arabic. India would want to do things by state, and a
 whole new front would emerge in the Israeli Palestine debate.
 
  Regards
 
  Jonathan Cardy
 
 
  On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:
 
  hey guys, you can't guess geolocation, because occasionally you'd be
  wrong. this happens to me all the time. I want to read a site in
  spanish... and then it thinks I'm in Latin America, when I'm not.
 
  --Sam
 
  On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org
 wrote:
  Possibly. But that sounds potentially wooly and sometimes inaccurate.
 
  When a browser makes a web request, it sends a header called the
  accept_language header
  (
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
  which indicates what languages the browser finds ideal - i.e., what
  languages the user and system are using.
 
  If we're going to make modifications here (I hope we will. But again;
  early days) I don't see a good argument for using geolocation, which
  is, as you've noted, flawed without substantial time and energy being
  applied to map those countries to probable languages. The data the
  browser already sends to the server contains the /certain/ languages.
  We can just use that.
 
  On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
  This seems like a great place to use analytics data, for each division
  in the geo-location classification, rank each of the languages by
  usage and present the top N as likely candidates (+ browser settings)
  when we need the user to pick a language.
 
  cheers
  stuart
  --
  ...let us be heard from red core to black sky
 
 
  On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org
 wrote:
 
  Stuart A. Yeates syea...@gmail.com writes:
 
  Reading that excellent presentation, the thought that struck me was:
 
  If I wanted to subvert the assumption that Wikipedia == en.wiki,
  linking to http://www.wikipedia.org/ is what I'd do.
 
  A smarter http://www.wikipedia.org/ might guess geo-location and
 thus
  local languages.
 
  I'd also like to see something smarter done at the main page, but the
  and thus bit here is notoriously tricky.
 
  For example most geolocation-based things, like Wikidata by default,
  tend to produce funny results in Denmark. A Copenhagener is offered
  something like this choice, in order:
 
  * Danish, Greelandic, Faroese, Swedish, German, ...
 
  The reasoning here is that Danish, Greenlandic, and Faroese are
 official
  languages of the Danish Realm, which includes both Denmark proper,
 and
  two autonomous territories, Greeland and the Faroe Islands. And then
  Sweden and Germany are the two neighboring countries.
 
  But for the average Copenhagener, the following order is far more
  likely:
 
  * Danish, English, Norwegian Bokmål, ...
 
  The reason here is that Norwegian Bokmål is very close to Danish in
  written form (more than Swedish is, and especially more than Faroese
 is)
  while English is a widely used semi-official language in business,
  government, and education (for example about half of university
 theses
  are now written in English, and several major companies use it as
 their
  official workplace language).
 
  I think it's possible to come up with something that better aligns
 with
  readers' actual preferences, but it's not easy!
 
  -Mark
 
  --
  Mark J. Nelson
  Anadrome Research
  http://www.kmjn.org
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Federico Leva (Nemo)
Thanks for looking into www.wikipedia.org traffic from India; I've been 
complaining about it for a while. :) See also:

* https://phabricator.wikimedia.org/T26767
* https://phabricator.wikimedia.org/T5665

Mark J. Nelson, 07/05/2015 04:24:

But for the average Copenhagener, the following order is far more
likely:

* Danish, English, Norwegian Bokmål, ...


This is something you can help fix. Please do!
https://www.mediawiki.org/wiki/ULS/FAQ#language-territory

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Oliver Keyes
Thanks for the bugs, Nemo!

(search team: should we take those over?)

On 7 May 2015 at 03:08, Federico Leva (Nemo) nemow...@gmail.com wrote:
 Thanks for looking into www.wikipedia.org traffic from India; I've been
 complaining about it for a while. :) See also:
 * https://phabricator.wikimedia.org/T26767
 * https://phabricator.wikimedia.org/T5665

 Mark J. Nelson, 07/05/2015 04:24:

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...


 This is something you can help fix. Please do!
 https://www.mediawiki.org/wiki/ULS/FAQ#language-territory

 Nemo


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Oliver Keyes
Interesting! This I didn't know; I'll factor it in :).

On 7 May 2015 at 04:48, Stuart A. Yeates syea...@gmail.com wrote:
 Accept-language is systematically broken for minority languages within
 dominant language communities. In New Zealand, a country with three official
 languages and a textbook case of language revivalism, I've never met anyone
 without a degree in computer science who sets accept-language, and I've
 never seen a computer system which ships with all three official languages
 selectable. Most computer systems ship with en or en-us as the default.

 If there were silver bullets in this area, the solution would be obvious and
 we wouldn't even be thinking about having this conversation.

 cheers
 stuart

 On Thursday, May 7, 2015, Oliver Keyes oke...@wikimedia.org wrote:

 As I've now said...4 times, I don't think we'd be using geolocation.
 We'd be using the accept-language header. See
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

 On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com
 wrote:
  When a reader comes to Wikipedia from the web we can detect their IP
  address and that usually geolocates them to a country. More often than not
  that then tells you the dominant language of that country.
 
  If we were to default to official or dominant languages then I predict
  endless arguments as to which language(s) should be the default in which
  countries. The large expat community in some parts of the Arab world might
  prefer English over Arabic. India would want to do things by state, and a
  whole new front would emerge in the Israeli Palestine debate.
 
  Regards
 
  Jonathan Cardy
 
 
  On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:
 
  hey guys, you can't guess geolocation, because occasionally you'd be
  wrong. this happens to me all the time. I want to read a site in
  spanish... and then it thinks I'm in Latin America, when I'm not.
 
  --Sam
 
  On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org
  wrote:
  Possibly. But that sounds potentially wooly and sometimes inaccurate.
 
  When a browser makes a web request, it sends a header called the
  accept_language header
 
  (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
  which indicates what languages the browser finds ideal - i.e., what
  languages the user and system are using.
 
  If we're going to make modifications here (I hope we will. But again;
  early days) I don't see a good argument for using geolocation, which
  is, as you've noted, flawed without substantial time and energy being
  applied to map those countries to probable languages. The data the
  browser already sends to the server contains the /certain/ languages.
  We can just use that.
 
  On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
  This seems like a great place to use analytics data, for each
  division
  in the geo-location classification, rank each of the languages by
  usage and present the top N as likely candidates (+ browser settings)
  when we need the user to pick a language.
 
  cheers
  stuart
  --
  ...let us be heard from red core to black sky
 
 
  On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org
  wrote:
 
  Stuart A. Yeates syea...@gmail.com writes:
 
  Reading that excellent presentation, the thought that struck me
  was:
 
  If I wanted to subvert the assumption that Wikipedia == en.wiki,
  linking to http://www.wikipedia.org/ is what I'd do.
 
  A smarter http://www.wikipedia.org/ might guess geo-location and
  thus
  local languages.
 
  I'd also like to see something smarter done at the main page, but
  the
  and thus bit here is notoriously tricky.
 
  For example most geolocation-based things, like Wikidata by default,
  tend to produce funny results in Denmark. A Copenhagener is offered
  something like this choice, in order:
 
  * Danish, Greelandic, Faroese, Swedish, German, ...
 
  The reasoning here is that Danish, Greenlandic, and Faroese are
  official
  languages of the Danish Realm, which includes both Denmark proper,
  and
  two autonomous territories, Greeland and the Faroe Islands. And then
  Sweden and Germany are the two neighboring countries.
 
  But for the average Copenhagener, the following order is far more
  likely:
 
  * Danish, English, Norwegian Bokmål, ...
 
  The reason here is that Norwegian Bokmål is very close to Danish in
  written form (more than Swedish is, and especially more than Faroese
  is)
  while English is a widely used semi-official language in business,
  government, and education (for example about half of university
  theses
  are now written in English, and several major companies use it as
  their
  official workplace language).
 
  I think it's possible to come up with something that better aligns
  with
  readers' actual preferences, but it's not easy!
 
  -Mark
 
  --
  Mark J. Nelson
  Anadrome Research
  http://www.kmjn.org
 
  

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Federico Leva (Nemo)

Scott Hale, 07/05/2015 09:51:

The accept-language header is the obvious place to start, but there is
amble scope to combine multiple approaches together.


Which is what UniversalLanguageSelector / jquery.uls, used on all 
Wikimedia projects, exists for. :)




In addition to accept-language and geolocation data, any logged in user
will have view/edit history related to multiple editions.


This was proposed at 
https://www.mediawiki.org/wiki/Talk:Universal_Language_Selector/Design/Interlanguage_links#Fetch_default_language_list_from_user_contributions 
. If you can think of a design/algorithm, please file: 
https://phabricator.wikimedia.org/maniphest/task/create/?parent=66793


Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
Traffic through Wikipedia zero; apologies for not being clear.

On 6 May 2015 at 19:56, Sam Katz smk...@gmail.com wrote:
 hey oliver,

 I don't mean to be a help vampire...

 but what is zero traffic? you think the traffic is being proxied?
 perhaps even reverse proxied?

 --Sam

 On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Sam Katz
hey oliver,

I don't mean to be a help vampire...

but what is zero traffic? you think the traffic is being proxied?
perhaps even reverse proxied?

--Sam

On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
Agreed! That's one of the changes I'd really like to push ahead with,
although we're going to do some more in-depth data collection before
any redesign :).

On 6 May 2015 at 20:27, Stuart A. Yeates syea...@gmail.com wrote:
 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 cheers
 stuart

 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Kerry Raymond
http://wikimediafoundation.org/wiki/Wikipedia_Zero

Not something that you probably know about if you live in the grey bits of
the map.

Kerry


-Original Message-
From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Oliver
Keyes
Sent: Thursday, 7 May 2015 10:06 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero
providers

Traffic through Wikipedia zero; apologies for not being clear.

On 6 May 2015 at 19:56, Sam Katz smk...@gmail.com wrote:
 hey oliver,

 I don't mean to be a help vampire...

 but what is zero traffic? you think the traffic is being proxied?
 perhaps even reverse proxied?

 --Sam

 On Wed, May 6, 2015 at 1:40 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
Probably also an excellent time to consider whether we can do anything
for those languages which don't have wikis yet.

For example, I'm in .nz, which has en, mi and nzs as official
languages, but we're a long way from an nzs.wiki, given that ase.wiki
is still in incubator. With the release of Unicode 8 with Sutton
SignWriting in June, these may or may not kick off in a big way.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 12:34 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Agreed! That's one of the changes I'd really like to push ahead with,
 although we're going to do some more in-depth data collection before
 any redesign :).

 On 6 May 2015 at 20:27, Stuart A. Yeates syea...@gmail.com wrote:
 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 cheers
 stuart

 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Mark J . Nelson

Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

I'd also like to see something smarter done at the main page, but the
and thus bit here is notoriously tricky.

For example most geolocation-based things, like Wikidata by default,
tend to produce funny results in Denmark. A Copenhagener is offered
something like this choice, in order:

* Danish, Greelandic, Faroese, Swedish, German, ...

The reasoning here is that Danish, Greenlandic, and Faroese are official
languages of the Danish Realm, which includes both Denmark proper, and
two autonomous territories, Greeland and the Faroe Islands. And then
Sweden and Germany are the two neighboring countries.

But for the average Copenhagener, the following order is far more
likely:

* Danish, English, Norwegian Bokmål, ...

The reason here is that Norwegian Bokmål is very close to Danish in
written form (more than Swedish is, and especially more than Faroese is)
while English is a widely used semi-official language in business,
government, and education (for example about half of university theses
are now written in English, and several major companies use it as their
official workplace language).

I think it's possible to come up with something that better aligns with
readers' actual preferences, but it's not easy!

-Mark

-- 
Mark J. Nelson
Anadrome Research
http://www.kmjn.org

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
Reading that excellent presentation, the thought that struck me was:

If I wanted to subvert the assumption that Wikipedia == en.wiki,
linking to http://www.wikipedia.org/ is what I'd do.

A smarter http://www.wikipedia.org/ might guess geo-location and thus
local languages.

cheers
stuart

--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
Totally! As said, I think accept-language is a better variable to
operate from. But these are early days; we're just beginning to
understand the space. Realistically, software changes will come a lot
later :)

On 6 May 2015 at 22:24, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Sam Katz
hey guys, you can't guess geolocation, because occasionally you'd be
wrong. this happens to me all the time. I want to read a site in
spanish... and then it thinks I'm in Latin America, when I'm not.

--Sam

On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Possibly. But that sounds potentially wooly and sometimes inaccurate.

 When a browser makes a web request, it sends a header called the
 accept_language header
 (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
 which indicates what languages the browser finds ideal - i.e., what
 languages the user and system are using.

 If we're going to make modifications here (I hope we will. But again;
 early days) I don't see a good argument for using geolocation, which
 is, as you've noted, flawed without substantial time and energy being
 applied to map those countries to probable languages. The data the
 browser already sends to the server contains the /certain/ languages.
 We can just use that.

 On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
This seems like a great place to use analytics data, for each division
in the geo-location classification, rank each of the languages by
usage and present the top N as likely candidates (+ browser settings)
when we need the user to pick a language.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
Possibly. But that sounds potentially wooly and sometimes inaccurate.

When a browser makes a web request, it sends a header called the
accept_language header
(https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
which indicates what languages the browser finds ideal - i.e., what
languages the user and system are using.

If we're going to make modifications here (I hope we will. But again;
early days) I don't see a good argument for using geolocation, which
is, as you've noted, flawed without substantial time and energy being
applied to map those countries to probable languages. The data the
browser already sends to the server contains the /certain/ languages.
We can just use that.

On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
One thing we could also do is check the accept_language header and
prioritise around that; that way we'd be prioritising specifically
the language the user's browser thinks they want.

On 6 May 2015 at 21:28, Stuart A. Yeates syea...@gmail.com wrote:
 Probably also an excellent time to consider whether we can do anything
 for those languages which don't have wikis yet.

 For example, I'm in .nz, which has en, mi and nzs as official
 languages, but we're a long way from an nzs.wiki, given that ase.wiki
 is still in incubator. With the release of Unicode 8 with Sutton
 SignWriting in June, these may or may not kick off in a big way.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 12:34 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Agreed! That's one of the changes I'd really like to push ahead with,
 although we're going to do some more in-depth data collection before
 any redesign :).

 On 6 May 2015 at 20:27, Stuart A. Yeates syea...@gmail.com wrote:
 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 cheers
 stuart

 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread WereSpielChequers
When a reader comes to Wikipedia from the web we can detect their IP address 
and that usually geolocates them to a country. More often than not that then 
tells you the dominant language of that country.

If we were to default to official or dominant languages then I predict endless 
arguments as to which language(s) should be the default in which countries. The 
large expat community in some parts of the Arab world might prefer English over 
Arabic. India would want to do things by state, and a whole new front would 
emerge in the Israeli Palestine debate. 

Regards

Jonathan Cardy


 On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:
 
 hey guys, you can't guess geolocation, because occasionally you'd be
 wrong. this happens to me all the time. I want to read a site in
 spanish... and then it thinks I'm in Latin America, when I'm not.
 
 --Sam
 
 On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Possibly. But that sounds potentially wooly and sometimes inaccurate.
 
 When a browser makes a web request, it sends a header called the
 accept_language header
 (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
 which indicates what languages the browser finds ideal - i.e., what
 languages the user and system are using.
 
 If we're going to make modifications here (I hope we will. But again;
 early days) I don't see a good argument for using geolocation, which
 is, as you've noted, flawed without substantial time and energy being
 applied to map those countries to probable languages. The data the
 browser already sends to the server contains the /certain/ languages.
 We can just use that.
 
 On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.
 
 cheers
 stuart
 --
 ...let us be heard from red core to black sky
 
 
 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:
 
 Stuart A. Yeates syea...@gmail.com writes:
 
 Reading that excellent presentation, the thought that struck me was:
 
 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.
 
 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.
 
 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.
 
 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:
 
 * Danish, Greelandic, Faroese, Swedish, German, ...
 
 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.
 
 But for the average Copenhagener, the following order is far more
 likely:
 
 * Danish, English, Norwegian Bokmål, ...
 
 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).
 
 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!
 
 -Mark
 
 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Oliver Keyes
As I've now said...4 times, I don't think we'd be using geolocation.
We'd be using the accept-language header. See
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com wrote:
 When a reader comes to Wikipedia from the web we can detect their IP address 
 and that usually geolocates them to a country. More often than not that then 
 tells you the dominant language of that country.

 If we were to default to official or dominant languages then I predict 
 endless arguments as to which language(s) should be the default in which 
 countries. The large expat community in some parts of the Arab world might 
 prefer English over Arabic. India would want to do things by state, and a 
 whole new front would emerge in the Israeli Palestine debate.

 Regards

 Jonathan Cardy


 On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:

 hey guys, you can't guess geolocation, because occasionally you'd be
 wrong. this happens to me all the time. I want to read a site in
 spanish... and then it thinks I'm in Latin America, when I'm not.

 --Sam

 On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Possibly. But that sounds potentially wooly and sometimes inaccurate.

 When a browser makes a web request, it sends a header called the
 accept_language header
 (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
 which indicates what languages the browser finds ideal - i.e., what
 languages the user and system are using.

 If we're going to make modifications here (I hope we will. But again;
 early days) I don't see a good argument for using geolocation, which
 is, as you've noted, flawed without substantial time and energy being
 applied to map those countries to probable languages. The data the
 browser already sends to the server contains the /certain/ languages.
 We can just use that.

 On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Yuvi Panda
Yes, but what about people not on earth? https://xkcd.com/713/ and
similar have to be taken into consideration as well for such an
important part of the wikipedia experience, I believe. It's 'Free
knowledge for all', not 'Free knowledge for all that we can accurately
geolocate'.

I wonder if we can set a permanent cookie after asking people with a
large modal dialog box about their language preferences on first load.
Thoughts?

On Wed, May 6, 2015 at 9:52 PM, WereSpielChequers
werespielchequ...@gmail.com wrote:
 When a reader comes to Wikipedia from the web we can detect their IP address 
 and that usually geolocates them to a country. More often than not that then 
 tells you the dominant language of that country.

 If we were to default to official or dominant languages then I predict 
 endless arguments as to which language(s) should be the default in which 
 countries. The large expat community in some parts of the Arab world might 
 prefer English over Arabic. India would want to do things by state, and a 
 whole new front would emerge in the Israeli Palestine debate.

 Regards

 Jonathan Cardy


 On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:

 hey guys, you can't guess geolocation, because occasionally you'd be
 wrong. this happens to me all the time. I want to read a site in
 spanish... and then it thinks I'm in Latin America, when I'm not.

 --Sam

 On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Possibly. But that sounds potentially wooly and sometimes inaccurate.

 When a browser makes a web request, it sends a header called the
 accept_language header
 (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
 which indicates what languages the browser finds ideal - i.e., what
 languages the user and system are using.

 If we're going to make modifications here (I hope we will. But again;
 early days) I don't see a good argument for using geolocation, which
 is, as you've noted, flawed without substantial time and energy being
 applied to map those countries to probable languages. The data the
 browser already sends to the server contains the /certain/ languages.
 We can just use that.

 On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list