Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-08-28 Thread Adam Baso
(cross-posted on mobile-l)

Update:

I have been checking on the indexed link count over the last couple of
months, and it has been roughly constant. Upon another check in the past
week, it looked like it was time to go ahead with the robots.txt update.

Just yesterday, the start of a robots.txt entry for lang.
zero.wikipedia.org has also been updated to instruct all robots like
Googlebot to not index lang.zero.wikipedia.org. Looks like even more
lang.zero.wikipedia.org pages may already be starting to fall out of the
index.

Thanks for flagging this! Will keep watching the indexed links count as it
dwindles.

Thanks again.
-Adam


On Wed, Jun 26, 2013 at 10:59 AM, Adam Baso ab...@wikimedia.org wrote:

 (cross-posted on mobile-l)

 Okay, looks like the index of zero.wikipedia.org pages in Google has
 shrunk by some 20 million entries. Nonetheless, a number of really old
 pages (e.g., going back to 6-May-2013) are still in the Google index with
 article text. I'll set a reminder to check on the Google index again in 30
 days, and hopefully then we can finally put the no-index rules in place at
 that time.

 The good news is that many of the pages are now correctly suppressed in
 natural search as non-canonical pages. In other words, a user would need to
 go through omitted results or do a site:domain search to see them.

 -Adam


 On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso ab...@wikimedia.org wrote:

 Update:

 We've added an enhancement to Wikipedia Zero so that if a user who isn't
 on a participating carrier network navigates to a Wikipedia Zero page on
 language.zero.wikipedia.org, such as
 http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
 presented an option to visit the canonical URL of the article. If clicked,
 the canonical URL should get the user to the mobile or desktop version of
 the page, based on device type.

 We're hoping that by next week the Google index will be refreshed so as
 to correctly mark the language.zero.wikipedia.org pages as duplicate
 pages in the omitted section. Upon confirmation of as much, the current
 plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to
 prevent indexing of language.zero.wikipedia.org altogether.


 On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the 
 language.zero.wikipedia.orglinks within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 
 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla 
 https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation







Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-26 Thread Matthew Flaschen
On 06/18/2013 06:35 PM, Adam Baso wrote:
 Update:
 
 We've added an enhancement to Wikipedia Zero so that if a user who isn't on
 a participating carrier network navigates to a Wikipedia Zero page on
 language.zero.wikipedia.org, such as
 http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
 presented an option to visit the canonical URL of the article. If clicked,
 the canonical URL should get the user to the mobile or desktop version of
 the page, based on device type.

That's good to hear.  It would be helpful if when visiting on desktop
(the original report,
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856, is about desktop
search), it did not mention mobile carriers, data charges, and such.
 Perhaps it could even redirect silently.

If that's not feasible for now, perhaps the message could be a bit more
general so it reads better on desktop.

Matt Flaschen

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-26 Thread Adam Baso
(cross-posted on mobile-l)

Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk
by some 20 million entries. Nonetheless, a number of really old pages
(e.g., going back to 6-May-2013) are still in the Google index with article
text. I'll set a reminder to check on the Google index again in 30 days,
and hopefully then we can finally put the no-index rules in place at that
time.

The good news is that many of the pages are now correctly suppressed in
natural search as non-canonical pages. In other words, a user would need to
go through omitted results or do a site:domain search to see them.

-Adam


On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso ab...@wikimedia.org wrote:

 Update:

 We've added an enhancement to Wikipedia Zero so that if a user who isn't
 on a participating carrier network navigates to a Wikipedia Zero page on
 language.zero.wikipedia.org, such as
 http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
 presented an option to visit the canonical URL of the article. If clicked,
 the canonical URL should get the user to the mobile or desktop version of
 the page, based on device type.

 We're hoping that by next week the Google index will be refreshed so as to
 correctly mark the language.zero.wikipedia.org pages as duplicate pages
 in the omitted section. Upon confirmation of as much, the current plan is
 to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent
 indexing of language.zero.wikipedia.org altogether.


 On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the language.zero.wikipedia.orglinks 
 within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 
 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org
 ?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation





___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-18 Thread Adam Baso
Update:

We've added an enhancement to Wikipedia Zero so that if a user who isn't on
a participating carrier network navigates to a Wikipedia Zero page on
language.zero.wikipedia.org, such as
http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
presented an option to visit the canonical URL of the article. If clicked,
the canonical URL should get the user to the mobile or desktop version of
the page, based on device type.

We're hoping that by next week the Google index will be refreshed so as to
correctly mark the language.zero.wikipedia.org pages as duplicate pages
in the omitted section. Upon confirmation of as much, the current plan is
to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing
of language.zero.wikipedia.org altogether.


On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the language.zero.wikipedia.orglinks 
 within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org
 ?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation




___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Tomasz Finc
Looping Dan Foy in who's managing the Zero backlog.

On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
 K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?

 https://bugzilla.wikimedia.org/show_bug.cgi?id=48856


 MZMcBride



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Kul Wadhwa
Adam Baso (copied on this email) is working on it and a fix is ready. He'll
do some testing to make sure it's resolved.

On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.org wrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




-- 
Kul Wadhwa
Head of Mobile
Wikimedia Foundation
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
Hello All,

We had shelved my patch, patch 64629 https://gerrit.wikimedia.org/r/64629,
in hopes that an earlier patch, patch
61809https://gerrit.wikimedia.org/r/61809(bug
35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
resolve the issue naturally as Google re-indexed. But it appears Google has
re-indexed and yet the .zero.wikipedia.org URLs are still  present in
Google's index, instead of the language.wikipedia.org URLs.

I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for
re-review. We will need to further discuss whether it is appropriate to
have Google completely remove .zero.wikipedia.org links from their cache,
or if perhaps we need to open a support thread with Google about canonical
URLs.




On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.org wrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.org wrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
All,

My mistake. The pages in Google's index that I used for sampling - the ones
that have Sorry, ... in their description in Google search results - are
cached pages. I assumed incorrectly that those pages were based on recent
indexing (e.g., in the past few days).

I think we can actually stick to the original plan of Google re-indexing
and the search results de-emphasizing the
language.zero.wikipedia.orglinks within the next 30 days.

I still find it strange that there are language.zero.wikipedia.org links
that turned up higher in the search engine rankings than their
better-established language.wikipedia.org counterparts. But I suppose
with fewer competing page elements, especially on long-tail articles with
fewer or no direct links to the desktop page, this is maybe not totally
unexpected.

-Adam




On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their cache,
 or if perhaps we need to open a support thread with Google about canonical
 URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.org wrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


[Wikimedia-l] Wikipedia Zero in Google search result

2013-05-27 Thread Benjamin Chen
Hi,

I noticed that when I'm searching on Google, many Wikipedia results are in the 
form of lang-code.zero.wikipedia.org, perhaps just since a day or two ago.

I'm not sure what items are indexed this way, but it would really be a trouble 
- there is no link on the page that jumps you to the standard site (even the 
notice links to main page of m.wikipedia.org, not the corresponding article on 
m.wikipedia.org)

Regards,

Benjamin Chen / [[User:Bencmq]]


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-27 Thread K. Peachey
Can you please file this in bugzilla https://bugzilla.wikimedia.org.

Thanks.

On Mon, May 27, 2013 at 9:41 PM, Benjamin Chen bencmqw...@gmail.com wrote:
 Hi,

 I noticed that when I'm searching on Google, many Wikipedia results are in 
 the form of lang-code.zero.wikipedia.org, perhaps just since a day or two ago.

 I'm not sure what items are indexed this way, but it would really be a 
 trouble - there is no link on the page that jumps you to the standard site 
 (even the notice links to main page of m.wikipedia.org, not the corresponding 
 article on m.wikipedia.org)

 Regards,

 Benjamin Chen / [[User:Bencmq]]


 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-27 Thread MZMcBride
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?

https://bugzilla.wikimedia.org/show_bug.cgi?id=48856


MZMcBride



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l