[Wikimedia-l] [Input requested] Knowledge as a Service at the Wikimedia Developer Summit 2018

2018-01-18 Thread Adam Baso
Greetings Wikimedians,

(And thank you for patience with me cross-posting.)

I'm writing to invite your input on the following Phabricator task ahead of
next week's Wikimedia Developer Summit 2018 [1] session.

Knowledge as a Service
https://phabricator.wikimedia.org/T183315

The purpose [2] of the Wikimedia Developer Summit 2018 sessions is to
provide guidance for Phase 2 of the Movement Strategic Direction [3] on
buildout of technology capabilities. We'd really love your thoughts to help
set context for our session next week, as Knowledge as a Service is a
primary consideration in the Movement Strategic Direction.

What is Knowledge as a Service? Its essence is about information
architecture approaches and the necessary software that will ultimately
allow content consumption and creation to radiate to new and different
types of interfaces and devices in addition to browser-based approaches. As
you review position papers from attendees [4] you'll notice that the way
they (myself included) think about best solving this is through a heavy
emphasis on technology that makes it easier to better structure information
and its metadata for re-use, remixing, and querying.

What might this mean? Does it mean we should build Wikimedia software in an
API- and metadata-first manner following industry standards compatible with
content structuration? Does it mean weaving our existing structured and
semi-structured data technologies together? How do we build technology that
can ensure successful collaboration between communities on increasingly
structured and interdependent information sources? And how can we ensure
the tech will bolster growth of multilingual and multimedia content
creation and consumption?

I've copied some of the essential material from the Movement Strategic
Direction concerning Knowledge as a Service so you have it here. We would
appreciate your input and hope you will subscribe to the Phabricator task
to contribute and follow along as we explore this topic.

https://phabricator.wikimedia.org/T183315

The following content is copied from
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction :


Knowledge as a service: To serve our users, we will become a platform that
serves open knowledge to the world across interfaces and communities. We
will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia. Our infrastructure will enable us and others to
collect and use different forms of free, trusted knowledge.

...

As technology spreads through every aspect of our lives, Wikimedia's
infrastructure needs to be able to communicate easily with other connected
systems.

...

As a platform, we need to transform our structures to support new formats,
new interfaces, and new types of knowledge. We have a strategic opportunity
to go further and offer this platform as a service to other institutions,
beyond Wikimedia. In a world that is becoming more and more connected,
building the infrastructure for knowledge gives others a vested interest in
our success. It is how we ensure our place in the larger network of
knowledge, and become an essential part of it. As a service to users, we
need to build the platform for knowledge or, in jargon, provide knowledge
as a service.

...

Knowledge as a service: A platform that serves open knowledge to the world
across interfaces and communities
Our openness will ensure that our decisions are fair, that we are
accountable to one another, and that we act in the public interest. Our
systems will follow the evolution of technology. We will transform our
platform to work across digital formats, devices, and interfaces. The
distributed structure of our network will help us adapt to local contexts.

...


We will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia.
We will continue to build the infrastructure for free knowledge for our
communities. We will go further by offering it as a service to others in
the network of knowledge. We will continue to build the partnerships that
enable us to develop knowledge we can't create ourselves.

...

Our infrastructure will enable us and others to collect and use different
forms of free, trusted knowledge.
We will build the technical infrastructures that enable us to collect free
knowledge in all forms and languages. We will use our position as a leader
in the ecosystem of knowledge to advance our ideals of freedom and
fairness. We will build the technical structures and the social agreements
that enable us to trust the new knowledge we compile. We will focus on
highly structured information to facilitate its exchange and reuse in
multiple contexts.


Thank you.
-Adam


[1] https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018
[2]
https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018/Purpose_and_Results
[3]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction
[4] 

Re: [Wikimedia-l] Fwd: [WikimediaMobile] "Among mobile sites, Wikipedia reigns in terms of popularity"

2016-05-12 Thread Adam Baso
Inline.

On Thu, May 12, 2016 at 1:28 AM, MZMcBride  wrote:

> Steven Walling wrote:
> >It's really great to see Wikipedia highlighted as a source for news and
> >current events. It's rare that people fully recognize the degree to which
> >the "encyclopedia" is actually very good at trending news information.
> >That said, the report paints a rosy picture that, strategically speaking,
> >may not be cause for celebration.
>
> Does the Knight Foundation disclose somewhere in this report that it's a
> donor to the Wikimedia Foundation?
>

I didn't see it in there. Looks like this was commissioned work.


>
> Comparing Wikipedia to sites like BuzzFeed and CNN seems to be a pretty
> classic case of comparing apples to oranges.
>

It seems like it's contextualized in
https://medium.com/mobile-first-news-how-people-use-smartphones-to/mobile-america-how-different-audiences-tap-mobile-news-1c72525210d7
as follows:

"The information and reference site Wikipedia is linked to news behavior
and is a critical pathway to the news and information ecosystem."

"Information and reference sites are linked to news behavior and often
drive traffic to news content. Wikipedia figures prominently in mobile
content access."



> >Neglecting to show people the value of the apps will help grow mobile web
> >traffic in the short term, but in the long run may leave us entirely
> >dependent on search (i.e. Google) or simply not growing readers, despite
> >millions of people still coming online via mobile.
>
> Can you elaborate on the value of the apps? HTTP is a free and open
> standard with very wide support. iOS is closed and proprietary. Maybe you
> can explain how investing resources into the latter aligns with
> Wikimedia's values?


The web and the apps are both ways to provide access to the openly licensed
content. We should and do invest in both on the grounds of reaching users
through popular channels.

I think Steven was talking about something different in terms of strategic
risk of disintermediation, though. I don't see the future as dystopian.
Rather I see a more utopian future requiring continued improvement in
dialogue with communities and nurturing of partnerships.


>
> Personally, I say hasten the day that we abolish the horrible "m." from
> our URLs and MobileFrontend from our servers.
>

I don't think it's something we're planning to do soon, but I agree it
would be nice to consolidate domain names.

I don't think we're at the place where we can yet deliver on converging the
full tech stack (and this was the feedback we basically received at the dev
summit), although I think we should keep iterating on this discussion over
the coming quarters. There are good things in MediaWiki Core and
desktop-oriented extensions and there are good things in MobileFrontend,
and I'm interested to see how we can port some things over if not
consolidate some pieces.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Wikipedia.org portal page update!

2016-03-11 Thread Adam Baso
>
> The image thumbnail thing fails to filter our fair use images along
> with the usual lack of author and licensing information.


Hi there - speaking to one thing I'm familiar with, with respect to image
selection, we believe https://phabricator.wikimedia.org/T124225 should
address fair use ("non-free") images, although page reparses will happen
gradually (pages are cached for up to 30 days or so).

-Adam
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Invitation to WMF November 2014 Metrics Activities Meeting: Thursday, December 4, 19:00 UTC

2014-12-08 Thread Adam Baso
Hi there, just wanted to touch on the autoredirection stuff. The thing
mentioned on autoredirection is an enhancement for accesses to
m.wikipedia.org/ webroot (not articles) for Wikipedia Zero users. As
before, non-Wikipedia Zero users accessing m.wikipedia.org/ webroot
continue to get redirected to en.m.wikipedia.org.

It seems thus far that the enhancement for Wikipedia Zero users isn't
causing harm, and our thinking is that if that holds, we should examine
some application of the approach to m.wikipedia.org/ non-Wikipedia
Zero-sourced access as well.

As an extension of this thinking, looking into alternative placement of
Read in another language or even a language shortlist (e.g., an API
endpoint looks at Accept-Language and the top 3 pertinent languages get
shimmed in) above the fold pertinent for the given user, taking into
account JavaScript support level, may be worthwhile.

-Adam




On Sun, Dec 7, 2014 at 4:48 PM, Asaf Bartov abar...@wikimedia.org wrote:

 Hi.

 On Fri, Dec 5, 2014 at 9:27 AM, C. Scott Ananian canan...@wikimedia.org
 wrote:

  On Thu, Dec 4, 2014 at 7:18 PM, Asaf Bartov abar...@wikimedia.org
 wrote:
   On Thu, Dec 4, 2014 at 1:23 PM, C. Scott Ananian 
 canan...@wikimedia.org
  
   wrote:
   1) Is the rise in global south page views specifically to *enwiki*, or
   is it to local wikis?
   Not actually an either/or.  The answer seems to me to be yes, i.e.
 all
   wikis -- that is, all projects, all languages.
 
  It may *seem to you* to be yes, but the data indicates that the
  answer differs, depending where you look.  For example, the data
  clearly indicates that the stunning rise in Iran is almost entirely
  due to enwiki.  enwiki gains over 80 million page views, fawiki gains
  only 10 million.  See
  https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics for a
  convincing graph.
 
  I think it's important that we determine the actual answers to these
  questions, instead of trusting our instincts.
 

 I definitely agree.  I had misread your question to mean is the rise
 computed across all wikis, which is indeed not what you were asking.  I
 apologize for the irrelevant answer.


Some definitely do.  Another major factor, mentioned today, is that in
  some
   countries, mobile devices just don't come with good local languages
   support, and people are putting up with that and using what the device
  does
   give them, which are generally the major, colonial languages.
 
  Hm, the word colonial bothers me here.  I know you mean
  historically colonial, but in the modern world English is also a
  trade language, not just a formerly-colonial language.  Much access to
  enwiki is due to its trade-language status.
 

 Certainly, there are very strong economic incentives to use English these
 days, and additionally other incentives, such as prestige real and
 imagined, still operating (and those, themselves, are still ripples of
 colonialism), but I did not mean 'colonial' here particularly strongly.  I
 could have written European, I suppose, except there are many languages
 in Europe, and only a handful have been colonial languages.  But the term
 is not important here, I think.


  I feel strongly that we have a moral obligation to offer good local
  language support, but I also feel that we shouldn't label and dismiss
  readers who want to learn/practice/find information in a trade
  language. (This is one of the reasons I'm a fan of simplewiki, but
  that's a whole 'nuther discussion.)
 

 I don't see that I (or anyone) did dismiss that.  In terms of our strategic
 goals of Reach and Participation, we are agnostic about which languages
 people contribute in, or consume in.  In terms of our strategic goal of
 Diversity however, we do want to work towards adequate offerings in all
 languages in which people are actually seeking to consume knowledge.


   On Fri, Dec 5, 2014 at 2:05 AM, Salvador A salvador1...@gmail.com
  wrote:
   I was reading the presentation on metrics and the point about Mexico's
   decreasing of views on Wikipedia called my attention.
 
  I dug into the numbers a little more; see the graphs at
  https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics
 
  It's a bit confusing.  At this moment I'm inclined to say that the
  computation of decliners was in some way erroneous; neither the page
  views for Mexico nor the overall pageviews for eswiki seem to support
  the large annual declines reported.
 
  On https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics I
  compute an annual decline for Mexico of 12.4% (compared to 23.2%
  reported at the metrics meeting), which compares to an eswiki annual
  decline of 4.8% (excludings bots and spiders).
 
  So Mexico is indeed concerning -- it's declining at three times the
  eswiki rate.  But eswiki as a whole seems like it ought to also be a
  concern.  And I'd like to understand why I can't reproduce the much
  higher numbers shown in the Metrics meeting.
 

 Thanks for taking 

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-06-24 Thread Adam Baso
Here's the patch update.

https://gerrit.wikimedia.org/r/141740


On Mon, Jun 23, 2014 at 3:30 PM, Adam Baso ab...@wikimedia.org wrote:

 One wrinkle we've encountered and sort of expected, is that the SIM card
 MCC-MNC doesn't always match the actual network MCC-MNC. So on Android,
 we'll add both to the payload so that we can differentiate them. On iOS it
 looks like the API only currently allows one of these values through an
 opaque method call. The previous EventLogging server side code wasn't
 logging the User-Agent (defined coarsely in our code on both platforms).
 I'm thinking to make it evident when we're dealing with an iOS version of
 the app, it would make most sense to re-enable the User-Agent so we can
 pick up this coarse-grained value. I wanted to put this User-Agent item out
 here for a brief period before adding the code, though.

 -Adam




 On Fri, May 30, 2014 at 2:04 PM, Adam Baso ab...@wikimedia.org wrote:

 Okay, the code is in place in the alphas of both the Android and iOS
 apps, and the server-side 2% sampling (extra header in HTTPS request sent
 once per cellular app session) is working.


 https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b


 https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce


 https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287

 Changes to event logging in the iOS alpha app (internal only at the
 moment, although repo can be cloned and run in the Xcode simulator) are
 coming pretty soon, and once those are in, we'll make one last tweak there
 to have the app not add the extra MCC/MNC header on that single request per
 cellular connection when logging is turned off in the iOS alpha app. That
 part is done in the Android app already.

 -Adam




 On Fri, May 2, 2014 at 1:16 PM, Adam Baso ab...@wikimedia.org wrote:

 Federico asked if sampling might make sense here. I think it will work,
 so I've updated the patchset.

 From a patchset comment I provided:

 It's possible we may have situations where operators have not lots of
 users on them accessing Wiki(m|p)edia properties, so we do run some risk of
 actually missing IPs, even if exit IPs are concentrators of typically large
 sets of users. That said, let's try a 2% sample ratio; and if we find out
 it's insufficient, then we'll sample more, if it's oversampling, then we
 can adjust the other way, too. New patchset arriving shortly.

 (I've since submitted the updated code for review.)

 -Adam



 On Thu, May 1, 2014 at 7:52 PM, Adam Baso ab...@wikimedia.org wrote:

 After examining this, it looks like EventLogging is more suited to the
 logging task than debug logging and the trappings of needing to alter debug
 logging in the core MediaWiki software.

 EventLogging logs at the resolution of a second (instead of a day), but
 has inbuilt support for record removal after 90 days.

 Please do let us know in case of further questions. Here's the logging
 schema for those with an interest:

 https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

 Here's the relevant server code:

 https://gerrit.wikimedia.org/r/#/c/130991/

 -Adam




 On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso ab...@wikimedia.org wrote:

 Great idea!

 Anyone on the list know if there's a way to make the debug log
 facilities do the MMDD timestamp instead of the longer one?

 If not, I suppose we could work to update the core MediaWiki code. [1]

 -Adam

 1. For those with PHP skills or equivalent, I'm referring to
 https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
 Scroll to the bottom of the function definition to see the datetimestamp
 approach.


 On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
 andrew.g...@dunelm.org.uk wrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to
 this
  data?
   Only members of the mobile team? Local project CheckUsers?
 Wikimedia
   Foundation-approved researchers? Wikimedia shell users?
 AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-06-23 Thread Adam Baso
One wrinkle we've encountered and sort of expected, is that the SIM card
MCC-MNC doesn't always match the actual network MCC-MNC. So on Android,
we'll add both to the payload so that we can differentiate them. On iOS it
looks like the API only currently allows one of these values through an
opaque method call. The previous EventLogging server side code wasn't
logging the User-Agent (defined coarsely in our code on both platforms).
I'm thinking to make it evident when we're dealing with an iOS version of
the app, it would make most sense to re-enable the User-Agent so we can
pick up this coarse-grained value. I wanted to put this User-Agent item out
here for a brief period before adding the code, though.

-Adam




On Fri, May 30, 2014 at 2:04 PM, Adam Baso ab...@wikimedia.org wrote:

 Okay, the code is in place in the alphas of both the Android and iOS apps,
 and the server-side 2% sampling (extra header in HTTPS request sent once
 per cellular app session) is working.


 https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b


 https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce


 https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287

 Changes to event logging in the iOS alpha app (internal only at the
 moment, although repo can be cloned and run in the Xcode simulator) are
 coming pretty soon, and once those are in, we'll make one last tweak there
 to have the app not add the extra MCC/MNC header on that single request per
 cellular connection when logging is turned off in the iOS alpha app. That
 part is done in the Android app already.

 -Adam




 On Fri, May 2, 2014 at 1:16 PM, Adam Baso ab...@wikimedia.org wrote:

 Federico asked if sampling might make sense here. I think it will work,
 so I've updated the patchset.

 From a patchset comment I provided:

 It's possible we may have situations where operators have not lots of
 users on them accessing Wiki(m|p)edia properties, so we do run some risk of
 actually missing IPs, even if exit IPs are concentrators of typically large
 sets of users. That said, let's try a 2% sample ratio; and if we find out
 it's insufficient, then we'll sample more, if it's oversampling, then we
 can adjust the other way, too. New patchset arriving shortly.

 (I've since submitted the updated code for review.)

 -Adam



 On Thu, May 1, 2014 at 7:52 PM, Adam Baso ab...@wikimedia.org wrote:

 After examining this, it looks like EventLogging is more suited to the
 logging task than debug logging and the trappings of needing to alter debug
 logging in the core MediaWiki software.

 EventLogging logs at the resolution of a second (instead of a day), but
 has inbuilt support for record removal after 90 days.

 Please do let us know in case of further questions. Here's the logging
 schema for those with an interest:

 https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

 Here's the relevant server code:

 https://gerrit.wikimedia.org/r/#/c/130991/

 -Adam




 On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso ab...@wikimedia.org wrote:

 Great idea!

 Anyone on the list know if there's a way to make the debug log
 facilities do the MMDD timestamp instead of the longer one?

 If not, I suppose we could work to update the core MediaWiki code. [1]

 -Adam

 1. For those with PHP skills or equivalent, I'm referring to
 https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
 Scroll to the bottom of the function definition to see the datetimestamp
 approach.


 On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
 andrew.g...@dunelm.org.uk wrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to
 this
  data?
   Only members of the mobile team? Local project CheckUsers?
 Wikimedia
   Foundation-approved researchers? Wikimedia shell users? AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me that the wiki name (e.g., enwiki), but not
 the
  full URL, gets logged additionally as part of the wfDebugLog call;
 to make
  the implicit explicit

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-30 Thread Adam Baso
Okay, the code is in place in the alphas of both the Android and iOS apps,
and the server-side 2% sampling (extra header in HTTPS request sent once
per cellular app session) is working.

https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b

https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce

https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287

Changes to event logging in the iOS alpha app (internal only at the moment,
although repo can be cloned and run in the Xcode simulator) are coming
pretty soon, and once those are in, we'll make one last tweak there to have
the app not add the extra MCC/MNC header on that single request per
cellular connection when logging is turned off in the iOS alpha app. That
part is done in the Android app already.

-Adam




On Fri, May 2, 2014 at 1:16 PM, Adam Baso ab...@wikimedia.org wrote:

 Federico asked if sampling might make sense here. I think it will work, so
 I've updated the patchset.

 From a patchset comment I provided:

 It's possible we may have situations where operators have not lots of
 users on them accessing Wiki(m|p)edia properties, so we do run some risk of
 actually missing IPs, even if exit IPs are concentrators of typically large
 sets of users. That said, let's try a 2% sample ratio; and if we find out
 it's insufficient, then we'll sample more, if it's oversampling, then we
 can adjust the other way, too. New patchset arriving shortly.

 (I've since submitted the updated code for review.)

 -Adam



 On Thu, May 1, 2014 at 7:52 PM, Adam Baso ab...@wikimedia.org wrote:

 After examining this, it looks like EventLogging is more suited to the
 logging task than debug logging and the trappings of needing to alter debug
 logging in the core MediaWiki software.

 EventLogging logs at the resolution of a second (instead of a day), but
 has inbuilt support for record removal after 90 days.

 Please do let us know in case of further questions. Here's the logging
 schema for those with an interest:

 https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

 Here's the relevant server code:

 https://gerrit.wikimedia.org/r/#/c/130991/

 -Adam




 On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso ab...@wikimedia.org wrote:

 Great idea!

 Anyone on the list know if there's a way to make the debug log
 facilities do the MMDD timestamp instead of the longer one?

 If not, I suppose we could work to update the core MediaWiki code. [1]

 -Adam

 1. For those with PHP skills or equivalent, I'm referring to
 https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
 Scroll to the bottom of the function definition to see the datetimestamp
 approach.


 On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray andrew.g...@dunelm.org.uk
  wrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to this
  data?
   Only members of the mobile team? Local project CheckUsers? Wikimedia
   Foundation-approved researchers? Wikimedia shell users? AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me that the wiki name (e.g., enwiki), but not
 the
  full URL, gets logged additionally as part of the wfDebugLog call; to
 make
  the implicit explicit, wfDebugLog adds a datetime stamp as well, and
 that's
  useful for purging old records. I'll forward this email to mobile-l
 and
  wikitech-l to underscore this.
 
 
   And this may be a silly question, but is there a reasonable means of
   approximating how identifying these two data points alone are? That
 is,
   Using a mobile country code and exit IP address, is it possible to
   identify a particular editor or reader? Or perhaps rephrased, is
 this
  data
   considered anonymized?
  
 
  Not a silly question. My approximation is these tuples (datetime, now
 that
  it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not
 perfectly
  anonymized, are low identifying (that is, indirect inferences on the
 data
  in isolation are unlikely, but technically possible, through
 examination of
  short tail

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-02 Thread Adam Baso
Federico asked if sampling might make sense here. I think it will work, so
I've updated the patchset.

From a patchset comment I provided:

It's possible we may have situations where operators have not lots of
users on them accessing Wiki(m|p)edia properties, so we do run some risk of
actually missing IPs, even if exit IPs are concentrators of typically large
sets of users. That said, let's try a 2% sample ratio; and if we find out
it's insufficient, then we'll sample more, if it's oversampling, then we
can adjust the other way, too. New patchset arriving shortly.

(I've since submitted the updated code for review.)

-Adam


On Thu, May 1, 2014 at 7:52 PM, Adam Baso ab...@wikimedia.org wrote:

 After examining this, it looks like EventLogging is more suited to the
 logging task than debug logging and the trappings of needing to alter debug
 logging in the core MediaWiki software.

 EventLogging logs at the resolution of a second (instead of a day), but
 has inbuilt support for record removal after 90 days.

 Please do let us know in case of further questions. Here's the logging
 schema for those with an interest:

 https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

 Here's the relevant server code:

 https://gerrit.wikimedia.org/r/#/c/130991/

 -Adam




 On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso ab...@wikimedia.org wrote:

 Great idea!

 Anyone on the list know if there's a way to make the debug log facilities
 do the MMDD timestamp instead of the longer one?

 If not, I suppose we could work to update the core MediaWiki code. [1]

 -Adam

 1. For those with PHP skills or equivalent, I'm referring to
 https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
 Scroll to the bottom of the function definition to see the datetimestamp
 approach.


 On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
 andrew.g...@dunelm.org.ukwrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to this
  data?
   Only members of the mobile team? Local project CheckUsers? Wikimedia
   Foundation-approved researchers? Wikimedia shell users? AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me that the wiki name (e.g., enwiki), but not the
  full URL, gets logged additionally as part of the wfDebugLog call; to
 make
  the implicit explicit, wfDebugLog adds a datetime stamp as well, and
 that's
  useful for purging old records. I'll forward this email to mobile-l and
  wikitech-l to underscore this.
 
 
   And this may be a silly question, but is there a reasonable means of
   approximating how identifying these two data points alone are? That
 is,
   Using a mobile country code and exit IP address, is it possible to
   identify a particular editor or reader? Or perhaps rephrased, is this
  data
   considered anonymized?
  
 
  Not a silly question. My approximation is these tuples (datetime, now
 that
  it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
  anonymized, are low identifying (that is, indirect inferences on the
 data
  in isolation are unlikely, but technically possible, through
 examination of
  short tail outliers in a cluster analysis where such readers/editors
 exist
  in the short tail outliers sets), in contrast to regular web access
 logs
  (where direct inferences are easy).
 
  Thanks. I'll forward this along now.
 
  -Adam
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe




___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-01 Thread Adam Baso
After examining this, it looks like EventLogging is more suited to the
logging task than debug logging and the trappings of needing to alter debug
logging in the core MediaWiki software.

EventLogging logs at the resolution of a second (instead of a day), but has
inbuilt support for record removal after 90 days.

Please do let us know in case of further questions. Here's the logging
schema for those with an interest:

https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

Here's the relevant server code:

https://gerrit.wikimedia.org/r/#/c/130991/

-Adam




On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso ab...@wikimedia.org wrote:

 Great idea!

 Anyone on the list know if there's a way to make the debug log facilities
 do the MMDD timestamp instead of the longer one?

 If not, I suppose we could work to update the core MediaWiki code. [1]

 -Adam

 1. For those with PHP skills or equivalent, I'm referring to
 https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
 Scroll to the bottom of the function definition to see the datetimestamp
 approach.


 On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
 andrew.g...@dunelm.org.ukwrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to this
  data?
   Only members of the mobile team? Local project CheckUsers? Wikimedia
   Foundation-approved researchers? Wikimedia shell users? AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me that the wiki name (e.g., enwiki), but not the
  full URL, gets logged additionally as part of the wfDebugLog call; to
 make
  the implicit explicit, wfDebugLog adds a datetime stamp as well, and
 that's
  useful for purging old records. I'll forward this email to mobile-l and
  wikitech-l to underscore this.
 
 
   And this may be a silly question, but is there a reasonable means of
   approximating how identifying these two data points alone are? That
 is,
   Using a mobile country code and exit IP address, is it possible to
   identify a particular editor or reader? Or perhaps rephrased, is this
  data
   considered anonymized?
  
 
  Not a silly question. My approximation is these tuples (datetime, now
 that
  it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
  anonymized, are low identifying (that is, indirect inferences on the
 data
  in isolation are unlikely, but technically possible, through
 examination of
  short tail outliers in a cluster analysis where such readers/editors
 exist
  in the short tail outliers sets), in contrast to regular web access logs
  (where direct inferences are easy).
 
  Thanks. I'll forward this along now.
 
  -Adam
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-16 Thread Adam Baso
Inline.

Thanks for starting this thread.

 Sorry if I've overlooked this, but who/what will have access to this data?
 Only members of the mobile team? Local project CheckUsers? Wikimedia
 Foundation-approved researchers? Wikimedia shell users? AbuseFilter
 filters?


It's a good question. The thought is to put it in the customary wfDebugLog
location (with, for example, filename mccmnc.log) on fluorine.

It just occurred to me that the wiki name (e.g., enwiki), but not the
full URL, gets logged additionally as part of the wfDebugLog call; to make
the implicit explicit, wfDebugLog adds a datetime stamp as well, and that's
useful for purging old records. I'll forward this email to mobile-l and
wikitech-l to underscore this.


 And this may be a silly question, but is there a reasonable means of
 approximating how identifying these two data points alone are? That is,
 Using a mobile country code and exit IP address, is it possible to
 identify a particular editor or reader? Or perhaps rephrased, is this data
 considered anonymized?


Not a silly question. My approximation is these tuples (datetime, now that
it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
anonymized, are low identifying (that is, indirect inferences on the data
in isolation are unlikely, but technically possible, through examination of
short tail outliers in a cluster analysis where such readers/editors exist
in the short tail outliers sets), in contrast to regular web access logs
(where direct inferences are easy).

Thanks. I'll forward this along now.

-Adam
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-16 Thread Adam Baso
Great idea!

Anyone on the list know if there's a way to make the debug log facilities
do the MMDD timestamp instead of the longer one?

If not, I suppose we could work to update the core MediaWiki code. [1]

-Adam

1. For those with PHP skills or equivalent, I'm referring to
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
Scroll to the bottom of the function definition to see the datetimestamp
approach.


On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray andrew.g...@dunelm.org.ukwrote:

 Hi Adam,

 One thought: you don't really need the date/time data at any detailed
 resolution, do you? If what you're wanting it for is to track major
 changes (last month it all switched to this IP) and to purge old
 data (delete anything older than 10 March), you could simply log day
 rather than datetime.

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45

 enwiki / 127.0.0.1 / 123.45 / 2014-04-16

 - the latter gives you the data you need while making it a lot harder
 to do any kind of close user-identification.

 Andrew.
 On 16 Apr 2014 19:17, Adam Baso ab...@wikimedia.org wrote:

  Inline.
 
  Thanks for starting this thread.
  
   Sorry if I've overlooked this, but who/what will have access to this
  data?
   Only members of the mobile team? Local project CheckUsers? Wikimedia
   Foundation-approved researchers? Wikimedia shell users? AbuseFilter
   filters?
  
 
  It's a good question. The thought is to put it in the customary
 wfDebugLog
  location (with, for example, filename mccmnc.log) on fluorine.
 
  It just occurred to me that the wiki name (e.g., enwiki), but not the
  full URL, gets logged additionally as part of the wfDebugLog call; to
 make
  the implicit explicit, wfDebugLog adds a datetime stamp as well, and
 that's
  useful for purging old records. I'll forward this email to mobile-l and
  wikitech-l to underscore this.
 
 
   And this may be a silly question, but is there a reasonable means of
   approximating how identifying these two data points alone are? That is,
   Using a mobile country code and exit IP address, is it possible to
   identify a particular editor or reader? Or perhaps rephrased, is this
  data
   considered anonymized?
  
 
  Not a silly question. My approximation is these tuples (datetime, now
 that
  it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
  anonymized, are low identifying (that is, indirect inferences on the data
  in isolation are unlikely, but technically possible, through examination
 of
  short tail outliers in a cluster analysis where such readers/editors
 exist
  in the short tail outliers sets), in contrast to regular web access logs
  (where direct inferences are easy).
 
  Thanks. I'll forward this along now.
 
  -Adam
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-15 Thread Adam Baso
I emailed mobile-l and wikitech-l about this, now I'm moving this
discussion to wikimedia-l. Here's the longer technical thread:

http://lists.wikimedia.org/pipermail/mobile-l/2014-April/006884.html

In summary, to show Wikipedia Zero banners for the correct mobile networks,
we are planning once for each cellular-based app session to log two pieces
of data in a specialized logfile, deleting log entries older than 90 days.

1. MCC-MNC http://en.wikipedia.org/wiki/Mobile_country_code code (format
is ###-##), which denotes the mobile operator
2. Exit (gateway/proxy) IP address
* These data points would not be logged alongside the normal web access
logs.

This information could be used to estimate rough demand for Wikipedia in
potential Wikipedia Zero geos, although remediating the out-of-sync IP
addresses on file for existing partners is primary.

Internal review suggests this is in alignment with privacy policy, and we
wanted to see if there were other thoughts on this approach here on
wikimedia-l.

-Adam
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-08-28 Thread Adam Baso
(cross-posted on mobile-l)

Update:

I have been checking on the indexed link count over the last couple of
months, and it has been roughly constant. Upon another check in the past
week, it looked like it was time to go ahead with the robots.txt update.

Just yesterday, the start of a robots.txt entry for lang.
zero.wikipedia.org has also been updated to instruct all robots like
Googlebot to not index lang.zero.wikipedia.org. Looks like even more
lang.zero.wikipedia.org pages may already be starting to fall out of the
index.

Thanks for flagging this! Will keep watching the indexed links count as it
dwindles.

Thanks again.
-Adam


On Wed, Jun 26, 2013 at 10:59 AM, Adam Baso ab...@wikimedia.org wrote:

 (cross-posted on mobile-l)

 Okay, looks like the index of zero.wikipedia.org pages in Google has
 shrunk by some 20 million entries. Nonetheless, a number of really old
 pages (e.g., going back to 6-May-2013) are still in the Google index with
 article text. I'll set a reminder to check on the Google index again in 30
 days, and hopefully then we can finally put the no-index rules in place at
 that time.

 The good news is that many of the pages are now correctly suppressed in
 natural search as non-canonical pages. In other words, a user would need to
 go through omitted results or do a site:domain search to see them.

 -Adam


 On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso ab...@wikimedia.org wrote:

 Update:

 We've added an enhancement to Wikipedia Zero so that if a user who isn't
 on a participating carrier network navigates to a Wikipedia Zero page on
 language.zero.wikipedia.org, such as
 http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
 presented an option to visit the canonical URL of the article. If clicked,
 the canonical URL should get the user to the mobile or desktop version of
 the page, based on device type.

 We're hoping that by next week the Google index will be refreshed so as
 to correctly mark the language.zero.wikipedia.org pages as duplicate
 pages in the omitted section. Upon confirmation of as much, the current
 plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to
 prevent indexing of language.zero.wikipedia.org altogether.


 On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the 
 language.zero.wikipedia.orglinks within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 
 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla 
 https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-26 Thread Adam Baso
(cross-posted on mobile-l)

Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk
by some 20 million entries. Nonetheless, a number of really old pages
(e.g., going back to 6-May-2013) are still in the Google index with article
text. I'll set a reminder to check on the Google index again in 30 days,
and hopefully then we can finally put the no-index rules in place at that
time.

The good news is that many of the pages are now correctly suppressed in
natural search as non-canonical pages. In other words, a user would need to
go through omitted results or do a site:domain search to see them.

-Adam


On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso ab...@wikimedia.org wrote:

 Update:

 We've added an enhancement to Wikipedia Zero so that if a user who isn't
 on a participating carrier network navigates to a Wikipedia Zero page on
 language.zero.wikipedia.org, such as
 http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
 presented an option to visit the canonical URL of the article. If clicked,
 the canonical URL should get the user to the mobile or desktop version of
 the page, based on device type.

 We're hoping that by next week the Google index will be refreshed so as to
 correctly mark the language.zero.wikipedia.org pages as duplicate pages
 in the omitted section. Upon confirmation of as much, the current plan is
 to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent
 indexing of language.zero.wikipedia.org altogether.


 On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the language.zero.wikipedia.orglinks 
 within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 
 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org
 ?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation





___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-18 Thread Adam Baso
Update:

We've added an enhancement to Wikipedia Zero so that if a user who isn't on
a participating carrier network navigates to a Wikipedia Zero page on
language.zero.wikipedia.org, such as
http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
presented an option to visit the canonical URL of the article. If clicked,
the canonical URL should get the user to the mobile or desktop version of
the page, based on device type.

We're hoping that by next week the Google index will be refreshed so as to
correctly mark the language.zero.wikipedia.org pages as duplicate pages
in the omitted section. Upon confirmation of as much, the current plan is
to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing
of language.zero.wikipedia.org altogether.


On Tue, May 28, 2013 at 6:26 PM, Adam Baso ab...@wikimedia.org wrote:

 All,

 My mistake. The pages in Google's index that I used for sampling - the
 ones that have Sorry, ... in their description in Google search results -
 are cached pages. I assumed incorrectly that those pages were based on
 recent indexing (e.g., in the past few days).

 I think we can actually stick to the original plan of Google re-indexing
 and the search results de-emphasizing the language.zero.wikipedia.orglinks 
 within the next 30 days.

 I still find it strange that there are language.zero.wikipedia.orglinks 
 that turned up higher in the search engine rankings than their
 better-established language.wikipedia.org counterparts. But I suppose
 with fewer competing page elements, especially on long-tail articles with
 fewer or no direct links to the desktop page, this is maybe not totally
 unexpected.

 -Adam




 On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their
 cache, or if perhaps we need to open a support thread with Google about
 canonical URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.orgwrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org
 ?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation




___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


[Wikimedia-l] Structured Data

2013-05-31 Thread Adam Baso
http://googlewebmastercentral.blogspot.com/2013/05/getting-started-with-structured-data.html

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
Hello All,

We had shelved my patch, patch 64629 https://gerrit.wikimedia.org/r/64629,
in hopes that an earlier patch, patch
61809https://gerrit.wikimedia.org/r/61809(bug
35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
resolve the issue naturally as Google re-indexed. But it appears Google has
re-indexed and yet the .zero.wikipedia.org URLs are still  present in
Google's index, instead of the language.wikipedia.org URLs.

I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for
re-review. We will need to further discuss whether it is appropriate to
have Google completely remove .zero.wikipedia.org links from their cache,
or if perhaps we need to open a support thread with Google about canonical
URLs.




On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.org wrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.org wrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
All,

My mistake. The pages in Google's index that I used for sampling - the ones
that have Sorry, ... in their description in Google search results - are
cached pages. I assumed incorrectly that those pages were based on recent
indexing (e.g., in the past few days).

I think we can actually stick to the original plan of Google re-indexing
and the search results de-emphasizing the
language.zero.wikipedia.orglinks within the next 30 days.

I still find it strange that there are language.zero.wikipedia.org links
that turned up higher in the search engine rankings than their
better-established language.wikipedia.org counterparts. But I suppose
with fewer competing page elements, especially on long-tail articles with
fewer or no direct links to the desktop page, this is maybe not totally
unexpected.

-Adam




On Tue, May 28, 2013 at 1:49 PM, Adam Baso ab...@wikimedia.org wrote:

 Hello All,

 We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629,
 in hopes that an earlier patch, patch 
 61809https://gerrit.wikimedia.org/r/61809(bug
 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would
 resolve the issue naturally as Google re-indexed. But it appears Google has
 re-indexed and yet the .zero.wikipedia.org URLs are still  present in
 Google's index, instead of the language.wikipedia.org URLs.

 I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for
 re-review. We will need to further discuss whether it is appropriate to
 have Google completely remove .zero.wikipedia.org links from their cache,
 or if perhaps we need to open a support thread with Google about canonical
 URLs.




 On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwad...@wikimedia.org wrote:

 Adam Baso (copied on this email) is working on it and a fix is ready.
 He'll do some testing to make sure it's resolved.

 On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tf...@wikimedia.orgwrote:

 Looping Dan Foy in who's managing the Zero backlog.

 On Mon, May 27, 2013 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
  K. Peachey wrote:
 Can you please file this in bugzilla https://bugzilla.wikimedia.org?
 
  https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
 
 
  MZMcBride
 
 
 
  ___
  Wikimedia-l mailing list
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l




 --
 Kul Wadhwa
 Head of Mobile
 Wikimedia Foundation



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l