[Wikimedia-l] [Input requested] Knowledge as a Service at the Wikimedia Developer Summit 2018

2018-01-18 Thread Adam Baso
Greetings Wikimedians,

(And thank you for patience with me cross-posting.)

I'm writing to invite your input on the following Phabricator task ahead of
next week's Wikimedia Developer Summit 2018 [1] session.

Knowledge as a Service
https://phabricator.wikimedia.org/T183315

The purpose [2] of the Wikimedia Developer Summit 2018 sessions is to
provide guidance for Phase 2 of the Movement Strategic Direction [3] on
buildout of technology capabilities. We'd really love your thoughts to help
set context for our session next week, as Knowledge as a Service is a
primary consideration in the Movement Strategic Direction.

What is Knowledge as a Service? Its essence is about information
architecture approaches and the necessary software that will ultimately
allow content consumption and creation to radiate to new and different
types of interfaces and devices in addition to browser-based approaches. As
you review position papers from attendees [4] you'll notice that the way
they (myself included) think about best solving this is through a heavy
emphasis on technology that makes it easier to better structure information
and its metadata for re-use, remixing, and querying.

What might this mean? Does it mean we should build Wikimedia software in an
API- and metadata-first manner following industry standards compatible with
content structuration? Does it mean weaving our existing structured and
semi-structured data technologies together? How do we build technology that
can ensure successful collaboration between communities on increasingly
structured and interdependent information sources? And how can we ensure
the tech will bolster growth of multilingual and multimedia content
creation and consumption?

I've copied some of the essential material from the Movement Strategic
Direction concerning Knowledge as a Service so you have it here. We would
appreciate your input and hope you will subscribe to the Phabricator task
to contribute and follow along as we explore this topic.

https://phabricator.wikimedia.org/T183315

The following content is copied from
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction :


Knowledge as a service: To serve our users, we will become a platform that
serves open knowledge to the world across interfaces and communities. We
will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia. Our infrastructure will enable us and others to
collect and use different forms of free, trusted knowledge.

...

As technology spreads through every aspect of our lives, Wikimedia's
infrastructure needs to be able to communicate easily with other connected
systems.

...

As a platform, we need to transform our structures to support new formats,
new interfaces, and new types of knowledge. We have a strategic opportunity
to go further and offer this platform as a service to other institutions,
beyond Wikimedia. In a world that is becoming more and more connected,
building the infrastructure for knowledge gives others a vested interest in
our success. It is how we ensure our place in the larger network of
knowledge, and become an essential part of it. As a service to users, we
need to build the platform for knowledge or, in jargon, provide knowledge
as a service.

...

Knowledge as a service: A platform that serves open knowledge to the world
across interfaces and communities
Our openness will ensure that our decisions are fair, that we are
accountable to one another, and that we act in the public interest. Our
systems will follow the evolution of technology. We will transform our
platform to work across digital formats, devices, and interfaces. The
distributed structure of our network will help us adapt to local contexts.

...


We will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia.
We will continue to build the infrastructure for free knowledge for our
communities. We will go further by offering it as a service to others in
the network of knowledge. We will continue to build the partnerships that
enable us to develop knowledge we can't create ourselves.

...

Our infrastructure will enable us and others to collect and use different
forms of free, trusted knowledge.
We will build the technical infrastructures that enable us to collect free
knowledge in all forms and languages. We will use our position as a leader
in the ecosystem of knowledge to advance our ideals of freedom and
fairness. We will build the technical structures and the social agreements
that enable us to trust the new knowledge we compile. We will focus on
highly structured information to facilitate its exchange and reuse in
multiple contexts.


Thank you.
-Adam


[1] https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018
[2]
https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018/Purpose_and_Results
[3]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction
[4] https://wikifarm.wmflabs.org/devsummit/

Re: [Wikimedia-l] Wikipedia.org portal page update!

2016-03-11 Thread Adam Baso
>
> The image thumbnail thing fails to filter our fair use images along
> with the usual lack of author and licensing information.


Hi there - speaking to one thing I'm familiar with, with respect to image
selection, we believe https://phabricator.wikimedia.org/T124225 should
address fair use ("non-free") images, although page reparses will happen
gradually (pages are cached for up to 30 days or so).

-Adam
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Fwd: [WikimediaMobile] "Among mobile sites, Wikipedia reigns in terms of popularity"

2016-05-12 Thread Adam Baso
Inline.

On Thu, May 12, 2016 at 1:28 AM, MZMcBride  wrote:

> Steven Walling wrote:
> >It's really great to see Wikipedia highlighted as a source for news and
> >current events. It's rare that people fully recognize the degree to which
> >the "encyclopedia" is actually very good at trending news information.
> >That said, the report paints a rosy picture that, strategically speaking,
> >may not be cause for celebration.
>
> Does the Knight Foundation disclose somewhere in this report that it's a
> donor to the Wikimedia Foundation?
>

I didn't see it in there. Looks like this was commissioned work.


>
> Comparing Wikipedia to sites like BuzzFeed and CNN seems to be a pretty
> classic case of comparing apples to oranges.
>

It seems like it's contextualized in
https://medium.com/mobile-first-news-how-people-use-smartphones-to/mobile-america-how-different-audiences-tap-mobile-news-1c72525210d7
as follows:

"The information and reference site Wikipedia is linked to news behavior
and is a critical pathway to the news and information ecosystem."

"Information and reference sites are linked to news behavior and often
drive traffic to news content. Wikipedia figures prominently in mobile
content access."



> >Neglecting to show people the value of the apps will help grow mobile web
> >traffic in the short term, but in the long run may leave us entirely
> >dependent on search (i.e. Google) or simply not growing readers, despite
> >millions of people still coming online via mobile.
>
> Can you elaborate on the value of the apps? HTTP is a free and open
> standard with very wide support. iOS is closed and proprietary. Maybe you
> can explain how investing resources into the latter aligns with
> Wikimedia's values?


The web and the apps are both ways to provide access to the openly licensed
content. We should and do invest in both on the grounds of reaching users
through popular channels.

I think Steven was talking about something different in terms of strategic
risk of disintermediation, though. I don't see the future as dystopian.
Rather I see a more utopian future requiring continued improvement in
dialogue with communities and nurturing of partnerships.


>
> Personally, I say hasten the day that we abolish the horrible "m." from
> our URLs and MobileFrontend from our servers.
>

I don't think it's something we're planning to do soon, but I agree it
would be nice to consolidate domain names.

I don't think we're at the place where we can yet deliver on converging the
full tech stack (and this was the feedback we basically received at the dev
summit), although I think we should keep iterating on this discussion over
the coming quarters. There are good things in MediaWiki Core and
desktop-oriented extensions and there are good things in MobileFrontend,
and I'm interested to see how we can port some things over if not
consolidate some pieces.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Invitation to WMF November 2014 Metrics & Activities Meeting: Thursday, December 4, 19:00 UTC

2014-12-08 Thread Adam Baso
Hi there, just wanted to touch on the autoredirection stuff. The thing
mentioned on autoredirection is an enhancement for accesses to
m.wikipedia.org/ webroot (not articles) for Wikipedia Zero users. As
before, non-Wikipedia Zero users accessing m.wikipedia.org/ webroot
continue to get redirected to en.m.wikipedia.org.

It seems thus far that the enhancement for Wikipedia Zero users isn't
causing harm, and our thinking is that if that holds, we should examine
some application of the approach to m.wikipedia.org/ non-Wikipedia
Zero-sourced access as well.

As an extension of this thinking, looking into alternative placement of
"Read in another language" or even a language shortlist (e.g., an API
endpoint looks at Accept-Language and the top 3 pertinent languages get
shimmed in) above the fold pertinent for the given user, taking into
account JavaScript support level, may be worthwhile.

-Adam




On Sun, Dec 7, 2014 at 4:48 PM, Asaf Bartov  wrote:

> Hi.
>
> On Fri, Dec 5, 2014 at 9:27 AM, C. Scott Ananian 
> wrote:
>
> > On Thu, Dec 4, 2014 at 7:18 PM, Asaf Bartov 
> wrote:
> > > On Thu, Dec 4, 2014 at 1:23 PM, C. Scott Ananian <
> canan...@wikimedia.org
> > >
> > > wrote:
> > >> 1) Is the rise in global south page views specifically to *enwiki*, or
> > >> is it to local wikis?
> > > Not actually an either/or.  The answer seems to me to be "yes", i.e.
> all
> > > wikis -- that is, all projects, all languages.
> >
> > It may *seem to you* to be "yes", but the data indicates that the
> > answer differs, depending where you look.  For example, the data
> > clearly indicates that the stunning rise in Iran is almost entirely
> > due to enwiki.  enwiki gains over 80 million page views, fawiki gains
> > only 10 million.  See
> > https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics for a
> > convincing graph.
> >
> > I think it's important that we determine the actual answers to these
> > questions, instead of trusting our instincts.
> >
>
> I definitely agree.  I had misread your question to mean "is the rise
> computed across all wikis", which is indeed not what you were asking.  I
> apologize for the irrelevant answer.
>
>
> >  > Some definitely do.  Another major factor, mentioned today, is that in
> > some
> > > countries, mobile devices just don't come with good local languages
> > > support, and people are putting up with that and using what the device
> > does
> > > give them, which are generally the major, colonial languages.
> >
> > Hm, the word "colonial" bothers me here.  I know you mean
> > "historically colonial", but in the modern world English is also a
> > trade language, not just a formerly-colonial language.  Much access to
> > enwiki is due to its trade-language status.
> >
>
> Certainly, there are very strong economic incentives to use English these
> days, and additionally other incentives, such as prestige real and
> imagined, still operating (and those, themselves, are still ripples of
> colonialism), but I did not mean 'colonial' here particularly strongly.  I
> could have written "European", I suppose, except there are many languages
> in Europe, and only a handful have been colonial languages.  But the term
> is not important here, I think.
>
>
> > I feel strongly that we have a moral obligation to offer good local
> > language support, but I also feel that we shouldn't label and dismiss
> > readers who want to learn/practice/find information in a trade
> > language. (This is one of the reasons I'm a fan of simplewiki, but
> > that's a whole 'nuther discussion.)
> >
>
> I don't see that I (or anyone) did dismiss that.  In terms of our strategic
> goals of Reach and Participation, we are agnostic about which languages
> people contribute in, or consume in.  In terms of our strategic goal of
> Diversity however, we do want to work towards adequate offerings in all
> languages in which people are actually seeking to consume knowledge.
>
>
> >  On Fri, Dec 5, 2014 at 2:05 AM, Salvador A 
> > wrote:
> > > I was reading the presentation on metrics and the point about Mexico's
> > > decreasing of views on Wikipedia called my attention.
> >
> > I dug into the numbers a little more; see the graphs at
> > https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics
> >
> > It's a bit confusing.  At this moment I'm inclined to say that the
> > computation of "decliners" was in some way erroneous; neither the page
> > views for Mexico nor the overall pageviews for eswiki seem to support
> > the large annual declines reported.
> >
> > On https://en.wikipedia.org/wiki/User:Cscott/2014_December_metrics I
> > compute an annual decline for Mexico of 12.4% (compared to 23.2%
> > reported at the metrics meeting), which compares to an eswiki annual
> > decline of 4.8% (excludings bots and spiders).
> >
> > So Mexico is indeed concerning -- it's declining at three times the
> > eswiki rate.  But eswiki as a whole seems like it ought to also be a
> > concern.  And I'd like to understand why I can

[Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-15 Thread Adam Baso
I emailed mobile-l and wikitech-l about this, now I'm moving this
discussion to wikimedia-l. Here's the longer technical thread:

http://lists.wikimedia.org/pipermail/mobile-l/2014-April/006884.html

In summary, to show Wikipedia Zero banners for the correct mobile networks,
we are planning once for each cellular-based app session to log two pieces
of data in a specialized logfile, deleting log entries older than 90 days.

1. MCC-MNC  code (format
is ###-##), which denotes the mobile operator
2. Exit (gateway/proxy) IP address
* These data points would not be logged alongside the normal web access
logs.

This information could be used to estimate rough demand for Wikipedia in
potential Wikipedia Zero geos, although remediating the out-of-sync IP
addresses on file for existing partners is primary.

Internal review suggests this is in alignment with privacy policy, and we
wanted to see if there were other thoughts on this approach here on
wikimedia-l.

-Adam
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-16 Thread Adam Baso
Inline.

Thanks for starting this thread.
>
> Sorry if I've overlooked this, but who/what will have access to this data?
> Only members of the mobile team? Local project CheckUsers? Wikimedia
> Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> filters?
>

It's a good question. The thought is to put it in the customary wfDebugLog
location (with, for example, filename "mccmnc.log") on fluorine.

It just occurred to me that the wiki name (e.g., "enwiki"), but not the
full URL, gets logged additionally as part of the wfDebugLog call; to make
the implicit explicit, wfDebugLog adds a datetime stamp as well, and that's
useful for purging old records. I'll forward this email to mobile-l and
wikitech-l to underscore this.


> And this may be a silly question, but is there a reasonable means of
> approximating how identifying these two data points alone are? That is,
> Using a mobile country code and exit IP address, is it possible to
> identify a particular editor or reader? Or perhaps rephrased, is this data
> considered anonymized?
>

Not a silly question. My approximation is these tuples (datetime, now that
it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
anonymized, are low identifying (that is, indirect inferences on the data
in isolation are unlikely, but technically possible, through examination of
short tail outliers in a cluster analysis where such readers/editors exist
in the short tail outliers sets), in contrast to regular web access logs
(where direct inferences are easy).

Thanks. I'll forward this along now.

-Adam
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-04-16 Thread Adam Baso
Great idea!

Anyone on the list know if there's a way to make the debug log facilities
do the MMDD timestamp instead of the longer one?

If not, I suppose we could work to update the core MediaWiki code. [1]

-Adam

1. For those with PHP skills or equivalent, I'm referring to
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
Scroll to the bottom of the function definition to see the datetimestamp
approach.


On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray wrote:

> Hi Adam,
>
> One thought: you don't really need the date/time data at any detailed
> resolution, do you? If what you're wanting it for is to track major
> changes ("last month it all switched to this IP") and to purge old
> data ("delete anything older than 10 March"), you could simply log day
> rather than datetime.
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>
> - the latter gives you the data you need while making it a lot harder
> to do any kind of close user-identification.
>
> Andrew.
> On 16 Apr 2014 19:17, "Adam Baso"  wrote:
>
> > Inline.
> >
> > Thanks for starting this thread.
> > >
> > > Sorry if I've overlooked this, but who/what will have access to this
> > data?
> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> > > filters?
> > >
> >
> > It's a good question. The thought is to put it in the customary
> wfDebugLog
> > location (with, for example, filename "mccmnc.log") on fluorine.
> >
> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
> > full URL, gets logged additionally as part of the wfDebugLog call; to
> make
> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
> that's
> > useful for purging old records. I'll forward this email to mobile-l and
> > wikitech-l to underscore this.
> >
> >
> > > And this may be a silly question, but is there a reasonable means of
> > > approximating how identifying these two data points alone are? That is,
> > > Using a mobile country code and exit IP address, is it possible to
> > > identify a particular editor or reader? Or perhaps rephrased, is this
> > data
> > > considered anonymized?
> > >
> >
> > Not a silly question. My approximation is these tuples (datetime, now
> that
> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
> > anonymized, are low identifying (that is, indirect inferences on the data
> > in isolation are unlikely, but technically possible, through examination
> of
> > short tail outliers in a cluster analysis where such readers/editors
> exist
> > in the short tail outliers sets), in contrast to regular web access logs
> > (where direct inferences are easy).
> >
> > Thanks. I'll forward this along now.
> >
> > -Adam
> > ___
> > Wikimedia-l mailing list
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-01 Thread Adam Baso
After examining this, it looks like EventLogging is more suited to the
logging task than debug logging and the trappings of needing to alter debug
logging in the core MediaWiki software.

EventLogging logs at the resolution of a second (instead of a day), but has
inbuilt support for record removal after 90 days.

Please do let us know in case of further questions. Here's the logging
schema for those with an interest:

https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode

Here's the relevant server code:

https://gerrit.wikimedia.org/r/#/c/130991/

-Adam




On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso  wrote:

> Great idea!
>
> Anyone on the list know if there's a way to make the debug log facilities
> do the MMDD timestamp instead of the longer one?
>
> If not, I suppose we could work to update the core MediaWiki code. [1]
>
> -Adam
>
> 1. For those with PHP skills or equivalent, I'm referring to
> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
> Scroll to the bottom of the function definition to see the datetimestamp
> approach.
>
>
> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
> wrote:
>
>> Hi Adam,
>>
>> One thought: you don't really need the date/time data at any detailed
>> resolution, do you? If what you're wanting it for is to track major
>> changes ("last month it all switched to this IP") and to purge old
>> data ("delete anything older than 10 March"), you could simply log day
>> rather than datetime.
>>
>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>
>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>
>> - the latter gives you the data you need while making it a lot harder
>> to do any kind of close user-identification.
>>
>> Andrew.
>> On 16 Apr 2014 19:17, "Adam Baso"  wrote:
>>
>> > Inline.
>> >
>> > Thanks for starting this thread.
>> > >
>> > > Sorry if I've overlooked this, but who/what will have access to this
>> > data?
>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>> > > filters?
>> > >
>> >
>> > It's a good question. The thought is to put it in the customary
>> wfDebugLog
>> > location (with, for example, filename "mccmnc.log") on fluorine.
>> >
>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
>> > full URL, gets logged additionally as part of the wfDebugLog call; to
>> make
>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
>> that's
>> > useful for purging old records. I'll forward this email to mobile-l and
>> > wikitech-l to underscore this.
>> >
>> >
>> > > And this may be a silly question, but is there a reasonable means of
>> > > approximating how identifying these two data points alone are? That
>> is,
>> > > Using a mobile country code and exit IP address, is it possible to
>> > > identify a particular editor or reader? Or perhaps rephrased, is this
>> > data
>> > > considered anonymized?
>> > >
>> >
>> > Not a silly question. My approximation is these tuples (datetime, now
>> that
>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
>> > anonymized, are low identifying (that is, indirect inferences on the
>> data
>> > in isolation are unlikely, but technically possible, through
>> examination of
>> > short tail outliers in a cluster analysis where such readers/editors
>> exist
>> > in the short tail outliers sets), in contrast to regular web access logs
>> > (where direct inferences are easy).
>> >
>> > Thanks. I'll forward this along now.
>> >
>> > -Adam
>> > ___
>> > Wikimedia-l mailing list
>> > Wikimedia-l@lists.wikimedia.org
>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>> ___
>> Wikimedia-l mailing list
>> Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>>
>
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-02 Thread Adam Baso
Federico asked if sampling might make sense here. I think it will work, so
I've updated the patchset.

From a patchset comment I provided:

"It's possible we may have situations where operators have not lots of
users on them accessing Wiki(m|p)edia properties, so we do run some risk of
actually missing IPs, even if exit IPs are concentrators of typically large
sets of users. That said, let's try a 2% sample ratio; and if we find out
it's insufficient, then we'll sample more, if it's oversampling, then we
can adjust the other way, too. New patchset arriving shortly."

(I've since submitted the updated code for review.)

-Adam


On Thu, May 1, 2014 at 7:52 PM, Adam Baso  wrote:

> After examining this, it looks like EventLogging is more suited to the
> logging task than debug logging and the trappings of needing to alter debug
> logging in the core MediaWiki software.
>
> EventLogging logs at the resolution of a second (instead of a day), but
> has inbuilt support for record removal after 90 days.
>
> Please do let us know in case of further questions. Here's the logging
> schema for those with an interest:
>
> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>
> Here's the relevant server code:
>
> https://gerrit.wikimedia.org/r/#/c/130991/
>
> -Adam
>
>
>
>
> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso  wrote:
>
>> Great idea!
>>
>> Anyone on the list know if there's a way to make the debug log facilities
>> do the MMDD timestamp instead of the longer one?
>>
>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>
>> -Adam
>>
>> 1. For those with PHP skills or equivalent, I'm referring to
>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
>> Scroll to the bottom of the function definition to see the datetimestamp
>> approach.
>>
>>
>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray 
>> wrote:
>>
>>> Hi Adam,
>>>
>>> One thought: you don't really need the date/time data at any detailed
>>> resolution, do you? If what you're wanting it for is to track major
>>> changes ("last month it all switched to this IP") and to purge old
>>> data ("delete anything older than 10 March"), you could simply log day
>>> rather than datetime.
>>>
>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>>
>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>>
>>> - the latter gives you the data you need while making it a lot harder
>>> to do any kind of close user-identification.
>>>
>>> Andrew.
>>> On 16 Apr 2014 19:17, "Adam Baso"  wrote:
>>>
>>> > Inline.
>>> >
>>> > Thanks for starting this thread.
>>> > >
>>> > > Sorry if I've overlooked this, but who/what will have access to this
>>> > data?
>>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>>> > > filters?
>>> > >
>>> >
>>> > It's a good question. The thought is to put it in the customary
>>> wfDebugLog
>>> > location (with, for example, filename "mccmnc.log") on fluorine.
>>> >
>>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
>>> > full URL, gets logged additionally as part of the wfDebugLog call; to
>>> make
>>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
>>> that's
>>> > useful for purging old records. I'll forward this email to mobile-l and
>>> > wikitech-l to underscore this.
>>> >
>>> >
>>> > > And this may be a silly question, but is there a reasonable means of
>>> > > approximating how identifying these two data points alone are? That
>>> is,
>>> > > Using a mobile country code and exit IP address, is it possible to
>>> > > identify a particular editor or reader? Or perhaps rephrased, is this
>>> > data
>>> > > considered anonymized?
>>> > >
>>> >
>>> > Not a silly question. My approximation is these tuples (datetime, now
>>> that
>>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
>>> > anonymized, are low identifyin

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-05-30 Thread Adam Baso
Okay, the code is in place in the alphas of both the Android and iOS apps,
and the server-side 2% sampling (extra header in HTTPS request sent once
per cellular app session) is working.

https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b

https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce

https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287

Changes to event logging in the iOS alpha app (internal only at the moment,
although repo can be cloned and run in the Xcode simulator) are coming
pretty soon, and once those are in, we'll make one last tweak there to have
the app not add the extra MCC/MNC header on that single request per
cellular connection when logging is turned off in the iOS alpha app. That
part is done in the Android app already.

-Adam




On Fri, May 2, 2014 at 1:16 PM, Adam Baso  wrote:

> Federico asked if sampling might make sense here. I think it will work, so
> I've updated the patchset.
>
> From a patchset comment I provided:
>
> "It's possible we may have situations where operators have not lots of
> users on them accessing Wiki(m|p)edia properties, so we do run some risk of
> actually missing IPs, even if exit IPs are concentrators of typically large
> sets of users. That said, let's try a 2% sample ratio; and if we find out
> it's insufficient, then we'll sample more, if it's oversampling, then we
> can adjust the other way, too. New patchset arriving shortly."
>
> (I've since submitted the updated code for review.)
>
> -Adam
>
>
>
> On Thu, May 1, 2014 at 7:52 PM, Adam Baso  wrote:
>
>> After examining this, it looks like EventLogging is more suited to the
>> logging task than debug logging and the trappings of needing to alter debug
>> logging in the core MediaWiki software.
>>
>> EventLogging logs at the resolution of a second (instead of a day), but
>> has inbuilt support for record removal after 90 days.
>>
>> Please do let us know in case of further questions. Here's the logging
>> schema for those with an interest:
>>
>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>>
>> Here's the relevant server code:
>>
>> https://gerrit.wikimedia.org/r/#/c/130991/
>>
>> -Adam
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso  wrote:
>>
>>> Great idea!
>>>
>>> Anyone on the list know if there's a way to make the debug log
>>> facilities do the MMDD timestamp instead of the longer one?
>>>
>>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>>
>>> -Adam
>>>
>>> 1. For those with PHP skills or equivalent, I'm referring to
>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
>>> Scroll to the bottom of the function definition to see the datetimestamp
>>> approach.
>>>
>>>
>>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray >> > wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> One thought: you don't really need the date/time data at any detailed
>>>> resolution, do you? If what you're wanting it for is to track major
>>>> changes ("last month it all switched to this IP") and to purge old
>>>> data ("delete anything older than 10 March"), you could simply log day
>>>> rather than datetime.
>>>>
>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>>>
>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>>>
>>>> - the latter gives you the data you need while making it a lot harder
>>>> to do any kind of close user-identification.
>>>>
>>>> Andrew.
>>>> On 16 Apr 2014 19:17, "Adam Baso"  wrote:
>>>>
>>>> > Inline.
>>>> >
>>>> > Thanks for starting this thread.
>>>> > >
>>>> > > Sorry if I've overlooked this, but who/what will have access to this
>>>> > data?
>>>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>>>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>>>> > > filters?
>>>> > >
>>>> >
>>>> > It's a good question. The thought is to put it in the customary
>>>

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-06-23 Thread Adam Baso
One wrinkle we've encountered and sort of expected, is that the SIM card
MCC-MNC doesn't always match the actual network MCC-MNC. So on Android,
we'll add both to the payload so that we can differentiate them. On iOS it
looks like the API only currently allows one of these values through an
opaque method call. The previous EventLogging server side code wasn't
logging the User-Agent (defined coarsely in our code on both platforms).
I'm thinking to make it evident when we're dealing with an iOS version of
the app, it would make most sense to re-enable the User-Agent so we can
pick up this coarse-grained value. I wanted to put this User-Agent item out
here for a brief period before adding the code, though.

-Adam




On Fri, May 30, 2014 at 2:04 PM, Adam Baso  wrote:

> Okay, the code is in place in the alphas of both the Android and iOS apps,
> and the server-side 2% sampling (extra header in HTTPS request sent once
> per cellular app session) is working.
>
>
> https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b
>
>
> https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce
>
>
> https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287
>
> Changes to event logging in the iOS alpha app (internal only at the
> moment, although repo can be cloned and run in the Xcode simulator) are
> coming pretty soon, and once those are in, we'll make one last tweak there
> to have the app not add the extra MCC/MNC header on that single request per
> cellular connection when logging is turned off in the iOS alpha app. That
> part is done in the Android app already.
>
> -Adam
>
>
>
>
> On Fri, May 2, 2014 at 1:16 PM, Adam Baso  wrote:
>
>> Federico asked if sampling might make sense here. I think it will work,
>> so I've updated the patchset.
>>
>> From a patchset comment I provided:
>>
>> "It's possible we may have situations where operators have not lots of
>> users on them accessing Wiki(m|p)edia properties, so we do run some risk of
>> actually missing IPs, even if exit IPs are concentrators of typically large
>> sets of users. That said, let's try a 2% sample ratio; and if we find out
>> it's insufficient, then we'll sample more, if it's oversampling, then we
>> can adjust the other way, too. New patchset arriving shortly."
>>
>> (I've since submitted the updated code for review.)
>>
>> -Adam
>>
>>
>>
>> On Thu, May 1, 2014 at 7:52 PM, Adam Baso  wrote:
>>
>>> After examining this, it looks like EventLogging is more suited to the
>>> logging task than debug logging and the trappings of needing to alter debug
>>> logging in the core MediaWiki software.
>>>
>>> EventLogging logs at the resolution of a second (instead of a day), but
>>> has inbuilt support for record removal after 90 days.
>>>
>>> Please do let us know in case of further questions. Here's the logging
>>> schema for those with an interest:
>>>
>>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>>>
>>> Here's the relevant server code:
>>>
>>> https://gerrit.wikimedia.org/r/#/c/130991/
>>>
>>> -Adam
>>>
>>>
>>>
>>>
>>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso  wrote:
>>>
>>>> Great idea!
>>>>
>>>> Anyone on the list know if there's a way to make the debug log
>>>> facilities do the MMDD timestamp instead of the longer one?
>>>>
>>>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>>>
>>>> -Adam
>>>>
>>>> 1. For those with PHP skills or equivalent, I'm referring to
>>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
>>>> Scroll to the bottom of the function definition to see the datetimestamp
>>>> approach.
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <
>>>> andrew.g...@dunelm.org.uk> wrote:
>>>>
>>>>> Hi Adam,
>>>>>
>>>>> One thought: you don't really need the date/time data at any detailed
>>>>> resolution, do you? If what you're wanting it for is to track major
>>>>> changes ("last month it all switched to this IP") and to purge old
>

Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation

2014-06-24 Thread Adam Baso
Here's the patch update.

https://gerrit.wikimedia.org/r/141740


On Mon, Jun 23, 2014 at 3:30 PM, Adam Baso  wrote:

> One wrinkle we've encountered and sort of expected, is that the SIM card
> MCC-MNC doesn't always match the actual network MCC-MNC. So on Android,
> we'll add both to the payload so that we can differentiate them. On iOS it
> looks like the API only currently allows one of these values through an
> opaque method call. The previous EventLogging server side code wasn't
> logging the User-Agent (defined coarsely in our code on both platforms).
> I'm thinking to make it evident when we're dealing with an iOS version of
> the app, it would make most sense to re-enable the User-Agent so we can
> pick up this coarse-grained value. I wanted to put this User-Agent item out
> here for a brief period before adding the code, though.
>
> -Adam
>
>
>
>
> On Fri, May 30, 2014 at 2:04 PM, Adam Baso  wrote:
>
>> Okay, the code is in place in the alphas of both the Android and iOS
>> apps, and the server-side 2% sampling (extra header in HTTPS request sent
>> once per cellular app session) is working.
>>
>>
>> https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b
>>
>>
>> https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce
>>
>>
>> https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287
>>
>> Changes to event logging in the iOS alpha app (internal only at the
>> moment, although repo can be cloned and run in the Xcode simulator) are
>> coming pretty soon, and once those are in, we'll make one last tweak there
>> to have the app not add the extra MCC/MNC header on that single request per
>> cellular connection when logging is turned off in the iOS alpha app. That
>> part is done in the Android app already.
>>
>> -Adam
>>
>>
>>
>>
>> On Fri, May 2, 2014 at 1:16 PM, Adam Baso  wrote:
>>
>>> Federico asked if sampling might make sense here. I think it will work,
>>> so I've updated the patchset.
>>>
>>> From a patchset comment I provided:
>>>
>>> "It's possible we may have situations where operators have not lots of
>>> users on them accessing Wiki(m|p)edia properties, so we do run some risk of
>>> actually missing IPs, even if exit IPs are concentrators of typically large
>>> sets of users. That said, let's try a 2% sample ratio; and if we find out
>>> it's insufficient, then we'll sample more, if it's oversampling, then we
>>> can adjust the other way, too. New patchset arriving shortly."
>>>
>>> (I've since submitted the updated code for review.)
>>>
>>> -Adam
>>>
>>>
>>>
>>> On Thu, May 1, 2014 at 7:52 PM, Adam Baso  wrote:
>>>
>>>> After examining this, it looks like EventLogging is more suited to the
>>>> logging task than debug logging and the trappings of needing to alter debug
>>>> logging in the core MediaWiki software.
>>>>
>>>> EventLogging logs at the resolution of a second (instead of a day), but
>>>> has inbuilt support for record removal after 90 days.
>>>>
>>>> Please do let us know in case of further questions. Here's the logging
>>>> schema for those with an interest:
>>>>
>>>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>>>>
>>>> Here's the relevant server code:
>>>>
>>>> https://gerrit.wikimedia.org/r/#/c/130991/
>>>>
>>>> -Adam
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso  wrote:
>>>>
>>>>> Great idea!
>>>>>
>>>>> Anyone on the list know if there's a way to make the debug log
>>>>> facilities do the MMDD timestamp instead of the longer one?
>>>>>
>>>>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>>>>
>>>>> -Adam
>>>>>
>>>>> 1. For those with PHP skills or equivalent, I'm referring to
>>>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
>>>>> Scroll to the bottom of the function definition to see the datetimestamp
&g

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
Hello All,

We had shelved my patch, patch 64629 <https://gerrit.wikimedia.org/r/64629>,
in hopes that an earlier patch, patch
61809<https://gerrit.wikimedia.org/r/61809>(bug
35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
resolve the issue naturally as Google re-indexed. But it appears Google has
re-indexed and yet the .zero.wikipedia.org URLs are still  present in
Google's index, instead of the .wikipedia.org URLs.

I have thus resubmitted patch 64629 <https://gerrit.wikimedia.org/r/64629> for
re-review. We will need to further discuss whether it is appropriate to
have Google completely remove .zero.wikipedia.org links from their cache,
or if perhaps we need to open a support thread with Google about canonical
URLs.




On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa  wrote:

> Adam Baso (copied on this email) is working on it and a fix is ready.
> He'll do some testing to make sure it's resolved.
>
> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc  wrote:
>
>> Looping Dan Foy in who's managing the Zero backlog.
>>
>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride  wrote:
>> > K. Peachey wrote:
>> >>Can you please file this in bugzilla <https://bugzilla.wikimedia.org>?
>> >
>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>> >
>> >
>> > MZMcBride
>> >
>> >
>> >
>> > ___
>> > Wikimedia-l mailing list
>> > Wikimedia-l@lists.wikimedia.org
>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>
>> ___
>> Wikimedia-l mailing list
>> Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>
>
>
>
> --
> Kul Wadhwa
> Head of Mobile
> Wikimedia Foundation
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-05-28 Thread Adam Baso
All,

My mistake. The pages in Google's index that I used for sampling - the ones
that have "Sorry, ..." in their description in Google search results - are
cached pages. I assumed incorrectly that those pages were based on recent
indexing (e.g., in the past few days).

I think we can actually stick to the original plan of Google re-indexing
and the search results de-emphasizing the
.zero.wikipedia.orglinks within the next 30 days.

I still find it strange that there are .zero.wikipedia.org links
that turned up higher in the search engine rankings than their
better-established .wikipedia.org counterparts. But I suppose
with fewer competing page elements, especially on long-tail articles with
fewer or no direct links to the desktop page, this is maybe not totally
unexpected.

-Adam




On Tue, May 28, 2013 at 1:49 PM, Adam Baso  wrote:

> Hello All,
>
> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
> in hopes that an earlier patch, patch 
> 61809<https://gerrit.wikimedia.org/r/61809>(bug
> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
> resolve the issue naturally as Google re-indexed. But it appears Google has
> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
> Google's index, instead of the .wikipedia.org URLs.
>
> I have thus resubmitted patch 64629 <https://gerrit.wikimedia.org/r/64629> for
> re-review. We will need to further discuss whether it is appropriate to
> have Google completely remove .zero.wikipedia.org links from their cache,
> or if perhaps we need to open a support thread with Google about canonical
> URLs.
>
>
>
>
> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa  wrote:
>
>> Adam Baso (copied on this email) is working on it and a fix is ready.
>> He'll do some testing to make sure it's resolved.
>>
>> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc wrote:
>>
>>> Looping Dan Foy in who's managing the Zero backlog.
>>>
>>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride  wrote:
>>> > K. Peachey wrote:
>>> >>Can you please file this in bugzilla <https://bugzilla.wikimedia.org>?
>>> >
>>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>>> >
>>> >
>>> > MZMcBride
>>> >
>>> >
>>> >
>>> > ___
>>> > Wikimedia-l mailing list
>>> > Wikimedia-l@lists.wikimedia.org
>>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>
>>> ___
>>> Wikimedia-l mailing list
>>> Wikimedia-l@lists.wikimedia.org
>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>
>>
>>
>>
>> --
>> Kul Wadhwa
>> Head of Mobile
>> Wikimedia Foundation
>>
>
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


[Wikimedia-l] Structured Data

2013-05-31 Thread Adam Baso
http://googlewebmastercentral.blogspot.com/2013/05/getting-started-with-structured-data.html

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-18 Thread Adam Baso
Update:

We've added an enhancement to Wikipedia Zero so that if a user who isn't on
a participating carrier network navigates to a Wikipedia Zero page on
.zero.wikipedia.org, such as
http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
presented an option to visit the canonical URL of the article. If clicked,
the canonical URL should get the user to the mobile or desktop version of
the page, based on device type.

We're hoping that by next week the Google index will be refreshed so as to
correctly mark the .zero.wikipedia.org pages as duplicate pages
in the omitted section. Upon confirmation of as much, the current plan is
to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing
of .zero.wikipedia.org altogether.


On Tue, May 28, 2013 at 6:26 PM, Adam Baso  wrote:

> All,
>
> My mistake. The pages in Google's index that I used for sampling - the
> ones that have "Sorry, ..." in their description in Google search results -
> are cached pages. I assumed incorrectly that those pages were based on
> recent indexing (e.g., in the past few days).
>
> I think we can actually stick to the original plan of Google re-indexing
> and the search results de-emphasizing the .zero.wikipedia.orglinks 
> within the next 30 days.
>
> I still find it strange that there are .zero.wikipedia.orglinks 
> that turned up higher in the search engine rankings than their
> better-established .wikipedia.org counterparts. But I suppose
> with fewer competing page elements, especially on long-tail articles with
> fewer or no direct links to the desktop page, this is maybe not totally
> unexpected.
>
> -Adam
>
>
>
>
> On Tue, May 28, 2013 at 1:49 PM, Adam Baso  wrote:
>
>> Hello All,
>>
>> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
>> in hopes that an earlier patch, patch 
>> 61809<https://gerrit.wikimedia.org/r/61809>(bug
>> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
>> resolve the issue naturally as Google re-indexed. But it appears Google has
>> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
>> Google's index, instead of the .wikipedia.org URLs.
>>
>> I have thus resubmitted patch 64629<https://gerrit.wikimedia.org/r/64629> for
>> re-review. We will need to further discuss whether it is appropriate to
>> have Google completely remove .zero.wikipedia.org links from their
>> cache, or if perhaps we need to open a support thread with Google about
>> canonical URLs.
>>
>>
>>
>>
>> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa wrote:
>>
>>> Adam Baso (copied on this email) is working on it and a fix is ready.
>>> He'll do some testing to make sure it's resolved.
>>>
>>> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc wrote:
>>>
>>>> Looping Dan Foy in who's managing the Zero backlog.
>>>>
>>>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride  wrote:
>>>> > K. Peachey wrote:
>>>> >>Can you please file this in bugzilla <https://bugzilla.wikimedia.org
>>>> >?
>>>> >
>>>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>>>> >
>>>> >
>>>> > MZMcBride
>>>> >
>>>> >
>>>> >
>>>> > ___
>>>> > Wikimedia-l mailing list
>>>> > Wikimedia-l@lists.wikimedia.org
>>>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>
>>>> ___
>>>> Wikimedia-l mailing list
>>>> Wikimedia-l@lists.wikimedia.org
>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>
>>>
>>>
>>>
>>> --
>>> Kul Wadhwa
>>> Head of Mobile
>>> Wikimedia Foundation
>>>
>>
>>
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-06-26 Thread Adam Baso
(cross-posted on mobile-l)

Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk
by some 20 million entries. Nonetheless, a number of really old pages
(e.g., going back to 6-May-2013) are still in the Google index with article
text. I'll set a reminder to check on the Google index again in 30 days,
and hopefully then we can finally put the no-index rules in place at that
time.

The good news is that many of the pages are now correctly suppressed in
natural search as non-canonical pages. In other words, a user would need to
go through omitted results or do a site: search to see them.

-Adam


On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso  wrote:

> Update:
>
> We've added an enhancement to Wikipedia Zero so that if a user who isn't
> on a participating carrier network navigates to a Wikipedia Zero page on
> .zero.wikipedia.org, such as
> http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
> presented an option to visit the canonical URL of the article. If clicked,
> the canonical URL should get the user to the mobile or desktop version of
> the page, based on device type.
>
> We're hoping that by next week the Google index will be refreshed so as to
> correctly mark the .zero.wikipedia.org pages as duplicate pages
> in the omitted section. Upon confirmation of as much, the current plan is
> to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent
> indexing of .zero.wikipedia.org altogether.
>
>
> On Tue, May 28, 2013 at 6:26 PM, Adam Baso  wrote:
>
>> All,
>>
>> My mistake. The pages in Google's index that I used for sampling - the
>> ones that have "Sorry, ..." in their description in Google search results -
>> are cached pages. I assumed incorrectly that those pages were based on
>> recent indexing (e.g., in the past few days).
>>
>> I think we can actually stick to the original plan of Google re-indexing
>> and the search results de-emphasizing the .zero.wikipedia.orglinks 
>> within the next 30 days.
>>
>> I still find it strange that there are .zero.wikipedia.orglinks 
>> that turned up higher in the search engine rankings than their
>> better-established .wikipedia.org counterparts. But I suppose
>> with fewer competing page elements, especially on long-tail articles with
>> fewer or no direct links to the desktop page, this is maybe not totally
>> unexpected.
>>
>> -Adam
>>
>>
>>
>>
>> On Tue, May 28, 2013 at 1:49 PM, Adam Baso  wrote:
>>
>>> Hello All,
>>>
>>> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
>>> in hopes that an earlier patch, patch 
>>> 61809<https://gerrit.wikimedia.org/r/61809>(bug
>>> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
>>> resolve the issue naturally as Google re-indexed. But it appears Google has
>>> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
>>> Google's index, instead of the .wikipedia.org URLs.
>>>
>>> I have thus resubmitted patch 64629<https://gerrit.wikimedia.org/r/64629> 
>>> for
>>> re-review. We will need to further discuss whether it is appropriate to
>>> have Google completely remove .zero.wikipedia.org links from their
>>> cache, or if perhaps we need to open a support thread with Google about
>>> canonical URLs.
>>>
>>>
>>>
>>>
>>> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa wrote:
>>>
>>>> Adam Baso (copied on this email) is working on it and a fix is ready.
>>>> He'll do some testing to make sure it's resolved.
>>>>
>>>> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc wrote:
>>>>
>>>>> Looping Dan Foy in who's managing the Zero backlog.
>>>>>
>>>>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride  wrote:
>>>>> > K. Peachey wrote:
>>>>> >>Can you please file this in bugzilla <https://bugzilla.wikimedia.org
>>>>> >?
>>>>> >
>>>>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>>>>> >
>>>>> >
>>>>> > MZMcBride
>>>>> >
>>>>> >
>>>>> >
>>>>> > ___
>>>>> > Wikimedia-l mailing list
>>>>> > Wikimedia-l@lists.wikimedia.org
>>>>> > Unsubscribe:
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>
>>>>> ___
>>>>> Wikimedia-l mailing list
>>>>> Wikimedia-l@lists.wikimedia.org
>>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kul Wadhwa
>>>> Head of Mobile
>>>> Wikimedia Foundation
>>>>
>>>
>>>
>>
>
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Wikipedia Zero in Google search result

2013-08-28 Thread Adam Baso
(cross-posted on mobile-l)

Update:

I have been checking on the indexed link count over the last couple of
months, and it has been roughly constant. Upon another check in the past
week, it looked like it was time to go ahead with the robots.txt update.

Just yesterday, the start of a robots.txt entry for .
zero.wikipedia.org has also been updated to instruct all robots like
Googlebot to not index .zero.wikipedia.org. Looks like even more
.zero.wikipedia.org pages may already be starting to fall out of the
index.

Thanks for flagging this! Will keep watching the indexed links count as it
dwindles.

Thanks again.
-Adam


On Wed, Jun 26, 2013 at 10:59 AM, Adam Baso  wrote:

> (cross-posted on mobile-l)
>
> Okay, looks like the index of zero.wikipedia.org pages in Google has
> shrunk by some 20 million entries. Nonetheless, a number of really old
> pages (e.g., going back to 6-May-2013) are still in the Google index with
> article text. I'll set a reminder to check on the Google index again in 30
> days, and hopefully then we can finally put the no-index rules in place at
> that time.
>
> The good news is that many of the pages are now correctly suppressed in
> natural search as non-canonical pages. In other words, a user would need to
> go through omitted results or do a site: search to see them.
>
> -Adam
>
>
> On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso  wrote:
>
>> Update:
>>
>> We've added an enhancement to Wikipedia Zero so that if a user who isn't
>> on a participating carrier network navigates to a Wikipedia Zero page on
>> .zero.wikipedia.org, such as
>> http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
>> presented an option to visit the canonical URL of the article. If clicked,
>> the canonical URL should get the user to the mobile or desktop version of
>> the page, based on device type.
>>
>> We're hoping that by next week the Google index will be refreshed so as
>> to correctly mark the .zero.wikipedia.org pages as duplicate
>> pages in the omitted section. Upon confirmation of as much, the current
>> plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to
>> prevent indexing of .zero.wikipedia.org altogether.
>>
>>
>> On Tue, May 28, 2013 at 6:26 PM, Adam Baso  wrote:
>>
>>> All,
>>>
>>> My mistake. The pages in Google's index that I used for sampling - the
>>> ones that have "Sorry, ..." in their description in Google search results -
>>> are cached pages. I assumed incorrectly that those pages were based on
>>> recent indexing (e.g., in the past few days).
>>>
>>> I think we can actually stick to the original plan of Google re-indexing
>>> and the search results de-emphasizing the 
>>> .zero.wikipedia.orglinks within the next 30 days.
>>>
>>> I still find it strange that there are .zero.wikipedia.orglinks 
>>> that turned up higher in the search engine rankings than their
>>> better-established .wikipedia.org counterparts. But I suppose
>>> with fewer competing page elements, especially on long-tail articles with
>>> fewer or no direct links to the desktop page, this is maybe not totally
>>> unexpected.
>>>
>>> -Adam
>>>
>>>
>>>
>>>
>>> On Tue, May 28, 2013 at 1:49 PM, Adam Baso  wrote:
>>>
>>>> Hello All,
>>>>
>>>> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
>>>> in hopes that an earlier patch, patch 
>>>> 61809<https://gerrit.wikimedia.org/r/61809>(bug
>>>> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
>>>> resolve the issue naturally as Google re-indexed. But it appears Google has
>>>> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
>>>> Google's index, instead of the .wikipedia.org URLs.
>>>>
>>>> I have thus resubmitted patch 64629<https://gerrit.wikimedia.org/r/64629> 
>>>> for
>>>> re-review. We will need to further discuss whether it is appropriate to
>>>> have Google completely remove .zero.wikipedia.org links from their
>>>> cache, or if perhaps we need to open a support thread with Google about
>>>> canonical URLs.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa wrote:
>>>>
>>>>> Adam Baso (copied on this email) is working on it and a fix is ready.
>>>>> He'll do some testing to make sure it's resol