Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread יגאל חיטרון
Sorry for misunderstanding, I spoke about the whitespace.
Igal


2017-08-10 22:06 GMT+03:00 Subramanya Sastry :

>
> On 08/10/2017 02:49 PM, יגאל חיטרון wrote:
>
>> Hello and thank you for this. Is there a phab ticket to follow the
>> deployment process?
>> Igal (User:IKhitron)
>>
> We have the original Tidy replacement ticket (
> https://phabricator.wikimedia.org/T89331) but, as we get closer to start
> making phased deployments, we'll create phab tickets to track deployments
> separately.
>
>
> Subbu.
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry


On 08/10/2017 02:49 PM, יגאל חיטרון wrote:

Hello and thank you for this. Is there a phab ticket to follow the
deployment process?
Igal (User:IKhitron)
We have the original Tidy replacement ticket 
(https://phabricator.wikimedia.org/T89331) but, as we get closer to 
start making phased deployments, we'll create phab tickets to track 
deployments separately.


Subbu.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread יגאל חיטרון
Hello and thank you for this. Is there a phab ticket to follow the
deployment process?
Igal (User:IKhitron)


2017-08-10 21:42 GMT+03:00 Subramanya Sastry :

>
>
> On 07/06/2017 08:02 AM, Subramanya Sastry wrote:
>
>>
>> TL;DR
>> -
>> The Parsing team wants to replace Tidy with a RemexHTML-based solution on
>> the
>> Wikimedia cluster by June 2018. This will require editors to fix pages and
>> templates to address wikitext patterns that behave differently with
>> RemexHTML.  Please see 'What editors will need to do' section on the Tidy
>> replacement FAQ [1].
>>
>> ..
>
>>
>> 9. Monitoring progress
>> --
>> In order to monitor progress, we plan to do a weekly (or some such
>> periodic
>> frequency) test run that compares the rendering of pages with Tidy and
>> with
>> RemexHTML on a large sample of pages (in the 50K range) from a large
>> subset
>> of Wikimedia wikis (~50 or so).  This will give us a pulse of how fixups
>> are
>> going, and when we might be able to flip the switch on different wikis.
>>
>
> I wanted to post some followups on this.
>
> 1. We have a revived dashboard that tracks linter error counts on wikis
>for all linter categories.
>
>See https://tools.wmflabs.org/wikitext-deprecation/
>
> 2. We track the error counts as they change and publish weekly snapshots
>comparing counts to a July 24th baseline (which is when I first
>started collecting stats)
>
>See https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/Linter/Stats
>
> 3. We also have a pixel-diffs test run (previously called visual diffs)
>that compares page rendering with Tidy and with RemexHTML. The test
>set has 73K pages sampled from 60 wikis. These diffs more accurately
>reflect what kind of rendering differences we can expect to see if
>pages are not fixed.
>
>See http://mw-expt-tests.wmflabs.org/
>
> 4. Based on the runs above, I identified one more high priority linter
>category which is a Tidy whitespace bug and needs to be fixed (expect
>mostly templates, especially navboxes based on what I've seen in the
>test run above). Once the code is reviewed and deployed to the
>cluster, we'll start populating this category.
>
>See https://gerrit.wikimedia.org/r/#/c/371068/ and
> https://gerrit.wikimedia.org/r/#/c/371071/
>
> Thanks,
> Subbu.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry



On 07/06/2017 08:02 AM, Subramanya Sastry wrote:


TL;DR
-
The Parsing team wants to replace Tidy with a RemexHTML-based solution on the
Wikimedia cluster by June 2018. This will require editors to fix pages and
templates to address wikitext patterns that behave differently with
RemexHTML.  Please see 'What editors will need to do' section on the Tidy
replacement FAQ [1].


..


9. Monitoring progress
--
In order to monitor progress, we plan to do a weekly (or some such periodic
frequency) test run that compares the rendering of pages with Tidy and with
RemexHTML on a large sample of pages (in the 50K range) from a large subset
of Wikimedia wikis (~50 or so).  This will give us a pulse of how fixups are
going, and when we might be able to flip the switch on different wikis.


I wanted to post some followups on this.

1. We have a revived dashboard that tracks linter error counts on wikis
   for all linter categories.

   See https://tools.wmflabs.org/wikitext-deprecation/

2. We track the error counts as they change and publish weekly snapshots
   comparing counts to a July 24th baseline (which is when I first
   started collecting stats)

   See https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/Linter/Stats

3. We also have a pixel-diffs test run (previously called visual diffs)
   that compares page rendering with Tidy and with RemexHTML. The test
   set has 73K pages sampled from 60 wikis. These diffs more accurately
   reflect what kind of rendering differences we can expect to see if
   pages are not fixed.

   See http://mw-expt-tests.wmflabs.org/

4. Based on the runs above, I identified one more high priority linter
   category which is a Tidy whitespace bug and needs to be fixed (expect
   mostly templates, especially navboxes based on what I've seen in the
   test run above). Once the code is reviewed and deployed to the
   cluster, we'll start populating this category.

   See https://gerrit.wikimedia.org/r/#/c/371068/ and 
https://gerrit.wikimedia.org/r/#/c/371071/


Thanks,
Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] More Detailed Browser Stats for Desktop Sites

2017-08-10 Thread Marcel Ruiz Forns
Hi Joaquin :]

In all WMF's "browser reports", the slice classified as "Other" can mean 1
of 2 things:

*1) Ua-parser  classifies
those requests as "Other".*
In this case, the requests come with a UA that is not recognized by
ua-parser. Probably they are, as you suggest, crawlers, bots or other kinds
of uncommon traffic. This should represent the smaller part of the whole
"Other" slice.

*2) The anonymization algorithm is sanitizing those requests setting them
to "Other".*
Browser stats data is privacy-sensitive, and we can not store it raw. There
is an algorithm that sanitizes all request groups that are too uncommon,
like "Opera 43 on Windows Phone", because they are so specific that they
could be used to re-identify a user. Each one of those groups is really
small, but all sanitized groups together make up to ~10% of the traffic.

We Analytics want to dedicate some time to this hopefully next quarter, to
reduce the percentage of that slice without loosing privacy:
https://phabricator.wikimedia.org/T131127

*Also*, I think the term "Other" is very confusing here, because it
indicates that that big 8.8% slice is neither Safari nor Chrome nor Android
nor Opera etc. But, in fact, this 8.8% includes all those browsers, it is
made of all those browsers. In my opinion those requests should be labelled
"Unknown" or "Sanitized" instead.

Cheers!


On Mon, Jul 24, 2017 at 2:23 PM, Joaquin Oltra Hernandez <
jhernan...@wikimedia.org> wrote:

> Thanks for sharing. This is very useful information in so many ways.
>
> For contrast, for awareness, here is some info about the mobile site
>  -site-by-browser>,
> which looks pretty different to desktop (last month's data):
>
>- Safari iOS is ~40.1%
>   - Mobile Safari (38%)
>   - iOS Chrome (safari based) (2.1%)
>- Chrome is 43.7%
>   - Chrome Mobile is 42%
>   - Chrome is 1.7%
>- Android Browser is 2.5%, with v4 being 1.9% of it
>- Opera mini is 1.3%
>- UC browser is 1.1%
>
> Nuria, do you know what is the 8.8% classified as "other"? Crawler bots?
>
> Some highlights from the last year:
>
>- Chrome + Safari are ~84%
>- Chrome mobile surpassed Safari mobile and has kept growing, more
>slowly in the last months.
>- Safari mobile usage seems pretty stable and most users are on v10
>- The Android browser has been steadily decreasing usage, and most of it
>is now on the v4 version, which means the old Android 2 browsers are
> less
>of a worry
>
>
> On Mon, Jul 17, 2017 at 8:33 PM Nuria Ruiz  wrote:
>
> > Hello:
> >
> >
> > Please take a look at the new browser report with more detailed desktop
> > site data (all wikimedia projects agreggated):
> >
> >
> > https://analytics.wikimedia.org/dashboards/browsers/#desktop
> -site-by-browser
> >
> > Some highlights:
> >
> > * Data is very stable over the last year
> >
> > * Chrome in the lead with 45% of traffic, closely followed by IE (18%)
> and
> > FF (13%)
> >
> > * The bulk of IE traffic is IE11 and IE7
> >
> > * Edge shows up with 4% slowly catching up to Safari (5%)
> >
> > * This data is still subject to fluctuations due to bot traffic not
> > identified as such.  We will be working on this next year.
> >
> >
> > Thanks,
> >
> > Nuria
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l