On Tue, Jul 23, 2013 at 7:55 PM, John Vandenberg <jay...@gmail.com> wrote:

> On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
> <ssas...@wikimedia.org> wrote:
> > http://parsoid.wmflabs.org:8001/stats
> >
> > This is the url for our round trip testing on 160K pages (20K each from 8
> > wikipedias).
>
> Very minor point .. there are ~400 missing pages on the list; is that
> intentional ? ;-)
>
> One is 'Mos:time' which is in NS 0, and does actually exist as a
> redirect to the WP: manual of style:
> https://en.wikipedia.org/wiki/Mos:time


I think it's an artifact of the changing article set on the wikis.  We
created the original page set months ago, and we haven't changed it since
so that our results are still comparable over time.  Since then 1) some
pages have been deleted/moved, and 2) we fixed parsoid not to automatically
follow redirects (bug 45808).


> > But, 99.6% means that 0.4% of pages still had corruptions, and that 15%
> of
> > pages had syntactic dirty diffs.
>
> So 15% is 24000 pages which can bust, but may not if the edit doesnt
> touch the bustable part.
>

subbu covered this in his email.  "yes" but only if you consider an extra
unrendered newline (etc) a "bust".  Syntactic diffs are wikitext
differences which do *not* lead to visible differences.  *Semantic* diffs
are the ones which lead to visible differences.  So 0.4% of the pages will
"bust" iff the bustable part is touched.

Does /topfails cycle through all 24000, 40 pages at a time?
>

yes.

Could you provide a dump of the list of 24000 bustable pages?  Split
> by project?  Each community could then investigate those pages for
> broken tables, and more critically .. templates which emit broken
> wikisyntax that is causing your team grief.
>

we could do that.  Usually there will be a very small number of broken
templates which end up reused in lots of places.  So it's probably best to
just look at the first few pages, fix the issues there, and then retest.

Do you have stats on each of those eight wikipedias? i.e. is there
> noticeable differences in the percentages on different wikipedias? if
> so, can you report those percentages for each projects?  I'm guessing
> Chinese is an example where there are higher percentages..?
>

http://parsoid.wmflabs.org:8001/stats/en gives results just for en, etc.
There are 10k titles from each of en de nl fr it ru es sv pl ja ar he hi ko
zh is.  (Of course, some titles have been deleted/moved as described above.)
  --scott

-- 
(http://cscott.net)
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to