Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread Roan Kattouw
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian
canan...@wikimedia.org wrote:
 For what it's worth, both the DOM serialization-to-a-string and DOM
 parsing-from-a-string are done with the domino package.  It has a
 substantial test suite of its own (originally from
 http://www.w3.org/html/wg/wiki/Testing I believe).  So although the above
 is probably worth doing as a low-priority task, it's really a test of the
 third-party library, not of Parsoid.  (Although, since I'm a co-maintainer
 of domino, I'd be very interested in fixing any bugs which it did turn up.)

I didn't mean it as a test of Domino, I meant it as a test of Parsoid:
does it generate things that are then foster-parented out, or other
things that a compliant DOM parser won't round-trip? It's also a more
realistic test, because the way that Parsoid is actually used by VE in
practice is that it serializes its DOM, sends it over the wire to VE,
which then does things with it and gives an HTML string back, which is
then parsed through Domino. So even in normal operation, ignoring the
fact that VE runs stuff through the browser's DOM parser, Parsoid
itself already round-trips the HTML through Domino, effectively.

 The foster parenting issues mostly arise in the wikitext-parsoid DOM
 phase.  Basically, the wikitext is tokenized into a HTML tag soup and then
 a customized version of the standard HTML parser is used to assemble the
 soup into a DOM, mimicking the process by which a browser would parse the
 tag soup emitted by the current PHP parser.  So the existing test suite
 does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few
months ago where Parsoid inserted meta tags in places where they
can't be put (e.g. trs), and no one in the Parsoid team seemed to
have noticed until I tracked down a few VE bugs to that problem.

Roan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread Subramanya Sastry

On 07/25/2013 01:03 PM, Roan Kattouw wrote:

On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian
canan...@wikimedia.org wrote:

For what it's worth, both the DOM serialization-to-a-string and DOM
parsing-from-a-string are done with the domino package.  It has a
substantial test suite of its own (originally from
http://www.w3.org/html/wg/wiki/Testing I believe).  So although the above
is probably worth doing as a low-priority task, it's really a test of the
third-party library, not of Parsoid.  (Although, since I'm a co-maintainer
of domino, I'd be very interested in fixing any bugs which it did turn up.)


I didn't mean it as a test of Domino, I meant it as a test of Parsoid:
does it generate things that are then foster-parented out, or other
things that a compliant DOM parser won't round-trip? It's also a more
realistic test, because the way that Parsoid is actually used by VE in
practice is that it serializes its DOM, sends it over the wire to VE,
which then does things with it and gives an HTML string back, which is
then parsed through Domino. So even in normal operation, ignoring the
fact that VE runs stuff through the browser's DOM parser, Parsoid
itself already round-trips the HTML through Domino, effectively.


We use two different libraries for different things:

* html5 library for building a DOM from a tag soup
* domino for serializing DOM -- HTML and for parsing HTML -- DOM

When doing a WT2WT roundtrip test, there are 2 ways to do this:

1. wikitext -- tag soup -- DOM (in-memory tree) -- wikitext
2. wikitext -- tag soup -- DOM (in-memory tree) -- HTML (string)-- 
DOM -- wikitext


We currently do 1. in our wt2wt testing.  If there are foster-parenting 
bugs in the HTML5 library, then they will get hidden if we use path 1.   
However, when using VE and serializing its result back to wikitext, we 
are effectively using path 2.


And, both Roan and Scott are correct.  Pathway 2. would be a test of of 
external libraries (HTML5 and Domino, not just domino).  And, we did 
have bugs in the HTML5 parsing library we used (which I fixed based on 
reports from Roan) and then added them to parser tests.


But, if we use path 2. for all our RT testing for wp pages, other latent 
bugs with fostered content will show up.


Hope this clarifies the issue.

Subbu.



The foster parenting issues mostly arise in the wikitext-parsoid DOM
phase.  Basically, the wikitext is tokenized into a HTML tag soup and then
a customized version of the standard HTML parser is used to assemble the
soup into a DOM, mimicking the process by which a browser would parse the
tag soup emitted by the current PHP parser.  So the existing test suite
does expose these foster-parenting issues already.

Does it really? There were a number of foster-parenting issues a few
months ago where Parsoid inserted meta tags in places where they
can't be put (e.g. trs), and no one in the Parsoid team seemed to
have noticed until I tracked down a few VE bugs to that problem.

Roan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread C. Scott Ananian
On Thu, Jul 25, 2013 at 2:19 PM, Subramanya Sastry ssas...@wikimedia.orgwrote:

 And, both Roan and Scott are correct.  Pathway 2. would be a test of of
 external libraries (HTML5 and Domino, not just domino).  And, we did have
 bugs in the HTML5 parsing library we used (which I fixed based on reports
 from Roan) and then added them to parser tests.


If you're playing along at home, the domino bug was:
https://github.com/fgnass/domino/pull/36

Hopefully there's not too many more of those lurking.
 --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Marc Ordinas i Llopis
On Wed, Jul 24, 2013 at 1:55 AM, John Vandenberg jay...@gmail.com wrote:

 Could you provide a dump of the list of 24000 bustable pages?  Split
 by project?  Each community could then investigate those pages for
 broken tables, and more critically .. templates which emit broken
 wikisyntax that is causing your team grief.


As Subbu said, I'm currently working on improving the round-trip test
server, mostly on porting it from sqlite to MySQL but also on expanding the
stats kept (with things like performance, etc.). If you think of some other
data we should track, or any new report we could add, we certainly welcome
suggestions :) Please open a new bug or add to the existing one:
https://bugzilla.wikimedia.org/show_bug.cgi?id=46659

Or just drop by #wikimedia-parsoid, I'm marcoil there.

Cheers,
Marc
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Roan Kattouw
On Wed, Jul 24, 2013 at 3:10 AM, Marc Ordinas i Llopis
marc...@wikimedia.org wrote:
 As Subbu said, I'm currently working on improving the round-trip test
 server, mostly on porting it from sqlite to MySQL but also on expanding the
 stats kept (with things like performance, etc.). If you think of some other
 data we should track, or any new report we could add, we certainly welcome
 suggestions :) Please open a new bug or add to the existing one:
 https://bugzilla.wikimedia.org/show_bug.cgi?id=46659

Thanks for working on this! The Parsoid testing infrastructure is
pretty awesome.

There are a few things I wish it tested, but they're mostly about how
it tests things rather than what data is collected. For instance, it
would be nice if the round-trip tests could round-trip from wikitext
to HTML *string* and back, rather than to HTML *DOM* and back. This
would help catch cases where the DOM doesn't cleanly round-trip
through the HTML parser (foster-parenting for instance). It may be
that this is already implemented, or that it was considered and
rejected, I don't know.

Additionally, it might be helpful to have some tests looking for null
DSRs or other broken data-parsoid stuff (because this breaks selser),
and/or some sort of selser testing in general (though off the top of
my head I'm not sure what that would look like). Another fun
serialization test that could be done is stripping all data-parsoid
attributes and asserting that this doesn't result in any semantic
diffs (you'll get lots of syntactic diffs of course).

 Or just drop by #wikimedia-parsoid, I'm marcoil there.

The channel is #mediawiki-parsoid :)

Roan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Subramanya Sastry

On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but they're mostly about how 
it tests things rather than what data is collected. For instance, it 
would be nice if the round-trip tests could round-trip from wikitext 
to HTML *string* and back, rather than to HTML *DOM* and back. This 
would help catch cases where the DOM doesn't cleanly round-trip 
through the HTML parser (foster-parenting for instance). It may be 
that this is already implemented, or that it was considered and 
rejected, I don't know. 


Yes, we've considered this for a while now.  Just not done yet since we 
haven't had a chance to work on the testing infrastructure in over 6 
months till now.


Additionally, it might be helpful to have some tests looking for null 
DSRs or other broken data-parsoid stuff (because this breaks selser), 
and/or some sort of selser testing in general (though off the top of 
my head I'm not sure what that would look like). Another fun 
serialization test that could be done is stripping all data-parsoid 
attributes and asserting that this doesn't result in any semantic 
diffs (you'll get lots of syntactic diffs of course).


We've on and off talked about how whether we could mimic editing on real 
pages and test correctness of resulting wikitext -- it is unclear at 
this time.  So, hasn't happened yet.


Also, null DSR  (* see below for what a DSR is)  by itself is not a 
serious problem -- it just means that that particular DOM node will go 
through regular serialization (and *might* introduce dirty diffs).  We 
also dont want to add a lot of noise to testing results without having a 
way to filter useful things out of it.


But, we could brainstorm ways of doing this on IRC.

Subbu.

* DSR: Domain Source Range.  Given a DOM node, a DSR tells what range of 
wikitext generated that piece of HTML.  While seemingly simple, 
calculating this accurately without introducing errors is quite tricky 
given that wikitext is string-based and DOM is structural and there is 
not such a clean mapping, especially in the presence of templates that 
generate fragments of a HTML string (ex: generating part of an html tag 
like a style attribute, generating multiple table cells, or multiple 
attributes, etc.).  Selective serialization for avoiding dirty diffs 
relies crucially on the accuracy of this mapping.




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Marc Ordinas i Llopis
On Wed, Jul 24, 2013 at 4:58 PM, Roan Kattouw roan.katt...@gmail.comwrote:

  Or just drop by #wikimedia-parsoid, I'm marcoil there.
 
 The channel is #mediawiki-parsoid :)


Yes, sorry… I hadn't had enough coffee :)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread C. Scott Ananian
On Wed, Jul 24, 2013 at 11:20 AM, Subramanya Sastry
ssas...@wikimedia.orgwrote:

 On 07/24/2013 09:58 AM, Roan Kattouw wrote:

 There are a few things I wish it tested, but they're mostly about how it
 tests things rather than what data is collected. For instance, it would be
 nice if the round-trip tests could round-trip from wikitext to HTML
 *string* and back, rather than to HTML *DOM* and back. This would help
 catch cases where the DOM doesn't cleanly round-trip through the HTML
 parser (foster-parenting for instance). It may be that this is already
 implemented, or that it was considered and rejected, I don't know.


 Yes, we've considered this for a while now.  Just not done yet since we
 haven't had a chance to work on the testing infrastructure in over 6 months
 till now.


For what it's worth, both the DOM serialization-to-a-string and DOM
parsing-from-a-string are done with the domino package.  It has a
substantial test suite of its own (originally from
http://www.w3.org/html/wg/wiki/Testing I believe).  So although the above
is probably worth doing as a low-priority task, it's really a test of the
third-party library, not of Parsoid.  (Although, since I'm a co-maintainer
of domino, I'd be very interested in fixing any bugs which it did turn up.)

The foster parenting issues mostly arise in the wikitext-parsoid DOM
phase.  Basically, the wikitext is tokenized into a HTML tag soup and then
a customized version of the standard HTML parser is used to assemble the
soup into a DOM, mimicking the process by which a browser would parse the
tag soup emitted by the current PHP parser.  So the existing test suite
does expose these foster-parenting issues already.
  --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:
 Hi John and Risker,

 First off, I do want to once again clarify that my intention in the previous
 post was not to claim that VE/Parsoid is perfect.  It was more that we've
 fixed sufficient bugs at this point that the most significant bugs (bugs,
 not missing features) that need fixing (and are being fixed) are those that
 have to do with usability tweaks.

How do you know that?  Have you performed automated tests on all
Wikipedia content?  Or are you waiting for users to find these bugs?

 My intention in that post was also not
 one to put some distance between us and the complaints, just to clarify that
 we are fixing things as fast as we can and it can be seen in the recent
 changes stream.

 John: specific answers to the edit diffs you highlighted in your post.  I
 acknowledge your intention to make sure we dont make false claims about
 VE/Parsoid's usability.   Thanks for taking the time for digging them up.
 My answers below are made with an intention of figuring out what the issues
 are so they can be fixed where they need to be.


 On 07/23/2013 02:50 AM, John Vandenberg wrote:

 On Tue, Jul 23, 2013 at 4:32 PM, Subramanya Sastry
 ssas...@wikimedia.org wrote:

 On 07/22/2013 10:44 PM, Tim Starling wrote:

 Round-trip bugs, and bugs which cause a given wikitext input to give
 different HTML in Parsoid compared to MW, should have been detected
 during automated testing, prior to beta deployment. I don't know why
 we need users to report them.


 500+ edits are being done per hour using Visual Editor [1] (less at this
 time given that it is way past midnight -- I have seen about 700/hour at
 times).  I did go and click on over 100 links and examined the diffs.  I
 did
 that twice in the last hour.  I am happy to report clean diffs on all
 edits
 I checked both times.

 I did run into a couple of nowiki-insertions which
 is, strictly speaking not erroneous and based on user input, but is more
 a
 usability issue.

 What is a dirty diff?  One that inserts junk unexpectedly, unrelated
 to the user's input?


 That is correct.  Strictly speaking, yes, any changes to the wikitext markup
 that arose from what the user didn't change.

 The broken table injection bugs are still happening.


 https://en.wikipedia.org/w/index.php?title=Sai_Baba_of_Shirdicurid=144175diff=565442800oldid=565354286

 If the parser isnt going to be fixed quickly to ignore tables it
 doesnt understand, we need to find the templates and pages with these
 broken tables - preferably using SQL and heuristics and fix them.  The
 same needs to be done for all the other wikis, otherwise they are
 going to have the same problems happening randomly, causing lots of
 grief.


 This maybe related to this:
 https://bugzilla.wikimedia.org/show_bug.cgi?id=51217  and I have a tentative
 fix for it as of y'day.

Fixes are of course appreciated.  The pace of bugfixes is not the problem ...

 VE and Parsoid devs have put in a lot and lot of effort to recognize broken
 wikitext source, fix it or isolate it,

My point was that you dont appear to be doing analysis of how of all
Wikipedia content is broken; at least I dont see a public document
listing which templates and pages are causing the parser problems, so
the communities on each Wikipedia can fix them ahead of deployment.

I believe there is bug about automated testing of the parser against
existing pages, which would identify problems.

I scanned the Spanish 'visualeditor' tag's 50 recentchanges earlier
and found a dirty diff, which I believe hasnt been raised in bugzilla
yet.

https://bugzilla.wikimedia.org/show_bug.cgi?id=51909

50 VE edits on eswp is more than one day of recentchanges.  Most of
the top 10 wikis have roughly the same level of testing going on.
That should be a concern.  The number of VE edits is about to increase
on another nine Wikipedias, with very little real impact analysis
having been done.  That is a shame, because the enwp deployment has
provided us with a list of problems which will impact those wikis if
they are using the same syntax, be it weird or broken or otherwise
troublesome.

 and protect it across edits, and
 roundtrip it back in original form to prevent corruption.  I think we have
 been largely successful but we still have more cases to go that are being
 exposed here which we will fix.  But, occasionally, these kind of errors do
 show up -- and we ask for your patience as we fix these.  Once again, this
 is not a claim to perfection, but a claim that this is not a significant
 source of corrupt edits.  But, yes even a 0.1% error rate does mean a big
 number in the absolute when thousands of pages are being edited -- and we
 will continue to pare this down.

Is 0.1% a real data point, or a stab in the dark?  Because I found two
in 100 on enwp; Robert found at least one in 200 on enwp; and I found
1 in 50 on eswp.

 In addition to nowikis, there are also wikilinks that are not 

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
 On Tue, Jul 23, 2013 at 6:28 PM, John Vandenberg jay...@gmail.com wrote:

 On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry
 ssas...@wikimedia.org wrote:
  Hi John and Risker,
 
  First off, I do want to once again clarify that my intention in the
 previous
  post was not to claim that VE/Parsoid is perfect.  It was more that we've
  fixed sufficient bugs at this point that the most significant bugs
 (bugs,
  not missing features) that need fixing (and are being fixed) are those
 that
  have to do with usability tweaks.

 How do you know that?  Have you performed automated tests on all
 Wikipedia content?


Yes -- or at least a large random subset of wp content comprising 160,509
articles across a dozen or so different languages.

http://www.mediawiki.org/wiki/Parsoid/Roundtrip

 --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry

On 07/23/2013 05:28 PM, John Vandenberg wrote:

On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:

Hi John and Risker,

First off, I do want to once again clarify that my intention in the previous
post was not to claim that VE/Parsoid is perfect.  It was more that we've
fixed sufficient bugs at this point that the most significant bugs (bugs,
not missing features) that need fixing (and are being fixed) are those that
have to do with usability tweaks.

How do you know that?  Have you performed automated tests on all
Wikipedia content?  Or are you waiting for users to find these bugs?


http://parsoid.wmflabs.org:8001/stats

This is the url for our round trip testing on 160K pages (20K each from 
8 wikipedias).


Till late March, we used to run round trip testing on 100K enwp pages.  
We then moved to a mix of pages from different WPs to catch language and 
wiki-specific issues and fix them.


So, this is our methodology for catching parse and roundtrip errors on 
real WP pages and regressions.


I wont go into great details of what the 3 numbers mean and the nuances.

But, 99.6% means that 0.4% of pages still had corruptions, and that 15% 
of pages had syntactic dirty diffs.


However, note that this is because the serialization behaves as if the 
entire document is edited (which lets us stress test our seriailzation 
system) but is not real behavior in production.  In production, our HTML 
two WT is smarter and attempts to only serialize modified segments and 
uses original wikitext for unmodified segments of the dom (called 
selective serialization). So, in reality, the corruption percentage 
should be much smaller than even the 0.4% and the dirty diffs as well 
will be way smaller (but you are still finding 1 in 200 or more) -- and 
this is separate from nowiki issues.


We are not solely dependent on users to find bugs for us, no, but in 
production, if there are corruptions that show up, it would be helpful 
if we are alerted.


Does that clarify?


VE and Parsoid devs have put in a lot and lot of effort to recognize broken
wikitext source, fix it or isolate it,

My point was that you dont appear to be doing analysis of how of all
Wikipedia content is broken; at least I dont see a public document
listing which templates and pages are causing the parser problems, so
the communities on each Wikipedia can fix them ahead of deployment.


Unfortunately, this is much harder to do.  What we can consider is to 
periodically swap out our test pages to consider a fresh patch of pages 
so new kinds of problems show up in automated testing.  In some cases, 
detecting problems automatically is equivalent to be able to fix them up 
automatically as well.


Gabriel is currently on a (well-deserved) vacation and once he is back, 
we'll discuss this issue and see what can be done.  But, whenever we 
find problems, we've been fixing templates (about 3 or 4 fixed so far) 
or we fix broken wikitext as well.


We also have this desirable enhancement/tool that we could build: 
https://bugzilla.wikimedia.org/show_bug.cgi?id=46705



I believe there is bug about automated testing of the parser against
existing pages, which would identify problems.

I scanned the Spanish 'visualeditor' tag's 50 recentchanges earlier
and found a dirty diff, which I believe hasnt been raised in bugzilla
yet.

https://bugzilla.wikimedia.org/show_bug.cgi?id=51909

50 VE edits on eswp is more than one day of recentchanges.  Most of
the top 10 wikis have roughly the same level of testing going on.
That should be a concern.  The number of VE edits is about to increase
on another nine Wikipedias, with very little real impact analysis
having been done.  That is a shame, because the enwp deployment has
provided us with a list of problems which will impact those wikis if
they are using the same syntax, be it weird or broken or otherwise
troublesome.


As indicated earlier, we have done automated RT testing on 20K pages on 
different WPs and fixed various problems, but yes, this will not catch 
all problematic scenarios.



and protect it across edits, and
roundtrip it back in original form to prevent corruption.  I think we have
been largely successful but we still have more cases to go that are being
exposed here which we will fix.  But, occasionally, these kind of errors do
show up -- and we ask for your patience as we fix these.  Once again, this
is not a claim to perfection, but a claim that this is not a significant
source of corrupt edits.  But, yes even a 0.1% error rate does mean a big
number in the absolute when thousands of pages are being edited -- and we
will continue to pare this down.

Is 0.1% a real data point, or a stab in the dark?  Because I found two
in 100 on enwp; Robert found at least one in 200 on enwp; and I found
1 in 50 on eswp.


Sorry -- I should have phrased that better.  I just picked 0.1% as an 
arbitrary number to make the observation that even when it is as low as 
0.1%, in absolute 

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:
 On 07/23/2013 05:28 PM, John Vandenberg wrote:

 On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry
 ssas...@wikimedia.org wrote:

 Hi John and Risker,

 First off, I do want to once again clarify that my intention in the
 previous
 post was not to claim that VE/Parsoid is perfect.  It was more that we've
 fixed sufficient bugs at this point that the most significant bugs
 (bugs,
 not missing features) that need fixing (and are being fixed) are those
 that
 have to do with usability tweaks.

 How do you know that?  Have you performed automated tests on all
 Wikipedia content?  Or are you waiting for users to find these bugs?


 http://parsoid.wmflabs.org:8001/stats

 This is the url for our round trip testing on 160K pages (20K each from 8
 wikipedias).

Fantastic!  How frequently are those tests re-run?  Could you add a
last-run-date on that page?

Was a regression testsuite built using the issues encountered during
the last parser rewrite?

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:13 PM, John Vandenberg jay...@gmail.com wrote:

  http://parsoid.wmflabs.org:8001/stats
 
  This is the url for our round trip testing on 160K pages (20K each from 8
  wikipedias).

 Fantastic!  How frequently are those tests re-run?  Could you add a
 last-run-date on that page?


The git sha1 displayed on the page can be turned into a timestamp.  For
example, it's currently showing git
 d5fe6c9052c23bcc0b63a4d0d1b3e5b68fd2ef37 and
https://git.wikimedia.org/commit/mediawiki%2Fextensions%2FParsoid/d5fe6c9052c23bcc0b63a4d0d1b3e5b68fd2ef37
says that commit was authored on Fri Jul 19 10:20:39 2013 -0700.  So, less
than a week old (it takes a few days to crank through all the pages in its
set).

Was a regression testsuite built using the issues encountered during
 the last parser rewrite?


Yes, mediawiki/core/tests/parser/parserTests.txt (which predates parsoid)
has been continuously updated throughout the development process.
 --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry

On 07/23/2013 06:13 PM, John Vandenberg wrote:

On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:

On 07/23/2013 05:28 PM, John Vandenberg wrote:

On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:

Hi John and Risker,

First off, I do want to once again clarify that my intention in the
previous
post was not to claim that VE/Parsoid is perfect.  It was more that we've
fixed sufficient bugs at this point that the most significant bugs
(bugs,
not missing features) that need fixing (and are being fixed) are those
that
have to do with usability tweaks.

How do you know that?  Have you performed automated tests on all
Wikipedia content?  Or are you waiting for users to find these bugs?


http://parsoid.wmflabs.org:8001/stats

This is the url for our round trip testing on 160K pages (20K each from 8
wikipedias).

Fantastic!  How frequently are those tests re-run?  Could you add a
last-run-date on that page?


The tests are re-run after a bunch of commits that we think should be 
regression tested -- usually updated one or more times a day (when a lot 
of patches are being merged) or after a few days (during periods of low 
activity).   The last code udpate was Thursday


http://parsoid.wmflabs.org:8001/commits gives you the list of commits 
(and date when code was updated)
http://parsoid.wmflabs.org:8001/topfails gives you individual test 
results on every tested page for more detail.


Currently we are updating our rt testing infrastructure to gather 
performance numbers as well (this has been on the cards for a long time, 
but never got the attention it needed).  But, Marco is working that part 
of our codebase as we speak. 
https://bugzilla.wikimedia.org/show_bug.cgi?id=46659  and other related 
ones.


We do not deploy to production before we have run tests on a subset of 
pages in rt-testing.  Given the nature of how tests are run, it is 
usually sufficient to run on about a 1000 pages to know if there are 
serious regressions .. sometimes we run on a larger subset of pages.



Was a regression testsuite built using the issues encountered during
the last parser rewrite?


We also continually update a parser tests file (in the code repository) 
with minimized test cases based on regressions and odd wikitext usage.  
About 1100 tests so far that run in 4 modes (wt2html, wt2wt, html2wt, 
html2html) plus 14000 randomly generated edits to the tests to mimic 
edits and test our selective serializer.  This is our first guard 
against bad code.


Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:24 PM, C. Scott Ananian canan...@wikimedia.orgwrote:


 Was a regression testsuite built using the issues encountered during
 the last parser rewrite?


 Yes, mediawiki/core/tests/parser/parserTests.txt (which predates parsoid)
 has been continuously updated throughout the development process.


If you'd like to see the set of tests:

https://git.wikimedia.org/blob/mediawiki%2Fcore/master/tests%2Fparser%2FparserTests.txt
(git.wikimedia.org was temporarily down when I wrote my previous email.)
 --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry

On 07/23/2013 06:02 PM, Subramanya Sastry wrote:

On 07/23/2013 05:28 PM, John Vandenberg wrote:

VE and Parsoid devs have put in a lot and lot of effort to recognize 
broken

wikitext source, fix it or isolate it,

My point was that you dont appear to be doing analysis of how of all
Wikipedia content is broken; at least I dont see a public document
listing which templates and pages are causing the parser problems, so
the communities on each Wikipedia can fix them ahead of deployment.


Unfortunately, this is much harder to do.  What we can consider is to 
periodically swap out our test pages to consider a fresh patch of 
pages so new kinds of problems show up in automated testing. In some 
cases, detecting problems automatically is equivalent to be able to 
fix them up automatically as well.


Actually, we do have a beginnings of a page for this that I had 
forgotten about: 
http://www.mediawiki.org/wiki/Parsoid/Broken_wikitext_tar_pit   I dont 
think this is very helpful at this time and is what you are asking for, 
but just pointing it out for the record that we've thought about it some.


Some of these cases -- we are actually beginning to address
* fostered content in top-level pages (we handle fostering from templates)
* handling of templates that produce part of a table-cell, or multiple 
cells, or multiple attributes of an image.


Ideally, we would not have to support these kind of use cases, but given 
what we are seeing in production now, we might try to deal with some of 
these cases ... Interestingly enough, we do a much better job of 
protecting against unclosed tables, fostered content out of tables, etc. 
when they come from templates rather than when such wikitext occurs in 
the page content itself.  We have a couple of DOM analysis passes to 
detect those problems and protect them from editing ... but that needs 
to be extended to top level page content.


Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:
 http://parsoid.wmflabs.org:8001/stats

 This is the url for our round trip testing on 160K pages (20K each from 8
 wikipedias).

Very minor point .. there are ~400 missing pages on the list; is that
intentional ? ;-)

One is 'Mos:time' which is in NS 0, and does actually exist as a
redirect to the WP: manual of style:
https://en.wikipedia.org/wiki/Mos:time

...
 But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of
 pages had syntactic dirty diffs.

So 15% is 24000 pages which can bust, but may not if the edit doesnt
touch the bustable part.

Does /topfails cycle through all 24000, 40 pages at a time?

Could you provide a dump of the list of 24000 bustable pages?  Split
by project?  Each community could then investigate those pages for
broken tables, and more critically .. templates which emit broken
wikisyntax that is causing your team grief.

Do you have stats on each of those eight wikipedias? i.e. is there
noticeable differences in the percentages on different wikipedias? if
so, can you report those percentages for each projects?  I'm guessing
Chinese is an example where there are higher percentages..?

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry

On 07/23/2013 06:55 PM, John Vandenberg wrote:

On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
ssas...@wikimedia.org wrote:

http://parsoid.wmflabs.org:8001/stats

This is the url for our round trip testing on 160K pages (20K each from 8
wikipedias).

Very minor point .. there are ~400 missing pages on the list; is that
intentional ? ;-)

One is 'Mos:time' which is in NS 0, and does actually exist as a
redirect to the WP: manual of style:
https://en.wikipedia.org/wiki/Mos:time


1. Some pages get deleted and then go 404. 
(http://parsoid.wmflabs.org:8001/failedFetches)
2. There are some (known) bugs in our rt testing infrastructure around 
recording results -- should be fixed once our testing infrastructure is 
updated and moved to mysql (from sqlite)



...
But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of
pages had syntactic dirty diffs.

So 15% is 24000 pages which can bust, but may not if the edit doesnt
touch the bustable part.


No, 15% of pages aren't bust.  15% of pages introduce meaning-preserving 
(hence purely syntactic) dirty diffs depending on what piece of the page 
is edited.
Ex: whitespace diffs, addition of  around attribute values are the most 
common ones.


For an example, see this: 
http://parsoid.wmflabs.org:8001/result/d5fe6c9052c23bcc0b63a4d0d1b3e5b68fd2ef37/en/Ketill_Flatnose


0.4% (~ 640) pages are classified as semantic diffs.  We assign a 
numerical score in base 1000 (digit 3 = # errors, digit 2 = # semantic 
errors, digit 1: # syntactic errors).
When results are sorted in reverse order of score, it gives us the most 
egregious pages to focus on (crashers first, semantic errors next, 
purely dirty diffs next).


So, going to http://parsoid.wmflabs.org:8001/topfails and paging through 
that will give you what you are looking for.  16 pages with 40 entries 
each.  We hang out on #mediawiki-parsoid and can help editors make sense 
of the diffs if anyone wants to look for broken wikitext and fix them.


Subbu.


Does /topfails cycle through all 24000, 40 pages at a time?

Could you provide a dump of the list of 24000 bustable pages?  Split
by project?  Each community could then investigate those pages for
broken tables, and more critically .. templates which emit broken
wikisyntax that is causing your team grief.

Do you have stats on each of those eight wikipedias? i.e. is there
noticeable differences in the percentages on different wikipedias? if
so, can you report those percentages for each projects?  I'm guessing
Chinese is an example where there are higher percentages..?

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:55 PM, John Vandenberg jay...@gmail.com wrote:

 On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry
 ssas...@wikimedia.org wrote:
  http://parsoid.wmflabs.org:8001/stats
 
  This is the url for our round trip testing on 160K pages (20K each from 8
  wikipedias).

 Very minor point .. there are ~400 missing pages on the list; is that
 intentional ? ;-)

 One is 'Mos:time' which is in NS 0, and does actually exist as a
 redirect to the WP: manual of style:
 https://en.wikipedia.org/wiki/Mos:time


I think it's an artifact of the changing article set on the wikis.  We
created the original page set months ago, and we haven't changed it since
so that our results are still comparable over time.  Since then 1) some
pages have been deleted/moved, and 2) we fixed parsoid not to automatically
follow redirects (bug 45808).


  But, 99.6% means that 0.4% of pages still had corruptions, and that 15%
 of
  pages had syntactic dirty diffs.

 So 15% is 24000 pages which can bust, but may not if the edit doesnt
 touch the bustable part.


subbu covered this in his email.  yes but only if you consider an extra
unrendered newline (etc) a bust.  Syntactic diffs are wikitext
differences which do *not* lead to visible differences.  *Semantic* diffs
are the ones which lead to visible differences.  So 0.4% of the pages will
bust iff the bustable part is touched.

Does /topfails cycle through all 24000, 40 pages at a time?


yes.

Could you provide a dump of the list of 24000 bustable pages?  Split
 by project?  Each community could then investigate those pages for
 broken tables, and more critically .. templates which emit broken
 wikisyntax that is causing your team grief.


we could do that.  Usually there will be a very small number of broken
templates which end up reused in lots of places.  So it's probably best to
just look at the first few pages, fix the issues there, and then retest.

Do you have stats on each of those eight wikipedias? i.e. is there
 noticeable differences in the percentages on different wikipedias? if
 so, can you report those percentages for each projects?  I'm guessing
 Chinese is an example where there are higher percentages..?


http://parsoid.wmflabs.org:8001/stats/en gives results just for en, etc.
There are 10k titles from each of en de nl fr it ru es sv pl ja ar he hi ko
zh is.  (Of course, some titles have been deleted/moved as described above.)
  --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l