Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread Roan Kattouw
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian canan...@wikimedia.org wrote: For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from

Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread Subramanya Sastry
On 07/25/2013 01:03 PM, Roan Kattouw wrote: On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian canan...@wikimedia.org wrote: For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own

Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread C. Scott Ananian
On Thu, Jul 25, 2013 at 2:19 PM, Subramanya Sastry ssas...@wikimedia.orgwrote: And, both Roan and Scott are correct. Pathway 2. would be a test of of external libraries (HTML5 and Domino, not just domino). And, we did have bugs in the HTML5 parsing library we used (which I fixed based on

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Marc Ordinas i Llopis
On Wed, Jul 24, 2013 at 1:55 AM, John Vandenberg jay...@gmail.com wrote: Could you provide a dump of the list of 24000 bustable pages? Split by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Roan Kattouw
On Wed, Jul 24, 2013 at 3:10 AM, Marc Ordinas i Llopis marc...@wikimedia.org wrote: As Subbu said, I'm currently working on improving the round-trip test server, mostly on porting it from sqlite to MySQL but also on expanding the stats kept (with things like performance, etc.). If you think of

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Subramanya Sastry
On 07/24/2013 09:58 AM, Roan Kattouw wrote: There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Marc Ordinas i Llopis
On Wed, Jul 24, 2013 at 4:58 PM, Roan Kattouw roan.katt...@gmail.comwrote: Or just drop by #wikimedia-parsoid, I'm marcoil there. The channel is #mediawiki-parsoid :) Yes, sorry… I hadn't had enough coffee :) ___ Wikitech-l mailing list

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread C. Scott Ananian
On Wed, Jul 24, 2013 at 11:20 AM, Subramanya Sastry ssas...@wikimedia.orgwrote: On 07/24/2013 09:58 AM, Roan Kattouw wrote: There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the

[Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssas...@wikimedia.org wrote: Hi John and Risker, First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 6:28 PM, John Vandenberg jay...@gmail.com wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssas...@wikimedia.org wrote: Hi John and Risker, First off, I do want to once again clarify that my intention in the previous post was not to claim that

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 05:28 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssas...@wikimedia.org wrote: Hi John and Risker, First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssas...@wikimedia.org wrote: On 07/23/2013 05:28 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssas...@wikimedia.org wrote: Hi John and Risker, First off, I do want to once again clarify that my intention in

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:13 PM, John Vandenberg jay...@gmail.com wrote: http://parsoid.wmflabs.org:8001/stats This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias). Fantastic! How frequently are those tests re-run? Could you add a last-run-date on

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:13 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssas...@wikimedia.org wrote: On 07/23/2013 05:28 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssas...@wikimedia.org wrote: Hi John and Risker, First off, I do

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:24 PM, C. Scott Ananian canan...@wikimedia.orgwrote: Was a regression testsuite built using the issues encountered during the last parser rewrite? Yes, mediawiki/core/tests/parser/parserTests.txt (which predates parsoid) has been continuously updated throughout

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:02 PM, Subramanya Sastry wrote: On 07/23/2013 05:28 PM, John Vandenberg wrote: VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it, My point was that you dont appear to be doing analysis of how of all Wikipedia

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread John Vandenberg
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssas...@wikimedia.org wrote: http://parsoid.wmflabs.org:8001/stats This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias). Very minor point .. there are ~400 missing pages on the list; is that intentional ? ;-)

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:55 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssas...@wikimedia.org wrote: http://parsoid.wmflabs.org:8001/stats This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias). Very minor point .. there are ~400

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread C. Scott Ananian
On Tue, Jul 23, 2013 at 7:55 PM, John Vandenberg jay...@gmail.com wrote: On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssas...@wikimedia.org wrote: http://parsoid.wmflabs.org:8001/stats This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias). Very