Hi Bin,
Thanks for the reports. Please feel free to add yourselves to the
relevant bug reports to track progress. You can also file additional bug
reports against Parsoid here:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=Parsoid
On 12/02/2013 01:39 AM, Bin Li (李斌) wrote:
Hi Parsoid developers,
I have compared Wikipedia HTML and Parsoid HTML (same title and
oldid) for 500 random samples. And I found some bug examples and
difference patterns that may help you. We also expect the bugs to be
fixed. Thanks! Below are the examples:
Bug examples:
1. In
http://parsoid-lb.eqiad.wikimedia.org/enwiki/1913_Gettysburg_reunion?oldid=581251478,
References 18 is “(Pennsylvania Department of Health).
http://books.google.com/books?id=swkTAAAAYAAJ&pg=PA72. Retrieved
2011-02-06.”. But in
http://en.wikipedia.org/w/index.php?title=1913_Gettysburg_reunion&oldid=581251478,
it’s “(Pennsylvania Department of Health). Retrieved 2011-02-06.”
Looks like some differences in Cite template processing. We'll
investigate and file a bug.
2. The first external link in
http://en.wikipedia.org/w/index.php?title=...From_the_Hungry_i&oldid=555958525
is “The Kingston Trio Liner Notes album entry.”, but in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/...From_the_Hungry_i?oldid=555958525
it’s
“[http://www.lazyka.com/linernotes/trio_01(Guard,Rynolds,Shane)/recrdngs/LP_T1107.htm#.%20.%20.%20From%20the%20hungry%20i
<http://www.lazyka.com/linernotes/trio_01%28Guard,Rynolds,Shane%29/recrdngs/LP_T1107.htm#.%20.%20.%20From%20the%20hungry%20i>:
The Kingston Trio Liner Notes album entry.]”. It’s an obvious bug.
We will investigate and file a bug.
3. In
http://en.wikipedia.org/w/index.php?title=1973_CARIFTA_Games&oldid=473380600,
every table have title line: “Event Gold Silver Bronze”. But in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/1973_CARIFTA_Games?oldid=473380600,
the table title line disappears.
Bug 53139 (https://bugzilla.wikimedia.org/show_bug.cgi?id=53139) --
duplicates (53927, 57266). We'll probably have to tackle this sooner
than later.
4. In
http://en.wikipedia.org/w/index.php?title=Airdisco_Phi-Phi&oldid=551648808,
there are a table on the right: “Phi-Phi … Number built 1”. But it
disappers in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/Airdisco_Phi-Phi?oldid=551648808.
Related to Bug 53139.
5. The figcaption not displays in wikipedia, but displays in parsoid.
Example 1: see “Breg , the old part of Novo Mesto along the Krka
River” in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C5%A0entjo%C5%A1t,_Novo_Mesto?oldid=542922305,
it not exist in
http://en.wikipedia.org/w/index.php?title=%C5%A0entjo%C5%A1t,_Novo_Mesto&oldid=542922305.
Example 2: “T-6 Texan IIs over Columbus Mississippi” appears twice in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/14th_Operations_Group?oldid=572478542
but one time in
http://en.wikipedia.org/w/index.php?title=14th_Operations_Group&oldid=572478542.
There are various image parsing bugs in bugzilla (that Marc pasted the
url for in an earlier email) that we haven't gotten to fixing yet.
6. The link “[1] [2] ...” in text or references disappears in Parsoid
HTML. Example1: see “[1] [2] [3] [4]” in
http://en.wikipedia.org/w/index.php?title=1982_PBA_Open_Conference&oldid=582521559,
it disappears in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/1982_PBA_Open_Conference?oldid=582521559.
Example2: “[1]” in
http://en.wikipedia.org/w/index.php?title=2008%E2%80%9309_Barnsley_F.C._season&oldid=561135626,
disappears in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/2008%E2%80%9309_Barnsley_F.C._season?oldid=561135626.
We'll investigate and file a bug.
Other different patterns with examples:
1. http://en.wikipedia.org/w/index.php?title=$pent&oldid=535219749
have the table of contents. But
http://parsoid-lb.eqiad.wikimedia.org/enwiki/$pent?oldid=535219749 hasn’t.
2. http://en.wikipedia.org/w/index.php?title=$pent&oldid=535219749
have “[edit]” after each section to click. But
http://parsoid-lb.eqiad.wikimedia.org/enwiki/$pent?oldid=535219749 hasn’t.
3. The sign “^ ” in references of
http://en.wikipedia.org/w/index.php?title=$pent&oldid=535219749 is
replaced with “↑” in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/$pent?oldid=535219749.
4. The superscript “a b c d” etc in references of
http://en.wikipedia.org/w/index.php?title=%C3%87a_plane_pour_moi&oldid=582236844
is replaced with “{num}.0 {num}.1 {num}.2 {num}.3” etc in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C3%87a_plane_pour_moi?oldid=582236844
Parsoid doesn't generate Table of contents or edit links yet. We may not
generate edit links in Parsoid and may rely on JS for rendering them. As
for the latter two, we are thinking of dealing with wiki-specific styles
by relying on CSS/JS rather than generating different HTML for different
rendering styles so core Parsoid code is not cluttered with these
stylistic differences which are really core parse output issues.
5. The voice playing component may be different between
http://en.wikipedia.org/w/index.php?title=%C3%89tincelles_(Moszkowski)&oldid=555997335
<http://en.wikipedia.org/w/index.php?title=%C3%89tincelles_%28Moszkowski%29&oldid=555997335>
(See Problems playing this file?) and
http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C3%89tincelles_(Moszkowski)?oldid=555997335
<http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C3%89tincelles_%28Moszkowski%29?oldid=555997335>.
I haven't looked closely, but this could be Bug 49896
(https://bugzilla.wikimedia.org/show_bug.cgi?id=49896) and is one our
list of things to fix.
Subbu.
_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l