[Bug 27773] Length of dump text and length field in API do not match

2011-03-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

--- Comment #5 from Ariel T. Glenn  2011-03-30 14:25:40 
UTC ---
(Yes, the XML files have   in them.)

I had a look at the output we get from ExternalStore::fetchFromURL()

The text we get back has a newline after the final parenthesis. 

That text is 8884 bytes long, which matches the rev_len recorded in the
revision table and in the XML dump file.  When I apply the various conversions
for & < > " and strip the ^Ms I get the byte count of the text entry in the xml
file: 8930.

When I do the same conversions for the json format (for " \r \n and /) I come
up one byte longer, 9160, than the actual json output text, 9159.  My
conclusion is that the json formatter or perhaps generally the API loses that
newline at the end.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 27773] Length of dump text and length field in API do not match

2011-03-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

--- Comment #4 from Aaron Halfaker  2011-03-29 
17:14:28 UTC ---
Anarchism(12)
RevisionId: 233194

>From the 2010-01-30 XML dump at the end of the 233194 revision (notice the line
breaks before the closing  tag)

[...]
/Talk 
/Todo
[[Anarchy/Talk]] [http://www.wikipedia.com/wiki.cgi?action=history&id=Anarchy Anarchy History] (The content of Anarchy and Anarchism have since been merged into this version) - >From the API (http://en.wikipedia.org/w/api.php?action=query&prop=revisions&revids=233194&rvprop=content&format=jsonfm) (notice that the string ends right after the last non-whitespace character) - { "query": { "pages": { "12": { "pageid": 12, "ns": 0, "title": "Anarchism", "revisions": [ { "*": "''Anarchism'' is (The content of Anarchy and Anarchism have since been merged into this version)" } ] } } } } - -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 27773] Length of dump text and length field in API do not match

2011-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

--- Comment #3 from Ariel T. Glenn  2011-03-27 13:34:59 
UTC ---
I would like a specific page ID, revision ID and dump file to look at, if
someone can point me to one.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 27773] Length of dump text and length field in API do not match

2011-03-05 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

Roan Kattouw  changed:

   What|Removed |Added

 CC||roan.katt...@gmail.com

--- Comment #2 from Roan Kattouw  2011-03-05 20:48:06 
UTC ---
(In reply to comment #1)
> Are there any specific examples?
> 
> Are whitespace mismatches due to problems parsing the way whitespace is 
> encoded
> in the XML, or due to the XML dumps actually containing incorrect whitespace?
> 
Do the XML dumps use the xml:space="preserve" attribute?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 27773] Length of dump text and length field in API do not match

2011-03-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

--- Comment #1 from Brion Vibber  2011-03-05 00:35:09 UTC ---
Are there any specific examples?

Are whitespace mismatches due to problems parsing the way whitespace is encoded
in the XML, or due to the XML dumps actually containing incorrect whitespace?

(The dumps may well contain incorrect whitespace, most likely due to
inconsistencies in parsing the previous whitespace when doing multiple passes
combining text from previous dumps with new stub dumps, etc.)

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 27773] Length of dump text and length field in API do not match

2011-02-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

Reedy  changed:

   What|Removed |Added

   Severity|enhancement |minor

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 27773] Length of dump text and length field in API do not match

2011-02-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773

Diederik van Liere  changed:

   What|Removed |Added

 Blocks||27772

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l