[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 --- Comment #5 from Ariel T. Glenn 2011-03-30 14:25:40 UTC --- (Yes, the XML files have in them.) I had a look at the output we get from ExternalStore::fetchFromURL() The text we get back has a newline after the final parenthesis. That text is 8884 bytes long, which matches the rev_len recorded in the revision table and in the XML dump file. When I apply the various conversions for & < > " and strip the ^Ms I get the byte count of the text entry in the xml file: 8930. When I do the same conversions for the json format (for " \r \n and /) I come up one byte longer, 9160, than the actual json output text, 9159. My conclusion is that the json formatter or perhaps generally the API loses that newline at the end. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 --- Comment #4 from Aaron Halfaker 2011-03-29 17:14:28 UTC --- Anarchism(12) RevisionId: 233194 >From the 2010-01-30 XML dump at the end of the 233194 revision (notice the line breaks before the closing tag) [...] /Talk
/Todo
[[Anarchy/Talk]] [http://www.wikipedia.com/wiki.cgi?action=history&id=Anarchy Anarchy History] (The content of Anarchy and Anarchism have since been merged into this version) - >From the API (http://en.wikipedia.org/w/api.php?action=query&prop=revisions&revids=233194&rvprop=content&format=jsonfm) (notice that the string ends right after the last non-whitespace character) - { "query": { "pages": { "12": { "pageid": 12, "ns": 0, "title": "Anarchism", "revisions": [ { "*": "''Anarchism'' is (The content of Anarchy and Anarchism have since been merged into this version)" } ] } } } } - -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 --- Comment #3 from Ariel T. Glenn 2011-03-27 13:34:59 UTC --- I would like a specific page ID, revision ID and dump file to look at, if someone can point me to one. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 Roan Kattouw changed: What|Removed |Added CC||roan.katt...@gmail.com --- Comment #2 from Roan Kattouw 2011-03-05 20:48:06 UTC --- (In reply to comment #1) > Are there any specific examples? > > Are whitespace mismatches due to problems parsing the way whitespace is > encoded > in the XML, or due to the XML dumps actually containing incorrect whitespace? > Do the XML dumps use the xml:space="preserve" attribute? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 --- Comment #1 from Brion Vibber 2011-03-05 00:35:09 UTC --- Are there any specific examples? Are whitespace mismatches due to problems parsing the way whitespace is encoded in the XML, or due to the XML dumps actually containing incorrect whitespace? (The dumps may well contain incorrect whitespace, most likely due to inconsistencies in parsing the previous whitespace when doing multiple passes combining text from previous dumps with new stub dumps, etc.) -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 Reedy changed: What|Removed |Added Severity|enhancement |minor -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 27773] Length of dump text and length field in API do not match
https://bugzilla.wikimedia.org/show_bug.cgi?id=27773 Diederik van Liere changed: What|Removed |Added Blocks||27772 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l