https://bugzilla.wikimedia.org/show_bug.cgi?id=22137

--- Comment #8 from Platonides <platoni...@gmail.com> 2010-02-12 23:48:54 UTC 
---
(In reply to comment #7)
> >Java internally uses UTF-16
> yes it does, but i think the file is interperted as utf-8, otherwise it
> wouldn't be able to make sense of it at all, as utf-8 and utf-16 look fairly
> different for your average english text (I'm under the impression that utf-16
> is not compatible with ASCII thus nothing would work at all if it was using
> utf-16). 

Right. But it could be overflowing the 16-bit or some other failure.


> >I don't see why it is reading a U+26 (100110).
> 
> The entity references that come after the problematic unicode character is
> where the U+26 (&) comes from.
Interesting. Saving from firefox produced a literal " in the output.

> I'm thinking this is a bug with the underlying java libraries, as opposed to
> mwdumper
I also think so.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to