https://bugzilla.wikimedia.org/show_bug.cgi?id=22137
--- Comment #8 from Platonides <[email protected]> 2010-02-12 23:48:54 UTC --- (In reply to comment #7) > >Java internally uses UTF-16 > yes it does, but i think the file is interperted as utf-8, otherwise it > wouldn't be able to make sense of it at all, as utf-8 and utf-16 look fairly > different for your average english text (I'm under the impression that utf-16 > is not compatible with ASCII thus nothing would work at all if it was using > utf-16). Right. But it could be overflowing the 16-bit or some other failure. > >I don't see why it is reading a U+26 (100110). > > The entity references that come after the problematic unicode character is > where the U+26 (&) comes from. Interesting. Saving from firefox produced a literal " in the output. > I'm thinking this is a bug with the underlying java libraries, as opposed to > mwdumper I also think so. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
