Java internally uses UTF-16

"The native coded character set of the Java programming language is that of the
first seventeen planes of the Unicode version 3.0 character set; that is, it
consists in the basic multilingual plane (BMP) of Unicode version 1 plus the
next sixteen planes of Unicode version 3. This is because the language's
internal representation of characters uses the UTF-16 encoding, which encodes
the BMP directly and uses surrogate pairs, a simple escape mechanism, to encode
the other planes. Hence a charset in the Java platform defines a mapping
between sequences of sixteen-bit values in UTF-16 and sequences of bytes."

The file contains U+01D59F in UTF-8, thus F0 9D 96 9F. In binary 11110000
10011101 10010110 10011111
I don't see why it is reading a U+26 (100110).

PS: Maybe bugzilla is using mysql as utf-8 instead of binary? mysql unicode
currently only supports the BMP.

