A Dilluns 02 Juny 2008, Jonathan Kew va escriure: > On 2 Jun 2008, at 9:51 pm, Ross Moore wrote: > > Q: How do I convert an unpaired UTF-16 surrogate to UTF-8? > > > > A different issue arises if an unpaired surrogate is encountered when > > converting ill-formed UTF-16 data. By represented such an unpaired > > surrogate on its own as a 3-byte sequence, the resulting UTF-8 data > > stream would become ill-formed. While it faithfully reflects the > > nature of the input, Unicode conformance requires that encoding form > > conversion always results in valid data stream. Therefore a converter > > must treat this as an error. [AF] > > > > > > > > So yes, throwing an error is recommended; but, IMHO dropping the > > character and continuing as far as possible would be a friendly thing > > to do, as part of how the error is presented. > > Simply dropping it is a bad thing; replacing it with U+FFFD > REPLACEMENT CHARACTER is better. > > > Now here's my concern about that conversion formula: > > > > Unicode uu = (u[i] & 0x3ff) << 10 | (u[i+1] & 0x3ff) | 0x10000; > > > > With | being bitwise 'or', doesn't this convert <d840 dc00> to > > 0x10000 when the correct result is 0x20000 ? > > You're right. Good catch; I should have noticed that too. > > > Thus this formula works correctly for Plane 1 characters only, and > > not for higher planes. > > [...] > > > A surrogate pair denotes the code point > > > > 10000 + (H - D800 ) × 400 + (L - DC00) > > where H and L are the hex values of the high and low surrogates > > respectively. > > > > Doesn't this translate into the following ? > > > > Unicode uu = ((u[i] & 0x7ff) << 10 ) + (u[i+1] & 0x3ff) + > > 0x10000; > > The expression (u[i] & 0x3ff) in the original is fine; the only > problem is that it should ADD the 0x10000, not OR it with the rest of > the value. (The 0x400 bit can never be set on a high surrogate; if it > were, it would have been outside the range D800..DBFF.)
Good that we caught all that, Koji can you provide an updated patch? Albert > > JK _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
