Hi, Brian.

Brian Minchau/Toronto/[EMAIL PROTECTED] wrote on 2005-01-13 12:30:52 AM:
> When this class is not available it looks like it exposes a 
configuration
> error in Xalan in its Encodings.properties file in the
> org.apache.xml.serializer package.  It has information for the Turkish
> characters in lines like this:
>   ISO8859_9 ISO-8859-9 0x00FF
>   ISO8859-9 ISO-8859-9 0x00FF
> The third word on the line, 0x00FF indicates the code point of the 
highest
> value used in the character set.  In base 10 this value is 255. But 
these
> Turkish characters are 287, 350, 304, which is bigger than 255.  When
> writing the characters to the output file, the serializer thinks the
> unicode characters are out of range because they are larger than the
> supposed maximum codepoint value. So the serializer converts them to
> numerical character references, e.g. the five characters İ rather 
than
> the single unicode character with a code point of 304.
> 
> At this point I'm not sure what the correct maximal code point value is 
for
> this character set, but I think that getting the value right might fix 
your
> problem.

     I think there is no single maximal code point value that will solve 
this problem.  The characters in ISO-8859-9 map to Unicode code points 
that are discontiguous, which means that the serializer actually needs to 
be able to describe ranges of code points that are representable.  For 
instance, the Unicode character U+00DD (LATIN CAPITAL LETTER Y WITH ACUTE) 
falls into the range currently permitted by the single maximal code point 
value, but it's not actually representable in ISO-8859-9 - and as you've 
noted, other characters that are representable are outside the maximal 
code point value currently specified.

     See [1] for another example.

Thanks,

Henry
[1] http://issues.apache.org/jira/browse/XALANJ-1866
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]


Reply via email to