Re: Is this a bug? (numbered entity references in produced HTML)

david_n_bertoni Tue, 01 Feb 2005 09:02:23 -0800

> Problem: HTML formatter writes numbered entity references
> instead of characters in output encoding (specified in
> "xsl:output" tag), despite the fact that output encoding
> supports these characters.


It shouldn't be a problem, because any web browser should render the 
character correctly.

> Formatter writes numbered entity reference for the
> character if the character is greater than the maximum
> character for the encoding (m_maxCharacter), which is
> always 0x7F for non-standard encodings
> (XalanTranscodingServices::getMaximumCharacterValue).
> This makes HTML documents incredibly large when custom
> encoding, added with XMLTransService::addEncoding() is
> used. Produced documents contain numbered entity reference
> for every locale-specific character, because they all have
> codes >0x007Fu.

Yes, this is a known problem with the design of the serializers.  I 
started working on this about a year ago, but it has not been a high 
priority, because very few people have complained about it.  You might 
want to choose UTF-8 as the output encoding, if the size of the generated 
files is too big.  Technically, XSLT processors are only required to 
support UTF-8 and UTF-16, and fixing this has a potentially significant 
performance impact on serialization, because it requires we lookup every 
character to determine if the target encoding can represent it.

> If this is a bug, can somebody register it? I couldn't do
> this through JIRA web interface.

As long as you're registered, you should be able to create a bug report.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is this a bug? (numbered entity references in produced HTML)

Reply via email to