This is not surprising. Currently, Xalan-C has a pretty brain-dead
algorithm for determining whether or not to write the actual character or a
numeric character reference. The problem is that checking each character
is horribly expensive, so we just punt on things > 256 in many cases.
I'd like to do something about it, but it's not the highest priority right
now. Unless you can actually determine that we're not emitting the correct
numeric character reference, there's nothing wrong with doing it the way
we're doing it. You can always post-process the file yourself if you
object to the references.
I'll bump this up on the list of things to work on for the next release.
Dave
"Dimitry
Chernyshov" To:
<[email protected]>
<[EMAIL PROTECTED] cc: (bcc: David N
Bertoni/CAM/Lotus)
n.ru> Subject: Xalan-C + ICU:
windows-1251 encoding troubles
01/24/2002 08:32
AM
Hi!
After I re-built Xalan-c1_3 + Xerces-c1_6_0 + ICU 2.0, Xalan works good
with
different encodings.
Though, I've encountered one pretty strange problem.
If an XSL file has xsl:output encoding set to "windows-1251" (<xsl:output
method="html" encoding="windows-1251"/>) while transforming some XML, the
result contains character codes instead of the characters themselves. E.g.
:
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=windows-1251">
<title>Типа винды
блин!</title>
However, if a source XML has "windows-1251" encoding and XSL file has
encoding set to, say, KOI8-R - everything works just fine: Xalan (ICU, I
guess) transforms win-1251 to KOI8-R correctly...
Any thoughts?
Thanks in advance,
Dimitry Chernyshov,
Technology Group Managing Director,
Polar Design
--------------------------
[EMAIL PROTECTED]
http://www.polardesign.com
phone/fax: +7 (095) 363 0708