Re: Xalan-C + ICU: windows-1251 encoding troubles

David N Bertoni/CAM/Lotus 24 Jan 2002 18:01:17 -0000

This is not surprising.  Currently, Xalan-C has a pretty brain-dead
algorithm for determining whether or not to write the actual character or a
numeric character reference.  The problem is that checking each character
is horribly expensive, so we just punt on things > 256 in many cases.


I'd like to do something about it, but it's not the highest priority right
now.  Unless you can actually determine that we're not emitting the correct
numeric character reference, there's nothing wrong with doing it the way
we're doing it.  You can always post-process the file yourself if you
object to the references.

I'll bump this up on the list of things to work on for the next release.

Dave



                                                                                
                                                      
                      "Dimitry                                                  
                                                      
                      Chernyshov"              To:      
<[email protected]>                                                
                      <[EMAIL PROTECTED]         cc:      (bcc: David N 
Bertoni/CAM/Lotus)                                              
                      n.ru>                    Subject: Xalan-C + ICU: 
windows-1251 encoding troubles                                 
                                                                                
                                                      
                      01/24/2002 08:32                                          
                                                      
                      AM                                                        
                                                      
                                                                                
                                                      
                                                                                
                                                      



Hi!

After I re-built Xalan-c1_3 + Xerces-c1_6_0 + ICU 2.0, Xalan works good
with
different encodings.
Though, I've encountered one pretty strange problem.

If an XSL file has xsl:output encoding set to "windows-1251" (<xsl:output
method="html" encoding="windows-1251"/>) while transforming some XML, the
result contains character codes instead of the characters themselves. E.g.
:

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=windows-1251">
<title>&#1058;&#1080;&#1087;&#1072; &#1074;&#1080;&#1085;&#1076;&#1099;
&#1073;&#1083;&#1080;&#1085;!</title>

However, if a source XML has "windows-1251" encoding and XSL file has
encoding set to, say, KOI8-R - everything works just fine: Xalan (ICU, I
guess) transforms win-1251 to KOI8-R correctly...

Any thoughts?

Thanks in advance,
Dimitry Chernyshov,
Technology Group Managing Director,
Polar Design
--------------------------
[EMAIL PROTECTED]
http://www.polardesign.com
phone/fax: +7 (095) 363 0708

Re: Xalan-C + ICU: windows-1251 encoding troubles

Reply via email to