http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1950

*** shadow/1950 Sat Jul 21 18:46:54 2001
--- shadow/1950.tmp.20340       Sun Jul 22 00:37:50 2001
***************
*** 138,141 ****
    in = new InputSource(fis);
  
  and all should be fine.
! Gary
--- 138,183 ----
    in = new InputSource(fis);
  
  and all should be fine.
! Gary
! 
! ------- Additional Comments From [EMAIL PROTECTED]  2001-07-22 00:37 -------
! Hi Gary,
! 
! Thanks for the advice. However, I have tried to use FileInputStream before and 
! the BIG5 characters will be lost. As shown below (I added a few lines to print 
! out the byte codes of the characters as well).
! 
! $ java JaxpTest2 big5.xml testing.xsl
! Transformation ....
! 
!      And the value of bbb is: ? ? ? ?
! Result in bytes:
! 10 32 32 32 32 32 65 110 100 32 116 104 101 32 118 97 108 117 101 32 111 102 32 
! 98 98 98 32 105 115 58 32 63 32 63 32 63 32 63
! 
! Whereas if I use an InputStreamReader and specify the attribute disable-output-
! escaping="yes" in the xsl, I would get the following.
! 
! $ java JaxpTest2 big5.xml testing2.xsl
! Transformation ....
! 
!      And the value of bbb is: ­» ´ä ©~ ¥Á
! Result in bytes:
! 10 32 32 32 32 32 65 110 100 32 116 104 101 32 118 97 108 117 101 32 111 102 32 
! 98 98 98 32 105 115 58 32 -83 -69 32 -76 -28 32 -87 126 32 -91 -63
! 
! You would notice that the BIG5 chinese characters output here consist of two 
! bytes with a negative value, this is because their ASCII values are larger than 
! 127. I think the HTML serializer (and XHTML serializer as well) of Xalan 
! treated these characters as special characters and escaped them. 
! 
! Using InputStreamReader to read in the XML file doesn't seem to be a problem, 
! at least in a Linux environment. I can read serialize the DOM document or using 
! the XpathAPI and get the result back without any problem. 
! 
! Come to think of it, may be this is not a problem but a feature of Xalan, to 
! escape the characters above 127. The obvious solution is for us in this part of 
! the world to use Unicode, instead of double byte character sets (DBCS) as 
! currently popular for CJK (Chinese, Japanese and Korean) languages. But for 
! various, sometimes non-technical reasons, this is unlikely to happen in the 
! future.

Reply via email to