[Bug 1950] - Transformation encoding problem with DOMSource

bugzilla Sun, 22 Jul 2001 01:09:02 -0700
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1950

*** shadow/1950 Sun Jul 22 00:37:50 2001
--- shadow/1950.tmp.20424       Sun Jul 22 01:34:10 2001
***************
*** 180,183 ****
  the world to use Unicode, instead of double byte character sets (DBCS) as 
  currently popular for CJK (Chinese, Japanese and Korean) languages. But for 
  various, sometimes non-technical reasons, this is unlikely to happen in the 
! future.
--- 180,237 ----
  the world to use Unicode, instead of double byte character sets (DBCS) as 
  currently popular for CJK (Chinese, Japanese and Korean) languages. But for 
  various, sometimes non-technical reasons, this is unlikely to happen in the 
! future.
! 
! ------- Additional Comments From [EMAIL PROTECTED]  2001-07-22 01:34 -------
! Terence -
! First, try executing this from the command line and you should see that 
! everything is okay:
!   java org.apache.xalan.xslt.Process -in big5.xml -xsl testing.xsl -out 
! testing.out
! (this is all on one line).
! 
! The problem in your example was twofold:  First you used an input reader instead 
! of an input stream.  The problem you're still having is because you're using an 
! output Writer instead of an output stream.
! 
! Internally, everything is carried in Unicode since these are java character 
! strings.  There are three conversions going on:
!   input XML encoding -> unicode
!   input XSL encoding -> unicode
!   unicode -> output encoding
! 
! By specifying an output Writer in your transform Result, you override the 
! encoding attribute of the xsl:output element and cause the result string to be 
! handled in Unicode.  I don't know how you're converting the result string to a 
! byte array for your debug printing but that is where the output conversion is 
! actually taking place.
! 
! Your original example appeared to work because the encoding into Unicode and the 
! decoding from Unicode were both handled by the java Reader/Writer mechanism so 
! the encoding and decoding errors compensated.  However, unless your platform 
! default encoding is Big5 it is unlikely that the conversion into Unicode was 
! accurate.  This means that you would have problems if your input document and 
! stylesheet were in different encodings.
! 
! If you want to return a string from your transform method, best bet is to use a 
! ByteArrayOutputStream as your Result, like this:
! 
!     transformer.transform(source, new StreamResult(baos);
!     baos.close();
!     result = baos.toString();
! 
! Now, XalanJ is writing out the characters using the Big5 encoding.  Assuming 
! your platform default encoding is Big5, the baos.toString() call will convert 
! the Big5 encoded characters in the byte array to unicode.  Then, your 
! System.out.println() call will convert the unicode back into Big5 encoding.
! 
! There is no reason that you should have to use Unicode as you mentioned.  Big5 
! is perfectly fine but you need to understand when the conversions from Big5 to 
! Unicode and back are being performed.  This is -very- tricky and confusing so 
! please come back with more questions if you have them.  If you do respond, 
! please let me know your platform default encoding (the value of the java 
! file.encoding System property) and how you're converting your string to bytes 
! for printing out your debugging information.  Sending the actual code for this 
! would be the most clear.
! 
! Gary
[Bug 1950] - Transformation encoding problem with DOMSource

Reply via email to