I wrote a small standalone class, to test .getString() vs .getBytes() and .getString doesn't handle the UTF8 characters correctly. You can download the source code at http://patrick.schlaepfer.com/TestUTF8.tar.gz
With mysql getObject, returns a an Object an not a byte[] - which makes sense. So the UTF8 encoding gets lost there. So I changed in cocoon-2.1.4/src/blocks/database/java/org/apache/cocoon/transformation/SQLTr ansformer.java The lines // String retval = SQLTransformer.getStringValue( rs.getObject( i ) ); String retval = SQLTransformer.getStringValue( rs.getBytes( i ) ); and // String retval = SQLTransformer.getStringValue( rs.getObject( name ) ); String retval = SQLTransformer.getStringValue( rs.getBytes( name ) ); and retString = "B "+new String( (byte[]) object, "UTF8" ); (B is only for debugging) And now ther characters are encoded correctly. Have no idea, if this is also the case with other Databases but at least with MySQL 4.1.1 it works. Any comments are welcome Patrick > -----Urspr�ngliche Nachricht----- > Von: Bertrand Delacretaz [mailto:[EMAIL PROTECTED] > Gesendet: Donnerstag, 1. April 2004 07:24 > An: [EMAIL PROTECTED] > Betreff: Re: Unicode Umlauts/SQLTransformer > > > Le 31 mars 04, � 16:23, Patrick Schlaepfer a �crit : > > > Made the observation that SQLTransformer, doesn't care > > that much about character Encoding: > > > > String retval = SQLTransformger.getStringValue(rs.getObject(i)); > > and then returns a new String((byte[]) object) > > According to the Java API, this "Constructs a new String by decoding > the specified array of bytes using the platform's default charset.". > > IIUC the platform's default charset is what can be set with the > -Dfile.encoding parameter, so things should be fine *if* the encoding > is correctly handled all the way down the pipeline. I don't know if > this is the case though, you might want to test it by dumping the > String at various stages or starting with minimal pipelines. > > OTOH I'm wondering if the use of rs.getObject(i) as opposed to > rs.getString() isn't a problem regarding encoding. It would be > interesting to compare the two, either in a simple test program outside > of Cocoon, or by modifying the SQLTransformer to use rs.getString() if > rs.getMetaData().getColumnType(i) says this is a String column. > > -Bertrand > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
