Tom Bradford wrote:
On Friday, January 4, 2002, at 10:16 AM, Gianugo Rabellino wrote:But actually I even don't understand what this small encoding issue might have to do when the only things flowing around should be SAX events as it is in Cocoon where only getContentAsSAX() is called. Isn't it weird?
Can you send me one of the offending documents? I'll play around and see what I can figure out.
Tom,
I managed to find out the problem: in SAXEventGenerator.java the "value" String is built from a byte array. The problem is that there is no encoding specification in the constructor, which means (as per the Java APIdocs) that some issue might arise. The attached one-liner patch fixes the problem, but it's a workaround that works only now, since UTF-8 is the only supported encoding. Once multiple encodings will be supported this will have to be changed.
I hope this patch (or something equivalent) can make into the CVS: it's really important (to me in the first place :)) and to everyone that will use XIndice with Cocoon, given that SAX is used all over the place.
Thanks for your support,
-- Gianugo
Index: java/src/org/apache/xindice/xml/sax/SAXEventGenerator.java
===================================================================
RCS file:
/home/cvspublic/xml-xindice/java/src/org/apache/xindice/xml/sax/SAXEventGenerator.java,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 SAXEventGenerator.java
--- java/src/org/apache/xindice/xml/sax/SAXEventGenerator.java 6 Dec 2001
19:34:00 -0000 1.1.1.1
+++ java/src/org/apache/xindice/xml/sax/SAXEventGenerator.java 6 Jan 2002
14:44:13 -0000
@@ -261,7 +261,7 @@
byte[] buf = new byte[tbis.available()];
tin.read(buf);
- String value = new String(buf);
+ String value = new String(buf, "UTF-8");
switch ( type ) {
