Hi,

I've had some problems concerning special characters in my flow, but could
fix it. One of my problems occurred when loading a document. When I used the
proposed way of loading (in woody binding sample):

        source = resolver.resolveURI(uri); (resolve is ok)
        var is = new
Packages.org.xml.sax.InputSource(source.getInputStream());
        is.setSystemId(source.getURI());
        return parser.parseDocument(is); (crashes here)

I got an error concerning special characters. When an '�' appeared in the
filename I got an exception concerning UTF-8 illegal characters. I created
this workaround with an encoding function to make sure that the string is in
UTF-8:

        source = resolver.resolveURI(uri);
        var file = new java.io.File(new
java.net.URI(encodeURI(source.getURI()))); // just another way to access the
file
        var is = new Packages.org.xml.sax.InputSource(new
java.io.FileReader(file));
        return parser.parseDocument(is);

The encodeURI() function essentially does this:
    split up the uri so that eg '/' is preserved, take the pieces (thus the
directories and filenames) and do java.net.URLEncoder.encode(part,"UTF-8"),
then replace the '+' (stands for whitespaces) with '%20'

This does work and my file is loaded correctly.
I thought that I had overcome this special character problem, but no, I
hadn't! I tried to read a directory with xml files and aggregat them to one
big xml so I can create one pdf file, but again this failed because of the
special character '�' appearing in my filename. I tried two combinations:
A) dir generator with xls that creates includes and then include transformer
B) easy way: XPathDirectoryGenerator

The first combination just crashes on the include, the second one ignores
the problem:

XPathDirectoryGenerator: Warning: Problem while reading the file AYG�L.xml.
Ignoring.
java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence.
 at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)

It seems to me that the same method of reading a file is used as I get the
same UTF error (that would be logical, reusing parts). So I think that the
inputSource doesn't take the special characters into account and when trying
to set an inputstream, it simply crashes because no conversion is done.
Isn't this a bug? Isn't it the responsibility of the InputSource object to
give a valid inputstream, even when special characters are used? (Or maybe
the Source gives an incorrect InputSource?)

Greetings,
Jan


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to