Nobody has any remarks about this? Or is it because it was posted at the end
of the week;-)

Or should I ask dev list?

Kind Regards,
Jan

----- Original Message ----- 
From: "Jan Hoskens" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, February 13, 2004 11:32 AM
Subject: Bug? Reading File Source


> Hi,
>
> I've had some problems concerning special characters in my flow, but could
> fix it. One of my problems occurred when loading a document. When I used
the
> proposed way of loading (in woody binding sample):
>
>         source = resolver.resolveURI(uri); (resolve is ok)
>         var is = new
> Packages.org.xml.sax.InputSource(source.getInputStream());
>         is.setSystemId(source.getURI());
>         return parser.parseDocument(is); (crashes here)
>
> I got an error concerning special characters. When an '�' appeared in the
> filename I got an exception concerning UTF-8 illegal characters. I created
> this workaround with an encoding function to make sure that the string is
in
> UTF-8:
>
>         source = resolver.resolveURI(uri);
>         var file = new java.io.File(new
> java.net.URI(encodeURI(source.getURI()))); // just another way to access
the
> file
>         var is = new Packages.org.xml.sax.InputSource(new
> java.io.FileReader(file));
>         return parser.parseDocument(is);
>
> The encodeURI() function essentially does this:
>     split up the uri so that eg '/' is preserved, take the pieces (thus
the
> directories and filenames) and do
java.net.URLEncoder.encode(part,"UTF-8"),
> then replace the '+' (stands for whitespaces) with '%20'
>
> This does work and my file is loaded correctly.
> I thought that I had overcome this special character problem, but no, I
> hadn't! I tried to read a directory with xml files and aggregat them to
one
> big xml so I can create one pdf file, but again this failed because of the
> special character '�' appearing in my filename. I tried two combinations:
> A) dir generator with xls that creates includes and then include
transformer
> B) easy way: XPathDirectoryGenerator
>
> The first combination just crashes on the include, the second one ignores
> the problem:
>
> XPathDirectoryGenerator: Warning: Problem while reading the file
AYG�L.xml.
> Ignoring.
> java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence.
>  at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
>
> It seems to me that the same method of reading a file is used as I get the
> same UTF error (that would be logical, reusing parts). So I think that the
> inputSource doesn't take the special characters into account and when
trying
> to set an inputstream, it simply crashes because no conversion is done.
> Isn't this a bug? Isn't it the responsibility of the InputSource object to
> give a valid inputstream, even when special characters are used? (Or maybe
> the Source gives an incorrect InputSource?)
>
> Greetings,
> Jan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to