Nobody has any remarks about this? Or is it because it was posted at the end of the week;-)
Or should I ask dev list? Kind Regards, Jan ----- Original Message ----- From: "Jan Hoskens" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, February 13, 2004 11:32 AM Subject: Bug? Reading File Source > Hi, > > I've had some problems concerning special characters in my flow, but could > fix it. One of my problems occurred when loading a document. When I used the > proposed way of loading (in woody binding sample): > > source = resolver.resolveURI(uri); (resolve is ok) > var is = new > Packages.org.xml.sax.InputSource(source.getInputStream()); > is.setSystemId(source.getURI()); > return parser.parseDocument(is); (crashes here) > > I got an error concerning special characters. When an '�' appeared in the > filename I got an exception concerning UTF-8 illegal characters. I created > this workaround with an encoding function to make sure that the string is in > UTF-8: > > source = resolver.resolveURI(uri); > var file = new java.io.File(new > java.net.URI(encodeURI(source.getURI()))); // just another way to access the > file > var is = new Packages.org.xml.sax.InputSource(new > java.io.FileReader(file)); > return parser.parseDocument(is); > > The encodeURI() function essentially does this: > split up the uri so that eg '/' is preserved, take the pieces (thus the > directories and filenames) and do java.net.URLEncoder.encode(part,"UTF-8"), > then replace the '+' (stands for whitespaces) with '%20' > > This does work and my file is loaded correctly. > I thought that I had overcome this special character problem, but no, I > hadn't! I tried to read a directory with xml files and aggregat them to one > big xml so I can create one pdf file, but again this failed because of the > special character '�' appearing in my filename. I tried two combinations: > A) dir generator with xls that creates includes and then include transformer > B) easy way: XPathDirectoryGenerator > > The first combination just crashes on the include, the second one ignores > the problem: > > XPathDirectoryGenerator: Warning: Problem while reading the file AYG�L.xml. > Ignoring. > java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence. > at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) > > It seems to me that the same method of reading a file is used as I get the > same UTF error (that would be logical, reusing parts). So I think that the > inputSource doesn't take the special characters into account and when trying > to set an inputstream, it simply crashes because no conversion is done. > Isn't this a bug? Isn't it the responsibility of the InputSource object to > give a valid inputstream, even when special characters are used? (Or maybe > the Source gives an incorrect InputSource?) > > Greetings, > Jan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
