On Thursday, 12/26/2002 at 07:23 ZE2, "Dima Gutzeit" <[EMAIL PROTECTED]> wrote: > Sometimes when parsing XML files I get an error message(exception) about
> "invalid Unicode characters" , is there any way to filter those before parsing ? There's no way to do that within the parser. "If it contains illegal characters, it isn't XML" and the error messages are entirely correct. You could, of course, write your own stream filter and pass the data through that, then use its output as the input to the parser. That's fairly straightforward Java coding. The problem would be deciding what you're going to do with those characters when you see them -- if you just discard them you may be changing the meaning of the document, and if you turn them into some sort of private escape sequence only applications which understand that convention will be able to do anything with them. Fixing the source documents really is the cleanest answer. For what it's worth: It has been proposed that future versions of XML *may* relax the forbidden-character restrictions, but there's still no firm consensus on whether that change would be desirable or what version of XML it might find its way into. ______________________________________ Joe Kesselman / IBM Research --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
