Thanks, I'll work with the file on the file system, then parse it with
SAX.

Is there a Pythonic way to read the file and identify any illegal XML
characters so I can strip them out? this would keep my program more
flexible - if the vendor is going to allow one illegal character in
their document, there's no way of knowing if another one will pop up
later.

Thanks!


Martin v. Löwis wrote:
> [EMAIL PROTECTED] schrieb:
> > My original posting has a funky line break character (it appears as an
> > ascii square) that blows up my program, but it may or may not show up
> > when you view my message.
>
> Looking at your document, it seems that this "funky line break
> character" is character \x1E, which, in latin-1, means "record
> separator". It's indeed ill-formed to use it in XML.
>
> > Is there a way to account for the invalid token in the error handler?
>
> Not with a standard XML parser, no. The error you describe is a "fatal
> error", and that's not something parsing can recover from. I recommend
> that you filter this character out before passing it to the XML parser.
> You can use the IncrementalParser interface to do so.
> 
> Regards,
> Martin

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to