Thanks, I'll work with the file on the file system, then parse it with SAX.
Is there a Pythonic way to read the file and identify any illegal XML characters so I can strip them out? this would keep my program more flexible - if the vendor is going to allow one illegal character in their document, there's no way of knowing if another one will pop up later. Thanks! Martin v. Löwis wrote: > [EMAIL PROTECTED] schrieb: > > My original posting has a funky line break character (it appears as an > > ascii square) that blows up my program, but it may or may not show up > > when you view my message. > > Looking at your document, it seems that this "funky line break > character" is character \x1E, which, in latin-1, means "record > separator". It's indeed ill-formed to use it in XML. > > > Is there a way to account for the invalid token in the error handler? > > Not with a standard XML parser, no. The error you describe is a "fatal > error", and that's not something parsing can recover from. I recommend > that you filter this character out before passing it to the XML parser. > You can use the IncrementalParser interface to do so. > > Regards, > Martin
-- http://mail.python.org/mailman/listinfo/python-list