I checked the file format (of the file containing the n-tilde - ñ) and it is indeed UTF-8! I'm baffled! Any ideas?
Thanks, Jason On Mar 27, 11:16 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > I've been using the xml.sax.handler module to do event-driven parsing > > of XML files in this python application I'm working on. However, I > > keep having really pesky invalid token exceptions. Initially, I was > > only getting them on control characters, and a little "sed -e 's/ > > [^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've > > been getting these invalid token excpetions with n-tildes (like the n > > in España), smart/fancy/curly quotes and other seemingly harmless > > characters. Specifying encoding="utf-8" in the xml header hasn't > > helped matters. > > > Any ideas? As a last resort, I'd be willing to scrub invalid > > characters.... it just seems strange that curly quotes and n-tildes > > wouldn't be valid XML! Is that really the case? > > It's not the case, unless you have a wrong encoding. Then the whole > XML-Document isn't a XML-document at all. > > Just putting an encoding header that doesn't match the actually used > encoding won't fix that. > > Read up on what encodings are, and ensure your XML-generation respects that. > Then reading these files will cause no problems. > > Diez -- http://mail.python.org/mailman/listinfo/python-list