That makes sense, any idea how to deal with it, flush the buffer somehow? This is what the code looks like:
from xml.sax import make_parser from xml.sax.handler import feature_namespaces, feature_validation from xml.sax.handler import ContentHandler, ErrorHandler, DTDHandler . . . evalHandler = EvaluateKeyHandler() parser = make_parser(['_xmlplus.sax.drivers2.drv_xmlproc']) parser.setFeature(feature_validation, 1) parser.setFeature(feature_namespaces, 0) parser.setContentHandler(evalHandler) parser.setErrorHandler(evalHandler) f = open(file) parser.parse(f) . . . def characters(self, content): self.keywordData += content >>> xmlproc.version '0.70' PyXML-0.8.3 Python 2.3.3 Red Hat Linux 3.3.3-7 Mike Brown wrote: >Anders wrote: > > >>Im having a hard time debugging this error: >> >><somefile>:<row>:<char>: character set conversion problem: 'utf8' codec can't >>decode byte 0xc3 in position 65535: unexpected end of data >> >>The file Im trying to parse with xmlproc contains no illegal utf-8 byte >>sequences and this error does not occur when I switch to pyexpat. This >>is a hexdump of the row its complaining about: >>00020030 64 65 73 20 6c c3 a8 76 72 65 73 20 42 6f 72 64 |des l..vres >>Bord| >>Its nothing wrong with this bytesequence what I can see. >> >>Has anyone else experienced this problem and found a solution, all help >>appreciated. >> >> > >Apparently it's a buffering issue; the stream it's decoding only consists of >2^16 bytes, and the last one is that c3. What does your python code look like? >What platform/OS is this on, and what versions of Python and PyXML? > > > _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig