Johannes Bauer a écrit : > Dear all, > > I've some applciations which fetch HTML docuemnts off the web, parse > their content and do stuff with it. Every once in a while it happens > that the web site administrators put up files which are encoded in a > wrong manner. > > Thus my Python script dies a horrible death: > > File "./update_db", line 67, in <module> > for line in open(tempfile, "r"): > File "/usr/local/lib/python3.1/codecs.py", line 300, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position > 3286: unexpected code byte > > This is well and ok usually, but I'd like to be able to tell Python: > "Don't worry, some idiot encoded that file, just skip over such > parts/replace them by some character sequence". > > Is that possible? If so, how?
This might get you started: """ >>> help(str.decode) decode(...) S.decode([encoding[,errors]]) -> object Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors. """ HTH -- http://mail.python.org/mailman/listinfo/python-list