very well, i'll use it. thanks.
On 9/1/06, Walter Dörwald <[EMAIL PROTECTED]> wrote: > tomer filiba wrote: > > > [...] > > besides, encoding suffers from many issues. suppose you have a > > damaged UTF8 file, which you read char-by-char. when we reach the > > damaged part, you'll never be able to "skip" it, as we'll just keep > > read()ing bytes, hoping to make a character out of it , until we > > reach EOF, i.e.: > > > > def read_char(self): > > buf = "" > > while not self._stream.eof: > > buf += self._stream.read(1) > > try: > > return buf.decode("utf8") > > except ValueError: > > pass > > > > which leads me to the following thought: maybe we should have > > an "enhanced" encoding library for py3k, which would report > > *incomplete* data differently from *invalid* data. today it's just a > > ValueError: suppose decode() would raise IncompleteDataError > > when the given data is not sufficient to be decoded successfully, > > and ValueError when the data is just corrupted. > > > > that could aid iostack greatly. > > We *do* have that functionality in Python 2.5: incremental decoders can > retain incomplete byte sequences on the call to the decode() method > until the next call. Only when final=True is passed in the decode() call > will it treat incomplete and invalid data in the same way: by raising an > exception. > > Incomplete input: > >>> import codecs > >>> d = codecs.lookup("utf-8").incrementaldecoder() > >>> d.decode("\xe1") > u'' > >>> d.decode("\x88") > u'' > >>> d.decode("\xb4") > u'\u1234' > > Invalid input: > >>> import codecs > >>> d = codecs.lookup("utf-8").incrementaldecoder() > >>> d.decode("\x80") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/var/home/walter/checkouts/Python/test/Lib/codecs.py", line 256, > in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: > unexpected code byte > > Incomplete input with final=True: > >>> import codecs > >>> d = codecs.lookup("utf-8").incrementaldecoder() > >>> d.decode("\xe1", final=True) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/var/home/walter/checkouts/Python/test/Lib/codecs.py", line 256, > in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: > unexpected end of data > > Servus, > Walter > > _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com