Le mercredi 25 mai 2011 à 23:41 +0200, Laura Creighton a écrit : > One reason I didn't implement the classes yet is that I couldn't > understand two points in how they are supposed to work. But it seems > that there are really two bugs, as I've been pointed to: > http://bugs.python.org/issue12100 and > http://bugs.python.org/issue12171 . So the question is if we should > be bug-compatible with Python 2.7 or if we should instead implement > some fixed version.
I fixed #12100 in Python 2.7, 3.1, 3.2, 3.3 yesterday. I plan also to fix #12171 in these four versions, it should be done next days. > I suppose I'm rather for the fixed version, but I'd like to hear some > feedback from people that actually use multibytecodecs. Both bugs are related to encoders. I don't think that anyone is using Python CJK codecs to encode text (because nobody noticed these bugs before), but more likely to decode text. Anyway, you should implement a codec without these *bugs*. For your information, I added more tests to the CJK codecs (e.g. see #12057), and I plan to add more tests next weeks. I plan also to fix issue #12016, yet another CJK codec bug. You may want to wait until all of these bugs are fixed before working on your own implementation, or implement directly a version without all of these bugs, and then upgrade the test suite. > Also, I wouldn't mind if someone would pick up the work and just do it, > either the classes or ``errors !=3D "strict"'' :-) The support of error handlers different than strict is far from being perfect. Issue #12016 is the main problem, but there are other minor issues. In some cases, invalid byte sequences are ignored even with the replace error handler (whereas I expected U+FFFD characters). CJK codecs are special because they use escape sequences (especially the ISO 2022 family): what should be done if a byte sequence looks like an escape sequences, but it is not valid? Replace each byte by U+FFFD, or ignore these bytes? I'm trying to write tests "describing" the current behaviour, and then I will maybe try to improve how invalid byte sequences are handled. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com