Le jeudi 21 octobre 2010 21:14:55, Toshio Kuratomi a écrit : > > That's exactly what I was looking for! Thanks. I think you've learned a > > huge amount of good information that's difficult to find, so writing it > > up in a more permanent and easy to find location will really help future > > Python developers! > > One further thing I'd be interested in is if you could document any best > practices from this experience. Things like, "surrogateescape is a > good/bad default in these cases",
I advice to use the PEP 383 (surrogateescape) when the *native* data type is bytes. Some examples: - filenames on UNIX/BSD - environment variables on UNIX/BSD - well, most data send/received from the system on UNIX/BSD :-) For network protocols, I don't know. It looks like the new email modules will offer two API levels: low level (native type) using bytes, high level using str (unicode). I don't know if the high level API uses the PEP 383 or not. PEP 383 can be used to avoid UnicodeDecodeError. But sometimes it's better to raise an error to warn the user that the encoding is incorrect or the input data is invalid (well, at least not correctly according to the encoding). I don't use strict rules. Each problem is different. Eg. it looks like not everybody agrees to use the PEP 383 for the host/domain name (issue #9377, I didn't read the whole issue, just few lines). > When is parallel functions for bytes and str better than a single > polymorphic function? If you cannot decide the output type depending on the inputs, it's better to have two functions. Examples: - 2 functions; os.getcwd() / os.getcwdb(). - polymorphic: os.path.*() But you should never accept mixed types, eg. os.path.join(b'bytes', 'unicode) have to raise a TypeError. -- Victor Stinner http://www.haypocalc.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com