On 21/05/14 11:32 -0400, Doug Hellmann wrote:
On Thu, May 15, 2014 at 11:41 AM, Victor Stinner <victor.stin...@enovance.com> wrote:Hi,The functions safe_decode() and safe_encode() have been ported to Python 3, and changed more than once. IMO we can still improve these functions to make them more reliable and easier to use. (1) My first concern is that these functions try to guess user expectation about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as the default encoding to decode, but this encoding depends on the locale encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on the Python major version. IMO the default encoding should be UTF-8 because most OpenStack components expect this encoding. Or maybe users want to display data to the terminal, and so the locale encoding should be used? In this case, locale.getpreferredencoding() would be more reliable than sys.stdin.encoding.From what I can see, most uses of the module are in the client programs. If using locale to find a default encoding is the best approach, perhaps we should go ahead and make the change you propose. One place I see safe_decode() used in a questionable way is in heat in heat/engine/parser.py where validation errors are being re-raised as StackValidationFailed (line 376 in my version). It's not clear why the message is processed the way it is, so I would want to understand the history before proposing a change there.
The original intent for these 2 functions was to provide a reliable way to encode/decode the input. As already mentioned in this thread, it's not good to assume what the best encoding for every case is and I would also prefer to keep these functions generci - as in, not thought just for client libraries. We use this module in Glance as well, unfortunately, not as much as I'd like. I would prefer the improved-encoding guess to happen outside these functions, if it's meant for client library. For example, glanceclient could use `getpreferredencoding` and pass that to safe_(encode|decode). Flavio
(2) My second concern is that safe_encode(bytes, incoming, encoding) transcodes the bytes string from incoming to encoding if these two encodings are different. When I port code to Python 3, I'm looking for a function to replace this common pattern: if isinstance(data, six.text_type): data = data.encode(encoding) I don't want to modify data encoding if it is already a bytes string. So I would prefer to have: def safe_encode(data, encoding='utf-8'): if isinstance(data, six.text_type): data = data.encode(encoding) return data Changing safe_encode() like this will break applications relying on the "transcode" feature (incoming => encoding). If such usage exists, I suggest to add a new function (ex: "transcode" ?) with an API fitting this use case. For example, the incoming encoding would be mandatory. Is there really applications using the incoming parameter?The only place I see that parameter used in integrated projects is in the tests for the module. I didn't check the non-integrated projects. Given its symmetry with safe_decode(), I don't really see a problem with the current name. Something like the shortcut you propose is present in safe_encode(), so I'm not sure what benefit a new function brings?
+1 Flavio P.S: I'm working on graduating strutils from the incubator. I'm glad you brought this up. I'm almost done with the graduation thing. -- @flaper87 Flavio Percoco
pgpHNaKdhG35L.pgp
Description: PGP signature
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev