On 21/05/14 11:32 -0400, Doug Hellmann wrote:
On Thu, May 15, 2014 at 11:41 AM, Victor Stinner
<victor.stin...@enovance.com> wrote:
Hi,

The functions safe_decode() and safe_encode() have been ported to Python 3,
and changed more than once. IMO we can still improve these functions to make
them more reliable and easier to use.


(1) My first concern is that these functions try to guess user expectation
about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as
the default encoding to decode, but this encoding depends on the locale
encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
the Python major version.

IMO the default encoding should be UTF-8 because most OpenStack components
expect this encoding.

Or maybe users want to display data to the terminal, and so the locale
encoding should be used? In this case, locale.getpreferredencoding() would be
more reliable than sys.stdin.encoding.

From what I can see, most uses of the module are in the client
programs. If using locale to find a default encoding is the best
approach, perhaps we should go ahead and make the change you propose.

One place I see safe_decode() used in a questionable way is in heat in
heat/engine/parser.py where validation errors are being re-raised as
StackValidationFailed (line 376 in my version). It's not clear why the
message is processed the way it is, so I would want to understand the
history before proposing a change there.

The original intent for these 2 functions was to provide a reliable
way to encode/decode the input. As already mentioned in this thread,
it's not good to assume what the best encoding for every case is and I
would also prefer to keep these functions generci - as in, not thought
just for client libraries. We use this module in Glance as well,
unfortunately, not as much as I'd like.

I would prefer the improved-encoding guess to happen outside these
functions, if it's meant for client library. For example, glanceclient
could use `getpreferredencoding` and pass that to
safe_(encode|decode).

Flavio




(2) My second concern is that safe_encode(bytes, incoming, encoding)
transcodes the bytes string from incoming to encoding if these two encodings
are different.

When I port code to Python 3, I'm looking for a function to replace this
common pattern:

    if isinstance(data, six.text_type):
        data = data.encode(encoding)

I don't want to modify data encoding if it is already a bytes string. So I
would prefer to have:

    def safe_encode(data, encoding='utf-8'):
        if isinstance(data, six.text_type):
            data = data.encode(encoding)
        return data

Changing safe_encode() like this will break applications relying on the
"transcode" feature (incoming => encoding). If such usage exists, I suggest to
add a new function (ex: "transcode" ?) with an API fitting this use case. For
example, the incoming encoding would be mandatory.

Is there really applications using the incoming parameter?

The only place I see that parameter used in integrated projects is in
the tests for the module. I didn't check the non-integrated projects.
Given its symmetry with safe_decode(), I don't really see a problem
with the current name. Something like the shortcut you propose is
present in safe_encode(), so I'm not sure what benefit a new function
brings?

+1

Flavio

P.S: I'm working on graduating strutils from the incubator. I'm glad
you brought this up. I'm almost done with the graduation thing.

--
@flaper87
Flavio Percoco

Attachment: pgpHNaKdhG35L.pgp
Description: PGP signature

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to