Re: [openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

2014-05-22 Thread Flavio Percoco

On 21/05/14 11:32 -0400, Doug Hellmann wrote:

On Thu, May 15, 2014 at 11:41 AM, Victor Stinner
victor.stin...@enovance.com wrote:

Hi,

The functions safe_decode() and safe_encode() have been ported to Python 3,
and changed more than once. IMO we can still improve these functions to make
them more reliable and easier to use.


(1) My first concern is that these functions try to guess user expectation
about encodings. They use sys.stdin.encoding or sys.getdefaultencoding() as
the default encoding to decode, but this encoding depends on the locale
encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
the Python major version.

IMO the default encoding should be UTF-8 because most OpenStack components
expect this encoding.

Or maybe users want to display data to the terminal, and so the locale
encoding should be used? In this case, locale.getpreferredencoding() would be
more reliable than sys.stdin.encoding.


From what I can see, most uses of the module are in the client
programs. If using locale to find a default encoding is the best
approach, perhaps we should go ahead and make the change you propose.

One place I see safe_decode() used in a questionable way is in heat in
heat/engine/parser.py where validation errors are being re-raised as
StackValidationFailed (line 376 in my version). It's not clear why the
message is processed the way it is, so I would want to understand the
history before proposing a change there.


The original intent for these 2 functions was to provide a reliable
way to encode/decode the input. As already mentioned in this thread,
it's not good to assume what the best encoding for every case is and I
would also prefer to keep these functions generci - as in, not thought
just for client libraries. We use this module in Glance as well,
unfortunately, not as much as I'd like.

I would prefer the improved-encoding guess to happen outside these
functions, if it's meant for client library. For example, glanceclient
could use `getpreferredencoding` and pass that to
safe_(encode|decode).

Flavio






(2) My second concern is that safe_encode(bytes, incoming, encoding)
transcodes the bytes string from incoming to encoding if these two encodings
are different.

When I port code to Python 3, I'm looking for a function to replace this
common pattern:

if isinstance(data, six.text_type):
data = data.encode(encoding)

I don't want to modify data encoding if it is already a bytes string. So I
would prefer to have:

def safe_encode(data, encoding='utf-8'):
if isinstance(data, six.text_type):
data = data.encode(encoding)
return data

Changing safe_encode() like this will break applications relying on the
transcode feature (incoming = encoding). If such usage exists, I suggest to
add a new function (ex: transcode ?) with an API fitting this use case. For
example, the incoming encoding would be mandatory.

Is there really applications using the incoming parameter?


The only place I see that parameter used in integrated projects is in
the tests for the module. I didn't check the non-integrated projects.
Given its symmetry with safe_decode(), I don't really see a problem
with the current name. Something like the shortcut you propose is
present in safe_encode(), so I'm not sure what benefit a new function
brings?


+1

Flavio

P.S: I'm working on graduating strutils from the incubator. I'm glad
you brought this up. I'm almost done with the graduation thing.

--
@flaper87
Flavio Percoco


pgpHNaKdhG35L.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

2014-05-21 Thread Doug Hellmann
On Thu, May 15, 2014 at 11:41 AM, Victor Stinner
victor.stin...@enovance.com wrote:
 Hi,

 The functions safe_decode() and safe_encode() have been ported to Python 3,
 and changed more than once. IMO we can still improve these functions to make
 them more reliable and easier to use.


 (1) My first concern is that these functions try to guess user expectation
 about encodings. They use sys.stdin.encoding or sys.getdefaultencoding() as
 the default encoding to decode, but this encoding depends on the locale
 encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
 the Python major version.

 IMO the default encoding should be UTF-8 because most OpenStack components
 expect this encoding.

 Or maybe users want to display data to the terminal, and so the locale
 encoding should be used? In this case, locale.getpreferredencoding() would be
 more reliable than sys.stdin.encoding.

From what I can see, most uses of the module are in the client
programs. If using locale to find a default encoding is the best
approach, perhaps we should go ahead and make the change you propose.

One place I see safe_decode() used in a questionable way is in heat in
heat/engine/parser.py where validation errors are being re-raised as
StackValidationFailed (line 376 in my version). It's not clear why the
message is processed the way it is, so I would want to understand the
history before proposing a change there.



 (2) My second concern is that safe_encode(bytes, incoming, encoding)
 transcodes the bytes string from incoming to encoding if these two encodings
 are different.

 When I port code to Python 3, I'm looking for a function to replace this
 common pattern:

 if isinstance(data, six.text_type):
 data = data.encode(encoding)

 I don't want to modify data encoding if it is already a bytes string. So I
 would prefer to have:

 def safe_encode(data, encoding='utf-8'):
 if isinstance(data, six.text_type):
 data = data.encode(encoding)
 return data

 Changing safe_encode() like this will break applications relying on the
 transcode feature (incoming = encoding). If such usage exists, I suggest to
 add a new function (ex: transcode ?) with an API fitting this use case. For
 example, the incoming encoding would be mandatory.

 Is there really applications using the incoming parameter?

The only place I see that parameter used in integrated projects is in
the tests for the module. I didn't check the non-integrated projects.
Given its symmetry with safe_decode(), I don't really see a problem
with the current name. Something like the shortcut you propose is
present in safe_encode(), so I'm not sure what benefit a new function
brings?

Doug



 So, what do you think about that?

 Victor

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

2014-05-21 Thread John Dennis
On 05/15/2014 11:41 AM, Victor Stinner wrote:
 Hi,
 
 The functions safe_decode() and safe_encode() have been ported to Python 3, 
 and changed more than once. IMO we can still improve these functions to make 
 them more reliable and easier to use.
 
 
 (1) My first concern is that these functions try to guess user expectation 
 about encodings. They use sys.stdin.encoding or sys.getdefaultencoding() as 
 the default encoding to decode, but this encoding depends on the locale 
 encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and 
 on 
 the Python major version.
 
 IMO the default encoding should be UTF-8 because most OpenStack components 
 expect this encoding.
 
 Or maybe users want to display data to the terminal, and so the locale 
 encoding should be used? In this case, locale.getpreferredencoding() would be 
 more reliable than sys.stdin.encoding.

The problem is you can't know the correct encoding to use until you know
the encoding of the IO stream, therefore I don't think you can correctly
write a generic encode/decode functions. What if you're trying to send
the output to multiple IO streams potentially with different encodings?
Think that's far fetched? Nope, it's one of the nastiest and common
problems in Python2. The default encoding differs depending on whether
the IO target is a tty or not. Therefore code that works fine when
written to the terminal blows up with encoding errors when redirected to
a file (because the TTY probably has UTF-8 and all other encodings
default to ASCII due to sys.defaultencoding).

Another problem is that Python2 default encoding is ASCII but in Python3
it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8,
that fact it was set to ASCII is the cause of 99% of the encoding
exceptions in Python2).

Given that you don't know what the encoding of the IO stream is I don't
think you should base it on the locale nor sys.stdin. Rather I think we
should just agree everything is UTF-8. If that messes up someones
terminal output I think it's fair to say if you're running OpenStack
you'll need to switch to UTF-8. Anything else requires way more
knowledge than we have available in a generic function. Solving this so
the encodings match for each and every IO stream is very complicated,
note Python3 still punts on this.


-- 
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

2014-05-21 Thread Doug Hellmann
On Wed, May 21, 2014 at 12:30 PM, John Dennis jden...@redhat.com wrote:
 On 05/15/2014 11:41 AM, Victor Stinner wrote:
 Hi,

 The functions safe_decode() and safe_encode() have been ported to Python 3,
 and changed more than once. IMO we can still improve these functions to make
 them more reliable and easier to use.


 (1) My first concern is that these functions try to guess user expectation
 about encodings. They use sys.stdin.encoding or sys.getdefaultencoding() as
 the default encoding to decode, but this encoding depends on the locale
 encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and 
 on
 the Python major version.

 IMO the default encoding should be UTF-8 because most OpenStack components
 expect this encoding.

 Or maybe users want to display data to the terminal, and so the locale
 encoding should be used? In this case, locale.getpreferredencoding() would be
 more reliable than sys.stdin.encoding.

 The problem is you can't know the correct encoding to use until you know
 the encoding of the IO stream, therefore I don't think you can correctly
 write a generic encode/decode functions. What if you're trying to send
 the output to multiple IO streams potentially with different encodings?
 Think that's far fetched? Nope, it's one of the nastiest and common
 problems in Python2. The default encoding differs depending on whether
 the IO target is a tty or not. Therefore code that works fine when
 written to the terminal blows up with encoding errors when redirected to
 a file (because the TTY probably has UTF-8 and all other encodings
 default to ASCII due to sys.defaultencoding).

 Another problem is that Python2 default encoding is ASCII but in Python3
 it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8,
 that fact it was set to ASCII is the cause of 99% of the encoding
 exceptions in Python2).

 Given that you don't know what the encoding of the IO stream is I don't
 think you should base it on the locale nor sys.stdin. Rather I think we
 should just agree everything is UTF-8. If that messes up someones
 terminal output I think it's fair to say if you're running OpenStack
 you'll need to switch to UTF-8. Anything else requires way more
 knowledge than we have available in a generic function. Solving this so
 the encodings match for each and every IO stream is very complicated,
 note Python3 still punts on this.

Unfortunately we can't just agree to a single encoding in all cases.
Lots of people use encodings other than UTF-8 for terminals, and
that's where these functions are most frequently used.

Doug



 --
 John

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

2014-05-15 Thread Victor Stinner
Hi,

The functions safe_decode() and safe_encode() have been ported to Python 3, 
and changed more than once. IMO we can still improve these functions to make 
them more reliable and easier to use.


(1) My first concern is that these functions try to guess user expectation 
about encodings. They use sys.stdin.encoding or sys.getdefaultencoding() as 
the default encoding to decode, but this encoding depends on the locale 
encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on 
the Python major version.

IMO the default encoding should be UTF-8 because most OpenStack components 
expect this encoding.

Or maybe users want to display data to the terminal, and so the locale 
encoding should be used? In this case, locale.getpreferredencoding() would be 
more reliable than sys.stdin.encoding.


(2) My second concern is that safe_encode(bytes, incoming, encoding) 
transcodes the bytes string from incoming to encoding if these two encodings 
are different.

When I port code to Python 3, I'm looking for a function to replace this 
common pattern:

if isinstance(data, six.text_type):
data = data.encode(encoding)

I don't want to modify data encoding if it is already a bytes string. So I 
would prefer to have:

def safe_encode(data, encoding='utf-8'):
if isinstance(data, six.text_type):
data = data.encode(encoding)
return data

Changing safe_encode() like this will break applications relying on the 
transcode feature (incoming = encoding). If such usage exists, I suggest to 
add a new function (ex: transcode ?) with an API fitting this use case. For 
example, the incoming encoding would be mandatory.

Is there really applications using the incoming parameter?


So, what do you think about that?

Victor

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev