[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2020-05-30 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- resolution: -> out of date stage: -> resolved status: open -> closed ___ Python tracker ___ ___

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-20 Thread Ben Spiller
Ben Spiller added the comment: Thanks for considering this, anyway. I'll admit I'm disappointed we couldn't fix this on the 2.7 train, as to me fixing a method that takes an errors='ignore' argument and then throws an exception anyway seems a little more like a bug than a feature (and

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ben, the methods on stings and Unicode objects in Python 2.x are direct interfaces to the underlying codecs. The codecs can handle any number of input and output types, so there are some which only work on 8-bit strings (bytes) and others which take

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-19 Thread Josh Rosenberg
Josh Rosenberg added the comment: Agree with Steven; the whole reason Python 3 changed from unicode and str to str and bytes was because having Py2 str be text sometimes, and binary data at other times is confusing. The existing behavior can't change in Py2 in any meaningful way without

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > btw If anyone can find the place in the code (sorry I tried and failed!) > where str.encode('utf-8', error=X) is resulting in an implicit call to the > equivalent of decode(defaultencoding, errors=strict) (as suggested by the > exception message) I think

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-19 Thread Steven D'Aprano
Steven D'Aprano added the comment: Ben, I'm sorry to see you have spent such a long time writing up reasons for changing this behaviour. I fear this is a total waste of your time, and ours to read it. Python 2.7 is under feature freeze, and changing the behaviour of str.encode and

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-19 Thread Ben Spiller
Ben Spiller added the comment: btw If anyone can find the place in the code (sorry I tried and failed!) where str.encode('utf-8', error=X) is resulting in an implicit call to the equivalent of decode(defaultencoding, errors=strict) (as suggested by the exception message) I think it'll be

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: If str.encode() raises a decoding exception, this is a programming bug. It would be bad to hide it. FYI, the default encoding is not hardcoded 'ascii'. Google "Changing default encoding in Python". Maybe this will help in your program. --

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Ben Spiller
Ben Spiller added the comment: I'm proposing that str.encode() should _not_ throw a 'decode' exception for non-ascii characters and be effectively a no-op, to match what it already does for ascii characters - which therefore shouldn't break behavior anyone will be depending on. This could be

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: What do you propose? Note that str.encode() doesn't raise an exception. Ascii unicode and 8-bit strings are interchangeable. Ascii unicode strings can be packed in str for less memory consumption (see xmlrpclib or ElementTree), a lot of str constant are

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Ben Spiller
Ben Spiller added the comment: yes the situation is loads better in python 3, this issue is specific to 2.x, but like many people sadly we're not able to move to 3 for the time being. Since making this mistake is quite common and there's some sensible behaviour that would make it disappear

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Note that with the -3 option Python 2.7 already warns about incompatibilities. >>> 'abc'.encode('base64') __main__:1: DeprecationWarning: 'base64' is not a text encoding; use codecs.encode() to handle arbitrary codecs 'YWJj\n' >>> 'YWJj\n'.decode('base64')

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

2016-05-12 Thread Ben Spiller
Ben Spiller added the comment: Thanks that's really helpful Having thought about it some more, I think if possible it'd be really so much better to actually 'fix' the behaviour for the unicode<->str standard codecs (i.e. not base64) rather than just documenting around it. The current