[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-07 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

Replace 2. above with "2. Replace 'basestring' with '(unicode, bytes)'."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-07 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

Replacing 'basestring' with 'str' is not a bug in the behavioral sense because 
it is intended and documented.
https://docs.python.org/3/library/2to3.html#2to3fixer-basestring

How the current behavior is correct: 2to3 converts syntactically valid 2.x code 
to syntactically valid 3.x code.  It cannot, however, guarantee semantic 
correctness.  A particular problem is that str is semantically ambiguous in 
2.x, as it is used both for (encoded) text and binary data.  To resolve the 
ambiguity, 2.6 introduced 'bytes' as a synonym for 'str'.  2to3 assumes that 
'bytes' means binary data, including text that will still be encoded in 3.x, 
while 'str' means text that is encoded bytes in 2.x but *will be unicode* in 
3.x.  Hence it changes 'unicode' to unambiguous 'str' and 'basestring' == 
Union(unicode, str) to Union(str, str) == 'str'.

If you fool 2to3 by applying isinstance(value, basestring) to a value that will 
still be bytes at that point in 3.x, you get a semantic change.  Possible fixes:

1. Since you decode value after the check, do it before the check.

if isinstance(value, bytes):
value = value.decode(encoding)
if not isinstance(value, unicode):
some other code

2. Replace 'basestring' with '(unicode, basestring)'

In both cases, the 'unicode' to 'str' replacement should result in correct 3.x 
code.

3. Edit Lib/lib2to3/fixes/fix_basestring.py to replace with '(str, bytes)'.  
This should be straightforward, but ask on python-list if you need help.

As for your second example, 2to3 is not meant for 2&3 code using exception 
tricks and six/future imports.  Turning 2&3 code into idiomatic 3-only code is 
a separate subject.

Since other have and will run into the same issues, I intend to post a revised 
version of the explanation above, with fixes for a revised example, to 
python-list as "2to3, str, and basestring".  Any further discussion should go 
there.

--
resolution: rejected -> not a bug

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-07 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

meant to say "really couldn't"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-07 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

Even at this late stage, we could really change 2to3's behavior here. 
Presumably many others are relying on the current behavior.

--
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-07 Thread Bob Kline


Bob Kline  added the comment:

OK, I give up. In parting I will point out that the official Python 2 
documentation says "basestring() This abstract type is the superclass for str 
and unicode. It cannot be called or instantiated, but it can be used to test 
whether an object is an instance of str or unicode. isinstance(obj, basestring) 
is equivalent to isinstance(obj, (str, unicode))." That's exactly what the code 
we are converting (much of which was written years before Python 3 even 
existed) was doing. As for the idea that we weren't really "planning to use it 
as logical text" (ignoring the fact that _everyone_ used Python 2 str objects 
to represent logical text back in 2003, and ignoring the fact that the repro 
case given at the top of this report converts the 8-bit string value to Unicode 
-- why else would it do that except to use the value as "logical text"?) ... 
well, I don't know where to start. I'm done here. :->}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38003] Change 2to3 to replace 'basestring' with '(str,bytes)'

2019-09-06 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
nosy: +benjamin.peterson
title: Incorrect "fixing" of isinstance tests for basestring -> Change 2to3 to 
replace 'basestring' with '(str,bytes)'
versions: +Python 3.9 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com