New submission from Nick Coghlan:

Some of the hardest compatibility issues to track down in Python 3 migrations 
are those where existing code is depending on an implicit str->unicode 
promotion something in the depths of a support library (or sometimes even the 
standard library - the context where this came up relates to some apparent 
misbehaviour in the standard library). In other cases, just being able to rule 
implicit conversions out as a possible contributing factor can be helpful in 
finding the real problem.

It's technically already possible to hook implicit conversions by adjusting (or 
shadowing) the site.py module and replacing the default "ascii" encoding with 
one that emits a warning whenever you rely on it: 
http://washort.twistedmatrix.com/2010/11/unicode-in-python-and-how-to-prevent-it.html

However, actually setting that up is a bit tricky, since we deliberately drop 
"sys.setdefaultencoding" from the sys module in the default site module. That 
means requesting warnings for implicit conversions requires doing the following:

1. Finding the "ascii_with_warnings" codec above (or writing your own)
2. Learning one of the following 3 tricks for overriding the default encoding:

2a. Run with "-S" and call sys.setdefaultencoding post-startup
2b. Edit the actual system site.py in a container or other test environment
2c. Shadow site.py with your own modified copy

3. Run your tests or application with the modified default encoding

If we wanted to make that easier for folks migrating, the first step would be 
to provide the "ascii_with_warnings" codec by default in Python 2.7 (perhaps as 
"_ascii_with_warnings", since it isn't intended for general use, it's just a 
migration helper)

The second would be to provide a way to turn it on that doesn't require 
fiddling with the site module. The simplest option there would be to always 
enable it under `-3`.

The argument against the simple option is that I'm not sure how noisy it would 
be by default - there are some standard library modules (e.g. URL processing) 
where we still rely on implicit encoding and decoding in Python 2, but have 
separate code paths in Python 3.

Since we don't have -X options in Python 2, the second simplest alternative 
would be to leave `sys.setdefaultencoding` available when running under `-3`: 
that way folks could more easily opt in to enabling the "ascii_with_warnings" 
codec selectively (e.g. via a context manager), rather than always having it 
enabled.

----------
messages: 278402
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: Migration RFE: optional warning for implicit unicode conversions
type: enhancement
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to