On 18/06/2012 00:55, Nick Coghlan wrote:
On Mon, Jun 18, 2012 at 6:41 AM, Guido van Rossum<gu...@python.org> wrote:
Would it make sense to detect and reject these in 3.3 if the 2.7 syntax is
used?
Possibly - I'm trying not to actually *change* any of the internals of
the string literal processing, though. (If I recall the way we
implemented the change correctly, by the time we get to processing the
string contents, we've forgotten which specific prefix was used)
However, tis question did remind me of another detail I wanted to
check after realising this discrepancy existed: it turns out this
semantic inconsistency already arises if you use "from __future__
import unicode_literals" to get supposedly "Python 3 style" string
literals in 2.x
Python 2.7.3 (default, May 29 2012, 14:54:22)
from __future__ import unicode_literals
print(r"\u03b3")
γ
print("\u03b3")
γ
Python 3.2.1 (default, Jul 11 2011, 18:54:42)
print(r"\u03b3")
\u03b3
print("\u03b3")
γ
So, perhaps the answer is to leave this as is, and try to make 2to3
smart enough to detect such escapes and replace them with their
properly encoded (according to the source code encoding) Unicode
equivalent?
What if it's not possible to encode that character? I suppose that it
could be expanded into a string expression so that a non-raw string
literal could be used, possibly using implicit concatenation,
parenthesised, if necessary (or always?).
> After all, that's already the way to include such characters in a
forward compatible way when using the future import:
Python 2.7.3 (default, May 29 2012, 14:54:22)
from __future__ import unicode_literals
print("γ")
γ
print(r"γ\n")
γ\n
Python 3.2.1 (default, Jul 11 2011, 18:54:42)
print("γ")
γ
print(r"γ\n")
γ\n
So, rather than going ahead with reverting "ur" support as I first
suggested (since it turns out that's not a *new* problem, but just a
different way of spelling an *existing* problem), how about I do the
following:
1. Add a note to PEP 414 and the Py3k porting guide regarding the
discrepancy in escaping semantics for raw Unicode strings between 2.x
and 3.x
2. Reject the tracker issue for reverting the ur support (the semantic
problem already exists, and any solution we come up with for
__future__.unicode_literals should handle the ur prefix as well)
3. Create a new feature request for 2to3 to see if it can
automatically handle the problem of translating "\u" and "\U" escapes
into properly encoded Unicode characters
The scope of the problem is really quite small: you have to be using a
raw Unicode string in 2.x (either via the string prefix, or the future
import) *and* using a "\u" or "\U" escape within that string.
[snip]
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com