I just saw some document which reminded me that strings with a
backslash followed by 3 octal digits. When a backslash is followed by
3 octal digits, that means a character with the corresponding
codepoint and all is well.

The "valid scenaario":

In [42]: "\777"
Out[42]: 'ǿ'

The problem is when you have just two valid octal digits

In [40]: "\778"
Out[40]: '?8'

Which is ambiguous at least -- why is this not "\x07" "77" for
example?  (0ct(77) actually corresponds to the "?" (63) character)

Or...when the first digit is not valid as octal - that is:
In [41]: "\877"
Out[41]: '\\877'

And then when the second digit is not valid octal:
In [43]: "\797"
Out[43]: '\x0797'
WAT?

So, between the possibly ambiguous scenario with two octal digits
followed by a no-octal digit, and   the complety unexpected expansion
to a 4-hexadecimal digit codepoint in the last case, what do you say
of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
octal digits, and yield a syntax error for that from Python 3.9 (or
3.10) on?

Best regards,

    js
  -><-
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to