I just saw some document which reminded me that strings with a backslash followed by 3 octal digits. When a backslash is followed by 3 octal digits, that means a character with the corresponding codepoint and all is well.
The "valid scenaario": In [42]: "\777" Out[42]: 'ǿ' The problem is when you have just two valid octal digits In [40]: "\778" Out[40]: '?8' Which is ambiguous at least -- why is this not "\x07" "77" for example? (0ct(77) actually corresponds to the "?" (63) character) Or...when the first digit is not valid as octal - that is: In [41]: "\877" Out[41]: '\\877' And then when the second digit is not valid octal: In [43]: "\797" Out[43]: '\x0797' WAT? So, between the possibly ambiguous scenario with two octal digits followed by a no-octal digit, and the complety unexpected expansion to a 4-hexadecimal digit codepoint in the last case, what do you say of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3 octal digits, and yield a syntax error for that from Python 3.9 (or 3.10) on? Best regards, js -><- _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/