You should bring this up on https://discuss.python.org/c/ideas/6 , which is 
where ideas are discussed these days. 

This mailing list should be retired. I’ll mention that elsewhere. 

--
Eric

> On Feb 16, 2023, at 9:57 AM, Arusekk <arek_...@o2.pl> wrote:
> 
> Hi all!
> 
> I was writing a tutorial on the distinction between bytes and strings
> and why it is important, when I saw the root cause.  People coming from
> C, Perl, Python 2 and similar languages tend to misinterpret "\x90" for
> b"\x90" often.  My idea is that Python could deprecate string literals
> containing any non-ASCII codepoints specified in any way different from
> unicode or unicode escapes (\u, \U, \N).
> 
> (Actually I found that I started having the idea already back in 2021 on
> StackOverflow[1].  The question is an excellent example of what I mean.)
> 
> I would not go so far to follow JSON (disallowing \x11 and \222 escapes
> completely), but while writing "\x00" or "\0" is useful and widely used,
> "\x99" (and especially "\777"!) is probably marginal and definitely less
> explicit than "\u0099" (in the Zen of explicit better than implicit).
> Byte strings do not treat b"\u00ff" as b"\xff".
> 
> In the first part of implementing it, Python could raise a SyntaxWarning
> (or should it be DeprecationWarning? BytesWarning?), suggesting "\x99"
> to either become b"\u0099" or b"\x99", eventually promoting it to some
> equally helpful SyntaxError.  All of it could be hidden behind a feature
> like from __future__ import backslashes (one nice name I can think of).
> 
> The new regular expression for octals would be \\[01]?[0-7]{1,2} and
> \\x[0-7][0-9A-Fa-f] for hexadecimals, hopefully not confusing anyone,
> and not much more complex than the old ones.
> 
> In the meantime, probably between introducing a warning and changing it
> to become an error (the most reasonable timeline I can think of now),
> the default ascii() representation should eventually use the \u0099 form
> for all such codepoints, to keep the invariant of eval(ascii(x)) == x
> without syntax warnings.  repr() is also affected, but it is fortunately
> limited to the [\x80-\xa0\xad] range.  I mean [\u0080-\u00a0\u00ad] :-)
> 
> Another timeline would be to change the repr first, initially hidden
> under an interpreter flag or environment variable, then officially
> deprecate it in the documentation, then introduce the error guarded by
> from __future__ import backslashes or another flag, then make the repr
> use \u by default, then add the warning and finally make it always raise
> an error.
> As a precedent, breaking repr() was not a dealbreaker when introducing
> randomized seeds (even repr({"a", "b"}) is now unpredictable).
> 
> This would be of course a breaking change for a lot of unit tests, and
> stuff like pickle should probably support old syntax, delaying any such
> change until a new protocol comes (if it applies to the newest one ---
> quite sure it does not).  Such a breaking change must be used wisely.
> Other changes to octal escapes could be sneaked in, based on conclusions
> from the 2018 'Python octal escape character encoding "wats"' thread[2]
> (I like writing "\0" and "\4" though, just to make my opinion clear).
> If going the whole hog, the 2015 'Make non-meaningful backslashes
> illegal in string literals' thread[3] could be revived as well, maybe
> even with "\f\v" deprecated, "\e" = "\33" introduced and such.
> 
> Please let me know what you think, what else could break, and is it
> useful anywhere else apart from my use case, and what similar problems
> you have.
> 
> Cheers,
> Arusekk
> 
> [1]: https://stackoverflow.com/q/64832281/3869724
> [2]: 
> https://mail.python.org/archives/list/python-ideas@python.org/thread/ARBCIPEQB32XBS7T3JMKUDIZ7BZGFTL6/
> [3]: 
> https://mail.python.org/archives/list/python-ideas@python.org/message/PJXKDJQT4XW6ZSMIIK7KAZ4OCDAO6DUT/
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/ITBFU4GPJJVXTHT57WNLASXKL4R4MPF5/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U47V6UYDCA43N5QD4OBANZQDCO3YG2X6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to