You should bring this up on https://discuss.python.org/c/ideas/6 , which is where ideas are discussed these days.
This mailing list should be retired. I’ll mention that elsewhere. -- Eric > On Feb 16, 2023, at 9:57 AM, Arusekk <arek_...@o2.pl> wrote: > > Hi all! > > I was writing a tutorial on the distinction between bytes and strings > and why it is important, when I saw the root cause. People coming from > C, Perl, Python 2 and similar languages tend to misinterpret "\x90" for > b"\x90" often. My idea is that Python could deprecate string literals > containing any non-ASCII codepoints specified in any way different from > unicode or unicode escapes (\u, \U, \N). > > (Actually I found that I started having the idea already back in 2021 on > StackOverflow[1]. The question is an excellent example of what I mean.) > > I would not go so far to follow JSON (disallowing \x11 and \222 escapes > completely), but while writing "\x00" or "\0" is useful and widely used, > "\x99" (and especially "\777"!) is probably marginal and definitely less > explicit than "\u0099" (in the Zen of explicit better than implicit). > Byte strings do not treat b"\u00ff" as b"\xff". > > In the first part of implementing it, Python could raise a SyntaxWarning > (or should it be DeprecationWarning? BytesWarning?), suggesting "\x99" > to either become b"\u0099" or b"\x99", eventually promoting it to some > equally helpful SyntaxError. All of it could be hidden behind a feature > like from __future__ import backslashes (one nice name I can think of). > > The new regular expression for octals would be \\[01]?[0-7]{1,2} and > \\x[0-7][0-9A-Fa-f] for hexadecimals, hopefully not confusing anyone, > and not much more complex than the old ones. > > In the meantime, probably between introducing a warning and changing it > to become an error (the most reasonable timeline I can think of now), > the default ascii() representation should eventually use the \u0099 form > for all such codepoints, to keep the invariant of eval(ascii(x)) == x > without syntax warnings. repr() is also affected, but it is fortunately > limited to the [\x80-\xa0\xad] range. I mean [\u0080-\u00a0\u00ad] :-) > > Another timeline would be to change the repr first, initially hidden > under an interpreter flag or environment variable, then officially > deprecate it in the documentation, then introduce the error guarded by > from __future__ import backslashes or another flag, then make the repr > use \u by default, then add the warning and finally make it always raise > an error. > As a precedent, breaking repr() was not a dealbreaker when introducing > randomized seeds (even repr({"a", "b"}) is now unpredictable). > > This would be of course a breaking change for a lot of unit tests, and > stuff like pickle should probably support old syntax, delaying any such > change until a new protocol comes (if it applies to the newest one --- > quite sure it does not). Such a breaking change must be used wisely. > Other changes to octal escapes could be sneaked in, based on conclusions > from the 2018 'Python octal escape character encoding "wats"' thread[2] > (I like writing "\0" and "\4" though, just to make my opinion clear). > If going the whole hog, the 2015 'Make non-meaningful backslashes > illegal in string literals' thread[3] could be revived as well, maybe > even with "\f\v" deprecated, "\e" = "\33" introduced and such. > > Please let me know what you think, what else could break, and is it > useful anywhere else apart from my use case, and what similar problems > you have. > > Cheers, > Arusekk > > [1]: https://stackoverflow.com/q/64832281/3869724 > [2]: > https://mail.python.org/archives/list/python-ideas@python.org/thread/ARBCIPEQB32XBS7T3JMKUDIZ7BZGFTL6/ > [3]: > https://mail.python.org/archives/list/python-ideas@python.org/message/PJXKDJQT4XW6ZSMIIK7KAZ4OCDAO6DUT/ > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/ITBFU4GPJJVXTHT57WNLASXKL4R4MPF5/ > Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U47V6UYDCA43N5QD4OBANZQDCO3YG2X6/ Code of Conduct: http://python.org/psf/codeofconduct/