[Python-Dev] Re: What to do about invalid escape sequences

Glenn Linderman Wed, 14 Aug 2019 13:58:42 -0700

On 8/14/2019 8:02 AM, Random832 wrote:

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

Please no more combinations. The presence of both legal and illegal
combinations is already a mild nightmare for processing and testing.
idlelib.colorizer has the following re to detest legal combinations


      stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type separately 
anyway, because they highlight (valid) backslash-escapes and f-string formatters. 
The proposed 'v-string' type would need separate handling even in a simplistic 
editor like IDLE, because it's different at the basic level of \" not ending 
the string (whereas, for better or worse, all current string types have exactly the 
same rules for how to find the end delimiter)

I had to read this several times, and then only after reading Eric'sreply, it finally hit me that what you are saying is that \" doesn't endthe string in any other form of string, but that sequence would end av-string.

It seems that also explains why Serhiy, in describing his experimentreally raw string literals mentioned having to change the tokenizer aswell as the parser (proving that it isn't impossible to deal with trulyraw strings).

\" not ending a raw string was certainly a gotcha for me when I startedusing Python (with a background in C and Perl among other languages),and it convinced me not to raw strings, that that gotcha was not worththe other benefits of raw strings. Serhiy said:

Currently a raw literal cannot end in a single backslash (e.g. inr"C:\User\"). Although there are reasons for this. It is an oldgotcha, and there are many closed issues about it. This question iseven included in FAQ.

which indicates that I am not the only one that has been tripped up bythat over the years.

Trying to look at it from the eyes of a beginning programmer, the wholeidea of backslash being an escape character is an unnatural artifice.I'm unaware (but willing to be educated) of any natural language, whenusing quotations, that has such a concept. Nested quotations exist, invarious forms: use of a different quotation mark for the inner andouter quotations, and block quotations (which in English, have increasedmargin on both sides, and have a blank line before and after).

Python actually supports constructs very similar to the natural languageformats, allowing both " and ' for quotations and nested quotations,and the triple-quoted string with either " or ' is very similar inconcept to a block quotation. But _all_ the strings forms are burdenedwith surprises for the beginning programmer: escape sequences of onesort or another must be learned and understood to avoid surprises whenusing the \ character.

Programming languages certainly need an escape character mechanism todeal with characters that cannot easily be typed on a keyboard (such as¤ ¶ etc.), or which are visually indistinguishable from other charactersor character sequences (various widths of white space), or which wouldbe disruptive to the flow of code or syntax if represented by the usualcharacter (newline, carriage return, formfeed, maybe others). But theseare programming concepts, not natural language concept. The basicconcept of a quoted string should best be borrowed directly from naturallanguage, and then enhancements to that made to deal with programmingconcepts.

In Python, as in C, the escape characters are built in the basic stringsyntax, one must learn the quirks of the escaping mechanism in order towrite

In Perl, " strings include escapes, and ' strings do not. So there is abasic string syntax that is similar to natural language, and one that isextended to include programming concepts. [N.B. There are lots ofreasons I switched from Perl to Python, and don't have any desire to goback, but I have to admit, that the lack of a truly raw string in Pythonwas a disappointment.]

So that, together with the desire for new escape sequences, and thecreation of a new escape mechanism in the f-string {} (which adds both {and } as escape characters by requiring them to be doubled to be treatedas literal inside an f-string, instead of using \{ and \} as the escapes[which would have been possible, due to the addition of the f prefix]),and the issue that because every current \-escape is defined to dosomething, is why I suggested elsewhere in this thread<https://mail.python.org/archives/list/[email protected]/message/XJNS45JG7EUO7EPJG4254HA2T2ASWQ3F/>that perhaps the whole irregular string syntax should be rebooted with afuture import, and it seems it could both be simpler, more regular, andmore powerful as a result. And by using a future import, there are nobackward incompatibility issues, and migration can be module by module.

The more I think about this, the more tempting it is to attempt to forkPython just to have a better string syntax! But alas! So many other timecommitments, and a lack of in-depth internals knowledge make that animpossibility. I daresay, though, that if I get a free week, I mightwell write a preprocessor that converts my suggested future syntax toC-Python, so that I can use it in my own projects!

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/Z4CIK6ZFPAIEL2H3BNV7PV5NI5A46B4N/

[Python-Dev] Re: What to do about invalid escape sequences

Reply via email to