[Python-Dev] Re: What to do about invalid escape sequences

Glenn Linderman Sun, 11 Aug 2019 23:04:29 -0700

On 8/11/2019 8:40 PM, Eric V. Smith wrote:

On 8/11/2019 4:18 PM, Glenn Linderman wrote:
On 8/11/2019 2:50 AM, Steven D'Aprano wrote:
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:
Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or <your favorite character
here>"c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."
Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.
Don't forget the upper & lower case varieties :)
And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr','RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}
>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:
{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV','vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf','fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF','bv', 'b', 'u', 'f', 'rf'}
There would be no need for 'uv' (not needed for backwardcompatibility) or 'rv' (can't be both raw and verbatim).
I'm not in any way serious about this. I just want people to realizehow many wacky combinations there would be. And heaven forbid we everadd some combination of 3 characters. If 'rfv' were actually alsovalid, you get to 89:
{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u','vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv','Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf','vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr','vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF','FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB','rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF','U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv','rfv', 'fRV', 'frv', 'RvF'}
If only we could deprecate upper case prefixes!

Eric

Yes. Happily while there is a combinatorial explosion in spellings andcasings, there is no cognitive overload: each character has anindependent effect on the interpretation and use of the string, so onceyou understand the 5 existing types (b r u f and plain) you understandthem all.

Should we add one or two more, it would be with the realization(hopefully realized in the documentation also) that v and e wouldeffectively be replacements for r and plain, rather than being combinedwith them.

Were I to design a new language with similar string syntax, I think Iwould use plain quotes for verbatim strings only, and have the followingprefixes, in only a single case:

(no prefix) - verbatim UTF-8 (at this point, I see no reason not torequire UTF-8 for the encoding of source files)

b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings

Actually, the above could be done as a preprocessor for python, or afuture import. In other words, what you see is what you get, until youadd a prefix to add additional processing. The only combinations thatseem useful are eb and ef. I don't know that constraining the orderof the prefixes would be helpful or not, if it is helpful, I have noproblem with a canonical ordering being prescribed.

As a future import, one could code modules to either the currentcombinatorial explosion with all its gotchas, special cases, and passingof undefined escapes; or one could code to the clean limited cases above.

Another thing that seems awkward about the current strings is that {{and }} become "special escapes". If it were not for the permissiveusage of \{ and \} in the current plain string processing, \{ and \}could have been used to escape the non-format-expression uses of { and}, which would be far more consistent with other escapes. Perhaps thefuture import could regularize that, also.

A future import would have no backward compatibility issues to disrupt asimplified, more regular syntax.

Does anyone know of an existing feature that couldn't be expressed in astraightforward manner with only the above capabilities?

The only other thing that I have heard about regarding strings is thatmulti-line strings have their first line indented, and other lines not.Some have recommended making the first line blank, and just chopping offthe first \n, others have recommended indenting all lines, and replacing"\n" followed by the number of indented spaces by "\n", so the text canbe aligned in the code like it will be aligned for use. Both techniquesseem to have their place in aiding code readability. Both techniquescould be used together, in practice, using one more prefix character fortriple quotes only:


    longstring = l"""
The traditional first blank line form
could be used at it has."""

If the first character of a long-string is a newline character, then itwill be removed. If the string wants to have an initial newlinecharacter, a second one can be provided, which would not be removed.


     longstring = l"""The traditional indented form
                      could be used as it has, also."""

This would be contracted by removing up to the number of spacecharacters to reach the first character of the first line of the string(if the lexer can provide that) after newlines within the string. Iffewer space characters are available after a newline, only the numberavailable would be removed. If there are more, they would be retained.


A new form would also be permitted:

    longstring = l"""
        An indented form that isn't pushed as far right as the
        traditional indented form could also be used."""

If the first character of an l-string is a newline and the secondcharacter is a space character, this form would count the number ofspace characters in the second line, and remove up to that many spacecharacters from all lines, as well as removing the initial newlinecharacter.

If l-strings were implemented (l for layout), they could be combinedwith f and/or e.

Are there any other string feature workarounds in common use that couldbe codified in a future import scenario?


Glenn

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/XJNS45JG7EUO7EPJG4254HA2T2ASWQ3F/

[Python-Dev] Re: What to do about invalid escape sequences

Reply via email to