On 8/11/2019 8:40 PM, Eric V. Smith wrote:
On 8/11/2019 4:18 PM, Glenn Linderman wrote:
On 8/11/2019 2:50 AM, Steven D'Aprano wrote:
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:
Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or <your favorite character
here>"c:\directory\"
And that brings me to the thought that if \e wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."
Please no.
We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.
Don't forget the upper & lower case varieties :)
And all orders!
>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr',
'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}
>>> len(_all_string_prefixes())
25
And if you add just 'bv' and 'fv', it's 41:
{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV',
'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf',
'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF',
'bv', 'b', 'u', 'f', 'rf'}
There would be no need for 'uv' (not needed for backward
compatibility) or 'rv' (can't be both raw and verbatim).
I'm not in any way serious about this. I just want people to realize
how many wacky combinations there would be. And heaven forbid we ever
add some combination of 3 characters. If 'rfv' were actually also
valid, you get to 89:
{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u',
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv',
'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf',
'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr',
'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF',
'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB',
'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF',
'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv',
'rfv', 'fRV', 'frv', 'RvF'}
If only we could deprecate upper case prefixes!
Eric
Yes. Happily while there is a combinatorial explosion in spellings and
casings, there is no cognitive overload: each character has an
independent effect on the interpretation and use of the string, so once
you understand the 5 existing types (b r u f and plain) you understand
them all.
Should we add one or two more, it would be with the realization
(hopefully realized in the documentation also) that v and e would
effectively be replacements for r and plain, rather than being combined
with them.
Were I to design a new language with similar string syntax, I think I
would use plain quotes for verbatim strings only, and have the following
prefixes, in only a single case:
(no prefix) - verbatim UTF-8 (at this point, I see no reason not to
require UTF-8 for the encoding of source files)
b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings
Actually, the above could be done as a preprocessor for python, or a
future import. In other words, what you see is what you get, until you
add a prefix to add additional processing. The only combinations that
seem useful are eb and ef. I don't know that constraining the order
of the prefixes would be helpful or not, if it is helpful, I have no
problem with a canonical ordering being prescribed.
As a future import, one could code modules to either the current
combinatorial explosion with all its gotchas, special cases, and passing
of undefined escapes; or one could code to the clean limited cases above.
Another thing that seems awkward about the current strings is that {{
and }} become "special escapes". If it were not for the permissive
usage of \{ and \} in the current plain string processing, \{ and \}
could have been used to escape the non-format-expression uses of { and
}, which would be far more consistent with other escapes. Perhaps the
future import could regularize that, also.
A future import would have no backward compatibility issues to disrupt a
simplified, more regular syntax.
Does anyone know of an existing feature that couldn't be expressed in a
straightforward manner with only the above capabilities?
The only other thing that I have heard about regarding strings is that
multi-line strings have their first line indented, and other lines not.
Some have recommended making the first line blank, and just chopping off
the first \n, others have recommended indenting all lines, and replacing
"\n" followed by the number of indented spaces by "\n", so the text can
be aligned in the code like it will be aligned for use. Both techniques
seem to have their place in aiding code readability. Both techniques
could be used together, in practice, using one more prefix character for
triple quotes only:
longstring = l"""
The traditional first blank line form
could be used at it has."""
If the first character of a long-string is a newline character, then it
will be removed. If the string wants to have an initial newline
character, a second one can be provided, which would not be removed.
longstring = l"""The traditional indented form
could be used as it has, also."""
This would be contracted by removing up to the number of space
characters to reach the first character of the first line of the string
(if the lexer can provide that) after newlines within the string. If
fewer space characters are available after a newline, only the number
available would be removed. If there are more, they would be retained.
A new form would also be permitted:
longstring = l"""
An indented form that isn't pushed as far right as the
traditional indented form could also be used."""
If the first character of an l-string is a newline and the second
character is a space character, this form would count the number of
space characters in the second line, and remove up to that many space
characters from all lines, as well as removing the initial newline
character.
If l-strings were implemented (l for layout), they could be combined
with f and/or e.
Are there any other string feature workarounds in common use that could
be codified in a future import scenario?
Glenn
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/XJNS45JG7EUO7EPJG4254HA2T2ASWQ3F/