On 8/14/2019 8:02 AM, Random832 wrote:
On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:
Please no more combinations. The presence of both legal and illegal
combinations is already a mild nightmare for processing and testing.
idlelib.colorizer has the following re to detest legal combinations
stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"
More advanced syntax highlighting editors have to handle each string type separately
anyway, because they highlight (valid) backslash-escapes and f-string formatters.
The proposed 'v-string' type would need separate handling even in a simplistic
editor like IDLE, because it's different at the basic level of \" not ending
the string (whereas, for better or worse, all current string types have exactly the
same rules for how to find the end delimiter)
I had to read this several times, and then only after reading Eric's
reply, it finally hit me that what you are saying is that \" doesn't end
the string in any other form of string, but that sequence would end a
v-string.
It seems that also explains why Serhiy, in describing his experiment
really raw string literals mentioned having to change the tokenizer as
well as the parser (proving that it isn't impossible to deal with truly
raw strings).
\" not ending a raw string was certainly a gotcha for me when I started
using Python (with a background in C and Perl among other languages),
and it convinced me not to raw strings, that that gotcha was not worth
the other benefits of raw strings. Serhiy said:
Currently a raw literal cannot end in a single backslash (e.g. in
r"C:\User\"). Although there are reasons for this. It is an old
gotcha, and there are many closed issues about it. This question is
even included in FAQ.
which indicates that I am not the only one that has been tripped up by
that over the years.
Trying to look at it from the eyes of a beginning programmer, the whole
idea of backslash being an escape character is an unnatural artifice.
I'm unaware (but willing to be educated) of any natural language, when
using quotations, that has such a concept. Nested quotations exist, in
various forms: use of a different quotation mark for the inner and
outer quotations, and block quotations (which in English, have increased
margin on both sides, and have a blank line before and after).
Python actually supports constructs very similar to the natural language
formats, allowing both " and ' for quotations and nested quotations,
and the triple-quoted string with either " or ' is very similar in
concept to a block quotation. But _all_ the strings forms are burdened
with surprises for the beginning programmer: escape sequences of one
sort or another must be learned and understood to avoid surprises when
using the \ character.
Programming languages certainly need an escape character mechanism to
deal with characters that cannot easily be typed on a keyboard (such as
¤ ¶ etc.), or which are visually indistinguishable from other characters
or character sequences (various widths of white space), or which would
be disruptive to the flow of code or syntax if represented by the usual
character (newline, carriage return, formfeed, maybe others). But these
are programming concepts, not natural language concept. The basic
concept of a quoted string should best be borrowed directly from natural
language, and then enhancements to that made to deal with programming
concepts.
In Python, as in C, the escape characters are built in the basic string
syntax, one must learn the quirks of the escaping mechanism in order to
write
In Perl, " strings include escapes, and ' strings do not. So there is a
basic string syntax that is similar to natural language, and one that is
extended to include programming concepts. [N.B. There are lots of
reasons I switched from Perl to Python, and don't have any desire to go
back, but I have to admit, that the lack of a truly raw string in Python
was a disappointment.]
So that, together with the desire for new escape sequences, and the
creation of a new escape mechanism in the f-string {} (which adds both {
and } as escape characters by requiring them to be doubled to be treated
as literal inside an f-string, instead of using \{ and \} as the escapes
[which would have been possible, due to the addition of the f prefix]),
and the issue that because every current \-escape is defined to do
something, is why I suggested elsewhere in this thread
<https://mail.python.org/archives/list/[email protected]/message/XJNS45JG7EUO7EPJG4254HA2T2ASWQ3F/>
that perhaps the whole irregular string syntax should be rebooted with a
future import, and it seems it could both be simpler, more regular, and
more powerful as a result. And by using a future import, there are no
backward incompatibility issues, and migration can be module by module.
The more I think about this, the more tempting it is to attempt to fork
Python just to have a better string syntax! But alas! So many other time
commitments, and a lack of in-depth internals knowledge make that an
impossibility. I daresay, though, that if I get a free week, I might
well write a preprocessor that converts my suggested future syntax to
C-Python, so that I can use it in my own projects!
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/Z4CIK6ZFPAIEL2H3BNV7PV5NI5A46B4N/