On 8/12/2019 12:08 AM, Serhiy Storchaka wrote:
Currently a raw literal cannot end in a single backslash (e.g. in
r"C:\User\"). Although there are reasons for this. It is an old
gotcha, and there are many closed issues about it. This question is
even included in FAQ.
Hmm. I didn't find it documentation, and searching several ways for it
in a FAQ, I wasn't able to find it either.
The most common workarounds are:
r"C:\User" "\\"
and
r"C:\User\ "[:-1]
I tried to experiment. It was easy to make the parser allowing a
trailing backslash character. It was more difficult to change the
Python implementation in the tokenizer module. But this change breaks
existing code in more sites than I expected. 14 Python files in the
stdlib (not counting tokenizer.py) will need to be fixed. In all cases
it is a regular expression.
Few examples:
1.
r"([\"\\])"
If only one type of quotes is used in a string, we can just use
different kind of quotes for creating a string literal and remove
escaping.
r'(["\\])'
2.
r'(\'[^\']*\'|"[^"]*"|...'
If different types o quotes are used in different parts of a string,
we can use implicit concatenation of string literals created with
different quotes (in any case a regular expression is long and should
be split on several lines on semantic boundaries).
r"('[^']*'|"
r'"[^"]*"|'
r'...'
3.
r"([^.'\"\\#]\b|^)"
You can also use triple quotes if the string contain both type of
quotes together.
r"""([^.'"\\#]\b|^)"""
4. In rare cases a multiline raw string literals can contain both
`'''` and `"""`. In this case you can use implicit concatenation of
string literals created with different triple quotes.
See https://github.com/python/cpython/pull/15217 .
I do not think we are ready for such breaking change. It will break
more code than forbidding unrecognized escape sequences, and the
required fixes are less trivial.
Thanks for your investigation, Serhiy. Point 3 seems like the easiest
way to convert most regular expressions containing \" or \' from
r"..." form to v"""...""", without disturbing the internal gibberish in
the regular expression, and without needing significant analysis.
Regarding point 4, if it is a string literal used as a regexp, internal
triple quotes can be recoded as "{3} and '{3} . But whether or not
it is used as a regexp, I fail to find a syntax that permits the
creation of a multiline raw string contining both "'''" and '"""',
without using implicit concatenation. Since implicit concatenation must
already be in use for that case, converting from raw string to verbatim
string is straightforward.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/BVTHINCVSXGYG5VCIRPP7MAIF2ACWWUZ/