[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]
On 8/14/2019 8:09 AM, Random832 wrote: On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote: Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings: Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression. The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation. So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions? AIUI there were strong objections to the "AS DESCRIBED" process (which would require almost all valid uses of backslashes inside to be doubled, and would incidentally leave your example *still* a syntax error), and disallowing backslashes is a way to pretend that it doesn't work that way and leave open the possibility of changing how it works in the future without breaking compatibility. The only dubious benefit to the described process with backslashes allowed would be that f-strings (or other strings, in the innermost level) could be infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited to four levels as f'''{f"""{f'{"..."}'}"""}''' Sure. I am just pointing out (and did so in the issue I created for documentation as well), that the documentation does not currently correctly describe the implemenation, which is misleading to the user. While I have opinions on how things could work better, my even stronger opinion is that documentation should *accurately* describe how things work, even if it how it works is more complex than it should be. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B2HI27XRCA43GVV2D2MF5IZOUX5NG2PW/
[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]
On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote: > Because of the "invalid escape sequence" and "raw string" discussion, > when looking at the documentation, I also noticed the following > description for f-strings: > > > Escape sequences are decoded like in ordinary string literals (except when > > a literal is also marked as a raw string). After decoding, the grammar for > > the contents of the string is: followed by lots of stuff, followed by > > Backslashes are not allowed in format expressions and will raise an > error: f"newline: {ord('\n')}" # raises SyntaxError > What I don't understand is how, if f-strings are processed AS > DESCRIBED, how the \n is ever seen by the format expression. > > The description is that they are first decoded like ordinary strings, > and then parsed for the internal grammar containing {} expressions to > be expanded. If that were true, the \n in the above example would > already be a newline character, and the parsing of the format > expression would not see the backslash. And if it were true, that would > actually be far more useful for this situation. > > So given that it is not true, why not? And why go to the extra work of > prohibiting \ in the format expressions? AIUI there were strong objections to the "AS DESCRIBED" process (which would require almost all valid uses of backslashes inside to be doubled, and would incidentally leave your example *still* a syntax error), and disallowing backslashes is a way to pretend that it doesn't work that way and leave open the possibility of changing how it works in the future without breaking compatibility. The only dubious benefit to the described process with backslashes allowed would be that f-strings (or other strings, in the innermost level) could be infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited to four levels as f'''{f"""{f'{"..."}'}"""}''' ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7NR4XYRTUAWFKGZEDTLODEWXV7YML6TZ/
[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]
On 8/10/2019 5:32 PM, Greg Ewing wrote: Glenn Linderman wrote: If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation. But then it would fail for a different reason -- the same reason that this is a syntax error: 'hello world' Would it really? Or would it, because it has already been lexed and parsed as string content by then, simply be treated as a new line that is part of the string? just like "hello\nworld" is treated after it is lexed and parsed? Of course, if it is passed back through the parser again, you would be correct. I don't know the internals that apply here. Anyway, Eric supplied the real reasons for the limitation, but it does seem like if it would be passed back through the "real" parser, that the real parser would have no problem handling the ord('\n') part of f"newline: {ord('\n')}" if it weren't prohibited by prechecking for \ and making it illegal. But there is also presently a custom parser involved, so whether the \ check is in there or in a preprocessing step before the parser, I don't know. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/THSM262IIVNYRF2DDSXNPYPSLSK5W3GY/
[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]
Glenn Linderman wrote: If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation. But then it would fail for a different reason -- the same reason that this is a syntax error: 'hello world' Why go to the extra work of prohibiting \ in the format expressions? Maybe to avoid problems like the above? Or maybe because it would be confusing -- there are two levels of string literal processing going on, one on the outer f-string and one on the embedded string literal in the expression. What level is the backslash expansion done in? Is it done in both? To get a backslash in the embedded string, do I need two backslashes or four? Banning backslashes altogether sidesteps all these issues. not mentioning the actual escape processing that is done for raw strings, regarding \" and \'. Technically that's not part of "escape processing", since it takes place during lexical analysis -- it has to, because it affects how the input stream is divided into tokens. However, the backslash prohibition seems to apply even to this use in f-strings: >>> f"quote: {ord('\"')}" File "", line 1 SyntaxError: f-string expression part cannot include a backslash So it seems that f-strings are even more special than r-strings when it comes to the treatment of backslashes. -- Greg ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XSMGC4VAPHQPXRNTGZP4TQG3ZNU7TZKK/
[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]
n 8/10/2019 7:46 PM, Glenn Linderman wrote: Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings: Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression. If I recall correctly, the mentioned decoding is happening on the string literal parts of the f-strings (above, the "newline: " part), not the expression parts (inside the {}). But it's been a while and I don't recall all of the details. The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation. So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions? It's a future-proofing thing. See the discussion at https://mail.python.org/archives/list/python-dev@python.org/thread/EVXD72IYUN2APF2443OMADKA5WJTOKHD/ It has pointers to other parts of the discussion. At some point, I'm planning on switching the parsing of f-strings from the custom parser (see Python/ast.c, FstringParser_ConcatFstring()) to having the python parser itself parse the f-strings. This will be similar to PEP 536, which doesn't have much detail, but does describe some of the motivations. The PEP 498, of course, has an apparently more accurate description, that the {} parsing actually happens before the escape processing. Perhaps this avoids making multiple passes over the string to do the work, as the literal pieces and format expression pieces have to be separate in the generated code, but that is just my speculation: I'd like to know the real reason. Should the documentation be fixed to make the description more accurate? If so, I'd be glad to open an issue. Sure. I'm always in favor of accuracy. The f-string documentation was a last-minute rush job that could have used a lot more editing, and more eyes are always welcome. But it will take a fair amount of research to understand it well enough to document it in more detail. The PEP further contains the inaccurate statement: Like all raw strings in Python, no escape processing is done for raw f-strings: not mentioning the actual escape processing that is done for raw strings, regarding \" and \'. It should probably just say it uses the same rules as raw strings. Eric ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FKNEBB5HTMRX4RWLPTZN5K2WRZ5W7MI6/