[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-14 Thread Glenn Linderman

On 8/14/2019 8:09 AM, Random832 wrote:

On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote:

Because of the "invalid escape sequence" and "raw string" discussion,
when looking at the documentation, I also noticed the following
description for f-strings:


Escape sequences are decoded like in ordinary string literals (except when a 
literal is also marked as a raw string). After decoding, the grammar for the 
contents of the string is: followed by lots of stuff, followed by
Backslashes are not allowed in format expressions and will raise an

error: f"newline: {ord('\n')}"  # raises SyntaxError
  What I don't understand is how, if f-strings are processed AS
DESCRIBED, how the \n is ever seen by the format expression.

  The description is that they are first decoded like ordinary strings,
and then parsed for the internal grammar containing {} expressions to
be expanded. If that were true, the \n in the above example would
already be a newline character, and the parsing of the format
expression would not see the backslash. And if it were true, that would
actually be far more useful for this situation.

  So given that it is not true, why not? And why go to the extra work of
prohibiting \ in the format expressions?

AIUI there were strong objections to the "AS DESCRIBED" process (which would 
require almost all valid uses of backslashes inside to be doubled, and would incidentally 
leave your example *still* a syntax error), and disallowing backslashes is a way to 
pretend that it doesn't work that way and leave open the possibility of changing how it 
works in the future without breaking compatibility.

The only dubious benefit to the described process with backslashes allowed would be that f-strings (or other strings, 
in the innermost level) could be infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited to 
four levels as f'''{f"""{f'{"..."}'}"""}'''


Sure. I am just pointing out (and did so in the issue I created for 
documentation as well), that the documentation does not currently 
correctly describe the implemenation, which is misleading to the user.


While I have opinions on how things could work better, my even stronger 
opinion is that documentation should *accurately* describe how things 
work, even if it how it works is more complex than it should be.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B2HI27XRCA43GVV2D2MF5IZOUX5NG2PW/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-14 Thread Random832
On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote:
> Because of the "invalid escape sequence" and "raw string" discussion, 
> when looking at the documentation, I also noticed the following 
> description for f-strings:
> 
> > Escape sequences are decoded like in ordinary string literals (except when 
> > a literal is also marked as a raw string). After decoding, the grammar for 
> > the contents of the string is: followed by lots of stuff, followed by 
> > Backslashes are not allowed in format expressions and will raise an 
> error: f"newline: {ord('\n')}"  # raises SyntaxError 
>  What I don't understand is how, if f-strings are processed AS 
> DESCRIBED, how the \n is ever seen by the format expression.
> 
>  The description is that they are first decoded like ordinary strings, 
> and then parsed for the internal grammar containing {} expressions to 
> be expanded. If that were true, the \n in the above example would 
> already be a newline character, and the parsing of the format 
> expression would not see the backslash. And if it were true, that would 
> actually be far more useful for this situation.
> 
>  So given that it is not true, why not? And why go to the extra work of 
> prohibiting \ in the format expressions?

AIUI there were strong objections to the "AS DESCRIBED" process (which would 
require almost all valid uses of backslashes inside to be doubled, and would 
incidentally leave your example *still* a syntax error), and disallowing 
backslashes is a way to pretend that it doesn't work that way and leave open 
the possibility of changing how it works in the future without breaking 
compatibility.

The only dubious benefit to the described process with backslashes allowed 
would be that f-strings (or other strings, in the innermost level) could be 
infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited 
to four levels as f'''{f"""{f'{"..."}'}"""}'''
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7NR4XYRTUAWFKGZEDTLODEWXV7YML6TZ/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Glenn Linderman

On 8/10/2019 5:32 PM, Greg Ewing wrote:

Glenn Linderman wrote:
If that were true, the \n in the above example would already be a 
newline character, and the parsing of the format expression would not 
see the backslash. And if it were true, that would actually be far

more useful for this situation.


But then it would fail for a different reason -- the same reason that
this is a syntax error:

   'hello
   world'


Would it really?  Or would it, because it has already been lexed and 
parsed as string content by then, simply be treated as a new line that 
is part of the string? just like "hello\nworld" is treated after it is 
lexed and parsed?


Of course, if it is passed back through the parser again, you would be 
correct. I don't know the internals that apply here.


Anyway, Eric supplied the real reasons for the limitation, but it does 
seem like if it would be passed back through the "real" parser, that the 
real parser would have no problem handling the ord('\n') part of


f"newline: {ord('\n')}"

if it weren't prohibited by prechecking for \ and making it illegal. But 
there is also presently a custom parser involved, so whether the \ check 
is in there or in a preprocessing step before the parser, I don't know.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/THSM262IIVNYRF2DDSXNPYPSLSK5W3GY/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Greg Ewing

Glenn Linderman wrote:
If that were true, the \n in the above example would already 
be a newline character, and the parsing of the format expression would 
not see the backslash. And if it were true, that would actually be far

more useful for this situation.


But then it would fail for a different reason -- the same reason that
this is a syntax error:

   'hello
   world'

Why go to the extra work of 
prohibiting \ in the format expressions?


Maybe to avoid problems like the above?

Or maybe because it would be confusing -- there are two levels of
string literal processing going on, one on the outer f-string and
one on the embedded string literal in the expression. What level
is the backslash expansion done in? Is it done in both? To get
a backslash in the embedded string, do I need two backslashes or
four? Banning backslashes altogether sidesteps all these issues.

not mentioning the actual escape processing that is done for raw 
strings, regarding \" and \'.


Technically that's not part of "escape processing", since it takes
place during lexical analysis -- it has to, because it affects how
the input stream is divided into tokens.

However, the backslash prohibition seems to apply even to this
use in f-strings:

>>> f"quote: {ord('\"')}"
  File "", line 1
SyntaxError: f-string expression part cannot include a backslash

So it seems that f-strings are even more special than r-strings
when it comes to the treatment of backslashes.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XSMGC4VAPHQPXRNTGZP4TQG3ZNU7TZKK/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Eric V. Smith

n 8/10/2019 7:46 PM, Glenn Linderman wrote:
Because of the "invalid escape sequence" and "raw string" discussion, 
when looking at the documentation, I also noticed the following 
description for f-strings:


Escape sequences are decoded like in ordinary string literals (except 
when a literal is also marked as a raw string). After decoding, the 
grammar for the contents of the string is:

followed by lots of stuff, followed by
Backslashes are not allowed in format expressions and will raise an 
error:

f"newline: {ord('\n')}"   # raises SyntaxError


What I don't understand is how, if f-strings are processed AS 
DESCRIBED, how the \n is ever seen by the format expression.
If I recall correctly, the mentioned decoding is happening on the string 
literal parts of the f-strings (above, the "newline: " part), not the 
expression parts (inside the {}). But it's been a while and I don't 
recall all of the details.


The description is that they are first decoded like ordinary strings, 
and then parsed for the internal grammar containing {} expressions to 
be expanded.  If that were true, the \n in the above example would 
already be a newline character, and the parsing of the format 
expression would not see the backslash. And if it were true, that 
would actually be far more useful for this situation.


So given that it is not true, why not? And why go to the extra work of 
prohibiting \ in the format expressions?


It's a future-proofing thing. See the discussion at 
https://mail.python.org/archives/list/python-dev@python.org/thread/EVXD72IYUN2APF2443OMADKA5WJTOKHD/ 
It has pointers to other parts of the discussion.


At some point, I'm planning on switching the parsing of f-strings from 
the custom parser (see Python/ast.c, FstringParser_ConcatFstring()) to 
having the python parser itself parse the f-strings. This will be 
similar to PEP 536, which doesn't have much detail, but does describe 
some of the motivations.




The PEP 498, of course, has an apparently more accurate description, 
that the {} parsing actually happens before the escape processing. 
Perhaps this avoids making multiple passes over the string to do the 
work, as the literal pieces and format expression pieces have to be 
separate in the generated code, but that is just my speculation: I'd 
like to know the real reason.


Should the documentation be fixed to make the description more 
accurate? If so, I'd be glad to open an issue.


Sure. I'm always in favor of accuracy. The f-string documentation was a 
last-minute rush job that could have used a lot more editing, and more 
eyes are always welcome.


But it will take a fair amount of research to understand it well enough 
to document it in more detail.




The PEP further contains the inaccurate statement:

Like all raw strings in Python, no escape processing is done for raw 
f-strings:


not mentioning the actual escape processing that is done for raw 
strings, regarding \" and \'.


It should probably just say it uses the same rules as raw strings.

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FKNEBB5HTMRX4RWLPTZN5K2WRZ5W7MI6/