[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 10/08/2019 23:30:18, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader 
that baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

Usually, but not always.  I have not infrequently used files with a 
blank extension.
I can't recall using a directory name with an extension (but I can't 
swear that I never have).

Rob Cliffe
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2LDAY5FU64X5HH3GUVGAQNHRSWEB/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Glenn Linderman

On 8/10/2019 5:32 PM, Greg Ewing wrote:

Glenn Linderman wrote:
If that were true, the \n in the above example would already be a 
newline character, and the parsing of the format expression would not 
see the backslash. And if it were true, that would actually be far

more useful for this situation.


But then it would fail for a different reason -- the same reason that
this is a syntax error:

   'hello
   world'


Would it really?  Or would it, because it has already been lexed and 
parsed as string content by then, simply be treated as a new line that 
is part of the string? just like "hello\nworld" is treated after it is 
lexed and parsed?


Of course, if it is passed back through the parser again, you would be 
correct. I don't know the internals that apply here.


Anyway, Eric supplied the real reasons for the limitation, but it does 
seem like if it would be passed back through the "real" parser, that the 
real parser would have no problem handling the ord('\n') part of


f"newline: {ord('\n')}"

if it weren't prohibited by prechecking for \ and making it illegal. But 
there is also presently a custom parser involved, so whether the \ check 
is in there or in a preprocessing step before the parser, I don't know.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/THSM262IIVNYRF2DDSXNPYPSLSK5W3GY/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Greg Ewing

Glenn Linderman wrote:
If that were true, the \n in the above example would already 
be a newline character, and the parsing of the format expression would 
not see the backslash. And if it were true, that would actually be far

more useful for this situation.


But then it would fail for a different reason -- the same reason that
this is a syntax error:

   'hello
   world'

Why go to the extra work of 
prohibiting \ in the format expressions?


Maybe to avoid problems like the above?

Or maybe because it would be confusing -- there are two levels of
string literal processing going on, one on the outer f-string and
one on the embedded string literal in the expression. What level
is the backslash expansion done in? Is it done in both? To get
a backslash in the embedded string, do I need two backslashes or
four? Banning backslashes altogether sidesteps all these issues.

not mentioning the actual escape processing that is done for raw 
strings, regarding \" and \'.


Technically that's not part of "escape processing", since it takes
place during lexical analysis -- it has to, because it affects how
the input stream is divided into tokens.

However, the backslash prohibition seems to apply even to this
use in f-strings:

>>> f"quote: {ord('\"')}"
  File "", line 1
SyntaxError: f-string expression part cannot include a backslash

So it seems that f-strings are even more special than r-strings
when it comes to the treatment of backslashes.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XSMGC4VAPHQPXRNTGZP4TQG3ZNU7TZKK/


[Python-Dev] Re: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Eric V. Smith

n 8/10/2019 7:46 PM, Glenn Linderman wrote:
Because of the "invalid escape sequence" and "raw string" discussion, 
when looking at the documentation, I also noticed the following 
description for f-strings:


Escape sequences are decoded like in ordinary string literals (except 
when a literal is also marked as a raw string). After decoding, the 
grammar for the contents of the string is:

followed by lots of stuff, followed by
Backslashes are not allowed in format expressions and will raise an 
error:

f"newline: {ord('\n')}"   # raises SyntaxError


What I don't understand is how, if f-strings are processed AS 
DESCRIBED, how the \n is ever seen by the format expression.
If I recall correctly, the mentioned decoding is happening on the string 
literal parts of the f-strings (above, the "newline: " part), not the 
expression parts (inside the {}). But it's been a while and I don't 
recall all of the details.


The description is that they are first decoded like ordinary strings, 
and then parsed for the internal grammar containing {} expressions to 
be expanded.  If that were true, the \n in the above example would 
already be a newline character, and the parsing of the format 
expression would not see the backslash. And if it were true, that 
would actually be far more useful for this situation.


So given that it is not true, why not? And why go to the extra work of 
prohibiting \ in the format expressions?


It's a future-proofing thing. See the discussion at 
https://mail.python.org/archives/list/python-dev@python.org/thread/EVXD72IYUN2APF2443OMADKA5WJTOKHD/ 
It has pointers to other parts of the discussion.


At some point, I'm planning on switching the parsing of f-strings from 
the custom parser (see Python/ast.c, FstringParser_ConcatFstring()) to 
having the python parser itself parse the f-strings. This will be 
similar to PEP 536, which doesn't have much detail, but does describe 
some of the motivations.




The PEP 498, of course, has an apparently more accurate description, 
that the {} parsing actually happens before the escape processing. 
Perhaps this avoids making multiple passes over the string to do the 
work, as the literal pieces and format expression pieces have to be 
separate in the generated code, but that is just my speculation: I'd 
like to know the real reason.


Should the documentation be fixed to make the description more 
accurate? If so, I'd be glad to open an issue.


Sure. I'm always in favor of accuracy. The f-string documentation was a 
last-minute rush job that could have used a lot more editing, and more 
eyes are always welcome.


But it will take a fair amount of research to understand it well enough 
to document it in more detail.




The PEP further contains the inaccurate statement:

Like all raw strings in Python, no escape processing is done for raw 
f-strings:


not mentioning the actual escape processing that is done for raw 
strings, regarding \" and \'.


It should probably just say it uses the same rules as raw strings.

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FKNEBB5HTMRX4RWLPTZN5K2WRZ5W7MI6/


[Python-Dev] An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

2019-08-10 Thread Glenn Linderman
Because of the "invalid escape sequence" and "raw string" discussion, 
when looking at the documentation, I also noticed the following 
description for f-strings:


Escape sequences are decoded like in ordinary string literals (except 
when a literal is also marked as a raw string). After decoding, the 
grammar for the contents of the string is:

followed by lots of stuff, followed by

Backslashes are not allowed in format expressions and will raise an error:
f"newline: {ord('\n')}"   # raises SyntaxError


What I don't understand is how, if f-strings are processed AS DESCRIBED, 
how the \n is ever seen by the format expression.


The description is that they are first decoded like ordinary strings, 
and then parsed for the internal grammar containing {} expressions to be 
expanded.  If that were true, the \n in the above example would already 
be a newline character, and the parsing of the format expression would 
not see the backslash. And if it were true, that would actually be far 
more useful for this situation.


So given that it is not true, why not? And why go to the extra work of 
prohibiting \ in the format expressions?


The PEP 498, of course, has an apparently more accurate description, 
that the {} parsing actually happens before the escape processing. 
Perhaps this avoids making multiple passes over the string to do the 
work, as the literal pieces and format expression pieces have to be 
separate in the generated code, but that is just my speculation: I'd 
like to know the real reason.


Should the documentation be fixed to make the description more accurate? 
If so, I'd be glad to open an issue.


The PEP further contains the inaccurate statement:

Like all raw strings in Python, no escape processing is done for raw 
f-strings:


not mentioning the actual escape processing that is done for raw 
strings, regarding \" and \'.






___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OH3APTWUWPDC376FBRKNEXBKQYPP6LXY/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 3:36 PM, Greg Ewing wrote:

Glenn Linderman wrote:


I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


Quite rare, I expect, but it's bound to break someone's code.
It might be better to introduce a new string prefix, e.g.
'v' for 'verbatim':

   v"C:\Users\Fred\"

Which is why I suggested  rr"C:\directory\", but allowed as how there 
might be better spellings I like your  v for verbatim !
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GTOVRKM7Q4VU67KYDQF6ICU7HAJDSBRX/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Greg Ewing

Glenn Linderman wrote:


I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


Quite rare, I expect, but it's bound to break someone's code.
It might be better to introduce a new string prefix, e.g.
'v' for 'verbatim':

   v"C:\Users\Fred\"

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TQM37LMDVIKQ7UXLNLVMUUSF3ZYT7TYI/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Greg Ewing

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader that 
baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F4Y4HNU72QOVWHCGLD74N7ZTAEJP2XBF/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, Rob Cliffe via Python-Dev  wrote:
> On 10/08/2019 11:50:35, eryk sun wrote:
>> On 8/9/19, Steven D'Aprano  wrote:
>>> I'm also curious why the string needs to *end* with a backslash. Both of
>>> these are the same path:
>>>
>>>  C:\foo\bar\baz\
>>>  C:\foo\bar\baz
>
> Also, the former is simply more *informative* - it tells the reader that
> baz is expected to be a directory, not a file.

This is an important point that I overlooked. The trailing backslash
is more than just a redundant character to inform human readers. Refer
to [MS-FSA] 2.1.5.1 "Server Requests an Open of a File" [1]. A
create/open fails with STATUS_OBJECT_NAME_INVALID if either of the
following is true:

* PathName contains a trailing backslash and
  CreateOptions.FILE_NON_DIRECTORY_FILE is
  TRUE.

* PathName contains a trailing backslash and
  StreamTypeToOpen is DataStream

For NtCreateFile or NtOpenFile (in the NT API), the
FILE_NON_DIRECTORY_FILE option restricts the call to a regular file,
and FILE_DIRECTORY_FILE restricts it to a directory. With neither
option, the call can target either a file or directory. A trailing
backslash is another information channel. It tells the filesystem that
the target has to be a directory. If we specify
FILE_NON_DIRECTORY_FILE with a trailing backslash on the name, this is
an immediate failure as an invalid name without even checking the
entry. If we specify neither option and use a trailing backslash, it's
an invalid name if the filesystem finds a regular file or data stream.
Had the call specified the FILE_DIRECTORY_FILE option, it would
instead fail with STATUS_NOT_A_DIRECTORY.

We can see this in practice in the published source for the fastfat
filesystem driver. FatCommonCreate [2] (for a create or open) has the
following code to handle the second case (in this code, an FCB is a
file control block for a regular file, and a DCB is a directory
control block):

if (NodeType(Fcb) == FAT_NTC_FCB) {
//
//  Check if we were only to open a directory
//
if (OpenDirectory) {
DebugTrace(0, Dbg, "Cannot open file as directory\n", 0);
try_return( Iosb.Status = STATUS_NOT_A_DIRECTORY );
}
DebugTrace(0, Dbg, "Open existing fcb, Fcb = %p\n", Fcb);
if ( TrailingBackslash ) {
try_return( Iosb.Status = STATUS_OBJECT_NAME_INVALID );
}

We observe the first case with a typical CreateFileW call, which uses
the option FILE_NON_DIRECTORY_FILE. In the following example "baz" is
a regular file:

>>> f = open(r'foo\bar\baz') # success
>>> try: open('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[Errno 22] Invalid argument: 'foo\\bar\\baz\\'

C EINVAL (22) is mapped from Windows ERROR_INVALID_NAME (123), which
is mapped from NT STATUS_OBJECT_NAME_INVALID (0xC033).

We can observe the second case with os.stat(), which calls CreateFileW
with backup semantics, which omits the FILE_NON_DIRECTORY_FILE option
in order to allow the call to open either a file or directory. In this
case the filesystem has to actually check that "baz" is a data file
before it can fail the call, as was shown in the fasfat code snippet
above:

>>> try: os.stat('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[WinError 123] The filename, directory name, or
volume label syntax is incorrect: 'foo\\bar\\baz\\'

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/8ada5fbe-db4e-49fd-aef6-20d54b748e40
[2] 
https://github.com/microsoft/Windows-driver-samples/blob/74200/filesys/fastfat/create.c#L1398
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QPDXUY4OXR2XOCNUHSKC7QRQGAXWV5WQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 12:19 PM, Guido van Rossum wrote:

Regular expressions.


I assume that is in response to the "good use for \" escape" question?

But can't you just surround them with ' instead of " ?  Or  ''' ?



On Sat, Aug 10, 2019 at 12:12 Glenn Linderman > wrote:


On 8/10/2019 11:16 AM, Terry Reedy wrote:

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where
use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to
people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply
say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your
code:

1. Raw strings


As pointed out elsewhere, Raw strings have limitations, paths
ending in \ cannot be represented, and such do exist in various
situations, not all of which can be easily avoided... except by
the "extra character contortion" of "C:\directory\ "[:-1]  (does
someone know a better way?)

It would be useful to make a "really raw" string that doesn't
treat \ special in any way. With 4 different quoting possibilities
( ' " ''' """ ) there isn't really a reason to treat \ special at
the end of a raw string, except for backward compatibility.

I wonder how many raw strings actually use the \"  escape
productively? Maybe that should be deprecated too! ?  I can't
think of a good and necessary use for it, can anyone?

Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string
where \\ can only be used as an escape or escaped character, \e
means the ASCII escape character, and \ followed by a character
with no escape definition would be an error."

Of course "extended escape" could be spelled lots of different
ways too, but not the same way as "really raw" :)


2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator,
where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing
differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows,
should have a section of file paths, at most x.y.z, so visible in
the TOC listed by https://docs.python.org/3/using/index.html



___
Python-Dev mailing list -- python-dev@python.org

To unsubscribe send an email to python-dev-le...@python.org

https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/

--
--Guido (mobile)


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BZDAXLX2IQTIUT2W47SFI2CJTZSPXY2V/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Guido van Rossum
Regular expressions.

On Sat, Aug 10, 2019 at 12:12 Glenn Linderman  wrote:

> On 8/10/2019 11:16 AM, Terry Reedy wrote:
>
> On 8/10/2019 4:33 AM, Paul Moore wrote:
>
> (Side issue)
>
>
> This deserves its own thread.
>
> As a Windows developer, who has seen far too many cases where use of
> slashes in filenames implies a Unix-based developer not thinking
> sufficiently about Windows compatibility, or where it leads to people
> hard coding '/' rather than using os.sep (or better, pathlib), I
> strongly object to this characterisation. Rather, I would simply say
> "to make Windows users more aware of the clash in usage between
> backslashes in filenames and backslashes as string escapes".
>
> There are *many* valid ways to write Windows pathnames in your code:
>
> 1. Raw strings
>
>
> As pointed out elsewhere, Raw strings have limitations, paths ending in \
> cannot be represented, and such do exist in various situations, not all of
> which can be easily avoided... except by the "extra character contortion"
> of   "C:\directory\ "[:-1]  (does someone know a better way?)
>
> It would be useful to make a "really raw" string that doesn't treat \
> special in any way. With 4 different quoting possibilities ( ' " ''' """ )
> there isn't really a reason to treat \ special at the end of a raw string,
> except for backward compatibility.
>
> I wonder how many raw strings actually use the \"  escape productively?
> Maybe that should be deprecated too! ?  I can't think of a good and
> necessary use for it, can anyone?
>
> Or invent "really raw" in some spelling, such as rr"c:\directory\"
> or e for exact, or x for exact, or  here>"c:\directory\"
>
> And that brings me to the thought that if   \e  wants to become an escape
> for escape, that maybe there should be an "extended escape" prefix... if
> you want to use more escapes, define   ee"string where \\ can only be used
> as an escape or escaped character, \e means the ASCII escape character, and
> \ followed by a character with no escape definition would be an error."
>
> Of course "extended escape" could be spelled lots of different ways too,
> but not the same way as "really raw" :)
>
> 2. Doubling the backslashes
> 3. Using pathlib (possibly with slash as a directory separator, where
> it's explicitly noted as a portable option)
> 4. Using slashes
>
> IMO, using slashes is the *worst* of these. But this latter is a
> matter of opinion - I've no objection to others believing differently,
> but I *do* object to slashes being presented as the only option, or
> the recommended option without qualification.
>
>
> Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a
> section of file paths, at most x.y.z, so visible in the TOC listed by
> https://docs.python.org/3/using/index.html
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LSFNRZTMK6HLUCE7IAWKD3GCBLZ7KINQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 11:16 AM, Terry Reedy wrote:

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings


As pointed out elsewhere, Raw strings have limitations, paths ending in 
\ cannot be represented, and such do exist in various situations, not 
all of which can be easily avoided... except by the "extra character 
contortion" of   "C:\directory\ "[:-1]  (does someone know a better way?)


It would be useful to make a "really raw" string that doesn't treat \ 
special in any way. With 4 different quoting possibilities ( ' " ''' """ 
) there isn't really a reason to treat \ special at the end of a raw 
string, except for backward compatibility.


I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or here>"c:\directory\"


And that brings me to the thought that if   \e  wants to become an 
escape for escape, that maybe there should be an "extended escape" 
prefix... if you want to use more escapes, define   ee"string where \\ 
can only be used as an escape or escaped character, \e means the ASCII 
escape character, and \ followed by a character with no escape 
definition would be an error."


Of course "extended escape" could be spelled lots of different ways too, 
but not the same way as "really raw" :)



2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows, should 
have a section of file paths, at most x.y.z, so visible in the TOC 
listed by https://docs.python.org/3/using/index.html




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJYKNMQAS63QW4HS2TUPMQH7LSL/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 10/08/2019 11:50:35, eryk sun wrote:

On 8/9/19, Steven D'Aprano  wrote:

I'm also curious why the string needs to *end* with a backslash. Both of
these are the same path:

 C:\foo\bar\baz\
 C:\foo\bar\baz
Also, the former is simply more *informative* - it tells the reader that 
baz is expected to be a directory, not a file.

Rob Cliffe

The above two cases are equivalent. But that's not the case for the
root directory. Unlike Unix, filesystem namespaces are implemented
directly on devices. For example, "//./C:" might resolve to a volume
device such as "\\Device\\HarddiskVolume2". With a trailing slash
added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is
the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to
the working directory on drive "C:". The system runtime library looks
for this path in a hidden environment variable named "=C:". The
Windows API never sets these hidden "=X:" drive variables. The C
runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or
backslash, such as GetVolumeInformationW [1].
GetVolumeNameForVolumeMountPointW [2] actually requires it to be a
trailing backslash. It will not accept a trailing forward slash such
as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name
(e.g. "?\\Volume{----}\\")
returned by the latter includes a trailing backslash, which must be
present in the target path in order for a mountpoint to function
properly as a directory, else it would resolve to the volume device
instead of the root directory.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumeinformationw
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumenameforvolumemountpointw


If they're Windows developers, they ought to be aware that the Windows
file system API allows / anywhere you can use \ and it is the
common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere
that we can use backslash. It's usually allowed, but not always. For
the most part, the conditions where forward slash is not supported are
intentional.

Windows replaces forward slash with backslash in normal DOS paths and
normal device paths. But sometimes we have to use a special form of
device path that bypasses normalization. A path that isn't normalized
can only use backslash as the path separator. For example, the most
common case is that the process doesn't have long paths enabled. In
this case we're limited to MAX_PATH, which limits file paths to a
paltry 259 characters (sans the terminating null); the current
directory to 258 characters (sans a trailing backslash and null); and
the path of a new directory to 247 characters (subtract 12 from 259 to
leave space for an 8.3 filename). By skipping DOS normalization, we
can access a path with up to about 32,750 characters (i.e. 32,767 sans
the length of the device name in the final NT path under
"\\Device\\").

(Long normalized paths are available starting in Windows 10, but the
system policy that allows this is disabled by default, and even if
enabled, each application has to declare itself to be long-path aware
in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device
directory (in the object namespace), which shadows the global device
directory. In NT, this directory is aliased to a special "\\??\\"
prefix (backslash only). A local device directory is created for each
logon session (not terminal session) by the security system that runs
in terminal session 0 (i.e. the system services session). The
per-logon directory is located at "\\Sessions\\0\\DosDevices\\". In the Windows API, it's accessible as "//?/" or "//./",
or with any mix of forward slashes or backslashes, but only the
all-backslash form is special-cased to bypass the normalization step.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EKFO3UNTATS7KVBY2WOUTFMAF5/

---
This email has been checked for viruses by AVG.
https://www.avg.com




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IVNUAUUHURCS4P77ZVFFK3H665ZKXGBC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Terry Reedy

On 8/10/2019 4:33 AM, Paul Moore wrote:


(Side issue)


This deserves its own thread.


As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.


Perhaps Python Setup and Usage, 3. Using Python on Windows, should have 
a section of file paths, at most x.y.z, so visible in the TOC listed by 
https://docs.python.org/3/using/index.html


--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SH3M5GGHJPIMKVTEYI6FFBYWHVZT7O64/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Glenn Linderman

On 8/10/2019 7:03 AM, Paul Moore wrote:

On Sat, 10 Aug 2019 at 12:06, Chris Angelico  wrote:

On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.

Please expand on why this is the worst?

I did say it was a matter of opinion, so I'm not going to respond if
people say that any of the following is "wrong", but since you asked:

1. Backslash is the native separator, whereas slash is not (see eryk
sun's post for *way* more detail).
2. People who routinely use slash have a tendency to forget to use
os.sep rather than a literal slash in places where it *does* matter.
3. Using slash, in my experience, ends up with paths with "mixed"
separators (os.path.join("C:/work/apps", "foo") ->
'C:/work/apps\\foo') which are messy to deal with, and ugly for the
user.
4. If a path with slashes is displayed directly to the user without
normalisation, it looks incorrect and can confuse users who are only
used to "native" Windows programs.

Etc.
Not to mention the problem of passing paths with / to other windows 
programs via system or subprocess.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7MBGUBTRNLZ5UWCMS4NHYAFGQC6MNQJ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Paul Moore
On Sat, 10 Aug 2019 at 12:06, Chris Angelico  wrote:
>
> On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:
> > There are *many* valid ways to write Windows pathnames in your code:
> >
> > 1. Raw strings
> > 2. Doubling the backslashes
> > 3. Using pathlib (possibly with slash as a directory separator, where
> > it's explicitly noted as a portable option)
> > 4. Using slashes
> >
> > IMO, using slashes is the *worst* of these. But this latter is a
> > matter of opinion - I've no objection to others believing differently,
> > but I *do* object to slashes being presented as the only option, or
> > the recommended option without qualification.
>
> Please expand on why this is the worst?

I did say it was a matter of opinion, so I'm not going to respond if
people say that any of the following is "wrong", but since you asked:

1. Backslash is the native separator, whereas slash is not (see eryk
sun's post for *way* more detail).
2. People who routinely use slash have a tendency to forget to use
os.sep rather than a literal slash in places where it *does* matter.
3. Using slash, in my experience, ends up with paths with "mixed"
separators (os.path.join("C:/work/apps", "foo") ->
'C:/work/apps\\foo') which are messy to deal with, and ugly for the
user.
4. If a path with slashes is displayed directly to the user without
normalisation, it looks incorrect and can confuse users who are only
used to "native" Windows programs.

Etc.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNAZ4G7VCCBZSFJLUCGH6NTTGW726R6G/


[Python-Dev] Re: [SPAM?] Re: What to do about invalid escape sequences

2019-08-10 Thread Richard Damon
On 8/10/19 7:05 AM, Chris Angelico wrote:
> On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:
>> There are *many* valid ways to write Windows pathnames in your code:
>>
>> 1. Raw strings
>> 2. Doubling the backslashes
>> 3. Using pathlib (possibly with slash as a directory separator, where
>> it's explicitly noted as a portable option)
>> 4. Using slashes
>>
>> IMO, using slashes is the *worst* of these. But this latter is a
>> matter of opinion - I've no objection to others believing differently,
>> but I *do* object to slashes being presented as the only option, or
>> the recommended option without qualification.
> Please expand on why this is the worst?
>
> ChrisA

One big issue with trying to get use to using / on windows for the
directory separator is that it doesn't work for many windows programs
because on Windows the / character is defined to be the option character
(instead of - for *nix)

Yes, you can write your program to use the foreign convention of using -
for options, and because the system calls accept either \ or / as the
directory separator, paths which use the 'wrong' separator will work,
but your program will be violating the conventions of the host environment.

-- 
Richard Damon
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RMEUSPLAST7INRQQDLILW3IYETWDITMV/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, eryk sun  wrote:
>
> The per-logon directory is located at "\\Sessions\\0\\DosDevices\\ Session ID>". In the Windows API, it's accessible as "//?/" or "//./",
> or with any mix of forward slashes or backslashes, but only the
> all-backslash form is special-cased to bypass the normalization step.

Correction: I slipped up in that last sentence. Only the all-backslash
form that's in the "?" namespace bypasses normalization, as most
Windows users should at least have seen in passing. These special
device paths pop up here and there. For example, r'\\?\C:\Temp\spam. .
.' allows creating or opening a file named "spam. . .", which the
Windows API would normalize as "spam". But I don't recommend
sidestepping the normal rules -- except for the path length limit
because there are ways to make long paths conveniently accessible
(e.g. symbolic links, bind-like mountpoints, and subst drives).

Sometimes people also come across "\\??\\" paths and come to the
mistaken conclusion that these can be used in Windows API programs.
No, they're for NT. The runtime library mangles them, e.g.
nt._getfullpathname(r'\??\C:') == 'C:\\??\\C:'.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VANNT2SIH7EBPEOUC6M7HI7PYASJPYC7/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Rob Cliffe via Python-Dev



On 06/08/2019 23:41:25, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Sorry, that won't work.  Strings are parsed at compile time, open() 
is executed at run-time.


It could check for control characters, which are probably the result
of a backslash accident. Maybe even auto-correct them...


By "It", do you mean open() ?  If so:
It already checks for control characters, at least with Python 2.7 on 
Windows:


>>> open('mydir\test')
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 22] invalid mode ('r') or filename: 'mydir\test'

As for auto-correct (presumably "\a" to "\\a", "\b" to "\\b" etc.), I 
hope you're not serious.
"In the face of gibberish, refuse the temptation to show how smart your 
guessing is."

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UK46EASIZVFTIQPORH7AG3EFB522NFI3/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Chris Angelico
On Sat, Aug 10, 2019 at 6:39 PM Paul Moore  wrote:
> There are *many* valid ways to write Windows pathnames in your code:
>
> 1. Raw strings
> 2. Doubling the backslashes
> 3. Using pathlib (possibly with slash as a directory separator, where
> it's explicitly noted as a portable option)
> 4. Using slashes
>
> IMO, using slashes is the *worst* of these. But this latter is a
> matter of opinion - I've no objection to others believing differently,
> but I *do* object to slashes being presented as the only option, or
> the recommended option without qualification.

Please expand on why this is the worst?

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PXVO7OT4EK2GRDC5DM6JXMP3WBOVC7DC/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/9/19, Steven D'Aprano  wrote:
>
> I'm also curious why the string needs to *end* with a backslash. Both of
> these are the same path:
>
> C:\foo\bar\baz\
> C:\foo\bar\baz

The above two cases are equivalent. But that's not the case for the
root directory. Unlike Unix, filesystem namespaces are implemented
directly on devices. For example, "//./C:" might resolve to a volume
device such as "\\Device\\HarddiskVolume2". With a trailing slash
added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is
the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to
the working directory on drive "C:". The system runtime library looks
for this path in a hidden environment variable named "=C:". The
Windows API never sets these hidden "=X:" drive variables. The C
runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or
backslash, such as GetVolumeInformationW [1].
GetVolumeNameForVolumeMountPointW [2] actually requires it to be a
trailing backslash. It will not accept a trailing forward slash such
as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name
(e.g. "?\\Volume{----}\\")
returned by the latter includes a trailing backslash, which must be
present in the target path in order for a mountpoint to function
properly as a directory, else it would resolve to the volume device
instead of the root directory.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumeinformationw
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumenameforvolumemountpointw

> If they're Windows developers, they ought to be aware that the Windows
> file system API allows / anywhere you can use \ and it is the
> common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere
that we can use backslash. It's usually allowed, but not always. For
the most part, the conditions where forward slash is not supported are
intentional.

Windows replaces forward slash with backslash in normal DOS paths and
normal device paths. But sometimes we have to use a special form of
device path that bypasses normalization. A path that isn't normalized
can only use backslash as the path separator. For example, the most
common case is that the process doesn't have long paths enabled. In
this case we're limited to MAX_PATH, which limits file paths to a
paltry 259 characters (sans the terminating null); the current
directory to 258 characters (sans a trailing backslash and null); and
the path of a new directory to 247 characters (subtract 12 from 259 to
leave space for an 8.3 filename). By skipping DOS normalization, we
can access a path with up to about 32,750 characters (i.e. 32,767 sans
the length of the device name in the final NT path under
"\\Device\\").

(Long normalized paths are available starting in Windows 10, but the
system policy that allows this is disabled by default, and even if
enabled, each application has to declare itself to be long-path aware
in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device
directory (in the object namespace), which shadows the global device
directory. In NT, this directory is aliased to a special "\\??\\"
prefix (backslash only). A local device directory is created for each
logon session (not terminal session) by the security system that runs
in terminal session 0 (i.e. the system services session). The
per-logon directory is located at "\\Sessions\\0\\DosDevices\\". In the Windows API, it's accessible as "//?/" or "//./",
or with any mix of forward slashes or backslashes, but only the
all-backslash form is special-cased to bypass the normalization step.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EKFO3UNTATS7KVBY2WOUTFMAF5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Paul Moore
On Sat, 10 Aug 2019 at 00:36, Steven D'Aprano  wrote:
> 2. To strongly discourage newbie Windows developers from hard-coding
> paths using backslashes, but to use forward slashes instead.

(Side issue)

As a Windows developer, who has seen far too many cases where use of
slashes in filenames implies a Unix-based developer not thinking
sufficiently about Windows compatibility, or where it leads to people
hard coding '/' rather than using os.sep (or better, pathlib), I
strongly object to this characterisation. Rather, I would simply say
"to make Windows users more aware of the clash in usage between
backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings
2. Doubling the backslashes
3. Using pathlib (possibly with slash as a directory separator, where
it's explicitly noted as a portable option)
4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a
matter of opinion - I've no objection to others believing differently,
but I *do* object to slashes being presented as the only option, or
the recommended option without qualification.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FZABAKCBZY72FKFRPK3OXPLKSQ62JZ6N/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Steve Holden
While not a total solution, it seems like it might be worthwhile forcing
flake8 or similar checks when uploading PyPI modules.

That would catch the illegal escape sequences where it really matters -
before they enter the ecosystem.

(general) fathead:pyxll-www sholden$ cat t.py
"Docstring with illegal \escape sequence"
(general) fathead:pyxll-www sholden$ flake8 t.py
t.py:1:25: W605 invalid escape sequence '\e'

while this won't mitigate the case for existing packages, it should reduce
the number of packages containing potentially erroneous string constants,
preparing the ground for the eventual introduction of the syntax error.

Steve Holden


On Sat, Aug 10, 2019 at 8:07 AM Serhiy Storchaka 
wrote:

> 10.08.19 02:04, Gregory P. Smith пише:
> > I've merged the PR reverting the behavior in 3.8 and am doing the same
> > in the master branch.
>
> I was going to rebase it to master and go in normal backporting process
> if we decide that DeprecationWarning should be in master. I waited the
> end of the discussion.
>
> > Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in
> > 2.5...  this would be similar.
>
> It is very different because DeprecationWarning for md5.py and sha.py is
> emitted at runtime.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/H5VXWS6UT2OZBTXG7HUERKAQQIQ4BYEA/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/24ID6EF6ESG64B6VFXVRL4XNWP5I7ITW/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Serhiy Storchaka

10.08.19 02:04, Gregory P. Smith пише:
I've merged the PR reverting the behavior in 3.8 and am doing the same 
in the master branch.


I was going to rebase it to master and go in normal backporting process 
if we decide that DeprecationWarning should be in master. I waited the 
end of the discussion.


Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in 
2.5...  this would be similar.


It is very different because DeprecationWarning for md5.py and sha.py is 
emitted at runtime.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/H5VXWS6UT2OZBTXG7HUERKAQQIQ4BYEA/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread Serhiy Storchaka

09.08.19 19:39, Steve Dower пише:
I also posted another possible option that helps solve the real problem 
faced by users, and not just the "we want to have a warning" problem 
that is purely ours.


Warnings solve two problems:

* Teaching users that a backslash has special meaning and should be 
escaped unless it is used for special meaning.


* Avoid breaking or introducing bugs if we add new escape sequences 
(like \e).


* change the SyntaxWarning into a default-silenced one that fires 
every time a .pyc is loaded (this is the hard part, but it's doable)


It was considered an advantage that these warnings are shown only once 
at compile time. So they will be shown to the author of the code, but 
the user of the code will not see them (except of installation time).


Actually we need to distinguish the the author and the user of the code 
and show warnings only to the author. Using .pyc files was just an 
heuristic: the author compiles the Python code, and the user uses 
compiled .pyc files. Would be nice to have more reliable way to 
determine the owning of the code. It is related not only to 
SyntaxWarnings, but to runtime DeprecationWarnings. Maybe silence 
warnings only for readonly files and make files installed by PIP readonly?


* change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to 
explicitly warn when the path contains control characters


This can cause additional harm. Currently you get expected FileNotFound 
when use user specified bad path, it can be caught and handled. But with 
warnings you will either get a noise on the output or an unexpected 
unhandled error.


* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to 
append (or chain) an extra message when either of the filenames 
contains control characters (or change OSError to do it, or the 
default sys.excepthook)


I do not understand what goal will be achieved by this.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BCAOEGQYK5KYAMPDQ5O6KWGCOOQUJ6UV/