Re: Why exception from os.path.exists()?

2018-06-13 Thread Peter J. Holzer
On 2018-06-13 23:56:09 +0300, Marko Rauhamaa wrote:
> "Peter J. Holzer" :
> > POSIX specifies a number of error codes which can be returned by stat():
[...]
> > So none of these is a good choice for the errno parameter of an OSError
> > to be thrown.
> 
> The natural errno value would be EINVAL, which is returned whenever a
> system call is invoked with an illegal argument.

And you present it like it's a new idea, after I have already discussed
the pros and cons of that twice (the second time in the very message you
just replied to, except that decided to not quote that part but some
other part ...)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-13 Thread Marko Rauhamaa
"Peter J. Holzer" :

> POSIX specifies a number of error codes which can be returned by stat():
>
> [EACCES]
> Search permission is denied for a component of the path prefix.
> [EIO]
> An error occurred while reading from the file system.
> [ELOOP]
> A loop exists in symbolic links encountered during resolution of the
> path argument.
> [ENAMETOOLONG]
> The length of a component of a pathname is longer than {NAME_MAX}.
> [ENOENT]
> A component of path does not name an existing file or path is an
> empty string.
> [ENOTDIR]
> A component of the path prefix names an existing file that is
> neither a directory nor a symbolic link to a directory, or the path
> argument contains at least one non-  character and ends with
> one or more trailing  characters and the last pathname
> component names an existing file that is neither a directory nor a
> symbolic link to a directory.
> [EOVERFLOW]
> The file size in bytes or the number of blocks allocated to the file
> or the file serial number cannot be represented correctly in the
> structure pointed to by buf. 
>
> [...]
>
> Note that none of these covers "file name contains an illegal character"
> for the simple reason that on POSIX systems there are no illegal
> characters.
>
> So none of these is a good choice for the errno parameter of an OSError
> to be thrown.

The natural errno value would be EINVAL, which is returned whenever a
system call is invoked with an illegal argument.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-13 Thread Peter J. Holzer
On 2018-06-11 14:23:42 +0300, Marko Rauhamaa wrote:
> "Peter J. Holzer" :
> > On 2018-06-11 01:06:37 +, Steven D'Aprano wrote:
> >> Baking a limitation of some file systems into the high-level
> >> interface is simply a *bad idea*.
> >
> > We aren't talking about a high-level interface here.
> 
> Call it high-level or not, we *are* talking about an interface
> ("os.path") whose whole raison d'être is abstracting OS specifics from
> basic pathname processing.

I can understand that Stephen missed the fact that we were talking about
os.stat, not os.path. But how could you miss that? It was you who
brought up os.stat and I was just replying to your message. Please try
to keep track of what you are talking about, otherwise discussions will
not be very productive.


> > You are barking up the wrong tree here.
> 
> I believe the "wrong tree" would want a ValueError to be raised in this
> situation.

Nope. The wrong tree is os.path, when we are talking about os.stat.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-13 Thread Peter J. Holzer
On 2018-06-13 10:10:03 +0300, Marko Rauhamaa wrote:
> "Peter J. Holzer" :
> > On 2018-06-11 12:24:54 +, Steven D'Aprano wrote:
> >> It also clearly states:
> >> 
> >> All functions in this module raise OSError in the case of
> >> invalid or inaccessible file names and paths, or other
> >> arguments that have the correct type, but are not accepted
> >> by the operating system. 
> >> 
> >> You know... like strings with NUL in them.
> 
> Nice catch!
> 
> > Ok. I missed that. So either the documentation or the implementation
> > should be fixed. 
> >
> > In any case, if the implementation is changed, I still think that
> > OSError(ENOENT) is wrong. It would have to be OSError(None, "embedded
> > null byte"), or, if that is not possible (I haven't checked)
> > OSError(EINVAL, "embedded null byte"), although that is slightly
> > misleading (it implies that the OS returned EINVAL, which it didn't).
> 
> You say "misleading", I say "abstracting".

If I get an error message which leads me on a wild goose chase, I call
that misleading when I'm in a good mood. If I'm feeling cranky, I call
it "lying".


> > The same check for NUL is also in other functions (e.g. open()), so
> > those would have to be changed as well.
> 
> Maybe.

Consistency is a virtue.


> > I wasn't entirely clear here. What I meant is that POSIX systems, as a
> > group, provide no such way.
> 
> I still don't see how POSIX is directly relevant here.

POSIX systems (or more specifically, systems where the Python
implementation uses a POSIX-conforming API to access the file system)
are relevant here because on such systems the Python implementation
needs to treat filenames with an embedded NUL specially.

The reasons have been mentioned several times in this threadm, but to
recap:

1) The API uses nul-terminated byte strings for file names. 
2) Python may also use byte strings for for file names, but they are not
   nul-terminated (they may contain nuls)
3) Simply passing a pointer to the start of a python byte string to the
   OS seems to work, and is therefore tempting.
4) But this would mean the OS gets a different file name than the
   application passed to it if the name contains NUL, which can lead to
   security holes (this isn't theoretical, it has happened)
5) Therefore an implemntation must not succumb to the tempation in point 
   3 and must explicitely check for NULs.

A theoretical Python implementation on MacOS using the Carbon API
wouldn't have to do this (and in fact it shouldn't). This is
system-dependent code ensuring that the OS API is called correctly.

For os.stat() POSIX is further relevant because stat() is a POSIX
function. On POSIX systems, os.stat() is just a very thin wrapper around
the syscall. On other systems, POSIX stat is basically emulated by
invoking other system calls.

A user on a POSIX system should therefore expect the result of os.stat()
be the same as that of the stat() system call (i.e. if successful the
fields should have the same values and if not, the exception should
reflect the errno returned by the OS). On other systems a user can only
expect a rough correspondence between what the actual system call
returned and what os.stat() returns, because there may not be a simple
1:1 mapping.

POSIX specifies a number of error codes which can be returned by stat():

[EACCES]
Search permission is denied for a component of the path prefix.
[EIO]
An error occurred while reading from the file system.
[ELOOP]
A loop exists in symbolic links encountered during resolution of the
path argument.
[ENAMETOOLONG]
The length of a component of a pathname is longer than {NAME_MAX}.
[ENOENT]
A component of path does not name an existing file or path is an
empty string.
[ENOTDIR]
A component of the path prefix names an existing file that is
neither a directory nor a symbolic link to a directory, or the path
argument contains at least one non-  character and ends with
one or more trailing  characters and the last pathname
component names an existing file that is neither a directory nor a
symbolic link to a directory.
[EOVERFLOW]
The file size in bytes or the number of blocks allocated to the file
or the file serial number cannot be represented correctly in the
structure pointed to by buf. 

A Python application may want to treat these errors differently. Even if
the application doesn't, the user reading the stack trace will want to
see the correct errno and not some generic "something went wrong"
message.

Note that none of these covers "file name contains an illegal character"
for the simple reason that on POSIX systems there are no illegal
characters. 

So none of these is a good choice for the errno parameter of an OSError
to be thrown. One might try to find out what Linux returns on a
filesystem which doesn't allow some characters, but that would be Linux
specific and probably even file system specific, so not a good choice
for a situation 

Re: Why exception from os.path.exists()?

2018-06-13 Thread Steven D'Aprano
On Wed, 13 Jun 2018 10:10:03 +0300, Marko Rauhamaa wrote:

> "Peter J. Holzer" :
[...]
>> I wasn't entirely clear here. What I meant is that POSIX systems, as a
>> group, provide no such way.
> 
> I still don't see how POSIX is directly relevant here.

Linux users like to sneer at Windows users for believing that Windows is 
a synonym for "computer", that what Windows does is what all computers 
do, but Linux users (especially if they're also C programmers) sometimes 
have a hard time remembering that "what POSIX does" is no more a 
universal limitation on computing than "what Windows does".

(And ironically, Linux doesn't even have POSIX certification.)

That is, when they're not blindly writing shell scripts using bashisms 
and expecting them to work under any shell :-)


I still would like to see one real-world use-case where the distinction 
between "file name is invalid because it has NUL" and "file name is 
invalid for any of a dozen other reasons" is necessary and important.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-13 Thread Marko Rauhamaa
"Peter J. Holzer" :

> On 2018-06-11 12:24:54 +, Steven D'Aprano wrote:
>> It also clearly states:
>> 
>> All functions in this module raise OSError in the case of
>> invalid or inaccessible file names and paths, or other
>> arguments that have the correct type, but are not accepted
>> by the operating system. 
>> 
>> You know... like strings with NUL in them.

Nice catch!

> Ok. I missed that. So either the documentation or the implementation
> should be fixed. 
>
> In any case, if the implementation is changed, I still think that
> OSError(ENOENT) is wrong. It would have to be OSError(None, "embedded
> null byte"), or, if that is not possible (I haven't checked)
> OSError(EINVAL, "embedded null byte"), although that is slightly
> misleading (it implies that the OS returned EINVAL, which it didn't).

You say "misleading", I say "abstracting".

> The same check for NUL is also in other functions (e.g. open()), so
> those would have to be changed as well.

Maybe.

> I wasn't entirely clear here. What I meant is that POSIX systems, as a
> group, provide no such way.

I still don't see how POSIX is directly relevant here.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-13 Thread Peter J. Holzer
On 2018-06-11 12:24:54 +, Steven D'Aprano wrote:
> On Mon, 11 Jun 2018 12:31:09 +0200, Peter J. Holzer wrote:
> > On 2018-06-11 01:06:37 +, Steven D'Aprano wrote:
> >> On Sun, 10 Jun 2018 23:57:35 +0200, Peter J. Holzer wrote:
> > 
> > [Note: I was talking about os.stat here, not os.path.exists. I agree
> > that os.path.exists (and the other boolean functions) should simply
> > return false]
> 
> o_O
> 
> Well... I don't know what to say. In a thread about os.path.exists, 
> you're talking about os.stat. I see. I don't understand why,

Easy to explain. Marko wrote:

| It may even be that the fix needs to go to os.stat(). That's for the
| Python gods to decide.

And to this I replied:

| I'm not a Python god, but I don't think os.stat() should be changed.
[and then explained why]

I do take some care to quote only what is necessary for context and
phrase my replies in a way which make it clear what I'm replying to.
Apparently I wasn't successful in this case (even though the message is
relatively short). If you have suggestions on how I might have been
clearer, I'm all ears.


> but if you want to derail the thread to discuss something else,

I wasn't derailing the thread, I was replying to a specific suggestion.

> okay, I'll play along.
> 
> 
> [...]
> > We are talking about platform-specific code here.
> 
> Are we? I thought we were talking about Python and the os module. The 
> very first paragraph of the documentation for os says:
> 
> This module provides a portable way of using operating
> system dependent functionality.

Note the magic words "system dependent functionality". The way you call
os.stat() is the same on all systems, but what it actually does and what
it returns depends on the OS. For example, the description of class
os.stat_result says:

|  st_ino
|
|Platform dependent, but if non-zero, uniquely identifies the file
|for a given value of st_dev.
[...]
| On some Unix systems (such as Linux), the following attributes may also
| be available:
[...]
| On other Unix systems (such as FreeBSD), the following attributes may be
| available (but may be only filled out if root tries to use them):
[...]
| On Mac OS systems, the following attributes may also be available:
[...]
| On Windows systems, the following attribute is also available:

So, quite some variation dependent on the system type.

> It also clearly states:
> 
> All functions in this module raise OSError in the case of
> invalid or inaccessible file names and paths, or other
> arguments that have the correct type, but are not accepted
> by the operating system. 
> 
> You know... like strings with NUL in them.

Ok. I missed that. So either the documentation or the implementation
should be fixed. 

In any case, if the implementation is changed, I still think that
OSError(ENOENT) is wrong. It would have to be OSError(None, "embedded
null byte"), or, if that is not possible (I haven't checked)
OSError(EINVAL, "embedded null byte"), although that is slightly
misleading (it implies that the OS returned EINVAL, which it didn't).

The same check for NUL is also in other functions (e.g. open()), so
those would have to be changed as well.


> > On POSIX systems, there IS NO WAY to pass a filename with an
> > embedded NUL byte to the OS.
> 
> Mac OS X is certified as POSIX-compliant. As I pointed out in a previous 
> email, OS X also provides APIs that are perfectly capable of dealing with 
> NULs in file names.

I wasn't entirely clear here. What I meant is that POSIX systems, as a
group, provide no such way. Some specific systems may have such an API,
but it isn't in POSIX and it not even likely to be the same on different
systems, so it cannot be used portably across POSIX systems or even the
subset of systems which allow NUL in filenames (unless that subset
happens to contain only one OS).


> POSIX specifies a *minimum* set of functionality, not a maximum.

But when writing an application for POSIX systems you are interested in
that minimum set of functionality. Plus maybe some common extensions.

Oh, and from the test results Gregory Ewing posted, it looks like Python
does use the POSIX API (and not the Carbon API) on MacOS. So you won't
ever see a filename containing NUL in a Python program on MacOS and you
won't be able to create one.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Antoon Pardon
On 11-06-18 13:59, Steven D'Aprano wrote:
> On Mon, 11 Jun 2018 09:55:06 +0200, Antoon Pardon wrote:
>
>> On 11-06-18 02:28, Steven D'Aprano wrote:
> [...]
>>> open(foo) raises an exception if foo doesn't exist;
>>>
>>> os.path.exists(foo) returns False if foo doesn't exist.
>> That is not correct. The path can exist and os.path.exists still return
>> False.
> It is correct. I made no claim about what happens if foo exists. I said 
> only that if it doesn't exist, the function returns False.
>
> If you're going to be pedantic, be *right*. Being pedantically wrong is 
> just sad.

I can live with that.

-- 
Antoon.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Steven D'Aprano
On Mon, 11 Jun 2018 12:31:09 +0200, Peter J. Holzer wrote:

> On 2018-06-11 01:06:37 +, Steven D'Aprano wrote:
>> On Sun, 10 Jun 2018 23:57:35 +0200, Peter J. Holzer wrote:
> 
> [Note: I was talking about os.stat here, not os.path.exists. I agree
> that os.path.exists (and the other boolean functions) should simply
> return false]

o_O

Well... I don't know what to say. In a thread about os.path.exists, 
you're talking about os.stat. I see. I don't understand why, but if you 
want to derail the thread to discuss something else, okay, I'll play 
along.


[...]
> We are talking about platform-specific code here.

Are we? I thought we were talking about Python and the os module. The 
very first paragraph of the documentation for os says:

This module provides a portable way of using operating
system dependent functionality.

https://docs.python.org/3/library/os.html


It also clearly states:

All functions in this module raise OSError in the case of
invalid or inaccessible file names and paths, or other
arguments that have the correct type, but are not accepted
by the operating system. 

You know... like strings with NUL in them.



> On POSIX systems, there IS NO WAY to pass a
> filename with an embedded NUL byte to the OS.

Mac OS X is certified as POSIX-compliant. As I pointed out in a previous 
email, OS X also provides APIs that are perfectly capable of dealing with 
NULs in file names.

POSIX specifies a *minimum* set of functionality, not a maximum.



> On such systems Python
> MUST NOT simply pass a pointer to the start of the (utf-8 encoded)
> string to the OS, it must take special action. It could fake an ENOENT
> error, but that would be confusing in many situations. Therefore it
> should raise an exception which cannot be confused with an error
> returned from the OS.

I agree that for os.stat, which already raises on error, raising for 
invalid file names is the right thing to do. As stated in the 
documentation, the right exception is an OSError, which matches the 
current behaviour for other impossible and invalid file names:


py> os.stat('')
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: ''



[...]
>> Baking a limitation of some file systems into the high-level interface
>> is simply a *bad idea*.
> 
> We aren't talking about a high-level interface here.

Yes we are.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Steven D'Aprano
On Mon, 11 Jun 2018 09:55:06 +0200, Antoon Pardon wrote:

> On 11-06-18 02:28, Steven D'Aprano wrote:
[...]
>> open(foo) raises an exception if foo doesn't exist;
>>
>> os.path.exists(foo) returns False if foo doesn't exist.
> 
> That is not correct. The path can exist and os.path.exists still return
> False.

It is correct. I made no claim about what happens if foo exists. I said 
only that if it doesn't exist, the function returns False.

If you're going to be pedantic, be *right*. Being pedantically wrong is 
just sad.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Marko Rauhamaa
"Peter J. Holzer" :

> On 2018-06-11 01:06:37 +, Steven D'Aprano wrote:
>> Baking a limitation of some file systems into the high-level
>> interface is simply a *bad idea*.
>
> We aren't talking about a high-level interface here.

Call it high-level or not, we *are* talking about an interface
("os.path") whose whole raison d'être is abstracting OS specifics from
basic pathname processing.

> We are talking about low-level code which is just above the OS. THAT
> code MUST make sure that it calls the OS API with meaningful
> parameters or not at all. And it should raise an Exception in the
> latter case. And that exception should not be misleading.

I respectfully disagree. You are breaking the illusion os.path seeks to
provide. What you are saying, essentially, is that os.path should not
exist.

Since it does exist and can't be wished away *and* since the trap we are
talking about ensnares well-meaning developers, there should be a
*practical* defense against accidents, which could be serious.

>> How would you feel if there were a whole lot of ignorant Pascal
>> programmers arguing that it was fundamentally impossible for file
>> names to exceed 255 characters, and therefore os.path.exists() out to
>> raise ValueError when passed a file name of 256 characters?
>
> You are barking up the wrong tree here.

I believe the "wrong tree" would want a ValueError to be raised in this
situation.

Funny thing is, nothing in the os.path.exists() API suggests what kinds
of illegal value result in a ValueError and what result in a False. The
application developer has no way of guaranteeing a ValueError won't take
place. It should normally be possible for an application to avoid a
ValueError (or TypeError) if it so chose.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Peter J. Holzer
On 2018-06-11 01:06:37 +, Steven D'Aprano wrote:
> On Sun, 10 Jun 2018 23:57:35 +0200, Peter J. Holzer wrote:

[Note: I was talking about os.stat here, not os.path.exists. I agree
that os.path.exists (and the other boolean functions) should simply
return false]

> > I think this is worth keeping, and "I couldn't pass that file name to
> > the OS" is a different error than "the OS told me the file doesn't
> > exist", so I think it should be a different exception.
> 
> What makes you think that NUL bytes are a fundamental limitation that no 
> OS could every cope with?

What makes you think that I think that? We are talking about
platform-specific code here. On POSIX systems, there IS NO WAY to pass a
filename with an embedded NUL byte to the OS. On such systems Python
MUST NOT simply pass a pointer to the start of the (utf-8 encoded)
string to the OS, it must take special action. It could fake an ENOENT
error, but that would be confusing in many situations. Therefore it
should raise an exception which cannot be confused with an error
returned from the OS.


> Classic Mac OS takes file names as Pascal strings, with a length byte and 
> an array of arbitrary bytes, no NUL terminator required.

On such a system os.stat would have to check that filename is less than
256 bytes and raise an Exception otherwise. In this case it is even more
obvious, because there is no Python structure which it can simply pass
to the OS.

> Baking a limitation of some file systems into the high-level interface is 
> simply a *bad idea*.

We aren't talking about a high-level interface here. We are talking about
low-level code which is just above the OS. THAT code MUST make sure that
it calls the OS API with meaningful parameters or not at all. And it
should raise an Exception in the latter case. And that exception should
not be misleading. 

> How would you feel if there were a whole lot of ignorant Pascal 
> programmers arguing that it was fundamentally impossible for file names 
> to exceed 255 characters, and therefore os.path.exists() out to raise 
> ValueError when passed a file name of 256 characters?

You are barking up the wrong tree here.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Antoon Pardon
On 11-06-18 10:35, Marko Rauhamaa wrote:
> Antoon Pardon :
>> On 11-06-18 02:28, Steven D'Aprano wrote:
>>> The *whole point* of o.p.exists is to return False, not raise an
>>> exception.
>> And the price is that it will not always give the correct answer.
> Yes, but that's still the point of the function's existence.

I find it very strange that the point of a function is to sometimes give
incorrect
answers. I find it an annoying misnomer.

-- 
Antoon.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Marko Rauhamaa
Antoon Pardon :
> On 11-06-18 02:28, Steven D'Aprano wrote:
>> The *whole point* of o.p.exists is to return False, not raise an
>> exception.
>
> And the price is that it will not always give the correct answer.

Yes, but that's still the point of the function's existence.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Marko Rauhamaa
Barry Scott :
> I think the rule is, if python can pass the string faithfully to the
> OS, then do so, otherwise raise an exception that tells the programmer
> that they are doing something that the OS does not allow for.

Sure, but few application programmers would think of dealing with the
surprising ValueError. If they did, they wouldn't think os.path.exists()
had any usefulness. Why write:

try:
ex = os.path.exists(path)
except ValueError:
ex = False
if not ex:
...

instead of:

try:
os.stat(path)
except (OSError, ValueError):
...


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Barry Scott



> On 11 Jun 2018, at 01:03, Chris Angelico  wrote:
> 
> On Mon, Jun 11, 2018 at 9:52 AM, Steven D'Aprano
>  wrote:
>> On Mon, 11 Jun 2018 06:10:26 +1000, Chris Angelico wrote:
>> 
>>> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
>>> I'd like to see what their file names are represented as.
>> 
>> The Finder could very easily be substituting another character, like
>> Konqueror (the KDE 3 file manager) does. In Konqueror, you can create a
>> file named "spam/ham" and it quietly substitutes "spam%2fham" instead.
>> But Konqueror's GUI treats it completely transparently: it is displayed
>> as a slash, and if you copy the file name from the GUI you get a slash.
> 
> Speculation is all very well, but I was wanting to see what actually
> happened. :)

As interesting as it is to see the way applications transform user input into
filenames its does not affect the API that python presents.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Barry Scott



> On 11 Jun 2018, at 01:28, Steven D'Aprano 
>  wrote:
> 
> On Sun, 10 Jun 2018 22:09:39 +0100, Barry Scott wrote:
> 
>> Singling out os.path.exists as a special case I do think is reasonable.
>> All functions that take paths need to have a consistent response to data
> 
> The *mere existence* of os.path.exists means that there is not a 
> consistent response to file names:
> 
>open(foo) raises an exception if foo doesn't exist;
> 
>os.path.exists(foo) returns False if foo doesn't exist.
> 
> There is no requirement that different functions do the same thing with 
> the same bad input. The *whole point* of o.p.exists is to return False, 
> not raise an exception.

I meant that if you cannot call the OS function then always deal with
that error the same way and python does.

How the result of the OS call is reported to the user is another matter.
returning a bool is fine for a predicate and raising an exception is fine for 
open etc.

> 
> 
>> that is impossible to pass to the OS.
> 
> Even if it were true that file names cannot contain certain characters, 
> and it is not, why is that a distinction that anyone gives a shit about?
> 
> I do not expect that there are more than a handful of use-cases for 
> distinguishing "file names which cannot be passed to the OS" versus "any 
> other illegal file name". And even that is generous.
> 
> Besides, it is certainly not true that there are no OSes that can deal 
> with NULs in file names. Classic Mac OS can, as filenames there are 
> represented as Pascal strings (a length byte followed by an array of 
> arbitrary bytes), not NUL-terminated C strings.

Any OS that can take NUL would have the string passed with the NUL by python
and it would then leave it up to the OS figure out if the NUL is valid.

I think the rule is, if python can pass the string faithfully to the OS, then 
do so,
otherwise raise an exception that tells the programmer that they are doing
something that the OS does not allow for.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Antoon Pardon
On 11-06-18 02:28, Steven D'Aprano wrote:
> On Sun, 10 Jun 2018 22:09:39 +0100, Barry Scott wrote:
>
>> Singling out os.path.exists as a special case I do think is reasonable.
>> All functions that take paths need to have a consistent response to data
> The *mere existence* of os.path.exists means that there is not a 
> consistent response to file names:
>
> open(foo) raises an exception if foo doesn't exist;
>
> os.path.exists(foo) returns False if foo doesn't exist.

That is not correct. The path can exist and os.path.exists still return False.

> There is no requirement that different functions do the same thing with 
> the same bad input. The *whole point* of o.p.exists is to return False, 
> not raise an exception.

And the price is that it will not always give the correct answer.

-- 
Antoon Pardon.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Gregory Ewing

Steven D'Aprano wrote:
The evidence suggests that using the Carbon APIs, NUL is just another 
Unicode character. Whatever API replaces Carbon, it will have to deal 
with file names created under Carbon, and classic Mac, and so likely will 
support the same.


Thsi raises the interesting quesion of what happens if you
use Carbon to create a file name containing a null, and then
try to access it throught the BSD API.

Does the file become inaccessible? Does the NUL get turned
into some kind of escape sequence?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Gregory Ewing

Steven D'Aprano wrote:
Besides, it is certainly not true that there are no OSes that can deal 
with NULs in file names. Classic Mac OS can, as filenames there are 
represented as Pascal strings (a length byte followed by an array of 
arbitrary bytes), not NUL-terminated C strings.


There's even a way you could potentially tell the classic
Mac file manager API to create a filename with a colon in
it, by using the (volume reference number, directory id,
filename) way of specifying a file.

I don't know whether it would have succeeded, though --
I suspect it would have rejected such a file name as
invalid.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Gregory Ewing

Steven D'Aprano wrote:
Hmmm... you know I might just be able to do that. Write a file to a 
floppy, then mount it under Linux.


That still might not tell you much. The Linux system will need
a filesystem driver that understands the Mac HFS file system,
which is what your classic Mac system will be writing. And that
driver will have its own way of handling file names with
slashes in them, probably by substituting something else.
So you still won't know what's actually stored on the disk.

The only way to be really sure would be to make a dump of
the raw disk contents and go looking for the file name in it.


(The march of technology is sometimes a nuisance.)


BTW, another problem there would be if your Mac was old
enough, it would be using a variable speed floppy drive,
which I doubt any PC-based system would be able to cope
with...

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-11 Thread Gregory Ewing

Chris Angelico wrote:

I'd like to find out about that. If it doesn't work, it'll be easily
provable that it can't be done.


Using the shell:

% touch colonic:name
% ls
colonic:name
% touch slashy/name
touch: slashy/name: No such file or directory

(It's trying to create a file in a directory called "slashy",
which doesn't exist.)

Using Python:

>>> open("colonic:name", "w").close()
>>> os.listdir(".")
['colonic:name']
>>> open("slashy/name", "w").close()
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: 'slashy/name'

(Same reason as before.)

Using the GUI: I tried to use TextEdit to save a file with
a colon in the name. When I typed ":" into the filename box,
it substituted "-".

I was able to type "slashy/textfile" into the filename box
and save. In the shell it shows as:

% ls
slashy:textfile.rtf

Is that proof enough for you?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


NUL in file names verified [was Re: Why exception from os.path.exists()?]

2018-06-10 Thread Steven D'Aprano
Straight from the horse's mouth, Apple's HFS Plus volumes do indeed
support NULs in file names. Quote:


Indirect node files exist in a special directory called the
metadata directory. This directory exists in the volume's root
directory. The name of the metadata directory is four null
characters followed by the string "HFS+ Private Data".


and:

The case-insensitive Unicode string comparison used by
HFS Plus and case-insensitive HFSX sorts null characters
after all other characters, so the metadata directory
will typically be the last item in the root directory.
On case-sensitive HFSX volumes, null characters sort
before other characters, so the metadata directory will
typically be the first item in the root directory.


https://developer.apple.com/library/archive/technotes/tn/tn1150.html#HFSPlusNames




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Chris Angelico
On Mon, Jun 11, 2018 at 11:06 AM, Steven D'Aprano
 wrote:
> On Sun, 10 Jun 2018 23:57:35 +0200, Peter J. Holzer wrote:
>
>> I think this is worth keeping, and "I couldn't pass that file name to
>> the OS" is a different error than "the OS told me the file doesn't
>> exist", so I think it should be a different exception.
>
> What makes you think that NUL bytes are a fundamental limitation that no
> OS could every cope with?

I didn't say that. If you have an OS that can't handle more than 255
bytes of file name, it's allowed to raise ValueError just the same.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Steven D'Aprano
On Sun, 10 Jun 2018 23:57:35 +0200, Peter J. Holzer wrote:

> I think this is worth keeping, and "I couldn't pass that file name to
> the OS" is a different error than "the OS told me the file doesn't
> exist", so I think it should be a different exception.

What makes you think that NUL bytes are a fundamental limitation that no 
OS could every cope with?

Classic Mac OS takes file names as Pascal strings, with a length byte and 
an array of arbitrary bytes, no NUL terminator required. Despite what far 
too many C programmers appear to believe, NUL-terminated strings are not 
a fundamental requirement.

Navigating Apple's documentation is a nightmare, but I've found the 
deprecated Carbon file manager APIs. For example, creating a file with 
PBCreateFileUnicodeSync:

https://developer.apple.com/documentation/coreservices/1566896-
pbcreatefileunicodesync?language=objc

takes a FSRefParam argument:

https://developer.apple.com/documentation/coreservices/fsrefparam?
language=objc

which includes a name field which is a pointer to an array of Unicode 
characters, and a separate name length.

The evidence suggests that using the Carbon APIs, NUL is just another 
Unicode character. Whatever API replaces Carbon, it will have to deal 
with file names created under Carbon, and classic Mac, and so likely will 
support the same.

Baking a limitation of some file systems into the high-level interface is 
simply a *bad idea*. There is no good reason to treat file names 
containing NUL as special in the API (even if, for implementation 
reasons, it has to be treated specially in the implementation).

How would you feel if there were a whole lot of ignorant Pascal 
programmers arguing that it was fundamentally impossible for file names 
to exceed 255 characters, and therefore os.path.exists() out to raise 
ValueError when passed a file name of 256 characters?

"But it is impossible to pass a string of 256 characters to the OS" is no 
more foolish than "it is impossible to pass a string with an embedded NUL 
to the OS". Both are implementation details. Neither should be baked into 
the high-level language as a fundamental requirement.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Charles Hixson

On 06/07/2018 12:45 AM, Chris Angelico wrote:

On Thu, Jun 7, 2018 at 1:55 PM, Steven D'Aprano
 wrote:

On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:


And an ASCIIZ string cannot contain a byte value of zero. The parallel
is exact.

Why should we, as Python programmers, care one whit about ASCIIZ strings?
They're not relevant. You might as well say that file names cannot
contain the character "π" because ASCIIZ strings don't support it.

No they don't, and yet nevertheless file names can and do contain
characters outside of the ASCIIZ range.

Under Linux, a file name contains bytes, most commonly representing
UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
or at least a representation of it. Yet it cannot contain "\0".

ChrisA
This seems like an argument for allowing byte strings to be used as file 
names, not for altering text strings.  If file names are allowed to 
contain values that are illegal for text strings, then they shouldn't be 
necessarily considered as text strings.


The unicode group sets one set of rules, and their rules should apply in 
their area.  The Linux group sets another set of rules, and their rules 
should apply in their area.  Just because there is a large area of 
overlap doesn't mean that the two areas are congruent.  Byte strings are 
designed to handle any byte pattern, but text strings are designed to 
handle a subset of those patterns. Most byte strings are readable as 
text, but not all of them.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Steven D'Aprano
On Sun, 10 Jun 2018 22:09:39 +0100, Barry Scott wrote:

> Singling out os.path.exists as a special case I do think is reasonable.
> All functions that take paths need to have a consistent response to data

The *mere existence* of os.path.exists means that there is not a 
consistent response to file names:

open(foo) raises an exception if foo doesn't exist;

os.path.exists(foo) returns False if foo doesn't exist.

There is no requirement that different functions do the same thing with 
the same bad input. The *whole point* of o.p.exists is to return False, 
not raise an exception.


> that is impossible to pass to the OS.

Even if it were true that file names cannot contain certain characters, 
and it is not, why is that a distinction that anyone gives a shit about?

I do not expect that there are more than a handful of use-cases for 
distinguishing "file names which cannot be passed to the OS" versus "any 
other illegal file name". And even that is generous.

Besides, it is certainly not true that there are no OSes that can deal 
with NULs in file names. Classic Mac OS can, as filenames there are 
represented as Pascal strings (a length byte followed by an array of 
arbitrary bytes), not NUL-terminated C strings.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Bev in TX
I accidentally did not send this to the list...

> On Jun 10, 2018, at 7:10 PM, Bev in TX  wrote:
> 
> 
>> On Jun 10, 2018, at 3:10 PM, Chris Angelico > > wrote:
>>> ...
>> 
>> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
>> I'd like to see what their file names are represented as.
>> 
> I dug around and found this very old article, in which it says:
> 
> "Another obvious problem is the different path separators between HFS+ 
> (colon, ':') and UFS (slash, '/'). This also means that HFS+ file names may 
> contain the slash character and not colons, while the opposite is true for 
> UFS file names. This was easy to address, though it involves transforming 
> strings back and forth. The HFS+ implementation in the kernel's VFS layer 
> converts colon to slash and vice versa when reading from and writing to the 
> on-disk format. So on disk the separator is a colon, but at the VFS layer 
> (and therefore anything above it and the kernel, such as libc) it's a slash. 
> However, the traditional Mac OS toolkits expect colons, so above the BSD 
> layer, the core Carbon toolkit does yet another translation. The result is 
> that Carbon applications see colons, and everyone else sees slashes. This can 
> create a user-visible schizophrenia in the rare cases of file names 
> containing colon characters, which appear to Carbon applications as slash 
> characters, but to BSD programs and Cocoa applications as colons.”
> 
> That was from, "USENIX 2000 Invited Talks Presentation” at:
> http://www.wsanchez.net/papers/USENIX_2000/ 
> 
> 
> 

Bev in TX




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Chris Angelico
On Mon, Jun 11, 2018 at 9:52 AM, Steven D'Aprano
 wrote:
> On Mon, 11 Jun 2018 06:10:26 +1000, Chris Angelico wrote:
>
>> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
>> I'd like to see what their file names are represented as.
>
> The Finder could very easily be substituting another character, like
> Konqueror (the KDE 3 file manager) does. In Konqueror, you can create a
> file named "spam/ham" and it quietly substitutes "spam%2fham" instead.
> But Konqueror's GUI treats it completely transparently: it is displayed
> as a slash, and if you copy the file name from the GUI you get a slash.

Speculation is all very well, but I was wanting to see what actually
happened. :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Steven D'Aprano
On Mon, 11 Jun 2018 06:10:26 +1000, Chris Angelico wrote:

> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
> I'd like to see what their file names are represented as.

The Finder could very easily be substituting another character, like 
Konqueror (the KDE 3 file manager) does. In Konqueror, you can create a 
file named "spam/ham" and it quietly substitutes "spam%2fham" instead. 
But Konqueror's GUI treats it completely transparently: it is displayed 
as a slash, and if you copy the file name from the GUI you get a slash.

I seem to recall Gnome doing something similar, except it quietly 
substitutes U+2044 FRACTION SLASH or U+2215 DIVISION SLASH instead.

To really be sure what is going on, you would have to bypass the Finder 
and any shell and write the file name using the OS X low-level API.

Or create the file using a classic Mac (system 8 or older), where slashes 
definitely are not treated as special. Not the Mac OS classic emulation 
layer.


Hmmm... you know I might just be able to do that. Write a file to a 
floppy, then mount it under Linux. If I had a Linux computer with a 
floppy disk drive.

(The march of technology is sometimes a nuisance.)

By the way, for some reason I don't seem to have received Bev's post.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Peter J. Holzer
On 2018-06-11 00:28:11 +0300, Marko Rauhamaa wrote:
> Barry Scott :
> > Singling out os.path.exists as a special case I do think is
> > reasonable.
> 
> I don't think anyone has proposed that. While I brought up
> os.path.exists() in my bug report, os.path.isfile(), os.path.isdir() etc
> should obviously be addressed simultaneously.

Yes.

> It may even be that the fix needs to go to os.stat(). That's for the
> Python gods to decide.

I'm not a Python god, but I don't think os.stat() should be changed.
That already throws different exceptions for different errors:

>>> os.stat("nix")
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: 'nix'
>>> os.stat("/lost+found/foo")
Traceback (most recent call last):
  File "", line 1, in 
PermissionError: [Errno 13] Permission denied: '/lost+found/foo'
>>> os.stat("\0")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: embedded null byte

I think this is worth keeping, and "I couldn't pass that file name to
the OS" is a different error than "the OS told me the file doesn't
exist", so I think it should be a different exception.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Marko Rauhamaa
Barry Scott :
> Singling out os.path.exists as a special case I do think is
> reasonable.

I don't think anyone has proposed that. While I brought up
os.path.exists() in my bug report, os.path.isfile(), os.path.isdir() etc
should obviously be addressed simultaneously. It may even be that the
fix needs to go to os.stat(). That's for the Python gods to decide.

> All functions that take paths need to have a consistent response to
> data that is impossible to pass to the OS.

Possibly.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Barry Scott

> On 10 Jun 2018, at 21:10, Chris Angelico  wrote:
> 
> On Mon, Jun 11, 2018 at 12:45 AM, Bev in TX  wrote:
>>> * One with an embedded / in the file name
>> 
>> This is easily done in Finder, where I created a folder named "my/slash”.
>> When I list it at the command line in Terminal, this shows up as "my:slash”, 
>> with the slash shown as a colon.
>> If I create a file with a colon in its name at the command line, that file 
>> name acts the same way:
>> 
>> $ touch ‘my:colon"
>> $ ls
>> my:colon
>> my:slash
>> 
>> In Finder they both display as:
>> my/colon
>> my/slash
>> 
>> However, if you use Finder’s “Copy item as Pathname” option, then you will 
>> again see the colon.
>> 
>> /Users/bev/Training/myPython/pygroup/files/my:colon
>> /Users/bev/Training/myPython/pygroup/files/my:slash
>> 
>> But if you paste that folder’s name in Finder’s “Go to Folder” option, it 
>> converts it to the following, and goes to that folder:
>> 
>> /Users/bev/Training/myPython/pygroup/files/my/slash/slash
> 
> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
> I'd like to see what their file names are represented as.

On Classic Mac OS the folder separator was : not /. /usr/bin/ls would be 
usr:bin:ls for example.

It looks like a hang over from Classic that the macOS Finder maps between : to 
/ for presentation.
In bash you see the ":" in Finder you see a /.

In the Finder attempting to use a : in a filename gets an error message "Name 
cannot be uses a:b".
In macOS bash you cannot use a / in a filename.

I think what all boils down to this:

Windows, macOS and Linux etc all use a 0 as the end of string marker. (Windows 
uses 16bit 0 not 8bit 0).
The os file functions call the underlying OS and map its results into successes 
or exceptions.

The \0 means that the OS functions cannot be passed the data from the user.
Therefore you cannot get an error code from the OS.

All the other file systems rules are checked by the OS itself and any errors 
are reported to the user as OSError etc.

For example on windows the OS will prevent use '<' in a filename and it is the 
OS that returned the error.

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('a", line 1, in 
OSError: [Errno 22] Invalid argument: 'a>> open('a\0b', 'w')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: embedded null character
>>>

Singling out os.path.exists as a special case I do think is reasonable.
All functions that take paths need to have a consistent response to data that 
is impossible to pass to the OS.

When it is impossible to get the OS to see all of the users data I'm not sure 
what else is reasonable for python
to do then what it already does not NUL.

With the exception that I do not think this is documented and the docs should 
be fixed.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Peter J. Holzer
On 2018-06-10 09:45:06 -0500, Bev in TX wrote:
> On Jun 10, 2018, at 5:49 AM, Peter J. Holzer  wrote:
> > On 2018-06-07 12:47:15 +, Steven D'Aprano wrote:
> >> But it doesn't do that. "Pathnames cannot contain NUL" is a falsehood 
> >> that programmers wrongly believe about paths. HFS Plus and Apple File 
> >> System support NULs in paths.
> > [...]
> >> But in the spirit of compromise, okay, let's ignore the existence of file 
> >> systems like HFS which allow NUL. Apart from Mac users, who uses them 
> >> anyway? Let's pretend that every file system in existence, now and into 
> >> the future, will prohibit NULs in paths.
[...]
> >  * One with an embedded / in the file name
> 
> This is easily done in Finder, where I created a folder named "my/slash”.  
> When I list it at the command line in Terminal, this shows up as "my:slash”, 
> with the slash shown as a colon.  
> If I create a file with a colon in its name at the command line, that file 
> name acts the same way:
> 
> $ touch ‘my:colon"
> $ ls
> my:colon
> my:slash
> 
> In Finder they both display as:
> my/colon
> my/slash

Thanks. So they just map '/' to ':'. IIRC, MacOS <= 9 used ':' as the
directory separator, so that makes sense. They kept the old behaviour
for applications using the Mac API (and for the GUI), but for the POSIX
API they use '/' (as they have to). Since ':' wasn't previously allowed,
there is no conflict, just some confusion for the users who sees
different filenames depending on which tool they use.

It does, however, mean that on MacOS filenames can't contain all Unicode
characters, either.

[...]
> I added printing the file name.  As suspected, the “slash” is a colon:
> 
> . - 56096374 - 2e
> .. - 56095464 - 2e 2e
> .DS_Store - 56109197 - 2e 44 53 5f 53 74 6f 72 65
> my:colon - 56095933 - 6d 79 3a 63 6f 6c 6f 6e
> my:slash - 56095521 - 6d 79 3a 73 6c 61 73 68

Yup.


> > Bonuspoints for doing this on an USB stick and then mounting the USB
> > stick on a Linux system and posting the output there as well.
> > 
> Sorry, I don’t have Linux, but I suspect it’s the same as the macOS command 
> line.

Very likely, yes.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Chris Angelico
On Mon, Jun 11, 2018 at 6:22 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
>> I'd like to see what their file names are represented as.
>
> I think Bev already explained it. At Unix level, you can't have slashes
> in filenames. At GUI level, you can't have colons in filenames. Unix
> slashes are bijectively mapped to colons in the GUI.
>
> So what you are asking can't really be tried out.
>

I'd like to find out about that. If it doesn't work, it'll be easily
provable that it can't be done.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Marko Rauhamaa
Chris Angelico :
> Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
> I'd like to see what their file names are represented as.

I think Bev already explained it. At Unix level, you can't have slashes
in filenames. At GUI level, you can't have colons in filenames. Unix
slashes are bijectively mapped to colons in the GUI.

So what you are asking can't really be tried out.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Chris Angelico
On Mon, Jun 11, 2018 at 12:45 AM, Bev in TX  wrote:
>>  * One with an embedded / in the file name
>
> This is easily done in Finder, where I created a folder named "my/slash”.
> When I list it at the command line in Terminal, this shows up as "my:slash”, 
> with the slash shown as a colon.
> If I create a file with a colon in its name at the command line, that file 
> name acts the same way:
>
> $ touch ‘my:colon"
> $ ls
> my:colon
> my:slash
>
> In Finder they both display as:
> my/colon
> my/slash
>
> However, if you use Finder’s “Copy item as Pathname” option, then you will 
> again see the colon.
>
> /Users/bev/Training/myPython/pygroup/files/my:colon
> /Users/bev/Training/myPython/pygroup/files/my:slash
>
> But if you paste that folder’s name in Finder’s “Go to Folder” option, it 
> converts it to the following, and goes to that folder:
>
> /Users/bev/Training/myPython/pygroup/files/my/slash/slash

Can you try creating "spam:ham" and "spam/ham"? If they're both legal,
I'd like to see what their file names are represented as.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Rick Johnson
Marko Rauhamaa wrote:
> Chris Angelico :
> 
> > Marko Rauhamaa  wrote:
> >>
> >> This surprising exception can even be a security issue:
> >>
> >>>>> os.path.exists("\0")
> >>Traceback (most recent call last):
> >>  File "", line 1, in 
> >>  File "/usr/lib64/python3.6/genericpath.py", line 19, in exists
> >>os.stat(path)
> >>ValueError: embedded null byte
> >
> > [...]
> >
> > A Unix path name cannot contain a null byte, so what you
> > have is a fundamentally invalid name. ValueError is
> > perfectly acceptable.
> 
> At the very least, that should be emphasized in the
> documentation. The pathname may come from an external
> source. It is routine to check for "/", "." and ".." but
> most developers (!?) would not think of checking for "\0".
> That means few test suites would catch this issue and few
> developers would think of catching ValueError here. The end
> result is unpredictable.

I'd have to agree with this assessment. Either a filepath
exists, or it doesn't. Therefore, in the "worldview" of
os.path.exist, there should be no consideration of
conformity, as it is blatantly obvious that any malformed
path would _not_ and could _not_ possibly, exist. In fact,
the only case in which the method in question should raise
an Exception is when a non-stringy argument is passed. Which
-- at least in 2.x version -- i can confirm (with limited
testing) it does this correctly!

## PYTHON 2.X SESSION ##
py> os.path.exists('')
False
py> os.path.exists(None)
Traceback (most recent call last):
  File "", line 1, in 
os.path.exists(None)
  File "C:\Python27\lib\genericpath.py", line 26, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found
>>> os.path.exists(1)
Traceback (most recent call last):
  File "", line 1, in 
os.path.exists(1)
  File "C:\Python27\lib\genericpath.py", line 26, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, int found

But if the argument is a string, either it exists as a
filepath or it doesn't. Case closed. Anything less would
constitute type discrimination. Which is not only
inconsistent, it's unethical.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Bev in TX


> On Jun 10, 2018, at 5:49 AM, Peter J. Holzer  wrote:
> 
> On 2018-06-07 12:47:15 +, Steven D'Aprano wrote:
>> But it doesn't do that. "Pathnames cannot contain NUL" is a falsehood 
>> that programmers wrongly believe about paths. HFS Plus and Apple File 
>> System support NULs in paths.
> [...]
>> But in the spirit of compromise, okay, let's ignore the existence of file 
>> systems like HFS which allow NUL. Apart from Mac users, who uses them 
>> anyway? Let's pretend that every file system in existence, now and into 
>> the future, will prohibit NULs in paths.
> 
> Could you (or anybody else who owns a Mac) please do the following:
> 
> * Create an empty directory
> * In this directory, create two files:
>  * One with an embedded \0 in the file name

I don’t know how to do this.  I can’t enter a Nul in Finder.  Bash silently 
converts it to a zero when using it as a file name.  C considers the previous 
character the end of the file name.  Python considers it an error.

>  * One with an embedded / in the file name

This is easily done in Finder, where I created a folder named "my/slash”.  
When I list it at the command line in Terminal, this shows up as "my:slash”, 
with the slash shown as a colon.  
If I create a file with a colon in its name at the command line, that file name 
acts the same way:

$ touch ‘my:colon"
$ ls
my:colon
my:slash

In Finder they both display as:
my/colon
my/slash

However, if you use Finder’s “Copy item as Pathname” option, then you will 
again see the colon.  

/Users/bev/Training/myPython/pygroup/files/my:colon
/Users/bev/Training/myPython/pygroup/files/my:slash

But if you paste that folder’s name in Finder’s “Go to Folder” option, it 
converts it to the following, and goes to that folder:

/Users/bev/Training/myPython/pygroup/files/my/slash/slash

So we can see three (3) separate behaviors for the same folder in Finder.  This 
is because at some higher level, Apple’s file systems reserve the colon for the 
path separator, in stead of the slash.  I know that AppleScript does use the 
colon as a path separator.  However at a lower level, macOS still uses the 
slash, rather than colon, as the path name separator.  IMO, this is confusing.  
Similarly in Finder file names are case insensitive, but case preserving.  At 
the command line they are still case sensitive.  Note that it’s possible to 
format a disk with a fully case sensitive file system, but that’s not the 
default.

Please note that I am not an Apple HFS+ or AFS file system expert, by any 
stretch of the imagination.  So please excuse me if I did’t state things 
perfectly.

> * Compile and run this C program in the directory and post the output:
> 
>#include 
>#include 
>#include 
> 
>int main(void) {
>DIR *dp;
>struct dirent *de;
>char *p;
> 
>dp = opendir(".");
>while ((de = readdir(dp)) != NULL) {
>printf("%ld -", (long)de->d_ino);
>for (p = de->d_name; *p; p++) {
>printf(" %02x", (unsigned char)*p);
>}
>printf("\n");
>}
>return 0;
>}

I added printing the file name.  As suspected, the “slash” is a colon:

. - 56096374 - 2e
.. - 56095464 - 2e 2e
.DS_Store - 56109197 - 2e 44 53 5f 53 74 6f 72 65
my:colon - 56095933 - 6d 79 3a 63 6f 6c 6f 6e
my:slash - 56095521 - 6d 79 3a 73 6c 61 73 68

> Bonuspoints for doing this on an USB stick and then mounting the USB
> stick on a Linux system and posting the output there as well.
> 
Sorry, I don’t have Linux, but I suspect it’s the same as the macOS command 
line.

> I'm really curious how MacOS maps those characters in the POSIX API.
> 
>hp

Bev in TX



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Peter J. Holzer
On 2018-06-07 12:47:15 +, Steven D'Aprano wrote:
> But it doesn't do that. "Pathnames cannot contain NUL" is a falsehood 
> that programmers wrongly believe about paths. HFS Plus and Apple File 
> System support NULs in paths.
[...]
> But in the spirit of compromise, okay, let's ignore the existence of file 
> systems like HFS which allow NUL. Apart from Mac users, who uses them 
> anyway? Let's pretend that every file system in existence, now and into 
> the future, will prohibit NULs in paths.

Could you (or anybody else who owns a Mac) please do the following:

* Create an empty directory
* In this directory, create two files:
  * One with an embedded \0 in the file name
  * One with an embedded / in the file name
* Compile and run this C program in the directory and post the output:

#include 
#include 
#include 

int main(void) {
DIR *dp;
struct dirent *de;
char *p;

dp = opendir(".");
while ((de = readdir(dp)) != NULL) {
printf("%ld -", (long)de->d_ino);
for (p = de->d_name; *p; p++) {
printf(" %02x", (unsigned char)*p);
}
printf("\n");
}
return 0;
}

Bonuspoints for doing this on an USB stick and then mounting the USB
stick on a Linux system and posting the output there as well.

I'm really curious how MacOS maps those characters in the POSIX API.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Chris Angelico
On Sun, Jun 10, 2018 at 7:06 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>> It's important to pin down the true cause of the problem, and not
>> blame something for doing the proper Pythonic thing.
>
> So could you tell me what the proper Pythonic fix for the example server
> in Python's documentation would be?
>
> Here's the code in question:
>
> 
> import http.server
> import socketserver
>
> PORT = 8000
>
> Handler = http.server.SimpleHTTPRequestHandler
>
> with socketserver.TCPServer(("", PORT), Handler) as httpd:
> print("serving at port", PORT)
> httpd.serve_forever()
> 
>
> BTW, the proper response would be a 404. 500 means: "There's a bug in my
> code".
>

The fix is a pull request against serve_forever to catch exceptions
and return 500s.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-10 Thread Marko Rauhamaa
Chris Angelico :
> It's important to pin down the true cause of the problem, and not
> blame something for doing the proper Pythonic thing.

So could you tell me what the proper Pythonic fix for the example server
in Python's documentation would be?

Here's the code in question:


import http.server
import socketserver

PORT = 8000

Handler = http.server.SimpleHTTPRequestHandler

with socketserver.TCPServer(("", PORT), Handler) as httpd:
print("serving at port", PORT)
httpd.serve_forever()


BTW, the proper response would be a 404. 500 means: "There's a bug in my
code".


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-09 Thread Chris Angelico
On Sun, Jun 10, 2018 at 5:53 AM, Ed Kellett  wrote:
> On 2018-06-08 03:42, Chris Angelico wrote:
>> Apart from the one odd bug with SimpleHTTPServer not properly sending
>> back 500s, I very much doubt that the original concern - namely that
>> os.path.exists() and os.stat() raise ValueError if therels a %00 in
>> the URL - can be abused effectively.
> Dismissing HTTP 500s as "not a vulnerability" sounds reasonable enough
> to me. But you're assuming that all other expressions of this bug in
> applications will be at least as benign. I'm not sure that that's warranted.
>

It is an exception. There are a small number of possible results:

1) It happens in code where ValueError could otherwise happen, and the
code gets confused. That's a bug, but bugs do happen. No way to
predict the actual results; it's probably going to make something else
go into a  default mode or something. Highly unlikely for it to
trigger a vulnerability, but if it does, the problem is that you have
code that's catching an exception that it shouldn't be.

2) It happens in code where ValueError is not expected, and is handled
as an unexpected exception. ALL end-user-facing code should have a
means of coping with exceptions (web servers should toss back a 500,
etc). If it doesn't, then *that* is the vulnerability, not the
ValueError itself; there are many MANY ways for Python code to
unexpectedly raise exceptions.

Either way, this exception isn't itself a problem; but it might reveal
a different problem. For instance, an end-user-facing app that has no
protective exception handler might be induced to terminate in this
way, which is a DOS; but the problem isn't os.path.exists raising
ValueError, the problem is an unexpected exception causing
termination.

It's important to pin down the true cause of the problem, and not
blame something for doing the proper Pythonic thing. Python is not Go;
exceptions exist to be used. The advantage of Go is that you never get
unexpected exceptions... instead, you just get unexpected incorrect
behaviour if you fail to check the return value of a function and just
assume that it did its job. Exceptions don't remove all responsibility
from you, but they DO make it a lot easier to handle them coherently.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-09 Thread Ed Kellett
On 2018-06-08 03:42, Chris Angelico wrote:
> Apart from the one odd bug with SimpleHTTPServer not properly sending
> back 500s, I very much doubt that the original concern - namely that
> os.path.exists() and os.stat() raise ValueError if therels a %00 in
> the URL - can be abused effectively.
Dismissing HTTP 500s as "not a vulnerability" sounds reasonable enough
to me. But you're assuming that all other expressions of this bug in
applications will be at least as benign. I'm not sure that that's warranted.



signature.asc
Description: OpenPGP digital signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread eryk sun
On Fri, Jun 8, 2018 at 11:35 AM, Steven D'Aprano
 wrote:
>
> (referring to both the NUL bytes in UTF-16 encoded NTFS file names, and
> the lack of NUL bytes in common Linux file names).

NTFS filenames are stored as wchar_t strings, for which NUL is
L"\x00\x00". Individual null bytes are irrelevant to this problem,
unless we're using an encoding such as an ASCII superset that stores
the character NUL as "\x00".
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Antoon Pardon
On 08-06-18 13:35, Steven D'Aprano wrote:
> On Fri, 08 Jun 2018 09:27:17 +0200, Antoon Pardon wrote:
>
>> On 08-06-18 04:19, Steven D'Aprano wrote:
>>> On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:
>>>
 So... an ASCIIZ string *can* contain that character, or at least a
 representation of it. Yet it cannot contain "\0".
>>> You keep saying that as if it made one whit of difference to what
>>> os.path.exists should do. I completely agree that ASCIIZ strings cannot
>>> contain NUL bytes. What does that have to do with os.path.exists()?
>>>
>>> NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII
>>> pathnames, the bytes on disk are *full* of NUL bytes.
>> This is irrelevant.
> Of course it is irrelevant, JUST LIKE I SAID IN THE PARAGRAPH YOU DELETED:
>
> They're actually both equally implementation details and 
> utterly irrelevant to the behaviour of os.path.exists.
>
> (referring to both the NUL bytes in UTF-16 encoded NTFS file names, and 
> the lack of NUL bytes in common Linux file names).
>
> I think that's dirty debating tactics, a variant of "Strawman argument". 
> I make a statement. You delete it, and respond saying the same thing I 
> said, but making it out as if it were a devastating response to my 
> argument.
>
> Pretty pathetic really.

Get of your high horse. Sure I was too quick in my reaction, without
reading your contribution through. That is regretable but it happens.

But you are in no situation to cast the first stone. These things
and worse have been happening to you too.

> The existence or use of ASCIIZ strings by the Linux kernel are not the 
> least bit relevant to the question of why, alone of a near-infinite 
> number of possible invalid pathnames, those containing NUL are singled 
> out for an exception when all others simply return False.

Only if you phrase the question as being about eliminating an exception.
If you phrase the question as being about how invalid pathnames should
be treated and may consider that they all should raise an exception then
ASCIIZ is relevant as an illustration of what is in some situation an invalid
pathname.

-- 
Antoon.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Steven D'Aprano
On Fri, 08 Jun 2018 09:27:17 +0200, Antoon Pardon wrote:

> On 08-06-18 04:19, Steven D'Aprano wrote:
>> On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:
>>
>>> So... an ASCIIZ string *can* contain that character, or at least a
>>> representation of it. Yet it cannot contain "\0".
>> You keep saying that as if it made one whit of difference to what
>> os.path.exists should do. I completely agree that ASCIIZ strings cannot
>> contain NUL bytes. What does that have to do with os.path.exists()?
>>
>> NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII
>> pathnames, the bytes on disk are *full* of NUL bytes.
> 
> This is irrelevant.

Of course it is irrelevant, JUST LIKE I SAID IN THE PARAGRAPH YOU DELETED:

They're actually both equally implementation details and 
utterly irrelevant to the behaviour of os.path.exists.

(referring to both the NUL bytes in UTF-16 encoded NTFS file names, and 
the lack of NUL bytes in common Linux file names).

I think that's dirty debating tactics, a variant of "Strawman argument". 
I make a statement. You delete it, and respond saying the same thing I 
said, but making it out as if it were a devastating response to my 
argument.

Pretty pathetic really.

The existence or use of ASCIIZ strings by the Linux kernel are not the 
least bit relevant to the question of why, alone of a near-infinite 
number of possible invalid pathnames, those containing NUL are singled 
out for an exception when all others simply return False.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Marko Rauhamaa
Antoon Pardon :

> On 08-06-18 09:45, Marko Rauhamaa wrote:
>> Whatever your philosophical tastes, this unexpected feature of
>> os.path.exists() (& co) leads to unexpected application behavior IRL,
>> and, in the GhostBusters sense, that is bad.
>
> Sure, I agree that it is unexpected behaviour. But does that mean the
> behaviour should be fixed or that the behaviour should be better
> documented?

I'd say it should be fixed. I would go so far as to say that os.stat()
should be fixed so that "" and "\0" result in the same exception (EINVAL
~ ValueError).

If the documentation option is taken, the documentation should emphasize
the issue. Furthermore, the http.server module should be fixed because
it is broken at the moment.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Antoon Pardon
On 08-06-18 09:45, Marko Rauhamaa wrote:
> Whatever your philosophical tastes, this unexpected feature of
> os.path.exists() (& co) leads to unexpected application behavior IRL,
> and, in the GhostBusters sense, that is bad.

Sure, I agree that it is unexpected behaviour. But does that mean the
behaviour
should be fixed or that the behaviour should be better documented?

Either could be an answer and in trying to find the answer I don't think
the actual
FS is relevant because the doesn't access the FS directly but through
the OS api.

-- 
Antoon Pardon.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Marko Rauhamaa
Antoon Pardon :

> On 08-06-18 04:19, Steven D'Aprano wrote:
>> On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:
>>
>>> So... an ASCIIZ string *can* contain that character, or
>>> at least a representation of it. Yet it cannot contain "\0".
>> [...]
>> NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII 
>> pathnames, the bytes on disk are *full* of NUL bytes. 
>
> This is irrelevant.

As for what is relevant for the original question is that the ValueError
exception is a practical trap that I have fallen into, and, as I
demonstrated yesterday, the http.server module has fallen into (through
os.path.isdir()). In fact, I couldn't spot a single instance of
os.path.exists() in the Python standard library that would guard against
a ValueError (to be sure, in almost all of the cases you could argue it
would be redundant).

Whatever your philosophical tastes, this unexpected feature of
os.path.exists() (& co) leads to unexpected application behavior IRL,
and, in the GhostBusters sense, that is bad.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Antoon Pardon
On 08-06-18 04:19, Steven D'Aprano wrote:
> On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:
>
>> So... an ASCIIZ string *can* contain that character, or
>> at least a representation of it. Yet it cannot contain "\0".
> You keep saying that as if it made one whit of difference to what 
> os.path.exists should do. I completely agree that ASCIIZ strings cannot 
> contain NUL bytes. What does that have to do with os.path.exists()?
>
> NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII 
> pathnames, the bytes on disk are *full* of NUL bytes. 

This is irrelevant. If you are on a linux box, you will still need to
pass an ASCIIZ string to the OS and won't be able to pass an embedded
NUL byte as part of a file name.

What is on the disk and what kind of remapping the OS performs to get
at the actual data, is something that happens in layer the user is
mostly not aware off.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Antoon Pardon
On 08-06-18 05:12, Ben Finney wrote:
> That is immediately followed by more specific advice that says when to
> use bytes:
>
> Unfortunately, some file names may not be representable as strings
> on Unix, so applications that need to support arbitrary file names
> on Unix should use bytes objects to represent path names. Vice
> versa, using bytes objects cannot represent all file names on
> Windows (in the standard mbcs encoding), hence Windows applications
> should use string objects to access all files.
>
> (That needs IMO a correction, because as already explored in this
> thread, it's not Unix or Windows that makes the distinction there. It's
> the specific *filesystem type* which records either bytes or text, and
> that is true no matter what operating system happens to be reading the
> filesystem.)

But it is the Unix or Windows api that is used. If the unix-api wants
bytes, then you don't have to care about what ends up on the file-system.
Just as you don't care whether the file-system uses encryption or not
to store it's data. That is all hidden in layers that are normally
inaccessible to the user.

Maybe some FS that is hooked up to my linux box does allow embedded NUL
bytes. I won't be able to notice that because even if there are such
files, they will need some remapping in order to make them accessible.
To the user the actuale filenames are the remapped filenames, not what
is actually on the FS.

-- 
Antoon Pardon.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-08 Thread Steven D'Aprano
On Thu, 07 Jun 2018 22:56:49 -0400, Richard Damon wrote:

> or we need an alternate API that lets us pass raw bytes as file names

Guido's Time Machine strikes again.

All the path related functions, include open(), take arguments as either 
bytes or strings.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Ben Finney
Richard Damon  writes:

> This does bring up an interesting point. Since the Unix file system
> really has file names that are collection of bytes instead of really
> being strings, and the Python API to it want to treat them as strings,
> then we have an issue that we are going to be stuck with problems with
> filenames.

I agree with the general statement “we are going to be stuck with
problems with filenames”; the world of filesystems is messy, which will
always cause problems.

With that said, I don't agree that “the Python API wants to treat
[file paths] as strings”. The ‘os’ module explicitly promises to treat
bytes as bytes, and text as text, in filesystem paths:

Note: All of these functions accept either only bytes or only string
objects as their parameters. The result is an object of the same
type, if a path or file name is returned.

https://docs.python.org/3/library/os.path.html>

There is a *preference* for text, it's true. The opening paragraph
includes this:

Applications are encouraged to represent file names as (Unicode)
character strings.

That is immediately followed by more specific advice that says when to
use bytes:

Unfortunately, some file names may not be representable as strings
on Unix, so applications that need to support arbitrary file names
on Unix should use bytes objects to represent path names. Vice
versa, using bytes objects cannot represent all file names on
Windows (in the standard mbcs encoding), hence Windows applications
should use string objects to access all files.

(That needs IMO a correction, because as already explored in this
thread, it's not Unix or Windows that makes the distinction there. It's
the specific *filesystem type* which records either bytes or text, and
that is true no matter what operating system happens to be reading the
filesystem.)

> Ultimately we have a fundamental limitation with trying to abstract out
> the format of filenames in the API, and we need a back door to allow us
> to define what encoding to use for filenames (and be able to detect that
> it doesn't work for a given file, and change it on the fly to try
> again), or we need an alternate API that lets us pass raw bytes as file
> names and the program needs to know how to handle the raw filename for
> that particular file system.

Yes, I agree that there is an unresolved problem to explicitly declare
the encoding for filesystem paths on ext4 and other filesystems where
byte strings are used for filesystem paths.

-- 
 \   “Give a man a fish, and you'll feed him for a day; give him a |
  `\religion, and he'll starve to death while praying for a fish.” |
_o__)   —Anonymous |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Richard Damon
On 6/7/18 9:17 PM, Steven D'Aprano wrote:
> On Thu, 07 Jun 2018 15:38:39 -0400, Dennis Lee Bieber wrote:
>
>> On Fri, 1 Jun 2018 23:16:32 + (UTC), Steven D'Aprano
>>  declaimed the following:
>>
>>> It should either return False, or raise TypeError. Of the two, since
>>> 3.14159 cannot represent a file on any known OS, TypeError would be more
>>> appropriate.
>>>
>>  I wouldn't be so sure of that...
> I would.
>
> There is no existing file system which uses floats instead of byte- or 
> character-strings for file names. If you believe different, please name 
> the file
>
>
>> Xerox CP/V allowed for embedding
>> non-printable characters into file names
> Just like most modern file systems.
>
> Even FAT-16 supports a range of non-ASCII bytes with the high-bit set 
> (although not the control codes with the high-bit cleared). Unix file 
> systems typically support any byte except \0 and /. Most modern file 
> systems outside of Unix support any Unicode character (or almost any) 
> including ASCII control characters.
>
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
>
>
>
This does bring up an interesting point. Since the Unix file system
really has file names that are collection of bytes instead of really
being strings, and the Python API to it want to treat them as strings,
then we have an issue that we are going to be stuck with problems with
filenames. If we assume they are utf-8 encoded, then there exist
filenames that will trap with invalid encodings  (if for example the
name were generated on a system that was using Latin-1 as an 8 bit
character set for file names). On the other hand, if we treat the file
names as 8 bit characters by themselves, if the system was using utf-8
then we are mangling any characters outside the basic ASCII set.
Basically we hit to old problem of confusing bytes and strings.
Ultimately we have a fundamental limitation with trying to abstract out
the format of filenames in the API, and we need a back door to allow us
to define what encoding to use for filenames (and be able to detect that
it doesn't work for a given file, and change it on the fly to try
again), or we need an alternate API that lets us pass raw bytes as file
names and the program needs to know how to handle the raw filename for
that particular file system.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Fri, Jun 8, 2018 at 12:16 PM, Steven D'Aprano
 wrote:
> On Thu, 07 Jun 2018 23:25:54 +1000, Chris Angelico wrote:
>> Yes, it's a bug. If someone tries a page size of zero, it'll divide by
>> zero and bomb. Great. But how is it a vulnerability? It is a
>> properly-handled exception.
>
> Causing a denial of service is a vulnerability.

Yes, but remember, anyone can build a botnet and send large numbers of
entirely legitimate requests to your server. Since no server has
infinite capacity, a DOS is inherently unavoidable. So to call
something a "DOS vulnerability", you have to show that it makes you
*more vulnerable* than simply getting overloaded with requests. For
example:

1) If the kernel allocates resources for half-open socket connections,
a malicious client can SYN-flood the server, causing massive resource
usage from relatively few packets.

2) If the language can be induced to build a hashtable using values
that all have the same hash, the CPU load required for the O(n²)
operations can easily exceed the cost of making the requests.

3) If the app inefficiently performs many database transactions for a
simple request, a plausible number of such requests could slow the
database to a crawl.

4) If a small request results in an inordinately large response, the
server's outgoing bandwidth can be saturated by a small number of
requests.

Where in this is a simple HTTP 500 from the os.stat() call worse than
a legitimate request for an actual page?

The response is small (far smaller than many legit files - consider a
web app with a large JavaScript bundle, easily multiple megabytes). It
required zero disk operations, so it's as fast as returning a file
from cache. The only way it's more expensive is the actual exception
handling code itself, and if you reckon someone can DOS a server via
the cost of throwing and catching exceptions, I'm going to have to ask
for some serious measurements.

Apart from the one odd bug with SimpleHTTPServer not properly sending
back 500s, I very much doubt that the original concern - namely that
os.path.exists() and os.stat() raise ValueError if therels a %00 in
the URL - can be abused effectively.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:

> On Thu, Jun 7, 2018 at 1:55 PM, Steven D'Aprano
>  wrote:
>> On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:
>>
>>> And an ASCIIZ string cannot contain a byte value of zero. The parallel
>>> is exact.
>>
>> Why should we, as Python programmers, care one whit about ASCIIZ
>> strings? They're not relevant. You might as well say that file names
>> cannot contain the character "π" because ASCIIZ strings don't support
>> it.
>>
>> No they don't, and yet nevertheless file names can and do contain
>> characters outside of the ASCIIZ range.
> 
> Under Linux, a file name contains bytes, most commonly representing
> UTF-8 sequences.

The fact that user-space applications like the shell and GUI file 
managers sometimes treat file names at UTF-8 Unicode is not really 
relevant to what the file system allows. The most common Linux file 
systems are fundamentally bytes, not Unicode characters, and while I'm 
willing to agree to call the byte 0x41 "A", there simply is no such byte 
that means "π" or U+10902 PHOENICIAN LETTER GAML.

File names under typical Linux file systems are not necessarily valid 
UTF-8 Unicode. That's why Python still provides a bytes-interface as well 
as a text interface.


> So... an ASCIIZ string *can* contain that character, or
> at least a representation of it. Yet it cannot contain "\0".

You keep saying that as if it made one whit of difference to what 
os.path.exists should do. I completely agree that ASCIIZ strings cannot 
contain NUL bytes. What does that have to do with os.path.exists()?

NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII 
pathnames, the bytes on disk are *full* of NUL bytes. If the 
implementation detail that ASCIIZ strings cannot contain NUL is important 
to you, it should be equally important that UTF-16 strings typically have 
many NULs.

They're actually both equally implementation details and utterly 
irrelevant to the behaviour of os.path.exists.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 23:25:54 +1000, Chris Angelico wrote:

[...]
>> Does the Python web server suffer from that vulnerability? I would be
>> surprised if it were. But it can be induced to crash (an exception, not
>> a seg fault) which is certainly a vulnerability.
> 
> "Certainly"? I'm dubious on that. This isn't C, where a segfault usually
> comes after executing duff memory, and therefore it's plausible to
> transform a segfault into a remote code execution exploit.

I just said that I would be surprised if you could get remote code 
execution from the Python web server, for exactly the reason you state: 
its an exception, not a segfault.

Stop agreeing with me when we're trying to have an argument! *wink*


[...]
> Yes, it's a bug. If someone tries a page size of zero, it'll divide by
> zero and bomb. Great. But how is it a vulnerability? It is a
> properly-handled exception.

Causing a denial of service is a vulnerability.

Security vulnerabilities are not just about remote code execution. Can 
remote attackers bring your service down? If so, you are vulnerable to 
having remote attackers bring your service down.

Can remote attackers overwhelm your server with so many errors that they 
fill your disks with error logs and either stop logging, or crash? Then 
you are vulnerable to having remote attackers crash your server, or hide 
their tracks by preventing logging.

Can remote attackers induce your server to serve files it shouldn't? Then 
you are vulnerable to attacks that leak sensitive or private information.

There's far more to security vulnerabilities than just "oh well, they 
can't get a shell or execute code on my server, so it's all cool" *wink*


In this specific case:

> It's slightly different with SimpleHTTPServer, as it fails to properly
> send back the 500. That would be a bug IMO. 

There seems to be some weird interaction occurring on my system between 
the SimpleHTTPServer, Firefox, and my web proxy, so I may have 
misinterpreted the precise nature of the crash. What I initially saw was 
that allow the SimpleHTTPServer remained running, it stopped responding 
to requests and Firefox would repeatedly respond:

Firefox can't find the server at www.localhost.com

even though the process was still running. But when I tried with a 
different browser (links), I don't get that same behaviour. links is 
using the web proxy, Firefox isn't, but I'm not quite sure why that makes 
a difference.

> Even then, though, all you
> can do is clog the server with unfinished requests - and you can do that
> much more easily by just connecting and being really slow to send data.
> (And I doubt that people are using SimpleHTTPServer in
> security-sensitive contexts anyway.)

Again, you're just repeating what I said in different words. I already 
said that *this specific* issue is probably low severity, because people 
are unlikely to use SimpleHTTPServer for mission critical services 
exposed to the internet. 



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 15:38:39 -0400, Dennis Lee Bieber wrote:

> On Fri, 1 Jun 2018 23:16:32 + (UTC), Steven D'Aprano
>  declaimed the following:
> 
>>It should either return False, or raise TypeError. Of the two, since
>>3.14159 cannot represent a file on any known OS, TypeError would be more
>>appropriate.
>>
>   I wouldn't be so sure of that...

I would.

There is no existing file system which uses floats instead of byte- or 
character-strings for file names. If you believe different, please name 
the file


> Xerox CP/V allowed for embedding
> non-printable characters into file names

Just like most modern file systems.

Even FAT-16 supports a range of non-ASCII bytes with the high-bit set 
(although not the control codes with the high-bit cleared). Unix file 
systems typically support any byte except \0 and /. Most modern file 
systems outside of Unix support any Unicode character (or almost any) 
including ASCII control characters.

https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits



[...]
>   With some work, one could probably generate a file name 
containing the
> bytes used for storing a floating point value.

Any collection of bytes can be interpreted as any thing we like. 
(Possibly requiring padding or splitting to fit fixed-width data 
structures.) Sounds. Bitmaps. Coordinates in three dimension space. 
Floating point numbers is no challenge. A Python float is represented by 
an eight-byte C double. Provided we agree on a convention for splitting 
byte strings into eight-byte chunks, adding padding, and agree on big- or 
little-endianness, it is trivial to convert file names to one or more 
floats:

/etc is equivalent to 2.2617901550715974e-80

(big endian, padding added to the right)

But just because I can do that conversion, doesn't mean that the file 
system uses floats for file names.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Fri, Jun 8, 2018 at 3:10 AM, MRAB  wrote:
> On 2018-06-07 08:45, Chris Angelico wrote:
>> Under Linux, a file name contains bytes, most commonly representing
>> UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
>> or at least a representation of it. Yet it cannot contain "\0".
>>
> I've seen a variation of UTF-8 that encodes U+ as 2 bytes so that a zero
> byte can be used as a terminator.
>
> It's therefore not impossible to have a version of Linux that allowed a
> (Unicode) "\0" in a filename.

Considering that Linux treats filenames as raw bytes, that's not
surprising. The mangled encoding you refer to is a horrendous cheat,
though, and violates several of the design principles of UTF-8, so I
do not recommend it EVER. The correct way for Python to handle and
represent such a file name would be to use the U+DCxx range to carry
the bytes through unchanged - not using "\0".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread MRAB

On 2018-06-07 08:45, Chris Angelico wrote:

On Thu, Jun 7, 2018 at 1:55 PM, Steven D'Aprano
 wrote:

On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:


And an ASCIIZ string cannot contain a byte value of zero. The parallel
is exact.


Why should we, as Python programmers, care one whit about ASCIIZ strings?
They're not relevant. You might as well say that file names cannot
contain the character "π" because ASCIIZ strings don't support it.

No they don't, and yet nevertheless file names can and do contain
characters outside of the ASCIIZ range.


Under Linux, a file name contains bytes, most commonly representing
UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
or at least a representation of it. Yet it cannot contain "\0".

I've seen a variation of UTF-8 that encodes U+ as 2 bytes so that a 
zero byte can be used as a terminator.


It's therefore not impossible to have a version of Linux that allowed a 
(Unicode) "\0" in a filename.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Tim Chase
On 2018-06-07 22:46, Chris Angelico wrote:
> On Thu, Jun 7, 2018 at 10:18 PM, Steven D'Aprano
>    3. http://localhost:8000/te%00st.html
> >>> Actually, I couldn't even get Chrome to make that request, so it
> >>> obviously was considered by the browser to be invalid.  

It doesn't matter whether Chrome or Firefox can make the request if
it can be made by opening the socket yourself with something as
simple as

  $ telnet example.com 80
  GET /te%00st.html HTTP/1.1
  Host: example.com

If that crashes the server, it's a problem, even if browsers try to
prevent it from happening by accident.

>> It works in Firefox, but Apache truncates the URL:
>>
>> Not Found
>> The requested URL /te was not found on this server.
>>
>> instead of te%00st.html

This is a sensible result, left up to each server to decide what to
do.

>> I wonder how many publicly facing web servers can be induced to
>> either crash, or serve the wrong content, this way?

I'm sure there are plenty. I mean, I discovered this a while back

https://mail.python.org/pipermail/python-list/2016-August/713373.html

and that's Microsoft running their own stack.  They seem to have
fixed that issue at that particular set of URLs, but a little probing
has turned it up elsewhere at microsoft.com since (for the record,
the first set of non-existent URLs return 404-not-found errors while
the second set of reserved filename URLs return
500-Server-Internal-Error pages).  Filename processing is full of
sharp edge-cases.

> Define "serve the wrong content". You could get the exact same
> content by asking for "te" instead of "te%00st.html"; what you've
> done is not significantly different from this:
> 
> http://localhost:8000/te?st.html
> 
> Is that a security problem too?

Depending on the server, it might allow injection for something like

 http://example.com/page%00cat+/etc/passwd

Or it might allow the request to be processed in an attack, but leave
the log files without the details:

 GET /innocent%00malicious_payload
 (where only the "/innocent" gets logged)

Or false data could get injected in log files

 
http://example.com/innocent%00%0a23.200.89.180+-+-+%5b07/Jun/2018%3a13%3a55%3a36+-0700%5d+%22GET+/nasty_porn.mov+HTTP/1.0%22+200+2326

(`host whitehouse.gov` = 23.200.89.180)

It all depends on the server and how the request is handled.

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Antoon Pardon
On 07-06-18 14:47, Steven D'Aprano wrote:
> On Thu, 07 Jun 2018 10:04:53 +0200, Antoon Pardon wrote:
>
>> On 07-06-18 05:55, Steven D'Aprano wrote:
>>> Python strings are rich objects which support the Unicode code point \0
>>> in them. The limitation of the Linux kernel that it relies on NULL-
>>> terminated byte strings is irrelevant to the question of what
>>> os.path.exists ought to do when given a path containing NUL. Other
>>> invalid path names return False.
>> It is not irrelevant. It makes the disctinction clear between possible
>> values and impossible values. 
> That is simply wrong. It is wrong in principle, and it is wrong in 
> practice, for reasons already covered to death in this thread.
>
> It is *wrong in practice* because other impossible values don't raise 
> ValueError, they simply return False:
>
> - illegal pathnames under Windows, those containing special 
>   characters like ? > < * etc, simply return False;
>
> - even on Linux, illegal pathnames like "" (the empty string)
>   return False;
>
> - invalid pathnames with too many path components, or too many
>   characters in a single component, simply return False;
>
> - the os.path.exists() function is not documented as making 
>   a three-way split between "exists, doesn't exist and invalid";

So? Maybe we should reconsider the above behaviour?

>
> - and it isn't even true to say that NULL is illegal in pathnames:
>   there are at least five file systems that allow either NUL bytes:
>   FAT-8, MFS, HFS, or Unicode \0 code points: HFS Plus and Apple
>   File System.

That doesn't matter much. sqrt(-1) gives a ValueError, while there
are numberdomains for which it has a value.


> And it is *wrong in principle* because in the most general case, there is 
> no way to tell which pathnames are valid or invalid without querying an 
> actual file system. In the case of Linux, any directory could be used as 
> a mount point.

I don't see how your first statement follows from that explanation. I don't
have a problem with needing to query the actual file system in order to find
out which pathnames are valid or invalid.

> Have you ever actually used this feature? When was the last time you?

This is irrelevant. You are now trying to argue the uselesness. The fact that
after consideration something turns out not very useful, is not a reason
to conclude that the factors that were taken into consideration were irrelevant.

Personaly I don't use os.path.exists because it tries to shoe horn too many
possibilities into a boolean result. Do you think os.stat("\0") should
raise FileNotFoundError?

-- 
Antoon.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 11:09 PM, Steven D'Aprano
 wrote:
> On Thu, 07 Jun 2018 22:46:09 +1000, Chris Angelico wrote:
>
>>> I wonder how many publicly facing web servers can be induced to either
>>> crash, or serve the wrong content, this way?
>>>
>>>
>> Define "serve the wrong content". You could get the exact same content
>> by asking for "te" instead of "te%00st.html";
>
> Perhaps so, but maybe you can bypass access controls to te and get access
> to it even though it is supposed to be private.
>
> This is a real vulnerability, called null-byte injection.
>
> One component of the system sees a piece of input, truncates it at the
> NULL, and validates the truncated input; then another component acts on
> the untruncated (and unvalidated) input.
>
> https://resources.infosecinstitute.com/null-byte-injection-php/
>
> https://capec.mitre.org/data/definitions/52.html
>
> Null-byte injection attacks have lead to remote attackers executing
> arbitrary code. That's unlikely in this scenario, but given that most web
> servers are written in C, not Python, it is conceivable that they could
> do anything under a null-byte injection attack.

Fair point. So you should just truncate early and have done with it. Easy.

> Does the Python web server suffer from that vulnerability? I would be
> surprised if it were. But it can be induced to crash (an exception, not a
> seg fault) which is certainly a vulnerability.

"Certainly"? I'm dubious on that. This isn't C, where a segfault
usually comes after executing duff memory, and therefore it's
plausible to transform a segfault into a remote code execution
exploit. This is Python, where we have EXCEPTION handling. Tell me, is
this a vulnerability?

@app.route("/foo")
def foo():
return "Kaboom", 500

What about this?

@app.route("/bar")
def bar():
1/0
return "Won't get here"

Put those into a Flask app and see what they do. One of them will
explicitly return a 500. The other will crash... and will return a
500. Is either of those a security problem? Now let's suppose a more
realistic version of the latter:

@app.route("/paginate/"):
def paginate(size):
total_pages = total_data/size
...

Yes, it's a bug. If someone tries a page size of zero, it'll divide by
zero and bomb. Great. But how is it a vulnerability? It is a
properly-handled exception.

It's slightly different with SimpleHTTPServer, as it fails to properly
send back the 500. That would be a bug IMO. Even then, though, all you
can do is clog the server with unfinished requests - and you can do
that much more easily by just connecting and being really slow to send
data. (And I doubt that people are using SimpleHTTPServer in
security-sensitive contexts anyway.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 22:46:09 +1000, Chris Angelico wrote:

>> I wonder how many publicly facing web servers can be induced to either
>> crash, or serve the wrong content, this way?
>>
>>
> Define "serve the wrong content". You could get the exact same content
> by asking for "te" instead of "te%00st.html"; 

Perhaps so, but maybe you can bypass access controls to te and get access 
to it even though it is supposed to be private.

This is a real vulnerability, called null-byte injection.

One component of the system sees a piece of input, truncates it at the 
NULL, and validates the truncated input; then another component acts on 
the untruncated (and unvalidated) input.

https://resources.infosecinstitute.com/null-byte-injection-php/

https://capec.mitre.org/data/definitions/52.html

Null-byte injection attacks have lead to remote attackers executing 
arbitrary code. That's unlikely in this scenario, but given that most web 
servers are written in C, not Python, it is conceivable that they could 
do anything under a null-byte injection attack.

Does the Python web server suffer from that vulnerability? I would be 
surprised if it were. But it can be induced to crash (an exception, not a 
seg fault) which is certainly a vulnerability.

Since people are unlikely to use this web server to serve mission 
critical public services over the internet, the severity is likely low. 
Nevertheless, it is still a real vulnerability.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 10:13 PM, Steven D'Aprano
 wrote:
> On Thu, 07 Jun 2018 19:47:03 +1000, Chris Angelico wrote:
>
>> To be fair, it's somewhat unideal behaviour - I would prefer to see an
>> HTTP 500 come back if the server crashes - but I can't see that that's a
>> security problem.
>
> You think that being able to remotely crash a webserver isn't a security
> issue?
>
>
> If Denial Of Service isn't a security issue in your eyes, what would it
> take? "Armed men burst into your house and shoot you"?
>
> *only half a wink*
>

By "crash" I mean that the request handler popped out an exception.
The correct behaviour is to send back a 500 and go back to handling
requests; with the extremely simple server given in that example, it
fails to send back the 500, but it DOES go back to handling requests.
So it's not a DOS. In any real server environment, this wouldn't have
any significant impact; even in this trivially simple server, the only
way you could hurt the server is by spamming enough of these that it
runs out of file handles for sockets or something.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 10:04:53 +0200, Antoon Pardon wrote:

> On 07-06-18 05:55, Steven D'Aprano wrote:
>> Python strings are rich objects which support the Unicode code point \0
>> in them. The limitation of the Linux kernel that it relies on NULL-
>> terminated byte strings is irrelevant to the question of what
>> os.path.exists ought to do when given a path containing NUL. Other
>> invalid path names return False.
> 
> It is not irrelevant. It makes the disctinction clear between possible
> values and impossible values. 

That is simply wrong. It is wrong in principle, and it is wrong in 
practice, for reasons already covered to death in this thread.

It is *wrong in practice* because other impossible values don't raise 
ValueError, they simply return False:

- illegal pathnames under Windows, those containing special 
  characters like ? > < * etc, simply return False;

- even on Linux, illegal pathnames like "" (the empty string)
  return False;

- invalid pathnames with too many path components, or too many
  characters in a single component, simply return False;

- the os.path.exists() function is not documented as making 
  a three-way split between "exists, doesn't exist and invalid";

- and it isn't even true to say that NULL is illegal in pathnames:
  there are at least five file systems that allow either NUL bytes:
  FAT-8, MFS, HFS, or Unicode \0 code points: HFS Plus and Apple
  File System.

And it is *wrong in principle* because in the most general case, there is 
no way to tell which pathnames are valid or invalid without querying an 
actual file system. In the case of Linux, any directory could be used as 
a mount point.

Is "/mnt/some?file" valid or invalid? If an NTFS file system is mounted 
on /mnt, it is invalid; if an ext4 file system is mounted there, it is 
valid; if there's nothing mounted there, the question is impossible to 
answer.


>> As a Python programmer, how does treating NUL specially make our life
>> better?
> 
> By treating possible path values differently from impossible path
> values.

But it doesn't do that. "Pathnames cannot contain NUL" is a falsehood 
that programmers wrongly believe about paths. HFS Plus and Apple File 
System support NULs in paths.

So what it does is wrongly single out one *POSSIBLE* path value to raise 
an exception, while other so-called "impossible" path values simply 
return False.

But in the spirit of compromise, okay, let's ignore the existence of file 
systems like HFS which allow NUL. Apart from Mac users, who uses them 
anyway? Let's pretend that every file system in existence, now and into 
the future, will prohibit NULs in paths.

Have you ever actually used this feature? When was the last time you 
wrote code like this?

try:
flag = os.path.exists(pathname)
except ValueError:
handle_null_in_path()
else:
if flag:
handle_file()
else:
handle_invalid_path_or_no_such_file()

I want to see actual, real code used in production, not made up code 
snippets, that demonstrate that this is a useful distinction to make.

Until such time that somebody shows me an actual real-world use-case for 
wanting to make this distinction for NULs and NULs alone, I call bullshit.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 10:18 PM, Steven D'Aprano
 wrote:
> On Thu, 07 Jun 2018 13:47:07 +0300, Marko Rauhamaa wrote:
>
>> Chris Angelico :
>>
>>> On Thu, Jun 7, 2018 at 7:29 PM, Marko Rauhamaa 
>>> wrote:
   3. http://localhost:8000/te%00st.html

  => The server crashes with a ValueError and the TCP connection is
 reset


>>> Actually, I couldn't even get Chrome to make that request, so it
>>> obviously was considered by the browser to be invalid.
>>
>> Wow! Why on earth?
>
> It works in Firefox, but Apache truncates the URL:
>
>
> Not Found
> The requested URL /te was not found on this server.
>
>
> instead of te%00st.html
>
> I wonder how many publicly facing web servers can be induced to either
> crash, or serve the wrong content, this way?
>

Define "serve the wrong content". You could get the exact same content
by asking for "te" instead of "te%00st.html"; what you've done is not
significantly different from this:

http://localhost:8000/te?st.html

Is that a security problem too?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 13:47:07 +0300, Marko Rauhamaa wrote:

> Chris Angelico :
> 
>> On Thu, Jun 7, 2018 at 7:29 PM, Marko Rauhamaa 
>> wrote:
>>>   3. http://localhost:8000/te%00st.html
>>>
>>>  => The server crashes with a ValueError and the TCP connection is
>>> reset
>>>
>>>
>> Actually, I couldn't even get Chrome to make that request, so it
>> obviously was considered by the browser to be invalid.
> 
> Wow! Why on earth?

It works in Firefox, but Apache truncates the URL:


Not Found
The requested URL /te was not found on this server.


instead of te%00st.html

I wonder how many publicly facing web servers can be induced to either 
crash, or serve the wrong content, this way?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Steven D'Aprano
On Thu, 07 Jun 2018 19:47:03 +1000, Chris Angelico wrote:

> To be fair, it's somewhat unideal behaviour - I would prefer to see an
> HTTP 500 come back if the server crashes - but I can't see that that's a
> security problem.

You think that being able to remotely crash a webserver isn't a security 
issue?


If Denial Of Service isn't a security issue in your eyes, what would it 
take? "Armed men burst into your house and shoot you"?

*only half a wink*



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 8:47 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Thu, Jun 7, 2018 at 7:29 PM, Marko Rauhamaa  wrote:
>>>   3. http://localhost:8000/te%00st.html
>>>
>>>  => The server crashes with a ValueError and the TCP connection is
>>> reset
>>>
>> it's somewhat unideal behaviour - I would prefer to see an HTTP 500
>> come back if the server crashes - but I can't see that that's a
>> security problem. Just a QOS issue, wherein you might get a 500 rather
>> than a 404 for certain requests.
>
> It's a demonstration of how this innocent-looking problem can lead to
> surprising and even serious consequences.
>
> The given URI is well-formed and should not give any particular trouble
> to any HTTP server.

You haven't demonstrated a security problem. Don't claim security
risks unless you can show there's at least a possibility of that;
otherwise, it's just FUD.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Marko Rauhamaa
Chris Angelico :

> On Thu, Jun 7, 2018 at 7:29 PM, Marko Rauhamaa  wrote:
>>   3. http://localhost:8000/te%00st.html
>>
>>  => The server crashes with a ValueError and the TCP connection is
>> reset
>>
>
> Actually, I couldn't even get Chrome to make that request, so it
> obviously was considered by the browser to be invalid.

Wow! Why on earth?

> it's somewhat unideal behaviour - I would prefer to see an HTTP 500
> come back if the server crashes - but I can't see that that's a
> security problem. Just a QOS issue, wherein you might get a 500 rather
> than a 404 for certain requests.

It's a demonstration of how this innocent-looking problem can lead to
surprising and even serious consequences.

The given URI is well-formed and should not give any particular trouble
to any HTTP server.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Antoon Pardon
On 07-06-18 11:29, Marko Rauhamaa wrote:
> Antoon Pardon :
>
>> On 07-06-18 05:55, Steven D'Aprano wrote:
>>> As a Python programmer, how does treating NUL specially make our life
>>> better?
>> By treating possible path values differently from impossible path
>> values.
> There are all kinds of impossibility. The os.stat() reports those
> impossibilities via an OSError exception. It's just that
> os.path.exists() converts the OSError exception into a False return
> value. A ValueError is raised by the Python os.stat() wrapper to
> indicate that it can't even deliver the request to the kernel.
>
> The application programmer doesn't give an iota who determined the
> impossibility of a pathname.

So? The fact that the application programmer doesn't give an iota who
determined the impossibility of a pathname, doesn't imply he is
equally unconcerned about the specific impossibility he ran into.

> Unfortunately, os.path.exists() forces the
> distinction on the application.

No it doesn't. It forces the distinction between two different kinds
of impossibilities, but you don't have to care where they originate
from.

>  If I have to be prepared to catch a
> ValueError from os.path.exists(), what added value does os.path.exists()
> give on top of os.stat()? The whole point of os.path.exists() is
>
>   1. To provide an operating-system-independent abstraction.
>
>   2. To provide a boolean interface instead of an exception interface.

Mayby trying to provide such an interface is inherently flawed. Answering
me a path doesn't exist because of a permission problem is IMO not a good
idea.

--
Antoon.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 7:29 PM, Marko Rauhamaa  wrote:
> This is a security risk. Here is a brief demonstration. Copy the example
> HTTP server from:
>
>https://docs.python.org/3/library/http.server.html?highlight=h
>ttp#http.server.SimpleHTTPRequestHandler>
>
> Run the server. Try these URLs in your browser:
>
>   1. http://localhost:8000/
>
>  => The directory listing is provided
>
>   2. http://localhost:8000/test.html
>
>  => A file is served or an HTTP error response (404) is generated
>
>   3. http://localhost:8000/te%00st.html
>
>  => The server crashes with a ValueError and the TCP connection is
> reset
>

Actually, I couldn't even get Chrome to make that request, so it
obviously was considered by the browser to be invalid. Doing the
request with curl produced a traceback on the server and an empty
response in the client. (And then the server returns to handling
requests normally.) How is this a security risk, exactly? To be fair,
it's somewhat unideal behaviour - I would prefer to see an HTTP 500
come back if the server crashes - but I can't see that that's a
security problem. Just a QOS issue, wherein you might get a 500 rather
than a 404 for certain requests.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Marko Rauhamaa
Marko Rauhamaa :

> This is a security risk. Here is a brief demonstration. Copy the example
> HTTP server from:
>
>https://docs.python.org/3/library/http.server.html?highlight=h
>ttp#http.server.SimpleHTTPRequestHandler>
>
> [...]
>
>   3. http://localhost:8000/te%00st.html
>
>  => The server crashes with a ValueError and the TCP connection is
> reset

An exercise for the reader: provide a fix for the example server so the
request returns a 404 response just like any other nonexistent resource.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Marko Rauhamaa
Antoon Pardon :

> On 07-06-18 05:55, Steven D'Aprano wrote:
>> As a Python programmer, how does treating NUL specially make our life
>> better?
>
> By treating possible path values differently from impossible path
> values.

There are all kinds of impossibility. The os.stat() reports those
impossibilities via an OSError exception. It's just that
os.path.exists() converts the OSError exception into a False return
value. A ValueError is raised by the Python os.stat() wrapper to
indicate that it can't even deliver the request to the kernel.

The application programmer doesn't give an iota who determined the
impossibility of a pathname. Unfortunately, os.path.exists() forces the
distinction on the application. If I have to be prepared to catch a
ValueError from os.path.exists(), what added value does os.path.exists()
give on top of os.stat()? The whole point of os.path.exists() is

  1. To provide an operating-system-independent abstraction.

  2. To provide a boolean interface instead of an exception interface.



This is a security risk. Here is a brief demonstration. Copy the example
HTTP server from:

   https://docs.python.org/3/library/http.server.html?highlight=h
   ttp#http.server.SimpleHTTPRequestHandler>

Run the server. Try these URLs in your browser:

  1. http://localhost:8000/

 => The directory listing is provided

  2. http://localhost:8000/test.html

 => A file is served or an HTTP error response (404) is generated

  3. http://localhost:8000/te%00st.html

 => The server crashes with a ValueError and the TCP connection is
reset


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Antoon Pardon
On 07-06-18 05:55, Steven D'Aprano wrote:
> Python strings are rich objects which support the Unicode code point \0 
> in them. The limitation of the Linux kernel that it relies on NULL-
> terminated byte strings is irrelevant to the question of what 
> os.path.exists ought to do when given a path containing NUL. Other 
> invalid path names return False.

It is not irrelevant. It makes the disctinction clear between possible
values and impossible values. Now you personnaly may find that distinction
of minor importance but it is a relevant distinction in discussing how
to treat it.

> As a Python programmer, how does treating NUL specially make our life 
> better?

By treating possible path values differently from impossible path values.

-- 
Antoon.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-07 Thread Chris Angelico
On Thu, Jun 7, 2018 at 1:55 PM, Steven D'Aprano
 wrote:
> On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:
>
>> And an ASCIIZ string cannot contain a byte value of zero. The parallel
>> is exact.
>
> Why should we, as Python programmers, care one whit about ASCIIZ strings?
> They're not relevant. You might as well say that file names cannot
> contain the character "π" because ASCIIZ strings don't support it.
>
> No they don't, and yet nevertheless file names can and do contain
> characters outside of the ASCIIZ range.

Under Linux, a file name contains bytes, most commonly representing
UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
or at least a representation of it. Yet it cannot contain "\0".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-06 Thread Steven D'Aprano
On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:

> And an ASCIIZ string cannot contain a byte value of zero. The parallel
> is exact.

Why should we, as Python programmers, care one whit about ASCIIZ strings? 
They're not relevant. You might as well say that file names cannot 
contain the character "π" because ASCIIZ strings don't support it.

No they don't, and yet nevertheless file names can and do contain 
characters outside of the ASCIIZ range.

Python strings are rich objects which support the Unicode code point \0 
in them. The limitation of the Linux kernel that it relies on NULL-
terminated byte strings is irrelevant to the question of what 
os.path.exists ought to do when given a path containing NUL. Other 
invalid path names return False.

As a Python programmer, how does treating NUL specially make our life 
better?

I don't know what the implementation of os.path.exists is precisely, but 
in pseudocode I expect it is something like this:


if "\0" in pathname:
panic("OH NOES A NUL WHATEVER SHALL WE DO?!?!?!")
else:
ask the OS to do a stat on pathname
if an error occurs:
 return False
else:
 return True


Why not just return False instead of panicking?





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-06 Thread Steven D'Aprano
On Tue, 05 Jun 2018 17:28:24 +0200, Peter J. Holzer wrote:
[...]
> If a disk with a file system which allows embedded NUL characters is
> mounted on Linux (let's for the sake of the argument assume it is HFS+,
> although I have to admit that I don't know anything about the internals
> of that filesystem), then the low level filesystem code has to map that
> character to something else. Even the generic filesystem code of the
> kernel will never see that NUL character,

Even if this were true, why is it even the tiniest bit relevant to what 
os.path.exists() does when given a path containing a NUL byte?


> let alone the user space. As
> far as the OS is concerned, that file doesn't contain a NUL character.

I don't care about "as far as the OS". I care about users, people like 
me. If I say "Here's a file called "sp\0am" then I don't care what the OS 
does, or the FS driver, or the disk hardware. I couldn't care less what 
the actual byte pattern on the disk is.

If you told me that the pattern of bytes representing that filename was 
0x0102030405 then I'd be momentarily impressed by the curious pattern and 
then do my best to immediately forget all about it.

As a Python programmer, *why do you care* about NULs? How does this 
special treatment make your life as a Python programmer better?


> The whole system (except for some low-level FS-dependent code) will
> always only see the mapped name.

Yes. So what? That's *already the case*. Even Python string you pass to 
os.path.exists is already mapped, and errors from the kernel are mapped 
to False. Why should NUL be treated differently?

Typical Linux file systems (ext3, ext4, btrfs, ReiserFS etc) don't 
support Unicode, only bytes 0...255, but we can query "invalid" file 
names containing characters like δ ж or ∆, without any problem. We don't 
get ValueError just because of some irrelevant technical detail that the 
file system doesn't support characters outside of the range of bytes 
1...255 (excluding 47). We can do this because Python seamlessly maps 
Unicode to bytes and back again.

You may have heard of a little-known operating system called "Windows", 
which defaults to NTFS as its file system. I'm told that there are a few 
people who use this file system. Even under Linux, you might have 
(knowingly or unknowingly) used a network file system or storage device 
that used NTFS under the hood.

If so, then every time you query a filename, even an ordinary looking one 
like "foo", you could be dealing with multiple NUL bytes, as the NTFS 
file system (even under Linux!) uses Unicode file names encoded with 
UTF-16. There's a good chance that EVERY filename you've used on a NAS 
device or network drive has included embedded NUL bytes.

You've painted a pretty picture of the supposed confusion and difficulty 
such NUL bytes would cause, but its all nonsense. We already can 
seamlessly and transparently interact with file systems where file names 
include NUL bytes under Linux.

BUT even if what you said was true, that Linux cannot deal with NUL bytes 
in file names even with driver support, even if passing a NUL byte to the 
Linux kernel would cause the fall of human civilization, that STILL 
wouldn't require us to raise ValueError from os.path.exists!




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread eryk sun
On Tue, Jun 5, 2018 at 3:28 PM, Peter J. Holzer  wrote:
>
> Now, if MacOS uses something like that, this is a different matter.
> Presumably (since HFS+ is a native file system) the kernel deals with
> NUL characters in a straightforward manner. It might even have a
> (non-POSIX) API to expose such filenames. Even if it hasn't, presumably
> the mapping back and forth is done in a very low level library used by
> all (or most) of the applications, so that they all show consistently
> the same filename.

The Linux subsystem in Windows 10 has to use character escaping. The
root file system is stored in the NTFS directory
"%LocalAppData%\Packages\\LocalState\rootfs". It
escapes invalid NTFS characters (as implemented by the ntfs.sys
driver) using the hex code prefixed by "#". Thus "#" itself has to be
escaped as "#0023". For example:

$ touch '\*?<>|#'
$ ls '\*?<>|#'
\*?<>|#

With CMD in the above directory, we can see the real filename:

> dir /b #*
#005C#002A#003F#003C#003E#007C#0023
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread Peter J. Holzer
On 2018-06-05 07:37:34 +, Steven D'Aprano wrote:
> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
> > On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
> >> I don't know whether or not the Linux OS is capable of accessing
> >> files with embedded NULs in the file name. But Mac OS is capable of
> >> doing so, so it should be possible. Wikipedia says:
> >> 
> >> "HFS Plus mandates support for an escape sequence to allow
> >> arbitrary Unicode. Users of older software might see the escape
> >> sequences instead of the desired characters."
> > 
> > I don't know about MacOS. In Linux there is no way to pass a
> > filename with an embedded '\0' (or a '/' which is not path
> > separator) between the kernel and user space. So if a filesystem
> > contained such a filename, the kernel would have to map it (via an
> > escape sequence or some other mechanism) to a different file name.
> > Which of course means that - from the perspective of any user space
> > process - the filename doesn't contain a '\0' or '/'.
> 
> That's an invalid analogy. According to that analogy, Python strings
> don't contain ASCII NULs, because you have to use an escape mechanism
> to insert them:
> 
> string = "Is this \0 not a NULL?"
> 
> 
> But we know that Python strings are not NUL-terminated and can contain
> NUL. It's just another character.

I think that's a bad analogy.

The escape mechanism for string literals is mostly for convenience of
the programmer. It's there to make the program's source code more
readable (and yes, also easier to write). But at run time the  \0
character is just that: A character with the value 0.

If a disk with a file system which allows embedded NUL characters is
mounted on Linux (let's for the sake of the argument assume it is HFS+,
although I have to admit that I don't know anything about the internals
of that filesystem), then the low level filesystem code has to map that
character to something else. Even the generic filesystem code of the
kernel will never see that NUL character, let alone the user space. As
far as the OS is concerned, that file doesn't contain a NUL character.
The whole system (except for some low-level FS-dependent code) will
always only see the mapped name.

If some application (which might be an interpreter, or it might be a
graphics program, for example) decides that it knows better what the
"real" filename is and reverses that mapping, it can do so - but it
would be very confusing because it would use a different file name than
the rest of the system. The user would see one file name with ls, but
would have to use a different filename in the application. The
application would show one filename in its "save" dialog, but the OS's
file manager would show another. Not a good idea, especially as the
benefits of such a scheme would be extremely narrow (you could share an
HFS+ formatted USB disk between MacOS and Linux with filenames with
embedded NULs and that application would let you use the same filenames
as you would use on MacOS).

Now, if MacOS uses something like that, this is a different matter.
Presumably (since HFS+ is a native file system) the kernel deals with
NUL characters in a straightforward manner. It might even have a
(non-POSIX) API to expose such filenames. Even if it hasn't, presumably
the mapping back and forth is done in a very low level library used by
all (or most) of the applications, so that they all show consistently
the same filename.

But Linux isn't MacOS. On Linux there are no filenames with embedded
NULs, even if you mount an HFS+ disk and even if some application
decides to internally remap filenames in a way that they can contain NUL
characters.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread Chris Angelico
On Tue, Jun 5, 2018 at 11:11 PM, Steven D'Aprano
 wrote:
> On Tue, 05 Jun 2018 20:15:01 +1000, Chris Angelico wrote:
>
>> On Tue, Jun 5, 2018 at 5:37 PM, Steven D'Aprano
>>  wrote:
>>> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
>>>
 On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
>>> [...]
>>>
> I don't know whether or not the Linux OS is capable of accessing
> files with embedded NULs in the file name. But Mac OS is capable of
> doing so, so it should be possible. Wikipedia says:
>
> "HFS Plus mandates support for an escape sequence to allow arbitrary
> Unicode. Users of older software might see the escape sequences
> instead of the desired characters."

 I don't know about MacOS. In Linux there is no way to pass a filename
 with an embedded '\0' (or a '/' which is not path separator) between
 the kernel and user space. So if a filesystem contained such a
 filename, the kernel would have to map it (via an escape sequence or
 some other mechanism) to a different file name. Which of course means
 that - from the perspective of any user space process - the filename
 doesn't contain a '\0' or '/'.
>>>
>>> That's an invalid analogy. According to that analogy, Python strings
>>> don't contain ASCII NULs, because you have to use an escape mechanism
>>> to insert them:
>>>
>>> string = "Is this \0 not a NULL?"
>>>
>>>
>>> But we know that Python strings are not NUL-terminated and can contain
>>> NUL. It's just another character.
>>>
>>>
>> No; by that analogy, a Python string cannot contain a non-Unicode
>> character. Here's a challenge: create a Python string that contains a
>> character that isn't part of the Universal Character Set.
>
> Huh? In what way is that the analogy being made? Your challenge is
> impossible from pure Python, equivalent to "create a Python bytes object
> that contains a byte greater than 255". The challenge is rigged to be
> doomed to fail.

And an ASCIIZ string cannot contain a byte value of zero. The parallel is exact.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread Steven D'Aprano
On Tue, 05 Jun 2018 20:15:01 +1000, Chris Angelico wrote:

> On Tue, Jun 5, 2018 at 5:37 PM, Steven D'Aprano
>  wrote:
>> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
>>
>>> On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
>> [...]
>>
 I don't know whether or not the Linux OS is capable of accessing
 files with embedded NULs in the file name. But Mac OS is capable of
 doing so, so it should be possible. Wikipedia says:

 "HFS Plus mandates support for an escape sequence to allow arbitrary
 Unicode. Users of older software might see the escape sequences
 instead of the desired characters."
>>>
>>> I don't know about MacOS. In Linux there is no way to pass a filename
>>> with an embedded '\0' (or a '/' which is not path separator) between
>>> the kernel and user space. So if a filesystem contained such a
>>> filename, the kernel would have to map it (via an escape sequence or
>>> some other mechanism) to a different file name. Which of course means
>>> that - from the perspective of any user space process - the filename
>>> doesn't contain a '\0' or '/'.
>>
>> That's an invalid analogy. According to that analogy, Python strings
>> don't contain ASCII NULs, because you have to use an escape mechanism
>> to insert them:
>>
>> string = "Is this \0 not a NULL?"
>>
>>
>> But we know that Python strings are not NUL-terminated and can contain
>> NUL. It's just another character.
>>
>>
> No; by that analogy, a Python string cannot contain a non-Unicode
> character. Here's a challenge: create a Python string that contains a
> character that isn't part of the Universal Character Set.

Huh? In what way is that the analogy being made? Your challenge is 
impossible from pure Python, equivalent to "create a Python bytes object 
that contains a byte greater than 255". The challenge is rigged to be 
doomed to fail.

That's not the case when it comes to \0 in file names: we know that Mac 
OS can do it, we know HFS and Apple FS support NUL in file names. We have 
an existence proof that it is possible.

(Although in your case, it is conceivable that using C you might be able 
to solve the challenge: create a string using the UCS-4 implementation 
(32-bit code units), then modify some code unit to be a value outside of 
the 21-bit range supported by Unicode. But that would require low-level 
hacking, it isn't supported by the language or the interpreter except 
maybe via ctypes.)

Apple FS, HFS and HFS Plus support \0 as a valid Unicode character. The 
Mac OS kernel has an escape mechanism to allow user code to include \0 
characters in pathnames, just as Python has an escape mechanism to allow 
user code to include \0 in strings.

There's no such escape mechanism for characters outside of Unicode.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread Chris Angelico
On Tue, Jun 5, 2018 at 5:37 PM, Steven D'Aprano
 wrote:
> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
>
>> On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
> [...]
>
>>> I don't know whether or not the Linux OS is capable of accessing files
>>> with embedded NULs in the file name. But Mac OS is capable of doing so,
>>> so it should be possible. Wikipedia says:
>>>
>>> "HFS Plus mandates support for an escape sequence to allow arbitrary
>>> Unicode. Users of older software might see the escape sequences instead
>>> of the desired characters."
>>
>> I don't know about MacOS. In Linux there is no way to pass a filename
>> with an embedded '\0' (or a '/' which is not path separator) between the
>> kernel and user space. So if a filesystem contained such a filename, the
>> kernel would have to map it (via an escape sequence or some other
>> mechanism) to a different file name. Which of course means that - from
>> the perspective of any user space process - the filename doesn't contain
>> a '\0' or '/'.
>
> That's an invalid analogy. According to that analogy, Python strings
> don't contain ASCII NULs, because you have to use an escape mechanism to
> insert them:
>
> string = "Is this \0 not a NULL?"
>
>
> But we know that Python strings are not NUL-terminated and can contain
> NUL. It's just another character.
>

No; by that analogy, a Python string cannot contain a non-Unicode
character. Here's a challenge: create a Python string that contains a
character that isn't part of the Universal Character Set.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-05 Thread Steven D'Aprano
On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:

> On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
[...]

>> I don't know whether or not the Linux OS is capable of accessing files
>> with embedded NULs in the file name. But Mac OS is capable of doing so,
>> so it should be possible. Wikipedia says:
>> 
>> "HFS Plus mandates support for an escape sequence to allow arbitrary
>> Unicode. Users of older software might see the escape sequences instead
>> of the desired characters."
> 
> I don't know about MacOS. In Linux there is no way to pass a filename
> with an embedded '\0' (or a '/' which is not path separator) between the
> kernel and user space. So if a filesystem contained such a filename, the
> kernel would have to map it (via an escape sequence or some other
> mechanism) to a different file name. Which of course means that - from
> the perspective of any user space process - the filename doesn't contain
> a '\0' or '/'.

That's an invalid analogy. According to that analogy, Python strings 
don't contain ASCII NULs, because you have to use an escape mechanism to 
insert them:

string = "Is this \0 not a NULL?"


But we know that Python strings are not NUL-terminated and can contain 
NUL. It's just another character.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread eryk sun
On Sat, Jun 2, 2018 at 11:28 AM, Chris Angelico  wrote:
>
> I also can't find anything about path names there. What does POSIX say
> about the concept of relative paths? Does Windows comply with that?

Certainly Windows file-system paths are not POSIX compatible. Seven
path types are supported:

* Extended Local Device (\\?\)
* Local Device (\\.\)
* UNC
* Drive Absolute
* Drive Relative
* Rooted
* Relative

Extended local-device paths only allow backslash as a path separator.
The others allow either backslash or slash. I doubt POSIX would allow
magically reserved DOS device names in every directory or stripping of
trailing dots and spaces from filenames.

But this isn't relevant to NT's POSIX compatibility. A POSIX process
links with psxdll.dll, which connects to the POSIX environment
subsystem (psxss.exe). It gets run from Windows via posix.exe
(console) or psxrun.exe. In the 00s, Microsoft acquired Interix, which
extended the original POSIX subsystem, and integrated it as the
Subsystem for UNIX Applications (SUA). Notably SUA adds a kernel
driver, psxdrv.sys, which facilitates implementing system calls and
signals. There used to be a community website with overviews [1], a
FAQ [2], a forum [3], tool downloads [4], and various documentation
[5]. However, NT's environment subsystems never really had mass
appeal, probably because existing programs had to be ported and
recompiled. SUA is no longer supported as of Windows 8.1 and Server
2012 R2. The community website was closed, and the domain is now held
by a squatter.

Regarding file-system paths, SUA has a single root directory and uses
"/dev/fs/C" for drive "C:" and "/net/server/share" for
"\\server\share".

[1]: http://www.suacommunity.com/SUA_Tools_Env_Start.htm
 https://archive.li/45JG
[2]: http://www.suacommunity.com/FAQs.htm
 https://archive.li/5LFw
[3]: http://www.suacommunity.com/forum2
 https://archive.li/LzZxS
[4]: http://www.suacommunity.com/tool_warehouse.aspx
 https://archive.li/0luI9
[5]: http://www.suacommunity.com/dictionary/fork-entry.php
 https://archive.li/5k8vW

Windows 10 has a Linux subsystem (WSL), but this is not an NT
environment subsystem. WSL processes do not load ntdll.dll. They're
lightweight pico processes with an associated pico provider in the
kernel (lxss.sys, lxcore.sys), and they directly execute native Linux
binaries (no porting and recompiling from source). WSL only supports
the console, but at least the console was upgraded to support
virtual-terminal mode.

> We know that Ctrl-C maps to the internal Windows interrupt
> handler, and "kill process" maps to the internal Windows "terminate",
> but can you send a different process all the different signals and
> handle them differently?

IIRC, the original POSIX subsystem supported only single-threaded
processes, and SIGKILL called NtTerminateThread. Of course the
subsystem has its own client bookkeeping to handle here as well. (For
the Windows subsystem, csrss.exe also maintains shadow process and
thread structures for clients. This is how an environment subsystem
supplements base NT behavior.)

Regarding Ctrl+C, a console session is started by posix.exe, which is
a Windows console application. It translates console control events to
signals, e.g. CTRL_C_EVENT to SIGINT, CTRL_BREAK_EVENT to SIGQUIT, and
otherwise SIGKILL (e.g. closing the console, logoff, shutdown). It
sends the signal number and session ID to the subsystem, which signals
the processes in the given session.

One way for the subsystem to implement signal delivery is via NT's
runtime library function RtlRemoteCall (i.e. suspend the target
thread, get its CPU context and copy it to the stack, modify the
context and stack, and resume). Make a remote call to a known client
function (i.e. in psxdll.dll), which delivers the signal and then
continues the thread's original context via NtContinue. This approach
isn't really efficient, but it's basically how the original POSIX
subsystem worked. SUA probably uses NT asynchronous procedure calls
(APCs).

---

Appendix: NT APCs

NT doesn't have anything exactly like POSIX signals. It has
exceptions, which are handled using either Vectored Exception Handling
or Structured Exception Handling (i.e. MSVC __try, __except,
__finally), and asynchronous procedure calls (APCs). Some POSIX
signals correspond to NT exceptions (e.g. SIGSEGV corresponds to a
STATUS_ACCESS_VIOLATION). But APCs are what a POSIX subsystem would
likely use to implement signals.

There are two types of APC: kernel and user. A thread has an APC queue
for each type. User APCs can be queued from user mode via
NtQueueApcThread, or via WinAPI QueueUserAPC. Some APIs such as
ReadFileEx take an optional completion or notification APC routine,
for which a kernel component queues the user APC.

All APCs have a "kernel routine" and most also have a "normal"
routine. The kernel routine is called first, with the CPU in kernel
mode and its interrupt request level (IRQL) at APC_LEVEL (1). The

Re: Why exception from os.path.exists()?

2018-06-04 Thread Peter J. Holzer
On 2018-06-04 13:23:59 +, Steven D'Aprano wrote:
> On Mon, 04 Jun 2018 13:33:28 +0100, Paul Moore wrote:
> > But there's also the question of what capability the kernel API has to
> > express the queries. The fact that the Unix API (and the Windows one, in
> > most cases - although as Eryk Sun pointed out there are exceptions in
> > the Windows kernel API) uses NUL-terminated strings means that querying
> > the filesystem about filenames with embedded \0 characters isn't
> > possible *at the OS level*.
> 
> I don't know whether or not the Linux OS is capable of accessing files 
> with embedded NULs in the file name. But Mac OS is capable of doing so, 
> so it should be possible. Wikipedia says:
> 
> "HFS Plus mandates support for an escape sequence to allow arbitrary 
> Unicode. Users of older software might see the escape sequences instead 
> of the desired characters."

I don't know about MacOS. In Linux there is no way to pass a filename
with an embedded '\0' (or a '/' which is not path separator) between the
kernel and user space. So if a filesystem contained such a filename, the
kernel would have to map it (via an escape sequence or some other
mechanism) to a different file name. Which of course means that - from
the perspective of any user space process - the filename doesn't contain
a '\0' or '/'.

Theoretically that mapping could be reversed in the standard library of
a language which allows '\0' in strings (like Python), but since that
would mean that programs written in that language see different
filenames than programs written in other languages (especially C, which
covers the majority of the GNU command line tools), this would be a very
bad idea. Much better to have strange but consistent filenames if you
mount a "foreign" file system. (This is btw also what Samba does,
although it does a spectacularly bad job).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Grant Edwards
On 2018-06-03, Gregory Ewing  wrote:
> Steven D'Aprano wrote:
>> Do you really mean to say that a computer that won't boot is POSIX 
>> compliant?
>
> No, I was pointing out the absurdity of saying that the Windows
> kernel layer is POSIX compliant, which is what the post I was
> replying to seemed to be saying.

The normal Win32 API that all Windows apps use is not Posix compliant.

However, there is an API layer Microsoft provides (or provided) that
is/was Posix compliant.  At one point, I think it was an add-on that
had to be purchased seperately.  I've never heard of anybody actually
_using_ it, but it allowed some US government purchasing droid to
check the "Posix Compliant" box on an acquisition checklist back in
the 90's.

-- 
Grant Edwards   grant.b.edwardsYow! But they went to MARS
  at   around 1953!!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Steven D'Aprano
On Mon, 04 Jun 2018 13:33:28 +0100, Paul Moore wrote:

> But there's also the question of what capability the kernel API has to
> express the queries. The fact that the Unix API (and the Windows one, in
> most cases - although as Eryk Sun pointed out there are exceptions in
> the Windows kernel API) uses NUL-terminated strings means that querying
> the filesystem about filenames with embedded \0 characters isn't
> possible *at the OS level*.

I don't know whether or not the Linux OS is capable of accessing files 
with embedded NULs in the file name. But Mac OS is capable of doing so, 
so it should be possible. Wikipedia says:

"HFS Plus mandates support for an escape sequence to allow arbitrary 
Unicode. Users of older software might see the escape sequences instead 
of the desired characters."

Apple File System is an even more modern FS (it replaced HFS Plus in 2017 
as Apple's preferred OS) which supports all Unicode code points, 
including NUL.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Paul Moore
On 4 June 2018 at 13:01, Steven D'Aprano
 wrote:

>> Turns out that this is a limitation on Windows as well. The \0 is not
>> allowed for Windows, macOS and Posix.
>
> We -- all of us, including myself -- have been terribly careless all
> through this discussion. The fact is, this should not be an OS limitation
> at all. It is a *file system* limitation.
>
> If I can mount a HFS or HFS-Plus disk on Linux, it can include file names
> with embedded NULs or slashes. (Only the : character is illegal in HFS
> file names.) It shouldn't matter what the OS is, if I have drivers for
> HFS and can mount a HFS disk, I ought to be able to sensibly ask for file
> names including NUL.

Agreed, being completely precise in this situation is both pretty
complicated, and essential.

The question of what are legal characters in a filename is, as you
say, a filesystem related issue. People traditionally forget this
point, but in these days of cross-platform filesystem mounting,
networked filesystems[1], etc, it's more and more relevant, and
thankfully people are getting more aware of the point.

But there's also the question of what capability the kernel API has to
express the queries. The fact that the Unix API (and the Windows one,
in most cases - although as Eryk Sun pointed out there are exceptions
in the Windows kernel API) uses NUL-terminated strings means that
querying the filesystem about filenames with embedded \0 characters
isn't possible *at the OS level*. (As another example, the fact that
the Unix kernel treats filenames as byte strings means that there are
translation issues querying an NTFS filesystem that uses Unicode
(UTF-16) natively - and vice versa when Windows queries a Unix-native
filesystem).

So "it's complicated" is about the best we can say :-)

Paul

[1] And of course if you mount (say) an NTFS filesystem over NFS, you
have *two* filesystems involved, each adding its own layer of
restrictions and capabilities.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Marko Rauhamaa
Barry Scott :
> os.path.exists() is not special and I don't think should be be changed.

You are right that os.path.exists() might be logically tied to other
os.* facilities. The question is, should the application be cognizant of
the seam between the standard library and the operating system kernel?

When a Linux system call contains an illegal value, it responds with
errno=EINVAL. In Python, that's represented by the OSError exception
with e.errno=EINVAL. However, when Python encounters an illegal value
itself, it usually raises a ValueError. Is it useful for the application
to have to be prepared for OSError/EINVAL and ValueError separately? Or
should the difference be paved over by Python?

As it stands, os.path.exists() really means: the operating system
doesn't have a reason to fail os.stat() on the pathname. Python
intercedes with an exception if it can't even ask the operating system
for its opinion. That dichotomy is not suggested by the os.path.exists()
documentation. In fact, the whole point of os.path.* is to provide for
an abstraction to isolate the application from the intricacies of the
operating system specifics.

BTW, I challenge you to find a test case that tests the proper behavior
of an application if it encounters a pathname with a NUL in it. Or code
that gracefully catches a ValueError from os.path.exists().


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Steven D'Aprano
On Mon, 04 Jun 2018 11:16:21 +0100, Barry Scott wrote:

[...]
> Turns out that this is a limitation on Windows as well. The \0 is not
> allowed for Windows, macOS and Posix.

We -- all of us, including myself -- have been terribly careless all 
through this discussion. The fact is, this should not be an OS limitation 
at all. It is a *file system* limitation.

If I can mount a HFS or HFS-Plus disk on Linux, it can include file names 
with embedded NULs or slashes. (Only the : character is illegal in HFS 
file names.) It shouldn't matter what the OS is, if I have drivers for 
HFS and can mount a HFS disk, I ought to be able to sensibly ask for file 
names including NUL.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-04 Thread Barry Scott



> On 1 Jun 2018, at 14:23, Paul Moore  wrote:
> 
> On 1 June 2018 at 13:15, Barry Scott  wrote:
>> I think the reason for the \0 check is that if the string is passed to the
>> operating system with the \0 you can get surprising results.
>> 
>> If \0 was not checked for you would be able to get True from:
>> 
>>os.file.exists('/home\0ignore me')
>> 
>> This is because a posix system only sees '/home'.

Turns out that this is a limitation on Windows as well.
The \0 is not allowed for Windows, macOS and Posix.

> 
> So because the OS API can't handle filenames with \0 in (because that
> API uses null-terminated strings) Python has to special case its
> handling of the check. That's fine.
> 
>> Surely ValueError is reasonable?
> 
> Well, if the OS API can't handle filenames with embedded \0, we can be
> sure that such a file doesn't exist - so returning False is
> reasonable.

I think most of the file APIs check for \0 and raise ValueError on python3
and  TypeError on python2.

os.path.exists() is not special and I don't think should be be changed.

> 
>> Once you know that all of the string you provided is given to the operating
>> system it can then do whatever checks it sees fit to and return a suitable
>> result.
> 
> As the programmer, I don't care. The Python interpreter should take
> care of that for me, and if I say "does file 'a\0b' exist?" I want an
> answer. And I don't see how anything other than "no it doesn't" is
> correct. Python allows strings with embedded \0 characters, so it's
> possible to express that question in Python - os.path.exists('a\0b').
> What can be expressed in terms of the low-level (C-based) operating
> system API shouldn't be relevant.
> 
> Disclaimer - the Python "os" module *does* expose low-level
> OS-dependent functionality, so it's not necessarily reasonable to
> extend this argument to other functions in os. But it seems like a
> pretty solid argument in this particular case.
> 
>> As an aside Windows has lots of special filenames that you have to know about
>> if you are writting robust file handling. AUX, COM1, \this\is\also\COM1 etc.
> 
> I don't think that's relevant in this context.

I think it is. This started because the OP was surprised that they needed to 
check for \0.
There are related surprised waiting. I'm point out that its more then \0 a 
robust
piece of code will need to consider.

Barry

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Steven D'Aprano
On Sun, 03 Jun 2018 10:38:34 +1000, Chris Angelico wrote:

> Let's just rewind this subthread a little bit. YOU said that the
> behaviour of os.path.exists on Unix systems should be "return False for
> invalid things" on the basis that the Windows invalid paths return
> False. Remember? 

No, invalid paths on Linux return False too:

py> os.path.exists("")
False


I can make a VFAT partition under Linux:

[steve@ando ~]$ dd if=/dev/zero of=fat.fs bs=1024 count=48
48+0 records in
48+0 records out
49152 bytes (49 kB) copied, 0.0149677 seconds, 3.3 MB/s
[steve@ando ~]$ /sbin/mkfs.vfat fat.fs
mkfs.vfat 2.11 (12 Mar 2005)
[steve@ando ~]$ mkdir dos
[steve@ando ~]$ sudo mount -o loop fat.fs ./dos
[sudo] password for steve:


I can write to it (as root), but not all file names are valid:

[steve@ando ~]$ sudo touch ./dos/foo
[steve@ando ~]$ sudo touch ./dos/"foo?"
touch: setting times of `./dos/foo?': No such file or directory
[steve@ando ~]$ ls ./dos
foo

And even though I'm using Linux, I get the right answer, legal file name 
or not legal file name.


[steve@ando ~]$ python3.5 -c "import os; \
> print(os.path.exists('./dos/foo'))"
True
[steve@ando ~]$ python3.5 -c "import os; \
> print(os.path.exists('./dos/foo?'))"
False



> I said that Windows isn't POSIX,

And I said, as a by-the-by, that technically Windows is POSIX compliant, 
for a very pedantically true but dubious in practice value of compliant.


> and pointed out just a couple of ways in which, to a programmer, Windows
> behaves very differently to POSIX-compliant systems.

Is that Windows out of the box, or Windows with the POSIX subsystem 
installed and active?

You keep talking about "POSIX-compliant", but POSIX is a family of 
standards. A system can be compliant with one POSIX standard without 
being compliant to the others.

And ironically, neither Linux, OpenBSD, FreeBSD nor Darwin are fully 
POSIX compliant, merely "mostly" compliant. (Or at least, they haven't 
been certified as such.)

Not that it matters much in practice.


In any case, the minutia of POSIX versus Windows, the availability of 
drive letters and signals etc are utterly irrelevant to the question of 
what os.path.exists should do.

Just as it ought to be utterly irrelevant that on Linux native C strings 
are null terminated.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Gregory Ewing

Steven D'Aprano wrote:
Do you really mean to say that a computer that won't boot is POSIX 
compliant?


No, I was pointing out the absurdity of saying that the Windows
kernel layer is POSIX compliant, which is what the post I was
replying to seemed to be saying.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Chris Angelico
On Sun, Jun 3, 2018 at 10:08 AM, Steven D'Aprano
 wrote:
> Chris, you seem to be labouring under the misapprehension that the claim
> is that a stock standard Windows  installation complies
> with the latest version of the POSIX standard.
>
> That's not the case.
>
> The claim (and fact) is that Windows NT with the POSIX subsystem
> installed (and possibly other changes made) is technically compliant with
> version 1 of the POSIX standard, which even in 1997 was only a fraction
> of what most Unix systems provided.
>
> https://en.wikipedia.org/wiki/POSIX#Versions
>
> Just enough to allow bean counters to tick the box that says "POSIX
> compliant" in a government requirements form, provided the technical
> people involved either don't get a say, or do get a say and actually want
> Windows but have to satisfy some bureaucratic requirement for POSIX.
>
> Nobody thinks that standard Windows counts as a Unix.

Let's just rewind this subthread a little bit. YOU said that the
behaviour of os.path.exists on Unix systems should be "return False
for invalid things" on the basis that the Windows invalid paths return
False. Remember? Or are you just too het up about arguing this point
that you don't care why you're arguing? I said that Windows isn't
POSIX, and pointed out just a couple of ways in which, to a
programmer, Windows behaves very differently to POSIX-compliant
systems. The two examples I gave were signals and relative paths. Now,
if you want to tell me that we can completely ignore drive letters on
Windows, then sure. Go ahead. Tell me that relative paths behave
sanely on Windows just as long as you have only a single drive. Or
tell me that the POSIX standard permits three different types of
relative path. And with signals, can you show me that a process can
send another process a variety of different signals, and that the
receiving process can handle them differently?

Claiming that Windows technically ticks some box is utterly meaningless to that.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Steven D'Aprano
On Sun, 03 Jun 2018 11:47:40 +1200, Gregory Ewing wrote:

> Paul Moore wrote:
>> Windows (the kernel) has the
>> capability to implement fork(), but this isn't exposed via the Win32
>> API. To implement fork() you need to go to the raw kernel layer. Which
>> is basically what the Windows Linux subsystem (bash on Windows 10) does
> 
> What people usually mean by "POSIX compliant" is not "it's possible to
> implement the POSIX API on top of it".

What people usually mean by "POSIX compliant" is "Unix or Linux".

But that's not what the POSIX standard requires. It requires a set of 
APIs. I doubt it cares where or how they are implemented.



> By that definition, a raw PC without any software is POSIX compliant.

Do you really mean to say that a computer that won't boot is POSIX 
compliant? Yeah, good luck getting that one past the user acceptance 
testing.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Steven D'Aprano
On Sat, 02 Jun 2018 21:28:41 +1000, Chris Angelico wrote:

> On Sat, Jun 2, 2018 at 9:13 PM, Steven D'Aprano
>  wrote:
>> On Sat, 02 Jun 2018 20:58:43 +1000, Chris Angelico wrote:
>>
> Windows isn't POSIX compliant.

 Technically, Windows is POSIX compliant. You have to turn off a bunch
 of features, turn on another bunch of features, and what you get is
 the bare minimum POSIX compliance possible, but it's enough to tick
 the check box for POSIX compliance.
>>>
>>> Really? I didn't know that Windows path names were POSIX compliant. Or
>>> do you have to use the Cygwin fudge to count Windows as POSIX? And
>>> what about POSIX signal handling?
>>>
>>> Citation needed, big-time.
>>
>> https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem
>>
>> https://technet.microsoft.com/en-us/library/bb463220.aspx
>>
>> https://brianreiter.org/2010/08/24/the-sad-history-of-the-microsoft-
posix-
>> subsystem/
> 
> Can someone confirm whether or not all the listed signals are actually
> supported? 

Unless people do their testing under Windows with the POSIX subsystem 
installed, such testing is likely to fail.


> We know that Ctrl-C maps to the internal Windows interrupt
> handler, and "kill process" maps to the internal Windows "terminate",
> but can you send a different process all the different signals and
> handle them differently?
> 
> I also can't find anything about path names there. What does POSIX say
> about the concept of relative paths? Does Windows comply with that?

That's a curious question to ask. If you don't know what POSIX says about 
a feature, why would you question whether Windows complies with it?


> "Windows has some features which are compatible with the equivalent
> POSIX features" is not the same as "Technically, Windows is POSIX
> compliant".

Chris, you seem to be labouring under the misapprehension that the claim 
is that a stock standard Windows  installation complies 
with the latest version of the POSIX standard.

That's not the case.

The claim (and fact) is that Windows NT with the POSIX subsystem 
installed (and possibly other changes made) is technically compliant with 
version 1 of the POSIX standard, which even in 1997 was only a fraction 
of what most Unix systems provided.

https://en.wikipedia.org/wiki/POSIX#Versions

Just enough to allow bean counters to tick the box that says "POSIX 
compliant" in a government requirements form, provided the technical 
people involved either don't get a say, or do get a say and actually want 
Windows but have to satisfy some bureaucratic requirement for POSIX.

Nobody thinks that standard Windows counts as a Unix.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why exception from os.path.exists()?

2018-06-02 Thread Richard Damon
On 6/2/18 7:47 PM, Gregory Ewing wrote:
> Paul Moore wrote:
>> Windows (the kernel) has the
>> capability to implement fork(), but this isn't exposed via the Win32
>> API. To implement fork() you need to go to the raw kernel layer. Which
>> is basically what the Windows Linux subsystem (bash on Windows 10)
>> does
>
> What people usually mean by "POSIX compliant" is not "it's
> possible to implement the POSIX API on top of it". By that
> definition, a raw PC without any software is POSIX compliant.
>
But it isn't just that it is possible, Microsoft provides that layer, it
just isn't the normal API they suggest using and needs to be explicitly
enabled.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   >