Re: [Python-Dev] pathlib - current status of discussions

2016-04-16 Thread Nick Coghlan
On 17 April 2016 at 04:47, Chris Barker - NOAA Federal
 wrote:
>> On Apr 13, 2016, at 8:31 PM, Nick Coghlan  wrote:
>>
   class Special(bytes):
   def __fspath__(self):
 return 'str-val'
   obj = Special('bytes-val', 'utf8')
   path_obj = fspath(obj, allow_bytes=True)

 With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>>
>> In this kind of case, inheritance tends to trump protocol.
>
> Sure, but...
>
>> example, int subclasses can't override operator.index:
> ...
>> The reasons for that behaviour are more pragmatic than philosophical:
>> builtins and their subclasses are extensively special-cased for speed
>> reasons,
>
> OK, but in this case, purity can beat practicality. If the author
> writes an __fspath__ method, presumably it's because it should be
> used.
>
> And I can certainly imagine one might want to store a path
> representation as bytes, but NOT want the raw bytes passed off to file
> handling libs.
>
> (of course you could use composition rather than subclassing if you had to)

Exactly - inheritance is a really strong relationship that directly
affects the in-memory layout of instances (at least in CPython), and
also the kinds of assumption other code will make about that type (for
example, subclasses are special cased to allow them to override the
behaviour of numeric binary operators when they appear as the right
operand with an instance of the parent type as the left operand, while
with unrelated types, the left operand always gets the first chance to
handle the operation).

When folks don't want to trigger those "this is an " behaviours,
the appropriate design pattern is composition, not inheritance (and
many of the ABCs were introduced to make it easier to implement
particular interfaces without inheriting from the corresponding
builtin types).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-16 Thread Chris Barker - NOAA Federal
> On Apr 13, 2016, at 8:31 PM, Nick Coghlan  wrote:
>
>>>   class Special(bytes):
>>>   def __fspath__(self):
>>> return 'str-val'
>>>   obj = Special('bytes-val', 'utf8')
>>>   path_obj = fspath(obj, allow_bytes=True)
>>>
>>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>
> In this kind of case, inheritance tends to trump protocol.

Sure, but...

> example, int subclasses can't override operator.index:
...
> The reasons for that behaviour are more pragmatic than philosophical:
> builtins and their subclasses are extensively special-cased for speed
> reasons,

OK, but in this case, purity can beat practicality. If the author
writes an __fspath__ method, presumably it's because it should be
used.

And I can certainly imagine one might want to store a path
representation as bytes, but NOT want the raw bytes passed off to file
handling libs.

(of course you could use composition rather than subclassing if you had to)

-CHB
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-15 Thread Nick Coghlan
On 15 April 2016 at 00:01, Random832  wrote:
> On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
>> Adding integers and floats is considered "safe" because most people's
>> use of floats completely compasses their use of ints. (You'll get
>> OverflowError if it can't be represented.) But float and Decimal are
>> considered "unsafe":
>>
>> >>> 1.5 + decimal.Decimal("1.5")
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: unsupported operand type(s) for +: 'float' and
>> 'decimal.Decimal'
>>
>> This is more what's happening here. Floats and Decimals can represent
>> similar sorts of things, but with enough incompatibilities that you
>> can't simply merge them.
>
> And what such incompatibilities exist between bytes and str for the
> purpose of representing file paths? At the end of the day, there's
> exactly one answer to "what file on disk this represents (or would
> represent if it existed)".

Bytes paths on WIndows are encoded as mbcs for use with the ASCII-only
Windows APIs, and hence don't support the full range of characters
that str does. The colloquial shorthand for that is "bytes paths don't
work properly on Windows" (the more strictly accurate description is
"bytes paths only work correctly on Windows if every code point in the
path can be encoded using the 'mbcs' codec").

Even on *nix, os.fsencode may fail outright if the system is
configured to use a non-universal encoding, while os.fsdecode may
pollute the resulting string with surrogate escaped characters.

Regardless of platform, if somebody hands you *mixed* bytes and str
data, the appropriate default reaction is to complain about it rather
than assume they meant one or the other. That complaint may take one
of two forms:

- for a high level, platform independent API, bytes should just be
rejected outright
- for a low level API with input type dependent behaviour, the input
should be rejected as ambiguous - the API doesn't know whether the str
behaviour or the bytes behaviour is the intended one

pathlib falls into the first category - it just rejects bytes as input
os.path.join falls into the second category - all str is fine, and all
bytes is fine, but mixing them fails

However, once somebody reaches for the coercion APIs (fsdecode and
fsencode), they're now *explicitly* telling the interpreter what they
want, since there's no ambiguity about the possible return types from
those functions.

In relation to Victor's comment about this being complex code to show
to a novice:

  os.path.join(*map(os.fsdecode, ("str", b"bytes")))

I agree, but also think that's a good reason for people to switch to
teaching novices pathlib rather than os.path, and letting them
discover the underlying libraries as required by the code and examples
they encounter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Koos Zevenhoven
On Thu, Apr 14, 2016 at 9:35 PM, Random832  wrote:
> On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
>> (1) Code that has access to pathname/filename data and has some level
>> of control over what data type comes in. This code may for instance
>> choose to deal with either bytes or str
>>
>> (2) Code that takes the path or file name that it happens to get and
>> does something with it. This type of code can be divided into
>> subgroups as follows:
>>
>>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
>> pathlib) and fails if it gets something else.
>
> Ideally, these should go away.
>

I don't think so. (1) might even be the most common type of all code.
This is code that gets a path from user input, from a config file,
from a database etc. and then does things with it, typically including
passing it to type (2) code and potentially getting a path back from
there too.

>>   (2b) Code that wants to support different types of paths such as
>> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
>> and various other standard library code. Presumably there is also
>> third-party code that does the same. These functions may want to
>> preserve the str-ness or bytes-ness of the paths in case they return
>> paths, as the stdlib now does. But new code may even want to return
>> pathlib objects when they get such objects as inputs.
>
> Hold on. None of the discussion I've seen has included any way to
> specify how to construct a new object representing a different path
> other than the ones passed in. Surely you're not suggesting type(a)(b).
>

That's right. This protocol is not solving the issue of returning
'rich' path objects. It's solving the issue of passing those objects
to lower-level functions or to interact with other 'rich' path types.
What I meant by this is that there may be code that *does* want to do
type(a)(b), which is out of our control. Maybe I should not have
mentioned that.

> Also, how does DirEntry fit in with any of this?
>

os.scandir + DirEntry are one of the many things in the stdlib that
give you pathnames of the same type as those that were put in.

>> This is the
>> duck-typing or polymorphic code we have been talking about. Code of
>> this type (2b) may want to avoid implicit conversions because it makes
>> the life of code of the other types more difficult.
>
> As long as the type it returns is still a path/bytes/str (and therefore
> can be accepted when the caller passes it somewhere else) what's the
> problem?

No, because not all paths are passed to the function that does the
implicit conversion, and then when for instance os.path.joining two
paths of a differenty type, it raises an error.

In other words: Most non-library code (even library code?) deals with
one specific type and does not want implicit conversions to other
types. Some code (2b) deals with several types and, at least in the
stdlib, such code returns paths of the same type as they are given,
which makes said "most non-library code" happy, because it does not
force the programmer to think about type conversions.

(Then there is also code that explicitly deals with type conversions,
such as os.fsencode and os.fsdecode.)

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
> (1) Code that has access to pathname/filename data and has some level
> of control over what data type comes in. This code may for instance
> choose to deal with either bytes or str
> 
> (2) Code that takes the path or file name that it happens to get and
> does something with it. This type of code can be divided into
> subgroups as follows:
> 
>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
> pathlib) and fails if it gets something else.

Ideally, these should go away.

>   (2b) Code that wants to support different types of paths such as
> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
> and various other standard library code. Presumably there is also
> third-party code that does the same. These functions may want to
> preserve the str-ness or bytes-ness of the paths in case they return
> paths, as the stdlib now does. But new code may even want to return
> pathlib objects when they get such objects as inputs.

Hold on. None of the discussion I've seen has included any way to
specify how to construct a new object representing a different path
other than the ones passed in. Surely you're not suggesting type(a)(b).

Also, how does DirEntry fit in with any of this?

> This is the
> duck-typing or polymorphic code we have been talking about. Code of
> this type (2b) may want to avoid implicit conversions because it makes
> the life of code of the other types more difficult.

As long as the type it returns is still a path/bytes/str (and therefore
can be accepted when the caller passes it somewhere else) what's the
problem?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 10:22 AM, Paul Moore wrote:

On 14 April 2016 at 17:46, Ethan Furman wrote:



If you are not working at the bytes layer, you shouldn't be getting bytes
objects because:

- you specified str when asking for data from the OS, or
- you transformed the incoming bytes from whatever external source
   to str when you received them.


My experience is that (particularly with code that was originally
written for Python 2) "you have control of your data" is often an
illusion - bytes can appear in code from unexpected sources, and when
they do I'd rather see an error if I'm using code where I expect a
string. Certainly that's a bug in the code - all I'm saying is that it
fail early rather than late.


If we have one function that uses a flag and you leave the flag alone 
(it defaults to rejecting bytes) -- voila!  An error is raised when 
bytes show up.



I'd appreciate it if anyone can clarify why "gracefully extending" the
protocol to include bytes support at a later date isn't practical.


It's going to be a bunch of work.  I don't want to do the work twice.

On the other hand, if while doing the work it becomes apparent that 
supporting bytes and str in the protocol is either infeasible, 
confusing, or a plain ol' bad idea I have no problem ripping out the 
bytes support and going to str only.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Koos Zevenhoven
On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furman  wrote:
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

There is an apparent contradiction of the above with some previous
posts, including your own. Let me try to fix it:

Code that deals with paths can be divided in groups as follows:

(1) Code that has access to pathname/filename data and has some level
of control over what data type comes in. This code may for instance
choose to deal with either bytes or str

(2) Code that takes the path or file name that it happens to get and
does something with it. This type of code can be divided into
subgroups as follows:

  (2a) Code that accepts only one type of paths (e.g. str, bytes or
pathlib) and fails if it gets something else.

  (2b) Code that wants to support different types of paths such as
str, bytes or pathlib objects. This includes os.path.*, os.scandir,
and various other standard library code. Presumably there is also
third-party code that does the same. These functions may want to
preserve the str-ness or bytes-ness of the paths in case they return
paths, as the stdlib now does. But new code may even want to return
pathlib objects when they get such objects as inputs. This is the
duck-typing or polymorphic code we have been talking about. Code of
this type (2b) may want to avoid implicit conversions because it makes
the life of code of the other types more difficult.

(feel free to fill in more categories of code)

So the code of type (2b) is trying to make all categories happy by
returning objects of the same type that it gets as input, while the
other categories are probably in the situation where they don't
necessarily need to make other categories of code happy.

And the question is this: Do we need to make code using both bytes
*and* scandir happy? This is largely the same question as whether we
have to support bytes in addition to str in the protocol.

(We may of course talk about third-party path libraries that have the
same problem as scandir's DirEntry. Ethan's library is not exactly in
the same category as DirEntry since its path objects *are* instances
of bytes or str and therefore do not need this protocol to begin with,
except perhaps for conversions from other high-level path types so
that different path libraries work together nicely).

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Paul Moore
On 14 April 2016 at 17:46, Ethan Furman  wrote:
> On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:
>
>> I am saying that if os.path.join now accepts RichPath objects, and those
>> objects can return either str or bytes, then its much harder to reason
>> about
>> when I have all bytes or all strings. In essence, you will force me to
>> pre-
>> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
>> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if
>> I
>> have to always do that wrapping then os.path.join doesn't need to accept
>> RichPath objects and call fspath at all.
>
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

My experience is that (particularly with code that was originally
written for Python 2) "you have control of your data" is often an
illusion - bytes can appear in code from unexpected sources, and when
they do I'd rather see an error if I'm using code where I expect a
string. Certainly that's a bug in the code - all I'm saying is that it
fail early rather than late.

Having said this, I don't have an actual use case - but equally it
seems to me that our problem is that *nobody* does (yet) because
uptake of pathlib has been slow, thanks to limited stdlib support. My
view remains that we should get the (relatively simple and
uncontroversial) str support in place, and defer bytes support for
when we have experience with that.

I'd appreciate it if anyone can clarify why "gracefully extending" the
protocol to include bytes support at a later date isn't practical.
Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:


I am saying that if os.path.join now accepts RichPath objects, and those
objects can return either str or bytes, then its much harder to reason about
when I have all bytes or all strings. In essence, you will force me to pre-
wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I
have to always do that wrapping then os.path.join doesn't need to accept
RichPath objects and call fspath at all.


What many folks seem to be missing is that *you* (generic you) have 
control of your data.


If you are not working at the bytes layer, you shouldn't be getting 
bytes objects because:


- you specified str when asking for data from the OS, or
- you transformed the incoming bytes from whatever external source
  to str when you received them.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 09:09 AM, Victor Stinner wrote:

2016-04-14 16:54 GMT+02:00 Ethan Furman:



I consider that the final goal of the whole discussion is to support
something like:

  path = os.path.join(pathlib_path, "str_path", direntry)

(...)
I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.


This would be where we strongly disagree.


FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)


Absolutely.  I appreciate you explaining your point of view.


At least, we now identified better a point of disagreement.


Agreed.  :)

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Donald Stufft  stufft.io> writes:

> > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev  python.org> wrote:
> > 
> > In essence, you will force me to pre-
> > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> > os.fsdecode(os.fspath(path)), just so I can reason about the type.
> 
> This is only the case if you have a singular RichPath object that can 
represent both bytes and str (which is
> what DirEntry does, which I agree makes it harder… but that’s already the 
case with DirEntry.path).
> However that’s not the case if you have a bRichPath and uRichPath.

And you might even be able to retain your sanity if you enforce any 
particular class to be either bRichPath or uRichPath. But if you do that, 
then that still leaves DirEntry out in the cold, likely converting to str in 
its __fspath__. Which leaves me in the camp that bRichPath falls under YAGNI, 
and RichPath should be str only.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 12:05, Stephen J. Turnbull wrote:
> Random832 writes:
> 
>  > And what such incompatibilities exist between bytes and str for the
>  > purpose of representing file paths?
> 
> A plethora of encodings.

Only one encoding, fsencode/fsdecode. All other encodings are not for
filenames.

>  > At the end of the day, there's exactly one answer to "what file on
>  > disk this represents (or would represent if it existed)".
> 
> Nope.  Suppose those bytes were read from a file or a socket?  It's
> dangerous to assume that encoding matches the file system's.

Why can I pass them to os.open, then, or to os.path.join so long as
everything else is also bytes?

On UNIX, the filesystem is in bytes, so saying that bytes can't match
the filesystem is absurd. Converting it to str with fsdecode will
*always, absolutely, 100% of the time* give a str that will address the
same file that the bytes does (even if it's "dangerous" to assume that
was the name the user wanted, that's beyond the scope of what the module
is capable of dealing with).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Donald Stufft

> On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev 
>  wrote:
> 
> In essence, you will force me to pre-
> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> os.fsdecode(os.fspath(path)), just so I can reason about the type.


This is only the case if you have a singular RichPath object that can represent 
both bytes and str (which is what DirEntry does, which I agree makes it harder… 
but that’s already the case with DirEntry.path). However that’s not the case if 
you have a bRichPath and uRichPath.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 17:29 GMT+02:00 Ethan Furman :
> Interoperability with other systems and/or libraries.  If we use
> surrogateescape to transform str to bytes, and the other side does not, we
> no longer have a workable path.

I guess that you mean a Python library? When you exchange with
external programs or call a C libraries, Python is responsible to
encode Unicode to bytes with os.fsencode(). The external part is not
aware that Python uses surrogateescape, it gets "regular" bytes.

I suggest to consider such Python library as external programs and
libraries: convert Unicode to bytes with os.fsencode(), but also
process paths as Unicode "inside" your application.

It's the basic rule to handle correctly Unicode in an application:
decode inputs as soon as possible, and encode back as late as
possible. Encode/decode at borders.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 16:54 GMT+02:00 Ethan Furman :
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>>  path = os.path.join(pathlib_path, "str_path", direntry)
>>
>> (...)
>> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
>> just to make my life easier.
>
> This would be where we strongly disagree.

FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)

At least, we now identified better a point of disagreement.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Stephen J. Turnbull
Random832 writes:

 > And what such incompatibilities exist between bytes and str for the
 > purpose of representing file paths?

A plethora of encodings.

 > At the end of the day, there's exactly one answer to "what file on
 > disk this represents (or would represent if it existed)".

Nope.  Suppose those bytes were read from a file or a socket?  It's
dangerous to assume that encoding matches the file system's.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Ethan Furman  stoneleaf.us> writes:

> On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:
> > In particular, one RichPath
> > class might return bytes and another str, or even worse the same class 
might
> > sometimes return bytes and sometimes str. When will os.path.join blow up 
due
> > to mixing bytes and str and when will it work in those situations?
> 
> What are you asking here?  ...  Meaning allowing os.fspath() 
> and __fspath__ to return either bytes or str will never cause the 
> combination of bytes and str to work.  Said another way: if you are 
> using os.path.join then all the pieces have be str or all the pieces 
> have to be bytes.

I am saying that if os.path.join now accepts RichPath objects, and those 
objects can return either str or bytes, then its much harder to reason about 
when I have all bytes or all strings. In essence, you will force me to pre-
wrap all RichPath objects in either os.fsencode(os.fspath(path)) or 
os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I 
have to always do that wrapping then os.path.join doesn't need to accept 
RichPath objects and call fspath at all.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 07:01 AM, Random832 wrote:

On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:

Adding integers and floats is considered "safe" because most people's
use of floats completely compasses their use of ints. (You'll get
OverflowError if it can't be represented.) But float and Decimal are
considered "unsafe":

--> 1.5 + decimal.Decimal("1.5")
Traceback (most recent call last):
   File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'float' and
'decimal.Decimal'

This is more what's happening here. Floats and Decimals can represent
similar sorts of things, but with enough incompatibilities that you
can't simply merge them.


And what such incompatibilities exist between bytes and str for the
purpose of representing file paths? At the end of the day, there's
exactly one answer to "what file on disk this represents (or would
represent if it existed)".


Interoperability with other systems and/or libraries.  If we use 
surrogateescape to transform str to bytes, and the other side does not, 
we no longer have a workable path.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 06:56 AM, Victor Stinner wrote:

2016-04-14 15:40 GMT+02:00 Nick Coghlan:

>> Even earlier, Victor Stinner wrote:


I consider that the final goal of the whole discussion is to support
something like:

 path = os.path.join(pathlib_path, "str_path", direntry)


That's not a *new* problem though, it already exists if you pass in a
mix of bytes and str:
(...)
There's also already a solution (regardless of whether you want bytes
or str as the result), which is to explicitly coerce all the arguments
to the same type:

--> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
(...)


I don't understand. What is the point of adding a new __fspath__
protocol to *implicitly* convert path objects to strings, if you still
have to use an explicit conversion?


That's the crux of the issue -- some of us think the job of __fspath__ 
is to simply retrieve the inherent data from the pathy object, *not* to 
do any implicit conversions.



I would really expect that a high-level API like pathlib would solve
encodings issues for me. IMHO DirEntry entries created by
os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.


Then let pathlib do it. As a high-level interface I have no issue with 
pathlib converting DirEntry bytes objects to str using fsdecode (or 
whatever makes sense); os.path.join (and by extension os.fspath and 
__fspath__) should do no such thing.



os.path.join(*map(os.fsdecode, ("str", b"bytes")))


This code is quite complex for a newbie, don't you think so?


A newbie should be using pathlib.  If pathlib is not low-level enough, 
then the newbie needs to learn about low-level stuff.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 05:16 AM, Victor Stinner wrote:


I consider that the final goal of the whole discussion is to support
something like:

 path = os.path.join(pathlib_path, "str_path", direntry)

Even if direntry uses a bytes filename. I expect genericpath.join() to
be patched to use os.fspath(). If os.fspath() returns bytes,
path.join() will fail with an annoying TypeError.

I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.


This would be where we strongly disagree.  If pathlib, as a high-level 
construct, wants to take that approach I have no issues, but the 
functions in os are low-level and as such should not be changing data 
types unless I ask for it.  I see __fspath__ as a retrieval mechanism, 
not a data-transformation mechanism.



You can apply the same rationale for the flavors 2 and 3
(os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
TypeError on os.path.join().


And that's fine.  Low-level interfaces should not change data types 
unless explicitly requested -- and we have fsencode() and fsdecode() for 
that.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:

Brett Cannon writes:



After playing with and considering the 4 possibilities, anything where
__fspath__ can return bytes seems like insanity that flies in the face of
everything Python 3 is trying to accomplish. In particular, one RichPath
class might return bytes and another str, or even worse the same class might
sometimes return bytes and sometimes str. When will os.path.join blow up due
to mixing bytes and str and when will it work in those situations?


What are you asking here?  Exactly where in os.join mixing bytes & str 
the exception will occur, or will mixing bytes & str ever work?


The answer to the first is irrelevant (except for performance).

The answer to the second is always/never.  Meaning allowing os.fspath() 
and __fspath__ to return either bytes or str will never cause the 
combination of bytes and str to work.  Said another way: if you are 
using os.path.join then all the pieces have be str or all the pieces 
have to be bytes.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 15:40 GMT+02:00 Nick Coghlan :
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>> path = os.path.join(pathlib_path, "str_path", direntry)
>
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> (...)
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:
>
 os.path.join(*map(os.fsdecode, ("str", b"bytes")))
> (...)

I don't understand. What is the point of adding a new __fspath__
protocol to *implicitly* convert path objects to strings, if you still
have to use an explicit conversion?

I would really expect that a high-level API like pathlib would solve
encodings issues for me. IMHO DirEntry entries created by
os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.

os.path.join() is just one example of an operation on multiple paths.
Look at os.path for other example ;-)

> os.path.join(*map(os.fsdecode, ("str", b"bytes")))

This code is quite complex for a newbie, don't you think so?

My example was os.path.join(pathlib_path, "str_path", direntry) where
we can do something to make the API easier to use.

I don't propose to do anything for os.path.join("str", b"bytes") which
would continue to fail with TypeError, *as expected*.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
> Adding integers and floats is considered "safe" because most people's
> use of floats completely compasses their use of ints. (You'll get
> OverflowError if it can't be represented.) But float and Decimal are
> considered "unsafe":
> 
> >>> 1.5 + decimal.Decimal("1.5")
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unsupported operand type(s) for +: 'float' and
> 'decimal.Decimal'
> 
> This is more what's happening here. Floats and Decimals can represent
> similar sorts of things, but with enough incompatibilities that you
> can't simply merge them.

And what such incompatibilities exist between bytes and str for the
purpose of representing file paths? At the end of the day, there's
exactly one answer to "what file on disk this represents (or would
represent if it existed)".
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> 
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:

It'd be nice if that went away. Having to do that makes about as much
sense to me as if you had to explicitly coerce an int to a float to add
them together. Sure, explicit is better than implicit, but there are
limits. You're explicitly calling os.path.join; isn't that explicit
enough?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Chris Angelico
On Thu, Apr 14, 2016 at 11:45 PM, Random832  wrote:
> On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
>> That's not a *new* problem though, it already exists if you pass in a
>> mix of bytes and str:
>>
>> There's also already a solution (regardless of whether you want bytes
>> or str as the result), which is to explicitly coerce all the arguments
>> to the same type:
>
> It'd be nice if that went away. Having to do that makes about as much
> sense to me as if you had to explicitly coerce an int to a float to add
> them together. Sure, explicit is better than implicit, but there are
> limits. You're explicitly calling os.path.join; isn't that explicit
> enough?

Adding integers and floats is considered "safe" because most people's
use of floats completely compasses their use of ints. (You'll get
OverflowError if it can't be represented.) But float and Decimal are
considered "unsafe":

>>> 1.5 + decimal.Decimal("1.5")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal'

This is more what's happening here. Floats and Decimals can represent
similar sorts of things, but with enough incompatibilities that you
can't simply merge them.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 22:16, Victor Stinner  wrote:
> 2016-04-13 19:10 GMT+02:00 Brett Cannon :
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
>> four potential approaches implemented (although it doesn't follow the
>> "separate functions" approach some are proposing and instead goes with the
>> allow_bytes approach I originally proposed).
>
> IMHO the best argument against the flavor 4 (fspath: str or bytes
> allowed) is the os.path.join() function.
>
> I consider that the final goal of the whole discussion is to support
> something like:
>
> path = os.path.join(pathlib_path, "str_path", direntry)

That's not a *new* problem though, it already exists if you pass in a
mix of bytes and str:

>>> import os.path
>>> os.path.join("str", b"bytes")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.4/posixpath.py", line 89, in join
"components") from None
TypeError: Can't mix strings and bytes in path components

There's also already a solution (regardless of whether you want bytes
or str as the result), which is to explicitly coerce all the arguments
to the same type:

>>> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
'str/bytes'
>>> os.path.join(*map(os.fsencode, ("str", b"bytes")))
b'str/bytes'

Assuming os.fsdecode and os.fsencode are updated to call os.fspath on
their argument before continuing with the current logic, the latter
two forms would both start automatically handling both DirEntry and
pathlib objects, while the first form would continue to throw
TypeError if handed an unexpected bytes value (whether directly or via
an __fspath__ call).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-13 19:10 GMT+02:00 Brett Cannon :
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
> four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).

IMHO the best argument against the flavor 4 (fspath: str or bytes
allowed) is the os.path.join() function.

I consider that the final goal of the whole discussion is to support
something like:

path = os.path.join(pathlib_path, "str_path", direntry)

Even if direntry uses a bytes filename. I expect genericpath.join() to
be patched to use os.fspath(). If os.fspath() returns bytes,
path.join() will fail with an annoying TypeError.

I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.

I recall that I used to say that Python 2 doesn't support Unicode
filenames because os.path.join() raises a UnicodeDecodeError when you
try to join a Unicode filename with a byte filename which contains
non-ASCII bytes. The problem occurs indirectly in code using hardcoded
paths, Unicode or bytes paths. Saying that "Python 2 doesn't support
Unicode filenames" is wrong, but since Unicode is an hard problem, I
tried to simplify my explanation :-)

You can apply the same rationale for the flavors 2 and 3
(os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
TypeError on os.path.join().

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Brett Cannon  python.org> writes:

> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has 
the four potential approaches implemented (although it doesn't follow the 
"separate functions" approach some are proposing and instead goes with the 
allow_bytes approach I originally proposed). 
> 

Thanks Brett, it is definitely a start! Maybe I am just more unimaginative 
than most, but since interoperability is the goal, I would ideally be able 
to play with a full implementation where all the stdlib functions Nick 
originally mentioned accepted these "rich path" objects. 

However, for concrete example purposes, maybe it is sufficient to start with 
your fspath function, a toy RichPath class implementing __fspath__, and 
something like os.path.join, which is a meaty enough example to test some of 
the functionality. I posted a gist of a string only example at 
https://gist.github.com/mmysinger/0b5ae2cfb866f7013c387a2683c7fc39

After playing with and considering the 4 possibilities, anything where 
__fspath__ can return bytes seems like insanity that flies in the face of 
everything Python 3 is trying to accomplish. In particular, one RichPath 
class might return bytes and another str, or even worse the same class might 
sometimes return bytes and sometimes str. When will os.path.join blow up due 
to mixing bytes and str and when will it work in those situations? So for me 
that eliminates #3 and #4.

Also the version #2 accepting bytes in os.fspath felt like it could be a 
very minor convenience, but even the str only version #1 is just requires 
one isinstance check in the rare case you need to also deal with bytes (see 
the os.path.join example in the gist above). So I lean toward the str only 
#1 version. 

In any case I would start with the strict str only full implementation and 
loosen it either in 3.6 or 3.7 depending on what people think after actually 
using it.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 14:05, Random832  wrote:
> On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote:
>> In this kind of case, inheritance tends to trump protocol. For
>> example, int subclasses can't override operator.index:
> ...
>> The reasons for that behaviour are more pragmatic than philosophical:
>> builtins and their subclasses are extensively special-cased for speed
>> reasons, and those shortcuts are encountered before the interpreter
>> even considers using the general protocol.
>>
>> In cases where the magic method return types are polymorphic (so
>> subclasses may want to override them) we'll use more restrictive exact
>> type checks for the shortcuts, but that argument doesn't apply for
>> typechecked protocols where the result is required to be an instance
>> of a particular builtin type (but subclasses are considered
>> acceptable).
>
> Then why aren't we doing it for str? Because "try: path =
> path.__fspath__()" is more idiomatic than the alternative?

The sketches Brett posted will bear little resemblance to the actual
implementation - that will be in C and use similar idioms to those we
use for other abstract protocols (such as shortcuts for instances of
builtin types, and doing the method lookup via the passed in object's
type, rather than on the instance).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote:
> In this kind of case, inheritance tends to trump protocol. For
> example, int subclasses can't override operator.index:
...
> The reasons for that behaviour are more pragmatic than philosophical:
> builtins and their subclasses are extensively special-cased for speed
> reasons, and those shortcuts are encountered before the interpreter
> even considers using the general protocol.
> 
> In cases where the magic method return types are polymorphic (so
> subclasses may want to override them) we'll use more restrictive exact
> type checks for the shortcuts, but that argument doesn't apply for
> typechecked protocols where the result is required to be an instance
> of a particular builtin type (but subclasses are considered
> acceptable).

Then why aren't we doing it for str? Because "try: path =
path.__fspath__()" is more idiomatic than the alternative?

If some sort of reasoned decision has been made to require the protocol
to trump the special case for str subclasses, it's unreasonable not to
apply the same decision to bytes subclasses. The decision should be
"always use the protocol first" or "always use the type match first".

In other words, why not this:

def fspath(path, *, allow_bytes=False):
if isinstance(path, (bytes, str) if allow_bytes else str)
return path
try:
m = path.__fspath__
except AttributeError:
raise TypeError
path = m()
if isinstance(path, (bytes, str) if allow_bytes else str)
return path
raise TypeError
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Nick Coghlan
On 14 April 2016 at 13:14, Ethan Furman  wrote:
> On 04/13/2016 07:57 PM, Nikolaus Rath wrote:
>> Either I haven't understood your answer, or you haven't understood my
>> question. I'm concerned about this case:
>>
>>class Special(bytes):
>>def __fspath__(self):
>>  return 'str-val'
>>obj = Special('bytes-val', 'utf8')
>>path_obj = fspath(obj, allow_bytes=True)
>>
>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>
> I misunderstood your question.  That is... an interesting case.  ;)

In this kind of case, inheritance tends to trump protocol. For
example, int subclasses can't override operator.index:

>>> from operator import index
>>> class NotAnInt():
... def __index__(self):
... return 42
...
>>> index(NotAnInt())
42
>>> class MyInt(int):
... def __index__(self):
... return 42
...
>>> index(MyInt(53))
53

The reasons for that behaviour are more pragmatic than philosophical:
builtins and their subclasses are extensively special-cased for speed
reasons, and those shortcuts are encountered before the interpreter
even considers using the general protocol.

In cases where the magic method return types are polymorphic (so
subclasses may want to override them) we'll use more restrictive exact
type checks for the shortcuts, but that argument doesn't apply for
typechecked protocols where the result is required to be an instance
of a particular builtin type (but subclasses are considered
acceptable).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Ethan Furman

On 04/13/2016 07:57 PM, Nikolaus Rath wrote:

On Apr 13 2016, Ethan Furman wrote:

On 04/13/2016 03:45 PM, Nikolaus Rath wrote:



When passing an object that is of type str and has a __fspath__
attribute, all approaches return the value of __fspath__().

However, when passing something of type bytes, the second approach
returns the object, while the third returns the value of __fspath__().

Is this intentional? I think a __fspath__ attribute should always be
preferred.


Yes, it is intentional.  The second approach assumes __fspath__ can
only contain str, so there is no point in checking it for bytes.


Either I haven't understood your answer, or you haven't understood my
question. I'm concerned about this case:

   class Special(bytes):
   def __fspath__(self):
 return 'str-val'
   obj = Special('bytes-val', 'utf8')
   path_obj = fspath(obj, allow_bytes=True)

With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.


I misunderstood your question.  That is... an interesting case.  ;)

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Nikolaus Rath
On Apr 13 2016, Ethan Furman  wrote:
> On 04/13/2016 03:45 PM, Nikolaus Rath wrote:
>
>> When passing an object that is of type str and has a __fspath__
>> attribute, all approaches return the value of __fspath__().
>>
>> However, when passing something of type bytes, the second approach
>> returns the object, while the third returns the value of __fspath__().
>>
>> Is this intentional? I think a __fspath__ attribute should always be
>> preferred.
>
> Yes, it is intentional.  The second approach assumes __fspath__ can
> only contain str, so there is no point in checking it for bytes.

Either I haven't understood your answer, or you haven't understood my
question. I'm concerned about this case:

  class Special(bytes):
  def __fspath__(self):
return 'str-val'
  obj = Special('bytes-val', 'utf8')
  path_obj = fspath(obj, allow_bytes=True)  

With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.

I would expect that fspath(obj, allow_bytes=True) == 'str-val' (after
all, it's allow_bytes, not require_bytes). Bu


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Random832

On Apr 13, 2016 19:06, Brett Cannon  wrote:
> On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath  wrote:
>> When passing an object that is of type str and has a __fspath__
>> attribute, all approaches return the value of __fspath__().
>>
>> However, when passing something of type bytes, the second approach
>> returns the object, while the third returns the value of __fspath__().
>>
>> Is this intentional? I think a __fspath__ attribute should always be
>> preferred.
>
>
> It's very much intentional. If we define __fspath__() to only return strings 
> but still want to minimize boilerplate of allowing bytes to simply pass 
> through without checking a path argument to see if it is bytes then approach 
> #2 is warranted. But if __fspath__() can return bytes then approach #3 allows 
> for it. 

Er, the difference comes in when the object passed to os.fspath is a subclass 
of bytes that, itself, has a __fspath__ method (which may return a str). It's 
unlikely to occur in the wild, but is a semantic difference between this case 
and all other objects with __fspath__ methods.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Random832

On Apr 13, 2016 20:06, Chris Barker  wrote:
>
> In this case, I don't know that we need to be tolerant of buggy __fspathname__() implementations -- they should be tested outside these checks, and not be buggy. So a buggy implementation may raise and may be ignored, depending on what Exception the bug triggers -- big deal. The only time it would matter is when the implementer is debugging the implementation.
>
> -CHB
Yes but you can often, and can in this case, restrict the contents of the try block to a single operation - a name access, an attribute, a subscript - and that sharply limits the risk of such a thing happening. Sure, the object's __getattr(ibute)__ could still fail from something deep inside it missing a different attribute, but that's it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Ethan Furman

On 04/13/2016 05:06 PM, Chris Barker wrote:


In this case, I don't know that we need to be tolerant of buggy
__fspathname__() implementations -- they should be tested outside these
checks, and not be buggy. So a buggy implementation may raise and may be
ignored, depending on what Exception the bug triggers -- big deal. The
only time it would matter is when the implementer is debugging the
implementation.


Yet the idea behind robust exception handling is to test as little as 
possible and only catch what you know how to correct.


This code catches only one thing, only at one place, and we know how to 
deal with it:


  try:
 fsp = obj.__fspath__
  except AttributeError:
 pass
  else:
 fsp = fsp()

Contrarily, this next code catches the same error, but it could happen 
at the one place we know how to deal with it *or* anywhere further down 
the call stack where we have no clue what the proper course is to handle 
the problem... yet we suppress it anyway:


  try:
fsp = obj.__fspath__()
  except AttributeError:
pass

Certainly not code I want to see in the stdlib.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Chris Barker
On Wed, Apr 13, 2016 at 1:47 PM, Random832  wrote:

> On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote:
> > so are we worried that __fspath__ will exist and be callable, but  might
> > raise an AttributeError somewhere inside itself? if so isn't it broken
> > anyway, so should it be ignored?
>
> Well, if you're going to say "ignore the protocol because it's broken",
> where do you stop? What if it raises some other exception? What if it
> raises SystemExit?


this is pretty much always the case with EAFTP coding:

try:
something()
except SomeError:
do_something_else()

unless SomeError is a custom defined error that you know is never going to
get raised anywhere else, then something() could raise SomeError for the
reason you expect, or some code deep in the call stack could raise
SomeError also, and you wouldn't know that.

I had a student run into this and it took him a good while to debug it. But
that was because the code in something() was pretty darn buggy. If he had
tested something() by itself, there would have been no issue finding the
problem.

In this case, I don't know that we need to be tolerant of buggy
__fspathname__() implementations -- they should be tested outside these
checks, and not be buggy. So a buggy implementation may raise and may be
ignored, depending on what Exception the bug triggers -- big deal. The only
time it would matter is when the implementer is debugging the
implementation.

-CHB





-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 15:20 Victor Stinner 
wrote:

> Oh, since others voted, I will also vote and explain my vote.
>
> I like choice 1, str only, because it's very well defined. In Python
> 3, Unicode is simply the native type for text. It's accepted by almost
> all functions. In other emails, I also explained that Unicode is fine
> to store undecodable filenames on UNIX, it works as expected since
> many years (since Python 3.3).
>
> --
>
> If you cannot survive without bytes, I suggest to add two functions:
> one for str only, another which can return str or bytes.
>
> Maybe you want in fact two protocols: __fspath__(str only) and
> __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or
> fallback to os.fsencode(__fspath__). os.fspath() would first try
> __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not
> worth to have such complexity while Unicode handles all use cases.
>

Implementing two magic methods for this seems like overkill. Best I would
be willing to do with automatic encode/decode is use
os.fsencode()/os.fsdecode() on the argument or what __fspath__() returned.


>
> Or do you know functions implemented in Python accepting str *and* bytes?
>

On purpose, nothing off the top of my head.


>
> --
>
> The C implementation of the os module has an important
> path_converter() function:
>
>  * path_converter accepts (Unicode) strings and their
>  * subclasses, and bytes and their subclasses.  What
>  * it does with the argument depends on the platform:
>  *
>  *   * On Windows, if we get a (Unicode) string we
>  * extract the wchar_t * and return it; if we get
>  * bytes we extract the char * and return that.
>  *
>  *   * On all other platforms, strings are encoded
>  * to bytes using PyUnicode_FSConverter, then we
>  * extract the char * from the bytes object and
>  * return that.
>
> This function will implement something like os.fspath().
>
> With os.fspath() only accepting str, we will return directly the
> Unicode string on Windows. On UNIX, Unicode will be encoded, as it's
> already done for Unicode strings.
>
> This specific function would benefit of the flavor 4 (os.fspath() can
> return str and bytes), but it's more an exception than the rule. I
> would be more a micro-optimization than a good reason to drive the API
> design.
>

Yep, it's interesting to know but Chris and I won't let it drive the
decision (I assume).

-Brett


>
> Victor
>
> Le mercredi 13 avril 2016, Brett Cannon  a écrit :
> >
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
> has the four potential approaches implemented (although it doesn't follow
> the "separate functions" approach some are proposing and instead goes with
> the allow_bytes approach I originally proposed).
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath  wrote:

> On Apr 13 2016, Brett Cannon  wrote:
> > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
> > python-dev@python.org> wrote:
> >
> >> Ethan Furman  stoneleaf.us> writes:
> >>
> >> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
> >> > allow bytes from __fspath__()?
> >>
> >> De-lurking. Especially since the ultimate goal is better
> interoperability,
> >> I
> >> feel like an implementation that people can play with would help guide
> the
> >> few remaining decisions. To help test the various options you could
> >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
> >> both
> >> pathlib.__fspath__() and os.fspath(), with distinct configurable
> defaults
> >> for
> >> each.
> >>
> >> In the spirit of Python 3 I feel like bytes might not be needed in
> >> practice,
> >> but something like this with defaults of False will allow people to
> easily
> >> test all the various options.
> >>
> >
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
> > the four potential approaches implemented (although it doesn't follow the
> > "separate functions" approach some are proposing and instead goes with
> the
> > allow_bytes approach I originally proposed).
>
>
> When passing an object that is of type str and has a __fspath__
> attribute, all approaches return the value of __fspath__().
>
> However, when passing something of type bytes, the second approach
> returns the object, while the third returns the value of __fspath__().
>
> Is this intentional? I think a __fspath__ attribute should always be
> preferred.
>

It's very much intentional. If we define __fspath__() to only return
strings but still want to minimize boilerplate of allowing bytes to simply
pass through without checking a path argument to see if it is bytes then
approach #2 is warranted. But if __fspath__() can return bytes then
approach #3 allows for it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Ethan Furman

On 04/13/2016 03:45 PM, Nikolaus Rath wrote:


When passing an object that is of type str and has a __fspath__
attribute, all approaches return the value of __fspath__().

However, when passing something of type bytes, the second approach
returns the object, while the third returns the value of __fspath__().

Is this intentional? I think a __fspath__ attribute should always be
preferred.


Yes, it is intentional.  The second approach assumes __fspath__ can only 
contain str, so there is no point in checking it for bytes.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Nikolaus Rath
On Apr 13 2016, Brett Cannon  wrote:
> On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
> python-dev@python.org> wrote:
>
>> Ethan Furman  stoneleaf.us> writes:
>>
>> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
>> > allow bytes from __fspath__()?
>>
>> De-lurking. Especially since the ultimate goal is better interoperability,
>> I
>> feel like an implementation that people can play with would help guide the
>> few remaining decisions. To help test the various options you could
>> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
>> both
>> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults
>> for
>> each.
>>
>> In the spirit of Python 3 I feel like bytes might not be needed in
>> practice,
>> but something like this with defaults of False will allow people to easily
>> test all the various options.
>>
>
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
> the four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).


When passing an object that is of type str and has a __fspath__
attribute, all approaches return the value of __fspath__().

However, when passing something of type bytes, the second approach
returns the object, while the third returns the value of __fspath__().

Is this intentional? I think a __fspath__ attribute should always be
preferred.


Best,
-Nikolaus


-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Victor Stinner
Oh, since others voted, I will also vote and explain my vote.

I like choice 1, str only, because it's very well defined. In Python
3, Unicode is simply the native type for text. It's accepted by almost
all functions. In other emails, I also explained that Unicode is fine
to store undecodable filenames on UNIX, it works as expected since
many years (since Python 3.3).

--

If you cannot survive without bytes, I suggest to add two functions:
one for str only, another which can return str or bytes.

Maybe you want in fact two protocols: __fspath__(str only) and
__fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or
fallback to os.fsencode(__fspath__). os.fspath() would first try
__fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not
worth to have such complexity while Unicode handles all use cases.

Or do you know functions implemented in Python accepting str *and* bytes?

--

The C implementation of the os module has an important
path_converter() function:

 * path_converter accepts (Unicode) strings and their
 * subclasses, and bytes and their subclasses.  What
 * it does with the argument depends on the platform:
 *
 *   * On Windows, if we get a (Unicode) string we
 * extract the wchar_t * and return it; if we get
 * bytes we extract the char * and return that.
 *
 *   * On all other platforms, strings are encoded
 * to bytes using PyUnicode_FSConverter, then we
 * extract the char * from the bytes object and
 * return that.

This function will implement something like os.fspath().

With os.fspath() only accepting str, we will return directly the
Unicode string on Windows. On UNIX, Unicode will be encoded, as it's
already done for Unicode strings.

This specific function would benefit of the flavor 4 (os.fspath() can
return str and bytes), but it's more an exception than the rule. I
would be more a micro-optimization than a good reason to drive the API
design.

Victor

Le mercredi 13 avril 2016, Brett Cannon  a écrit :
>
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the 
> four potential approaches implemented (although it doesn't follow the 
> "separate functions" approach some are proposing and instead goes with the 
> allow_bytes approach I originally proposed).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote:
> so are we worried that __fspath__ will exist and be callable, but  might
> raise an AttributeError somewhere inside itself? if so isn't it broken
> anyway, so should it be ignored?

Well, if you're going to say "ignore the protocol because it's broken",
where do you stop? What if it raises some other exception? What if it
raises SystemExit? 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 13:40 Chris Barker  wrote:

> so are we worried that __fspath__ will exist and be callable, but  might
> raise an AttributeError somewhere inside itself? if so isn't it broken
> anyway, so should it be ignored?
>

It should propagate instead of swallowing up the exception, otherwise it's
hard to debug why __fspath__ seems to be ignored.


>
> and I know it's asking permission rather than forgiveness, but what's
> wrong with:
>
> if hasattr(path, "__fspath__"):
> path = path.__fspath__()
>
> if you really want to check for the existence of the attribute first?
>
>
Nothing.


> or even:
>
> path = path.__fspath__ if hasattr(path, "__fspath__") else path
>
>
That also works.


>
> (OK, really a Pythonic style question now)
>

Yes, this is getting a bit side-tracked over some example code to just get
a concept across.

-Brett


>
> -CHB
>
>
>
> On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon  wrote:
>
>>
>>
>> On Wed, 13 Apr 2016 at 12:39 Fred Drake  wrote:
>>
>>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico 
>>> wrote:
>>> > Is that the intention, or should the exception catching be narrower? I
>>> > know it's clunky to write it in Python, but AIUI it's less so in C:
>>> >
>>> > try:
>>> > callme = path.__fspath__
>>> > except AttributeError:
>>> > pass
>>> > else:
>>> > path = callme()
>>>
>>> +1 for this variant; I really don't like masking errors inside the
>>> __fspath__ implementation.
>>>
>>
>> Don't read too much into the code in that gist. I just did them quickly
>> to get the point across of the proposals in terms of str/bytes, not what
>> will be proposed in any final patch.
>>
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Chris Barker
so are we worried that __fspath__ will exist and be callable, but  might
raise an AttributeError somewhere inside itself? if so isn't it broken
anyway, so should it be ignored?

and I know it's asking poermission rather than forgiveness, but what's
wrong with:

if hasattr(path, "__fspath__"):
path = path.__fspath__()

if you really want to check for the existence of the attribute first?

or even:

path = path.__fspath__ if hasattr(path, "__fspath__") else path


(OK, really a Pythonic style question now)

-CHB



On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon  wrote:

>
>
> On Wed, 13 Apr 2016 at 12:39 Fred Drake  wrote:
>
>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico  wrote:
>> > Is that the intention, or should the exception catching be narrower? I
>> > know it's clunky to write it in Python, but AIUI it's less so in C:
>> >
>> > try:
>> > callme = path.__fspath__
>> > except AttributeError:
>> > pass
>> > else:
>> > path = callme()
>>
>> +1 for this variant; I really don't like masking errors inside the
>> __fspath__ implementation.
>>
>
> Don't read too much into the code in that gist. I just did them quickly to
> get the point across of the proposals in terms of str/bytes, not what will
> be proposed in any final patch.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 12:39 Fred Drake  wrote:

> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico  wrote:
> > Is that the intention, or should the exception catching be narrower? I
> > know it's clunky to write it in Python, but AIUI it's less so in C:
> >
> > try:
> > callme = path.__fspath__
> > except AttributeError:
> > pass
> > else:
> > path = callme()
>
> +1 for this variant; I really don't like masking errors inside the
> __fspath__ implementation.
>

Don't read too much into the code in that gist. I just did them quickly to
get the point across of the proposals in terms of str/bytes, not what will
be proposed in any final patch.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Chris Angelico
On Thu, Apr 14, 2016 at 5:46 AM, Random832  wrote:
> On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote:
>> Is that the intention, or should the exception catching be narrower? I
>> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> How is it less so in C? You lose the ability to PyObject_CallMethod.

I might be wrong, then. Wasn't sure how it was all implemented.
Anyway, it's a correctness thing, not a simplicity one, so even if it
is clunkier, it ought to be the case.

And that is the intention, so we're fine.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote:
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:

How is it less so in C? You lose the ability to PyObject_CallMethod.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Alexander Walters

On 4/13/2016 13:49, Ethan Furman wrote:
Number 3: it allows bytes, but only when told it's okay to do so. 
Having code get a bytes object when one is not expected is not a 
headache we need to inflict on anyone. 


This is an artifact of the other needless restrictions I said I wouldn't 
rant about.  I think it is in the best interest not to perpetuate those 
needless restrictions.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Chris Angelico
On Thu, Apr 14, 2016 at 5:30 AM, Brett Cannon  wrote:
>
>
> On Wed, 13 Apr 2016 at 12:25 Chris Angelico  wrote:
>>
>> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon  wrote:
>> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
>> > the
>> > four potential approaches implemented (although it doesn't follow the
>> > "separate functions" approach some are proposing and instead goes with
>> > the
>> > allow_bytes approach I originally proposed).
>>
>> All of them have this construct:
>>
>> try:
>> path = path.__fspath__()
>> except AttributeError:
>> pass
>>
>> Is that the intention, or should the exception catching be narrower? I
>> know it's clunky to write it in Python, but AIUI it's less so in C:
>>
>> try:
>> callme = path.__fspath__
>> except AttributeError:
>> pass
>> else:
>> path = callme()
>
>
> I'm assuming the C code will do what you're suggesting. My way is just
> faster to write in 2 minutes of coding. :)

Cool cool. Just checking!

You're already aware that my preference is for the first one,
str-only. I don't think the second one has much value (a path-like
object can only ever return a str, but a bytes can be passed through
unchanged?), and the fourth strikes me as a bad idea (just allowing
bytes any time). So my votes are +1, -0.5, +0, -1.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Fred Drake
On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico  wrote:
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> try:
> callme = path.__fspath__
> except AttributeError:
> pass
> else:
> path = callme()

+1 for this variant; I really don't like masking errors inside the
__fspath__ implementation.


  -Fred

-- 
Fred L. Drake, Jr.
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 12:25 Chris Angelico  wrote:

> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon  wrote:
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
> has the
> > four potential approaches implemented (although it doesn't follow the
> > "separate functions" approach some are proposing and instead goes with
> the
> > allow_bytes approach I originally proposed).
>
> All of them have this construct:
>
> try:
> path = path.__fspath__()
> except AttributeError:
> pass
>
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> try:
> callme = path.__fspath__
> except AttributeError:
> pass
> else:
> path = callme()
>

I'm assuming the C code will do what you're suggesting. My way is just
faster to write in 2 minutes of coding. :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Chris Angelico
On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon  wrote:
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
> four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).

All of them have this construct:

try:
path = path.__fspath__()
except AttributeError:
pass

Is that the intention, or should the exception catching be narrower? I
know it's clunky to write it in Python, but AIUI it's less so in C:

try:
callme = path.__fspath__
except AttributeError:
pass
else:
path = callme()

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Antoine Pitrou
Brett Cannon  python.org> writes:
> In the spirit of Python 3 I feel like bytes might not be needed in practice,
> but something like this with defaults of False will allow people to easily
> test all the various options.
> 
> 
> 
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
the four potential approaches implemented (although it doesn't follow the
"separate functions" approach some are proposing and instead goes with the
allow_bytes approach I originally proposed). 

Either number 1 or number 3 for me (I don't think bytes path-like
objects are useful in Python).

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Ethan Furman

On 04/13/2016 10:22 AM, Alexander Walters wrote:

On 4/13/2016 13:10, Brett Cannon wrote:



https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
has the four potential approaches implemented (although it doesn't
follow the "separate functions" approach some are proposing and
instead goes with the allow_bytes approach I originally proposed).


Number 4 is my personal favorite - it has a simple control flow path and
is the least needlessly restrictive.


Number 3: it allows bytes, but only when told it's okay to do so. 
Having code get a bytes object when one is not expected is not a 
headache we need to inflict on anyone.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Alexander Walters

On 4/13/2016 13:10, Brett Cannon wrote:
https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has 
the four potential approaches implemented (although it doesn't follow 
the "separate functions" approach some are proposing and instead goes 
with the allow_bytes approach I originally proposed). 


Number 4 is my personal favorite - it has a simple control flow path and 
is the least needlessly restrictive.


(I could rant about needless restrictions, but I am about a decade late 
for that, so I wont bother.)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Brett Cannon
On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
python-dev@python.org> wrote:

> Ethan Furman  stoneleaf.us> writes:
>
> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
> > allow bytes from __fspath__()?
>
> De-lurking. Especially since the ultimate goal is better interoperability,
> I
> feel like an implementation that people can play with would help guide the
> few remaining decisions. To help test the various options you could
> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
> both
> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults
> for
> each.
>
> In the spirit of Python 3 I feel like bytes might not be needed in
> practice,
> but something like this with defaults of False will allow people to easily
> test all the various options.
>

https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
the four potential approaches implemented (although it doesn't follow the
"separate functions" approach some are proposing and instead goes with the
allow_bytes approach I originally proposed).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-13 Thread Nick Coghlan
On 13 April 2016 at 02:19, Chris Barker  wrote:
> So: why use strings as the lingua franca of paths? i.e. the basis of the
> path protocol. maybe we should support only two path representations:
>
> 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> supports the path protocol.
>
> 2) the bytes that the OS actually needs.
>
> this would mean that the protocol would be to have a __pathbytes__() method
> that woulde return the bytes that should be passed off to the OS.

The reason to favour strings over raw bytes for path manipulation is
the same reason to favour them anywhere else: to avoid having to worry
about encodings *while* you're manipulating things, and instead only
worry about the encoding when actually talking to the OS (which may be
UTF-16-LE to talk to a Windows API, or UTF-8 to talk to a *nix API, or
something else entirely if your OS is set up that way, or you're
writing the path to a file or network packet, rather than using it
locally).

Regardless of what we decide about os.fspath's return type, that
general principle won't change - if you're manipulating bytes paths
directly, you're doing something relatively specialised (like working
on CPython's own os module).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Michael Mysinger via Python-Dev
Ethan Furman  stoneleaf.us> writes:
 
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we 
> allow bytes from __fspath__()?

De-lurking. Especially since the ultimate goal is better interoperability, I 
feel like an implementation that people can play with would help guide the 
few remaining decisions. To help test the various options you could 
temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to both 
pathlib.__fspath__() and os.fspath(), with distinct configurable defaults for 
each. 

In the spirit of Python 3 I feel like bytes might not be needed in practice, 
but something like this with defaults of False will allow people to easily 
test all the various options.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 6:52 PM, Stephen J. Turnbull  wrote:
>
> (A) Why does anybody need bytes out of a pathlib.Path (or other
> __fspath__-toting, higher-level API) *inside* the boundary?  Note
> that the APIs in os (etc) *don't need* bytes because they are
> already polymorphic.
>

Indeed not from pathlib.*Path , but from DirEntry, which may have a
path as bytes. So the options for DirEntry (or things like Ethan's
'antipathy') are:

(1) Provide bytes or str via the protocol, depending on which type
this DirEntry has

Downside: The protocol needs to support str and bytes.

(2) Decode bytes using os.fsdecode and provide a str via the protocol

Downside: The user passed in bytes and maybe had a reason to do so.
This might lead to a weird mixture of str and bytes in the same code.

(3) Do not implement the protocol when dealing with bytes

Downside: If a function calling os.scandir accepts both bytes and str
in a duck-typing fashion, then, if this adopted something that uses
the new protocol, it will lose its bytes compatiblity. This risk might
not be huge, so perhaps (3) is an option?


> (B) If they do, why can't they just apply bytes() to the object?  I
> understand that that would offend Ethan's aesthetic sense, so it's
> worth looking for a nice way around it.  But allowing __fspath__
> to return bytes or str is hideous, because Paths are clearly on
> the application side of the boundary.
>
> Note that bytes() may not have the serious problem that str() does of
> being too catholic about its argument: nothing in __builtins__ has a
> __bytes__!  Of course there are a few things that do work: ints, and
> sequences of ints.

Good point. But this only applies to when the user _explicitly_ deals
with bytes. But when the user just deals with the type (str or bytes)
that is passed in, as os.path.* as well as DirEntry now do, this does
not work.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 9:32 AM, Koos Zevenhoven  wrote:

> > 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> > supports the path protocol.
> >
> > 2) the bytes that the OS actually needs.
> >
>
> You do have a point there. But since bytes pathnames are deprecated on
> windows,


Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest
level, but the decision was already made there to use str as the the
lingua-franca -- i.e. the user NEVER sees a path as a bytestring on
Windows? I guess that's decided then. str is the exchange format.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Ethan Furman

On 04/12/2016 09:26 AM, Koos Zevenhoven wrote:


So I'm, once again, posing this question (that I don't think got any
reactions previously): Is there a significant audience for this new
function, or is it enough to keep it a private function for the stdlib
to use?


Quite frankly, I expect the stdlib itself to be the primary consumer. 
But I see no reason to not publish the function so that users who need 
the advanced functionality have easy access to it.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 7:19 PM, Chris Barker  wrote:
>
> One more though came up just now: there are different level sof abstractions
> and representations for paths. We don't want to make Path a subclass of
> string, because Path is supposed to be a higher level abstraction -- good.
>
> then at the bottom of the stack, we NEED the bytes level path, because that
> what ultimately gets passed to the OS.
>
> THe legacy from the single-byte encoding days is that bytes and strings were
> the same, so we could let people work with nice human readable strings,
> while also working with byte paths in the same way -- but those days are
> gone -- py3 make s clear (and important) distiction between nice human
> readable strings  and the bytes that represent them.
>
> So: why use strings as the lingua franca of paths? i.e. the basis of the
> path protocol. maybe we should support only two path representations:
>
> 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> supports the path protocol.
>
> 2) the bytes that the OS actually needs.
>

You do have a point there. But since bytes pathnames are deprecated on
windows, this seems to lead to supporting both str and bytes in the
protocol, or having two protocols __fspathbytes__ and __fspathstr__
(and one being preferred over the other, potentially even depending on
the platform).,

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 11:56 AM, Nick Coghlan  wrote:
> One possible way to address this concern would be to have the
> underlying protocol be bytes/str (since boundary code frequently needs
> to handle the paths-are-bytes assumption in POSIX), but offer an
> "os.fspathname" API that rejected bytes output from os.fspath. That
> is, it would be equivalent to:
>
> def fspathname(path):
> name = os.fspath(path)
> if not isinstance(name, str):
> raise TypeError("Expected str for pathname, not
> {}".format(type(name)))
> return name
>
> That way folks that wanted the clean "must be str" signature could use
> os.fspathname, while those that wanted to accept either could use the
> lower level os.fspath.

I'm not necessarily opposed to this. I kept bringing up bytes in the
discussion because os.path.* etc. and DirEntry support bytes and will
need to keep doing so for backwards compatibility.  I have no
intention to use bytes pathnames myself. But it may break existing
code if functions, for instance, began to decode bytes paths to str if
they did not previously do so (or to reject them). It is indeed a lot
safer to make new code not support bytes paths than to change the
behavior of old code.

But then again, do we really recommend new code to use os.fspath (or
os.fspathname)? Should they not be using either pathlib or os.path.*
etc. so they don't have to care? I'm sure Ethan and his library (or
some other path library) will manage without the function in the
stdlib, as long as the dunder attribute is there.

So I'm, once again, posing this question (that I don't think got any
reactions previously): Is there a significant audience for this new
function, or is it enough to keep it a private function for the stdlib
to use? That handful of third-party path libraries can decide for
themselves if they want to (a) reject bytes or (b) implicitly fsdecode
them or (c) pass them through just like str, depending on whatever
their case requires in terms of backwards compatiblity or other goals.

If we forget about the os.fswhatever function, we only have to decide
whether the magic dunder attribute can be str or bytes or just str.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Barker
On Mon, Apr 11, 2016 at 10:40 PM, Greg Ewing 
wrote:
>
> So the ONLY thing
>> you should do with it is pass it along to another low level system
>> call.
>>
>
> Not quite -- you can separate it into components and
> work with them. Essentially the same set of operations
> that os.path provides.
>

ahh yes, so while posix claims that paths are "just a char*", they are
really bytes where we can assume that the byte with value 2F is the pathsep
(and that 2E separates an extension?), so I suppose os.path is useful. But
I still think that most of us should never deal with bytes paths, and the
few that need to should just work with the low level functions and be done
with it.

One more though came up just now: there are different level sof
abstractions and representations for paths. We don't want to make Path a
subclass of string, because Path is supposed to be a higher level
abstraction -- good.

then at the bottom of the stack, we NEED the bytes level path, because that
what ultimately gets passed to the OS.

THe legacy from the single-byte encoding days is that bytes and strings
were the same, so we could let people work with nice human readable
strings, while also working with byte paths in the same way -- but those
days are gone -- py3 make s clear (and important) distiction between nice
human readable strings  and the bytes that represent them.

So: why use strings as the lingua franca of paths? i.e. the basis of the
path protocol. maybe we should support only two path representations:

1) A "proper" path object -- i.e. pathlib.Path or anything else that
supports the path protocol.

2) the bytes that the OS actually needs.

this would mean that the protocol would be to have a __pathbytes__() method
that woulde return the bytes that should be passed off to the OS.

A posix Path implementation could store that internal bytes representation,
so it could pass it off unchanged if that's all you need to do.

Any current API that takes bytes could be made to easily work.

I'm SURE I'm missing something really big here, but it seems like maybe
it's better to get farther from "strings as paths" rather than closer to
it

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Ethan Furman

On 04/11/2016 02:58 PM, Ethan Furman wrote:

Sticking points:
---

Do we allow bytes to be returned from os.fspath()?  If yes, then do we
allow bytes from __fspath__()?


On 04/11/2016 10:28 PM, Stephen J. Turnbull wrote:
> In text applications, "bytes as carcinogen" is an apt metaphor.

On 04/12/2016 08:25 AM, Chris Angelico wrote:
> I would say No and No, on the basis that it's *far* easier to widen
> their scope in 3.7 than to narrow it.

On 04/11/2016 08:45 PM, Nick Coghlan wrote:
> I've come around to the point of view that allowing both str and
> bytes-like objects to pass through unchanged makes sense, with the
> rationale being the one someone mentioned regarding ease-of-use in
> os.path.
[...]

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath.


I think this is the way forward:  offer a standard way to get 
paths-as-strings, with an easily supported way of working with 
paths-as-bytes.


This could be with on os.fspathname() & os.fspath() pair of functions, 
or with a single function that has a parameter specifying what to do 
with bytes objects: reject (default), accept, or (maybe) an encoding to 
use to coerce to bytes.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > One possible way to address this concern would be to have the
 > underlying protocol be bytes/str (since boundary code frequently
 > needs to handle the paths-are-bytes assumption in POSIX),

What "needs"?  As has been pointed out several times, with PEP 383 you
can deal with bytes losslessly by using an arbitrary codec and
errors=surrogateescape.  I know why *I* use bytes nevertheless:
because when I must guess the encoding, it just makes more sense to
read bytes and then iterate over codecs until the result looks like
words I know in some language.

I don't understand why people who mostly believe "bytes are text, too"
because almost all they ever see are bytes in the range 0x00-0x7f need
bytes.  For them, fsdecode and fsencode DTRT.

If you want to claim "efficiency", I can't gainsay since I don't know
the applications, but if you're trying to manipulate file names
millions of times per second, I have to wonder what you're doing with
them that benefits so much from Path.

 > but offer an "os.fspathname" API that rejected bytes output from
 > os.fspath.

Either it's a YAGNI because I'm not going to get any bytes in the
first place, or it raises where I probably could have done something
useful with bytes if I were expecting them (see "pathological" below).

 > That way folks that wanted the clean "must be str" signature

Er, I don't need no steenkin' "clean signature".  I need str, and if
I can't get it from __fspath__, there's always os.fsdecode.  But this
is serious horse-before cart-putting, punishing those who do things
Python-3-ishly right.

 > The ambiguity in question here is inherent in the differences between
 > the way POSIX and Windows work,

Not with PEP 383, it's not.  And I don't do Windows, so my preference
for str has nothing to do with it mapping to native OS APIs well.

The ambiguity in question here is inherent in the differences between
the ways Python 2 and Python 3 programmers work on POSIX AFAICS.
Certainly, there will be times when fsdecode doesn't DTRT.  So those
times you have to use an explicit bytes.decode.  Note that when you
*do* care enough to do that, it's because the Path is *text* -- you're
going to display it to a human, or pass it out of the module.  If all
you're going to do is access the filesystem object denoted, fsdecode
does a sufficiently accurate job.

So if for some reason you're getting bytes at the boundary, I see no
reason why you can't have a convenience constructor

def pathological(str_or_bytes_or_path_seq):
args = []
for s_o_b in str_or_bytes_or_path_seq:
args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b)
return pathlib.Path(str_or_path_list)

for when that's good enough (maybe Antoine would even allow it into
pathlib?)

 > so there are limits to how far we can go in hiding it without
 > making things worse rather than better.

What "hide"?  Nobody is suggesting that the polymorphic os APIs should
go away.  Indeed, they are perfect TOOWTDI, giving the programmer
exactly the flexibility needed *and no more*, *at* the boundary.

The questions on my mind are:

(A) Why does anybody need bytes out of a pathlib.Path (or other
__fspath__-toting, higher-level API) *inside* the boundary?  Note
that the APIs in os (etc) *don't need* bytes because they are
already polymorphic.

(B) If they do, why can't they just apply bytes() to the object?  I
understand that that would offend Ethan's aesthetic sense, so it's
worth looking for a nice way around it.  But allowing __fspath__
to return bytes or str is hideous, because Paths are clearly on
the application side of the boundary.

Note that bytes() may not have the serious problem that str() does of
being too catholic about its argument: nothing in __builtins__ has a
__bytes__!  Of course there are a few things that do work: ints, and
sequences of ints.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Sven R. Kunze

Sorry for disturbing this thread's harmony.


On 12.04.2016 08:00, Ethan Furman wrote:

On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote:


Consider os.path.join:


Why in the world do the  os.path functions need to work with Path
objects? ( and other conforming objects)


Because library XYZ that takes a path and wants to open it shouldn't 
have to care whether that path is a string or pathlib.Path -- but if 
os.open can't use pathlib.Path then the library has to care (or the 
user has to care).



This all started with the goal of using Path objects in the stdlib,
but that's for opening files, etc.


Etc. as in os.join?  os.stat? os.path.split?


Path is an alternative to os.path -- you don't need to use both.




I agree with that quote of Chris.

As a user you don't, no.  As a library that has no control over what 
kind of "path" is passed to you -- well, if os and os.path can accept 
Path objects then you can just use os and os.path; otherwise you have 
to use os and os.path if passed a str or bytes, and pathlib.Path if 
passed a pathlib.Path -- so you do have to use both.


I don't agree here. There's no need to increase the convenience for a 
library maintainer when it comes to implicit conversions.


When people want to use your library and it requires a string, the can 
simply use "my_path.path" and everything still works for them when they 
switch to pathlib.



Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 7:58 AM, Ethan Furman  wrote:
> Sticking points:
> ---
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow
> bytes from __fspath__()?
>

I would say No and No, on the basis that it's *far* easier to widen
their scope in 3.7 than to narrow it. Once you declare that one or
both of these may return bytes, it becomes an annoying incompatibility
to change that (even if it *is* marked provisional), which almost
certainly means it won't happen. By restricting them both, we force
the issue: if you want bytes, you'll know about it.

I'd also prefer to stick to Unicode path names, for reasons I've
stated in other threads. Undecodable path byte streams can be handled
already, so what are we really gaining by allowing a Path-like object
to emit bytes? If it becomes a major issue for a lot of types, it
wouldn't be hard to add a helper function somewhere (or a mixin class
that provides a ready-to-go __fspath__, which might well be
sufficient).

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Nick Coghlan
On 12 April 2016 at 15:28, Stephen J. Turnbull  wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath. That
is, it would be equivalent to:

def fspathname(path):
name = os.fspath(path)
if not isinstance(name, str):
raise TypeError("Expected str for pathname, not
{}".format(type(name)))
return name

That way folks that wanted the clean "must be str" signature could use
os.fspathname, while those that wanted to accept either could use the
lower level os.fspath.

The ambiguity in question here is inherent in the differences between
the way POSIX and Windows work, so there are limits to how far we can
go in hiding it without making things worse rather than better.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Paul Moore
On 12 April 2016 at 06:28, Stephen J. Turnbull  wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.
>
> In text applications, "bytes as carcinogen" is an apt metaphor.
>
> Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
> text-processing applications is more important than the inconvenience
> to byte-shoveling applications.  But there is a need to be
> parsimonious with polymorphism.

As someone who has done a lot of work helping projects to port from
the 2.x bytes/text model to the 3.x model, I have similar concerns
that rooting out the source of bytes objects appearing in a program
could be an issue with the proposed "return either" approach. The most
effective tool I have found in fixing programs with text/bytes issues
is carefully and thoroughly annotating precisely which functions
accept and return bytes, and which accept and return text. The sort of
mixed-mode processing we're talking about here makes that
substantially harder. And note that the signature of os.fspath can
return bytes or text *independent* of the type of the argument - it's
not a "bytes in, bytes out" function like the usual pattern of
"polymorphic support for bytes".

But just like Stephen, I have no feel for how significant the risk
will be in real life. I've never worked on code that actually has a
need for bytestring paths (particularly now that surrogateescape
ensures that most cases "just work").

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Ethan Furman

On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote:


Consider os.path.join:


Why in the world do the  os.path functions need to work with Path
objects? ( and other conforming objects)


Because library XYZ that takes a path and wants to open it shouldn't 
have to care whether that path is a string or pathlib.Path -- but if 
os.open can't use pathlib.Path then the library has to care (or the user 
has to care).



This all started with the goal of using Path objects in the stdlib,
but that's for opening files, etc.


Etc. as in os.join?  os.stat? os.path.split?


Path is an alternative to os.path -- you don't need to use both.


As a user you don't, no.  As a library that has no control over what 
kind of "path" is passed to you -- well, if os and os.path can accept 
Path objects then you can just use os and os.path; otherwise you have to 
use os and os.path if passed a str or bytes, and pathlib.Path if passed 
a pathlib.Path -- so you do have to use both.



- the names would be fspath and __fspath__, since the result may be
either a path name as text, or an encoded path name as bytes


You just used the phrase "path name as bytes" -- so why is
__pathname__ inappropriate if it might return bytes?


No, he used the phrase "*encoded* path name as bytes".  Names are 
typically represented as text, and since bytes might be returned we 
don't want a signal that says text.



I like __pathname__ better because this entire effort is because we'
be decided itMs important to make the distinction between a "path" and
the text representation of said path.


No, this entire effort is to make pathlib work with the rest of the stdlib.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Greg Ewing

Chris Barker - NOAA Federal wrote:

Why in the world do the  os.path functions need to work with Path
objects?


So that applications using path objects can pass them
to library code that uses os.path to manipulate them.


I'm confused about what a bytes path IS -- is it encoded?


It's a sequence of bytes identifying a file. Often it
will be an encoding of som piece of text in the file
system encoding, but there's no guarantee of that.


Can you assume it can be decoded ?


Only if you use an encoding in which all byte sequences
are valid, such as latin1 or utf8+surrogateescape.


So the ONLY thing
you should do with it is pass it along to another low level system
call.


Not quite -- you can separate it into components and
work with them. Essentially the same set of operations
that os.path provides.


- the names would be fspath and __fspath__, since the result may be
either a path name as text, or an encoded path name as bytes


I like __pathname__ better because this entire effort is because we'
be decided itMs important to make the distinction between a "path" and
the text representation of said path.


I agree -- the term "pathname" can cover both text and
bytes. When posix talks about pathnames it's really
talking about bytes.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Stephen J. Turnbull
Donald Stufft writes:

 > I think yes and yes [__fspath__ and fspath should be allowed to
 > handle bytes, otherwise] it seems like making it needlessly harder
 > to deal with a bytes path

It's not needless.  This kind of polymorphism makes it hard to review
code locally.  Once bytes get a foothold inside a text application,
they metastasize altogether too easily, and you end up with TypeErrors
or UnicodeErrors quite far from the origin.  Debugging often requires
tracing data flows over hill and over dale while choking from the
dusty trail, or band-aids like a top-level "except UnicodeError:
log_and_quarantine(bytes)".  I can't prove that returning bytes from
these APIs is a big risk in this sense, but I can't see a way to prove
that it's not, either, given that their point is duck-typing, and
therefore they may be generalized in the future, and by third parties.

I understand that there are applications where it's bytes all the way
down, but by the very nature of computing systems, there are systems
where bytes are decoded to text.  For historical reasons (the encoding
Tower of Babel), it's very error-prone to do that on demand.  Best
practice is to do the conversion as close to the boundary as possible,
and process only text internally.

In text applications, "bytes as carcinogen" is an apt metaphor.

Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
text-processing applications is more important than the inconvenience
to byte-shoveling applications.  But there is a need to be
parsimonious with polymorphism.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Chris Barker - NOAA Federal
>  with the
> rationale being the one someone mentioned regarding ease-of-use in
> os.path.
>
> Consider os.path.join:

Why in the world do the  os.path functions need to work with Path
objects? ( and other conforming objects)

Thus all started with the goal of using Path objects in the stdlib,
but that's for opening files, etc. Path is an alternative to os.path
-- you don't need to use both.

And if you do have a byte path, you can stick with os.path

BTW,

I'm confused about what a bytes path IS -- is it encoded? Can you
assume it can be decoded ? It seems to me that the ONLY time you
should get a byte path is from a low level system call on a posix
system, and you may have no idea how it's encoded. So the ONLY thing
you should do with it is pass it along to another low level system
call.

I can't see why we should support anything else with bytes objects.

> - the names would be fspath and __fspath__, since the result may be
> either a path name as text, or an encoded path name as bytes

You just used the phrase "path name as bytes" -- so why is
__pathname__ inappropriate if it might return bytes?

I like __pathname__ better because this entire effort is because we'
be decided itMs important to make the distinction between a "path" and
the text representation of said path.

Just sayin'

-CHB
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Nick Coghlan
On 12 April 2016 at 13:45, Nick Coghlan  wrote:
> Consider os.path.join: with a permissive os.fspath, the necessary
> update should just be to introduce "map(os.fspath, args)" (or its C
> equivalent), and then continue with the existing bytes vs str handling
> logic.

That does remind me: once a patch is available, we should check the
benchmark numbers with the patch applied. I'd expect the new protocol
overhead to be swamped by the actual IO costs, but this kind of low
level change can have surprising consequences.

Regarding the type checks, PyObject_AsFilesystemPath (or whatever we
call it) will be implemented in C, with os.fspath just calling that,
so doing "PyUnicode_Check(path) || PyBytes_Check(path)" on the result
will be both cheap and convenient for API consumers (since it means
they know they only have to cope with bytes or str instances
internally, and will get a clear error message if handed something
else).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Nick Coghlan
On 12 April 2016 at 07:58, Ethan Furman  wrote:
> Sticking points:
> ---
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow
> bytes from __fspath__()?

I've come around to the point of view that allowing both str and
bytes-like objects to pass through unchanged makes sense, with the
rationale being the one someone mentioned regarding ease-of-use in
os.path.

Consider os.path.join: with a permissive os.fspath, the necessary
update should just be to introduce "map(os.fspath, args)" (or its C
equivalent), and then continue with the existing bytes vs str handling
logic.

Functions consuming os.fspath can then decide on a case-by-case basis
how they want to handle binary paths: either use them as is (which
will usually work on mostly-ASCII systems), convert them to text with
os.fsdecode (which will usually work on *nix systems), or disallow
them entirely (which would probably only be appropriate for libraries
that wanted to ensure support for non-ASCII paths on Windows systems).

That then cascades into the other open questions mentioned:

- permitted return types for both fspath and __fspath__ would be (str, bytes)
- the names would be fspath and __fspath__, since the result may be
either a path name as text, or an encoded path name as bytes

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Donald Stufft

> On Apr 11, 2016, at 5:58 PM, Ethan Furman  wrote:
> 
> name:
> 
> 
> We are down to two choices:
> 
> - __fspath__, or
> - __fspathname__
> 
> The final choice I suspect will be affected by the choice to allow (or not) 
> bytes.


+1 on __fspath__, -0 on __fspathname__

> 
> 
> 
> add a Path ABC:
> --
> 
> undecided


I think it makes sense to add it, but maybe only in 3.6? Path accepting code 
could be updated to do something like `isinstance(obj, (bytes, str, PathMeta))` 
which seems like a net win to me.

> 
> 
> Sticking points:
> ---
> 
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow 
> bytes from __fspath__()?

I think yes and yes, it seems like making it needlessly harder to deal with a 
bytes path in the scenarios that you’re actually dealing with them is the kind 
of change that 3.0 made that ended up getting rolled back where it could.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] pathlib - current status of discussions

2016-04-11 Thread Ethan Furman

name:


We are down to two choices:

- __fspath__, or
- __fspathname__

The final choice I suspect will be affected by the choice to allow (or 
not) bytes.



method or attribute:
---

method


built-in:


Almost - we'll put it in the os module


add to str:
--

No, not all strings are paths.


add to C API:


Yes.  Possible names include PyUnicode_FromFSPath and PyObject_Path -- 
again, the choice of bytes inclusion will affect the final choice of name.



add a Path ABC:
--

undecided


Sticking points:
---

Do we allow bytes to be returned from os.fspath()?  If yes, then do we 
allow bytes from __fspath__()?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com