Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-21 Thread Nick Coghlan
On 20 April 2016 at 13:16, Stephen J. Turnbull  wrote:

> It's people who live in monolingual mono-encoding environments who
> will be using bytes successfully, and be resistent to costly changes
> that don't make their lives better.  But the bytes vs. text cost is
> inherent in using pathlib, so polymorphism doesn't help promote
> pathlib.  It might help promote use of os.scandir in bytes-oriented
> code, though I don't see that as a huge effect nor more than mildly
> desirable.  Is it?
>

Some of us are also interested in optimised network service development use
cases where UTF-8 already rules the world [1]. It's a vastly different
domain from desktop computing, and different even from traditional stateful
servers where the same instance may be kept running for years.

When "absolutely everything is UTF-8, and your system boundaries are
policed accordingly" is a valid assumption, then writing bytes level
network code is a far more viable option than when you're writing software
to give to other people to run in arbitrary environments (that's how Go is
able to get away with its "all system boundaries use UTF-8" approach - if
you're not prepared to meet that precondition, you don't choose to use Go
in the first place).

I think this is also why we're talking past each other - as a default, I
completely agree it makes sense to present a "str-only" API (that's where
my proposed fspath/_raw_fspath split came from). However, there really are
contexts where "our text is always stored as bytes, those bytes are always
UTF-8 encoded, and our software only needs to work on *nix systems" is a
reasonable approach, and those are the domains where being *able* to stay
entirely in the binary domain is actually a desirable characteristic,
rather than merely a tool for migrating from Python 2.

Cheers,
Nick.

[1] http://utf8everywhere.org/

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-21 Thread Nick Coghlan
On 20 April 2016 at 13:16, Stephen J. Turnbull  wrote:

> What's left is DirEntry (and perhaps other producers of byte-oriented
> objects in os and os.path).  If they're currently using DirEntry,
> they're currently accessing .path.  Surely bytes users can continue
> doing that, even if we offer str users the advantage of new protocols?
>

The consuming functions aren't currently allowing DirEntry objects either
(since scandir is even newer than pathlib), so we want to allow both
pathlib and DirEntry objects with a single change to consuming functions.

I'd like to see that change in consuming functions be as simple as
possible: an unconditional "path = os._raw_fspath(path)" at the start of
their existing input processing

Those consuming functions fall into one of three categories:

1. They're bytes/str polymorphic
2. They're bytes only
3. They're str only

Whichever category they're in, their existing argument processing will be
readily able to cope with a polymorphic result from os._raw_fspath, since
that's no different from today, where the argument passed in may be bytes
or str and they need to handle that appropriately.

Having os.fspath(path) as a specifically str-only layer then gives
consuming functions in category 3 an alternative option, and encourages
category 3 functions and APIs (like pathlib) as the future default, without
getting in the way of the folks that need to mess about down in the low
level weeds of operating system interfaces.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-20 Thread Koos Zevenhoven
On Wed, Apr 20, 2016 at 6:16 AM, Stephen J. Turnbull  wrote:
>
> (1) some really attractive producer of pathlib.Paths will be
> published, and
>

Yes, pathlib is str-only, so this sounds just right.

> (2) people will want to plug that producer into their bytes paths
> consumers using os.fspath(path) "and be done with it".
>

No, fspath can't know that is the the right thing to do.  There should
be *someone* that is aware of the encoding that happens, either the
provider or the consumer. That byte path consumer, assuming it wants
to support the behavior you describe, should use os.fsencode instead
of os.fspath, which will do exactly what you want, and just as easy
for the bytes path consumer to implement!

(Unless you want to explicitly reject plain str objects, which you
would then indeed do *explicitly*, but I'm not sure there is a point
in accepting plain bytes and str-based pathlib objects but not str).

To avoid further unnecessary discussion, please read [1] carefully,
where I already explained this, among other things.

-Koos

[1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-20 Thread Koos Zevenhoven
On Wed, Apr 20, 2016 at 6:11 AM, Stephen J. Turnbull  wrote:
> Koos Zevenhoven writes:
>  > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull  
> wrote:
>  > >
>  > > AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
>  > > that actually wants it.
>  >
>  > It might be,
>
> May I take that as meaning you just jumped to the conclusion that
> extending polymorphism is useful on no actual evidence of usefulness?

No you may not! YAGNI almost never means "you are *never* going to
need it". And if you implement a feature, better implement it well. If
a variation of the feature is rarely used, that is perfectly fine. I
think leaving bytes out would complicate things. If os.fspath does its
job well, everyone should be happy.

I kept bringing up bytes paths, because that is already a feature in
Python 3. Then (already some time ago in these discussions) I briefly
visited the thought of 'can we deprecate bytes paths', and it then
quickly became clear to me that is not going to happen any time soon.

In other words: As long as bytes paths are supported, they should be
supported consistently. I don't want DirEntry to behave differently
when the underlying type is bytes, which is one of the things I've
been talking about all the time. That would just be broken. And as you
also understand, one point is to allow passing DirEntry to open. Or
any of the os.path functions.

An some more: I don't want open(direntry_obj) to ever raise because it
is the bytes flavor of direntry, because, when they are created,
DirEntry objects always point to existing objects on the file system.
I also don't want implicit conversions between str and bytes paths,
because there are cases where they will produce strange results and
exceptions. [Yes, way back in the p-string thread, I did first suggest
a similiar thing that implied implicit conversion, but I soon
abandoned that part.]

Not that I will ever use these features---just to do this right.

>  > but as long as bytes paths are supported polymorphicly all over the
>  > stdlib, we won't get rid of supporting bytes paths. So are you
>  > proposing to deprecate bytes paths?
>
> You claim "almost always want str", Ethan claims "bias against bytes."
> Sorry, guys, you can't have it both ways.  Either bytes paths are
> discouraged (not "deprecated", not yet), or they aren't.
>
> I say, let's not encourage them.

It's all essentially the same thing:

"almost always want str":
Yes, I still claim this. This is the reason for str (and rejecting
bytes) being the default for third-party code. If we wanted to, we
could even leave bytes support out of the documentation, so no-one
will know about it unless they already deal with bytes paths. However,
I dont think we should do that---we should just strongly discourage
using the bytes version unless there is a reason to, and you know what
you are doing.

"bias against bytes":
I agree with this too. This is in line with making str (and rejecting
bytes) the default for third-party code.

"let's not encourage them":
And I even agree with this, as you may have noticed.

I just don't believe in deliberately making implementations awkward
for the bytes-based paths. Bytes paths already exist, not because of
Python 2 (as you know), but because not all operating systems
guarantee that paths make sense in any encoding, and people may need
to work at that level.

There is no need to make working with bytes-based paths awkward, and
we can support them with little additional work compared to supporting
str-based rich path objects. The additional work is mostly this
discussion.

> Ie, keep the status quo for bytes,
> and make things better for the preferred str.  Yes, that means
> discouraging bytes relative to str in this context.  That's a Python 3
> principle, one strong enough to justify the huge compatibility break
> involved in making str be Unicode.  That compatibility break has been
> extremely successful in my personal experience as a sometime Python
> teacher and Mailman developer, though the Mercurial developers have a
> different POV.

Yes. Luckily, people are already using str-based paths. We don't need
any more discrete transitions. If linux will start to enforce an
encoding, as Guido and Random832 may be suggesting on python-ideas,
these already obscure bytes paths will slowly fade away.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-20 Thread Stephen J. Turnbull
Eric Snow writes:
 > On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
 > > Ah, but you see that doesn't make porting easy.

 > Perhaps I missed previous discussion on the point, but why not support
 > both __fspath__() -> str and __fssyspath__() -> bytes?

That's fine by me, I can live with that although I don't really like
it.  But the proponents of polymorphic __fspath__ think it's
unnecessary.

Why I don't like it: what's going to end up happening is that a
__fspath__- or __fssyspath__-bearing object of unknown provenance is
going to get passed to polymorphic os functions that won't complain,
and a few million cycles later something is going to access
fileobj.path expecting bytes and getting str, and blooey!

Also I just don't see a need for bytes when the original purpose of
this was to support passing pathlib.Path objects to open.  It's also
nice to pass DirEntry objects to open, but it's not obvious to me that
we need to support bytes since only new code can use this feature, and
there's a way to not-support them that doesn't cause any new problems.

It's not that I want bytes to go away[1], it's just that the playing
field will tilt a little more against them in new code.

Footnotes: 
[1]  I wouldn't weep, but I wouldn't laugh, either.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Chris Angelico
On Wed, Apr 20, 2016 at 1:16 PM, Stephen J. Turnbull  wrote:
> Brett Cannon writes:
>
>  > Now if you can convince me that the use of bytes paths is very
>  > minimal
>
> I doubt that I can do that, because all that Python 2 code is
> effectively bytes.  To the extent that people are just passing it into
> their bytes-domain code and it works for them, they probably "port" to
> Python 3 by using bytes for paths.  I just don't think bytes usage per
> se matters to the issue of polymorphism of __fspath__.
>

I would prefer to see this kind of code ported to Python 3 by using
native strings.

Python 2 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
data = json.load(f)
for scene in data["scene_order"]:
print scene["name"]

Python 3 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
data = json.load(f)
for scene in data["scene_order"]:
print(scene["name"])

The bulk of path string literals in Python programs will be all-ASCII.
Porting to Py3 won't fundamentally change this code, yet suddenly now
it's using Unicode strings. In reality, both versions of this example
are using *text* strings. The Py3 version has text in the source code,
a stream of Unicode codepoints in the runtime, and then (since I ran
this on Linux) encodes that to bytes for the file system. The Py2
version just does that conversion a little earlier: text in the source
code, a stream of eight-bit "texty bytes" in the runtime, and those
same bytes get given to the fs.

There's no reason to slap a b"..." prefix on every path for Py3. There
might be specific situations where you want that, but for the most
part, those paths came from human-readable text anyway, so they should
stay that way.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > The gist of the motivation for bytes/str polymorphism here is similar to
 > that for restoring __mod__ polymorphism in
 > https://www.python.org/dev/peps/pep-0461/:

I don't think it is, actually.  Filenames off the wire cannot be
relied on to be in the local file system encoding, and that matters.
The semantics of a filename or path requires getting the encodings
matched.  You cannot be encoding-agnostic.

On the other hand, streams of characters are merely a special case of
streams of tokens, and the principles that apply to editing streams of
characters apply to more general tokens, including bytes and XML.  You
*can* be content-agnostic as long as you define semantics in terms of
moving tokens around, and not in terms of their content.

BTW, my opposition to PEP 461 was based on the same mistake with
opposite polarity: I think of bytes as encoded text *first*, and
therefore feared PEP 461 for quite insufficient reason.  Most
applications of PEP 461 won't be for text.

 > This is also why I ended up proposing pushing the complexity down into a
 > documented-but-underscore-prefixed API: folks writing pure Python 3
 > application code *really* shouldn't need to worry about the bytes
 > support

You can't have that with your proposal.  They are going to (at least
in theory) get a new TypeError which they will not be expecting (vs
bytes, which are implicit in the object they have, where previously
they would have got one vs. Path or DirEntry which they were
expecting).  So they will have to learn that much about bytes support.

 > in the protocol, but for operating system level use cases, not having it
 > readily available to 2/3 compatible Python code would be a pain.

Erm, how do you propose to make this protocol available to Python-2-
compatible code?  Pervasively monkey-patch the Python 2 os module?
Even if so, is it our responsibility to worry about that?

BTW, I came to this conclusion thinking about the poster boy for PEP
461, Mercurial.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Brett Cannon writes:

 > Now if you can convince me that the use of bytes paths is very
 > minimal

I doubt that I can do that, because all that Python 2 code is
effectively bytes.  To the extent that people are just passing it into
their bytes-domain code and it works for them, they probably "port" to
Python 3 by using bytes for paths.  I just don't think bytes usage per
se matters to the issue of polymorphism of __fspath__.

 > Ah, but you see that doesn't make porting easy. If I have a bunch
 > of path-manipulating code using os.path already and I want to add
 > support for pathlib I can either (a) rewrite all of that
 > path-manipulating code to work using pathlib, or (b) simply call
 > `path = os.fspath(path)` and be done with it.

OK, so what matters here is not "how many people are using bytes".
They can keep using os.path, which is what they probably have already
been using.  What we are worrying about is that

(1) some really attractive producer of pathlib.Paths will be
published, and

(2) people will want to plug that producer into their bytes paths
consumers using os.fspath(path) "and be done with it".

Excuse me, but that doesn't make sense as written.  Path.__fspath__
will return str, in any case.  So these developers have to consume
text to use pathlib, even merely as a consumer of Paths.  No need for
polymorphism here, simply because it won't be used in this instance.

What's left is DirEntry (and perhaps other producers of byte-oriented
objects in os and os.path).  If they're currently using DirEntry,
they're currently accessing .path.  Surely bytes users can continue
doing that, even if we offer str users the advantage of new protocols?

I conclude that there is no real use in having a polymorphic
__fspath__ unless callers of os.fspath can communicate desired return
type to it, and it implicitly coerces to that type.  But then open and
friends *implicitly* consume __fspath__.  So there probably needs to
be a way to communicate the desired type to them in the case where
they receive an __fspath__-bearing object so they can tell os.fspath
what their callers want, no?

Supporting both "pipeline polymorphism" of this kind and implicit
conversion protocols at the same time is quite complicated, I think.

 > [Folks] have convinced me that people do some really crazy stuff
 > with their file systems and that it isn't isolated to just one or
 > two people.  And so it becomes this situation where we need to ask
 > ourselves if we are going to tell them to just deal with it or help
 > them transition.

People who have to deal with really crazy stuff in filesystems are
already manipulating paths as text.  It's not we who need help with
the transition that matters (bytes to text).  We can use os.path or
pathlib, but bytes just don't matter because we're not using them in
path manipulations.

It's people who live in monolingual mono-encoding environments who
will be using bytes successfully, and be resistent to costly changes
that don't make their lives better.  But the bytes vs. text cost is
inherent in using pathlib, so polymorphism doesn't help promote
pathlib.  It might help promote use of os.scandir in bytes-oriented
code, though I don't see that as a huge effect nor more than mildly
desirable.  Is it?

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Koos Zevenhoven writes:
 > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull  
 > wrote:
 > >
 > > AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
 > > that actually wants it.
 > 
 > It might be,

May I take that as meaning you just jumped to the conclusion that
extending polymorphism is useful on no actual evidence of usefulness?

 > but as long as bytes paths are supported polymorphicly all over the
 > stdlib, we won't get rid of supporting bytes paths. So are you
 > proposing to deprecate bytes paths?

You claim "almost always want str", Ethan claims "bias against bytes."
Sorry, guys, you can't have it both ways.  Either bytes paths are
discouraged (not "deprecated", not yet), or they aren't.

I say, let's not encourage them.  Ie, keep the status quo for bytes,
and make things better for the preferred str.  Yes, that means
discouraging bytes relative to str in this context.  That's a Python 3
principle, one strong enough to justify the huge compatibility break
involved in making str be Unicode.  That compatibility break has been
extremely successful in my personal experience as a sometime Python
teacher and Mailman developer, though the Mercurial developers have a
different POV.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Brett Cannon
On Tue, 19 Apr 2016 at 15:22 Eric Snow  wrote:

> On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
> > Ah, but you see that doesn't make porting easy. If I have a bunch of
> > path-manipulating code using os.path already and I want to add support
> for
> > pathlib I can either (a) rewrite all of that path-manipulating code to
> work
> > using pathlib, or (b) simply call `path = os.fspath(path)` and be done
> with
> > it. Basically if you have written any code that uses os.path then you
> will
> > have to care about (a) or (b) as a way to add support for pathlib short
> of
> > the `str(path)` hack we're all working to get away from. And if people
> truly
> > liked option (a) then this conversation wouldn't be such a big deal as we
> > would have seen more people using pathlib already (yes, the provisional
> tag
> > may have scared some off, but my guess is it's more from not wanting to
> > rewrite os.path-using code).
> >
> > Now if you can convince me that the use of bytes paths is very minimal
> and
> > thus people doing path manipulations with them will be a very small
> minority
> > then I'm happy to try and use this to keep pushing people towards
> avoiding
> > bytes for file paths. But over the years people such as yourself,
> Stephen,
> > have convinced me that people do some really crazy stuff with their file
> > systems and that it isn't isolated to just one or two people. And so it
> > becomes this situation where we need to ask ourselves if we are going to
> > tell them to just deal with it or help them transition.
> >
> > The other way to convince me is that people needing to support older
> > versions of Python will use `path = path.__fspath__() if hasattr(path,
> > '__fspath__') else path` and that allowing bytes with that idiom is
> going to
> > cost them dearly. My current assumption is that it won't because people
> > using that idiom are using os.path and those functions will complain when
> > mixing str and bytes together, but I'm open to being convinced otherwise.
> >
> > I guess what I'm trying to get at is that I understand the desire to get
> > people to get the bytes path habit, but to me the best way will be to get
> > people quickly and easily transitioned over to pathlib as a carrot rather
> > than using the lack of bytes path support in this transition as a stick.
>
> Perhaps I missed previous discussion on the point, but why not support
> both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
> NotImplemented would indicate "try the other one".  For example,
> DirEntry.__fspath__() would return NotImplemented when the underlying
> value is bytes and vice-versa.
>

It was deemed more complexity than necessary for the protocol to have two
functions. Either __fspath__ will be polymorphic or it will only return str.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Eric Snow
On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
> Ah, but you see that doesn't make porting easy. If I have a bunch of
> path-manipulating code using os.path already and I want to add support for
> pathlib I can either (a) rewrite all of that path-manipulating code to work
> using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
> it. Basically if you have written any code that uses os.path then you will
> have to care about (a) or (b) as a way to add support for pathlib short of
> the `str(path)` hack we're all working to get away from. And if people truly
> liked option (a) then this conversation wouldn't be such a big deal as we
> would have seen more people using pathlib already (yes, the provisional tag
> may have scared some off, but my guess is it's more from not wanting to
> rewrite os.path-using code).
>
> Now if you can convince me that the use of bytes paths is very minimal and
> thus people doing path manipulations with them will be a very small minority
> then I'm happy to try and use this to keep pushing people towards avoiding
> bytes for file paths. But over the years people such as yourself, Stephen,
> have convinced me that people do some really crazy stuff with their file
> systems and that it isn't isolated to just one or two people. And so it
> becomes this situation where we need to ask ourselves if we are going to
> tell them to just deal with it or help them transition.
>
> The other way to convince me is that people needing to support older
> versions of Python will use `path = path.__fspath__() if hasattr(path,
> '__fspath__') else path` and that allowing bytes with that idiom is going to
> cost them dearly. My current assumption is that it won't because people
> using that idiom are using os.path and those functions will complain when
> mixing str and bytes together, but I'm open to being convinced otherwise.
>
> I guess what I'm trying to get at is that I understand the desire to get
> people to get the bytes path habit, but to me the best way will be to get
> people quickly and easily transitioned over to pathlib as a carrot rather
> than using the lack of bytes path support in this transition as a stick.

Perhaps I missed previous discussion on the point, but why not support
both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
NotImplemented would indicate "try the other one".  For example,
DirEntry.__fspath__() would return NotImplemented when the underlying
value is bytes and vice-versa.

A str-specific os.fspath would looks something like this:

def fspath(path):
try:
fspath = type(path).__fspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered
raise TypeError

...and a more lenient, polymorphic version (for use by os.path.*,
etc.) would look like this:

def _fspath(path):
try:
fspath = type(path).__fspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered

   try:
fspath = type(path).__fssyspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered

# nothing to do
return path

The hard distinction between the two dunder methods preserves the
conceptual str/bytes division we're aiming for.  It will be much
easier to identify which path implementations are dealing with (or
supporting) bytes paths.  Likewise with the two helpers and their
usage.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Brett Cannon
On Tue, 19 Apr 2016 at 04:46 Stephen J. Turnbull  wrote:

> Brett Cannon writes:
>  > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull 
> wrote:
>
>  > Well, it makes *your* head hurt;
>
> It doesn't, because I have a different (and IMHO better) model.  I can
> interpret yours without pain by comparing to that.
>
>  > By providing os.fspath() I can say that I do not, under any
>  > circumstances, want someone to guess at the encoding some bytes
>  > path is under to get me a string and instead I want to start and
>  > end entirely in a world of strings. IOW os.fspath() lets me work in
>  > such a way that the instant bytes are introduced into my code for
>  > file paths it triggers a TypeError.
>
> Does it really help you work that way?  open is polymorphic, and will
> use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
> don't, there's no point in supporting bytes returns from __fspath__,
> is there?


You're leaving out all of the os.path functions, but you're right that if
they didn't support it like Windows then this entire discussion of bytes
paths would be moot.


>   Application code will normally not be calling os.fspath.
> In the future, pathlib will, I suppose, but even without os.fspath
> pathlib already protects you, as does antipathy.[1]
>

I disagree that application code won't be calling os.fspath.


>
> More effective, then, is just to use pathlib for your Path-hacking
> work as soon as the path-representing object appears, and Path will
> complain about bytes for you.  This is an analogue of the "decode
> bytes at the boundary" principle.
>

Ah, but you see that doesn't make porting easy. If I have a bunch of
path-manipulating code using os.path already and I want to add support for
pathlib I can either (a) rewrite all of that path-manipulating code to work
using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
it. Basically if you have written any code that uses os.path then you will
have to care about (a) or (b) as a way to add support for pathlib short of
the `str(path)` hack we're all working to get away from. And if people
truly liked option (a) then this conversation wouldn't be such a big deal
as we would have seen more people using pathlib already (yes, the
provisional tag may have scared some off, but my guess is it's more from
not wanting to rewrite os.path-using code).

Now if you can convince me that the use of bytes paths is very minimal and
thus people doing path manipulations with them will be a very small
minority then I'm happy to try and use this to keep pushing people towards
avoiding bytes for file paths. But over the years people such as yourself,
Stephen, have convinced me that people do some really crazy stuff with
their file systems and that it isn't isolated to just one or two people.
And so it becomes this situation where we need to ask ourselves if we are
going to tell them to just deal with it or help them transition.

The other way to convince me is that people needing to support older
versions of Python will use `path = path.__fspath__() if hasattr(path,
'__fspath__') else path` and that allowing bytes with that idiom is going
to cost them dearly. My current assumption is that it won't because people
using that idiom are using os.path and those functions will complain when
mixing str and bytes together, but I'm open to being convinced otherwise.

I guess what I'm trying to get at is that I understand the desire to get
people to get the bytes path habit, but to me the best way will be to get
people quickly and easily transitioned over to pathlib as a carrot rather
than using the lack of bytes path support in this transition as a stick.

-Brett



>
>  > Yep, we are stuck with the names unless you want to propose a new
>  > name and deprecate the old one.
>
> I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
> sufficiently ugly to prove my point.
>
>
> Footnotes:
> [1]  Strictly speaking, antipathy protects you from inadvertant mixing
> of bytes and str.
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Nick Coghlan
On 19 April 2016 at 21:55, Stephen J. Turnbull  wrote:

> I really want to know.  I'm not 100% sure that's the right way to go,
> mostly because Nick and Brett are signed up for polymorphism.  But I
> sure haven't seen any explicit arguments for polymorphism, though I've
> asked for them.  AFAICS, everybody just assumed that because some
> related APIs are polymorphic, this one should be, too, and dove into
> the problem of how to make a polymorphic API safe for Python 3.
>

In my case, it's ~5 years of peripheral involvement in porting the Fedora
ecosystem to Python 3. I haven't personally done that much of the actual
porting work, but I've spent plenty of time talking to the folks that are,
and tweaking various things to make their lives easier where I could make
the case that there was either a benefit to Python 3, or at least no harm
to it.

The gist of the motivation for bytes/str polymorphism here is similar to
that for restoring __mod__ polymorphism in
https://www.python.org/dev/peps/pep-0461/: the bytes/str duality is as much
a fact of life when dealing with OS interfaces as it is when dealing with
wire protocols, so if __fspath__ is polymorphic, then it's easier for
compatibility modules like six and future to define their own "fspath"
helper functions that work on both Python 2 and Python 3 across all
supported platforms.

This is also why I ended up proposing pushing the complexity down into a
documented-but-underscore-prefixed API: folks writing pure Python 3
application code *really* shouldn't need to worry about the bytes support
in the protocol, but for operating system level use cases, not having it
readily available to 2/3 compatible Python code would be a pain.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Koos Zevenhoven
On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull  wrote:
>
> AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
> that actually wants it.

It might be, but as long as bytes paths are supported polymorphicly
all over the stdlib, we won't get rid of supporting bytes paths. So
are you proposing to deprecate bytes paths?

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Ethan Furman writes:
 > On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
 > > Koos Zevenhoven writes:
 > 
 > >> After all, we want something that's *almost* exclusively str.
 > >
 > > But we don't want that, AFAICT.  Some clearly want this API to be
 > > unbiased against bytes in the same way the os APIs are unbiased,
 > > because that's what we've got in the current proposal.
 > 
 > Are we reading the same thread?  For my last several replies I am
 > very biased against bytes (and I know I'm not the only one).

I'm not "reinterpreting" what people *write*, I'm looking at *the APIs
they propose and advocate*.  As I wrote, and you quoted.

Except for the original proposal that only supported pathlib.Path, the
facilities advocated are actually unbiased.  It's just as easy to use
bytes as str, but it's proposed not to advertise that fact.  So what?
A 'my.fspath' is trivial to write, and hard to get wrong AFAICS.

Consider a truly biased alternative: __fspath__ of types like DirEntry
would return self when bytes-oriented.  (This addresses the issue of
__fspath__ that coerces to str becoming a timebomb in bytes apps.)
bytes-oriented applications would have to use DirEntry.path.  No
visible difference from now (you get the same API for bytes and the
same TypeError from open), and no loss, except for str-envy.  So use
str!  Why isn't that acceptable to you?  Maybe even TOOWTDI?

I really want to know.  I'm not 100% sure that's the right way to go,
mostly because Nick and Brett are signed up for polymorphism.  But I
sure haven't seen any explicit arguments for polymorphism, though I've
asked for them.  AFAICS, everybody just assumed that because some
related APIs are polymorphic, this one should be, too, and dove into
the problem of how to make a polymorphic API safe for Python 3.

 > If the client says "I'm okay with either" then I fully expect the
 > client to have code to properly handle str vs bytes after the
 > fspath (or whatever it's called) call.

I would too, but, uh, examples of such clients?  And no, antipathy
isn't an example -- it doesn't consume bytes, it passes them through
to the kind of client I want to hear about.

AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
that actually wants it.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Brett Cannon writes:
 > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull  wrote:

 > Well, it makes *your* head hurt;

It doesn't, because I have a different (and IMHO better) model.  I can
interpret yours without pain by comparing to that.

 > By providing os.fspath() I can say that I do not, under any
 > circumstances, want someone to guess at the encoding some bytes
 > path is under to get me a string and instead I want to start and
 > end entirely in a world of strings. IOW os.fspath() lets me work in
 > such a way that the instant bytes are introduced into my code for
 > file paths it triggers a TypeError.

Does it really help you work that way?  open is polymorphic, and will
use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
don't, there's no point in supporting bytes returns from __fspath__,
is there?  Application code will normally not be calling os.fspath.
In the future, pathlib will, I suppose, but even without os.fspath
pathlib already protects you, as does antipathy.[1]

More effective, then, is just to use pathlib for your Path-hacking
work as soon as the path-representing object appears, and Path will
complain about bytes for you.  This is an analogue of the "decode
bytes at the boundary" principle.

 > Yep, we are stuck with the names unless you want to propose a new
 > name and deprecate the old one.

I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
sufficiently ugly to prove my point.


Footnotes: 
[1]  Strictly speaking, antipathy protects you from inadvertant mixing
of bytes and str.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Wes Turner
On Apr 18, 2016 3:19 PM, "Ethan Furman"  wrote:
>
> On 04/18/2016 12:54 PM, Wes Turner wrote:
>
>> Don't we *have* to always support bytes because other programs can
>> create filenames containing bytes?
>
>
> Yes, but not every function has to support bytes.

Because there's no function overloading in Python, we then must have
explicit typing conditionals.

I haven't the time to dig through and compare this with the other fine
solutions presented; is there a reason that a proxy/facade PrimitiveType
wouldn't solve for this?

class TextThing:
  __init__(self, data):
  self.data = data
  self.type_ = type(data)
   __getattr__(self, key):
   return getattr(self.data, key)


>
>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Ethan Furman

On 04/18/2016 12:26 PM, Stephen J. Turnbull wrote:


I haven't looked at Antipathy, but I would guess from Ethan's
promotion of bytes paths and concern with efficiency that "bytes
antipaths" do *not* "go through" str to get to bytes, they already are
bytes (in the sense of class inheritance).


Couple points:

- Correct: if you create an antipathy.Path with bytes, you get a
  bytes path (bPath); if you create an antipathy.Path with str
  you get a str path (uPath)

- if you mix a bPath with a uPath, or bytes with a uPath, or str with
  a bPath -- an exception is raised (conversions are *not* implicit (on
  3.0, at least -- on 2.x you can activate that behavior if you want it)

- my concern with supporting bytes is primarily for the sake of the
  stdlib, and secondarily for anyone who needs to work with bytes; it
  really has no effect on my library (since antipathy uses subclasses
  of bytes/str)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Ethan Furman

On 04/18/2016 02:58 PM, Koos Zevenhoven wrote:


It's a matter of documentation whether it "supports" bytes
or not. In fact, that function (assuming the name os.fspath) could now
even be documented to support this:

 patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath))  # :-)


While the os.fspath() function could be abused in such a way, we 
certainly wouldn't advertise it.  (Leave that to StackOverflow. ;)


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Koos Zevenhoven
On Mon, Apr 18, 2016 at 5:03 PM, Ethan Furman  wrote:
> On 04/18/2016 12:41 AM, Nick Coghlan wrote:
>
>> Given the variant you [Koos] suggested, what if we defined the API
>> semantics
>> like this:
>>
>>  # Offer the simplest possible API as the public vesion
>>  def fspath(pathlike) -> str:
>>  return os._raw_fspath(pathlike)
>>
>>  # Expose the complexity in the "private" variant
>>  def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
>>  # Short-circuit for instances of the output type
>>  if isinstance(pathlike, output_types):
>>  return pathlike
>>  # We'd have a tidier error message here for non-path objects
>>  result = pathlike.__fspath__()
>>  if not isinstance(result, output_types):
>>  raise TypeError("argument is not and does not provide an
>> acceptable pathname")
>>  return result
>
> My initial reaction was that this was overly complex, but after thinking
> about it a couple days I /really/ like it.  It has a reasonable default for
> the 99% real-world use-case, while still allowing for custom and exact
> tailoring (for the 99% stdlib use-case ;) .
>

While it does seem we finally might be nearly there :), this still
seems to need some further discussion.

As described in that long post of mine, I suppose some third-party
code may need the variations (A-C), while it seems that in the stdlib,
most places need (str, bytes), i.e. (A), except in pathlib, which
needs (str,), i.e. (B). I'm not sure what I think about making the
variations private, even if "hiding" the bytes version is, as I said,
an important role of the public function.

Except for that type hint, there is *nothing* in the function that
might mislead the user to think bytes paths are something important in
Python 3. It's a matter of documentation whether it "supports" bytes
or not. In fact, that function (assuming the name os.fspath) could now
even be documented to support this:

patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath))  # :-)

So are we still going to end up with two functions or can we deal with one?
What should the typehint be? Something new in typing.py? How about
FSPath[...] as follows:

FSPath[bytes]  # bytes-based pathlike, including bytes
FSPath[str]   # str-based pathlike, including str

pathstring = typing.TypeVar('pathstring', str, bytes)  # could be
extended with PurePath or some path ABC

So the above variation might become:

def fspathname(pathlike: FSPath[pathstring],
   *, output_types: tuple = (str,)) -> pathstring:
# Short-circuit for instances of the output type
if isinstance(pathlike, output_types):
return pathlike
# We'd have a tidier error message here for non-path objects
result = pathlike.__fspath__()
if not isinstance(result, output_types):
raise TypeError("valid output type not provided via __fspath__")
return result

And similar type hints would apply to os.path functions. For instance,
os.path.dirname:

def dirname(p: FSPath[pathstring]) -> pathstring:
...

This would say pathstring all over and not give anyone any ideas about
bytes, unless they know what they're doing.

Complicated? Yes, typing is. But I think we will need this kind of
hints for os.path functions anyway.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Brett Cannon
On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull  wrote:

> Brett Cannon writes:
>
>  > If we continue with the "str is an encoding of file paths",
>
> It's not.  It's a representation, but not an encoding.  In Python 3,
> encoding means a representation of a character string using bytes.
> It's using "encoding" generically for "representation" that makes your
> head hurt.
>

Well, it makes *your* head hurt; for me it helped clarify some things. :)


>
>  > you can then build from "bytes is an encoding of str" to get a
>  > pyramid of file path encodings: Path -> str -> bytes. I don't think
>  > this is in any way a controversial view.
>
> Perhaps not.  But it's not particularly useful. ;-)  Here's the
> pyramid I think about:
>
>  Path
> /\
>/  \
>   VV
> str <-> bytes
>
> That is, str and bytes are interchangeable *without* any knowledge of
> paths, which are on a higher level of complexity and abstraction.
> Although in pathlib, there's an assumption that paths are serialized
> to str which is (implicitly) serialized to bytes when talking to the
> OS, this is not necessarily true for other structured path classes, in
> particular it is not true for DirEntry (which is a "enhanced
> degenerate" path containing only one path segment but also other
> useful information about the filesystem object addressed)
>
> I haven't looked at Antipathy, but I would guess from Ethan's
> promotion of bytes paths and concern with efficiency that "bytes
> antipaths" do *not* "go through" str to get to bytes, they already are
> bytes (in the sense of class inheritance).
>
>  > But that's when I realized that adding __fspath__ support to
> os.fsdecode()
>  > and os.fsencode(), they become more coercion functions rather than
>  > encoding/decoding functions. It also means that os.fspath() has a place
>  > when you want to say "I only want to encode a file path to str" and
> avoid
>  > the decode bit that os.fsdecode() would do
>
> I don't understand what you're trying to say here.  fsdecode currently
> does not promise to decode anything, because it's polymorphic,
> accepting str and bytes.  fsdecode and fsencode already *are* coercion
> functions.
>

And they will continue to be coercion functions. My point is that since
they coerce there is no way to use them in a way to dictate that you don't
want any str/bytes encoding/decoding to occur without checking the
arguments going into the function (i.e. "no guessing about encodings,
please"). By providing os.fspath() I can say that I do not, under any
circumstances, want someone to guess at the encoding some bytes path is
under to get me a string and instead I want to start and end entirely in a
world of strings. IOW os.fspath() lets me work in such a way that the
instant bytes are introduced into my code for file paths it triggers a
TypeError.


>
> It's this kind of semantic confusion and broken nomenclature that is
> *why* I dislike these polymorphic functions and objects so much.  It
> is impossible to reason correctly about them.  We're stuck with
> invoking "practicality" and muddling through.  And the names mislead
> even experienced Pythonistas.
>

Yep, we are stuck with the names unless you want to propose a new name and
deprecate the old one.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Ethan Furman

On 04/18/2016 12:54 PM, Wes Turner wrote:


Don't we *have* to always support bytes because other programs can
create filenames containing bytes?


Yes, but not every function has to support bytes.

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Wes Turner
On Apr 18, 2016 2:50 PM, "Ethan Furman"  wrote:
>
> On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
>
>> Koos Zevenhoven writes:
>
>
>>> After all, we want something that's *almost* exclusively str.
>>
>>
>> But we don't want that, AFAICT.  Some clearly want this API to be
>> unbiased against bytes in the same way the os APIs are unbiased[2],
>> because that's what we've got in the current proposal.
>
>
> Are we reading the same thread?  For my last several replies I am very
biased against bytes (and I know I'm not the only one).
>
> Just not so biased that I'm unwilling to let clients say, "No, I'm really
okay with getting bytes back".
>
> I really like Koos' ideas because they allow the client to say:
>
> - I only want str
> - I only want bytes
> - I'm okay with either
>
> If the client says "I'm okay with either" then I fully expect the client
to have code to properly handle str vs bytes after the fspath (or whatever
it's called) call.

Don't we *have* to always support bytes because other programs can create
filenames containing bytes?

>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Ethan Furman

On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:

Koos Zevenhoven writes:



After all, we want something that's *almost* exclusively str.


But we don't want that, AFAICT.  Some clearly want this API to be
unbiased against bytes in the same way the os APIs are unbiased[2],
because that's what we've got in the current proposal.


Are we reading the same thread?  For my last several replies I am very 
biased against bytes (and I know I'm not the only one).


Just not so biased that I'm unwilling to let clients say, "No, I'm 
really okay with getting bytes back".


I really like Koos' ideas because they allow the client to say:

- I only want str
- I only want bytes
- I'm okay with either

If the client says "I'm okay with either" then I fully expect the client 
to have code to properly handle str vs bytes after the fspath (or 
whatever it's called) call.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Random832


On Mon, Apr 18, 2016, at 15:26, Stephen J. Turnbull wrote:
> in
> particular it is not true for DirEntry (which is a "enhanced
> degenerate" path containing only one path segment but also other
> useful information abot the filesystem object addressed)

DirEntry contains multiple path segments - it has the name, and the
directory path that was passed into scandir.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Stephen J. Turnbull
Brett Cannon writes:

 > If we continue with the "str is an encoding of file paths",

It's not.  It's a representation, but not an encoding.  In Python 3,
encoding means a representation of a character string using bytes.
It's using "encoding" generically for "representation" that makes your
head hurt.

 > you can then build from "bytes is an encoding of str" to get a
 > pyramid of file path encodings: Path -> str -> bytes. I don't think
 > this is in any way a controversial view.

Perhaps not.  But it's not particularly useful. ;-)  Here's the
pyramid I think about:

 Path
/\
   /  \
  VV
str <-> bytes

That is, str and bytes are interchangeable *without* any knowledge of
paths, which are on a higher level of complexity and abstraction.
Although in pathlib, there's an assumption that paths are serialized
to str which is (implicitly) serialized to bytes when talking to the
OS, this is not necessarily true for other structured path classes, in
particular it is not true for DirEntry (which is a "enhanced
degenerate" path containing only one path segment but also other
useful information abot the filesystem object addressed)

I haven't looked at Antipathy, but I would guess from Ethan's
promotion of bytes paths and concern with efficiency that "bytes
antipaths" do *not* "go through" str to get to bytes, they already are
bytes (in the sense of class inheritance).

 > But that's when I realized that adding __fspath__ support to os.fsdecode()
 > and os.fsencode(), they become more coercion functions rather than
 > encoding/decoding functions. It also means that os.fspath() has a place
 > when you want to say "I only want to encode a file path to str" and avoid
 > the decode bit that os.fsdecode() would do

I don't understand what you're trying to say here.  fsdecode currently
does not promise to decode anything, because it's polymorphic,
accepting str and bytes.  fsdecode and fsencode already *are* coercion
functions.

It's this kind of semantic confusion and broken nomenclature that is
*why* I dislike these polymorphic functions and objects so much.  It
is impossible to reason correctly about them.  We're stuck with
invoking "practicality" and muddling through.  And the names mislead
even experienced Pythonistas.

Steve

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Stephen J. Turnbull
I don't disagree with the basic analysis, but there are a number of
issues with motivational statements.

Koos Zevenhoven writes:

 > (B) "str-based only"
 > *Accept*: str, provided via __fspath__ as well as plain str.
 > *Return*: str.
 > *Audience*: relatively low-level code that works exclusively with str
 > paths but accepts specialized path objects as input.

Why "low-level"?  All code that stores paths persistently is likely to
store them in text files or database strings or the like, rather than
as Path (read: specialized path objects, not necessarily
pathlib.Path).  But if there is any low-level manipulation of the
paths to be done before storing, it would be done as Path.  Thus
high-level code might also want to accept Path transparently.

 > (C) "bytes-based only"
 > *Accept*: bytes, provided via __fspath__ as well as plain bytes.
 > *Return*: bytes.
 > *Audience*: low-level code that explicitly deals with paths as bytes
 > (probably to deal with undefined/ill-defined encodings).

No, if it's to deal with encoding issues, we wouldn't accept this.
PEP 383 eliminates that concern.  We accept bytes to support people
who are representing paths with bytes because they think that it's a
good idea and that encoding doesn't matter in their application.

 > (D) "coerce to str"
 > *Accept*: str and bytes, provided via __fspath__ as well as plain str
 > and bytes instances.
 > *Return*: str (coerced / decoded if needed).
 > *Audience*: code that deals explicitly with str but wants to 'try'
 > supporting bytes-based path inputs too via implicit decoding (even if
 > it may result in surrogate escapes, which one cannot for instance
 > print(...).)

No.  As Nick points out with respect to fsencode/fsdecode, it's not
a question of supporting known bytes via implicit decoding (that's
what __fspath__ does for the types that support it), but rather
of supporting ambiguity.  Best practice is to convert explicitly at
the boundary, because it's too likely that data with unexpected type
is just the wrong data.  

Printing surrogates can be done with errors=backslashreplace, and if
you're using fsdecode, you probably should use that, namereplace, or
xmlcharrefreplace.

 > (E) "coerce to bytes"
 > *Accept*: str and bytes, provided via __fspath__ as well as plain str
 > and bytes instances.
 > *Return*: bytes (coerced / encoded if needed).
 > *Audience*: low-level code that explicitly deals with bytes paths but
 > wants to accept str-based path inputs too via implicit encoding.

Again, it's a question of ambiguity, or perhaps sloppy programming
(eg, using str literals for paths in a bytes-oriented program).

Use cases D and E are basically "guessing when faced with ambiguity",
and fsencode and fsdecode are code smells because (as Nick claims)
they almost always conceal a situation where you don't know whether
you've got bytes or str (and it's way too much work to find out by
tracing them back to where they came from).

 > It seems to me we now "all" agree that __fspath__ should allow
 > str+bytes polymorphism.

I don't agree that we *should* allow polymorphism, because (purity)
paths are in the text domain[1] and (practicality) I don't believe that
use of os.fspath will be restricted to "low-level boundary code".  I
would be perfectly happy telling bytes users that the idiom is not
"os.fspath(maybe_direntry, allow_types=(bytes,))", but rather
"os.fsencode(os.fspath(maybe_direntry))", so that code in the text
domain can safely use os.fspath(maybe_direntry) without worrying that
it will raise because maybe_direntry.__fspath__() returns bytes.

This would allow pathlib.Path to handle arguments providing __fspath__
transparently.  With the current proposal, it would need to rule out
bytes before invoking os.fspath, or handle the exception, or leave the
exception to its caller.  None of these options are pleasant.

Unfortunately, as Nick points out, defining __fspath__ to return str
is very unpleasant because bytes applications will now have to guard
*everything* that might provide __fspath__ with that incantation
before passing to open and other APIs that store the path on the
object returned.  So we don't really have a choice about polymorphism
if we want to support both __fspath__ and bytes paths.

 > After all, we want something that's *almost* exclusively str.

But we don't want that, AFAICT.  Some clearly want this API to be
unbiased against bytes in the same way the os APIs are unbiased[2],
because that's what we've got in the current proposal.  Further, due
to the existing ambiguity in fsencode and fsdecode, we're extending
the field of ambiguity where bytes and str can mix indiscriminately.

If we are serious about "*almost* exclusively str" we should accept
that "exclusively str" is a very good approximation and much easier to
use correctly, and regretfully postpone inclusion of DirEntry in this
protocol to the future.  But that's not on the table, is it?


Footnotes: 
[1]  Representation on disk as (basically 

Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Brett Cannon
On Sun, 17 Apr 2016 at 06:59 Koos Zevenhoven  wrote:

> On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull
>  wrote:
> > Nick Coghlan writes:
> >
> >  > str and bytes aren't going to implement __fspath__ (since they're
> >  > only *sometimes* path objects), so asking people to call the
> >  > protocol method directly for any purpose would be a pain.
> >
> > It *should* be a pain.  People who need bytes should call fsencode,
> > people who need str should call fsdecode, and Ethan's antipathy checks
> > for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
> > the bartender and the janitor, last call was hours ago.  OK, maybe
> > there are enough clients to make it worthwhile to provide the utility,
> > but it should be clearly marked as "double opt-in, for experts only
> > (consenting adults must show proof of insurance)".
>
> My doubts, expressed several times in these threads, about the need
> for a *public* os.fspath function to complement the __fspath__
> protocol, are now perhaps gone. I'll explain why (and how). The
> reasons for my doubts were that
>
> (1) The audience outside the stdlib for such a function should be
> small, because it is preferred to either use existing tools in
> os.path.* or pathlib (or similar) for manipulating paths.
>
> (2) There are just too many different possible versions of this
> function: rejecting str, rejecting bytes, coercion to str, coercion to
> bytes, and accepting both str and bytes. That's a total of 5 different
> cases. People also used to talk about versions that would not allow
> passing through objects that are already bytes or str. That would make
> it a total of 10 different versions!
> (in principle, there could be even more, but let's not go there :-).
> In other words, this argument was that it is probably best to
> implement whatever flavor is needed for the context, perhaps based on
> documented recipes.
>
>
> Regarding (2), we can first rule out half of the 10 cases---the ones
> that reject plain instances of bytes and/or str---because they would
> not be very useful as all the isinstance/hasattr checking etc. would
> be left to the caller. And here are the remaining five, explained
> based on what they accept as argument, what they return, and where
> they would be used:
>
> (A) "polymorphic"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: str/bytes depending on input.
> *Audience*: the stdlib, including os.path.things, os.things,
> shutil.things, open, ... (some functions would need a C version).
> There may even be a small audience outside the stdlib.
>
> (B) "str-based only"
> *Accept*: str, provided via __fspath__ as well as plain str.
> *Return*: str.
> *Audience*: relatively low-level code that works exclusively with str
> paths but accepts specialized path objects as input.
>
> (C) "bytes-based only"
> *Accept*: bytes, provided via __fspath__ as well as plain bytes.
> *Return*: bytes.
> *Audience*: low-level code that explicitly deals with paths as bytes
> (probably to deal with undefined/ill-defined encodings).
>
> (D) "coerce to str"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: str (coerced / decoded if needed).
> *Audience*: code that deals explicitly with str but wants to 'try'
> supporting bytes-based path inputs too via implicit decoding (even if
> it may result in surrogate escapes, which one cannot for instance
> print(...).)
>
> (E) "coerce to bytes"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: bytes (coerced / encoded if needed).
> *Audience*: low-level code that explicitly deals with bytes paths but
> wants to accept str-based path inputs too via implicit encoding.
>
>
> Even if all options (A-E) probably have small audiences (compared to
> e.g. os.path.*), some of them have larger audiences than others. But
> all of them have at least *some* reasonable audience (as desribed
> above).
>
> Recently (well, a few days ago, but 'recently', considering the scale
> of these discussions anyway ;-), Nick pointed out something I hadn't
> realized---os.fsencode and os.fsdecode actually already implement
> coercion to bytes and str, respectively. With those two functions made
> compatible with the __fspath__ protocol [using (A) above], they would
> in fact *be* (D) and (E), respectively.
>
> Now, we only have options (A-C) left. They could all be implemented
> roughly as follows:
>
> def fspath(pathlike, *, output_types = (str,)):
>   if hasattr(pathlike, '__fspath__'):
> ret = pathlike.__fspath__()  # or pathlike.__fspath__ if it's not a
> method
>   else:
> ret = pathlike
>   if not isinstance(ret, output_types):
> raise TypeError("argument is not and does not provide an
> acceptable pathname")
>   return ret
>
> With an implementation like the above, (A) would correspond to
> output_types = 

Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Ethan Furman

On 04/18/2016 12:41 AM, Nick Coghlan wrote:


Given the variant you [Koos] suggested, what if we defined the API semantics
like this:

 # Offer the simplest possible API as the public vesion
 def fspath(pathlike) -> str:
 return os._raw_fspath(pathlike)

 # Expose the complexity in the "private" variant
 def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
 # Short-circuit for instances of the output type
 if isinstance(pathlike, output_types):
 return pathlike
 # We'd have a tidier error message here for non-path objects
 result = pathlike.__fspath__()
 if not isinstance(result, output_types):
 raise TypeError("argument is not and does not provide an
acceptable pathname")
 return result


My initial reaction was that this was overly complex, but after thinking 
about it a couple days I /really/ like it.  It has a reasonable default 
for the 99% real-world use-case, while still allowing for custom and 
exact tailoring (for the 99% stdlib use-case ;) .


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-18 Thread Nick Coghlan
On 18 April 2016 at 07:05, Koos Zevenhoven  wrote:

> On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman  wrote:
> > On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
> >
> >> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
> >> above argumentation and the rough implementation of os.fspath(...),
> >> the conclusion is that the os.fspath function should indeed be public,
> >> and that no further variations are needed.
> >
> >
> > Nice summation, thank you.  :)
> >
>
> Come on, Ethan, that summary was not for you ;)


As Chris noted though, the "Yes, that summary is accurate" from active
participants in the discussion helps assure readers that it's a good
overview :)

Given the variant you suggested, what if we defined the API semantics like
this:

# Offer the simplest possible API as the public vesion
def fspath(pathlike) -> str:
return os._raw_fspath(pathlike)

# Expose the complexity in the "private" variant
def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
# Short-circuit for instances of the output type
if isinstance(pathlike, output_types):
return pathlike
# We'd have a tidier error message here for non-path objects
result = pathlike.__fspath__()
if not isinstance(result, output_types):
raise TypeError("argument is not and does not provide an
acceptable pathname")
return result

That way, the default API would be saying unambiguously that the preferred
way of manipulating filesystem paths is as text, but the lower level
"mainly for the standard library" API would explicitly handle the 3
different scenarios (binary-input-is-a-bug, text-input-is-a-bug, and
either-binary-or-text-input-is-fine).

That way the structure of the additional parameters on _raw_fspath can be
tailored specifically to the needs of the standard library, without
worrying as much about 3rd party use cases.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Ethan Furman

On 04/17/2016 02:05 PM, Koos Zevenhoven wrote:

On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman  wrote:

On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:


So, as a summary: With a str+bytes-polymorphic __fspath__, with the
above argumentation and the rough implementation of os.fspath(...),
the conclusion is that the os.fspath function should indeed be public,
and that no further variations are needed.



Nice summation, thank you.  :)



Come on, Ethan, that summary was not for you ;)


Heh.


You can do better than that: read the whole thing! ;-).


Ah, but I did read the whole thing!  I just didn't want to quote it all 
and then add one line, so I snipped the rest.


Let me try again:

Good, well thought-out post.  Thank you.  :)

if-at-first-you-don't-succeed'ly yrs,

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Chris Angelico
On Mon, Apr 18, 2016 at 7:05 AM, Koos Zevenhoven  wrote:
> On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman  wrote:
>> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
>>
>>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
>>> above argumentation and the rough implementation of os.fspath(...),
>>> the conclusion is that the os.fspath function should indeed be public,
>>> and that no further variations are needed.
>>
>>
>> Nice summation, thank you.  :)
>>
>
> Come on, Ethan, that summary was not for you ;) It was for lazy
> people, people with bad memory, or people not so involved in the
> topic. I wrote a big post, provided new arguments, with other points
> collected into the same logical framework, wrote a new version of
> os.fspath and argued why it is the right one --- and all you do is
> read the stupid summary. You can do better than that: read the whole
> thing! ;-).

Yes, but people like me who haven't read every single post appreciate
the vote of support from someone who has. Ethan's post says that this
one-paragraph summary has twice as much weight as it had when only one
person attests it.

So, thank you Koos for summarizing, and thank you Ethan for affirming
the summary.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Koos Zevenhoven
On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman  wrote:
> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
>
>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
>> above argumentation and the rough implementation of os.fspath(...),
>> the conclusion is that the os.fspath function should indeed be public,
>> and that no further variations are needed.
>
>
> Nice summation, thank you.  :)
>

Come on, Ethan, that summary was not for you ;) It was for lazy
people, people with bad memory, or people not so involved in the
topic. I wrote a big post, provided new arguments, with other points
collected into the same logical framework, wrote a new version of
os.fspath and argued why it is the right one --- and all you do is
read the stupid summary. You can do better than that: read the whole
thing! ;-).

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Ethan Furman

On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:


So, as a summary: With a str+bytes-polymorphic __fspath__, with the
above argumentation and the rough implementation of os.fspath(...),
the conclusion is that the os.fspath function should indeed be public,
and that no further variations are needed.


Nice summation, thank you.  :)

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Koos Zevenhoven
On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull
 wrote:
> Nick Coghlan writes:
>
>  > str and bytes aren't going to implement __fspath__ (since they're
>  > only *sometimes* path objects), so asking people to call the
>  > protocol method directly for any purpose would be a pain.
>
> It *should* be a pain.  People who need bytes should call fsencode,
> people who need str should call fsdecode, and Ethan's antipathy checks
> for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
> the bartender and the janitor, last call was hours ago.  OK, maybe
> there are enough clients to make it worthwhile to provide the utility,
> but it should be clearly marked as "double opt-in, for experts only
> (consenting adults must show proof of insurance)".

My doubts, expressed several times in these threads, about the need
for a *public* os.fspath function to complement the __fspath__
protocol, are now perhaps gone. I'll explain why (and how). The
reasons for my doubts were that

(1) The audience outside the stdlib for such a function should be
small, because it is preferred to either use existing tools in
os.path.* or pathlib (or similar) for manipulating paths.

(2) There are just too many different possible versions of this
function: rejecting str, rejecting bytes, coercion to str, coercion to
bytes, and accepting both str and bytes. That's a total of 5 different
cases. People also used to talk about versions that would not allow
passing through objects that are already bytes or str. That would make
it a total of 10 different versions!
(in principle, there could be even more, but let's not go there :-).
In other words, this argument was that it is probably best to
implement whatever flavor is needed for the context, perhaps based on
documented recipes.


Regarding (2), we can first rule out half of the 10 cases---the ones
that reject plain instances of bytes and/or str---because they would
not be very useful as all the isinstance/hasattr checking etc. would
be left to the caller. And here are the remaining five, explained
based on what they accept as argument, what they return, and where
they would be used:

(A) "polymorphic"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str/bytes depending on input.
*Audience*: the stdlib, including os.path.things, os.things,
shutil.things, open, ... (some functions would need a C version).
There may even be a small audience outside the stdlib.

(B) "str-based only"
*Accept*: str, provided via __fspath__ as well as plain str.
*Return*: str.
*Audience*: relatively low-level code that works exclusively with str
paths but accepts specialized path objects as input.

(C) "bytes-based only"
*Accept*: bytes, provided via __fspath__ as well as plain bytes.
*Return*: bytes.
*Audience*: low-level code that explicitly deals with paths as bytes
(probably to deal with undefined/ill-defined encodings).

(D) "coerce to str"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str (coerced / decoded if needed).
*Audience*: code that deals explicitly with str but wants to 'try'
supporting bytes-based path inputs too via implicit decoding (even if
it may result in surrogate escapes, which one cannot for instance
print(...).)

(E) "coerce to bytes"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: bytes (coerced / encoded if needed).
*Audience*: low-level code that explicitly deals with bytes paths but
wants to accept str-based path inputs too via implicit encoding.


Even if all options (A-E) probably have small audiences (compared to
e.g. os.path.*), some of them have larger audiences than others. But
all of them have at least *some* reasonable audience (as desribed
above).

Recently (well, a few days ago, but 'recently', considering the scale
of these discussions anyway ;-), Nick pointed out something I hadn't
realized---os.fsencode and os.fsdecode actually already implement
coercion to bytes and str, respectively. With those two functions made
compatible with the __fspath__ protocol [using (A) above], they would
in fact *be* (D) and (E), respectively.

Now, we only have options (A-C) left. They could all be implemented
roughly as follows:

def fspath(pathlike, *, output_types = (str,)):
  if hasattr(pathlike, '__fspath__'):
ret = pathlike.__fspath__()  # or pathlike.__fspath__ if it's not a method
  else:
ret = pathlike
  if not isinstance(ret, output_types):
raise TypeError("argument is not and does not provide an
acceptable pathname")
  return ret

With an implementation like the above, (A) would correspond to
output_types = (str, bytes), (B) to the default, and (C) to
output_types = (bytes,).


So, with the above considerations as a counterargument, I consider
argument (2) gone.

What about argument (1), that the audience for the os.fspath(...)
function (especially for one selected version of 

Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Nick Coghlan
On 17 April 2016 at 18:03, Stephen J. Turnbull  wrote:

> Nick Coghlan writes:
>  > and instead throw exceptions in those cases.
>
> Then I don't understand the current design of fsdecode and fsencode.
> Shouldn't they raise on str and bytes respectively, rather than
> passing them through?  In general, I would expect that something
> that's explicitly intended to be polymorphic would be documented as
> such, and the *caller* would be responsible for type-checking and
> raising if it got the wrong thing.
>

I was initially surprised myself, but then realised it made sense for their
intended use cases - if almost every usage looks like "obj if
isinstance(obj, str) else os.fsdecode(obj)", then there ends up being a
strong pragmatic case for pushing the pass-through down into the underlying
function to reduce code duplication and rejecting str input in the cases
where it isn't supported. By contrast, there are lots of places where
"obj.decode()" gets called without a pass-through for objects that are
already decoded.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-17 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > str and bytes aren't going to implement __fspath__ (since they're
 > only *sometimes* path objects), so asking people to call the
 > protocol method directly for any purpose would be a pain.

It *should* be a pain.  People who need bytes should call fsencode,
people who need str should call fsdecode, and Ethan's antipathy checks
for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
the bartender and the janitor, last call was hours ago.  OK, maybe
there are enough clients to make it worthwhile to provide the utility,
but it should be clearly marked as "double opt-in, for experts only
(consenting adults must show proof of insurance)".

The functionality of raising on wrong types can be incorporated in
fsencode and fsdecode, but I think there's still some discussion
needed about the conditions for raising, and what flags are needed.

Of course with this reinterpretation, names like "fs_ensure_str" and
"fs_ensure_bytes" might be more appropriate (much as y'all hate
putting types in function names, in this case I think that's best).
But backward compatibility, and the existing names aren't *that* bad I
guess.

 > You may have missed my email where I agreed os.fspath() itself
 > needs to ensure the output is a str object and throw an exception
 > otherwise.

Presumably it should do the same for bytes when those are desired,
though.  I don't find the "cast to bytes using memoryview" approach
plausible, especially not where I live: if str, very likely some of
the characters will be outside of the latin1 repertoire, and thus the
internal representation will likely be full of NULs, and certainly not
be what the user wants.

 > The remaining API design debate relates to whether the polymorphic
 > version should be "os.fspath(obj, allow_bytes=True)" or
 > "os._raw_fspath(obj)" (with Ethan favouring the former, and me the
 > latter).

 > > Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing
 > > with an implicitly invoked polymorphic API like this one --
 > > unless you consider a crashed program to be in the binary
 > > domain. ;-)
 > 
 > I do, as one of the core changes in design philosophy between
 > Python 2 and 3 is attempting to remove the implicit level shifting
 > between the binary and text domains,

Hey, Reverend, I've been singing those hymns since the early '90s.

 > and instead throw exceptions in those cases.

Then I don't understand the current design of fsdecode and fsencode.
Shouldn't they raise on str and bytes respectively, rather than
passing them through?  In general, I would expect that something
that's explicitly intended to be polymorphic would be documented as
such, and the *caller* would be responsible for type-checking and
raising if it got the wrong thing.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-16 Thread Nick Coghlan
On 16 April 2016 at 21:21, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > On 15 April 2016 at 00:52, Stephen J. Turnbull  wrote:
>  > > Nick Coghlan writes:
>  > >
>  > >  > The use case for returning bytes from __fspath__ is DirEntry, so you
>  > >  > can write things like this in low level code:
>  > >  >
>  > >  > def myscandir(dirpath):
>  > >  > for entry in os.scandir(dirpath):
>  > >  > if entry.is_file():
>  > >  > with open(entry) as f:
>  > >  > # do something
>  > >
>  > > Excuse me, but that is *not* a use case for returning bytes from
>  > > DirEntry.__fspath__.  open() is perfectly happy taking str (including
>  > > surrogate-encoded rawbytes).
>  >
>  > That results in a different type for the file object's name:
>  >
>  > >>> open("README.md").name
>  > 'README.md'
>  > >>> open(b"README.md").name
>  > b'README.md'
>
> OK, you win, __fspath__ needs to be polymorphic.
>
> But you've just shifted me to -1 on "os.fspath": it's an attractive
> nuisance.
>
> EIBTI, applications and high-level library functions should
> use os.fsdecode or os.fsencode.  Functions that take a polymorphic
> argument and want preserve type should invoke __fspath__ on the
> argument. That will visually signal that the caller is not merely
> low-level, but is explicitly a boundary function.

str and bytes aren't going to implement __fspath__ (since they're only
*sometimes* path objects), so asking people to call the protocol
method directly for any purpose would be a pain.

>  (You could rename
> the generic function as "os._fspath", I guess, but I *really* want to
> deprecate calling the polymorphic version in user code.  _fspath can
> be added if experience shows that polymorphic usage is very desireable
> outside the stdlib.  This remark is in my not-so-Dutch opinion, of
> course.)

You may have missed my email where I agreed os.fspath() itself needs
to ensure the output is a str object and throw an exception otherwise.
The remaining API design debate relates to whether the polymorphic
version should be "os.fspath(obj, allow_bytes=True)" or
"os._raw_fspath(obj)" (with Ethan favouring the former, and me the
latter).

>
>  > The guarantee we want to provide those folks is that if they're
>  > operating in the binary domain they'll stay there.
>
> Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing with
> an implicitly invoked polymorphic API like this one -- unless you
> consider a crashed program to be in the binary domain. ;-)

I do, as one of the core changes in design philosophy between Python 2
and 3 is attempting to remove the implicit level shifting between the
binary and text domains, and instead throw exceptions in those cases.
Pragmatism requires us to keep some of them (e.g. the codecs module is
officially object<->object in both Python 2 and Python 3, and string
formatting codes can still do unexpected things), but a great many of
them are already gone, and we don't want to add any new ones if
alternative designs are available.

> Note that
> the current proposala don't even do that for the binary domain, only
> for the text domain!

Folks that want to ensure they're working in the binary domain can
already do "memoryview(obj)" to ensure they have a bytes-like object
without constraining it to a specific type.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-16 Thread Paul Moore
On 16 April 2016 at 14:46, Stephen J. Turnbull  wrote:
> Paul Moore writes:
[...]
>  > 1. I just want to pass the argument on to other functions - just do
>  > so, stdlib functions will work fine.
>
> I think this is a bad idea unless you *need* polymorphism, but OK,
> it's "consenting adults".

All I'm really saying here is that if you don't need to care about
type checking (and 99% of Python programs rely on duck typing, so this
is pretty much the norm) then everything will be OK. I'm not
suggesting encouraging polymorphism, just pointing out that most code
should simply work and this whole debate is a non-issue for code like
that. (That's the whole point of getting the stdlib functions to
accept Path objects, after all :-))

>  > 2. I need a string - use os.fsdecode(p)
>  > 3. I need bytes - use os.fsencode(p)
>  > 4. I need a guaranteed pathlib.Path object so that I can use Path
>  > methods - convert via pathlib.Path(os.fsdecode(p))
>
> LGTM.  Applications or user toolkits could provide a derived
> IFeelLuckyPath(Path) for symmetry with the os functions.
>
>  > I guess there's the possibility that you want to deliberately reject
>  > bytes-like paths,
>
> I wouldn't put it that way.  I think more likely is the possibility
> that you want to restrict yourself to a particular type, as all your
> code is written in terms of that type and expects that type.  Note
> that Nick's example shows that in both the bytes domain and the text
> domain you can easily end up with a filelike.name of the wrong type.

But within your own code, you do that by convention and good coding
practices, not by explicit type checks (except in boundary code). If
you're writing a library to be used by others, you should be as
permissive as possible - you may not expect your code to be called
with bytes-like paths, but why go out of your way to reject it? That's
not Pythonic, IMO. (On the other hand, documenting that only text-like
path objects are supported by your library is fine).

In my experience, bytes/text safety is about being aware of where the
two different types appear in your program, not about forcing only one
type. So my cases are about keeping the types clear - the output of
(1) is "same as input", of (2) is "string", of (3) is "bytes" and of
(4) is "Path". Call me with whatever you like, I can work with it in
terms I need.

But we're mostly just debating coding style here, I think we agree on
the basic principle.

>  > and it's not immediately obvious how you'd do that without
>  > os.fspath or using the __fspath__ protocol directly, but I'm not
>  > sure what anyone gains by doing so (maybe the chance to fail early?
>  > but doesn't using fsdecode mean I never need to fail at all?)
>
> Well, wouldn't you like to raise there if your dataflow spec says only
> one type should ever be observed?

Meh. Maybe asserts, maybe unit tests. But typechecks throughout my
code sounds more like strong typing than Python. But as I say, coding
style - I write scripts, glue code, and general-use libraries. None of
these lend themselves to that sort of rigorous dataflow analysis (this
is the same reason I have little personal use for the new typechecking
stuff).

> The reasons that I wouldn't bother are that (1) I suspect it's going
> to be very rare to see bytes in a text application, and (2) in bytes-
> oriented code I would be fairly likely to either specify literals as
> str (a bug, but nobody would ever notice) or importing them from an
> .ini or other text source (which might very well be in a non-
> filesystem encoding in my environment!)  In either case it's probably
> the filename I want but specified in the wrong form.

Also, that feels very much like the sort of boundary code that needs
to do the fiddly rigorous stuff so the rest of us don't have to :-)

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-16 Thread Stephen J. Turnbull
Paul Moore writes:
 > On 16 April 2016 at 12:21, Stephen J. Turnbull  wrote:
 > > OK, you win, __fspath__ needs to be polymorphic.
 > >
 > > But you've just shifted me to -1 on "os.fspath": it's an attractive
 > > nuisance.  EIBTI, applications and high-level library functions should
 > > use os.fsdecode or os.fsencode.
 > 
 > I presume your expectation is that os.fsencode/os.fsdecode will work
 > with objects supporting the __fspath__ protocol?

Yes, I've suggested that before, and I think it's TOOWTDI, rather than
insisting on a os.fspath intervening, even if os.fspath is included
after all.

 > So the question for me is, if I'm writing a function that takes a path
 > argument p:

 > 1. I just want to pass the argument on to other functions - just do
 > so, stdlib functions will work fine.

I think this is a bad idea unless you *need* polymorphism, but OK,
it's "consenting adults".

 > 2. I need a string - use os.fsdecode(p)
 > 3. I need bytes - use os.fsencode(p)
 > 4. I need a guaranteed pathlib.Path object so that I can use Path
 > methods - convert via pathlib.Path(os.fsdecode(p))

LGTM.  Applications or user toolkits could provide a derived
IFeelLuckyPath(Path) for symmetry with the os functions.

 > I guess there's the possibility that you want to deliberately reject
 > bytes-like paths,

I wouldn't put it that way.  I think more likely is the possibility
that you want to restrict yourself to a particular type, as all your
code is written in terms of that type and expects that type.  Note
that Nick's example shows that in both the bytes domain and the text
domain you can easily end up with a filelike.name of the wrong type.

 > and it's not immediately obvious how you'd do that without
 > os.fspath or using the __fspath__ protocol directly, but I'm not
 > sure what anyone gains by doing so (maybe the chance to fail early? 
 > but doesn't using fsdecode mean I never need to fail at all?)

Well, wouldn't you like to raise there if your dataflow spec says only
one type should ever be observed?

The reasons that I wouldn't bother are that (1) I suspect it's going
to be very rare to see bytes in a text application, and (2) in bytes-
oriented code I would be fairly likely to either specify literals as
str (a bug, but nobody would ever notice) or importing them from an
.ini or other text source (which might very well be in a non-
filesystem encoding in my environment!)  In either case it's probably
the filename I want but specified in the wrong form.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-16 Thread Paul Moore
On 16 April 2016 at 12:21, Stephen J. Turnbull  wrote:
> OK, you win, __fspath__ needs to be polymorphic.
>
> But you've just shifted me to -1 on "os.fspath": it's an attractive
> nuisance.  EIBTI, applications and high-level library functions should
> use os.fsdecode or os.fsencode.

I presume your expectation is that os.fsencode/os.fsdecode will work
with objects supporting the __fspath__ protocol?

So the question for me is, if I'm writing a function that takes a path
argument p (in the most general sense - I want my function to be able
to handle anything the stdlib functions can) then how do I write the
code? There are 4 cases I can think of:

1. I just want to pass the argument on to other functions - just do
so, stdlib functions will work fine.
2. I need a string - use os.fsdecode(p)
3. I need bytes - use os.fsencode(p)
4. I need a guaranteed pathlib.Path object so that I can use Path
methods - convert via pathlib.Path(os.fsdecode(p))

I guess there's the possibility that you want to deliberately reject
bytes-like paths, and it's not immediately obvious how you'd do that
without os.fspath or using the __fspath__ protocol directly, but I'm
not sure what anyone gains by doing so (maybe the chance to fail
early? but doesn't using fsdecode mean I never need to fail at all?)

While I don't have any specific reason to object to os.fspath, I'd
appreciate someone describing a concrete use case that needs it (and
isn't covered by any of the options above).

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-16 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > On 15 April 2016 at 00:52, Stephen J. Turnbull  wrote:
 > > Nick Coghlan writes:
 > >
 > >  > The use case for returning bytes from __fspath__ is DirEntry, so you
 > >  > can write things like this in low level code:
 > >  >
 > >  > def myscandir(dirpath):
 > >  > for entry in os.scandir(dirpath):
 > >  > if entry.is_file():
 > >  > with open(entry) as f:
 > >  > # do something
 > >
 > > Excuse me, but that is *not* a use case for returning bytes from
 > > DirEntry.__fspath__.  open() is perfectly happy taking str (including
 > > surrogate-encoded rawbytes).
 > 
 > That results in a different type for the file object's name:
 > 
 > >>> open("README.md").name
 > 'README.md'
 > >>> open(b"README.md").name
 > b'README.md'

OK, you win, __fspath__ needs to be polymorphic.

But you've just shifted me to -1 on "os.fspath": it's an attractive
nuisance.  EIBTI, applications and high-level library functions should
use os.fsdecode or os.fsencode.  Functions that take a polymorphic
argument and want preserve type should invoke __fspath__ on the
argument.  That will visually signal that the caller is not merely
low-level, but is explicitly a boundary function.  (You could rename
the generic function as "os._fspath", I guess, but I *really* want to
deprecate calling the polymorphic version in user code.  _fspath can
be added if experience shows that polymorphic usage is very desireable
outside the stdlib.  This remark is in my not-so-Dutch opinion, of
course.)

 > The guarantee we want to provide those folks is that if they're
 > operating in the binary domain they'll stay there.

Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing with
an implicitly invoked polymorphic API like this one -- unless you
consider a crashed program to be in the binary domain. ;-)  Note that
the current proposala don't even do that for the binary domain, only
for the text domain!

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-15 Thread Nick Coghlan
On 15 April 2016 at 00:52, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > The use case for returning bytes from __fspath__ is DirEntry, so you
>  > can write things like this in low level code:
>  >
>  > def myscandir(dirpath):
>  > for entry in os.scandir(dirpath):
>  > if entry.is_file():
>  > with open(entry) as f:
>  > # do something
>
> Excuse me, but that is *not* a use case for returning bytes from
> DirEntry.__fspath__.  open() is perfectly happy taking str (including
> surrogate-encoded rawbytes).

That results in a different type for the file object's name:

>>> open("README.md").name
'README.md'
>>> open(b"README.md").name
b'README.md'

Implicitly level shifting in a low level API isn't a good thing,
especially when there are idempotent level shifting commands available
(so you can always ensure a given value is on the level you expect,
even if you don't know which level it was on originally).

I completely agree with you that folks working with text in the binary
domain are asking for trouble, but at the same time, that's the
reality of the way a lot of *nix system interfaces operate. The
guarantee we want to provide those folks is that if they're operating
in the binary domain they'll stay there unless they explicitly shift
out of it using a decoding API of some kind - doing it behind their
back would be akin to implicitly shifting from the time domain to the
frequency domain in an engineering library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 06:01 PM, Ethan Furman wrote:

On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:



you'll have to impose it on me.


Hmm.  Well, the good news is you have convinced me that letting bytes
through willy-nilly is akin to loosing the hounds of hell on our code.
The bad news is I was never in that camp.  ;)


Actually, in retrospect, I was in that camp at the beginning.  But 
Brett's code (and your arguments, amongst others) convinced me of that 
 or  would be better/safer.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:


However, the proposed polymorphism does create ambiguity and risk for
my uses.  I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon.  Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging!  So I don't consent;
you'll have to impose it on me.


Hmm.  Well, the good news is you have convinced me that letting bytes 
through willy-nilly is akin to loosing the hounds of hell on our code. 
The bad news is I was never in that camp.  ;)


The camp I'm in is a function* that, be default, will raise if bytes 
enters the picture -- but will allow them through if the user 
specifically says they are okay with getting bytes.


Would that work for you?

--
~Ethan~

*Or pair of functions, one that is str-only, one that allows both -- but 
I'd rather just have one function with a flag.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Ethan Furman writes:

 > Substitute open() with sending those bytes somewhere else:

Eg, pathlib.Path, which will raise?  Surely it should be safe to pass
a DirEntry to a pathlib constructor?  Note that having Path call
fsdecode implicitly is a bad idea, because we don't know the
provenance of generic bytes.  But by design of __fspath__, its value
(if str) is suitable for passing to Path, for further processing.

 > why should I have to reencode this str back to bytes, when bytes
 > are what I asked for in the first place?

Erm, you didn't *ask* for bytes.  You asked for whatever __fspath__ is
going to give you.  And in many cases, like pathlib, it will be str.
I imagine that doesn't bother you; you plan to use antipathy anyway.
But if there's uptake on the protocol, I'll bet that str-only
implementations are the majority.

And your question also cuts the other way.  Why should *I* have to
decode bytes to str, or suffer unexpected TypeErrors, or deal with the
possibility of TypeErrors, just because __fspath__ is polymorphic?

We're here to improve pathlib.  There's been a huge amount of mission
creep, with no use cases to provide intuition.  You pit your abstract
inconvenience against my 20 years of whack-a-mole with UnicodeErrors
and TypeErrors in Mailman.  I *know* that if you let bytes that
represent text loose inside an application, eventually they'll end up
in a str context and "blooey!"

 > How did this application get a bytes path object to begin with?
 > Either it explicitly used bytes when calling scandir and friends
 > (in which case it shouldn't be surprised to be working with bytes);
 > or it got that bytes object from a database, over-the-wire,
 > an-other-language-lib, etc.

No, it got it from an __fspath__-toting object (such as a DirEntry) it
received from some library, which constructed it polymorphically from
bytes it got from some other place -- and so lost the original
encoding.  That's the scenario I think is impossible to rule out, and
reducing that kind of scenario to the bare minimum is why bytes got
demoted from being the default representation of text in Python 3 in
the first place.

 > If I'm working with bytes, why would I want to work with str?

First, are you actually *working* on those bytes, or are you just
passing them to os functions?  If the latter, you shouldn't care.

Second, because paths are conceptually text (you may not agree, but
Nick inter alia has indicated he does).  Working with bytes paths
(except literals) is a good way to get in trouble, because there are
all kinds of ways they can end up inappropriately encoded.  For
example, the odds are very high that a bytes path read from a file
(including from a zipfile directory) in Japan will be encoded in Shift
JIS.  On Mac OS X, that will either produce mojibake in the directory
(if the access creates the file) or fail to access the intended file,
because the filesystem encoding is UTF-8.

Third, because you want to be portable to Windows, where you have no
choice about whether paths are str or bytes.

These reasons probably don't apply to you with much strength, but the
question is how typical you are, vs. the nearly universal experience
of mojibake and the dominant market share of Windows.

 > Python is a glue language, and Python practitioners don't always
 > have the luxury of working only with text.

For paths?  Of course you can work with them as text.  ISTM what you
really want is the luxury of working only with bytes, because you're
in the habit of pretending they are text.  I don't object to you
having your luxury as long as it doesn't increase risk for my use
cases.  I think you're asking for trouble, and the practice is
definitely nonportable, but consenting adults applies.

However, the proposed polymorphism does create ambiguity and risk for
my uses.  I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon.  Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging!  So I don't consent;
you'll have to impose it on me.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/13/2016 02:37 PM, Victor Stinner wrote:


I'm not a big fan of a flag parameter to change the return type of a
function. Usually, two functions are preferred. In the os module we have
getcwd/getcwdb for example. I don't know if it's a good example


I think of os.fspath() as more of a filter/reduce operation:

- str -> str
- str DirEntry -> str

- bytes -> bytes
- bytes DirEntry -> bytes

The purpose of os.fspath() (at least the one I'm arguing for ;) is to 
distil its inputs to the lowest common denominator, and no lower -- 
which is either str for string-based path objects, or bytes for 
bytes-based path objects.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 07:52 AM, Stephen J. Turnbull wrote:

Nick Coghlan writes:



The use case for returning bytes from __fspath__ is DirEntry, so you
can write things like this in low level code:

def myscandir(dirpath):
for entry in os.scandir(dirpath):
if entry.is_file():
with open(entry) as f:
# do something


Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).


Substitute open() with sending those bytes somewhere else: why should I 
have to reencode this str back to bytes, when bytes are what I asked for 
in the first place?



If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.


How did this application get a bytes path object to begin with?  Either 
it explicitly used bytes when calling scandir and friends (in which case 
it shouldn't be surprised to be working with bytes); or it got that 
bytes object from a database, over-the-wire, an-other-language-lib, etc. 
 Those are the boundaries where bytes should be transformed to str if 
the app doesn't want to deal with bytes (whether for path manipulation 
or other text manipulation).  os.fspath() is not a boundary function and 
shouldn't be used as if it were.



If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?


If I'm working with bytes, why would I want to work with str?  Python is 
a glue language, and Python practitioners don't always have the luxury 
of working only with text.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Random832 writes:
 > On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:

 > > I have a strong preference for str only, because I still don't see a
 > > use case for polymorphic __fspath__.
 > 
 > Ultimately we're talking about redundancy and performance here.

Ultimately, yes.  Right now I have some epithets for you:  Premature!
Optimization!!  Get thee behind me, Satan!

More seriously, concrete use cases where this overhead matters?

Church-of-Don-Knuth-member-ly y'rs,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > The use case for returning bytes from __fspath__ is DirEntry, so you
 > can write things like this in low level code:
 > 
 > def myscandir(dirpath):
 > for entry in os.scandir(dirpath):
 > if entry.is_file():
 > with open(entry) as f:
 > # do something

Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).  If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.

If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?

The more I think about this, the more I like my proposal to junk
fspath, and have fsdecode and fsencode consume __fspath__.  That way
application code can request its native type.

 > By contrast, as soon as you type "import pathlib" at the top of your
 > file, you've stepped outside the world of potentially pure boundary
 > code,

"Potentially pure" is an odd term to apply to the boundary code IMO.
We are agreed that conceptually paths are text, for human consumption
(at least at last report we were).  Therefore, paths represented as
bytes are inherently an impure construct.  Viz, surrogateescape.

 > and are instead dealing with structured application level
 > objects (which means traversing the bytes->str boundary before the
 > str->Path one).

That assumes that pathlib.Path's str-only design is appropriate.  I'm
questioning that, primarily as a thought experiment.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:
> I have a strong preference for str only, because I still don't see a
> use case for polymorphic __fspath__.

Ultimately we're talking about redundancy and performance here. The "use
case" such as there is one, is if there's a class (be it DirEntry or
whatever else) that natively stores bytes, and __fspath__ has to return
str, then it calls fsdecode and then open immediately turns around and
calls fsencode on the result, accomplishing nothing vs just passing
everything straight through.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 02:00, Nick Coghlan wrote:
> > If the protocol can return bytes, then that means that types (DirEntry?
> > someone had an alternate path library with a bPath?) which return bytes
> > via the protocol will proliferate, and cannot be safely passed to
> > anything that uses os.fspath. Numerous copies of "def myfspath(x):
> > return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> > monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> > examples.
> 
> If folks want coercion, they can just use os.fsdecode(x), as that
> already has a str -> str passthrough from the input to the output
> (unlike codecs.decode) and will presumably be updated to include an
> implicit call to os._raw_fspath() on the passed in object.

This is the first I've heard of any suggestion to have fsdecode accept
non-strings.

> > Why is it so objectionable for os.fspath to do coercion?
> 
> The first problem is that binary paths on Windows basically don't
> work, so it's preferable for them to fail fast regardless of platform,
> rather than to have them implicitly work on *nix, only to fail for
> Windows users using non-ASCII paths later.

Ideally, this warning would be raised from a central place, and even
fspath (and even fsdecode) would go through it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 17:02, Stephen J. Turnbull  wrote:
> But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
> (so far he's been in the "pathlib is a text utility" camp).

pathlib is too high level (i.e. has too many dependencies) to be used
in low level boundary code.

The use case for returning bytes from __fspath__ is DirEntry, so you
can write things like this in low level code:

def myscandir(dirpath):
for entry in os.scandir(dirpath):
if entry.is_file():
with open(entry) as f:
# do something

and still have them automatically inherit the str/bytes handling of
the core standard library APIs.

By contrast, as soon as you type "import pathlib" at the top of your
file, you've stepped outside the world of potentially pure boundary
code, and are instead dealing with structured application level
objects (which means traversing the bytes->str boundary before the
str->Path one).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Paul Moore
On 14 April 2016 at 08:02, Stephen J. Turnbull  wrote:
> So let me propose what I think is the elephant in the room.  If you're
> going to have a polymorphic __fspath__, then pathlib is *the* example
> of a module that *desperately* needs to be polymorphic.  Consider:
>
> A non-text Application has some bytes and passes them to
> pathlib.Path as 
> manipulates them and passes the result to
> os.scandir as 
> expecting a return of
> DirEntries of 
>
>  ==  == bytes, and  == Path is TOOWTDI, no?

I'm not sure I follow this logic at all. But from my reading your
argument contradicts your conclusion, so maybe I'm misunderstanding.

To me, the "obvious" conclusion is that pathlib is not appropriate in
non-text applications, because  *cannot* be bytes (the
constructor rejects bytes). I see no reason to change that - non-text
applications are inherently low level, and shouldn't expect to use
high-level abstractions like pathlib.

> But under the current proposal which doesn't touch the internal
> mechanisms of pathlib and allows, but has no way to request, bytes
> returns,  == str,  == Path, and  == str,
> requiring two explicit conversions that bytes-shoveling developers
> will tell you should be unnecessary.  QED, pathlib should be
> polymorphic as a central part of this proposal.

Nope, QED pathlib is not a low level abstraction.

So your argument to me doesn't help much, because it's a given that
pathlib is str-only. The debate is about how things like scandir
(specifically DirEntry objects) and Ethan's pathlib replacement, which
*do* allow bytes in and out, should participate in the new protocol,
when they are bytes (they obviously should work just like pathlib when
they are strings).

In my opinion, they *shouldn't* the new protocol should be string-only
(at least initially).

If I understand (from a couple of brief mentions) Ethan has a
string-like path object and a bytes-like path object, so he could
support fspath on the string-like one but not the bytes-like one. He
may not like having slightly different APIs for the two types, I don't
know, but it's possible. But DirEntry is polymorphic, so it *will*
have a __fspath__ method, and needs to know what to do when it's
bytes-like (I guess with a bit of getattr hacking DirEntry *could*
expose a __fspath__ method only if it's string-like, but that seems
like a pretty gross hack).

So:

1. pathlib remains string-like, and is the canonical example of
__fspath__, returns strings only
2. DirEntry is the only other example of the protocol in the stdlib,
but is polymorphic
3. I'm not aware of any 3rd party library that has polymorphic classes
(Ethan can correct me if I'm wrong here)

So the only purpose I know of for discussing __fspath__ returning
bytes is for scandir, and hypothetical polymorphic 3rd party path
abstractions (and possibly Ethan's preference to have a common API for
his 2 classes).

I propose we should have a string-only __fspath__ protocol in 3.6.
Bytes-format DirEntry objects can raise an error in __fspath__. If it
becomes obvious with usage that we need bytes support in __fspath__ we
can add it (compatibly - string-only code wouldn't need to change) in
3.7. That seems far better to me than trying to design bytes support
without actual use cases.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
I was going to read the new posts that came in since I started this
one (at one point it was 5X as long as it is now), but this thread is
way out of control.  My apologies to anybody who has presented[1] use
cases in support of the wildly speculative proposals under discussion,
but my bet is that there have been none.

Victor Stinner writes:

 > Oops sorry, I forgot to add that I have no strong opinion on the type (I
 > only have a minor preference for str only).

I have a strong preference for str only, because I still don't see a
use case for polymorphic __fspath__.

os functions and os.path functions need to *accept* both str and bytes
because they are interfaces to OS functionality used by both text and
non-text applications, and so must check and convert to OS native type.
Many of these function produce what they receive because both text and
non-text applications use names of filesystem objects internally, as
well as passing them to OS wrappers.  The question is how far to take
that logic.

So let me propose what I think is the elephant in the room.  If you're
going to have a polymorphic __fspath__, then pathlib is *the* example
of a module that *desperately* needs to be polymorphic.  Consider:

A non-text Application has some bytes and passes them to
pathlib.Path as 
manipulates them and passes the result to
os.scandir as 
expecting a return of
DirEntries of 

 ==  == bytes, and  == Path is TOOWTDI, no?
But under the current proposal which doesn't touch the internal
mechanisms of pathlib and allows, but has no way to request, bytes
returns,  == str,  == Path, and  == str,
requiring two explicit conversions that bytes-shoveling developers
will tell you should be unnecessary.  QED, pathlib should be
polymorphic as a central part of this proposal.

IMO that's not the right way to go (slippery slope, very quickly you
hit manipulations that are "really" text operations).  See also my
proposal "Pathlib enhancements - improve fsdecode and fsencode" which
suggests a (primitive) way for code to request the type it likes
better.

But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
(so far he's been in the "pathlib is a text utility" camp).


Footnotes: 
[1]  Just because I don't know of any I consider persuasive doesn't
mean there aren't any, but what you don't tell me I don't know.
(Maybe you'd have to kill me?  If so, thanks for not telling!)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 13:54, Random832  wrote:
> On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote:
>
>> - os.fspath -> str (no coercion)
>> - os.fsdecode -> str (with coercion from bytes)
>> - os.fsencode -> bytes (with coercion from str)
>> - os._raw_fspath -> str-or-bytes (no coercion)
>>
>> (with "coercion" referring to how the result of __fspath__ and any
>> directly passed in str or bytes objects are handled)
>>
>> The leading underscore on _raw_fspath would be of the "this is a
>> documented and stable API, but you probably don't want to use it
>> unless you really know what you're doing" variety, rather than the
>> "this is an undocumented and potentially unstable private API"
>> variety.
>
> In this scenario could the protocol return bytes?

Yes, that's desirable to handle DirEntry transparently regardless of type.

> If the protocol can return bytes, then that means that types (DirEntry?
> someone had an alternate path library with a bPath?) which return bytes
> via the protocol will proliferate, and cannot be safely passed to
> anything that uses os.fspath. Numerous copies of "def myfspath(x):
> return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> examples.

If folks want coercion, they can just use os.fsdecode(x), as that
already has a str -> str passthrough from the input to the output
(unlike codecs.decode) and will presumably be updated to include an
implicit call to os._raw_fspath() on the passed in object.

> Why is it so objectionable for os.fspath to do coercion?

The first problem is that binary paths on Windows basically don't
work, so it's preferable for them to fail fast regardless of platform,
rather than to have them implicitly work on *nix, only to fail for
Windows users using non-ASCII paths later.

The second is that it would make os.fspath and os.fsdecode
functionally equivalent, so we'd have two different spellings for the
same operation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote:

> - os.fspath -> str (no coercion)
> - os.fsdecode -> str (with coercion from bytes)
> - os.fsencode -> bytes (with coercion from str)
> - os._raw_fspath -> str-or-bytes (no coercion)
> 
> (with "coercion" referring to how the result of __fspath__ and any
> directly passed in str or bytes objects are handled)
> 
> The leading underscore on _raw_fspath would be of the "this is a
> documented and stable API, but you probably don't want to use it
> unless you really know what you're doing" variety, rather than the
> "this is an undocumented and potentially unstable private API"
> variety.

In this scenario could the protocol return bytes?

If the protocol cannot return bytes, then _raw_fspath will only return
bytes if directly passed bytes. This limits its utility for the
functions that consume it (presumably path_convert (os.open and friends)
and builtin open), since they already have to act specially based on the
types of their arguments (builtin open can accept an integer;
path_convert has to behave radically differently on str or bytes input)
and there's no reason they couldn't simply accept bytes directly while
they're doing that.

If the protocol can return bytes, then that means that types (DirEntry?
someone had an alternate path library with a bPath?) which return bytes
via the protocol will proliferate, and cannot be safely passed to
anything that uses os.fspath. Numerous copies of "def myfspath(x):
return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
monkey-patch os.fspath), and no-one actually uses os.fspath except toy
examples.

Why is it so objectionable for os.fspath to do coercion?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Nick Coghlan
On 14 April 2016 at 12:49, Nick Coghlan  wrote:
> The API could be something like:
>
> - os.fspath -> str-or-bytes
> - os.fsencode -> bytes (with coercion from str)
> - os.fsdecode -> str (with coercion from bytes)
> - os.strpath -> str (no coercion)

There seems to be fairly broad opposition to the idea of defining the
public API in terms of what os and os.path are likely to need, which
reminded me of Koos's suggestion of using a private API for the
str-or-bytes variant. That approach would give us something like:

- os.fspath -> str (no coercion)
- os.fsdecode -> str (with coercion from bytes)
- os.fsencode -> bytes (with coercion from str)
- os._raw_fspath -> str-or-bytes (no coercion)

(with "coercion" referring to how the result of __fspath__ and any
directly passed in str or bytes objects are handled)

The leading underscore on _raw_fspath would be of the "this is a
documented and stable API, but you probably don't want to use it
unless you really know what you're doing" variety, rather than the
"this is an undocumented and potentially unstable private API"
variety.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Nick Coghlan
On 14 April 2016 at 07:37, Victor Stinner  wrote:
> Le mercredi 13 avril 2016, Brett Cannon  a écrit :
>>
>> All of this is demonstrated in
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the
>> various possibilities. In the end it's not a corner case because the
>> definition of __fspath__ will be such that there's no ambiguity in what
>> os.fspath() will accept and what __fspath__ can return and the code will be
>> written to conform to what the PEP dictates (IOW I'm aware that this needs
>> to be considered in the implementation :) .
>
> I'm not a big fan of a flag parameter to change the return type of a
> function. Usually, two functions are preferred. In the os module we have
> getcwd/getcwdb for example. I don't know if it's a good example

It is, as one of the benefits of the "two separate functions" model is
to improve type inference during static analysis - you don't
necessarily know the values of parameters at analysis time, but you do
know which function is being called.

> Do you know other examples of Python functions taking a (flag) parameter to
> change the result type?

subprocess.Popen has a couple of flags that can do that (more
precisely, they change the return type of some methods on the
resulting object), but that's not an especially pretty API in general.
String based type variations are more common (e.g. file mode flags,
using the codec module registry), but they're still used only
sparingly (since they make the code harder to reason about for both
humans and static analysers).

In terms of types for filesystem path APIs:

1. I assume we'll want a fast path for bytes & str to avoid
performance regressions (especially in os.path, where we may be doing
pure data manipulation without any IO operations)
2. I favour defining __fspath__ and os.fspath() in terms of what the
os and os.path modules need to handle both DirEntry and pathlib (which
I currently expect to be str-or-bytes)
3. For the benefit of higher level cross-platform code like pathlib,
it likely makes sense to also have a str-only API that throws an
exception rather than returning bytes

However, I also suggest deferring a decision on 3 until 2 has been
definitively answered by way of implementing the changes. If I'm right
about 2, then the API could be something like:

- os.fspath -> str-or-bytes
- os.fsencode -> bytes (with coercion from str)
- os.fsdecode -> str (with coercion from bytes)
- os.strpath -> str (no coercion)

It's also worth noting that os.fsencode and os.fsdecode are already
idempotent - their current signatures are "str-or-bytes -> bytes" and
"str-or-bytes -> str". With a str-or-bytes return type on os.fspath,
adapting them to handle rich path objects should just be a matter of
adding an os.fspath call as the first step.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Victor Stinner
Oops sorry, I forgot to add that I have no strong opinion on the type (I
only have a minor preference for str only).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Victor Stinner
Le mercredi 13 avril 2016, Brett Cannon  a écrit :
>
> All of this is demonstrated in
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by
> the various possibilities. In the end it's not a corner case because the
> definition of __fspath__ will be such that there's no ambiguity in what
> os.fspath() will accept and what __fspath__ can return and the code will be
> written to conform to what the PEP dictates (IOW I'm aware that this needs
> to be considered in the implementation :) .
>

I'm not a big fan of a flag parameter to change the return type of a
function. Usually, two functions are preferred. In the os module we have
getcwd/getcwdb for example. I don't know if it's a good example

Do you know other examples of Python functions taking a (flag) parameter to
change the result type?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 09:52 Random832  wrote:

> On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote:
> > On 04/13/2016 08:17 AM, Random832 wrote:
> > > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> >
> > >> I'd expect the main consumers to be os and os.path, and would honestly
> > >> be surprised if we needed many explicit invocations above that layer,
> > >> other than in pathlib itself.
> > >
> > > I made a toy implementation to try this out, and making os.open support
> > > it does not get you builtin open "for free" as I had suspected; builtin
> > > open has its own type checks in _iomodule.c.
> >
> > Yup, it will take some effort to make this work.
>
> A corner case just occurred to me...
>
> For functions that will continue to accept str/bytes (and functions that
> accept some other type such as Number or file-like objects), what should
> be done with an object that is one of these, *and* has an __fspath__
> method, *and* this method returns a value other than the object's own
> value? Basically, should the protocol check be done unconditionally
> (before attempting to use the argument as a string) or only if the
> argument is not a string (there's an efficiency argument for this). Or
> should it be left "unspecified", with the understanding that such
> objects are badly behaved and may not be handled consistently across
> different functions / python implementations / cpython versions?
>
> Also, should the os.fspath (or whatever we call it) function itself
> accept str/bytes, even if these are not going to implement the protocol?
>

All of this is demonstrated in
https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the
various possibilities. In the end it's not a corner case because the
definition of __fspath__ will be such that there's no ambiguity in what
os.fspath() will accept and what __fspath__ can return and the code will be
written to conform to what the PEP dictates (IOW I'm aware that this needs
to be considered in the implementation :) .
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Ethan Furman
On 04/13/2016 09:58 AM, Brett Cannon wrote:> On Wed, 13 Apr 2016 at 
09:19 Fred Drake wrote:


>> I do the same, but... this is one of those cases where a caller will
>> usually be passing a constant directly. If passed as a positional
>> argument, it'll just be confusing ("what's True?" is my usual
>> reaction to a Boolean positional argument).
>
> It would be keyword-only so this isn't even a possibility.
>
>> If passed as a keyword argument
>> with a descriptive name, it'll be longer than I'd like to see:
>>
>>  path_str = os.fspath(path, allow_bytes=True)
>
> I think the expectation that the number of people actually directly
> calling this function with that argument specified is going to be
> rather small, so the common-case will simply be:
>
>  path_str = os.fspath(path)

That is certainly my expectation.  :)

>> Names like os.fspath() and os.fssyspath() seem good to me.

A single function is definitely my preference, but if that's not 
possible then I'm fine with that pair of names.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Nikolaus Rath
On Apr 13 2016, Ethan Furman  wrote:
> (I'm not very good at keeping similar sounding functions separate --
> what's the difference between shutil.copy and shutil.copy2?  I have to
> look it up every time).

Well, "2" is more than "" (or 1), so copy2() copies *more* than copy() -
it includes the metadata. That always helps me.


Best,
-Nikolaus
-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Brett Cannon
On Wed, 13 Apr 2016 at 09:19 Fred Drake  wrote:

> On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman  wrote:
> > - a single os.fspath() with an allow_bytes parameter
> >   (mostly True in os and os.path, mostly False everywhere
> >   else)
>
> -0
>
> > - a str-only os.fspathname() and a str/bytes os.fspath()
>
> +1 on using separate functions.
>
> > I'm partial to the first choice as it is simplicity itself to know when
> > looking at it if bytes might be coming back by the presence or absence
> of a
> > second argument to the call; otherwise one has to keep straight in one's
> > head which is str-only and which might allow bytes (I'm not very good at
> > keeping similar sounding functions separate -- what's the difference
> between
> > shutil.copy and shutil.copy2?  I have to look it up every time).
>
> I do the same, but... this is one of those cases where a caller will
> usually be passing a constant directly. If passed as a positional
> argument, it'll just be confusing ("what's True?" is my usual reaction
> to a Boolean positional argument).


It would be keyword-only so this isn't even a possibility.


> If passed as a keyword argument
> with a descriptive name, it'll be longer than I'd like to see:
>
> path_str = os.fspath(path, allow_bytes=True)
>

I think the expectation that the number of people actually directly calling
this function with that argument specified is going to be rather small, so
the common-case will simply be:

path_str = os.fspath(path)


>
> Names like os.fspath() and os.fssyspath() seem good to me.
>

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote:
> On 04/13/2016 08:17 AM, Random832 wrote:
> > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> 
> >> I'd expect the main consumers to be os and os.path, and would honestly
> >> be surprised if we needed many explicit invocations above that layer,
> >> other than in pathlib itself.
> >
> > I made a toy implementation to try this out, and making os.open support
> > it does not get you builtin open "for free" as I had suspected; builtin
> > open has its own type checks in _iomodule.c.
> 
> Yup, it will take some effort to make this work.

A corner case just occurred to me...

For functions that will continue to accept str/bytes (and functions that
accept some other type such as Number or file-like objects), what should
be done with an object that is one of these, *and* has an __fspath__
method, *and* this method returns a value other than the object's own
value? Basically, should the protocol check be done unconditionally
(before attempting to use the argument as a string) or only if the
argument is not a string (there's an efficiency argument for this). Or
should it be left "unspecified", with the understanding that such
objects are badly behaved and may not be handled consistently across
different functions / python implementations / cpython versions?

Also, should the os.fspath (or whatever we call it) function itself
accept str/bytes, even if these are not going to implement the protocol?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Paul Moore
On 13 April 2016 at 17:31, Ethan Furman  wrote:
> On 04/13/2016 09:27 AM, Paul Moore wrote:
>>
>> On 13 April 2016 at 17:18, Fred Drake wrote:
>
>
>>> Names like os.fspath() and os.fssyspath() seem good to me.
>>
>>
>> -1 on fssyspath - the "system" representation is bytes on POSIX, but
>> not on Windows. Let's be explicit and go with fsbytespath().
>
>
> It will be confusing that fsbytespath() can return a string.

Oh, wait, yes fssyspath is for allow_bytes=True which *may* be bytes,
but could still be a string. My mistake. On that basis, I could go
with fssyspath (thinking "sys" = "low level").

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Ethan Furman

On 04/13/2016 09:27 AM, Paul Moore wrote:

On 13 April 2016 at 17:18, Fred Drake wrote:



Names like os.fspath() and os.fssyspath() seem good to me.


-1 on fssyspath - the "system" representation is bytes on POSIX, but
not on Windows. Let's be explicit and go with fsbytespath().


It will be confusing that fsbytespath() can return a string.

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Ethan Furman

On 04/13/2016 09:18 AM, Fred Drake wrote:

On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman wrote:

- a single os.fspath() with an allow_bytes parameter
   (mostly True in os and os.path, mostly False everywhere
   else)


-0


- a str-only os.fspathname() and a str/bytes os.fspath()


+1 on using separate functions.



Names like os.fspath() and os.fssyspath() seem good to me.


Ooh, I like that!  I could probably keep those names separate in my 
head.  :)


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Fred Drake
On Wed, Apr 13, 2016 at 12:27 PM, Paul Moore  wrote:
> -1 on fssyspath - the "system" representation is bytes on POSIX, but
> not on Windows. Let's be explicit and go with fsbytespath().

Depends on the semantics; if we're expecting it to return
str-or-bytes, os.fssyspath() seems fine.  If only returning bytes (not
sure that ever makes sense on Windows, since I don't use Windows),
then I'd be happy with os.fsbytespath().


  -Fred

-- 
Fred L. Drake, Jr.
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Paul Moore
On 13 April 2016 at 17:18, Fred Drake  wrote:
> Names like os.fspath() and os.fssyspath() seem good to me.

-1 on fssyspath - the "system" representation is bytes on POSIX, but
not on Windows. Let's be explicit and go with fsbytespath().

But agreed that always-constant boolean parameters are a bad idea. The
hard bit is good naming of the separate functions (100% agree that
shutil is a good example of how not to do it :-))

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Fred Drake
On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman  wrote:
> - a single os.fspath() with an allow_bytes parameter
>   (mostly True in os and os.path, mostly False everywhere
>   else)

-0

> - a str-only os.fspathname() and a str/bytes os.fspath()

+1 on using separate functions.

> I'm partial to the first choice as it is simplicity itself to know when
> looking at it if bytes might be coming back by the presence or absence of a
> second argument to the call; otherwise one has to keep straight in one's
> head which is str-only and which might allow bytes (I'm not very good at
> keeping similar sounding functions separate -- what's the difference between
> shutil.copy and shutil.copy2?  I have to look it up every time).

I do the same, but... this is one of those cases where a caller will
usually be passing a constant directly. If passed as a positional
argument, it'll just be confusing ("what's True?" is my usual reaction
to a Boolean positional argument). If passed as a keyword argument
with a descriptive name, it'll be longer than I'd like to see:

path_str = os.fspath(path, allow_bytes=True)

Names like os.fspath() and os.fssyspath() seem good to me.


  -Fred

-- 
Fred L. Drake, Jr.
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Ethan Furman

On 04/13/2016 08:17 AM, Random832 wrote:

On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:



I'd expect the main consumers to be os and os.path, and would honestly
be surprised if we needed many explicit invocations above that layer,
other than in pathlib itself.


I made a toy implementation to try this out, and making os.open support
it does not get you builtin open "for free" as I had suspected; builtin
open has its own type checks in _iomodule.c.


Yup, it will take some effort to make this work.


Probably anything not implemented in pure python that deals with
filenames is going to have to have its type checking revised.


Agreed.

You can see why there was no point in pursuing the conversation unless 
someone was willing to do the work.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Random832
On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> I'd expect the main consumers to be os and os.path, and would honestly
> be surprised if we needed many explicit invocations above that layer,
> other than in pathlib itself.

I made a toy implementation to try this out, and making os.open support
it does not get you builtin open "for free" as I had suspected; builtin
open has its own type checks in _iomodule.c.

Probably anything not implemented in pure python that deals with
filenames is going to have to have its type checking revised.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Ethan Furman

On 04/13/2016 07:21 AM, Nick Coghlan wrote:

On 14 April 2016 at 00:11, Paul Moore wrote:

On 13 April 2016 at 14:51, Nick Coghlan wrote:



The potential SE-strings only come back when you pass str, and the
operating system data isn't properly encoded according to the nominal
filesystem encoding. They round trip nicely to other operating system
APIs, but can indeed be a problem if they escape to other parts of
your program


If the operating system APIs handle SE-strings correctly, is it not
acceptable to require the fspath protocol to return strings, and then
places like DirEntry or Ethan's module, when they want to return
bytes, can just SE-encode the bytes and return those?

Or will the fspath protocol be used at a low enough level that it's
*below* the point where SE-encoded strings are handled properly?


I'd expect the main consumers to be os and os.path, and would honestly
be surprised if we needed many explicit invocations above that layer,
other than in pathlib itself.

That's actually the main factor in my suggesting the two level API
design - from a protocol consumer perspective, bytes-or-str is a
natural fit for os and os.path, while str-only is a natural fit for
pathlib.

I also now believe it makes sense to postpone a final decision on this
aspect of the design until after a draft implementation has been put
together, as my and Ethan's assumption that os and os.path will be the
main consumers is exactly that: an assumption. Putting the draft
implementation together will let us know whether or not it's an
accurate one.


Sounds reasonable.

However, there is still one choice that needs to be made:

- a single os.fspath() with an allow_bytes parameter
  (mostly True in os and os.path, mostly False everywhere
  else)

- a str-only os.fspathname() and a str/bytes os.fspath()

I'm partial to the first choice as it is simplicity itself to know when 
looking at it if bytes might be coming back by the presence or absence 
of a second argument to the call; otherwise one has to keep straight in 
one's head which is str-only and which might allow bytes (I'm not very 
good at keeping similar sounding functions separate -- what's the 
difference between shutil.copy and shutil.copy2?  I have to look it up 
every time).


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Nick Coghlan
On 14 April 2016 at 00:11, Paul Moore  wrote:
> On 13 April 2016 at 14:51, Nick Coghlan  wrote:
>> The potentially SE-strings only come back when you pass str, and the
>> operating system data isn't properly encoded according to the nominal
>> filesystem encoding. They round trip nicely to other operating system
>> APIs, but can indeed be a problem if they escape to other parts of
>> your program
>
> If the operating system APIs handle SE-strings correctly, is it not
> acceptable to require the fspath protocol to return strings, and then
> places like DirEntry or Ethan's module, when they want to return
> bytes, can just SE-encode the bytes and return those?
>
> Or will the fspath protocol be used at a low enough level that it's
> *below* the point where SE-encoded strings are handled properly?

I'd expect the main consumers to be os and os.path, and would honestly
be surprised if we needed many explicit invocations above that layer,
other than in pathlib itself.

That's actually the main factor in my suggesting the two level API
design - from a protocol consumer perspective, bytes-or-str is a
natural fit for os and os.path, while str-only is a natural fit for
pathlib.

I also now believe it makes sense to postpone a final decision on this
aspect of the design until after a draft implementation has been put
together, as my and Ethan's assumption that os and os.path will be the
main consumers is exactly that: an assumption. Putting the draft
implementation together will let us know whether or not it's an
accurate one.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Paul Moore
On 13 April 2016 at 14:51, Nick Coghlan  wrote:
> The potentially SE-strings only come back when you pass str, and the
> operating system data isn't properly encoded according to the nominal
> filesystem encoding. They round trip nicely to other operating system
> APIs, but can indeed be a problem if they escape to other parts of
> your program

If the operating system APIs handle SE-strings correctly, is it not
acceptable to require the fspath protocol to return strings, and then
places like DirEntry or Ethan's module, when they want to return
bytes, can just SE-encode the bytes and return those?

Or will the fspath protocol be used at a low enough level that it's
*below* the point where SE-encoded strings are handled properly?

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-13 Thread Nick Coghlan
On 13 April 2016 at 02:15, Ethan Furman  wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?

On POSIX, if you pass bytes to the os module, it will pass bytes to
the underlying system API, and then pass bytes back to your
application.

The potentially SE-strings only come back when you pass str, and the
operating system data isn't properly encoded according to the nominal
filesystem encoding. They round trip nicely to other operating system
APIs, but can indeed be a problem if they escape to other parts of
your program (hence ideas like
http://bugs.python.org/issue18814#msg251694 and the preceding
discussion in that issue)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Ethan Furman

On 04/12/2016 09:20 AM, Chris Angelico wrote:

On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman



latin1?  I thought latin1 had a code point for 0-255, so how could using it
raise an encoding error?


Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.


Ah, right -- so if you start with bytes it cannot fail, if you start 
with a string it can.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 9:20 AM, Chris Angelico  wrote:

> > latin1?  I thought latin1 had a code point for 0-255, so how could using
> it
> > raise an encoding error?
>
> Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
> string will *decode*. It only defines 256 characters as having
> equivalent bytes, though, so *encoding* can fail.
>

unless it was decoded as latin-1 in the first place. doesn't the surrogate
escape thing only work properly if you decode/encode with the same encoding?

-CHB




Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Chris Angelico
On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman  wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?
>
>
>> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
>> error.
>
>
> latin1?  I thought latin1 had a code point for 0-255, so how could using it
> raise an encoding error?

Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Ethan Furman

On 04/11/2016 04:43 PM, Victor Stinner wrote:

Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :



So my concern in such a case is what happens if we pass this SE
string somewhere else: a UTF-8 file, or over a socket, or into a
database? Does this have issues that we wouldn't face if we just used bytes?


"SE string" are returned by os.listdir(str), os.walk(str),
os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
the sun.


So when we pass a bytes object in, Python (on posix) converts that to a 
string using surrogateescape, gets back strings from the os, and encodes 
them back to bytes, again using surrogateescape?




Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
error.


latin1?  I thought latin1 had a code point for 0-255, so how could using 
it raise an encoding error?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Stephen J. Turnbull
INADA Naoki writes:

 > > Why not print(obj)?

print(obj) will give mojibake by default if
sys.getfilenameencoding() != sys.getdefaultencoding().

 > > str() is normal high-level API, and __fspath__ and os.fspath() should be
 > > low level API.
 > > Normal users shouldn't use __fspath__ and os.fspath().  Only library
 > > developers should use it.

This is the price we pay for the stubbornness of the
bytes-are-text-too meme.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Greg Ewing

Ethan Furman wrote:

  # after new protocol with bytes/str support
  def zingar(a_path):
  a_path = fspath(a_path)
  if not isinstance(a_path, (bytes,str)):
  raise TypeError('bytes or str required')
  ...


I think that one would be just

   def zingar(a_path):
   a_path = fspath(a_path)

because fspath() would presumably check the result for
str/bytesness itself. At least I can't think of a reason
for it not to, since returning either str or bytes is
part of its contract.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread INADA Naoki
Sorry, I've forgot to use "Reply All".

On Tue, Apr 12, 2016 at 9:49 AM, INADA Naoki  wrote:

> IHMO it's safer to get an encoding error rather than no error when you
>> concatenate two byte strings encoded to two different encodings (mojibake).
>>
>> print(os.fspath(obj)) will more likely do what you expect if os.fspath()
>> always return str. I mean that it will encode your filename to the encoding
>> of the terminal which can be different than the filesystem encoding.
>>
>> If fspath() can return bytes, you should write
>> print(os.fsdecode(os.fspath(obj))).
>>
>>
> Why not print(obj)?
> str() is normal high-level API, and __fspath__ and os.fspath() should be
> low level API.
> Normal users shouldn't use __fspath__ and os.fspath().  Only library
> developers should use it.
>
> --
> INADA Naoki  
>

-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/11/2016 01:42 PM, Victor Stinner wrote:


With the PEP 383, a bytes filename can be stored as str using the
surrogateescape error handler. So DirEntry can convert a bytes path to
str using os.fsdecode().


Does this mean that os.fsdecode() is simply a wrapper that sets the 
errors to the surrogateescape handler?


--

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Victor Stinner
Le 11 avr. 2016 11:11 PM, "Ethan Furman"  a écrit :
> So my concern in such a case is what happens if we pass this SE string
somewhere else: a UTF-8 file, or over a socket, or into a database? Does
this have issues that we wouldn't face if we just used bytes?

"SE string" are returned by os.listdir(str), os.walk(str), os.getenv(str),
sys.argv[int], ... since Python 3.3. Nothing new under the sun.

Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
error. A surrogate is created to store an undecodable byte in a filename.

IHMO it's safer to get an encoding error rather than no error when you
concatenate two byte strings encoded to two different encodings (mojibake).

print(os.fspath(obj)) will more likely do what you expect if os.fspath()
always return str. I mean that it will encode your filename to the encoding
of the terminal which can be different than the filesystem encoding.

If fspath() can return bytes, you should write
print(os.fsdecode(os.fspath(obj))).

--

On Linux, open(DirEntry) for a bytes entry (os.scandir(bytes)) would have
to first decode a bytes filename with os.fsdecode() to then encode it back
with os.fsencode().

Yeah, that's inefficient. But we now have super fast codecs (ex: encode and
decode is almost memcpy for pure ascii). And filenames are usually very
short (less than 300 bytes). IMHO the interface matters more than
performance.

As I showed with my print example, filenames are not only used to access
the filesystem, you also want to display them. Using Unicode avoids bad
surprises (mojibake).

--

Well, the question is more why you want to get bytes at the first place.
Why not only using Unicode?

I understood that some people expect mojibake when using Unicode, whereas
using bytes cannot lead to mojibake. Well, in practice it's simply the
opposite :-)

Maybe devs read that Linux syscalls and C functions take bytes, so using
bytes give access to any filenames including "invalid filenames". That's
true. But it's also true for Unicode if you use os.fsdecode().

Maybe dev don't understand, don't know and fear Unicode :-)

My goal is more to educate users and help them to avoid mojibake.

Did I mention that you must not use bytes filename on Windows? So using
Unicode everywhere helps to write really portable code. On Windows, using
Unicode is requied to be able to open any file.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Brett Cannon
On Mon, 11 Apr 2016 at 14:11 Ethan Furman  wrote:

> On 04/11/2016 01:42 PM, Victor Stinner wrote:
> > 2016-04-11 21:00 GMT+02:00 Brett Cannon:
>
> >> I'm -0 on allowing __fspath__ to return bytes, but we can see what
> others
> >> think.
> >
> > With the PEP 383, a bytes filename can be stored as str using the
> > surrogateescape error handler. So DirEntry can convert a bytes path to
> > str using os.fsdecode().
>
> I am far from a unicode expert, but if I understand this correctly you
> are proposing that DirEntry.__whatever__ can always return a str using
> the surogateescape (SE) method.
>
> However, before this SE string can be used, it would need to be
> converted back to bytes, and with the same SE method, yes?  And this has
> already been implemented in the stdlib?
>
> So my concern in such a case is what happens if we pass this SE string
> somewhere else: a UTF-8 file, or over a socket, or into a database?
> Does this have issues that we wouldn't face if we just used bytes?
>

This is my worry as well and why I have not proposed this kind of universal
normalizing of bytes paths using os.fsdecode() w/ surrogateescape. Doing
this sort of thing from the system boundary and documenting as such as PEP
383 proposed makes a bit more sense as the expectation is more controlled
and is a clear input boundary.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/11/2016 01:42 PM, Victor Stinner wrote:

2016-04-11 21:00 GMT+02:00 Brett Cannon:



I'm -0 on allowing __fspath__ to return bytes, but we can see what others
think.


With the PEP 383, a bytes filename can be stored as str using the
surrogateescape error handler. So DirEntry can convert a bytes path to
str using os.fsdecode().


I am far from a unicode expert, but if I understand this correctly you 
are proposing that DirEntry.__whatever__ can always return a str using 
the surogateescape (SE) method.


However, before this SE string can be used, it would need to be 
converted back to bytes, and with the same SE method, yes?  And this has 
already been implemented in the stdlib?


So my concern in such a case is what happens if we pass this SE string 
somewhere else: a UTF-8 file, or over a socket, or into a database? 
Does this have issues that we wouldn't face if we just used bytes?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Victor Stinner
2016-04-11 21:00 GMT+02:00 Brett Cannon :
> I'm -0 on allowing __fspath__ to return bytes, but we can see what others
> think.

With the PEP 383, a bytes filename can be stored as str using the
surrogateescape error handler. So DirEntry can convert a bytes path to
str using os.fsdecode().

A "byte string" is unclear in Python. There is the immutable "bytes"
type. But there is also the mutable "bytearray" type. And the buffer
protocol which can have different shapes.

I like the idea of a simple protocol: only allow a single type, str.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/11/2016 12:00 PM, Brett Cannon wrote:

On Mon, 11 Apr 2016 at 11:28 Ethan Furman wrote:



I would write the above as:

def fspath(path, *, allow_bytes=False):


You get type consistency from so.fspath(), not the protocol, though.


Well, since the protocol is also a function, we could put the 
allow_bytes on that as well -- not sure if that is a good idea or not.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Brett Cannon
On Mon, 11 Apr 2016 at 11:28 Ethan Furman  wrote:

> On 04/11/2016 10:36 AM, Brett Cannon wrote:
> > On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote:
>
> >> I'm not saying that bytes paths are common -- and if this was a
> >> brand-new feature I wouldn't be pushing for it so hard;  however, bytes
> >> paths are already supported and it seems to me to be much less of a
> >> headache to continue the support in this new protocol instead of drawing
> >> an artificial line in the sand.
> >
> > Headache for you? The stdlib? Library authors? Users of libraries? There
> > are a lot of users of this who have varying levels of pain for this.
>
> Yes, yes, maybe, maybe.  :)
>
> >> Asked another way, what are we gaining by disallowing bytes in this new
> >> way of getting paths versus the pain caused when bytes are needed and/or
> >> accepted?
> >
> > Type consistency. E.g. if I pass in a DirEntry object into os.fspath()
> > and I don't know what the heck I'm getting back then that can lead to
> > subtle bugs [...]
>
> > How about we take something from the "explicit is better than implicit"
> > playbook and add a keyword argument to os.fspath() to allow bytes to
> > pass through?
> >
> >def fspath(path, *, allow_bytes=False):
> >if isinstance(path, str):
> >return path
> ># Allow bytearray?
> >elif allow_bytes and isinstance(path, bytes):
> >return path
> >try:
> >protocol = path.__fspath__()
> >except AttributeError:
> >pass
> >else:
> ># Explicit type check worth it, or better to rely on duck
> typing?
> >if isinstance(protocol_path, str):
> >return protocol_path
> >raise TypeError("expected a path-like object, str, or bytes (if
> > allowed), not {type(path)}")
>
> I think that might work.  We currently have four path related things:
> bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and
> one can be either.
>
> I would write the above as:
>
>def fspath(path, *, allow_bytes=False):
>   try:
>  path = path.__fspath__()
>   except AttributeError:
>  pass
>   if isinstance(path, str):
>  return path
>   elif allow_bytes and isinstance(path, bytes):
>  return path
>   else:
>  raise SomeError()
>
> > For DirEntry users who use bytes, they will simply have to pass around
> > DirEntry.path which is not as nice as simply passing around DirEntry,
>
> If we go with the above we allow DirEntry.__fspath__ to return bytes and
> still get type-consistency of str unless the user explicitly declares
> they're okay with getting either (and even then the field is narrowed
> from four possible source types (or more as time goes on) to two.
>

You get type consistency from so.fspath(), not the protocol, though.


>
> To recap, this would allow both str & bytes in __fspath__, but the
> fspath() function defaults to only allowing str through.
>
> I can live with that.
>

I'm -0 on allowing __fspath__ to return bytes, but we can see what others
think.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/11/2016 10:36 AM, Brett Cannon wrote:

On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote:



I'm not saying that bytes paths are common -- and if this was a
brand-new feature I wouldn't be pushing for it so hard;  however, bytes
paths are already supported and it seems to me to be much less of a
headache to continue the support in this new protocol instead of drawing
an artificial line in the sand.


Headache for you? The stdlib? Library authors? Users of libraries? There
are a lot of users of this who have varying levels of pain for this.


Yes, yes, maybe, maybe.  :)


Asked another way, what are we gaining by disallowing bytes in this new
way of getting paths versus the pain caused when bytes are needed and/or
accepted?


Type consistency. E.g. if I pass in a DirEntry object into os.fspath()
and I don't know what the heck I'm getting back then that can lead to
subtle bugs [...]



How about we take something from the "explicit is better than implicit"
playbook and add a keyword argument to os.fspath() to allow bytes to
pass through?

   def fspath(path, *, allow_bytes=False):
   if isinstance(path, str):
   return path
   # Allow bytearray?
   elif allow_bytes and isinstance(path, bytes):
   return path
   try:
   protocol = path.__fspath__()
   except AttributeError:
   pass
   else:
   # Explicit type check worth it, or better to rely on duck typing?
   if isinstance(protocol_path, str):
   return protocol_path
   raise TypeError("expected a path-like object, str, or bytes (if
allowed), not {type(path)}")


I think that might work.  We currently have four path related things: 
bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and 
one can be either.


I would write the above as:

  def fspath(path, *, allow_bytes=False):
 try:
path = path.__fspath__()
 except AttributeError:
pass
 if isinstance(path, str):
return path
 elif allow_bytes and isinstance(path, bytes):
return path
 else:
raise SomeError()


For DirEntry users who use bytes, they will simply have to pass around
DirEntry.path which is not as nice as simply passing around DirEntry,


If we go with the above we allow DirEntry.__fspath__ to return bytes and 
still get type-consistency of str unless the user explicitly declares 
they're okay with getting either (and even then the field is narrowed 
from four possible source types (or more as time goes on) to two.


To recap, this would allow both str & bytes in __fspath__, but the 
fspath() function defaults to only allowing str through.


I can live with that.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Random832
On Mon, Apr 11, 2016, at 13:36, Brett Cannon wrote:
> How about we take something from the "explicit is better than implicit"
> playbook and add a keyword argument to os.fspath() to allow bytes to pass
> through?

Except, we already know how to convert a bytes-path into a str (and vice
versa) with sys.getfilesystemencoding and surrogateescape. So why not
just have the argument specify what return type is desired?

def fspath(path, *, want_bytes=False):
if isinstance(path, (bytes, str)):
ppath = path
else:
try:
ppath = path.__fspath__()
except AttributeError:
raise TypeError
if isinstance(ppath, str):
return ppath.encode(...) if want_bytes else ppath
elif isinstance(ppath, bytes):
return ppath if want_bytes else ppath.decode(...)
else:
raise TypeError

This way the posix os module can call the function and have the bytes
value already prepared for it to pass to the real open() syscall.

You could even add the same thing in other places, e.g. os.path.join
(defaulting to if the first argument is a bytes).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Antoine Pitrou
Ethan Furman  stoneleaf.us> writes:
> 
> On 04/11/2016 07:56 AM, Antoine Pitrou wrote:
> 
> >> 2) pathlib.Path accepts bytes --
> >
> > Does it? Or are you proposing such a change?
> 
> It used to (I posted a couple examples from 3.5.0).  I finally rebuilt 
> with the latest and it no longer does.

This is surprising, since in its entire lifetime, pathlib was never
supposed to support bytes inputs. See the argument check in the
initial checkin of pathlib.py:
https://hg.python.org/cpython/rev/43377dcfb801/#l6.571

Perhaps that slipped through at some point (and obviously no test was
there to prevent it :-)).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Brett Cannon
On Mon, 11 Apr 2016 at 10:13 Ethan Furman  wrote:

> On 04/11/2016 09:32 AM, Zachary Ware wrote:
> > On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote:
>
> >> If those examples are anywhere close to accurate, an fspath protocol
> that
> >> supported both bytes and str seems a lot easier to work with.
> >
> > But why are you working with bytes paths in the first place? Where did
> > you get them from, and why couldn't you decode them at that boundary?
> > In 7ish years of working with Python (almost exclusively Python 3) on
> > Windows and UNIX, I have never used bytes paths on any platform.
>
> I'm not saying that bytes paths are common -- and if this was a
> brand-new feature I wouldn't be pushing for it so hard;  however, bytes
> paths are already supported and it seems to me to be much less of a
> headache to continue the support in this new protocol instead of drawing
> an artificial line in the sand.
>

Headache for you? The stdlib? Library authors? Users of libraries? There
are a lot of users of this who have varying levels of pain for this.


>
> Also, let me be clear that the new protocol will not adversely affect my
> own library is it directly subclasses bytes and strings (bPath and
> uPath), so they will pass through either way (or be appropriately
> rejected if the function only supports str -- are there any?) .
>

Well, technically it depends on whether we prefer the protocol or explicit
type checking and how we define the protocol. If we say __ospath__ has to
return str and we check for that first then that would be bad for you. If
we do isinstance() checks before calling the protocol or allow both str and
bytes then we open it up.


>
> This kind of feels like PEP 361 again -- the vast majority of Python
> programmers do not need %-interpolation for bytes, but what a pain in
> the rear for those that did!  (Yes, I was one of those.)  Admittedly,
> the pain from this will not be nearly as severe as that was, but why
> should we have any unnecessary pain at all?
>
> Asked another way, what are we gaining by disallowing bytes in this new
> way of getting paths versus the pain caused when bytes are needed and/or
> accepted?
>

Type consistency. E.g. if I pass in a DirEntry object into os.fspath() and
I don't know what the heck I'm getting back then that can lead to subtle
bugs, especially when you didn't check ahead of time what DirEntry.path
was. To me, that bumps up against "In the face of ambiguity, refuse the
temptation to guess". Having the type vary even when the type doesn't can
get messy if you don't expect to always vary (i.e. this isn't getattr()).


>
>  From my point of view the pain of simply implementing this without
> bytes support in the existing os and os.path modules is not worth
> excluding bytes.
>

How about we take something from the "explicit is better than implicit"
playbook and add a keyword argument to os.fspath() to allow bytes to pass
through?

  def fspath(path, *, allow_bytes=False):
  if isinstance(path, str):
  return path
  # Allow bytearray?
  elif allow_bytes and isinstance(path, bytes):
  return path
  try:
  protocol = path.__fspath__()
  except AttributeError:
  pass
  else:
  # Explicit type check worth it, or better to rely on duck typing?
  if isinstance(protocol_path, str):
  return protocol_path
  raise TypeError("expected a path-like object, str, or bytes (if
allowed), not {type(path)}")

For DirEntry users who use bytes, they will simply have to pass around
DirEntry.path which is not as nice as simply passing around DirEntry, but
it does allow them to continue to operate without having to decode the
bytes if allow_bytes is True. We get type consistency in the protocol fas
we can continue to expect people to return strings for __fspath__. And for
those APIs where supporting bytes won't be an issue, they can explicitly
choose to support bytes or not and then not have to juggle support for both
str and bytes if they choose not to. IOW consenting adults to bytes paths
can not get cut out and have a ton of hoops to jump through as long as they
opt-in, but those adults who don't consent to bytes paths have their lives
simplified.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Donald Stufft

> On Apr 11, 2016, at 1:12 PM, Ethan Furman  wrote:
> 
> Asked another way, what are we gaining by disallowing bytes in this new way 
> of getting paths versus the pain caused when bytes are needed and/or accepted?


It seems fine to me to allow __fspath__ to return bytes as well as str. The 
only argument I can think against it is that something like pathlib.Path() 
would not work with a bytes returning __fspath__, but that’s not any different 
than what happens if you pass a bytes object directly into pathlib.Path as well.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/11/2016 09:32 AM, Zachary Ware wrote:

On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote:



If those examples are anywhere close to accurate, an fspath protocol that
supported both bytes and str seems a lot easier to work with.


But why are you working with bytes paths in the first place? Where did
you get them from, and why couldn't you decode them at that boundary?
In 7ish years of working with Python (almost exclusively Python 3) on
Windows and UNIX, I have never used bytes paths on any platform.


I'm not saying that bytes paths are common -- and if this was a 
brand-new feature I wouldn't be pushing for it so hard;  however, bytes 
paths are already supported and it seems to me to be much less of a 
headache to continue the support in this new protocol instead of drawing 
an artificial line in the sand.


Also, let me be clear that the new protocol will not adversely affect my 
own library is it directly subclasses bytes and strings (bPath and 
uPath), so they will pass through either way (or be appropriately 
rejected if the function only supports str -- are there any?) .


This kind of feels like PEP 361 again -- the vast majority of Python 
programmers do not need %-interpolation for bytes, but what a pain in 
the rear for those that did!  (Yes, I was one of those.)  Admittedly, 
the pain from this will not be nearly as severe as that was, but why 
should we have any unnecessary pain at all?


Asked another way, what are we gaining by disallowing bytes in this new 
way of getting paths versus the pain caused when bytes are needed and/or 
accepted?


From my point of view the pain of simply implementing this without 
bytes support in the existing os and os.path modules is not worth 
excluding bytes.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Zachary Ware
On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman  wrote:
> If those examples are anywhere close to accurate, an fspath protocol that
> supported both bytes and str seems a lot easier to work with.

But why are you working with bytes paths in the first place? Where did
you get them from, and why couldn't you decode them at that boundary?
In 7ish years of working with Python (almost exclusively Python 3) on
Windows and UNIX, I have never used bytes paths on any platform.

-- 
Zach
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-11 Thread Ethan Furman

On 04/10/2016 11:27 PM, Nick Coghlan wrote:

On 11 April 2016 at 02:16, Ethan Furman  wrote:



DirEntry can still get the check, it can just throw TypeError when it
represents a binary path (that's one of the advantages of using a
method-based protocol - exceptions on method calls are more acceptable
than exceptions on property access).



I guess I don't see the point of this.  Either DirEntry's [1] only get
partial support (which is only marginally better than the no support pathlib
currently has), or stdlib code will need to catch those errors and then do
an isinstance check to see if knows what the type is and how to deal with it
[1].


What's wrong with only gaining partial support? Standard library code
that doesn't currently support DirEntry at all will gain the ability
to support str-based DirEntry objects, while bytes-based DirEntry
objects will continue to be a low level object [...]


Let's consider to functions, one that accepts bytes/str for the path, 
and one that only accepts str:



  str-only support
  
  # before new protocol
  def do_fritz(a_path):
  if not isinstance(a_path, str):
  raise TypeError('str required')
  ...

  # after new protocol with str-only support
  def do_fritz(a_path):
  a_path = fspath(a_path)
  ...

  # after new protocol with bytes/str support
  a_path = fspath(a_path)
  if not isinstance(a_path, str):
  raise TypeError('str required')
  ...


  bytes/str support
  -
  # before new protocol
  def zingar(a_path):
  if not isinstance(a_path, (bytes,str)):
  raise TypeError('bytes or str required')
  ...

  # after new protocol with str-only support
  def zingar(a_path):
  if not isinstance(a_path, bytes):
  try:
  a_path = fspath(a_path)
  except FSPathError:
  raise TypeError('bytes or str required')
  ...

  # after new protocol with bytes/str support
  def zingar(a_path):
  a_path = fspath(a_path)
  if not isinstance(a_path, (bytes,str)):
  raise TypeError('bytes or str required')
  ...


If those examples are anywhere close to accurate, an fspath protocol 
that supported both bytes and str seems a lot easier to work with.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >