Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Stefan Behnel
Victor Stinner schrieb am 15.04.2016 um 00:33:
> 2016-04-15 0:22 GMT+02:00 Brett Cannon:
>> And even if it was GIL-free you do run the risk of two dicts ending up at
>> the same version # by simply mutating the same number of times if the
>> counters were per-dict instead of process-wide.
> 
> For some optimizations, it is not needed to check if the dictionary
> was replaced, or you check it directly. So it doesn't matter to have
> the same version with the same number of operations.
> 
> For the use case of Yury's optimization, having a globally unique
> version tag makes the guard much cheaper, and the guard must check
> that the dictionary was not replaced.

How can that be achieved? If the tag is just a sequentially growing number,
creating two dicts and applying one operation to the first one should give
both the same version tag, right?

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Senthil Kumaran
On Wed, Apr 13, 2016 at 4:40 AM, Victor Stinner 
wrote:

> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.
>
> So please try to not break buildbots again and remind to watch them
> sometimes:
>

Piling in my thanks again, Victor. This is a great gesture from you to fix
all the build bots.
Keeping them stable is a proper thing to do and should be expected from all
committers.

--
Senthil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should secrets include a fallback for hmac.compare_digest?

2016-04-14 Thread Steven D'Aprano
Now that PEP 506 has been approved, I've checked in the secrets module, 
but an implementation question has come up regarding compare_digest.

Currently, the module tries to import hmac.compare_digest, and if that 
fails, then it falls back to a Python version. But since compare_digest 
has been available since 3.3, I'm now questioning whether the fallback 
is useful at all. Perhaps for alternate Python implementations?

So, two questions:

- should secrets include a fallback?

- if so, what is the preferred way of doing this?

# option 1: fallback if compare_digest is missing

try:
from hmac import compare_digest
except ImportError:
def compare_digest(a, b):
...


# option 2: "C accelerator idiom"

def compare_digest(a, b):
...

try:
from hmac import compare_digest
except ImportError:
pass


Option 1 is closer to how I would write hybrid 2/3 code, but option 2 is 
how PEP 399 suggests it should be written.

https://www.python.org/dev/peps/pep-0399/


Currently, hmac imports compare_digest from _operator. There's no Python 
version in operator either. Should there be?



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 06:01 PM, Ethan Furman wrote:

On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:



you'll have to impose it on me.


Hmm.  Well, the good news is you have convinced me that letting bytes
through willy-nilly is akin to loosing the hounds of hell on our code.
The bad news is I was never in that camp.  ;)


Actually, in retrospect, I was in that camp at the beginning.  But 
Brett's code (and your arguments, amongst others) convinced me of that 
 or  would be better/safer.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Brett Cannon
On Thu, Apr 14, 2016, 17:14 MRAB  wrote:

> On 2016-04-14 21:42, Armin Rigo wrote:
> > Hi Victor,
> >
> > On 14 April 2016 at 17:19, Victor Stinner 
> wrote:
> >> Each time a dictionary is created, the global
> >> version is incremented and the dictionary version is initialized to the
> >> global version.
> >
> > A detail, but why not set the version tag of new empty dictionaries to
> > zero, always?   Same after a clear().  This would satisfy the
> > condition: equality of the version tag is supposed to mean "the
> > dictionary content is precisely the same".
> >
> If you did that, wouldn't it then be possible to replace an empty dict
> with another empty dict with you noticing?


If you meant to say "without" then yes.


Would that matter?
>

Nope because this is about versioining content, so having identical/empty
content compare equal is fine.

-brett


> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:


However, the proposed polymorphism does create ambiguity and risk for
my uses.  I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon.  Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging!  So I don't consent;
you'll have to impose it on me.


Hmm.  Well, the good news is you have convinced me that letting bytes 
through willy-nilly is akin to loosing the hounds of hell on our code. 
The bad news is I was never in that camp.  ;)


The camp I'm in is a function* that, be default, will raise if bytes 
enters the picture -- but will allow them through if the user 
specifically says they are okay with getting bytes.


Would that work for you?

--
~Ethan~

*Or pair of functions, one that is str-only, one that allows both -- but 
I'd rather just have one function with a flag.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Ethan Furman writes:

 > Substitute open() with sending those bytes somewhere else:

Eg, pathlib.Path, which will raise?  Surely it should be safe to pass
a DirEntry to a pathlib constructor?  Note that having Path call
fsdecode implicitly is a bad idea, because we don't know the
provenance of generic bytes.  But by design of __fspath__, its value
(if str) is suitable for passing to Path, for further processing.

 > why should I have to reencode this str back to bytes, when bytes
 > are what I asked for in the first place?

Erm, you didn't *ask* for bytes.  You asked for whatever __fspath__ is
going to give you.  And in many cases, like pathlib, it will be str.
I imagine that doesn't bother you; you plan to use antipathy anyway.
But if there's uptake on the protocol, I'll bet that str-only
implementations are the majority.

And your question also cuts the other way.  Why should *I* have to
decode bytes to str, or suffer unexpected TypeErrors, or deal with the
possibility of TypeErrors, just because __fspath__ is polymorphic?

We're here to improve pathlib.  There's been a huge amount of mission
creep, with no use cases to provide intuition.  You pit your abstract
inconvenience against my 20 years of whack-a-mole with UnicodeErrors
and TypeErrors in Mailman.  I *know* that if you let bytes that
represent text loose inside an application, eventually they'll end up
in a str context and "blooey!"

 > How did this application get a bytes path object to begin with?
 > Either it explicitly used bytes when calling scandir and friends
 > (in which case it shouldn't be surprised to be working with bytes);
 > or it got that bytes object from a database, over-the-wire,
 > an-other-language-lib, etc.

No, it got it from an __fspath__-toting object (such as a DirEntry) it
received from some library, which constructed it polymorphically from
bytes it got from some other place -- and so lost the original
encoding.  That's the scenario I think is impossible to rule out, and
reducing that kind of scenario to the bare minimum is why bytes got
demoted from being the default representation of text in Python 3 in
the first place.

 > If I'm working with bytes, why would I want to work with str?

First, are you actually *working* on those bytes, or are you just
passing them to os functions?  If the latter, you shouldn't care.

Second, because paths are conceptually text (you may not agree, but
Nick inter alia has indicated he does).  Working with bytes paths
(except literals) is a good way to get in trouble, because there are
all kinds of ways they can end up inappropriately encoded.  For
example, the odds are very high that a bytes path read from a file
(including from a zipfile directory) in Japan will be encoded in Shift
JIS.  On Mac OS X, that will either produce mojibake in the directory
(if the access creates the file) or fail to access the intended file,
because the filesystem encoding is UTF-8.

Third, because you want to be portable to Windows, where you have no
choice about whether paths are str or bytes.

These reasons probably don't apply to you with much strength, but the
question is how typical you are, vs. the nearly universal experience
of mojibake and the dominant market share of Windows.

 > Python is a glue language, and Python practitioners don't always
 > have the luxury of working only with text.

For paths?  Of course you can work with them as text.  ISTM what you
really want is the luxury of working only with bytes, because you're
in the habit of pretending they are text.  I don't object to you
having your luxury as long as it doesn't increase risk for my use
cases.  I think you're asking for trouble, and the practice is
definitely nonportable, but consenting adults applies.

However, the proposed polymorphism does create ambiguity and risk for
my uses.  I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon.  Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging!  So I don't consent;
you'll have to impose it on me.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread MRAB

On 2016-04-14 21:42, Armin Rigo wrote:

Hi Victor,

On 14 April 2016 at 17:19, Victor Stinner  wrote:

Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version.


A detail, but why not set the version tag of new empty dictionaries to
zero, always?   Same after a clear().  This would satisfy the
condition: equality of the version tag is supposed to mean "the
dictionary content is precisely the same".

If you did that, wouldn't it then be possible to replace an empty dict 
with another empty dict with you noticing? Would that matter?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Yury Selivanov



On 2016-04-14 4:42 PM, Armin Rigo wrote:

Hi Victor,

On 14 April 2016 at 17:19, Victor Stinner  wrote:

Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version.

A detail, but why not set the version tag of new empty dictionaries to
zero, always?   Same after a clear().  This would satisfy the
condition: equality of the version tag is supposed to mean "the
dictionary content is precisely the same".


So

{}.version_tag == {}.version_tag == 0
{'a':1}.version_tag != {'a':1}.version_tag

right?

For my patches I need globally unique version tags
(making an exception for empty dicts is OK).

Yury

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Glenn Linderman

On 4/14/2016 3:33 PM, Victor Stinner wrote:

When we will be able to get ride of the GIL for the dict type, we will
probably be able to get an atomic "global_version++" for 64-bit
integer. Right now, I don't think that an atomic int64++ is available
on 32-bit archs.
By the time we get an atomic increment for 64-bit integer, we'll be 
wanting it for 128-bit...
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
2016-04-15 0:22 GMT+02:00 Brett Cannon :
> And even if it was GIL-free you do run the risk of two dicts ending up at
> the same version # by simply mutating the same number of times if the
> counters were per-dict instead of process-wide.

For some optimizations, it is not needed to check if the dictionary
was replaced, or you check it directly. So it doesn't matter to have
the same version with the same number of operations.

For the use case of Yury's optimization, having a globally unique
version tag makes the guard much cheaper, and the guard must check
that the dictionary was not replaced.

IMHO it's cheap enough to make the version globally unique. I don't
see any technical drawback of having a globally unique version. It
doesn't make the integer overflow much more likely. We are still
talking about many years before an overflow occurs.

--

When we will be able to get ride of the GIL for the dict type, we will
probably be able to get an atomic "global_version++" for 64-bit
integer. Right now, I don't think that an atomic int64++ is available
on 32-bit archs.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Brett Cannon
On Thu, 14 Apr 2016 at 15:14 Victor Stinner 
wrote:

> 2016-04-14 23:29 GMT+02:00 Barry Warsaw :
> > I can see why you might want a global version number, but not doing so
> would
> > eliminate an implicit reliance on the GIL, or in a GIL-less
> implementation
> >  a lock around incrementing the global version number.
>
> It's not like the builtin dict type is going to become GIL-free... So
> I think that it's ok to use a global version.
>
> A very few know that, but the GIL has some advantages sometimes...
>

And even if it was GIL-free you do run the risk of two dicts ending up at
the same version # by simply mutating the same number of times if the
counters were per-dict instead of process-wide.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
2016-04-14 23:29 GMT+02:00 Barry Warsaw :
> I can see why you might want a global version number, but not doing so would
> eliminate an implicit reliance on the GIL, or in a GIL-less implementation
>  a lock around incrementing the global version number.

It's not like the builtin dict type is going to become GIL-free... So
I think that it's ok to use a global version.

A very few know that, but the GIL has some advantages sometimes...

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Barry Warsaw
On Apr 14, 2016, at 11:17 PM, Victor Stinner wrote:

>You're right that incrementing the global version is useless for these
>specific cases, and using the version 0 should work. It only matters
>that the version (version? version tag?) is different.
>
>I will play with that. If I don't see any issue, I will update the PEP.
>
>It's more an implementation detail, but it may help to mention it in the PEP.

I can see why you might want a global version number, but not doing so would
eliminate an implicit reliance on the GIL, or in a GIL-less implementation
 a lock around incrementing the global version number.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
2016-04-14 22:50 GMT+02:00 Barry Warsaw :
> Although I'm not totally convinced, I won't continue to object.  You've
> provided some performance numbers in the PEP even without FAT, and you aren't
> exposing the API to Python, so it's not a burden being imposed on other
> implementations.

Cool!

Ah right, the PEP evolved since its first version sent to
python-ideas. I didn't recall the full context of the discussion. The
PEP is now more complete and it has more known (future) use cases ;-)
(now maybe also Cython?)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
Hi,

2016-04-14 22:42 GMT+02:00 Armin Rigo :
> Hi Victor,
>
> On 14 April 2016 at 17:19, Victor Stinner  wrote:
>> Each time a dictionary is created, the global
>> version is incremented and the dictionary version is initialized to the
>> global version.
>
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".

You're right that incrementing the global version is useless for these
specific cases, and using the version 0 should work. It only matters
that the version (version? version tag?) is different.

I will play with that. If I don't see any issue, I will update the PEP.

It's more an implementation detail, but it may help to mention it in the PEP.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Émanuel Barry
> From Armin Rigo
> Sent: Thursday, April 14, 2016 4:42 PM
> To: Victor Stinner
> Cc: Python Dev
> Subject: Re: [Python-Dev] RFC: PEP 509: Add a private version to dict
> 
> Hi Victor,
> 
> On 14 April 2016 at 17:19, Victor Stinner  wrote:
> > Each time a dictionary is created, the global
> > version is incremented and the dictionary version is initialized to the
> > global version.
> 
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".

>From Victor's original post:

"Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced."

I think it's a good design idea, and there's no chance that this counter will 
ever overflow (I think Victor is using 64-bit unsigned integer). I don't think 
there's really any drawback to using a global vs per-dict counter (but Victor 
is better placed to answer that :))

-Emanuel

~Ducks lay where no programmer has ever been~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Barry Warsaw
On Apr 14, 2016, at 09:49 PM, Victor Stinner wrote:

>It would be nice to hear Barry Warsow who was opposed to the PEP in
>january. He wanted to wait until FAT Python was proven to really be faster,
>which is still not case right now. (I mean that I didnt't run seriously
>benchmarks, but early macro benchmarks are not really promising, only micro
>benchmarks. I expect better results when the implemenation will be more
>complete.)

Although I'm not totally convinced, I won't continue to object.  You've
provided some performance numbers in the PEP even without FAT, and you aren't
exposing the API to Python, so it's not a burden being imposed on other
implementations.

Cheers,
-Barry


pgpf0nC005Wjr.pgp
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Armin Rigo
Hi Victor,

On 14 April 2016 at 17:19, Victor Stinner  wrote:
> Each time a dictionary is created, the global
> version is incremented and the dictionary version is initialized to the
> global version.

A detail, but why not set the version tag of new empty dictionaries to
zero, always?   Same after a clear().  This would satisfy the
condition: equality of the version tag is supposed to mean "the
dictionary content is precisely the same".


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Stefan Behnel
Victor Stinner schrieb am 14.04.2016 um 21:56:
> Which kind of usage do you see in Cython?

Mainly caching, I guess. We could avoid global/module name lookups in some
cases, especially inside of loops.


> Off-topic (PEP 510):
> 
> I really want to experiment automatic generation of Cython code from the
> Python using profiling to discover function parameters types. Then use the
> PEP 510 to attach the fast Cython code to a Python function, but fallback
> to bytecode if the types are different. See the example of builtin
> functions in the PEP:
> https://www.python.org/dev/peps/pep-0510/#using-builtin-function
> 
> Before having something fully automated, we can use some manual steps, like
> annotate manually function types, compile manually the code, etc.

Sounds like Cython's "Fused Types" could help here:

http://docs.cython.org/src/userguide/fusedtypes.html

It's essentially a generic functions implementation and you get a dispatch
either at compile time or runtime, depending on where (Python/Cython) and
how you call a function.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
Which kind of usage do you see in Cython?

Off-topic (PEP 510):

I really want to experiment automatic generation of Cython code from the
Python using profiling to discover function parameters types. Then use the
PEP 510 to attach the fast Cython code to a Python function, but fallback
to bytecode if the types are different. See the example of builtin
functions in the PEP:
https://www.python.org/dev/peps/pep-0510/#using-builtin-function

Before having something fully automated, we can use some manual steps, like
annotate manually function types, compile manually the code, etc.

Victor

Le jeudi 14 avril 2016, Stefan Behnel  a écrit :

> +1 from me, too. I'm sure we can make some use of this in Cython.
>
> Stefan
>
>
> Victor Stinner schrieb am 14.04.2016 um 17:19:
> > PEP: 509
> > Title: Add a private version to dict
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
It would be nice to hear Barry Warsow who was opposed to the PEP in
january. He wanted to wait until FAT Python was proven to really be faster,
which is still not case right now. (I mean that I didnt't run seriously
benchmarks, but early macro benchmarks are not really promising, only micro
benchmarks. I expect better results when the implemenation will be more
complete.)

The main change since january is that Yury wrote a patch making method
calls using the PEP.
https://mail.python.org/pipermail/python-dev/2016-January/142772.html

Victor

Le jeudi 14 avril 2016, Guido van Rossum  a écrit :

> I'll wait a day before formally pronouncing to see if any objections
> are made, but it looks good to me.
>
> On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner
> > wrote:
> > Hi,
> >
> > I updated my PEP 509 to make the dictionary version globally unique.
> > With *two* use cases of this PEP (Yury's method call patch and my FAT
> > Python project), I think that the PEP is now ready to be accepted.
> >
> > Globally unique identifier is a requirement for Yury's patch
> > optimizing method calls ( https://bugs.python.org/issue26110 ). It
> > allows to check for free if the dictionary was replaced.
> >
> > I also renamed the ma_version field to ma_version_tag.
> >
> > HTML version:
> > https://www.python.org/dev/peps/pep-0509/
> >
> > Victor
> >
> >
> > PEP: 509
> > Title: Add a private version to dict
> > Version: $Revision$
> > Last-Modified: $Date$
> > Author: Victor Stinner >
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 4-January-2016
> > Python-Version: 3.6
> >
> >
> > Abstract
> > 
> >
> > Add a new private version to the builtin ``dict`` type, incremented at
> > each dictionary creation and at each dictionary change, to implement
> > fast guards on namespaces.
> >
> >
> > Rationale
> > =
> >
> > In Python, the builtin ``dict`` type is used by many instructions. For
> > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> > global namespace, or in the builtins namespace (two dict lookups).
> > Python uses ``dict`` for the builtins namespace, globals namespace, type
> > namespaces, instance namespaces, etc. The local namespace (namespace of
> > a function) is usually optimized to an array, but it can be a dict too.
> >
> > Python is hard to optimize because almost everything is mutable: builtin
> > functions, function code, global variables, local variables, ... can be
> > modified at runtime. Implementing optimizations respecting the Python
> > semantics requires to detect when "something changes": we will call
> > these checks "guards".
> >
> > The speedup of optimizations depends on the speed of guard checks. This
> > PEP proposes to add a version to dictionaries to implement fast guards
> > on namespaces.
> >
> > Dictionary lookups can be skipped if the version does not change which
> > is the common case for most namespaces. Since the version is globally
> > unique, the version is also enough to check if the namespace dictionary
> > was not replaced with a new dictionary. The performance of a guard does
> > not depend on the number of watched dictionary entries, complexity of
> > O(1), if the dictionary version does not change.
> >
> > Example of optimization: copy the value of a global variable to function
> > constants.  This optimization requires a guard on the global variable to
> > check if it was modified. If the variable is modified, the variable must
> > be loaded at runtime when the function is called, instead of using the
> > constant.
> >
> > See the `PEP 510 -- Specialized functions with guards
> > `_ for the concrete usage of
> > guards to specialize functions and for the rationale on Python static
> > optimizers.
> >
> >
> > Guard example
> > =
> >
> > Pseudo-code of an fast guard to check if a dictionary entry was modified
> > (created, updated or deleted) using an hypothetical
> > ``dict_get_version(dict)`` function::
> >
> > UNSET = object()
> >
> > class GuardDictKey:
> > def __init__(self, dict, key):
> > self.dict = dict
> > self.key = key
> > self.value = dict.get(key, UNSET)
> > self.version = dict_get_version(dict)
> >
> > def check(self):
> > """Return True if the dictionary entry did not changed
> > and the dictionary was not replaced."""
> >
> > # read the version of the dict structure
> > version = dict_get_version(self.dict)
> > if version == self.version:
> > # Fast-path: dictionary lookup avoided
> > return True
> >
> > # lookup in the dictionary
> > value = self.dict.get(self.key, UNSET)
> > if value is self.value:
> > # another key was 

Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Koos Zevenhoven
On Thu, Apr 14, 2016 at 9:35 PM, Random832  wrote:
> On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
>> (1) Code that has access to pathname/filename data and has some level
>> of control over what data type comes in. This code may for instance
>> choose to deal with either bytes or str
>>
>> (2) Code that takes the path or file name that it happens to get and
>> does something with it. This type of code can be divided into
>> subgroups as follows:
>>
>>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
>> pathlib) and fails if it gets something else.
>
> Ideally, these should go away.
>

I don't think so. (1) might even be the most common type of all code.
This is code that gets a path from user input, from a config file,
from a database etc. and then does things with it, typically including
passing it to type (2) code and potentially getting a path back from
there too.

>>   (2b) Code that wants to support different types of paths such as
>> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
>> and various other standard library code. Presumably there is also
>> third-party code that does the same. These functions may want to
>> preserve the str-ness or bytes-ness of the paths in case they return
>> paths, as the stdlib now does. But new code may even want to return
>> pathlib objects when they get such objects as inputs.
>
> Hold on. None of the discussion I've seen has included any way to
> specify how to construct a new object representing a different path
> other than the ones passed in. Surely you're not suggesting type(a)(b).
>

That's right. This protocol is not solving the issue of returning
'rich' path objects. It's solving the issue of passing those objects
to lower-level functions or to interact with other 'rich' path types.
What I meant by this is that there may be code that *does* want to do
type(a)(b), which is out of our control. Maybe I should not have
mentioned that.

> Also, how does DirEntry fit in with any of this?
>

os.scandir + DirEntry are one of the many things in the stdlib that
give you pathnames of the same type as those that were put in.

>> This is the
>> duck-typing or polymorphic code we have been talking about. Code of
>> this type (2b) may want to avoid implicit conversions because it makes
>> the life of code of the other types more difficult.
>
> As long as the type it returns is still a path/bytes/str (and therefore
> can be accepted when the caller passes it somewhere else) what's the
> problem?

No, because not all paths are passed to the function that does the
implicit conversion, and then when for instance os.path.joining two
paths of a differenty type, it raises an error.

In other words: Most non-library code (even library code?) deals with
one specific type and does not want implicit conversions to other
types. Some code (2b) deals with several types and, at least in the
stdlib, such code returns paths of the same type as they are given,
which makes said "most non-library code" happy, because it does not
force the programmer to think about type conversions.

(Then there is also code that explicitly deals with type conversions,
such as os.fsencode and os.fsdecode.)

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/13/2016 02:37 PM, Victor Stinner wrote:


I'm not a big fan of a flag parameter to change the return type of a
function. Usually, two functions are preferred. In the os module we have
getcwd/getcwdb for example. I don't know if it's a good example


I think of os.fspath() as more of a filter/reduce operation:

- str -> str
- str DirEntry -> str

- bytes -> bytes
- bytes DirEntry -> bytes

The purpose of os.fspath() (at least the one I'm arguing for ;) is to 
distil its inputs to the lowest common denominator, and no lower -- 
which is either str for string-based path objects, or bytes for 
bytes-based path objects.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
> (1) Code that has access to pathname/filename data and has some level
> of control over what data type comes in. This code may for instance
> choose to deal with either bytes or str
> 
> (2) Code that takes the path or file name that it happens to get and
> does something with it. This type of code can be divided into
> subgroups as follows:
> 
>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
> pathlib) and fails if it gets something else.

Ideally, these should go away.

>   (2b) Code that wants to support different types of paths such as
> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
> and various other standard library code. Presumably there is also
> third-party code that does the same. These functions may want to
> preserve the str-ness or bytes-ness of the paths in case they return
> paths, as the stdlib now does. But new code may even want to return
> pathlib objects when they get such objects as inputs.

Hold on. None of the discussion I've seen has included any way to
specify how to construct a new object representing a different path
other than the ones passed in. Surely you're not suggesting type(a)(b).

Also, how does DirEntry fit in with any of this?

> This is the
> duck-typing or polymorphic code we have been talking about. Code of
> this type (2b) may want to avoid implicit conversions because it makes
> the life of code of the other types more difficult.

As long as the type it returns is still a path/bytes/str (and therefore
can be accepted when the caller passes it somewhere else) what's the
problem?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 10:22 AM, Paul Moore wrote:

On 14 April 2016 at 17:46, Ethan Furman wrote:



If you are not working at the bytes layer, you shouldn't be getting bytes
objects because:

- you specified str when asking for data from the OS, or
- you transformed the incoming bytes from whatever external source
   to str when you received them.


My experience is that (particularly with code that was originally
written for Python 2) "you have control of your data" is often an
illusion - bytes can appear in code from unexpected sources, and when
they do I'd rather see an error if I'm using code where I expect a
string. Certainly that's a bug in the code - all I'm saying is that it
fail early rather than late.


If we have one function that uses a flag and you leave the flag alone 
(it defaults to rejecting bytes) -- voila!  An error is raised when 
bytes show up.



I'd appreciate it if anyone can clarify why "gracefully extending" the
protocol to include bytes support at a later date isn't practical.


It's going to be a bunch of work.  I don't want to do the work twice.

On the other hand, if while doing the work it becomes apparent that 
supporting bytes and str in the protocol is either infeasible, 
confusing, or a plain ol' bad idea I have no problem ripping out the 
bytes support and going to str only.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Ethan Furman

On 04/14/2016 10:02 AM, Random832 wrote:


"between versions" is ambiguous. It could mean that there's no guarantee
that there will be no changes from one version to the next, or it could
mean, even more strongly, that there's no guarantee that there will be
no changes in a maintenance release (which are, after all, released
*between* minor releases)


I don't see us making a breaking change in a maintenance release except 
to fix something that was already broken.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Koos Zevenhoven
On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furman  wrote:
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

There is an apparent contradiction of the above with some previous
posts, including your own. Let me try to fix it:

Code that deals with paths can be divided in groups as follows:

(1) Code that has access to pathname/filename data and has some level
of control over what data type comes in. This code may for instance
choose to deal with either bytes or str

(2) Code that takes the path or file name that it happens to get and
does something with it. This type of code can be divided into
subgroups as follows:

  (2a) Code that accepts only one type of paths (e.g. str, bytes or
pathlib) and fails if it gets something else.

  (2b) Code that wants to support different types of paths such as
str, bytes or pathlib objects. This includes os.path.*, os.scandir,
and various other standard library code. Presumably there is also
third-party code that does the same. These functions may want to
preserve the str-ness or bytes-ness of the paths in case they return
paths, as the stdlib now does. But new code may even want to return
pathlib objects when they get such objects as inputs. This is the
duck-typing or polymorphic code we have been talking about. Code of
this type (2b) may want to avoid implicit conversions because it makes
the life of code of the other types more difficult.

(feel free to fill in more categories of code)

So the code of type (2b) is trying to make all categories happy by
returning objects of the same type that it gets as input, while the
other categories are probably in the situation where they don't
necessarily need to make other categories of code happy.

And the question is this: Do we need to make code using both bytes
*and* scandir happy? This is largely the same question as whether we
have to support bytes in addition to str in the protocol.

(We may of course talk about third-party path libraries that have the
same problem as scandir's DirEntry. Ethan's library is not exactly in
the same category as DirEntry since its path objects *are* instances
of bytes or str and therefore do not need this protocol to begin with,
except perhaps for conversions from other high-level path types so
that different path libraries work together nicely).

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Paul Moore
On 14 April 2016 at 17:46, Ethan Furman  wrote:
> On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:
>
>> I am saying that if os.path.join now accepts RichPath objects, and those
>> objects can return either str or bytes, then its much harder to reason
>> about
>> when I have all bytes or all strings. In essence, you will force me to
>> pre-
>> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
>> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if
>> I
>> have to always do that wrapping then os.path.join doesn't need to accept
>> RichPath objects and call fspath at all.
>
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

My experience is that (particularly with code that was originally
written for Python 2) "you have control of your data" is often an
illusion - bytes can appear in code from unexpected sources, and when
they do I'd rather see an error if I'm using code where I expect a
string. Certainly that's a bug in the code - all I'm saying is that it
fail early rather than late.

Having said this, I don't have an actual use case - but equally it
seems to me that our problem is that *nobody* does (yet) because
uptake of pathlib has been slow, thanks to limited stdlib support. My
view remains that we should get the (relatively simple and
uncontroversial) str support in place, and defer bytes support for
when we have experience with that.

I'd appreciate it if anyone can clarify why "gracefully extending" the
protocol to include bytes support at a later date isn't practical.
Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 12:56, Terry Reedy wrote:
> https://docs.python.org/3/library/dis.html#module-dis
> CPython implementation detail: Bytecode is an implementation detail of 
> the CPython interpreter. No guarantees are made that bytecode will not 
> be added, removed, or changed between versions of Python.
> 
> Version = minor release, as opposed to maintenance release.

"between versions" is ambiguous. It could mean that there's no guarantee
that there will be no changes from one version to the next, or it could
mean, even more strongly, that there's no guarantee that there will be
no changes in a maintenance release (which are, after all, released
*between* minor releases)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Guido van Rossum
I'll wait a day before formally pronouncing to see if any objections
are made, but it looks good to me.

On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner
 wrote:
> Hi,
>
> I updated my PEP 509 to make the dictionary version globally unique.
> With *two* use cases of this PEP (Yury's method call patch and my FAT
> Python project), I think that the PEP is now ready to be accepted.
>
> Globally unique identifier is a requirement for Yury's patch
> optimizing method calls ( https://bugs.python.org/issue26110 ). It
> allows to check for free if the dictionary was replaced.
>
> I also renamed the ma_version field to ma_version_tag.
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
> Victor
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> 
>
> Add a new private version to the builtin ``dict`` type, incremented at
> each dictionary creation and at each dictionary change, to implement
> fast guards on namespaces.
>
>
> Rationale
> =
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. Since the version is globally
> unique, the version is also enough to check if the namespace dictionary
> was not replaced with a new dictionary. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> `_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
> UNSET = object()
>
> class GuardDictKey:
> def __init__(self, dict, key):
> self.dict = dict
> self.key = key
> self.value = dict.get(key, UNSET)
> self.version = dict_get_version(dict)
>
> def check(self):
> """Return True if the dictionary entry did not changed
> and the dictionary was not replaced."""
>
> # read the version of the dict structure
> version = dict_get_version(self.dict)
> if version == self.version:
> # Fast-path: dictionary lookup avoided
> return True
>
> # lookup in the dictionary
> value = self.dict.get(self.key, UNSET)
> if value is self.value:
> # another key was modified:
> # cache the new dictionary version
> self.version = version
> return True
>
> # the key was modified
> return False
>
>
> Usage of the dict version
> =
>
> Speedup method calls 1.2x
> -
>
> Yury Selivanov wrote a `patch to optimize method calls
> `_. The patch depends on the
> `implement per-opcode cache in ceval
> `_ patch which requires dictionary
> versions to invalidate the cache if the globals dictionary or the
> builtins dictionary has been modified.
>
> The cache also requires that the dictionary version is globally unique.
> It is possible to define a function in a namespace and call it
> in a different namespace: using ``exec()`` with the *globals* 

Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Terry Reedy

On 4/14/2016 12:03 PM, Nikita Nemkin wrote:


I think that Python should make bytecode explicitly unstable and subject
to change with any major release.


https://docs.python.org/3/library/dis.html#module-dis
CPython implementation detail: Bytecode is an implementation detail of 
the CPython interpreter. No guarantees are made that bytecode will not 
be added, removed, or changed between versions of Python.


Version = minor release, as opposed to maintenance release.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Stefan Behnel
+1 from me, too. I'm sure we can make some use of this in Cython.

Stefan


Victor Stinner schrieb am 14.04.2016 um 17:19:
> PEP: 509
> Title: Add a private version to dict


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:


I am saying that if os.path.join now accepts RichPath objects, and those
objects can return either str or bytes, then its much harder to reason about
when I have all bytes or all strings. In essence, you will force me to pre-
wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I
have to always do that wrapping then os.path.join doesn't need to accept
RichPath objects and call fspath at all.


What many folks seem to be missing is that *you* (generic you) have 
control of your data.


If you are not working at the bytes layer, you shouldn't be getting 
bytes objects because:


- you specified str when asking for data from the OS, or
- you transformed the incoming bytes from whatever external source
  to str when you received them.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
2016-04-14 18:28 GMT+02:00 Brett Cannon :
> +1 from me!

Thanks.

> A couple of grammar/typo suggestions below.

Fixed. (Yes, I want to use unsigned type, so PY_UINT64_T.)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 09:09 AM, Victor Stinner wrote:

2016-04-14 16:54 GMT+02:00 Ethan Furman:



I consider that the final goal of the whole discussion is to support
something like:

  path = os.path.join(pathlib_path, "str_path", direntry)

(...)
I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.


This would be where we strongly disagree.


FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)


Absolutely.  I appreciate you explaining your point of view.


At least, we now identified better a point of disagreement.


Agreed.  :)

~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Donald Stufft  stufft.io> writes:

> > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev  python.org> wrote:
> > 
> > In essence, you will force me to pre-
> > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> > os.fsdecode(os.fspath(path)), just so I can reason about the type.
> 
> This is only the case if you have a singular RichPath object that can 
represent both bytes and str (which is
> what DirEntry does, which I agree makes it harder… but that’s already the 
case with DirEntry.path).
> However that’s not the case if you have a bRichPath and uRichPath.

And you might even be able to retain your sanity if you enforce any 
particular class to be either bRichPath or uRichPath. But if you do that, 
then that still leaves DirEntry out in the cold, likely converting to str in 
its __fspath__. Which leaves me in the camp that bRichPath falls under YAGNI, 
and RichPath should be str only.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Brett Cannon
On Thu, 14 Apr 2016 at 09:16 Nikita Nemkin  wrote:

> On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner
>  wrote:
> >
> > Would you like to work on a patch to implement that change?
>
> I'll work on a patch. Should I post it to bugs.python.org?
>

Yep.


>
> > Since Python 3.6 may get a new bytecode format  (wordcode, see the
> > other thread on this mlailing list), I think that it's ok to change
> > MAKE_FUNCTION in the same release.
>
> Wordcode looks like pure win from (projected) 25% bytecode size
> reduction alone.
>

CPU performance is more the worry here (which looks mostly unaffected,
maybe even faster), but reduced .pyc files is a nice perk. :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Brett Cannon
+1 from me!

A couple of grammar/typo suggestions below.

On Thu, 14 Apr 2016 at 08:20 Victor Stinner 
wrote:

> Hi,
>
> I updated my PEP 509 to make the dictionary version globally unique.
> With *two* use cases of this PEP (Yury's method call patch and my FAT
> Python project), I think that the PEP is now ready to be accepted.
>
> Globally unique identifier is a requirement for Yury's patch
> optimizing method calls ( https://bugs.python.org/issue26110 ). It
> allows to check for free if the dictionary was replaced.
>
> I also renamed the ma_version field to ma_version_tag.
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
> Victor
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> 
>
> Add a new private version to the builtin ``dict`` type, incremented at
> each dictionary creation and at each dictionary change, to implement
> fast guards on namespaces.
>
>
> Rationale
> =
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. Since the version is globally
> unique, the version is also enough to check if the namespace dictionary
> was not replaced with a new dictionary. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> `_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
> UNSET = object()
>
> class GuardDictKey:
> def __init__(self, dict, key):
> self.dict = dict
> self.key = key
> self.value = dict.get(key, UNSET)
> self.version = dict_get_version(dict)
>
> def check(self):
> """Return True if the dictionary entry did not changed
> and the dictionary was not replaced."""
>

"did not change"


>
> # read the version of the dict structure
> version = dict_get_version(self.dict)
> if version == self.version:
> # Fast-path: dictionary lookup avoided
> return True
>
> # lookup in the dictionary
> value = self.dict.get(self.key, UNSET)
> if value is self.value:
> # another key was modified:
> # cache the new dictionary version
> self.version = version
> return True
>
> # the key was modified
> return False
>
>
> Usage of the dict version
> =
>
> Speedup method calls 1.2x
> -
>
> Yury Selivanov wrote a `patch to optimize method calls
> `_. The patch depends on the
> `implement per-opcode cache in ceval
> `_ patch which requires dictionary
> versions to invalidate the cache if the globals dictionary or the
> builtins dictionary has been modified.
>
> The cache also requires that the dictionary version is globally unique.
> It is possible to define a function in a namespace and call it
> in a different namespace: using ``exec()`` with the *globals* parameter
> for example. In this 

Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 12:05, Stephen J. Turnbull wrote:
> Random832 writes:
> 
>  > And what such incompatibilities exist between bytes and str for the
>  > purpose of representing file paths?
> 
> A plethora of encodings.

Only one encoding, fsencode/fsdecode. All other encodings are not for
filenames.

>  > At the end of the day, there's exactly one answer to "what file on
>  > disk this represents (or would represent if it existed)".
> 
> Nope.  Suppose those bytes were read from a file or a socket?  It's
> dangerous to assume that encoding matches the file system's.

Why can I pass them to os.open, then, or to os.path.join so long as
everything else is also bytes?

On UNIX, the filesystem is in bytes, so saying that bytes can't match
the filesystem is absurd. Converting it to str with fsdecode will
*always, absolutely, 100% of the time* give a str that will address the
same file that the bytes does (even if it's "dangerous" to assume that
was the name the user wanted, that's beyond the scope of what the module
is capable of dealing with).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Nikita Nemkin
On Thu, Apr 14, 2016 at 8:27 PM, Guido van Rossum  wrote:
> Great analysis! What might stand in the way of adoption is concern for
> bytecode manipulation libraries that would have to be changed. What
> might encourage adoption would be a benchmark showing this saves a lot
> of time.
>
> Personally I'm expecting it won't make much of a difference for real
> programs since almost always the cost of creating the function is
> dwarfed by the (total) cost of running it. But Python does create a
> lot of functions, and there's also lambdas.

This change alone is very unlikely to have a measurable performance impact.
The intention is to clean up ceval.c/compile.c a bit, nothing more.
If many other opcodes were somehow slimmed down in the similar fashion,
then we might (or might not) see perf gains.

For example, most slot dispatch opcodes can be compressed into a single
opcode+slot index with inlined dispatch logic, instead of each one individually
calling C API functions...

> There's also talk of switching to wordcode, in a different thread.
> Maybe the idea would be easier to introduce there? (Bytecode libraries
> would have to change anyways, so the additional concern for this
> change would be minimal.)

Wordcode can benefit from this change, because it guarantees
single-byte MAKE_FUNCTION oparg.

I think that Python should make bytecode explicitly unstable and subject
to change with any major release. The potential for a faster Python
interpreter (or simple JIT) is huge; requiring bytecode compatibility
will slow down any progress in this area.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Nikita Nemkin
On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner
 wrote:
>
> Would you like to work on a patch to implement that change?

I'll work on a patch. Should I post it to bugs.python.org?

> Since Python 3.6 may get a new bytecode format  (wordcode, see the
> other thread on this mlailing list), I think that it's ok to change
> MAKE_FUNCTION in the same release.

Wordcode looks like pure win from (projected) 25% bytecode size
reduction alone.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Donald Stufft

> On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev 
>  wrote:
> 
> In essence, you will force me to pre-
> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> os.fsdecode(os.fspath(path)), just so I can reason about the type.


This is only the case if you have a singular RichPath object that can represent 
both bytes and str (which is what DirEntry does, which I agree makes it harder… 
but that’s already the case with DirEntry.path). However that’s not the case if 
you have a bRichPath and uRichPath.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 17:29 GMT+02:00 Ethan Furman :
> Interoperability with other systems and/or libraries.  If we use
> surrogateescape to transform str to bytes, and the other side does not, we
> no longer have a workable path.

I guess that you mean a Python library? When you exchange with
external programs or call a C libraries, Python is responsible to
encode Unicode to bytes with os.fsencode(). The external part is not
aware that Python uses surrogateescape, it gets "regular" bytes.

I suggest to consider such Python library as external programs and
libraries: convert Unicode to bytes with os.fsencode(), but also
process paths as Unicode "inside" your application.

It's the basic rule to handle correctly Unicode in an application:
decode inputs as soon as possible, and encode back as late as
possible. Encode/decode at borders.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 16:54 GMT+02:00 Ethan Furman :
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>>  path = os.path.join(pathlib_path, "str_path", direntry)
>>
>> (...)
>> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
>> just to make my life easier.
>
> This would be where we strongly disagree.

FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)

At least, we now identified better a point of disagreement.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Stephen J. Turnbull
Random832 writes:

 > And what such incompatibilities exist between bytes and str for the
 > purpose of representing file paths?

A plethora of encodings.

 > At the end of the day, there's exactly one answer to "what file on
 > disk this represents (or would represent if it existed)".

Nope.  Suppose those bytes were read from a file or a socket?  It's
dangerous to assume that encoding matches the file system's.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Ethan Furman  stoneleaf.us> writes:

> On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:
> > In particular, one RichPath
> > class might return bytes and another str, or even worse the same class 
might
> > sometimes return bytes and sometimes str. When will os.path.join blow up 
due
> > to mixing bytes and str and when will it work in those situations?
> 
> What are you asking here?  ...  Meaning allowing os.fspath() 
> and __fspath__ to return either bytes or str will never cause the 
> combination of bytes and str to work.  Said another way: if you are 
> using os.path.join then all the pieces have be str or all the pieces 
> have to be bytes.

I am saying that if os.path.join now accepts RichPath objects, and those 
objects can return either str or bytes, then its much harder to reason about 
when I have all bytes or all strings. In essence, you will force me to pre-
wrap all RichPath objects in either os.fsencode(os.fspath(path)) or 
os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I 
have to always do that wrapping then os.path.join doesn't need to accept 
RichPath objects and call fspath at all.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Brett Cannon
On Thu, 14 Apr 2016 at 03:26 Victor Stinner 
wrote:

>
> Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka"  a
> écrit :
> > A desirable but nonexistent feature is to write emails to authors of
> commits that broke buildbots. How hard to implement this?
>
> Yeah I also had this idea since many years but buildbots were quite
> unstable. Maybe we should be more strict to consider a buildbot as stable?
>

Depending on how fancy we get with our infrastructure after we move to
GitHub, we could theoretically end up with a PR-merging bot that can detect
which commit broke things and report on the PR that did it (we well as
report anywhere else we wanted to).


> I propose to experiment sending notifications of failure to the authors of
> changes *and* to a new mailing list. I would subscribe to such list. An
> even safer starting point would be to only start with the mailing list.
>
> FYI I'm connected to the #python-dev IRC channel which already contain
> these notifications. But I agree that mails are better.
>

Yeah, I'm one of those that doesn't sit on #python-dev due to the lack of a
persistently connected machine, so an email would work better (unless we
want to be trendy and write a bot for Slack/Skype/FB Messenger :).


> > What are you think about backporting recent regrtest to 2.7? Most needed
> features to me are the -m and -G options.
>
> Regrtest changed a lot in python 3.6 (new test.libregrtest library).
> I suggest to start from python 3.5.
>
> For -m: if it doesn't need to modify the unittest module, I agree.
>
> I don't know -G option.
>
> > Would be nice to add a feature for running every test in separate
> subprocess. This will isolate the effect of failed tests.
>
> See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I
> even mentionned my issue.
>
> I suggest to use -jN on all buildbot, at least -j1.
>
> Maybe -j2 is even better since many tests are waiting on IO or simple
> sleep.
>

Both ideas seems reasonable.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Victor Stinner
2016-04-14 17:27 GMT+02:00 Guido van Rossum :
> Great analysis! What might stand in the way of adoption is concern for
> bytecode manipulation libraries that would have to be changed.
> (...)
> There's also talk of switching to wordcode, in a different thread.

I agree that breaking backward compatibility just for MAKE_FUNCTION is
not worth. But if we accept the wordcode change, IMHO it's ok to take
this as an opportunity to also modify MAKE_FUNCTION.

> Maybe the idea would be easier to introduce there? (Bytecode libraries
> would have to change anyways, so the additional concern for this
> change would be minimal.)

Exactly ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Victor Stinner
2016-04-14 11:04 GMT+02:00 Nikita Nemkin :
> MAKE_FUNCTION opcode is complex due to the way it receives
> input arguments: (...)

Yeah, I was always disturbed how this opcode gets parameters.

> My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
> i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
> for keyword defaults and annotations.

I read the code. I fact, I don't understand why it wasn't done like
that since the beginning :-p

> Then, MAKE_FUNCTION will become a dramatically simpler
> 5 argument opcode, taking

Would you like to work on a patch to implement that change?

Since Python 3.6 may get a new bytecode format  (wordcode, see the
other thread on this mlailing list), I think that it's ok to change
MAKE_FUNCTION in the same release.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Guido van Rossum
Great analysis! What might stand in the way of adoption is concern for
bytecode manipulation libraries that would have to be changed. What
might encourage adoption would be a benchmark showing this saves a lot
of time.

Personally I'm expecting it won't make much of a difference for real
programs since almost always the cost of creating the function is
dwarfed by the (total) cost of running it. But Python does create a
lot of functions, and there's also lambdas.

There's also talk of switching to wordcode, in a different thread.
Maybe the idea would be easier to introduce there? (Bytecode libraries
would have to change anyways, so the additional concern for this
change would be minimal.)

On Thu, Apr 14, 2016 at 2:04 AM, Nikita Nemkin  wrote:
> MAKE_FUNCTION opcode is complex due to the way it receives
> input arguments:
>
>  1) default args, individually;
>  2) default kwonly args, individual name-value pairs;
>  3) a tuple of parameter names (single constant);
>  4) annotation values, individually;
>  5) code object;
>  6) qualname.
>
> The counts for 1,2,4 are packed into oparg bitfields, making oparg large.
>
> My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
> i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
> for keyword defaults and annotations.
>
> Then, MAKE_FUNCTION will become a dramatically simpler
> 5 argument opcode, taking
>
>  1) default args tuple (optional);
>  2) default keyword only args dict (optional);
>  3) annotations dict (optional);
>  4) code object;
>  5) qualname.
>
> These arguments correspond exactly to __annotations__, __kwdefaults__,
> __defaults__, __code__ and __qualname__ attributes.
>
> For optional args, oparg bits should indicate individual arg presence.
> (This also saves None checks in opcode implementation.)
>
> If we add another optional argument (and oparg bit) for __closure__
> attribute, then separate MAKE_CLOSURE opcode becomes unnecessary.
>
> Default args tuple is likely to be a constant and can be packaged whole,
> compensating for the extra size of explicit BUILD_* instructions.
>
> Compare the current implementation:
>
> https://github.com/python/cpython/blob/master/Python/ceval.c#L3262
>
> with this provisional implementation (untested):
>
> TARGET(MAKE_FUNCTION) {
> PyObject *qualname = POP();
> PyObject *codeobj = POP();
> PyFunctionObject *func;
> func = (PyFunctionObject *)PyFunction_NewWithQualName(
>codeobj, f->f_globals, qualname);
> Py_DECREF(codeobj);
> Py_DECREF(qualname);
> if (func == NULL)
> goto error;
>
> /* NB: Py_None is not an acceptable value for these. */
> if (oparg & 0x08)
> func->func_closure = POP();
> if (oparg & 0x04)
> func->func_annotations = POP();
> if (oparg & 0x02)
> func->func_kwdefaults = POP();
> if (oparg & 0x01)
> func->func_defaults = POP();
>
> PUSH((PyObject *)func);
> DISPATCH();
> }
>
> compile.c also gets a bit simpler, but not much.
>
> What do you think?
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 07:01 AM, Random832 wrote:

On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:

Adding integers and floats is considered "safe" because most people's
use of floats completely compasses their use of ints. (You'll get
OverflowError if it can't be represented.) But float and Decimal are
considered "unsafe":

--> 1.5 + decimal.Decimal("1.5")
Traceback (most recent call last):
   File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'float' and
'decimal.Decimal'

This is more what's happening here. Floats and Decimals can represent
similar sorts of things, but with enough incompatibilities that you
can't simply merge them.


And what such incompatibilities exist between bytes and str for the
purpose of representing file paths? At the end of the day, there's
exactly one answer to "what file on disk this represents (or would
represent if it existed)".


Interoperability with other systems and/or libraries.  If we use 
surrogateescape to transform str to bytes, and the other side does not, 
we no longer have a workable path.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Ethan Furman

On 04/14/2016 07:52 AM, Stephen J. Turnbull wrote:

Nick Coghlan writes:



The use case for returning bytes from __fspath__ is DirEntry, so you
can write things like this in low level code:

def myscandir(dirpath):
for entry in os.scandir(dirpath):
if entry.is_file():
with open(entry) as f:
# do something


Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).


Substitute open() with sending those bytes somewhere else: why should I 
have to reencode this str back to bytes, when bytes are what I asked for 
in the first place?



If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.


How did this application get a bytes path object to begin with?  Either 
it explicitly used bytes when calling scandir and friends (in which case 
it shouldn't be surprised to be working with bytes); or it got that 
bytes object from a database, over-the-wire, an-other-language-lib, etc. 
 Those are the boundaries where bytes should be transformed to str if 
the app doesn't want to deal with bytes (whether for path manipulation 
or other text manipulation).  os.fspath() is not a boundary function and 
shouldn't be used as if it were.



If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?


If I'm working with bytes, why would I want to work with str?  Python is 
a glue language, and Python practitioners don't always have the luxury 
of working only with text.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-14 Thread Victor Stinner
Hi,

I updated my PEP 509 to make the dictionary version globally unique.
With *two* use cases of this PEP (Yury's method call patch and my FAT
Python project), I think that the PEP is now ready to be accepted.

Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced.

I also renamed the ma_version field to ma_version_tag.

HTML version:
https://www.python.org/dev/peps/pep-0509/

Victor


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract


Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.


Rationale
=

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.

See the `PEP 510 -- Specialized functions with guards
`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.


Guard example
=

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

UNSET = object()

class GuardDictKey:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.value = dict.get(key, UNSET)
self.version = dict_get_version(dict)

def check(self):
"""Return True if the dictionary entry did not changed
and the dictionary was not replaced."""

# read the version of the dict structure
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
return True

# lookup in the dictionary
value = self.dict.get(self.key, UNSET)
if value is self.value:
# another key was modified:
# cache the new dictionary version
self.version = version
return True

# the key was modified
return False


Usage of the dict version
=

Speedup method calls 1.2x
-

Yury Selivanov wrote a `patch to optimize method calls
`_. The patch depends on the
`implement per-opcode cache in ceval
`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.


Specialized functions using guards
--

The `PEP 510 -- Specialized functions with guards
`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python 

[Python-Dev] MAKE_FUNCTION simplification

2016-04-14 Thread Nikita Nemkin
MAKE_FUNCTION opcode is complex due to the way it receives
input arguments:

 1) default args, individually;
 2) default kwonly args, individual name-value pairs;
 3) a tuple of parameter names (single constant);
 4) annotation values, individually;
 5) code object;
 6) qualname.

The counts for 1,2,4 are packed into oparg bitfields, making oparg large.

My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
for keyword defaults and annotations.

Then, MAKE_FUNCTION will become a dramatically simpler
5 argument opcode, taking

 1) default args tuple (optional);
 2) default keyword only args dict (optional);
 3) annotations dict (optional);
 4) code object;
 5) qualname.

These arguments correspond exactly to __annotations__, __kwdefaults__,
__defaults__, __code__ and __qualname__ attributes.

For optional args, oparg bits should indicate individual arg presence.
(This also saves None checks in opcode implementation.)

If we add another optional argument (and oparg bit) for __closure__
attribute, then separate MAKE_CLOSURE opcode becomes unnecessary.

Default args tuple is likely to be a constant and can be packaged whole,
compensating for the extra size of explicit BUILD_* instructions.

Compare the current implementation:

https://github.com/python/cpython/blob/master/Python/ceval.c#L3262

with this provisional implementation (untested):

TARGET(MAKE_FUNCTION) {
PyObject *qualname = POP();
PyObject *codeobj = POP();
PyFunctionObject *func;
func = (PyFunctionObject *)PyFunction_NewWithQualName(
   codeobj, f->f_globals, qualname);
Py_DECREF(codeobj);
Py_DECREF(qualname);
if (func == NULL)
goto error;

/* NB: Py_None is not an acceptable value for these. */
if (oparg & 0x08)
func->func_closure = POP();
if (oparg & 0x04)
func->func_annotations = POP();
if (oparg & 0x02)
func->func_kwdefaults = POP();
if (oparg & 0x01)
func->func_defaults = POP();

PUSH((PyObject *)func);
DISPATCH();
}

compile.c also gets a bit simpler, but not much.

What do you think?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 06:56 AM, Victor Stinner wrote:

2016-04-14 15:40 GMT+02:00 Nick Coghlan:

>> Even earlier, Victor Stinner wrote:


I consider that the final goal of the whole discussion is to support
something like:

 path = os.path.join(pathlib_path, "str_path", direntry)


That's not a *new* problem though, it already exists if you pass in a
mix of bytes and str:
(...)
There's also already a solution (regardless of whether you want bytes
or str as the result), which is to explicitly coerce all the arguments
to the same type:

--> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
(...)


I don't understand. What is the point of adding a new __fspath__
protocol to *implicitly* convert path objects to strings, if you still
have to use an explicit conversion?


That's the crux of the issue -- some of us think the job of __fspath__ 
is to simply retrieve the inherent data from the pathy object, *not* to 
do any implicit conversions.



I would really expect that a high-level API like pathlib would solve
encodings issues for me. IMHO DirEntry entries created by
os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.


Then let pathlib do it. As a high-level interface I have no issue with 
pathlib converting DirEntry bytes objects to str using fsdecode (or 
whatever makes sense); os.path.join (and by extension os.fspath and 
__fspath__) should do no such thing.



os.path.join(*map(os.fsdecode, ("str", b"bytes")))


This code is quite complex for a newbie, don't you think so?


A newbie should be using pathlib.  If pathlib is not low-level enough, 
then the newbie needs to learn about low-level stuff.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Random832 writes:
 > On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:

 > > I have a strong preference for str only, because I still don't see a
 > > use case for polymorphic __fspath__.
 > 
 > Ultimately we're talking about redundancy and performance here.

Ultimately, yes.  Right now I have some epithets for you:  Premature!
Optimization!!  Get thee behind me, Satan!

More seriously, concrete use cases where this overhead matters?

Church-of-Don-Knuth-member-ly y'rs,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 05:16 AM, Victor Stinner wrote:


I consider that the final goal of the whole discussion is to support
something like:

 path = os.path.join(pathlib_path, "str_path", direntry)

Even if direntry uses a bytes filename. I expect genericpath.join() to
be patched to use os.fspath(). If os.fspath() returns bytes,
path.join() will fail with an annoying TypeError.

I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.


This would be where we strongly disagree.  If pathlib, as a high-level 
construct, wants to take that approach I have no issues, but the 
functions in os are low-level and as such should not be changing data 
types unless I ask for it.  I see __fspath__ as a retrieval mechanism, 
not a data-transformation mechanism.



You can apply the same rationale for the flavors 2 and 3
(os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
TypeError on os.path.join().


And that's fine.  Low-level interfaces should not change data types 
unless explicitly requested -- and we have fsencode() and fsdecode() for 
that.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > The use case for returning bytes from __fspath__ is DirEntry, so you
 > can write things like this in low level code:
 > 
 > def myscandir(dirpath):
 > for entry in os.scandir(dirpath):
 > if entry.is_file():
 > with open(entry) as f:
 > # do something

Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).  If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.

If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?

The more I think about this, the more I like my proposal to junk
fspath, and have fsdecode and fsencode consume __fspath__.  That way
application code can request its native type.

 > By contrast, as soon as you type "import pathlib" at the top of your
 > file, you've stepped outside the world of potentially pure boundary
 > code,

"Potentially pure" is an odd term to apply to the boundary code IMO.
We are agreed that conceptually paths are text, for human consumption
(at least at last report we were).  Therefore, paths represented as
bytes are inherently an impure construct.  Viz, surrogateescape.

 > and are instead dealing with structured application level
 > objects (which means traversing the bytes->str boundary before the
 > str->Path one).

That assumes that pathlib.Path's str-only design is appropriate.  I'm
questioning that, primarily as a thought experiment.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Ethan Furman

On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:

Brett Cannon writes:



After playing with and considering the 4 possibilities, anything where
__fspath__ can return bytes seems like insanity that flies in the face of
everything Python 3 is trying to accomplish. In particular, one RichPath
class might return bytes and another str, or even worse the same class might
sometimes return bytes and sometimes str. When will os.path.join blow up due
to mixing bytes and str and when will it work in those situations?


What are you asking here?  Exactly where in os.join mixing bytes & str 
the exception will occur, or will mixing bytes & str ever work?


The answer to the first is irrelevant (except for performance).

The answer to the second is always/never.  Meaning allowing os.fspath() 
and __fspath__ to return either bytes or str will never cause the 
combination of bytes and str to work.  Said another way: if you are 
using os.path.join then all the pieces have be str or all the pieces 
have to be bytes.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-14 15:40 GMT+02:00 Nick Coghlan :
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>> path = os.path.join(pathlib_path, "str_path", direntry)
>
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> (...)
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:
>
 os.path.join(*map(os.fsdecode, ("str", b"bytes")))
> (...)

I don't understand. What is the point of adding a new __fspath__
protocol to *implicitly* convert path objects to strings, if you still
have to use an explicit conversion?

I would really expect that a high-level API like pathlib would solve
encodings issues for me. IMHO DirEntry entries created by
os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.

os.path.join() is just one example of an operation on multiple paths.
Look at os.path for other example ;-)

> os.path.join(*map(os.fsdecode, ("str", b"bytes")))

This code is quite complex for a newbie, don't you think so?

My example was os.path.join(pathlib_path, "str_path", direntry) where
we can do something to make the API easier to use.

I don't propose to do anything for os.path.join("str", b"bytes") which
would continue to fail with TypeError, *as expected*.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
> Adding integers and floats is considered "safe" because most people's
> use of floats completely compasses their use of ints. (You'll get
> OverflowError if it can't be represented.) But float and Decimal are
> considered "unsafe":
> 
> >>> 1.5 + decimal.Decimal("1.5")
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unsupported operand type(s) for +: 'float' and
> 'decimal.Decimal'
> 
> This is more what's happening here. Floats and Decimals can represent
> similar sorts of things, but with enough incompatibilities that you
> can't simply merge them.

And what such incompatibilities exist between bytes and str for the
purpose of representing file paths? At the end of the day, there's
exactly one answer to "what file on disk this represents (or would
represent if it existed)".
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> 
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:

It'd be nice if that went away. Having to do that makes about as much
sense to me as if you had to explicitly coerce an int to a float to add
them together. Sure, explicit is better than implicit, but there are
limits. You're explicitly calling os.path.join; isn't that explicit
enough?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Chris Angelico
On Thu, Apr 14, 2016 at 11:45 PM, Random832  wrote:
> On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
>> That's not a *new* problem though, it already exists if you pass in a
>> mix of bytes and str:
>>
>> There's also already a solution (regardless of whether you want bytes
>> or str as the result), which is to explicitly coerce all the arguments
>> to the same type:
>
> It'd be nice if that went away. Having to do that makes about as much
> sense to me as if you had to explicitly coerce an int to a float to add
> them together. Sure, explicit is better than implicit, but there are
> limits. You're explicitly calling os.path.join; isn't that explicit
> enough?

Adding integers and floats is considered "safe" because most people's
use of floats completely compasses their use of ints. (You'll get
OverflowError if it can't be represented.) But float and Decimal are
considered "unsafe":

>>> 1.5 + decimal.Decimal("1.5")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal'

This is more what's happening here. Floats and Decimals can represent
similar sorts of things, but with enough incompatibilities that you
can't simply merge them.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 22:16, Victor Stinner  wrote:
> 2016-04-13 19:10 GMT+02:00 Brett Cannon :
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
>> four potential approaches implemented (although it doesn't follow the
>> "separate functions" approach some are proposing and instead goes with the
>> allow_bytes approach I originally proposed).
>
> IMHO the best argument against the flavor 4 (fspath: str or bytes
> allowed) is the os.path.join() function.
>
> I consider that the final goal of the whole discussion is to support
> something like:
>
> path = os.path.join(pathlib_path, "str_path", direntry)

That's not a *new* problem though, it already exists if you pass in a
mix of bytes and str:

>>> import os.path
>>> os.path.join("str", b"bytes")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.4/posixpath.py", line 89, in join
"components") from None
TypeError: Can't mix strings and bytes in path components

There's also already a solution (regardless of whether you want bytes
or str as the result), which is to explicitly coerce all the arguments
to the same type:

>>> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
'str/bytes'
>>> os.path.join(*map(os.fsencode, ("str", b"bytes")))
b'str/bytes'

Assuming os.fsdecode and os.fsencode are updated to call os.fspath on
their argument before continuing with the current logic, the latter
two forms would both start automatically handling both DirEntry and
pathlib objects, while the first form would continue to throw
TypeError if handed an unexpected bytes value (whether directly or via
an __fspath__ call).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:
> I have a strong preference for str only, because I still don't see a
> use case for polymorphic __fspath__.

Ultimately we're talking about redundancy and performance here. The "use
case" such as there is one, is if there's a class (be it DirEntry or
whatever else) that natively stores bytes, and __fspath__ has to return
str, then it calls fsdecode and then open immediately turns around and
calls fsencode on the result, accomplishing nothing vs just passing
everything straight through.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Random832
On Thu, Apr 14, 2016, at 02:00, Nick Coghlan wrote:
> > If the protocol can return bytes, then that means that types (DirEntry?
> > someone had an alternate path library with a bPath?) which return bytes
> > via the protocol will proliferate, and cannot be safely passed to
> > anything that uses os.fspath. Numerous copies of "def myfspath(x):
> > return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> > monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> > examples.
> 
> If folks want coercion, they can just use os.fsdecode(x), as that
> already has a str -> str passthrough from the input to the output
> (unlike codecs.decode) and will presumably be updated to include an
> implicit call to os._raw_fspath() on the passed in object.

This is the first I've heard of any suggestion to have fsdecode accept
non-strings.

> > Why is it so objectionable for os.fspath to do coercion?
> 
> The first problem is that binary paths on Windows basically don't
> work, so it's preferable for them to fail fast regardless of platform,
> rather than to have them implicitly work on *nix, only to fail for
> Windows users using non-ASCII paths later.

Ideally, this warning would be raised from a central place, and even
fspath (and even fsdecode) would go through it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Victor Stinner
2016-04-13 19:10 GMT+02:00 Brett Cannon :
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
> four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).

IMHO the best argument against the flavor 4 (fspath: str or bytes
allowed) is the os.path.join() function.

I consider that the final goal of the whole discussion is to support
something like:

path = os.path.join(pathlib_path, "str_path", direntry)

Even if direntry uses a bytes filename. I expect genericpath.join() to
be patched to use os.fspath(). If os.fspath() returns bytes,
path.join() will fail with an annoying TypeError.

I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.

I recall that I used to say that Python 2 doesn't support Unicode
filenames because os.path.join() raises a UnicodeDecodeError when you
try to join a Unicode filename with a byte filename which contains
non-ASCII bytes. The problem occurs indirectly in code using hardcoded
paths, Unicode or bytes paths. Saying that "Python 2 doesn't support
Unicode filenames" is wrong, but since Unicode is an hard problem, I
tried to simplify my explanation :-)

You can apply the same rationale for the flavors 2 and 3
(os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
TypeError on os.path.join().

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-14 Thread Victor Stinner
Le jeudi 14 avril 2016, Nick Coghlan  a écrit :
>
> > IHMO it's not a big deal to update these projects for the future
> > Python 3.6. I can even help them to support the new bytecode format.
>
> We've also had previous discussions on adding a "minimum viable
> bytecode editing" API to the standard library, and updating these
> third party modules to support wordcode instead of bytecode could
> provide a good use-case-driven opportunity for defining that (i.e. it
> wouldn't be about providing an end user facing API directly, but
> rather about letting CPython take care of the bookkeeping details for
> things like lnotab and sorting out jump targets).

Yeah, I know well this discussion since it started with my PEP 511. I
wrote the bytecode as a tool for the discussion, to try to understand
better the use case. The main task was to design the API.

I first looked at byteplay and codetranformer projects, but I found
some issues in their design. Their API has some design issues. IMHO
their API is not the best to modify bytecode.

My goal is to support Bytecode.from_code(code).to_code()==code: store
enough information to be able to emit again exactly the same bytecode
(line numbers, exact argument value, etc.).

I started with a long email, but I decided to document differences in
bytecode documentation:
https://bytecode.readthedocs.org/en/latest/byteplay_codetransformer.html

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 17:02, Stephen J. Turnbull  wrote:
> But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
> (so far he's been in the "pathlib is a text utility" camp).

pathlib is too high level (i.e. has too many dependencies) to be used
in low level boundary code.

The use case for returning bytes from __fspath__ is DirEntry, so you
can write things like this in low level code:

def myscandir(dirpath):
for entry in os.scandir(dirpath):
if entry.is_file():
with open(entry) as f:
# do something

and still have them automatically inherit the str/bytes handling of
the core standard library APIs.

By contrast, as soon as you type "import pathlib" at the top of your
file, you've stepped outside the world of potentially pure boundary
code, and are instead dealing with structured application level
objects (which means traversing the bytes->str boundary before the
str->Path one).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Victor Stinner
2016-04-14 13:01 GMT+02:00 Serhiy Storchaka :
> But this filter is not quite robust, for example it will cause this mail to
> be moved to the folder for Rietveld reviews.

Right, it's just a workaround since I'm unable to fix the root cause
(emails marked as spam which looks like a configuration issue in the
SMTP server.)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Serhiy Storchaka

On 14.04.16 13:33, Martin Panter wrote:

On 14 April 2016 at 08:51, Serhiy Storchaka  wrote:

Most bug tracker emails still went in the Spam folder. I have a filter for
Roundap emails, but there is no any mark that I can use for filtering
Rietveld emails.


FWIW I set up the following filter in Gmail for Rietveld reviews:

Matches: http://bugs.python.org/review
Do this: Never send it to Spam

I suspect it helps, but occasionally I think stuff still goes to spam.
(Just don’t tell this secret rule to actual spammers :)


Thank you and Victor for this advise.

But this filter is not quite robust, for example it will cause this mail 
to be moved to the folder for Rietveld reviews.


I was going to try a different approach, append "+py" to my address for 
the tracker, as in your address.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Martin Panter
On 14 April 2016 at 08:51, Serhiy Storchaka  wrote:
> On 13.04.16 07:39, Terry Reedy wrote:
>>
>> On 4/4/2016 5:05 PM, Terry Reedy wrote:
>>
>> Since a few days, I am getting bug tracker emails again, in my Inbox.  I
>> just got a Rietveld review in the Inbox and I believe it went there
>> directly instead of first to Junk.  Thank you to whoever made the
>> improvements.
>
>
> AFAIK David just disabled IPv6 support.
>
> Most bug tracker emails still went in the Spam folder. I have a filter for
> Roundap emails, but there is no any mark that I can use for filtering
> Rietveld emails.

FWIW I set up the following filter in Gmail for Rietveld reviews:

Matches: http://bugs.python.org/review
Do this: Never send it to Spam

I suspect it helps, but occasionally I think stuff still goes to spam.
(Just don’t tell this secret rule to actual spammers :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Victor Stinner
Le 14 avr. 2016 10:53 AM, "Serhiy Storchaka"  a écrit :
> Most bug tracker emails still went in the Spam folder. I have a filter
for Roundap emails, but there is no any mark that I can use for filtering
Rietveld emails.

I'm using the base URL of Rietveld and match it in the mail body. Gmail
filters have an option to never mark emails as spam.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path

2016-04-14 Thread Victor Stinner
IMHO it's more a side effect of the implementation than a deliberate
choice. For new code which really want to support bytes paths, I suggest to
only accept bytes and bytes subclasses.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Victor Stinner
Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka"  a écrit :
> A desirable but nonexistent feature is to write emails to authors of
commits that broke buildbots. How hard to implement this?

Yeah I also had this idea since many years but buildbots were quite
unstable. Maybe we should be more strict to consider a buildbot as stable?

I propose to experiment sending notifications of failure to the authors of
changes *and* to a new mailing list. I would subscribe to such list. An
even safer starting point would be to only start with the mailing list.

FYI I'm connected to the #python-dev IRC channel which already contain
these notifications. But I agree that mails are better.

> What are you think about backporting recent regrtest to 2.7? Most needed
features to me are the -m and -G options.

Regrtest changed a lot in python 3.6 (new test.libregrtest library).
I suggest to start from python 3.5.

For -m: if it doesn't need to modify the unittest module, I agree.

I don't know -G option.

> Would be nice to add a feature for running every test in separate
subprocess. This will isolate the effect of failed tests.

See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I
even mentionned my issue.

I suggest to use -jN on all buildbot, at least -j1.

Maybe -j2 is even better since many tests are waiting on IO or simple sleep.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Martin Panter
On 14 April 2016 at 09:15, Serhiy Storchaka  wrote:
> On 13.04.16 14:40, Victor Stinner wrote:
>> By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
>> considered as stable since it's failing with multiple issues since
>> many months and nobody is working on these failures. I suggest to move
>> this buildbot back to the unstable category.
>
> I think the main cause is the lack of memory in this buildbot. I tried to
> minimize memory consumption and leaks, but some leaks are left, and they
> provoke other tests failures, and additional resource leaks. Would be nice
> to add a feature for running every test in separate subprocess. This will
> isolate the effect of failed tests.

Last time I looked into the Open Indiana buildbot, I concluded that
the biggest problem was Python using fork() to spawn subprocesses. I
understand that OS does not do “memory overcommitment” like Linux
does, so every time you fork, the OS has to double the amount of
memory that is reserved. It is ironic, but running each test using the
current subprocess module (which uses fork) would probably make the
problem worse.

I suspect using posix_spawn() if possible would help a lot. But this
was rejected in  for not being
flexible enough and making maintainence too complicated.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Paul Moore
On 14 April 2016 at 08:02, Stephen J. Turnbull  wrote:
> So let me propose what I think is the elephant in the room.  If you're
> going to have a polymorphic __fspath__, then pathlib is *the* example
> of a module that *desperately* needs to be polymorphic.  Consider:
>
> A non-text Application has some bytes and passes them to
> pathlib.Path as 
> manipulates them and passes the result to
> os.scandir as 
> expecting a return of
> DirEntries of 
>
>  ==  == bytes, and  == Path is TOOWTDI, no?

I'm not sure I follow this logic at all. But from my reading your
argument contradicts your conclusion, so maybe I'm misunderstanding.

To me, the "obvious" conclusion is that pathlib is not appropriate in
non-text applications, because  *cannot* be bytes (the
constructor rejects bytes). I see no reason to change that - non-text
applications are inherently low level, and shouldn't expect to use
high-level abstractions like pathlib.

> But under the current proposal which doesn't touch the internal
> mechanisms of pathlib and allows, but has no way to request, bytes
> returns,  == str,  == Path, and  == str,
> requiring two explicit conversions that bytes-shoveling developers
> will tell you should be unnecessary.  QED, pathlib should be
> polymorphic as a central part of this proposal.

Nope, QED pathlib is not a low level abstraction.

So your argument to me doesn't help much, because it's a given that
pathlib is str-only. The debate is about how things like scandir
(specifically DirEntry objects) and Ethan's pathlib replacement, which
*do* allow bytes in and out, should participate in the new protocol,
when they are bytes (they obviously should work just like pathlib when
they are strings).

In my opinion, they *shouldn't* the new protocol should be string-only
(at least initially).

If I understand (from a couple of brief mentions) Ethan has a
string-like path object and a bytes-like path object, so he could
support fspath on the string-like one but not the bytes-like one. He
may not like having slightly different APIs for the two types, I don't
know, but it's possible. But DirEntry is polymorphic, so it *will*
have a __fspath__ method, and needs to know what to do when it's
bytes-like (I guess with a bit of getattr hacking DirEntry *could*
expose a __fspath__ method only if it's string-like, but that seems
like a pretty gross hack).

So:

1. pathlib remains string-like, and is the canonical example of
__fspath__, returns strings only
2. DirEntry is the only other example of the protocol in the stdlib,
but is polymorphic
3. I'm not aware of any 3rd party library that has polymorphic classes
(Ethan can correct me if I'm wrong here)

So the only purpose I know of for discussing __fspath__ returning
bytes is for scandir, and hypothetical polymorphic 3rd party path
abstractions (and possibly Ethan's preference to have a common API for
his 2 classes).

I propose we should have a string-only __fspath__ protocol in 3.6.
Bytes-format DirEntry objects can raise an error in __fspath__. If it
becomes obvious with usage that we need bytes support in __fspath__ we
can add it (compatibly - string-only code wouldn't need to change) in
3.7. That seems far better to me than trying to design bytes support
without actual use cases.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Serhiy Storchaka

On 13.04.16 14:40, Victor Stinner wrote:

Last months, most 3.x buildbots failed randomly. Some of them were
always failing. I spent some time to fix almost all Windows and Linux
buildbots. There were a lot of different issues.


Excelent! Many thanks for doing this. And new features of regrtest look 
nice.



So please try to not break buildbots again and remind to watch them sometimes:

   
http://buildbot.python.org/all/waterfall?category=3.x.stable=3.x.unstable


A desirable but nonexistent feature is to write emails to authors of 
commits that broke buildbots. How hard to implement this?



Next weeks, I will try to backport some fixes to Python 3.5 (if
needed) to make these buildbots more stable too.

Python 2.7 buildbots are also in a sad state (ex: test_marshal
segfaults on Windows, see issue #25264). But it's not easy to get a
Windows with the right compiler to develop on Python 2.7 on Windows.


What are you think about backporting recent regrtest to 2.7? Most needed 
features to me are the -m and -G options.



Maybe it's time to move more 3.x buildbots to the "stable" category?
http://buildbot.python.org/all/waterfall?category=3.x.stable


+1


By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
considered as stable since it's failing with multiple issues since
many months and nobody is working on these failures. I suggest to move
this buildbot back to the unstable category.


I think the main cause is the lack of memory in this buildbot. I tried 
to minimize memory consumption and leaks, but some leaks are left, and 
they provoke other tests failures, and additional resource leaks. Would 
be nice to add a feature for running every test in separate subprocess. 
This will isolate the effect of failed tests.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Serhiy Storchaka

On 13.04.16 07:39, Terry Reedy wrote:

On 4/4/2016 5:05 PM, Terry Reedy wrote:

Since a few days, I am getting bug tracker emails again, in my Inbox.  I
just got a Rietveld review in the Inbox and I believe it went there
directly instead of first to Junk.  Thank you to whoever made the
improvements.


AFAIK David just disabled IPv6 support.

Most bug tracker emails still went in the Spam folder. I have a filter 
for Roundap emails, but there is no any mark that I can use for 
filtering Rietveld emails.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bytes path

2016-04-14 Thread Serhiy Storchaka

What types should be accepted as bytes path?

For now os.path is strict and accepts only bytes and bytes subclasses 
(even bytearray is not accepted) as bytes path. This is enough for 
working with low-level Posix paths and supporting backward compatibility.


On other hand, most os functions is too permissive since 3.3 and accept 
any type that supports the buffer protocol as bytes path. Accepted even 
such meaningless objects as array('h').


Some functions (zipimport.zipimporter() in 3.x, _imp.load_dynamic() in 
3.3+, builtin compile() etc in 3.4) accept even arbitrary iterables, 
e.g. [116, 101, 115, 116] (see http://bugs.python.org/issue26754).


I think we should accept only bytes (and subclasses). Even bytearray is 
less acceptable since it is mutable and can't be used as a key in caches.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Michael Mysinger via Python-Dev
Brett Cannon  python.org> writes:

> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has 
the four potential approaches implemented (although it doesn't follow the 
"separate functions" approach some are proposing and instead goes with the 
allow_bytes approach I originally proposed). 
> 

Thanks Brett, it is definitely a start! Maybe I am just more unimaginative 
than most, but since interoperability is the goal, I would ideally be able 
to play with a full implementation where all the stdlib functions Nick 
originally mentioned accepted these "rich path" objects. 

However, for concrete example purposes, maybe it is sufficient to start with 
your fspath function, a toy RichPath class implementing __fspath__, and 
something like os.path.join, which is a meaty enough example to test some of 
the functionality. I posted a gist of a string only example at 
https://gist.github.com/mmysinger/0b5ae2cfb866f7013c387a2683c7fc39

After playing with and considering the 4 possibilities, anything where 
__fspath__ can return bytes seems like insanity that flies in the face of 
everything Python 3 is trying to accomplish. In particular, one RichPath 
class might return bytes and another str, or even worse the same class might 
sometimes return bytes and sometimes str. When will os.path.join blow up due 
to mixing bytes and str and when will it work in those situations? So for me 
that eliminates #3 and #4.

Also the version #2 accepting bytes in os.fspath felt like it could be a 
very minor convenience, but even the str only version #1 is just requires 
one isinstance check in the rare case you need to also deal with bytes (see 
the os.path.join example in the gist above). So I lean toward the str only 
#1 version. 

In any case I would start with the strict str only full implementation and 
loosen it either in 3.6 or 3.7 depending on what people think after actually 
using it.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Stephen J. Turnbull
I was going to read the new posts that came in since I started this
one (at one point it was 5X as long as it is now), but this thread is
way out of control.  My apologies to anybody who has presented[1] use
cases in support of the wildly speculative proposals under discussion,
but my bet is that there have been none.

Victor Stinner writes:

 > Oops sorry, I forgot to add that I have no strong opinion on the type (I
 > only have a minor preference for str only).

I have a strong preference for str only, because I still don't see a
use case for polymorphic __fspath__.

os functions and os.path functions need to *accept* both str and bytes
because they are interfaces to OS functionality used by both text and
non-text applications, and so must check and convert to OS native type.
Many of these function produce what they receive because both text and
non-text applications use names of filesystem objects internally, as
well as passing them to OS wrappers.  The question is how far to take
that logic.

So let me propose what I think is the elephant in the room.  If you're
going to have a polymorphic __fspath__, then pathlib is *the* example
of a module that *desperately* needs to be polymorphic.  Consider:

A non-text Application has some bytes and passes them to
pathlib.Path as 
manipulates them and passes the result to
os.scandir as 
expecting a return of
DirEntries of 

 ==  == bytes, and  == Path is TOOWTDI, no?
But under the current proposal which doesn't touch the internal
mechanisms of pathlib and allows, but has no way to request, bytes
returns,  == str,  == Path, and  == str,
requiring two explicit conversions that bytes-shoveling developers
will tell you should be unnecessary.  QED, pathlib should be
polymorphic as a central part of this proposal.

IMO that's not the right way to go (slippery slope, very quickly you
hit manipulations that are "really" text operations).  See also my
proposal "Pathlib enhancements - improve fsdecode and fsencode" which
suggests a (primitive) way for code to request the type it likes
better.

But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
(so far he's been in the "pathlib is a text utility" camp).


Footnotes: 
[1]  Just because I don't know of any I consider persuasive doesn't
mean there aren't any, but what you don't tell me I don't know.
(Maybe you'd have to kill me?  If so, thanks for not telling!)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pathlib enhancements - improve fsdecode and fsencode

2016-04-14 Thread Stephen J. Turnbull
Please please please, junk both "filter out bytes" proposals.

Since they involve an exception, they impose an unnecessary "try" on
all text applications that fear death on bytes returns.  May as well
just wrap all objects with __fspath__ in fsdecode, and all is
happy.

Counterproposal: make fsdecode and fsencode grok __fspath__.  Then:
(1) Bytes-lovers and str-addicts are both safe.
(2) They can omit fspath, too!

No, that doesn't work if the bytes objects aren't in the file system
encoding, but these are *bytes*, mon ami: you have no way to find out
what that encoding is, so you either know already and you substitute
that + fspath for fsdecode, or you're hosed.  And in the only concrete
use case so far, fsdecode Just Works.

I suppose a similar argument holds for applications that want bytes
and fsencode, but I leave that as an exercise for the reader.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 14:05, Random832  wrote:
> On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote:
>> In this kind of case, inheritance tends to trump protocol. For
>> example, int subclasses can't override operator.index:
> ...
>> The reasons for that behaviour are more pragmatic than philosophical:
>> builtins and their subclasses are extensively special-cased for speed
>> reasons, and those shortcuts are encountered before the interpreter
>> even considers using the general protocol.
>>
>> In cases where the magic method return types are polymorphic (so
>> subclasses may want to override them) we'll use more restrictive exact
>> type checks for the shortcuts, but that argument doesn't apply for
>> typechecked protocols where the result is required to be an instance
>> of a particular builtin type (but subclasses are considered
>> acceptable).
>
> Then why aren't we doing it for str? Because "try: path =
> path.__fspath__()" is more idiomatic than the alternative?

The sketches Brett posted will bear little resemblance to the actual
implementation - that will be in C and use similar idioms to those we
use for other abstract protocols (such as shortcuts for instances of
builtin types, and doing the method lookup via the passed in object's
type, rather than on the instance).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-14 Thread Nick Coghlan
On 14 April 2016 at 13:54, Random832  wrote:
> On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote:
>
>> - os.fspath -> str (no coercion)
>> - os.fsdecode -> str (with coercion from bytes)
>> - os.fsencode -> bytes (with coercion from str)
>> - os._raw_fspath -> str-or-bytes (no coercion)
>>
>> (with "coercion" referring to how the result of __fspath__ and any
>> directly passed in str or bytes objects are handled)
>>
>> The leading underscore on _raw_fspath would be of the "this is a
>> documented and stable API, but you probably don't want to use it
>> unless you really know what you're doing" variety, rather than the
>> "this is an undocumented and potentially unstable private API"
>> variety.
>
> In this scenario could the protocol return bytes?

Yes, that's desirable to handle DirEntry transparently regardless of type.

> If the protocol can return bytes, then that means that types (DirEntry?
> someone had an alternate path library with a bPath?) which return bytes
> via the protocol will proliferate, and cannot be safely passed to
> anything that uses os.fspath. Numerous copies of "def myfspath(x):
> return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> examples.

If folks want coercion, they can just use os.fsdecode(x), as that
already has a str -> str passthrough from the input to the output
(unlike codecs.decode) and will presumably be updated to include an
implicit call to os._raw_fspath() on the passed in object.

> Why is it so objectionable for os.fspath to do coercion?

The first problem is that binary paths on Windows basically don't
work, so it's preferable for them to fail fast regardless of platform,
rather than to have them implicitly work on *nix, only to fail for
Windows users using non-ASCII paths later.

The second is that it would make os.fspath and os.fsdecode
functionally equivalent, so we'd have two different spellings for the
same operation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com