Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-31 Thread Nathaniel Smith
On Sun, Dec 31, 2017 at 5:39 PM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Sun, Dec 31, 2017 at 09:07:01AM -0800, Nathaniel Smith wrote:
>
>> This is another reason why we ought to let users do their own IDNA handling
>> if they want...
>
> I expect that letting users do their own IDNA handling will correspond
> to not doing any IDNA handling at all.

You did see the words "if they want", right? I'm not talking about
removing the stdlib's default IDNA handling, I'm talking about fixing
the cases where the stdlib goes out of its way to prevent users from
overriding its IDNA handling.

And "users" here is a very broad category; it includes libraries like
requests, twisted, trio, ... that are already doing better IDNA
handling than the stdlib, except in cases where the stdlib actively
prevents it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-31 Thread Nathaniel Smith
On Dec 31, 2017 7:37 AM, "Stephen J. Turnbull" <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

Nathaniel Smith writes:

 > Issue 1: Python's built-in IDNA implementation is wrong (implements
 > IDNA 2003, not IDNA 2008).

Is "wrong" the right word here?  I'll grant you that 2008 is *better*,
but typically in practice versions coexist for years.  Ie, is there no
backward compatibility issue with registries that specified IDNA 2003?


Well, yeah, I was simplifying, but at the least we can say that always and
only using IDNA 2003 certainly isn't right :-). I think in most cases the
preferred way to deal with these kinds of issues is not to carry around an
IDNA 2003 implementation, but instead to use an IDNA 2008 implementation
with the "transitional compatibility" flag enabled in the UTS46
preprocessor? But this is rapidly exceeding my knowledge.

This is another reason why we ought to let users do their own IDNA handling
if they want...


This is not entirely an idle question: I'd like to tool up on the
RFCs, research existing practice (especially in the East/Southeast Asian
registries), and contribute to the implementation if there may be an
issue remaining.  (Interpreting RFCs is something I'm reasonably good
at.)


Maybe this is a good place to start:

https://github.com/kjd/idna/blob/master/README.rst

-n

[Sorry if my quoting is messed up; posting from my phone and Gmail for
Android apparently generates broken text/plain.]
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith
On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou  wrote:
> On Fri, 29 Dec 2017 21:54:46 +0100
> Christian Heimes  wrote:
>>
>> On the other hand ssl module is currently completely broken. It converts
>> hostnames from bytes to text with 'idna' codec in some places, but not
>> in all. The SSLSocket.server_hostname attribute and callback function
>> SSLContext.set_servername_callback() are decoded as U-label.
>> Certificate's common name and subject alternative name fields are not
>> decoded and therefore A-labels. The *must* stay A-labels because
>> hostname verification is only defined in terms of A-labels. We even had
>> a security issue once, because partial wildcard like 'xn*.example.org'
>> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>>
>> In issue [2] and PR [3], we all agreed that the only sensible fix is to
>> make 'SSLContext.server_hostname' an ASCII text A-label.
>
> What are the changes in API terms?  If I'm calling wrap_socket(), can I
> pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> have to encode it myself?  If the latter, it seems like we are putting
> the burden of protocol compliance on users.

Part of what makes this confusing is that there are actually three
intertwined issues here. (Also, anything that deals with Unicode *or*
SSL/TLS is automatically confusing, and this is about both!)

Issue 1: Python's built-in IDNA implementation is wrong (implements
IDNA 2003, not IDNA 2008).
Issue 2: The ssl module insists on using Python's built-in IDNA
implementation whether you want it to or not.
Issue 3: Also, the ssl module has a separate bug that means
client-side cert validation has never worked for any IDNA domain.

Issue 1 is potentially a security issue, because it means that in a
small number of cases, Python will misinterpret a domain name. IDNA
2003 and IDNA 2008 are very similar, but there are 4 characters that
are interpreted differently, with ß being one of them. Fixing this
though is a big job, and doesn't exactly have anything to do with the
ssl module -- for example, socket.getaddrinfo("straße.de", 80) and
sock.connect("straße.de", 80) also do the wrong thing. Christian's not
proposing to fix this here. It's issues 2 and 3 that he's proposing to
fix.

Issue 2 is a problem because it makes it impossible to work around
issue 1, even for users who know what they're doing. In the socket
module, you can avoid Python's automagical IDNA handling by doing it
manually, and then calling socket.getaddrinfo("strasse.de", 80) or
socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In
the ssl module, this doesn't work. There are two places where ssl uses
hostnames. In client mode, the user specifies the server_hostname that
they want to see a certificate for, and then the module runs this
through Python's IDNA machinery *even if* it's already properly
encoded in ascii. And in server mode, when the user has specified an
SNI callback so they can find out which certificate an incoming client
connection is looking for, the module runs the incoming name through
Python's IDNA machinery before handing it to user code. In both cases,
the right thing to do would be to just pass through the ascii A-label
versions, so savvy users can do whatever they want with them. (This
also matches the general design principle around IDNA, which assumes
that the pretty unicode U-labels are used only for UI purposes, and
everything internal uses A-labels.)

Issue 3 is just a silly bug that needs to be fixed, but it's tangled
up here because the fix is the same as for Issue 2: the reason
client-side cert validation has never worked is that we've been taking
the A-label from the server's certificate and checking if it matches
the U-label we expect, and of course it never does because we're
comparing strings in different encodings. If we consistently converted
everything to A-labels as soon as possible and kept it that way, then
this bug would never have happened.

What makes it tricky is that on both the client and the server, fixing
this is actually user-visible.

On the client, checking sslsock.server_hostname used to always show a
U-label, but if we stop using U-labels internally then this doesn't
make sense. Fortunately, since this case has never worked at all,
fixing it shouldn't cause any problems.

On the server, the obvious fix would be to start passing
A-label-encoded names to the servername_callback, instead of
U-label-encoded names. Unfortunately, this is a bit trickier, because
this *has* historically worked (AFAIK) for IDNA names, so long as they
didn't use one of the four magic characters who changed meaning
between IDNA 2003 and IDNA 2008. But we do still need to do something.
For example, right now, it's impossible to use the ssl module to
implement a web server at https://straße.de, because incoming
connections will use SNI to say that they expect a cert for
"xn--strae-oqa.de", and then the ssl module will 

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith
On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull
 wrote:
> Christian Heimes writes:
>  > Questions:
>  > - Is everybody OK with breaking backwards compatibility? The risk is
>  > small. ASCII-only domains are not affected
>
> That's not quite true, as your German example shows.  In some Oriental
> renderings it is impossible to distinguish halfwidth digits from
> full-width ones as the same glyphs are used.  (This occasionally
> happens with other ASCII characters, but users are more fussy about
> digits lining up.)  That is, while technically ASCII-only domain names
> are not affected, users of ASCII-only domain names are potentially
> vulnerable to confusable names when IDNA is introduced.  (Hopefully
> the Asian registrars are as woke as the German ones!  But you could
> still register a .com containing full-width digits or letters.)

This particular example isn't an issue: in IDNA encoding, full-width
and half-width digits are normalized together, so number1.com and
number1.com actually refer to the same domain name. This is true in
both the 2003 and 2008 versions:

# IDNA 2003
In [7]: "number\uff11.com".encode("idna")
Out[7]: b'number1.com'

# IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True)
Out[8]: b'number1.com'

That said, IDNA does still allow for a bunch of spoofing opportunities
that aren't possible with pure ASCII, and this requires some care:
https://unicode.org/faq/idn.html#16

This is mostly a UI issue, though; there's not much that the socket or
ssl modules can do to help here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-29 Thread Nathaniel Smith
On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman  wrote:
> Good point.  So auto-generate a new __repr__ if:
>
> - one is not provided, and
> - existing __repr__ is either:
>   - object.__repr__, or
>   - a previous dataclass __repr__
>
> And if the auto default doesn't work for one's use-case, use the keyword
> parameter to specify what you want.

What does attrs do here?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 567 v2

2017-12-28 Thread Nathaniel Smith
On Thu, Dec 28, 2017 at 1:51 AM, Victor Stinner
 wrote:
> var = ContextVar('var', default=42)
>
> and:
>
> var = ContextVar('var')
> var.set (42)
>
> behaves the same, no?

No, they're different. The second sets the value in the current
context. The first sets the value in all contexts that currently
exist, and all empty contexts created in the future.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-19 Thread Nathaniel Smith
On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower <steve.do...@python.org> wrote:
> On 19Dec2017 1004, Chris Barker wrote:
>>
>> Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to
>> return the same dict -- so documented behavior may already be broken.
>
>
> Two relevant quotes from the pprint module docs:
>
>>>> The pprint module provides a capability to “pretty-print” arbitrary
>>>> Python data structures in a form which can be used as input to the
>>>> interpreter
>
>>>> Dictionaries are sorted by key before the display is computed.
>
> It says nothing about the resulting dict being the same as the original one,
> just that it can be used as input. So these are both still true (until
> someone deliberately breaks the latter).

This is a pretty fine hair to be splitting... I'm sure you wouldn't
argue that it would be valid to display the dict {"a": 1} as
'["hello"]', just because '["hello"]' is a valid input to the
interpreter (that happens to produce a different object than the
original one) :-). I think we can assume that pprint's output is
supposed to let you reconstruct the original data structures, at least
in simple cases, even if that isn't explicitly stated.

> In any case, there are so many ways
> to spoil the first point for yourself that it's hardly worth treating as an
> important constraint.

I guess the underlying issue here is partly the question of what the
pprint module is for. In my understanding, it's primarily a tool for
debugging/introspecting Python programs, and the reason it talks about
"valid input to the interpreter" isn't because we want anyone to
actually feed the data back into the interpreter, but to emphasize
that it provides an accurate what-you-see-is-what's-really-there view
into how the interpreter understands a given object. It also
emphasizes that this is not intended for display to end users; making
the output format be "Python code" suggests that the main intended
audience is people who know how to read, well, Python code, and
therefore can be expected to care about Python's semantics.

>> (though I assume order is still ignored when comparing dicts, so:
>> eval(pprint(a_dict)) == a_dict will still hold.
>
>
> Order had better be ignored when comparing dicts, or plenty of code will
> break. For example:
>
>>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1}
> True

Yes, this is never going to change -- I expect that in the long run,
the only semantic difference between dict and OrderedDict will be in
their __eq__ methods.

> Saying that "iter(dict)" will produce keys in the same order as they were
> inserted is not the same as saying that "dict" is an ordered mapping. As far
> as I understand, we've only said the first part.
>
> (And the "nerve" here is that I disagreed with even the first part, but
> didn't fight it too strongly because I never relied on the iteration order
> of dict. However, I *do* rely on nobody else relying on the iteration order
> of dict either, and so proposals to change existing semantics that were
> previously independent of insertion order to make them rely on insertion
> order will affect me. So now I'm pushing back.)

I mean, I don't want to be a jerk about this, and we still need to
examine things on a case-by-case basis but... Guido has pronounced
that Python dict preserves order. If your code "rel[ies] on nobody
else relying on the iteration order", then starting in 3.7 your code
is no longer Python.

Obviously I like that change more than you, but to some extent it's
just something we have to live with, and even if I disagreed with the
new semantics I'd still rather the standard library handle them
consistently rather than being half-one-thing-and-half-another.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-19 Thread Nathaniel Smith
On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower  wrote:
> On 18Dec2017 2309, Steven D'Aprano wrote:
>> [A LOT OF THINGS I AGREE WITH]
> I agree completely with Steven's reasoning here, and it bothers me that
> what is an irrelevant change to many users (dict becoming ordered) seems
> to imply that all users of dict have to be updated.

Can we all take a deep breath and lay off the hyperbole? The only
point under discussion in this subthread is whether pprint -- our
module for producing nicely-formatted-reprs -- should continue to sort
keys, or should continue to provide an accurate repr. There are
reasonable arguments for both positions, but no-one's suggesting
anything in the same solar system as "all users of dict have to be
updated".

Am I missing some underlying nerve that this is hitting for some reason?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-18 Thread Nathaniel Smith
On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote:
>> On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw <ba...@python.org> wrote:
>> > On Dec 18, 2017, at 21:11, Chris Barker <chris.bar...@noaa.gov> wrote:
>> >
>> >> Will changing pprint be considered a breaking change?
>> >
>> > Yes, definitely.
>>
>> Wait, what? Why would changing pprint (so that it accurately reflects
>> dict's new underlying semantics!) be a breaking change?
>
> I have a script which today prints data like so:
>
> {'Aaron': 62,
>  'Anne': 51,
>  'Bob': 23,
>  'George': 30,
>  'Karen': 45,
>  'Sue': 17,
>  'Sylvester': 34}
>
> Tomorrow, it will suddenly start printing:
>
> {'Bob': 23,
>  'Karen': 45,
>  'Sue': 17,
>  'George': 30,
>  'Aaron': 62,
>  'Anne': 51,
>  'Sylvester': 34}
>
>
> and my users will yell at me that my script is broken because the data
> is now in random order.

To make sure I understand, do you actually have a script like this, or
is this hypothetical?

> Now, maybe that's my own damn fault for using
> pprint instead of writing my own pretty printer... but surely the point
> of pprint is so I don't have to write my own?
>
> Besides, the docs say very prominently:
>
> "Dictionaries are sorted by key before the display is computed."
>
> https://docs.python.org/3/library/pprint.html
>
> so I think I can be excused having relied on that feature.

No need to get aggro -- I asked a question, it wasn't a personal attack.

At a high-level, pprint's job is to "pretty-print arbitray Python data
structures in a form which can be used as input to the interpreter"
(quoting the first sentence of its documentation), i.e., like repr()
it's fundamentally intended as a debugging tool that's supposed to
match how Python works, not any particular externally imposed output
format. Now, how Python works has changed. Previously dict order was
arbitrary, so picking the arbitrary order that happened to be sorted
was a nice convenience. Now, dict order isn't arbitrary, and sorting
dicts both obscures the actual structure of the Python objects, and
also breaks round-tripping through pprint. Given that pprint's
overarching documented contract of "represent Python objects" now
conflicts with the more-specific documented contract of "sort dict
keys", something has to give.

My feeling is that we should preserve the overarching contract, not
the details of how dicts were handled. Here's another example of a
teacher struggling with this:
https://mastodon.social/@aparrish/13011522

But I would be in favor of adding a kwarg to let people opt-in to the
old behavior like:

from pprint import PrettyPrinter
pprint = PrettyPrinter(sortdict=True).pprint

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-18 Thread Nathaniel Smith
On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw  wrote:
> On Dec 18, 2017, at 21:11, Chris Barker  wrote:
>
>> Will changing pprint be considered a breaking change?
>
> Yes, definitely.

Wait, what? Why would changing pprint (so that it accurately reflects
dict's new underlying semantics!) be a breaking change? Are you
suggesting it shouldn't be changed in 3.7?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usefulness of binary compatibility accross Python versions?

2017-12-17 Thread Nathaniel Smith
On Dec 16, 2017 11:44 AM, "Guido van Rossum"  wrote:

On Sat, Dec 16, 2017 at 11:14 AM, Antoine Pitrou 
wrote:

> On Sat, 16 Dec 2017 19:37:54 +0100
> Antoine Pitrou  wrote:
> >
> > Currently, you can pass a `module_api_version` to PyModule_Create2(),
> > but that function is for specialists only :-)
> >
> > ("""Most uses of this function should be using PyModule_Create()
> > instead; only use this if you are sure you need it.""")
>
> Ah, it turns out I misunderstood that piece of documentation and also
> what PEP 3121 really did w.r.t the module API check.
>
> PyModule_Create() is actually a *macro* calling PyModule_Create2() with
> the version number is was compiled against!
>
> #ifdef Py_LIMITED_API
> #define PyModule_Create(module) \
> PyModule_Create2(module, PYTHON_ABI_VERSION)
> #else
> #define PyModule_Create(module) \
> PyModule_Create2(module, PYTHON_API_VERSION)
> #endif
>
> And there's already a check for that version number in moduleobject.c:
> https://github.com/python/cpython/blob/master/Objects/moduleobject.c#L114
>
> That check is always invoked when calling PyModule_Create() and
> PyModule_Create2().  Currently it merely invokes a warning, but we can
> easily turn that into an error.
>
> (with apologies to Martin von Löwis for not fully understanding what he
> did at the time :-))
>

If it's only a warning, I worry that if we stop checking the flag bits it
can cause wild pointer following. This sounds like it would be a potential
security issue (load a module, ignore the warning, try to use a certain API
on a class it defines, boom). Also, could there still be 3rd party modules
out there that haven't been recompiled in a really long time and use some
older backwards compatible module initialization API? (I guess we could
stop supporting that and let them fail hard.)


I think there's a pretty simple way to avoid this kind of problem.

Since PEP 3149 (Python 3.2), the import system has (IIUC) checked for:

foo.cpython-XYm.so
foo.abi3.so
foo.so

If we drop foo.so from this list, then we're pretty much guaranteed not to
load anything into a python that it wasn't intended for.

How disruptive would this be? AFAICT there hasn't been any standard way to
build python extensions named like 'foo.so' since 3.2 was released, so
we're talking about modules from 3.1 and earlier (or else people who are
manually hacking around the compatibility checking system, who can
presumably take care of themselves). We've at a minimum been issuing
warnings about these modules for 5 versions now (based on Antoine's
analysis above), and I'd be really surprised if a module built for 3.1
works on 3.7 anyway. So this change seems pretty reasonable to me.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-15 Thread Nathaniel Smith
On Dec 15, 2017 10:50, "Tim Peters"  wrote:

[Eric Snow ]
> Does that include preserving order after deletion?

Given that we're blessing current behavior:

- At any moment, iteration order is from oldest to newest.  So, "yes"
to your question.

- While iteration starts with the oldest, .popitem() returns the
youngest.  This is analogous to how lists work, viewing a dict
similarly ordered "left to right" (iteration starts at the left,
.pop() at the right, for lists and dicts).


Fortunately, this also matches OrderedDict.popitem().

It'd be nice if we could also support dict.popitem(last=False) to get the
other behavior, again matching OrderedDict.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-12-14 Thread Nathaniel Smith
On Dec 14, 2017 21:30, "Raymond Hettinger" 
wrote:



> On Dec 14, 2017, at 6:03 PM, INADA Naoki  wrote:
>
> If "dict keeps insertion order" is not language spec and we
> continue to recommend people to use OrderedDict to keep
> order, I want to optimize OrderedDict for creation/iteration
> and memory usage.  (See https://bugs.python.org/issue31265#msg301942 )

I support having regular dicts maintain insertion order but am opposed to
Inada changing the implementation of collections.OrderedDict   We can have
the first without having the second.


It seems like the two quoted paragraphs are in vociferous agreement.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 567 -- Context Variables

2017-12-13 Thread Nathaniel Smith
On Tue, Dec 12, 2017 at 10:39 PM, Dima Tisnek  wrote:
> My 2c:
> TL;DR PEP specifies implementation in some detail, but doesn't show
> how proposed change can or should be used.
>
>
>
> get()/set(value)/delete() methods: Python provides syntax sugar for
> these, let's use it.
> (dict: d["k"]/d["k] = value/del d["k"]; attrs: obj.k/obj.k = value/del
> obj.k; inheriting threading.Local)

This was already discussed to death in the PEP 550 threads... what
most users want is a single value, and routing get/set through a
ContextVar object allows for important optimizations and a simpler
implementation. Also, remember that 99% of users will never use these
objects directly; it's a low-level API mostly useful to framework
implementers.

> This PEP and 550 describe why TLS is inadequate, but don't seem to
> specify how proposed context behaves in async world. I'd be most
> interested in how it appears to work to the user of the new library.
>
> Consider a case of asynchronous cache:
>
> async def actual_lookup(name):
> ...
>
> def cached_lookup(name, cache={}):
> if name not in cache:
> cache["name"] = shield(ensure_future(actual_lookup(name))
> return cache["name"]
>
> Unrelated (or related) asynchronous processes end up waiting on the same 
> future:
>
> async def called_with_user_context():
> ...
> await cached_lookup(...)
> ...
>
> Which context is propagated to actual_lookup()?
> The PEP doesn't seem to state that clearly.
> It appears to be first caller's context.

Yes.

> Is it a copy or a reference?

It's a copy, as returned by get_context().

> If first caller is cancelled, the context remains alive.
>
>
>
> token is fragile, I believe PEP should propose a working context
> manager instead.
> Btw., isn't a token really a reference to
> state-of-context-before-it's-cloned-and-modified?

No, a Token only represents the value of one ContextVar, not the whole
Context. This could maybe be clearer in the PEP, but it has to be this
way or you'd get weird behavior from code like:

with decimal.localcontext(...):  # sets and then restores
numpy.seterr(...) # sets without any plan to restore
# after the 'with' block, the decimal ContextVar gets restored
# but this shouldn't affect the numpy.seterr ContextVar

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issues with PEP 526 Variable Notation at the class level

2017-12-08 Thread Nathaniel Smith
On Dec 7, 2017 12:49, "Eric V. Smith"  wrote:

The reason I didn't include it (as @dataclass(slots=True)) is because it
has to return a new class, and the rest of the dataclass features just
modifies the given class in place. I wanted to maintain that conceptual
simplicity. But this might be a reason to abandon that. For what it's
worth, attrs does have an @attr.s(slots=True) that returns a new class with
__slots__ set.


They actually switched to always returning a new class, regardless of
whether slots is set:

https://github.com/python-attrs/attrs/pull/260

You'd have to ask Hynek to get the full rationale, but I believe it was
both for consistency with slot classes, and for consistency with regular
class definition. For example, type.__new__ actually does different things
depending on whether it sees an __eq__ method, so adding a method after the
fact led to weird bugs with hashing. That class of bug goes away if you
always set up the autogenerated methods and then call type.__new__.

 -n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-29 Thread Nathaniel Smith
On Nov 28, 2017 3:55 PM, "Guido van Rossum" <gu...@python.org> wrote:

On Sun, Nov 19, 2017 at 5:40 AM, Nathaniel Smith <n...@pobox.com> wrote:

> Eh, numpy does use FutureWarning for changes where the same code will
> transition from doing one thing to doing something else without
> passing through a state where it raises an error. But that decision
> was based on FutureWarning being shown to users by default, not
> because it matches the nominal purpose :-). IIRC I proposed this
> policy for NumPy in the first place, and I still don't even know if it
> matches the original intent because the docs are so vague. "Will
> change behavior in the future" describes every case where you might
> consider using FutureWarning *or* DeprecationWarning, right?
>
> We have been using DeprecationWarning for changes where code will
> transition from working -> raising an error, and that *is* based on
> the Official Recommendation to hide those by default whenever
> possible. We've been doing this for a few years now, and I'd say our
> experience so far has been... poor. I'm trying to figure out how to
> say this politely. Basically it doesn't work at all. What happens in
> practice is that we issue a DeprecationWarning for a year, mostly
> no-one notices, then we make the change in a 1.x.0 release, everyone's
> code breaks, we roll it back in 1.x.1, and then possibly repeat
> several times in 1.(x+1).0 and 1.(x+2).0 until enough people have
> updated their code that the screams die down. I'm pretty sure we'll be
> changing our policy at some point, possibly to always use
> FutureWarning for everything.


Can one of you check that the latest version of PEP 565 gets this right?


If you're asking about the the proposed new language about FutureWarnings,
it seems fine to me. If you're asking about the PEP as a whole, it seems
fine but I don't think it will make much difference in our case. IPython
has been showing deprecation warnings in __main__ for a few years now, and
it's nice enough. Getting warnings for scripts seems nice too. But we
aren't rolling back changes because they broke someone's one off script –
I'm sure it happens but we don't tend to hear about it. We're responding to
things like major downstream dependencies that nonetheless totally missed
all the warnings.

The part that might help there is evangelising popular test runners like
pytest to change their defaults. To me that's the most interesting change
to come out of this. But it's hard to predict in advance how effective it
will be.

tl;dr: I don't think PEP 565 solves all my problems, but I don't have any
objections to what it does to.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using async/await in place of yield expression

2017-11-27 Thread Nathaniel Smith
On Sun, Nov 26, 2017 at 9:33 PM, Caleb Hattingh
 wrote:
> The PEP only says that __await__ must return an iterator, but it turns out
> that it's also required that that iterator
> should not return any intermediate values.

I think you're confused :-). When the iterator yields an intermediate
value, it does two things:

(1) it suspends the current call stack and returns control to the
coroutine runner (i.e. the event loop)
(2) it sends some arbitrary value back to the coroutine runner

The whole point of `await` is that it can do (1) -- this is what lets
you switch between executing different tasks, so they can pretend to
execute in parallel. However, you do need to make sure that your
__await__ and your coroutine runner are on the same page with respect
to (2) -- if you send a value that the coroutine runner isn't
expecting, it'll get confused. Generally async libraries control both
the coroutine runner and the __await__ method, so they get to invent
whatever arbitrary convention they want.

In asyncio, the convention is that the values you send back must be
Future objects, and the coroutine runner interprets this as a request
to wait for the Future to be resolved, and then resume the current
call stack. In curio, the convention is that you send back a special
tuple describing some operation you want the event loop to perform
[1], and then it resumes your call stack once that operation has
finished. And Trio barely uses this channel at all. (It does transfer
a bit of information that way for convenience/speed, but the main work
of setting up the task to be resumed at the appropriate time happens
through other mechanisms.)

What you observed is that the asyncio coroutine runner gets cranky if
you send it an integer when it was expecting a Future.

Since most libraries assume that they control both __await__ and the
coroutine runner, they don't tend to give great error messages here
(though trio does [2] ;-)). I think this is also why the asyncio docs
don't talk about this. I guess in asyncio's case it is technically a
semi-public API because you need to know how it works if you're the
author of a library like tornado or twisted that wants to integrate
with asyncio. But most people aren't the authors of tornado or
twisted, and the ones who are already know how this works, so the lack
of docs isn't a huge deal in practice...

-n

[1] 
https://github.com/dabeaz/curio/blob/bd0e2cb7741278d1d9288780127dc0807b1aa5b1/curio/traps.py#L48-L156
[2] 
https://github.com/python-trio/trio/blob/2b8e297e544088b98ff758d37c7ad84f74c3f2f5/trio/_core/_run.py#L1521-L1530

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-26 Thread Nathaniel Smith
On Sat, Nov 25, 2017 at 3:37 PM, Guido van Rossum  wrote:
> On Sat, Nov 25, 2017 at 1:05 PM, David Mertz  wrote:
>>
>> FWIW, on a side point. I use 'yield' and 'yield from' ALL THE TIME in real
>> code. Probably 80% of those would be fine with yield statements, but a
>> significant fraction use `gen.send()`.
>>
>> On the other hand, I have yet once to use 'await', or 'async' outside of
>> pedagogical contexts. There are a whole lot of generators, including ones
>> utilizing state injection, that are useful without the scaffolding of an
>> event loop, in synchronous code.
>
>
> Maybe you didn't realize async/await don't need an event loop? Driving an
> async/await-based coroutine is just as simple as driving a yield-from-based
> one (`await` does exactly the same thing as `yield from`).

Technically anything you can write with yield/yield from could also be
written using async/await and vice-versa, but I think it's actually
nice to have both in the language.

The distinction I'd make is that yield/yield from is what you should
use for ad hoc coroutines where the person writing the code that has
'yield from's in it is expected to understand the details of the
coroutine runner, while async/await is what you should use when the
coroutine running is handled by a library like asyncio, and the person
writing code with 'await's in it is expected to treat coroutine stuff
as an opaque implementation detail. (NB I'm using "coroutine" in the
CS sense here, where generators and async functions are both
"coroutines".)

I think of this as being sort of half-way between a style guideline
and a technical guideline. It's like the guideline that lists should
be homogenously-typed and variable length, while tuples are
heterogenously-typed and fixed length: there's nothing in the language
that outright *enforces* this, but it's a helpful convention *and*
things tend to work better if you go along with it.

Here are some technical issues you'll run into if you try to use
async/await for ad hoc coroutines:

- If you don't iterate an async function, you get a "coroutine never
awaited" warning. This may or may not be what you want.

- async/await has associated thread-global state like
sys.set_coroutine_wrapper and sys.set_asyncgen_hooks. Generally async
libraries assume that they own these, and arbitrarily weird things may
happen if you have multiple async/await coroutine runners in same
thread with no coordination between them.

- In async/await, it's not obvious how to write leaf functions:
'await' is equivalent to 'yield from', but there's no equivalent to
'yield'. You have to jump through some hoops by writing a class with a
custom __await__ method or using @types.coroutine. Of course it's
doable, and it's no big deal if you're writing a proper async library,
but it's awkward for quick ad hoc usage.

For a concrete example of 'ad hoc coroutines' where I think 'yield
from' is appropriate, here's wsproto's old 'yield from'-based
incremental websocket protocol parser:


https://github.com/python-hyper/wsproto/blob/4b7db502cc0568ab2354798552148dadd563a4e3/wsproto/frame_protocol.py#L142

The flow here is: received_frames is the public API: it gives you an
iterator over all completed frames. When it stops you're expected to
add more data to the buffer and then call it again. Internally,
received_frames acts as a coroutine runner for parse_more_gen, which
is the main parser that calls various helper methods to parse
different parts of the websocket frame. These calls eventually bottom
out in _consume_exactly or _consume_at_most, which use 'yield' to
"block" until enough data is available in the internal buffer.
Basically this is the classic trick of using coroutines to write an
incremental state machine parser as ordinary-looking code where the
state is encoded in local variables on the stack.

Using coroutines here isn't just a cute trick; I'm pretty confident
that there is absolutely no other way to write a readable incremental
websocket parser in Python. This is the 3rd rewrite of wsproto's
parser, and I think I've read the code for all the other Python
libraries that do this too. The websocket framing format is branchy
enough that trying to write out the state machine explicitly will
absolutely tie you in knots. (Of course we then rewrote wsproto's
parser a 4th time for py2 compatibility; the current version's not
*terrible* but the 'yield from' version was simpler and more
maintainable.)

For wsproto's use case, I think using 'await' would be noticeably
worse than 'yield from'. It'd make the code more opaque to readers
(people know generators but no-one shows up already knowing what
@types.coroutine does), the "coroutine never awaited" warnings would
be obnoxious (it's totally fine to instantiate a parser and then throw
it away without using it!), and the global state issues would make us
very nervous (wsproto is absolutely designed to be used alongside a
library like asyncio or trio). 

Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 9:39 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 25 November 2017 at 15:27, Nathaniel Smith <n...@pobox.com> wrote:
>> On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
>>> def example():
>>> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
>>> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
>>> return comp1, comp2
>>
>> Isn't this a really confusing way of writing
>>
>> def example():
>> return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]
>
> A real use case

Do you have a real use case? This seems incredibly niche...

> wouldn't be iterating over hardcoded tuples in the
> comprehensions, it would be something more like:
>
> def example(iterable1, iterable2):
> comp1 = yield from [(yield x) for x in iterable1]
> comp2 = yield from [(yield x) for x in iterable2]
> return comp1, comp2

I submit that this would still be easier to understand if written out like:

def map_iterable_to_yield_values(iterable):
"Yield the values in iterable, then return a list of the values sent back."
result = []
for obj in iterable:
result.append(yield obj)
return result

def example(iterable1, iterable2):
values1 = yield from map_iterable_to_yield_values(iterable1)
values2 = yield from map_iterable_to_yield_values(iterable2)
return values1, values2

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan  wrote:
> def example():
> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
> return comp1, comp2

Isn't this a really confusing way of writing

def example():
return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]

?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 4:22 PM, Guido van Rossum  wrote:
> The more I hear about this topic, the more I think that `await`, `yield` and
> `yield from` should all be banned from occurring in all comprehensions and
> generator expressions. That's not much different from disallowing `return`
> or `break`.

I would say that banning `yield` and `yield from` is like banning
`return` and `break`, but banning `await` is like banning function
calls. There's no reason for most users to even know that `await` is
related to generators, so a rule disallowing it inside comprehensions
is just confusing. AFAICT 99% of the confusion around async/await is
because people think of them as being related to generators, when from
the user point of view it's not true at all and `await` is just a
funny function-call syntax.

Also, at the language level, there's a key difference between these
cases. A comprehension has implicit `yield`s in it, and then mixing in
explicit `yield`s as well obviously leads to confusion. But when you
use an `await` in a comprehension, that turns it into an async
generator expression (thanks to PEP 530), and in an async generator,
`yield` and `await` use two separate, unrelated channels. So there's
no confusion or problem with having `await` inside a comprehension.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-19 Thread Nathaniel Smith
On Sun, Nov 19, 2017 at 2:26 AM, Serhiy Storchaka  wrote:
> It seems to me that most of issues with FutureWarning on GitHub [1] are
> related to NumPy and pandas which use FutureWarning for its original nominal
> purpose, for warning about using programming interfaces that will change the
> behavior in future. This doesn't have any relation to end users unless the
> end user is an author of the written code.
>
> [1] https://github.com/search?q=FutureWarning=Issues

Eh, numpy does use FutureWarning for changes where the same code will
transition from doing one thing to doing something else without
passing through a state where it raises an error. But that decision
was based on FutureWarning being shown to users by default, not
because it matches the nominal purpose :-). IIRC I proposed this
policy for NumPy in the first place, and I still don't even know if it
matches the original intent because the docs are so vague. "Will
change behavior in the future" describes every case where you might
consider using FutureWarning *or* DeprecationWarning, right?

We have been using DeprecationWarning for changes where code will
transition from working -> raising an error, and that *is* based on
the Official Recommendation to hide those by default whenever
possible. We've been doing this for a few years now, and I'd say our
experience so far has been... poor. I'm trying to figure out how to
say this politely. Basically it doesn't work at all. What happens in
practice is that we issue a DeprecationWarning for a year, mostly
no-one notices, then we make the change in a 1.x.0 release, everyone's
code breaks, we roll it back in 1.x.1, and then possibly repeat
several times in 1.(x+1).0 and 1.(x+2).0 until enough people have
updated their code that the screams die down. I'm pretty sure we'll be
changing our policy at some point, possibly to always use
FutureWarning for everything.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 10:14 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>> The second way is fairly similar, but instead of replacing the entire
>> sys.modules entry, its class is updated to be the class just created --
>> something like sys.modules['mymod'].__class__ = MyNewClass .
>>
>> My request:  Can someone write a better example of the second method?  And
>> include __getattr__ ?

Doh, I forgot to permalinkify those. Better links for anyone reading
this in the future:

> Here's a fairly straightforward example:
>
> https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/_deprecate.py#L114-L140

> (Intentionally doesn't include __dir__ because I didn't want
> deprecated attributes to show up in tab completion. For other use
> cases like lazy imports, you would implement __dir__ too.)
>
> Example usage:
>
> https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/__init__.py#L66-L98

> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 5:49 PM, Guido van Rossum  wrote:
>> If not, why not, and if so, shouldn't PEP 562's __getattr__ also take a
>> 'self'?
>
> Not really, since there's only one module (the one containing the
> __getattr__ function). Plus we already have a 1-argument module-level
> __getattr__ in mypy. See PEP 484.

I guess the benefit of taking 'self' would be that it would make it
possible (though still a bit odd-looking) to have reusable __getattr__
implementations, like:

# mymodule.py
from auto_importer import __getattr__, __dir__

auto_import_modules = {"foo", "bar"}

# auto_importer.py
def __getattr__(self, name):
if name in self.auto_import_modules:
...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman  wrote:
> The second way is fairly similar, but instead of replacing the entire
> sys.modules entry, its class is updated to be the class just created --
> something like sys.modules['mymod'].__class__ = MyNewClass .
>
> My request:  Can someone write a better example of the second method?  And
> include __getattr__ ?

Here's a fairly straightforward example:

https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

(Intentionally doesn't include __dir__ because I didn't want
deprecated attributes to show up in tab completion. For other use
cases like lazy imports, you would implement __dir__ too.)

Example usage:

https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-13 Thread Nathaniel Smith
On Mon, Nov 13, 2017 at 6:09 AM, Serhiy Storchaka <storch...@gmail.com> wrote:
> 13.11.17 14:29, Antoine Pitrou пише:
>>
>> On Mon, 13 Nov 2017 22:37:46 +1100
>> Chris Angelico <ros...@gmail.com> wrote:
>>>
>>> On Mon, Nov 13, 2017 at 9:46 PM, Antoine Pitrou <solip...@pitrou.net>
>>> wrote:
>>>>
>>>> On Sun, 12 Nov 2017 19:48:28 -0800
>>>> Nathaniel Smith <n...@pobox.com> wrote:
>>>>>
>>>>> On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan <ncogh...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> This change will lead to DeprecationWarning being displayed by default
>>>>>> for:
>>>>>>
>>>>>> * code executed directly at the interactive prompt
>>>>>> * code executed directly as part of a single-file script
>>>>>
>>>>>
>>>>> Technically it's orthogonal, but if you're trying to get better
>>>>> warnings in the REPL, then you might also want to look at:
>>>>>
>>>>> https://bugs.python.org/issue1539925
>>>>> https://github.com/ipython/ipython/issues/6611
>>>>
>>>>
>>>> Depends what you call "better".  Personally, I don't want to see
>>>> warnings each and every time I use a deprecated or questionable
>>>> construct or API from the REPL.
>>>
>>>
>>> Isn't that the entire *point* of warnings? When you're working at the
>>> REPL, you're the one in control of which APIs you use, so you should
>>> be the one to know about deprecations.
>>
>>
>> If I see a warning once every REPL session, I know about the deprecation
>> already, thank you.  I don't need to be taken by the hand like a little
>> child.  Besides, the code I write in the REPL is not meant for durable
>> use.
>
>
> Hmm, now I see that the simple Nathaniel's solution is not completely
> correct. If the warning action is 'module', it should be emitted only once
> if used directly in the REPL, because '__main__' is the same module.

True. The fundamental problem is that generally, Python uses
(filename, lineno) pairs to identify lines of code. But (a) the
warning module assumes that for each namespace dict, there is a unique
mapping between line numbers and lines of code, so it ignores filename
and just keys off lineno, and (b) the REPL re-uses the same (file,
lineno) for different lines of code anyway.

So I guess the fully correct solution would be to use a unique
"filename" when compiling each block of code -- e.g. the REPL could do
the equivalent of compile(, "REPL[1]", ...) for the first line,
compile(, "REPL[2]", ...) for the second line, etc. -- and then
also teach the warnings module's duplicate detection logic to key off
of (file, lineno) pairs instead of just lineno.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Standardise the AST (Re: PEP 563: Postponed Evaluation of Annotations)

2017-11-13 Thread Nathaniel Smith
Can you give any examples of problems caused by the ast not being
standardized? The original motivation of being able to distinguish between
  foo(x: int)
  foo(x: "int")
isn't very compelling – it's not clear it's a problem in the first place,
and even if it is then all we need is some kind of boolean flag, not an ast
standard.

On Nov 13, 2017 13:38, "Greg Ewing"  wrote:

> Guido van Rossum wrote:
>
>> But Python's syntax changes in nearly every release.
>>
>
> The changes are almost always additions, so there's no
> reason why the AST can't remain backwards compatible.
>
> the AST level ... elides many details (such as whitespace and parentheses).
>>
>
> That's okay, because the AST is only expected to
> represent the semantics of Python code, not its
> exact lexical representation in the source. It's
> the same with Lisp -- comments and whitespace have
> been stripped out by the time you get to Lisp
> data.
>
> Lisp had almost no syntax so I presume the mapping to data structures was
>> nearly trivial compared to Python.
>>
>
> Yes, the Python AST is more complicated, but we
> already have that much complexity in the AST being
> used by the compiler.
>
> If I understand correctly, we also have a process
> for converting that internal structure to and from
> an equally complicated set of Python objects, that
> isn't needed by the compiler and exists purely for
> the convenience of Python code.
>
> I can't see much complexity being added if we were
> to decide to standardise the Python representation.
>
> --
> Greg
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
> 40pobox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561 rework

2017-11-12 Thread Nathaniel Smith
On Sun, Nov 12, 2017 at 11:21 AM, Ethan Smith  wrote:
>
>
> On Sun, Nov 12, 2017 at 9:53 AM, Jelle Zijlstra 
> wrote:
>>
>> 2017-11-12 3:40 GMT-08:00 Ethan Smith :
>>> The name of the stub
>>> package
>>> MUST follow the scheme ``pkg_stubs`` for type stubs for the package named
>>> ``pkg``. The normal resolution order of checking ``*.pyi`` before
>>> ``*.py``
>>> will be maintained.
>>
>> This is very minor, but what do you think of using "pkg-stubs" instead of
>> "pkg_stubs" (using a hyphen rather than an underscore)? This would make the
>> name illegal to import as a normal Python package, which makes it clear that
>> it's not a normal package. Also, there could be real packages named
>> "_stubs".
>
> I suppose this makes sense. I checked PyPI and as of a few weeks ago there
> were no packages with the name pattern, but I like the idea of making it
> explicitly non-runtime importable. I cannot think of any reason not to do
> it, and the avoidance of confusion about the package being importable is a
> benefit. I will make the change with my next round of edits.

PyPI doesn't distinguish between the names 'foo-stubs' and 'foo_stubs'
-- they get normalized together. So even if you use 'foo-stubs' as the
directory name on sys.path to avoid collisions at import time, it
still won't allow someone to distribute a separate 'foo_stubs' package
on PyPI.

If you do go with a fixed naming convention like this, the PEP should
probably also instruct the PyPI maintainers that whoever owns 'foo'
automatically has the right to control the name 'foo-stubs' as well.
Or maybe some tweak to PEP 541 is needed.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-12 Thread Nathaniel Smith
On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan  wrote:
> This change will lead to DeprecationWarning being displayed by default for:
>
> * code executed directly at the interactive prompt
> * code executed directly as part of a single-file script

Technically it's orthogonal, but if you're trying to get better
warnings in the REPL, then you might also want to look at:

https://bugs.python.org/issue1539925
https://github.com/ipython/ipython/issues/6611

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-11 Thread Nathaniel Smith
On Fri, Nov 10, 2017 at 11:34 PM, Brett Cannon <br...@python.org> wrote:
> On Thu, Nov 9, 2017, 17:33 Nathaniel Smith, <n...@pobox.com> wrote:
>> - if an envvar CI=true is set, then by default make deprecation warnings
>> into errors. (This is an informal standard that lots of CI systems use.
>> Error instead of "once" because most people don't look at CI output at all
>> unless there's an error.)
>
> One problem with that is I don't want e.g. mypy to start spewing out
> warnings while checking my code. That's why I like Victor's idea of a -X
> option that also flips on other test/debug features. Yes, this would also
> trigger for test runners, but that's at least a smaller amount of affected
> code.

Ah, yeah, you're right -- often CI systems use Python programs for
infrastructure, beyond the actual code under test. pip is maybe a more
obvious example than mypy -- we probably don't want pip to stop
working in CI runs just because it happens to use a deprecated API
somewhere :-). So this idea doesn't work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-10 Thread Nathaniel Smith
On Tue, Nov 7, 2017 at 8:45 AM, Nathaniel Smith <n...@pobox.com> wrote:
> Also, IIRC it's actually impossible to set the stacklevel= correctly when
> you're deprecating a whole module and issue the warning at import time,
> because you need to know how many stack frames the import system uses.

Doh, I didn't remember correctly. Actually Brett fixed this in 3.5:
https://bugs.python.org/issue24305

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-09 Thread Nathaniel Smith
On Nov 8, 2017 16:12, "Nick Coghlan"  wrote:

On 9 November 2017 at 07:46, Antoine Pitrou  wrote:
>
> Le 08/11/2017 à 22:43, Nick Coghlan a écrit :
>>
>> However, between them, the following two guidelines should provide
>> pretty good deprecation warning coverage for the world's Python code:
>>
>> 1. If it's in __main__, it will emit deprecation warnings at runtime
>> 2. If it's not in __main__, it should have a test suite
>
> Nick, have you actually read the discussion and the complaints people
> had with the current situation?  Most of them *don't* specifically talk
> about __main__ scripts.

I have, and I've also re-read the discussions regarding why the
default got changed in the first place.

Behaviour up until 2.6 & 3.1:

once::DeprecationWarning

Behaviour since 2.7 & 3.2:

ignore::DeprecationWarning

With test runners overriding the default filters to set it back to
"once::DeprecationWarning".


Is this intended to be a description of the current state of affairs?
Because I've never encountered a test runner that does this... Which
runners are you thinking of?


The rationale for that change was so that end users of applications
that merely happened to be written in Python wouldn't see deprecation
warnings when Linux distros (or the end user) updated to a new Python
version. It had the downside that you had to remember to opt-in to
deprecation warnings in order to see them, which is a problem if you
mostly use Python for ad hoc personal scripting.

Proposed behaviour for Python 3.7+:

once::DeprecationWarning:__main__
ignore::DeprecationWarning

With test runners still overriding the default filters to set them
back to "once::DeprecationWarning".

This is a partial reversion back to the pre-2.7 behaviour, focused
specifically on interactive use and ad hoc personal scripting. For ad
hoc *distributed* scripting, the changed default encourages upgrading
from single-file scripts to the zipapp model, and then minimising the
amount of code that runs directly in __main__.py.

I expect this will be a sufficient change to solve the specific
problem I'm personally concerned by, so I'm no longer inclined to
argue for anything more complicated. Other folks may have other
concerns that this tweak to the default filters doesn't address - they
can continue to build their case for more complex options using this
as the new baseline behaviour.


I think most people's concern is that we've gotten into a state where
DeprecationWarning's are largely useless in practice, because no one sees
them. Effectively the norm now is that developers (both the Python core
team and downstream libraries) think they're following some sensible
deprecation cycle, but often they're actually making changes without any
warning, just they wait a year to do it. It's not clear why we're bothering
through multiple releases -- which adds major overhead -- if in practice we
aren't going to actually warn most people. Enabling them for another 1% of
code doesn't really address this.

As I mentioned above, it's also having the paradoxical effect of making it
so that end-users are *more* likely to see deprecation warnings, since
major libraries are giving up on using DeprecationWarning. Most recently it
looks like pyca/cryptography is going to switch, partly as a result of this
thread:
  https://github.com/pyca/cryptography/pull/4014

Some more ideas to throw out there:

- if an envvar CI=true is set, then by default make deprecation warnings
into errors. (This is an informal standard that lots of CI systems use.
Error instead of "once" because most people don't look at CI output at all
unless there's an error.)

- provide some mechanism that makes it easy to have a deprecation warning
that starts out as invisible, but then becomes visible as you get closer to
the switchover point. (E.g. CPython might make the deprecation warnings
that it issues be invisible in 3.x.0 and 3.x.1 but become visible in
3.x.2+.) Maybe:

# in warnings.py
def deprecation_warning(library_version, visible_in_version,
change_in_version, msg, stacklevel):
...

Then a call like:

  deprecation_warning(my_library.__version__, "1.3", "1.4", "This function
is deprecated", 2)

issues an InvisibleDeprecationWarning if my_library.__version__ < 1.3, and
a VisibleDeprecationWarning otherwise.

(The stacklevel argument is mandatory because the usual default of 1 is
always wrong for deprecation warnings.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The current dict is not an "OrderedDict"

2017-11-09 Thread Nathaniel Smith
On Thu, Nov 9, 2017 at 1:46 PM, Cameron Simpson  wrote:
> On 08Nov2017 10:28, Antoine Pitrou  wrote:
>>
>> On Wed, 8 Nov 2017 13:07:12 +1000
>> Nick Coghlan  wrote:
>>>
>>> On 8 November 2017 at 07:19, Evpok Padding 
>>> wrote:
>>> > On 7 November 2017 at 21:47, Chris Barker 
>>> > wrote:
>>> >> if dict order is preserved in cPython , people WILL count on it!
>>> >
>>> > I won't, and if people do and their code break, they'll have only
>>> > themselves
>>> > to blame.
>>> > Also, what proof do you have of that besides anecdotal evidence ?
>>>
>>> ~27 calendar years of anecdotal evidence across a multitude of CPython
>>> API behaviours (as well as API usage in other projects).
>>>
>>> Other implementation developers don't say "CPython's runtime behaviour
>>> is the real Python specification" for the fun of it - they say it
>>> because "my code works on CPython, but it does the wrong thing on your
>>> interpreter, so I'm going to stick with CPython" is a real barrier to
>>> end user adoption, no matter what the language specification says.
>>
>>
>> Yet, PyPy has no reference counting, and it doesn't seem to be a cause
>> of concern.  Broken code is fixed along the way, when people notice.
>
>
> I'd expect that this may be because that would merely to cause temporary
> memory leakage or differently timed running of __del__ actions.  Neither of
> which normally affects semantics critical to the end result of most
> programs.

It's actually a major problem when porting apps to PyPy. The common
case is servers that crash because they rely on the GC to close file
descriptors, and then run out of file descriptors. IIRC this is the
major obstacle to supporting OpenStack-on-PyPy. NumPy is currently
going through the process to deprecate and replace a core bit of API
[1] because it turns out to assume a refcounting GC.

-n

[1] See:
  https://github.com/numpy/numpy/pull/9639
  https://mail.python.org/pipermail/numpy-discussion/2017-November/077367.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-07 Thread Nathaniel Smith
On Nov 7, 2017 12:02 PM, "Barry Warsaw"  wrote:

On Nov 7, 2017, at 09:39, Paul Sokolovsky  wrote:

> So, the problem is that there's no "Python language spec”.

There is a language specification: https://docs.python.org/3/refe
rence/index.html

But there are still corners that are undocumented, or topics that are
deliberately left as implementation details.


Also, specs don't mean that much unless there are multiple implementations
in widespread use. In JS the spec matters because it describes the common
subset of the language you can expect to see across browsers, and lets the
browser vendors coordinate on future changes. Since users actually target
and test against multiple implementations, this is useful. In python,
CPython's dominance means that most libraries are written against CPython's
behavior instead of the spec, and alternative implementations generally
don't care about the spec, they care about whether they can run the code
their users want to run. So PyPy has found that for their purposes, the
python spec includes all kinds of obscure internal implementation details
like CPython's static type/heap type distinction, the exact tricks CPython
uses to optimize local variable access, the CPython C API, etc. The Pyston
devs found that for their purposes, refcounting actually was a mandatory
part of the python language. Jython, MicroPython, etc make a different set
of compatibility tradeoffs again.

I'm not saying the spec is useless, but it's not magic either. It only
matters to the extent that it solves some problem for people.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-07 Thread Nathaniel Smith
On Nov 7, 2017 06:24, "Nick Coghlan"  wrote:

On 7 November 2017 at 19:30, Paul Moore  wrote:
> On 7 November 2017 at 04:09, Nick Coghlan  wrote:
>> Given the status quo, how do educators learn that the examples they're
>> teaching to their students are using deprecated APIs?
>
> By reading the documentation on what they are teaching, and by testing
> their examples with new versions with deprecation warnings turned on?
> Better than having warnings appear the first time they run a course
> with a new version of Python, surely?
>
> I understand the "but no-one actually does this" argument. And I
> understand that breakage as a result is worse than a few warnings. But
> enabling deprecation warnings by default feels to me like favouring
> the developer over the end user. I remember before the current
> behaviour was enabled and it was *immensely* frustrating to try to use
> 3rd party code and get a load of warnings. The only options were:
>
> 1. Report the bug - usually not much help, as I want to run the
> program *now*, not when a new release is made.
> 2. Fix the code (and ideally submit a PR upstream) - I want to *use*
> the program, not debug it.
> 3. Find the right setting/environment variable, and tweak how I call
> the program to apply it - which doesn't fix the root cause, it's just
> a workaround.

Yes, this is why I've come around to the view that we need to come up
with a viable definition of "third party code" and leave deprecation
warnings triggered by that code disabled by default.

My suggestion for that definition is to have the *default* meaning of
"third party code" be "everything that isn't __main__".


That way, if you get a deprecation warning at the REPL, it's
necessarily because of something *you* did, not because of something a
library you called did. Ditto for single file scripts.


IPython actually made this change a few years ago; since 2015 I think it
has shown DeprecationWarnings by default if they're triggered by __main__.

It's helpful but I haven't noticed it eliminating this problem. One
limitation in particular is that it requires that the warnings are
correctly attributed to the code that triggered them, which means that
whoever is issuing the warning has to set the stacklevel= correctly, and
most people don't. (The default of stacklevel=1 is always wrong for
DeprecationWarning.) Also, IIRC it's actually impossible to set the
stacklevel= correctly when you're deprecating a whole module and issue the
warning at import time, because you need to know how many stack frames the
import system uses.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-06 Thread Nathaniel Smith
On Sun, Nov 5, 2017 at 9:38 PM, Nick Coghlan  wrote:
> We've been running the current experiment for 7 years, and the main
> observable outcome has been folks getting surprised by breaking
> changes in CPython releases, especially folks that primarily use
> Python interactively (e.g. for data analysis), or as a scripting
> engine (e.g. for systems administration).

It's also caused lots of projects to switch to using their own ad hoc
warning types for deprecations, e.g. off the top of my head:

https://github.com/matplotlib/matplotlib/blob/6c51037864f9a4ca816b68ede78207f7ecec656c/lib/matplotlib/cbook/deprecation.py#L5
https://github.com/numpy/numpy/blob/d75b86c0c49f7eb3ec60564c2e23b3ff237082a2/numpy/_globals.py#L45
https://github.com/python-trio/trio/blob/f50aa8e00c29c7f2953b7bad38afc620772dca74/trio/_deprecate.py#L16

So in some ways the change has actually made it *harder* for end-user
applications/scripts to hide all deprecation warnings, because for
each package you use you have to somehow figure out which
idiosyncratic type it uses, and filter them each separately.

(In any changes though please do keep in mind that Python itself is
not the only one issuing deprecation warnings. I'm thinking in
particular of the filter-based-on-Python-version idea. Maybe you could
have subclasses like Py35DeprecationWarning and filter on those?)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-05 Thread Nathaniel Smith
On Nov 5, 2017 2:41 PM, "Paul Ganssle"  wrote:

I think the question of whether any specific implementation of dict could
be made faster for a given architecture or even that the trade-offs made by
CPython are generally the right ones is kinda beside the point. It's
certainly feasible that an implementation that does not preserve ordering
could be better for some implementation of Python, and the question is
really how much is gained by changing the language semantics in such a way
as to cut off that possibility.


The language definition is not nothing, but I think it's easy to
overestimate its importance. CPython does in practice provide ordering
guarantees for dicts, and this solves a whole bunch of pain points: it
makes json roundtripping work better, it gives ordered kwargs, it makes it
possible for metaclasses to see the order class items were defined, etc.
And we got all these goodies for better-than-free: the new dict is faster
and uses less memory. So it seems very unlikely that CPython is going to
revert this change in the foreseeable future, and that means people will
write code that depends on this, and that means in practice reverting it
will become impossible due to backcompat and it will be important for other
interpreters to implement, regardless of what the language definition says.

That said, there are real benefits to putting this in the spec. Given that
we're not going to get rid of it, we might as well reward the minority of
programmers who are conscientious about following the spec by letting them
use it too. And there were multiple PEPs that went away when this was
merged; no one wants to resurrect them just for hypothetical future
implementations that may never exist. And putting it in the spec will mean
that we can stop having this argument over and over with the same points
rehashed for those who missed the last one. (This isn't aimed at you or
anything; it's not your fault you don't know all these arguments off the
top of your head, because how would you? But it is a reality of mailing
list dynamics that rehashing this kind of thing sucks up energy without
producing much.)

MicroPython deviates from the language spec in lots of ways. Hopefully this
won't need to be another one, but it won't be the end of the world if it is.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-27 Thread Nathaniel Smith
On Thu, Oct 26, 2017 at 3:42 PM, Ethan Smith  wrote:
> However, the stubs may be put in a sub-folder
> of the Python sources, with the same name the ``*.py`` files are in. For
> example, the ``flyingcircus`` package would have its stubs in the folder
> ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are
> not found in ``flyingcircus/`` the type checker may treat the subdirectory as
> a normal package.

I admit that I find this aesthetically unpleasant. Wouldn't something
like __typestubs__/ be a more Pythonic name? (And also avoid potential
name clashes, e.g. my async_generator package has a top-level export
called async_generator; normally you do 'from async_generator import
async_generator'. I think that might cause problems if I created an
async_generator/async_generator/ directory, especially post-PEP 420.)

I also don't understand the given rationale -- it sounds like you want
to be able say well, if ${SOME_DIR_ON_PYTHONPATH}/flyingcircus/
doesn't contain stubs, then just stick the
${SOME_DIR_ON_PYTHONPATH}/flyingcircus/ directory *itself* onto
PYTHONPATH, and then try again. But that's clearly the wrong thing,
because then you'll also be adding a bunch of other random junk into
that directory into the top-level namespace. For example, suddenly the
flyingcircus.summarise_proust module has become a top-level
summarise_proust package. I must be misunderstanding something?

> Type Checker Module Resolution Order
> 
>
> The following is the order that type checkers supporting this PEP should
> resolve modules containing type information:
>
> 1. User code - the files the type checker is running on.
>
> 2. Stubs or Python source manually put in the beginning of the path. Type
>checkers should provide this to allow the user complete control of which
>stubs to use, and patch broken stubs/inline types from packages.
>
> 3. Third party stub packages - these packages can supersede the installed
>untyped packages. They can be found at ``pkg-stubs`` for package ``pkg``,
>however it is encouraged to check the package's metadata using packaging
>query APIs such as ``pkg_resources`` to assure that the package is meant
>for type checking, and is compatible with the installed version.

Am I right that this means you need to be able to map from import
names to distribution names? I.e., if you see 'import foo', you need
to figure out which *.dist-info directory contains metadata for the
'foo' package? How do you plan to do this?

The problem is that technically, import names and distribution names
are totally unrelated namespaces -- for example, the '_pytest' package
comes from the 'pytest' distribution, the 'pylab' package comes from
'matplotlib', and 'pip install scikit-learn' gives you a package
imported as 'sklearn'. Namespace packages are also challenging,
because a single top-level package might actually be spread across
multiple distributions.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-17 Thread Nathaniel Smith
On Oct 17, 2017 11:25 AM, "Guido van Rossum"  wrote:


In short, I really don't think there's a need for context variables to be
faster than instance variables.


There really is: currently the cost of looking up a thread local through
the C API is a dict lookup, which is faster than instance variable lookup,
and decimal and numpy have both found that that's already too expensive.

Or maybe you're just talking about the speed when the cache misses, in
which case never mind :-).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith
On Mon, Oct 16, 2017 at 8:49 AM, Guido van Rossum <gu...@python.org> wrote:
> On Sun, Oct 15, 2017 at 10:26 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum <gu...@python.org>
>> wrote:
>> > Yes, that's what I meant by "ignoring generators". And I'd like there to
>> > be
>> > a "current context" that's a per-thread MutableMapping with ContextVar
>> > keys.
>> > Maybe there's not much more to it apart from naming the APIs for getting
>> > and
>> > setting it? To be clear, I am fine with this being a specific subtype of
>> > MutableMapping. But I don't see much benefit in making it more abstract
>> > than
>> > that.
>>
>> We don't need it to be abstract (it's fine to have a single concrete
>> mapping type that we always use internally), but I think we do want it
>> to be opaque (instead of exposing the MutableMapping interface, the
>> only way to get/set specific values should be through the ContextVar
>> interface). The advantages are:
>>
>> - This allows C level caching of values in ContextVar objects (in
>> particular, funneling mutations through a limited API makes cache
>> invalidation *much* easier)
>
>
> Well the MutableMapping could still be a proxy or something that invalidates
> the cache when mutated. That's why I said it should be a single concrete
> mapping type. (It also doesn't have to derive from MutableMapping -- it's
> sufficient for it to be a duck type for one, or perhaps some Python-level
> code could `register()` it.

MutableMapping is just a really complicated interface -- you have to
deal with iterator invalidation and popitem and implementing view
classes and all that. It seems like a lot of code for a feature that
no-one seems to worry about missing right now. (In fact, I suspect the
extra code required to implement the full MutableMapping interface on
top of a basic HAMT type is larger than the extra code to implement
the current PEP 550 draft's chaining semantics on top of this proposal
for a minimal PEP 550.)

What do you think of something like:

class Context:
def __init__(self, /, init: MutableMapping[ContextVar,object] = {}):
...

def as_dict(self) -> Dict[ContextVar, object]:
"Returns a snapshot of the internal state."

def copy(self) -> Context:
"Equivalent to (but maybe faster than) Context(self.as_dict())."

I like the idea of making it possible to set up arbitrary Contexts and
introspect them, because sometimes you do need to debug weird issues
or do some wacky stuff deep in the guts of a coroutine scheduler, but
this would give us that without implementing MutableMapping's 17
methods and 7 helper classes.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith
On Mon, Oct 16, 2017 at 11:12 AM, Ethan Furman  wrote:
> What would be really nice is to have attribute access like thread locals.
> Instead of working with individual ContextVars you grab the LocalContext and
> access the vars as attributes.  I don't recall reading in the PEP why this
> is a bad idea.

You're mixing up levels -- the way threading.local objects work is
that there's one big dict that's hidden inside the interpreter (in the
ThreadState), and it holds a separate little dict for each
threading.local. The dict holding ContextVars is similar to the big
dict; a threading.local itself is like a ContextVar that holds a dict.
(And the reason it's this way is that it's easy to build either
version on top of the other, and we did some survey of threading.local
usage and the ContextVar style usage was simpler in the majority of
cases.)

For threading.local there's no way to get at the big dict at all from
Python; it's hidden inside the C APIs and threading internals. I'm
guessing you've never missed this :-). For ContextVars we can't hide
it that much, because async frameworks need to be able to swap the
current dict when switching tasks and clone it when starting a new
task, but those are the only absolutely necessary operations.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith
On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum <gu...@python.org> wrote:
> On Sun, Oct 15, 2017 at 8:17 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov <yselivanov...@gmail.com>
>> wrote:
>> > Stage 1. A new execution context PEP to solve the problem *just for
>> > async code*.  The PEP will target Python 3.7 and completely ignore
>> > synchronous generators and asynchronous generators.  It will be based
>> > on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
>> > optimization) and borrow some good API decisions from PEP 550 v3+
>> > (contextvars module, ContextVar class).  The API (and C-API) will be
>> > designed to be future proof and ultimately allow transition to the
>> > stage 2.
>>
>> If you want to ignore generators/async generators, then I think you
>> don't even want PEP 550 v1, you just want something like a
>> {set,get}_context_state API that lets you access the ThreadState's
>> context dict (or rather, an opaque ContextState object that holds the
>> context dict), and then task schedulers can call them at appropriate
>> moments.
>
>
> Yes, that's what I meant by "ignoring generators". And I'd like there to be
> a "current context" that's a per-thread MutableMapping with ContextVar keys.
> Maybe there's not much more to it apart from naming the APIs for getting and
> setting it? To be clear, I am fine with this being a specific subtype of
> MutableMapping. But I don't see much benefit in making it more abstract than
> that.

We don't need it to be abstract (it's fine to have a single concrete
mapping type that we always use internally), but I think we do want it
to be opaque (instead of exposing the MutableMapping interface, the
only way to get/set specific values should be through the ContextVar
interface). The advantages are:

- This allows C level caching of values in ContextVar objects (in
particular, funneling mutations through a limited API makes cache
invalidation *much* easier)

- It gives us flexibility to change the underlying data structure
without breaking API, or for different implementations to make
different choices -- in particular, it's not clear whether a dict or
HAMT is better, and it's not clear whether a regular dict or
WeakKeyDict is better.

The first point (caching) I think is the really compelling one: in
practice decimal and numpy are already using tricky caching code to
reduce the overhead of accessing the ThreadState dict, and this gets
even trickier with context-local state which has more cache
invalidation points, so if we don't do this in the interpreter then it
could actually become a blocker for adoption. OTOH it's easy for the
interpreter itself to do this caching, and it makes everyone faster.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith
On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov  wrote:
> Hi,
>
> It looks like the discussion about the execution context became
> extremely hard to follow.  There are many opinions on how the spec for
> generators should look like.  What seems to be "natural"
> behaviour/example to one, seems to be completely unreasonable to other
> people.  Recent emails from Guido indicate that he doesn't want to
> implement execution contexts for generators (at least in 3.7).
>
> In another thread Guido said this: "... Because coroutines and
> generators are similar under the covers, Yury demonstrated the issue
> with generators instead of coroutines (which are unfamiliar to many
> people). And then somehow we got hung up about fixing the problem in
> the example."
>
> And Guido is right.  My initial motivation to write PEP 550 was to
> solve my own pain point, have a solution for async code.
> 'threading.local' is completely unusable there, but complex code bases
> demand a working solution.  I thought that because coroutines and
> generators are so similar under the hood, I can design a simple
> solution that will cover all edge cases.  Turns out it is not possible
> to do it in one pass.
>
> Therefore, in order to make some progress, I propose to split the
> problem in half:
>
> Stage 1. A new execution context PEP to solve the problem *just for
> async code*.  The PEP will target Python 3.7 and completely ignore
> synchronous generators and asynchronous generators.  It will be based
> on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
> optimization) and borrow some good API decisions from PEP 550 v3+
> (contextvars module, ContextVar class).  The API (and C-API) will be
> designed to be future proof and ultimately allow transition to the
> stage 2.

If you want to ignore generators/async generators, then I think you
don't even want PEP 550 v1, you just want something like a
{set,get}_context_state API that lets you access the ThreadState's
context dict (or rather, an opaque ContextState object that holds the
context dict), and then task schedulers can call them at appropriate
moments.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Nathaniel Smith
On Sun, Oct 1, 2017 at 7:04 PM, INADA Naoki  wrote:
> 4. http.client
>
> import time:  1376 |   2448 |   email.header
> ...
> import time:  1469 |   7791 |   email.utils
> import time:   408 |  10646 | email._policybase
> import time:   939 |  12210 |   email.feedparser
> import time:   322 |  12720 | email.parser
> ...
> import time:   599 |   1361 | email.message
> import time:  1162 |  16694 |   http.client
>
> email.parser has very large import tree.
> But I don't know how to break the tree.

There is some work to get urllib3/requests to stop using http.client,
though it's not clear if/when it will actually happen:
https://github.com/shazow/urllib3/pull/1068

> Another major slowness comes from compiling regular expression.
> I think we can increase cache size of `re.compile` and use ondemand cached
> compiling (e.g. `re.match()`),
> instead of "compile at import time" in many modules.

In principle re.compile() itself could be made lazy -- return a
regular exception object that just holds the string, and then compiles
and caches it the first time it's used. Might be tricky to do in a
backwards compatibility way if it moves detection of invalid regexes
from compile time to use time, but it could be an opt-in flag.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-25 Thread Nathaniel Smith
On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
>> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
>> what the benefit would be.  You can compose either list manually with
>> a simple comprehension:
>>
>> [interp for interp in interpreters.list_all() if interp.is_running()]
>> [interp for interp in interpreters.list_all() if not interp.is_running()]
>
> There is a inherit race condition in doing that, at least if
> interpreters are running in multiple threads (which I assume is going
> to be the overly dominant usage model).  That is why I'm proposing all
> three variants.

There's a race condition no matter what the API looks like -- having a
dedicated running_interpreters() lets you guarantee that the returned
list describes the set of interpreters that were running at some
moment in time, but you don't know when that moment was and by the
time you get the list, it's already out-of-date. So this doesn't seem
very useful. OTOH if we think that invariants like this are useful, we
might also want to guarantee that calling running_interpreters() and
idle_interpreters() gives two lists such that each interpreter appears
in exactly one of them, but that's impossible with this API; it'd
require a single function that returns both lists.

What problem are you trying to solve?

>> Likewise,
>> queue.Queue.send() supports blocking, in addition to providing a
>> put_nowait() method.
>
> queue.Queue.put() never blocks in the usual case (*), which is of an
> unbounded queue.  Only bounded queues (created with an explicit
> non-zero max_size parameter) can block in Queue.put().
>
> (*) and therefore also never deadlocks :-)

Unbounded queues also introduce unbounded latency and memory usage in
realistic situations. (E.g. a producer/consumer setup where the
producer runs faster than the consumer.) There's a reason why sockets
always have bounded buffers -- it's sometimes painful, but the pain is
intrinsic to building distributed systems, and unbounded buffers just
paper over it.

>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

Technically you just need the other end to wake up at some time in
between any two calls to send(), and if there's no GIL then this
doesn't necessarily require a context switch.

> Also, suddenly an interpreter's ability to exploit CPU time is
> dependent on another interpreter's ability to consume data in a timely
> manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> IMHO it would be better not to have such coupling.

A small buffer probably is useful in some cases, yeah -- basically
enough to smooth out scheduler jitter.

>> > it also increases the likelihood of deadlocks.
>>
>> How much of a problem will deadlocks be in practice?
>
> I expect more often than expected, in complex systems :-)  For example,
> you could have a recv() loop that also from time to time send()s some
> data on another queue, depending on what is received.  But if that
> send()'s recipient also has the same structure (a recv() loop which
> send()s from time to time), then it's easy to imagine to two getting in
> a deadlock.

You kind of want to be able to create deadlocks, since the alternative
is processes that can't coordinate and end up stuck in livelocks or
with unbounded memory use etc.

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

It's fairly reasonable to implement a mutex using a CSP-style
unbuffered channel (send = acquire, receive = release). And the same
trick turns a channel with a fixed-size buffer into a bounded
semaphore. It won't be as efficient as a modern specialized mutex
implementation, of course, but it's workable.

Unfortunately while technically you can construct a buffered channel
out of an unbuffered channel, the construction's pretty unreasonable
(it needs two dedicated threads per channel).

-n

-- 

Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 10:59 AM, Antoine Pitrou <anto...@python.org> wrote:
> Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
>>>
>>>> Why are reference cycles a problem that needs solving?
>>>
>>> Because sometimes they are holding up costly resources in memory when
>>> people don't expect them to.  Such as large Numpy arrays :-)
>>
>> Do we have any reason to believe that this is actually happening on a
>> regular basis though?
>
> Define "regular" :-)  We did get some reports on dask/distributed about it.

Caused by uncollected cycles involving tracebacks? I looked here:

  
https://github.com/dask/distributed/issues?utf8=%E2%9C%93=is%3Aissue%20memory%20leak

and saw some issues with cycles causing delayed collection (e.g. #956)
or the classic memory leak problem of explicitly holding onto data you
don't need any more (e.g. #1209, bpo-29861), but nothing involving
traceback cycles. It was just a quick skim though.

>> If it is then it might make sense to look at the cycle collection
>> heuristics; IIRC they're based on a fairly naive count of how many
>> allocations have been made, without regard to their size.
>
> Yes... But just because a lot of memory has been allocated isn't a good
> enough heuristic to launch a GC collection.

I'm not an expert on GC at all, but intuitively it sure seems like
allocation size might be a useful piece of information to feed into a
heuristic. Our current heuristic is just, run a small collection after
every 700 allocations, run a larger collection after 10 smaller
collections.

> What if that memory is
> gonna stay allocated for a long time?  Then you're frequently launching
> GC runs for no tangible result except more CPU consumption and frequent
> pauses.

Every heuristic has problematic cases, that's why we call it a
heuristic :-). But somehow every other GC language manages to do
well-enough without refcounting... I think they mostly have more
sophisticated heuristics than CPython, though. Off the top of my head,
I know PyPy's heuristic involves the ratio of the size of nursery
objects versus the size of the heap, and JVMs do much cleverer things
like auto-tuning nursery size to make empirical pause times match some
target.

> Perhaps we could special-case tracebacks somehow, flag when a traceback
> remains alive after the implicit "del" clause at the end of an "except"
> block, then maintain some kind of linked list of the flagged tracebacks
> and launch specialized GC runs to find cycles accross that collection.
> That sounds quite involved, though.

We already keep a list of recently allocated objects and have a
specialized GC that runs across just that collection. That's what
generational GC is :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 9:50 AM, Antoine Pitrou <solip...@pitrou.net> wrote:
> On Mon, 18 Sep 2017 09:42:45 -0700
> Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Obviously it's nice when the refcount system is able to implicitly clean
>> things up in a prompt and deterministic way, but there are already tools to
>> handle the cases where it doesn't (ResourceWarning, context managers, ...),
>> and the more we encourage people to implicitly rely on refcounting, [...]
>
> The thing is, we don't need to encourage them.  Having objects disposed
> of when the last visible reference vanishes is a pretty common
> expectation people have when using CPython.
>
>> Why are reference cycles a problem that needs solving?
>
> Because sometimes they are holding up costly resources in memory when
> people don't expect them to.  Such as large Numpy arrays :-)

Do we have any reason to believe that this is actually happening on a
regular basis though?

If it is then it might make sense to look at the cycle collection
heuristics; IIRC they're based on a fairly naive count of how many
allocations have been made, without regard to their size.

> And, no, there are no obvious ways to fix for users.  gc.collect()
> is much too costly to be invoked on a regular basis.
>
>> Because if so then that seems
>> like a bug in the warnings mechanism; there's no harm in a dead Thread
>> hanging around until collected, and Victor may have wasted a day debugging
>> an issue that wasn't a problem in the first place...
>
> Yes, I think Victor is getting a bit overboard with the so-called
> "dangling thread" issue.  But the underlying issue (that heavyweight
> resources can be inadvertently held up in memory up just because some
> unrelated exception was caught and silenced along the way) is a real
> one.

Simply catching and silencing exceptions doesn't create any loops -- if you do

try:
raise ValueError
except ValueError as exc:
raise exc

then there's no loop, because the 'exc' local gets cleared as soon as
you exit the except: block. The issue that Victor ran into with
socket.create_connection is a special case where that function saves
off the caught exception to use later.

If someone wanted to replace socket.create_connection's 'raise err' with

try:
raise err
finally:
del err

then I guess that would be pretty harmless...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Sep 18, 2017 07:58, "Antoine Pitrou"  wrote:


Le 18/09/2017 à 16:52, Guido van Rossum a écrit :
>
> In Python 2 the traceback was not part of the exception object because
> there was (originally) no cycle GC. In Python GC we changed the awkward
> interface to something more useful, because we could depend on GC. Why
> are we now trying to roll back this feature? We should just improve GC.
> (Or perhaps you shouldn't be raising so many exceptions. :-)

Improving the GC is obviously a good thing, but what heuristic would you
have in mind that may solve the issue at hand?


I read the whole thread and I'm not sure what the issue at hand is :-).
Obviously it's nice when the refcount system is able to implicitly clean
things up in a prompt and deterministic way, but there are already tools to
handle the cases where it doesn't (ResourceWarning, context managers, ...),
and the more we encourage people to implicitly rely on refcounting, the
harder it is to optimize the interpreter or use alternative language
implementations. Why are reference cycles a problem that needs solving?

Actually the bit that I find most confusing about Victor's story is, how
can a traceback frame keep a thread alive? Is the problem that "dangling
thread" warnings are being triggered by threads that are finished and dead
but their Thread objects are still allocated? Because if so then that seems
like a bug in the warnings mechanism; there's no harm in a dead Thread
hanging around until collected, and Victor may have wasted a day debugging
an issue that wasn't a problem in the first place...

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nathaniel Smith
On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 14 September 2017 at 15:27, Nathaniel Smith <n...@pobox.com> wrote:
>> I don't get it. With bytes, you can either share objects or copy them and
>> the user can't tell the difference, so you can change your mind later if you
>> want.
>> But memoryviews require some kind of cross-interpreter strong
>> reference to keep the underlying buffer object alive. So if you want to
>> minimize object sharing, surely bytes are more future-proof.
>
> Not really, because the only way to ensure object separation (i.e no
> refcounted objects accessible from multiple interpreters at once) with
> a bytes-based API would be to either:
>
> 1. Always copy (eliminating most of the low overhead communications
> benefits that subinterpreters may offer over multiple processes)
> 2. Make the bytes implementation more complicated by allowing multiple
> bytes objects to share the same underlying storage while presenting as
> distinct objects in different interpreters
> 3. Make the output on the receiving side not actually a bytes object,
> but instead a view onto memory owned by another object in a different
> interpreter (a "memory view", one might say)
>
> And yes, using memory views for this does mean defining either a
> subclass or a mediating object that not only keeps the originating
> object alive until the receiving memoryview is closed, but also
> retains a reference to the originating interpreter so that it can
> switch to it when it needs to manipulate the source object's refcount
> or call one of the buffer methods.
>
> Yury and I are fine with that, since it means that either the sender
> *or* the receiver can decide to copy the data (e.g. by calling
> bytes(obj) before sending, or bytes(view) after receiving), and in the
> meantime, the object holding the cross-interpreter view knows that it
> needs to switch interpreters (and hence acquire the sending
> interpreter's GIL) before doing anything with the source object.
>
> The reason we're OK with this is that it means that only reading a new
> message from a channel (i.e creating a cross-interpreter view) or
> discarding a previously read message (i.e. closing a cross-interpreter
> view) will be synchronisation points where the receiving interpreter
> necessarily needs to acquire the sending interpreter's GIL.
>
> By contrast, if we allow an actual bytes object to be shared, then
> either every INCREF or DECREF on that bytes object becomes a
> synchronisation point, or else we end up needing some kind of
> secondary per-interpreter refcount where the interpreter doesn't drop
> its shared reference to the original object in its source interpreter
> until the internal refcount in the borrowing interpreter drops to
> zero.

Ah, that makes more sense.

I am nervous that allowing arbitrary memoryviews gives a *little* more
power than we need or want. I like that the current API can reasonably
be emulated using subprocesses -- it opens up the door for backports,
compatibility support on language implementations that don't support
subinterpreters, direct benchmark comparisons between the two
implementation strategies, etc. But if we allow arbitrary memoryviews,
then this requires that you can take (a) an arbitrary object, not
specified ahead of time, and (b) provide two read-write views on it in
separate interpreters such that modifications made in one are
immediately visible in the other. Subprocesses can do one or the other
-- they can copy arbitrary data, and if you warn them ahead of time
when you allocate the buffer, they can do real zero-copy shared
memory. But the combination is really difficult.

It'd be one thing if this were like a key feature that gave
subinterpreters an advantage over subprocesses, but it seems really
unlikely to me that a library won't know ahead of time when it's
filling in a buffer to be transferred, and if anything it seems like
we'd rather not expose read-write shared mappings in any case. It's
extremely non-trivial to do right [1].

tl;dr: let's not rule out a useful implementation strategy based on a
feature we don't actually need.

One alternative would be your option (3) -- you can put bytes in and
get memoryviews out, and since bytes objects are immutable it's OK.

[1] https://en.wikipedia.org/wiki/Memory_model_(programming)

>>> Handling an exception
>>> -
>> It would also be reasonable to simply not return any value/exception from
>> run() at all, or maybe just a bool for whether there was an unhandled
>> exception. Any high level API is going to be injecting code on both sides of
>> the interpreter boundary anyway, so it can do whatever exception and
>> traceback translation it wants to.
>
> So a

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Nathaniel Smith
On Sep 13, 2017 9:01 PM, "Nick Coghlan"  wrote:

On 14 September 2017 at 11:44, Eric Snow 
wrote:
>send(obj):
>
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.


I don't get it. With bytes, you can either share objects or copy them and
the user can't tell the difference, so you can change your mind later if
you want. But memoryviews require some kind of cross-interpreter strong
reference to keep the underlying buffer object alive. So if you want to
minimize object sharing, surely bytes are more future-proof.


> Handling an exception
> -
>
> ::
>
>interp = interpreters.create()
>try:
>interp.run("""if True:
>raise KeyError
>""")
>except KeyError:
>print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.


It would also be reasonable to simply not return any value/exception from
run() at all, or maybe just a bool for whether there was an unhandled
exception. Any high level API is going to be injecting code on both sides
of the interpreter boundary anyway, so it can do whatever exception and
traceback translation it wants to.


> Reseting __main__
> -
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.

I was going to note that you can already do this:

interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.


This is another point where the API could reasonably say that if you want
clean namespaces then you should do that yourself (e.g. by setting up your
own globals dict and using it to execute any post-bootstrap code).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 549: Instance Properties (aka: module properties)

2017-09-13 Thread Nathaniel Smith
On Wed, Sep 13, 2017 at 11:49 AM, Guido van Rossum  wrote:
>  > Why not adding both? Properties do have their uses as does __getattr__.
>
> In that case I would just add __getattr__ to module.c, and add a recipe or
> perhaps a utility module that implements a __getattr__ you can put into your
> module if you want @property support. That way you can have both but you
> only need a little bit of code in module.c to check for __getattr__ and call
> it when you'd otherwise raise AttributeError.

Unfortunately I don't think this works. If there's a @property object
present in the module's instance dict, then __getattribute__ will
return it directly instead of calling __getattr__.

(I guess for full property emulation you'd also need to override
__setattr__ and __dir__, but I don't know how important that is.)

We could consider letting modules overload __getattribute__ instead of
__getattr__, but I don't think this is viable either -- a key feature
of __getattr__ is that it doesn't add overhead to normal lookups. If
you implement deprecation warnings by overloading __getattribute__,
then it makes all your users slower, even the ones who never touch the
deprecated attributes. __getattr__ is much better than
__getattribute__ for this purpose.

Alternatively we can have a recipe that implements @property support
using __class__ assignment and overriding
__getattribute__/__setattr__/__dir__, so instead of 'from
module_helper.property_emulation import __getattr__' it'd be 'from
module_helper import enable_property_emulation;
enable_property_emulation(__name__)'. Still has the slowdown problem
but it would work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 6:45 PM, Barry Warsaw <ba...@python.org> wrote:
> On Sep 11, 2017, at 18:15, Nathaniel Smith <n...@pobox.com> wrote:
>
>> Compared to checking it on each call to sys.breakpointhook(), I guess
>> the two user-visible differences in behavior would be:
>>
>> - whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
>> affects future calls. I would find it quite surprising if it did;
>
> Maybe, but the effect would be essentially the same as setting 
> sys.breakpointhook during the execution of your program.
>
>> - whether the import happens immediately at startup, or is delayed
>> until the first call to breakpoint().
>
> I definitely think it should be delayed until first use.  You might never hit 
> the breakpoint() in which case you wouldn’t want to pay any penalty for even 
> checking the environment variable.  And once you *do* hit the breakpoint(), 
> you aren’t caring about performance then anyway.
>
> Both points could be addressed by caching the import after the first lookup, 
> but even there I don’t think it’s worth it.  I’d invoke KISS and just look it 
> up anew each time.  That way there’s no global/static state to worry about, 
> and the feature is as flexible as it can be.
>
>> If it's imported once at
>> startup, then this adds overhead to programs that set PYTHONBREAKPOINT
>> but don't use it, and if the envvar is set to some nonsense then you
>> get an error immediately instead of at the first call to breakpoint().
>
> That’s one case I can see being useful; report an error immediately upon 
> startup rather that when you hit breakpoint().  But even there, I’m not sure. 
>  What if you’ve got some kind of dynamic debugger discovery thing going on, 
> such that the callable isn’t available until its first use?

I don't think that's a big deal? Just do the discovery logic from
inside the callable itself, instead of using some kind of magical
attribute lookup hook.

>> These both seem like fairly minor differences to me, but maybe saving
>> that 30 ms or whatever of startup time is an important enough
>> optimization to justify the special case?
>
> I’m probably too steeped in the implementation, but it feels to me like not 
> just loading the envar callable on demand makes reasoning about and using 
> this more complicated.  I think for most uses though, it won’t really matter.

I don't mind breaking from convention if you have a good reason, I
just like it for PEPs to actually write down that reasoning :-)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 5:27 PM, Barry Warsaw <ba...@python.org> wrote:
> On Sep 10, 2017, at 13:46, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw <ba...@python.org> wrote:
>>> For PEP 553, I think it’s a good idea to support the environment variable 
>>> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
>>> some feedback.
>>>
>>> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
>>> sys.breakpointhook()?
>>
>> Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
>> startup, and if it's set use it to initialize sys.breakpointhook()?
>> Compare to, say, $PYTHONPATH.
>
> Perhaps, but what would be the visible effects of that?  I.e. what would that 
> buy you?

Why is this case special enough to break the rules?

Compared to checking it on each call to sys.breakpointhook(), I guess
the two user-visible differences in behavior would be:

- whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
affects future calls. I would find it quite surprising if it did;
generally when we mutate envvars like os.environ["PYTHONPATH"] it's a
way to set things up for future child processes and doesn't affect our
process.

- whether the import happens immediately at startup, or is delayed
until the first call to breakpoint(). If it's imported once at
startup, then this adds overhead to programs that set PYTHONBREAKPOINT
but don't use it, and if the envvar is set to some nonsense then you
get an error immediately instead of at the first call to breakpoint().
These both seem like fairly minor differences to me, but maybe saving
that 30 ms or whatever of startup time is an important enough
optimization to justify the special case?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 557: Data Classes

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 5:32 AM, Eric V. Smith <e...@trueblade.com> wrote:
> On 9/10/17 11:08 PM, Nathaniel Smith wrote:
>>
>> Hi Eric,
>>
>> A few quick comments:
>>
>> Why do you even have a hash= argument on individual fields? For the
>> whole class, I can imagine you might want to explicitly mark a whole
>> class as unhashable, but it seems like the only thing you can do with
>> the field-level hash= argument is to create a class where the __hash__
>> and __eq__ take different fields into account, and why would you ever
>> want that?
>
>
> The use case is that you have a cache, or something similar, that doesn't
> affect the object identity.

But wouldn't this just be field(cmp=False), no need to fiddle with hash=?

>> Though honestly I can see a reasonable argument for removing the
>> class-level hash= option too. And even if you keep it you might want to
>> error on some truly nonsensical options like defining __hash__ without
>> __eq__. (Also watch out that Python's usual rule about defining __eq__
>> blocking the inheritance of __hash__ does not kick in if __eq__ is added
>> after the class is created.)
>>
>> I've sometimes wished that attrs let me control whether it generated
>> equality methods (eq/ne/hash) separately from ordering methods
>> (lt/gt/...). Maybe the cmp= argument should take an enum with options
>> none/equality-only/full?
>
>
> Yeah, I've thought about this, too. But I don't have any use case in mind,
> and if it hasn't come up with attrs, then I'm reluctant to break new ground
> here.

https://github.com/python-attrs/attrs/issues/170

>From that thread, it feels like part of the problem is that it's
awkward to encode this using only boolean arguments, but they're sort
of stuck with that for backcompat with how they originally defined
cmp=, hence my suggestion to consider making it an enum from the start
:-).

>> The "why not attrs" section kind of reads like "because it's too popular
>> and useful"?
>
>
> I'll add some words to that section, probably focused on typing
> compatibility. My general feeling is that attrs has some great design
> decisions, but goes a little too far (e.g., conversions, validations). As
> with most things we add, I'm trying to be as minimalist as possible, while
> still being widely useful and allowing 3rd party extensions and future
> features.

If the question is "given that we're going to add something to the
stdlib, why shouldn't that thing be attrs?" then I guess it's
sufficient to say "because the attrs developers didn't want it". But I
think the PEP should also address the question "why are we adding
something to the stdlib, instead of just recommending people install
attrs".

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 557: Data Classes

2017-09-10 Thread Nathaniel Smith
Hi Eric,

A few quick comments:

Why do you even have a hash= argument on individual fields? For the whole
class, I can imagine you might want to explicitly mark a whole class as
unhashable, but it seems like the only thing you can do with the
field-level hash= argument is to create a class where the __hash__ and
__eq__ take different fields into account, and why would you ever want that?

Though honestly I can see a reasonable argument for removing the
class-level hash= option too. And even if you keep it you might want to
error on some truly nonsensical options like defining __hash__ without
__eq__. (Also watch out that Python's usual rule about defining __eq__
blocking the inheritance of __hash__ does not kick in if __eq__ is added
after the class is created.)

I've sometimes wished that attrs let me control whether it generated
equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...).
Maybe the cmp= argument should take an enum with options
none/equality-only/full?

The "why not attrs" section kind of reads like "because it's too popular
and useful"?

-n

On Sep 8, 2017 08:44, "Eric V. Smith"  wrote:

Oops, I forgot the link. It should show up shortly at
https://www.python.org/dev/peps/pep-0557/.

Eric.


On 9/8/17 7:57 AM, Eric V. Smith wrote:

> I've written a PEP for what might be thought of as "mutable namedtuples
> with defaults, but not inheriting tuple's behavior" (a mouthful, but it
> sounded simpler when I first thought of it). It's heavily influenced by
> the attrs project. It uses PEP 526 type annotations to define fields.
> From the overview section:
>
> @dataclass
> class InventoryItem:
> name: str
> unit_price: float
> quantity_on_hand: int = 0
>
> def total_cost(self) -> float:
> return self.unit_price * self.quantity_on_hand
>
> Will automatically add these methods:
>
>   def __init__(self, name: str, unit_price: float, quantity_on_hand: int
> = 0) -> None:
>   self.name = name
>   self.unit_price = unit_price
>   self.quantity_on_hand = quantity_on_hand
>   def __repr__(self):
>   return
> f'InventoryItem(name={self.name!r},unit_price={self.unit_pri
> ce!r},quantity_on_hand={self.quantity_on_hand!r})'
>
>   def __eq__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) ==
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ne__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) !=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __lt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __le__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __gt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ge__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>
> Data Classes saves you from writing and maintaining these functions.
>
> The PEP is largely complete, but could use some filling out in places.
> Comments welcome!
>
> Eric.
>
> P.S. I wrote this PEP when I was in my happy place.
>
> ___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
40pobox.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-10 Thread Nathaniel Smith
On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw  wrote:
> For PEP 553, I think it’s a good idea to support the environment variable 
> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
> some feedback.
>
> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
> sys.breakpointhook()?

Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
startup, and if it's set use it to initialize sys.breakpointhook()?
Compare to, say, $PYTHONPATH.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith
On Sep 8, 2017 4:06 PM, "Eric Snow"  wrote:


   run(code):

  Run the provided Python code in the interpreter, in the current
  OS thread.  If the interpreter is already running then raise
  RuntimeError in the interpreter that called ``run()``.

  The current interpreter (which called ``run()``) will block until
  the subinterpreter finishes running the requested code.  Any
  uncaught exception in that code will bubble up to the current
  interpreter.


This phrase "bubble up" here is doing a lot of work :-). Can you elaborate
on what you mean? The text now makes it seem like the exception will just
pass from one interpreter into another, but that seems impossible – it'd
mean sharing not just arbitrary user defined exception classes but full
frame objects...

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith
On Sep 9, 2017 9:07 AM, "Nick Coghlan"  wrote:


To immediately realise some level of efficiency benefits from the
shared memory space between the main interpreter and subinterpreters,
I also think these low level FIFOs should be defined as accepting any
object that supports the PEP 3118 buffer protocol, and emitting
memoryview() objects on the receiving end, rather than being bytes-in,
bytes-out.


Is your idea that this memoryview would refer directly to the sending
interpreter's memory (as opposed to a copy into some receiver-owned
buffer)? If so, then how do the two subinterpreters coordinate the buffer
release when the memoryview is closed?

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 8:54 PM, Michel Desmoulin
 wrote:
>
> Le 09/09/2017 à 01:28, Stefan Krah a écrit :
>> Still, the argument "who uses subinterpreters?" of course still remains.
>
> For now, nobody. But if we expose it and web frameworks manage to create
> workers as fast as multiprocessing and as cheap as threading, you will
> find a lot of people starting to want to use it.

To temper expectations a bit here, it sounds like the first version
might be more like: as slow as threading (no multicore), as expensive
as multiprocessing (no shared memory), and -- on Unix -- slower to
start than either of them (no fork).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 549 v2: now titled Instance Descriptors

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 1:45 PM, Ethan Furman  wrote:
> On 09/08/2017 12:44 PM, Larry Hastings wrote:
>
>> I've updated PEP 549 with a new title--"Instance Descriptors" is a better
>> name than "Instance Properties"--and to
>> clarify my rationale for the PEP.  I've also updated the prototype with
>> code cleanups and a new type:
>> "collections.abc.InstanceDescriptor", a base class that allows user
>> classes to be instance descriptors.
>
>
> I like the new title, I'm +0 on the PEP itself, and I have one correction
> for the PEP:  we've had the ability to simulate module properties for ages:
>
> Python 2.7.6 (default, Oct 26 2016, 20:32:47)
> [GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> --> import module_class
> --> module_class.hello
> 'hello'
> --> module_class.hello = 'hola'
> --> module_class.hello
> 'hola'
>
> And the code:
>
> class ModuleClass(object):
> @property
> def hello(self):
> try:
> return self._greeting
> except AttributeError:
> return 'hello'
> @hello.setter
> def hello(self, value):
> self._greeting = value
>
> import sys
> sys.modules[__name__] = ModuleClass()
>
> I will admit I don't see what reassigning the __class__ attribute on a
> module did for us.

If you have an existing package that doesn't replace itself in
sys.modules, then it's difficult and risky to switch to that form --
don't think of toy examples, think of django/__init__.py or
numpy/__init__.py. You have to rewrite the whole export logic, and you
have to figure out what to do with things like submodules that import
from the parent module before the swaparoo happens, you can get skew
issues between the original module namespace and the replacement class
namespace, etc. The advantage of the __class__ assignment trick (as
compared to what we had before) is that it lets you easily and safely
retrofit this kind of magic onto existing packages.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 556: Threaded garbage collection

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 12:13 PM, Antoine Pitrou  wrote:
> On Fri, 08 Sep 2017 12:04:10 -0700
> Benjamin Peterson  wrote:
>> I like it overall.
>>
>> - I was wondering what happens during interpreter shutdown. I see you
>> have that listed as a open issue. How about simply shutting down the
>> finalization thread and not guaranteeing that finalizers are actually
>> ever run à la Java?
>
> I don't know.  People generally have expectations towards stuff being
> finalized properly (especially when talking about files etc.).
> Once the first implementation is devised, we will know more about
> what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
> shutdown sequence?  perhaps we'll want to switch back to serial mode
> when shutting down?).

PyPy just abandons everything when shutting down, instead of running
finalizers. See the last paragraph of :
http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies

So that might be a useful source of experience.

On another note, I'm going to be that annoying person who suggests
massively extending the scope of your proposal. Feel free to throw
things at me or whatever.

Would it make sense to also move signal handlers to run in this
thread? Those are the other major source of nasty re-entrancy
problems.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553: Built-in debug()

2017-09-06 Thread Nathaniel Smith
On Wed, Sep 6, 2017 at 7:39 AM, Barry Warsaw <ba...@python.org> wrote:
>> On Tue, Sep 5, 2017 at 7:58 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> This would also avoid confusion with IPython's very
>> useful debug magic:
>> 
>> https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-debug
>> and which might also be worth stealing for the builtin REPL.
>> (Personally I use it way more often than set_trace().)
>
> Interesting.  I’m not an IPython user.  Do you think its %debug magic would 
> benefit from PEP 553?

Not in particular. But if you're working on making debugger entry more
discoverable/human-friendly, then providing a friendlier alias for the
pdb.pm() semantics might be useful too?

Actually, if you look at the pdb docs, the 3 ways of entering the
debugger that merit demonstrations at the top of the manual page are:

  pdb.run("...code...")  # "I want to debug this code"
  pdb.set_trace()# "break here"
  pdb.pm()# "wtf just happened?"

The set_trace() name is particularly opaque, but if we're talking
about adding a friendly debugger abstraction layer then I'd at least
think about whether to make it cover all three of these.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4

2017-09-06 Thread Nathaniel Smith
On Wed, Sep 6, 2017 at 1:49 AM, Ivan Levkivskyi  wrote:
> Another comment from bystander point of view: it looks like the discussions
> of API design and implementation are a bit entangled here.
> This is much better in the current version of the PEP, but still there is a
> _feelling_ that some design decisions are influenced by the implementation
> strategy.
>
> As I currently see the "philosophy" at large is like this:
> there are different level of coupling between concurrently executing code:
> * processes: practically not coupled, designed to be long running
> * threads: more tightly coupled, designed to be less long-lived, context is
> managed by threading.local, which is not inherited on "forking"
> * tasks: tightly coupled, designed to be short-lived, context will be
> managed by PEP 550, context is inherited on "forking"
>
> This seems right to me.
>
> Normal generators fall out from this "scheme", and it looks like their
> behavior is determined by the fact that coroutines are implemented as
> generators. What I think miht help is to add few more motivational examples
> to the design section of the PEP.

Literally the first motivating example at the beginning of the PEP
('def fractions ...') involves only generators, not coroutines, and
only works correctly if generators get special handling. (In fact, I'd
be curious to see how Greg's {push,pop}_local_storage could handle
this case.) The implementation strategy changed radically between v1
and v2 because of considerations around generator (not coroutine)
semantics. I'm not sure what more it can do to dispel these feelings
:-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553: Built-in debug()

2017-09-05 Thread Nathaniel Smith
On Tue, Sep 5, 2017 at 6:14 PM, Barry Warsaw  wrote:
> I’ve written a PEP proposing the addition of a new built-in function called 
> debug().  Adding this to your code would invoke a debugger through the hook 
> function sys.debughook().

The 'import pdb; pdb.set_trace()' dance is *extremely* obscure, so
replacing it with something more friendly seems like a great idea.

Maybe breakpoint() would be a better description of what set_trace()
actually does? This would also avoid confusion with IPython's very
useful debug magic:
https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-debug
and which might also be worth stealing for the builtin REPL.
(Personally I use it way more often than set_trace().)

Example:

In [1]: def f():
   ...: x = 1
   ...: raise RuntimeError
   ...:

In [2]: f()
---
RuntimeError  Traceback (most recent call last)
 in ()
> 1 f()

 in f()
  1 def f():
  2 x = 1
> 3 raise RuntimeError

RuntimeError:

In [3]: debug
> (3)f()
  1 def f():
  2 x = 1
> 3 raise RuntimeError

ipdb> p x
1

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 549: Instance Properties (aka: module properties)

2017-09-05 Thread Nathaniel Smith
On Tue, Sep 5, 2017 at 3:03 PM, Larry Hastings  wrote:
>
> I've written a PEP proposing a language change:
>
> https://www.python.org/dev/peps/pep-0549/
>
> The TL;DR summary: add support for property objects to modules.  I've
> already posted a prototype.

Interesting idea! It's definitely less arcane than the __class__
assignment support that was added in 3.5. I guess the question is
whether to add another language feature here, or to provide better
documentation/helpers for the existing feature.

If anyone's curious what the __class__ trick looks like in practice,
here's some simple deprecation machinery:
  
https://github.com/njsmith/trio/blob/ee8d909e34a2b28d55b5c6137707e8861eee3234/trio/_deprecate.py#L102-L138

And here's what it looks like in use:
  
https://github.com/njsmith/trio/blob/ee8d909e34a2b28d55b5c6137707e8861eee3234/trio/__init__.py#L91-L115

Advantages of PEP 549:
- easier to explain and use

Advantages of the __class__ trick:
- faster (no need to add an extra step to the normal attribute lookup
fast path); only those who need the feature pay for it

- exposes the full power of Python's class model. Notice that the
above code overrides __getattr__ but not __dir__, so the attributes
are accessible via direct lookup but not listed in dir(mod). This is
on purpose, for two reasons: (a) tab completion shouldn't be
suggesting deprecated attributes, (b) when I did expose them in
__dir__, I had trouble with test runners that iterated through
dir(mod) looking for tests, and ended up spewing tons of spurious
deprecation warnings. (This is especially bad when you have a policy
of running your tests with DeprecationWarnings converted to errors.) I
don't think there's any way to do this with PEP 549.

- already supported in CPython 3.5+ and PyPy3, and with a bit of care
can be faked on all older CPython releases (including, crucially,
CPython 2). PEP 549 OTOH AFAICT will only be usable for packages that
have 3.7 as their minimum supported version.

I don't imagine that I would actually use PEP 549 any time in the
foreseeable future, due to the inability to override __dir__ and the
minimum version requirement. If you only need to support CPython 3.5+
and PyPy3 5.9+, then you can effectively get PEP 549's functionality
at the cost of 3 lines of code and a block indent:

import sys, types
class _MyModuleType(types.ModuleType):
@property
def ...

@property
def ...
sys.modules[__name__].__class__ = _MyModuleType

It's definitely true though that they're not the most obvious lines of code :-)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] To reduce Python "application" startup time

2017-09-05 Thread Nathaniel Smith
On Tue, Sep 5, 2017 at 11:13 AM, Jelle Zijlstra
 wrote:
>
>
> 2017-09-05 6:02 GMT-07:00 INADA Naoki :
>> With this profile, I tried optimize `python -c 'import asyncio'`, logging
>> and http.client.
>>
>>
>> https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-optimize-import-patch
>>
> This patch moves a few imports inside functions. I wonder whether that kind
> of change actually helps with real applications—doesn't any real application
> end up importing the socket module anyway at some point?

I don't know if this particular change is worthwhile, but one place
where startup slowness is particularly noticed is with commands like
'foo.py --help' or 'foo.py --generate-completions' (the latter called
implicitly by hitting  in some shell), which typically do lots of
imports that end up not being used.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4: coroutine policy

2017-08-29 Thread Nathaniel Smith
On Tue, Aug 29, 2017 at 12:59 PM, Yury Selivanov
 wrote:
> b2 = wait_for(bar())
> # bar() was wrapped into a Task and is being running right now
> await b2

Ah not quite. wait_for is itself implemented as a coroutine, so it
doesn't spawn off bar() into its own task until you await b2.

Though according to the docs you should pretend that you don't know
whether wait_for returns a coroutine or a Future, so what you said
would also be a conforming implementation.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4: coroutine policy

2017-08-29 Thread Nathaniel Smith
On Tue, Aug 29, 2017 at 12:32 PM, Antoine Pitrou  wrote:
>
>
> Le 29/08/2017 à 21:18, Yury Selivanov a écrit :
>> On Tue, Aug 29, 2017 at 2:40 PM, Antoine Pitrou  wrote:
>>> On Mon, 28 Aug 2017 17:24:29 -0400
>>> Yury Selivanov  wrote:
 Long story short, I think we need to rollback our last decision to
 prohibit context propagation up the call stack in coroutines.  In PEP
 550 v3 and earlier, the following snippet would work just fine:

var = new_context_var()

async def bar():
var.set(42)

async def foo():
await bar()
assert var.get() == 42   # with previous PEP 550 semantics

run_until_complete(foo())

 But it would break if a user wrapped "await bar()" with "wait_for()":

var = new_context_var()

async def bar():
var.set(42)

async def foo():
await wait_for(bar(), 1)
assert var.get() == 42  # AssertionError !!!

run_until_complete(foo())

>>> [...]
>>
>>> Why wouldn't the bar() coroutine inherit
>>> the LC at the point it's instantiated (i.e. where the synchronous bar()
>>> call is done)?
>>
>> We want tasks to have their own isolated contexts.  When a task
>> is started, it runs its code in parallel with its "parent" task.
>
> I'm sorry, but I don't understand what it all means.
>
> To pose the question differently: why is example #1 supposed to be
> different, philosophically, than example #2?  Both spawn a coroutine,
> both wait for its execution to end.  There is no reason that adding a
> wait_for() intermediary (presumably because the user wants to add a
> timeout) would significantly change the execution semantics of bar().
>
>> wait_for() in the above example creates an asyncio.Task implicitly,
>> and that's why we don't see 'var' changed to '42' in foo().
>
> I don't understand why a non-obvious behaviour detail (the fact that
> wait_for() creates an asyncio.Task implicitly) should translate into a
> fundamental difference in observable behaviour.  I find it
> counter-intuitive and error-prone.

For better or worse, asyncio users generally need to be aware of the
distinction between coroutines/Tasks/Futures and which functions
create or return which -- it's essentially the same as the distinction
between running some code in the current thread versus spawning a new
thread to run it (and then possibly waiting for the result).

Mostly the docs tell you when a function converts a coroutine into a
Task, e.g. if you look at the docs for 'ensure_future' or 'wait_for'
or 'wait' they all say this explicitly. Or in some cases like 'gather'
and 'shield', it's implicit because they take arbitrary futures, and
creating a task is how you convert a coroutine into a future.

As a rule of thumb, I think it's accurate to say that any function
that takes a coroutine object as an argument always converts it into a
Task.

>> This is a slightly complicated case, but it's addressable with a good
>> documentation and recommended best practices.
>
> It would be better addressed with consistent behaviour that doesn't rely
> on specialist knowledge, though :-/

This is the core of the Curio/Trio critique of asyncio: in asyncio,
operations that implicitly initiate concurrent execution are all over
the API. This is the root cause of asyncio's problems with buffering
and backpressure, it makes it hard to shut down properly (it's hard to
know when everything has finished running), it's related to the
"spooky cancellation at a distance" issue where cancelling one task
can cause another Task to get a cancelled exception, etc. If you use
the recommended "high level" API for streams, then AFAIK it's still
impossible to close your streams properly at shutdown (you can request
that a close happen "sometime soon", but you can't tell when it's
finished).

Obviously asyncio isn't going anywhere, so we should try to
solve/mitigate these issues where we can, but asyncio's API
fundamentally assumes that users will be very aware and careful about
which operations create which kinds of concurrency. So I sort of feel
like, if you can use asyncio at all, then you can handle wait_for
creating a new LC.

-n

[1] 
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-3-closing-time

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4

2017-08-28 Thread Nathaniel Smith
On Mon, Aug 28, 2017 at 3:14 PM, Eric Snow <ericsnowcurren...@gmail.com> wrote:
> On Sat, Aug 26, 2017 at 3:09 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> You might be interested in these notes I wrote to motivate why we need
>> a chain of namespaces, and why simple "async task locals" aren't
>> sufficient:
>>
>> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb
>
> Thanks, Nathaniel!  That helped me understand the rationale, though
> I'm still unconvinced chained lookup is necessary for the stated goal
> of the PEP.
>
> (The rest of my reply is not specific to Nathaniel.)
>
> tl;dr Please:
>   * make the chained lookup aspect of the proposal more explicit (and
> distinct) in the beginning sections of the PEP (or drop chained
> lookup).
>   * explain why normal frames do not get to take advantage of chained
> lookup (or allow them to).
>
> 
>
> If I understood right, the problem is that we always want context vars
> resolved relative to the current frame and then to the caller's frame
> (and on up the call stack).  For generators, "caller" means the frame
> that resumed the generator.  Since we don't know what frame will
> resume the generator beforehand, we can't simply copy the current LC
> when a generator is created and bind it to the generator's frame.
>
> However, I'm still not convinced that's the semantics we need.  The
> key statement is "and then to the caller's frame (and on up the call
> stack)", i.e. chained lookup.  On the linked page Nathaniel explained
> the position (quite clearly, thank you) using sys.exc_info() as an
> example of async-local state.  I posit that that example isn't
> particularly representative of what we actually need.  Isn't the point
> of the PEP to provide an async-safe alternative to threading.local()?
>
> Any existing code using threading.local() would not expect any kind of
> chained lookup since threads don't have any.  So introducing chained
> lookup in the PEP is unnecessary and consequently not ideal since it
> introduces significant complexity.

There's a lot of Python code out there, and it's hard to know what it
all wants :-). But I don't think we should get hung up on matching
threading.local() -- no-one sits down and says "okay, what my users
want is for me to write some code that uses a thread-local", i.e.,
threading.local() is a mechanism, not an end-goal.

My hypothesis is in most cases, when people reach for
threading.local(), it's because they have some "contextual" variable,
and they want to be able to do things like set it to a value that
affects all and only the code that runs inside a 'with' block. So far
the only way to approximate this in Python has been to use
threading.local(), but chained lookup would work even better.

As evidence for this hypothesis: something like chained lookup is
important for exc_info() [1] and for Trio's cancellation semantics,
and I'm pretty confident that it's what users naturally expect for use
cases like 'with decimal.localcontext(): ...' or 'with
numpy.errstate(...): ...'. And it works fine for cases like Flask's
request-locals that get set once near the top of a callstack and then
treated as read-only by most of the code.

I'm not aware of any alternative to chained lookup that fulfills all
of these use cases -- are you? And I'm not aware of any use cases that
require something more than threading.local() but less than chained
lookup -- are you?

[1] I guess I should say something about including sys.exc_info() as
evidence that chained lookup as useful, given that CPython probably
won't share code between it's PEP 550 implementation and its
sys.exc_info() implementation. I'm mostly citing it as a evidence that
this is a real kind of need that can arise when writing programs -- if
it happens once, it'll probably happen again. But I can also imagine
that other implementations might want to share code here, and it's
certainly nice if the Python-the-language spec can just say
"exc_info() has semantics 'as if' it were implemented using PEP 550
storage" and leave it at that. Plus it's kind of rude for the
interpreter to claim semantics for itself that it won't let anyone
else implement :-).

> As the PEP is currently written, chained lookup is a key part of the
> proposal, though it does not explicitly express this.  I suppose this
> is where my confusion has been.
>
> At this point I think I understand one rationale for the chained
> lookup functionality; it takes advantage of the cooperative scheduling
> characteristics of generators, et al.  Unlike with threads, a
> programmer can know the context under which a generator will be
> resumed.  Thus it may be useful to the programmer to allow (or expect)
> the resumed generator to

Re: [Python-Dev] Pep 550 and None/masking

2017-08-27 Thread Nathaniel Smith
I believe that the current status is:

- assigning None isn't treated specially – it does mask any underlying
values (which I think is what we want)

- there is currently no way to "unmask"

- but it's generally agreed that there should be a way to do that, at least
in some cases, to handle the save/restore issue I raised. It's just that
Yury & Elvis wanted to deal with restructuring the PEP first before doing
more work on the api details.

-n

On Aug 27, 2017 9:08 AM, "Jim J. Jewett"  wrote:

> Does setting an ImplicitScopeVar to None set the value to None, or just
> remove it?
>
> If it removes it, does that effectively unmask a previously masked value?
>
> If it really sets to None, then is there a way to explicitly unmask
> previously masked values?
>
> Perhaps the initial constructor should require an initial value
>  (defaulting to None) and the docs should give examples both for using a
> sensible default value and for using a special "unset" marker.
>
> -jJ
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> njs%40pobox.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4

2017-08-26 Thread Nathaniel Smith
On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus <elpr...@gmail.com> wrote:
> On Saturday, August 26, 2017 2:34:29 AM EDT Nathaniel Smith wrote:
>> On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov
> <yselivanov...@gmail.com> wrote:
>> > Coroutines and Asynchronous Tasks
>> > -
>> >
>> > In coroutines, like in generators, context variable changes are
>> > local>
>> > and are not visible to the caller::
>> > import asyncio
>> >
>> > var = new_context_var()
>> >
>> > async def sub():
>> > assert var.lookup() == 'main'
>> > var.set('sub')
>> > assert var.lookup() == 'sub'
>> >
>> > async def main():
>> > var.set('main')
>> > await sub()
>> > assert var.lookup() == 'main'
>> >
>> > loop = asyncio.get_event_loop()
>> > loop.run_until_complete(main())
>>
>> I think this change is a bad idea. I think that generally, an async
>> call like 'await async_sub()' should have the equivalent semantics
>> to a synchronous call like 'sync_sub()', except for the part where
>> the former is able to contain yields. Giving every coroutine an LC
>> breaks that equivalence. It also makes it so in async code, you
>> can't necessarily refactor by moving code in and out of
>> subroutines. Like, if we inline 'sub' into 'main', that shouldn't
>> change the semantics, but...
>
> If we could easily, we'd given each _normal function_ its own logical
> context as well.

I mean... you could do that. It'd be easy to do technically, right?
But it would make the PEP useless, because then projects like decimal
and numpy couldn't adopt it without breaking backcompat, meaning they
couldn't adopt it at all.

The backcompat argument isn't there in the same way for async code,
because it's new and these functions have generally been broken there
anyway. But it's still kinda there in spirit: there's a huge amount of
collective knowledge about how (synchronous) Python code works, and
IMO async code should match that whenever possible.

> What we are talking about here is variable scope leaking up the call
> stack.  I think this is a bad pattern.  For decimal context-like uses
> of the EC you should always use a context manager.  For uses like Web
> request locals, you always have a top function that sets the context
> vars.

It's perfectly reasonable to have a script where you call
decimal.setcontext or np.seterr somewhere at the top to set the
defaults for the rest of the script. Yeah, maybe it'd be a bit cleaner
to use a 'with' block wrapped around main(), and certainly in a
complex app you want to stick to that, but Python isn't just used for
complex apps :-). I foresee confused users trying to figure out why
np.seterr suddenly stopped working when they ported their app to use
async.

This also seems like it makes some cases much trickier. Like, say you
have an async context manager that wants to manipulate a context
local. If you write 'async def __aenter__', you just lost -- it'll be
isolated. I think you have to write some awkward thing like:

def __aenter__(self):
coro = self._real_aenter()
coro.__logical_context__ = None
return coro

It would be really nice if libraries like urllib3/requests supported
async as an option, but it's difficult because they can't drop support
for synchronous operation and python 2, and we want to keep a single
codebase. One option I've been exploring is to write them in
"synchronous style" but with async/await keywords added, and then
generating a py2-compatible version with a script that strips out
async/await etc. (Like a really simple 3to2 that just works at the
token level.) One transformation you'd want to apply is replacing
__aenter__ -> __enter__, but this gets much more difficult if we have
to worry about elaborate transformations like the above...

If I have an async generator, and I set its __logical_context__ to
None, then do I also have to set this attribute on every coroutine
returned from calling __anext__/asend/athrow/aclose?

>>
>> I think I see the motivation: you want to make
>>
>>await sub()
>>
>> and
>>
>>await ensure_future(sub())
>>
>> have the same semantics, right? And the latter has to create a Task
>
> What we want is for `await sub()` to be equivalent to
> `await asyncio.wait_for(sub())` and to `await asyncio.gather(sub())`.

I don't feel like there's any need to make gather() have exactly the
same semantics as a regular call -- it's pretty clearly a
task-spawning primitive that runs all of the given coroutines
concurrently, so it makes sense that it would have 

Re: [Python-Dev] PEP 550 v4

2017-08-26 Thread Nathaniel Smith
On Sat, Aug 26, 2017 at 10:25 AM, Eric Snow  wrote:
> With threads we have a directed graph of execution, rooted at the root
> thread, branching with each new thread and merging with each .join().
> Each thread gets its own copy of each threading.local, regardless of
> the relationship between branches (threads) in the execution graph.
>
> With async (and generators) we also have a directed graph of
> execution, rooted in the calling thread, branching with each new async
> call.  Currently there is no equivalent to threading.local for the
> async execution graph.  This proposal involves adding such an
> equivalent.
>
> However, the proposed solution isn''t quite equivalent, right?  It
> adds a concept of lookup on the chain of namespaces, traversing up the
> execution graph back to the root.  threading.local does not do this.
> Furthermore, you can have more than one threading.local per thread.
> From what I read in the PEP, each node in the execution graph has (at
> most) one Execution Context.
>
> The PEP doesn't really say much about these differences from
> threadlocals, including a rationale.  FWIW, I think such a COW
> mechanism could be useful.  However, it does add complexity to the
> feature.  So a clear explanation in the PEP of why it's worth it would
> be valuable.

You might be interested in these notes I wrote to motivate why we need
a chain of namespaces, and why simple "async task locals" aren't
sufficient:

https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb

They might be a bit verbose to include directly in the PEP, but
Yury/Elvis, feel free to steal whatever if you think it'd be useful.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v4

2017-08-26 Thread Nathaniel Smith
On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov  wrote:
> Coroutines and Asynchronous Tasks
> -
>
> In coroutines, like in generators, context variable changes are local
> and are not visible to the caller::
>
> import asyncio
>
> var = new_context_var()
>
> async def sub():
> assert var.lookup() == 'main'
> var.set('sub')
> assert var.lookup() == 'sub'
>
> async def main():
> var.set('main')
> await sub()
> assert var.lookup() == 'main'
>
> loop = asyncio.get_event_loop()
> loop.run_until_complete(main())

I think this change is a bad idea. I think that generally, an async
call like 'await async_sub()' should have the equivalent semantics to
a synchronous call like 'sync_sub()', except for the part where the
former is able to contain yields. Giving every coroutine an LC breaks
that equivalence. It also makes it so in async code, you can't
necessarily refactor by moving code in and out of subroutines. Like,
if we inline 'sub' into 'main', that shouldn't change the semantics,
but...

async def main():
var.set('main')
# inlined copy of sub()
assert var.lookup() == 'main'
var.set('sub')
assert var.lookup() == 'sub'
# end of inlined copy
assert var.lookup() == 'main'   # fails

It also adds non-trivial overhead, because now lookup() is O(depth of
async callstack), instead of O(depth of (async) generator nesting),
which is generally much smaller.

I think I see the motivation: you want to make

   await sub()

and

   await ensure_future(sub())

have the same semantics, right? And the latter has to create a Task
and split it off into a new execution context, so you want the former
to do so as well? But to me this is like saying that we want

   sync_sub()

and

   thread_pool_executor.submit(sync_sub).result()

to have the same semantics: they mostly do, but sync_sub() access
thread-locals then they won't. Oh well. That's perhaps a but
unfortunate, but it doesn't mean we should give every synchronous
frame its own thread-locals.

(And fwiw I'm still not convinced we should give up on 'yield from' as
a mechanism for refactoring generators.)

> To establish the full semantics of execution context in couroutines,
> we must also consider *tasks*.  A task is the abstraction used by
> *asyncio*, and other similar libraries, to manage the concurrent
> execution of coroutines.  In the example above, a task is created
> implicitly by the ``run_until_complete()`` function.
> ``asyncio.wait_for()`` is another example of implicit task creation::
>
> async def sub():
> await asyncio.sleep(1)
> assert var.lookup() == 'main'
>
> async def main():
> var.set('main')
>
> # waiting for sub() directly
> await sub()
>
> # waiting for sub() with a timeout
> await asyncio.wait_for(sub(), timeout=2)
>
> var.set('main changed')
>
> Intuitively, we expect the assertion in ``sub()`` to hold true in both
> invocations, even though the ``wait_for()`` implementation actually
> spawns a task, which runs ``sub()`` concurrently with ``main()``.

I found this example confusing -- you talk about sub() and main()
running concurrently, but ``wait_for`` blocks main() until sub() has
finished running, right? Is this just supposed to show that there
should be some sort of inheritance across tasks, and then the next
example is to show that it has to be a copy rather than sharing the
actual object? (This is just a issue of phrasing/readability.)

> The ``sys.run_with_logical_context()`` function performs the following
> steps:
>
> 1. Push *lc* onto the current execution context stack.
> 2. Run ``func(*args, **kwargs)``.
> 3. Pop *lc* from the execution context stack.
> 4. Return or raise the ``func()`` result.

It occurs to me that both this and the way generator/coroutines expose
their logic context means that logical context objects are
semantically mutable. This could create weird effects if someone
attaches the same LC to two different generators, or tries to use it
simultaneously in two different threads, etc. We should have a little
interlock like generator's ag_running, where an LC keeps track of
whether it's currently in use and if you try to push the same LC onto
two ECs simultaneously then it errors out.

> For efficient access in performance-sensitive code paths, such as in
> ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``,
> making it an O(1) operation when the cache is hit.  The cache key is
> composed from the following:
>
> * The new ``uint64_t PyThreadState->unique_id``, which is a globally
>   unique thread state identifier.  It is computed from the new
>   ``uint64_t PyInterpreterState->ts_counter``, which is incremented
>   whenever a new thread state is created.
>
> * The ``uint64_t ContextVar->version`` counter, which is incremented
>   whenever the context variable value is changed in 

Re: [Python-Dev] PEP 550 v3 naming

2017-08-24 Thread Nathaniel Smith
On Thu, Aug 24, 2017 at 1:22 AM, Nick Coghlan  wrote:
> On 24 August 2017 at 02:19, Yury Selivanov  wrote:
>> I think that "implicit context" is not an accurate description of what
>> LogicalContext is.
>>
>> "implicit context" only makes sense when we talk about decimal
>> context.  For instance, in:
>>
>> Decimal(1) + Decimal(2)
>>
>> decimal context is implicit.  But this is "implicit" from the
>> standpoint of that code.  Decimal will manage its context
>> *explicitly*.
>>
>> Fixing decimal context is only one part of the PEP though.  EC will
>> also allow to implement asynchronous task locals:
>>
>> current_request = new_context_key()
>>
>> async def handle_http_request(request):
>>  current_request.set(request)
>>
>> Here we explicitly set and will explicitly get values from the EC.  We
>> will explicitly manage the EC in asyncio.Task and when we schedule
>> callbacks.
>
> The implicit behaviour that "implicit context" refers to is the fact
> that if you look at an isolated snippet of code, you have no idea what
> context you're actually going to be querying or modifying - that's
> implicit in the execution of the whole program.

How about: RuntimeContext and RuntimeContextStack

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

2017-08-24 Thread Nathaniel Smith
On Wed, Aug 23, 2017 at 9:32 PM, Jim J. Jewett  wrote:
>> While the context is defined conceptually as a nested chain of
>> key:value mappings, we avoid using the mapping syntax because of the
>> way the values can shift dynamically out from under you based on who
>> called you
> ...
>> instead of having the problem of changes inside the
>> generator leaking out, we instead had the problem of
>> changes outside the generator *not* making their way in
>
> I still don't see how this is different from a ChainMap.
>
> If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C,
> d_mine]  maps as your implicit context, then a change to d_thread map
> (that some other code could make) will be visible unless it is masked.
>
> Similarly, if the fallback for d_C changes from d_B to d_B1 (which
> points directly to d_thread), that will be visible for any keys that
> were previously resolved in d_A or d_B, or are now resolved in dB1.
>
> Those seem like exactly the cases that would (and should) cause
> "shifting values".
>
> This does mean that you can't cache everything in the localmost map,
> but that is a problem with the optimization regardless of how the
> implementation is done.

It's crucial that the only thing that can effect the result of calling
ContextKey.get() is other method calls on that same ContextKey within
the same thread. That's what enables ContextKey to efficiently cache
repeated lookups, which is an important optimization for code like
decimal or numpy that needs to access their local context *extremely*
quickly (like, a dict lookup is too slow). In fact this caching trick
is just preserving what decimal does right now in their thread-local
handling (because accessing the threadstate dict is too slow for
them).

So we can't expose any kind of mutable API to individual maps, because
then someone might call __setitem__ on some map that's lower down in
the stack, and break caching.

> And, of course, using a ChainMap means that the keys do NOT have to be
> predefined ... so the Key class really can be skipped.

The motivations for the Key class are to eliminate the need to worry
about accidental key collisions between unrelated libraries, to
provide some important optimizations (as per above), and to make it
easy and natural to provide convenient APIs like for saving and
restoring the state of a value inside a context manager. Those are all
orthogonal to whether the underlying structure is implemented as a
ChainMap or as something more specialized.

But I tend to agree with your general argument that the current PEP is
trying a bit too hard to hide away all this structure where no-one can
see it. The above constraints mean that simply exposing a ChainMap as
the public API is probably a bad idea. Even if there are compelling
performance advantages to fancy immutable-HAMT implementation (I'm in
wait-and-see mode on this myself), then there are still a lot more
introspection-y operations that could be provided, like:

- make a LocalContext out of a {ContextKey: value} dict
- get a {ContextKey: value} dict out of a LocalContext
- get the underlying list of LocalContexts out of an ExecutionContext
- create an ExecutionContext out of a list of LocalContexts
- given a ContextKey and a LocalContext, get the current value of the
key in the context
- given a ContextKey and an ExecutionContext, get out a list of values
at each level

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v3 naming

2017-08-22 Thread Nathaniel Smith
On Tue, Aug 22, 2017 at 8:51 PM, Guido van Rossum <gu...@python.org> wrote:
> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> We could do worse than just plain Context and ContextStack, for that
>> matter.
>
>
> I worry that that's going to lead more people astray thinking this has
> something to do with contextlib, which it really doesn't (it's much more
> closely related to threading.local()).

I guess that could be an argument for LocalContext and
LocalContextStack, to go back full circle...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 550 v3 naming

2017-08-22 Thread Nathaniel Smith
On Tue, Aug 22, 2017 at 8:22 AM, Guido van Rossum  wrote:
> As I understand the key APIs and constraints of the proposal better, I'm
> leaning towards FooContext (LC) and FooContextStack (EC), for some value of
> Foo that I haven't determined yet. Perhaps the latter can be shortened to
> just ContextStack (since the Foo part can probably be guessed from context.
> :-)

I guess I'll put in another vote for DynamicContext and
DynamicContextStack. Though my earlier suggestion of these [1] sank
like a stone, either because no-one saw it or because everyone hated
it :-).

We could do worse than just plain Context and ContextStack, for that matter.

-n

[1] https://mail.python.org/pipermail/python-ideas/2017-August/046840.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows

2017-08-08 Thread Nathaniel Smith
On Tue, Aug 8, 2017 at 2:29 PM, Steve Dower <steve.do...@python.org> wrote:
> On 08Aug2017 1151, Nathaniel Smith wrote:
>>
>> It looks like Thread.join ultimately ends up blocking in
>> Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs
>> behind it -- I think there are 3 different implementation you might
>> end up with, depending on how CPython was built? Two of them seem to
>> ultimately block in WaitForSingleObject, which would be easy to adapt
>> to handle control-C. Unfortunately I think the implementation that
>> actually gets used on modern systems is the one that blocks in
>> SleepConditionVariableSRW, and I don't see any easy way for a
>> control-C to interrupt that. But maybe I'm missing something -- I'm
>> not a Windows expert.
>
>
> I'd have to dig back through the recent attempts at changing this, but I
> believe the SleepConditionVariableSRW path is unused for all versions of
> Windows.
>
> A couple of people (including myself) attempted to enable that code path,
> but it has some subtle issues that were causing test failures, so we
> abandoned all the attempts. Though ISTR that someone put in more effort than
> most of us, but I don't think we've merged it (and if we have, it'd only be
> in 3.7 at this stage).

Ah, you're right -- the comments say it's used on Windows 7 and later,
but the code disagrees. Silly me for trusting the comments :-). So it
looks like it would actually be fairly straightforward to add
control-C interruption support.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows

2017-08-08 Thread Nathaniel Smith
On Tue, Aug 8, 2017 at 2:54 AM, Jonathan Slenders  wrote:
> Hi all,
>
> Is it possible that thread.join() cannot be interrupted on Windows, while it
> can be on Linux?
> Would this be a bug, or is it by design?
>
>
> import threading, time
> def wait():
> time.sleep(1000)
> t = threading.Thread(target=wait)
> t.start()
> t.join()  # Press Control-C now. It stops on Linux, while it hangs on
> Windows.

This comes down to a difference in how the Linux and Windows low-level
APIs handle control-C and blocking functions: on Linux, the default is
that any low-level blocking function can be interrupted by a control-C
or any other random signal, and it's the calling code's job to check
for this and restart it if necessary. This is annoying because it
means that every low-level function call inside the Python interpreter
has to have a bunch of boilerplate to detect this and retry, but OTOH
it means that control-C automatically works in (almost) all cases. On
Windows, they made the opposite decision: low-level blocking functions
are never automatically interrupted by control-C. It's a reasonable
design choice. The advantage is that sloppily written programs tend to
work better -- on Linux you kind of *have* to put a retry loop around
*every* low level call or your program will suffer weird random bugs,
and on Windows you don't.

But for carefully written programs like CPython this is actually
pretty annoying, because if you *do* want to wake up on a control-C,
then on Windows that has to be laboriously implemented on a
case-by-case basis for each blocking function, and often this requires
some kind of cleverness or is actually impossible, depending on what
function you want to interrupt. At least on Linux the retry loop is
always the same.

The end result is that on Windows, control-C almost never works to
wake up a blocked Python process, with a few special exceptions where
someone did the work to implement this. On Python 2 the only functions
that have this implemented are time.sleep() and
multiprocessing.Semaphore.acquire; on Python 3 there are a few more
(you can grep the source for _PyOS_SigintEvent to find them), but
Thread.join isn't one of them.

It looks like Thread.join ultimately ends up blocking in
Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs
behind it -- I think there are 3 different implementation you might
end up with, depending on how CPython was built? Two of them seem to
ultimately block in WaitForSingleObject, which would be easy to adapt
to handle control-C. Unfortunately I think the implementation that
actually gets used on modern systems is the one that blocks in
SleepConditionVariableSRW, and I don't see any easy way for a
control-C to interrupt that. But maybe I'm missing something -- I'm
not a Windows expert.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-20 Thread Nathaniel Smith
On Jul 20, 2017 14:18, "Eric Snow"  wrote:

On Thu, Jul 20, 2017 at 11:53 AM, Jim J. Jewett 
wrote:
> I agree that startup time is a problem, but I wonder if some of the pain
> could be mitigated by using a persistent process.
>
> [snip]
>
> Is it too hard to create a daemon server?
> Is the communication and context switch slower than a new startup?
> Is the pattern just not well-enough advertised?

A couple years ago I suggested the same idea (i.e. "pythond") during a
conversation with MAL at PyCon UK.  IIRC, security and complexity were
the two major obstacles.  Assuming you use fork, you must ensure that
the daemon gets into just the right state.  Otherwise you're leaking
(potentially sensitive) info into the forked processes or you're
wasting cycles/memory.  Relatedly, at PyCon this year Barry and I were
talking about the idea of bootstrapping the interpreter from a memory
snapshot on disk, rather than from scatch (thus drastically reducing
the number of IO events).  From what I gather, emacs does (or did)
something like this.


There's a fair amount of prior art for both of these. The prestart/daemon
approach is apparently somewhat popular in the Java world, because the jvm
is super slow to start up. E.g.:
https://github.com/ninjudd/drip
The interesting thing there is probably their README's comparison of their
strategy to previous attempts at the same thing. (They've explicitly moved
away from a persistent daemon approach.)

The emacs memory dump approach is *really* challenging. They've been
struggling to move away from it for years. Have you ever wondered why
jemalloc is so much better than the default glibc malloc on Linux?
Apparently it's because for many years it was impossible to improve glibc
malloc's internal memory layout because it would break emacs.

I'm not sure either of these make much sense when python startup is already
in the single digit milliseconds. While it's certainly great if we can
lower that further, my impression is that for any real application, startup
time is overwhelmingly spent importing user packages, not in the
interpreter start up itself. And this is difficult to optimize with a
daemon or memory dump, because you need a full list of modules to preload
and it'll differ between programs.

This suggests that optimizations to finding/loading/executing modules are
likely to give the biggest startup time wins.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Program runs in 12s on Python 2.7, but 5s on Python 3.5 -- why so much difference?

2017-07-18 Thread Nathaniel Smith
I'd probably start with a regular C-level profiler, like perf or
callgrind. They're not very useful for comparing two versions of code
written in Python, but here the Python code is the same (modulo
changes in the stdlib), and it's changes in the interpreter's C code
that probably make the difference.

On Tue, Jul 18, 2017 at 9:03 AM, Ben Hoyt  wrote:
> Hi folks,
>
> (Not entirely sure this is the right place for this question, but hopefully
> it's of interest to several folks.)
>
> A few days ago I posted a note in response to Victor Stinner's articles on
> his CPython contributions, noting that I wrote a program that ran in 11.7
> seconds on Python 2.7, but only takes 5.1 seconds on Python 3.5 (on my 2.5
> GHz macOS i7), more than 2x as fast. Obviously this is a Good Thing, but I'm
> curious as to why there's so much difference.
>
> The program is a pentomino puzzle solver, and it works via code generation,
> generating a ton of nested "if" statements, so I believe it's exercising the
> Python bytecode interpreter heavily. Obviously there have been some big
> optimizations to make this happen, but I'm curious what the main
> improvements are that are causing this much difference.
>
> There's a writeup about my program here, with benchmarks at the bottom:
> http://benhoyt.com/writings/python-pentomino/
>
> This is the generated Python code that's being exercised:
> https://github.com/benhoyt/python-pentomino/blob/master/generated_solve.py
>
> For reference, on Python 3.6 it runs in 4.6 seconds (same on Python 3.7
> alpha). This smallish increase from Python 3.5 to Python 3.6 was more
> expected to me due to the bytecode changing to wordcode in 3.6.
>
> I tried using cProfile on both Python versions, but that didn't say much,
> because the functions being called aren't taking the majority of the time.
> How does one benchmark at a lower level, or otherwise explain what's going
> on here?
>
> Thanks,
> Ben
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Impact of Namedtuple on startup time

2017-07-17 Thread Nathaniel Smith
On Jul 17, 2017 5:28 PM, "Steven D'Aprano"  wrote:

On Mon, Jul 17, 2017 at 09:31:20PM +, Brett Cannon wrote:

> As for removing exec() as a goal, I'll back up Christian's point and the
> one Steve made at the language summit that removing the use of exec() from
> the critical path in Python is a laudable goal from a security
perspective.

I'm sorry, I don't understand this point. What do you mean by "critical
path"?

Is the intention to remove exec from builtins? From the entire language?
If not, how does its use in namedtuple introduce a security problem?


I think the intention is to allow users with a certain kind of security
requirement to opt in to a restricted version of the language that doesn't
support exec. This is difficult if the stdlib is calling exec all over the
place. But nobody is suggesting to change the language in regular usage,
just provide another option.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Helping contributors with chores (do we have to?)

2017-06-25 Thread Nathaniel Smith
On Jun 25, 2017 10:27, "Larry Hastings" <la...@hastings.org> wrote:



On 06/25/2017 10:02 AM, Nathaniel Smith wrote:

My dudes, in a previous life I helped invent distributed VCS, but I still
get confused by fiddly git BS just like everyone else.


Really?  I thought Bitkeeper was out before the monotone project even
started--and TeamWare had monotone beat by most of a decade.  Not to
downplay your achievements, but I don't think you "helped invent DVCS".  Or
did you work on Bitkeeper / TeamWare?


If your response to the previous message is to send an email to all of
python-dev nitpicking credentials, then I think you missed the point...

(There are many projects that attempted to somehow combine the terms
"distributed" and "VCS", but nowadays the overwhelmingly dominant use of
that term is to refer to system designs in the monotone/git/hg lineage.
Perhaps I could have phrased it better; feel free to pretend I wrote a
several paragraph history tracing which systems influenced which instead. I
don't think any of this affects the actual point, which is that you can be
arbitrarily familiar with how a software system works and still be lost and
confused on a regular basis, so people should pay attention and try to
avoid making comments that imply this indicates some personal deficit in
the asker.)

((Maybe I should clarify that the reason I'm being cranky about this isn't
really on my or Antoine's behalf, but on behalf of anyone lurking and
thinking "oh, python-dev says I suck, I guess I'm not cut out to be a
python contributor".))

(((Though also I *would* genuinely appreciate it if anyone could explain
how to make GitHub do the thing I couldn't figure out. Probably it's
something silly and obvious...)))

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Helping contributors with chores (do we have to?)

2017-06-25 Thread Nathaniel Smith
On Jun 25, 2017 08:12, "Jakub Wilk"  wrote:

* Paul Sokolovsky , 2017-06-25, 11:47:

A GitHub PR is just a git branch (in somebody else's repository, but also
> in the repository it's submitted to). So, like any git branch, you can
> fetch it, re-branch to your local branch, apply any changes to it, rebase,
> push anywhere.
>

Right, this is documented here:
https://help.github.com/articles/checking-out-pull-requests-locally/


There're also various tools for dealing specifically with git branch layout
> as used by Github, and every real man writes their own
>

I have this in my gitconfig:

[alias]
hub-pr = ! "_g() { set -e -u; git fetch origin
\"pull/$1/head:gh-$1\" && git checkout \"gh-$1\"; }; _g"

If I want to checkout PR#42, I do:

$ git hub-pr 42


I believe you and Paul are missing the specific problem that Antoine was
talking about, which is: how can we easily make changes *to someone else's
PR*, i.e. these changes should show up in the diff view if you go to the
PR's web page. This requires not just getting a copy of the PR branch
locally, but also pushing it back to the original submitter's branch on
GitHub.

Allegedly this is possible in most cases (there's a permissions toggle the
submitter can set, but it's set by default):


https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/

However, like Antoine, when I've tried to do this then all I've managed is
to get obscure errors from GitHub. Did I have the wrong incantations? Was
the permissions toggle set wrong? (I thought the web ui said it wasn't, but
maybe I misunderstood.) It's a mystery. Has anyone figured out how to make
*this* work reliably or ergonomically?

Also, as a general comment, not directed at Jakub: the posturing about how
easy git is reminds me of the posturing about how much better language X is
than others described here: http://blog.aurynn.com/contempt-culture-2. My
dudes, in a previous life I helped invent distributed VCS, but I still get
confused by fiddly git BS just like everyone else. I know you probably
aren't meaning to go around telling people that they're not Real
Programmers because they get confused like me, but you are and it's not
kind; please stop.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Handle errors in cleanup code

2017-06-13 Thread Nathaniel Smith
On Tue, Jun 13, 2017 at 12:10 AM, Nick Coghlan  wrote:

>
> reporting failures from concurrent.futures.wait:
> https://pythonhosted.org/futures/#concurrent.futures.wait


Yeah, and asyncio.gather is another example in the stdlib. Or there's
twisted's DeferredList. Trio is unusual in effectively forcing all tasks to
be run under a gather(), but the basic concept isn't unique at all.


> Figuring out how to *display* an exception tree coherently is going to
> be a pain (it's already problematic with just the linked list), but if
> we can at least model exception trees consistently, then we'd be able
> to share that display logic, even if the scenarios resulting in
> MultiErrors varied.
>

It's true, it was a pain :-). And I'm sure it can be refined. But I think
this is a reasonable first cut:
https://github.com/python-trio/trio/blob/9e0df6159e55fe5e389ae5e24f9bbe51e9b77943/trio/_core/_multierror.py#L341-L388

Basically the algorithm there is:

if there's a __cause__ or __context__:
  print it (recursively using this algorithm)
  print the description line ("this exception was the direct cause" or
whatever is appropriate)
print the traceback and str() attached to this exception itself
for each embedded exception:
  print "Details of embedded exception {i}:"
  with extra indentation:
 print the embedded exception (recursively using this algorithm)
(+ some memoization to avoid infinite recursion on loopy structures)

Of course really complicated trainwrecks that send exception shrapnel
flying everywhere can still be complicated to read, but ... that's why we
make the big bucks, I guess. (My fingers initially typoed that as "that's
why we make the big bugs". Just throwing that out there.)

Currently that code has a hard-coded assumption that the only kind of
exception-container is MultiError, but I guess it wouldn't be too hard to
extend that into some kind of protocol that your asymmetric CleanupError
could also participate in? Like: an abstract exception container has 0 or 1
(predecessor exception, description) pairs [i.e., __cause__ or
__context__], plus 0 or more (embedded exception, description) pairs, and
then the printing code just walks the resulting tree using the above
algorithm?

If this all works out at the library level, then one can start to imagine
ways that the interpreter could potentially get benefits from participating:

- right now, if an exception that already has a __cause__ or __context__ is
re-raised, then implicit exception chaining will overwrite the old
__cause__ or __context__. Instead, it could check to see if there's already
a non-None value there, and if so do exc.__context__ =
MultiError([exc.__context__, new_exc]).

- standardizing the protocol for walking over an exception container would
let us avoid fighting over who owns sys.excepthook, and make it easier for
third-party exception printers (like IPython, pytest, etc.) to play along.

- MultiError.catch is a bit awkward. If we were going all-in on this then
one can imagine some construct like

multitry:
... do stuff ...
raise MultiError([ValueError(1), MultiError([TypeError(),
ValueError(2)])])
multiexcept ValueError as exc:
print("caught a ValueError", exc)
multiexcept TypeError:
print("caught a TypeError and raising RuntimeError")
raise RuntimeError

which prints

  caught a ValueError 1
  caught a TypeError and raising RuntimeError
  caught a ValueError 2

and then raises a RuntimeError with its __context__ pointing to the
TypeError.

...but of course that's pretty speculative at this point; definitely
more python-ideas territory.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Handle errors in cleanup code

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 1:07 AM, Nick Coghlan  wrote:
> Aye, agreed. The key challenge we have is that we're trying to
> represent the exception state as a linked list, when what we really
> have once we start taking cleanup errors into account is an exception
> *tree*.
[...]
> P.S. trio's MultiError is also likely worth looking into in this context

Huh, yeah, this is some interesting convergent evolution.
trio.MultiError is exactly a way of representing a tree of exceptions,
though it's designed to do that for "live" exceptions rather than just
context chaining.

Briefly... the motivation here is that if you have multiple concurrent
call-stacks (threads/tasks/whatever-your-abstraction-is-called)
running at the same time, then it means you can literally have
multiple uncaught exceptions propagating out at the same time, so you
need some strategy for dealing with that. One popular option is to
avoid the problem by throwing away exceptions that propagate "too far"
(e.g., in the stdlib threading module, if an exception hits the top of
the call stack of a non-main thread, then it gets printed to stderr
and then the program continues normally). Trio takes a different
approach: its tasks are arranged in a tree, and if a child task exits
with an exception then that exception gets propagated into the parent
task. This allows us avoid throwing away exceptions, but it means that
we need some way to represent the situation when a parent task has
*multiple* live exceptions propagate into it at the same time. That's
what trio.MultiError is for.

Structurally, MultiError is just an Exception that holds a list of
child exceptions, like

   MultiError([TypeError(), OSError(), KeyboardInterrupt()])

so that they can propagate together. One design decision though is
that if you put a MultiError inside a MultiError, it *isn't*
collapsed, so it's also legal to have something like

MultiError([
TypeError(),
MultiError([OSError(), KeyboardInterrupt()]),
])

Semantically, these two MultiErrors are mostly the same; they both
represent a situation where there are 3 unhandled exceptions
propagating together. The reason for keeping the tree structure is
that if the inner MultiError propagated for a while on its own before
meeting the TypeError, then it accumulated some traceback and we need
somewhere to store that information. (This generally happens when the
task tree has multiple levels of nesting.) The other option would be
to make two copies of this part of the traceback and attach one copy
onto each of the two exceptions inside it, (a) but that's potentially
expensive, and (b) if we eventually print the traceback then it's much
more readable if we can say "here's where OSError was raised, and
where KeyboardInterrupt was raised, and here's where they traveled
together" and only print the common frames once.

There's some examples of how this works on pages 38-49 of my language
summit slides here:
https://vorpus.org/~njs/misc/trio-language-summit-2017.pdf
And here's the source for the toy example programs that I used in the
slides, in case anyone wants to play with them:
https://gist.github.com/njsmith/634b596e5765d5ed8b819a4f8e56d306

Then the other piece of the MultiError design is catching them. This
is done with a context manager called MultiError.catch, which "maps"
an exception handler (represented as a callable) over all the
exceptions that propagate through it, and then simplifies the
MultiError tree to collapse empty and singleton nodes. Since the
exceptions inside a MultiError represent independent, potentially
unrelated errors, you definitely don't want to accidentally throw away
that KeyboardInterrupt just because your code knows how to handle the
OSError. Or if you have something like MultiError([OSError(),
TypeError()]) then trio has no idea which of those exceptions might be
the error you know how to catch and handle which might be the error
that indicates some terrible bug that should abort the program, so
neither is allowed to mask the other - you have to decide how to
handle them independently.

If anyone wants to dig into it the code is here:
https://github.com/python-trio/trio/blob/master/trio/_core/_multierror.py

Anyway, compared to the __cleanup_errors__ idea:

- Both involve a collection object that holds exceptions, but in the
MultiError version the collection subclasses BaseException. One
consequence is that you can put the exception collection object
directly into __context__ or __cause__ instead of using a new
attribute.

- MultiError allows for a tree structure *within* a single collection
of parallel exceptions. (And then of course on top of that each
individual exception within the collection can have its own chain
attached.) Since the motivation for this is wanting to preserve
traceback structure accumulated while the collection was propagating,
and __cleanup_errors__ is only intended for "dead" expections that
don't propagate, this is solving an issue 

Re: [Python-Dev] Handle errors in cleanup code

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 6:29 AM, Stefan Ring  wrote:
>
> > Yury in the comment for PR 2108 [1] suggested more complicated code:
> >
> > do_something()
> > try:
> > do_something_other()
> > except BaseException as ex:
> > try:
> > undo_something()
> > finally:
> > raise ex
>
> And this is still bad, because it loses the back trace. The way we do it is:
>
> do_something()
> try:
> do_something_other()
> except BaseException as ex:
> tb = sys.exc_info()[2]
> try:
> undo_something()
> finally:
> raise ex, None, tb

Are you testing on python 2? On Python 3 just plain 'raise ex' seems
to give a sensible traceback for me...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538 warning at startup: please remove it

2017-06-12 Thread Nathaniel Smith
On Jun 12, 2017 10:50, "Gregory P. Smith"  wrote:


The problem, as with all warnings, is that it isn't the user who has
control over the problem who sees the warning. It is the end use of an
application on a system that sees it.


I don't think I understand the distinction you're making here. This isn't
like DeprecationWarnings where python is trying to tell the application
developer that they need to fix something, and the application user gets
caught in the crossfire. In this case there's nothing the application
developer can do – the problem is that the end user has their system
misconfigured, and probably should fix their .bashrc or crontab something
before running the application. Or maybe they shouldn't and python should
just do its thing and it's fine. But if anyone's going to do something it
would have to be the end user, right?

(I don't have an opinion on the warning itself; I'd be mildly interested to
discover which of my systems are broken in this fashion but it doesn't
affect me much either way. It does occur to me though that it will probably
make some sysadmins *extremely* cranky: the ones that have crontabs with
broken locales and which are set to email if their script generates any
output – I think these are both still cron defaults – and who after
upgrading to py37 will suddenly find copies of this warning pouring into
their inboxes at a rate of hundreds per hour.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-07 Thread Nathaniel Smith
On Jun 7, 2017 6:29 AM, "Victor Stinner" <victor.stin...@gmail.com> wrote:

2017-06-07 10:56 GMT+02:00 Nathaniel Smith <n...@pobox.com>:
> Another testing challenge is that the stdlib ssl module has no way to
> trigger a renegotiation, and therefore there's no way to write tests
> to check that it properly handles a renegotiation, even though
> renegotiation is by far the trickiest part of the protocol to get
> right. (In particular, renegotiation is the only case where attempting
> to read can give WantWrite and vice-versa.)

Renegociation was the source of a vulnerability in SSL/TLS protocols,
so maybe it's a good thing that it's not implemented :-)
https://www.rapid7.com/db/vulnerabilities/tls-sess-renegotiation

Renegociation was removed from the new TLS 1.3 protocol:
https://tlswg.github.io/tls13-spec/
"TLS 1.3 forbids renegotiation"


Oh, sure, renegotiation is awful, no question. The HTTP/2 spec also forbids
it. But it does still get used and python totally implements it :-). If
python is talking to some peer and the peer says "hey, let's renegotiate
now" then it will. There just aren't any tests for what happens next.

Not that this has much to do with MemoryBIOs. Sorry for the tangent.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-07 Thread Nathaniel Smith
On Tue, Jun 6, 2017 at 10:49 AM, Jim Baker  wrote:
> With Nick's suggestion of _tls_bootstrap, this has certainly become more
> complicated. I'm still thinking of the ramifications for future Jython 2.7
> support, if Python dev goes down this path. It still seems easier to me to
> avoid exposing the SSLObject/MemoryBIO model to 2.7 and instead concentrate
> on just having native TLS support. Yes, this might require more work than a
> simple backport, but everyone is resource constrained, not just the Requests
> or Jython teams.

Yeah, presumably the preferred way to do PEP 543 on Jython would be to
have a native backend based on javax.net.ssl.SSLEngine. Not sure how
that affects the MemoryBIO discussion – maybe it means that Jython
just doesn't need a MemoryBIO backport in the stdlib ssl module,
because everyone who might use it will find themselves using SSLEngine
instead.

> My concrete suggestion is that any work on PEP 543 will benefit from
> improving the testing currently found in test_ssl as part of its
> implementation. For a variety of reasons, test functions like ssl_io_loop
> (https://github.com/python/cpython/blob/master/Lib/test/test_ssl.py#L1691)
> avoid asserting more than it can properly manage the framing of
> wrapped/unwrapped data, so for example one doesn't want to state explicitly
> when SSL_ERROR_WANT_READ would be called (too much white box at that point).
> On the other hand, the lack of unit testing of, for example, SSLObject.read,
> but instead only doing at the functional test level, means that there's
> nothing explicit testing "will raise an SSLWantReadError if it needs more
> data than the incoming BIO has available" (per the docs), or similar use of
> signaling (SSLEOFError comes to mind).

Another testing challenge is that the stdlib ssl module has no way to
trigger a renegotiation, and therefore there's no way to write tests
to check that it properly handles a renegotiation, even though
renegotiation is by far the trickiest part of the protocol to get
right. (In particular, renegotiation is the only case where attempting
to read can give WantWrite and vice-versa.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-06 Thread Nathaniel Smith
On Mon, Jun 5, 2017 at 8:49 PM, Nick Coghlan  wrote:
> The reason this kind of approach is really attractive to
> redistributors from a customer risk management perspective is that
> like gevent's monkeypatching of synchronous networking APIs, it's
> *opt-in at runtime*, so the risk of our accidentally inflicting it on
> a customer that doesn't want it and doesn't need it is almost exactly
> zero - if none of their own code includes the "import _tls_bootstrap;
> _tls_bootstrap.monkeypatch_ssl()" invocation and none of their
> dependencies start enabling it as an implicit side effect of some
> other operation, they'll never even know the enhancement is there.
> Instead, the compatibility risks get concentrated in the applications
> relying on the bootstrapping API, since the monkeypatching process is
> a potentially new source of bugs that don't exist in the more
> conventional execution models.

OK. To unpack that, I think it would mean:

2.7's ssl.py and _ssl.c remain exactly as they are.

We create a new _tls_bootstrap.py and __tls_bootstrap.c, which start
out as ~copies of the current ssl.py and _ssl.c code in master.
(Unfortunately the SSLObject code can't just be separated out from
_ssl.c into a new file, because the implementations of SSLObject and
SSLSocket are heavily intertwined.) Then the copied files are hacked
up to work in this monkeypatching context (so that e.g. instead of
defining the SSLError hierarchy in C code, it gets imported from the
main ssl.py).

So: we end up with two copies of the ssl code in the py27 tree, both
diverged from the copy that in the py3 tree, in different ways.

I know that the ssl maintainers won't be *happy* about this given that
keeping the py2 and py3 code similar is an ongoing issue and was one
of the stated advantages of backporting SSLObject, but I don't know
whether it's a showstopper or not; it'd be good to hear their
perspective.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-05 Thread Nathaniel Smith
On Jun 5, 2017 7:01 AM, "Nick Coghlan"  wrote:

On 5 June 2017 at 21:44, Donald Stufft  wrote:
> Is pip allowed to use the hypothetical _ensurepip_ssl outside of
ensurepip?

Yes, so something like _tls_bootstrap would probably be a better name
for the helper module (assuming we go down the "private API to
bootstrap 3rd party SSL modules" path).


It seems like there's a risk here that we end up with two divergent copies
of ssl.py and _ssl.c inside the python 2 tree, and that this will make it
harder to do future bug fixing and backports. Is this going to cause
problems? Is there any way to mitigate them?

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nathaniel Smith
On Fri, Jun 2, 2017 at 1:29 PM, Terry Reedy  wrote:
> On 6/2/2017 12:21 PM, Barry Warsaw wrote:
>>
>> On Jun 03, 2017, at 02:10 AM, Nick Coghlan wrote:
>
>
>>> The benefit of making any backport a private API is that it would mean
>>> we weren't committing to support that API for general use: it would be
>>> supported *solely* for the use case discussed in the PEP (i.e. helping
>>> to advance the development of PEP 543 without breaking pip
>>> bootstrapping in the process).
>>
>>
>> That sounds like a good compromise.  My own major objection was in
>> exposing a
>> new public API in Python 2.7, which would clearly be a new feature.
>
>
> Which would likely be seen by someone as justifying other requests to add to
> 2.7 'just this one more essential new feature' ;-).

Whatever the eventual outcome, I don't think there's any danger
someone will read this thread and think "wow, it's so easy to get new
features into 2.7".

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nathaniel Smith
On Jun 2, 2017 7:24 AM, "Ben Darnell"  wrote:

The PEP's rationale is now "This PEP will help facilitate the future
adoption
of :pep:`543` across all supported Python versions, which will improve
security
for both Python 2 and Python 3 users."

What exactly are these security improvements? My understanding (which may
well be incorrect) is that the security improvements come in the future
when the PEP 543 APIs are implemented on top of the various platform-native
security libraries. These changes will not be present until some future 3.x
release, and will not be backported to 2.7 (without another PEP, which I
expect would be even more contentious than this one). What then is the
security improvement for Python 2 users?


My understanding is that PEP 543 would be released as a library on pypi, as
well as targeting stdlib inclusion in some future release. The goal of the
MemoryBIO PEP is to allow the PEP 543 package to be pure-python, support
all the major platforms, and straddle py2/py3


In Tornado, I have not felt any urgency to replace wrap_socket with
MemoryBIO. Is there a security-related reason I should do so sooner rather
than later? (I'd also urge Cory and any other wrap_socket skeptics on the
requests team to reconsider - Tornado's SSLIOStream works well. The
asynchronous use of wrap_socket isn't so subtle and risky with buffering)


I'll leave the discussion of wrap_socket's reliability to others, because I
don't have any experience there, but I do want to point out that that which
primitive you pick has major system design consequences. MemoryBIO is an
abstraction that can implement wrap_socket, but not vice-versa; if you use
wrap_socket as your fundamental primitive then that leaks all over your
abstraction hierarchy.

Twisted has to have a MemoryBIO​ implementation of their TLS code, because
they want to support TLS over arbitrary transports. Carrying a wrap_socket
implementation as well would mean twice the tricky and security-sensitive
code to maintain, plus breaking their current abstractions to expose
whether any particular transport object is actually just a raw socket.

The problem gets worse when you add PEP 543's pluggable TLS backends. If
you can require a MemoryBIO-like API, then adding a backend becomes a
matter of defining like 4 methods with relatively simple, testable
interfaces, and it automatically works everywhere. Implementing a
wrap_socket interface on top of this is complicated because the socket API
is complicated and full of subtle corners, but it only has to be done once
and libraries like tornado can adapt to the quirks of the one
implementation.

OTOH if you have to also support backends that only have wrap_socket, then
this multiplies the complexity of everything. Now we need a way to
negotiate which APIs each backend supports, and we need to somehow document
all the weird corners of how wrapped sockets are supposed to behave in edge
cases, and when different wrapped sockets inevitably behave differently
then libraries like tornado need to discover those differences and support
all the options. We need a way to explain to users why some backends work
with some libraries but not others. And so on. And we still need to support
MemoryBIOs in as many cases as possible.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-01 Thread Nathaniel Smith
On Jun 1, 2017 9:20 AM, "Chris Angelico"  wrote:

On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield  wrote:
> The answer to that is honestly not clear to me. I chatted with the pip
developers, and they have 90%+ of their users currently on Python 2, but
more than half of those are on 2.7.9 or later. This shows some interest in
upgrading to newer Python 2s. The question, I think, is: do we end up in a
position where a good number of developers are on 2.7.14 or later and only
a very small fraction on 2.7.13 or earlier before the absolute number of
Python 2 devs drops low enough to just drop Python 2?
>
> I don’t have an answer to that question. I have a gut instinct that says
yes, probably, but a lack of certainty. My suspicion is that most of the
core dev community believe the answer to that is “no”.
>

Let's see.

Python 2 users include people on Windows who install it themselves,
and then have no mechanism for automatic updates. They'll probably
stay on whatever 2.7.x they first got, until something forces them to
update. But it also includes people on stable Linux distros, where
they have automatic updates provided by Red Hat or Debian or whomever,
so a change like this WILL propagate - particularly (a) as the window
is three entire years, and (b) if the change is considered important
by the distro managers, which is a smaller group of people to convince
than the users themselves.


I believe that for answering this question about the ssl module, it's
really only Linux users that matter, since pip/requests/everyone else
pushing for this only want to use ssl.MemoryBIO on Linux. Their plan on
Windows/MacOS (IIUC) is to stop using the ssl module entirely in favor of
new ctypes bindings for their respective native TLS libraries.

(And yes, in principle it might be possible to write new ctypes-based
bindings for openssl, but (a) this whole project is already teetering on
the verge of being impossible given the resources available, so adding any
major extra deliverable is likely to sink the whole thing, and (b) compared
to the proprietary libraries, openssl is *much* harder and riskier to wrap
at the ctypes level because it has different/incompatible ABIs depending on
its micro version and the vendor who distributed it. This is why manylinux
packages that need openssl have to ship their own, but pip can't and
shouldn't ship its own openssl for many hopefully obvious reasons.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backport ssl.MemoryBIO on Python 2.7?

2017-05-23 Thread Nathaniel Smith
On Tue, May 23, 2017 at 8:34 PM, David Wilson  wrote:
> On Tue, May 23, 2017 at 06:13:17PM -0700, Cory Benfield wrote:
>
>> There are discussions around Requests unvendoring its dependencies
>> thanks to the improved nature of pip. This might be a year of pretty
>> big changes for Requests.
>
> In which case, what is to prevent Requests from just depending on
> pyOpenSSL as usual?

I thought that requests couldn't have any hard dependencies on C
extensions, because pip vendors requests, and pip needs to be
pure-python for bootstrapping purposes? Cory would know better than me
though, so perhaps I'm wrong...

> I'm still writing 2.7 code every day and would love to see it live a
> little longer, but accepting every feature request seems the wrong way
> to go - and MemoryBIO is a hard sell as a security enhancement, it's new
> functionality.

IIUC, the security enhancement is indirect but real: on Windows/MacOS,
Python's dependence on openssl is a security liability, and to get
away from this we need Cory's new library that abstracts over
different TLS implementations. But for applications to take advantage
of this, they need to switch to using the new library. And before they
can switch to using the new library, it needs to work everywhere. And
for the new library to work on python 2 on unix, it needs MemoryBIO's
in the stdlib – ideally using an interface that's as-close-as-possible
to what they look like on python 3, so he doesn't have to implement
totally different backends for py2 and py3, because Cory is already a
hero for trying to make this happen and we don't want to waste any
more of his time than necessary. So the end result is that if you have
Python 2 code doing SSL/TLS on Windows/MacOS, and you want proper
trust handling and prompt security updates, then MemoryBIO support is
actually on the critical path for making that happen.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Snap Python for simple distribution across multiple Linux distros

2017-05-23 Thread Nathaniel Smith
Well, it sounds like you have an answer to the question of whether the
cpython developers are interested in making official snap releases, but of
course this doesn't stop you making unofficial snap releases, and simple
standalone python distributions can be pretty handy. I just wanted to point
out that virtualenv support is a very important feature of python builds,
and it may be a challenge to support this within the app-as-container
paradigm, because it requires that the python interpreter be able to locate
itself, copy itself to arbitrary directories, and then run the copy. (Or
maybe it's fine, dunno. And python 3 has some changes that might make it
friendlier than python 2 in this regard.)

-n

On May 16, 2017 8:09 AM, "Martin Wimpress" 
wrote:

> Hi all,
>
> I work at Canonical as part of the engineering team developing Ubuntu
> and Snapcraft [1] and I'm a long time Python fan :-)
>
> We've created snaps, a platform that enables projects to directly
> control delivery of software updates to users. This video of a
> lightning talk by dlang developers at DConf2017 [2] shows how they've
> made good use of snaps to distribute their compiler. They found the
> release channels particularly useful so their users can track a
> specific release.
>
> Is there someone here who'd be interested in doing the same for Python?
>
> [1] https://snapcraft.io/
> [2] https://www.youtube.com/watch?v=M-bDzr4gYUU
> [3] https://snapcraft.io/docs/core/install
> [4] https://build.snapcraft.io/
>
> --
> Regards, Martin.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> njs%40pobox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] windows installer and python list mention

2017-04-10 Thread Nathaniel Smith
On Mon, Apr 10, 2017 at 10:32 AM, Ethan Furman  wrote:
> Some people find it easier to follow this and other lists via gmane
> (http://news.gmane.org/gmane.comp.python.general), a service which
> offers a newsgroup interface to many online mailing lists.

Also, gmane has been dead for a few months (that link just says "Page
Not Found") and its future is uncertain, so this bit isn't terribly
helpful either...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] why _PyGen_Finalize(gen) propagates close() to _PyGen_yf() ?

2017-03-30 Thread Nathaniel Smith
On Thu, Mar 30, 2017 at 11:05 AM, Oleg Nesterov  wrote:
> On 03/28, Eric Snow wrote:
>>
>> On Mon, Mar 20, 2017 at 11:30 AM, Oleg Nesterov  wrote:
>> > Hello,
>> >
>> > Let me first clarify, I do not claim this is a bug, I am trying to learn
>> > python and now I trying to understand yield-from.
>>
>> Given that you haven't gotten a response here,
>
> and this looks a bit strange. My question is simple, the implementation looks
> clear and straightforward, I am a bit surprised none of cpython devs bothered
> to reply.
>
>> you may want to take
>> this over to the core-mentors...@python.org list.
>
> Well, I'm afraid to contact this closed and not-for-mortals list, not sure
> this very basic question should go there ;) perhaps you are already a member,
> feel free to forward.

core-mentorship is intended as a friendly place for folks who are
starting to study CPython internals. I'm not sure where you got the
impression that it's not-for-mortals but I suspect the people running
it would like to know so they can fix it :-).

In any case the short answer to your original question is that PEP 342
says that generator finalization calls the generator's close() method,
which throws a GeneratorExit into the generator, and PEP 380 says that
as a special case, when a GeneratorExit is thrown into a yield from,
then this is propagated by calling .close() on the yielded-from
iterator (if such a method exists) and then re-raised in the original
generator.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What version is an extension module binary compatible with

2017-03-29 Thread Nathaniel Smith
On Wed, Mar 29, 2017 at 8:22 AM, Paul Moore <p.f.mo...@gmail.com> wrote:
> On 28 March 2017 at 17:31, Nathaniel Smith <n...@pobox.com> wrote:
>> IMO this is a bug, and depending on how many packages are affected it might
>> even call for an emergency 3.6.2. The worst case is that we start getting
>> large numbers of packages uploaded to pypi that claim to be 3.6.0 compatible
>> but that crash like crash with an obscure error when people download them.
>
> Has anyone logged this on bugs.python.org? There's nothing in the
> Fedora bug referenced by the OP that indicates they've done so.

I didn't see one, so: https://bugs.python.org/issue29943

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


<    1   2   3   4   5   >