Re: [Async-sig] New blog post: Notes on structured concurrency, or: Go statement considered harmful

2018-04-26 Thread Nathaniel Smith
On Wed, Apr 25, 2018 at 3:17 AM, Antoine Pitrou  wrote:
> On Wed, 25 Apr 2018 02:24:15 -0700
> Nathaniel Smith  wrote:
>> Hi all,
>> I just posted another essay on concurrent API design:
>> This is the one that finally gets at the core reasons why Trio exists;
>> I've been trying to figure out how to write it for at least a year
>> now. I hope you like it.
> My experience is indeed that something like the nursery construct would
> make concurrent programming much more robust in complex cases.
> This is a great explanation why.


> API note: I would expect to be able to use it this way:
> class MyEndpoint:
> def __init__(self):
> self._nursery = open_nursery()
> # Lots of behaviour methods that can put new tasks in the nursery
> def close(self):
> self._nursery.close()

You might expect to be able to use it that way, but you can't! The
'async with' part of 'async with open_nursery()' is mandatory. This is
what I mean about it forcing you to rethink things, and why I think
there is room for genuine controversy :-). (Just like there was about
goto -- it's weird to think that it could have turned out differently
in hindsight, but people really did have valid concerns...)

I think the pattern we're settling on for this particular case is:

class MyEndpoint:
def __init__(self, nursery, ...):
self._nursery = nursery
# methods here that use nursery

async def open_my_endpoint(...):
async with trio.open_nursery() as nursery:
yield MyEndpoint(nursery, ...)

Then most end-users do 'async with open_my_endpoint() as endpoint:'
and then use the 'endpoint' object inside the block; or if you have
some special reason why you need to have multiple endpoints in the
same nursery (e.g. you have an unbounded number of endpoints and don't
want to have to somehow write an unbounded number of 'async with'
blocks in your source code), then you can call MyEndpoint() directly
and pass an explicit nursery. A little bit of extra fuss, but not too

So that's how you handle it. Why do we make you jump through these hoops?

The problem is, we want to enforce that each nursery object's lifetime
is bound to the lifetime of a calling frame. The point of the 'async
with' in 'async with open_nursery()' is to perform this binding. To
reduce errors, open_nursery() doesn't even return a nursery object –
only open_nursery().__aenter__() does that. Otherwise, if a task in
the nursery has an unhandled error, we have nowhere to report it
(among other issues).

Of course this is Python, so you can always do gross hacks like
calling __aenter__ yourself, but then you're responsible for making
sure the context manager semantics are respected. In most systems
you'd expect this kind of thing to syntactically enforced as part of
the language; it's actually pretty amazing that Trio is able to makes
things work as well as it can as a "mere library". It's really a
testament to how much thought has been put into Python -- other
languages don't really have any equivalent to with or Python's
generator-based async/await.

> Also perhaps more finegrained shutdown routines such as:
> * Nursery.join(cancel_after=None):
>   wait for all tasks to join, cancel the remaining ones
>   after the given timeout

Hmm, I've never needed that particular pattern, but it's actually
pretty easy to express. I didn't go into it in this writeup, but:
because nurseries need to be able to cancel their contents in order to
unwind the stack during exception propagation, they need to enclose
their contents in a cancel scope. And since they have this cancel
scope anyway, we expose it on the nursery object. And cancel scopes
allow you to adjust their deadline. So if you write:

async with trio.open_nursery() as nursery:
   ... blah blah ...
   # Last line before exiting the block and triggering the implicit join():
   nursery.cancel_scope.deadline = trio.current_time() + TIMEOUT

then it'll give you the semantics you're asking about. There could be
more sugar for this if it turns out to be useful. Maybe a .timeout
attribute on cancel scopes that's a magic property always equal to
(self.deadline - trio.current_time()), so you could do
'nursery.cancel_scope.timeout = TIMEOUT'?


Nathaniel J. Smith --
Async-sig mailing list
Code of Conduct:

Re: [Async-sig] New blog post: Notes on structured concurrency, or: Go statement considered harmful

2018-04-26 Thread Nathaniel Smith
On Wed, Apr 25, 2018 at 9:43 PM, Guido van Rossum  wrote:
> Now there's a PEP I'd like to see.

Which part?


Nathaniel J. Smith --
Async-sig mailing list
Code of Conduct:

Re: [Async-sig] New blog post: Notes on structured concurrency, or: Go statement considered harmful

2018-04-26 Thread Nathaniel Smith
On Thu, Apr 26, 2018 at 7:55 PM, Dima Tisnek  wrote:
> My 2c after careful reading:
> restarting tasks automatically (custom nursery example) is quite questionable:
> * it's unexpected
> * it's not generally safe (argument reuse, side effects)
> * user's coroutine can be decorated to achieve same effect

It's an example of something that a user could implement. I guess if
you go to the trouble of implementing this behavior, then it is no
longer unexpected and you can also cope with handling the edge cases
:-).There may be some reason why it turns out to be a bad idea
specifically in the context of Python, but it's one of the features
that's famously helpful for making Erlang work so well, so it seemed
worth mentioning.

> It's very nice to have the escape hatch of posting tasks to "someone
> else's" nursery.
> I feel there are more caveats to posting a task to parent's or global
> nursery though.
> Consider that local tasks typically await on other local tasks.
> What happens when N1-task1 waits on N2-task2 and N2-task9 encounters an error?
> My guess is N2-task2 is cancelled, which by default cancels N1-task1 too, 
> right?
> That kinda break the abstraction, doesn't it?

"Await on a task" is not a verb that Trio has. (We don't even have
task objects, except in some low-level plumbing/introspection APIs.)
You can do 'await queue.get()' to wait for another task to send you
something, but if the other task gets cancelled then the data will
just... never arrive.

There is some discussion here of moving from a queue.Queue-like model
to a model with separate send- and receive-channels:

If we do this (which I suspect we will), then probably the task that
gets cancelled was holding the only reference to the send-channel (or
even better, did 'with send_channel: ...'), so the channel will get
closed, and then the call to get() will raise an error which it can
handle or not...

But yes, you do need to spend some time thinking about what kind of
task tree topology makes sense for your problem. Trio can give you
tools but it's not a replacement for thoughtful design :-).

> If the escape hatch is available, how about allowing tasks to be moved
> between nurseries?

That would be possible (and in fact there's one special case
internally where we do it!), but I haven't seen a good reason yet to
implement it as a standard feature. If someone shows up with use cases
then we could talk about it :-).

> Is dependency inversion allowed?
> (as in given parent N1 and child N1.N2, can N1.N2.t2 await on N1.t1 ?)
> If that's the case, I guess it's not a "tree of tasks", as in the
> graph is arbitrary, not DAG.

See above re: not having "wait on a task" as a verb.

> I've seen [proprietary] strict DAG task frameworks.
> while they are useful to e.g. perform sub-requests in parallel,
> they are not general enough to be useful at large.
> Thus I'm assuming trio does not enforce DAG...

The task tree itself is in fact a tree, not a DAG. But that tree
doesn't control which tasks can talk to each other. It's just used for
exception propagation, and for enforcing that all children have to
finish before the parent can continue. (Just like how in a regular
function call, the caller stops while the callee is running.) Does
that help?

> Finally, slob programmers like me occasionally want fire-and-forget
> tasks, aka daemonic threads.
> Some are long-lived, e.g. "battery status poller", others short-lived,
> e.g. "tail part of low-latency logging".
> Obv., a careful programmer would keep track of those, but we want
> things simple :)
> Perhaps in line with batteries included principle, trio could include
> a standard way to accomplish that?

Well, what semantics do you want? If the battery status poller
crashes, what should happen? If the "tail part of low-latency logging"
command is still running when you go to shut down, do you want to wait
a bit for it to finish, or cancel it, or ...?

You can certainly implement some helper like:

async with open_throwaway_nursery() as throwaway_nursery:
# If this crashes, we ignore the problem, maybe log it or something
# When we exit the with block, it gets cancelled

if that's what you want. Before adding anything like this to trio
itself though I'd like to see some evidence of how it's being used in
real-ish projects.

> Thanks again for the great post!
> I think you could publish an article on this, it would be good to have
> wider discussion, academic, ES6, etc.

Thanks for the vote of confidence :-). And, we'll see...


Nathaniel J. Smith --
Async-sig mailing list
Code of Conduct:

Re: [Async-sig] New blog post: Notes on structured concurrency, or: Go statement considered harmful

2018-04-26 Thread Dima Tisnek
My 2c after careful reading:

restarting tasks automatically (custom nursery example) is quite questionable:
* it's unexpected
* it's not generally safe (argument reuse, side effects)
* user's coroutine can be decorated to achieve same effect

I'd say just remove this, it's not relevant to your thesis.

It's very nice to have the escape hatch of posting tasks to "someone
else's" nursery.
I feel there are more caveats to posting a task to parent's or global
nursery though.
Consider that local tasks typically await on other local tasks.
What happens when N1-task1 waits on N2-task2 and N2-task9 encounters an error?
My guess is N2-task2 is cancelled, which by default cancels N1-task1 too, right?
That kinda break the abstraction, doesn't it?

If the escape hatch is available, how about allowing tasks to be moved
between nurseries?
Is dependency inversion allowed?
(as in given parent N1 and child N1.N2, can N1.N2.t2 await on N1.t1 ?)
If that's the case, I guess it's not a "tree of tasks", as in the
graph is arbitrary, not DAG.

I've seen [proprietary] strict DAG task frameworks.
while they are useful to e.g. perform sub-requests in parallel,
they are not general enough to be useful at large.
Thus I'm assuming trio does not enforce DAG...

Finally, slob programmers like me occasionally want fire-and-forget
tasks, aka daemonic threads.
Some are long-lived, e.g. "battery status poller", others short-lived,
e.g. "tail part of low-latency logging".
Obv., a careful programmer would keep track of those, but we want
things simple :)
Perhaps in line with batteries included principle, trio could include
a standard way to accomplish that?

Thanks again for the great post!
I think you could publish an article on this, it would be good to have
wider discussion, academic, ES6, etc.

On 25 April 2018 at 17:24, Nathaniel Smith  wrote:
> Hi all,
> I just posted another essay on concurrent API design:
> This is the one that finally gets at the core reasons why Trio exists;
> I've been trying to figure out how to write it for at least a year
> now. I hope you like it.
> (Guido: this is the one you should read :-). Or if it's too much, you
> can jump to the conclusion [1], and I'm happy to come find you
> somewhere with a whiteboard, if that'd be helpful!)
> -n
> [1] 
> --
> Nathaniel J. Smith --
> ___
> Async-sig mailing list
> Code of Conduct:
Async-sig mailing list
Code of Conduct: