subject:"\[Async\-sig\] Blog post\: Timeouts and cancellation for humans"

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-16 Thread Nathaniel Smith

On Sun, Jan 14, 2018 at 6:33 PM, Dima Tisnek  wrote:
> Perhaps the latter is what `shield` should do? That is detach computation as
> opposed to blocking the caller past caller's deadline?

Well, it can't do that in trio :-). One of trio's core design
principles is: no detached processes.

And even if you don't think detached processes are inherently a bad
idea, I don't think they're what you'd want in this case anyway. If
your socket shutdown code has frozen, you want to kill it and close
the socket, not move it into the background where it can hang around
indefinitely wasting resources.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-15 Thread Antoine Pitrou


Hi,

On Thu, 11 Jan 2018 02:09:29 -0800
Nathaniel Smith  wrote:
> Hi all,
> 
> Folks here might be interested in this new blog post:
> 
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/
> 
> It's a detailed discussion of pitfalls and design-tradeoffs in APIs
> for timeout and cancellation, and has a proposal for handling them in
> a more Pythonic way. Any feedback welcome!

I have little constructive feedback to share, other than it is a very
insightful write-up and the API proposal there is quite interesting.

cheers,

Antoine.


___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Nick Badger

Quick preface: there are definitely times when code "smell" really isn't --
nothing's perfect! -- and sometimes some system component is unavoidably
inelegant. I think this is oftentimes (but not always) the result of
scoping: clearly I couldn't decide, as a library author, that "it's all
just broken" and rip out everything from OS to TCP to language syntax and
semantics just to make my API prettier. So I pragmatically downscope the
problem space, and it forces me to make design decisions to accommodate the
rest of the universe. And that's okay!

With that being said, I'm still not convinced that the
double-timeout-shutdown isn't an indication of upstream code smell. From a
practical standpoint, for the purposes of this discussion it really doesn't
matter; Trio et al can't go mucking about in the TCP stack internals, so we
do the best we can. But I'm willing to entertain the possibility (actually
I think it's highly likely) that there are better solutions to the
aforementioned problems than the ones used by (for example) TCP and TLS.
But that rabbit hole goes very, very deep, so to circle back, what I'm
trying to say is this:

   - I share the inclination that shielding against cancellation (or any
   equivalent workaround) is likely code smell
   - However, I personally suspect the source of that smell is upstream, in
   the network protocols themselves
   - Given that, I think some amount of smell in downstream libraries like
   Trio is unavoidable

To that end, I really like Trio's existing approach. Shielding should
definitely be used sparingly, but I think it's a justifiable, pragmatic
compromise when it comes to dealing with not-quite-perfect protocols on
even-less-perfect networks. And I think the connection close semantics Trio
provides for these situations -- attempt to close gracefully, but if
cancelled, still close unilaterally to free local resources -- is an
excellent approach. But it also "lucks out" a bit, because freeing local
resources is many orders of magnitude faster than the enclosing timeout is
likely to be, so it's effectively a "free" operation. The relative
timescales are a critical observation; if freeing local resources took one
second out of a ten-second timeout, I think you'd be stuck asking the same
question there, too.

Nick Badger
https://www.nickbadger.com

2018-01-14 20:52 GMT-08:00 Nathaniel Smith :

> On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger  wrote:
> >> However, I think this is probably a code smell. Like all code smells,
> >> there are probably cases where it's the right thing to do, but when
> >> you see it you should stop and think carefully.
> >
> > Huh. That's a really good point. But I'm not sure the source of the
> smell is
> > the code that needs the shield logic -- I think this might instead be
> > indicative of upstream code smell. Put a bit more concretely: if you're
> > writing a protocol for an unreliable network (and of course, every
> network
> > is unreliable), requiring a closure operation to transmit something over
> > that network is inherently problematic, because it inevitably leads to
> > multiple-stage timeouts or ungraceful shutdowns.
>
> I wouldn't go that far -- there are actually good reasons to design
> protocols like this.
>
> SSL/TLS is a protocol that has a "goodbye" message (they call it
> "close-notify"). According to the spec [1], sending this is mandatory
> if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
> I send you a message, "Should I buy more bitcoin?" and your reply is
> "Yes, but only if the price drops below $XX". Unbeknownst to us, we're
> being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
> what we're saying. But they can manipulate the network; for example,
> they could cause our connection to drop after the first 3 bytes of
> your message, so your answer gets truncated and I think you just said
> "Yes" -- which is very different! But, close-notify saves us -- or at
> least contains the damage. Since I know that you're supposed to send a
> close-notify at the end of your connection, and I didn't get one, I
> can tell that this is a truncated message. I can't tell what the rest
> was going to be, but at least I know the message I got isn't the
> message you intended to send. And an attacker can't forge a
> close-notify message, because they're cryptographically authenticated
> like all the data we send.
>
> In websockets, the goodbye handshake is used to work around a nasty
> case that can happen with common TCP stacks (like, all of them):
>
> 1. A sends a message to B.
> 2. A is done after that, so it closes the connection.
> 3. Just then, B sends a message to A, like maybe a regular ping on some
> timer.
> 4. A's TCP stack receives data on a closed connection, goes "huh
> wut?", and sends a RST packet.
> 5. B goes to read the last message A sent before they closed the
> connection... but whoops it's gone! the RST packet caused both TCP
>

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Nathaniel Smith

On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger  wrote:
>> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully.
>
> Huh. That's a really good point. But I'm not sure the source of the smell is
> the code that needs the shield logic -- I think this might instead be
> indicative of upstream code smell. Put a bit more concretely: if you're
> writing a protocol for an unreliable network (and of course, every network
> is unreliable), requiring a closure operation to transmit something over
> that network is inherently problematic, because it inevitably leads to
> multiple-stage timeouts or ungraceful shutdowns.

I wouldn't go that far -- there are actually good reasons to design
protocols like this.

SSL/TLS is a protocol that has a "goodbye" message (they call it
"close-notify"). According to the spec [1], sending this is mandatory
if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
I send you a message, "Should I buy more bitcoin?" and your reply is
"Yes, but only if the price drops below $XX". Unbeknownst to us, we're
being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
what we're saying. But they can manipulate the network; for example,
they could cause our connection to drop after the first 3 bytes of
your message, so your answer gets truncated and I think you just said
"Yes" -- which is very different! But, close-notify saves us -- or at
least contains the damage. Since I know that you're supposed to send a
close-notify at the end of your connection, and I didn't get one, I
can tell that this is a truncated message. I can't tell what the rest
was going to be, but at least I know the message I got isn't the
message you intended to send. And an attacker can't forge a
close-notify message, because they're cryptographically authenticated
like all the data we send.

In websockets, the goodbye handshake is used to work around a nasty
case that can happen with common TCP stacks (like, all of them):

1. A sends a message to B.
2. A is done after that, so it closes the connection.
3. Just then, B sends a message to A, like maybe a regular ping on some timer.
4. A's TCP stack receives data on a closed connection, goes "huh
wut?", and sends a RST packet.
5. B goes to read the last message A sent before they closed the
connection... but whoops it's gone! the RST packet caused both TCP
stacks to wipe out all their buffered data associated with this
connection.

So if you have a protocol that's used for streaming indefinite amounts
of data in both directions and supports stuff like pings, you kind of
have to have a goodbye handshake to avoid TCP stacks accidentally
corrupting your data. (The goodbye handshake can also help make sure
that clients end up carrying CLOSE-WAIT states instead of servers, but
that's a finicky and less important issue.)

Of course, it is absolutely true that networks are unreliable, so when
your protocol specifies a goodbye handshake like this then
implementations still need to have some way to cope if their peer
closes the connection unexpectedly, and they may need to unilaterally
close the connection in some circumstances no matter what the spec
says. Correctly handling every possible case here quickly becomes,
like, infinitely complicated. But nonetheless, as a library author one
has to try to provide some reasonable behavior by default (while
knowing that some users will end up needing to tweak things to handle
special circumstances).

My tentative approach so far in Trio is (a) make cancellation stateful
like discussed in the blog post, because accidentally hanging forever
just can't be a good default, (b) in the "trio.abc.AsyncResource"
interface that complex objects like trio.SSLStream implement (and we
recommend libraries implement too), the semantics for the aclose and
__aexit__ methods are that they're allowed to block forever trying to
do a graceful shutdown, but if cancelled then they have to return
promptly *but still freeing any underlying resources*, possibly in a
non-graceful way. So if you write straightforward code like:

with trio.move_on_after(10):
async with open_websocket_connection(...):
...

then it tries to do a proper websocket goodbye handshake by default,
but if the timeout expires then it gives up and immediately closes the
socket. It's not perfect, but it seems like a better default than
anything else I can think of.

-n

[1] There's also this whole mess where many SSL/TLS implementations
ignore the spec and don't bother sending close-notify. This is *kinda*
justifiable because the original and most popular use for SSL/TLS is
for wrapping HTTP connections, and HTTP has its own ways of signaling
the end of the connection that are already transmitted through the
encrypted tunnel, so the SSL/TLS end-of-connection handshake is
redundant. Therefore lots of

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Nathaniel Smith

On Sun, Jan 14, 2018 at 5:11 AM, Chris Jerdonek
 wrote:
> On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smith  wrote:
>> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>>  wrote:
>>> Say you have a complex operation that you want to be able to timeout
>>> or cancel, but the process of cleanup / cancelling might also require
>>> a certain amount of time that you'd want to allow time for (likely a
>>> smaller time in normal circumstances). Then it seems like you'd want
>>> to be able to allocate a separate timeout for the clean-up portion
>>> (independent of the timeout allotted for the original operation).
>>> ...
>>
>> You can get these semantics using the "shielding" feature, which the
>> post discusses a bit later:
>> ...
>> However, I think this is probably a code smell.
>
> I agree with this assessment. My sense was that shielding could
> probably do it, but it seems like it could be brittle or more of a
> kludge. It would be nice if the same primitive could be used to
> accommodate this and other variations in addition to the normal case.
> For example, a related variation might be if you wanted to let
> yourself extend the timeout in response to certain actions or results.
>
> The main idea that occurs to me is letting the cancel scope be
> dynamic: the timeout could be allowed to change in response to certain
> things. Something like that seems like it has the potential to be both
> simple as well as general enough to accommodate lots of different
> scenarios, including adjusting the timeout in response to entering a
> clean-up phase. One good test would be whether shielding could be
> implemented using such a primitive.

Ah, if you want to change the timeout on a specific cancel scope, that's easy:

async def do_something():
with move_on_after(10) as cscope:
...
# Actually, let's give ourselves a bit more time
cscope.deadline += 10
...

If you have a reference to a Trio cancel scope, you can change its
timeout at any time. However, this is different from shielding. The
code above only changes the deadline for that particular cancel scope.
If the caller sets their own timeout:

with move_on_after(15):
await do_something()

then the code will still get cancelled after 15 seconds when the outer
cancel scope's deadline expires, even though the inner scope ended up
with a 20 second timeout.

Shielding is about disabling outer cancel scopes -- the ones you don't
know about! -- in a particular bit of code. (If you compare to C#'s
cancellation sources or Golang's context-based cancellation, it's like
writing a function that intentionally choose not to pass through the
cancel token it was given into some function it calls.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Dima Tisnek

I suppose the websocket case ought to follow conventions similar to kernel
TCP API where `close` returns immediately but continues to send packets
behind the scenes. It could look something like this:


with move_on_after(10):
await get_ws_message(url):

async def get_ws_message(url):
async def close():
if sock and sock.is_connected and ...:
await sock.send(build_close_packet())
await sock.recv()  # or something
if sock:
sock.close()

sock = socket.socket()
try:
await sock.connect(url)
data = sock.recv(...)
return decode(data)
finally:
with move_on_after(30):
someio.spawn_tak(close())


I believe the concern is more general than supporting "broken" protocols,
like websocket.

When someone writes `with move_on_after(N): a = await foo()` it can be
understood in two ways:

* perform foo for N seconds or else, or
* I want the result in N seconds or else

The latter doesn't imply that foo should be interrupted, only that caller
wishes to proceed without the result. It makes sense if the action involves
an unrelated, long-running process, where `foo()` is something like
`anext(some_async_generator)`.

Both solve the original concern, that caller should not block for more than
N.
I suppose one can be implemented in terms of the other.

Perhaps the latter is what `shield` should do? That is detach computation
as opposed to blocking the caller past caller's deadline?

What do you all think?


On Mon, 15 Jan 2018 at 6:45 AM, Nick Badger  wrote:

> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully.
>
>
> Huh. That's a really good point. But I'm not sure the source of the smell
> is the code that needs the shield logic -- I think this might instead be
> indicative of upstream code smell. Put a bit more concretely: if you're
> writing a protocol for an unreliable network (and of course, every network
> is unreliable), requiring a closure operation to transmit something over
> that network is inherently problematic, because it inevitably leads to
> multiple-stage timeouts or ungraceful shutdowns.
>
> Clearly, changing anything upstream is out of scope here. So if the smell
> is, in fact, "upwind", there's not really much you could do about that in
> asyncio, Curio, Trio, etc, other than minimize the additional smell you
> need to accommodate smelly protocols. Unfortunately, I'm not sure there's
> any one approach to that problem that isn't application-specific.
>
>
> Nick Badger
> https://www.nickbadger.com
>
> 2018-01-14 3:33 GMT-08:00 Nathaniel Smith :
>
>> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>>  wrote:
>> > Thanks, Nathaniel. Very instructive, thought-provoking write-up!
>> >
>> > One thing occurred to me around the time of reading this passage:
>> >
>> >> "Once the cancel token is triggered, then all future operations on
>> that token are cancelled, so the call to ws.close doesn't get stuck. It's a
>> less error-prone paradigm. ... If you follow the path we did in this blog
>> post, and start by thinking about applying a timeout to a complex operation
>> composed out of multiple blocking calls, then it's obvious that if the
>> first call uses up the whole timeout budget, then any future calls should
>> fail immediately."
>> >
>> > One case that's not clear how should be addressed is the following.
>> > It's something I've wrestled with in the context of asyncio, and it
>> > doesn't seem to be raised as a possibility in your write-up.
>> >
>> > Say you have a complex operation that you want to be able to timeout
>> > or cancel, but the process of cleanup / cancelling might also require
>> > a certain amount of time that you'd want to allow time for (likely a
>> > smaller time in normal circumstances). Then it seems like you'd want
>> > to be able to allocate a separate timeout for the clean-up portion
>> > (independent of the timeout allotted for the original operation).
>> >
>> > It's not clear to me how this case would best be handled with the
>> > primitives you described. In your text above ("then any future calls
>> > should fail immediately"), without any changes, it seems there
>> > wouldn't be "time" for any clean-up to complete.
>> >
>> > With asyncio, one way to handle this is to await on a task with a
>> > smaller timeout after calling task.cancel(). That lets you assign a
>> > different timeout to waiting for cancellation to complete.
>>
>> You can get these semantics using the "shielding" feature, which the
>> post discusses a bit later:
>>
>> try:
>> await do_some_stuff()
>> finally:
>> # Always give this 30 seconds to clean up, even if we've
>> # been cancelled
>> with trio.move_on_after(30) as cscope:
>> cscope.shield = True
>> await

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Nick Badger

>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully.


Huh. That's a really good point. But I'm not sure the source of the smell
is the code that needs the shield logic -- I think this might instead be
indicative of upstream code smell. Put a bit more concretely: if you're
writing a protocol for an unreliable network (and of course, every network
is unreliable), requiring a closure operation to transmit something over
that network is inherently problematic, because it inevitably leads to
multiple-stage timeouts or ungraceful shutdowns.

Clearly, changing anything upstream is out of scope here. So if the smell
is, in fact, "upwind", there's not really much you could do about that in
asyncio, Curio, Trio, etc, other than minimize the additional smell you
need to accommodate smelly protocols. Unfortunately, I'm not sure there's
any one approach to that problem that isn't application-specific.

Nick Badger
https://www.nickbadger.com

2018-01-14 3:33 GMT-08:00 Nathaniel Smith :

> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>  wrote:
> > Thanks, Nathaniel. Very instructive, thought-provoking write-up!
> >
> > One thing occurred to me around the time of reading this passage:
> >
> >> "Once the cancel token is triggered, then all future operations on that
> token are cancelled, so the call to ws.close doesn't get stuck. It's a less
> error-prone paradigm. ... If you follow the path we did in this blog post,
> and start by thinking about applying a timeout to a complex operation
> composed out of multiple blocking calls, then it's obvious that if the
> first call uses up the whole timeout budget, then any future calls should
> fail immediately."
> >
> > One case that's not clear how should be addressed is the following.
> > It's something I've wrestled with in the context of asyncio, and it
> > doesn't seem to be raised as a possibility in your write-up.
> >
> > Say you have a complex operation that you want to be able to timeout
> > or cancel, but the process of cleanup / cancelling might also require
> > a certain amount of time that you'd want to allow time for (likely a
> > smaller time in normal circumstances). Then it seems like you'd want
> > to be able to allocate a separate timeout for the clean-up portion
> > (independent of the timeout allotted for the original operation).
> >
> > It's not clear to me how this case would best be handled with the
> > primitives you described. In your text above ("then any future calls
> > should fail immediately"), without any changes, it seems there
> > wouldn't be "time" for any clean-up to complete.
> >
> > With asyncio, one way to handle this is to await on a task with a
> > smaller timeout after calling task.cancel(). That lets you assign a
> > different timeout to waiting for cancellation to complete.
>
> You can get these semantics using the "shielding" feature, which the
> post discusses a bit later:
>
> try:
> await do_some_stuff()
> finally:
> # Always give this 30 seconds to clean up, even if we've
> # been cancelled
> with trio.move_on_after(30) as cscope:
> cscope.shield = True
> await do_cleanup()
>
> Here the inner scope "hides" the code inside it from any external
> cancel scopes, so it can continue executing even of the overall
> context has been cancelled.
>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully. If you're writing code
> like this, then it means that there are multiple different layers in
> your code that are implementing timeout policies, that might end up
> fighting with each other. What if the caller really needs this to
> finish in 15 seconds? So if you have some way to move the timeout
> handling into the same layer, then I suspect that will make your
> program easier to understand and maintain. OTOH, if you decide you
> want it, the code above works :-). I'm not 100% sure here; I'd
> definitely be interested to hear about more use cases.
>
> One thing I've thought about that might help is adding a kind of "soft
> cancelled" state to the cancel scopes, inspired by the "graceful
> shutdown" mode that you'll often see in servers where you stop
> accepting new connections, then try to finish up old ones (with some
> time limit). So in this case you might mark 'do_some_stuff()' as being
> cancelled immediately when we entered the 'soft cancel' phase, but let
> the 'do_cleanup' code keep running until the grace period expired and
> the region was hard-cancelled. This idea isn't fully baked yet though.
> (There's some more mumbling about this at
> https://github.com/python-trio/trio/issues/147.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Chris Jerdonek

On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smith  wrote:
> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>  wrote:
>> Say you have a complex operation that you want to be able to timeout
>> or cancel, but the process of cleanup / cancelling might also require
>> a certain amount of time that you'd want to allow time for (likely a
>> smaller time in normal circumstances). Then it seems like you'd want
>> to be able to allocate a separate timeout for the clean-up portion
>> (independent of the timeout allotted for the original operation).
>> ...
>
> You can get these semantics using the "shielding" feature, which the
> post discusses a bit later:
> ...
> However, I think this is probably a code smell.

I agree with this assessment. My sense was that shielding could
probably do it, but it seems like it could be brittle or more of a
kludge. It would be nice if the same primitive could be used to
accommodate this and other variations in addition to the normal case.
For example, a related variation might be if you wanted to let
yourself extend the timeout in response to certain actions or results.

The main idea that occurs to me is letting the cancel scope be
dynamic: the timeout could be allowed to change in response to certain
things. Something like that seems like it has the potential to be both
simple as well as general enough to accommodate lots of different
scenarios, including adjusting the timeout in response to entering a
clean-up phase. One good test would be whether shielding could be
implemented using such a primitive.

--Chris

> Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully. If you're writing code
> like this, then it means that there are multiple different layers in
> your code that are implementing timeout policies, that might end up
> fighting with each other. What if the caller really needs this to
> finish in 15 seconds? So if you have some way to move the timeout
> handling into the same layer, then I suspect that will make your
> program easier to understand and maintain. OTOH, if you decide you
> want it, the code above works :-). I'm not 100% sure here; I'd
> definitely be interested to hear about more use cases.
>
> One thing I've thought about that might help is adding a kind of "soft
> cancelled" state to the cancel scopes, inspired by the "graceful
> shutdown" mode that you'll often see in servers where you stop
> accepting new connections, then try to finish up old ones (with some
> time limit). So in this case you might mark 'do_some_stuff()' as being
> cancelled immediately when we entered the 'soft cancel' phase, but let
> the 'do_cleanup' code keep running until the grace period expired and
> the region was hard-cancelled. This idea isn't fully baked yet though.
> (There's some more mumbling about this at
> https://github.com/python-trio/trio/issues/147.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-14 Thread Nathaniel Smith

On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
 wrote:
> Thanks, Nathaniel. Very instructive, thought-provoking write-up!
>
> One thing occurred to me around the time of reading this passage:
>
>> "Once the cancel token is triggered, then all future operations on that 
>> token are cancelled, so the call to ws.close doesn't get stuck. It's a less 
>> error-prone paradigm. ... If you follow the path we did in this blog post, 
>> and start by thinking about applying a timeout to a complex operation 
>> composed out of multiple blocking calls, then it's obvious that if the first 
>> call uses up the whole timeout budget, then any future calls should fail 
>> immediately."
>
> One case that's not clear how should be addressed is the following.
> It's something I've wrestled with in the context of asyncio, and it
> doesn't seem to be raised as a possibility in your write-up.
>
> Say you have a complex operation that you want to be able to timeout
> or cancel, but the process of cleanup / cancelling might also require
> a certain amount of time that you'd want to allow time for (likely a
> smaller time in normal circumstances). Then it seems like you'd want
> to be able to allocate a separate timeout for the clean-up portion
> (independent of the timeout allotted for the original operation).
>
> It's not clear to me how this case would best be handled with the
> primitives you described. In your text above ("then any future calls
> should fail immediately"), without any changes, it seems there
> wouldn't be "time" for any clean-up to complete.
>
> With asyncio, one way to handle this is to await on a task with a
> smaller timeout after calling task.cancel(). That lets you assign a
> different timeout to waiting for cancellation to complete.

You can get these semantics using the "shielding" feature, which the
post discusses a bit later:

try:
await do_some_stuff()
finally:
# Always give this 30 seconds to clean up, even if we've
# been cancelled
with trio.move_on_after(30) as cscope:
cscope.shield = True
await do_cleanup()

Here the inner scope "hides" the code inside it from any external
cancel scopes, so it can continue executing even of the overall
context has been cancelled.

However, I think this is probably a code smell. Like all code smells,
there are probably cases where it's the right thing to do, but when
you see it you should stop and think carefully. If you're writing code
like this, then it means that there are multiple different layers in
your code that are implementing timeout policies, that might end up
fighting with each other. What if the caller really needs this to
finish in 15 seconds? So if you have some way to move the timeout
handling into the same layer, then I suspect that will make your
program easier to understand and maintain. OTOH, if you decide you
want it, the code above works :-). I'm not 100% sure here; I'd
definitely be interested to hear about more use cases.

One thing I've thought about that might help is adding a kind of "soft
cancelled" state to the cancel scopes, inspired by the "graceful
shutdown" mode that you'll often see in servers where you stop
accepting new connections, then try to finish up old ones (with some
time limit). So in this case you might mark 'do_some_stuff()' as being
cancelled immediately when we entered the 'soft cancel' phase, but let
the 'do_cleanup' code keep running until the grace period expired and
the region was hard-cancelled. This idea isn't fully baked yet though.
(There's some more mumbling about this at
https://github.com/python-trio/trio/issues/147.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-13 Thread Nathaniel Smith

On Thu, Jan 11, 2018 at 7:49 PM, Dima Tisnek  wrote:
> Very nice read, Nathaniel.
>
> The post left me wondering how cancel tokens interact or should
> logically interact with async composition, for example:
>
> with move_on_after(10):
> await someio.gather(a(), b(), c())
>
> or
>
> with move_on_after(10):
> await someio.first/race(a(), b(), c())
>
> or
>
> dataset = someio.Future(large_download(), move_on_after=)
>
> task a:
> with move_on_after(10):
> use((await dataset)["a"])
>
> task b:
> with move_on_after(10):
> use((await dataset)["b"])

It's funny you say "async composition"... Trio's concurrency primitive
(nurseries) is closely related to the core concurrency primitive in
Communicating Sequential Processes, which they call "parallel
composition". (Basically, if P and Q are processes, then "P || Q" is
the process that runs both P and Q in parallel and then finishes when
they've both finished.) If you were using that as your primitive, then
tasks would form an orderly tree and this wouldn't be a problem :-).

Given asyncio's actual primitives though, then yeah, this is clearly
the big question, and I doubt there are any simple answers; so far my
ambition has just been to articulate the problem well enough to start
that conversation (see also the "asyncio" section in the blog post).

One possibility might be a hybrid cancel token / cancel scope API:
create a first class cancel token API like C# has, enhance make the
low-level asyncio APIs to use them, and then on top of that add
mechanisms to attach a stack of implicitly-applied cancel tokens to
each task? That's just a vague handwave of an idea so far though.

Note that last case is the one where asyncio cancellation semantics
are already... well, surprising, anyway. If you cancel task a then
task b will receive a CancelledError, even though task a was not
cancelled. (I talked about this a bit in my "Some thoughts ..." blog
post; search for "spooky-cancellation-at-a-distance.py".)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-12 Thread Chris Jerdonek

Thanks, Nathaniel. Very instructive, thought-provoking write-up!

One thing occurred to me around the time of reading this passage:

> "Once the cancel token is triggered, then all future operations on that token 
> are cancelled, so the call to ws.close doesn't get stuck. It's a less 
> error-prone paradigm. ... If you follow the path we did in this blog post, 
> and start by thinking about applying a timeout to a complex operation 
> composed out of multiple blocking calls, then it's obvious that if the first 
> call uses up the whole timeout budget, then any future calls should fail 
> immediately."

One case that's not clear how should be addressed is the following.
It's something I've wrestled with in the context of asyncio, and it
doesn't seem to be raised as a possibility in your write-up.

Say you have a complex operation that you want to be able to timeout
or cancel, but the process of cleanup / cancelling might also require
a certain amount of time that you'd want to allow time for (likely a
smaller time in normal circumstances). Then it seems like you'd want
to be able to allocate a separate timeout for the clean-up portion
(independent of the timeout allotted for the original operation).

It's not clear to me how this case would best be handled with the
primitives you described. In your text above ("then any future calls
should fail immediately"), without any changes, it seems there
wouldn't be "time" for any clean-up to complete.

With asyncio, one way to handle this is to await on a task with a
smaller timeout after calling task.cancel(). That lets you assign a
different timeout to waiting for cancellation to complete.

--Chris

On Thu, Jan 11, 2018 at 2:09 AM, Nathaniel Smith  wrote:
> Hi all,
>
> Folks here might be interested in this new blog post:
>
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/
>
> It's a detailed discussion of pitfalls and design-tradeoffs in APIs
> for timeout and cancellation, and has a proposal for handling them in
> a more Pythonic way. Any feedback welcome!
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> Async-sig mailing list
> Async-sig@python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

[Async-sig] Blog post: Timeouts and cancellation for humans

2018-01-11 Thread Nathaniel Smith

Hi all,

Folks here might be interested in this new blog post:

https://vorpus.org/blog/timeouts-and-cancellation-for-humans/

It's a detailed discussion of pitfalls and design-tradeoffs in APIs
for timeout and cancellation, and has a proposal for handling them in
a more Pythonic way. Any feedback welcome!

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

Re: [Async-sig] Blog post: Timeouts and cancellation for humans

[Async-sig] Blog post: Timeouts and cancellation for humans

12 matches

Site Navigation

Mail list logo

Footer information