Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Sun, Jan 14, 2018 at 6:33 PM, Dima Tisnekwrote: > Perhaps the latter is what `shield` should do? That is detach computation as > opposed to blocking the caller past caller's deadline? Well, it can't do that in trio :-). One of trio's core design principles is: no detached processes. And even if you don't think detached processes are inherently a bad idea, I don't think they're what you'd want in this case anyway. If your socket shutdown code has frozen, you want to kill it and close the socket, not move it into the background where it can hang around indefinitely wasting resources. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
Hi, On Thu, 11 Jan 2018 02:09:29 -0800 Nathaniel Smithwrote: > Hi all, > > Folks here might be interested in this new blog post: > > https://vorpus.org/blog/timeouts-and-cancellation-for-humans/ > > It's a detailed discussion of pitfalls and design-tradeoffs in APIs > for timeout and cancellation, and has a proposal for handling them in > a more Pythonic way. Any feedback welcome! I have little constructive feedback to share, other than it is a very insightful write-up and the API proposal there is quite interesting. cheers, Antoine. ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
Quick preface: there are definitely times when code "smell" really isn't -- nothing's perfect! -- and sometimes some system component is unavoidably inelegant. I think this is oftentimes (but not always) the result of scoping: clearly I couldn't decide, as a library author, that "it's all just broken" and rip out everything from OS to TCP to language syntax and semantics just to make my API prettier. So I pragmatically downscope the problem space, and it forces me to make design decisions to accommodate the rest of the universe. And that's okay! With that being said, I'm still not convinced that the double-timeout-shutdown isn't an indication of upstream code smell. From a practical standpoint, for the purposes of this discussion it really doesn't matter; Trio et al can't go mucking about in the TCP stack internals, so we do the best we can. But I'm willing to entertain the possibility (actually I think it's highly likely) that there are better solutions to the aforementioned problems than the ones used by (for example) TCP and TLS. But that rabbit hole goes very, very deep, so to circle back, what I'm trying to say is this: - I share the inclination that shielding against cancellation (or any equivalent workaround) is likely code smell - However, I personally suspect the source of that smell is upstream, in the network protocols themselves - Given that, I think some amount of smell in downstream libraries like Trio is unavoidable To that end, I really like Trio's existing approach. Shielding should definitely be used sparingly, but I think it's a justifiable, pragmatic compromise when it comes to dealing with not-quite-perfect protocols on even-less-perfect networks. And I think the connection close semantics Trio provides for these situations -- attempt to close gracefully, but if cancelled, still close unilaterally to free local resources -- is an excellent approach. But it also "lucks out" a bit, because freeing local resources is many orders of magnitude faster than the enclosing timeout is likely to be, so it's effectively a "free" operation. The relative timescales are a critical observation; if freeing local resources took one second out of a ten-second timeout, I think you'd be stuck asking the same question there, too. Nick Badger https://www.nickbadger.com 2018-01-14 20:52 GMT-08:00 Nathaniel Smith: > On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger wrote: > >> However, I think this is probably a code smell. Like all code smells, > >> there are probably cases where it's the right thing to do, but when > >> you see it you should stop and think carefully. > > > > Huh. That's a really good point. But I'm not sure the source of the > smell is > > the code that needs the shield logic -- I think this might instead be > > indicative of upstream code smell. Put a bit more concretely: if you're > > writing a protocol for an unreliable network (and of course, every > network > > is unreliable), requiring a closure operation to transmit something over > > that network is inherently problematic, because it inevitably leads to > > multiple-stage timeouts or ungraceful shutdowns. > > I wouldn't go that far -- there are actually good reasons to design > protocols like this. > > SSL/TLS is a protocol that has a "goodbye" message (they call it > "close-notify"). According to the spec [1], sending this is mandatory > if you want to cleanly shut down an SSL/TLS connection. Why? Well, say > I send you a message, "Should I buy more bitcoin?" and your reply is > "Yes, but only if the price drops below $XX". Unbeknownst to us, we're > being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter > what we're saying. But they can manipulate the network; for example, > they could cause our connection to drop after the first 3 bytes of > your message, so your answer gets truncated and I think you just said > "Yes" -- which is very different! But, close-notify saves us -- or at > least contains the damage. Since I know that you're supposed to send a > close-notify at the end of your connection, and I didn't get one, I > can tell that this is a truncated message. I can't tell what the rest > was going to be, but at least I know the message I got isn't the > message you intended to send. And an attacker can't forge a > close-notify message, because they're cryptographically authenticated > like all the data we send. > > In websockets, the goodbye handshake is used to work around a nasty > case that can happen with common TCP stacks (like, all of them): > > 1. A sends a message to B. > 2. A is done after that, so it closes the connection. > 3. Just then, B sends a message to A, like maybe a regular ping on some > timer. > 4. A's TCP stack receives data on a closed connection, goes "huh > wut?", and sends a RST packet. > 5. B goes to read the last message A sent before they closed the > connection... but whoops it's gone! the RST packet caused both TCP >
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Sun, Jan 14, 2018 at 2:45 PM, Nick Badgerwrote: >> However, I think this is probably a code smell. Like all code smells, >> there are probably cases where it's the right thing to do, but when >> you see it you should stop and think carefully. > > Huh. That's a really good point. But I'm not sure the source of the smell is > the code that needs the shield logic -- I think this might instead be > indicative of upstream code smell. Put a bit more concretely: if you're > writing a protocol for an unreliable network (and of course, every network > is unreliable), requiring a closure operation to transmit something over > that network is inherently problematic, because it inevitably leads to > multiple-stage timeouts or ungraceful shutdowns. I wouldn't go that far -- there are actually good reasons to design protocols like this. SSL/TLS is a protocol that has a "goodbye" message (they call it "close-notify"). According to the spec [1], sending this is mandatory if you want to cleanly shut down an SSL/TLS connection. Why? Well, say I send you a message, "Should I buy more bitcoin?" and your reply is "Yes, but only if the price drops below $XX". Unbeknownst to us, we're being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter what we're saying. But they can manipulate the network; for example, they could cause our connection to drop after the first 3 bytes of your message, so your answer gets truncated and I think you just said "Yes" -- which is very different! But, close-notify saves us -- or at least contains the damage. Since I know that you're supposed to send a close-notify at the end of your connection, and I didn't get one, I can tell that this is a truncated message. I can't tell what the rest was going to be, but at least I know the message I got isn't the message you intended to send. And an attacker can't forge a close-notify message, because they're cryptographically authenticated like all the data we send. In websockets, the goodbye handshake is used to work around a nasty case that can happen with common TCP stacks (like, all of them): 1. A sends a message to B. 2. A is done after that, so it closes the connection. 3. Just then, B sends a message to A, like maybe a regular ping on some timer. 4. A's TCP stack receives data on a closed connection, goes "huh wut?", and sends a RST packet. 5. B goes to read the last message A sent before they closed the connection... but whoops it's gone! the RST packet caused both TCP stacks to wipe out all their buffered data associated with this connection. So if you have a protocol that's used for streaming indefinite amounts of data in both directions and supports stuff like pings, you kind of have to have a goodbye handshake to avoid TCP stacks accidentally corrupting your data. (The goodbye handshake can also help make sure that clients end up carrying CLOSE-WAIT states instead of servers, but that's a finicky and less important issue.) Of course, it is absolutely true that networks are unreliable, so when your protocol specifies a goodbye handshake like this then implementations still need to have some way to cope if their peer closes the connection unexpectedly, and they may need to unilaterally close the connection in some circumstances no matter what the spec says. Correctly handling every possible case here quickly becomes, like, infinitely complicated. But nonetheless, as a library author one has to try to provide some reasonable behavior by default (while knowing that some users will end up needing to tweak things to handle special circumstances). My tentative approach so far in Trio is (a) make cancellation stateful like discussed in the blog post, because accidentally hanging forever just can't be a good default, (b) in the "trio.abc.AsyncResource" interface that complex objects like trio.SSLStream implement (and we recommend libraries implement too), the semantics for the aclose and __aexit__ methods are that they're allowed to block forever trying to do a graceful shutdown, but if cancelled then they have to return promptly *but still freeing any underlying resources*, possibly in a non-graceful way. So if you write straightforward code like: with trio.move_on_after(10): async with open_websocket_connection(...): ... then it tries to do a proper websocket goodbye handshake by default, but if the timeout expires then it gives up and immediately closes the socket. It's not perfect, but it seems like a better default than anything else I can think of. -n [1] There's also this whole mess where many SSL/TLS implementations ignore the spec and don't bother sending close-notify. This is *kinda* justifiable because the original and most popular use for SSL/TLS is for wrapping HTTP connections, and HTTP has its own ways of signaling the end of the connection that are already transmitted through the encrypted tunnel, so the SSL/TLS end-of-connection handshake is redundant. Therefore lots of
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Sun, Jan 14, 2018 at 5:11 AM, Chris Jerdonekwrote: > On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smith wrote: >> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek >> wrote: >>> Say you have a complex operation that you want to be able to timeout >>> or cancel, but the process of cleanup / cancelling might also require >>> a certain amount of time that you'd want to allow time for (likely a >>> smaller time in normal circumstances). Then it seems like you'd want >>> to be able to allocate a separate timeout for the clean-up portion >>> (independent of the timeout allotted for the original operation). >>> ... >> >> You can get these semantics using the "shielding" feature, which the >> post discusses a bit later: >> ... >> However, I think this is probably a code smell. > > I agree with this assessment. My sense was that shielding could > probably do it, but it seems like it could be brittle or more of a > kludge. It would be nice if the same primitive could be used to > accommodate this and other variations in addition to the normal case. > For example, a related variation might be if you wanted to let > yourself extend the timeout in response to certain actions or results. > > The main idea that occurs to me is letting the cancel scope be > dynamic: the timeout could be allowed to change in response to certain > things. Something like that seems like it has the potential to be both > simple as well as general enough to accommodate lots of different > scenarios, including adjusting the timeout in response to entering a > clean-up phase. One good test would be whether shielding could be > implemented using such a primitive. Ah, if you want to change the timeout on a specific cancel scope, that's easy: async def do_something(): with move_on_after(10) as cscope: ... # Actually, let's give ourselves a bit more time cscope.deadline += 10 ... If you have a reference to a Trio cancel scope, you can change its timeout at any time. However, this is different from shielding. The code above only changes the deadline for that particular cancel scope. If the caller sets their own timeout: with move_on_after(15): await do_something() then the code will still get cancelled after 15 seconds when the outer cancel scope's deadline expires, even though the inner scope ended up with a 20 second timeout. Shielding is about disabling outer cancel scopes -- the ones you don't know about! -- in a particular bit of code. (If you compare to C#'s cancellation sources or Golang's context-based cancellation, it's like writing a function that intentionally choose not to pass through the cancel token it was given into some function it calls.) -n -- Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
I suppose the websocket case ought to follow conventions similar to kernel TCP API where `close` returns immediately but continues to send packets behind the scenes. It could look something like this: with move_on_after(10): await get_ws_message(url): async def get_ws_message(url): async def close(): if sock and sock.is_connected and ...: await sock.send(build_close_packet()) await sock.recv() # or something if sock: sock.close() sock = socket.socket() try: await sock.connect(url) data = sock.recv(...) return decode(data) finally: with move_on_after(30): someio.spawn_tak(close()) I believe the concern is more general than supporting "broken" protocols, like websocket. When someone writes `with move_on_after(N): a = await foo()` it can be understood in two ways: * perform foo for N seconds or else, or * I want the result in N seconds or else The latter doesn't imply that foo should be interrupted, only that caller wishes to proceed without the result. It makes sense if the action involves an unrelated, long-running process, where `foo()` is something like `anext(some_async_generator)`. Both solve the original concern, that caller should not block for more than N. I suppose one can be implemented in terms of the other. Perhaps the latter is what `shield` should do? That is detach computation as opposed to blocking the caller past caller's deadline? What do you all think? On Mon, 15 Jan 2018 at 6:45 AM, Nick Badgerwrote: > However, I think this is probably a code smell. Like all code smells, >> there are probably cases where it's the right thing to do, but when >> you see it you should stop and think carefully. > > > Huh. That's a really good point. But I'm not sure the source of the smell > is the code that needs the shield logic -- I think this might instead be > indicative of upstream code smell. Put a bit more concretely: if you're > writing a protocol for an unreliable network (and of course, every network > is unreliable), requiring a closure operation to transmit something over > that network is inherently problematic, because it inevitably leads to > multiple-stage timeouts or ungraceful shutdowns. > > Clearly, changing anything upstream is out of scope here. So if the smell > is, in fact, "upwind", there's not really much you could do about that in > asyncio, Curio, Trio, etc, other than minimize the additional smell you > need to accommodate smelly protocols. Unfortunately, I'm not sure there's > any one approach to that problem that isn't application-specific. > > > Nick Badger > https://www.nickbadger.com > > 2018-01-14 3:33 GMT-08:00 Nathaniel Smith : > >> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek >> wrote: >> > Thanks, Nathaniel. Very instructive, thought-provoking write-up! >> > >> > One thing occurred to me around the time of reading this passage: >> > >> >> "Once the cancel token is triggered, then all future operations on >> that token are cancelled, so the call to ws.close doesn't get stuck. It's a >> less error-prone paradigm. ... If you follow the path we did in this blog >> post, and start by thinking about applying a timeout to a complex operation >> composed out of multiple blocking calls, then it's obvious that if the >> first call uses up the whole timeout budget, then any future calls should >> fail immediately." >> > >> > One case that's not clear how should be addressed is the following. >> > It's something I've wrestled with in the context of asyncio, and it >> > doesn't seem to be raised as a possibility in your write-up. >> > >> > Say you have a complex operation that you want to be able to timeout >> > or cancel, but the process of cleanup / cancelling might also require >> > a certain amount of time that you'd want to allow time for (likely a >> > smaller time in normal circumstances). Then it seems like you'd want >> > to be able to allocate a separate timeout for the clean-up portion >> > (independent of the timeout allotted for the original operation). >> > >> > It's not clear to me how this case would best be handled with the >> > primitives you described. In your text above ("then any future calls >> > should fail immediately"), without any changes, it seems there >> > wouldn't be "time" for any clean-up to complete. >> > >> > With asyncio, one way to handle this is to await on a task with a >> > smaller timeout after calling task.cancel(). That lets you assign a >> > different timeout to waiting for cancellation to complete. >> >> You can get these semantics using the "shielding" feature, which the >> post discusses a bit later: >> >> try: >> await do_some_stuff() >> finally: >> # Always give this 30 seconds to clean up, even if we've >> # been cancelled >> with trio.move_on_after(30) as cscope: >> cscope.shield = True >> await
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
> > However, I think this is probably a code smell. Like all code smells, > there are probably cases where it's the right thing to do, but when > you see it you should stop and think carefully. Huh. That's a really good point. But I'm not sure the source of the smell is the code that needs the shield logic -- I think this might instead be indicative of upstream code smell. Put a bit more concretely: if you're writing a protocol for an unreliable network (and of course, every network is unreliable), requiring a closure operation to transmit something over that network is inherently problematic, because it inevitably leads to multiple-stage timeouts or ungraceful shutdowns. Clearly, changing anything upstream is out of scope here. So if the smell is, in fact, "upwind", there's not really much you could do about that in asyncio, Curio, Trio, etc, other than minimize the additional smell you need to accommodate smelly protocols. Unfortunately, I'm not sure there's any one approach to that problem that isn't application-specific. Nick Badger https://www.nickbadger.com 2018-01-14 3:33 GMT-08:00 Nathaniel Smith: > On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek > wrote: > > Thanks, Nathaniel. Very instructive, thought-provoking write-up! > > > > One thing occurred to me around the time of reading this passage: > > > >> "Once the cancel token is triggered, then all future operations on that > token are cancelled, so the call to ws.close doesn't get stuck. It's a less > error-prone paradigm. ... If you follow the path we did in this blog post, > and start by thinking about applying a timeout to a complex operation > composed out of multiple blocking calls, then it's obvious that if the > first call uses up the whole timeout budget, then any future calls should > fail immediately." > > > > One case that's not clear how should be addressed is the following. > > It's something I've wrestled with in the context of asyncio, and it > > doesn't seem to be raised as a possibility in your write-up. > > > > Say you have a complex operation that you want to be able to timeout > > or cancel, but the process of cleanup / cancelling might also require > > a certain amount of time that you'd want to allow time for (likely a > > smaller time in normal circumstances). Then it seems like you'd want > > to be able to allocate a separate timeout for the clean-up portion > > (independent of the timeout allotted for the original operation). > > > > It's not clear to me how this case would best be handled with the > > primitives you described. In your text above ("then any future calls > > should fail immediately"), without any changes, it seems there > > wouldn't be "time" for any clean-up to complete. > > > > With asyncio, one way to handle this is to await on a task with a > > smaller timeout after calling task.cancel(). That lets you assign a > > different timeout to waiting for cancellation to complete. > > You can get these semantics using the "shielding" feature, which the > post discusses a bit later: > > try: > await do_some_stuff() > finally: > # Always give this 30 seconds to clean up, even if we've > # been cancelled > with trio.move_on_after(30) as cscope: > cscope.shield = True > await do_cleanup() > > Here the inner scope "hides" the code inside it from any external > cancel scopes, so it can continue executing even of the overall > context has been cancelled. > > However, I think this is probably a code smell. Like all code smells, > there are probably cases where it's the right thing to do, but when > you see it you should stop and think carefully. If you're writing code > like this, then it means that there are multiple different layers in > your code that are implementing timeout policies, that might end up > fighting with each other. What if the caller really needs this to > finish in 15 seconds? So if you have some way to move the timeout > handling into the same layer, then I suspect that will make your > program easier to understand and maintain. OTOH, if you decide you > want it, the code above works :-). I'm not 100% sure here; I'd > definitely be interested to hear about more use cases. > > One thing I've thought about that might help is adding a kind of "soft > cancelled" state to the cancel scopes, inspired by the "graceful > shutdown" mode that you'll often see in servers where you stop > accepting new connections, then try to finish up old ones (with some > time limit). So in this case you might mark 'do_some_stuff()' as being > cancelled immediately when we entered the 'soft cancel' phase, but let > the 'do_cleanup' code keep running until the grace period expired and > the region was hard-cancelled. This idea isn't fully baked yet though. > (There's some more mumbling about this at > https://github.com/python-trio/trio/issues/147.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org >
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smithwrote: > On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek > wrote: >> Say you have a complex operation that you want to be able to timeout >> or cancel, but the process of cleanup / cancelling might also require >> a certain amount of time that you'd want to allow time for (likely a >> smaller time in normal circumstances). Then it seems like you'd want >> to be able to allocate a separate timeout for the clean-up portion >> (independent of the timeout allotted for the original operation). >> ... > > You can get these semantics using the "shielding" feature, which the > post discusses a bit later: > ... > However, I think this is probably a code smell. I agree with this assessment. My sense was that shielding could probably do it, but it seems like it could be brittle or more of a kludge. It would be nice if the same primitive could be used to accommodate this and other variations in addition to the normal case. For example, a related variation might be if you wanted to let yourself extend the timeout in response to certain actions or results. The main idea that occurs to me is letting the cancel scope be dynamic: the timeout could be allowed to change in response to certain things. Something like that seems like it has the potential to be both simple as well as general enough to accommodate lots of different scenarios, including adjusting the timeout in response to entering a clean-up phase. One good test would be whether shielding could be implemented using such a primitive. --Chris > Like all code smells, > there are probably cases where it's the right thing to do, but when > you see it you should stop and think carefully. If you're writing code > like this, then it means that there are multiple different layers in > your code that are implementing timeout policies, that might end up > fighting with each other. What if the caller really needs this to > finish in 15 seconds? So if you have some way to move the timeout > handling into the same layer, then I suspect that will make your > program easier to understand and maintain. OTOH, if you decide you > want it, the code above works :-). I'm not 100% sure here; I'd > definitely be interested to hear about more use cases. > > One thing I've thought about that might help is adding a kind of "soft > cancelled" state to the cancel scopes, inspired by the "graceful > shutdown" mode that you'll often see in servers where you stop > accepting new connections, then try to finish up old ones (with some > time limit). So in this case you might mark 'do_some_stuff()' as being > cancelled immediately when we entered the 'soft cancel' phase, but let > the 'do_cleanup' code keep running until the grace period expired and > the region was hard-cancelled. This idea isn't fully baked yet though. > (There's some more mumbling about this at > https://github.com/python-trio/trio/issues/147.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonekwrote: > Thanks, Nathaniel. Very instructive, thought-provoking write-up! > > One thing occurred to me around the time of reading this passage: > >> "Once the cancel token is triggered, then all future operations on that >> token are cancelled, so the call to ws.close doesn't get stuck. It's a less >> error-prone paradigm. ... If you follow the path we did in this blog post, >> and start by thinking about applying a timeout to a complex operation >> composed out of multiple blocking calls, then it's obvious that if the first >> call uses up the whole timeout budget, then any future calls should fail >> immediately." > > One case that's not clear how should be addressed is the following. > It's something I've wrestled with in the context of asyncio, and it > doesn't seem to be raised as a possibility in your write-up. > > Say you have a complex operation that you want to be able to timeout > or cancel, but the process of cleanup / cancelling might also require > a certain amount of time that you'd want to allow time for (likely a > smaller time in normal circumstances). Then it seems like you'd want > to be able to allocate a separate timeout for the clean-up portion > (independent of the timeout allotted for the original operation). > > It's not clear to me how this case would best be handled with the > primitives you described. In your text above ("then any future calls > should fail immediately"), without any changes, it seems there > wouldn't be "time" for any clean-up to complete. > > With asyncio, one way to handle this is to await on a task with a > smaller timeout after calling task.cancel(). That lets you assign a > different timeout to waiting for cancellation to complete. You can get these semantics using the "shielding" feature, which the post discusses a bit later: try: await do_some_stuff() finally: # Always give this 30 seconds to clean up, even if we've # been cancelled with trio.move_on_after(30) as cscope: cscope.shield = True await do_cleanup() Here the inner scope "hides" the code inside it from any external cancel scopes, so it can continue executing even of the overall context has been cancelled. However, I think this is probably a code smell. Like all code smells, there are probably cases where it's the right thing to do, but when you see it you should stop and think carefully. If you're writing code like this, then it means that there are multiple different layers in your code that are implementing timeout policies, that might end up fighting with each other. What if the caller really needs this to finish in 15 seconds? So if you have some way to move the timeout handling into the same layer, then I suspect that will make your program easier to understand and maintain. OTOH, if you decide you want it, the code above works :-). I'm not 100% sure here; I'd definitely be interested to hear about more use cases. One thing I've thought about that might help is adding a kind of "soft cancelled" state to the cancel scopes, inspired by the "graceful shutdown" mode that you'll often see in servers where you stop accepting new connections, then try to finish up old ones (with some time limit). So in this case you might mark 'do_some_stuff()' as being cancelled immediately when we entered the 'soft cancel' phase, but let the 'do_cleanup' code keep running until the grace period expired and the region was hard-cancelled. This idea isn't fully baked yet though. (There's some more mumbling about this at https://github.com/python-trio/trio/issues/147.) -n -- Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
On Thu, Jan 11, 2018 at 7:49 PM, Dima Tisnekwrote: > Very nice read, Nathaniel. > > The post left me wondering how cancel tokens interact or should > logically interact with async composition, for example: > > with move_on_after(10): > await someio.gather(a(), b(), c()) > > or > > with move_on_after(10): > await someio.first/race(a(), b(), c()) > > or > > dataset = someio.Future(large_download(), move_on_after=) > > task a: > with move_on_after(10): > use((await dataset)["a"]) > > task b: > with move_on_after(10): > use((await dataset)["b"]) It's funny you say "async composition"... Trio's concurrency primitive (nurseries) is closely related to the core concurrency primitive in Communicating Sequential Processes, which they call "parallel composition". (Basically, if P and Q are processes, then "P || Q" is the process that runs both P and Q in parallel and then finishes when they've both finished.) If you were using that as your primitive, then tasks would form an orderly tree and this wouldn't be a problem :-). Given asyncio's actual primitives though, then yeah, this is clearly the big question, and I doubt there are any simple answers; so far my ambition has just been to articulate the problem well enough to start that conversation (see also the "asyncio" section in the blog post). One possibility might be a hybrid cancel token / cancel scope API: create a first class cancel token API like C# has, enhance make the low-level asyncio APIs to use them, and then on top of that add mechanisms to attach a stack of implicitly-applied cancel tokens to each task? That's just a vague handwave of an idea so far though. Note that last case is the one where asyncio cancellation semantics are already... well, surprising, anyway. If you cancel task a then task b will receive a CancelledError, even though task a was not cancelled. (I talked about this a bit in my "Some thoughts ..." blog post; search for "spooky-cancellation-at-a-distance.py".) -n -- Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Re: [Async-sig] Blog post: Timeouts and cancellation for humans
Thanks, Nathaniel. Very instructive, thought-provoking write-up! One thing occurred to me around the time of reading this passage: > "Once the cancel token is triggered, then all future operations on that token > are cancelled, so the call to ws.close doesn't get stuck. It's a less > error-prone paradigm. ... If you follow the path we did in this blog post, > and start by thinking about applying a timeout to a complex operation > composed out of multiple blocking calls, then it's obvious that if the first > call uses up the whole timeout budget, then any future calls should fail > immediately." One case that's not clear how should be addressed is the following. It's something I've wrestled with in the context of asyncio, and it doesn't seem to be raised as a possibility in your write-up. Say you have a complex operation that you want to be able to timeout or cancel, but the process of cleanup / cancelling might also require a certain amount of time that you'd want to allow time for (likely a smaller time in normal circumstances). Then it seems like you'd want to be able to allocate a separate timeout for the clean-up portion (independent of the timeout allotted for the original operation). It's not clear to me how this case would best be handled with the primitives you described. In your text above ("then any future calls should fail immediately"), without any changes, it seems there wouldn't be "time" for any clean-up to complete. With asyncio, one way to handle this is to await on a task with a smaller timeout after calling task.cancel(). That lets you assign a different timeout to waiting for cancellation to complete. --Chris On Thu, Jan 11, 2018 at 2:09 AM, Nathaniel Smithwrote: > Hi all, > > Folks here might be interested in this new blog post: > > https://vorpus.org/blog/timeouts-and-cancellation-for-humans/ > > It's a detailed discussion of pitfalls and design-tradeoffs in APIs > for timeout and cancellation, and has a proposal for handling them in > a more Pythonic way. Any feedback welcome! > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > ___ > Async-sig mailing list > Async-sig@python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
[Async-sig] Blog post: Timeouts and cancellation for humans
Hi all, Folks here might be interested in this new blog post: https://vorpus.org/blog/timeouts-and-cancellation-for-humans/ It's a detailed discussion of pitfalls and design-tradeoffs in APIs for timeout and cancellation, and has a proposal for handling them in a more Pythonic way. Any feedback welcome! -n -- Nathaniel J. Smith -- https://vorpus.org ___ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/