Re: [swift-evolution] Pitch: [stdlib] Error recovery hook, PR #12025

Joe Groff via swift-evolution Thu, 21 Sep 2017 10:23:10 -0700


> On Sep 21, 2017, at 12:14 AM, John Holdsworth via swift-evolution 
> <[email protected]> wrote:
> 
> Hi S/E,
> 
> I’ve raised a rather speculative PR suggesting a hook be added to stdlib
> that would allow users to experiment with error recovery in their apps.
> I’ve been asked to put it forward on Swift Evolution to gather opinions
> from the wider community about such a design.
> 
> https://github.com/apple/swift/pull/12025 
> <https://github.com/apple/swift/pull/12025>
> 
> Ultimately, it comes down to being able to do something like this:
> 
>             do {
>                 try Fortify.exec {
>                     var a: String!
>                     a = a!
>                 }
>             }
>             catch {
>                 NSLog("Caught exception: \(error)")
>             }
> 
> This was primarily intended for user in "Swift on the server" but could also
> help avoid crashes in the mobile domain. The rationale and mechanics
> are written up at the following url:
> 
> http://johnholdsworth.com/fortify.html 
> <http://johnholdsworth.com/fortify.html>
> 
> I'll accept this won’t be everybody’s cup of tea but at this stage this is
> only an opt-in facilitating patch. Developers need not subject their apps
> to this approach which requires a separate experimental implementation.
> 
> The recovery is reasonably well behaved except it will not recover 
> objects and system resources used in intermediate frames, It’s not
> as random as something like, for example, trying to cancel a thread.
> 
> The debate about whether apps should try to soldier on when something
> is clearly amiss is a stylistic one about which there will be a spectrum of
> opinions. The arguments weigh more in favour in the server domain.


Thanks for raising this topic! Graceful partial recovery is important and 
useful, and although we've tended to invoke "actors" as the vague savior that 
will answer all the questions in this space, I think we can provide useful 
functionality with a smaller scope that won't interfere with future directions. 
At a language level, questions to answer include:

- What can we guarantee about the process state after a trap?
- What does the interface for setting up a trap handler look like?

Instead of thinking of a trap as completely invalidating and ending the program 
like we do today, we can think of it as deadlocking the current execution 
context (setting aside for a moment the question of what "execution context" 
means), as if it got stuck in an infinite loop. As you noted, this means we 
can't reclaim any memory, locks, or other resources currently being held by the 
trapped context, but other contexts can continue executing. In fantasy actor 
land, the definition of "execution context" would ideally be "current actor"; 
in the world today, we have a few choices. We could say that a trap takes down 
the current thread, though that might be a bit too much for single-threaded or 
workqueue-based architectures. Another alternative is to delimit the scope 
affected by a trap with a setjmp/longjmp-like mechanism, sort of like what you 
have, though that then requires care to ensure that state "above" the trap line 
isn't entangled with the invalidated state "below" the trap line.

That leads into the question of what the interface for handling a trap should 
look like. Personally, I don't think trying to turn fatal errors into 
exceptions is the right answer, since that makes it way too easy to do the 
kinds of harmful things people do with Java runtime errors, SEH, etc. to 
swallow and ignore serious problems. I think it'd be better to have an 
interface that's clearly tuned toward supervisory reaction to unexpected 
failure, rather than one for routine handling of expected errors.  It also 
potentially creates safety problems for the ownership model. It's tempting to 
think of the block passed to your `Fortify.exec` as nonescaping, but that's 
problematic with inouts:

        var x: Int
        do {
                try execWhileTurningTrapsIntoErrors {
                        foo(&x)
                }
        } catch {
                print(x)
        }

        func foo(x: inout Int) { fatalError() }

The compiler will reason that x is statically exclusively held only during the 
call to `foo`, but that's not really the case—foo trapped and deadlocked in the 
middle of the access, and we essentially left it hanging and went and ran our 
catch handler with the inout access still active.

If we were to say that a trap takes down the current thread, then we could have 
a signal-like interface for installing a handler that's the last thing to run 
on the thread before taking it down, like this:

func ifTrapOccursOnCurrentThread(_ do: @escaping () -> ())

ifTrapOccursOnCurrentThread {
  supervisor.notifyAboutTrap(on: pthread_self())
}
doStuff()

A scoped handler could still be made to work, with an interface something like 
this:

func run(_ body: @escaping () -> (), withTrapHandler: () -> ())

run({
  doStuff()
}, withTrapHandler: {
  supervisor.notifyAboutTrap(on: pthread_self())
})

The `@escaping` annotation on the body would prevent the compiler from making 
invalid static assumptions about lifetimes in the body that would be violated 
if it traps. IMO, the handler block also shouldn't receive any information 
about the trap other than that one happened—the runtime ought to handle logging 
the reason for a trap, and anything you do to plan your shutdown or continue 
running unrelated subtasks should have no other business knowing why the trap 
occurred.

At an implementation level, enabling trap handling would also require us to 
standardize on an ABI for handlable runtime errors. We currently don't have a 
standard mechanism here. Failures don't necessarily funnel through any fixed 
set of runtime entry points; the compiler also directly generates @llvm.trap() 
calls, and LLVM doesn't make any guarantee about how llvm.trap is implemented. 
It's also an open question whether things like C's abort(), null pointer 
dereferences, segfaults, etc. should be treated as runtime failures that can be 
handled by this mechanism.

-Joe

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Pitch: [stdlib] Error recovery hook, PR #12025

Reply via email to