Re: Feasibility of native RTS support for continuations?

2020-03-03 Thread Alexis King
As a small update on this for anyone following along, I submitted a GHC 
proposal about a week ago to add the discussed primops (albeit with some 
tweaked names). For those who haven’t seen it already, the pull request is here:

https://github.com/ghc-proposals/ghc-proposals/pull/313

So far, the reception has been quite positive, so I’m optimistic about getting 
these added. Of course, if anyone has any concerns, please voice them in the PR 
thread!

Thanks,
Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-12 Thread Alexis King
> On Feb 10, 2020, at 02:18, Simon Marlow  wrote:
> 
>> On Mon, 10 Feb 2020 at 08:17, Simon Marlow  wrote:
>> 
>> Let me just say "unsafePerformIO" :)  You probably want to at least ensure 
>> that things don't crash in that case, even if you can't give a sensible 
>> semantics to what actually happens. We have a similar situation with 
>> unsafeIOToST - we can't tell you exactly what it does in general, except 
>> that it doesn't crash (I hope!).
> 
> Typo - I meant unsafeIOToSTM here.

I’ve been thinking about this. At first I figured you were probably right, and 
I decided I’d switch to a more raiseAsync-like approach. But once I started 
trying to implement it, I became less convinced it makes sense.

As its name implies, an AP_STACK is sort of like a saturated application, where 
the stack itself is the “function.” Extending the analogy, a continuation would 
be a PAP_STACK, since the stack is awaiting a value. This difference is 
significant. Suppose you write:

let x = 1 + unsafePerformIO (shift f >>= g) in ...

If you force x, the stack will unwind to the nearest reset. When you unwind 
past x’s UPDATE_FRAME, you can’t replace the blackhole with an AP_STACK, since 
the captured slice of the stack represents the expression

\m -> 1 + unsafePerformIO (m >>= g)

which is a function, not a thunk. The only logical interpretation of this 
situation is that x is a thunk quite like

let y = 1 + error "bang"

except that it doesn’t just abort to the enclosing reset frame when forced, it 
actually composes the captured continuation with the upper frames of the 
current stack and passes the composed continuation to f to resume computation. 
That’s an awful lot of trouble given the resulting semantics is going to be 
unpredictable, anyway!

I should be clear that I do not intend shift/reset to have any safe interface 
in IO directly. Even if you could make it type safe, it would break all kinds 
of code that manages resources using bracket. Rather, I have built a library 
that defines a totally separate Eff monad, and that monad uses the primops 
directly, wrapped in a safe interface. It isn’t possible for a user to screw 
things up using unsafePerformIO because there is no way to call shift from IO 
(unless you wrap shift# in IO yourself, but then I think you can be expected to 
know what you’re getting yourself into). So without mucking about with 
GHC.Exts, you still can’t get segfaults.

Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-10 Thread Simon Marlow
On Mon, 10 Feb 2020 at 08:17, Simon Marlow  wrote:

> On Mon, 10 Feb 2020 at 08:10, Alexis King  wrote:
>
>> On Feb 6, 2020, at 02:28, Simon Marlow  wrote:
>>
>> The issue here is that raiseAsync is destructive - it *moves* the stack
>> to the heap, rather than copying it. So if you want to continue execution
>> where you left off, for shift#, you would have to copy it back onto the
>> stack again. That's the point I was trying to highlight here.
>>
>>
>> Ah, yes, I see what you mean! It happens that for my use case I actually
>> do want to unwind the stack when I capture a continuation, so that isn’t a
>> problem for me.
>>
>> Yes, these are all the things that make raiseAsync tricky! You can either
>> copy what raiseAsync does (but be warned, it has taken a lot of iteration
>> to get right) or try to use raiseAsync and/or modify it to do what you want.
>>
>>
>> My point was more that I’m unsure that shift# *should* handle most of
>> those cases. For raiseAsync, it makes sense, since asynchronous interrupts
>> can, by their nature, occur at any time, even during pure code. But my
>> shift# operation lives in IO, and the intent is to only capture up to a
>> reset# in the same state thread.
>>
>> My justification for this is that if you could use shift# in pure code,
>> it would be ill-defined what you’d even be capturing. Suppose you return a
>> thunk containing a call to shift#. When the thunk is evaluated, you capture
>> up to the nearest reset#… but who knows what that is now? This opens you up
>> to all sorts of general badness.
>>
>> Therefore, I don’t think there should ever be an UPDATE_FRAME in the
>> captured continuation—if there is, it’s probably a bug. So unless someone
>> can think of any valid use cases, I’ll make that more explicit by modifying
>> the continuation-capturing code to add some assertions that those frames
>> never appear in the captured stack.
>>
>
> Let me just say "unsafePerformIO" :)  You probably want to at least ensure
> that things don't crash in that case, even if you can't give a sensible
> semantics to what actually happens. We have a similar situation with
> unsafeIOToST - we can't tell you exactly what it does in general, except
> that it doesn't crash (I hope!).
>

Typo - I meant unsafeIOToSTM here.


>
> Cheers
> Simon
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-10 Thread Simon Marlow
On Mon, 10 Feb 2020 at 08:10, Alexis King  wrote:

> On Feb 6, 2020, at 02:28, Simon Marlow  wrote:
>
> The issue here is that raiseAsync is destructive - it *moves* the stack to
> the heap, rather than copying it. So if you want to continue execution
> where you left off, for shift#, you would have to copy it back onto the
> stack again. That's the point I was trying to highlight here.
>
>
> Ah, yes, I see what you mean! It happens that for my use case I actually
> do want to unwind the stack when I capture a continuation, so that isn’t a
> problem for me.
>
> Yes, these are all the things that make raiseAsync tricky! You can either
> copy what raiseAsync does (but be warned, it has taken a lot of iteration
> to get right) or try to use raiseAsync and/or modify it to do what you want.
>
>
> My point was more that I’m unsure that shift# *should* handle most of
> those cases. For raiseAsync, it makes sense, since asynchronous interrupts
> can, by their nature, occur at any time, even during pure code. But my
> shift# operation lives in IO, and the intent is to only capture up to a
> reset# in the same state thread.
>
> My justification for this is that if you could use shift# in pure code, it
> would be ill-defined what you’d even be capturing. Suppose you return a
> thunk containing a call to shift#. When the thunk is evaluated, you capture
> up to the nearest reset#… but who knows what that is now? This opens you up
> to all sorts of general badness.
>
> Therefore, I don’t think there should ever be an UPDATE_FRAME in the
> captured continuation—if there is, it’s probably a bug. So unless someone
> can think of any valid use cases, I’ll make that more explicit by modifying
> the continuation-capturing code to add some assertions that those frames
> never appear in the captured stack.
>

Let me just say "unsafePerformIO" :)  You probably want to at least ensure
that things don't crash in that case, even if you can't give a sensible
semantics to what actually happens. We have a similar situation with
unsafeIOToST - we can't tell you exactly what it does in general, except
that it doesn't crash (I hope!).

Cheers
Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-10 Thread Alexis King
> On Feb 6, 2020, at 02:28, Simon Marlow  wrote:
> 
> The issue here is that raiseAsync is destructive - it *moves* the stack to 
> the heap, rather than copying it. So if you want to continue execution where 
> you left off, for shift#, you would have to copy it back onto the stack 
> again. That's the point I was trying to highlight here.

Ah, yes, I see what you mean! It happens that for my use case I actually do 
want to unwind the stack when I capture a continuation, so that isn’t a problem 
for me.

> Yes, these are all the things that make raiseAsync tricky! You can either 
> copy what raiseAsync does (but be warned, it has taken a lot of iteration to 
> get right) or try to use raiseAsync and/or modify it to do what you want.

My point was more that I’m unsure that shift# should handle most of those 
cases. For raiseAsync, it makes sense, since asynchronous interrupts can, by 
their nature, occur at any time, even during pure code. But my shift# operation 
lives in IO, and the intent is to only capture up to a reset# in the same state 
thread.

My justification for this is that if you could use shift# in pure code, it 
would be ill-defined what you’d even be capturing. Suppose you return a thunk 
containing a call to shift#. When the thunk is evaluated, you capture up to the 
nearest reset#… but who knows what that is now? This opens you up to all sorts 
of general badness.

Therefore, I don’t think there should ever be an UPDATE_FRAME in the captured 
continuation—if there is, it’s probably a bug. So unless someone can think of 
any valid use cases, I’ll make that more explicit by modifying the 
continuation-capturing code to add some assertions that those frames never 
appear in the captured stack.___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-06 Thread Simon Marlow
On Sun, 2 Feb 2020 at 04:26, Alexis King  wrote:

> I took a stab at implementing this today, using the “continuation is a
> stack” implementation strategy I described in my previous email. I
> haven’t tried very hard to break it yet, but this tiny test program
> works:
>
> {-# LANGUAGE BangPatterns, BlockArguments, MagicHash,
>  ScopedTypeVariables, UnboxedTuples #-}
>
> import GHC.Prim
> import GHC.Types
>
> data Continuation a b = Continuation# (Continuation# RealWorld a b)
>
> reset :: IO a -> IO a
> reset (IO m) = IO (reset# m)
>
> shift :: (Continuation a b -> IO b) -> IO a
> shift f = IO (shift# \k -> let !(IO m) = f (Continuation# k) in m)
>
> applyContinuation :: Continuation a b -> a -> IO b
> applyContinuation (Continuation# k) a = IO (applyContinuation# k a)
>
> main :: IO ()
> main = do
>   ns <- reset do
> n <- shift \(k :: Continuation Integer [Integer]) -> do
>   a <- applyContinuation k 2
>   b <- applyContinuation k 3
>   pure (a ++ b)
> pure [n]
>   print ns
>
> The output of this program is [2, 3], as expected.
>

That's impressive!


>
> My implementation doesn’t share any code with raiseAsync. Currently, it
> isn’t very clever:
>
> * reset# pushes a RET_SMALL frame with a well-known info pointer,
>   _reset_frame_info.
>
> * shift# walks the stack and copies it up to the nearest reset
>   frame. If the stack consists of several chunks, it only copies the
>   chunk that contains the reset frame, and it just repurposes the
>   other chunks as the continuation (since the stack is unwinding
>   anyway).
>
> * applyContinuation# copies the captured stack and updates the
>   UNDERFLOW frames as needed to point to the current stack.
>
> * I haven’t implemented it yet, but it would be straightforward to
>   implement an applyContinuationOneShot# operation that works like
>   applyContinuation#, but doesn’t actually copy anything and just
>   updates the UNDERFLOW frames in the captured stack itself.
>
> This seems to work in my very simple examples, but there are also things
> I know it doesn’t handle properly:
>
> * It doesn’t make any attempt to handle modifications to the
>   interrupt masking state properly. The right thing to do here is
>   probably to look for mask/unmask frames on the stack during
>   unwinding and to stash that information somewhere.
>
> * It doesn’t do anything special for UPDATE_FRAMEs, so if there’s an
>   UPDATE_FRAME that owns a blackhole on the stack, things will go
>   quite wrong.
>
>   I haven’t been worrying about this because I don’t think there
>   should ever be any update frames between shift# and reset#. In the
>   case of raiseAsync, the location of the “prompt” is well-defined:
>   it’s the update frame. But shift# captures up to an explicit
>   prompt, so using shift# when there’s an update frame on the stack
>   can surely only lead to nonsense... right?
>
> * It doesn’t do anything special for STM frames, so trying to
>   capture a continuation through those will be similarly broken.
>

Yes, these are all the things that make raiseAsync tricky! You can either
copy what raiseAsync does (but be warned, it has taken a lot of iteration
to get right) or try to use raiseAsync and/or modify it to do what you want.

Cheers
Simon



> There are also probably bugs I don’t know about — I haven’t exercised
> the implementation very hard yet — but I’ll keep playing with it. If
> anyone is at all interested, I’ve pushed the code to a branch here:
>
>
> https://gitlab.haskell.org/lexi.lambda/ghc/compare/master...first-class-continuations
>
> My thanks again to everyone’s help!
>
> Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-02-06 Thread Simon Marlow
On Sat, 1 Feb 2020 at 00:23, Alexis King  wrote:

> > On Jan 30, 2020, at 02:35, Simon Marlow  wrote:
>
> > Also you might want to optimise the implementation so that it doesn't
> actually tear down the stack as it copies it into the heap, so that you
> could avoid the need to copy it back from the heap again in shift#.
>
> I don’t fully understand this point — do you mean “avoid the need to copy
> it back on continuation application”? shift# just copies the stack slice to
> the heap, so it doesn’t need to copy it back.
>

The issue here is that raiseAsync is destructive - it *moves* the stack to
the heap, rather than copying it. So if you want to continue execution
where you left off, for shift#, you would have to copy it back onto the
stack again. That's the point I was trying to highlight here.

 Cheers
Simon


> If I was right, and you were referring to continuation application rather
> than shift#, I agree with you there. It seems as though it ought to be
> possible to represent the stack slice as a stack itself, so if the stack
> looks like
>
> ┌───┐
> │ RET_SMALL │
> ├───┤
> │ CATCH │
> ├───┤
> │ RESET │
> ├───┤
>
> then the captured continuation could itself essentially be a stack like
>
> ┌───┐
> │ RET_SMALL │
> ├───┤
> │ CATCH │
> ├───┤
> │ UNDERFLOW │
> └───┘
>
> where restoring a continuation just copies the captured stack and updates
> its underflow frame to point at the top of the current stack. If the caller
> promises not to use the continuation again after applying it, the copying
> could be skipped entirely, and the captured stack could just become the new
> stack.
>
> However, I don’t understand enough about the way the RTS currently works
> to know if this is viable. For example, what if I write this:
>
> reset (mask_ (shift f))
>
> Now presumably I want to ensure the masking state is restored when I
> invoke the continuation. But it can’t just be unconditionally restored,
> since if I write
>
> mask_ (reset (shift f >>= g))
>
> then the mask frame isn’t included on the stack, so applying the
> continuation shouldn’t affect the masking state. Presumably this means a
> continuation restore can’t be as simple as copying the captured stack
> frames onto the current stack, since restoring certain frames affects other
> parts of the RTS state.
>
> (Tangentially, it seems like this particular example is not handled
> properly in the existing capture/restore code, as this comment in
> Exception.cmm notes:
>
>  NB. there's a bug in here.  If a thread is inside an
>  unsafePerformIO, and inside maskAsyncExceptions# (there is an
>  unmaskAsyncExceptions_ret on the stack), and it is blocked in an
>  interruptible operation, and it receives an exception, then the
>  unsafePerformIO thunk will be updated with a stack object
>  containing the unmaskAsyncExceptions_ret frame.  Later, when
>  someone else evaluates this thunk, the original masking state is
>  not restored.
>
> I think getting this right probably matters more if continuations are
> added, so that’s something else to worry about.)
>
> > So that's shift#. What about reset#? I expect it's something like
> `unsafeInterleaveIO`, that is it creates a thunk to name the continuation.
> You probably also want a `catch` in there, so that we don't tear down more
> of the stack than we need to.
>
> It would be nice to be able to capture slices of the stack that include
> catch frames, though theoretically it isn’t necessary — it would be
> possible to arrange for a continuation that wants to capture through a
> catch to do something like
>
> reset (... (catch (reset ...) ...))
>
> instead, then call shift twice and compose the two continuations by hand
> (inserting another call to catch in between). I don’t know enough yet to
> understand all the tradeoffs involved, but I’ll see if it’s possible to get
> away with the userland emulation, since I figure the less code in the RTS
> the better!
>
> > Hope this is helpful.
>
> Very much so, thank you!
>
> > On Jan 30, 2020, at 10:31, Ben Gamari  wrote:
> >
> > For the record, runtime system captures the stack state in an AP_STACK
> > closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is
> > described in the comment attached to that function.
> >
> > As Simon PJ points out, this is all very tricky stuff, especially in a
> > concurrent context. If you make any changes in this area do be sure to
> > keep in mind the considerations described in Note [AP_STACKs must be
> > eagerly blackholed], which arose out of the very-nast #13615.
>
> Thank you for the pointers — I did manage to find raiseAsync, but I hadn’t
> seen that Note. I will try my best to be suitably wary, though I imagine
> I’m in far enough over my head that I don’t yet know half the things I need
> to be wary of. :)
>
> Alexis

Re: Feasibility of native RTS support for continuations?

2020-02-01 Thread Alexis King
I took a stab at implementing this today, using the “continuation is a
stack” implementation strategy I described in my previous email. I
haven’t tried very hard to break it yet, but this tiny test program
works:

{-# LANGUAGE BangPatterns, BlockArguments, MagicHash,
 ScopedTypeVariables, UnboxedTuples #-}

import GHC.Prim
import GHC.Types

data Continuation a b = Continuation# (Continuation# RealWorld a b)

reset :: IO a -> IO a
reset (IO m) = IO (reset# m)

shift :: (Continuation a b -> IO b) -> IO a
shift f = IO (shift# \k -> let !(IO m) = f (Continuation# k) in m)

applyContinuation :: Continuation a b -> a -> IO b
applyContinuation (Continuation# k) a = IO (applyContinuation# k a)

main :: IO ()
main = do
  ns <- reset do
n <- shift \(k :: Continuation Integer [Integer]) -> do
  a <- applyContinuation k 2
  b <- applyContinuation k 3
  pure (a ++ b)
pure [n]
  print ns

The output of this program is [2, 3], as expected.

My implementation doesn’t share any code with raiseAsync. Currently, it
isn’t very clever:

* reset# pushes a RET_SMALL frame with a well-known info pointer,
  _reset_frame_info.

* shift# walks the stack and copies it up to the nearest reset
  frame. If the stack consists of several chunks, it only copies the
  chunk that contains the reset frame, and it just repurposes the
  other chunks as the continuation (since the stack is unwinding
  anyway).

* applyContinuation# copies the captured stack and updates the
  UNDERFLOW frames as needed to point to the current stack.

* I haven’t implemented it yet, but it would be straightforward to
  implement an applyContinuationOneShot# operation that works like
  applyContinuation#, but doesn’t actually copy anything and just
  updates the UNDERFLOW frames in the captured stack itself.

This seems to work in my very simple examples, but there are also things
I know it doesn’t handle properly:

* It doesn’t make any attempt to handle modifications to the
  interrupt masking state properly. The right thing to do here is
  probably to look for mask/unmask frames on the stack during
  unwinding and to stash that information somewhere.

* It doesn’t do anything special for UPDATE_FRAMEs, so if there’s an
  UPDATE_FRAME that owns a blackhole on the stack, things will go
  quite wrong.

  I haven’t been worrying about this because I don’t think there
  should ever be any update frames between shift# and reset#. In the
  case of raiseAsync, the location of the “prompt” is well-defined:
  it’s the update frame. But shift# captures up to an explicit
  prompt, so using shift# when there’s an update frame on the stack
  can surely only lead to nonsense... right?

* It doesn’t do anything special for STM frames, so trying to
  capture a continuation through those will be similarly broken.

There are also probably bugs I don’t know about — I haven’t exercised
the implementation very hard yet — but I’ll keep playing with it. If
anyone is at all interested, I’ve pushed the code to a branch here:


https://gitlab.haskell.org/lexi.lambda/ghc/compare/master...first-class-continuations

My thanks again to everyone’s help!

Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-01-31 Thread Alexis King
> On Jan 30, 2020, at 02:35, Simon Marlow  wrote:
> 
> My guess is you can almost do what you want with asynchronous exceptions but 
> some changes to the RTS would be needed.
> 
> There's a bit of code in the IO library that literally looks like this 
> (https://gitlab.haskell.org/ghc/ghc/blob/master/libraries%2Fbase%2FGHC%2FIO%2FHandle%2FInternals.hs#L175)

Thanks for the pointer, I definitely had not discovered that! That is an 
interesting trick. I think your explanation paired with the Note is enough for 
it to make sense to me, though I don’t yet understand all the implementation 
subtleties of raiseAsync itself.

> Also you might want to optimise the implementation so that it doesn't 
> actually tear down the stack as it copies it into the heap, so that you could 
> avoid the need to copy it back from the heap again in shift#.

I don’t fully understand this point — do you mean “avoid the need to copy it 
back on continuation application”? shift# just copies the stack slice to the 
heap, so it doesn’t need to copy it back.

If I was right, and you were referring to continuation application rather than 
shift#, I agree with you there. It seems as though it ought to be possible to 
represent the stack slice as a stack itself, so if the stack looks like

┌───┐
│ RET_SMALL │
├───┤
│ CATCH │
├───┤
│ RESET │
├───┤

then the captured continuation could itself essentially be a stack like

┌───┐
│ RET_SMALL │
├───┤
│ CATCH │
├───┤
│ UNDERFLOW │
└───┘

where restoring a continuation just copies the captured stack and updates its 
underflow frame to point at the top of the current stack. If the caller 
promises not to use the continuation again after applying it, the copying could 
be skipped entirely, and the captured stack could just become the new stack.

However, I don’t understand enough about the way the RTS currently works to 
know if this is viable. For example, what if I write this:

reset (mask_ (shift f))

Now presumably I want to ensure the masking state is restored when I invoke the 
continuation. But it can’t just be unconditionally restored, since if I write

mask_ (reset (shift f >>= g))

then the mask frame isn’t included on the stack, so applying the continuation 
shouldn’t affect the masking state. Presumably this means a continuation 
restore can’t be as simple as copying the captured stack frames onto the 
current stack, since restoring certain frames affects other parts of the RTS 
state.

(Tangentially, it seems like this particular example is not handled properly in 
the existing capture/restore code, as this comment in Exception.cmm notes:

 NB. there's a bug in here.  If a thread is inside an
 unsafePerformIO, and inside maskAsyncExceptions# (there is an
 unmaskAsyncExceptions_ret on the stack), and it is blocked in an
 interruptible operation, and it receives an exception, then the
 unsafePerformIO thunk will be updated with a stack object
 containing the unmaskAsyncExceptions_ret frame.  Later, when
 someone else evaluates this thunk, the original masking state is
 not restored.

I think getting this right probably matters more if continuations are added, so 
that’s something else to worry about.)

> So that's shift#. What about reset#? I expect it's something like 
> `unsafeInterleaveIO`, that is it creates a thunk to name the continuation. 
> You probably also want a `catch` in there, so that we don't tear down more of 
> the stack than we need to.

It would be nice to be able to capture slices of the stack that include catch 
frames, though theoretically it isn’t necessary — it would be possible to 
arrange for a continuation that wants to capture through a catch to do 
something like

reset (... (catch (reset ...) ...))

instead, then call shift twice and compose the two continuations by hand 
(inserting another call to catch in between). I don’t know enough yet to 
understand all the tradeoffs involved, but I’ll see if it’s possible to get 
away with the userland emulation, since I figure the less code in the RTS the 
better!

> Hope this is helpful.

Very much so, thank you!

> On Jan 30, 2020, at 10:31, Ben Gamari  wrote:
> 
> For the record, runtime system captures the stack state in an AP_STACK
> closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is
> described in the comment attached to that function.
> 
> As Simon PJ points out, this is all very tricky stuff, especially in a
> concurrent context. If you make any changes in this area do be sure to
> keep in mind the considerations described in Note [AP_STACKs must be
> eagerly blackholed], which arose out of the very-nast #13615.

Thank you for the pointers — I did manage to find raiseAsync, but I hadn’t seen 
that Note. I will try my best to be suitably wary, though I imagine I’m in far 
enough over my head that I don’t yet know 

RE: Feasibility of native RTS support for continuations?

2020-01-30 Thread Ben Gamari
Simon Peyton Jones via ghc-devs  writes:

> |  Now that is very interesting, and certainly not something I would have
> |  expected! Why would asynchronous exceptions need to capture any portion of
> |  the stack? Exceptions obviously trigger stack unwinding, so I assumed the
> |  “abort to the current prompt” part of my implementation would already
> |  exist, but not the “capture a slice of the stack” part. Could you say a
> |  little more about this, or point me to some relevant code?
>
> Suppose a thread happens to be evaluating a pure thunk for (factorial 200). 
> Then it gets an asynchronous exception from another thread.  That asynch exn 
> is nothing to do with (factorial 200). So we could either 
>
> A. revert the thunk to (factorial 200), abandoning all
>the work done so far, or
> B. capture the stack and attach it to the thunk, so that ie any other
>thread enters that thunk, it'll just run that stack.
>
> Now (A) means that every thunk has to be revertible, which means keeping its 
> original free variables, which leads to space leaks.  And extra work to avoid 
> losing any info you need for reversion.  Extra work is painful; we want to 
> put all of the extra work on the asynch exn.
>
> So we do (B).
>
> See Section 8 of "Asynchronous exceptions in Haskell".
> https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/
>
> And "An implementation of resumable black holes" (Reid).
> https://alastairreid.github.io/papers/IFL_98/
>
> This stack-freezing stuff is definitely implemented. I'm not quite
> sure where, but I'm cc'ing Simon Marlow who can point you at it.
>
For the record, runtime system captures the stack state in an AP_STACK
closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is
described in the comment attached to that function.

As Simon PJ points out, this is all very tricky stuff, especially in a
concurrent context. If you make any changes in this area do be sure to
keep in mind the considerations described in Note [AP_STACKs must be
eagerly blackholed], which arose out of the very-nast #13615.

Cheers and good luck!

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-01-30 Thread Simon Marlow
My guess is you can almost do what you want with asynchronous exceptions
but some changes to the RTS would be needed.

There's a bit of code in the IO library that literally looks like this (
https://gitlab.haskell.org/ghc/ghc/blob/master/libraries%2Fbase%2FGHC%2FIO%2FHandle%2FInternals.hs#L175
):

t <- myThreadId
throwTo t e
... carry on ...

that is, it throws an exception to the current thread using throwTo, and
then there is code to handle what happens if the enclosing thunk is
evaluated after the exception has been thrown.

That is, throwing an exception to the current thread is an IO operation
that returns later! This only works with throwTo, not with throwIO, because
throwIO is a *synchronous* exception that destructively tears down the
stack.

I suppose if you want to pass a value to the thread after resumption you
could do it via an IORef.

But the issue with this is that you can only apply the continuation once:
GHC treats the captured continuation like a thunk, which means that after
evaluating it, it will be updated with its value. But for your purposes you
need to be able to apply it at least twice - once because we want to
continue after shift#, and again when we apply the continuation later.
Somehow the thunks we build this way would need to be marked non-updatable.
Perhaps this could be done with a new primitive `throwToNonUpdatable`
(hopefully with a better name) that creates non-updatable thunks. Also you
might want to optimise the implementation so that it doesn't actually tear
down the stack as it copies it into the heap, so that you could avoid the
need to copy it back from the heap again in shift#.

So that's shift#. What about reset#? I expect it's something like
`unsafeInterleaveIO`, that is it creates a thunk to name the continuation.
You probably also want a `catch` in there, so that we don't tear down more
of the stack than we need to.

Hope this is helpful.

Cheers
Simon


On Thu, 30 Jan 2020 at 00:55, Alexis King  wrote:

> > On Jan 29, 2020, at 03:32, Simon Peyton Jones 
> wrote:
> >
> > Suppose a thread happens to be evaluating a pure thunk for (factorial
> 200). […] This stack-freezing stuff is definitely implemented.
>
> That’s fascinating! I had no idea, but your explanation makes sense (as do
> the papers you linked). That is definitely promising, as it seems like many
> of the tricky cases may already be accounted for? I’ll see if I can follow
> the Cmm code well enough to hunt down how it’s implemented.
>
> One other thing I have been thinking about: this is completely
> incompatible with the state hack, isn’t it? That is not a showstopper, of
> course—I do not intend to suggest that continuations be capturable in
> ordinary IO—but it does mean I probably want a way to selectively opt out.
> (But I’ll worry about that if I ever get that far.)
>
> Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Feasibility of native RTS support for continuations?

2020-01-29 Thread Alexis King
> On Jan 29, 2020, at 03:32, Simon Peyton Jones  wrote:
> 
> Suppose a thread happens to be evaluating a pure thunk for (factorial 200). 
> […] This stack-freezing stuff is definitely implemented.

That’s fascinating! I had no idea, but your explanation makes sense (as do the 
papers you linked). That is definitely promising, as it seems like many of the 
tricky cases may already be accounted for? I’ll see if I can follow the Cmm 
code well enough to hunt down how it’s implemented.

One other thing I have been thinking about: this is completely incompatible 
with the state hack, isn’t it? That is not a showstopper, of course—I do not 
intend to suggest that continuations be capturable in ordinary IO—but it does 
mean I probably want a way to selectively opt out. (But I’ll worry about that 
if I ever get that far.)

Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Feasibility of native RTS support for continuations?

2020-01-29 Thread Simon Peyton Jones via ghc-devs
|  Now that is very interesting, and certainly not something I would have
|  expected! Why would asynchronous exceptions need to capture any portion of
|  the stack? Exceptions obviously trigger stack unwinding, so I assumed the
|  “abort to the current prompt” part of my implementation would already
|  exist, but not the “capture a slice of the stack” part. Could you say a
|  little more about this, or point me to some relevant code?

Suppose a thread happens to be evaluating a pure thunk for (factorial 200). 
Then it gets an asynchronous exception from another thread.  That asynch exn is 
nothing to do with (factorial 200). So we could either 

A. revert the thunk to (factorial 200), abandoning all
   the work done so far, or
B. capture the stack and attach it to the thunk, so that ie any other
   thread enters that thunk, it'll just run that stack.

Now (A) means that every thunk has to be revertible, which means keeping its 
original free variables, which leads to space leaks.  And extra work to avoid 
losing any info you need for reversion.  Extra work is painful; we want to put 
all of the extra work on the asynch exn.

So we do (B).

See Section 8 of "Asynchronous exceptions in Haskell".
https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/

And "An implementation of resumable black holes" (Reid).
https://alastairreid.github.io/papers/IFL_98/

This stack-freezing stuff is definitely implemented.   I'm not quite sure 
where, but I'm cc'ing Simon Marlow who can point you at it.



You need to be careful.  Suppose a thread pushes a prompt, then later evaluates 
a thunk T1, which in turn evaluates a thunk T2.  If you capture the stack down 
to the prompt, you MUST overwrite T1 and T2 with a resumable continuation 
capturing their portion of the stack, in case some other, unrelated thread 
needs their value.

But as I say, all this is implemented.

---

Keep us posted.  It'd be good to have a design that accommodated some of the 
applications in the 'composable scheduler activations' paper too.

Simon


|  -Original Message-
|  From: Alexis King 
|  Sent: 28 January 2020 22:19
|  To: Simon Peyton Jones 
|  Cc: ghc-devs 
|  Subject: Re: Feasibility of native RTS support for continuations?
|  
|  > On Jan 28, 2020, at 04:09, Simon Peyton Jones 
|  wrote:
|  >
|  > I've thought about this quite a bit in the past, but got stalled for
|  lack of cycles to think about it more.  But there's a paper or two
|  
|  Many thanks! I’d stumbled upon the 2007 paper, but I hadn’t seen the 2016
|  one. In the case of the former, I had thought it probably wasn’t
|  enormously relevant, since the “continuations” appear to be fundamentally
|  one-shot. At first glance, that doesn’t seem to have changed in the JFP
|  article, but I haven’t really read it yet, so maybe I’m mistaken. I’ll
|  take a closer look.
|  
|  > On the effects front I think Daan Leijen is doing interesting stuff,
|  although I'm not very up to date:
|  >
|  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.micr
|  osoft.com%2Fen-
|  us%2Fresearch%2Fpeople%2Fdaan%2Fpublications%2Fdata=02%7C01%7Csimonpj
|  %40microsoft.com%7C1f6ac242e0334d662c8c08d7a4401d95%7C72f988bf86f141af91ab
|  2d7cd011db47%7C1%7C0%7C637158467589648051sdata=%2BPgJblk6y%2BjXRc5bA0
|  gdzQEjrqgAQB6UYytdw7UtLQQ%3Dreserved=0
|  
|  Indeed, I’ve read a handful of his papers while working on this! I didn’t
|  mention it in the original email, but I’ve also talked a little with
|  Matthew Flatt about efficient implementation of delimited control, and he
|  pointed me to a few papers a couple of months ago. One of those was “Final
|  Shift for call/cc: a Direct Implementation of Shift and Reset” by
|  Gasbichler and Sperber, which describes an approach closest to what I
|  currently have in mind to try to implement in the RTS.
|  
|  > One interesting dimension is whether or not the continuations you
|  capture are one-shot.  If so, particularly efficient implementations are
|  possible.
|  
|  Quite so. One thing I’ve considered is that it’s possible to obtain much
|  of that efficiency even without requiring strict one-shot continuations if
|  you have a separate operation for restoring a continuation that guarantees
|  you won’t ever restore it again, sort of like the existing
|  unsafeThaw/unsafeFreeze operations. That is, you can essentially convert a
|  multi-shot continuation into a one-shot continuation and reap performance
|  benefits, even if you’ve already applied the continuation.
|  
|  This is a micro-optimization, though, so I’m not worrying too much about
|  it right now.
|  
|  > Also: much of the "capture stack chunk" stuff is *already* implemented,
|  because it is (I think) what happens when a thread receives an
|  asynchronous exception, and just abandon its evaluation of thunks that it
|  has started work on.
|  
|  Now that is ve

Re: Feasibility of native RTS support for continuations?

2020-01-28 Thread Alexis King
> On Jan 28, 2020, at 04:09, Simon Peyton Jones  wrote:
> 
> I've thought about this quite a bit in the past, but got stalled for lack of 
> cycles to think about it more.  But there's a paper or two

Many thanks! I’d stumbled upon the 2007 paper, but I hadn’t seen the 2016 one. 
In the case of the former, I had thought it probably wasn’t enormously 
relevant, since the “continuations” appear to be fundamentally one-shot. At 
first glance, that doesn’t seem to have changed in the JFP article, but I 
haven’t really read it yet, so maybe I’m mistaken. I’ll take a closer look.

> On the effects front I think Daan Leijen is doing interesting stuff, although 
> I'm not very up to date:
> https://www.microsoft.com/en-us/research/people/daan/publications/

Indeed, I’ve read a handful of his papers while working on this! I didn’t 
mention it in the original email, but I’ve also talked a little with Matthew 
Flatt about efficient implementation of delimited control, and he pointed me to 
a few papers a couple of months ago. One of those was “Final Shift for call/cc: 
a Direct Implementation of Shift and Reset” by Gasbichler and Sperber, which 
describes an approach closest to what I currently have in mind to try to 
implement in the RTS.

> One interesting dimension is whether or not the continuations you capture are 
> one-shot.  If so, particularly efficient implementations are possible.

Quite so. One thing I’ve considered is that it’s possible to obtain much of 
that efficiency even without requiring strict one-shot continuations if you 
have a separate operation for restoring a continuation that guarantees you 
won’t ever restore it again, sort of like the existing unsafeThaw/unsafeFreeze 
operations. That is, you can essentially convert a multi-shot continuation into 
a one-shot continuation and reap performance benefits, even if you’ve already 
applied the continuation.

This is a micro-optimization, though, so I’m not worrying too much about it 
right now.

> Also: much of the "capture stack chunk" stuff is *already* implemented, 
> because it is (I think) what happens when a thread receives an asynchronous 
> exception, and just abandon its evaluation of thunks that it has started work 
> on.

Now that is very interesting, and certainly not something I would have 
expected! Why would asynchronous exceptions need to capture any portion of the 
stack? Exceptions obviously trigger stack unwinding, so I assumed the “abort to 
the current prompt” part of my implementation would already exist, but not the 
“capture a slice of the stack” part. Could you say a little more about this, or 
point me to some relevant code?

Thanks again,
Alexis
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Feasibility of native RTS support for continuations?

2020-01-28 Thread Simon Peyton Jones via ghc-devs
Alexis

I've thought about this quite a bit in the past, but got stalled for lack of 
cycles to think about it more.  But there's a paper or two:

https://www.microsoft.com/en-us/research/publication/composable-scheduler-activations-haskell/

On that link you can also see a link to an earlier, shorter, conference version 
(rejected ).

Also this earlier (2007) work

https://www.microsoft.com/en-us/research/publication/lightweight-concurrency-primitives-for-ghc/


On the effects front I think Daan Leijen is doing interesting stuff, although 
I'm not very up to date:
https://www.microsoft.com/en-us/research/people/daan/publications/

One interesting dimension is whether or not the continuations you capture are 
one-shot.  If so, particularly efficient implementations are possible.

Also: much of the "capture stack chunk" stuff is *already* implemented, because 
it is (I think) what happens when a thread receives an asynchronous exception, 
and just abandon its evaluation of thunks that it has started work on.

Simon


|  -Original Message-
|  From: ghc-devs  On Behalf Of Alexis King
|  Sent: 28 January 2020 08:20
|  To: ghc-devs 
|  Subject: Feasibility of native RTS support for continuations?
|  
|  Hi all,
|  
|  tl;dr: I want to try to implement native support for capturing slices of
|  the RTS stack as a personal experiment; please tell me the obstacles I
|  am likely to run into. Much more context follows.
|  
|  ---
|  
|  I have been working on an implementation of an algebraic effect system
|  that uses unsafe primops to be as performant as possible. However, the
|  unavoidable need to CPS the entire program balloons heap allocation.
|  Consider the following definition:
|  
|  f a b = g a >>= \c -> h (b + c)
|  
|  Assume `g` and `h` are not inlined. If the monad used is IO, this will
|  be compiled efficiently: the result of `g` is returned on the stack, and
|  no closure needs to be allocated for the lambda. However, if the monad
|  supports capturing the continuation, the above definition must be CPS’d.
|  After inlining, we end up with
|  
|  f a b = \k -> let lvl = \c -> h (b + c) k in g a lvl
|  
|  which must allocate a closure on the heap. This is frustrating, as it
|  must happen for every call to a non-inlined monadic operation, even if
|  that operation never captures the continuation. In an algebraic effect
|  system, there are many shortcuts that avoid the need to capture the
|  continuation, and under my implementation, large swaths of code never do
|  so. I’ve managed to exploit that to get some savings, but I can’t escape
|  the need to allocate all these closures.
|  
|  This motivates my question: how difficult would it be to allow capturing
|  a portion of the RTS call stack directly? My requirements are fairly
|  minimal, as continuations go:
|  
|1. Capturing a continuation is only legal from within a strict state
|   thread (i.e. IO or strict ST).
|  
|2. The continuation is captured up to a prompt, which would be a new
|   kind of RTS stack frame. Prompts are not tagged, so there is only
|   ever exactly one prompt active at any time (which may be the root
|   prompt).
|  
|3. Capturing a continuation is unsafe. The behavior of capturing a
|   continuation is undefined if the current prompt was not created by
|   the current state thread (and it is never legal to capture up to
|   the root prompt).
|  
|4. Applying a continuation is unsafe. Captured continuations return
|   `Any`, and type safety is the caller’s obligation.
|  
|5. Continuations are “functional,” which is to say applying them does
|   not trigger any additional stack unwinding.
|  
|  This minimal support for primitive continuation capturing would be
|  enough to support my efficient, safe delimited control implementation.
|  In my ignorant mind, implementing this ought to be as simple as defining
|  two new primops,
|  
|  reset# :: (State# s -> (# State# s, a #))
| -> State# s -> (# State# s, a #)
|  
|  shift# :: ((a -> State# s -> (# State# s, Any #))
| -> State# s -> (# State# s, Any #))
| -> State# s -> (# State# s, a #)
|  
|  where reset# pushes a new prompt frame and shift# captures a slice of
|  the RTS stack up to that frame and copies it into the heap. Restoring a
|  continuation would copy all the captured frames onto the end of the
|  current stack. Sounds simple enough!
|  
|  I would like to experiment with implementing something like this myself,
|  just to see if it would really work, but somehow I doubt it is actually
|  as simple as it sounds. Minor complications are fine, but what worries
|  me are major obstacles I haven’t found in my limited attempts to learn
|  about the RTS.
|  
|  So far, I’ve read the old “The New GHC/Hugs Runtime System” paper, which
|  still seems mostly accurate from a high level, though I imagine many
|  details have changed since then. I’ve