Re: Re[2]: Transparently hooking into the STG stack to validate an escape analysis

2021-12-16 Thread Csaba Hruska
Thanks for the feedback!
Let's have a video meeting. My schedule is flexible. What time is best for
you?

Cheers,
Csaba

On Thu, Dec 16, 2021 at 5:29 PM Sebastian Graf  wrote:

> Hey Csaba,
>
> After catching up on your talk and reflecting about it a bit, it seems
> quite obvious that your tool is the right way to collect the data and
> validate our analysis!
> Even if meanwhile we decided that a "transparent stack frame" (which I
> believe is something similar to what you are doing here
> ,
> with an explicit `argCount` which we do not know) is not the ideal solution
> we've been looking for (for different reasons).
>
> Essentially, we have two things
>
>1. An escape analysis, implemented as an STG-to-STG pass that attaches
>a boolean flag to each Id whether it "escapes its scope" (for a suitable
>definition of that).
>We'd like to implement it in a way that it would be reusable within
>GHC with moderate effort (e.g., renaming Binder to Id or accounting for
>different fields), operating on a module at a time rather than whole
>program.
>2. The instrumentation that tries to measure how many heap objects
>could be allocated on the stack. E.g., set a closure-specific flag whenever
>the closure is entered, unset that bit (once) when we "leave the scope"
>that defines the closure.
>If my understanding is right, we could just implement this
>"instrumentation" as a simple extra field to Closure
>
> ,
>right? Neat!
>
> A bit tangential: I see that your interpreter currently allocates a fresh
> closure for let-no-escapes
> 
> when it could just re-use the closure of its defining scope. That would
> skew our numbers somewhat compared to instrumenting GHC-compiled programs,
> but I think we'd be able to work around that. I also wonder if the
> semantics of your let-no-escapes are actually as typically specified
> (operationally), in that a jump to a let-no-escape should also reset the
> stack pointer. It should hardly matter for the programs that GHC generates,
> though.
>
> I would also be interested in knowing whether the +RTS -s "bytes allocated
> in the heap" metric (very nearly) coincides with a similar metric you could
> produce. It would be fantastic if that was the case! Theoretically, that
> should be possible, right?
>
> I think your interpreter work is very valuable to collect data we
> otherwise would only be able to measure with a TickyTicky-based approach.
> Nice!
> Another, similar use case would be to identify the fraction of closures
> that are only entered once. I remember that there was a ticky-based patch
> with which Joachim used to measure this fraction
> 
> (and similarly validate the analysis results), but unfortunately it
> couldn't end up in master. Ah, yes, we have a whole open ticket about it:
> #10613 . In fact, that
> instrumentation is also somewhat similar (adding a field to every closure)
> as what we want to do.
>
> Anyway, it seems like your work will be very valuable in replacing some of
> the annoying ticky-based instrumentation ideas!
>
> Maybe we can have a call some time this or next week to discuss details,
> once Sebastian and I are more familiar with the code base?
>
> Thanks for sticking with the project and doing all the hard work that can
> build upon!
> Sebastian
>
> -- Originalnachricht --
> Von: "Csaba Hruska" 
> An: "Sebastian Graf" 
> Cc: "ghc-devs" ; "Sebastian Scheper" <
> sebastian.sche...@student.kit.edu>
> Gesendet: 15.12.2021 16:16:27
> Betreff: Re: Transparently hooking into the STG stack to validate an
> escape analysis
>
> Hi,
>
> IMO the Cmm STG machine implementation is just too complex for student
> projects. It's not fun to work with at all.
> Why did you choose this approach?
> IMO the escape analysis development and validation would be much smoother
> and fun when you'd use the external STG interpreter.
> When you have a solid and working design of your analysis and
> transformations then you could implement it in GHC's native backend if it
> needs any changes at all.
>
> What do you think?
> Do you disagree?
>
> Have you seen my presentation about the stg interpreter?
> https://www.youtube.com/watch?v=Ey5OFPkxF_w
>
> Cheers,
> Csaba
>
> On Wed, Dec 8, 2021 at 11:20 AM Sebastian Graf 
> wrote:
>
>> Hi Devs,
>>
>> my master's student Sebastian and I 

Re[2]: Transparently hooking into the STG stack to validate an escape analysis

2021-12-16 Thread Sebastian Graf

Hey Csaba,

After catching up on your talk and reflecting about it a bit, it seems 
quite obvious that your tool is the right way to collect the data and 
validate our analysis!
Even if meanwhile we decided that a "transparent stack frame" (which I 
believe is something similar to what you are doing here 
, 
with an explicit `argCount` which we do not know) is not the ideal 
solution we've been looking for (for different reasons).


Essentially, we have two things
An escape analysis, implemented as an STG-to-STG pass that attaches a 
boolean flag to each Id whether it "escapes its scope" (for a suitable 
definition of that).
We'd like to implement it in a way that it would be reusable within GHC 
with moderate effort (e.g., renaming Binder to Id or accounting for 
different fields), operating on a module at a time rather than whole 
program.The instrumentation that tries to measure how many heap objects 
could be allocated on the stack. E.g., set a closure-specific flag 
whenever the closure is entered, unset that bit (once) when we "leave 
the scope" that defines the closure.
If my understanding is right, we could just implement this 
"instrumentation" as a simple extra field to Closure 
, 
right? Neat!
A bit tangential: I see that your interpreter currently allocates a 
fresh closure for let-no-escapes 
 
when it could just re-use the closure of its defining scope. That would 
skew our numbers somewhat compared to instrumenting GHC-compiled 
programs, but I think we'd be able to work around that. I also wonder if 
the semantics of your let-no-escapes are actually as typically specified 
(operationally), in that a jump to a let-no-escape should also reset the 
stack pointer. It should hardly matter for the programs that GHC 
generates, though.


I would also be interested in knowing whether the +RTS -s "bytes 
allocated in the heap" metric (very nearly) coincides with a similar 
metric you could produce. It would be fantastic if that was the case! 
Theoretically, that should be possible, right?


I think your interpreter work is very valuable to collect data we 
otherwise would only be able to measure with a TickyTicky-based 
approach. Nice!
Another, similar use case would be to identify the fraction of closures 
that are only entered once. I remember that there was a ticky-based 
patch with which Joachim used to measure this fraction 
 
(and similarly validate the analysis results), but unfortunately it 
couldn't end up in master. Ah, yes, we have a whole open ticket about 
it: #10613 . In fact, 
that instrumentation is also somewhat similar (adding a field to every 
closure) as what we want to do.


Anyway, it seems like your work will be very valuable in replacing some 
of the annoying ticky-based instrumentation ideas!


Maybe we can have a call some time this or next week to discuss details, 
once Sebastian and I are more familiar with the code base?


Thanks for sticking with the project and doing all the hard work that 
can build upon!

Sebastian

-- Originalnachricht --
Von: "Csaba Hruska" 
An: "Sebastian Graf" 
Cc: "ghc-devs" ; "Sebastian Scheper" 


Gesendet: 15.12.2021 16:16:27
Betreff: Re: Transparently hooking into the STG stack to validate an 
escape analysis



Hi,

IMO the Cmm STG machine implementation is just too complex for student 
projects. It's not fun to work with at all.

Why did you choose this approach?
IMO the escape analysis development and validation would be much 
smoother and fun when you'd use the external STG interpreter.
When you have a solid and working design of your analysis and 
transformations then you could implement it in GHC's native backend if 
it needs any changes at all.


What do you think?
Do you disagree?

Have you seen my presentation about the stg interpreter?
https://www.youtube.com/watch?v=Ey5OFPkxF_w

Cheers,
Csaba

On Wed, Dec 8, 2021 at 11:20 AM Sebastian Graf  
wrote:

Hi Devs,

my master's student Sebastian and I (also Sebastian :)) are working on 
an escape analysis in STG, see 
https://gitlab.haskell.org/ghc/ghc/-/issues/16891#note_347903.


We have a prototype for the escape analysis that we want to 
validate/exploit now.
The original plan was to write the transformation that allocates 
non-escaping objects on the STG stack. But that is quite tricky for 
many reasons, one of them being treatment of the 

Re: Thoughts on async RTS API?

2021-12-16 Thread Cheng Shao
Hi Alex,

Thanks for reminding. hs_try_put_mvar() wouldn't work for our use
case. If the C function finishes work and calls hs_try_put_mvar()
synchronously, in Haskell takeMVar wouldn't block at all, which is all
fine. However, if the C function is expected to call hs_try_put_mvar()
asynchronously, the non-threaded RTS will hang!

Here's a minimal repro. It works with non-threaded RTS at first, but
if you change scheduleCallback() in C so hs_try_put_mvar() is only
invoked in a detached pthread, then the program hangs.

The proposed async RTS API and related scheduler refactorings can't be
avoided, if the MVar is intended to be fulfilled in an async manner,
using the non-threaded RTS, on a platform with extremely limited
syscall capabilities.

```haskell
import Control.Concurrent
import Control.Exception
import Foreign
import Foreign.C
import GHC.Conc

main :: IO ()
main = makeExternalCall >>= print

makeExternalCall :: IO CInt
makeExternalCall = mask_ $ do
  mvar <- newEmptyMVar
  sp <- newStablePtrPrimMVar mvar
  fp <- mallocForeignPtr
  withForeignPtr fp $ \presult -> do
(cap,_) <- threadCapability =<< myThreadId
scheduleCallback sp cap presult
takeMVar mvar
peek presult

foreign import ccall "scheduleCallback"
  scheduleCallback :: StablePtr PrimMVar
   -> Int
   -> Ptr CInt
   -> IO ()
```

```c
#include "HsFFI.h"
#include "Rts.h"
#include "RtsAPI.h"
#include 
#include 

struct callback {
HsStablePtr mvar;
int cap;
int *presult;
};

void* callback(struct callback *p)
{
usleep(1000);
*p->presult = 42;
hs_try_putmvar(p->cap,p->mvar);
free(p);
return NULL;
}

void scheduleCallback(HsStablePtr mvar, HsInt cap, int *presult)
{
pthread_t t;
struct callback *p = malloc(sizeof(struct callback));
p->mvar = mvar;
p->cap = cap;
p->presult = presult;
// pthread_create(, NULL, callback, p);
// pthread_detach(t);
callback(p);
}
```

On Thu, Dec 16, 2021 at 12:10 PM Alexander V Vershilov
 wrote:
>
> Hello, replying off-the thread as it would be basically an offtopic.
>
> But you can achieve the solution using MVars only.
> The idea is that you can call mkStablePtr on the MVar that way it will
> not be marked as dead, so RTS will not exit.
> Then you can use hs_try_put_mvar in C thread to call the thread back.
>
> On Wed, 15 Dec 2021 at 05:07, Cheng Shao  wrote:
> >
> > Hi devs,
> >
> > To invoke Haskell computation in C, we need to call one of rts_eval*
> > functions, which enters the scheduler loop, and returns only when the
> > specified Haskell thread is finished or killed. We'd like to enhance
> > the scheduler and add async variants of the rts_eval* functions, which
> > take C callbacks to consume the Haskell thread result, kick off the
> > scheduler loop, and the loop is allowed to exit when the Haskell
> > thread is blocked. Sync variants of RTS API will continue to work with
> > unchanged behavior.
> >
> > The main intended use case is async foreign calls for the WebAssembly
> > target. When an async foreign call is made, the Haskell thread will
> > block on an MVar to be fulfilled with the call result. But the
> > scheduler will eventually fail to find work due to empty run queue and
> > exit with error! We need a way to gracefully exit the scheduler, so
> > the RTS API caller can process the async foreign call, fulfill that
> > MVar and resume Haskell computation later.
> >
> > Question I: does the idea of adding async RTS API sound acceptable by
> > GHC HQ? To be honest, it's not impossible to workaround lack of async
> > RTS API: reuse the awaitEvent() logic in non-threaded RTS, pretend
> > each async foreign call reads from a file descriptor and can be
> > handled by the POSIX select() function in awaitEvent(). But it'd
> > surely be nice to avoid such hacks and do things the principled way.
> >
> > Question II: how to modify the scheduler loop to implement this
> > feature? Straightforward answer seems to be: check some RTS API
> > non-blocking flag, if present, allow early exit due to empty run
> > queue.
> >
> > Thanks a lot for reading this, I appreciate any suggestions or
> > questions :)
> >
> > Best regards,
> > Cheng
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
>
> --
> --
> Alexander
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs