Better calling conventions for strict functions (bang patterns)?

2015-10-23 Thread Ryan Newton
Hi all,

With module-level Strict and StrictData pragmas coming soon, one obvious
question is what kind of the code quality GHC can achieve for strict
programs.

When it came up in discussion in our research group we realized we didn't
actually know whether the bang patterns, `f !x`, on function arguments were
enforced by caller or callee.

Here's a Gist that shows the compilation of a trivial function:

foo :: Maybe Int -> Intfoo !x = case x of Just y -> y

   https://gist.github.com/rrnewton/1ac722189c65f26fe9ac

If that function is compiled to *assume* its input is in WHNF, it should be
just as efficient as the isomorphic MLton/OCaml code, right?  It only needs
to branch on the tag, do a field dereference, and return.

But as you can see from the STG and CMM generated, foo *does indeed* enter
the thunk, adding an extra indirect jump.  Here's the body:

c3aY: if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0; c3aZ: // nop R1 =
PicBaseReg + foo_closure; call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0,
upd: 8; c3b0: I64[Sp - 8] = PicBaseReg + block_c3aO_info; R1 = R2; Sp = Sp
- 8; if (R1 & 7 != 0) goto c3aO; else goto c3aP; c3aP: call (I64[R1])(R1)
returns to c3aO, args: 8, res: 8, upd: 8; c3aO: if (R1 & 7 >= 2) goto c3aW;
else goto c3aX; c3aW: R1 = P64[R1 + 6] & (-8); Sp = Sp + 8; call
(I64[R1])(R1) args: 8, res: 0, upd: 8; c3aX: R1 = PicBaseReg +
lvl_r39S_closure; Sp = Sp + 8; call (I64[R1])(R1) args: 8, res: 0, upd: 8;


The call inside c3aP is entering "x" as a thunk, which also incurs all of
the stack limit check code.  I believe that IF the input could be assumed
to be in WHNF, everything above the label "c3aO" could be omitted.

So... if GHC is going to be a fabulous pure *and* imperative language, and
a fabulous lazy *and* strict compiler/runtime.. is there some work we can
do here to improve this situation? Would the following make sense:

   - Put together a benchmark suite of all-strict programs with
   Strict/StrictData (compare a few benchmark's generated code to MLton, if
   time allows)
   - Modify GHC to change calling conventions for bang patterns -- caller
   enforces WHNF rather than callee.  Existing strictness/demand/cardinality
   analysis would stay the same.

Unless there's something I'm really missing here, the result should be that
you can have a whole chain of strict function calls, each of which knows
its arguments and the arguments it passes to its callees are all in WHNF,
without ever generating thunk-entry sequences.

Thanks for your time,
  -Ryan
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Better calling conventions for strict functions (bang patterns)?

2015-10-23 Thread Simon Peyton Jones
It’s absolutely the case that bang patterns etc tell the caller what to do, but 
the function CANNOT ASSUME that its argument is evaluated.  Reason: higher 
order functions.

I think that the way to allow functions that can assume their arg is evaluated 
is through types: see Type are calling 
conventions.
  But it’d be a fairly big deal to implement.

Simon


From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Ryan Newton
Sent: 23 October 2015 14:54
To: ghc-devs@haskell.org; Ömer Sinan Ağacan; Ryan Scott; Chao-Hong Chen; Johan 
Tibell
Subject: Better calling conventions for strict functions (bang patterns)?

Hi all,

With module-level Strict and StrictData pragmas coming soon, one obvious 
question is what kind of the code quality GHC can achieve for strict programs.

When it came up in discussion in our research group we realized we didn't 
actually know whether the bang patterns, `f !x`, on function arguments were 
enforced by caller or callee.

Here's a Gist that shows the compilation of a trivial function:


foo :: Maybe Int -> Int


foo !x =


  case x of


   Just y -> y


   
https://gist.github.com/rrnewton/1ac722189c65f26fe9ac

If that function is compiled to *assume* its input is in WHNF, it should be 
just as efficient as the isomorphic MLton/OCaml code, right?  It only needs to 
branch on the tag, do a field dereference, and return.

But as you can see from the STG and CMM generated, foo does indeed enter the 
thunk, adding an extra indirect jump.  Here's the body:


  c3aY:


  if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;


  c3aZ:


  // nop


  R1 = PicBaseReg + foo_closure;


  call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;


  c3b0:


  I64[Sp - 8] = PicBaseReg + block_c3aO_info;


  R1 = R2;


  Sp = Sp - 8;


  if (R1 & 7 != 0) goto c3aO; else goto c3aP;


  c3aP:


  call (I64[R1])(R1) returns to c3aO, args: 8, res: 8, upd: 8;


  c3aO:


  if (R1 & 7 >= 2) goto c3aW; else goto c3aX;


  c3aW:


  R1 = P64[R1 + 6] & (-8);


  Sp = Sp + 8;


  call (I64[R1])(R1) args: 8, res: 0, upd: 8;


  c3aX:


  R1 = PicBaseReg + lvl_r39S_closure;


  Sp = Sp + 8;


  call (I64[R1])(R1) args: 8, res: 0, upd: 8;



The call inside c3aP is entering "x" as a thunk, which also incurs all of the 
stack limit check code.  I believe that IF the input could be assumed to be in 
WHNF, everything above the label "c3aO" could be omitted.

So... if GHC is going to be a fabulous pure and imperative language, and a 
fabulous lazy and strict compiler/runtime.. is there some work we can do here 
to improve this situation? Would the following make sense:

  *   Put together a benchmark suite of all-strict programs with 
Strict/StrictData (compare a few benchmark's generated code to MLton, if time 
allows)
  *   Modify GHC to change calling conventions for bang patterns -- caller 
enforces WHNF rather than callee.  Existing strictness/demand/cardinality 
analysis would stay the same.
Unless there's something I'm really missing here, the result should be that you 
can have a whole chain of strict function calls, each of which knows its 
arguments and the arguments it passes to its callees are all in WHNF, without 
ever generating thunk-entry sequences.

Thanks for your time,
  -Ryan

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Better calling conventions for strict functions (bang patterns)?

2015-10-23 Thread Ryan Newton
>
>
>1. Small tweaks: The CMM code above seems to be *betting* than the
>thunk is unevaluated, because it does the stack check and stack write
>*before* the predicate test that checks if the thunk is evaluated (if
>(R1 & 7 != 0) goto c3aO; else goto c3aP;).  With a bang-pattern
>function, couldn't it make the opposite bet?  That is, branch on whether
>the thunk is evaluated first, and then the wasted computation is only a
>single correctly predicted branch (and a read of a tag that we need to read
>anyway).
>
> Oh, a small further addition would be needed for this tweak.  In the
generated code above "Sp = Sp + 8;" happens *late*, but I think it could
happen right after the call to the thunk.  In general, does it seem
feasible to separate the slowpath from fastpath as in the following tweak
of the example CMM?


*  // Skip to the chase if it's already evaluated:*
*  start:*
*  if (R2 & 7 != 0) goto fastpath; else goto slowpath;*

*  slowpath:   // Formerly c3aY*
*  if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
*  c3aZ:*
*  // nop*
*  R1 = PicBaseReg + foo_closure;*
*  call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
*  c3b0:*
*  I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
*  R1 = R2;*
*  Sp = Sp - 8;*

*  call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;*
*  // Sp bump moved to here so it's separate from "fastpath"*
*  Sp = Sp + 8;*

*  fastpath: // Formerly c3aO*
*  if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
*  c3aW:*
*  R1 = P64[R1 + 6] & (-8);*
*  call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
*  c3aX:*
*  R1 = PicBaseReg + lvl_r39S_closure;*
*  call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Context for typed holes

2015-10-23 Thread David Feuer
I just closed mine as a duplicate of yours.
On Oct 23, 2015 1:55 AM, "Andres Löh"  wrote:

> Actually, #9091 was the one I was really looking for ... reported by
> me. See also the discussion about "given" vs. "wanted" constraints.
>
> Cheers,
>   Andres
>
> On Fri, Oct 23, 2015 at 7:48 AM, David Feuer 
> wrote:
> > I opened https://ghc.haskell.org/trac/ghc/ticket/10954 for this. #9479,
> by
> > Dominique Devriese, is complementary--she wants instance information for
> a
> > *hole* with an ambiguous type.
> >
> > On Oct 23, 2015 1:28 AM, "Andres Löh"  wrote:
> >>
> >> Hi.
> >>
> >> On Oct 23, 2015 01:15, "Manuel M T Chakravarty" 
> >> wrote:
> >> >
> >> > I think, this is a good point. Maybe you should make a ticket for it.
> >>
> >> #9479, I think.
> >>
> >> Cheers,
> >> Andres
> >>
> >> >> David Feuer :
> >> >>
> >> >> Unless something has changed really recently that I've missed, the
> >> >> typed holes messages are missing some really important information:
> instance
> >> >> information for types in scope. When I am trying to fill in a hole,
> I look
> >> >> to the "relevant bindings" to show me what pieces I have available
> to use.
> >> >> Those pieces don't include contexts! Is there something
> fundamentally hard
> >> >> about adding this information? I'd only want instance information
> for type
> >> >> variables--providing it for concrete types would make too much
> noise. I'd
> >> >> also want information on equality constraints, of course.
> >> >>
> >> >> ___
> >> >> ghc-devs mailing list
> >> >> ghc-devs@haskell.org
> >> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >> >
> >> >
> >> >
> >> > ___
> >> > ghc-devs mailing list
> >> > ghc-devs@haskell.org
> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >> >
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Context for typed holes

2015-10-23 Thread Simon Peyton Jones
Indeed.  Please also do give concrete examples, so that we all talk about the 
same proposal.

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Manuel M T 
Chakravarty
Sent: 23 October 2015 00:16
To: David Feuer
Cc: ghc-devs
Subject: Re: Context for typed holes

I think, this is a good point. Maybe you should make a ticket for it.

Manuel

David Feuer >:

Unless something has changed really recently that I've missed, the typed holes 
messages are missing some really important information: instance information 
for types in scope. When I am trying to fill in a hole, I look to the "relevant 
bindings" to show me what pieces I have available to use. Those pieces don't 
include contexts! Is there something fundamentally hard about adding this 
information? I'd only want instance information for type variables--providing 
it for concrete types would make too much noise. I'd also want information on 
equality constraints, of course.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Better calling conventions for strict functions (bang patterns)?

2015-10-23 Thread Ryan Newton
Ah, yes, so just to give a concrete example in this thread, if we take the
`foo` function above and say `map foo ls`, we may well get unevaluated
arguments to foo.  (And this is almost precisely the same as the first
example that Strict-Core paper!)

Thanks for the paper reference.  I read it and it's great -- just what I
was looking for.  An approach that eliminates any jealousy of ML/Scheme
compiler techniques vis a vis calling conventions ;-).  I'm also wondering
if there are some incremental steps that can be taken, short of what is
proposed in the paper.

   1. Small tweaks: The CMM code above seems to be *betting* than the thunk
   is unevaluated, because it does the stack check and stack write *before* the
   predicate test that checks if the thunk is evaluated (if (R1 & 7 != 0)
   goto c3aO; else goto c3aP;).  With a bang-pattern function, couldn't it
   make the opposite bet?  That is, branch on whether the thunk is evaluated
   first, and then the wasted computation is only a single correctly predicted
   branch (and a read of a tag that we need to read anyway).

   2. The option of multiple entrypoints which is considered and discarded
   as fragile in the beginning of the paper (for direct call vs indirect / 1st
   order vs higher order).  That fragile option is along the lines of what I
   wanted to discuss on this thread.  It does seem like a tricky phase
   ordering concern, but how bad is it exactly?  The conflict with the a
   case-expr rewrite is illustrated clearly in the paper, but that just means
   that such optimizations must happen *before *the final choice of which
   function entrypoint to call, doesn't it?  I'm not 100% sure where it could
   go in the current compiler pipeline, but couldn't the adjustment of call
   target from "foo" to "foo_with_known_whnf_args" happen quite late?

Cheers,
  -Ryan

P.S. One of the students CC'd, Ryan Scott, is currently on internship at
Intel labs and is working to (hopefully) liberate the Intell Haskell
Research Compiler as open source.  Like the 2009 paper, it also uses a
strict IR, and I think it will be interesting to see exactly how it handles
the conversion from Core to its IR.  (Probably the same as Fig 10 in the
paper.)


On Fri, Oct 23, 2015 at 10:11 AM, Simon Peyton Jones 
wrote:

> It’s absolutely the case that bang patterns etc tell the caller what to
> do, but the function CANNOT ASSUME that its argument is evaluated.  Reason:
> higher order functions.
>
>
>
> I think that the way to allow functions that can assume their arg is
> evaluated is through types: see Type are calling conventions
> .
> But it’d be a fairly big deal to implement.
>
>
>
> Simon
>
>
>
>
>
> *From:* ghc-devs [mailto:ghc-devs-boun...@haskell.org] *On Behalf Of *Ryan
> Newton
> *Sent:* 23 October 2015 14:54
> *To:* ghc-devs@haskell.org; Ömer Sinan Ağacan; Ryan Scott; Chao-Hong
> Chen; Johan Tibell
> *Subject:* Better calling conventions for strict functions (bang
> patterns)?
>
>
>
> Hi all,
>
>
>
> With module-level Strict and StrictData pragmas coming soon, one obvious
> question is what kind of the code quality GHC can achieve for strict
> programs.
>
>
>
> When it came up in discussion in our research group we realized we didn't
> actually know whether the bang patterns, `f !x`, on function arguments were
> enforced by caller or callee.
>
>
>
> Here's a Gist that shows the compilation of a trivial function:
>
> foo :: Maybe Int -> Int
>
> foo !x =
>
>   case x of
>
>Just y -> y
>
>
>
>https://gist.github.com/rrnewton/1ac722189c65f26fe9ac
> 
>
>
>
> If that function is compiled to *assume* its input is in WHNF, it should
> be just as efficient as the isomorphic MLton/OCaml code, right?  It only
> needs to branch on the tag, do a field dereference, and return.
>
>
>
> But as you can see from the STG and CMM generated, foo *does indeed*
> enter the thunk, adding an extra indirect jump.  Here's the body:
>
>   c3aY:
>
>   if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;
>
>   c3aZ:
>
>   // nop
>
>   R1 = PicBaseReg + foo_closure;
>
>   call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;
>
>   c3b0:
>
>   I64[Sp - 8] = PicBaseReg + block_c3aO_info;
>
>   R1 = R2;
>
>   Sp = Sp - 8;
>
>   if (R1 & 7 != 0) goto c3aO; else goto c3aP;
>
>   c3aP:
>
>   call (I64[R1])(R1) returns to c3aO, args: 8, res: 8, upd: 8;
>
>   c3aO:
>
>   if (R1 & 7 >= 2) goto c3aW; else goto c3aX;
>
>   c3aW:
>
>   R1 = P64[R1 + 6] & (-8);
>
>   Sp = Sp + 8;
>
>   call