Status and GHC's treatment of Haddock

2020-12-15 Thread Ben Gamari
Hi all,

Sadly `master` has been broken for much of the day. The causal-chain
is roughly the following:

 1. Last week a haddock merge-request was prematurely merged by a
haddock developer, despite not being compilable, to the `ghc-head`
branch.
 2. marge-bot failed during a merge on Sunday
 3. On Monday I finished merging the half-merged batch manually, not
aware of the inconsistent submodule state. However,
while doing so some unintended changes crept in, relying on 
an unmerged haddock commit.
 4. in an attempt to fix the inconsistent submodule state another
developer merged the unmerged commit
 5. I notice the unintentional commits on `master` and start to back
them out
 6. I note that even after backing out the unintentional commits the
tree *still* doesn't build due to haddock
 7. I perform a fair amount of head-scratching trying to figure out what
is going on; eventually I find the non-building commit.
 8. I eventually sort out what needs to be backed-out and in what order

I take a few lessons from this:

 * manual merges, no matter how "straightforward", are always a vector
   for mistakes. I should have triple-checked my work before pushing
   manually pushing the half-merged branch.

 * haddock submodule changes going in prematurely come with a very high
   cost, especially if they don't build. It turns out I had run into
   the non-compilable haddock commit a few times last week but had
   chalked it up to in-flight changes.

On the whole, I am really starting to wonder whether we can't improve
our model for dealing with haddock; it is currently quite easy for haddock
and GHC to enter an inconsistent state. Moreover, the fact that
`haddock`'s `ghc-head` branch runs ahead of `master` means that we won't
realize the problem until it's too late.

One small improvement would be to move Haddock to GitLab. This would at
least ensure that there is a clear connection between haddock MRs and
their corresponding GHC changes. I have suggested this to Alex.

Beyond this, it would be great if someone could step up to finish the
hi-haddock change [1]. I suspect this refactoring has the potential to
greatly improve this situation: the interface file AST tends to be more
stable that the Core AST and has significantly less surface area. After
this is in place Haddock could then use either its own AST or a frozen
HsSyn AST from GHC (faciliated by TTG) internally, greatly reducing the
coupling between Haddock and GHC.

Any takers?

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Nested constructed product results?

2020-12-15 Thread Sebastian Graf
Hi Alexis,

that's a very interesting example you have there!

So far, what we referred to as Nested CPR concerned unboxing for returned
nested *records*, e.g., the `annotation` field in your example. That's what
I try to exploit in !1866
, which after a
rebase that I'll hopefully be doing this week, some more sleuthing and then
documenting what I did will finally make it into master.

CPR'ing the Lambda, i.e., what is returned for `parser`, on the other hand,
is a surprising new opportunity for what Nested CPR could do beyond
unboxing records! And it's pretty simple, too: Because it's a function, we
don't care about subtleties such as whether all callers actually evaluate
the pair that deep (actually, that's wrong, as I realise below). I think
it's entirely within the reach of !1866 today. So we could transform
(provided that `(,) <$> a <*> b` inlines `<$>` and `<*>` and then will
actually have the CPR property)

AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
  { annotation = Seq ann1 ann2
  , parser = \s1 ->
  let !(a, s2) = f s1
  !(b, s3) = g s2
  in ((,) <$> a <*> b, s3)
  }

to

$w<+> :: Annotation
  -> (String -> (Maybe a, String))
  -> Annotation
  -> (String -> (Maybe b, String))
  -> (# Annotation, String -> (# Maybe (a, b), String #) #)
$w<+> ann1 f ann2 g =
  (# Seq ann1 ann2
   , \s1 -> case (\s1 -> let !(a, s2) = f s1
!(b, s3) = g s2
in ((,) <$> a <*> b) s1 of (p, q) -> (#p, q#), s3) #)

<+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
<+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
  case $w<+> ann1 f ann2 g of
(# a, b #) -> AnnotatedParser (\s1 -> case a s1 of (# p, q#) -> (p,
q)) b
{-# INLINE <+> #-}

Actually writing out the transformation tells me that this isn't always a
win: We now have to allocate a lambda in the wrapper. That is only a win if
that lambda cancels away at call sites! So we have to make sure that all
call sites of the wrapper actually call the `parser`, so that the lambda
simplifies away. If it doesn't, we have a situation akin to reboxing. So I
was wrong above when I said "we don't care about subtleties such as whether
all callers actually evaluate the pair that deep": We very much need to
know whether all call sites call the lambda. Luckily, I implemented just
that  for
exploitation by Nested CPR! That's the reason why I need to rebase !1866
now. I'll ḱeep you posted.

---

You might wonder why CPR today doesn't care for lambdas. Well, they only
make sense in nested scenarios (otherwise the function wasn't eta-expanded
that far, for good reasons) and CPR currently doesn't bother unboxing
records nestedly, which is what #18174
 discusses and what
!1866 tries to fix.

Cheers,
Sebastian

Am Di., 15. Dez. 2020 um 06:52 Uhr schrieb Alexis King <
lexi.lam...@gmail.com>:

> Hi all,
>
> I spent some time today looking into the performance of a program
> involving a parser type that looks something like this:
>
> data AnnotatedParser a = AnnotatedParser
>   { annotation :: Annotation
>   , parser :: String -> (Maybe a, String)
>   }
>
> The `Annotation` records metadata about the structure of an
> `AnnotatedParser` that can be accessed statically (that is, without
> having to run the parser on some input). `AnnotatedParser`s are built
> from various primitive constructors and composed using various
> combinators. These combinators end up looking something like this:
>
> (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a,
> b)
> AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
>   { annotation = Seq ann1 ann2
>   , parser = \s1 ->
>   let !(a, s2) = f s1
>   !(b, s3) = g s2
>   in ((,) <$> a <*> b, s3)
>   }
>
> Use of these combinators leads to the construction and subsequent case
> analysis of numerous `AnnotatedParser` closures. Happily, constructed
> product result[1] analysis kicks in and rewrites such combinators to cut
> down on the needless boxing, leading to worker/wrapper splits like this:
>
> $w<+> :: Annotation
>   -> (String -> (Maybe a, String))
>   -> Annotation
>   -> (String -> (Maybe b, String))
>   -> (# Annotation, String -> (Maybe (a, b), String) #)
> $w<+> ann1 f ann2 g =
>   (# Seq ann1 ann2
>, \s1 -> let !(a, s2) = f s1
> !(b, s3) = g s2
> in ((,) <$> a <*> b, s3) #)
>
> <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
> <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
>   case $w<+> ann1 f ann2 g of
> (# a, b #) -> AnnotatedParser a b
> {-# INLINE