Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 09:15:15PM +, Simon Peyton Jones via ghc-devs wrote:

> GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any*
> join point.  It would not be hard to make it clever enough to CSE join
> points, but no one has yet done it.
> 
> Do open a ticket!

Thanks, I opened https://gitlab.haskell.org/ghc/ghc/-/issues/20717

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Simon Peyton Jones via ghc-devs
There is absolutely no reason not to common-up those to join points.  But we 
can't common up some join points when we could if they were let's.  Consider

join j1 x = x+1
in case v of
  A -> f (join j2 x = x+1 in ...j2...)
  B -> j1...
  C -> j1...

Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 
because then j1 wouldn't be a join point any more.

GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join 
point.  It would not be hard to make it clever enough to CSE join points, but 
no one has yet done it.

Do open a ticket!

Simon

PS: I am leaving Microsoft at the end of November 2021, at which point 
simo...@microsoft.com will cease to work.  Use simon.peytonjo...@gmail.com 
instead.  (For now, it just forwards to simo...@microsoft.com.)

| -Original Message-
| From: ghc-devs  On Behalf Of Viktor
| Dukhovni
| Sent: 20 November 2021 00:57
| To: ghc-devs@haskell.org
| Subject: [EXTERNAL] Unexpected duplicate join points in "Core" output?
| 
| The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
| program shows seemingly rendundant join points:
| 
|   join {
| exit :: State# RealWorld -> (# State# RealWorld, () #)
| exit (ipv :: State# RealWorld) = jump $s$j ipv } in
| 
|   join {
| exit1 :: State# RealWorld -> (# State# RealWorld, () #)
| exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
| 
| that are identical in all but name.  These correspond to fallthrough to
| the "otherwise" case in:
| 
|...
|| acc < q || (acc == q && d <= 5)
|  -> loop (ptr `plusPtr` 1) (acc * 10 + d)
|| otherwise -> return Nothing
| 
| but it seems that the generated X86_64 code (also below) ultimately
| consolidates these into a single target... Is that why it is harmless
| to leave these duplicated in the generated "Core"?
| 
| [ Separately, in the generated machine code, it'd also be nice to avoid
|   comparing the same "q" with the accumulator twice.  A single load and
|   compare should I think be enough, as I'd expect the status flags to
|   persist across the jump the second test.
| 
|   This happens to not be performance critical in my case, because most
|   calls should satisfy the first test, but generally I think that 3-way
|   "a < b", "a == b", "a > b" branches ideally avoid comparing twice...
| ]
| 
|  Associated Core output
| 
| -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
| main2 :: Addr#
| main2 = "12345678901234567890 junk"#
| 
| -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
| main1 :: State# RealWorld -> (# State# RealWorld, () #)
| main1
|   = \ (eta :: State# RealWorld) ->
|   let {
| end :: Addr#
| end = plusAddr# main2 25# } in
|   join {
| $s$j :: State# RealWorld -> (# State# RealWorld, () #)
| $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
|   join {
| exit :: State# RealWorld -> (# State# RealWorld, () #)
| exit (ipv :: State# RealWorld) = jump $s$j ipv } in
|   join {
| exit1 :: State# RealWorld -> (# State# RealWorld, () #)
| exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
|   join {
| exit2
|   :: Addr# -> Word# -> State# RealWorld -> (# State#
| RealWorld, () #)
| exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State#
| RealWorld)
|   = case eqAddr# ww main2 of {
|   __DEFAULT ->
| hPutStr2
|   stdout
|   (++
|  $fShowMaybe1
|  (case $w$cshowsPrec3 11# (integerFromWord#
| ww1) [] of
|   { (# ww3, ww4 #) ->
|   : ww3 ww4
|   }))
|   True
|   eta;
|   1# -> jump $s$j ipv
| } } in
|   joinrec {
| $wloop
|   :: Addr# -> Word# -> State# RealWorld -> (# State#
| RealWorld, () #)
| $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
|   = join {
|   getDigit :: State# RealWorld -> (# State# RealWorld,
| () #)
|   getDigit (eta1 :: State# RealWorld)
| = case eqAddr# ww end of {
| __DEFAULT ->
|   case readWord8OffAddr# ww 0# eta1 of { (#
| ipv, ipv1 #) ->
|   let {
| ipv2 :: Word#
| ipv2 = minusWord# (word8ToWord# ipv1) 48##
| } in
|   case gtWord# ipv2 9## of {
| __DEFAULT ->
|   case ltWord# ww1 1844674407370955161## of
| {
| __DEFAULT ->
|  

Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Andreas Klebinger

At this point I think it would be good if you could put your problem
into a ghc-ticket.

I can't look in detail into this in greater detail atm because of time
constraints.
And without a ticket it's likely to fall by the wayside eventually.
But it does seem like something where we maybe could do better.

And having good examples for the problematic behaviour is always
immensely helpful
to solve these kinds of problems.

Cheers
Andreas

Am 20/11/2021 um 19:54 schrieb Viktor Dukhovni:

On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:


For the assembly I opened a ticket:
https://gitlab.haskell.org/ghc/ghc/-/issues/20714

Thanks, much appreciated.  Understood re redundant join points, though
in the non-toy context the redundnat point code is noticeably larger.

 join {
   exit4
 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
   exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld)
 = case touch# dt1 ipv of { __DEFAULT ->
   let {
 dt3 :: Int#
 dt3 = minusAddr# ww4 dt } in
   case ==# dt3 dt2 of {
 __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
 1# -> jump $wconsume cs (orI# ww2 dt3) ww5
   }
   } } in
 join {
   exit5
 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
   exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld)
 = case touch# dt1 w1 of { __DEFAULT ->
   let {
 dt3 :: Int#
 dt3 = minusAddr# ww4 dt } in
   case ==# dt3 dt2 of {
 __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
 1# -> jump $wconsume cs (orI# ww2 dt3) ww5
   }
   } } in

FWIW, these don't appear to be deduplicated, both result from the same
conditional: `acc < q || acc == q && d < 5`.  I need some way to make
this compute a single boolean value without forking the continuation.

There's a another source of code bloat that I'd like to run by you...
In the WIP code for Lazy ByteString 'readInt', I started with:

   readInt !q !r =
 \ !s -> consume s False 0
   where
 -- All done
 consume s@Empty !valid !acc
 = if valid then convert acc s else Nothing
 -- skip empty chunk
 consume (Chunk (BI.BS _ 0) cs) !valid !acc
-- Recurse
 = consume cs valid acc
 -- process non-empty chunk
 consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
 = case _digits q r c acc of
 Result used acc'
 | used <= 0 -- No more digits present
   -> if valid then convert acc' s else Nothing
 | used < len -- valid input not entirely digits
   -> let !c' = BU.unsafeDrop used c
   in convert acc' $ Chunk c' cs
 | otherwise -- try to read more digits
-- Recurse
   -> consume cs True acc'
 Overflow -> Nothing

Now _digits is the I/O loop I shared before, and the calling code gets
inlined into that recursive loop with various join points.  But the loop
gets forked into multiple copies which are compiled separately, because
there are two different recursive calls into "consume" that got compiled
into separate "joinrec { ... }".

So I tried instead:

   readInt !q !r =
 \ !s -> consume s False 0
   where
 -- All done
 consume s@Empty !valid !acc
 = if valid then convert acc s else Nothing
 consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
 = case _digits q r c acc of
 Result used acc'
 | used == len -- try to read more digits
-- Recurse
   -> consume cs (valid || used > 0) acc'
 | used > 0 -- valid input not entirely digits
   -> let !c' = BU.unsafeDrop used c
   in convert acc' $ Chunk c' cs
 | otherwise -- No more digits present
   -> if valid then convert acc' s else Nothing
 Overflow -> Nothing

But was slightly surprised to find even more duplication (3 copies
instead of tw) of the I/O loop, because in the call:

 consume cs (valid || used > 0) acc'

the boolean argument got floated out, giving:

 case valid of
 True -> consume cs True acc'
 _ -> case used > 0 of
 True -> consume cs True acc'
 _-> consume cs False acc'

and each of these then generates essentially the same code.  To get the
code to be emitted 

Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 01:54:36PM -0500, Viktor Dukhovni wrote:

> Is there some way for GHC to figure out to not float out such cheap
> computations?  The 'Result' constructor is strict, so there's no cost to
> evaluating `used > 0`, and cloning the entire computation is I think
> the more unfortunate choice...

I managed to get the loop to not emit duplicate code bloat by
inserting another NOINLINE term:

!keepGoing = acc < q || acc == q && d <= r
{-# NOINLINE keepGoing #-}

Thus the below produces Core with no significant bloat, matching roughly
what one might (reasonably?/naively?) expect.  But I am reluctant to
actually include such work-arounds in the PR, the code that produces
more "bloated" Core is easier to understand and maintain...

_digits :: Accum -> Accum -> BI.ByteString -> Accum -> Result   
  {-# INLINE _digits #-}

_digits !q !r !(BI.BS !fp !len) = \ !acc -> 
  
BI.accursedUnutterablePerformIO $   
  BI.unsafeWithForeignPtr 
fp $ \ptr -> do 
  let end = ptr `plusPtr` len   

go ptr end ptr acc  
where   

go start end = loop 
where   

loop !ptr !acc | ptr == end 

  = return $ Result (ptr `minusPtr` start) acc  
loop !ptr !acc = 
getDigit >>= \ !d ->
if | d <= 9-> update d
   | otherwise -> return $ Result (ptr `minusPtr` start) acc
  where
fromDigit = \w -> fromIntegral w - 0x30 -- i.e. w - '0'
--
{-# NOINLINE getDigit #-}
getDigit | ptr /= end = fromDigit <$> peek ptr
 | otherwise  = pure 10  -- End of input
--
update d
| keepGoing = loop (ptr `plusPtr` 1) (acc * 10 + d)
| otherwise = return Overflow
  where
{-# NOINLINE keepGoing #-}
!keepGoing = acc < q || acc == q && d <= r

The Core code is now, with the duplicate comparison as the only visible
inefficiency.

-- The exit/exit3 joins could be combined but are small,
-- ditto with exit1/exit2.

Rec {
-- RHS size: {terms: 190, types: 146, coercions: 0, joins: 8/10}
$wconsume
  :: ByteString -> Int# -> Word# -> Maybe (Word64, ByteString)
$wconsume
  = \ (w :: ByteString) (ww :: Int#) (ww1 :: Word#) ->
  case w of wild {
Empty ->
  case ww of {
__DEFAULT -> Just (W64# ww1, Empty);
0# -> Nothing
  };
Chunk dt dt1 dt2 cs ->
  let {
end :: Addr#
end = plusAddr# dt dt2 } in
  join {
$s$j
  :: Int# -> Word# -> State# RealWorld -> Maybe (Word64, 
ByteString)
$s$j (sc :: Int#) (sc1 :: Word#) (sc2 :: State# RealWorld)
  = case touch# dt1 sc2 of { __DEFAULT ->
case ==# sc dt2 of {
  __DEFAULT ->
case ># sc 0# of {
  __DEFAULT ->
case ww of {
  __DEFAULT -> Just (W64# sc1, wild);
  0# -> Nothing
};
  1# -> Just (W64# sc1, Chunk (plusAddr# dt sc) dt1 (-# 
dt2 sc) cs)
};
  1# -> $wconsume cs (orI# ww sc) sc1
}
} } in
  join {
exit
  :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, 
ByteString)
exit (ww2 :: Addr#) (ww3 :: Word#) (ipv :: State# RealWorld)
  = jump $s$j (minusAddr# ww2 dt) ww3 ipv } in
  join {
exit1 :: State# RealWorld -> Maybe (Word64, 

Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:

> For the assembly I opened a ticket:
> https://gitlab.haskell.org/ghc/ghc/-/issues/20714

Thanks, much appreciated.  Understood re redundant join points, though
in the non-toy context the redundnat point code is noticeably larger.

join {
  exit4
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
  exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld)
= case touch# dt1 ipv of { __DEFAULT ->
  let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
  case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
  }
  } } in
join {
  exit5
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
  exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld)
= case touch# dt1 w1 of { __DEFAULT ->
  let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
  case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
  }
  } } in

FWIW, these don't appear to be deduplicated, both result from the same
conditional: `acc < q || acc == q && d < 5`.  I need some way to make
this compute a single boolean value without forking the continuation.

There's a another source of code bloat that I'd like to run by you...
In the WIP code for Lazy ByteString 'readInt', I started with:

  readInt !q !r =
\ !s -> consume s False 0
  where
-- All done
consume s@Empty !valid !acc
= if valid then convert acc s else Nothing
-- skip empty chunk
consume (Chunk (BI.BS _ 0) cs) !valid !acc
-- Recurse
= consume cs valid acc
-- process non-empty chunk
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used <= 0 -- No more digits present
  -> if valid then convert acc' s else Nothing
| used < len -- valid input not entirely digits
  -> let !c' = BU.unsafeDrop used c
  in convert acc' $ Chunk c' cs
| otherwise -- try to read more digits
-- Recurse
  -> consume cs True acc'
Overflow -> Nothing

Now _digits is the I/O loop I shared before, and the calling code gets
inlined into that recursive loop with various join points.  But the loop
gets forked into multiple copies which are compiled separately, because
there are two different recursive calls into "consume" that got compiled
into separate "joinrec { ... }".

So I tried instead:

  readInt !q !r =
\ !s -> consume s False 0
  where
-- All done
consume s@Empty !valid !acc
= if valid then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
  -> consume cs (valid || used > 0) acc'
| used > 0 -- valid input not entirely digits
  -> let !c' = BU.unsafeDrop used c
  in convert acc' $ Chunk c' cs
| otherwise -- No more digits present
  -> if valid then convert acc' s else Nothing
Overflow -> Nothing

But was slightly surprised to find even more duplication (3 copies
instead of tw) of the I/O loop, because in the call:

consume cs (valid || used > 0) acc'

the boolean argument got floated out, giving:

case valid of
True -> consume cs True acc'
_ -> case used > 0 of
True -> consume cs True acc'
_-> consume cs False acc'

and each of these then generates essentially the same code.  To get the
code to be emitted just once, I had to switch from a Bool "valid" to a
bitwise "valid":

  readInt !q !r =
\ !s -> consume s 0 0
  where
-- All done
consume s@Empty !valid !acc
= if valid /= 0 then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
  -> consume cs (valid .|. used) acc'
| used > 0 -- valid input 

Re: How to build Haddock documentation quickly?

2021-11-20 Thread Andrey Mokhov
Hi Norman,

> I'm more than willing to dive into Hadrian and figure out how it works.
> I could even add a new target to build just what I'm interested in.
> But I would need help.  I've spent some time poking around the `doc` 
> directory, and I've read
> the Shake papers (and some of Andrei's work) but I've never used these tools 
> myself.

I would be delighted to help you (or anyone else!) navigate Hadrian source 
code. Please feel free to get in touch directly.

(Alas, in the last couple of years I couldn't contribute to improving Hadrian 
but that's not for the lack of desire - just due to some life changes. 
Hopefully I can at least help by helping others!)

Cheers,
Andrey

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Andreas Klebinger

Hello Victor,

generally GHC does try to common up join points and duplicate
expressions like that.
But since that's relatively expensive most of the duplication happens
during the core-cse pass which only happens once.

We don't create them because they are harmless. They are simple a side
product of optimizations happening after
the main cse pass has run. There is no feasible way to fix this I think.
As you say with some luck they get caught at the Cmm stage and
deduplicated there. Sadly it doesn't always happen. In most cases the
impact of this is thankfully rather
small.

For the assembly I opened a ticket:
https://gitlab.haskell.org/ghc/ghc/-/issues/20714

Am 20/11/2021 um 02:02 schrieb Viktor Dukhovni:

[ Sorry wrong version of attachment in previous message. ]

The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
program shows seemingly rendundant join points:

   join {
 exit :: State# RealWorld -> (# State# RealWorld, () #)
 exit (ipv :: State# RealWorld) = jump $s$j ipv } in

   join {
 exit1 :: State# RealWorld -> (# State# RealWorld, () #)
 exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in

that are identical in all but name.  These correspond to fallthrough
to the "otherwise" case in:

...
| acc < q || (acc == q && d <= 5)
  -> loop (ptr `plusPtr` 1) (acc * 10 + d)
| otherwise -> return Nothing

but it seems that the generated X86_64 code (also below) ultimately
consolidates these into a single target... Is that why it is harmless to
leave these duplicated in the generated "Core"?

[ Separately, in the generated machine code, it'd also be nice to avoid
   comparing the same "q" with the accumulator twice.  A single load and
   compare should I think be enough, as I'd expect the status flags to
   persist across the jump the second test.

   This happens to not be performance critical in my case, because most
   calls should satisfy the first test, but generally I think that 3-way
   "a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]

 Associated Core output

 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
 main2 :: Addr#
 main2 = "12345678901234567890 junk"#

 -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
 main1 :: State# RealWorld -> (# State# RealWorld, () #)
 main1
   = \ (eta :: State# RealWorld) ->
   let {
 end :: Addr#
 end = plusAddr# main2 25# } in
   join {
 $s$j :: State# RealWorld -> (# State# RealWorld, () #)
 $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
   join {
 exit :: State# RealWorld -> (# State# RealWorld, () #)
 exit (ipv :: State# RealWorld) = jump $s$j ipv } in
   join {
 exit1 :: State# RealWorld -> (# State# RealWorld, () #)
 exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
   join {
 exit2
   :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
 exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld)
   = case eqAddr# ww main2 of {
   __DEFAULT ->
 hPutStr2
   stdout
   (++
  $fShowMaybe1
  (case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of
   { (# ww3, ww4 #) ->
   : ww3 ww4
   }))
   True
   eta;
   1# -> jump $s$j ipv
 } } in
   joinrec {
 $wloop
   :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
 $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
   = join {
   getDigit :: State# RealWorld -> (# State# RealWorld, () #)
   getDigit (eta1 :: State# RealWorld)
 = case eqAddr# ww end of {
 __DEFAULT ->
   case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 #) 
->
   let {
 ipv2 :: Word#
 ipv2 = minusWord# (word8ToWord# ipv1) 48## } in
   case gtWord# ipv2 9## of {
 __DEFAULT ->
   case ltWord# ww1 1844674407370955161## of {
 __DEFAULT ->
   case ww1 of {
 __DEFAULT -> jump exit ipv;
 1844674407370955161## ->
   case leWord# ipv2 5## of {
 __DEFAULT -> jump exit1 ipv;
 1# ->