Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?
On Sat, Nov 20, 2021 at 09:15:15PM +, Simon Peyton Jones via ghc-devs wrote: > GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* > join point. It would not be hard to make it clever enough to CSE join > points, but no one has yet done it. > > Do open a ticket! Thanks, I opened https://gitlab.haskell.org/ghc/ghc/-/issues/20717 -- Viktor. ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: [EXTERNAL] Unexpected duplicate join points in "Core" output?
There is absolutely no reason not to common-up those to join points. But we can't common up some join points when we could if they were let's. Consider join j1 x = x+1 in case v of A -> f (join j2 x = x+1 in ...j2...) B -> j1... C -> j1... Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 because then j1 wouldn't be a join point any more. GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join point. It would not be hard to make it clever enough to CSE join points, but no one has yet done it. Do open a ticket! Simon PS: I am leaving Microsoft at the end of November 2021, at which point simo...@microsoft.com will cease to work. Use simon.peytonjo...@gmail.com instead. (For now, it just forwards to simo...@microsoft.com.) | -Original Message- | From: ghc-devs On Behalf Of Viktor | Dukhovni | Sent: 20 November 2021 00:57 | To: ghc-devs@haskell.org | Subject: [EXTERNAL] Unexpected duplicate join points in "Core" output? | | The below "Core" output from "ghc -O2" (9.2/8.10) for the attached | program shows seemingly rendundant join points: | | join { | exit :: State# RealWorld -> (# State# RealWorld, () #) | exit (ipv :: State# RealWorld) = jump $s$j ipv } in | | join { | exit1 :: State# RealWorld -> (# State# RealWorld, () #) | exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in | | that are identical in all but name. These correspond to fallthrough to | the "otherwise" case in: | |... || acc < q || (acc == q && d <= 5) | -> loop (ptr `plusPtr` 1) (acc * 10 + d) || otherwise -> return Nothing | | but it seems that the generated X86_64 code (also below) ultimately | consolidates these into a single target... Is that why it is harmless | to leave these duplicated in the generated "Core"? | | [ Separately, in the generated machine code, it'd also be nice to avoid | comparing the same "q" with the accumulator twice. A single load and | compare should I think be enough, as I'd expect the status flags to | persist across the jump the second test. | | This happens to not be performance critical in my case, because most | calls should satisfy the first test, but generally I think that 3-way | "a < b", "a == b", "a > b" branches ideally avoid comparing twice... | ] | | Associated Core output | | -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} | main2 :: Addr# | main2 = "12345678901234567890 junk"# | | -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8} | main1 :: State# RealWorld -> (# State# RealWorld, () #) | main1 | = \ (eta :: State# RealWorld) -> | let { | end :: Addr# | end = plusAddr# main2 25# } in | join { | $s$j :: State# RealWorld -> (# State# RealWorld, () #) | $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in | join { | exit :: State# RealWorld -> (# State# RealWorld, () #) | exit (ipv :: State# RealWorld) = jump $s$j ipv } in | join { | exit1 :: State# RealWorld -> (# State# RealWorld, () #) | exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in | join { | exit2 | :: Addr# -> Word# -> State# RealWorld -> (# State# | RealWorld, () #) | exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# | RealWorld) | = case eqAddr# ww main2 of { | __DEFAULT -> | hPutStr2 | stdout | (++ | $fShowMaybe1 | (case $w$cshowsPrec3 11# (integerFromWord# | ww1) [] of | { (# ww3, ww4 #) -> | : ww3 ww4 | })) | True | eta; | 1# -> jump $s$j ipv | } } in | joinrec { | $wloop | :: Addr# -> Word# -> State# RealWorld -> (# State# | RealWorld, () #) | $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld) | = join { | getDigit :: State# RealWorld -> (# State# RealWorld, | () #) | getDigit (eta1 :: State# RealWorld) | = case eqAddr# ww end of { | __DEFAULT -> | case readWord8OffAddr# ww 0# eta1 of { (# | ipv, ipv1 #) -> | let { | ipv2 :: Word# | ipv2 = minusWord# (word8ToWord# ipv1) 48## | } in | case gtWord# ipv2 9## of { | __DEFAULT -> | case ltWord# ww1 1844674407370955161## of | { | __DEFAULT -> |
Re: [Take 2] Unexpected duplicate join points in "Core" output?
At this point I think it would be good if you could put your problem into a ghc-ticket. I can't look in detail into this in greater detail atm because of time constraints. And without a ticket it's likely to fall by the wayside eventually. But it does seem like something where we maybe could do better. And having good examples for the problematic behaviour is always immensely helpful to solve these kinds of problems. Cheers Andreas Am 20/11/2021 um 19:54 schrieb Viktor Dukhovni: On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote: For the assembly I opened a ticket: https://gitlab.haskell.org/ghc/ghc/-/issues/20714 Thanks, much appreciated. Understood re redundant join points, though in the non-toy context the redundnat point code is noticeably larger. join { exit4 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in join { exit5 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld) = case touch# dt1 w1 of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in FWIW, these don't appear to be deduplicated, both result from the same conditional: `acc < q || acc == q && d < 5`. I need some way to make this compute a single boolean value without forking the continuation. There's a another source of code bloat that I'd like to run by you... In the WIP code for Lazy ByteString 'readInt', I started with: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing -- skip empty chunk consume (Chunk (BI.BS _ 0) cs) !valid !acc -- Recurse = consume cs valid acc -- process non-empty chunk consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used <= 0 -- No more digits present -> if valid then convert acc' s else Nothing | used < len -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- try to read more digits -- Recurse -> consume cs True acc' Overflow -> Nothing Now _digits is the I/O loop I shared before, and the calling code gets inlined into that recursive loop with various join points. But the loop gets forked into multiple copies which are compiled separately, because there are two different recursive calls into "consume" that got compiled into separate "joinrec { ... }". So I tried instead: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid || used > 0) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid then convert acc' s else Nothing Overflow -> Nothing But was slightly surprised to find even more duplication (3 copies instead of tw) of the I/O loop, because in the call: consume cs (valid || used > 0) acc' the boolean argument got floated out, giving: case valid of True -> consume cs True acc' _ -> case used > 0 of True -> consume cs True acc' _-> consume cs False acc' and each of these then generates essentially the same code. To get the code to be emitted
Re: [Take 2] Unexpected duplicate join points in "Core" output?
On Sat, Nov 20, 2021 at 01:54:36PM -0500, Viktor Dukhovni wrote: > Is there some way for GHC to figure out to not float out such cheap > computations? The 'Result' constructor is strict, so there's no cost to > evaluating `used > 0`, and cloning the entire computation is I think > the more unfortunate choice... I managed to get the loop to not emit duplicate code bloat by inserting another NOINLINE term: !keepGoing = acc < q || acc == q && d <= r {-# NOINLINE keepGoing #-} Thus the below produces Core with no significant bloat, matching roughly what one might (reasonably?/naively?) expect. But I am reluctant to actually include such work-arounds in the PR, the code that produces more "bloated" Core is easier to understand and maintain... _digits :: Accum -> Accum -> BI.ByteString -> Accum -> Result {-# INLINE _digits #-} _digits !q !r !(BI.BS !fp !len) = \ !acc -> BI.accursedUnutterablePerformIO $ BI.unsafeWithForeignPtr fp $ \ptr -> do let end = ptr `plusPtr` len go ptr end ptr acc where go start end = loop where loop !ptr !acc | ptr == end = return $ Result (ptr `minusPtr` start) acc loop !ptr !acc = getDigit >>= \ !d -> if | d <= 9-> update d | otherwise -> return $ Result (ptr `minusPtr` start) acc where fromDigit = \w -> fromIntegral w - 0x30 -- i.e. w - '0' -- {-# NOINLINE getDigit #-} getDigit | ptr /= end = fromDigit <$> peek ptr | otherwise = pure 10 -- End of input -- update d | keepGoing = loop (ptr `plusPtr` 1) (acc * 10 + d) | otherwise = return Overflow where {-# NOINLINE keepGoing #-} !keepGoing = acc < q || acc == q && d <= r The Core code is now, with the duplicate comparison as the only visible inefficiency. -- The exit/exit3 joins could be combined but are small, -- ditto with exit1/exit2. Rec { -- RHS size: {terms: 190, types: 146, coercions: 0, joins: 8/10} $wconsume :: ByteString -> Int# -> Word# -> Maybe (Word64, ByteString) $wconsume = \ (w :: ByteString) (ww :: Int#) (ww1 :: Word#) -> case w of wild { Empty -> case ww of { __DEFAULT -> Just (W64# ww1, Empty); 0# -> Nothing }; Chunk dt dt1 dt2 cs -> let { end :: Addr# end = plusAddr# dt dt2 } in join { $s$j :: Int# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) $s$j (sc :: Int#) (sc1 :: Word#) (sc2 :: State# RealWorld) = case touch# dt1 sc2 of { __DEFAULT -> case ==# sc dt2 of { __DEFAULT -> case ># sc 0# of { __DEFAULT -> case ww of { __DEFAULT -> Just (W64# sc1, wild); 0# -> Nothing }; 1# -> Just (W64# sc1, Chunk (plusAddr# dt sc) dt1 (-# dt2 sc) cs) }; 1# -> $wconsume cs (orI# ww sc) sc1 } } } in join { exit :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) exit (ww2 :: Addr#) (ww3 :: Word#) (ipv :: State# RealWorld) = jump $s$j (minusAddr# ww2 dt) ww3 ipv } in join { exit1 :: State# RealWorld -> Maybe (Word64,
Re: [Take 2] Unexpected duplicate join points in "Core" output?
On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote: > For the assembly I opened a ticket: > https://gitlab.haskell.org/ghc/ghc/-/issues/20714 Thanks, much appreciated. Understood re redundant join points, though in the non-toy context the redundnat point code is noticeably larger. join { exit4 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in join { exit5 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld) = case touch# dt1 w1 of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in FWIW, these don't appear to be deduplicated, both result from the same conditional: `acc < q || acc == q && d < 5`. I need some way to make this compute a single boolean value without forking the continuation. There's a another source of code bloat that I'd like to run by you... In the WIP code for Lazy ByteString 'readInt', I started with: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing -- skip empty chunk consume (Chunk (BI.BS _ 0) cs) !valid !acc -- Recurse = consume cs valid acc -- process non-empty chunk consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used <= 0 -- No more digits present -> if valid then convert acc' s else Nothing | used < len -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- try to read more digits -- Recurse -> consume cs True acc' Overflow -> Nothing Now _digits is the I/O loop I shared before, and the calling code gets inlined into that recursive loop with various join points. But the loop gets forked into multiple copies which are compiled separately, because there are two different recursive calls into "consume" that got compiled into separate "joinrec { ... }". So I tried instead: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid || used > 0) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid then convert acc' s else Nothing Overflow -> Nothing But was slightly surprised to find even more duplication (3 copies instead of tw) of the I/O loop, because in the call: consume cs (valid || used > 0) acc' the boolean argument got floated out, giving: case valid of True -> consume cs True acc' _ -> case used > 0 of True -> consume cs True acc' _-> consume cs False acc' and each of these then generates essentially the same code. To get the code to be emitted just once, I had to switch from a Bool "valid" to a bitwise "valid": readInt !q !r = \ !s -> consume s 0 0 where -- All done consume s@Empty !valid !acc = if valid /= 0 then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid .|. used) acc' | used > 0 -- valid input
Re: How to build Haddock documentation quickly?
Hi Norman, > I'm more than willing to dive into Hadrian and figure out how it works. > I could even add a new target to build just what I'm interested in. > But I would need help. I've spent some time poking around the `doc` > directory, and I've read > the Shake papers (and some of Andrei's work) but I've never used these tools > myself. I would be delighted to help you (or anyone else!) navigate Hadrian source code. Please feel free to get in touch directly. (Alas, in the last couple of years I couldn't contribute to improving Hadrian but that's not for the lack of desire - just due to some life changes. Hopefully I can at least help by helping others!) Cheers, Andrey ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: [Take 2] Unexpected duplicate join points in "Core" output?
Hello Victor, generally GHC does try to common up join points and duplicate expressions like that. But since that's relatively expensive most of the duplication happens during the core-cse pass which only happens once. We don't create them because they are harmless. They are simple a side product of optimizations happening after the main cse pass has run. There is no feasible way to fix this I think. As you say with some luck they get caught at the Cmm stage and deduplicated there. Sadly it doesn't always happen. In most cases the impact of this is thankfully rather small. For the assembly I opened a ticket: https://gitlab.haskell.org/ghc/ghc/-/issues/20714 Am 20/11/2021 um 02:02 schrieb Viktor Dukhovni: [ Sorry wrong version of attachment in previous message. ] The below "Core" output from "ghc -O2" (9.2/8.10) for the attached program shows seemingly rendundant join points: join { exit :: State# RealWorld -> (# State# RealWorld, () #) exit (ipv :: State# RealWorld) = jump $s$j ipv } in join { exit1 :: State# RealWorld -> (# State# RealWorld, () #) exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in that are identical in all but name. These correspond to fallthrough to the "otherwise" case in: ... | acc < q || (acc == q && d <= 5) -> loop (ptr `plusPtr` 1) (acc * 10 + d) | otherwise -> return Nothing but it seems that the generated X86_64 code (also below) ultimately consolidates these into a single target... Is that why it is harmless to leave these duplicated in the generated "Core"? [ Separately, in the generated machine code, it'd also be nice to avoid comparing the same "q" with the accumulator twice. A single load and compare should I think be enough, as I'd expect the status flags to persist across the jump the second test. This happens to not be performance critical in my case, because most calls should satisfy the first test, but generally I think that 3-way "a < b", "a == b", "a > b" branches ideally avoid comparing twice... ] Associated Core output -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} main2 :: Addr# main2 = "12345678901234567890 junk"# -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8} main1 :: State# RealWorld -> (# State# RealWorld, () #) main1 = \ (eta :: State# RealWorld) -> let { end :: Addr# end = plusAddr# main2 25# } in join { $s$j :: State# RealWorld -> (# State# RealWorld, () #) $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in join { exit :: State# RealWorld -> (# State# RealWorld, () #) exit (ipv :: State# RealWorld) = jump $s$j ipv } in join { exit1 :: State# RealWorld -> (# State# RealWorld, () #) exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in join { exit2 :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #) exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld) = case eqAddr# ww main2 of { __DEFAULT -> hPutStr2 stdout (++ $fShowMaybe1 (case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of { (# ww3, ww4 #) -> : ww3 ww4 })) True eta; 1# -> jump $s$j ipv } } in joinrec { $wloop :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #) $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld) = join { getDigit :: State# RealWorld -> (# State# RealWorld, () #) getDigit (eta1 :: State# RealWorld) = case eqAddr# ww end of { __DEFAULT -> case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 #) -> let { ipv2 :: Word# ipv2 = minusWord# (word8ToWord# ipv1) 48## } in case gtWord# ipv2 9## of { __DEFAULT -> case ltWord# ww1 1844674407370955161## of { __DEFAULT -> case ww1 of { __DEFAULT -> jump exit ipv; 1844674407370955161## -> case leWord# ipv2 5## of { __DEFAULT -> jump exit1 ipv; 1# ->