Re: performance regressions
I've gained a little more insight here, but only a little. INLINE doesn't work because zipWithAndUnzipM is recursive. Oddly, using worker/wrapper made allocations go *up*, so that's somehow not the answer. When I use `inline`, GHC will inline the function once; recursive calls remain. This seems to the local minimum in allocations. Using `inline` gives roughly a 7% allocation level change on this pathological case. To reproduce, just compile HEAD's TcFlatten with and without the call to `inline`. You can see, using -ddump-simpl, what GHC is doing. Then, you can check the allocation numbers by compiling perf/compiler/T9872a.hs. My tests were all using the default build settings, with an unmodified build.mk, as that seemed to be the most performant setting. In other news, a slight change to the algorithm (see #9872, comment 23) makes a 50% improvement. Will hopefully commit tonight, after a full local validation run. Richard On Dec 17, 2014, at 10:59 AM, Simon Peyton Jones simo...@microsoft.com wrote: I still would like to understand why INLINE does not make it inline. That's weird. Eg way to reproduce. Simion | -Original Message- | From: Richard Eisenberg [mailto:e...@cis.upenn.edu] | Sent: 17 December 2014 15:56 | To: Simon Peyton Jones | Cc: Joachim Breitner; ghc-devs@haskell.org | Subject: Re: performance regressions | | By unsubstantiated guess is that INLINEABLE would have the same effect | as INLINE here, as GHC doesn't see fit to actually inline the | function, even with INLINE -- the big improvement seen between (1) and | (2) is actually specialization, not inlining. The jump from (2) to (3) | is actual inlining. Thus, it seems that GHC's heuristics for inlining | aren't working out for the best here. | | I've pushed my changes, though I agree with Simon that more research | may uncover even more improvements here. I didn't focus on the number | of calls because that number didn't regress. Will look into this soon. | | Richard | | On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones | simo...@microsoft.com wrote: | | If you use INLINEABLE, that should make the function specialisable | to a particular monad, even if it's in a different module. You | shouldn't need INLINE for that. | | I don't understand the difference between cases (2) and (3). | | I am still suspicious of why there are so many calls to this one | function that it, alone, is allocating a significant proportion of | compilation of the entire run of GHC. Are you sure there isn't an | algorithmic improvement to be had, to simply reduce the number of | calls? | | Simon | | | -Original Message- | | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | | Richard Eisenberg | | Sent: 16 December 2014 21:46 | | To: Joachim Breitner | | Cc: ghc-devs@haskell.org | | Subject: Re: performance regressions | | | | I've learned several very interesting things in this analysis. | | | | - Inlining polymorphic methods is very important. Here are some | | data points to back up that claim: | | * Original implementation using zipWithAndUnzipM: | 8,472,613,440 | | bytes allocated in the heap | | * Adding {-# INLINE #-} to the definition thereof: | 6,639,253,488 | | bytes allocated in the heap | | * Using `inline` at call site to force inlining: | 6,281,539,792 | | bytes allocated in the heap | | | | The middle step above allowed GHC to specialize zipWithAndUnzipM | to | | my particular monad, but GHC didn't see fit to actually inline | the | | function. Using `inline` forced it, to good effect. (I did not | | collect data on code sizes, but it wouldn't be hard to.) | | | | By comparison: | | * Hand-written recursion:6,587,809,112 bytes allocated in | the | | heap | | Interestingly, this is *not* the best result! | | | | Conclusion: We should probably add INLINE pragmas to Util and | | MonadUtils. | | | | | | - I then looked at rejiggering the algorithm to keep the common | | case fast. This had a side effect of changing the | zipWithAndUnzipM | | to mapAndUnzipM, from Control.Monad. To my surprise, this brought | | disaster! | | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes | | allocated in the heap | | * Hand-written recursion: 5,848,602,848 bytes | | allocated in the heap | | | | That last number is better than the numbers above because of the | | algorithm streamlining. But, the inadequacy of mapAndUnzipM | | surprised me -- it already has an INLINE pragma in Control.Monad | of course. | | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed | | getting inlined, but a call to `map` remained, perhaps causing | | extra allocation. | | | | Conclusion: We
RE: performance regressions
If you use INLINEABLE, that should make the function specialisable to a particular monad, even if it's in a different module. You shouldn't need INLINE for that. I don't understand the difference between cases (2) and (3). I am still suspicious of why there are so many calls to this one function that it, alone, is allocating a significant proportion of compilation of the entire run of GHC. Are you sure there isn't an algorithmic improvement to be had, to simply reduce the number of calls? Simon | -Original Message- | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | Richard Eisenberg | Sent: 16 December 2014 21:46 | To: Joachim Breitner | Cc: ghc-devs@haskell.org | Subject: Re: performance regressions | | I've learned several very interesting things in this analysis. | | - Inlining polymorphic methods is very important. Here are some data | points to back up that claim: | * Original implementation using zipWithAndUnzipM:8,472,613,440 | bytes allocated in the heap | * Adding {-# INLINE #-} to the definition thereof: 6,639,253,488 | bytes allocated in the heap | * Using `inline` at call site to force inlining: 6,281,539,792 | bytes allocated in the heap | | The middle step above allowed GHC to specialize zipWithAndUnzipM to my | particular monad, but GHC didn't see fit to actually inline the | function. Using `inline` forced it, to good effect. (I did not collect | data on code sizes, but it wouldn't be hard to.) | | By comparison: | * Hand-written recursion:6,587,809,112 bytes allocated in the | heap | Interestingly, this is *not* the best result! | | Conclusion: We should probably add INLINE pragmas to Util and | MonadUtils. | | | - I then looked at rejiggering the algorithm to keep the common case | fast. This had a side effect of changing the zipWithAndUnzipM to | mapAndUnzipM, from Control.Monad. To my surprise, this brought | disaster! | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes | allocated in the heap | * Hand-written recursion: 5,848,602,848 bytes | allocated in the heap | | That last number is better than the numbers above because of the | algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised | me -- it already has an INLINE pragma in Control.Monad of course. | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting | inlined, but a call to `map` remained, perhaps causing extra | allocation. | | Conclusion: We should examine the implementation of mapAndUnzipM (and | similar functions) in Control.Monad. Is it as fast as possible? | | | | In the end, I was unable to bring the allocation numbers down to where | they were before my work. This is because the flattener now deals in | roles. Most of its behavior is the same between nominal and | representational roles, so it seems silly (though very possible) to | specialize the code to nominal to keep that path fast. Instead, I | identified one key spot and made that go fast. | | Thus, there is a 7% bump to memory usage on very-type-family-heavy | code, compared to before my commit on Friday. (On more ordinary code, | there is no noticeable change.) | | Validating my patch locally now; will push when that's done. | | Thanks, | Richard | | On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim- | breitner.de wrote: | | Hi, | | | Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg: | On Dec 16, 2014, at 4:01 AM, Joachim Breitner mail@joachim- | breitner.de wrote: | | another guess (without looking at the code, sorry): Are they in | the | same module? I.e., can GHC specialize the code to your particular | Monad? | | No, they're not in the same module. I could also try moving the | zipWithAndUnzipM function to the same module, and even specializing | it by hand to the right monad. | | I did mean zipWithAndUnzipM, so maybe yes: Try that. | | (I find it hard to believe that any polymorphic monadic code should | perform well, with those many calls to an unknown (=) with a | function parameter, but maybe I'm too pessimistic here.) | | | Could that be preventing the fusing? | | There is not going to be any fusing here, at least not list fusion; | that would require your code to be written in terms of functions | with | fusion rules. | | Greetings, | Joachim | | -- | Joachim nomeata Breitner |m...@joachim-breitner.de * http://www.joachim-breitner.de/ |Jabber: nome...@joachim-breitner.de * GPG-Key: 0xF0FBF51F Debian | Developer: nome...@debian.org | | ___ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs | | ___ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo
Re: performance regressions
By unsubstantiated guess is that INLINEABLE would have the same effect as INLINE here, as GHC doesn't see fit to actually inline the function, even with INLINE -- the big improvement seen between (1) and (2) is actually specialization, not inlining. The jump from (2) to (3) is actual inlining. Thus, it seems that GHC's heuristics for inlining aren't working out for the best here. I've pushed my changes, though I agree with Simon that more research may uncover even more improvements here. I didn't focus on the number of calls because that number didn't regress. Will look into this soon. Richard On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones simo...@microsoft.com wrote: If you use INLINEABLE, that should make the function specialisable to a particular monad, even if it's in a different module. You shouldn't need INLINE for that. I don't understand the difference between cases (2) and (3). I am still suspicious of why there are so many calls to this one function that it, alone, is allocating a significant proportion of compilation of the entire run of GHC. Are you sure there isn't an algorithmic improvement to be had, to simply reduce the number of calls? Simon | -Original Message- | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | Richard Eisenberg | Sent: 16 December 2014 21:46 | To: Joachim Breitner | Cc: ghc-devs@haskell.org | Subject: Re: performance regressions | | I've learned several very interesting things in this analysis. | | - Inlining polymorphic methods is very important. Here are some data | points to back up that claim: | * Original implementation using zipWithAndUnzipM:8,472,613,440 | bytes allocated in the heap | * Adding {-# INLINE #-} to the definition thereof: 6,639,253,488 | bytes allocated in the heap | * Using `inline` at call site to force inlining: 6,281,539,792 | bytes allocated in the heap | | The middle step above allowed GHC to specialize zipWithAndUnzipM to my | particular monad, but GHC didn't see fit to actually inline the | function. Using `inline` forced it, to good effect. (I did not collect | data on code sizes, but it wouldn't be hard to.) | | By comparison: | * Hand-written recursion:6,587,809,112 bytes allocated in the | heap | Interestingly, this is *not* the best result! | | Conclusion: We should probably add INLINE pragmas to Util and | MonadUtils. | | | - I then looked at rejiggering the algorithm to keep the common case | fast. This had a side effect of changing the zipWithAndUnzipM to | mapAndUnzipM, from Control.Monad. To my surprise, this brought | disaster! | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes | allocated in the heap | * Hand-written recursion: 5,848,602,848 bytes | allocated in the heap | | That last number is better than the numbers above because of the | algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised | me -- it already has an INLINE pragma in Control.Monad of course. | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting | inlined, but a call to `map` remained, perhaps causing extra | allocation. | | Conclusion: We should examine the implementation of mapAndUnzipM (and | similar functions) in Control.Monad. Is it as fast as possible? | | | | In the end, I was unable to bring the allocation numbers down to where | they were before my work. This is because the flattener now deals in | roles. Most of its behavior is the same between nominal and | representational roles, so it seems silly (though very possible) to | specialize the code to nominal to keep that path fast. Instead, I | identified one key spot and made that go fast. | | Thus, there is a 7% bump to memory usage on very-type-family-heavy | code, compared to before my commit on Friday. (On more ordinary code, | there is no noticeable change.) | | Validating my patch locally now; will push when that's done. | | Thanks, | Richard | | On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim- | breitner.de wrote: | | Hi, | | | Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg: | On Dec 16, 2014, at 4:01 AM, Joachim Breitner mail@joachim- | breitner.de wrote: | | another guess (without looking at the code, sorry): Are they in | the | same module? I.e., can GHC specialize the code to your particular | Monad? | | No, they're not in the same module. I could also try moving the | zipWithAndUnzipM function to the same module, and even specializing | it by hand to the right monad. | | I did mean zipWithAndUnzipM, so maybe yes: Try that. | | (I find it hard to believe that any polymorphic monadic code should | perform well, with those many calls to an unknown (=) with a | function parameter
RE: performance regressions
I still would like to understand why INLINE does not make it inline. That's weird. Eg way to reproduce. Simion | -Original Message- | From: Richard Eisenberg [mailto:e...@cis.upenn.edu] | Sent: 17 December 2014 15:56 | To: Simon Peyton Jones | Cc: Joachim Breitner; ghc-devs@haskell.org | Subject: Re: performance regressions | | By unsubstantiated guess is that INLINEABLE would have the same effect | as INLINE here, as GHC doesn't see fit to actually inline the | function, even with INLINE -- the big improvement seen between (1) and | (2) is actually specialization, not inlining. The jump from (2) to (3) | is actual inlining. Thus, it seems that GHC's heuristics for inlining | aren't working out for the best here. | | I've pushed my changes, though I agree with Simon that more research | may uncover even more improvements here. I didn't focus on the number | of calls because that number didn't regress. Will look into this soon. | | Richard | | On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones | simo...@microsoft.com wrote: | | If you use INLINEABLE, that should make the function specialisable | to a particular monad, even if it's in a different module. You | shouldn't need INLINE for that. | | I don't understand the difference between cases (2) and (3). | | I am still suspicious of why there are so many calls to this one | function that it, alone, is allocating a significant proportion of | compilation of the entire run of GHC. Are you sure there isn't an | algorithmic improvement to be had, to simply reduce the number of | calls? | | Simon | | | -Original Message- | | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | | Richard Eisenberg | | Sent: 16 December 2014 21:46 | | To: Joachim Breitner | | Cc: ghc-devs@haskell.org | | Subject: Re: performance regressions | | | | I've learned several very interesting things in this analysis. | | | | - Inlining polymorphic methods is very important. Here are some | | data points to back up that claim: | | * Original implementation using zipWithAndUnzipM: | 8,472,613,440 | | bytes allocated in the heap | | * Adding {-# INLINE #-} to the definition thereof: | 6,639,253,488 | | bytes allocated in the heap | | * Using `inline` at call site to force inlining: | 6,281,539,792 | | bytes allocated in the heap | | | | The middle step above allowed GHC to specialize zipWithAndUnzipM | to | | my particular monad, but GHC didn't see fit to actually inline | the | | function. Using `inline` forced it, to good effect. (I did not | | collect data on code sizes, but it wouldn't be hard to.) | | | | By comparison: | | * Hand-written recursion:6,587,809,112 bytes allocated in | the | | heap | | Interestingly, this is *not* the best result! | | | | Conclusion: We should probably add INLINE pragmas to Util and | | MonadUtils. | | | | | | - I then looked at rejiggering the algorithm to keep the common | | case fast. This had a side effect of changing the | zipWithAndUnzipM | | to mapAndUnzipM, from Control.Monad. To my surprise, this brought | | disaster! | | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes | | allocated in the heap | | * Hand-written recursion: 5,848,602,848 bytes | | allocated in the heap | | | | That last number is better than the numbers above because of the | | algorithm streamlining. But, the inadequacy of mapAndUnzipM | | surprised me -- it already has an INLINE pragma in Control.Monad | of course. | | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed | | getting inlined, but a call to `map` remained, perhaps causing | | extra allocation. | | | | Conclusion: We should examine the implementation of mapAndUnzipM | | (and similar functions) in Control.Monad. Is it as fast as | possible? | | | | | | | | In the end, I was unable to bring the allocation numbers down to | | where they were before my work. This is because the flattener now | | deals in roles. Most of its behavior is the same between nominal | | and representational roles, so it seems silly (though very | | possible) to specialize the code to nominal to keep that path | fast. | | Instead, I identified one key spot and made that go fast. | | | | Thus, there is a 7% bump to memory usage on very-type-family- | heavy | | code, compared to before my commit on Friday. (On more ordinary | | code, there is no noticeable change.) | | | | Validating my patch locally now; will push when that's done. | | | | Thanks, | | Richard | | | | On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim- | | breitner.de wrote: | | | | Hi, | | | | | | Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard | Eisenberg: | | On Dec 16, 2014, at 4:01 AM
Re: performance regressions
Is it possible INLINE didn't inline the function because it's recursive? If it were my function, I'd probably try a manual worker /wrapper. On 07:59, Wed, Dec 17, 2014 Simon Peyton Jones simo...@microsoft.com wrote: I still would like to understand why INLINE does not make it inline. That's weird. Eg way to reproduce. Simion | -Original Message- | From: Richard Eisenberg [mailto:e...@cis.upenn.edu] | Sent: 17 December 2014 15:56 | To: Simon Peyton Jones | Cc: Joachim Breitner; ghc-devs@haskell.org | Subject: Re: performance regressions | | By unsubstantiated guess is that INLINEABLE would have the same effect | as INLINE here, as GHC doesn't see fit to actually inline the | function, even with INLINE -- the big improvement seen between (1) and | (2) is actually specialization, not inlining. The jump from (2) to (3) | is actual inlining. Thus, it seems that GHC's heuristics for inlining | aren't working out for the best here. | | I've pushed my changes, though I agree with Simon that more research | may uncover even more improvements here. I didn't focus on the number | of calls because that number didn't regress. Will look into this soon. | | Richard | | On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones | simo...@microsoft.com wrote: | | If you use INLINEABLE, that should make the function specialisable | to a particular monad, even if it's in a different module. You | shouldn't need INLINE for that. | | I don't understand the difference between cases (2) and (3). | | I am still suspicious of why there are so many calls to this one | function that it, alone, is allocating a significant proportion of | compilation of the entire run of GHC. Are you sure there isn't an | algorithmic improvement to be had, to simply reduce the number of | calls? | | Simon | | | -Original Message- | | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | | Richard Eisenberg | | Sent: 16 December 2014 21:46 | | To: Joachim Breitner | | Cc: ghc-devs@haskell.org | | Subject: Re: performance regressions | | | | I've learned several very interesting things in this analysis. | | | | - Inlining polymorphic methods is very important. Here are some | | data points to back up that claim: | | * Original implementation using zipWithAndUnzipM: | 8,472,613,440 | | bytes allocated in the heap | | * Adding {-# INLINE #-} to the definition thereof: | 6,639,253,488 | | bytes allocated in the heap | | * Using `inline` at call site to force inlining: | 6,281,539,792 | | bytes allocated in the heap | | | | The middle step above allowed GHC to specialize zipWithAndUnzipM | to | | my particular monad, but GHC didn't see fit to actually inline | the | | function. Using `inline` forced it, to good effect. (I did not | | collect data on code sizes, but it wouldn't be hard to.) | | | | By comparison: | | * Hand-written recursion:6,587,809,112 bytes allocated in | the | | heap | | Interestingly, this is *not* the best result! | | | | Conclusion: We should probably add INLINE pragmas to Util and | | MonadUtils. | | | | | | - I then looked at rejiggering the algorithm to keep the common | | case fast. This had a side effect of changing the | zipWithAndUnzipM | | to mapAndUnzipM, from Control.Monad. To my surprise, this brought | | disaster! | | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes | | allocated in the heap | | * Hand-written recursion: 5,848,602,848 bytes | | allocated in the heap | | | | That last number is better than the numbers above because of the | | algorithm streamlining. But, the inadequacy of mapAndUnzipM | | surprised me -- it already has an INLINE pragma in Control.Monad | of course. | | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed | | getting inlined, but a call to `map` remained, perhaps causing | | extra allocation. | | | | Conclusion: We should examine the implementation of mapAndUnzipM | | (and similar functions) in Control.Monad. Is it as fast as | possible? | | | | | | | | In the end, I was unable to bring the allocation numbers down to | | where they were before my work. This is because the flattener now | | deals in roles. Most of its behavior is the same between nominal | | and representational roles, so it seems silly (though very | | possible) to specialize the code to nominal to keep that path | fast. | | Instead, I identified one key spot and made that go fast. | | | | Thus, there is a 7% bump to memory usage on very-type-family- | heavy | | code, compared to before my commit on Friday. (On more ordinary | | code, there is no noticeable change.) | | | | Validating my patch
Re: performance regressions
On 2014-12-16 at 22:45:36 +0100, Richard Eisenberg wrote: I've learned several very interesting things in this analysis. - Inlining polymorphic methods is very important. otoh, there are cases where marking methods INLINE have catastrophic effects; the following https://github.com/kolmodin/binary/commit/48c02966512a67b018fcdf093fab8d34bddf9a69 was necessary a few months ago, as otherwise GHC HEAD's compile-time memory usage would explode: https://ghc.haskell.org/trac/ghc/ticket/9630 ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Using this combinator instead of writing the algorithm directly cost me 30% allocation overhead! What does your algorithm look like when you write it directly? Something like this: flatten_many fmode roles tys = unzip `liftM` mapM go (zip roles tys) where go (Nominal,ty) = flatten_one (fmode { fe_eq_rel = NomEq }) ty go (Representational,ty) = flatten_one (fmode { fe_eq_rel = ReprEq }) ty go (Phantom, ty) = -- See Note [Phantoms in the flattener] return (ty, mkTcPhantomCo ty ty) ? Maybe this has something to do with `zipWithAndUnzipM` not being tail-recursive vs. direct version being able to fuse intermediate lists? Janek ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Hi, Am Montag, den 15.12.2014, 23:48 -0500 schrieb Richard Eisenberg: Can anyone tell me: why? Have I made some horrible mistake in the implementation? And, relatedly: how can I fix this? I want to learn from this experience how to avoid this problem next time... another guess (without looking at the code, sorry): Are they in the same module? I.e., can GHC specialize the code to your particular Monad? Greetings, Joachim -- Joachim Breitner e-Mail: m...@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nome...@joachim-breitner.de signature.asc Description: This is a digitally signed message part ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
RE: performance regressions
| Using this combinator instead of writing the algorithm directly cost | me 30% allocation overhead! That seems surprising. I'd build a profiled compiler (GhcProfiled=YES) and see what it says. If it increases allocation by 30% overall, there must be a LOT of calls to this function. Should there be so many? S | -Original Message- | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of | Richard Eisenberg | Sent: 16 December 2014 04:49 | To: Ben Gamari | Cc: Joachim Breitner; ghc-devs@haskell.org | Subject: Re: performance regressions | | I've made progress, but still need some help. | | It turns out that a monadic combinator (that I wrote) is mostly | responsible: | | zipWithAndUnzipM :: Monad m |= (a - b - m (c, d)) - [a] - [b] - m ([c], | [d]) | zipWithAndUnzipM f (x:xs) (y:ys) | = do { (c, d) - f x y | ; (cs, ds) - zipWithAndUnzipM f xs ys | ; return (c:cs, d:ds) } | zipWithAndUnzipM _ _ _ = return ([], []) | | | Using this combinator instead of writing the algorithm directly cost | me 30% allocation overhead! | | Can anyone tell me: why? Have I made some horrible mistake in the | implementation? | And, relatedly: how can I fix this? I want to learn from this | experience how to avoid this problem next time... | | Unfortunately, my commit causes 50% overhead, not 30%, so I'm not out | of the woods yet. Hopefully, another 20% of good news tomorrow. | | Thanks! | Richard | | On Dec 15, 2014, at 11:33 AM, Ben Gamari b...@smart-cactus.org wrote: | | Joachim Breitner nome...@debian.org writes: | | Hi, | | Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari: | - Travis has not picked up on these errors. | | unfortunately, travis is slighly less useful since a few weeks | due | to | T5681 failing (possibly due to the use of LLVM-3.4), but I'm | still | waiting for an reply on that issue. | | You aren't looking for a response from me on this, are you? I just | checked and I don't seem to have any outstanding messages from you | but it's entirely possible I overlooked something. | | this is independent of our arm issues, and I think a tad older; I | did | not direct it to anyone specific. | | But I guess you are likely a person that can tell what's wrong | here: | | Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner: | Compile failed (status 256) errors were: | /tmp/ghc16123_0/ghc16123_5.s: Assembler messages: | | /tmp/ghc16123_0/ghc16123_5.s:26:0: | Error: can't resolve `.rodata' {.rodata section} - | `Main_zdwwork_info$def' {.text section} | | /tmp/ghc16123_0/ghc16123_5.s:46:0: | Error: can't resolve `.rodata' {.rodata section} - | `Main_work_info$def' {.text section} | | /tmp/ghc16123_0/ghc16123_5.s:66:0: | Error: can't resolve `.rodata' {.rodata section} - | `Main_main1_info$def' {.text section} | | /tmp/ghc16123_0/ghc16123_5.s:86:0: | Error: can't resolve `.rodata' {.rodata section} - | `Main_main_info$def' {.text section} | | /tmp/ghc16123_0/ghc16123_5.s:106:0: | Error: can't resolve `.rodata' {.rodata section} - | `Main_main2_info$def' {.text section} | | /tmp/ghc16123_0/ghc16123_5.s:126:0: | Error: can't resolve `.rodata' {.rodata section} - | `ZCMain_main_info$def' {.text section} | | *** unexpected failure for T5681(optllvm) | | https://s3.amazonaws.com/archive.travis- | ci.org/jobs/42557559/log.txt | | Any ideas? | | Is it possible that this is due the llvm version used? Do we | support | 3.4 in GHC HEAD? | | Using LLVM tools |llc : /usr/local/clang-3.4/bin/llc |opt : /usr/local/clang-3.4/bin/opt | | (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm- | backend. | html does not talk about GHC HEAD explicitly. Should I look at the | 7.10 row? Does that mean that 3.4 is not supported? Shouldn't the | build system, or at least the compiler, fail harder and more | helpfully in this case?) | | LLVM 3.4 appears to have an unfortunate behavior whereby it will | lose | track of which section symbols with Internal linkage belong. I | haven't had a chance to delve into this too deeply, however given | that | both 3.3 and | 3.5 behave as expected I'm pretty sure this a bug. There are a few | options here, | |a. Mark the `$def` symbols as ExternallyVisible, working around the | issue at the expense of exposing internal symbols to the outside | world. | |b. Mark LLVM 3.4 as unsupported | | At the moment I'm leaning towards (b) since I haven't had a chance | to | think through the implications of (a); if nothing else I suspect | this | wouldn't help the DLL symbol table size issues on Windows. Giving up | on LLVM 3.4 might be unfortunate for a good number of users, | however. | | Ultimately this underlines the need to package LLVM
Re: performance regressions
I've learned several very interesting things in this analysis. - Inlining polymorphic methods is very important. Here are some data points to back up that claim: * Original implementation using zipWithAndUnzipM:8,472,613,440 bytes allocated in the heap * Adding {-# INLINE #-} to the definition thereof: 6,639,253,488 bytes allocated in the heap * Using `inline` at call site to force inlining: 6,281,539,792 bytes allocated in the heap The middle step above allowed GHC to specialize zipWithAndUnzipM to my particular monad, but GHC didn't see fit to actually inline the function. Using `inline` forced it, to good effect. (I did not collect data on code sizes, but it wouldn't be hard to.) By comparison: * Hand-written recursion:6,587,809,112 bytes allocated in the heap Interestingly, this is *not* the best result! Conclusion: We should probably add INLINE pragmas to Util and MonadUtils. - I then looked at rejiggering the algorithm to keep the common case fast. This had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from Control.Monad. To my surprise, this brought disaster! * Using `inline` and mapAndUnzipM:7,463,047,432 bytes allocated in the heap * Hand-written recursion: 5,848,602,848 bytes allocated in the heap That last number is better than the numbers above because of the algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting inlined, but a call to `map` remained, perhaps causing extra allocation. Conclusion: We should examine the implementation of mapAndUnzipM (and similar functions) in Control.Monad. Is it as fast as possible? In the end, I was unable to bring the allocation numbers down to where they were before my work. This is because the flattener now deals in roles. Most of its behavior is the same between nominal and representational roles, so it seems silly (though very possible) to specialize the code to nominal to keep that path fast. Instead, I identified one key spot and made that go fast. Thus, there is a 7% bump to memory usage on very-type-family-heavy code, compared to before my commit on Friday. (On more ordinary code, there is no noticeable change.) Validating my patch locally now; will push when that's done. Thanks, Richard On Dec 16, 2014, at 10:41 AM, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg: On Dec 16, 2014, at 4:01 AM, Joachim Breitner m...@joachim-breitner.de wrote: another guess (without looking at the code, sorry): Are they in the same module? I.e., can GHC specialize the code to your particular Monad? No, they're not in the same module. I could also try moving the zipWithAndUnzipM function to the same module, and even specializing it by hand to the right monad. I did mean zipWithAndUnzipM, so maybe yes: Try that. (I find it hard to believe that any polymorphic monadic code should perform well, with those many calls to an unknown (=) with a function parameter, but maybe I’m too pessimistic here.) Could that be preventing the fusing? There is not going to be any fusing here, at least not list fusion; that would require your code to be written in terms of functions with fusion rules. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Would specialize pragmas also have these performance improvements, or is inline needed? On Tue, Dec 16, 2014 at 4:45 PM, Richard Eisenberg e...@cis.upenn.edu wrote: I've learned several very interesting things in this analysis. - Inlining polymorphic methods is very important. Here are some data points to back up that claim: * Original implementation using zipWithAndUnzipM:8,472,613,440 bytes allocated in the heap * Adding {-# INLINE #-} to the definition thereof: 6,639,253,488 bytes allocated in the heap * Using `inline` at call site to force inlining: 6,281,539,792 bytes allocated in the heap The middle step above allowed GHC to specialize zipWithAndUnzipM to my particular monad, but GHC didn't see fit to actually inline the function. Using `inline` forced it, to good effect. (I did not collect data on code sizes, but it wouldn't be hard to.) By comparison: * Hand-written recursion:6,587,809,112 bytes allocated in the heap Interestingly, this is *not* the best result! Conclusion: We should probably add INLINE pragmas to Util and MonadUtils. - I then looked at rejiggering the algorithm to keep the common case fast. This had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from Control.Monad. To my surprise, this brought disaster! * Using `inline` and mapAndUnzipM:7,463,047,432 bytes allocated in the heap * Hand-written recursion: 5,848,602,848 bytes allocated in the heap That last number is better than the numbers above because of the algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting inlined, but a call to `map` remained, perhaps causing extra allocation. Conclusion: We should examine the implementation of mapAndUnzipM (and similar functions) in Control.Monad. Is it as fast as possible? In the end, I was unable to bring the allocation numbers down to where they were before my work. This is because the flattener now deals in roles. Most of its behavior is the same between nominal and representational roles, so it seems silly (though very possible) to specialize the code to nominal to keep that path fast. Instead, I identified one key spot and made that go fast. Thus, there is a 7% bump to memory usage on very-type-family-heavy code, compared to before my commit on Friday. (On more ordinary code, there is no noticeable change.) Validating my patch locally now; will push when that's done. Thanks, Richard On Dec 16, 2014, at 10:41 AM, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg: On Dec 16, 2014, at 4:01 AM, Joachim Breitner m...@joachim-breitner.de wrote: another guess (without looking at the code, sorry): Are they in the same module? I.e., can GHC specialize the code to your particular Monad? No, they're not in the same module. I could also try moving the zipWithAndUnzipM function to the same module, and even specializing it by hand to the right monad. I did mean zipWithAndUnzipM, so maybe yes: Try that. (I find it hard to believe that any polymorphic monadic code should perform well, with those many calls to an unknown (=) with a function parameter, but maybe I’m too pessimistic here.) Could that be preventing the fusing? There is not going to be any fusing here, at least not list fusion; that would require your code to be written in terms of functions with fusion rules. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Hi, Am Sonntag, den 14.12.2014, 22:37 -0500 schrieb Richard Eisenberg: Of course, I now see that it wasn't a full fix. This is all most assuredly my fault. I wouldn’t call it a fault. Where wood is chopped, splinters must fall. (Hopefully correct translation of a German idiom.) We don’t _have_ to catch everything before it hits master, it is already pretty good if we manage to catch and fix regressions later. I guess we could use the known_broken feature of the test suite more quickly, to signal that an issue is being worked on and to avoid others from stumbling over test suite failures. - Travis has not picked up on these errors. unfortunately, travis is slighly less useful since a few weeks due to T5681 failing (possibly due to the use of LLVM-3.4), but I’m still waiting for an reply on that issue. But it wouldn’t have helped you: Travis generally skips all performance tests. These used to be far less reliable when I set up travis (this got better due to the removal of some of the max_bytes_allocated tests, and also due to harbormaster and ghcspeed monitoring), and also because we still hardly make the 50 minute deadline on Travis. So do not rely on Travis for performance measurements. (This is documented on https://ghc.haskell.org/trac/ghc/wiki/Travis, but it needs to become common knowledge.) Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Joachim Breitner m...@joachim-breitner.de writes: Hi, Am Sonntag, den 14.12.2014, 22:37 -0500 schrieb Richard Eisenberg: snip - Travis has not picked up on these errors. unfortunately, travis is slighly less useful since a few weeks due to T5681 failing (possibly due to the use of LLVM-3.4), but I’m still waiting for an reply on that issue. You aren't looking for a response from me on this, are you? I just checked and I don't seem to have any outstanding messages from you but it's entirely possible I overlooked something. Cheers, - Ben pgpy6aPIougxw.pgp Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Joachim Breitner nome...@debian.org writes: Hi, Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari: - Travis has not picked up on these errors. unfortunately, travis is slighly less useful since a few weeks due to T5681 failing (possibly due to the use of LLVM-3.4), but I’m still waiting for an reply on that issue. You aren't looking for a response from me on this, are you? I just checked and I don't seem to have any outstanding messages from you but it's entirely possible I overlooked something. this is independent of our arm issues, and I think a tad older; I did not direct it to anyone specific. But I guess you are likely a person that can tell what’s wrong here: Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner: Compile failed (status 256) errors were: /tmp/ghc16123_0/ghc16123_5.s: Assembler messages: /tmp/ghc16123_0/ghc16123_5.s:26:0: Error: can't resolve `.rodata' {.rodata section} - `Main_zdwwork_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:46:0: Error: can't resolve `.rodata' {.rodata section} - `Main_work_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:66:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main1_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:86:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:106:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main2_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:126:0: Error: can't resolve `.rodata' {.rodata section} - `ZCMain_main_info$def' {.text section} *** unexpected failure for T5681(optllvm) https://s3.amazonaws.com/archive.travis-ci.org/jobs/42557559/log.txt Any ideas? Is it possible that this is due the llvm version used? Do we support 3.4 in GHC HEAD? Using LLVM tools llc : /usr/local/clang-3.4/bin/llc opt : /usr/local/clang-3.4/bin/opt (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm-backend.html does not talk about GHC HEAD explicitly. Should I look at the 7.10 row? Does that mean that 3.4 is not supported? Shouldn’t the build system, or at least the compiler, fail harder and more helpfully in this case?) LLVM 3.4 appears to have an unfortunate behavior whereby it will lose track of which section symbols with Internal linkage belong. I haven't had a chance to delve into this too deeply, however given that both 3.3 and 3.5 behave as expected I'm pretty sure this a bug. There are a few options here, a. Mark the `$def` symbols as ExternallyVisible, working around the issue at the expense of exposing internal symbols to the outside world. b. Mark LLVM 3.4 as unsupported At the moment I'm leaning towards (b) since I haven't had a chance to think through the implications of (a); if nothing else I suspect this wouldn't help the DLL symbol table size issues on Windows. Giving up on LLVM 3.4 might be unfortunate for a good number of users, however. Ultimately this underlines the need to package LLVM with GHC. Cheers, - Ben pgpkUijPcDKUI.pgp Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
I've made progress, but still need some help. It turns out that a monadic combinator (that I wrote) is mostly responsible: zipWithAndUnzipM :: Monad m = (a - b - m (c, d)) - [a] - [b] - m ([c], [d]) zipWithAndUnzipM f (x:xs) (y:ys) = do { (c, d) - f x y ; (cs, ds) - zipWithAndUnzipM f xs ys ; return (c:cs, d:ds) } zipWithAndUnzipM _ _ _ = return ([], []) Using this combinator instead of writing the algorithm directly cost me 30% allocation overhead! Can anyone tell me: why? Have I made some horrible mistake in the implementation? And, relatedly: how can I fix this? I want to learn from this experience how to avoid this problem next time... Unfortunately, my commit causes 50% overhead, not 30%, so I'm not out of the woods yet. Hopefully, another 20% of good news tomorrow. Thanks! Richard On Dec 15, 2014, at 11:33 AM, Ben Gamari b...@smart-cactus.org wrote: Joachim Breitner nome...@debian.org writes: Hi, Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari: - Travis has not picked up on these errors. unfortunately, travis is slighly less useful since a few weeks due to T5681 failing (possibly due to the use of LLVM-3.4), but I’m still waiting for an reply on that issue. You aren't looking for a response from me on this, are you? I just checked and I don't seem to have any outstanding messages from you but it's entirely possible I overlooked something. this is independent of our arm issues, and I think a tad older; I did not direct it to anyone specific. But I guess you are likely a person that can tell what’s wrong here: Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner: Compile failed (status 256) errors were: /tmp/ghc16123_0/ghc16123_5.s: Assembler messages: /tmp/ghc16123_0/ghc16123_5.s:26:0: Error: can't resolve `.rodata' {.rodata section} - `Main_zdwwork_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:46:0: Error: can't resolve `.rodata' {.rodata section} - `Main_work_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:66:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main1_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:86:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:106:0: Error: can't resolve `.rodata' {.rodata section} - `Main_main2_info$def' {.text section} /tmp/ghc16123_0/ghc16123_5.s:126:0: Error: can't resolve `.rodata' {.rodata section} - `ZCMain_main_info$def' {.text section} *** unexpected failure for T5681(optllvm) https://s3.amazonaws.com/archive.travis-ci.org/jobs/42557559/log.txt Any ideas? Is it possible that this is due the llvm version used? Do we support 3.4 in GHC HEAD? Using LLVM tools llc : /usr/local/clang-3.4/bin/llc opt : /usr/local/clang-3.4/bin/opt (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm-backend.html does not talk about GHC HEAD explicitly. Should I look at the 7.10 row? Does that mean that 3.4 is not supported? Shouldn’t the build system, or at least the compiler, fail harder and more helpfully in this case?) LLVM 3.4 appears to have an unfortunate behavior whereby it will lose track of which section symbols with Internal linkage belong. I haven't had a chance to delve into this too deeply, however given that both 3.3 and 3.5 behave as expected I'm pretty sure this a bug. There are a few options here, a. Mark the `$def` symbols as ExternallyVisible, working around the issue at the expense of exposing internal symbols to the outside world. b. Mark LLVM 3.4 as unsupported At the moment I'm leaning towards (b) since I haven't had a chance to think through the implications of (a); if nothing else I suspect this wouldn't help the DLL symbol table size issues on Windows. Giving up on LLVM 3.4 might be unfortunate for a good number of users, however. Ultimately this underlines the need to package LLVM with GHC. Cheers, - Ben ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Hello Richard, Can you please push the fix asap to master? This performance failures are causing distracting false alarms (in terms of validation failures) on each pushed commit as well as submitted code-revisions. Thanks, HVR On 2014-12-13 at 16:55:40 +0100, Richard Eisenberg wrote: Fixed, hopefully! On Dec 13, 2014, at 10:03 AM, Richard Eisenberg e...@cis.upenn.edu wrote: I think I've fixed this. I've pushed the fix to wip/rae, and waiting for validation results before pushing to master. My hunch below was right -- it was the change to matchFam, which essentially evaluated type-level functions more strictly. I've now made it lazier again. I'd like to better understand the tradeoff here, and to see if there's a principled sweet spot. But that will happen in a few days. Expect a push to master soon. Again, sorry for the bother. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Hi, Am Samstag, den 13.12.2014, 10:55 -0500 schrieb Richard Eisenberg: Fixed, hopefully! Mitigated, but still a regression: http://ghcspeed-nomeata.rhcloud.com/timeline/?exe=2base=2%2B68ben=tests%2Falloc%2FT9872aenv=1revs=50equid=on# Is that now a level that we’ll have to live with, or is it still unexpectedly high? Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
I pushed my supposed fix yesterday morning, as I emailed out the Fixed, hopefully note. Of course, I now see that it wasn't a full fix. This is all most assuredly my fault. However, I also feel that elements of the infrastructure failed me somewhat, making this error easier for me to commit: - Travis has not picked up on these errors. - Harbormaster has seemed unreliable, finding spurious compile-time errors sometimes, and sometimes just failing to test code that it sees. For example, when I pushed Diff 1901 to D546, Harbormaster never even attempted. Also, when I pushed my fix, commit 3ec9391711, Harbormaster also skipped, as viewable here: https://phabricator.haskell.org/diffusion/GHC/ So, after pushing yesterday morning, I didn't get any email from Harbormaster saying that it failed, so I thought my fix was indeed a fix. Because my local build was a devel2 build (and that I had only about 20 minutes to work), I was unable to test locally -- all I could tell is that my fix lowered the numbers (as verified by ghcspeed). Having a weekend full of plans, there wasn't really any opportunity for me to do the work necessary to figure out what's going on. It will be first on my docket tomorrow. I suppose one lesson here is that I shouldn't push anything at all non-trivial on a Friday afternoon. But I also hope we can either improve the infrastructure (of course, it's *much* better than it was months ago!) or have realistic expectations of what we can expect from the infrastructure (e.g., trust Harbormaster/Travis when seeking feedback, but always validate locally before actually pushing to master). More tomorrow, Richard On Dec 14, 2014, at 3:47 PM, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Samstag, den 13.12.2014, 10:55 -0500 schrieb Richard Eisenberg: Fixed, hopefully! Mitigated, but still a regression: http://ghcspeed-nomeata.rhcloud.com/timeline/?exe=2base=2%2B68ben=tests%2Falloc%2FT9872aenv=1revs=50equid=on# Is that now a level that we’ll have to live with, or is it still unexpectedly high? Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Hi, Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg: Phab has shown up some performance regressions in my recent commits. See https://phabricator.haskell.org/harbormaster/build/2607/. The failures except for haddock.base are new, and evidently my fault. They didn't show up on Travis. Will look into it shortly, but I doubt over the weekend. ghcspeed also observes this: http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot Especially the T9872 benchmarks have a huge increase in allocations. But you seem to be aware of this, so that’s fine. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
I think I've fixed this. I've pushed the fix to wip/rae, and waiting for validation results before pushing to master. My hunch below was right -- it was the change to matchFam, which essentially evaluated type-level functions more strictly. I've now made it lazier again. I'd like to better understand the tradeoff here, and to see if there's a principled sweet spot. But that will happen in a few days. Expect a push to master soon. Again, sorry for the bother. Richard On Dec 13, 2014, at 8:32 AM, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg: Phab has shown up some performance regressions in my recent commits. See https://phabricator.haskell.org/harbormaster/build/2607/. The failures except for haddock.base are new, and evidently my fault. They didn't show up on Travis. Will look into it shortly, but I doubt over the weekend. ghcspeed also observes this: http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot Especially the T9872 benchmarks have a huge increase in allocations. But you seem to be aware of this, so that’s fine. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: performance regressions
Fixed, hopefully! On Dec 13, 2014, at 10:03 AM, Richard Eisenberg e...@cis.upenn.edu wrote: I think I've fixed this. I've pushed the fix to wip/rae, and waiting for validation results before pushing to master. My hunch below was right -- it was the change to matchFam, which essentially evaluated type-level functions more strictly. I've now made it lazier again. I'd like to better understand the tradeoff here, and to see if there's a principled sweet spot. But that will happen in a few days. Expect a push to master soon. Again, sorry for the bother. Richard On Dec 13, 2014, at 8:32 AM, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg: Phab has shown up some performance regressions in my recent commits. See https://phabricator.haskell.org/harbormaster/build/2607/. The failures except for haddock.base are new, and evidently my fault. They didn't show up on Travis. Will look into it shortly, but I doubt over the weekend. ghcspeed also observes this: http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot Especially the T9872 benchmarks have a huge increase in allocations. But you seem to be aware of this, so that’s fine. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nome...@debian.org ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs