Re: performance regressions

2014-12-19 Thread Richard Eisenberg
I've gained a little more insight here, but only a little.

INLINE doesn't work because zipWithAndUnzipM is recursive. Oddly, using 
worker/wrapper made allocations go *up*, so that's somehow not the answer.

When I use `inline`, GHC will inline the function once; recursive calls remain. 
This seems to the local minimum in allocations.
Using `inline` gives roughly a 7% allocation level change on this pathological 
case.

To reproduce, just compile HEAD's TcFlatten with and without the call to 
`inline`. You can see, using -ddump-simpl, what GHC is doing. Then, you can 
check the allocation numbers by compiling perf/compiler/T9872a.hs. My tests 
were all using the default build settings, with an unmodified build.mk, as that 
seemed to be the most performant setting.

In other news, a slight change to the algorithm (see #9872, comment 23) makes a 
50% improvement. Will hopefully commit tonight, after a full local validation 
run.

Richard

On Dec 17, 2014, at 10:59 AM, Simon Peyton Jones simo...@microsoft.com wrote:

 I still would like to understand why INLINE does not make it inline. That's 
 weird.
 
 Eg way to reproduce.
 
 Simion
 
 |  -Original Message-
 |  From: Richard Eisenberg [mailto:e...@cis.upenn.edu]
 |  Sent: 17 December 2014 15:56
 |  To: Simon Peyton Jones
 |  Cc: Joachim Breitner; ghc-devs@haskell.org
 |  Subject: Re: performance regressions
 |  
 |  By unsubstantiated guess is that INLINEABLE would have the same effect
 |  as INLINE here, as GHC doesn't see fit to actually inline the
 |  function, even with INLINE -- the big improvement seen between (1) and
 |  (2) is actually specialization, not inlining. The jump from (2) to (3)
 |  is actual inlining. Thus, it seems that GHC's heuristics for inlining
 |  aren't working out for the best here.
 |  
 |  I've pushed my changes, though I agree with Simon that more research
 |  may uncover even more improvements here. I didn't focus on the number
 |  of calls because that number didn't regress. Will look into this soon.
 |  
 |  Richard
 |  
 |  On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones
 |  simo...@microsoft.com wrote:
 |  
 |   If you use INLINEABLE, that should make the function specialisable
 |  to a particular monad, even if it's in a different module. You
 |  shouldn't need INLINE for that.
 |  
 |   I don't understand the difference between cases (2) and (3).
 |  
 |   I am still suspicious of why there are so many calls to this one
 |  function that it, alone, is allocating a significant proportion of
 |  compilation of the entire run of GHC.  Are you sure there isn't an
 |  algorithmic improvement to be had, to simply reduce the number of
 |  calls?
 |  
 |   Simon
 |  
 |   |  -Original Message-
 |   |  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
 |   | Richard Eisenberg
 |   |  Sent: 16 December 2014 21:46
 |   |  To: Joachim Breitner
 |   |  Cc: ghc-devs@haskell.org
 |   |  Subject: Re: performance regressions
 |   |
 |   |  I've learned several very interesting things in this analysis.
 |   |
 |   |  - Inlining polymorphic methods is very important. Here are some
 |   | data  points to back up that claim:
 |   | * Original implementation using zipWithAndUnzipM:
 |  8,472,613,440
 |   |  bytes allocated in the heap
 |   | * Adding {-# INLINE #-} to the definition thereof:
 |  6,639,253,488
 |   |  bytes allocated in the heap
 |   | * Using `inline` at call site to force inlining:
 |  6,281,539,792
 |   |  bytes allocated in the heap
 |   |
 |   |  The middle step above allowed GHC to specialize zipWithAndUnzipM
 |  to
 |   | my  particular monad, but GHC didn't see fit to actually inline
 |  the
 |   | function. Using `inline` forced it, to good effect. (I did not
 |   | collect  data on code sizes, but it wouldn't be hard to.)
 |   |
 |   |  By comparison:
 |   | * Hand-written recursion:6,587,809,112 bytes allocated in
 |  the
 |   |  heap
 |   |  Interestingly, this is *not* the best result!
 |   |
 |   |  Conclusion: We should probably add INLINE pragmas to Util and
 |   | MonadUtils.
 |   |
 |   |
 |   |  - I then looked at rejiggering the algorithm to keep the common
 |   | case  fast. This had a side effect of changing the
 |  zipWithAndUnzipM
 |   | to  mapAndUnzipM, from Control.Monad. To my surprise, this brought
 |   | disaster!
 |   | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes
 |   |  allocated in the heap
 |   | * Hand-written recursion: 5,848,602,848 bytes
 |   |  allocated in the heap
 |   |
 |   |  That last number is better than the numbers above because of the
 |   | algorithm streamlining. But, the inadequacy of mapAndUnzipM
 |   | surprised  me -- it already has an INLINE pragma in Control.Monad
 |  of course.
 |   |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed
 |   | getting  inlined, but a call to `map` remained, perhaps causing
 |   | extra  allocation.
 |   |
 |   |  Conclusion: We

RE: performance regressions

2014-12-17 Thread Simon Peyton Jones
If you use INLINEABLE, that should make the function specialisable to a 
particular monad, even if it's in a different module. You shouldn't need INLINE 
for that.

I don't understand the difference between cases (2) and (3).

I am still suspicious of why there are so many calls to this one function that 
it, alone, is allocating a significant proportion of compilation of the entire 
run of GHC.  Are you sure there isn't an algorithmic improvement to be had, to 
simply reduce the number of calls?

Simon

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|  Richard Eisenberg
|  Sent: 16 December 2014 21:46
|  To: Joachim Breitner
|  Cc: ghc-devs@haskell.org
|  Subject: Re: performance regressions
|  
|  I've learned several very interesting things in this analysis.
|  
|  - Inlining polymorphic methods is very important. Here are some data
|  points to back up that claim:
| * Original implementation using zipWithAndUnzipM:8,472,613,440
|  bytes allocated in the heap
| * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488
|  bytes allocated in the heap
| * Using `inline` at call site to force inlining: 6,281,539,792
|  bytes allocated in the heap
|  
|  The middle step above allowed GHC to specialize zipWithAndUnzipM to my
|  particular monad, but GHC didn't see fit to actually inline the
|  function. Using `inline` forced it, to good effect. (I did not collect
|  data on code sizes, but it wouldn't be hard to.)
|  
|  By comparison:
| * Hand-written recursion:6,587,809,112 bytes allocated in the
|  heap
|  Interestingly, this is *not* the best result!
|  
|  Conclusion: We should probably add INLINE pragmas to Util and
|  MonadUtils.
|  
|  
|  - I then looked at rejiggering the algorithm to keep the common case
|  fast. This had a side effect of changing the zipWithAndUnzipM to
|  mapAndUnzipM, from Control.Monad. To my surprise, this brought
|  disaster!
| * Using `inline` and mapAndUnzipM:7,463,047,432 bytes
|  allocated in the heap
| * Hand-written recursion: 5,848,602,848 bytes
|  allocated in the heap
|  
|  That last number is better than the numbers above because of the
|  algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised
|  me -- it already has an INLINE pragma in Control.Monad of course.
|  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting
|  inlined, but a call to `map` remained, perhaps causing extra
|  allocation.
|  
|  Conclusion: We should examine the implementation of mapAndUnzipM (and
|  similar functions) in Control.Monad. Is it as fast as possible?
|  
|  
|  
|  In the end, I was unable to bring the allocation numbers down to where
|  they were before my work. This is because the flattener now deals in
|  roles. Most of its behavior is the same between nominal and
|  representational roles, so it seems silly (though very possible) to
|  specialize the code to nominal to keep that path fast. Instead, I
|  identified one key spot and made that go fast.
|  
|  Thus, there is a 7% bump to memory usage on very-type-family-heavy
|  code, compared to before my commit on Friday. (On more ordinary code,
|  there is no noticeable change.)
|  
|  Validating my patch locally now; will push when that's done.
|  
|  Thanks,
|  Richard
|  
|  On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim-
|  breitner.de wrote:
|  
|   Hi,
|  
|  
|   Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
|   On Dec 16, 2014, at 4:01 AM, Joachim Breitner mail@joachim-
|  breitner.de wrote:
|  
|   another guess (without looking at the code, sorry): Are they in
|  the
|   same module? I.e., can GHC specialize the code to your particular
|  Monad?
|  
|   No, they're not in the same module. I could also try moving the
|   zipWithAndUnzipM function to the same module, and even specializing
|   it by hand to the right monad.
|  
|   I did mean zipWithAndUnzipM, so maybe yes: Try that.
|  
|   (I find it hard to believe that any polymorphic monadic code should
|   perform well, with those many calls to an unknown (=) with a
|   function parameter, but maybe I'm too pessimistic here.)
|  
|  
|   Could that be preventing the fusing?
|  
|   There is not going to be any fusing here, at least not list fusion;
|   that would require your code to be written in terms of functions
|  with
|   fusion rules.
|  
|   Greetings,
|   Joachim
|  
|   --
|   Joachim nomeata Breitner
|m...@joachim-breitner.de * http://www.joachim-breitner.de/
|Jabber: nome...@joachim-breitner.de  * GPG-Key: 0xF0FBF51F  Debian
|   Developer: nome...@debian.org
|  
|   ___
|   ghc-devs mailing list
|   ghc-devs@haskell.org
|   http://www.haskell.org/mailman/listinfo/ghc-devs
|  
|  ___
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  http://www.haskell.org/mailman/listinfo

Re: performance regressions

2014-12-17 Thread Richard Eisenberg
By unsubstantiated guess is that INLINEABLE would have the same effect as 
INLINE here, as GHC doesn't see fit to actually inline the function, even with 
INLINE -- the big improvement seen between (1) and (2) is actually 
specialization, not inlining. The jump from (2) to (3) is actual inlining. 
Thus, it seems that GHC's heuristics for inlining aren't working out for the 
best here.

I've pushed my changes, though I agree with Simon that more research may 
uncover even more improvements here. I didn't focus on the number of calls 
because that number didn't regress. Will look into this soon.

Richard

On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones simo...@microsoft.com wrote:

 If you use INLINEABLE, that should make the function specialisable to a 
 particular monad, even if it's in a different module. You shouldn't need 
 INLINE for that.
 
 I don't understand the difference between cases (2) and (3).
 
 I am still suspicious of why there are so many calls to this one function 
 that it, alone, is allocating a significant proportion of compilation of the 
 entire run of GHC.  Are you sure there isn't an algorithmic improvement to be 
 had, to simply reduce the number of calls?
 
 Simon
 
 |  -Original Message-
 |  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
 |  Richard Eisenberg
 |  Sent: 16 December 2014 21:46
 |  To: Joachim Breitner
 |  Cc: ghc-devs@haskell.org
 |  Subject: Re: performance regressions
 |  
 |  I've learned several very interesting things in this analysis.
 |  
 |  - Inlining polymorphic methods is very important. Here are some data
 |  points to back up that claim:
 | * Original implementation using zipWithAndUnzipM:8,472,613,440
 |  bytes allocated in the heap
 | * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488
 |  bytes allocated in the heap
 | * Using `inline` at call site to force inlining: 6,281,539,792
 |  bytes allocated in the heap
 |  
 |  The middle step above allowed GHC to specialize zipWithAndUnzipM to my
 |  particular monad, but GHC didn't see fit to actually inline the
 |  function. Using `inline` forced it, to good effect. (I did not collect
 |  data on code sizes, but it wouldn't be hard to.)
 |  
 |  By comparison:
 | * Hand-written recursion:6,587,809,112 bytes allocated in the
 |  heap
 |  Interestingly, this is *not* the best result!
 |  
 |  Conclusion: We should probably add INLINE pragmas to Util and
 |  MonadUtils.
 |  
 |  
 |  - I then looked at rejiggering the algorithm to keep the common case
 |  fast. This had a side effect of changing the zipWithAndUnzipM to
 |  mapAndUnzipM, from Control.Monad. To my surprise, this brought
 |  disaster!
 | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes
 |  allocated in the heap
 | * Hand-written recursion: 5,848,602,848 bytes
 |  allocated in the heap
 |  
 |  That last number is better than the numbers above because of the
 |  algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised
 |  me -- it already has an INLINE pragma in Control.Monad of course.
 |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting
 |  inlined, but a call to `map` remained, perhaps causing extra
 |  allocation.
 |  
 |  Conclusion: We should examine the implementation of mapAndUnzipM (and
 |  similar functions) in Control.Monad. Is it as fast as possible?
 |  
 |  
 |  
 |  In the end, I was unable to bring the allocation numbers down to where
 |  they were before my work. This is because the flattener now deals in
 |  roles. Most of its behavior is the same between nominal and
 |  representational roles, so it seems silly (though very possible) to
 |  specialize the code to nominal to keep that path fast. Instead, I
 |  identified one key spot and made that go fast.
 |  
 |  Thus, there is a 7% bump to memory usage on very-type-family-heavy
 |  code, compared to before my commit on Friday. (On more ordinary code,
 |  there is no noticeable change.)
 |  
 |  Validating my patch locally now; will push when that's done.
 |  
 |  Thanks,
 |  Richard
 |  
 |  On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim-
 |  breitner.de wrote:
 |  
 |   Hi,
 |  
 |  
 |   Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
 |   On Dec 16, 2014, at 4:01 AM, Joachim Breitner mail@joachim-
 |  breitner.de wrote:
 |  
 |   another guess (without looking at the code, sorry): Are they in
 |  the
 |   same module? I.e., can GHC specialize the code to your particular
 |  Monad?
 |  
 |   No, they're not in the same module. I could also try moving the
 |   zipWithAndUnzipM function to the same module, and even specializing
 |   it by hand to the right monad.
 |  
 |   I did mean zipWithAndUnzipM, so maybe yes: Try that.
 |  
 |   (I find it hard to believe that any polymorphic monadic code should
 |   perform well, with those many calls to an unknown (=) with a
 |   function parameter

RE: performance regressions

2014-12-17 Thread Simon Peyton Jones
I still would like to understand why INLINE does not make it inline. That's 
weird.

Eg way to reproduce.

Simion

|  -Original Message-
|  From: Richard Eisenberg [mailto:e...@cis.upenn.edu]
|  Sent: 17 December 2014 15:56
|  To: Simon Peyton Jones
|  Cc: Joachim Breitner; ghc-devs@haskell.org
|  Subject: Re: performance regressions
|  
|  By unsubstantiated guess is that INLINEABLE would have the same effect
|  as INLINE here, as GHC doesn't see fit to actually inline the
|  function, even with INLINE -- the big improvement seen between (1) and
|  (2) is actually specialization, not inlining. The jump from (2) to (3)
|  is actual inlining. Thus, it seems that GHC's heuristics for inlining
|  aren't working out for the best here.
|  
|  I've pushed my changes, though I agree with Simon that more research
|  may uncover even more improvements here. I didn't focus on the number
|  of calls because that number didn't regress. Will look into this soon.
|  
|  Richard
|  
|  On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones
|  simo...@microsoft.com wrote:
|  
|   If you use INLINEABLE, that should make the function specialisable
|  to a particular monad, even if it's in a different module. You
|  shouldn't need INLINE for that.
|  
|   I don't understand the difference between cases (2) and (3).
|  
|   I am still suspicious of why there are so many calls to this one
|  function that it, alone, is allocating a significant proportion of
|  compilation of the entire run of GHC.  Are you sure there isn't an
|  algorithmic improvement to be had, to simply reduce the number of
|  calls?
|  
|   Simon
|  
|   |  -Original Message-
|   |  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|   | Richard Eisenberg
|   |  Sent: 16 December 2014 21:46
|   |  To: Joachim Breitner
|   |  Cc: ghc-devs@haskell.org
|   |  Subject: Re: performance regressions
|   |
|   |  I've learned several very interesting things in this analysis.
|   |
|   |  - Inlining polymorphic methods is very important. Here are some
|   | data  points to back up that claim:
|   | * Original implementation using zipWithAndUnzipM:
|  8,472,613,440
|   |  bytes allocated in the heap
|   | * Adding {-# INLINE #-} to the definition thereof:
|  6,639,253,488
|   |  bytes allocated in the heap
|   | * Using `inline` at call site to force inlining:
|  6,281,539,792
|   |  bytes allocated in the heap
|   |
|   |  The middle step above allowed GHC to specialize zipWithAndUnzipM
|  to
|   | my  particular monad, but GHC didn't see fit to actually inline
|  the
|   | function. Using `inline` forced it, to good effect. (I did not
|   | collect  data on code sizes, but it wouldn't be hard to.)
|   |
|   |  By comparison:
|   | * Hand-written recursion:6,587,809,112 bytes allocated in
|  the
|   |  heap
|   |  Interestingly, this is *not* the best result!
|   |
|   |  Conclusion: We should probably add INLINE pragmas to Util and
|   | MonadUtils.
|   |
|   |
|   |  - I then looked at rejiggering the algorithm to keep the common
|   | case  fast. This had a side effect of changing the
|  zipWithAndUnzipM
|   | to  mapAndUnzipM, from Control.Monad. To my surprise, this brought
|   | disaster!
|   | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes
|   |  allocated in the heap
|   | * Hand-written recursion: 5,848,602,848 bytes
|   |  allocated in the heap
|   |
|   |  That last number is better than the numbers above because of the
|   | algorithm streamlining. But, the inadequacy of mapAndUnzipM
|   | surprised  me -- it already has an INLINE pragma in Control.Monad
|  of course.
|   |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed
|   | getting  inlined, but a call to `map` remained, perhaps causing
|   | extra  allocation.
|   |
|   |  Conclusion: We should examine the implementation of mapAndUnzipM
|   | (and  similar functions) in Control.Monad. Is it as fast as
|  possible?
|   |
|   |
|   |
|   |  In the end, I was unable to bring the allocation numbers down to
|   | where  they were before my work. This is because the flattener now
|   | deals in  roles. Most of its behavior is the same between nominal
|   | and  representational roles, so it seems silly (though very
|   | possible) to  specialize the code to nominal to keep that path
|  fast.
|   | Instead, I  identified one key spot and made that go fast.
|   |
|   |  Thus, there is a 7% bump to memory usage on very-type-family-
|  heavy
|   | code, compared to before my commit on Friday. (On more ordinary
|   | code,  there is no noticeable change.)
|   |
|   |  Validating my patch locally now; will push when that's done.
|   |
|   |  Thanks,
|   |  Richard
|   |
|   |  On Dec 16, 2014, at 10:41 AM, Joachim Breitner mail@joachim-
|   | breitner.de wrote:
|   |
|   |   Hi,
|   |  
|   |  
|   |   Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard
|  Eisenberg:
|   |   On Dec 16, 2014, at 4:01 AM

Re: performance regressions

2014-12-17 Thread John Lato
Is it possible INLINE didn't inline the function because it's recursive? If
it were my function, I'd probably try a manual worker /wrapper.

On 07:59, Wed, Dec 17, 2014 Simon Peyton Jones simo...@microsoft.com
wrote:

 I still would like to understand why INLINE does not make it inline.
 That's weird.

 Eg way to reproduce.

 Simion

 |  -Original Message-
 |  From: Richard Eisenberg [mailto:e...@cis.upenn.edu]
 |  Sent: 17 December 2014 15:56
 |  To: Simon Peyton Jones
 |  Cc: Joachim Breitner; ghc-devs@haskell.org
 |  Subject: Re: performance regressions
 |
 |  By unsubstantiated guess is that INLINEABLE would have the same effect
 |  as INLINE here, as GHC doesn't see fit to actually inline the
 |  function, even with INLINE -- the big improvement seen between (1) and
 |  (2) is actually specialization, not inlining. The jump from (2) to (3)
 |  is actual inlining. Thus, it seems that GHC's heuristics for inlining
 |  aren't working out for the best here.
 |
 |  I've pushed my changes, though I agree with Simon that more research
 |  may uncover even more improvements here. I didn't focus on the number
 |  of calls because that number didn't regress. Will look into this soon.
 |
 |  Richard
 |
 |  On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones
 |  simo...@microsoft.com wrote:
 |
 |   If you use INLINEABLE, that should make the function specialisable
 |  to a particular monad, even if it's in a different module. You
 |  shouldn't need INLINE for that.
 |  
 |   I don't understand the difference between cases (2) and (3).
 |  
 |   I am still suspicious of why there are so many calls to this one
 |  function that it, alone, is allocating a significant proportion of
 |  compilation of the entire run of GHC.  Are you sure there isn't an
 |  algorithmic improvement to be had, to simply reduce the number of
 |  calls?
 |  
 |   Simon
 |  
 |   |  -Original Message-
 |   |  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
 |   | Richard Eisenberg
 |   |  Sent: 16 December 2014 21:46
 |   |  To: Joachim Breitner
 |   |  Cc: ghc-devs@haskell.org
 |   |  Subject: Re: performance regressions
 |   |
 |   |  I've learned several very interesting things in this analysis.
 |   |
 |   |  - Inlining polymorphic methods is very important. Here are some
 |   | data  points to back up that claim:
 |   | * Original implementation using zipWithAndUnzipM:
 |  8,472,613,440
 |   |  bytes allocated in the heap
 |   | * Adding {-# INLINE #-} to the definition thereof:
 |  6,639,253,488
 |   |  bytes allocated in the heap
 |   | * Using `inline` at call site to force inlining:
 |  6,281,539,792
 |   |  bytes allocated in the heap
 |   |
 |   |  The middle step above allowed GHC to specialize zipWithAndUnzipM
 |  to
 |   | my  particular monad, but GHC didn't see fit to actually inline
 |  the
 |   | function. Using `inline` forced it, to good effect. (I did not
 |   | collect  data on code sizes, but it wouldn't be hard to.)
 |   |
 |   |  By comparison:
 |   | * Hand-written recursion:6,587,809,112 bytes allocated in
 |  the
 |   |  heap
 |   |  Interestingly, this is *not* the best result!
 |   |
 |   |  Conclusion: We should probably add INLINE pragmas to Util and
 |   | MonadUtils.
 |   |
 |   |
 |   |  - I then looked at rejiggering the algorithm to keep the common
 |   | case  fast. This had a side effect of changing the
 |  zipWithAndUnzipM
 |   | to  mapAndUnzipM, from Control.Monad. To my surprise, this brought
 |   | disaster!
 |   | * Using `inline` and mapAndUnzipM:7,463,047,432 bytes
 |   |  allocated in the heap
 |   | * Hand-written recursion: 5,848,602,848 bytes
 |   |  allocated in the heap
 |   |
 |   |  That last number is better than the numbers above because of the
 |   | algorithm streamlining. But, the inadequacy of mapAndUnzipM
 |   | surprised  me -- it already has an INLINE pragma in Control.Monad
 |  of course.
 |   |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed
 |   | getting  inlined, but a call to `map` remained, perhaps causing
 |   | extra  allocation.
 |   |
 |   |  Conclusion: We should examine the implementation of mapAndUnzipM
 |   | (and  similar functions) in Control.Monad. Is it as fast as
 |  possible?
 |   |
 |   |
 |   |
 |   |  In the end, I was unable to bring the allocation numbers down to
 |   | where  they were before my work. This is because the flattener now
 |   | deals in  roles. Most of its behavior is the same between nominal
 |   | and  representational roles, so it seems silly (though very
 |   | possible) to  specialize the code to nominal to keep that path
 |  fast.
 |   | Instead, I  identified one key spot and made that go fast.
 |   |
 |   |  Thus, there is a 7% bump to memory usage on very-type-family-
 |  heavy
 |   | code, compared to before my commit on Friday. (On more ordinary
 |   | code,  there is no noticeable change.)
 |   |
 |   |  Validating my patch

Re: performance regressions

2014-12-17 Thread Herbert Valerio Riedel
On 2014-12-16 at 22:45:36 +0100, Richard Eisenberg wrote:
 I've learned several very interesting things in this analysis.

 - Inlining polymorphic methods is very important. 

otoh, there are cases where marking methods INLINE have catastrophic
effects; the following

  
https://github.com/kolmodin/binary/commit/48c02966512a67b018fcdf093fab8d34bddf9a69

was necessary a few months ago, as otherwise GHC HEAD's compile-time
memory usage would explode:

  https://ghc.haskell.org/trac/ghc/ticket/9630

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-16 Thread Jan Stolarek
 Using this combinator instead of writing the algorithm directly cost me 30%
 allocation overhead!
What does your algorithm look like when you write it directly? Something like 
this:

flatten_many fmode roles tys
 = unzip `liftM` mapM go (zip roles tys)
 where
   go (Nominal,ty)  = flatten_one (fmode { fe_eq_rel = NomEq }) ty
   go (Representational,ty) = flatten_one (fmode { fe_eq_rel = ReprEq }) ty
   go (Phantom, ty) = -- See Note [Phantoms in the flattener]
  return (ty, mkTcPhantomCo ty ty)

?

Maybe this has something to do with `zipWithAndUnzipM` not being tail-recursive 
vs. direct version 
being able to fuse intermediate lists?

Janek
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-16 Thread Joachim Breitner
Hi,

Am Montag, den 15.12.2014, 23:48 -0500 schrieb Richard Eisenberg:
 Can anyone tell me: why? Have I made some horrible mistake in the 
 implementation? 
 And, relatedly: how can I fix this? I want to learn from this experience how 
 to avoid this problem next time...

another guess (without looking at the code, sorry): Are they in the same
module? I.e., can GHC specialize the code to your particular Monad?

Greetings,
Joachim

-- 
Joachim Breitner
  e-Mail: m...@joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nome...@joachim-breitner.de



signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


RE: performance regressions

2014-12-16 Thread Simon Peyton Jones
|  Using this combinator instead of writing the algorithm directly cost
|  me 30% allocation overhead!

That seems surprising.  I'd build a profiled compiler (GhcProfiled=YES) and see 
what it says.  

If it increases allocation by 30% overall, there must be a LOT of calls to this 
function.  Should there be so many?

S

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|  Richard Eisenberg
|  Sent: 16 December 2014 04:49
|  To: Ben Gamari
|  Cc: Joachim Breitner; ghc-devs@haskell.org
|  Subject: Re: performance regressions
|  
|  I've made progress, but still need some help.
|  
|  It turns out that a monadic combinator (that I wrote) is mostly
|  responsible:
|  
|   zipWithAndUnzipM :: Monad m
|= (a - b - m (c, d)) - [a] - [b] - m ([c],
|  [d])
|   zipWithAndUnzipM f (x:xs) (y:ys)
| = do { (c, d) - f x y
|  ; (cs, ds) - zipWithAndUnzipM f xs ys
|  ; return (c:cs, d:ds) }
|   zipWithAndUnzipM _ _ _ = return ([], [])
|  
|  
|  Using this combinator instead of writing the algorithm directly cost
|  me 30% allocation overhead!
|  
|  Can anyone tell me: why? Have I made some horrible mistake in the
|  implementation?
|  And, relatedly: how can I fix this? I want to learn from this
|  experience how to avoid this problem next time...
|  
|  Unfortunately, my commit causes 50% overhead, not 30%, so I'm not out
|  of the woods yet. Hopefully, another 20% of good news tomorrow.
|  
|  Thanks!
|  Richard
|  
|  On Dec 15, 2014, at 11:33 AM, Ben Gamari b...@smart-cactus.org wrote:
|  
|   Joachim Breitner nome...@debian.org writes:
|  
|   Hi,
|  
|   Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari:
|   - Travis has not picked up on these errors.
|  
|   unfortunately, travis is slighly less useful since a few weeks
|  due
|   to
|   T5681 failing (possibly due to the use of LLVM-3.4), but I'm
|  still
|   waiting for an reply on that issue.
|  
|   You aren't looking for a response from me on this, are you? I just
|   checked and I don't seem to have any outstanding messages from you
|   but it's entirely possible I overlooked something.
|  
|   this is independent of our arm issues, and I think a tad older; I
|  did
|   not direct it to anyone specific.
|  
|   But I guess you are likely a person that can tell what's wrong
|  here:
|  
|   Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner:
|   Compile failed (status 256) errors were:
|   /tmp/ghc16123_0/ghc16123_5.s: Assembler messages:
|  
|   /tmp/ghc16123_0/ghc16123_5.s:26:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `Main_zdwwork_info$def' {.text section}
|  
|   /tmp/ghc16123_0/ghc16123_5.s:46:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `Main_work_info$def' {.text section}
|  
|   /tmp/ghc16123_0/ghc16123_5.s:66:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `Main_main1_info$def' {.text section}
|  
|   /tmp/ghc16123_0/ghc16123_5.s:86:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `Main_main_info$def' {.text section}
|  
|   /tmp/ghc16123_0/ghc16123_5.s:106:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `Main_main2_info$def' {.text section}
|  
|   /tmp/ghc16123_0/ghc16123_5.s:126:0:
|   Error: can't resolve `.rodata' {.rodata section} -
|   `ZCMain_main_info$def' {.text section}
|  
|   *** unexpected failure for T5681(optllvm)
|  
|   https://s3.amazonaws.com/archive.travis-
|  ci.org/jobs/42557559/log.txt
|  
|   Any ideas?
|  
|   Is it possible that this is due the llvm version used? Do we
|  support
|   3.4 in GHC HEAD?
|  
| Using LLVM tools
|llc   : /usr/local/clang-3.4/bin/llc
|opt   : /usr/local/clang-3.4/bin/opt
|  
|   (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm-
|  backend.
|   html does not talk about GHC HEAD explicitly. Should I look at the
|   7.10 row? Does that mean that 3.4 is not supported? Shouldn't the
|   build system, or at least the compiler, fail harder and more
|   helpfully in this case?)
|  
|   LLVM 3.4 appears to have an unfortunate behavior whereby it will
|  lose
|   track of which section symbols with Internal linkage belong.  I
|   haven't had a chance to delve into this too deeply, however given
|  that
|   both 3.3 and
|   3.5 behave as expected I'm pretty sure this a bug. There are a few
|   options here,
|  
|a. Mark the `$def` symbols as ExternallyVisible, working around the
|   issue at the expense of exposing internal symbols to the outside
|   world.
|  
|b. Mark LLVM 3.4 as unsupported
|  
|   At the moment I'm leaning towards (b) since I haven't had a chance
|  to
|   think through the implications of (a); if nothing else I suspect
|  this
|   wouldn't help the DLL symbol table size issues on Windows. Giving up
|   on LLVM 3.4 might be unfortunate for a good number of users,
|  however.
|  
|   Ultimately this underlines the need to package LLVM

Re: performance regressions

2014-12-16 Thread Richard Eisenberg
I've learned several very interesting things in this analysis.

- Inlining polymorphic methods is very important. Here are some data points to 
back up that claim:
   * Original implementation using zipWithAndUnzipM:8,472,613,440 bytes 
allocated in the heap
   * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488 bytes 
allocated in the heap
   * Using `inline` at call site to force inlining: 6,281,539,792 bytes 
allocated in the heap

The middle step above allowed GHC to specialize zipWithAndUnzipM to my 
particular monad, but GHC didn't see fit to actually inline the function. Using 
`inline` forced it, to good effect. (I did not collect data on code sizes, but 
it wouldn't be hard to.)

By comparison:
   * Hand-written recursion:6,587,809,112 bytes allocated in the heap
Interestingly, this is *not* the best result!

Conclusion: We should probably add INLINE pragmas to Util and MonadUtils.


- I then looked at rejiggering the algorithm to keep the common case fast. This 
had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from 
Control.Monad. To my surprise, this brought disaster!
   * Using `inline` and mapAndUnzipM:7,463,047,432 bytes allocated in 
the heap
   * Hand-written recursion: 5,848,602,848 bytes allocated in 
the heap

That last number is better than the numbers above because of the algorithm 
streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already 
has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it 
seems that mapAndUnzipM was indeed getting inlined, but a call to `map` 
remained, perhaps causing extra allocation.

Conclusion: We should examine the implementation of mapAndUnzipM (and similar 
functions) in Control.Monad. Is it as fast as possible?



In the end, I was unable to bring the allocation numbers down to where they 
were before my work. This is because the flattener now deals in roles. Most of 
its behavior is the same between nominal and representational roles, so it 
seems silly (though very possible) to specialize the code to nominal to keep 
that path fast. Instead, I identified one key spot and made that go fast. 

Thus, there is a 7% bump to memory usage on very-type-family-heavy code, 
compared to before my commit on Friday. (On more ordinary code, there is no 
noticeable change.)

Validating my patch locally now; will push when that's done.

Thanks,
Richard

On Dec 16, 2014, at 10:41 AM, Joachim Breitner m...@joachim-breitner.de wrote:

 Hi,
 
 
 Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
 On Dec 16, 2014, at 4:01 AM, Joachim Breitner m...@joachim-breitner.de 
 wrote:
 
 another guess (without looking at the code, sorry): Are they in the same
 module? I.e., can GHC specialize the code to your particular Monad?
 
 No, they're not in the same module. I could also try moving the
 zipWithAndUnzipM function to the same module, and even specializing it
 by hand to the right monad.
 
 I did mean zipWithAndUnzipM, so maybe yes: Try that.
 
 (I find it hard to believe that any polymorphic monadic code should
 perform well, with those many calls to an unknown (=) with a function
 parameter, but maybe I’m too pessimistic here.)
 
 
 Could that be preventing the fusing?
 
 There is not going to be any fusing here, at least not list fusion; that
 would require your code to be written in terms of functions with fusion
 rules.
 
 Greetings,
 Joachim
 
 -- 
 Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org
 
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-16 Thread Carter Schonwald
Would specialize pragmas also have these performance improvements, or is
inline needed?


On Tue, Dec 16, 2014 at 4:45 PM, Richard Eisenberg e...@cis.upenn.edu
wrote:

 I've learned several very interesting things in this analysis.

 - Inlining polymorphic methods is very important. Here are some data
 points to back up that claim:
* Original implementation using zipWithAndUnzipM:8,472,613,440
 bytes allocated in the heap
* Adding {-# INLINE #-} to the definition thereof:   6,639,253,488
 bytes allocated in the heap
* Using `inline` at call site to force inlining: 6,281,539,792
 bytes allocated in the heap

 The middle step above allowed GHC to specialize zipWithAndUnzipM to my
 particular monad, but GHC didn't see fit to actually inline the function.
 Using `inline` forced it, to good effect. (I did not collect data on code
 sizes, but it wouldn't be hard to.)

 By comparison:
* Hand-written recursion:6,587,809,112 bytes allocated in the heap
 Interestingly, this is *not* the best result!

 Conclusion: We should probably add INLINE pragmas to Util and MonadUtils.


 - I then looked at rejiggering the algorithm to keep the common case fast.
 This had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM,
 from Control.Monad. To my surprise, this brought disaster!
* Using `inline` and mapAndUnzipM:7,463,047,432 bytes allocated
 in the heap
* Hand-written recursion: 5,848,602,848 bytes allocated
 in the heap

 That last number is better than the numbers above because of the algorithm
 streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it
 already has an INLINE pragma in Control.Monad of course. Looking at
 -ddump-simpl, it seems that mapAndUnzipM was indeed getting inlined, but a
 call to `map` remained, perhaps causing extra allocation.

 Conclusion: We should examine the implementation of mapAndUnzipM (and
 similar functions) in Control.Monad. Is it as fast as possible?



 In the end, I was unable to bring the allocation numbers down to where
 they were before my work. This is because the flattener now deals in roles.
 Most of its behavior is the same between nominal and representational
 roles, so it seems silly (though very possible) to specialize the code to
 nominal to keep that path fast. Instead, I identified one key spot and made
 that go fast.

 Thus, there is a 7% bump to memory usage on very-type-family-heavy code,
 compared to before my commit on Friday. (On more ordinary code, there is no
 noticeable change.)

 Validating my patch locally now; will push when that's done.

 Thanks,
 Richard

 On Dec 16, 2014, at 10:41 AM, Joachim Breitner m...@joachim-breitner.de
 wrote:

  Hi,
 
 
  Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
  On Dec 16, 2014, at 4:01 AM, Joachim Breitner m...@joachim-breitner.de
 wrote:
 
  another guess (without looking at the code, sorry): Are they in the
 same
  module? I.e., can GHC specialize the code to your particular Monad?
 
  No, they're not in the same module. I could also try moving the
  zipWithAndUnzipM function to the same module, and even specializing it
  by hand to the right monad.
 
  I did mean zipWithAndUnzipM, so maybe yes: Try that.
 
  (I find it hard to believe that any polymorphic monadic code should
  perform well, with those many calls to an unknown (=) with a function
  parameter, but maybe I’m too pessimistic here.)
 
 
  Could that be preventing the fusing?
 
  There is not going to be any fusing here, at least not list fusion; that
  would require your code to be written in terms of functions with fusion
  rules.
 
  Greetings,
  Joachim
 
  --
  Joachim “nomeata” Breitner
   m...@joachim-breitner.de • http://www.joachim-breitner.de/
   Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
   Debian Developer: nome...@debian.org
 
  ___
  ghc-devs mailing list
  ghc-devs@haskell.org
  http://www.haskell.org/mailman/listinfo/ghc-devs

 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-15 Thread Joachim Breitner
Hi,

Am Sonntag, den 14.12.2014, 22:37 -0500 schrieb Richard Eisenberg:
 Of course, I now see that it wasn't a full fix.
 
 This is all most assuredly my fault.

I wouldn’t call it a fault. Where wood is chopped, splinters must fall.
(Hopefully correct translation of a German idiom.)

We don’t _have_ to catch everything before it hits master, it is already
pretty good if we manage to catch and fix regressions later.

I guess we could use the known_broken feature of the test suite more
quickly, to signal that an issue is being worked on and to avoid others
from stumbling over test suite failures.

 - Travis has not picked up on these errors.

unfortunately, travis is slighly less useful since a few weeks due to
T5681 failing (possibly due to the use of LLVM-3.4), but I’m still
waiting for an reply on that issue.

But it wouldn’t have helped you: Travis generally skips all performance
tests. These used to be far less reliable when I set up travis (this got
better due to the removal of some of the max_bytes_allocated tests, and
also due to harbormaster and ghcspeed monitoring), and also because we
still hardly make the 50 minute deadline on Travis.

So do not rely on Travis for performance measurements.
(This is documented on https://ghc.haskell.org/trac/ghc/wiki/Travis, but
it needs to become common knowledge.)


Greetings,
Joachim

-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org



signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-15 Thread Ben Gamari
Joachim Breitner m...@joachim-breitner.de writes:

 Hi,

 Am Sonntag, den 14.12.2014, 22:37 -0500 schrieb Richard Eisenberg:

snip

 - Travis has not picked up on these errors.

 unfortunately, travis is slighly less useful since a few weeks due to
 T5681 failing (possibly due to the use of LLVM-3.4), but I’m still
 waiting for an reply on that issue.

You aren't looking for a response from me on this, are you? I just
checked and I don't seem to have any outstanding messages from you but
it's entirely possible I overlooked something.

Cheers,

- Ben



pgpy6aPIougxw.pgp
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-15 Thread Ben Gamari
Joachim Breitner nome...@debian.org writes:

 Hi,

 Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari:
  - Travis has not picked up on these errors.
 
  unfortunately, travis is slighly less useful since a few weeks due to
  T5681 failing (possibly due to the use of LLVM-3.4), but I’m still
  waiting for an reply on that issue.
 
 You aren't looking for a response from me on this, are you? I just
 checked and I don't seem to have any outstanding messages from you but
 it's entirely possible I overlooked something.

 this is independent of our arm issues, and I think a tad older; I did
 not direct it to anyone specific.

 But I guess you are likely a person that can tell what’s wrong here:

 Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner:
 Compile failed (status 256) errors were:
 /tmp/ghc16123_0/ghc16123_5.s: Assembler messages:
 
 /tmp/ghc16123_0/ghc16123_5.s:26:0:
  Error: can't resolve `.rodata' {.rodata section} - 
 `Main_zdwwork_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:46:0:
  Error: can't resolve `.rodata' {.rodata section} - `Main_work_info$def' 
 {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:66:0:
  Error: can't resolve `.rodata' {.rodata section} - 
 `Main_main1_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:86:0:
  Error: can't resolve `.rodata' {.rodata section} - `Main_main_info$def' 
 {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:106:0:
  Error: can't resolve `.rodata' {.rodata section} - 
 `Main_main2_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:126:0:
  Error: can't resolve `.rodata' {.rodata section} - 
 `ZCMain_main_info$def' {.text section}
 
 *** unexpected failure for T5681(optllvm)
 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/42557559/log.txt
 
 Any ideas?

 Is it possible that this is due the llvm version used? Do we support 3.4
 in GHC HEAD?

Using LLVM tools
   llc   : /usr/local/clang-3.4/bin/llc
   opt   : /usr/local/clang-3.4/bin/opt

 (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm-backend.html
 does not talk about GHC HEAD explicitly. Should I look at the 7.10
 row? Does that mean that 3.4 is not supported? Shouldn’t the build
 system, or at least the compiler, fail harder and more helpfully in
 this case?)

LLVM 3.4 appears to have an unfortunate behavior whereby it will lose track
of which section symbols with Internal linkage belong.  I haven't had a
chance to delve into this too deeply, however given that both 3.3 and
3.5 behave as expected I'm pretty sure this a bug. There are a few
options here,

  a. Mark the `$def` symbols as ExternallyVisible, working around the
  issue at the expense of exposing internal symbols to the outside
  world.

  b. Mark LLVM 3.4 as unsupported

At the moment I'm leaning towards (b) since I haven't had a chance to
think through the implications of (a); if nothing else I suspect this
wouldn't help the DLL symbol table size issues on Windows. Giving up on
LLVM 3.4 might be unfortunate for a good number of users, however.

Ultimately this underlines the need to package LLVM with GHC.

Cheers,

- Ben




pgpkUijPcDKUI.pgp
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-15 Thread Richard Eisenberg
I've made progress, but still need some help.

It turns out that a monadic combinator (that I wrote) is mostly responsible:

 zipWithAndUnzipM :: Monad m
  = (a - b - m (c, d)) - [a] - [b] - m ([c], [d])
 zipWithAndUnzipM f (x:xs) (y:ys)
   = do { (c, d) - f x y
; (cs, ds) - zipWithAndUnzipM f xs ys
; return (c:cs, d:ds) }
 zipWithAndUnzipM _ _ _ = return ([], [])
 

Using this combinator instead of writing the algorithm directly cost me 30% 
allocation overhead!

Can anyone tell me: why? Have I made some horrible mistake in the 
implementation? 
And, relatedly: how can I fix this? I want to learn from this experience how to 
avoid this problem next time...

Unfortunately, my commit causes 50% overhead, not 30%, so I'm not out of the 
woods yet. Hopefully, another 20% of good news tomorrow.

Thanks!
Richard

On Dec 15, 2014, at 11:33 AM, Ben Gamari b...@smart-cactus.org wrote:

 Joachim Breitner nome...@debian.org writes:
 
 Hi,
 
 Am Montag, den 15.12.2014, 10:58 -0500 schrieb Ben Gamari:
 - Travis has not picked up on these errors.
 
 unfortunately, travis is slighly less useful since a few weeks due to
 T5681 failing (possibly due to the use of LLVM-3.4), but I’m still
 waiting for an reply on that issue.
 
 You aren't looking for a response from me on this, are you? I just
 checked and I don't seem to have any outstanding messages from you but
 it's entirely possible I overlooked something.
 
 this is independent of our arm issues, and I think a tad older; I did
 not direct it to anyone specific.
 
 But I guess you are likely a person that can tell what’s wrong here:
 
 Am Sonntag, den 30.11.2014, 20:01 +0100 schrieb Joachim Breitner:
 Compile failed (status 256) errors were:
 /tmp/ghc16123_0/ghc16123_5.s: Assembler messages:
 
 /tmp/ghc16123_0/ghc16123_5.s:26:0:
 Error: can't resolve `.rodata' {.rodata section} - 
 `Main_zdwwork_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:46:0:
 Error: can't resolve `.rodata' {.rodata section} - `Main_work_info$def' 
 {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:66:0:
 Error: can't resolve `.rodata' {.rodata section} - 
 `Main_main1_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:86:0:
 Error: can't resolve `.rodata' {.rodata section} - `Main_main_info$def' 
 {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:106:0:
 Error: can't resolve `.rodata' {.rodata section} - 
 `Main_main2_info$def' {.text section}
 
 /tmp/ghc16123_0/ghc16123_5.s:126:0:
 Error: can't resolve `.rodata' {.rodata section} - 
 `ZCMain_main_info$def' {.text section}
 
 *** unexpected failure for T5681(optllvm)
 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/42557559/log.txt
 
 Any ideas?
 
 Is it possible that this is due the llvm version used? Do we support 3.4
 in GHC HEAD?
 
   Using LLVM tools
  llc   : /usr/local/clang-3.4/bin/llc
  opt   : /usr/local/clang-3.4/bin/opt
 
 (http://smart-cactus.org/~ben/posts/2014-11-28-state-of-llvm-backend.html
 does not talk about GHC HEAD explicitly. Should I look at the 7.10
 row? Does that mean that 3.4 is not supported? Shouldn’t the build
 system, or at least the compiler, fail harder and more helpfully in
 this case?)
 
 LLVM 3.4 appears to have an unfortunate behavior whereby it will lose track
 of which section symbols with Internal linkage belong.  I haven't had a
 chance to delve into this too deeply, however given that both 3.3 and
 3.5 behave as expected I'm pretty sure this a bug. There are a few
 options here,
 
  a. Mark the `$def` symbols as ExternallyVisible, working around the
  issue at the expense of exposing internal symbols to the outside
  world.
 
  b. Mark LLVM 3.4 as unsupported
 
 At the moment I'm leaning towards (b) since I haven't had a chance to
 think through the implications of (a); if nothing else I suspect this
 wouldn't help the DLL symbol table size issues on Windows. Giving up on
 LLVM 3.4 might be unfortunate for a good number of users, however.
 
 Ultimately this underlines the need to package LLVM with GHC.
 
 Cheers,
 
 - Ben
 
 
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-14 Thread Herbert Valerio Riedel
Hello Richard,

Can you please push the fix asap to master? This performance failures
are causing distracting false alarms (in terms of validation failures)
on each pushed commit as well as submitted code-revisions.

Thanks,
  HVR
  
On 2014-12-13 at 16:55:40 +0100, Richard Eisenberg wrote:
 Fixed, hopefully!

 On Dec 13, 2014, at 10:03 AM, Richard Eisenberg e...@cis.upenn.edu wrote:

 I think I've fixed this. I've pushed the fix to wip/rae, and waiting for 
 validation results before pushing to master.
 
 My hunch below was right -- it was the change to matchFam, which
 essentially evaluated type-level functions more strictly. I've now
 made it lazier again. I'd like to better understand the tradeoff
 here, and to see if there's a principled sweet spot. But that will
 happen in a few days.
 
 Expect a push to master soon.
 
 Again, sorry for the bother.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-14 Thread Joachim Breitner
Hi,

Am Samstag, den 13.12.2014, 10:55 -0500 schrieb Richard Eisenberg:
 Fixed, hopefully!

Mitigated, but still a regression:

http://ghcspeed-nomeata.rhcloud.com/timeline/?exe=2base=2%2B68ben=tests%2Falloc%2FT9872aenv=1revs=50equid=on#

Is that now a level that we’ll have to live with, or is it still
unexpectedly high?

Greetings,
Joachim

-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org



signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-14 Thread Richard Eisenberg
I pushed my supposed fix yesterday morning, as I emailed out the Fixed, 
hopefully note.

Of course, I now see that it wasn't a full fix.

This is all most assuredly my fault. However, I also feel that elements of the 
infrastructure failed me somewhat, making this error easier for me to commit:
- Travis has not picked up on these errors.
- Harbormaster has seemed unreliable, finding spurious compile-time errors 
sometimes, and sometimes just failing to test code that it sees. For example, 
when I pushed Diff 1901 to D546, Harbormaster never even attempted. Also, when 
I pushed my fix, commit 3ec9391711, Harbormaster also skipped, as viewable 
here: https://phabricator.haskell.org/diffusion/GHC/

So, after pushing yesterday morning, I didn't get any email from Harbormaster 
saying that it failed, so I thought my fix was indeed a fix. Because my local 
build was a devel2 build (and that I had only about 20 minutes to work), I was 
unable to test locally -- all I could tell is that my fix lowered the numbers 
(as verified by ghcspeed).

Having a weekend full of plans, there wasn't really any opportunity for me to 
do the work necessary to figure out what's going on. It will be first on my 
docket tomorrow.

I suppose one lesson here is that I shouldn't push anything at all non-trivial 
on a Friday afternoon. But I also hope we can either improve the infrastructure 
(of course, it's *much* better than it was months ago!) or have realistic 
expectations of what we can expect from the infrastructure (e.g., trust 
Harbormaster/Travis when seeking feedback, but always validate locally before 
actually pushing to master).

More tomorrow,
Richard

On Dec 14, 2014, at 3:47 PM, Joachim Breitner m...@joachim-breitner.de wrote:

 Hi,
 
 Am Samstag, den 13.12.2014, 10:55 -0500 schrieb Richard Eisenberg:
 Fixed, hopefully!
 
 Mitigated, but still a regression:
 
 http://ghcspeed-nomeata.rhcloud.com/timeline/?exe=2base=2%2B68ben=tests%2Falloc%2FT9872aenv=1revs=50equid=on#
 
 Is that now a level that we’ll have to live with, or is it still
 unexpectedly high?
 
 Greetings,
 Joachim
 
 -- 
 Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org
 

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-13 Thread Joachim Breitner
Hi,


Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg:
 
 Phab has shown up some performance regressions in my recent commits.
 See https://phabricator.haskell.org/harbormaster/build/2607/. The
 failures except for haddock.base are new, and evidently my fault. They
 didn't show up on Travis. Will look into it shortly, but I doubt over
 the weekend.


ghcspeed also observes this:
http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot

Especially the T9872 benchmarks have a huge increase in allocations. But
you seem to be aware of this, so that’s fine.

Greetings,
Joachim

-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org



signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-13 Thread Richard Eisenberg
I think I've fixed this. I've pushed the fix to wip/rae, and waiting for 
validation results before pushing to master.

My hunch below was right -- it was the change to matchFam, which essentially 
evaluated type-level functions more strictly. I've now made it lazier again. 
I'd like to better understand the tradeoff here, and to see if there's a 
principled sweet spot. But that will happen in a few days.

Expect a push to master soon.

Again, sorry for the bother.

Richard

On Dec 13, 2014, at 8:32 AM, Joachim Breitner m...@joachim-breitner.de wrote:

 Hi,
 
 
 Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg:
 
 Phab has shown up some performance regressions in my recent commits.
 See https://phabricator.haskell.org/harbormaster/build/2607/. The
 failures except for haddock.base are new, and evidently my fault. They
 didn't show up on Travis. Will look into it shortly, but I doubt over
 the weekend.
 
 
 ghcspeed also observes this:
 http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot
 
 Especially the T9872 benchmarks have a huge increase in allocations. But
 you seem to be aware of this, so that’s fine.
 
 Greetings,
 Joachim
 
 -- 
 Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org
 
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs


Re: performance regressions

2014-12-13 Thread Richard Eisenberg
Fixed, hopefully!

On Dec 13, 2014, at 10:03 AM, Richard Eisenberg e...@cis.upenn.edu wrote:

 I think I've fixed this. I've pushed the fix to wip/rae, and waiting for 
 validation results before pushing to master.
 
 My hunch below was right -- it was the change to matchFam, which essentially 
 evaluated type-level functions more strictly. I've now made it lazier again. 
 I'd like to better understand the tradeoff here, and to see if there's a 
 principled sweet spot. But that will happen in a few days.
 
 Expect a push to master soon.
 
 Again, sorry for the bother.
 
 Richard
 
 On Dec 13, 2014, at 8:32 AM, Joachim Breitner m...@joachim-breitner.de 
 wrote:
 
 Hi,
 
 
 Am Freitag, den 12.12.2014, 21:51 -0500 schrieb Richard Eisenberg:
 
 Phab has shown up some performance regressions in my recent commits.
 See https://phabricator.haskell.org/harbormaster/build/2607/. The
 failures except for haddock.base are new, and evidently my fault. They
 didn't show up on Travis. Will look into it shortly, but I doubt over
 the weekend.
 
 
 ghcspeed also observes this:
 http://ghcspeed-nomeata.rhcloud.com/changes/?rev=7256213843b80d75a86f033be77516a62d56044aexe=2env=johan%27s%20buildbot
 
 Especially the T9872 benchmarks have a huge increase in allocations. But
 you seem to be aware of this, so that’s fine.
 
 Greetings,
 Joachim
 
 -- 
 Joachim “nomeata” Breitner
 m...@joachim-breitner.de • http://www.joachim-breitner.de/
 Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
 Debian Developer: nome...@debian.org
 
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs
 
 ___
 ghc-devs mailing list
 ghc-devs@haskell.org
 http://www.haskell.org/mailman/listinfo/ghc-devs
 

___
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs