Re: Re[2]: Compiler optimizations questions for ghc 6.10...

2009-02-21 Thread Bertram Felgenhauer
Krasimir Angelov wrote:
 Well I actually did, almost. I added this function:
 
 quotX :: Int - Int - Int
 a `quotX` b
| b == 0 = error divZeroError
| b == (-1)  a == minBound = error overflowError
| otherwise  =  a `quotInt` b
 
 It does the right thing. However to be sure that this doesn't
 interfere with some other GHC magic the real quot function have to be
 changed and tested. I haven't build GHC from source for 2-3 years now
 and I don't have the time to do it just to test whether this works.

It works, and it has the desired effect. It's not always a win though;
the nofib prime sieve benchmark suffers.

For a patch, see
  http://int-e.home.tlink.de/haskell/ghc-libraries-base-tune-division.patch

Nofib results extract:


Program   SizeAllocs   Runtime   Elapsed

   fish  -0.7% -5.3%  0.05  0.04
 primes  -0.0%+28.5%+25.6%+25.5%
   wheel-sieve2  -0.0% -0.3%-17.9%-18.6%

Min  -0.9% -5.3%-17.9%-18.6%
Max  +0.1%+28.5%+25.6%+25.5%
 Geometric Mean  -0.2% +0.2% -0.0% +0.2%

'primes' is an outlier - the other slowdowns are below 3%

What happens in 'primes' is that 'mod' no longer gets inlined;
apparently it now looks bigger to the compiler than before.

Using -funfolding-use-threshold=10 brings the benchmark back to its
original speed, despite the extra comparison before doing the
division.

In 'wheel-sieve2', the first different optimization choice I see is
again a 'mod' that is no longer inlined; this leads to a change in other
inlining choices that result in a speedup for reasons that I have not
investigated.

Bertram
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Compiler optimizations questions for ghc 6.10...

2009-02-21 Thread Krasimir Angelov
How mod is affected by the change in quot? Currently mod is defined as:

a `mod` b
 | b == 0 = divZeroError
 | a == minBound  b == (-1) = overflowError
 | otherwise  =  a `modInt` b

and modInt is defined via remInt# which is primitive. Did you change
the definition of mod as well?

I agree that this change in the definition of quot is just a
workaround. The best solution would be to do something clever in the
compiler itself. For now if someone wants fast division then he/she
should use quotInt.



On Sat, Feb 21, 2009 at 9:52 AM, Bertram Felgenhauer
bertram.felgenha...@googlemail.com wrote:
 Krasimir Angelov wrote:
 Well I actually did, almost. I added this function:

 quotX :: Int - Int - Int
 a `quotX` b
| b == 0 = error divZeroError
| b == (-1)  a == minBound = error overflowError
| otherwise  =  a `quotInt` b

 It does the right thing. However to be sure that this doesn't
 interfere with some other GHC magic the real quot function have to be
 changed and tested. I haven't build GHC from source for 2-3 years now
 and I don't have the time to do it just to test whether this works.

 It works, and it has the desired effect. It's not always a win though;
 the nofib prime sieve benchmark suffers.

 For a patch, see
  http://int-e.home.tlink.de/haskell/ghc-libraries-base-tune-division.patch

 Nofib results extract:

 
Program   SizeAllocs   Runtime   Elapsed
 
   fish  -0.7% -5.3%  0.05  0.04
 primes  -0.0%+28.5%+25.6%+25.5%
   wheel-sieve2  -0.0% -0.3%-17.9%-18.6%
 
Min  -0.9% -5.3%-17.9%-18.6%
Max  +0.1%+28.5%+25.6%+25.5%
  Geometric Mean  -0.2% +0.2% -0.0% +0.2%

 'primes' is an outlier - the other slowdowns are below 3%

 What happens in 'primes' is that 'mod' no longer gets inlined;
 apparently it now looks bigger to the compiler than before.

 Using -funfolding-use-threshold=10 brings the benchmark back to its
 original speed, despite the extra comparison before doing the
 division.

 In 'wheel-sieve2', the first different optimization choice I see is
 again a 'mod' that is no longer inlined; this leads to a change in other
 inlining choices that result in a speedup for reasons that I have not
 investigated.

 Bertram
 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-20 Thread Krasimir Angelov
I think that what happens is that this function:

a `quot` b
| b == 0 = divZeroError
| a == minBound  b == (-1) = overflowError
| otherwise  =  a `quotInt` b

is expanded to:

a `quot` b =
   if b == 0
 then divZeroError
 else if a == minBound
  then if b == (-1)
   then overflowError
   else a `quotInt` b
  else a `quotInt` b

Then the compiler sees that b is a constant and computes that b == 0
is False and b == (-1) is also False so it could eliminate two If
statements. The result is:

a `quot` b =
   if a == minBound
 then a `quotInt` b
 else a `quotInt` b

and this is exactly what we get. I bet that if the original function was:

a `quot` b
| b == 0 = divZeroError
| b == (-1)  a == minBound = overflowError   -- Note the changed
order here
| otherwise  =  a `quotInt` b

then we would get what we want. I think that it is much more often to
have division where the divisor is known so we will get the best code
in this case.

Regards,
   Krasimir


On Fri, Feb 20, 2009 at 4:00 AM, Tyson Whitehead twhiteh...@gmail.com wrote:
 On February 19, 2009 18:20:33 Krasimir Angelov wrote:
 Oh. I looked at the primops.txt.pp for something suspicious but I
 didn't checked the actual implementation of quot. I thought that quot
 calls quotInt# directly. When I use quotInt in the code I can get the
 real idiv assembly instruction. Still the code generated by GHC is
 strange it doesn't throw any exception actually. It just evaluates the
 same expression but with the constant maxBound.

 On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke
 a `quot` b
 
  | b == 0 = divZeroError
  | a == minBound  b == (-1) = overflowError
  | otherwise  =  a `quotInt` b

 I checked the quot one like you said.  Kind of strange all right.

 It looks like it is performing the a == minBound check (where a = maxBound-x)
 despite the b == (-1) check (where b = 10) being optimized away.

 Cheers!  -Tyson

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: Compiler optimizations questions for ghc 6.10...

2009-02-20 Thread Bulat Ziganshin
Hello Krasimir,

Friday, February 20, 2009, 11:00:30 AM, you wrote:

 and this is exactly what we get. I bet that if the original function was:

well, you check this yourself! :)


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Compiler optimizations questions for ghc 6.10...

2009-02-20 Thread Krasimir Angelov
Well I actually did, almost. I added this function:

quotX :: Int - Int - Int
a `quotX` b
   | b == 0 = error divZeroError
   | b == (-1)  a == minBound = error overflowError
   | otherwise  =  a `quotInt` b

It does the right thing. However to be sure that this doesn't
interfere with some other GHC magic the real quot function have to be
changed and tested. I haven't build GHC from source for 2-3 years now
and I don't have the time to do it just to test whether this works.

Regards,
  Krasimir


On Fri, Feb 20, 2009 at 9:22 AM, Bulat Ziganshin
bulat.zigans...@gmail.com wrote:
 Hello Krasimir,

 Friday, February 20, 2009, 11:00:30 AM, you wrote:

 and this is exactly what we get. I bet that if the original function was:

 well, you check this yourself! :)


 --
 Best regards,
  Bulatmailto:bulat.zigans...@gmail.com


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-19 Thread Krasimir Angelov
I was surprised to see this case expression:

case GHC.Prim.-# 9223372036854775807 ipv_s1bD
of wild2_a1xi [ALWAYS Just L] {

What is the purpose to compare the value with maxBound before the
division? The case expression doesn't disappear even if I use quot
instead of div.

Krasimir



2009/2/18 Tyson Whitehead twhiteh...@gmail.com:
 digit_s1lk =
  \ (x_aqR [ALWAYS Just U(L)] :: GHC.Types.Int) -
case x_aqR
of x_XsQ [ALWAYS Just A] { GHC.Types.I# ipv_s1bD [ALWAYS Just L] -
let {
  lvl_s1my [ALWAYS Just D(T)] :: GHC.Types.Int
  [Str: DmdType]
  lvl_s1my =
case GHC.Prim.-# 9223372036854775807 ipv_s1bD
of wild2_a1xi [ALWAYS Just L] {
  __DEFAULT -
case GHC.Base.divInt# wild2_a1xi 10
of wild21_a1xj [ALWAYS Just L] { __DEFAULT -
GHC.Types.I# wild21_a1xj
};
  (-9223372036854775808) -
case GHC.Base.divInt# (-9223372036854775808) 10
of wild21_a1xl [ALWAYS Just L] { __DEFAULT -
GHC.Types.I# wild21_a1xl
}
} } in
(\ (eta_X1sU [ALWAYS Just L] :: GHC.Types.Int)
   (eta_s1G7 [ALWAYS Just U(L)] :: GHC.Types.Int) -
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-19 Thread Max Bolingbroke
2009/2/19 Krasimir Angelov kr.ange...@gmail.com:
 I was surprised to see this case expression:

case GHC.Prim.-# 9223372036854775807 ipv_s1bD
of wild2_a1xi [ALWAYS Just L] {

 What is the purpose to compare the value with maxBound before the
 division? The case expression doesn't disappear even if I use quot
 instead of div.

Have a look at this snippet of the base library, file GHC/Real.lhs:

a `quot` b
 | b == 0 = divZeroError
 | a == minBound  b == (-1) = overflowError
 | otherwise  =  a `quotInt` b

quotInt is defined in GHC/Base.lhs as:

(I# x) `quotInt`  (I# y) = I# (x `quotInt#` y)

And quotInt# is a primitive, which I guess is implemented via machine
division (though I don't work on the codegen stuff at all) - so your
offending case must come from those tests in GHC/Real.lhs.

In general, if you want to answer questions like this you can usually
find the answer by looking at the base code: it's all in Haskell, so
very readable! You can get it online at
http://darcs.haskell.org/libraries/base/GHC/

Cheers,
Max
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-19 Thread Krasimir Angelov
Oh. I looked at the primops.txt.pp for something suspicious but I
didn't checked the actual implementation of quot. I thought that quot
calls quotInt# directly. When I use quotInt in the code I can get the
real idiv assembly instruction. Still the code generated by GHC is
strange it doesn't throw any exception actually. It just evaluates the
same expression but with the constant maxBound.


On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke
batterseapo...@hotmail.com wrote:
 2009/2/19 Krasimir Angelov kr.ange...@gmail.com:
 I was surprised to see this case expression:

case GHC.Prim.-# 9223372036854775807 ipv_s1bD
of wild2_a1xi [ALWAYS Just L] {

 What is the purpose to compare the value with maxBound before the
 division? The case expression doesn't disappear even if I use quot
 instead of div.

 Have a look at this snippet of the base library, file GHC/Real.lhs:

a `quot` b
 | b == 0 = divZeroError
 | a == minBound  b == (-1) = overflowError
 | otherwise  =  a `quotInt` b

 quotInt is defined in GHC/Base.lhs as:

 (I# x) `quotInt`  (I# y) = I# (x `quotInt#` y)

 And quotInt# is a primitive, which I guess is implemented via machine
 division (though I don't work on the codegen stuff at all) - so your
 offending case must come from those tests in GHC/Real.lhs.

 In general, if you want to answer questions like this you can usually
 find the answer by looking at the base code: it's all in Haskell, so
 very readable! You can get it online at
 http://darcs.haskell.org/libraries/base/GHC/

 Cheers,
 Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-19 Thread Tyson Whitehead
On February 19, 2009 18:20:33 Krasimir Angelov wrote:
 Oh. I looked at the primops.txt.pp for something suspicious but I
 didn't checked the actual implementation of quot. I thought that quot
 calls quotInt# directly. When I use quotInt in the code I can get the
 real idiv assembly instruction. Still the code generated by GHC is
 strange it doesn't throw any exception actually. It just evaluates the
 same expression but with the constant maxBound.

 On Thu, Feb 19, 2009 at 11:19 PM, Max Bolingbroke
 a `quot` b
 
  | b == 0 = divZeroError
  | a == minBound  b == (-1) = overflowError
  | otherwise  =  a `quotInt` b

I checked the quot one like you said.  Kind of strange all right.

It looks like it is performing the a == minBound check (where a = maxBound-x) 
despite the b == (-1) check (where b = 10) being optimized away.

Cheers!  -Tyson


signature.asc
Description: This is a digitally signed message part.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-18 Thread Max Bolingbroke
2009/2/18 Tyson Whitehead twhiteh...@gmail.com:
 On February 17, 2009 19:24:44 Max Bolingbroke wrote:
 2009/2/17 Tyson Whitehead twhiteh...@gmail.com:
  (compiled with ghc 6.10 with options -O2 -ddump-simpl)

 That should have been -ddump-stranal instead of -ddump-simpl.

Right. Mystery solved. In case you were wondering, the reason the
other two lambdas in digit are not worker/wrappered is because GHC
only performs strictness analysis on lambdas that occur /immediately/
in the binding. So increasing the arity of digit is CRUCIAL to getting
your program to compile well, as that will cause:
* Strictness analysis on the inner lambdas
* Inlining of lvl_s1mF
* Fewer updateable thunks to be created by callers (as we won't try to
cache the results of a partial application)

  I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core
  at the bottom of this email as that is the only place it is ever
  referenced.

 The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks
 like it dosen't get inlined for two reasons:
 1) It's not a manifest lambda (it's an application) so inlining inside
 another lambda would change the number of times the FVs of lvl_s1mF
 might occur

 I have to confess my ignorance here as my google fu failed and so I still
 don't know what a manifest lambda is (other than not a application).  : )

Sorry :-). Manifest lambdas are just lambdas that occur manifestly! So e.g.:

f = \x - g x

HAS a manifest lambda, whereas:

f' = g

Does NOT have one. No lambda immediately starts the binding.

Lambdas, manifest or otherwise, are important in this context because
any value with a leading lambda is certainly very cheap and so can be
inlined inside other lambdas - which is what you are trying to achieve
with lvl_s1mF.

(snip code and explanation)

 I just finished adding the Parse q Int type to help with email line wrapping.
 As I alluded to in my original email, if I don't have the Int overflow check
 in digit, it is not chosen as the loop breaker, all the StateT stuff is
 compiled away, and you get a really nice efficient assembler loop (which is
 important because the final FSM has to actually chew through GBs of data).

 The part of the code under the first lambda in digit is as follows (I didn't
 keep the original dump, so the uniques have changed here).  It's the second
 part of the Int overflow bounds check (i.e., y = (maxBound-x)`div`10), and,
 indeed, something you don't want to compute unless the easy check fails.

Yes - GHC wants to share the work of (maxBound-x)`div`10 between
several partial applications of digit. This is usually a good idea,
but in this case it sucks because it's resulted in a massively
increased arity. IMHO GHC should fix this by:
* Marking divInt# INLINE in the base library. This would result in
your code would just containing uses of quotInt#
* Making some operations cheap even if they may fail
(PrimOp.primpOpIsCheap should change). Though this might mean that we
turn non-terminating programs into terminating ones (such operations
get pushed inside lambdas) but this is consistent with our treatment
of lambdas generally.

Actually, your divInt# call wouldn't even usually be floated out to
between two lambdas, but at the time FloatOut runs there is something
in between the \x lambda and the lambdas from the state monad - the
monadic bind operator! So FloatOut feels free to move the computation
for x up even though that = will go away as soon as we run the
simplifier. What a disaster!

For me, making digit INLINE fixes all this, but that's probably
because the context of my code is not the same as yours (I had to
invent parts of a program to bind the bs, q and n variables). For your
immediate problem, I would suggest this bit of GHC witchdoctory:

 where digit :: Int - ParseInt Int Int
   digit !x = do !y - lift get
 ( if y = (maxBound-9)`quot`10 ||
y = ghc_hacks
   then let !y' = y*10+x in (lift
$ put y')  s2
   else throwError integer overflow )
  where {-# INLINE ghc_hacks #-}
ghc_hacks = (maxBound-x)`div`10

With luck, this should make the loop-invariant cheap in GHC's eyes,
preventing it from trying to share it. It works for me - but like I
say things may differ in your program.

 In general, the -ddump-inlinings flag is useful for working out why
 something wasn't inlined - but it wouldn't have helped you in this
 case, because it only dumps information about inlining at call sites,
 and you actually want an unconditional inlining to occur.

 I also tried that, and didn't have much luck with it.  I didn't understand the
 output, which there was 48k lines worth of, and the uniques kept changing
 which made it hard to grep for names from previous -ddump-simpl runs.

Right. I tend to use -ddump-inilnings and -dverbose-core2core (or

Re: Compiler optimizations questions for ghc 6.10...

2009-02-18 Thread Max Bolingbroke
2009/2/18 Max Bolingbroke batterseapo...@hotmail.com:
 Yes - GHC wants to share the work of (maxBound-x)`div`10 between
 several partial applications of digit. This is usually a good idea,
 but in this case it sucks because it's resulted in a massively
 increased arity. IMHO GHC should fix this by:
 * Marking divInt# INLINE in the base library. This would result in
 your code would just containing uses of quotInt#
 * Making some operations cheap even if they may fail
 (PrimOp.primpOpIsCheap should change). Though this might mean that we
 turn non-terminating programs into terminating ones (such operations
 get pushed inside lambdas) but this is consistent with our treatment
 of lambdas generally.

Just to let you know - I've filed a bug (#3034
http://hackage.haskell.org/trac/ghc/ticket/3034) about this issue.
I've taken the liberty of adding you to the CC list so you can keep
track of the problem.

Cheers,
Max
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-18 Thread Tyson Whitehead
On February 18, 2009 04:29:42 Max Bolingbroke wrote:
  The part of the code under the first lambda in digit is as follows (I
  didn't keep the original dump, so the uniques have changed here).  It's
  the second part of the Int overflow bounds check (i.e., y =
  (maxBound-x)`div`10), and, indeed, something you don't want to compute
  unless the easy check fails.

 Yes - GHC wants to share the work of (maxBound-x)`div`10 between
 several partial applications of digit. This is usually a good idea,
 but in this case it sucks because it's resulted in a massively
 increased arity. IMHO GHC should fix this by:
 * Marking divInt# INLINE in the base library. This would result in
 your code would just containing uses of quotInt#
 * Making some operations cheap even if they may fail
 (PrimOp.primpOpIsCheap should change). Though this might mean that we
 turn non-terminating programs into terminating ones (such operations
 get pushed inside lambdas) but this is consistent with our treatment
 of lambdas generally.

Reading your response got me thinking, quot and div, I thought I had changed 
everything to quot because earlier I found that GHC was leaving evaluation of 
constant expressions like maxBound-9`div`10 until runtime if I used div.

Turns out I had missed that one in the second bounds check, and changing it 
from y = (maxBound-x)`div`10 to y = (maxBound-x)`qout`10 resulted in GHC 
just doing the right thing.  I presume this is because quot must be either 
marked INLINE and/or primOpIsCheap like you said div should be.

I am guess this is because quot maps directly onto the x86 idiv instruction 
due to both of them truncating towards zero, while div, with its truncation 
towards negative infinity, does not.  Running with -ddump-asm seems to back 
this up as quotInt# compiles down to an idiv instruction and divInt# to a call 
through base_GHCziBase_divIntzh_info.  Unfortunately for me, I always seem to 
instinctively go with div and mod ahead of quot and rem.

 Actually, your divInt# call wouldn't even usually be floated out to
 between two lambdas, but at the time FloatOut runs there is something
 in between the \x lambda and the lambdas from the state monad - the
 monadic bind operator! So FloatOut feels free to move the computation
 for x up even though that = will go away as soon as we run the
 simplifier. What a disaster!

I would like to try combining the GHC optimizer with a genetic algorithm so 
you could set it to pound away on your core loops for an hour or so to find 
the right sequence of ghc optimization steps to generate the tightest code.

Maybe it could then write out an optimization hint file that regular GHC could 
optionally take in for use alongside the standard rules of thumb to produce 
great code.  All I need is more time and more knowledge.  Maybe someday.  : )

 For me, making digit INLINE fixes all this, but that's probably
 because the context of my code is not the same as yours (I had to
 invent parts of a program to bind the bs, q and n variables).

Sorry about that.  I've put the entire routine at

http://www.sharcnet.ca/~tyson/haskell/Example1.hs

I should have done this in the first place as it allows me to provide the full 
code while also stripping it down in the emails to make them more readable.

 For your
 immediate problem, I would suggest this bit of GHC witchdoctory:

  where digit :: Int - ParseInt Int Int
digit !x = do !y - lift get
  ( if y = (maxBound-9)`quot`10 ||
 y = ghc_hacks
then let !y' = y*10+x in (lift
 $ put y')  s2
else throwError integer overflow )
   where {-# INLINE ghc_hacks #-}
 ghc_hacks = (maxBound-x)`div`10

 With luck, this should make the loop-invariant cheap in GHC's eyes,
 preventing it from trying to share it. It works for me - but like I
 say things may differ in your program.

I tried that as well, and am pleased to report that it works great.  I think I 
actually understand what is going on here and will hopefully be able to wield 
it in the future to my advantage.

Generally I've had very little luck with the INLINE hammer when the function 
is at all complex.  GHC always seems to finds a way to thwart my pathetic 
attempts and punish me with even worse code (like then not inlining the monad 
bind operation in the code resulting from the inlining).  : ) 

 If you are interested in trying my script, it's available via Git at
 http://github.com/batterseapower/scripts/blob/da1f24ba16c27e3994aa66f9db352
ec1102c39d2/ghc-dump-split and is called as ghc-dump-split ghc-dump-file,
 where ghc-dump-file results from redirection of GHC stdout+stderr to a
 file.

I'll grab a copy of that.  : )

 Hope all that helps,

It sure did.  Thanks very much for all your time!

I'm really impressed with GHC and the people hacking on it.  I've actually 

Re: Compiler optimizations questions for ghc 6.10...

2009-02-18 Thread Tyson Whitehead
On February 18, 2009 12:42:02 Tyson Whitehead wrote:
 On February 18, 2009 04:29:42 Max Bolingbroke wrote:
  Yes - GHC wants to share the work of (maxBound-x)`div`10 between
  several partial applications of digit. This is usually a good idea,
  but in this case it sucks because it's resulted in a massively
  increased arity. IMHO GHC should fix this by:
  * Marking divInt# INLINE in the base library. This would result in
  your code would just containing uses of quotInt#
  * Making some operations cheap even if they may fail
  (PrimOp.primpOpIsCheap should change). Though this might mean that we
  turn non-terminating programs into terminating ones (such operations
  get pushed inside lambdas) but this is consistent with our treatment
  of lambdas generally.

 I am guess this is because quot maps directly onto the x86 idiv instruction
 due to both of them truncating towards zero, while div, with its truncation
 towards negative infinity, does not.  Running with -ddump-asm seems to back
 this up as quotInt# compiles down to an idiv instruction and divInt# to a
 call through base_GHCziBase_divIntzh_info.  Unfortunately for me, I always
 seem to instinctively go with div and mod ahead of quot and rem.

I see what you mean about div as it is defined through divMod which is in turn 
defined through quotRem in Prelude.

n `div` d =  q  where (q,_) = divMod n d
divMod n d =  if signum r == negate (signum d) then (q-1, r+d) else qr
where qr@(q,r) = quotRem n d

However, GHC almost seems to be doing its own thing as, as I mentioned above, 
it turns it into a divInt# (which is not in GHC.Prim) in the tidy core, which 
then turns into a call through base_GHCziBase_divIntzh_info in the assembler.

(note that I'm just looking at the source from the haskell.org libraries link)

Cheers!  -Tyson


signature.asc
Description: This is a digitally signed message part.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Compiler optimizations questions for ghc 6.10...

2009-02-17 Thread Tyson Whitehead
(compiled with ghc 6.10 with options -O2 -ddump-simpl)

I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at 
the bottom of this email as that is the only place it is ever referenced.

It also seems the extra levels of indirection are defeating the strictness 
analyzer on eta_s1CN in a_s1Gv as all code branches either directly force it 
or ultimately pass it to digit_s1l3 as in the included branch.

Also, why isn't digit_s1l3 optimized to take its first parameter unboxed?  It 
is strict in its first argument, and grepping the core shows that it is only 
ever used like in lvl_s1mF (i.e., passed things like lvl_s1mG).

Thanks!  -Tyson

PS:  Is there any way to get better control over the loop breaker choice?  For 
a slightly simpler digit function, it is not chosen, and great code is 
produced.  I've tried using INLINE on digit, but that seems to result in the 
monad bind operator not being inlined, which produces even worse code.

.
.
.
letrec {
  lvl_s1mF :: Control.Monad.State.Strict.StateT
GHC.Types.Int
(Control.Monad.State.Strict.StateT
   GHC.Types.Int
   (Control.Monad.Error.ErrorT
  GHC.Base.String Control.Monad.Identity.Identity))
GHC.Types.Int
  [Str: DmdType {s1l3-C(S)}]
  lvl_s1mF = digit_s1l3 lvl_s1mG;
  .
  .
  .
  a_s1Gv :: GHC.Types.Int
- GHC.Types.Int
- Control.Monad.Error.ErrorT
 [GHC.Types.Char]
 Control.Monad.Identity.Identity
 ((GHC.Types.Int, GHC.Types.Int), GHC.Types.Int)
  [Arity 2
   Str: DmdType U(L)L {a1hP-U(TTTL) s1ka-U(L)}]
  a_s1Gv =

  a_s1Gv =
\ (eta_a1px [ALWAYS Just U(L)] :: GHC.Types.Int)
  (eta_s1CN [ALWAYS Just L] :: GHC.Types.Int) -
.
.

  GHC.Bool.True -
(((lvl_s1mF
   `cast` (... ~
  GHC.Types.Int
  - Control.Monad.State.Strict.StateT
   GHC.Types.Int
   (Control.Monad.Error.ErrorT
  GHC.Base.String Control.Monad.Identity.Identity)
   (GHC.Types.Int, GHC.Types.Int)))
(GHC.Types.I# (GHC.Prim.+# x_a1tl 1)))
 `cast` (... ~
GHC.Types.Int
- Control.Monad.Error.ErrorT
 GHC.Base.String
 Control.Monad.Identity.Identity
 ((GHC.Types.Int, GHC.Types.Int), GHC.Types.Int)))
  eta_s1CN

.
.
  .
  .
  .
  digit_s1l3 [ALWAYS LoopBreaker Nothing]
 :: GHC.Types.Int
- Control.Monad.State.Strict.StateT
 GHC.Types.Int
 (Control.Monad.State.Strict.StateT
GHC.Types.Int
(Control.Monad.Error.ErrorT
   GHC.Base.String
   Control.Monad.Identity.Identity))
 GHC.Types.Int
  [Arity 1
   Str: DmdType U(L)]
  digit_s1l3 =
\ (x_aqR [ALWAYS Just U(L)] ::GHC.Types.Int) -
  case x_aqR
  of x_XsO [ALWAYS Just A] { GHC.Types.I# ipv_s1bt [ALWAYS Just L] -
  let {
.
.
.
  } in
  (\ (eta_X1sC [ALWAYS Just L] :: GHC.Types.Int)
 (eta_s1FE [ALWAYS Just U(L)] :: GHC.Types.Int) -
.
.
. )
  `cast` (... ~
 Control.Monad.State.Strict.StateT
   GHC.Types.Int
   (Control.Monad.State.Strict.StateT
  GHC.Types.Int
  (Control.Monad.Error.ErrorT
 GHC.Base.String Control.Monad.Identity.Identity))
   GHC.Types.Int)
  .
  .
  .
}
.
.
.
lvl_s1mG :: GHC.Types.Int
[Str: DmdType m]
lvl_s1mG = GHC.Types.I# 0

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-17 Thread Max Bolingbroke
2009/2/17 Tyson Whitehead twhiteh...@gmail.com:
 (compiled with ghc 6.10 with options -O2 -ddump-simpl)

 I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at
 the bottom of this email as that is the only place it is ever referenced.

The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks
like it dosen't get inlined for two reasons:
1) It's not a manifest lambda (it's an application) so inlining inside
another lambda would change the number of times the FVs of lvl_s1mF
might occur
2) I'm not sure if the use-context is considered interesting by GHC
because the application of the function might be hidden by the cast.
Not sure about this one.

So it looks like the problem stems from digit_s1l3 having arity 1
rather than arity 3. You could try and force it into a higher arity
somehow, but I can't say exactly how you might do that without seeing
the rest of the Core (and in particular the part under the first
lambda in the definition of digit).

In general, the -ddump-inlinings flag is useful for working out why
something wasn't inlined - but it wouldn't have helped you in this
case, because it only dumps information about inlining at call sites,
and you actually want an unconditional inlining to occur.

 It also seems the extra levels of indirection are defeating the strictness
 analyzer on eta_s1CN in a_s1Gv as all code branches either directly force it
 or ultimately pass it to digit_s1l3 as in the included branch.

 Also, why isn't digit_s1l3 optimized to take its first parameter unboxed?  It
 is strict in its first argument, and grepping the core shows that it is only
 ever used like in lvl_s1mF (i.e., passed things like lvl_s1mG).

Yeah, that's weird. I don't know the answer to this. Have you actually
got to the worker-wrapper stage at the point you copied this core?

 Thanks!  -Tyson

 PS:  Is there any way to get better control over the loop breaker choice?  For
 a slightly simpler digit function, it is not chosen, and great code is
 produced.  I've tried using INLINE on digit, but that seems to result in the
 monad bind operator not being inlined, which produces even worse code.

GHC won't inline inside functions marked with INLINE pragmas, for
various very good reasons. I don't know how you could change the loop
breaker given the current state of technology, but SPJ has a patch for
GHC in the works that revamps the treatment of INLINE pragmas and
which should (if my understanding of his patch is correct) solve the
issue with = not being inlined within an inlineable digit.

Sorry I can't be more helpful. It's possible I could make some more
concrete suggestions if you posted the complete code somewhere.
Max
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-17 Thread Tyson Whitehead
On February 17, 2009 19:24:44 Max Bolingbroke wrote:
 2009/2/17 Tyson Whitehead twhiteh...@gmail.com:
  It also seems the extra levels of indirection are defeating the
  strictness analyzer on eta_s1CN in a_s1Gv as all code branches either
  directly force it or ultimately pass it to digit_s1l3 as in the included
  branch.
 
  Also, why isn't digit_s1l3 optimized to take its first parameter unboxed?
   It is strict in its first argument, and grepping the core shows that it
  is only ever used like in lvl_s1mF (i.e., passed things like lvl_s1mG).

 Yeah, that's weird. I don't know the answer to this. Have you actually
 got to the worker-wrapper stage at the point you copied this core?

Yes.  You are right.  Contrary to what the top of the email said, I created 
that output with -ddump-stranal, and -dshow-passes indicates that the worker-
wrapper stage comes next.  If I dump it (or just the final core with -ddump-
simpl), digit* is entirely replaced with a first-argument-unboxed $wdigit*.

The inner lambdas (i.e., the second and third arguments) remain boxed.

It seems I should have just used the -ddump-simpl output instead of the -
ddump-stranal output.  I had just got thinking the the -ddump-simpl output did 
not include strictness analysis because I didn't see it on a bunch of the code 
(in retrospect, that was because that code was created by later stages)

Cheers!  -Tyson



signature.asc
Description: This is a digitally signed message part.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Compiler optimizations questions for ghc 6.10...

2009-02-17 Thread Tyson Whitehead
On February 17, 2009 19:24:44 Max Bolingbroke wrote:
 2009/2/17 Tyson Whitehead twhiteh...@gmail.com:
  (compiled with ghc 6.10 with options -O2 -ddump-simpl)

That should have been -ddump-stranal instead of -ddump-simpl.

  I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core
  at the bottom of this email as that is the only place it is ever
  referenced.

 The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks
 like it dosen't get inlined for two reasons:
 1) It's not a manifest lambda (it's an application) so inlining inside
 another lambda would change the number of times the FVs of lvl_s1mF
 might occur

I have to confess my ignorance here as my google fu failed and so I still 
don't know what a manifest lambda is (other than not a application).  : )

 2) I'm not sure if the use-context is considered interesting by GHC
 because the application of the function might be hidden by the cast.
 Not sure about this one.

I was wondering about that, which is why I didn't remove all the cast noise.

 So it looks like the problem stems from digit_s1l3 having arity 1
 rather than arity 3. You could try and force it into a higher arity
 somehow, but I can't say exactly how you might do that without seeing
 the rest of the Core (and in particular the part under the first
 lambda in the definition of digit).

The thing is that the inner lambdas come from inlining that StateT monad 
transformers in a StateT q (StateT Int (ErrorT String Identity)) monad (i.e., 
the first one is the q state -- which works out to an Int -- and the second is 
the Int state).  I guess I could explicitly pass them around, but that would 
seem to defeat the purpose of having StateT.

The actual routines under this implement part of a FSM for (hopefully) 
efficiently extracting an Int from a ByteString (or a uvector UArr -- source 
of the Step data type).  The relevant part of the actual code, which is a bit 
hacked up with ! patterns from my attempts to get better code, is as follows.

type ParseInt q a = StateT q (StateT Int (ErrorT String Identity)) a

next :: q - Step q Word8
next i | i==n  = Done
   | otherwise = Yield (bs `BS.unsafeIndex` i) (i+1)

wrap :: Monad m = (Word8 - StateT q m a) - StateT q m a - StateT q m a
wrap yield (done::StateT q m a) = loop
where loop :: StateT q m a
  loop = do q - get
case next q of
  Yield x q' - put q'  yield x
  Skipq' - put q'  loop
  Done   - done
s2 :: ParseInt q Int
s2 = wrap yield done
where yield :: Word8 - StateT q (StateT Int (ErrorT String Identity)) Int
  yield x | x==48 = digit 0
  | x==49 = digit 1
  | x==50 = digit 2
  | x==51 = digit 3
  | x==52 = digit 4
  | x==53 = digit 5
  | x==54 = digit 6
  | x==55 = digit 7
  | x==56 = digit 8
  | x==57 = digit 9
  | otherwise = do !y - lift get
   return y
  where digit :: Int - ParseInt q Int
digit !x = do !y - lift get
  ( if y = (maxBound-9)`quot`10 || y = 
(maxBound-x)`div`10
then let !y' = y*10+x in (lift $ put y') 
 s2
else throwError integer overflow )
  done :: ParseInt q Int
  done= do !y - lift get
   return y

I just finished adding the Parse q Int type to help with email line wrapping.  
As I alluded to in my original email, if I don't have the Int overflow check 
in digit, it is not chosen as the loop breaker, all the StateT stuff is 
compiled away, and you get a really nice efficient assembler loop (which is 
important because the final FSM has to actually chew through GBs of data).

The part of the code under the first lambda in digit is as follows (I didn't 
keep the original dump, so the uniques have changed here).  It's the second 
part of the Int overflow bounds check (i.e., y = (maxBound-x)`div`10), and, 
indeed, something you don't want to compute unless the easy check fails.

digit_s1lk =
  \ (x_aqR [ALWAYS Just U(L)] :: GHC.Types.Int) -
case x_aqR
of x_XsQ [ALWAYS Just A] { GHC.Types.I# ipv_s1bD [ALWAYS Just L] -
let {
  lvl_s1my [ALWAYS Just D(T)] :: GHC.Types.Int
  [Str: DmdType]
  lvl_s1my =
case GHC.Prim.-# 9223372036854775807 ipv_s1bD
of wild2_a1xi [ALWAYS Just L] {
  __DEFAULT -
case GHC.Base.divInt# wild2_a1xi 10
of wild21_a1xj [ALWAYS Just L] { __DEFAULT -
GHC.Types.I# wild21_a1xj
};
  (-9223372036854775808) -
case GHC.Base.divInt# (-9223372036854775808) 10
of wild21_a1xl [ALWAYS Just L] { __DEFAULT -