Re: [Haskell-cafe] GHC optimisations

2007-08-23 Thread Isaac Dupree

Hugh Perkins wrote:

On 8/22/07, Twan van Laarhoven [EMAIL PROTECTED] wrote:

But Double is already quite badly behaved:
  let x = 1e20
  Prelude 1 + (x - x)
  1.0
  Prelude (1 + x) - x
  0.0


E.  Whilst that's understandable and unavoidable, that kindof
rings alarm bells for folds of Doubles in an automatic threading
situation.  Ie, if we split our fold between 4 cores, we might get a
different answer than if we split it between 8 cores?


Yep, Double's (+) isn't associative.

I was trying to come up with ANY common mathematical property that 
Double fulfills and this was what I came up with:


-- for all x::Float.  not (x  x + 1)

I'm not even sure it's true in the presence of very large numbers, 
optimizations and extra precision being added in the CPU.  Note that 
it's not (x = x + 1) because of NaN.


Is Double's (+) commutative? I don't know for sure. Its (==) isn't 
reflexive (is it transitive? probably, at least if there aren't too many 
optimizations, but floating-point transitive equality isn't very useful).


-- Okay... so (+), (*), (==) are probably symmetric too.

Any others?

Isaac
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-23 Thread Neil Mitchell
Hi

 Its (==) isn't
 reflexive (is it transitive? probably, at least if there aren't too many
 optimizations, but floating-point transitive equality isn't very useful).

It's not even referentially transparent in all cases. a == b may fail
while the double's are in the high precision registers, and then
succeed later on in the program once they are truncated. I think you
have to specify -fexcess-precision with GHC to get this behaviour.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-23 Thread Adam Langley
On 8/23/07, Neil Mitchell [EMAIL PROTECTED] wrote:
 It's not even referentially transparent in all cases. a == b may fail
 while the double's are in the high precision registers, and then
 succeed later on in the program once they are truncated. I think you
 have to specify -fexcess-precision with GHC to get this behaviour.

Also, if you tell GCC to use SSE (there are a number of flags, like
-march and -msse, -msse2 etc which can do this). It can stop using the
80-bit x87 registers for floating point and switch to 64-bit SSE
registers. Then you *don't* get this affect, but I'm not sure you can
be sure that you don't ever get it.

Floating point numbers make me sad :(


AGL

-- 
Adam Langley  [EMAIL PROTECTED]
http://www.imperialviolet.org   650-283-9641
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Simon Peyton-Jones
| First of all, optimizing mod and div can not be done with PrelRules,
| because they are not primitives, quot and rem are.

Yes, you can do them with PrelRules!  Check out PrelRules.builtinRules.

| Multiplication and division can become shifts:
|
|   {-# RULES
|  
|   -- x * 2^n  --  x `shiftL` n
|   x# *# 2#  forall x#.  x# *# 2# = x# `iShiftL#` 1#
|   2# *# x#  forall x#.  2# *# x# = x# `iShiftL#` 1#
| -- etc.


| A problem with these rules is that you need a whole lot of them. 32 per
| operation (on a 32 bit platform), * 4 operations, * 2 separate versions
| for words and ints = 256.

I think you should be able to a lot better.  For example, to do constant 
folding for +# you might think you needed a lot of rules

1# +# 2# = 3#
1# +# 3# = 4#
etc

But not so!  See PrelRules for how to write one rule that does all of these at 
once.  I think you can do multiply-to-shift in the same way.


The downside of PrelRules is that it's part of the compiler, not in Haskell 
pragmas; that's what makes it more expressive than rules written in source code.

Does that help?

If one of the folk listening to this thread wanted to add a page to the GHC 
Commentary distilling this thread into Wiki material, I'd be happy to check its 
accuracy.
http://hackage.haskell.org/trac/ghc/wiki/Commentary

Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Neil Mitchell
Hi

 Other rules that could be interesting are:
   forall a b. fromInteger a + fromInteger b = fromInteger (a + b)
   forall a b. fromInteger a * fromInteger b = fromInteger (a * b)

This is wrong, since the class function can do what it wants. Imagine:

instance Num String where
   (+) = (++)
   fromInteger x = show x

1 + 2 :: String

this expression now goes from 12 to 3 by applying this rule.

You need to be incredibly careful if there are any classes floating around.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Stefan O'Rear
On Wed, Aug 22, 2007 at 09:04:11AM +0100, Simon Peyton-Jones wrote:
 | First of all, optimizing mod and div can not be done with PrelRules,
 | because they are not primitives, quot and rem are.
 
 Yes, you can do them with PrelRules!  Check out PrelRules.builtinRules.
 
 | Multiplication and division can become shifts:
 |
 |   {-# RULES
 |  
 |   -- x * 2^n  --  x `shiftL` n
 |   x# *# 2#  forall x#.  x# *# 2# = x# `iShiftL#` 1#
 |   2# *# x#  forall x#.  2# *# x# = x# `iShiftL#` 1#
 | -- etc.
 
 
 | A problem with these rules is that you need a whole lot of them. 32 per
 | operation (on a 32 bit platform), * 4 operations, * 2 separate versions
 | for words and ints = 256.
 
 I think you should be able to a lot better.  For example, to do
 constant folding for +# you might think you needed a lot of rules
 
 1# +# 2# = 3#
 1# +# 3# = 4#
 etc
 
 But not so!  See PrelRules for how to write one rule that does all of
 these at once.  I think you can do multiply-to-shift in the same way.
 
 The downside of PrelRules is that it's part of the compiler, not in
 Haskell pragmas; that's what makes it more expressive than rules
 written in source code.

Something I've pondered is adding a more-expressive form of RULES which
works using general pattern matching:

{-# XRULES
 *#-to-shift (*#)
  (CoreLit (CoreInt num)) obj
  | num .. (num - 1) == 0  -
  CoreApp (CoreVar GHC.Prim.iShiftL#)
  [obj, CoreLit (CoreInt (lg2 num))]
  obj (CoreLit (CoreInt num))
  | num .. (num - 1) == 0  -
  CoreApp (CoreVar GHC.Prim.iShiftL#)
  [obj, CoreLit (CoreInt (lg2 num))]
#-}

This would require reusing the TH infrastructure, and (depending on how
much we can abstract) might leak too many details of Core to be
useful; on the other hand it would allow some very interesting domain
optimizations to be done.

Views might be nice here.

Opinions on whether something like this is a good idea?

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Simon Peyton-Jones
| Something I've pondered is adding a more-expressive form of RULES which
| works using general pattern matching:

Yes, but it would need the rule-matcher in the Simplifier to be more 
sophisticated.  Have a look in specialise/Rules.lhs.

No need to be so ambitious; just moving towards what you can do in PrelRules 
would be an improvement

Simon

|
| {-# XRULES
|  *#-to-shift (*#)
|   (CoreLit (CoreInt num)) obj
|   | num .. (num - 1) == 0  -
|   CoreApp (CoreVar GHC.Prim.iShiftL#)
|   [obj, CoreLit (CoreInt (lg2 num))]
|   obj (CoreLit (CoreInt num))
|   | num .. (num - 1) == 0  -
|   CoreApp (CoreVar GHC.Prim.iShiftL#)
|   [obj, CoreLit (CoreInt (lg2 num))]
| #-}
|
| This would require reusing the TH infrastructure, and (depending on how
| much we can abstract) might leak too many details of Core to be
| useful; on the other hand it would allow some very interesting domain
| optimizations to be done.
|
| Views might be nice here.
|
| Opinions on whether something like this is a good idea?
|
| Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Isaac Dupree

Simon Peyton-Jones wrote:

| Something I've pondered is adding a more-expressive form of RULES which
| works using general pattern matching:

Yes, but it would need the rule-matcher in the Simplifier to be more 
sophisticated.  Have a look in specialise/Rules.lhs.

No need to be so ambitious; just moving towards what you can do in PrelRules 
would be an improvement


Careful about concealing
http://hackage.haskell.org/trac/ghc/ticket/1603 even more than it 
already is! (Not actually, but it would probably be even harder to track 
down and still be a bug in some really obscure cases.  Maybe there's a 
flag to disable PrelRules-optimizations for a run of ghc.)


Isaac
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Albert Y. C. Lai

Neil Mitchell wrote:

Other rules that could be interesting are:
  forall a b. fromInteger a + fromInteger b = fromInteger (a + b)
  forall a b. fromInteger a * fromInteger b = fromInteger (a * b)


This is wrong, since the class function can do what it wants. Imagine:

instance Num String where
   (+) = (++)
   fromInteger x = show x

1 + 2 :: String

this expression now goes from 12 to 3 by applying this rule.

You need to be incredibly careful if there are any classes floating around.


Do we assume Num instances to obey Num axioms, just like Arrow instances 
to obey Arrow axioms? My impression is that the GHC de-sugaring of the 
proc notation contains an optimizing stage that uses arrow axioms. This 
is a good precedence.


If Num obeys ring axioms, fromInteger is a perfectly fine 
ring-homomorphism. (It's also the first or second homomorphism taught.)


If a and b are large integers and the target ring is small, fromInteger 
(a + b) is likely slower. That is my concern.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Neil Mitchell
Hi

 If Num obeys ring axioms, fromInteger is a perfectly fine
 ring-homomorphism. (It's also the first or second homomorphism taught.)

Does Int obey these axioms? I'm thinking that assuming properties
about things such as numbers is very likely to go wrong very quickly.
Monads you might be able to get away with, Numbers you probably can't.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Andrew Coppin

Neil Mitchell wrote:
  

If Num obeys ring axioms, fromInteger is a perfectly fine
ring-homomorphism. (It's also the first or second homomorphism taught.)



Does Int obey these axioms? I'm thinking that assuming properties
about things such as numbers is very likely to go wrong very quickly.
Monads you might be able to get away with, Numbers you probably can't.
  


Now, see, you don't find this kind of discussion in other languages. If 
you go talk about Java, people just flame each other about whether we 
should have true MI or not... ;-)


Ah, I *like* it here. :-D

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Stefan O'Rear
On Wed, Aug 22, 2007 at 06:36:15PM +0100, Neil Mitchell wrote:
 Hi
 
  If Num obeys ring axioms, fromInteger is a perfectly fine
  ring-homomorphism. (It's also the first or second homomorphism taught.)
 
 Does Int obey these axioms? I'm thinking that assuming properties
 about things such as numbers is very likely to go wrong very quickly.
 Monads you might be able to get away with, Numbers you probably can't.

Int does obey the axioms, it's the classical ring ℤ[4294967296].
Double, however, does not:

[EMAIL PROTECTED]:~$ ghci
GHCi, version 6.7.20070712: http://www.haskell.org/ghc/  :? for help
Loading package base ... linking ... done.
Prelude let x = 100 ; y = 1 - x
Prelude fromInteger (x + y) :: Double
1.0
Prelude fromInteger x + fromInteger y :: Double
0.0


Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Twan van Laarhoven

Stefan O'Rear wrote:

On Wed, Aug 22, 2007 at 06:36:15PM +0100, Neil Mitchell wrote:


Hi



If Num obeys ring axioms, fromInteger is a perfectly fine
ring-homomorphism. (It's also the first or second homomorphism taught.)


Does Int obey these axioms? I'm thinking that assuming properties
about things such as numbers is very likely to go wrong very quickly.
Monads you might be able to get away with, Numbers you probably can't.



Int does obey the axioms, it's the classical ring ℤ[4294967296].
Double, however, does not:


But Double is already quite badly behaved:
 let x = 1e20
 Prelude 1 + (x - x)
 1.0
 Prelude (1 + x) - x
 0.0

Using the fromInteger (and fromRational) axioms should only *increase* 
precission, I don't see how that is such a bad thing.


Also, as far as I can see GHC already does this optimizations if the 
type is specialized to Double. Except for the fact that the PrelRules 
rules don't seem to fire, because the constants get floated out.


Twan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Rodrigo Queiro
 Using the fromInteger (and fromRational) axioms should only *increase*
 precission, I don't see how that is such a bad thing.

I think it's bad if the behaviour of your program depends on the
optimisation level.

On 22/08/07, Twan van Laarhoven [EMAIL PROTECTED] wrote:
 Stefan O'Rear wrote:
  On Wed, Aug 22, 2007 at 06:36:15PM +0100, Neil Mitchell wrote:
 
 Hi
 
 
 If Num obeys ring axioms, fromInteger is a perfectly fine
 ring-homomorphism. (It's also the first or second homomorphism taught.)
 
 Does Int obey these axioms? I'm thinking that assuming properties
 about things such as numbers is very likely to go wrong very quickly.
 Monads you might be able to get away with, Numbers you probably can't.
 
 
  Int does obey the axioms, it's the classical ring ℤ[4294967296].
  Double, however, does not:

 But Double is already quite badly behaved:
   let x = 1e20
   Prelude 1 + (x - x)
   1.0
   Prelude (1 + x) - x
   0.0

 Using the fromInteger (and fromRational) axioms should only *increase*
 precission, I don't see how that is such a bad thing.

 Also, as far as I can see GHC already does this optimizations if the
 type is specialized to Double. Except for the fact that the PrelRules
 rules don't seem to fire, because the constants get floated out.

 Twan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Derek Elkins
On Thu, 2007-08-23 at 10:17 +1000, Donald Bruce Stewart wrote:
 overdrigzed:
   Using the fromInteger (and fromRational) axioms should only *increase*
   precission, I don't see how that is such a bad thing.
  
  I think it's bad if the behaviour of your program depends on the
  optimisation level.
  
 
 Isn't this how the thread started?
 
 $ gcc t.c -o a
 $ ./a 1
 zsh: segmentation fault (core dumped)  ./a 1
 
 $ gcc -O3 t.c -o b
 $ ./b 1 
 987459712

Don't make me pull out sum.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-22 Thread Hugh Perkins
On 8/22/07, Twan van Laarhoven [EMAIL PROTECTED] wrote:
 But Double is already quite badly behaved:
   let x = 1e20
   Prelude 1 + (x - x)
   1.0
   Prelude (1 + x) - x
   0.0

E.  Whilst that's understandable and unavoidable, that kindof
rings alarm bells for folds of Doubles in an automatic threading
situation.  Ie, if we split our fold between 4 cores, we might get a
different answer than if we split it between 8 cores?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Philip Armstrong

On Mon, Aug 20, 2007 at 09:57:38PM +0100, Simon Peyton-Jones wrote:

GHC does some constant folding, but little by way of strength
reduction, or using shifts instead of multiplication.  It's pretty
easy to add more: it's all done in a single module.  Look at
primOpRules in the module PrelRules.

Patches welcome!  But please also supply test-suite tests that check
the correctness of the rules.


Sucking another example out of comp.lang.functional:

This:

  import System

  f :: Int - Int - Int
  f s n = if n  0 then f (s+n) (n-1) else s

  main = do
[n] - getArgs
putStrLn $ show $ f 0 (read n) 


is 3-4x slower than this:

  #include stdio.h
  #include stdlib.h
  #include assert.h

  int f(int s, int n) { 
return n  0 ? f(s+n, n-1) : s;

  }

  int main(int argc, char *argv[]) { 
assert(argc == 2);

printf(%d\n, f(0, strtol(argv[1],0,0)));
  }

The generated assembler suggests (if I've read it correctly) that gcc
is spotting that it can replace the tail call with a jump in the C
version, but for some reason it can't spot it for the Haskell version
when compiling with -fvia-C (and neither does ghc itself using
-fasm). So the haskell version ends up pushing and popping values on
and off the stack for every call to f, which is a bit sad.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Donald Bruce Stewart
phil:
 On Mon, Aug 20, 2007 at 09:57:38PM +0100, Simon Peyton-Jones wrote:
 GHC does some constant folding, but little by way of strength
 reduction, or using shifts instead of multiplication.  It's pretty
 easy to add more: it's all done in a single module.  Look at
 primOpRules in the module PrelRules.
 
 Patches welcome!  But please also supply test-suite tests that check
 the correctness of the rules.
 
 Sucking another example out of comp.lang.functional:
 
 This:
 
   import System
 
   f :: Int - Int - Int
   f s n = if n  0 then f (s+n) (n-1) else s
 
   main = do
 [n] - getArgs
 putStrLn $ show $ f 0 (read n) 
 
 is 3-4x slower than this:
 
   #include stdio.h
   #include stdlib.h
   #include assert.h
 
   int f(int s, int n) { 
 return n  0 ? f(s+n, n-1) : s;
   }
 
   int main(int argc, char *argv[]) { 
 assert(argc == 2);
 printf(%d\n, f(0, strtol(argv[1],0,0)));
   }
 
 The generated assembler suggests (if I've read it correctly) that gcc
 is spotting that it can replace the tail call with a jump in the C
 version, but for some reason it can't spot it for the Haskell version
 when compiling with -fvia-C (and neither does ghc itself using
 -fasm). So the haskell version ends up pushing and popping values on
 and off the stack for every call to f, which is a bit sad.
 

That doesn't sound quite right. The C version should get a tail call ,
with gcc -O2, the Haskell version should be a tail call anyway.

Let's see:

C
$ gcc -O t.c -o t 
$ time ./t 10
zsh: segmentation fault (core dumped)  ./t 10
./t 10  0.02s user 0.22s system 5% cpu 4.640 total

Turning on -O2

$ time ./t 10
-243309312
./t 10  1.89s user 0.00s system 97% cpu 1.940 total


And GHC:

$ ghc -O2 A.hs -o A
$ time ./A 10   
   
-243309312
./A 10  3.21s user 0.01s system 97% cpu 3.289 total

So, what, 1.6x slower than gcc -O2
Seems ok without any tuning.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Rodrigo Queiro
On my system, the C version runs about 9x faster than the haskell
version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
seems to produce about 70 lines of assembly for the main loop,
compared to about 10 from GHC. I suspect the speed difference is the
result of some heavy optimisation by GCC, which would need to be
hand-tuned for GHC. (I would be interested to see what this would be.
Unfortunately I don't know x86 assembly well enough to understand the
GCC output.)

On 21/08/07, Donald Bruce Stewart [EMAIL PROTECTED] wrote:
 phil:
  On Mon, Aug 20, 2007 at 09:57:38PM +0100, Simon Peyton-Jones wrote:
  GHC does some constant folding, but little by way of strength
  reduction, or using shifts instead of multiplication.  It's pretty
  easy to add more: it's all done in a single module.  Look at
  primOpRules in the module PrelRules.
  
  Patches welcome!  But please also supply test-suite tests that check
  the correctness of the rules.
 
  Sucking another example out of comp.lang.functional:
 
  This:
 
import System
 
f :: Int - Int - Int
f s n = if n  0 then f (s+n) (n-1) else s
 
main = do
  [n] - getArgs
  putStrLn $ show $ f 0 (read n)
 
  is 3-4x slower than this:
 
#include stdio.h
#include stdlib.h
#include assert.h
 
int f(int s, int n) {
  return n  0 ? f(s+n, n-1) : s;
}
 
int main(int argc, char *argv[]) {
  assert(argc == 2);
  printf(%d\n, f(0, strtol(argv[1],0,0)));
}
 
  The generated assembler suggests (if I've read it correctly) that gcc
  is spotting that it can replace the tail call with a jump in the C
  version, but for some reason it can't spot it for the Haskell version
  when compiling with -fvia-C (and neither does ghc itself using
  -fasm). So the haskell version ends up pushing and popping values on
  and off the stack for every call to f, which is a bit sad.
 

 That doesn't sound quite right. The C version should get a tail call ,
 with gcc -O2, the Haskell version should be a tail call anyway.

 Let's see:

 C
 $ gcc -O t.c -o t
 $ time ./t 10
 zsh: segmentation fault (core dumped)  ./t 10
 ./t 10  0.02s user 0.22s system 5% cpu 4.640 total

 Turning on -O2

 $ time ./t 10
 -243309312
 ./t 10  1.89s user 0.00s system 97% cpu 1.940 total


 And GHC:

 $ ghc -O2 A.hs -o A
 $ time ./A 10
 -243309312
 ./A 10  3.21s user 0.01s system 97% cpu 3.289 total

 So, what, 1.6x slower than gcc -O2
 Seems ok without any tuning.

 -- Don
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Stefan O'Rear
On Tue, Aug 21, 2007 at 01:14:20PM +0100, Rodrigo Queiro wrote:
 On my system, the C version runs about 9x faster than the haskell
 version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
 seems to produce about 70 lines of assembly for the main loop,
 compared to about 10 from GHC. I suspect the speed difference is the
 result of some heavy optimisation by GCC, which would need to be
 hand-tuned for GHC. (I would be interested to see what this would be.
 Unfortunately I don't know x86 assembly well enough to understand the
 GCC output.)

The fundamental problem is that GHC doesn't have enough registers to to
a good job with Haskell.  Normal Haskell code  makes extensive use of
the GHC stack for function calls, the C stack for signal handlers, the
capability base pointer for thread state, and the heap for everything
else.  Which doesn't really leave us in a good state for optimizing.  In
particular, x86 ghc ALWAYS passes parameters on the stack, even for tail
calls.  I didn't actually bother to check, but I'm pretty sure that's
what the OP was noticing - if you look carefully it's not actually
pushing or popping anything, just using stack memory.

Situations are far better on x86_64 (16 registers) and ppc (32
registers).  There is some work being done on the backend to improve
this (in particular, a new and much better register allocator and a
parameter-aware Cmm system).

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Philip Armstrong

On Tue, Aug 21, 2007 at 05:25:49AM -0700, Stefan O'Rear wrote:

On Tue, Aug 21, 2007 at 01:14:20PM +0100, Rodrigo Queiro wrote:

On my system, the C version runs about 9x faster than the haskell
version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
seems to produce about 70 lines of assembly for the main loop,
compared to about 10 from GHC. I suspect the speed difference is the
result of some heavy optimisation by GCC, which would need to be
hand-tuned for GHC. (I would be interested to see what this would be.
Unfortunately I don't know x86 assembly well enough to understand the
GCC output.)


GCC is carrying out two major optimisations that ghc is missing here:
replacing the tail call with a jump directly into the function body
(having stuffed the correct arguments into the appropriate registers)
and unrolling the loop. That's pretty much it. Neither are what I'd
call 'heavy' optimisations.


The fundamental problem is that GHC doesn't have enough registers to to
a good job with Haskell.  Normal Haskell code  makes extensive use of
the GHC stack for function calls, the C stack for signal handlers, the
capability base pointer for thread state, and the heap for everything
else.  Which doesn't really leave us in a good state for optimizing.  In
particular, x86 ghc ALWAYS passes parameters on the stack, even for tail
calls.  I didn't actually bother to check, but I'm pretty sure that's
what the OP was noticing - if you look carefully it's not actually
pushing or popping anything, just using stack memory.


Yes, absolutely.


Situations are far better on x86_64 (16 registers) and ppc (32
registers).  There is some work being done on the backend to improve
this (in particular, a new and much better register allocator and a
parameter-aware Cmm system).


fires up ppc box

Ouch. That's even worse:

$ ./sum 1

C version: 0.16s
Haskell  : 1.40s

Looking at the generated assembler, the ppc version has exactly the
same problem that the x86 version does. It carries out the
calculation, the stores the result in some memory locations and calls f
again so that the preamble to f can pull those same results out of the
memory locations in order to put them back into the same registers
again!

(I'm using ghc 6.6.1 on Debian unstable btw for anyone following along.)

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Philip Armstrong

Don's reply didn't reach me for some reason, but pulling it out of the
previous response:


On 21/08/07, Donald Bruce Stewart [EMAIL PROTECTED] wrote:

phil:
 The generated assembler suggests (if I've read it correctly) that gcc
 is spotting that it can replace the tail call with a jump in the C
 version, but for some reason it can't spot it for the Haskell version
 when compiling with -fvia-C (and neither does ghc itself using
 -fasm). So the haskell version ends up pushing and popping values on
 and off the stack for every call to f, which is a bit sad.


That doesn't sound quite right. The C version should get a tail call ,
with gcc -O2, the Haskell version should be a tail call anyway.


Just to be clear; the Haskell version is a tail call, but it's pushing
the values to and from memory (well, cache really of course) for every
call to f, which is killing the performance.


Let's see:

C
$ gcc -O t.c -o t
$ time ./t 10
zsh: segmentation fault (core dumped)  ./t 10
./t 10  0.02s user 0.22s system 5% cpu 4.640 total

Turning on -O2

$ time ./t 10
-243309312
./t 10  1.89s user 0.00s system 97% cpu 1.940 total


-O3 does better thanks to the loop unrolling, see timings bellow.


And GHC:

$ ghc -O2 A.hs -o A
$ time ./A 10
-243309312
./A 10  3.21s user 0.01s system 97% cpu 3.289 total

So, what, 1.6x slower than gcc -O2
Seems ok without any tuning.


You're getting much better timings than I am!

$ time -p ./sum-hs 10
-243309312
real 3.75
user 3.70
$ time -p ./sum-c-O2 10
-243309312
real 1.40
user 1.35
$ time -p ./sum-c-O3 10
-243309312
real 1.21
user 1.18

(My box has a AMD Athlon64 3000+ CPU fwiw, but the powerpc version is
even worse when compared to it's respective C binary!)

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Hugh Perkins
On 8/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
 Currently, it's never worse.  GHC's backend is about as good as GCC;
 most of the optimiations it doesn't do are not possible for GCC because
 of various lack-of-information problems (the stack pointer never aliases
 the heap pointer, stuff like that).  It's conceivable that at some point
 -fasm will be faster, because you have the possibility of much more
 accurate aliasing information inside the compiler, than can be coded in
 C.  In the meantime, note that the runtime difference is less than 3%
 and the compile time difference is over 100%, so it's only worthwhile if
 you expect *this version* of your program to be used more than 30 times,
 ie releases only.

Wait, you're saying that ghc can produce pure c-code, that doesnt
contain any assembly code, and that runs as fast as ghc code that does
contain assembly?

Sooo if I was feeling evil, could I take this c-code and pipe it
into something that turns it into C#???   If it contains lots of
macros (or any macros at all perhaps...), this becomes non-trivial,
but otherwise I think most things in C can be mapped fairly trivially
to C#?  (It's a one-way mapping of course, eg delete in C is simply
dropped when mapped to c#).

(Not that I have any good reason to do this, simply... fun).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Neil Mitchell
Hi

 Wait, you're saying that ghc can produce pure c-code, that doesnt
 contain any assembly code, and that runs as fast as ghc code that does
 contain assembly?

No. It can produce pure C code (unregistered), but to get high
performance it processes the output assembly afterwards (registered).

 Sooo if I was feeling evil, could I take this c-code and pipe it
 into something that turns it into C#???

You might be able to. Much easier would be to use Yhc and pass the
--dotnet flag which generates .NET binaries natively.

 macros (or any macros at all perhaps...), this becomes non-trivial,
 but otherwise I think most things in C can be mapped fairly trivially
 to C#?  (It's a one-way mapping of course, eg delete in C is simply
 dropped when mapped to c#).

There isn't going to be much free/delete, its all a garbage collected heap.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Stefan O'Rear
On Tue, Aug 21, 2007 at 09:39:32PM +0800, Hugh Perkins wrote:
 On 8/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
  Currently, it's never worse.  GHC's backend is about as good as GCC;
  most of the optimiations it doesn't do are not possible for GCC because
  of various lack-of-information problems (the stack pointer never aliases
  the heap pointer, stuff like that).  It's conceivable that at some point
  -fasm will be faster, because you have the possibility of much more
  accurate aliasing information inside the compiler, than can be coded in
  C.  In the meantime, note that the runtime difference is less than 3%
  and the compile time difference is over 100%, so it's only worthwhile if
  you expect *this version* of your program to be used more than 30 times,
  ie releases only.
 
 Wait, you're saying that ghc can produce pure c-code, that doesnt
 contain any assembly code, and that runs as fast as ghc code that does
 contain assembly?

No.

Name:Registerized C
Performance: 1.00
Flags:   -fvia-C

C, but with gcc and machine specific hacks to implement general tail
calls and (most notably) register global variables.

Name:Native code generator
Performance: 0.97
Flags:   -fasm

GHC's own mini C compiler converts the internal C-- data into assembly
code, which is then piped to gas.

Name:Unregisterized C
Performance: 0.40
Flags:   -unreg

Generates near-ANSI C, using memory variables for the VM's registers and
the returning function pointer hack seen in oh so many Scheme compilers.
Good for early stages of porting, and not much else.

Name:Byte-code
Performance: 0.05
Flags:   -fbyte-code (GHCi HEAD only)

Generates a compact form of STG code, and then interprets it.  A
generally quite bad idea, whose main redeeming feature is that it
doesn't require starting the GNU toolchain.

 Sooo if I was feeling evil, could I take this c-code and pipe it
 into something that turns it into C#???

Yes.  You could do the same with the original haskell.  It's called a
compiler.

 If it contains lots of macros (or any macros at all perhaps...), this
 becomes non-trivial,

I fail to see how macros have anything to do with this.  Especially
since cpp removes them all.

 but otherwise I think most things in C can be mapped fairly trivially
 to C#?

Unsafe C#, sure.  Haskell's type system is strictly more expressive than
C#, and you need to sacrifice either machine efficiency or checked
safety.

 (It's a one-way mapping of course, eg delete in C is simply dropped
 when mapped to c#).

There is no delete in C, and even if there was, GHC wouldn't use it.
Allocation is *the* major bottleneck of functional programs, and having
a custom allocator inlined into every call site is vital to have usable
performance.

 (Not that I have any good reason to do this, simply... fun).

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Andrew Coppin

Simon Peyton-Jones wrote:

GHC does some constant folding, but little by way of strength reduction, or 
using shifts instead of multiplication.  It's pretty easy to add more: it's all 
done in a single module.  Look at primOpRules in the module PrelRules.

Patches welcome!  But please also supply test-suite tests that check the 
correctness of the rules.
  


So... you mean it's source-level transformation rules? (Rather than 
wired into the compiler itself somewhere.)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Simon Peyton-Jones
|  GHC does some constant folding, but little by way of strength
| reduction, or using shifts instead of multiplication.  It's pretty easy
| to add more: it's all done in a single module.  Look at primOpRules in
| the module PrelRules.
| 
|  Patches welcome!  But please also supply test-suite tests that check
| the correctness of the rules.
| 
|
| So... you mean it's source-level transformation rules? (Rather than
| wired into the compiler itself somewhere.)

No, constant folding is part of the compiler, I'm afraid, in the module 
PrelRules.

Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Isaac Dupree

Simon Peyton-Jones wrote:

|  GHC does some constant folding, but little by way of strength
| reduction, or using shifts instead of multiplication.  It's pretty easy
| to add more: it's all done in a single module.  Look at primOpRules in
| the module PrelRules.
| 
|  Patches welcome!  But please also supply test-suite tests that check
| the correctness of the rules.
| 
|
| So... you mean it's source-level transformation rules? (Rather than
| wired into the compiler itself somewhere.)

No, constant folding is part of the compiler, I'm afraid, in the module 
PrelRules.

Simon


_Constant_ folding is, but in GHC.Base there are rules like (unboxed) 
multiplying by zero or one, or adding or subtracting zero, from an 
unknown other (non-constant) value.  I think shifts might be doable via 
RULES... if you were willing to make one rule for each denominator 2, 4, 
8 and so on, which rather depends on max. Int... (and that's not 
Integers either, I guess)


Isaac
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Twan van Laarhoven

Isaac Dupree wrote:

Simon Peyton-Jones wrote:


...
No, constant folding is part of the compiler, I'm afraid, in the 
module PrelRules.


Simon



_Constant_ folding is, but in GHC.Base there are rules like (unboxed) 
multiplying by zero or one, or adding or subtracting zero, from an 
unknown other (non-constant) value.  I think shifts might be doable via 
RULES... if you were willing to make one rule for each denominator 2, 4, 
8 and so on, which rather depends on max. Int... (and that's not 
Integers either, I guess)



Just to see what this would look like.


First of all, optimizing mod and div can not be done with PrelRules, 
because they are not primitives, quot and rem are. And most of the nice 
optimizations with shifts no longer work there. But using rules should 
work, assuming the inliner is not too fast.


Multiplication and division can become shifts:

 {-# RULES

 -- x * 2^n  --  x `shiftL` n
 x# *# 2#  forall x#.  x# *# 2# = x# `iShiftL#` 1#
 2# *# x#  forall x#.  2# *# x# = x# `iShiftL#` 1#
   -- etc.

 -- x `div` 2^n  --  x `shiftR` n
 x# `divInt#` 2#  forall x#.  divInt# x# 2# = x# `iShiftRA#` 1#
 x# `divInt#` 4#  forall x#.  divInt# x# 4# = x# `iShiftRA#` 2#
   -- etc.

Mod can become and:

 -- x `mod` 2^n  --  x .. (2^n - 1)
 x# `modInt#` 2#  forall x#.  modInt# x# 2# = andInt# x# 1#
 x# `modInt#` 4#  forall x#.  modInt# x# 4# = andInt# x# 3#
   -- etc.

   #-}

Here I use a new function (see instance Bits Int),

 andInt# :: Int# - Int# - Int#
 andInt# x# y# = word2Int# (int2Word# x# `and#` int2Word# y#)

but you could write that inline as well.

A problem with these rules is that you need a whole lot of them. 32 per 
operation (on a 32 bit platform), * 4 operations, * 2 separate versions 
for words and ints = 256.




Other rules that could be interesting are:
 forall a b. fromInteger a + fromInteger b = fromInteger (a + b)
 forall a b. fromInteger a * fromInteger b = fromInteger (a * b)
 -- etc.
To allow optimizations on generic Num code, although I am not sure what 
the Haskell spec has to say about this.




Now, if you want to get really creative you can use other semi-evil 
optimization tricks for quot and rem.

The following is based on code generated by Visual C++:
 -- remPowInt x y == x `rem` (2^y)
 remPowInt x y
 | r = 0 =  r
 | otherwise  =  ((r - 1) .|. (complement yWithSign)) + 1
   where  r = x .. yWithSign
  yWithSign = (1 `shiftL` (bitSize - 1)) .|.
  ((1 `shiftL` y) - 1)
Or in assembly (for y == 2, so x `rem` 4)
  and ecx,8007h
  jns main+60h (401060h)
  dec ecx
  or  ecx,0FFF8h
  inc ecx

The C++ compiler also performs other optimizations when multiplying with 
other constants, for example *3 becomes something like

  lea eax, [eax+eax*2]
Divisions become horrendous constructs with magic numbers,
   -- eax := ecx / 5
  mov eax,6667h
  imulecx
  sar edx,1
  mov eax,edx
  shr eax,1Fh
  add eax,edx
But such things are probably best left to the code generator / a 
peephole optimizer, if they are done at all. I think the LEA trick 
should be feasible.



Twan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Brandon S. Allbery KF8NH


On Aug 21, 2007, at 22:13 , Twan van Laarhoven wrote:


Other rules that could be interesting are:
 forall a b. fromInteger a + fromInteger b = fromInteger (a + b)


I don't think this will work, a and b have to be the same type.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Twan van Laarhoven

Brandon S. Allbery KF8NH wrote:



On Aug 21, 2007, at 22:13 , Twan van Laarhoven wrote:


Other rules that could be interesting are:
 forall a b. fromInteger a + fromInteger b = fromInteger (a + b)



I don't think this will work, a and b have to be the same type.


They are of the same type, both are Integers,

 forall a b :: Integer.
 ((fromInteger (a::Integer)) + (fromInteger b)) :: Num n = n
   =
 (fromInteger (a + b :: Integer)) :: Num n = n

Twan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-21 Thread Hugh Perkins
Thank-you for the information.  It was very useful.  Couple of reactions FWIW:

On 8/21/07, Stefan O'Rear [EMAIL PROTECTED] wrote:
  Sooo if I was feeling evil, could I take this c-code and pipe it
  into something that turns it into C#???

 Yes.  You could do the same with the original haskell.  It's called a
 compiler.

Yes, that is true.  However, this is also true, for an appropriate
compiler, for programs such as:

Give me the first 10 numbers of the Fibonnacci (spelling?) series.

The compiler can search on the internet for what is the Fibonnacci
series, and/or ask its friends.

In a subsequent version, a compiler could in fact compile programs such as:

Go!

... where the compiler uses context to deduce what I want it to do ;-)

Nevertheless certain compilers are easier to write than others, and
writing code to automatically port ghc-generated C code is likely to
be significantly easier than to compile Haskell to C#, or to .Net
bytecode, from scratch.


 Name:Native code generator
 Performance: 0.97
 Flags:   -fasm

 GHC's own mini C compiler converts the internal C-- data into assembly
 code, which is then piped to gas.

Ah, hence SPJ's C-- project?


 Name:Unregisterized C
 Performance: 0.40
 Flags:   -unreg

 Generates near-ANSI C, using memory variables for the VM's registers and
 the returning function pointer hack seen in oh so many Scheme compilers.
 Good for early stages of porting, and not much else.

Could be good enough.  C# compiler and VM provides some optimizations
which could handle this.  What is the function pointer hack?
Specifically, is that why you say near-ANSI C, rather than ANSI C?


  If it contains lots of macros (or any macros at all perhaps...), this
  becomes non-trivial,

 I fail to see how macros have anything to do with this.  Especially
 since cpp removes them all.

  but otherwise I think most things in C can be mapped fairly trivially
  to C#?

 Unsafe C#, sure.  Haskell's type system is strictly more expressive than
 C#, and you need to sacrifice either machine efficiency or checked
 safety.

  (It's a one-way mapping of course, eg delete in C is simply dropped
  when mapped to c#).

 There is no delete in C, and even if there was, GHC wouldn't use it.
 Allocation is *the* major bottleneck of functional programs, and having
 a custom allocator inlined into every call site is vital to have usable
 performance.

  (Not that I have any good reason to do this, simply... fun).

 Stefan

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)

 iD8DBQFGyur/FBz7OZ2P+dIRAleTAJ9WiK8tCp0QZE4syG4BZk5EFm1FuQCgzYGK
 NUv22zY5IgeqkEJ5kL3yriQ=
 =0Xkq
 -END PGP SIGNATURE-


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-20 Thread Andrew Coppin

Stefan O'Rear wrote:

On Sun, Aug 19, 2007 at 12:53:07PM +0100, Andrew Coppin wrote:
  
Does GHC do stuff like converting (2*) into (shift 1) or converting x + x 
into 2*x?



For a good time, compile some code which uses even or odd :: Int - Bool
using -O2 -fasm -ddump-asm...  The compiler *really* shouldn't be using 'idivl'.

(If you use -fvia-C -optc-O2, the C compiler will notice the operations
and optimize it itself.  This is one of the very few areas where -fvia-C
is still better.)
  


The way I heard it is that compilation via C is always better, though 
they plan to change that some day. I don't know how true that is...



If I do x * sin 12, is GHC likely to compute sin 12 at compile-time?



Also try -ddump-simpl-stats and -ddump-simpl-iterations if you want to
know *why*.  (The extremely obscure 'full laziness' transformation
performed by GHC has a fundamental effect on the compilation of x * sin
12...)
  


Hmm, OK.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-20 Thread Stefan O'Rear
On Mon, Aug 20, 2007 at 06:30:27PM +0100, Andrew Coppin wrote:
 Stefan O'Rear wrote:
 On Sun, Aug 19, 2007 at 12:53:07PM +0100, Andrew Coppin wrote:
   
 Does GHC do stuff like converting (2*) into (shift 1) or converting x + x 
 into 2*x?

 For a good time, compile some code which uses even or odd :: Int - Bool
 using -O2 -fasm -ddump-asm...  The compiler *really* shouldn't be using 
 'idivl'.

 (If you use -fvia-C -optc-O2, the C compiler will notice the operations
 and optimize it itself.  This is one of the very few areas where -fvia-C
 is still better.)

 The way I heard it is that compilation via C is always better, though 
 they plan to change that some day. I don't know how true that is...

Currently, it's never worse.  GHC's backend is about as good as GCC;
most of the optimiations it doesn't do are not possible for GCC because
of various lack-of-information problems (the stack pointer never aliases
the heap pointer, stuff like that).  It's conceivable that at some point
-fasm will be faster, because you have the possibility of much more
accurate aliasing information inside the compiler, than can be coded in
C.  In the meantime, note that the runtime difference is less than 3%
and the compile time difference is over 100%, so it's only worthwhile if
you expect *this version* of your program to be used more than 30 times,
ie releases only.

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] GHC optimisations

2007-08-20 Thread Simon Peyton-Jones
GHC does some constant folding, but little by way of strength reduction, or 
using shifts instead of multiplication.  It's pretty easy to add more: it's all 
done in a single module.  Look at primOpRules in the module PrelRules.

Patches welcome!  But please also supply test-suite tests that check the 
correctness of the rules.

Simon

| -Original Message-
| From: [EMAIL PROTECTED] [mailto:haskell-cafe-
| [EMAIL PROTECTED] On Behalf Of Stefan O'Rear
| Sent: 19 August 2007 20:14
| To: Andrew Coppin
| Cc: haskell-cafe@haskell.org
| Subject: Re: [Haskell-cafe] GHC optimisations
|
| On Sun, Aug 19, 2007 at 12:53:07PM +0100, Andrew Coppin wrote:
|  Does GHC do stuff like converting (2*) into (shift 1) or converting x
| + x
|  into 2*x?
|
| For a good time, compile some code which uses even or odd :: Int -
| Bool
| using -O2 -fasm -ddump-asm...  The compiler *really* shouldn't be using
| 'idivl'.
|
| (If you use -fvia-C -optc-O2, the C compiler will notice the operations
| and optimize it itself.  This is one of the very few areas where -fvia-
| C
| is still better.)
|
|  If I do x * sin 12, is GHC likely to compute sin 12 at compile-time?
|
| Also try -ddump-simpl-stats and -ddump-simpl-iterations if you want to
| know *why*.  (The extremely obscure 'full laziness' transformation
| performed by GHC has a fundamental effect on the compilation of x * sin
| 12...)
|
| Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] GHC optimisations

2007-08-19 Thread Andrew Coppin
Does GHC do stuff like converting (2*) into (shift 1) or converting x + 
x into 2*x?


If I do x * sin 12, is GHC likely to compute sin 12 at compile-time?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-19 Thread Tim Chevalier
On 8/19/07, Andrew Coppin [EMAIL PROTECTED] wrote:
 Does GHC do stuff like converting (2*) into (shift 1) or converting x +
 x into 2*x?


Hmm, that's an interesting architecture where multiplication is
cheaper than addition :-)

 If I do x * sin 12, is GHC likely to compute sin 12 at compile-time?


I seriously doubt it, but there's an easy way to find out the answer
to all these questions: pass the -ddump-simpl flag to GHC so that it
prints out the intermediate code it generates. There's some
information in the users' guide about how to read the results.

Cheers,
Tim

-- 
Tim Chevalier * catamorphism.org * Often in error, never in doubt
I always wanted to be commander-in-chief of my own one-woman army --
Ani DiFranco
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-19 Thread Stefan O'Rear
On Sun, Aug 19, 2007 at 12:53:07PM +0100, Andrew Coppin wrote:
 Does GHC do stuff like converting (2*) into (shift 1) or converting x + x 
 into 2*x?

For a good time, compile some code which uses even or odd :: Int - Bool
using -O2 -fasm -ddump-asm...  The compiler *really* shouldn't be using 'idivl'.

(If you use -fvia-C -optc-O2, the C compiler will notice the operations
and optimize it itself.  This is one of the very few areas where -fvia-C
is still better.)

 If I do x * sin 12, is GHC likely to compute sin 12 at compile-time?

Also try -ddump-simpl-stats and -ddump-simpl-iterations if you want to
know *why*.  (The extremely obscure 'full laziness' transformation
performed by GHC has a fundamental effect on the compilation of x * sin
12...)

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] GHC optimisations

2007-08-13 Thread Andrew Coppin

Does GHC do dead code elimination?

I observe that if you take a module and edit it so that some function is 
now no longer exported or called by any exported function, the size of 
the *.o file seems to go down. This suggests that dead code within a 
single module is eliminated.


However... how about this?

 module Foo where

 f = ...
 g = ...
 h = ...

 module Main where

 import Foo

 main = ...f...

Will the generated executable contain the code for g and h, even though 
they aren't called? Or does the unused code get eliminated? How about 
the standard libraries?




I read somewhere that if one funtion returns a tuple, and the caller 
immediately extracts the values from the tuple, GHC tries to optimise 
away the tuple - so it's as if the function can just return multiple 
values at once. Is this true? Does it apply only to tuples, or to all types?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-13 Thread Stefan O'Rear
On Mon, Aug 13, 2007 at 07:05:03PM +0100, Andrew Coppin wrote:
 Does GHC do dead code elimination?

Yes - all unused let-binders are removed.

 I observe that if you take a module and edit it so that some function is 
 now no longer exported or called by any exported function, the size of the 
 *.o file seems to go down. This suggests that dead code within a single 
 module is eliminated.

 However... how about this?

  module Foo where

  f = ...
  g = ...
  h = ...

  module Main where

  import Foo

  main = ...f...

 Will the generated executable contain the code for g and h, even though 
 they aren't called? Or does the unused code get eliminated? How about the 
 standard libraries?

By default, GHC generates one object file per source module, so if any
values from that module are used, the entire module is linked in.  With
the -split-objs flag, GHC generates one object file per top level
function, sometimes significantly reducing the size of executables.
This is not the default because it puts tremendous strain on the linker.
It is common to spend several *minutes* linking base when compiling ghc,
and it's not unheard of for ar to simply run out of memory and die.

 I read somewhere that if one funtion returns a tuple, and the caller 
 immediately extracts the values from the tuple, GHC tries to optimise away 
 the tuple - so it's as if the function can just return multiple values at 
 once. Is this true? Does it apply only to tuples, or to all types?

This is called the Constructed Product Return (CPR) analysis, and it
applies to all types with one constructor (in archaic jargon, product
types).

For instance,

(+) (I# x#) (I# y#) = I# (x# +# y#)

foo = case 2 + 2 of I# x# - ...

is transformed into:

zdzp (I# x#) (I# y#) = (x# +# y#)

foo = case 2 + 2 of x# - ...

This is one of the two main pillars of arithmetic unboxing, alongside
and equally important to strictness analysis.

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-13 Thread Andrew Coppin

Stefan O'Rear wrote:

On Mon, Aug 13, 2007 at 07:05:03PM +0100, Andrew Coppin wrote:
  

Does GHC do dead code elimination?



Yes - all unused let-binders are removed.
  


Not related to optimisation, but... is there some switch to warn you if 
something gets removed? (Presumably this means you forgot to export 
something, or you haven't coded the bit that calls it yet or something.)


Will the generated executable contain the code for g and h, even though 
they aren't called? Or does the unused code get eliminated? How about the 
standard libraries?



By default, GHC generates one object file per source module, so if any
values from that module are used, the entire module is linked in.


Right, OK. I had a feeling that was the case.

(I once compiled a program that used the GHC API. The final binary was 
several times larger than ghc.exe...)



With
the -split-objs flag, GHC generates one object file per top level
function, sometimes significantly reducing the size of executables.
This is not the default because it puts tremendous strain on the linker.
It is common to spend several *minutes* linking base when compiling ghc,
and it's not unheard of for ar to simply run out of memory and die.
  


Ouch! o_O

Probably simpler for me, as a human, to split the program into a 
sensible module structure... ;-)


I read somewhere that if one funtion returns a tuple, and the caller 
immediately extracts the values from the tuple, GHC tries to optimise away 
the tuple - so it's as if the function can just return multiple values at 
once. Is this true? Does it apply only to tuples, or to all types?



This is called the Constructed Product Return (CPR) analysis, and it
applies to all types with one constructor (in archaic jargon, product
types).
  


Right. So it doesn't have to have strict fields or anything? Just has to 
have exactly one constructor?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-13 Thread Stefan O'Rear
On Mon, Aug 13, 2007 at 07:35:31PM +0100, Andrew Coppin wrote:
 Not related to optimisation, but... is there some switch to warn you if 
 something gets removed? (Presumably this means you forgot to export 
 something, or you haven't coded the bit that calls it yet or something.)

-fwarn-unused-binds (included in -Wall)

 (I once compiled a program that used the GHC API. The final binary was 
 several times larger than ghc.exe...)

GHC is a particularly bad case because what it does is determined by the
settings of a bunch of switches in the configuration data.  Of course,
GHC isn't smart enough to perform inter-module control flow analysis, so
even with -split-objs you'd probably still link most of GHC.

 I read somewhere that if one funtion returns a tuple, and the caller 
 immediately extracts the values from the tuple, GHC tries to optimise 
 away the tuple - so it's as if the function can just return multiple 
 values at once. Is this true? Does it apply only to tuples, or to all 
 types?

 This is called the Constructed Product Return (CPR) analysis, and it
 applies to all types with one constructor (in archaic jargon, product
 types).

 Right. So it doesn't have to have strict fields or anything? Just has to 
 have exactly one constructor?

Yep, CPR is completely independent of strictness, however the returned
product must be *new*, since returning an old object by value risks
losing sharing (and thus creating large memory leaks).

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] GHC optimisations

2007-08-13 Thread Andrew Coppin

Stefan O'Rear wrote:

On Mon, Aug 13, 2007 at 07:35:31PM +0100, Andrew Coppin wrote:
  
(I once compiled a program that used the GHC API. The final binary was 
several times larger than ghc.exe...)



GHC is a particularly bad case because what it does is determined by the
settings of a bunch of switches in the configuration data.  Of course,
GHC isn't smart enough to perform inter-module control flow analysis, so
even with -split-objs you'd probably still link most of GHC.
  


Is it likely that GHC will ever become smart enough to do that kind of 
analysis? (I imagine this is going to be especially fun with precompiled 
libraries...)



This is called the Constructed Product Return (CPR) analysis, and it
applies to all types with one constructor (in archaic jargon, product
types).
  
Right. So it doesn't have to have strict fields or anything? Just has to 
have exactly one constructor?



Yep, CPR is completely independent of strictness, however the returned
product must be *new*, since returning an old object by value risks
losing sharing (and thus creating large memory leaks).
  


Right, OK.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe