Re: [Haskell-cafe] Haskell performance question

2007-11-09 Thread Derek Elkins
On Thu, 2007-11-08 at 22:54 -0800, Don Stewart wrote:
 bulat.ziganshin:
  definitely, it's a whole new era in low-level ghc programming
 
 victory!

Now I want a way of getting (well-used) SIMD instructions and such, and
with some luck some high-level approach as well.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Bulat Ziganshin
Hello Dan,

Thursday, November 8, 2007, 9:33:12 PM, you wrote:

 main = do
a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
 (i+1); writeArray a (i+1) (x+y) }
x - readArray a (n-1)
print x

1. ghc doesn't implement loop unrolling, even in -O2 mode
2. forM_ may have its own overheads, i'm not sure
3. add strictness annotations:

forM_ [0..n-2] $ \i - do { return $! i;
x - readArray a i; return $! x;
y - readArray a (i+1); return $! y;
writeArray a (i+1) (x+y) }

such cycle should be approx. 3-10 times slower than C one

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Mikhail Gusarov
Dan Piponi [EMAIL PROTECTED] writes:

 Even though 'n' is 10 times bigger in the C program it runs much
 faster than the Haskell program on my MacBook Pro with Haskell 6.6.1.
 I've tried lots of different combinations of flags that I've found in
 various postings to haskell-cafe but to no avail.

import Data.List

main = do
  print $ foldl' (+) 0 $ take 1 [1.0,1.0..]

works 10 times faster than your C version. You just need to adapt to the
radically different style of programming.

Real Fortran programmer may write Fortran programs on any programming
language :)

-- 
JID: [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Thomas Schilling
On Fri, 2007-11-09 at 00:51 +0600, Mikhail Gusarov wrote:
 Dan Piponi [EMAIL PROTECTED] writes:
 
  Even though 'n' is 10 times bigger in the C program it runs much
  faster than the Haskell program on my MacBook Pro with Haskell 6.6.1.
  I've tried lots of different combinations of flags that I've found in
  various postings to haskell-cafe but to no avail.
 
 import Data.List
 
 main = do
   print $ foldl' (+) 0 $ take 1 [1.0,1.0..]
 
 works 10 times faster than your C version. You just need to adapt to the
 radically different style of programming.
 
 Real Fortran programmer may write Fortran programs on any programming
 language :)

Good point, but this is not true for the OP.  Take a look at Dan's blog.
It's linked on planet.haskell.org.

BTW. I prefer:

  print 1000.0

;)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 2:48 PM, Duncan Coutts [EMAIL PROTECTED] wrote:
 You really do not need happy to build ghc. Just ignore the extralibs
 tarball.

Well that was the crucial fact I needed. 6.8.1 is now built. ghci
doesn't work, it complains about an unknown symbol '_environ' in
HSbase-3.0.0.0.o but I was able to rerun the timings for my code. WIth
-O2 the run time went from about 1.5s to 0.2s. With unsafeRead and
unsafeWrite that becomes 0.16s. I can start thinking about writing
lots of Fortran in Haskell now.

(Anyone have any idea what that missing symbol is about?)
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Stefan O'Rear
On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote:
 On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:
  On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:
   
   $ ghc --make -O2 ghc-bench.hs

 Even for GCC (/not/ G_H_C)?

No, GCC implements -Ox properly.

 I know about the GHC bug, that's why I used
 -O2 for GHC/Haskell and -O3 for GCC/C.

You just contradicted the line I quoted; was it in error?

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Stefan O'Rear
On Thu, Nov 08, 2007 at 05:03:54PM -0800, Stefan O'Rear wrote:
 On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote:
  On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:
   On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:

$ ghc --make -O2 ghc-bench.hs
 
  Even for GCC (/not/ G_H_C)?
 
 No, GCC implements -Ox properly.
 
  I know about the GHC bug, that's why I used
  -O2 for GHC/Haskell and -O3 for GCC/C.
 
 You just contradicted the line I quoted; was it in error?

Nevermind, I'm just blind today.  (thanks Shachaf for bringing this to
my attention...)

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Duncan Coutts
On Thu, 2007-11-08 at 13:00 -0800, Dan Piponi wrote:
 
 It looks like my whole question might become moot with ghc 6.8.1, but
 so far I've been unable to build it due to the cyclic happy
 dependency.

You really do not need happy to build ghc. Just ignore the extralibs
tarball. You can install any of those libs that you need later from
hackage.

Duncan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
nominolo:
 On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:
  I see lots of shootout examples where Haskell programs seem to perform
  comparably with C programs, but I find it hard to reproduce anything
  like those figures when testing with my own code. So here's a simple
  case:
  
  I have this C program:
  
  #include stdio.h
  
  #define n 1
  
  double a[n];
  
  int main()
  {
 int i;
 for (i = 0; in; ++i)
 {
 a[i] = 1.0f;
 }
 for (i = 0; in-1; ++i)
 {
 a[i+1] = a[i]+a[i+1];
 }
 printf(%f\n,a[n-1]);
  }
  
  And this Haskell program:
  
  import Data.Array.IO
  import Control.Monad
  
  n = 1000
  
  main = do
 a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
 forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
  (i+1); writeArray a (i+1) (x+y) }
 x - readArray a (n-1)
 print x
  
  Even though 'n' is 10 times bigger in the C program it runs much
  faster than the Haskell program on my MacBook Pro with Haskell 6.6.1.
  I've tried lots of different combinations of flags that I've found in
  various postings to haskell-cafe but to no avail.
  
  What flags do I need to get at least within a factor of 2 or 3 of C?
  Am I using the wrong kind of array? Or is Haskell always going to be
  15-20 times slower for this kind of numerical work?
 
 Wow.  You should *really* try using GHC 6.8.1:
 
 GHC 6.6.1  (-O2)
 
 real0m7.062s
 user0m5.464s
 sys 0m0.288s
 
 GHC 6.8.0.20071019 
 
 real0m0.723s
 user0m0.616s
 sys 0m0.100s
 
 result is 2.0e7 for both

So that looks like a bug in the simplifier in 6.6.1 that has been fixed.
This should go in the testsuite, perhaps, Simon?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 11:34 AM, Jason Dusek [EMAIL PROTECTED] wrote:
 Can you show us your compilation options and timings?

I was simply using -O3. I tried a bunch of other flags (copied from
the shootout examples) but they made no appreciable difference.

I was getting about 1.5s for the Haskell program and about 0.08s for
the C one with the same n=10,000,000.

I now see that my ghc is in fact an i386 build whose binary was
downloaded from the ghc binary web page. So I'm going to try to make a
better comparison with a 64 bit 6.8.1. The batteries on my laptop will
probably drop to zero before then however, so it'll be tomorrow at the
earliest.
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Xiao-Yong Jin
Bulat Ziganshin [EMAIL PROTECTED] writes:

 Hello Dan,

 Thursday, November 8, 2007, 9:33:12 PM, you wrote:

 main = do
a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
 (i+1); writeArray a (i+1) (x+y) }
x - readArray a (n-1)
print x

 1. ghc doesn't implement loop unrolling, even in -O2 mode
 2. forM_ may have its own overheads, i'm not sure
 3. add strictness annotations:

 forM_ [0..n-2] $ \i - do { return $! i;
 x - readArray a i; return $! x;
 y - readArray a (i+1); return $! y;
 writeArray a (i+1) (x+y) }

 such cycle should be approx. 3-10 times slower than C one

I didn't know you can do that.  Anyway, I tried OP's C and
Haskell versions and the one with your strictness
annotations.  It seems that the `$!' thing doesn't do much,
or it's not doing anything.


OP's C code:

$ time ./test
1.00

real0m1.128s
user0m0.572s
sys 0m0.555s


OP's Haskell code with n=1e8:

$ time ./test-ghc 
1.0e8

real0m2.065s
user0m1.505s
sys 0m0.558s


OP's Haskell code with n=1e8 and your strictness annotations:

$ time ./test-ghc 
1.0e8

real0m2.064s
user0m1.472s
sys 0m0.591s


However, bad performance observed by OP is not observed by me.

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.1
$ gcc --version
gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-- 
c/*__o/*
\ * (__
*/\  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Stefan O'Rear
On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:
 
 $ ghc --make -O2 ghc-bench.hs
 
 and got:
 
 $ time ./ghc-bench 
 2.0e7
 
 real0m0.714s
 user0m0.576s
 sys 0m0.132s
 
 $ time ./ghcbC 
 2000.00
 
 real0m0.305s
 user0m0.164s
 sys 0m0.132s
 
 This is on a first-gen Macbook running Ubuntu.  1GB RAM.  1.83Ghz Core
 Duo
 
 $ ghc --version
 The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019
 
 $ gcc --version
 gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
 Copyright (C) 2006 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is
 NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
 PURPOSE.
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 $ gcc -O3 ghc-bench.c -o ghcbC
 ghc-bench.c: In function ‘main’:
 ghc-bench.c:16: warning: incompatible implicit declaration of built-in
 function ‘printf’

-O3 is worse than -O0, DO NOT USE IT.

I reported the bug several months ago.  In the meantime, use -O2.

(there are no high optimizations, it's just that the option parser
misparses -O3)

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Xiao-Yong Jin
Don Stewart [EMAIL PROTECTED] writes:

 xj2106:
 I used `unsafePerformIO' with `INLINE', because I don't know
 where `inlinePerformIO' is now.  And also the `-optc-march'
 is changed to `nocona'.

 Using unsafePerformIO here would break some crucial inlining.
 (the same trick is used in Data.ByteString, by the way).

 You can find inlinePerformIO is in Data.ByteString.Internal.

 Comparing the two, n=5500, ghc 6.8:

 $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3
 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2  -optc-ffast-math
 spec.hs -o spec_hs --make

 With inlinePerformIO:

 $ time ./spec_hs 5500
 1.274224153
 ./spec_hs 5500  26.32s user 0.00s system 99% cpu 26.406 total

 As expected, and comparable to the shooutout result for the same N.
 With unsafePerformIO, the whole thing falls apart:

 $ time ./spec_hs 5500
 ^Cspec_hs: interrupted
 ./spec_hs 5500  124.86s user 0.11s system 99% cpu 2:05.04 total

 I gave up after 2 minutes. This FFI peek/poke code, acting as an ST
 monad, under a pure interface relies on inlinePerformIO.

Thanks for pointing this out.

Xiao-Yong
-- 
c/*__o/*
\ * (__
*/\  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
dpiponi:
 On Nov 8, 2007 11:34 AM, Jason Dusek [EMAIL PROTECTED] wrote:
  Can you show us your compilation options and timings?
 
 I was simply using -O3. I tried a bunch of other flags (copied from
 the shootout examples) but they made no appreciable difference.

Argh, -O2 please. -O3 does nothing more, and sometime does less,
depending on the compiler :)

Double math always needs the gcc backend, and -optc flags (see the
spectral-norm post).

 
 I was getting about 1.5s for the Haskell program and about 0.08s for
 the C one with the same n=10,000,000.

I'm sure we can do better than that!
  
 I now see that my ghc is in fact an i386 build whose binary was
 downloaded from the ghc binary web page. So I'm going to try to make a
 better comparison with a 64 bit 6.8.1. The batteries on my laptop will
 probably drop to zero before then however, so it'll be tomorrow at the
 earliest.

If you can post the code somewhere, that would be great, with examples
of how to reproduce your timings.

-- don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Xiao-Yong Jin
Dan Piponi [EMAIL PROTECTED] writes:

 My wasn't intended to represent the problem that I'm trying to solve,
 but the approach I want to take. The problems that I do want to solve
 don't lend themselves to this kind of approach.

 My real situation is that I want to write code that has both a
 high-level component and a low-level number-crunching component that
 works on large dense and sparse arrays. Idiomatic Haskell is great for
 the high-level component. But the question is whether or not its worth
 trying to write the low-level code in Haskell or whether I should just
 write that code in C and use the FFI. 

I've written a bit number-crunching code in haskell.  My
experience is that usually the haskell code runs about 2 to
5 times slower -- It depends on whether I use the best
optimisation scheme or not -- I do not know if I am doing
the correct thing most of the time, since there is not much
literature on-line about numerical computing in haskell, or
in FP in general.  There is one in ocaml, though.

 There are still advantages to using Haskell even if the
 code is highly unidiomatic: you have the freedom of
 throwing in higher order functions from time to time in
 the low-level code, and writing mixed-language code is a
 pain.  -- Dan

I fully agree.

Xiao-Yong
-- 
c/*__o/*
\ * (__
*/\  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Thomas Schilling
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:
 I see lots of shootout examples where Haskell programs seem to perform
 comparably with C programs, but I find it hard to reproduce anything
 like those figures when testing with my own code. So here's a simple
 case:
 
 I have this C program:
 
 #include stdio.h
 
 #define n 1
 
 double a[n];
 
 int main()
 {
int i;
for (i = 0; in; ++i)
{
a[i] = 1.0f;
}
for (i = 0; in-1; ++i)
{
a[i+1] = a[i]+a[i+1];
}
printf(%f\n,a[n-1]);
 }
 
 And this Haskell program:
 
 import Data.Array.IO
 import Control.Monad
 
 n = 1000
 
 main = do
a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
 (i+1); writeArray a (i+1) (x+y) }
x - readArray a (n-1)
print x
 
 Even though 'n' is 10 times bigger in the C program it runs much
 faster than the Haskell program on my MacBook Pro with Haskell 6.6.1.
 I've tried lots of different combinations of flags that I've found in
 various postings to haskell-cafe but to no avail.
 
 What flags do I need to get at least within a factor of 2 or 3 of C?
 Am I using the wrong kind of array? Or is Haskell always going to be
 15-20 times slower for this kind of numerical work?

I tried both, but it would be really helpful to see your command line
options.  I used:

$ gcc -O3 ghc-bench.c -o ghcbC
ghc-bench.c: In function ‘main’:
ghc-bench.c:16: warning: incompatible implicit declaration of built-in
function ‘printf’

$ ghc --make -O2 ghc-bench.hs

and got:

$ time ./ghc-bench 
2.0e7

real0m0.714s
user0m0.576s
sys 0m0.132s

$ time ./ghcbC 
2000.00

real0m0.305s
user0m0.164s
sys 0m0.132s

This is on a first-gen Macbook running Ubuntu.  1GB RAM.  1.83Ghz Core
Duo

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019

$ gcc --version
gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Thomas Schilling
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:
 I see lots of shootout examples where Haskell programs seem to perform
 comparably with C programs, but I find it hard to reproduce anything
 like those figures when testing with my own code. So here's a simple
 case:
 
 I have this C program:
 
 #include stdio.h
 
 #define n 1
 
 double a[n];
 
 int main()
 {
int i;
for (i = 0; in; ++i)
{
a[i] = 1.0f;
}
for (i = 0; in-1; ++i)
{
a[i+1] = a[i]+a[i+1];
}
printf(%f\n,a[n-1]);
 }
 
 And this Haskell program:
 
 import Data.Array.IO
 import Control.Monad
 
 n = 1000
 
 main = do
a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
 (i+1); writeArray a (i+1) (x+y) }
x - readArray a (n-1)
print x
 
 Even though 'n' is 10 times bigger in the C program it runs much
 faster than the Haskell program on my MacBook Pro with Haskell 6.6.1.
 I've tried lots of different combinations of flags that I've found in
 various postings to haskell-cafe but to no avail.
 
 What flags do I need to get at least within a factor of 2 or 3 of C?
 Am I using the wrong kind of array? Or is Haskell always going to be
 15-20 times slower for this kind of numerical work?

Wow.  You should *really* try using GHC 6.8.1:

GHC 6.6.1  (-O2)

real0m7.062s
user0m5.464s
sys 0m0.288s

GHC 6.8.0.20071019 

real0m0.723s
user0m0.616s
sys 0m0.100s

result is 2.0e7 for both


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
Bulat,

The strictness gave me something like a 10% performance increase
making the Haskell code more than 10 times slower than the C. Is this
the right type of array to use for performance?
--
Dan

On Nov 8, 2007 10:36 AM, Bulat Ziganshin [EMAIL PROTECTED] wrote:
 Hello Dan,

 Thursday, November 8, 2007, 9:33:12 PM, you wrote:

  main = do
 a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
 forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
  (i+1); writeArray a (i+1) (x+y) }
 x - readArray a (n-1)
 print x

 1. ghc doesn't implement loop unrolling, even in -O2 mode
 2. forM_ may have its own overheads, i'm not sure
 3. add strictness annotations:

 forM_ [0..n-2] $ \i - do { return $! i;
 x - readArray a i; return $! x;
 y - readArray a (i+1); return $! y;
 writeArray a (i+1) (x+y) }

 such cycle should be approx. 3-10 times slower than C one

 --
 Best regards,
  Bulatmailto:[EMAIL PROTECTED]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
Mikhail,

 main = do
   print $ foldl' (+) 0 $ take 1 [1.0,1.0..]

 works 10 times faster than your C version. You just need to adapt to the
 radically different style of programming.

My wasn't intended to represent the problem that I'm trying to solve,
but the approach I want to take. The problems that I do want to solve
don't lend themselves to this kind of approach.

My real situation is that I want to write code that has both a
high-level component and a low-level number-crunching component that
works on large dense and sparse arrays. Idiomatic Haskell is great for
the high-level component. But the question is whether or not its worth
trying to write the low-level code in Haskell or whether I should just
write that code in C and use the FFI. There are still advantages to
using Haskell even if the code is highly unidiomatic: you have the
freedom of throwing in higher order functions from time to time in the
low-level code, and writing mixed-language code is a pain.
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 11:24 AM, Thomas Schilling [EMAIL PROTECTED] wrote:
 Wow.  You should *really* try using GHC 6.8.1:

I was hoping you weren't going to say that :-) As soon as I find a
suitable 64-bit Intel binary for MacOSX, or can bootstrap my way out
of happy needing happy in my attempted source build, I'll report back
on how my code performs.
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Jason Dusek
Can you show us your compilation options and timings?

-- 
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
dpiponi:
 I see lots of shootout examples where Haskell programs seem to perform
 comparably with C programs, but I find it hard to reproduce anything
 like those figures when testing with my own code. So here's a simple
 case:
 
 I have this C program:
 
 #include stdio.h
 
 #define n 1
 
 double a[n];
 
 int main()
 {
int i;
for (i = 0; in; ++i)
{
a[i] = 1.0f;
}
for (i = 0; in-1; ++i)
{
a[i+1] = a[i]+a[i+1];
}
printf(%f\n,a[n-1]);
 }
 
 And this Haskell program:
 
 import Data.Array.IO
 import Control.Monad
 
 n = 1000
 
 main = do
a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
 (i+1); writeArray a (i+1) (x+y) }
x - readArray a (n-1)
print x

Be really careful with Doubles (i.e. check what Int arrays also do),
since you need special flags to get good performance from unboxed Double
arrays. The best effort I know of is:


http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=all

which took some time to work out the right path through the compiler. In
essence:

* some gcc flags:
-optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 
-optc-ffast-math
* Foreign.Marshal.Array
* type Reals = Ptr Double
* inlinePerformIO

in the end I'm stunned at how well we did there, since Double math has
been a long term issue.

Perhaps there are other lessons we can extract from that code?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 11:36 AM, Paul Brown [EMAIL PROTECTED] wrote:

 All that said, I'm not sure where I got the GHC that I used to build
 the 6.6.1 via MacPorts; I think it shipped with MacOS once upon a time.

 sudo port install ghc
. . .
configure: error: GHC is required unless bootstrapping from .hc files.

But I do have a plan...
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 12:16 PM, Don Stewart [EMAIL PROTECTED] wrote:

 If you can post the code somewhere, that would be great, with examples
 of how to reproduce your timings.

The code is exactly what I posted originally (but nore that n is 10
times larger in the C code). I compiled using ghc -O3 -o test test.hs.
I timed the resulting executable with unix 'time'. I think I used the
tiger Intel binary here:
http://www.haskell.org/ghc/download_ghc_661.html (ghc --version
reports only The Glorious...verison 6.6.1, file
~/lib/ghc-6.6.1/ghc-6.6.1 reports Mach-O executable i386.) This was
all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything
else you need?

I was investigating the plausibility of implementing something like a
fluid dynamics solver completely in Haskell. I have some working C
code so I thought I'd port it. But only if there's a chance of it
being within a factor of 2 or 3 of the original speed.
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
dpiponi:
 On Nov 8, 2007 12:16 PM, Don Stewart [EMAIL PROTECTED] wrote:
 
  If you can post the code somewhere, that would be great, with examples
  of how to reproduce your timings.
 
 The code is exactly what I posted originally (but nore that n is 10
 times larger in the C code). I compiled using ghc -O3 -o test test.hs.
 I timed the resulting executable with unix 'time'. I think I used the
 tiger Intel binary here:
 http://www.haskell.org/ghc/download_ghc_661.html (ghc --version
 reports only The Glorious...verison 6.6.1, file
 ~/lib/ghc-6.6.1/ghc-6.6.1 reports Mach-O executable i386.) This was
 all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything
 else you need?
 
 I was investigating the plausibility of implementing something like a
 fluid dynamics solver completely in Haskell. I have some working C
 code so I thought I'd port it. But only if there's a chance of it
 being within a factor of 2 or 3 of the original speed.

Can you start by retrying with flags from the spectral-norm benchmark:


http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0

The interaction with gcc here is quite important, so forcing -fvia-C
will matter.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Dan Piponi
On Nov 8, 2007 12:36 PM, Don Stewart [EMAIL PROTECTED] wrote:
 dpiponi:
 Can you start by retrying with flags from the spectral-norm benchmark:
 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0

Actually, that was my starting point for investigating how to optimise
Haskell numerical code and I initially used all of those flags.
-fvia-C isn't listed among those flags but I also tried adding it
anyway. 1.45s every time, they make no difference. (And I made sure I
touched the source each time to force recompilation.) It looks like my
whole question might become moot with ghc 6.8.1, but so far I've been
unable to build it due to the cyclic happy dependency.
--
Dan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Xiao-Yong Jin
Don Stewart [EMAIL PROTECTED] writes:

 Can you start by retrying with flags from the spectral-norm benchmark:

 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0

 The interaction with gcc here is quite important, so forcing -fvia-C
 will matter.

Clearly things has been changed, since the release of ghc-6.8.1.  I tried them
with my laptop, and here are the results of N=3000.


C++ g++
===

real0m4.553s
user0m4.551s
sys 0m0.002s

changed one option: -march=nocona


Haskell GHC
===

real 0m34.392s
user 0m34.316s
sys 0m0.074s

I used `unsafePerformIO' with `INLINE', because I don't know
where `inlinePerformIO' is now.  And also the `-optc-march'
is changed to `nocona'.


Haskell GHC #2
==

real0m20.069s
user0m20.052s
sys 0m0.017s

The same label `Haskell GHC #2' on that web page.  This is
the one without any optimisation.  And it out performs the
previous heavily optimised one.  Is it because I used
`unsafePerformIO' instead of the original `inlinePerformIO'?
`-optc-march' is also changed to `nocona'.


Is it only my laptop, or ghc-6.8.1 is indeed much faster
than 6.6.1 and the internal has changed so much that the old
optimisation no longer works?

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.1
$ gcc --version
gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Intel(R) Core(TM)2 Duo CPU T7700  @ 2.40GHz


Xiao-Yong
-- 
c/*__o/*
\ * (__
*/\  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
bulat.ziganshin:
 Hello Don,
 
 Thursday, November 8, 2007, 10:53:28 PM, you wrote:
 
 a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double)
 forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a
  (i+1); writeArray a (i+1) (x+y) }
 
 oh, i was stupid. obviously, first thing you need to do is to use
 unsafeRead/unsafeWrite operations. i'm wonder how this code is only
 20x slower than C version, may be you need to use gcc -O3 -funroll-loops
 and gcc3 (because gcc4 becomes slower)

I think you use a different computer to me. Out of the box, amd64, ghc
6.8, with the same n=1, Dan's original code runs:

$ ghc -O2 A.hs  
$ time ./A_slow
1.0e8
./A_slow  1.60s user 0.50s system 95% cpu 2.193 total

and the C program

$ gcc -O2 A.c -o A_c   
$ time ./A_c
1.00
./A_c  1.03s user 0.52s system 96% cpu 1.612 total

without using any special flags.  A 1.3x slowdown such as this is more
typical of low level array programs.

If I awa a  5x slow down it would be considered a bug, and its been a
/long/ time since I've seen a 20x slowdown.

Note that with ghc 6.6, we get:

$ time ./A_66
A_66: out of memory (requested 1048576 bytes)
./A_66  4.94s user 0.78s system 99% cpu 5.735 total

So I know which compiler I prefer. 

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Thomas Schilling
On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote:
 On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote:
  
  $ ghc --make -O2 ghc-bench.hs
  
  and got:
  
  $ time ./ghc-bench 
  2.0e7
  
  real0m0.714s
  user0m0.576s
  sys 0m0.132s
  
  $ time ./ghcbC 
  2000.00
  
  real0m0.305s
  user0m0.164s
  sys 0m0.132s
  
  This is on a first-gen Macbook running Ubuntu.  1GB RAM.  1.83Ghz Core
  Duo
  
  $ ghc --version
  The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019
  
  $ gcc --version
  gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
  Copyright (C) 2006 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is
  NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
  PURPOSE.
  
  
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
  $ gcc -O3 ghc-bench.c -o ghcbC
  ghc-bench.c: In function ‘main’:
  ghc-bench.c:16: warning: incompatible implicit declaration of built-in
  function ‘printf’
 
 -O3 is worse than -O0, DO NOT USE IT.
 
 I reported the bug several months ago.  In the meantime, use -O2.
 
 (there are no high optimizations, it's just that the option parser
 misparses -O3)
 
 Stefan

Even for GCC (/not/ G_H_C)?  I know about the GHC bug, that's why I used
-O2 for GHC/Haskell and -O3 for GCC/C.

/Thomas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
xj2106:
 Don Stewart [EMAIL PROTECTED] writes:
 
  Can you start by retrying with flags from the spectral-norm benchmark:
 
  
  http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0
 
  The interaction with gcc here is quite important, so forcing -fvia-C
  will matter.
 
 Clearly things has been changed, since the release of ghc-6.8.1.  I tried them
 with my laptop, and here are the results of N=3000.
 
 
 C++ g++
 ===
 
 real0m4.553s
 user0m4.551s
 sys 0m0.002s
 
 changed one option: -march=nocona
 
 
 Haskell GHC
 ===
 
 real 0m34.392s
 user 0m34.316s
 sys 0m0.074s
 
 I used `unsafePerformIO' with `INLINE', because I don't know
 where `inlinePerformIO' is now.  And also the `-optc-march'
 is changed to `nocona'.

Using unsafePerformIO here would break some crucial inlining.
(the same trick is used in Data.ByteString, by the way).

You can find inlinePerformIO is in Data.ByteString.Internal.

Comparing the two, n=5500, ghc 6.8:

$ ghc -O -fglasgow-exts -fbang-patterns -optc-O3
-optc-march=pentium4 -optc-mfpmath=sse -optc-msse2  -optc-ffast-math
spec.hs -o spec_hs --make

With inlinePerformIO:

$ time ./spec_hs 5500
1.274224153
./spec_hs 5500  26.32s user 0.00s system 99% cpu 26.406 total

As expected, and comparable to the shooutout result for the same N.
With unsafePerformIO, the whole thing falls apart:

$ time ./spec_hs 5500
^Cspec_hs: interrupted
./spec_hs 5500  124.86s user 0.11s system 99% cpu 2:05.04 total

I gave up after 2 minutes. This FFI peek/poke code, acting as an ST
monad, under a pure interface relies on inlinePerformIO.

And the C++ program, just for comparison:

$ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4  -mfpmath=sse
-msse2 spec.c 
$ g++ spec.o -o spec-cpp

$ time ./spec-cpp 5500
1.274224153  
./spec-cpp 5500  18.81s user 0.00s system 99% cpu 18.816 total

So we remain competitive after changing to 6.8.

Again, low level array code optimised is within 2x optimised C/C++.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell performance question

2007-11-08 Thread Don Stewart
bulat.ziganshin:
 definitely, it's a whole new era in low-level ghc programming

victory!

-- Don :D
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe