Re: [Haskell-cafe] Haskell performance question
On Thu, 2007-11-08 at 22:54 -0800, Don Stewart wrote: bulat.ziganshin: definitely, it's a whole new era in low-level ghc programming victory! Now I want a way of getting (well-used) SIMD instructions and such, and with some luck some high-level approach as well. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Hello Dan, Thursday, November 8, 2007, 9:33:12 PM, you wrote: main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x 1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations: forM_ [0..n-2] $ \i - do { return $! i; x - readArray a i; return $! x; y - readArray a (i+1); return $! y; writeArray a (i+1) (x+y) } such cycle should be approx. 3-10 times slower than C one -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Dan Piponi [EMAIL PROTECTED] writes: Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. import Data.List main = do print $ foldl' (+) 0 $ take 1 [1.0,1.0..] works 10 times faster than your C version. You just need to adapt to the radically different style of programming. Real Fortran programmer may write Fortran programs on any programming language :) -- JID: [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Fri, 2007-11-09 at 00:51 +0600, Mikhail Gusarov wrote: Dan Piponi [EMAIL PROTECTED] writes: Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. import Data.List main = do print $ foldl' (+) 0 $ take 1 [1.0,1.0..] works 10 times faster than your C version. You just need to adapt to the radically different style of programming. Real Fortran programmer may write Fortran programs on any programming language :) Good point, but this is not true for the OP. Take a look at Dan's blog. It's linked on planet.haskell.org. BTW. I prefer: print 1000.0 ;) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 2:48 PM, Duncan Coutts [EMAIL PROTECTED] wrote: You really do not need happy to build ghc. Just ignore the extralibs tarball. Well that was the crucial fact I needed. 6.8.1 is now built. ghci doesn't work, it complains about an unknown symbol '_environ' in HSbase-3.0.0.0.o but I was able to rerun the timings for my code. WIth -O2 the run time went from about 1.5s to 0.2s. With unsafeRead and unsafeWrite that becomes 0.16s. I can start thinking about writing lots of Fortran in Haskell now. (Anyone have any idea what that missing symbol is about?) -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote: On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote: On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote: $ ghc --make -O2 ghc-bench.hs Even for GCC (/not/ G_H_C)? No, GCC implements -Ox properly. I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C. You just contradicted the line I quoted; was it in error? Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, Nov 08, 2007 at 05:03:54PM -0800, Stefan O'Rear wrote: On Fri, Nov 09, 2007 at 01:39:55AM +0100, Thomas Schilling wrote: On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote: On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote: $ ghc --make -O2 ghc-bench.hs Even for GCC (/not/ G_H_C)? No, GCC implements -Ox properly. I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C. You just contradicted the line I quoted; was it in error? Nevermind, I'm just blind today. (thanks Shachaf for bringing this to my attention...) Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, 2007-11-08 at 13:00 -0800, Dan Piponi wrote: It looks like my whole question might become moot with ghc 6.8.1, but so far I've been unable to build it due to the cyclic happy dependency. You really do not need happy to build ghc. Just ignore the extralibs tarball. You can install any of those libs that you need later from hackage. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
nominolo: On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote: I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include stdio.h #define n 1 double a[n]; int main() { int i; for (i = 0; in; ++i) { a[i] = 1.0f; } for (i = 0; in-1; ++i) { a[i+1] = a[i]+a[i+1]; } printf(%f\n,a[n-1]); } And this Haskell program: import Data.Array.IO import Control.Monad n = 1000 main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work? Wow. You should *really* try using GHC 6.8.1: GHC 6.6.1 (-O2) real0m7.062s user0m5.464s sys 0m0.288s GHC 6.8.0.20071019 real0m0.723s user0m0.616s sys 0m0.100s result is 2.0e7 for both So that looks like a bug in the simplifier in 6.6.1 that has been fixed. This should go in the testsuite, perhaps, Simon? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 11:34 AM, Jason Dusek [EMAIL PROTECTED] wrote: Can you show us your compilation options and timings? I was simply using -O3. I tried a bunch of other flags (copied from the shootout examples) but they made no appreciable difference. I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000. I now see that my ghc is in fact an i386 build whose binary was downloaded from the ghc binary web page. So I'm going to try to make a better comparison with a 64 bit 6.8.1. The batteries on my laptop will probably drop to zero before then however, so it'll be tomorrow at the earliest. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Bulat Ziganshin [EMAIL PROTECTED] writes: Hello Dan, Thursday, November 8, 2007, 9:33:12 PM, you wrote: main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x 1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations: forM_ [0..n-2] $ \i - do { return $! i; x - readArray a i; return $! x; y - readArray a (i+1); return $! y; writeArray a (i+1) (x+y) } such cycle should be approx. 3-10 times slower than C one I didn't know you can do that. Anyway, I tried OP's C and Haskell versions and the one with your strictness annotations. It seems that the `$!' thing doesn't do much, or it's not doing anything. OP's C code: $ time ./test 1.00 real0m1.128s user0m0.572s sys 0m0.555s OP's Haskell code with n=1e8: $ time ./test-ghc 1.0e8 real0m2.065s user0m1.505s sys 0m0.558s OP's Haskell code with n=1e8 and your strictness annotations: $ time ./test-ghc 1.0e8 real0m2.064s user0m1.472s sys 0m0.591s However, bad performance observed by OP is not observed by me. $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.1 $ gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- c/*__o/* \ * (__ */\ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote: $ ghc --make -O2 ghc-bench.hs and got: $ time ./ghc-bench 2.0e7 real0m0.714s user0m0.576s sys 0m0.132s $ time ./ghcbC 2000.00 real0m0.305s user0m0.164s sys 0m0.132s This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019 $ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’ -O3 is worse than -O0, DO NOT USE IT. I reported the bug several months ago. In the meantime, use -O2. (there are no high optimizations, it's just that the option parser misparses -O3) Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Don Stewart [EMAIL PROTECTED] writes: xj2106: I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'. Using unsafePerformIO here would break some crucial inlining. (the same trick is used in Data.ByteString, by the way). You can find inlinePerformIO is in Data.ByteString.Internal. Comparing the two, n=5500, ghc 6.8: $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math spec.hs -o spec_hs --make With inlinePerformIO: $ time ./spec_hs 5500 1.274224153 ./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total As expected, and comparable to the shooutout result for the same N. With unsafePerformIO, the whole thing falls apart: $ time ./spec_hs 5500 ^Cspec_hs: interrupted ./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total I gave up after 2 minutes. This FFI peek/poke code, acting as an ST monad, under a pure interface relies on inlinePerformIO. Thanks for pointing this out. Xiao-Yong -- c/*__o/* \ * (__ */\ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
dpiponi: On Nov 8, 2007 11:34 AM, Jason Dusek [EMAIL PROTECTED] wrote: Can you show us your compilation options and timings? I was simply using -O3. I tried a bunch of other flags (copied from the shootout examples) but they made no appreciable difference. Argh, -O2 please. -O3 does nothing more, and sometime does less, depending on the compiler :) Double math always needs the gcc backend, and -optc flags (see the spectral-norm post). I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000. I'm sure we can do better than that! I now see that my ghc is in fact an i386 build whose binary was downloaded from the ghc binary web page. So I'm going to try to make a better comparison with a 64 bit 6.8.1. The batteries on my laptop will probably drop to zero before then however, so it'll be tomorrow at the earliest. If you can post the code somewhere, that would be great, with examples of how to reproduce your timings. -- don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Dan Piponi [EMAIL PROTECTED] writes: My wasn't intended to represent the problem that I'm trying to solve, but the approach I want to take. The problems that I do want to solve don't lend themselves to this kind of approach. My real situation is that I want to write code that has both a high-level component and a low-level number-crunching component that works on large dense and sparse arrays. Idiomatic Haskell is great for the high-level component. But the question is whether or not its worth trying to write the low-level code in Haskell or whether I should just write that code in C and use the FFI. I've written a bit number-crunching code in haskell. My experience is that usually the haskell code runs about 2 to 5 times slower -- It depends on whether I use the best optimisation scheme or not -- I do not know if I am doing the correct thing most of the time, since there is not much literature on-line about numerical computing in haskell, or in FP in general. There is one in ocaml, though. There are still advantages to using Haskell even if the code is highly unidiomatic: you have the freedom of throwing in higher order functions from time to time in the low-level code, and writing mixed-language code is a pain. -- Dan I fully agree. Xiao-Yong -- c/*__o/* \ * (__ */\ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote: I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include stdio.h #define n 1 double a[n]; int main() { int i; for (i = 0; in; ++i) { a[i] = 1.0f; } for (i = 0; in-1; ++i) { a[i+1] = a[i]+a[i+1]; } printf(%f\n,a[n-1]); } And this Haskell program: import Data.Array.IO import Control.Monad n = 1000 main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work? I tried both, but it would be really helpful to see your command line options. I used: $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’ $ ghc --make -O2 ghc-bench.hs and got: $ time ./ghc-bench 2.0e7 real0m0.714s user0m0.576s sys 0m0.132s $ time ./ghcbC 2000.00 real0m0.305s user0m0.164s sys 0m0.132s This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019 $ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote: I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include stdio.h #define n 1 double a[n]; int main() { int i; for (i = 0; in; ++i) { a[i] = 1.0f; } for (i = 0; in-1; ++i) { a[i+1] = a[i]+a[i+1]; } printf(%f\n,a[n-1]); } And this Haskell program: import Data.Array.IO import Control.Monad n = 1000 main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work? Wow. You should *really* try using GHC 6.8.1: GHC 6.6.1 (-O2) real0m7.062s user0m5.464s sys 0m0.288s GHC 6.8.0.20071019 real0m0.723s user0m0.616s sys 0m0.100s result is 2.0e7 for both ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Bulat, The strictness gave me something like a 10% performance increase making the Haskell code more than 10 times slower than the C. Is this the right type of array to use for performance? -- Dan On Nov 8, 2007 10:36 AM, Bulat Ziganshin [EMAIL PROTECTED] wrote: Hello Dan, Thursday, November 8, 2007, 9:33:12 PM, you wrote: main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x 1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations: forM_ [0..n-2] $ \i - do { return $! i; x - readArray a i; return $! x; y - readArray a (i+1); return $! y; writeArray a (i+1) (x+y) } such cycle should be approx. 3-10 times slower than C one -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Mikhail, main = do print $ foldl' (+) 0 $ take 1 [1.0,1.0..] works 10 times faster than your C version. You just need to adapt to the radically different style of programming. My wasn't intended to represent the problem that I'm trying to solve, but the approach I want to take. The problems that I do want to solve don't lend themselves to this kind of approach. My real situation is that I want to write code that has both a high-level component and a low-level number-crunching component that works on large dense and sparse arrays. Idiomatic Haskell is great for the high-level component. But the question is whether or not its worth trying to write the low-level code in Haskell or whether I should just write that code in C and use the FFI. There are still advantages to using Haskell even if the code is highly unidiomatic: you have the freedom of throwing in higher order functions from time to time in the low-level code, and writing mixed-language code is a pain. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 11:24 AM, Thomas Schilling [EMAIL PROTECTED] wrote: Wow. You should *really* try using GHC 6.8.1: I was hoping you weren't going to say that :-) As soon as I find a suitable 64-bit Intel binary for MacOSX, or can bootstrap my way out of happy needing happy in my attempted source build, I'll report back on how my code performs. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Can you show us your compilation options and timings? -- _jsn ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
dpiponi: I see lots of shootout examples where Haskell programs seem to perform comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include stdio.h #define n 1 double a[n]; int main() { int i; for (i = 0; in; ++i) { a[i] = 1.0f; } for (i = 0; in-1; ++i) { a[i+1] = a[i]+a[i+1]; } printf(%f\n,a[n-1]); } And this Haskell program: import Data.Array.IO import Control.Monad n = 1000 main = do a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } x - readArray a (n-1) print x Be really careful with Doubles (i.e. check what Int arrays also do), since you need special flags to get good performance from unboxed Double arrays. The best effort I know of is: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=all which took some time to work out the right path through the compiler. In essence: * some gcc flags: -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math * Foreign.Marshal.Array * type Reals = Ptr Double * inlinePerformIO in the end I'm stunned at how well we did there, since Double math has been a long term issue. Perhaps there are other lessons we can extract from that code? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 11:36 AM, Paul Brown [EMAIL PROTECTED] wrote: All that said, I'm not sure where I got the GHC that I used to build the 6.6.1 via MacPorts; I think it shipped with MacOS once upon a time. sudo port install ghc . . . configure: error: GHC is required unless bootstrapping from .hc files. But I do have a plan... -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 12:16 PM, Don Stewart [EMAIL PROTECTED] wrote: If you can post the code somewhere, that would be great, with examples of how to reproduce your timings. The code is exactly what I posted originally (but nore that n is 10 times larger in the C code). I compiled using ghc -O3 -o test test.hs. I timed the resulting executable with unix 'time'. I think I used the tiger Intel binary here: http://www.haskell.org/ghc/download_ghc_661.html (ghc --version reports only The Glorious...verison 6.6.1, file ~/lib/ghc-6.6.1/ghc-6.6.1 reports Mach-O executable i386.) This was all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything else you need? I was investigating the plausibility of implementing something like a fluid dynamics solver completely in Haskell. I have some working C code so I thought I'd port it. But only if there's a chance of it being within a factor of 2 or 3 of the original speed. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
dpiponi: On Nov 8, 2007 12:16 PM, Don Stewart [EMAIL PROTECTED] wrote: If you can post the code somewhere, that would be great, with examples of how to reproduce your timings. The code is exactly what I posted originally (but nore that n is 10 times larger in the C code). I compiled using ghc -O3 -o test test.hs. I timed the resulting executable with unix 'time'. I think I used the tiger Intel binary here: http://www.haskell.org/ghc/download_ghc_661.html (ghc --version reports only The Glorious...verison 6.6.1, file ~/lib/ghc-6.6.1/ghc-6.6.1 reports Mach-O executable i386.) This was all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything else you need? I was investigating the plausibility of implementing something like a fluid dynamics solver completely in Haskell. I have some working C code so I thought I'd port it. But only if there's a chance of it being within a factor of 2 or 3 of the original speed. Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0 The interaction with gcc here is quite important, so forcing -fvia-C will matter. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Nov 8, 2007 12:36 PM, Don Stewart [EMAIL PROTECTED] wrote: dpiponi: Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0 Actually, that was my starting point for investigating how to optimise Haskell numerical code and I initially used all of those flags. -fvia-C isn't listed among those flags but I also tried adding it anyway. 1.45s every time, they make no difference. (And I made sure I touched the source each time to force recompilation.) It looks like my whole question might become moot with ghc 6.8.1, but so far I've been unable to build it due to the cyclic happy dependency. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
Don Stewart [EMAIL PROTECTED] writes: Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0 The interaction with gcc here is quite important, so forcing -fvia-C will matter. Clearly things has been changed, since the release of ghc-6.8.1. I tried them with my laptop, and here are the results of N=3000. C++ g++ === real0m4.553s user0m4.551s sys 0m0.002s changed one option: -march=nocona Haskell GHC === real 0m34.392s user 0m34.316s sys 0m0.074s I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'. Haskell GHC #2 == real0m20.069s user0m20.052s sys 0m0.017s The same label `Haskell GHC #2' on that web page. This is the one without any optimisation. And it out performs the previous heavily optimised one. Is it because I used `unsafePerformIO' instead of the original `inlinePerformIO'? `-optc-march' is also changed to `nocona'. Is it only my laptop, or ghc-6.8.1 is indeed much faster than 6.6.1 and the internal has changed so much that the old optimisation no longer works? $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.1 $ gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz Xiao-Yong -- c/*__o/* \ * (__ */\ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
bulat.ziganshin: Hello Don, Thursday, November 8, 2007, 10:53:28 PM, you wrote: a - newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i - do { x - readArray a i; y - readArray a (i+1); writeArray a (i+1) (x+y) } oh, i was stupid. obviously, first thing you need to do is to use unsafeRead/unsafeWrite operations. i'm wonder how this code is only 20x slower than C version, may be you need to use gcc -O3 -funroll-loops and gcc3 (because gcc4 becomes slower) I think you use a different computer to me. Out of the box, amd64, ghc 6.8, with the same n=1, Dan's original code runs: $ ghc -O2 A.hs $ time ./A_slow 1.0e8 ./A_slow 1.60s user 0.50s system 95% cpu 2.193 total and the C program $ gcc -O2 A.c -o A_c $ time ./A_c 1.00 ./A_c 1.03s user 0.52s system 96% cpu 1.612 total without using any special flags. A 1.3x slowdown such as this is more typical of low level array programs. If I awa a 5x slow down it would be considered a bug, and its been a /long/ time since I've seen a 20x slowdown. Note that with ghc 6.6, we get: $ time ./A_66 A_66: out of memory (requested 1048576 bytes) ./A_66 4.94s user 0.78s system 99% cpu 5.735 total So I know which compiler I prefer. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
On Thu, 2007-11-08 at 16:24 -0800, Stefan O'Rear wrote: On Thu, Nov 08, 2007 at 07:57:23PM +0100, Thomas Schilling wrote: $ ghc --make -O2 ghc-bench.hs and got: $ time ./ghc-bench 2.0e7 real0m0.714s user0m0.576s sys 0m0.132s $ time ./ghcbC 2000.00 real0m0.305s user0m0.164s sys 0m0.132s This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019 $ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’ -O3 is worse than -O0, DO NOT USE IT. I reported the bug several months ago. In the meantime, use -O2. (there are no high optimizations, it's just that the option parser misparses -O3) Stefan Even for GCC (/not/ G_H_C)? I know about the GHC bug, that's why I used -O2 for GHC/Haskell and -O3 for GCC/C. /Thomas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
xj2106: Don Stewart [EMAIL PROTECTED] writes: Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=0 The interaction with gcc here is quite important, so forcing -fvia-C will matter. Clearly things has been changed, since the release of ghc-6.8.1. I tried them with my laptop, and here are the results of N=3000. C++ g++ === real0m4.553s user0m4.551s sys 0m0.002s changed one option: -march=nocona Haskell GHC === real 0m34.392s user 0m34.316s sys 0m0.074s I used `unsafePerformIO' with `INLINE', because I don't know where `inlinePerformIO' is now. And also the `-optc-march' is changed to `nocona'. Using unsafePerformIO here would break some crucial inlining. (the same trick is used in Data.ByteString, by the way). You can find inlinePerformIO is in Data.ByteString.Internal. Comparing the two, n=5500, ghc 6.8: $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math spec.hs -o spec_hs --make With inlinePerformIO: $ time ./spec_hs 5500 1.274224153 ./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total As expected, and comparable to the shooutout result for the same N. With unsafePerformIO, the whole thing falls apart: $ time ./spec_hs 5500 ^Cspec_hs: interrupted ./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total I gave up after 2 minutes. This FFI peek/poke code, acting as an ST monad, under a pure interface relies on inlinePerformIO. And the C++ program, just for comparison: $ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4 -mfpmath=sse -msse2 spec.c $ g++ spec.o -o spec-cpp $ time ./spec-cpp 5500 1.274224153 ./spec-cpp 5500 18.81s user 0.00s system 99% cpu 18.816 total So we remain competitive after changing to 6.8. Again, low level array code optimised is within 2x optimised C/C++. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell performance question
bulat.ziganshin: definitely, it's a whole new era in low-level ghc programming victory! -- Don :D ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe