[Haskell-cafe] Re: newbie optimization question
OK, if somebody wants to speculate (and if it even makes sense for such a microbenchmark) here are some more data. Except different OS and C++ compiler also processor is different. On my side it was AMD Athlon 64 X2 4800+ (2.4GHz, 2x1MiB L2 cache non-shared; C&Q was not switched off during the test). The system has 2GiB RAM. The C++ version had working set about 1.7 MiB, ghc version had it about 2 times bigger. Peter. Dusan Kolar wrote: Hello all, just to compare the stuff, I get quite other results being on other OS. Thus, the result of C++ compiler may not be that interesting as I do not have the one presented below. My machine: Linux 2.6.23-ARCH #1 SMP PREEMPT Mon Oct 22 12:50:26 CEST 2007 x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux Compilers: g++ --version g++ (GCC) 4.2.2 Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ghc --version The Glorious Glasgow Haskell Compilation System, version 6.6.1 Measurement: compiled with ghc -O2 time ./mainInteger real0m4.866s user0m4.843s sys 0m0.020s compiled with ghc -O2 time ./mainInt64 real0m2.213s user0m2.210s sys 0m0.003s compiled with ghc -O2 time ./mainInt real0m1.149s user0m1.143s sys 0m0.003s compiled with g++ -O3 time ./mainC real0m0.271s user0m0.270s sys 0m0.000s I don't know what is the reason, but the difference between Int, Int64 and Integer is not that dramatic as in example below, nevertheless, the difference between GHC and GNU C++ is very bad :-\ Dusan Peter Hercek wrote: Derek Elkins wrote: Try with rem instead of mod. (What the heck is with bottom?) The bottom was there by error and I was lazy to redo the tests so I rather posted exactly what I was doing. I do not know the compiler that good to be absolutely sure it cannot have impact on result ... so I rather did not doctor what I did :-) Ok, rem did help quite a bit. Any comments why it is so? Here are summary of results for those interested. I run all the tests once again. Haskell was about 13% slower than C++. MS cl.exe options: /Ox /G7 /MD ghc options: -O2 C++ version: 1.000; 0.984; 0.984 Haskell version specialized to Int: 1.125; 1.125; 1.109 Haskell version specialized to Integer: 8.781; 8.813; 8.813 Haskell version specialized to Int64: 9.781; 9.766; 9.831 The code: % cat b.hs module Main (divisors, perfect, main) where import Data.Int divisors :: Int -> [Int] divisors i = [j | j<-[1..i-1], i `rem` j == 0] perfect :: [Int] perfect = [i | i<-[1..1], i == sum (divisors i)] main = print perfect % cat b.cpp #include using namespace std; int main() { for (int i = 1; i <= 1; i++) { int sum = 0; for (int j = 1; j < i; j++) if (i % j == 0) sum += j; if (sum == i) cout << i << " "; } return 0; } % ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
Am Montag, 29. Oktober 2007 13:49 schrieb Dusan Kolar: > Hello all, > > just to compare the stuff, I get quite other results being on other > OS. Thus, the result of C++ compiler may not be that interesting as I do > not have the one presented below. Just to chime in, my results with the code below: [EMAIL PROTECTED]:~> uname -a Linux linux 2.4.20-4GB-athlon #1 Mon Mar 17 17:56:47 UTC 2003 i686 unknown unknown GNU/Linux on a 1200 MHz Duron g++ is version 3.3, C++ code compiled with -O3, Haskell with -O2 (GHC 6.6.1) [EMAIL PROTECTED]:~> time ./mainC 6 28 496 8128 real0m1.945s user0m1.910s sys 0m0.010s [EMAIL PROTECTED]:~> time ./mainInt [6,28,496,8128] real0m2.407s user0m2.300s sys 0m0.010s [EMAIL PROTECTED]:~> time ./mainInt64 [6,28,496,8128] real0m24.009s user0m23.900s sys 0m0.050s [EMAIL PROTECTED]:~> time ./mainInteger [6,28,496,8128] real0m21.555s user0m20.870s sys 0m0.010s So Int is not so much slower than C, Int64 and Integer dramatically slower with Integer beating Int64 here, too. Cheers, Daniel > > My machine: > Linux 2.6.23-ARCH #1 SMP PREEMPT Mon Oct 22 12:50:26 CEST 2007 x86_64 > Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux > > Compilers: > g++ --version > g++ (GCC) 4.2.2 > Copyright (C) 2007 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > ghc --version > The Glorious Glasgow Haskell Compilation System, version 6.6.1 > > Measurement: > compiled with ghc -O2 > time ./mainInteger > real0m4.866s > user0m4.843s > sys 0m0.020s > > compiled with ghc -O2 > time ./mainInt64 > real0m2.213s > user0m2.210s > sys 0m0.003s > > compiled with ghc -O2 > time ./mainInt > real0m1.149s > user0m1.143s > sys 0m0.003s > > compiled with g++ -O3 > time ./mainC > real0m0.271s > user0m0.270s > sys 0m0.000s > > I don't know what is the reason, but the difference between Int, Int64 > and Integer is not that dramatic as in example below, nevertheless, the > difference between GHC and GNU C++ is very bad :-\ > > Dusan > > The code: > > > > % cat b.hs > > module Main (divisors, perfect, main) where > > import Data.Int > > > > divisors :: Int -> [Int] > > divisors i = [j | j<-[1..i-1], i `rem` j == 0] > > > > perfect :: [Int] > > perfect = [i | i<-[1..1], i == sum (divisors i)] > > > > main = print perfect > > > > % cat b.cpp > > #include > > using namespace std; > > > > int main() { > > for (int i = 1; i <= 1; i++) { > > int sum = 0; > > for (int j = 1; j < i; j++) > > if (i % j == 0) > > sum += j; > > if (sum == i) > > cout << i << " "; > > } > > return 0; > > } > > > > % ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Peter Hercek wrote: Daniel Fischer wrote: What perpetually puzzles me is that in C long long int has very good performance, *much* faster than gmp, in Haskell, on my computer, Int64 is hardly faster than Integer. I tried the example with Int64 and Integer. The integer version was actually quicker ... which is the reason I decided to post the results. C++ version times: 1.125; 1.109; 1.125 Int32 cpu times: 3.203; 3.172; 3.172 Int64 cpu times: 11.734; 11.797; 11.844 Integer cpu times: 9.609; 9.609; 9.500 Interesting that Int64 is *slower* than Integer. I can believe that. Integer is actually optimised for small values: there's a specialised representation for values that fit in a single word that avoids calling out to the GMP library. As Stefan pointed out, there's a lot of room to improve the performance of Int64, it's just never been a high priority. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
Hello all, just to compare the stuff, I get quite other results being on other OS. Thus, the result of C++ compiler may not be that interesting as I do not have the one presented below. My machine: Linux 2.6.23-ARCH #1 SMP PREEMPT Mon Oct 22 12:50:26 CEST 2007 x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux Compilers: g++ --version g++ (GCC) 4.2.2 Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ghc --version The Glorious Glasgow Haskell Compilation System, version 6.6.1 Measurement: compiled with ghc -O2 time ./mainInteger real0m4.866s user0m4.843s sys 0m0.020s compiled with ghc -O2 time ./mainInt64 real0m2.213s user0m2.210s sys 0m0.003s compiled with ghc -O2 time ./mainInt real0m1.149s user0m1.143s sys 0m0.003s compiled with g++ -O3 time ./mainC real0m0.271s user0m0.270s sys 0m0.000s I don't know what is the reason, but the difference between Int, Int64 and Integer is not that dramatic as in example below, nevertheless, the difference between GHC and GNU C++ is very bad :-\ Dusan Peter Hercek wrote: Derek Elkins wrote: Try with rem instead of mod. (What the heck is with bottom?) The bottom was there by error and I was lazy to redo the tests so I rather posted exactly what I was doing. I do not know the compiler that good to be absolutely sure it cannot have impact on result ... so I rather did not doctor what I did :-) Ok, rem did help quite a bit. Any comments why it is so? Here are summary of results for those interested. I run all the tests once again. Haskell was about 13% slower than C++. MS cl.exe options: /Ox /G7 /MD ghc options: -O2 C++ version: 1.000; 0.984; 0.984 Haskell version specialized to Int: 1.125; 1.125; 1.109 Haskell version specialized to Integer: 8.781; 8.813; 8.813 Haskell version specialized to Int64: 9.781; 9.766; 9.831 The code: % cat b.hs module Main (divisors, perfect, main) where import Data.Int divisors :: Int -> [Int] divisors i = [j | j<-[1..i-1], i `rem` j == 0] perfect :: [Int] perfect = [i | i<-[1..1], i == sum (divisors i)] main = print perfect % cat b.cpp #include using namespace std; int main() { for (int i = 1; i <= 1; i++) { int sum = 0; for (int j = 1; j < i; j++) if (i % j == 0) sum += j; if (sum == i) cout << i << " "; } return 0; } % ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Dusan Kolartel: +420 54 114 1238 UIFS FIT VUT Brno fax: +420 54 114 1270 Bozetechova 2 e-mail: [EMAIL PROTECTED] Brno 612 66 Czech Republic -- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Rodrigo Queiro wrote: Why do you expose perfect and divisors? Maybe if you just expose main, perfect and divisors will be inlined (although this will only save 10,000 function entries, so will probably have negligible effect). I exposed them so that I can check types in ghci. Hiding them does not seem to have noticeable effect. Thanks, Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
rem is faster because it has slightly different behaviour to mod, and there happens to be an intel instruction that maps more directly to rem than to mod, thus making it much faster on intel processors. Why do you expose perfect and divisors? Maybe if you just expose main, perfect and divisors will be inlined (although this will only save 10,000 function entries, so will probably have negligible effect). Rodrigo On 29/10/2007, Peter Hercek <[EMAIL PROTECTED]> wrote: > Derek Elkins wrote: > > > > Try with rem instead of mod. > > > > (What the heck is with bottom?) > > The bottom was there by error and I was lazy to redo > the tests so I rather posted exactly what I was > doing. I do not know the compiler that good to be > absolutely sure it cannot have impact on result > ... so I rather did not doctor what I did :-) > > Ok, rem did help quite a bit. Any comments why it is so? > > Here are summary of results for those interested. I run > all the tests once again. Haskell was about 13% slower > than C++. > > MS cl.exe options: /Ox /G7 /MD > ghc options: -O2 > > C++ version: 1.000; 0.984; 0.984 > Haskell version specialized to Int: 1.125; 1.125; 1.109 > Haskell version specialized to Integer: 8.781; 8.813; 8.813 > Haskell version specialized to Int64: 9.781; 9.766; 9.831 > > The code: > > % cat b.hs > module Main (divisors, perfect, main) where > import Data.Int > > divisors :: Int -> [Int] > divisors i = [j | j<-[1..i-1], i `rem` j == 0] > > perfect :: [Int] > perfect = [i | i<-[1..1], i == sum (divisors i)] > > main = print perfect > > % cat b.cpp > #include > using namespace std; > > int main() { >for (int i = 1; i <= 1; i++) { > int sum = 0; > for (int j = 1; j < i; j++) >if (i % j == 0) > sum += j; > if (sum == i) >cout << i << " "; >} >return 0; > } > > % > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Derek Elkins wrote: Try with rem instead of mod. (What the heck is with bottom?) The bottom was there by error and I was lazy to redo the tests so I rather posted exactly what I was doing. I do not know the compiler that good to be absolutely sure it cannot have impact on result ... so I rather did not doctor what I did :-) Ok, rem did help quite a bit. Any comments why it is so? Here are summary of results for those interested. I run all the tests once again. Haskell was about 13% slower than C++. MS cl.exe options: /Ox /G7 /MD ghc options: -O2 C++ version: 1.000; 0.984; 0.984 Haskell version specialized to Int: 1.125; 1.125; 1.109 Haskell version specialized to Integer: 8.781; 8.813; 8.813 Haskell version specialized to Int64: 9.781; 9.766; 9.831 The code: % cat b.hs module Main (divisors, perfect, main) where import Data.Int divisors :: Int -> [Int] divisors i = [j | j<-[1..i-1], i `rem` j == 0] perfect :: [Int] perfect = [i | i<-[1..1], i == sum (divisors i)] main = print perfect % cat b.cpp #include using namespace std; int main() { for (int i = 1; i <= 1; i++) { int sum = 0; for (int j = 1; j < i; j++) if (i % j == 0) sum += j; if (sum == i) cout << i << " "; } return 0; } % ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Don Stewart wrote: perfect :: [Int] perfect = [i | i<-[1..1], i == sum (divisors i)] This should be a little faster , as sum will fuse, perfect :: [Int] perfect = [i | i<-[1..1], i == sum' (divisors i)] where sum' = foldr (+) 0 sum' did not help. Times are about the same with Int type. Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
peter: > Don Stewart wrote: > >>C++ version times: 1.109; 1.125; 1.125 > >>Int32 cpu times: 1.359; 1.359; 1.375 > >>Int64 cpu times: 11.688; 11.719; 11.766 > >>Integer cpu times: 9.719; 9.703; 9.703 > >> > >>Great result from ghc. > > > >What Haskell program were you using for this test? The original > >naive/high level implementation? > > > >-- Don > > Here it is (I played only with the types for divisors and perfect; > bottom is there from my previous tests, but it should not matter): > <---cut---> > > module Main (bottom, divisors, perfect, main) where > import Data.Int > > bottom = error "_|_" > > divisors :: Int -> [Int] > divisors i = [j | j<-[1..i-1], i `mod` j == 0] > > perfect :: [Int] > perfect = [i | i<-[1..1], i == sum (divisors i)] > This should be a little faster , as sum will fuse, perfect :: [Int] perfect = [i | i<-[1..1], i == sum' (divisors i)] where sum' = foldr (+) 0 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
On Sun, 2007-10-28 at 23:34 +0100, Peter Hercek wrote: > Don Stewart wrote: > >> C++ version times: 1.109; 1.125; 1.125 > >> Int32 cpu times: 1.359; 1.359; 1.375 > >> Int64 cpu times: 11.688; 11.719; 11.766 > >> Integer cpu times: 9.719; 9.703; 9.703 > >> > >> Great result from ghc. > > > > What Haskell program were you using for this test? The original > > naive/high level implementation? > > > > -- Don > > Here it is (I played only with the types for divisors and perfect; > bottom is there from my previous tests, but it should not matter): > <---cut---> > > module Main (bottom, divisors, perfect, main) where > import Data.Int > > bottom = error "_|_" > > divisors :: Int -> [Int] > divisors i = [j | j<-[1..i-1], i `mod` j == 0] > > perfect :: [Int] > perfect = [i | i<-[1..1], i == sum (divisors i)] > > main = print perfect Try with rem instead of mod. (What the heck is with bottom?) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Peter Hercek wrote: MS cl.exe version 13.10.3077 (options /G7 /MD) And I had cl options wrong too - I need to start to optimize not only to set the optimization target. /G7 /MD -> 1.109; 1.125; 1.125 /Ox /G7 /MD -> 0.922; 0.984; 0.984 Still it does not change the results too much. Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Don Stewart wrote: C++ version times: 1.109; 1.125; 1.125 Int32 cpu times: 1.359; 1.359; 1.375 Int64 cpu times: 11.688; 11.719; 11.766 Integer cpu times: 9.719; 9.703; 9.703 Great result from ghc. What Haskell program were you using for this test? The original naive/high level implementation? -- Don Here it is (I played only with the types for divisors and perfect; bottom is there from my previous tests, but it should not matter): <---cut---> module Main (bottom, divisors, perfect, main) where import Data.Int bottom = error "_|_" divisors :: Int -> [Int] divisors i = [j | j<-[1..i-1], i `mod` j == 0] perfect :: [Int] perfect = [i | i<-[1..1], i == sum (divisors i)] main = print perfect <---cut---> and here is the C++ version: <---cut---> #include using namespace std; int main() { for (int i = 1; i <= 1; i++) { int sum = 0; for (int j = 1; j < i; j++) if (i % j == 0) sum += j; if (sum == i) cout << i << " "; } return 0; } <---cut---> OS winxp 64 bit ghc v. 6.6.1 (optins -O2) MS cl.exe version 13.10.3077 (options /G7 /MD) Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
peter: > Peter Hercek wrote: > >C++ version times: 1.125; 1.109; 1.125 > >Int32 cpu times: 3.203; 3.172; 3.172 > >Int64 cpu times: 11.734; 11.797; 11.844 > >Integer cpu times: 9.609; 9.609; 9.500 > > Ooops, my results ware wrong (nonoptimizing ms cl > compiler used and I used -O instead of -O2 in ghc). > > C++ version times: 1.109; 1.125; 1.125 > Int32 cpu times: 1.359; 1.359; 1.375 > Int64 cpu times: 11.688; 11.719; 11.766 > Integer cpu times: 9.719; 9.703; 9.703 > > Great result from ghc. What Haskell program were you using for this test? The original naive/high level implementation? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Peter Hercek wrote: C++ version times: 1.125; 1.109; 1.125 Int32 cpu times: 3.203; 3.172; 3.172 Int64 cpu times: 11.734; 11.797; 11.844 Integer cpu times: 9.609; 9.609; 9.500 Ooops, my results ware wrong (nonoptimizing ms cl compiler used and I used -O instead of -O2 in ghc). C++ version times: 1.109; 1.125; 1.125 Int32 cpu times: 1.359; 1.359; 1.375 Int64 cpu times: 11.688; 11.719; 11.766 Integer cpu times: 9.719; 9.703; 9.703 Great result from ghc. Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: newbie optimization question
peter: > Daniel Fischer wrote: > >What perpetually puzzles me is that in C long long int has very good > >performance, *much* faster than gmp, in Haskell, on my computer, Int64 is > >hardly faster than Integer. > > I tried the example with Int64 and Integer. The integer version > was actually quicker ... which is the reason I decided to post > the results. > > C++ version times: 1.125; 1.109; 1.125 > Int32 cpu times: 3.203; 3.172; 3.172 > Int64 cpu times: 11.734; 11.797; 11.844 > Integer cpu times: 9.609; 9.609; 9.500 > > Interesting that Int64 is *slower* than Integer. > > On the other side the C version is not that much quicker. I guess > the Haskell version is using generic versions of mod and sum > (since they are from a library) which would mean indirect calls. > The Haskell version probably also creates the list nodes ... > even when they get almost immediately garbage collected. > > Thanks for pointing out Int64 sucks so much :) With -O2 ghc will only have a function call to 'sum', with -O2 and the list stream fusion library, there are no indirect calls and the entire program is specialised to a nested loop. Do you have your C++ program handy? I'd expect the fully fused version to be somewhere between 1 and 2x slower. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: newbie optimization question
Daniel Fischer wrote: What perpetually puzzles me is that in C long long int has very good performance, *much* faster than gmp, in Haskell, on my computer, Int64 is hardly faster than Integer. I tried the example with Int64 and Integer. The integer version was actually quicker ... which is the reason I decided to post the results. C++ version times: 1.125; 1.109; 1.125 Int32 cpu times: 3.203; 3.172; 3.172 Int64 cpu times: 11.734; 11.797; 11.844 Integer cpu times: 9.609; 9.609; 9.500 Interesting that Int64 is *slower* than Integer. On the other side the C version is not that much quicker. I guess the Haskell version is using generic versions of mod and sum (since they are from a library) which would mean indirect calls. The Haskell version probably also creates the list nodes ... even when they get almost immediately garbage collected. Thanks for pointing out Int64 sucks so much :) Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe