Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Mikhail Glushenkov
Hi all,

This came up on StackOverflow [1]. When compiled with GHC (7.4.2 
7.6.2), this simple program:

main = print $ ack 4 1
  where ack :: Int - Int - Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

consumes all available memory on my machine and slows down to a crawl.
However, when compiled with JHC it runs in constant space and is about
as fast as the straightforward Ocaml version (see the SO question for
benchmark numbers).

I was able to fix the space leak by using CPS-conversion, but the
CPS-converted version is still about 10 times slower than the naive
version compiled with JHC.

I looked both at the Core and Cmm, but couldn't find anything
obviously wrong with the generated code - 'ack' is compiled to a
simple loop of type 'Int# - Int# - Int#'. What's more frustrating is
that running the program with +RTS -hc makes the space leak
mysteriously vanish.

Can someone please explain where the space leak comes from and if it's
possible to further improve the runtime of this program with GHC?
Apparently it's somehow connected to the stack management strategy,
since running the program with a larger stack chunk size (+RTS -kc1M)
makes the space leak go away. Interestingly, choosing smaller stack
chunk sizes (256K, 512K) causes it to die with an OOM exception:

$ time ./Test +RTS -kc256K
Test: out of memory (requested 2097152 bytes)


[1] 
http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Christopher Done
JHC compiles to C and last time I tried this it looked very much like the
original C version:
http://www.reddit.com/r/haskell/comments/1bcru7/damn_lies_and_haskell_performance/c9689a3

This is the same thing. Compile with --tdir=/tmp/ajhc and then cat
/tmp/ajhc/main_code.c. Output should be like this:

static uint32_t A_STD
fW$__fMain_1__ack(gc_t gc,uint32_t v105553376,uint32_t v61835120)
{
if (0 == v105553376) {
return 1 + v61835120;
} else {
uint32_t v16;
uint32_t v22;
struct tup1 x2;
if (0 == v61835120) {
uint32_t v228308038 = (v105553376 - 1);
x2.t0 = v228308038;
x2.t1 = 1;
} else {
uint32_t v110947990;
uint32_t v215884490 = (v61835120 - 1);
uint32_t v62470112 = (v105553376 - 1);
v110947990 = fW$__fMain_1__ack(gc,v105553376,v215884490);
x2.t0 = v62470112;
x2.t1 = v110947990;
}
v22 = x2.t0;
v16 = x2.t1;
return fW$__fMain_1__ack(gc,v22,v16);
}
}

And it's as fast as C on my machine:

chris@midnight:~/Projects/me/ajhc$ time ./ack
65533

real 0m2.134s
user 0m2.124s
sys 0m0.000s
chris@midnight:~/Projects/me/ajhc$ gcc -O3 ack.c -o ack-c
ack.c: In function ‘main’:
ack.c:8:3: warning: incompatible implicit declaration of built-in function
‘printf’ [enabled by default]
chris@midnight:~/Projects/me/ajhc$ time ./ack-c
65533

real 0m2.255s
user 0m2.248s
sys 0m0.000s
chris@midnight:~/Projects/me/ajhc$




On 20 April 2013 10:55, Mikhail Glushenkov the.dead.shall.r...@gmail.comwrote:

 Hi all,

 This came up on StackOverflow [1]. When compiled with GHC (7.4.2 
 7.6.2), this simple program:

 main = print $ ack 4 1
   where ack :: Int - Int - Int
 ack 0 n = n+1
 ack m 0 = ack (m-1) 1
 ack m n = ack (m-1) (ack m (n-1))

 consumes all available memory on my machine and slows down to a crawl.
 However, when compiled with JHC it runs in constant space and is about
 as fast as the straightforward Ocaml version (see the SO question for
 benchmark numbers).

 I was able to fix the space leak by using CPS-conversion, but the
 CPS-converted version is still about 10 times slower than the naive
 version compiled with JHC.

 I looked both at the Core and Cmm, but couldn't find anything
 obviously wrong with the generated code - 'ack' is compiled to a
 simple loop of type 'Int# - Int# - Int#'. What's more frustrating is
 that running the program with +RTS -hc makes the space leak
 mysteriously vanish.

 Can someone please explain where the space leak comes from and if it's
 possible to further improve the runtime of this program with GHC?
 Apparently it's somehow connected to the stack management strategy,
 since running the program with a larger stack chunk size (+RTS -kc1M)
 makes the space leak go away. Interestingly, choosing smaller stack
 chunk sizes (256K, 512K) causes it to die with an OOM exception:

 $ time ./Test +RTS -kc256K
 Test: out of memory (requested 2097152 bytes)


 [1]
 http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074

 --
 ()  ascii ribbon campaign - against html e-mail
 /\  www.asciiribbon.org   - against proprietary attachments

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Mikhail Glushenkov
Hi,

On Sat, Apr 20, 2013 at 12:08 PM, Christopher Done chrisd...@gmail.com wrote:
 JHC compiles to C and last time I tried this it looked very much like the
 original C version:
 http://www.reddit.com/r/haskell/comments/1bcru7/damn_lies_and_haskell_performance/c9689a3

 This is the same thing.

Yes, but with the Fibonacci example GHC wasn't consuming all available
memory and running 10 times slower. Surely it must be possible to make
the GHC version at least 2x as slow as Ocaml/GHC?

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Mikhail Glushenkov
On Sat, Apr 20, 2013 at 12:23 PM, Mikhail Glushenkov
the.dead.shall.r...@gmail.com wrote:
 Surely it must be possible to make
 the GHC version at least 2x as slow as Ocaml/GHC?

s|Ocaml/GHC|Ocaml/JHC|


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Dan Doel
There's something strange going on in this example. For instance, setting
(-M) heap limits as low as 40K seems to have no effect, even though the
program easily uses more than 8G. Except, interrupting the program in such
a case does seem to give a message about heap limits being exceeded (it
won't stop on its own, though). Also, the program compiled without
optimizations uses very little memory (though it's slow), which is odd,
because the optimized version works exclusively on Int#, which shouldn't
cause heap allocation.

I filed a ticket earlier: http://hackage.haskell.org/trac/ghc/ticket/7850
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Mikhail Glushenkov
Hi,

On 20 April 2013 14:20, Dan Doel dan.d...@gmail.com wrote:
 There's something strange going on in this example. For instance, setting
 (-M) heap limits as low as 40K seems to have no effect, even though the
 program easily uses more than 8G.

Apparently the only things it allocates is stack chunks.

I managed to produce a version of this program that runs approximately
as fast as the Ocaml one by manually unrolling the main loop [1], but
it still has to be run with +RTS -kc1M to avoid the memory leak.

[1] https://gist.github.com/23Skidoo/5425891

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Edward Z. Yang
I don't seem to get the leak on latest GHC head.  Running the program
in GC debug mode in 7.6.2 is quite telling; the program is allocating
*a lot* of megablocks.  We probably fixed it though?

Edward

Excerpts from Mikhail Glushenkov's message of Sat Apr 20 01:55:10 -0700 2013:
 Hi all,
 
 This came up on StackOverflow [1]. When compiled with GHC (7.4.2 
 7.6.2), this simple program:
 
 main = print $ ack 4 1
   where ack :: Int - Int - Int
 ack 0 n = n+1
 ack m 0 = ack (m-1) 1
 ack m n = ack (m-1) (ack m (n-1))
 
 consumes all available memory on my machine and slows down to a crawl.
 However, when compiled with JHC it runs in constant space and is about
 as fast as the straightforward Ocaml version (see the SO question for
 benchmark numbers).
 
 I was able to fix the space leak by using CPS-conversion, but the
 CPS-converted version is still about 10 times slower than the naive
 version compiled with JHC.
 
 I looked both at the Core and Cmm, but couldn't find anything
 obviously wrong with the generated code - 'ack' is compiled to a
 simple loop of type 'Int# - Int# - Int#'. What's more frustrating is
 that running the program with +RTS -hc makes the space leak
 mysteriously vanish.
 
 Can someone please explain where the space leak comes from and if it's
 possible to further improve the runtime of this program with GHC?
 Apparently it's somehow connected to the stack management strategy,
 since running the program with a larger stack chunk size (+RTS -kc1M)
 makes the space leak go away. Interestingly, choosing smaller stack
 chunk sizes (256K, 512K) causes it to die with an OOM exception:
 
 $ time ./Test +RTS -kc256K
 Test: out of memory (requested 2097152 bytes)
 
 
 [1] 
 http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074
 

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Joachim Breitner
Hi,

Am Samstag, den 20.04.2013, 13:03 -0700 schrieb Edward Z. Yang:
 I don't seem to get the leak on latest GHC head.  Running the program
 in GC debug mode in 7.6.2 is quite telling; the program is allocating
 *a lot* of megablocks.  We probably fixed it though?

it’s confirmed that it is fixed in HEAD. But it’d be really interesting
to know what the bug was and how it was fixed. Could someone who has a
setup for running bisect at his fingertips find out?

Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata



signature.asc
Description: This is a digitally signed message part
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Why is GHC so much worse than JHC when computing the Ackermann function?

2013-04-20 Thread Thomas Schilling
Sounds similar to the bug we had in 6.12.1.  It was a simple
accounting bug, i.e., the GC was not invoked, because it didn't know
that the memory was allocated.

On 20 April 2013 22:03, Edward Z. Yang ezy...@mit.edu wrote:
 I don't seem to get the leak on latest GHC head.  Running the program
 in GC debug mode in 7.6.2 is quite telling; the program is allocating
 *a lot* of megablocks.  We probably fixed it though?

 Edward

 Excerpts from Mikhail Glushenkov's message of Sat Apr 20 01:55:10 -0700 2013:
 Hi all,

 This came up on StackOverflow [1]. When compiled with GHC (7.4.2 
 7.6.2), this simple program:

 main = print $ ack 4 1
   where ack :: Int - Int - Int
 ack 0 n = n+1
 ack m 0 = ack (m-1) 1
 ack m n = ack (m-1) (ack m (n-1))

 consumes all available memory on my machine and slows down to a crawl.
 However, when compiled with JHC it runs in constant space and is about
 as fast as the straightforward Ocaml version (see the SO question for
 benchmark numbers).

 I was able to fix the space leak by using CPS-conversion, but the
 CPS-converted version is still about 10 times slower than the naive
 version compiled with JHC.

 I looked both at the Core and Cmm, but couldn't find anything
 obviously wrong with the generated code - 'ack' is compiled to a
 simple loop of type 'Int# - Int# - Int#'. What's more frustrating is
 that running the program with +RTS -hc makes the space leak
 mysteriously vanish.

 Can someone please explain where the space leak comes from and if it's
 possible to further improve the runtime of this program with GHC?
 Apparently it's somehow connected to the stack management strategy,
 since running the program with a larger stack chunk size (+RTS -kc1M)
 makes the space leak go away. Interestingly, choosing smaller stack
 chunk sizes (256K, 512K) causes it to die with an OOM exception:

 $ time ./Test +RTS -kc256K
 Test: out of memory (requested 2097152 bytes)


 [1] 
 http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074


 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users