[Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline
Hello, everyone, I'm not sure this is the good list for this posting: I plan to write yet another lazy graph reduction machine, using gcc, mainly for fun. My concern right now is about one particular aspect of gcc: the tail calls vs trampoline deathmatch. First some clarifications: - By tail call, I mean a properly optimized one, which doesn't grow the stack. Basically a jump. - By trampoline, I mean the infinite loop referred to as a tiny interpreter in the STG paper [1]. Instead of (tail) calling a function, we return its pointer, which will then be used for the actual call. I've read that trampoline calls are about 3 times as expensive as tail calls. Instead of a (direct) jump, you have a return, followed by an indirect call. However, testing this with gcc, I didn't manage to show any measurable difference. What am I missing? I used -O2 on both benchmarks, and turned off inlining (using the __noinline__ attribute). I'm running the program on a core2Duo in 32 bits mode, using GNU/Linux Ubuntu. Should I also make separate modules to prevent whole program analysis from gcc? Anyway, I send you the code below, so you can test and review it. Enjoy! [1]: http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz /* This program tests the performance difference between trampoline * calls and tail calls. (In the STG and related papers, trampoline * is referred to as tiny interpreter) * * depending on the mode (tail call or trampoline), some code changes: * the return type of the code blocks, and the calls to other blocks * (direct tail calls or return to trampoline). Hence the macros ENTER * and RETURN. * * To compile it,type one of the two following commands: * gcc -O2 sibcall.c -D SIBCALL * gcc -O2 sibcall.c * * The first one will make the program use sibling calls (enabled at * O2), and the second one will fall back to trampoline calls */ #ifdef SIBCALL // tail call mode #define ENTER(f) (f()) #define RETURN void #define MAIN(start) start() #else // trampoline call mode #define ENTER(f) return f #define RETURN void* #define MAIN(start) do{ void * (*f)(void) = start; \ while (1) \ f = (*f)(); \ } while (0) #endif // Now, we can begin the generic code #include stdlib.h #define LIMIT 10 // one billion int counter = 0; RETURN closure0(void); RETURN closure1(void); int main(int argc, char ** argv) { MAIN(closure0); return 0; } __attribute__ ((__noinline__)) RETURN closure0(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure1); } __attribute__ ((__noinline__)) RETURN closure1(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure0); } ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline
2008/7/1 Robin Green [EMAIL PROTECTED]: I think you should use the -march flag to gcc. Otherwise you are tying the compiler to an ancient version of the x86 instruction set (386). OK, let's try... Done. I used -march=nocona. I couldn't do core2 (not available on my gcc 4.1.2 yet). Did not notice any difference... On Tue, 1 Jul 2008 16:57:38 +0200 Loup Vaillant [EMAIL PROTECTED] wrote: Hello, everyone, I'm not sure this is the good list for this posting: I plan to write yet another lazy graph reduction machine, using gcc, mainly for fun. My concern right now is about one particular aspect of gcc: the tail calls vs trampoline deathmatch. First some clarifications: - By tail call, I mean a properly optimized one, which doesn't grow the stack. Basically a jump. - By trampoline, I mean the infinite loop referred to as a tiny interpreter in the STG paper [1]. Instead of (tail) calling a function, we return its pointer, which will then be used for the actual call. I've read that trampoline calls are about 3 times as expensive as tail calls. Instead of a (direct) jump, you have a return, followed by an indirect call. However, testing this with gcc, I didn't manage to show any measurable difference. What am I missing? I used -O2 on both benchmarks, and turned off inlining (using the __noinline__ attribute). I'm running the program on a core2Duo in 32 bits mode, using GNU/Linux Ubuntu. Should I also make separate modules to prevent whole program analysis from gcc? Anyway, I send you the code below, so you can test and review it. Enjoy! [1]: http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz /* This program tests the performance difference between trampoline * calls and tail calls. (In the STG and related papers, trampoline * is referred to as tiny interpreter) * * depending on the mode (tail call or trampoline), some code changes: * the return type of the code blocks, and the calls to other blocks * (direct tail calls or return to trampoline). Hence the macros ENTER * and RETURN. * * To compile it,type one of the two following commands: * gcc -O2 sibcall.c -D SIBCALL * gcc -O2 sibcall.c * * The first one will make the program use sibling calls (enabled at * O2), and the second one will fall back to trampoline calls */ #ifdef SIBCALL // tail call mode #define ENTER(f) (f()) #define RETURN void #define MAIN(start) start() #else // trampoline call mode #define ENTER(f) return f #define RETURN void* #define MAIN(start) do{ void * (*f)(void) = start; \ while (1) \ f = (*f)(); \ } while (0) #endif // Now, we can begin the generic code #include stdlib.h #define LIMIT 10 // one billion int counter = 0; RETURN closure0(void); RETURN closure1(void); int main(int argc, char ** argv) { MAIN(closure0); return 0; } __attribute__ ((__noinline__)) RETURN closure0(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure1); } __attribute__ ((__noinline__)) RETURN closure1(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure0); } ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline
Did you look at the generated assembly code? Although I don't know anything about GCC, I believe it's easy to let it generate an assembly listing file. Loup Vaillant wrote: 2008/7/1 Robin Green [EMAIL PROTECTED]: I think you should use the -march flag to gcc. Otherwise you are tying the compiler to an ancient version of the x86 instruction set (386). OK, let's try... Done. I used -march=nocona. I couldn't do core2 (not available on my gcc 4.1.2 yet). Did not notice any difference... On Tue, 1 Jul 2008 16:57:38 +0200 Loup Vaillant [EMAIL PROTECTED] wrote: Hello, everyone, I'm not sure this is the good list for this posting: I plan to write yet another lazy graph reduction machine, using gcc, mainly for fun. My concern right now is about one particular aspect of gcc: the tail calls vs trampoline deathmatch. First some clarifications: - By tail call, I mean a properly optimized one, which doesn't grow the stack. Basically a jump. - By trampoline, I mean the infinite loop referred to as a tiny interpreter in the STG paper [1]. Instead of (tail) calling a function, we return its pointer, which will then be used for the actual call. I've read that trampoline calls are about 3 times as expensive as tail calls. Instead of a (direct) jump, you have a return, followed by an indirect call. However, testing this with gcc, I didn't manage to show any measurable difference. What am I missing? I used -O2 on both benchmarks, and turned off inlining (using the __noinline__ attribute). I'm running the program on a core2Duo in 32 bits mode, using GNU/Linux Ubuntu. Should I also make separate modules to prevent whole program analysis from gcc? Anyway, I send you the code below, so you can test and review it. Enjoy! [1]: http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz /* This program tests the performance difference between trampoline * calls and tail calls. (In the STG and related papers, trampoline * is referred to as tiny interpreter) * * depending on the mode (tail call or trampoline), some code changes: * the return type of the code blocks, and the calls to other blocks * (direct tail calls or return to trampoline). Hence the macros ENTER * and RETURN. * * To compile it,type one of the two following commands: * gcc -O2 sibcall.c -D SIBCALL * gcc -O2 sibcall.c * * The first one will make the program use sibling calls (enabled at * O2), and the second one will fall back to trampoline calls */ #ifdef SIBCALL // tail call mode #define ENTER(f) (f()) #define RETURN void #define MAIN(start) start() #else // trampoline call mode #define ENTER(f) return f #define RETURN void* #define MAIN(start) do{ void * (*f)(void) = start; \ while (1) \ f = (*f)(); \ } while (0) #endif // Now, we can begin the generic code #include stdlib.h #define LIMIT 10 // one billion int counter = 0; RETURN closure0(void); RETURN closure1(void); int main(int argc, char ** argv) { MAIN(closure0); return 0; } __attribute__ ((__noinline__)) RETURN closure0(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure1); } __attribute__ ((__noinline__)) RETURN closure1(void) { if (counter = LIMIT) exit(0); counter++; ENTER(closure0); } ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe