[Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline

2008-07-01 Thread Loup Vaillant
Hello, everyone,
I'm not sure this is the good list for this posting: I plan to write
yet another lazy graph reduction machine, using gcc, mainly for fun.
My concern right now is about one particular aspect of gcc: the tail
calls vs trampoline deathmatch.

First some clarifications:
- By tail call, I mean a properly optimized one, which doesn't grow
the stack. Basically a jump.
- By trampoline, I mean the infinite loop referred to as a tiny
interpreter in the STG paper [1]. Instead of (tail) calling a
function, we return its pointer, which will then be used for the
actual call.

I've read that trampoline calls are about 3 times as expensive as tail
calls. Instead of a (direct) jump, you have a return, followed by an
indirect call. However, testing this with gcc, I didn't manage to show
any measurable difference. What am I missing? I used -O2 on both
benchmarks, and turned off inlining (using the __noinline__
attribute). I'm running the program on a core2Duo in 32 bits mode,
using GNU/Linux Ubuntu. Should I also make separate modules to prevent
whole program analysis from gcc?

Anyway, I send you the code below, so you can test and review it. Enjoy!

[1]: 
http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz



/* This program tests the performance difference between trampoline
 * calls and tail calls.  (In the STG and related papers, trampoline
 * is referred to as tiny interpreter)
 *
 * depending on the mode (tail call or trampoline), some code changes:
 * the return type of the code blocks, and the calls to other blocks
 * (direct tail calls or return to trampoline). Hence the macros ENTER
 * and RETURN.
 *
 * To compile it,type one of the two following commands:
 * gcc -O2 sibcall.c -D SIBCALL
 * gcc -O2 sibcall.c
 *
 * The first one will make the program use sibling calls (enabled at
 * O2), and the second one will fall back to trampoline calls
 */

#ifdef SIBCALL // tail call mode

#define ENTER(f) (f())
#define RETURN void
#define MAIN(start) start()

#else // trampoline call mode

#define ENTER(f) return f
#define RETURN void*
#define MAIN(start)  do{ void * (*f)(void) = start; \
 while (1) \
   f = (*f)(); \
   } while (0)
#endif

// Now, we can begin the generic code
#include stdlib.h
#define LIMIT 10 // one billion

int counter = 0;

RETURN closure0(void);
RETURN closure1(void);


int main(int argc, char ** argv)
{
  MAIN(closure0);
  return 0;
}

__attribute__ ((__noinline__))
RETURN closure0(void)
{
  if (counter = LIMIT) exit(0);
  counter++;
  ENTER(closure1);
}
__attribute__ ((__noinline__))
RETURN closure1(void)
{
  if (counter = LIMIT) exit(0);
  counter++;
  ENTER(closure0);
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline

2008-07-01 Thread Loup Vaillant
2008/7/1 Robin Green [EMAIL PROTECTED]:
 I think you should use the -march flag to gcc. Otherwise you are tying
 the compiler to an ancient version of the x86 instruction set (386).

OK, let's try... Done. I used -march=nocona. I couldn't do core2 (not
available on my gcc 4.1.2 yet). Did not notice any difference...


 On Tue, 1 Jul 2008 16:57:38 +0200
 Loup Vaillant [EMAIL PROTECTED] wrote:

 Hello, everyone,
 I'm not sure this is the good list for this posting: I plan to write
 yet another lazy graph reduction machine, using gcc, mainly for fun.
 My concern right now is about one particular aspect of gcc: the tail
 calls vs trampoline deathmatch.

 First some clarifications:
 - By tail call, I mean a properly optimized one, which doesn't grow
 the stack. Basically a jump.
 - By trampoline, I mean the infinite loop referred to as a tiny
 interpreter in the STG paper [1]. Instead of (tail) calling a
 function, we return its pointer, which will then be used for the
 actual call.

 I've read that trampoline calls are about 3 times as expensive as tail
 calls. Instead of a (direct) jump, you have a return, followed by an
 indirect call. However, testing this with gcc, I didn't manage to show
 any measurable difference. What am I missing? I used -O2 on both
 benchmarks, and turned off inlining (using the __noinline__
 attribute). I'm running the program on a core2Duo in 32 bits mode,
 using GNU/Linux Ubuntu. Should I also make separate modules to prevent
 whole program analysis from gcc?

 Anyway, I send you the code below, so you can test and review it.
 Enjoy!

 [1]:
 http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz



 /* This program tests the performance difference between trampoline
  * calls and tail calls.  (In the STG and related papers, trampoline
  * is referred to as tiny interpreter)
  *
  * depending on the mode (tail call or trampoline), some code changes:
  * the return type of the code blocks, and the calls to other blocks
  * (direct tail calls or return to trampoline). Hence the macros ENTER
  * and RETURN.
  *
  * To compile it,type one of the two following commands:
  * gcc -O2 sibcall.c -D SIBCALL
  * gcc -O2 sibcall.c
  *
  * The first one will make the program use sibling calls (enabled at
  * O2), and the second one will fall back to trampoline calls
  */

 #ifdef SIBCALL // tail call mode

 #define ENTER(f) (f())
 #define RETURN void
 #define MAIN(start) start()

 #else // trampoline call mode

 #define ENTER(f) return f
 #define RETURN void*
 #define MAIN(start)  do{ void * (*f)(void) = start; \
  while (1) \
f = (*f)(); \
} while (0)
 #endif

 // Now, we can begin the generic code
 #include stdlib.h
 #define LIMIT 10 // one billion

 int counter = 0;

 RETURN closure0(void);
 RETURN closure1(void);


 int main(int argc, char ** argv)
 {
   MAIN(closure0);
   return 0;
 }

 __attribute__ ((__noinline__))
 RETURN closure0(void)
 {
   if (counter = LIMIT) exit(0);
   counter++;
   ENTER(closure1);
 }
 __attribute__ ((__noinline__))
 RETURN closure1(void)
 {
   if (counter = LIMIT) exit(0);
   counter++;
   ENTER(closure0);
 }
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] gcc as a portable assembler: tail calls vs trampoline

2008-07-01 Thread Peter Verswyvelen
Did you look at the generated assembly code? Although I don't know 
anything about GCC, I believe it's easy to let it generate an assembly 
listing file.


Loup Vaillant wrote:

2008/7/1 Robin Green [EMAIL PROTECTED]:
  

I think you should use the -march flag to gcc. Otherwise you are tying
the compiler to an ancient version of the x86 instruction set (386).



OK, let's try... Done. I used -march=nocona. I couldn't do core2 (not
available on my gcc 4.1.2 yet). Did not notice any difference...


  

On Tue, 1 Jul 2008 16:57:38 +0200
Loup Vaillant [EMAIL PROTECTED] wrote:



Hello, everyone,
I'm not sure this is the good list for this posting: I plan to write
yet another lazy graph reduction machine, using gcc, mainly for fun.
My concern right now is about one particular aspect of gcc: the tail
calls vs trampoline deathmatch.

First some clarifications:
- By tail call, I mean a properly optimized one, which doesn't grow
the stack. Basically a jump.
- By trampoline, I mean the infinite loop referred to as a tiny
interpreter in the STG paper [1]. Instead of (tail) calling a
function, we return its pointer, which will then be used for the
actual call.

I've read that trampoline calls are about 3 times as expensive as tail
calls. Instead of a (direct) jump, you have a return, followed by an
indirect call. However, testing this with gcc, I didn't manage to show
any measurable difference. What am I missing? I used -O2 on both
benchmarks, and turned off inlining (using the __noinline__
attribute). I'm running the program on a core2Duo in 32 bits mode,
using GNU/Linux Ubuntu. Should I also make separate modules to prevent
whole program analysis from gcc?

Anyway, I send you the code below, so you can test and review it.
Enjoy!

[1]:
http://research.microsoft.com/Users/simonpj/Papers/spineless-tagless-gmachine.ps.gz



/* This program tests the performance difference between trampoline
 * calls and tail calls.  (In the STG and related papers, trampoline
 * is referred to as tiny interpreter)
 *
 * depending on the mode (tail call or trampoline), some code changes:
 * the return type of the code blocks, and the calls to other blocks
 * (direct tail calls or return to trampoline). Hence the macros ENTER
 * and RETURN.
 *
 * To compile it,type one of the two following commands:
 * gcc -O2 sibcall.c -D SIBCALL
 * gcc -O2 sibcall.c
 *
 * The first one will make the program use sibling calls (enabled at
 * O2), and the second one will fall back to trampoline calls
 */

#ifdef SIBCALL // tail call mode

#define ENTER(f) (f())
#define RETURN void
#define MAIN(start) start()

#else // trampoline call mode

#define ENTER(f) return f
#define RETURN void*
#define MAIN(start)  do{ void * (*f)(void) = start; \
 while (1) \
   f = (*f)(); \
   } while (0)
#endif

// Now, we can begin the generic code
#include stdlib.h
#define LIMIT 10 // one billion

int counter = 0;

RETURN closure0(void);
RETURN closure1(void);


int main(int argc, char ** argv)
{
  MAIN(closure0);
  return 0;
}

__attribute__ ((__noinline__))
RETURN closure0(void)
{
  if (counter = LIMIT) exit(0);
  counter++;
  ENTER(closure1);
}
__attribute__ ((__noinline__))
RETURN closure1(void)
{
  if (counter = LIMIT) exit(0);
  counter++;
  ENTER(closure0);
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
  

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


  


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe