Author: fengzhou
Date: 2011-05-10 12:27:46 -0400 (Tue, 10 May 2011)
New Revision: 3592

Modified:
   trunk/osprey/be/cg/calls.cxx
   trunk/osprey/common/com/x8664/targ_sim.cxx
Log:
Background
----------
The problem is related to gcc's extension, namely the stdcall function
attributes. It is exposed while using gnu m4 as the test case.
According to gcc,

        stdcall
   On the Intel 386, the stdcall attribute causes the compiler to assume that
   the called function will pop off the stack space used to pass
arguments, unless
   it takes a variable number of arguments.

For example, the following code

int  __attribute__ ((stdcall)) foo(int a, int b) {
   int j; j = a + b; return j;
}

is compiled into (with gcc -m32 -O0):

foo:
   pushl   %ebp
   movl    %esp, %ebp
   subl    $16, %esp
   movl    12(%ebp), %eax
   addl    8(%ebp), %eax
   movl    %eax, -4(%ebp)
   movl    -4(%ebp), %eax
   leave
   ret $8                         <== note $8 here, which is the space needed
for passing actual parameters, callee adjusts sp

and the callsite is compiled into:

   call    foo
   subl    $8, %esp               <== generated only when __attribute__
((stdcall)) is declared on `foo'


Bug 1: m4 runtime failure when compiled with `-O2' or `-Os' (-m32)
------------------------------------------------------------------
This bug is caused by tail call optimization in CG. Basically, in some cases,
CG decides if it can just transform a call of B --> C into a jump instruction.
In m4, we have the following kind of code:

static reg_errcode_t __attribute__ ((regparm (3), stdcall)) get_subexp_sub
(re_match_context_t *mctx, const re_sub_match_top_t *sub_top,
re_sub_match_last_t *sub_last, Idx bkref_node, Idx bkref_str) {
 ...
 return clean_state_log_if_needed (mctx, to_idx); }


static reg_errcode_t __attribute__ ((regparm (3), stdcall))
clean_state_log_if_needed (re_match_context_t *mctx, Idx next_state_log_idx)


For function `get_subexp_sub', `regparm(3)' causes the first three arguments to
be passed via registers and the last two via stack. So, the stack size needed
for the actual parameters is 8. Therefore, you will see a `ret $8' instruction
at the end of this function. On the other hand, all the parameters for
`clean_state_log_if_needed' are passed via registers because of the
`regparm(3)' declaration. So, you will see a `ret' instruction at the end
instead (no sp adjustment needed). Because the `sp' adjustment is different for
these two functions, the call to `clean_state_log_if_needed' cannot be
converted into a `jump' instruction in this case. The original code fails to
check to see if the stack adjustments are same.

Bug 2: m4 runtime failure when compiled with `-Ofast' (-m32)
------------------------------------------------------------
This bug is also caused by `stdcall'. In function `Initialize_Stack_Frame',
`Calc_Formal_Area' is called to compute the stack area from types of the formal
parameters. On the other hand, when processing a callsite, `Calc_Actual_Area'
is called to compute the stack area from types of the *actual* parameters.
Ideally, the results should be same for the same function. The compiler also
assumes so, since it uses `arg_area_size_array' to store the computation result
(regardless whether it is from `Calc_Formal_Area' or `Calc_Actual_Area'), so
that only one such computation is needed for each function type.

In m4 code, we have the following two functions:

static reg_errcode_t __attribute__ ((regparm (3), stdcall))
re_search_internal (const regex_t *preg, const char *string, Idx length,
     Idx start, Idx last_start, Idx stop, size_t nmatch, regmatch_t pmatch[],
int eflags) //calls set_regs

static reg_errcode_t __attribute__ ((regparm (3), stdcall))
set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
regmatch_t *pmatch, _Bool fl_backtrack)

For `set_regs' function, because of `regparm(3)', the first 3 arguments are
passed via registers. So, for the call to `set_regs', only the last two
arguments need to be passed via stack. The last parameter of `set_regs' is of
_Bool type, which is of size 1 byte. However, the actual parameter needs to be
at least 4 byte (I4) even for _Bool. So, the stack size computed from
`Calc_Actual_Area' is 8 while it is 5 from `Calc_Formal_Area'. So, now what you
have in the generated code looks like:

re_search_internal:
          call set_regs
          sub $8, sp

set_regs:
        ...
        ret $5

Again, this is caused inconsistency in sp adjustment. The fix is to take
account of padding in `Calc_Formal_Area' as well because of stack alignment.

Additional Notes
----------------
1. Bug 2 does not happen in `-O3' because the compiler sees calls to `set_regs'
before compiling `set_regs'. Therefore, the stack size computed is 8. Then,
when `set_regs' function is being compiled, because the `arg_area_size_array'
has already been filled for this function type, `Calc_Formal_Area' does not
re-compute. So you get `ret $8' at the end of `set_regs' function.


Code Reviewed by Jian-Xin



Modified: trunk/osprey/be/cg/calls.cxx
===================================================================
--- trunk/osprey/be/cg/calls.cxx        2011-05-08 16:05:12 UTC (rev 3591)
+++ trunk/osprey/be/cg/calls.cxx        2011-05-10 16:27:46 UTC (rev 3592)
@@ -1182,6 +1182,50 @@
     }
   }
 
+  /* Don't perform tail call optimization if the caller and callee have 
different stack
+     adjustment size
+     see: Generate_Exit
+     */
+  if ( Is_Target_32bit() && call_st ){
+      // call_st might be NULL in indirect calls;
+      const TY_IDX caller_ty = ST_pu_type(pu_st);
+      const BOOL   caller_ff2c_abi = PU_ff2c_abi(Pu_Table[ST_pu(pu_st)]);
+      const RETURN_INFO caller_return_info = 
+          Get_Return_Info( TY_ret_type(caller_ty), No_Simulated, 
caller_ff2c_abi );
+
+      const TY_IDX callee_ty = func_type;
+      const BOOL   callee_ff2c_abi = PU_ff2c_abi(Pu_Table[ST_pu(call_st)]);
+      const RETURN_INFO callee_return_info =
+          Get_Return_Info( TY_ret_type(callee_ty), No_Simulated, 
callee_ff2c_abi );
+      
+      int   caller_sp_adjust = 0;
+      int   callee_sp_adjust = 0;
+
+      if( RETURN_INFO_return_via_first_arg(caller_return_info) ||
+              TY_return_to_param( caller_ty ) ){
+          caller_sp_adjust = Pointer_Size;
+      }
+
+      // callee adjust SP for stdcall/fastcall at return time
+      if (TY_has_stdcall(caller_ty) || TY_has_fastcall(caller_ty)) {
+          caller_sp_adjust += Get_PU_arg_area_size(caller_ty);
+      }
+
+      if( RETURN_INFO_return_via_first_arg(callee_return_info) ||
+              TY_return_to_param( callee_ty ) ){
+          callee_sp_adjust = Pointer_Size;
+      }
+
+      // callee adjust SP for stdcall/fastcall at return time
+      if (TY_has_stdcall(callee_ty) || TY_has_fastcall(callee_ty)) {
+          callee_sp_adjust += Get_PU_arg_area_size(callee_ty);
+      }
+
+      if ( caller_sp_adjust != callee_sp_adjust ) {
+          return NULL;
+      }
+  }
+
   /* Under -m32 -fpic, don't do tail call optimization, because the caller
      needs to restore GOT before return.
   */

Modified: trunk/osprey/common/com/x8664/targ_sim.cxx
===================================================================
--- trunk/osprey/common/com/x8664/targ_sim.cxx  2011-05-08 16:05:12 UTC (rev 
3591)
+++ trunk/osprey/common/com/x8664/targ_sim.cxx  2011-05-10 16:27:46 UTC (rev 
3592)
@@ -822,6 +822,13 @@
           if (Is_Target_32bit() && ploc.size == 8) {
             ++Current_Int_Param_Num;
           }
+          // for IA-32, stack needs to be aligned. So, if the last formal 
argument  
+          // needs pading, the total stack size computed from (just) function 
decl
+          // is different from the stack size computed from actual callsite    
+          if (Is_Target_32bit() && PLOC_on_stack(ploc)) {
+              ploc.size += rpad;
+              rpad = 0;
+          }
        }
        break;
        


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to