be: cg lno

Sun Chan Thu, 05 May 2011 23:19:10 -0700

Folks,
Any suggestion or opinion? Mike? Dehnert? ...

I understand this is highly personal, but it also make sense that
people try to follow similar practices. Let me throw in my
interpretation of what open64 practice so far. Please feel free to say
otherwise.
For each source file (.cxx), we have a header file associated with it.
Mainly, the header file consists of related class declarations and
their interfaces. It also specifies the visibility rule for their
interfaces.
A source file (.cxx and .h file included) is a source file, visibility
is enforced through use of "private", "friend" or related language
features. IOWs, using file inclusion "physically" is not the practice
in open64 source code to achieve "privacy".


Sun

On Fri, May 6, 2011 at 7:56 AM, Sun Chan <sun.c...@gmail.com> wrote:
> STL files are pretty big too. I think this topic deserves more input
> from fellow developers.
> I had problem downloading the original attachment. I will look at the
> .h file in question more carefully.
>
> Sun
>
> On Fri, May 6, 2011 at 7:47 AM, Ye, Mei <mei...@amd.com> wrote:
>> Hi Sun,
>>
>> In private Email exchanges, the author give the following reasons.  I 
>> granted a OK.
>> Please check whether these reasons make sense for you.  Thanks.
>>
>> -Mei
>>
>> Reason 1: to make external header file concise.
>>
>>   Do you feel comfortable if the header file has more than 1000 lines of 
>> code? I prefer it has only, say 50 lines of code. The concise header file 
>> will quickly help you to figure out which function is appropriate to be 
>> called.
>>
>> Need an example? you can examine the region.h (ORC region, I don't have code 
>> at hand). I remember there are more than 2k line of code there. It will took 
>> me quite a while to to figure out how to add a edge.
>>
>> reason 2: prevent internal data structure from being exposed....
>> reason 3: improve compile time...
>>
>>
>>
>> -----Original Message-----
>> From: Sun Chan [mailto:sun.c...@gmail.com]
>> Sent: Thursday, May 05, 2011 4:35 PM
>> To: open64-devel@lists.sourceforge.net
>> Subject: Re: [Open64-devel] r3586 - in trunk/osprey/be: cg lno
>>
>> I thought Mei had requested the #define xxx_INCLUDED
>> be moved to the header file (i.e. not in the .cxx file). This request
>> is consistent with the rest of the source code and common practice.
>>
>> Please fix that or back out your checkin
>>
>> Sun
>>
>> On Fri, May 6, 2011 at 7:15 AM,  <s...@open64.net> wrote:
>>> Author: pallavimathew
>>> Date: 2011-05-05 19:15:18 -0400 (Thu, 05 May 2011)
>>> New Revision: 3586
>>>
>>> Added:
>>>   trunk/osprey/be/lno/simd_util.cxx
>>>   trunk/osprey/be/lno/simd_util.h
>>> Modified:
>>>   trunk/osprey/be/cg/whirl2ops.cxx
>>>   trunk/osprey/be/lno/Makefile.gbase
>>>   trunk/osprey/be/lno/simd.cxx
>>> Log:
>>> This patch:
>>> 1. Introduces an initial object-oriented framework of classes (SIMD_*) to 
>>> represent and manage simd expressions.
>>>
>>> 2. Enhances the representation of constant integer vector and the 
>>> load-from-constant-integer-vector.
>>>
>>>   Before this change, a constant integer (say 4) is vectorized into 
>>> V16I4CONST. i.e a symbolic constant.
>>>   After this change, the vector is represented by:
>>>      U4INTCONST 4
>>>   V16I4I4REPLICATE
>>>
>>>   It is up to CG to determine how generate the code. Currently, the code 
>>> would be very
>>> efficient if the element's value is 0. In this case, we need only one 
>>> arithmetic instruction
>>> "pxor $xmm0, $xmm0". If element's value is non-zero, there are two options:
>>>  a)
>>>    - a.1) load integer to a scalar GPR, and
>>>    - a.2) move the GPR a SIMD register, and
>>>    - a.3) perform resuffle to replicate the element's value to the entire 
>>> vector.
>>>
>>>  b) save the vector as a symbolic constant, substitute the REPLICATE with
>>>     load from the symbolic constant.
>>>
>>>  b) is appealing for vector with short vector-length, say V16I8, and a) is
>>> desirable for vector like V16I1. However, we are blocked at step a.2) --
>>> A SIMD register is categorized as fp register, we have hard time moving a 
>>> int
>>> register to fp register.
>>>
>>> 3. Vectorizes loop with small trip count
>>>
>>>   The original SIMD implementation set a hard trip-count limit for 
>>> vectorization.
>>> This change is to try to vectorize any loop so long as the trip-count >= 
>>> vector-len.
>>> e.g. Following loop can be vectorized now:
>>>
>>>  for (int i=0; i < 2; i++) double_array[i] = (double)float_array[i];
>>>
>>>  TODO in the future:
>>>   For the loop like following, SIMD still try to perform peeling in order
>>> to archieve better alignment. This is a deoptimization if the trip-count is
>>> too small.
>>>
>>>  for (int i=0; i<4; i++) { a[i+1] = 0; }
>>>
>>> Code review by Mei Ye.
>>>
>>>
>>> Modified: trunk/osprey/be/cg/whirl2ops.cxx
>>> ===================================================================
>>> --- trunk/osprey/be/cg/whirl2ops.cxx    2011-05-05 02:52:50 UTC (rev 3585)
>>> +++ trunk/osprey/be/cg/whirl2ops.cxx    2011-05-05 23:15:18 UTC (rev 3586)
>>> @@ -4596,6 +4596,65 @@
>>>
>>>  void dump_op(const OP* op);
>>>
>>> +static TN*
>>> +Handle_Replicate (WN* expr, WN* parent, TN* result) {
>>> +
>>> +  if (result == NULL) { result = Allocate_Result_TN(expr, NULL); }
>>> +
>>> +  WN* elmt_val = WN_kid0(expr);
>>> +  if (WN_operator (elmt_val) == OPR_INTCONST) {
>>> +
>>> +    INT64 value = WN_const_val (elmt_val);
>>> +    if (value == 0) {
>>> +      Build_OP (TOP_pxor, result, result, result, &New_OPs);
>>> +      return result;
>>> +    }
>>> +
>>> +    TYPE_ID elmt_mty = MTYPE_UNKNOWN;
>>> +    TYPE_ID vect_mty = MTYPE_UNKNOWN;
>>> +    switch (WN_opcode (expr)) {
>>> +    case OPC_V16I8I8REPLICA:
>>> +        elmt_mty = MTYPE_I8; vect_mty = MTYPE_V16I8;
>>> +        break;
>>> +
>>> +    case OPC_V16I4I4REPLICA:
>>> +        elmt_mty = MTYPE_I4; vect_mty = MTYPE_V16I4;
>>> +        break;
>>> +
>>> +    case OPC_V16I2I2REPLICA:
>>> +        elmt_mty = MTYPE_I2; vect_mty = MTYPE_V16I2;
>>> +        break;
>>> +
>>> +    case OPC_V16I1I1REPLICA:
>>> +        elmt_mty = MTYPE_I1; vect_mty = MTYPE_V16I1;
>>> +        break;
>>> +    }
>>> +
>>> +    if (elmt_mty != MTYPE_UNKNOWN) {
>>> +        TCON elmt_tcon;
>>> +        if (MTYPE_is_size_double (elmt_mty)) {
>>> +            elmt_tcon = Host_To_Targ(MTYPE_I8, value);
>>> +        } else {
>>> +            elmt_tcon = Host_To_Targ(MTYPE_I4, value);
>>> +        }
>>> +
>>> +        TCON vect_tcon = Create_Simd_Const (vect_mty, elmt_tcon);
>>> +        ST *vect_sym =
>>> +            New_Const_Sym (Enter_tcon (vect_tcon), Be_Type_Tbl(vect_mty));
>>> +        Allocate_Object (vect_sym);
>>> +        TN* vect_tn = Gen_Symbol_TN (vect_sym, 0, 0);
>>> +        Exp_OP1 (OPCODE_make_op (OPR_CONST, vect_mty, MTYPE_V),
>>> +                 result, vect_tn, &New_OPs);
>>> +
>>> +        return result;
>>> +    }
>>> +  }
>>> +  TN* kid_tn  = Expand_Expr (elmt_val, expr, NULL);
>>> +  Expand_Replicate (WN_opcode(expr), result, kid_tn, &New_OPs);
>>> +
>>> +  return result;
>>> +}
>>> +
>>>  static TN*
>>>  Handle_Fma_Operation(WN* expr, TN* result, WN *mul_wn, BOOL mul_kid0)
>>>  {
>>> @@ -5278,6 +5337,9 @@
>>>       return Handle_Shift_Operation(expr, result);
>>>     }
>>>  #elif defined(TARG_X8664)
>>> +  case OPR_REPLICATE:
>>> +    return Handle_Replicate (expr, parent, result);
>>> +
>>>   case OPR_SUB:
>>>   case OPR_ADD:
>>>     if ((CG_opt_level > 1) && Is_Target_Orochi() &&
>>>
>>> Modified: trunk/osprey/be/lno/Makefile.gbase
>>> ===================================================================
>>> --- trunk/osprey/be/lno/Makefile.gbase  2011-05-05 02:52:50 UTC (rev 3585)
>>> +++ trunk/osprey/be/lno/Makefile.gbase  2011-05-05 23:15:18 UTC (rev 3586)
>>> @@ -294,6 +294,7 @@
>>>
>>>  ifeq ($(BUILD_TARGET), X8664)
>>>  BE_LNOPT_NLX_CXX_SRCS += simd.cxx
>>> +BE_LNOPT_NLX_CXX_SRCS += simd_util.cxx
>>>  endif
>>>
>>>  BE_LNOPT_LX_CXX_SRCS = \
>>>
>>> Modified: trunk/osprey/be/lno/simd.cxx
>>> ===================================================================
>>> --- trunk/osprey/be/lno/simd.cxx        2011-05-05 02:52:50 UTC (rev 3585)
>>> +++ trunk/osprey/be/lno/simd.cxx        2011-05-05 23:15:18 UTC (rev 3586)
>>> @@ -85,11 +85,15 @@
>>>  #include "data_layout.h"          // for Stack_Alignment
>>>  #include "cond.h"                  // for Guard_A_Do
>>>  #include "config_opt.h"            // for Align_Unsafe
>>> +#include "be_util.h"               // for Current_PU_Count()
>>>  #include "region_main.h"          // for creating new region id.
>>>  #include "lego_util.h"             // for AWN_StidIntoSym, AWN_Add
>>>  #include "minvariant.h"            // for Minvariant_Removal
>>>  #include "prompf.h"
>>>
>>> +#define simd_util_INCLUDED
>>> +#include "simd_util.h"
>>> +
>>>  #define ABS(a) ((a<0)?-(a):(a))
>>>
>>>  BOOL debug;
>>> @@ -119,38 +123,28 @@
>>>  static void Simd_Mark_Code (WN* wn);
>>>
>>>  static INT Last_Vectorizable_Loop_Id = 0;
>>> +SIMD_VECTOR_CONF Simd_vect_conf;
>>>
>>> -static BOOL Too_Few_Iterations(INT64 iters, WN *body)
>>> +// Return TRUE iff there are too few iterations to generate a single
>>> +// vectorized iteration.
>>> +//
>>> +// One interesting snippet to challenge this function is following:
>>> +//
>>> +//     float f[]; double d[];
>>> +//     for (i = 0; i < 2; i++) { d[i] = (double)f[i]; }
>>> +//
>>> +// This func should not be folled by "f[i]". Currently, it is ok because
>>> +// "(double)f[i]" instead of "f[i]" is considered as vectorizable expr.
>>> +//
>>> +static BOOL Too_Few_Iterations (WN* loop, SCALAR_REF_STACK* vect_exprs)
>>>  {
>>> -   UINT32 iter_threshold = Iteration_Count_Threshold;
>>> -   if(LNO_Iter_threshold)
>>> -     iter_threshold = LNO_Iter_threshold;
>>> +  DO_LOOP_INFO *dli = Get_Do_Loop_Info (loop);
>>> +  if (dli->Est_Num_Iterations >= Simd_vect_conf.Get_Vect_Byte_Size ())
>>> +    return FALSE;
>>>
>>> -   if(iters < iter_threshold) //watch performance
>>> -    return TRUE;
>>> -   if(iters >= 16) //should always be fine, not too few
>>> -    return FALSE;
>>> -   //bug 12056: no matter what Iteration_Count_Threshold is, we should
>>> -   //           make sure at least one iter of the vectorized version
>>> -   for(WN *stmt = WN_first(body); stmt; stmt = WN_next(stmt)){
>>> -    switch(WN_desc(stmt)){
>>> -      case MTYPE_I1: case MTYPE_U1:
>>> -       return TRUE;
>>> -      case MTYPE_I2: case MTYPE_U2:
>>> -       if(iters < 8)
>>> -         return TRUE;
>>> -       break;
>>> -      case MTYPE_I4: case MTYPE_U4: case MTYPE_F4:
>>> -       if(iters < 4)
>>> -         return TRUE;
>>> -       break;
>>> -      case MTYPE_I8: case MTYPE_U8: case MTYPE_F8: case MTYPE_C4:
>>> -       if(iters < 2)
>>> -         return TRUE;
>>> -       break;
>>> -    }//end switch;
>>> -   }//end for
>>> -  return FALSE;
>>> +  SIMD_EXPR_MGR expr_mgr (loop, &SIMD_default_pool);
>>> +  expr_mgr.Convert_From_Lagacy_Expr_List (vect_exprs);
>>> +  return expr_mgr.Get_Max_Vect_Len () > dli->Est_Num_Iterations;
>>>  }
>>>
>>>  // Bug 10136: use a stack to count the number of different
>>> @@ -265,26 +259,27 @@
>>>   case OPR_SUB:
>>>     return TRUE;
>>>   case OPR_MPY:
>>> -    if (rtype == MTYPE_F8 || rtype == MTYPE_F4 ||
>>> -#ifdef TARG_X8664
>>> -       ((rtype == MTYPE_C4 || rtype == MTYPE_C8) && Is_Target_SSE3()) ||
>>> -#endif
>>> -       // I2MPY followed by I2STID is actually I4MPY followed by I2STID
>>> -       // We will distinguish between I4MPY and I2MPY in 
>>> Is_Well_Formed_Simd
>>> -       rtype == MTYPE_I4)
>>> +    if (rtype == MTYPE_F8 || rtype == MTYPE_F4)
>>>       return TRUE;
>>> -    else
>>> -      return FALSE;
>>> +    else if (rtype == MTYPE_I4) {
>>> +           // I2MPY followed by I2STID is actually I4MPY followed by I2STID
>>> +           // We will distinguish between I4MPY and I2MPY in 
>>> Is_Well_Formed_Simd
>>> +      return TRUE;
>>> +    } else if (Simd_vect_conf.Is_SSE3() &&
>>> +               (rtype == MTYPE_C4 || rtype == MTYPE_C8)) {
>>> +      // TODO: explain why requires SSE3
>>> +      return TRUE;
>>> +    }
>>> +
>>> +    return FALSE;
>>> +
>>>   case OPR_DIV:
>>> -    // Look at icc
>>> -    if (rtype == MTYPE_F8 || rtype == MTYPE_F4
>>> -#ifdef TARG_X8664
>>> -        || (rtype == MTYPE_C4 && Is_Target_SSE3())
>>> -#endif
>>> -       )
>>> +    if (rtype == MTYPE_F8 || rtype == MTYPE_F4 ||
>>> +        (rtype == MTYPE_C4 && Simd_vect_conf.Is_SSE3()))
>>>       return TRUE;
>>>     else
>>>       return FALSE;
>>> +
>>>   case OPR_MAX:
>>>   case OPR_MIN:
>>>     if (rtype == MTYPE_F4 || rtype == MTYPE_F8 || rtype == MTYPE_I4)
>>> @@ -304,7 +299,6 @@
>>>     else
>>>       return FALSE;
>>>   case OPR_RSQRT:
>>> -//case OPR_RECIP:
>>>  #ifdef TARG_X8664
>>>   case OPR_ATOMIC_RSQRT:
>>>  #endif
>>> @@ -2769,12 +2763,6 @@
>>>     return FALSE;
>>>    }
>>>
>>> -   //if there are too few iterations, we will not vectorize
>>> -   if(Too_Few_Iterations(dli->Est_Num_Iterations, WN_do_body(innerloop))){
>>> -     sprintf(verbose_msg, "Loop has too few iterations.");
>>> -     return FALSE;
>>> -   }
>>> -
>>>   // Bug 3784
>>>   // Check for useless loops (STID's use_list is empty) of the form
>>>   // do i
>>> @@ -2832,12 +2820,10 @@
>>>           WN_operator(enclosing_parallel_region) != OPR_REGION)
>>>       enclosing_parallel_region =
>>>         LWN_Get_Parent(enclosing_parallel_region);
>>> -#ifdef KEY
>>>     if (PU_cxx_lang(Get_Current_PU()) &&
>>>         Is_Eh_Or_Try_Region(enclosing_parallel_region))
>>>       enclosing_parallel_region =
>>>         LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region));
>>> -#endif
>>>     FmtAssert(enclosing_parallel_region, ("NYI"));
>>>     region_pragma = WN_first(WN_region_pragmas(enclosing_parallel_region));
>>>     while(region_pragma && (!reduction || !pdo)) {
>>> @@ -3119,12 +3105,10 @@
>>>               WN_operator(enclosing_parallel_region) != OPR_REGION)
>>>           enclosing_parallel_region =
>>>             LWN_Get_Parent(enclosing_parallel_region);
>>> -#ifdef KEY
>>>         if (PU_cxx_lang(Get_Current_PU()) &&
>>>             Is_Eh_Or_Try_Region(enclosing_parallel_region))
>>>           enclosing_parallel_region =
>>>             LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region));
>>> -#endif
>>>         WN *stmt_before_region = WN_prev(enclosing_parallel_region);
>>>         FmtAssert(stmt_before_region, ("NYI"));
>>>         WN *parent_block = LWN_Get_Parent(enclosing_parallel_region);
>>> @@ -3177,6 +3161,11 @@
>>>       return FALSE;
>>>   }
>>>
>>> +  if (Too_Few_Iterations (innerloop, simd_ops)) {
>>> +      sprintf(verbose_msg, "Too few iterations.");
>>> +      return FALSE;
>>> +  }
>>> +
>>>   //WHETHER scalar expansion is required
>>>   for(stmt=WN_first(body); stmt && curr_simd_red_manager; 
>>> stmt=WN_next(stmt)){
>>>     if (WN_operator(stmt) == OPR_STID &&
>>> @@ -4087,6 +4076,19 @@
>>>  // second argument is a constant it can be placed in a 1 byte immediate if 
>>> it fits.
>>>  // But the first option has been chosen because it fits easier with the 
>>> existing framework.
>>>
>>> +static WN* Simd_Vectorize_Shift_Left_Amt (WN* const_wn,
>>> +                                          WN *istore,  //parent of simd_op
>>> +                                          WN *simd_op) //const_wn's parent
>>> +{
>>> +  Is_True (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn,
>>> +           ("input WN isn't SHL"));
>>> +
>>> +    WN* shift_amt = WN_Intconst (MTYPE_I8, WN_const_val (const_wn));
>>> +    WN* res = LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, 
>>> MTYPE_I8),
>>> +                              shift_amt);
>>> +    return res;
>>> +}
>>> +
>>>  static WN *Simd_Vectorize_Constants(WN *const_wn,//to be vectorized
>>>                                     WN *istore,  //parent of simd_op
>>>                                     WN *simd_op) //const_wn's parent
>>> @@ -4094,6 +4096,10 @@
>>>    FmtAssert(const_wn && (WN_operator(const_wn)==OPR_INTCONST ||
>>>              WN_operator(const_wn)==OPR_CONST),("not a constant operand"));
>>>
>>> +   if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn) {
>>> +        return Simd_Vectorize_Shift_Left_Amt (const_wn, istore, simd_op);
>>> +   }
>>> +
>>>    TYPE_ID type;
>>>    TCON tcon;
>>>    ST *sym;
>>> @@ -4110,17 +4116,9 @@
>>>           WN_intrinsic(istore) == INTRN_SUBSU2) {
>>>         type = WN_desc(LWN_Get_Parent(istore));
>>>     }
>>> -    if (!MTYPE_is_float(type)){
>>> -          if (MTYPE_is_size_double(type)){
>>> -            INT64 value = (INT64)WN_const_val(const_wn);
>>> -            tcon = Host_To_Targ(MTYPE_I8, value);
>>> -          } else {
>>> -            INT value = (INT)WN_const_val(const_wn);
>>> -            tcon = Host_To_Targ(MTYPE_I4, value);
>>> -            }
>>> -          sym = New_Const_Sym (Enter_tcon (tcon),
>>> -                               Be_Type_Tbl(type));
>>> -    }
>>> +
>>> +    WN* orig_const_wn = const_wn;
>>> +
>>>     switch (type) {
>>>      case MTYPE_F4: case MTYPE_V16F4:
>>>           WN_set_rtype(const_wn, MTYPE_V16F4);
>>> @@ -4131,27 +4129,34 @@
>>>      case MTYPE_C4: case MTYPE_V16C4:
>>>           WN_set_rtype(const_wn, MTYPE_V16C4);
>>>           break;
>>> +
>>>      case MTYPE_U1: case MTYPE_I1: case MTYPE_V16I1:
>>> -          const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I1, MTYPE_V, sym);
>>> +          const_wn =
>>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I1, 
>>> MTYPE_I1),
>>> +                            orig_const_wn);
>>>           break;
>>> +
>>>      case MTYPE_U2: case MTYPE_I2: case MTYPE_V16I2:
>>> -          if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == 
>>> const_wn)
>>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, 
>>> sym);
>>> -         else
>>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I2, MTYPE_V, 
>>> sym);
>>> +          const_wn =
>>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I2, 
>>> MTYPE_I2),
>>> +                            orig_const_wn);
>>>           break;
>>> +
>>>      case MTYPE_U4: case MTYPE_I4: case MTYPE_V16I4:
>>> -          if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == 
>>> const_wn)
>>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, 
>>> sym);
>>> -         else
>>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I4, MTYPE_V, 
>>> sym);
>>> +          const_wn =
>>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I4, 
>>> MTYPE_I4),
>>> +                            orig_const_wn);
>>>           break;
>>> +
>>>      case MTYPE_U8: case MTYPE_I8: case MTYPE_V16I8:
>>> -          const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym);
>>> +          const_wn =
>>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, 
>>> MTYPE_I8),
>>> +                            orig_const_wn);
>>>           break;
>>> -     }//end switch
>>> +
>>> +     } // end switch
>>>
>>> -   return const_wn;
>>> +    return const_wn;
>>>  }
>>>
>>>  static WN *Simd_Vectorize_Invariants(WN *inv_wn,
>>> @@ -5342,8 +5347,9 @@
>>>  // Vectorize an innerloop
>>>  static INT Simd(WN* innerloop)
>>>  {
>>> -// Don't do anything for now for non-x8664
>>> -#ifdef TARG_X8664
>>> +  if (!Simd_vect_conf.Arch_Has_Vect ())
>>> +    return 0;
>>> +
>>>   INT good_vector = 0;
>>>
>>>   //pre_analysis to filter out loops that can not be vectorized
>>> @@ -5360,8 +5366,12 @@
>>>     Last_Vectorizable_Loop_Id ++;
>>>     if (Last_Vectorizable_Loop_Id < LNO_Simd_Loop_Skip_Before ||
>>>        Last_Vectorizable_Loop_Id > LNO_Simd_Loop_Skip_After ||
>>> -       Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal)
>>> +       Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal) {
>>> +      fprintf (stderr, "SIMD: loop (%s:%d) of PU:%d is skipped\n",
>>> +               Src_File_Name, Srcpos_To_Line(WN_Get_Linenum(innerloop)),
>>> +               Current_PU_Count ());
>>>       return 0;
>>> +    }
>>>   }
>>>
>>>   MEM_POOL_Push(&SIMD_default_pool);
>>> @@ -5587,10 +5597,6 @@
>>>   }
>>>
>>>   return 1;
>>> -#else
>>> -  return 0;
>>> -#endif // TARG_X8664
>>> -
>>>  }
>>>
>>>  static void Simd_Walk(WN* wn) {
>>>
>>> Added: trunk/osprey/be/lno/simd_util.cxx
>>> ===================================================================
>>> --- trunk/osprey/be/lno/simd_util.cxx                           (rev 0)
>>> +++ trunk/osprey/be/lno/simd_util.cxx   2011-05-05 23:15:18 UTC (rev 3586)
>>> @@ -0,0 +1,75 @@
>>> +/*
>>> +  Copyright (C) 2010 Advanced Micro Devices, Inc.  All Rights Reserved.
>>> +
>>> +  Open64 is free software; you can redistribute it and/or modify it
>>> +  under the terms of the GNU General Public License as published by
>>> +  the Free Software Foundation; either version 2 of the License,
>>> +  or (at your option) any later version.
>>> +
>>> +  Open64 is distributed in the hope that it will be useful, but
>>> +  WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +  GNU General Public License for more details.
>>> +
>>> +  You should have received a copy of the GNU General Public License
>>> +  along with this program; if not, write to the Free Software
>>> +  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>> +  02110-1301, USA.
>>> +*/
>>> +
>>> +#include "defs.h"
>>> +#include "glob.h"
>>> +#include "wn.h"
>>> +#include "cxx_memory.h"
>>> +#include "lwn_util.h"
>>> +#include "ff_utils.h"
>>> +
>>> +#define simd_util_INCLUDED
>>> +#include "simd_util.h"
>>> +
>>> +/////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//          Implementation of SIMD_EXPR
>>> +//
>>> +/////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +SIMD_EXPR::SIMD_EXPR (WN* expr) {
>>> +    _expr= expr;
>>> +
>>> +    _elem_sz = MTYPE_byte_size (WN_rtype (expr));
>>> +    _vect_len = Simd_vect_conf.Get_Vect_Len_Given_Elem_Ty (WN_rtype(expr));
>>> +
>>> +    _mis_align = -1;
>>> +    _is_invar = FALSE;
>>> +}
>>> +
>>> +/////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//          Implementation of SIMD_EXPR_MGR
>>> +//
>>> +/////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +SIMD_EXPR_MGR::SIMD_EXPR_MGR (WN* loop, MEM_POOL* mp):
>>> +    _loop(loop), _mp(mp), _exprs(mp) {
>>> +
>>> +    _min_vect_len = _max_vect_len = 0;
>>> +}
>>> +
>>> +void
>>> +SIMD_EXPR_MGR::Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK* simd_ops) {
>>> +
>>> +    Is_True (_exprs.empty (), ("expr is not empty"));
>>> +
>>> +    _min_vect_len = Simd_vect_conf.Get_Vect_Byte_Size ();
>>> +    _max_vect_len = 0;
>>> +
>>> +    for (INT i=0, elem_cnt = simd_ops->Elements(); i<elem_cnt; i++) {
>>> +        WN* wn_expr = simd_ops->Top_nth(i).Wn;
>>> +        SIMD_EXPR* expr = CXX_NEW (SIMD_EXPR (wn_expr), _mp);
>>> +
>>> +        _exprs.push_back (expr);
>>> +        INT vec_len = expr->Get_Vect_Len ();
>>> +        _min_vect_len = MIN(vec_len, _min_vect_len);
>>> +        _max_vect_len = MAX(vec_len, _max_vect_len);
>>> +    }
>>> +}
>>>
>>> Added: trunk/osprey/be/lno/simd_util.h
>>> ===================================================================
>>> --- trunk/osprey/be/lno/simd_util.h                             (rev 0)
>>> +++ trunk/osprey/be/lno/simd_util.h     2011-05-05 23:15:18 UTC (rev 3586)
>>> @@ -0,0 +1,196 @@
>>> +/*
>>> +  Copyright (C) 2010 Advanced Micro Devices, Inc.  All Rights Reserved.
>>> +
>>> +  Open64 is free software; you can redistribute it and/or modify it
>>> +  under the terms of the GNU General Public License as published by
>>> +  the Free Software Foundation; either version 2 of the License,
>>> +  or (at your option) any later version.
>>> +
>>> +  Open64 is distributed in the hope that it will be useful, but
>>> +  WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +  GNU General Public License for more details.
>>> +
>>> +  You should have received a copy of the GNU General Public License
>>> +  along with this program; if not, write to the Free Software
>>> +  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>> +  02110-1301, USA.
>>> +*/
>>> +
>>> +#ifndef simd_util_INCLUDED
>>> +    #error simd_util.h is for internal use only
>>> +#endif
>>> +
>>> +#include <list>
>>> +
>>> +// Forward declaration
>>> +//
>>> +class SIMD_EXPR;
>>> +class SIMD_EXPR_MGR;
>>> +class SIMD_VECTOR_CONF_BASE;
>>> +class SIMD_VECTOR_CONF;
>>> +
>>> +/////////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//   Arch specific stuff are encapsulated by SIMD_VECTOR_CONF_BASE and
>>> +//  SIMD_VECTOR_CONF.
>>> +//
>>> +//   TODO: it would be better to place these stuff in a separate header 
>>> file
>>> +//
>>> +/////////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +class SIMD_VECTOR_CONF_BASE {
>>> +public:
>>> +    // Does H.W support vectorization
>>> +    BOOL Arch_Has_Vect (void) const { return FALSE; }
>>> +
>>> +    // About SSE
>>> +    //
>>> +    BOOL Is_SSE_Family (void)   const { return FALSE; }
>>> +    BOOL Is_MMX (void)   const { return FALSE; }
>>> +    BOOL Is_SSE (void)   const { return FALSE; }
>>> +    BOOL Is_SSE2 (void)  const { return FALSE; }
>>> +    BOOL Is_SSE3 (void)  const { return FALSE; }
>>> +    BOOL Is_SSE4a (void) const { return FALSE; }
>>> +    BOOL Is_SSSE3 (void) const { return FALSE; }
>>> +    BOOL Is_SSE41 (void) const { return FALSE; }
>>> +    BOOL Is_SSE42 (void) const { return FALSE; }
>>> +
>>> +    INT Get_Vect_Byte_Size (void) const { return -1; }
>>> +    INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID) const { -1; }
>>> +};
>>> +
>>> +#ifdef TARG_X8664
>>> +
>>> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE {
>>> +public:
>>> +    BOOL Arch_Has_Vect (void) const { return TRUE; }
>>> +
>>> +    BOOL Is_MMX (void)   const { return Is_Target_MMX (); }
>>> +    BOOL Is_SSE (void)   const { return Is_Target_SSE (); }
>>> +    BOOL Is_SSE2 (void)  const { return Is_Target_SSE2 (); }
>>> +    BOOL Is_SSE3 (void)  const { return Is_Target_SSE3 (); }
>>> +    BOOL Is_SSE4a (void) const { return Is_Target_SSE4a (); }
>>> +    BOOL Is_SSSE3 (void) const { return Is_Target_SSSE3 (); }
>>> +    BOOL Is_SSE41 (void) const { return Is_Target_SSE41 (); }
>>> +    BOOL Is_SSE42 (void) const { return Is_Target_SSE42 (); }
>>> +    BOOL Is_SSE_Family (void) const {
>>> +        return Is_SSE () || Is_SSE2 () || Is_SSE3 () ||
>>> +               Is_SSE4a () || Is_SSSE3 () || Is_SSE41 () ||
>>> +               Is_SSE42 ();
>>> +    }
>>> +
>>> +    INT Get_Vect_Byte_Size (void) const { return 16; }
>>> +    INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID t) const
>>> +        { return 16/MTYPE_byte_size(t);}
>>> +};
>>> +
>>> +#else
>>> +
>>> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE;
>>> +
>>> +#endif
>>> +
>>> +extern SIMD_VECTOR_CONF Simd_vect_conf;
>>> +
>>> +/////////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//   First of all, SIMD_EXPR is a container hosting vectorization related
>>> +// informations. Among all these information, some can be derived directly 
>>> from
>>> +// the given WN expression itself; some need context. For instance, in
>>> +// the following snippet, the vectorizable expression "(x * 
>>> (INT32)sa2[i])" doesn't
>>> +// need to have 32 significant bits. However, the expression per se cannot 
>>> reveal
>>> +// this info, but the "contex" will help.
>>> +//
>>> +//    INT16 sa1[], sa2[]; INT32 x;
>>> +//    for (i = 0; i < N; i++) { sa1[i] = (INT16)(x * (INT32)sa2[i])
>>> +//
>>> +//   Since a SIMD_EXPR is not aware of the "context" it is in, it has to 
>>> "derive"
>>> +// information blindly, and imprecisely. The objects who have better 
>>> knowledge
>>> +// of the context should correct them properly.
>>> +//
>>> +//   Second, SIMD_EXPR is responsible for physically converting its 
>>> corresponding
>>> +// scalar expression into vectorized form.
>>> +//
>>> +//////////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +class SIMD_EXPR {
>>> +public:
>>> +    friend class SIMD_EXPR_MGR;
>>> +
>>> +    INT32 Get_Misalignment (void) { Is_True (FALSE, ("TBD")); return -1; }
>>> +
>>> +    INT32 Get_Vect_Len (void) const { return _vect_len; }
>>> +    INT32 Get_Vect_Elem_Byte_Sz (void) const { return _elem_sz; }
>>> +
>>> +    BOOL Is_Invar (void) const { return _is_invar; }
>>> +    WN* Get_Wn (void) const { return _expr; }
>>> +
>>> +private:
>>> +    SIMD_EXPR (WN* expr);
>>> +
>>> +    void Set_Elem_Sz (INT sz);
>>> +
>>> +    WN* _expr;
>>> +
>>> +    INT16 _vect_len;
>>> +    INT16 _elem_sz;
>>> +    INT16 _mis_align;
>>> +
>>> +    BOOL _is_invar;
>>> +};
>>> +
>>> +typedef mempool_allocator<SIMD_EXPR*> SIMD_EXPR_ALLOC;
>>> +typedef std::list<SIMD_EXPR*, SIMD_EXPR_ALLOC> SIMD_EXPR_LIST;
>>> +
>>> +
>>> +//////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//   SIMD_EXPR_MGR is to manage all SIMD_EXPRs of the loop being 
>>> vectorized.
>>> +// Its duty includes:
>>> +//
>>> +//   - identify vectorizable expressions.
>>> +//   - allocate/free a SIMD_EXPR.
>>> +//   - collect statistical information of the SIMD_EXPRs under management
>>> +//
>>> +/////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +class SIMD_EXPR_MGR {
>>> +public:
>>> +    SIMD_EXPR_MGR (WN* loop, MEM_POOL*);
>>> +    const SIMD_EXPR_LIST& Get_Expr_List (void) const { return _exprs; }
>>> +
>>> +    // This func is provided for the time being.
>>> +    //
>>> +    void Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK*);
>>> +
>>> +    inline UINT Get_Max_Vect_Len (void) const;
>>> +    inline UINT Get_Min_Vect_Len (void) const;
>>> +
>>> +private:
>>> +    MEM_POOL* _mp;
>>> +    WN* _loop;
>>> +    SIMD_EXPR_LIST _exprs;
>>> +
>>> +    UINT16 _min_vect_len;
>>> +    UINT16 _max_vect_len;
>>> +};
>>> +
>>> +
>>> +//////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +//          Inline functions are defined here
>>> +//
>>> +//////////////////////////////////////////////////////////////////////////////
>>> +//
>>> +inline UINT
>>> +SIMD_EXPR_MGR::Get_Max_Vect_Len (void) const {
>>> +    Is_True (_max_vect_len != 0, ("_max_vect_len isn't set properly"));
>>> +    return _max_vect_len;
>>> +}
>>> +
>>> +inline UINT
>>> +SIMD_EXPR_MGR::Get_Min_Vect_Len (void) const {
>>> +    Is_True (_min_vect_len != 0, ("_min_vect_len isn't set properly"));
>>> +    return _min_vect_len;
>>> +}
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> WhatsUp Gold - Download Free Network Management Software
>>> The most intuitive, comprehensive, and cost-effective network
>>> management toolset available today.  Delivers lowest initial
>>> acquisition cost and overall TCO of any competing solution.
>>> http://p.sf.net/sfu/whatsupgold-sd
>>> _______________________________________________
>>> Open64-devel mailing list
>>> Open64-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>>
>>
>> ------------------------------------------------------------------------------
>> WhatsUp Gold - Download Free Network Management Software
>> The most intuitive, comprehensive, and cost-effective network
>> management toolset available today.  Delivers lowest initial
>> acquisition cost and overall TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> Open64-devel mailing list
>> Open64-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>
>>
>>
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Re: [Open64-devel] r3586 - in trunk/osprey/be: cg lno

Reply via email to