STL files are pretty big too. I think this topic deserves more input
from fellow developers.
I had problem downloading the original attachment. I will look at the
.h file in question more carefully.

Sun

On Fri, May 6, 2011 at 7:47 AM, Ye, Mei <mei...@amd.com> wrote:
> Hi Sun,
>
> In private Email exchanges, the author give the following reasons.  I granted 
> a OK.
> Please check whether these reasons make sense for you.  Thanks.
>
> -Mei
>
> Reason 1: to make external header file concise.
>
>   Do you feel comfortable if the header file has more than 1000 lines of 
> code? I prefer it has only, say 50 lines of code. The concise header file 
> will quickly help you to figure out which function is appropriate to be 
> called.
>
> Need an example? you can examine the region.h (ORC region, I don't have code 
> at hand). I remember there are more than 2k line of code there. It will took 
> me quite a while to to figure out how to add a edge.
>
> reason 2: prevent internal data structure from being exposed....
> reason 3: improve compile time...
>
>
>
> -----Original Message-----
> From: Sun Chan [mailto:sun.c...@gmail.com]
> Sent: Thursday, May 05, 2011 4:35 PM
> To: open64-devel@lists.sourceforge.net
> Subject: Re: [Open64-devel] r3586 - in trunk/osprey/be: cg lno
>
> I thought Mei had requested the #define xxx_INCLUDED
> be moved to the header file (i.e. not in the .cxx file). This request
> is consistent with the rest of the source code and common practice.
>
> Please fix that or back out your checkin
>
> Sun
>
> On Fri, May 6, 2011 at 7:15 AM,  <s...@open64.net> wrote:
>> Author: pallavimathew
>> Date: 2011-05-05 19:15:18 -0400 (Thu, 05 May 2011)
>> New Revision: 3586
>>
>> Added:
>>   trunk/osprey/be/lno/simd_util.cxx
>>   trunk/osprey/be/lno/simd_util.h
>> Modified:
>>   trunk/osprey/be/cg/whirl2ops.cxx
>>   trunk/osprey/be/lno/Makefile.gbase
>>   trunk/osprey/be/lno/simd.cxx
>> Log:
>> This patch:
>> 1. Introduces an initial object-oriented framework of classes (SIMD_*) to 
>> represent and manage simd expressions.
>>
>> 2. Enhances the representation of constant integer vector and the 
>> load-from-constant-integer-vector.
>>
>>   Before this change, a constant integer (say 4) is vectorized into 
>> V16I4CONST. i.e a symbolic constant.
>>   After this change, the vector is represented by:
>>      U4INTCONST 4
>>   V16I4I4REPLICATE
>>
>>   It is up to CG to determine how generate the code. Currently, the code 
>> would be very
>> efficient if the element's value is 0. In this case, we need only one 
>> arithmetic instruction
>> "pxor $xmm0, $xmm0". If element's value is non-zero, there are two options:
>>  a)
>>    - a.1) load integer to a scalar GPR, and
>>    - a.2) move the GPR a SIMD register, and
>>    - a.3) perform resuffle to replicate the element's value to the entire 
>> vector.
>>
>>  b) save the vector as a symbolic constant, substitute the REPLICATE with
>>     load from the symbolic constant.
>>
>>  b) is appealing for vector with short vector-length, say V16I8, and a) is
>> desirable for vector like V16I1. However, we are blocked at step a.2) --
>> A SIMD register is categorized as fp register, we have hard time moving a int
>> register to fp register.
>>
>> 3. Vectorizes loop with small trip count
>>
>>   The original SIMD implementation set a hard trip-count limit for 
>> vectorization.
>> This change is to try to vectorize any loop so long as the trip-count >= 
>> vector-len.
>> e.g. Following loop can be vectorized now:
>>
>>  for (int i=0; i < 2; i++) double_array[i] = (double)float_array[i];
>>
>>  TODO in the future:
>>   For the loop like following, SIMD still try to perform peeling in order
>> to archieve better alignment. This is a deoptimization if the trip-count is
>> too small.
>>
>>  for (int i=0; i<4; i++) { a[i+1] = 0; }
>>
>> Code review by Mei Ye.
>>
>>
>> Modified: trunk/osprey/be/cg/whirl2ops.cxx
>> ===================================================================
>> --- trunk/osprey/be/cg/whirl2ops.cxx    2011-05-05 02:52:50 UTC (rev 3585)
>> +++ trunk/osprey/be/cg/whirl2ops.cxx    2011-05-05 23:15:18 UTC (rev 3586)
>> @@ -4596,6 +4596,65 @@
>>
>>  void dump_op(const OP* op);
>>
>> +static TN*
>> +Handle_Replicate (WN* expr, WN* parent, TN* result) {
>> +
>> +  if (result == NULL) { result = Allocate_Result_TN(expr, NULL); }
>> +
>> +  WN* elmt_val = WN_kid0(expr);
>> +  if (WN_operator (elmt_val) == OPR_INTCONST) {
>> +
>> +    INT64 value = WN_const_val (elmt_val);
>> +    if (value == 0) {
>> +      Build_OP (TOP_pxor, result, result, result, &New_OPs);
>> +      return result;
>> +    }
>> +
>> +    TYPE_ID elmt_mty = MTYPE_UNKNOWN;
>> +    TYPE_ID vect_mty = MTYPE_UNKNOWN;
>> +    switch (WN_opcode (expr)) {
>> +    case OPC_V16I8I8REPLICA:
>> +        elmt_mty = MTYPE_I8; vect_mty = MTYPE_V16I8;
>> +        break;
>> +
>> +    case OPC_V16I4I4REPLICA:
>> +        elmt_mty = MTYPE_I4; vect_mty = MTYPE_V16I4;
>> +        break;
>> +
>> +    case OPC_V16I2I2REPLICA:
>> +        elmt_mty = MTYPE_I2; vect_mty = MTYPE_V16I2;
>> +        break;
>> +
>> +    case OPC_V16I1I1REPLICA:
>> +        elmt_mty = MTYPE_I1; vect_mty = MTYPE_V16I1;
>> +        break;
>> +    }
>> +
>> +    if (elmt_mty != MTYPE_UNKNOWN) {
>> +        TCON elmt_tcon;
>> +        if (MTYPE_is_size_double (elmt_mty)) {
>> +            elmt_tcon = Host_To_Targ(MTYPE_I8, value);
>> +        } else {
>> +            elmt_tcon = Host_To_Targ(MTYPE_I4, value);
>> +        }
>> +
>> +        TCON vect_tcon = Create_Simd_Const (vect_mty, elmt_tcon);
>> +        ST *vect_sym =
>> +            New_Const_Sym (Enter_tcon (vect_tcon), Be_Type_Tbl(vect_mty));
>> +        Allocate_Object (vect_sym);
>> +        TN* vect_tn = Gen_Symbol_TN (vect_sym, 0, 0);
>> +        Exp_OP1 (OPCODE_make_op (OPR_CONST, vect_mty, MTYPE_V),
>> +                 result, vect_tn, &New_OPs);
>> +
>> +        return result;
>> +    }
>> +  }
>> +  TN* kid_tn  = Expand_Expr (elmt_val, expr, NULL);
>> +  Expand_Replicate (WN_opcode(expr), result, kid_tn, &New_OPs);
>> +
>> +  return result;
>> +}
>> +
>>  static TN*
>>  Handle_Fma_Operation(WN* expr, TN* result, WN *mul_wn, BOOL mul_kid0)
>>  {
>> @@ -5278,6 +5337,9 @@
>>       return Handle_Shift_Operation(expr, result);
>>     }
>>  #elif defined(TARG_X8664)
>> +  case OPR_REPLICATE:
>> +    return Handle_Replicate (expr, parent, result);
>> +
>>   case OPR_SUB:
>>   case OPR_ADD:
>>     if ((CG_opt_level > 1) && Is_Target_Orochi() &&
>>
>> Modified: trunk/osprey/be/lno/Makefile.gbase
>> ===================================================================
>> --- trunk/osprey/be/lno/Makefile.gbase  2011-05-05 02:52:50 UTC (rev 3585)
>> +++ trunk/osprey/be/lno/Makefile.gbase  2011-05-05 23:15:18 UTC (rev 3586)
>> @@ -294,6 +294,7 @@
>>
>>  ifeq ($(BUILD_TARGET), X8664)
>>  BE_LNOPT_NLX_CXX_SRCS += simd.cxx
>> +BE_LNOPT_NLX_CXX_SRCS += simd_util.cxx
>>  endif
>>
>>  BE_LNOPT_LX_CXX_SRCS = \
>>
>> Modified: trunk/osprey/be/lno/simd.cxx
>> ===================================================================
>> --- trunk/osprey/be/lno/simd.cxx        2011-05-05 02:52:50 UTC (rev 3585)
>> +++ trunk/osprey/be/lno/simd.cxx        2011-05-05 23:15:18 UTC (rev 3586)
>> @@ -85,11 +85,15 @@
>>  #include "data_layout.h"          // for Stack_Alignment
>>  #include "cond.h"                  // for Guard_A_Do
>>  #include "config_opt.h"            // for Align_Unsafe
>> +#include "be_util.h"               // for Current_PU_Count()
>>  #include "region_main.h"          // for creating new region id.
>>  #include "lego_util.h"             // for AWN_StidIntoSym, AWN_Add
>>  #include "minvariant.h"            // for Minvariant_Removal
>>  #include "prompf.h"
>>
>> +#define simd_util_INCLUDED
>> +#include "simd_util.h"
>> +
>>  #define ABS(a) ((a<0)?-(a):(a))
>>
>>  BOOL debug;
>> @@ -119,38 +123,28 @@
>>  static void Simd_Mark_Code (WN* wn);
>>
>>  static INT Last_Vectorizable_Loop_Id = 0;
>> +SIMD_VECTOR_CONF Simd_vect_conf;
>>
>> -static BOOL Too_Few_Iterations(INT64 iters, WN *body)
>> +// Return TRUE iff there are too few iterations to generate a single
>> +// vectorized iteration.
>> +//
>> +// One interesting snippet to challenge this function is following:
>> +//
>> +//     float f[]; double d[];
>> +//     for (i = 0; i < 2; i++) { d[i] = (double)f[i]; }
>> +//
>> +// This func should not be folled by "f[i]". Currently, it is ok because
>> +// "(double)f[i]" instead of "f[i]" is considered as vectorizable expr.
>> +//
>> +static BOOL Too_Few_Iterations (WN* loop, SCALAR_REF_STACK* vect_exprs)
>>  {
>> -   UINT32 iter_threshold = Iteration_Count_Threshold;
>> -   if(LNO_Iter_threshold)
>> -     iter_threshold = LNO_Iter_threshold;
>> +  DO_LOOP_INFO *dli = Get_Do_Loop_Info (loop);
>> +  if (dli->Est_Num_Iterations >= Simd_vect_conf.Get_Vect_Byte_Size ())
>> +    return FALSE;
>>
>> -   if(iters < iter_threshold) //watch performance
>> -    return TRUE;
>> -   if(iters >= 16) //should always be fine, not too few
>> -    return FALSE;
>> -   //bug 12056: no matter what Iteration_Count_Threshold is, we should
>> -   //           make sure at least one iter of the vectorized version
>> -   for(WN *stmt = WN_first(body); stmt; stmt = WN_next(stmt)){
>> -    switch(WN_desc(stmt)){
>> -      case MTYPE_I1: case MTYPE_U1:
>> -       return TRUE;
>> -      case MTYPE_I2: case MTYPE_U2:
>> -       if(iters < 8)
>> -         return TRUE;
>> -       break;
>> -      case MTYPE_I4: case MTYPE_U4: case MTYPE_F4:
>> -       if(iters < 4)
>> -         return TRUE;
>> -       break;
>> -      case MTYPE_I8: case MTYPE_U8: case MTYPE_F8: case MTYPE_C4:
>> -       if(iters < 2)
>> -         return TRUE;
>> -       break;
>> -    }//end switch;
>> -   }//end for
>> -  return FALSE;
>> +  SIMD_EXPR_MGR expr_mgr (loop, &SIMD_default_pool);
>> +  expr_mgr.Convert_From_Lagacy_Expr_List (vect_exprs);
>> +  return expr_mgr.Get_Max_Vect_Len () > dli->Est_Num_Iterations;
>>  }
>>
>>  // Bug 10136: use a stack to count the number of different
>> @@ -265,26 +259,27 @@
>>   case OPR_SUB:
>>     return TRUE;
>>   case OPR_MPY:
>> -    if (rtype == MTYPE_F8 || rtype == MTYPE_F4 ||
>> -#ifdef TARG_X8664
>> -       ((rtype == MTYPE_C4 || rtype == MTYPE_C8) && Is_Target_SSE3()) ||
>> -#endif
>> -       // I2MPY followed by I2STID is actually I4MPY followed by I2STID
>> -       // We will distinguish between I4MPY and I2MPY in Is_Well_Formed_Simd
>> -       rtype == MTYPE_I4)
>> +    if (rtype == MTYPE_F8 || rtype == MTYPE_F4)
>>       return TRUE;
>> -    else
>> -      return FALSE;
>> +    else if (rtype == MTYPE_I4) {
>> +           // I2MPY followed by I2STID is actually I4MPY followed by I2STID
>> +           // We will distinguish between I4MPY and I2MPY in 
>> Is_Well_Formed_Simd
>> +      return TRUE;
>> +    } else if (Simd_vect_conf.Is_SSE3() &&
>> +               (rtype == MTYPE_C4 || rtype == MTYPE_C8)) {
>> +      // TODO: explain why requires SSE3
>> +      return TRUE;
>> +    }
>> +
>> +    return FALSE;
>> +
>>   case OPR_DIV:
>> -    // Look at icc
>> -    if (rtype == MTYPE_F8 || rtype == MTYPE_F4
>> -#ifdef TARG_X8664
>> -        || (rtype == MTYPE_C4 && Is_Target_SSE3())
>> -#endif
>> -       )
>> +    if (rtype == MTYPE_F8 || rtype == MTYPE_F4 ||
>> +        (rtype == MTYPE_C4 && Simd_vect_conf.Is_SSE3()))
>>       return TRUE;
>>     else
>>       return FALSE;
>> +
>>   case OPR_MAX:
>>   case OPR_MIN:
>>     if (rtype == MTYPE_F4 || rtype == MTYPE_F8 || rtype == MTYPE_I4)
>> @@ -304,7 +299,6 @@
>>     else
>>       return FALSE;
>>   case OPR_RSQRT:
>> -//case OPR_RECIP:
>>  #ifdef TARG_X8664
>>   case OPR_ATOMIC_RSQRT:
>>  #endif
>> @@ -2769,12 +2763,6 @@
>>     return FALSE;
>>    }
>>
>> -   //if there are too few iterations, we will not vectorize
>> -   if(Too_Few_Iterations(dli->Est_Num_Iterations, WN_do_body(innerloop))){
>> -     sprintf(verbose_msg, "Loop has too few iterations.");
>> -     return FALSE;
>> -   }
>> -
>>   // Bug 3784
>>   // Check for useless loops (STID's use_list is empty) of the form
>>   // do i
>> @@ -2832,12 +2820,10 @@
>>           WN_operator(enclosing_parallel_region) != OPR_REGION)
>>       enclosing_parallel_region =
>>         LWN_Get_Parent(enclosing_parallel_region);
>> -#ifdef KEY
>>     if (PU_cxx_lang(Get_Current_PU()) &&
>>         Is_Eh_Or_Try_Region(enclosing_parallel_region))
>>       enclosing_parallel_region =
>>         LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region));
>> -#endif
>>     FmtAssert(enclosing_parallel_region, ("NYI"));
>>     region_pragma = WN_first(WN_region_pragmas(enclosing_parallel_region));
>>     while(region_pragma && (!reduction || !pdo)) {
>> @@ -3119,12 +3105,10 @@
>>               WN_operator(enclosing_parallel_region) != OPR_REGION)
>>           enclosing_parallel_region =
>>             LWN_Get_Parent(enclosing_parallel_region);
>> -#ifdef KEY
>>         if (PU_cxx_lang(Get_Current_PU()) &&
>>             Is_Eh_Or_Try_Region(enclosing_parallel_region))
>>           enclosing_parallel_region =
>>             LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region));
>> -#endif
>>         WN *stmt_before_region = WN_prev(enclosing_parallel_region);
>>         FmtAssert(stmt_before_region, ("NYI"));
>>         WN *parent_block = LWN_Get_Parent(enclosing_parallel_region);
>> @@ -3177,6 +3161,11 @@
>>       return FALSE;
>>   }
>>
>> +  if (Too_Few_Iterations (innerloop, simd_ops)) {
>> +      sprintf(verbose_msg, "Too few iterations.");
>> +      return FALSE;
>> +  }
>> +
>>   //WHETHER scalar expansion is required
>>   for(stmt=WN_first(body); stmt && curr_simd_red_manager; 
>> stmt=WN_next(stmt)){
>>     if (WN_operator(stmt) == OPR_STID &&
>> @@ -4087,6 +4076,19 @@
>>  // second argument is a constant it can be placed in a 1 byte immediate if 
>> it fits.
>>  // But the first option has been chosen because it fits easier with the 
>> existing framework.
>>
>> +static WN* Simd_Vectorize_Shift_Left_Amt (WN* const_wn,
>> +                                          WN *istore,  //parent of simd_op
>> +                                          WN *simd_op) //const_wn's parent
>> +{
>> +  Is_True (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn,
>> +           ("input WN isn't SHL"));
>> +
>> +    WN* shift_amt = WN_Intconst (MTYPE_I8, WN_const_val (const_wn));
>> +    WN* res = LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, 
>> MTYPE_I8),
>> +                              shift_amt);
>> +    return res;
>> +}
>> +
>>  static WN *Simd_Vectorize_Constants(WN *const_wn,//to be vectorized
>>                                     WN *istore,  //parent of simd_op
>>                                     WN *simd_op) //const_wn's parent
>> @@ -4094,6 +4096,10 @@
>>    FmtAssert(const_wn && (WN_operator(const_wn)==OPR_INTCONST ||
>>              WN_operator(const_wn)==OPR_CONST),("not a constant operand"));
>>
>> +   if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn) {
>> +        return Simd_Vectorize_Shift_Left_Amt (const_wn, istore, simd_op);
>> +   }
>> +
>>    TYPE_ID type;
>>    TCON tcon;
>>    ST *sym;
>> @@ -4110,17 +4116,9 @@
>>           WN_intrinsic(istore) == INTRN_SUBSU2) {
>>         type = WN_desc(LWN_Get_Parent(istore));
>>     }
>> -    if (!MTYPE_is_float(type)){
>> -          if (MTYPE_is_size_double(type)){
>> -            INT64 value = (INT64)WN_const_val(const_wn);
>> -            tcon = Host_To_Targ(MTYPE_I8, value);
>> -          } else {
>> -            INT value = (INT)WN_const_val(const_wn);
>> -            tcon = Host_To_Targ(MTYPE_I4, value);
>> -            }
>> -          sym = New_Const_Sym (Enter_tcon (tcon),
>> -                               Be_Type_Tbl(type));
>> -    }
>> +
>> +    WN* orig_const_wn = const_wn;
>> +
>>     switch (type) {
>>      case MTYPE_F4: case MTYPE_V16F4:
>>           WN_set_rtype(const_wn, MTYPE_V16F4);
>> @@ -4131,27 +4129,34 @@
>>      case MTYPE_C4: case MTYPE_V16C4:
>>           WN_set_rtype(const_wn, MTYPE_V16C4);
>>           break;
>> +
>>      case MTYPE_U1: case MTYPE_I1: case MTYPE_V16I1:
>> -          const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I1, MTYPE_V, sym);
>> +          const_wn =
>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I1, 
>> MTYPE_I1),
>> +                            orig_const_wn);
>>           break;
>> +
>>      case MTYPE_U2: case MTYPE_I2: case MTYPE_V16I2:
>> -          if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == 
>> const_wn)
>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym);
>> -         else
>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I2, MTYPE_V, sym);
>> +          const_wn =
>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I2, 
>> MTYPE_I2),
>> +                            orig_const_wn);
>>           break;
>> +
>>      case MTYPE_U4: case MTYPE_I4: case MTYPE_V16I4:
>> -          if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == 
>> const_wn)
>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym);
>> -         else
>> -           const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I4, MTYPE_V, sym);
>> +          const_wn =
>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I4, 
>> MTYPE_I4),
>> +                            orig_const_wn);
>>           break;
>> +
>>      case MTYPE_U8: case MTYPE_I8: case MTYPE_V16I8:
>> -          const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym);
>> +          const_wn =
>> +            LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, 
>> MTYPE_I8),
>> +                            orig_const_wn);
>>           break;
>> -     }//end switch
>> +
>> +     } // end switch
>>
>> -   return const_wn;
>> +    return const_wn;
>>  }
>>
>>  static WN *Simd_Vectorize_Invariants(WN *inv_wn,
>> @@ -5342,8 +5347,9 @@
>>  // Vectorize an innerloop
>>  static INT Simd(WN* innerloop)
>>  {
>> -// Don't do anything for now for non-x8664
>> -#ifdef TARG_X8664
>> +  if (!Simd_vect_conf.Arch_Has_Vect ())
>> +    return 0;
>> +
>>   INT good_vector = 0;
>>
>>   //pre_analysis to filter out loops that can not be vectorized
>> @@ -5360,8 +5366,12 @@
>>     Last_Vectorizable_Loop_Id ++;
>>     if (Last_Vectorizable_Loop_Id < LNO_Simd_Loop_Skip_Before ||
>>        Last_Vectorizable_Loop_Id > LNO_Simd_Loop_Skip_After ||
>> -       Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal)
>> +       Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal) {
>> +      fprintf (stderr, "SIMD: loop (%s:%d) of PU:%d is skipped\n",
>> +               Src_File_Name, Srcpos_To_Line(WN_Get_Linenum(innerloop)),
>> +               Current_PU_Count ());
>>       return 0;
>> +    }
>>   }
>>
>>   MEM_POOL_Push(&SIMD_default_pool);
>> @@ -5587,10 +5597,6 @@
>>   }
>>
>>   return 1;
>> -#else
>> -  return 0;
>> -#endif // TARG_X8664
>> -
>>  }
>>
>>  static void Simd_Walk(WN* wn) {
>>
>> Added: trunk/osprey/be/lno/simd_util.cxx
>> ===================================================================
>> --- trunk/osprey/be/lno/simd_util.cxx                           (rev 0)
>> +++ trunk/osprey/be/lno/simd_util.cxx   2011-05-05 23:15:18 UTC (rev 3586)
>> @@ -0,0 +1,75 @@
>> +/*
>> +  Copyright (C) 2010 Advanced Micro Devices, Inc.  All Rights Reserved.
>> +
>> +  Open64 is free software; you can redistribute it and/or modify it
>> +  under the terms of the GNU General Public License as published by
>> +  the Free Software Foundation; either version 2 of the License,
>> +  or (at your option) any later version.
>> +
>> +  Open64 is distributed in the hope that it will be useful, but
>> +  WITHOUT ANY WARRANTY; without even the implied warranty of
>> +  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +  GNU General Public License for more details.
>> +
>> +  You should have received a copy of the GNU General Public License
>> +  along with this program; if not, write to the Free Software
>> +  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> +  02110-1301, USA.
>> +*/
>> +
>> +#include "defs.h"
>> +#include "glob.h"
>> +#include "wn.h"
>> +#include "cxx_memory.h"
>> +#include "lwn_util.h"
>> +#include "ff_utils.h"
>> +
>> +#define simd_util_INCLUDED
>> +#include "simd_util.h"
>> +
>> +/////////////////////////////////////////////////////////////////////////////
>> +//
>> +//          Implementation of SIMD_EXPR
>> +//
>> +/////////////////////////////////////////////////////////////////////////////
>> +//
>> +SIMD_EXPR::SIMD_EXPR (WN* expr) {
>> +    _expr= expr;
>> +
>> +    _elem_sz = MTYPE_byte_size (WN_rtype (expr));
>> +    _vect_len = Simd_vect_conf.Get_Vect_Len_Given_Elem_Ty (WN_rtype(expr));
>> +
>> +    _mis_align = -1;
>> +    _is_invar = FALSE;
>> +}
>> +
>> +/////////////////////////////////////////////////////////////////////////////
>> +//
>> +//          Implementation of SIMD_EXPR_MGR
>> +//
>> +/////////////////////////////////////////////////////////////////////////////
>> +//
>> +SIMD_EXPR_MGR::SIMD_EXPR_MGR (WN* loop, MEM_POOL* mp):
>> +    _loop(loop), _mp(mp), _exprs(mp) {
>> +
>> +    _min_vect_len = _max_vect_len = 0;
>> +}
>> +
>> +void
>> +SIMD_EXPR_MGR::Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK* simd_ops) {
>> +
>> +    Is_True (_exprs.empty (), ("expr is not empty"));
>> +
>> +    _min_vect_len = Simd_vect_conf.Get_Vect_Byte_Size ();
>> +    _max_vect_len = 0;
>> +
>> +    for (INT i=0, elem_cnt = simd_ops->Elements(); i<elem_cnt; i++) {
>> +        WN* wn_expr = simd_ops->Top_nth(i).Wn;
>> +        SIMD_EXPR* expr = CXX_NEW (SIMD_EXPR (wn_expr), _mp);
>> +
>> +        _exprs.push_back (expr);
>> +        INT vec_len = expr->Get_Vect_Len ();
>> +        _min_vect_len = MIN(vec_len, _min_vect_len);
>> +        _max_vect_len = MAX(vec_len, _max_vect_len);
>> +    }
>> +}
>>
>> Added: trunk/osprey/be/lno/simd_util.h
>> ===================================================================
>> --- trunk/osprey/be/lno/simd_util.h                             (rev 0)
>> +++ trunk/osprey/be/lno/simd_util.h     2011-05-05 23:15:18 UTC (rev 3586)
>> @@ -0,0 +1,196 @@
>> +/*
>> +  Copyright (C) 2010 Advanced Micro Devices, Inc.  All Rights Reserved.
>> +
>> +  Open64 is free software; you can redistribute it and/or modify it
>> +  under the terms of the GNU General Public License as published by
>> +  the Free Software Foundation; either version 2 of the License,
>> +  or (at your option) any later version.
>> +
>> +  Open64 is distributed in the hope that it will be useful, but
>> +  WITHOUT ANY WARRANTY; without even the implied warranty of
>> +  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +  GNU General Public License for more details.
>> +
>> +  You should have received a copy of the GNU General Public License
>> +  along with this program; if not, write to the Free Software
>> +  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> +  02110-1301, USA.
>> +*/
>> +
>> +#ifndef simd_util_INCLUDED
>> +    #error simd_util.h is for internal use only
>> +#endif
>> +
>> +#include <list>
>> +
>> +// Forward declaration
>> +//
>> +class SIMD_EXPR;
>> +class SIMD_EXPR_MGR;
>> +class SIMD_VECTOR_CONF_BASE;
>> +class SIMD_VECTOR_CONF;
>> +
>> +/////////////////////////////////////////////////////////////////////////////////
>> +//
>> +//   Arch specific stuff are encapsulated by SIMD_VECTOR_CONF_BASE and
>> +//  SIMD_VECTOR_CONF.
>> +//
>> +//   TODO: it would be better to place these stuff in a separate header file
>> +//
>> +/////////////////////////////////////////////////////////////////////////////////
>> +//
>> +class SIMD_VECTOR_CONF_BASE {
>> +public:
>> +    // Does H.W support vectorization
>> +    BOOL Arch_Has_Vect (void) const { return FALSE; }
>> +
>> +    // About SSE
>> +    //
>> +    BOOL Is_SSE_Family (void)   const { return FALSE; }
>> +    BOOL Is_MMX (void)   const { return FALSE; }
>> +    BOOL Is_SSE (void)   const { return FALSE; }
>> +    BOOL Is_SSE2 (void)  const { return FALSE; }
>> +    BOOL Is_SSE3 (void)  const { return FALSE; }
>> +    BOOL Is_SSE4a (void) const { return FALSE; }
>> +    BOOL Is_SSSE3 (void) const { return FALSE; }
>> +    BOOL Is_SSE41 (void) const { return FALSE; }
>> +    BOOL Is_SSE42 (void) const { return FALSE; }
>> +
>> +    INT Get_Vect_Byte_Size (void) const { return -1; }
>> +    INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID) const { -1; }
>> +};
>> +
>> +#ifdef TARG_X8664
>> +
>> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE {
>> +public:
>> +    BOOL Arch_Has_Vect (void) const { return TRUE; }
>> +
>> +    BOOL Is_MMX (void)   const { return Is_Target_MMX (); }
>> +    BOOL Is_SSE (void)   const { return Is_Target_SSE (); }
>> +    BOOL Is_SSE2 (void)  const { return Is_Target_SSE2 (); }
>> +    BOOL Is_SSE3 (void)  const { return Is_Target_SSE3 (); }
>> +    BOOL Is_SSE4a (void) const { return Is_Target_SSE4a (); }
>> +    BOOL Is_SSSE3 (void) const { return Is_Target_SSSE3 (); }
>> +    BOOL Is_SSE41 (void) const { return Is_Target_SSE41 (); }
>> +    BOOL Is_SSE42 (void) const { return Is_Target_SSE42 (); }
>> +    BOOL Is_SSE_Family (void) const {
>> +        return Is_SSE () || Is_SSE2 () || Is_SSE3 () ||
>> +               Is_SSE4a () || Is_SSSE3 () || Is_SSE41 () ||
>> +               Is_SSE42 ();
>> +    }
>> +
>> +    INT Get_Vect_Byte_Size (void) const { return 16; }
>> +    INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID t) const
>> +        { return 16/MTYPE_byte_size(t);}
>> +};
>> +
>> +#else
>> +
>> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE;
>> +
>> +#endif
>> +
>> +extern SIMD_VECTOR_CONF Simd_vect_conf;
>> +
>> +/////////////////////////////////////////////////////////////////////////////////
>> +//
>> +//   First of all, SIMD_EXPR is a container hosting vectorization related
>> +// informations. Among all these information, some can be derived directly 
>> from
>> +// the given WN expression itself; some need context. For instance, in
>> +// the following snippet, the vectorizable expression "(x * (INT32)sa2[i])" 
>> doesn't
>> +// need to have 32 significant bits. However, the expression per se cannot 
>> reveal
>> +// this info, but the "contex" will help.
>> +//
>> +//    INT16 sa1[], sa2[]; INT32 x;
>> +//    for (i = 0; i < N; i++) { sa1[i] = (INT16)(x * (INT32)sa2[i])
>> +//
>> +//   Since a SIMD_EXPR is not aware of the "context" it is in, it has to 
>> "derive"
>> +// information blindly, and imprecisely. The objects who have better 
>> knowledge
>> +// of the context should correct them properly.
>> +//
>> +//   Second, SIMD_EXPR is responsible for physically converting its 
>> corresponding
>> +// scalar expression into vectorized form.
>> +//
>> +//////////////////////////////////////////////////////////////////////////////////
>> +//
>> +class SIMD_EXPR {
>> +public:
>> +    friend class SIMD_EXPR_MGR;
>> +
>> +    INT32 Get_Misalignment (void) { Is_True (FALSE, ("TBD")); return -1; }
>> +
>> +    INT32 Get_Vect_Len (void) const { return _vect_len; }
>> +    INT32 Get_Vect_Elem_Byte_Sz (void) const { return _elem_sz; }
>> +
>> +    BOOL Is_Invar (void) const { return _is_invar; }
>> +    WN* Get_Wn (void) const { return _expr; }
>> +
>> +private:
>> +    SIMD_EXPR (WN* expr);
>> +
>> +    void Set_Elem_Sz (INT sz);
>> +
>> +    WN* _expr;
>> +
>> +    INT16 _vect_len;
>> +    INT16 _elem_sz;
>> +    INT16 _mis_align;
>> +
>> +    BOOL _is_invar;
>> +};
>> +
>> +typedef mempool_allocator<SIMD_EXPR*> SIMD_EXPR_ALLOC;
>> +typedef std::list<SIMD_EXPR*, SIMD_EXPR_ALLOC> SIMD_EXPR_LIST;
>> +
>> +
>> +//////////////////////////////////////////////////////////////////////////////
>> +//
>> +//   SIMD_EXPR_MGR is to manage all SIMD_EXPRs of the loop being vectorized.
>> +// Its duty includes:
>> +//
>> +//   - identify vectorizable expressions.
>> +//   - allocate/free a SIMD_EXPR.
>> +//   - collect statistical information of the SIMD_EXPRs under management
>> +//
>> +/////////////////////////////////////////////////////////////////////////////
>> +//
>> +class SIMD_EXPR_MGR {
>> +public:
>> +    SIMD_EXPR_MGR (WN* loop, MEM_POOL*);
>> +    const SIMD_EXPR_LIST& Get_Expr_List (void) const { return _exprs; }
>> +
>> +    // This func is provided for the time being.
>> +    //
>> +    void Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK*);
>> +
>> +    inline UINT Get_Max_Vect_Len (void) const;
>> +    inline UINT Get_Min_Vect_Len (void) const;
>> +
>> +private:
>> +    MEM_POOL* _mp;
>> +    WN* _loop;
>> +    SIMD_EXPR_LIST _exprs;
>> +
>> +    UINT16 _min_vect_len;
>> +    UINT16 _max_vect_len;
>> +};
>> +
>> +
>> +//////////////////////////////////////////////////////////////////////////////
>> +//
>> +//          Inline functions are defined here
>> +//
>> +//////////////////////////////////////////////////////////////////////////////
>> +//
>> +inline UINT
>> +SIMD_EXPR_MGR::Get_Max_Vect_Len (void) const {
>> +    Is_True (_max_vect_len != 0, ("_max_vect_len isn't set properly"));
>> +    return _max_vect_len;
>> +}
>> +
>> +inline UINT
>> +SIMD_EXPR_MGR::Get_Min_Vect_Len (void) const {
>> +    Is_True (_min_vect_len != 0, ("_min_vect_len isn't set properly"));
>> +    return _min_vect_len;
>> +}
>>
>>
>> ------------------------------------------------------------------------------
>> WhatsUp Gold - Download Free Network Management Software
>> The most intuitive, comprehensive, and cost-effective network
>> management toolset available today.  Delivers lowest initial
>> acquisition cost and overall TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> Open64-devel mailing list
>> Open64-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> Open64-devel mailing list
> Open64-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/open64-devel
>
>
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to