STL files are pretty big too. I think this topic deserves more input from fellow developers. I had problem downloading the original attachment. I will look at the .h file in question more carefully.
Sun On Fri, May 6, 2011 at 7:47 AM, Ye, Mei <mei...@amd.com> wrote: > Hi Sun, > > In private Email exchanges, the author give the following reasons. I granted > a OK. > Please check whether these reasons make sense for you. Thanks. > > -Mei > > Reason 1: to make external header file concise. > > Do you feel comfortable if the header file has more than 1000 lines of > code? I prefer it has only, say 50 lines of code. The concise header file > will quickly help you to figure out which function is appropriate to be > called. > > Need an example? you can examine the region.h (ORC region, I don't have code > at hand). I remember there are more than 2k line of code there. It will took > me quite a while to to figure out how to add a edge. > > reason 2: prevent internal data structure from being exposed.... > reason 3: improve compile time... > > > > -----Original Message----- > From: Sun Chan [mailto:sun.c...@gmail.com] > Sent: Thursday, May 05, 2011 4:35 PM > To: open64-devel@lists.sourceforge.net > Subject: Re: [Open64-devel] r3586 - in trunk/osprey/be: cg lno > > I thought Mei had requested the #define xxx_INCLUDED > be moved to the header file (i.e. not in the .cxx file). This request > is consistent with the rest of the source code and common practice. > > Please fix that or back out your checkin > > Sun > > On Fri, May 6, 2011 at 7:15 AM, <s...@open64.net> wrote: >> Author: pallavimathew >> Date: 2011-05-05 19:15:18 -0400 (Thu, 05 May 2011) >> New Revision: 3586 >> >> Added: >> trunk/osprey/be/lno/simd_util.cxx >> trunk/osprey/be/lno/simd_util.h >> Modified: >> trunk/osprey/be/cg/whirl2ops.cxx >> trunk/osprey/be/lno/Makefile.gbase >> trunk/osprey/be/lno/simd.cxx >> Log: >> This patch: >> 1. Introduces an initial object-oriented framework of classes (SIMD_*) to >> represent and manage simd expressions. >> >> 2. Enhances the representation of constant integer vector and the >> load-from-constant-integer-vector. >> >> Before this change, a constant integer (say 4) is vectorized into >> V16I4CONST. i.e a symbolic constant. >> After this change, the vector is represented by: >> U4INTCONST 4 >> V16I4I4REPLICATE >> >> It is up to CG to determine how generate the code. Currently, the code >> would be very >> efficient if the element's value is 0. In this case, we need only one >> arithmetic instruction >> "pxor $xmm0, $xmm0". If element's value is non-zero, there are two options: >> a) >> - a.1) load integer to a scalar GPR, and >> - a.2) move the GPR a SIMD register, and >> - a.3) perform resuffle to replicate the element's value to the entire >> vector. >> >> b) save the vector as a symbolic constant, substitute the REPLICATE with >> load from the symbolic constant. >> >> b) is appealing for vector with short vector-length, say V16I8, and a) is >> desirable for vector like V16I1. However, we are blocked at step a.2) -- >> A SIMD register is categorized as fp register, we have hard time moving a int >> register to fp register. >> >> 3. Vectorizes loop with small trip count >> >> The original SIMD implementation set a hard trip-count limit for >> vectorization. >> This change is to try to vectorize any loop so long as the trip-count >= >> vector-len. >> e.g. Following loop can be vectorized now: >> >> for (int i=0; i < 2; i++) double_array[i] = (double)float_array[i]; >> >> TODO in the future: >> For the loop like following, SIMD still try to perform peeling in order >> to archieve better alignment. This is a deoptimization if the trip-count is >> too small. >> >> for (int i=0; i<4; i++) { a[i+1] = 0; } >> >> Code review by Mei Ye. >> >> >> Modified: trunk/osprey/be/cg/whirl2ops.cxx >> =================================================================== >> --- trunk/osprey/be/cg/whirl2ops.cxx 2011-05-05 02:52:50 UTC (rev 3585) >> +++ trunk/osprey/be/cg/whirl2ops.cxx 2011-05-05 23:15:18 UTC (rev 3586) >> @@ -4596,6 +4596,65 @@ >> >> void dump_op(const OP* op); >> >> +static TN* >> +Handle_Replicate (WN* expr, WN* parent, TN* result) { >> + >> + if (result == NULL) { result = Allocate_Result_TN(expr, NULL); } >> + >> + WN* elmt_val = WN_kid0(expr); >> + if (WN_operator (elmt_val) == OPR_INTCONST) { >> + >> + INT64 value = WN_const_val (elmt_val); >> + if (value == 0) { >> + Build_OP (TOP_pxor, result, result, result, &New_OPs); >> + return result; >> + } >> + >> + TYPE_ID elmt_mty = MTYPE_UNKNOWN; >> + TYPE_ID vect_mty = MTYPE_UNKNOWN; >> + switch (WN_opcode (expr)) { >> + case OPC_V16I8I8REPLICA: >> + elmt_mty = MTYPE_I8; vect_mty = MTYPE_V16I8; >> + break; >> + >> + case OPC_V16I4I4REPLICA: >> + elmt_mty = MTYPE_I4; vect_mty = MTYPE_V16I4; >> + break; >> + >> + case OPC_V16I2I2REPLICA: >> + elmt_mty = MTYPE_I2; vect_mty = MTYPE_V16I2; >> + break; >> + >> + case OPC_V16I1I1REPLICA: >> + elmt_mty = MTYPE_I1; vect_mty = MTYPE_V16I1; >> + break; >> + } >> + >> + if (elmt_mty != MTYPE_UNKNOWN) { >> + TCON elmt_tcon; >> + if (MTYPE_is_size_double (elmt_mty)) { >> + elmt_tcon = Host_To_Targ(MTYPE_I8, value); >> + } else { >> + elmt_tcon = Host_To_Targ(MTYPE_I4, value); >> + } >> + >> + TCON vect_tcon = Create_Simd_Const (vect_mty, elmt_tcon); >> + ST *vect_sym = >> + New_Const_Sym (Enter_tcon (vect_tcon), Be_Type_Tbl(vect_mty)); >> + Allocate_Object (vect_sym); >> + TN* vect_tn = Gen_Symbol_TN (vect_sym, 0, 0); >> + Exp_OP1 (OPCODE_make_op (OPR_CONST, vect_mty, MTYPE_V), >> + result, vect_tn, &New_OPs); >> + >> + return result; >> + } >> + } >> + TN* kid_tn = Expand_Expr (elmt_val, expr, NULL); >> + Expand_Replicate (WN_opcode(expr), result, kid_tn, &New_OPs); >> + >> + return result; >> +} >> + >> static TN* >> Handle_Fma_Operation(WN* expr, TN* result, WN *mul_wn, BOOL mul_kid0) >> { >> @@ -5278,6 +5337,9 @@ >> return Handle_Shift_Operation(expr, result); >> } >> #elif defined(TARG_X8664) >> + case OPR_REPLICATE: >> + return Handle_Replicate (expr, parent, result); >> + >> case OPR_SUB: >> case OPR_ADD: >> if ((CG_opt_level > 1) && Is_Target_Orochi() && >> >> Modified: trunk/osprey/be/lno/Makefile.gbase >> =================================================================== >> --- trunk/osprey/be/lno/Makefile.gbase 2011-05-05 02:52:50 UTC (rev 3585) >> +++ trunk/osprey/be/lno/Makefile.gbase 2011-05-05 23:15:18 UTC (rev 3586) >> @@ -294,6 +294,7 @@ >> >> ifeq ($(BUILD_TARGET), X8664) >> BE_LNOPT_NLX_CXX_SRCS += simd.cxx >> +BE_LNOPT_NLX_CXX_SRCS += simd_util.cxx >> endif >> >> BE_LNOPT_LX_CXX_SRCS = \ >> >> Modified: trunk/osprey/be/lno/simd.cxx >> =================================================================== >> --- trunk/osprey/be/lno/simd.cxx 2011-05-05 02:52:50 UTC (rev 3585) >> +++ trunk/osprey/be/lno/simd.cxx 2011-05-05 23:15:18 UTC (rev 3586) >> @@ -85,11 +85,15 @@ >> #include "data_layout.h" // for Stack_Alignment >> #include "cond.h" // for Guard_A_Do >> #include "config_opt.h" // for Align_Unsafe >> +#include "be_util.h" // for Current_PU_Count() >> #include "region_main.h" // for creating new region id. >> #include "lego_util.h" // for AWN_StidIntoSym, AWN_Add >> #include "minvariant.h" // for Minvariant_Removal >> #include "prompf.h" >> >> +#define simd_util_INCLUDED >> +#include "simd_util.h" >> + >> #define ABS(a) ((a<0)?-(a):(a)) >> >> BOOL debug; >> @@ -119,38 +123,28 @@ >> static void Simd_Mark_Code (WN* wn); >> >> static INT Last_Vectorizable_Loop_Id = 0; >> +SIMD_VECTOR_CONF Simd_vect_conf; >> >> -static BOOL Too_Few_Iterations(INT64 iters, WN *body) >> +// Return TRUE iff there are too few iterations to generate a single >> +// vectorized iteration. >> +// >> +// One interesting snippet to challenge this function is following: >> +// >> +// float f[]; double d[]; >> +// for (i = 0; i < 2; i++) { d[i] = (double)f[i]; } >> +// >> +// This func should not be folled by "f[i]". Currently, it is ok because >> +// "(double)f[i]" instead of "f[i]" is considered as vectorizable expr. >> +// >> +static BOOL Too_Few_Iterations (WN* loop, SCALAR_REF_STACK* vect_exprs) >> { >> - UINT32 iter_threshold = Iteration_Count_Threshold; >> - if(LNO_Iter_threshold) >> - iter_threshold = LNO_Iter_threshold; >> + DO_LOOP_INFO *dli = Get_Do_Loop_Info (loop); >> + if (dli->Est_Num_Iterations >= Simd_vect_conf.Get_Vect_Byte_Size ()) >> + return FALSE; >> >> - if(iters < iter_threshold) //watch performance >> - return TRUE; >> - if(iters >= 16) //should always be fine, not too few >> - return FALSE; >> - //bug 12056: no matter what Iteration_Count_Threshold is, we should >> - // make sure at least one iter of the vectorized version >> - for(WN *stmt = WN_first(body); stmt; stmt = WN_next(stmt)){ >> - switch(WN_desc(stmt)){ >> - case MTYPE_I1: case MTYPE_U1: >> - return TRUE; >> - case MTYPE_I2: case MTYPE_U2: >> - if(iters < 8) >> - return TRUE; >> - break; >> - case MTYPE_I4: case MTYPE_U4: case MTYPE_F4: >> - if(iters < 4) >> - return TRUE; >> - break; >> - case MTYPE_I8: case MTYPE_U8: case MTYPE_F8: case MTYPE_C4: >> - if(iters < 2) >> - return TRUE; >> - break; >> - }//end switch; >> - }//end for >> - return FALSE; >> + SIMD_EXPR_MGR expr_mgr (loop, &SIMD_default_pool); >> + expr_mgr.Convert_From_Lagacy_Expr_List (vect_exprs); >> + return expr_mgr.Get_Max_Vect_Len () > dli->Est_Num_Iterations; >> } >> >> // Bug 10136: use a stack to count the number of different >> @@ -265,26 +259,27 @@ >> case OPR_SUB: >> return TRUE; >> case OPR_MPY: >> - if (rtype == MTYPE_F8 || rtype == MTYPE_F4 || >> -#ifdef TARG_X8664 >> - ((rtype == MTYPE_C4 || rtype == MTYPE_C8) && Is_Target_SSE3()) || >> -#endif >> - // I2MPY followed by I2STID is actually I4MPY followed by I2STID >> - // We will distinguish between I4MPY and I2MPY in Is_Well_Formed_Simd >> - rtype == MTYPE_I4) >> + if (rtype == MTYPE_F8 || rtype == MTYPE_F4) >> return TRUE; >> - else >> - return FALSE; >> + else if (rtype == MTYPE_I4) { >> + // I2MPY followed by I2STID is actually I4MPY followed by I2STID >> + // We will distinguish between I4MPY and I2MPY in >> Is_Well_Formed_Simd >> + return TRUE; >> + } else if (Simd_vect_conf.Is_SSE3() && >> + (rtype == MTYPE_C4 || rtype == MTYPE_C8)) { >> + // TODO: explain why requires SSE3 >> + return TRUE; >> + } >> + >> + return FALSE; >> + >> case OPR_DIV: >> - // Look at icc >> - if (rtype == MTYPE_F8 || rtype == MTYPE_F4 >> -#ifdef TARG_X8664 >> - || (rtype == MTYPE_C4 && Is_Target_SSE3()) >> -#endif >> - ) >> + if (rtype == MTYPE_F8 || rtype == MTYPE_F4 || >> + (rtype == MTYPE_C4 && Simd_vect_conf.Is_SSE3())) >> return TRUE; >> else >> return FALSE; >> + >> case OPR_MAX: >> case OPR_MIN: >> if (rtype == MTYPE_F4 || rtype == MTYPE_F8 || rtype == MTYPE_I4) >> @@ -304,7 +299,6 @@ >> else >> return FALSE; >> case OPR_RSQRT: >> -//case OPR_RECIP: >> #ifdef TARG_X8664 >> case OPR_ATOMIC_RSQRT: >> #endif >> @@ -2769,12 +2763,6 @@ >> return FALSE; >> } >> >> - //if there are too few iterations, we will not vectorize >> - if(Too_Few_Iterations(dli->Est_Num_Iterations, WN_do_body(innerloop))){ >> - sprintf(verbose_msg, "Loop has too few iterations."); >> - return FALSE; >> - } >> - >> // Bug 3784 >> // Check for useless loops (STID's use_list is empty) of the form >> // do i >> @@ -2832,12 +2820,10 @@ >> WN_operator(enclosing_parallel_region) != OPR_REGION) >> enclosing_parallel_region = >> LWN_Get_Parent(enclosing_parallel_region); >> -#ifdef KEY >> if (PU_cxx_lang(Get_Current_PU()) && >> Is_Eh_Or_Try_Region(enclosing_parallel_region)) >> enclosing_parallel_region = >> LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region)); >> -#endif >> FmtAssert(enclosing_parallel_region, ("NYI")); >> region_pragma = WN_first(WN_region_pragmas(enclosing_parallel_region)); >> while(region_pragma && (!reduction || !pdo)) { >> @@ -3119,12 +3105,10 @@ >> WN_operator(enclosing_parallel_region) != OPR_REGION) >> enclosing_parallel_region = >> LWN_Get_Parent(enclosing_parallel_region); >> -#ifdef KEY >> if (PU_cxx_lang(Get_Current_PU()) && >> Is_Eh_Or_Try_Region(enclosing_parallel_region)) >> enclosing_parallel_region = >> LWN_Get_Parent(LWN_Get_Parent(enclosing_parallel_region)); >> -#endif >> WN *stmt_before_region = WN_prev(enclosing_parallel_region); >> FmtAssert(stmt_before_region, ("NYI")); >> WN *parent_block = LWN_Get_Parent(enclosing_parallel_region); >> @@ -3177,6 +3161,11 @@ >> return FALSE; >> } >> >> + if (Too_Few_Iterations (innerloop, simd_ops)) { >> + sprintf(verbose_msg, "Too few iterations."); >> + return FALSE; >> + } >> + >> //WHETHER scalar expansion is required >> for(stmt=WN_first(body); stmt && curr_simd_red_manager; >> stmt=WN_next(stmt)){ >> if (WN_operator(stmt) == OPR_STID && >> @@ -4087,6 +4076,19 @@ >> // second argument is a constant it can be placed in a 1 byte immediate if >> it fits. >> // But the first option has been chosen because it fits easier with the >> existing framework. >> >> +static WN* Simd_Vectorize_Shift_Left_Amt (WN* const_wn, >> + WN *istore, //parent of simd_op >> + WN *simd_op) //const_wn's parent >> +{ >> + Is_True (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn, >> + ("input WN isn't SHL")); >> + >> + WN* shift_amt = WN_Intconst (MTYPE_I8, WN_const_val (const_wn)); >> + WN* res = LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, >> MTYPE_I8), >> + shift_amt); >> + return res; >> +} >> + >> static WN *Simd_Vectorize_Constants(WN *const_wn,//to be vectorized >> WN *istore, //parent of simd_op >> WN *simd_op) //const_wn's parent >> @@ -4094,6 +4096,10 @@ >> FmtAssert(const_wn && (WN_operator(const_wn)==OPR_INTCONST || >> WN_operator(const_wn)==OPR_CONST),("not a constant operand")); >> >> + if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == const_wn) { >> + return Simd_Vectorize_Shift_Left_Amt (const_wn, istore, simd_op); >> + } >> + >> TYPE_ID type; >> TCON tcon; >> ST *sym; >> @@ -4110,17 +4116,9 @@ >> WN_intrinsic(istore) == INTRN_SUBSU2) { >> type = WN_desc(LWN_Get_Parent(istore)); >> } >> - if (!MTYPE_is_float(type)){ >> - if (MTYPE_is_size_double(type)){ >> - INT64 value = (INT64)WN_const_val(const_wn); >> - tcon = Host_To_Targ(MTYPE_I8, value); >> - } else { >> - INT value = (INT)WN_const_val(const_wn); >> - tcon = Host_To_Targ(MTYPE_I4, value); >> - } >> - sym = New_Const_Sym (Enter_tcon (tcon), >> - Be_Type_Tbl(type)); >> - } >> + >> + WN* orig_const_wn = const_wn; >> + >> switch (type) { >> case MTYPE_F4: case MTYPE_V16F4: >> WN_set_rtype(const_wn, MTYPE_V16F4); >> @@ -4131,27 +4129,34 @@ >> case MTYPE_C4: case MTYPE_V16C4: >> WN_set_rtype(const_wn, MTYPE_V16C4); >> break; >> + >> case MTYPE_U1: case MTYPE_I1: case MTYPE_V16I1: >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I1, MTYPE_V, sym); >> + const_wn = >> + LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I1, >> MTYPE_I1), >> + orig_const_wn); >> break; >> + >> case MTYPE_U2: case MTYPE_I2: case MTYPE_V16I2: >> - if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == >> const_wn) >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym); >> - else >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I2, MTYPE_V, sym); >> + const_wn = >> + LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I2, >> MTYPE_I2), >> + orig_const_wn); >> break; >> + >> case MTYPE_U4: case MTYPE_I4: case MTYPE_V16I4: >> - if (WN_operator(simd_op) == OPR_SHL && WN_kid1(simd_op) == >> const_wn) >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym); >> - else >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I4, MTYPE_V, sym); >> + const_wn = >> + LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I4, >> MTYPE_I4), >> + orig_const_wn); >> break; >> + >> case MTYPE_U8: case MTYPE_I8: case MTYPE_V16I8: >> - const_wn = WN_CreateConst (OPR_CONST, MTYPE_V16I8, MTYPE_V, sym); >> + const_wn = >> + LWN_CreateExp1 (OPCODE_make_op(OPR_REPLICATE, MTYPE_V16I8, >> MTYPE_I8), >> + orig_const_wn); >> break; >> - }//end switch >> + >> + } // end switch >> >> - return const_wn; >> + return const_wn; >> } >> >> static WN *Simd_Vectorize_Invariants(WN *inv_wn, >> @@ -5342,8 +5347,9 @@ >> // Vectorize an innerloop >> static INT Simd(WN* innerloop) >> { >> -// Don't do anything for now for non-x8664 >> -#ifdef TARG_X8664 >> + if (!Simd_vect_conf.Arch_Has_Vect ()) >> + return 0; >> + >> INT good_vector = 0; >> >> //pre_analysis to filter out loops that can not be vectorized >> @@ -5360,8 +5366,12 @@ >> Last_Vectorizable_Loop_Id ++; >> if (Last_Vectorizable_Loop_Id < LNO_Simd_Loop_Skip_Before || >> Last_Vectorizable_Loop_Id > LNO_Simd_Loop_Skip_After || >> - Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal) >> + Last_Vectorizable_Loop_Id == LNO_Simd_Loop_Skip_Equal) { >> + fprintf (stderr, "SIMD: loop (%s:%d) of PU:%d is skipped\n", >> + Src_File_Name, Srcpos_To_Line(WN_Get_Linenum(innerloop)), >> + Current_PU_Count ()); >> return 0; >> + } >> } >> >> MEM_POOL_Push(&SIMD_default_pool); >> @@ -5587,10 +5597,6 @@ >> } >> >> return 1; >> -#else >> - return 0; >> -#endif // TARG_X8664 >> - >> } >> >> static void Simd_Walk(WN* wn) { >> >> Added: trunk/osprey/be/lno/simd_util.cxx >> =================================================================== >> --- trunk/osprey/be/lno/simd_util.cxx (rev 0) >> +++ trunk/osprey/be/lno/simd_util.cxx 2011-05-05 23:15:18 UTC (rev 3586) >> @@ -0,0 +1,75 @@ >> +/* >> + Copyright (C) 2010 Advanced Micro Devices, Inc. All Rights Reserved. >> + >> + Open64 is free software; you can redistribute it and/or modify it >> + under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 2 of the License, >> + or (at your option) any later version. >> + >> + Open64 is distributed in the hope that it will be useful, but >> + WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program; if not, write to the Free Software >> + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA >> + 02110-1301, USA. >> +*/ >> + >> +#include "defs.h" >> +#include "glob.h" >> +#include "wn.h" >> +#include "cxx_memory.h" >> +#include "lwn_util.h" >> +#include "ff_utils.h" >> + >> +#define simd_util_INCLUDED >> +#include "simd_util.h" >> + >> +///////////////////////////////////////////////////////////////////////////// >> +// >> +// Implementation of SIMD_EXPR >> +// >> +///////////////////////////////////////////////////////////////////////////// >> +// >> +SIMD_EXPR::SIMD_EXPR (WN* expr) { >> + _expr= expr; >> + >> + _elem_sz = MTYPE_byte_size (WN_rtype (expr)); >> + _vect_len = Simd_vect_conf.Get_Vect_Len_Given_Elem_Ty (WN_rtype(expr)); >> + >> + _mis_align = -1; >> + _is_invar = FALSE; >> +} >> + >> +///////////////////////////////////////////////////////////////////////////// >> +// >> +// Implementation of SIMD_EXPR_MGR >> +// >> +///////////////////////////////////////////////////////////////////////////// >> +// >> +SIMD_EXPR_MGR::SIMD_EXPR_MGR (WN* loop, MEM_POOL* mp): >> + _loop(loop), _mp(mp), _exprs(mp) { >> + >> + _min_vect_len = _max_vect_len = 0; >> +} >> + >> +void >> +SIMD_EXPR_MGR::Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK* simd_ops) { >> + >> + Is_True (_exprs.empty (), ("expr is not empty")); >> + >> + _min_vect_len = Simd_vect_conf.Get_Vect_Byte_Size (); >> + _max_vect_len = 0; >> + >> + for (INT i=0, elem_cnt = simd_ops->Elements(); i<elem_cnt; i++) { >> + WN* wn_expr = simd_ops->Top_nth(i).Wn; >> + SIMD_EXPR* expr = CXX_NEW (SIMD_EXPR (wn_expr), _mp); >> + >> + _exprs.push_back (expr); >> + INT vec_len = expr->Get_Vect_Len (); >> + _min_vect_len = MIN(vec_len, _min_vect_len); >> + _max_vect_len = MAX(vec_len, _max_vect_len); >> + } >> +} >> >> Added: trunk/osprey/be/lno/simd_util.h >> =================================================================== >> --- trunk/osprey/be/lno/simd_util.h (rev 0) >> +++ trunk/osprey/be/lno/simd_util.h 2011-05-05 23:15:18 UTC (rev 3586) >> @@ -0,0 +1,196 @@ >> +/* >> + Copyright (C) 2010 Advanced Micro Devices, Inc. All Rights Reserved. >> + >> + Open64 is free software; you can redistribute it and/or modify it >> + under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 2 of the License, >> + or (at your option) any later version. >> + >> + Open64 is distributed in the hope that it will be useful, but >> + WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program; if not, write to the Free Software >> + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA >> + 02110-1301, USA. >> +*/ >> + >> +#ifndef simd_util_INCLUDED >> + #error simd_util.h is for internal use only >> +#endif >> + >> +#include <list> >> + >> +// Forward declaration >> +// >> +class SIMD_EXPR; >> +class SIMD_EXPR_MGR; >> +class SIMD_VECTOR_CONF_BASE; >> +class SIMD_VECTOR_CONF; >> + >> +///////////////////////////////////////////////////////////////////////////////// >> +// >> +// Arch specific stuff are encapsulated by SIMD_VECTOR_CONF_BASE and >> +// SIMD_VECTOR_CONF. >> +// >> +// TODO: it would be better to place these stuff in a separate header file >> +// >> +///////////////////////////////////////////////////////////////////////////////// >> +// >> +class SIMD_VECTOR_CONF_BASE { >> +public: >> + // Does H.W support vectorization >> + BOOL Arch_Has_Vect (void) const { return FALSE; } >> + >> + // About SSE >> + // >> + BOOL Is_SSE_Family (void) const { return FALSE; } >> + BOOL Is_MMX (void) const { return FALSE; } >> + BOOL Is_SSE (void) const { return FALSE; } >> + BOOL Is_SSE2 (void) const { return FALSE; } >> + BOOL Is_SSE3 (void) const { return FALSE; } >> + BOOL Is_SSE4a (void) const { return FALSE; } >> + BOOL Is_SSSE3 (void) const { return FALSE; } >> + BOOL Is_SSE41 (void) const { return FALSE; } >> + BOOL Is_SSE42 (void) const { return FALSE; } >> + >> + INT Get_Vect_Byte_Size (void) const { return -1; } >> + INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID) const { -1; } >> +}; >> + >> +#ifdef TARG_X8664 >> + >> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE { >> +public: >> + BOOL Arch_Has_Vect (void) const { return TRUE; } >> + >> + BOOL Is_MMX (void) const { return Is_Target_MMX (); } >> + BOOL Is_SSE (void) const { return Is_Target_SSE (); } >> + BOOL Is_SSE2 (void) const { return Is_Target_SSE2 (); } >> + BOOL Is_SSE3 (void) const { return Is_Target_SSE3 (); } >> + BOOL Is_SSE4a (void) const { return Is_Target_SSE4a (); } >> + BOOL Is_SSSE3 (void) const { return Is_Target_SSSE3 (); } >> + BOOL Is_SSE41 (void) const { return Is_Target_SSE41 (); } >> + BOOL Is_SSE42 (void) const { return Is_Target_SSE42 (); } >> + BOOL Is_SSE_Family (void) const { >> + return Is_SSE () || Is_SSE2 () || Is_SSE3 () || >> + Is_SSE4a () || Is_SSSE3 () || Is_SSE41 () || >> + Is_SSE42 (); >> + } >> + >> + INT Get_Vect_Byte_Size (void) const { return 16; } >> + INT Get_Vect_Len_Given_Elem_Ty (TYPE_ID t) const >> + { return 16/MTYPE_byte_size(t);} >> +}; >> + >> +#else >> + >> +class SIMD_VECTOR_CONF : public SIMD_VECTOR_CONF_BASE; >> + >> +#endif >> + >> +extern SIMD_VECTOR_CONF Simd_vect_conf; >> + >> +///////////////////////////////////////////////////////////////////////////////// >> +// >> +// First of all, SIMD_EXPR is a container hosting vectorization related >> +// informations. Among all these information, some can be derived directly >> from >> +// the given WN expression itself; some need context. For instance, in >> +// the following snippet, the vectorizable expression "(x * (INT32)sa2[i])" >> doesn't >> +// need to have 32 significant bits. However, the expression per se cannot >> reveal >> +// this info, but the "contex" will help. >> +// >> +// INT16 sa1[], sa2[]; INT32 x; >> +// for (i = 0; i < N; i++) { sa1[i] = (INT16)(x * (INT32)sa2[i]) >> +// >> +// Since a SIMD_EXPR is not aware of the "context" it is in, it has to >> "derive" >> +// information blindly, and imprecisely. The objects who have better >> knowledge >> +// of the context should correct them properly. >> +// >> +// Second, SIMD_EXPR is responsible for physically converting its >> corresponding >> +// scalar expression into vectorized form. >> +// >> +////////////////////////////////////////////////////////////////////////////////// >> +// >> +class SIMD_EXPR { >> +public: >> + friend class SIMD_EXPR_MGR; >> + >> + INT32 Get_Misalignment (void) { Is_True (FALSE, ("TBD")); return -1; } >> + >> + INT32 Get_Vect_Len (void) const { return _vect_len; } >> + INT32 Get_Vect_Elem_Byte_Sz (void) const { return _elem_sz; } >> + >> + BOOL Is_Invar (void) const { return _is_invar; } >> + WN* Get_Wn (void) const { return _expr; } >> + >> +private: >> + SIMD_EXPR (WN* expr); >> + >> + void Set_Elem_Sz (INT sz); >> + >> + WN* _expr; >> + >> + INT16 _vect_len; >> + INT16 _elem_sz; >> + INT16 _mis_align; >> + >> + BOOL _is_invar; >> +}; >> + >> +typedef mempool_allocator<SIMD_EXPR*> SIMD_EXPR_ALLOC; >> +typedef std::list<SIMD_EXPR*, SIMD_EXPR_ALLOC> SIMD_EXPR_LIST; >> + >> + >> +////////////////////////////////////////////////////////////////////////////// >> +// >> +// SIMD_EXPR_MGR is to manage all SIMD_EXPRs of the loop being vectorized. >> +// Its duty includes: >> +// >> +// - identify vectorizable expressions. >> +// - allocate/free a SIMD_EXPR. >> +// - collect statistical information of the SIMD_EXPRs under management >> +// >> +///////////////////////////////////////////////////////////////////////////// >> +// >> +class SIMD_EXPR_MGR { >> +public: >> + SIMD_EXPR_MGR (WN* loop, MEM_POOL*); >> + const SIMD_EXPR_LIST& Get_Expr_List (void) const { return _exprs; } >> + >> + // This func is provided for the time being. >> + // >> + void Convert_From_Lagacy_Expr_List (SCALAR_REF_STACK*); >> + >> + inline UINT Get_Max_Vect_Len (void) const; >> + inline UINT Get_Min_Vect_Len (void) const; >> + >> +private: >> + MEM_POOL* _mp; >> + WN* _loop; >> + SIMD_EXPR_LIST _exprs; >> + >> + UINT16 _min_vect_len; >> + UINT16 _max_vect_len; >> +}; >> + >> + >> +////////////////////////////////////////////////////////////////////////////// >> +// >> +// Inline functions are defined here >> +// >> +////////////////////////////////////////////////////////////////////////////// >> +// >> +inline UINT >> +SIMD_EXPR_MGR::Get_Max_Vect_Len (void) const { >> + Is_True (_max_vect_len != 0, ("_max_vect_len isn't set properly")); >> + return _max_vect_len; >> +} >> + >> +inline UINT >> +SIMD_EXPR_MGR::Get_Min_Vect_Len (void) const { >> + Is_True (_min_vect_len != 0, ("_min_vect_len isn't set properly")); >> + return _min_vect_len; >> +} >> >> >> ------------------------------------------------------------------------------ >> WhatsUp Gold - Download Free Network Management Software >> The most intuitive, comprehensive, and cost-effective network >> management toolset available today. Delivers lowest initial >> acquisition cost and overall TCO of any competing solution. >> http://p.sf.net/sfu/whatsupgold-sd >> _______________________________________________ >> Open64-devel mailing list >> Open64-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/open64-devel >> > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > _______________________________________________ > Open64-devel mailing list > Open64-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/open64-devel > > > ------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ Open64-devel mailing list Open64-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/open64-devel