date:20170622

Re: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

2017-06-22 Thread Yang, Rong R

Pushed, thanks.

> -Original Message-
> From: Pan, Xiuli
> Sent: Thursday, June 22, 2017 14:54
> To: Yang, Rong R ; beignet@lists.freedesktop.org
> Cc: Yang, Rong R 
> Subject: RE: [Beignet] [PATCH] GBE: clean llvm module's clone and release.
> 
> LGTM.
> 
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Yang Rong
> Sent: Thursday, June 22, 2017 14:04
> To: beignet@lists.freedesktop.org
> Cc: Yang, Rong R 
> Subject: [Beignet] [PATCH] GBE: clean llvm module's clone and release.
> 
> There are some changes:
> 1. Clone the module before call LLVMLinkModules2, remove other clones for
> it.
> 2. Don't delete module in function llvmToGen.
> 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM
> and buildFromLLVMModule only handle llvm module. Actually,
> programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel,
> and I think it is useless, maybe we could delete it at all.
> 
> Signed-off-by: Yang Rong 
> ---
>  backend/src/backend/gen_program.cpp|  5 +-
>  backend/src/backend/program.cpp| 84 +--
> ---
>  backend/src/backend/program.h  | 10 +++-
>  backend/src/backend/program.hpp|  4 +-
>  backend/src/llvm/llvm_bitcode_link.cpp |  3 +-
>  backend/src/llvm/llvm_to_gen.cpp   | 19 +---
>  backend/src/llvm/llvm_to_gen.hpp   |  2 +-
>  src/cl_gbe_loader.cpp  |  5 ++
>  src/cl_gbe_loader.h|  1 +
>  src/cl_program.c   |  2 +-
>  10 files changed, 77 insertions(+), 58 deletions(-)
> 
> diff --git a/backend/src/backend/gen_program.cpp
> b/backend/src/backend/gen_program.cpp
> index cfb23fe..bb1d22f 100644
> --- a/backend/src/backend/gen_program.cpp
> +++ b/backend/src/backend/gen_program.cpp
> @@ -455,7 +455,6 @@ namespace gbe {
>}
> 
>static gbe_program genProgramNewFromLLVM(uint32_t deviceID,
> -   const char *fileName,
> const void* module,
> const void* llvm_ctx,
> const char* asm_file_name, @@ 
> -475,7 +474,7 @@
> namespace gbe {  #ifdef GBE_COMPILER_AVAILABLE
>  std::string error;
>  // Try to compile the program
> -if (program->buildFromLLVMFile(fileName, module, error, optLevel) ==
> false) {
> +if (program->buildFromLLVMModule(module, error, optLevel) == false)
> + {
>if (err != NULL && errSize != NULL && stringSize > 0u) {
>  const size_t msgSize = std::min(error.size(), stringSize-1u);
>  std::memcpy(err, error.c_str(), msgSize); @@ -598,7 +597,7 @@
> namespace gbe {
>  acquireLLVMContextLock();
>  llvm::Module* module = (llvm::Module*)p->module;
> 
> -if (p->buildFromLLVMFile(NULL, module, error, optLevel) == false) {
> +if (p->buildFromLLVMModule(module, error, optLevel) == false) {
>if (err != NULL && errSize != NULL && stringSize > 0u) {
>  const size_t msgSize = std::min(error.size(), stringSize-1u);
>  std::memcpy(err, error.c_str(), msgSize); diff --git
> a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp
> index 724058c..740c5c2 100644
> --- a/backend/src/backend/program.cpp
> +++ b/backend/src/backend/program.cpp
> @@ -40,6 +40,7 @@
>  #include "llvm/Support/ManagedStatic.h"
>  #include "llvm/Transforms/Utils/Cloning.h"
>  #include "llvm/IR/LLVMContext.h"
> +#include "llvm/IRReader/IRReader.h"
>  #endif
> 
>  #include 
> @@ -113,32 +114,17 @@ namespace gbe {
>IVAR(OCL_PROFILING_LOG, 0, 0, 1); // Int for different profiling types.
>BVAR(OCL_OUTPUT_BUILD_LOG, false);
> 
> -  bool Program::buildFromLLVMFile(const char *fileName,
> - const void* module,
> - std::string ,
> - int optLevel) {
> +  bool Program::buildFromLLVMModule(const void* module,
> +  std::string ,
> +  int optLevel) {
>  ir::Unit *unit = new ir::Unit();
> -llvm::Module * cloned_module = NULL;
>  bool ret = false;
> -if(module){
> -#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
> -  cloned_module = llvm::CloneModule((llvm::Module*)module).release();
> -#else
> -  cloned_module = llvm::CloneModule((llvm::Module*)module);
> -#endif
> -}
> +
>  bool strictMath = true;
>  if (fast_relaxed_math || !OCL_STRICT_CONFORMANCE)
>strictMath = false;
> -#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39
> -llvm::Module * linked_module = module ?
> llvm::CloneModule((llvm::Module*)module).release() : NULL;
> -// Src now will be removed automatically. So clone it.
> -if (llvmToGen(*unit,

Re: [Beignet] [PATCH V4] backend: add global immediate optimization

2017-06-22 Thread Yang, Rong R

Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Thursday, June 22, 2017 14:30
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH V4] backend: add global immediate
> optimization
> 
> LGTM
> 
> Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander.wang
> > Sent: Wednesday, June 14, 2017 1:56 PM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH V4] backend: add global immediate
> > optimization
> >
> > there are some global immediates in global var list of LLVM.
> > these imm can be integrated in instructions. for
> > compiler_global_immediate_optimized test
> > in utest, there are two global immediates:
> > L0:
> > MOV(1)  %42<0>:UD   :   0x0:UD
> > MOV(1)  %43<0>:UD   :   0x30:UD
> >
> > used by:
> > ADD(16) %49<1>:D:   %42<0,1,0>:D
> > %48<8,8,1>:D
> > ADD(16) %54<1>:D:   %43<0,1,0>:D
> > %53<8,8,1>:D
> >
> > it can be
> > ADD(16) %49<1>:D:   %48<8,8,1>:D   0x0:UD
> > ADD(16) %54<1>:D:   %53<8,8,1>:D   0x30:UD
> >
> > Then the MOV can be removed. And after this optimization, ADD 0
> can
> > be change
> > to MOV, then local copy propagation can be done.
> >
> > V2: (1) add environment variable to enable/disable the optimization
> > (2) refine the architecture of imm optimization, inherit from global
> > optimizer not local block optimizer
> >
> > V3: merge with latest master driver
> >
> > V4: (1)refine some type errors
> > (2)remove UD/D check for no need
> > (3)refine imm calculate for UD/D
> >
> > Signed-off-by: rander.wang 
> > ---
> >  .../src/backend/gen_insn_selection_optimize.cpp| 367
> > +++--
> >  1 file changed, 342 insertions(+), 25 deletions(-)
> >
> > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > index 07547ec..eb93a20 100644
> > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > @@ -40,6 +40,33 @@ namespace gbe
> >  return elements;
> >}
> >
> > +  class ReplaceInfo
> > +  {
> > +  public:
> > +ReplaceInfo(SelectionInstruction ,
> > +const GenRegister ,
> > +const GenRegister ) : insn(insn),
> > + intermedia(intermedia),
> > replacement(replacement)
> > +{
> > +  assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> > +  assert(&(insn.dst(0)) == );
> > +  this->elements = CalculateElements(intermedia, insn.state.execWidth);
> > +  replacementOverwritten = false;
> > +}
> > +~ReplaceInfo()
> > +{
> > +  this->toBeReplaceds.clear();
> > +}
> > +
> > +SelectionInstruction 
> > +const GenRegister 
> > +uint32_t elements;
> > +const GenRegister 
> > +set toBeReplaceds;
> > +set toBeReplacedInsns;
> > +bool replacementOverwritten;
> > +GBE_CLASS(ReplaceInfo);
> > +  };
> > +
> >class SelOptimizer
> >{
> >public:
> > @@ -66,32 +93,7 @@ namespace gbe
> >
> >private:
> >  // local copy propagation
> > -class ReplaceInfo
> > -{
> > -public:
> > -  ReplaceInfo(SelectionInstruction& insn,
> > -  const GenRegister& intermedia,
> > -  const GenRegister& replacement) :
> > -  insn(insn), intermedia(intermedia), 
> > replacement(replacement)
> > -  {
> > -assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> > -assert(&(insn.dst(0)) == );
> > -this->elements = CalculateElements(intermedia,
> insn.state.execWidth);
> > -replacementOverwritten = false;
> > -  }
> > -  ~ReplaceInfo()
> > -  {
> > -this->toBeReplaceds.clear();
> > -  }
> >
> > -  SelectionInstruction& insn;
> > -  const GenRegister& intermedia;
> > -  uint32_t elements;
> > -  const GenRegister& replacement;
> > -  set toBeReplaceds;
> > -  bool replacementOverwritten;
> > -  GBE_CLASS(ReplaceInfo);
> > -};
> >  typedef map ReplaceInfoMap;
> >  ReplaceInfoMap replaceInfoMap;
> >  void doLocalCopyPropagation();
> > @@ -298,13 +300,328 @@ namespace gbe
> >  virtual void run();
> >};
> >
> > +  class SelGlobalImmMovOpt : public SelGlobalOptimizer  {
> > +  public:
> > +SelGlobalImmMovOpt(const GenContext& ctx, uint32_t

Re: [Beignet] [PATCH V2] backend: refine load/store merging algorithm

2017-06-22 Thread Yang, Rong R

Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Thursday, June 22, 2017 14:29
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH V2] backend: refine load/store merging
> algorithm
> 
> LGTM
> 
> Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander.wang
> > Sent: Friday, June 16, 2017 9:50 AM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH V2] backend: refine load/store merging
> > algorithm
> >
> > Now it works for sequence: load(0), load(1), load(2)
> > but it cant work for load(2), load(0), load(1). because
> > it compared the last merged load and the new one not all
> > the loads
> >
> > for  sequence: load(0), load(1), load(2). the load(0) is the
> > start, can find that load(1) is successor without space, so
> > put it to a merge fifo. then the start is moving to the top
> > of fifo load(1), and compared with load(2). Also load(2) can
> > be merged
> >
> > for load(2), load(0), load(1). load(2) cant be merged with
> > load(0) for a space between them. So skip load(0) and mov to next
> > load(1).And this load(1) can be merged. But it never go back merge
> > load(0)
> >
> > Now change the algorithm.
> > (1) find all loads maybe merged arround the start by the distance to
> > the start. the distance is depended on data type, for 32bit data, 
> > the
> > distance is 4. Put them in a list
> >
> > (2) sort the list by the distance from the start.
> >
> > (3) search the continuous sequence including the start to merge
> >
> > V2: (1)refine the sort and compare algoritm. First find all the IO
> >in small offset compared to start. Then call std:sort
> > (2)check the number of candidate IO to be favorable to 
> > performance
> >for most cases there is no chance to merge IO
> >
> > Signed-off-by: rander.wang 
> > ---
> >  backend/src/llvm/llvm_loadstore_optimization.cpp | 87
> > +---
> >  1 file changed, 78 insertions(+), 9 deletions(-)
> >
> > diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> > b/backend/src/llvm/llvm_loadstore_optimization.cpp
> > index 5aa38be..c91c1a0 100644
> > --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> > +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> > @@ -67,7 +67,7 @@ namespace gbe {
> >  bool isSimpleLoadStore(Value *I);
> >  bool optimizeLoadStore(BasicBlock );
> >
> > -bool isLoadStoreCompatible(Value *A, Value *B);
> > +bool isLoadStoreCompatible(Value *A, Value *B, int *dist, int*
> elementSize,
> > int maxVecSize);
> >  void mergeLoad(BasicBlock , SmallVector
> );
> >  void mergeStore(BasicBlock , SmallVector
> );
> >  bool findConsecutiveAccess(BasicBlock ,
> > @@ -109,7 +109,7 @@ namespace gbe {
> >  return NULL;
> >}
> >
> > -  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A,
> > Value *B) {
> > +  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A,
> > + Value *B,
> > int *dist, int* elementSize, int maxVecSize) {
> >  Value *ptrA = getPointerOperand(A);
> >  Value *ptrB = getPointerOperand(B);
> >  unsigned ASA = getAddressSpace(A); @@ -136,7 +136,11 @@
> namespace
> > gbe {
> >  // The Instructions are connsecutive if the size of the first 
> > load/store is
> >  // the same as the offset.
> >  int64_t sz = TD->getTypeStoreSize(Ty);
> > -return ((-offset) == sz);
> > +*dist = -offset;
> > +*elementSize = sz;
> > +
> > +//a insn with small distance from the search load/store is a candidate
> one
> > +return (abs(-offset) < sz*maxVecSize);
> >}
> >
> >void GenLoadStoreOptimization::mergeLoad(BasicBlock ,
> > SmallVector ) { @@ -163,6 +167,25 @@
> > namespace gbe {
> >values[i]->replaceAllUsesWith(S);
> >  }
> >}
> > +
> > +  class mergedInfo{
> > +public:
> > +Instruction* mInsn;
> > +int mOffset;
> > +
> > +void init(Instruction* insn, int offset)
> > +{
> > +  mInsn = insn;
> > +  mOffset = offset;
> > +}
> > +  };
> > +
> > +  struct offsetSorter {
> > +bool operator()(mergedInfo* m0, mergedInfo* m1) const {
> > +return m0->mOffset < m1->mOffset;
> > +}
> > +  };
> > +
> >// When searching for consecutive memory access, we do it in a small
> window,
> >// if the window is too large, it would take up too much compiling time.
> >// An Important rule we have followed is don't try to change load/store
> order.
> > @@ -177,7 +200,6 @@ namespace gbe {
> >
> >

[Beignet] [PATCH] Runtime: refine max group size for SKL & KBL

2017-06-22 Thread rander.wang

Now change max group size to 256. it is a reasonable
size for Gen9. According to performance test, 256 make
good progress in openCV and no regression. So change it

Signed-off-by: rander.wang 
---
 src/cl_device_id.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/cl_device_id.c b/src/cl_device_id.c
index 6cba2b5..5ea13a9 100644
--- a/src/cl_device_id.c
+++ b/src/cl_device_id.c
@@ -149,7 +149,7 @@ static struct _cl_device_id intel_skl_gt1_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 2,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -159,7 +159,7 @@ static struct _cl_device_id intel_skl_gt2_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 3,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -169,7 +169,7 @@ static struct _cl_device_id intel_skl_gt3_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 6,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -179,7 +179,7 @@ static struct _cl_device_id intel_skl_gt4_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 9,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -209,7 +209,7 @@ static struct _cl_device_id intel_kbl_gt1_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 2,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -219,7 +219,7 @@ static struct _cl_device_id intel_kbl_gt15_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 3,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -229,7 +229,7 @@ static struct _cl_device_id intel_kbl_gt2_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 3,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -239,7 +239,7 @@ static struct _cl_device_id intel_kbl_gt3_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 6,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
@@ -249,7 +249,7 @@ static struct _cl_device_id intel_kbl_gt4_device = {
   .max_thread_per_unit = 7,
   .sub_slice_count = 9,
   .max_work_item_sizes = {512, 512, 512},
-  .max_work_group_size = 512,
+  .max_work_group_size = 256,
   .max_clock_frequency = 1000,
 #include "cl_gen9_device.h"
 };
-- 
2.7.4

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

[Beignet] [PATCH] backend: refine pow function

2017-06-22 Thread rander.wang

Now save 40% time than before
(1) group many branches which deal with corner case  to one branch.
(2) using HW exp2 and log2 to replace some instructions

pass conformance tests and utest

Signed-off-by: rander.wang 
---
 backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 294 
 1 file changed, 148 insertions(+), 146 deletions(-)

diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl 
b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
index 2c0a702..6026629 100644
--- a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
+++ b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
@@ -2352,7 +2352,8 @@ OVERLOADABLE float __gen_ocl_internal_pow(float x, float 
y) {
   float z,ax,z_h,z_l,p_h,p_l;
   float y1,t1,t2,r,s,sn,t,u,v,w;
   int i,j,k,yisint,n;
-  int hx,hy,ix,iy,is;
+  int hy,ix,iy,is;
+  unsigned int hx;
   float bp,dp_h,dp_l,
   zero=  0.0,
   one  =  1.0,
@@ -2382,17 +2383,17 @@ OVERLOADABLE float __gen_ocl_internal_pow(float x, 
float y) {
   float retVal = 0.0f;
   bool bRet = false;
 
-  GEN_OCL_GET_FLOAT_WORD(hx,x);
+  hx = as_uint(x);
   GEN_OCL_GET_FLOAT_WORD(hy,y);
   ax   = __gen_ocl_fabs(x);
   ix = as_int(ax);  iy = as_int(fabs(y));
 
-  if(iy < 0x0080 || hx==0x3f80)
+  if(iy < 0x0080)
   {
  bRet = true;
  retVal = one;
   }
-  else if (ix > 0x7f80 || iy > 0x7f80)
+  else if (iy > 0x7f80)
   {
bRet = true;
retVal = NAN;
@@ -2403,120 +2404,152 @@ OVERLOADABLE float __gen_ocl_internal_pow(float x, 
float y) {
  * yisint = 1  ... y is an odd int
  * yisint = 2  ... y is an even int
  */
-  yisint  = 0;
-  if(hx<0) {
-k = (iy>>23)-0x7f;  /* exponent */
-j = iy>>(23-k);
-yisint = (iy>=0x3f80 && (j<<(23-k))==iy)? 2-(j&1):yisint;
-yisint = (iy>=0x4b80) ? 2:yisint;
-  }
-
-/* special value of x */
-  if(ix==0x7f80||ix==0||ix==0x3f80){
-z = ax;/*x is +-0,+-inf,+-1*/
+  sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */
 
-z = (hy < 0)? one/z:z;
-z = ((hx<0) && (((ix-0x3f80)|yisint)==0))? NAN:z;
-z = ((hx<0) && (yisint==1))? -z:z;
+  if(hx >= 0x7f80)
+  {
+yisint  = 0;
+n = (hx>>31)-1;
 
-retVal = (bRet)? retVal:z;
-bRet = true;
-  }
+if (!retVal && ix > 0x7f80)
+{
+  bRet = true;
+  retVal = NAN;
+}
 
-  n = ((uint)hx>>31)-1;
+if(hx >= 0x8000) {
+  k = (iy>>23)-0x7f;/* exponent */
+  j = iy>>(23-k);
+  yisint = (iy>=0x3f80 && (j<<(23-k))==iy)? 2-(j&1):yisint;
+  yisint = (iy>=0x4b80) ? 2:yisint;
+}
 
-  /* (x<0)**(non-int) is NaN */
-  if(!bRet && (n|yisint)==0)
-  {
- bRet= true;
- retVal = NAN;
-  }
+  /* special value of x */
+if(ix==0x7f80||ix==0||ix==0x3f80){
+  z = ax; /*x is +-0,+-inf,+-1*/
+  z = (hy < 0)? one/z:z;
+  z = (((ix-0x3f80)|yisint)==0)? NAN:z;
+  z = (yisint==1)? -z:z;
+  retVal = (bRet)? retVal:z;
+  bRet = true;
+}
 
-  sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */
-  if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */
+/* (x<0)**(non-int) is NaN */
+if(!bRet && (n|yisint)==0)
+{
+   bRet= true;
+   retVal = NAN;
+}
 
-  /* |y| is huge */
-  if(iy>0x4d00)
-  { /* if |y| > 2**27 */
-   /* over/underflow if x is not close to one */
-   /* special value of y */
-   float b1 = (hy>=0)? y: zero;
-   float b2 = (hy<0)?-y: zero;
-   b1 = (ix > 0x3f80)? b1:b2;
-   retVal = (iy==0x7f80 && !bRet)? b1:retVal;
-   bRet = (iy==0x7f80 && !bRet)? true: bRet;
+if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */
+  }
 
+/* special value of x */
+  if((ix&0x7f) == 0) {
+if(hx == 0x3f80)
+{
+  retVal = one;
+  bRet = true;
+}
 
-   b1 = (hy>0)? sn*huge*huge:0;
-   retVal = (ix>0x3f87 && !bRet)? b1:retVal;
-   bRet = (ix>0x3f87 && !bRet)? true:bRet;
+if(ix==0x7f80||ix==0) {
+  z = ax;  /*x is +0,+inf*/
+  z = (hy < 0)? one/z:z;
+  retVal = (bRet)? retVal:z;
+  bRet = true;
+}
+  }
 
-   /* now |1-x| is tiny <= 2**-20, suffice to compute
- log(x) by x-x^2/2+x^3/3-x^4/4 */
-   t = ax-1;   /* t has 20 trailing zeros */
-   w = (t*t)*((float)0.5-t*(0.f-t*0.25f));
-   u = ivln2_h*t;  /* ivln2_h has 16 sig. bits */
-   v = t*ivln2_l-w*ivln2;
-   t1 = u+v;
-   GEN_OCL_GET_FLOAT_WORD(is,t1);
-   GEN_OCL_SET_FLOAT_WORD(t1,is&0xf000);
-   t2 = v-(t1-u);
-  } 
-  else
+  /* |y| is not huge */
+  if(iy <= 0x4d00)
   {
-   float s2,s_h,s_l,t_h,t_l;
-   n = 0;
-   n  += ((ix)>>23)-0x7f;
-   j  = ix&0x007f;
-   /* determine interval */
-   ix = j|0x3f80; /* normalize ix */
-
-   n = (j >= 0x5db3d7)?n+1:n;
-   ix = (j >=

[Beignet] [PATCH] GBE: clean llvm module's clone and release.

2017-06-22 Thread Yang Rong

There are some changes:
1. Clone the module before call LLVMLinkModules2, remove other
clones for it.
2. Don't delete module in function llvmToGen.
3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM
and buildFromLLVMModule only handle llvm module. Actually,
programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel,
and I think it is useless, maybe we could delete it at all.

V2: define errDiag beside #if/#endif.
Signed-off-by: Yang Rong 
---
 backend/src/backend/gen_program.cpp|  5 +-
 backend/src/backend/program.cpp| 83 +-
 backend/src/backend/program.h  | 10 +++-
 backend/src/backend/program.hpp|  4 +-
 backend/src/llvm/llvm_bitcode_link.cpp |  3 +-
 backend/src/llvm/llvm_to_gen.cpp   | 19 +---
 backend/src/llvm/llvm_to_gen.hpp   |  2 +-
 src/cl_gbe_loader.cpp  |  5 ++
 src/cl_gbe_loader.h|  1 +
 src/cl_program.c   |  2 +-
 10 files changed, 76 insertions(+), 58 deletions(-)

diff --git a/backend/src/backend/gen_program.cpp 
b/backend/src/backend/gen_program.cpp
index cfb23fe..bb1d22f 100644
--- a/backend/src/backend/gen_program.cpp
+++ b/backend/src/backend/gen_program.cpp
@@ -455,7 +455,6 @@ namespace gbe {
   }
 
   static gbe_program genProgramNewFromLLVM(uint32_t deviceID,
-   const char *fileName,
const void* module,
const void* llvm_ctx,
const char* asm_file_name,
@@ -475,7 +474,7 @@ namespace gbe {
 #ifdef GBE_COMPILER_AVAILABLE
 std::string error;
 // Try to compile the program
-if (program->buildFromLLVMFile(fileName, module, error, optLevel) == 
false) {
+if (program->buildFromLLVMModule(module, error, optLevel) == false) {
   if (err != NULL && errSize != NULL && stringSize > 0u) {
 const size_t msgSize = std::min(error.size(), stringSize-1u);
 std::memcpy(err, error.c_str(), msgSize);
@@ -598,7 +597,7 @@ namespace gbe {
 acquireLLVMContextLock();
 llvm::Module* module = (llvm::Module*)p->module;
 
-if (p->buildFromLLVMFile(NULL, module, error, optLevel) == false) {
+if (p->buildFromLLVMModule(module, error, optLevel) == false) {
   if (err != NULL && errSize != NULL && stringSize > 0u) {
 const size_t msgSize = std::min(error.size(), stringSize-1u);
 std::memcpy(err, error.c_str(), msgSize);
diff --git a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp
index 724058c..c06ae5a 100644
--- a/backend/src/backend/program.cpp
+++ b/backend/src/backend/program.cpp
@@ -40,6 +40,7 @@
 #include "llvm/Support/ManagedStatic.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/IRReader/IRReader.h"
 #endif
 
 #include 
@@ -113,32 +114,17 @@ namespace gbe {
   IVAR(OCL_PROFILING_LOG, 0, 0, 1); // Int for different profiling types.
   BVAR(OCL_OUTPUT_BUILD_LOG, false);
 
-  bool Program::buildFromLLVMFile(const char *fileName,
- const void* module,
- std::string ,
- int optLevel) {
+  bool Program::buildFromLLVMModule(const void* module,
+  std::string ,
+  int optLevel) {
 ir::Unit *unit = new ir::Unit();
-llvm::Module * cloned_module = NULL;
 bool ret = false;
-if(module){
-#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
-  cloned_module = llvm::CloneModule((llvm::Module*)module).release();
-#else
-  cloned_module = llvm::CloneModule((llvm::Module*)module);
-#endif
-}
+
 bool strictMath = true;
 if (fast_relaxed_math || !OCL_STRICT_CONFORMANCE)
   strictMath = false;
-#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39
-llvm::Module * linked_module = module ? 
llvm::CloneModule((llvm::Module*)module).release() : NULL;
-// Src now will be removed automatically. So clone it.
-if (llvmToGen(*unit, fileName, linked_module, optLevel, strictMath, 
OCL_PROFILING_LOG, error) == false) {
-#else
-if (llvmToGen(*unit, fileName, module, optLevel, strictMath, 
OCL_PROFILING_LOG, error) == false) {
-#endif
-  if (fileName)
-error = std::string(fileName) + " not found";
+
+if (llvmToGen(*unit, module, optLevel, strictMath, OCL_PROFILING_LOG, 
error) == false) {
   delete unit;
   return false;
 }
@@ -147,13 +133,8 @@ namespace gbe {
 if(!unit->getValid()) {
   delete unit;   //clear unit
   unit = new ir::Unit();
-  if(cloned_module){
-//suppose file exists and llvmToGen will not return false.
-llvmToGen(*unit, fileName, cloned_module, 0, strictMath, 
OCL_PROFILING_LOG, error);
-  }else{
-//suppose file

Re: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

2017-06-22 Thread Pan, Xiuli

LGTM.


-Original Message-
From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Yang 
Rong
Sent: Thursday, June 22, 2017 14:04
To: beignet@lists.freedesktop.org
Cc: Yang, Rong R 
Subject: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

There are some changes:
1. Clone the module before call LLVMLinkModules2, remove other clones for it.
2. Don't delete module in function llvmToGen.
3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and 
buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile 
is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe 
we could delete it at all.

Signed-off-by: Yang Rong 
---
 backend/src/backend/gen_program.cpp|  5 +-
 backend/src/backend/program.cpp| 84 +-
 backend/src/backend/program.h  | 10 +++-
 backend/src/backend/program.hpp|  4 +-
 backend/src/llvm/llvm_bitcode_link.cpp |  3 +-
 backend/src/llvm/llvm_to_gen.cpp   | 19 +---
 backend/src/llvm/llvm_to_gen.hpp   |  2 +-
 src/cl_gbe_loader.cpp  |  5 ++
 src/cl_gbe_loader.h|  1 +
 src/cl_program.c   |  2 +-
 10 files changed, 77 insertions(+), 58 deletions(-)

diff --git a/backend/src/backend/gen_program.cpp 
b/backend/src/backend/gen_program.cpp
index cfb23fe..bb1d22f 100644
--- a/backend/src/backend/gen_program.cpp
+++ b/backend/src/backend/gen_program.cpp
@@ -455,7 +455,6 @@ namespace gbe {
   }
 
   static gbe_program genProgramNewFromLLVM(uint32_t deviceID,
-   const char *fileName,
const void* module,
const void* llvm_ctx,
const char* asm_file_name, @@ 
-475,7 +474,7 @@ namespace gbe {  #ifdef GBE_COMPILER_AVAILABLE
 std::string error;
 // Try to compile the program
-if (program->buildFromLLVMFile(fileName, module, error, optLevel) == 
false) {
+if (program->buildFromLLVMModule(module, error, optLevel) == false) 
+ {
   if (err != NULL && errSize != NULL && stringSize > 0u) {
 const size_t msgSize = std::min(error.size(), stringSize-1u);
 std::memcpy(err, error.c_str(), msgSize); @@ -598,7 +597,7 @@ 
namespace gbe {
 acquireLLVMContextLock();
 llvm::Module* module = (llvm::Module*)p->module;
 
-if (p->buildFromLLVMFile(NULL, module, error, optLevel) == false) {
+if (p->buildFromLLVMModule(module, error, optLevel) == false) {
   if (err != NULL && errSize != NULL && stringSize > 0u) {
 const size_t msgSize = std::min(error.size(), stringSize-1u);
 std::memcpy(err, error.c_str(), msgSize); diff --git 
a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp index 
724058c..740c5c2 100644
--- a/backend/src/backend/program.cpp
+++ b/backend/src/backend/program.cpp
@@ -40,6 +40,7 @@
 #include "llvm/Support/ManagedStatic.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/IRReader/IRReader.h"
 #endif
 
 #include 
@@ -113,32 +114,17 @@ namespace gbe {
   IVAR(OCL_PROFILING_LOG, 0, 0, 1); // Int for different profiling types.
   BVAR(OCL_OUTPUT_BUILD_LOG, false);
 
-  bool Program::buildFromLLVMFile(const char *fileName,
- const void* module,
- std::string ,
- int optLevel) {
+  bool Program::buildFromLLVMModule(const void* module,
+  std::string ,
+  int optLevel) {
 ir::Unit *unit = new ir::Unit();
-llvm::Module * cloned_module = NULL;
 bool ret = false;
-if(module){
-#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
-  cloned_module = llvm::CloneModule((llvm::Module*)module).release();
-#else
-  cloned_module = llvm::CloneModule((llvm::Module*)module);
-#endif
-}
+
 bool strictMath = true;
 if (fast_relaxed_math || !OCL_STRICT_CONFORMANCE)
   strictMath = false;
-#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39
-llvm::Module * linked_module = module ? 
llvm::CloneModule((llvm::Module*)module).release() : NULL;
-// Src now will be removed automatically. So clone it.
-if (llvmToGen(*unit, fileName, linked_module, optLevel, strictMath, 
OCL_PROFILING_LOG, error) == false) {
-#else
-if (llvmToGen(*unit, fileName, module, optLevel, strictMath, 
OCL_PROFILING_LOG, error) == false) {
-#endif
-  if (fileName)
-error = std::string(fileName) + " not found";
+
+if (llvmToGen(*unit, module, optLevel, strictMath, 
+ OCL_PROFILING_LOG, error) == false) {
   delete unit;
   return false;
 }
@@ -147,13 +133,8 @@ namespace gbe {
 if(!unit->getValid()) {
   delete unit;

Re: [Beignet] [PATCH V4] backend: add global immediate optimization

2017-06-22 Thread Song, Ruiling

LGTM

Ruiling

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Wednesday, June 14, 2017 1:56 PM
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH V4] backend: add global immediate optimization
> 
> there are some global immediates in global var list of LLVM.
> these imm can be integrated in instructions. for
> compiler_global_immediate_optimized test
> in utest, there are two global immediates:
> L0:
> MOV(1)  %42<0>:UD   :   0x0:UD
> MOV(1)  %43<0>:UD   :   0x30:UD
> 
> used by:
> ADD(16) %49<1>:D:   %42<0,1,0>:D
> %48<8,8,1>:D
> ADD(16) %54<1>:D:   %43<0,1,0>:D
> %53<8,8,1>:D
> 
> it can be
> ADD(16) %49<1>:D:   %48<8,8,1>:D   0x0:UD
> ADD(16) %54<1>:D:   %53<8,8,1>:D   0x30:UD
> 
>   Then the MOV can be removed. And after this optimization, ADD 0 can
> be change
>   to MOV, then local copy propagation can be done.
> 
>   V2: (1) add environment variable to enable/disable the optimization
>   (2) refine the architecture of imm optimization, inherit from global
> optimizer not local block optimizer
> 
>   V3: merge with latest master driver
> 
>   V4: (1)refine some type errors
>   (2)remove UD/D check for no need
>   (3)refine imm calculate for UD/D
> 
> Signed-off-by: rander.wang 
> ---
>  .../src/backend/gen_insn_selection_optimize.cpp| 367
> +++--
>  1 file changed, 342 insertions(+), 25 deletions(-)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 07547ec..eb93a20 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -40,6 +40,33 @@ namespace gbe
>  return elements;
>}
> 
> +  class ReplaceInfo
> +  {
> +  public:
> +ReplaceInfo(SelectionInstruction ,
> +const GenRegister ,
> +const GenRegister ) : insn(insn), 
> intermedia(intermedia),
> replacement(replacement)
> +{
> +  assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> +  assert(&(insn.dst(0)) == );
> +  this->elements = CalculateElements(intermedia, insn.state.execWidth);
> +  replacementOverwritten = false;
> +}
> +~ReplaceInfo()
> +{
> +  this->toBeReplaceds.clear();
> +}
> +
> +SelectionInstruction 
> +const GenRegister 
> +uint32_t elements;
> +const GenRegister 
> +set toBeReplaceds;
> +set toBeReplacedInsns;
> +bool replacementOverwritten;
> +GBE_CLASS(ReplaceInfo);
> +  };
> +
>class SelOptimizer
>{
>public:
> @@ -66,32 +93,7 @@ namespace gbe
> 
>private:
>  // local copy propagation
> -class ReplaceInfo
> -{
> -public:
> -  ReplaceInfo(SelectionInstruction& insn,
> -  const GenRegister& intermedia,
> -  const GenRegister& replacement) :
> -  insn(insn), intermedia(intermedia), 
> replacement(replacement)
> -  {
> -assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> -assert(&(insn.dst(0)) == );
> -this->elements = CalculateElements(intermedia, insn.state.execWidth);
> -replacementOverwritten = false;
> -  }
> -  ~ReplaceInfo()
> -  {
> -this->toBeReplaceds.clear();
> -  }
> 
> -  SelectionInstruction& insn;
> -  const GenRegister& intermedia;
> -  uint32_t elements;
> -  const GenRegister& replacement;
> -  set toBeReplaceds;
> -  bool replacementOverwritten;
> -  GBE_CLASS(ReplaceInfo);
> -};
>  typedef map ReplaceInfoMap;
>  ReplaceInfoMap replaceInfoMap;
>  void doLocalCopyPropagation();
> @@ -298,13 +300,328 @@ namespace gbe
>  virtual void run();
>};
> 
> +  class SelGlobalImmMovOpt : public SelGlobalOptimizer
> +  {
> +  public:
> +SelGlobalImmMovOpt(const GenContext& ctx, uint32_t features,
> intrusive_list *blockList) :
> +  SelGlobalOptimizer(ctx, features)
> +  {
> +mblockList = blockList;
> +  }
> +
> +virtual void run();
> +
> +void addToReplaceInfoMap(SelectionInstruction& insn);
> +void doGlobalCopyPropagation();
> +bool CanBeReplaced(const ReplaceInfo* info, SelectionInstruction& insn,
> const GenRegister& var);
> +void cleanReplaceInfoMap();
> +void doReplacement(ReplaceInfo* info);
> +
> +  private:
> +intrusive_list *mblockList;
> +
> +typedef map ReplaceInfoMap;
> +ReplaceInfoMap

Re: [Beignet] [PATCH V2] backend: refine load/store merging algorithm

2017-06-22 Thread Song, Ruiling

LGTM

Ruiling

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Friday, June 16, 2017 9:50 AM
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH V2] backend: refine load/store merging algorithm
> 
>   Now it works for sequence: load(0), load(1), load(2)
>   but it cant work for load(2), load(0), load(1). because
> it compared the last merged load and the new one not all
>   the loads
> 
>   for  sequence: load(0), load(1), load(2). the load(0) is the
> start, can find that load(1) is successor without space, so
> put it to a merge fifo. then the start is moving to the top
> of fifo load(1), and compared with load(2). Also load(2) can be merged
> 
>   for load(2), load(0), load(1). load(2) cant be merged with
> load(0) for a space between them. So skip load(0) and mov to next
> load(1).And this load(1) can be merged. But it never go back merge
> load(0)
> 
> Now change the algorithm.
>   (1) find all loads maybe merged arround the start by the distance to
>   the start. the distance is depended on data type, for 32bit data, 
> the
> distance is 4. Put them in a list
> 
> (2) sort the list by the distance from the start.
> 
>   (3) search the continuous sequence including the start to merge
> 
>   V2: (1)refine the sort and compare algoritm. First find all the IO
>  in small offset compared to start. Then call std:sort
> (2)check the number of candidate IO to be favorable to performance
>  for most cases there is no chance to merge IO
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 87
> +---
>  1 file changed, 78 insertions(+), 9 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index 5aa38be..c91c1a0 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -67,7 +67,7 @@ namespace gbe {
>  bool isSimpleLoadStore(Value *I);
>  bool optimizeLoadStore(BasicBlock );
> 
> -bool isLoadStoreCompatible(Value *A, Value *B);
> +bool isLoadStoreCompatible(Value *A, Value *B, int *dist, int* 
> elementSize,
> int maxVecSize);
>  void mergeLoad(BasicBlock , SmallVector 
> );
>  void mergeStore(BasicBlock , SmallVector 
> );
>  bool findConsecutiveAccess(BasicBlock ,
> @@ -109,7 +109,7 @@ namespace gbe {
>  return NULL;
>}
> 
> -  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A, Value *B) {
> +  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A, Value *B,
> int *dist, int* elementSize, int maxVecSize) {
>  Value *ptrA = getPointerOperand(A);
>  Value *ptrB = getPointerOperand(B);
>  unsigned ASA = getAddressSpace(A);
> @@ -136,7 +136,11 @@ namespace gbe {
>  // The Instructions are connsecutive if the size of the first load/store 
> is
>  // the same as the offset.
>  int64_t sz = TD->getTypeStoreSize(Ty);
> -return ((-offset) == sz);
> +*dist = -offset;
> +*elementSize = sz;
> +
> +//a insn with small distance from the search load/store is a candidate 
> one
> +return (abs(-offset) < sz*maxVecSize);
>}
> 
>void GenLoadStoreOptimization::mergeLoad(BasicBlock ,
> SmallVector ) {
> @@ -163,6 +167,25 @@ namespace gbe {
>values[i]->replaceAllUsesWith(S);
>  }
>}
> +
> +  class mergedInfo{
> +public:
> +Instruction* mInsn;
> +int mOffset;
> +
> +void init(Instruction* insn, int offset)
> +{
> +  mInsn = insn;
> +  mOffset = offset;
> +}
> +  };
> +
> +  struct offsetSorter {
> +bool operator()(mergedInfo* m0, mergedInfo* m1) const {
> +return m0->mOffset < m1->mOffset;
> +}
> +  };
> +
>// When searching for consecutive memory access, we do it in a small 
> window,
>// if the window is too large, it would take up too much compiling time.
>// An Important rule we have followed is don't try to change load/store 
> order.
> @@ -177,7 +200,6 @@ namespace gbe {
> 
>  if(!isSimpleLoadStore(&*start)) return false;
> 
> -merged.push_back(&*start);
>  unsigned targetAddrSpace = getAddressSpace(&*start);
> 
>  BasicBlock::iterator E = BB.end();
> @@ -187,11 +209,27 @@ namespace gbe {
>  unsigned maxLimit = maxVecSize * 8;
>  bool reordered = false;
> 
> -for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> -  if((isLoad && isa(*J)) || (!isLoad && isa(*J))) {
> -if(isLoadStoreCompatible(merged[merged.size()-1], &*J)) {
> -  merged.push_back(&*J);
> -}
> +bool ready = false;
> +int

Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel list.

2017-06-22 Thread yan . wang

Sorry for this.
Thanks your modification.



yan.wang
 
From: Yang, Rong R
Date: 2017-06-22 14:23
To: yan.w...@linux.intel.com; beignet@lists.freedesktop.org
Subject: Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel 
list.
Rename "__cl_cpy_region_unalign_same_offset;" to 
"__cl_copy_region_unalign_same_offset;",
and "__cl_copy_image_3d_to_2d;" is duplicated.
I have modified them and pushed, thanks.
 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, June 22, 2017 13:52
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel
> list.
> 
> From: Yan Wang 
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_gt_device.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h index f6cb5f8..ff23b32
> 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -115,16 +115,33 @@ DECL_INFO_STRING(built_in_kernels,
> "__cl_copy_region_align4;"
> "__cl_cpy_region_unalign_same_offset;"
> "__cl_copy_region_unalign_dst_offset;"
> "__cl_copy_region_unalign_src_offset;"
> +   "__cl_copy_region_unalign_same_offset;"
> "__cl_copy_buffer_rect;"
> +   "__cl_copy_buffer_rect_align4;"
> "__cl_copy_image_1d_to_1d;"
> "__cl_copy_image_2d_to_2d;"
> "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_3d;"
> "__cl_copy_image_3d_to_3d;"
> +   "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_buffer;"
> +   "__cl_copy_image_2d_to_buffer_align4;"
> +   "__cl_copy_image_2d_to_buffer_align16;"
> "__cl_copy_image_3d_to_buffer;"
> +   "__cl_copy_image_3d_to_buffer_align4;"
> +   "__cl_copy_image_3d_to_buffer_align16;"
> "__cl_copy_buffer_to_image_2d;"
> +   "__cl_copy_buffer_to_image_2d_align4;"
> +   "__cl_copy_buffer_to_image_2d_align16;"
> "__cl_copy_buffer_to_image_3d;"
> +   "__cl_copy_buffer_to_image_3d_align4;"
> +   "__cl_copy_buffer_to_image_3d_align16;"
> +   "__cl_copy_image_1d_array_to_1d_array;"
> +   "__cl_copy_image_2d_array_to_2d_array;"
> +   "__cl_copy_image_2d_array_to_2d;"
> +   "__cl_copy_image_2d_array_to_3d;"
> +   "__cl_copy_image_2d_to_2d_array;"
> +   "__cl_copy_image_3d_to_2d_array;"
> "__cl_fill_region_unalign;"
> "__cl_fill_region_align2;"
> "__cl_fill_region_align4;"
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel list.

2017-06-22 Thread Yang, Rong R

Rename "__cl_cpy_region_unalign_same_offset;" to 
"__cl_copy_region_unalign_same_offset;",
and "__cl_copy_image_3d_to_2d;" is duplicated.
 I have modified them and pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, June 22, 2017 13:52
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel
> list.
> 
> From: Yan Wang 
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_gt_device.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h index f6cb5f8..ff23b32
> 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -115,16 +115,33 @@ DECL_INFO_STRING(built_in_kernels,
> "__cl_copy_region_align4;"
> "__cl_cpy_region_unalign_same_offset;"
> "__cl_copy_region_unalign_dst_offset;"
> "__cl_copy_region_unalign_src_offset;"
> +   "__cl_copy_region_unalign_same_offset;"
> "__cl_copy_buffer_rect;"
> +   "__cl_copy_buffer_rect_align4;"
> "__cl_copy_image_1d_to_1d;"
> "__cl_copy_image_2d_to_2d;"
> "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_3d;"
> "__cl_copy_image_3d_to_3d;"
> +   "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_buffer;"
> +   "__cl_copy_image_2d_to_buffer_align4;"
> +   "__cl_copy_image_2d_to_buffer_align16;"
> "__cl_copy_image_3d_to_buffer;"
> +   "__cl_copy_image_3d_to_buffer_align4;"
> +   "__cl_copy_image_3d_to_buffer_align16;"
> "__cl_copy_buffer_to_image_2d;"
> +   "__cl_copy_buffer_to_image_2d_align4;"
> +   "__cl_copy_buffer_to_image_2d_align16;"
> "__cl_copy_buffer_to_image_3d;"
> +   "__cl_copy_buffer_to_image_3d_align4;"
> +   "__cl_copy_buffer_to_image_3d_align16;"
> +   "__cl_copy_image_1d_array_to_1d_array;"
> +   "__cl_copy_image_2d_array_to_2d_array;"
> +   "__cl_copy_image_2d_array_to_2d;"
> +   "__cl_copy_image_2d_array_to_3d;"
> +   "__cl_copy_image_2d_to_2d_array;"
> +   "__cl_copy_image_3d_to_2d_array;"
> "__cl_fill_region_unalign;"
> "__cl_fill_region_align2;"
> "__cl_fill_region_align4;"
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

Re: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

Re: [Beignet] [PATCH V4] backend: add global immediate optimization

Re: [Beignet] [PATCH V2] backend: refine load/store merging algorithm

[Beignet] [PATCH] Runtime: refine max group size for SKL & KBL

[Beignet] [PATCH] backend: refine pow function

[Beignet] [PATCH] GBE: clean llvm module's clone and release.

Re: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

Re: [Beignet] [PATCH V4] backend: add global immediate optimization

Re: [Beignet] [PATCH V2] backend: refine load/store merging algorithm

Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel list.

Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel list.

11 matches

Site Navigation

Mail list logo

Footer information