[Beignet] [PATCH] backend: add double constants defined by spec

2017-04-20 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/include/ocl_float.h | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/backend/src/libocl/include/ocl_float.h b/backend/src/libocl/include/ocl_float.h index 7

[Beignet] [PATCH] backend: add denorm support to double operations

2017-04-19 Thread rander.wang
set Double Precision Denorm Mode bit in control register to enable denorm of double when there is a double operator. this is suppoted form IVB, so it is set in GenContext Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_context.cpp

[Beignet] [PATCH] backend: refine the unpack operation to 64bit register

2017-04-20 Thread rander.wang
H4L4H5L5H6L6H7L7 Now define a dedicated interace class to deal with this case Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/CMakeLists.txt | 1 + backend/src/backend/gen8_context.cpp | 28 +-- backend/src/backend/gen_context.cpp

[Beignet] [PATCH] backend: refine sel instruction

2017-04-18 Thread rander.wang
double, long , ulong are not supported by sel instruction with conditional modifier, or it outputs random data Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

[Beignet] [PATCH] backend: add denorm support to double operations

2017-04-18 Thread rander.wang
set Double Precision Denorm Mode bit in control register to enable denorm of double when there is a double operator. this is suppoted form IVB, so it is set in GenContext Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_context.cpp

[Beignet] [PATCH V4] backend: refine load store optimization

2017-07-11 Thread rander.wang
offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests V4:check pointer type, if 64bit, modify it by 64, or 32 Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cpp

[Beignet] [PATCH] backend: refine global immediate optimization

2017-06-30 Thread rander.wang
for ABS(UD) = UD on Gen, so delete it, or it make compilation failed on some platform Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 4 1 file changed, 4 deletions(-) diff --git a/backend/src/b

[Beignet] [PATCH V2] backend: improve add zero pattern

2017-07-03 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization V2: refine the function name for zeroAdd Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 6 +++--- 1 file chan

[Beignet] [PATCH V3] backend: refine fdiv to rcp at some cases

2017-07-04 Thread rander.wang
the conformance test and utests V2: refine negation flag V3: modify negation by negate Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 28 1 file changed, 28 insertions(+) diff --git a/backe

[Beignet] [PATCH] backend: refine load store optimization

2017-07-02 Thread rander.wang
for overflow if too many insn (2)Make sure the start insn is the first insn of searched array because if it is not the first, the offset maybe invalid. And it is complex to modify offset without error Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/sr

[Beignet] [PATCH V3] backend: refine load store optimization

2017-07-06 Thread rander.wang
offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cpp | 103 --- 1 file changed, 74 insertions(+), 29 deletions(-) diff

[Beignet] [PATCH V3] backend: improve add zero pattern

2017-07-06 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization V2: refine the function name for zeroAdd V3: refine the build Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cp

[Beignet] [PATCH] backend: refine hypot function

2017-05-18 Thread rander.wang
the test OCL_Magnitude of opencv is slow on beignet because of hypot. refine the hypot, change algorithm and remove unnecessary code to get 30% up Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/tmpl/ocl_math_common.tmpl.c

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-17 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 71 ++ 1 file changed, 71 insertions(+) diff --git

[Beignet] [PATCH 1/2] Backend: Add optimization for negtive modifier

2017-05-17 Thread rander.wang
, -b, so it is a Mov operation like LocalCopyPropagation Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 35 +++--- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/backend/src/b

[Beignet] [PATCH] backend: refine pow function

2017-06-22 Thread rander.wang
Now save 40% time than before (1) group many branches which deal with corner case to one branch. (2) using HW exp2 and log2 to replace some instructions pass conformance tests and utest Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/

[Beignet] [PATCH] Runtime: refine max group size for SKL & KBL

2017-06-22 Thread rander.wang
Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.w...@intel.com> --- src/cl_device_id.c | 18 +-

[Beignet] [PATCH] backend: improve add zero pattern

2017-06-23 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/backe

[Beignet] [PATCH] utests: added for optimization negtiveAdd

2017-05-19 Thread rander.wang
the negtive Add is like: exp -a llvm transfer it to: add x -a, 0 exp x Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_remove_negtiveAdd.cl | 4 utests/CMakeLists.txt | 3 ++-

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-19 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 77 ++ 1 file changed, 77 insertions(+) diff --git

[Beignet] [PATCH] utests: add utest for sqrt-div optimization

2017-05-19 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_sqrtDiv.cl | 8 ++ utests/CMakeLists.txt | 3 ++- utests/compiler_sqrtDiv.cpp | 61 + 3 files changed, 71 insertions(+), 1 deletion(-) create mode 100644 k

[Beignet] [PATCH] utests: added for optimization negativeAdd

2017-05-22 Thread rander.wang
the negtive Add is like: exp -a llvm transfer it to: add x -a, 0 exp x Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_remove_negative_add.cl | 4 utests/CMakeLists.txt | 3 ++-

[Beignet] [PATCH] backend: add global immediate optimization

2017-05-26 Thread rander.wang
,8,1>:D it can be ADD(16) %49<1>:D: %48<8,8,1>:D 0x0:UD ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD Then the MOV can be removed. And after this optimization, ADD 0 can be change to M

[Beignet] [PATCH] utests: add utests for global imm optimized

2017-05-26 Thread rander.wang
lt;8,8,1>:D 0x0:UD ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_global_immediate_optimized.cl | 49 ++ utests/CMakeLists.txt | 3 +-

[Beignet] [PATCH 1/2] backend: fix tgamma error after restructure

2017-05-18 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 56 ++--- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl b/backend/src/libocl/tmpl/ocl_math_common.t

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-19 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 68 ++ 1 file changed, 68 insertions(+) diff --git

[Beignet] [PATCH] utests: add utests for load/store optimization

2017-06-04 Thread rander.wang
loads in this case can be merged to 4 from 8 Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_load_store_merging.cl | 18 utests/CMakeLists.txt | 3 +- utests/compiler_load_store_merging.cpp | 51 +++

[Beignet] [PATCH] backend: refine load/store merging algorithm

2017-06-04 Thread rander.wang
, for 32bit data, the distance is 4. Put them in a list (2) sort the list by the distance from the start. (3) search the continuous sequence including the start to merge Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/sr

[Beignet] [PATCH V3] backend: add global immediate optimization

2017-06-12 Thread rander.wang
hen local copy propagation can be done. V2: (1) add environment variable to enable/disable the optimization (2) refine the architecture of imm optimization, inherit from global optimizer not local block optimizer V3: merge with latest master driver

[Beignet] [PATCH V2] backend: add global immediate optimization

2017-06-11 Thread rander.wang
hen local copy propagation can be done. V2: (1) add environment variable to enable/disable the optimization (2) refine the architecture of imm optimization, inherit from global optimizer not local block optimizer Signed-off-by: rander.wang <rander.w.

[Beignet] [PATCH V4] backend: add global immediate optimization

2017-06-13 Thread rander.wang
V4: (1)refine some type errors (2)remove UD/D check for no need (3)refine imm calculate for UD/D Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 367 +++-- 1 file

[Beignet] [PATCH V2] backend: refine load/store merging algorithm

2017-06-15 Thread rander.wang
offset compared to start. Then call std:sort (2)check the number of candidate IO to be favorable to performance for most cases there is no chance to merge IO Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cp

[Beignet] [PATCH] backend: refine math log function

2017-06-18 Thread rander.wang
remove a few unnecessary codes , and get 20% improvement at worse case. If X is a NAN, there are some if-return codes to return NAN. Now change it to add(x - x) which get the same NAN pass the conformance tests and utests Signed-off-by: rander.wang

[Beignet] [PATCH] backend: refine fdiv to rcp at some cases

2017-06-18 Thread rander.wang
the conformance test and utests Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 29 + 1 file changed, 29 insertions(+) diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selecti

[Beignet] [PATCH] utests: add utest for fdiv to rcp

2017-06-19 Thread rander.wang
for this case 1.0f/src, 2.0f/src can be converted, but 3.0f/src and i/src cant Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_fdiv2rcp.cl | 8 ++ utests/CMakeLists.txt| 3 ++- utests/compiler_fdiv2rcp.cp

[Beignet] [PATCH] backend: refine the local copy propagation.

2017-06-13 Thread rander.wang
src modifier is not supported by some instructions. so return false when it exists. This fix piglit % failed Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 32 ++ 1 file chang