[Beignet] [PATCH V4] backend: refine load store optimization

2017-07-11 Thread rander.wang
offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests V4:check pointer type, if 64bit, modify it by 64, or 32 Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cpp

[Beignet] [PATCH V3] backend: refine load store optimization

2017-07-06 Thread rander.wang
offset to the pointer of start pass OpenCV, conformance basic and compiler tests, utests Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cpp | 103 --- 1 file changed, 74 insertions(+), 29 deletions(-) diff

[Beignet] [PATCH V3] backend: improve add zero pattern

2017-07-06 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization V2: refine the function name for zeroAdd V3: refine the build Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cp

[Beignet] [PATCH V3] backend: refine fdiv to rcp at some cases

2017-07-04 Thread rander.wang
the conformance test and utests V2: refine negation flag V3: modify negation by negate Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 28 1 file changed, 28 insertions(+) diff --git a/backe

[Beignet] [PATCH V2] backend: improve add zero pattern

2017-07-03 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization V2: refine the function name for zeroAdd Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 6 +++--- 1 file chan

[Beignet] [PATCH] backend: refine load store optimization

2017-07-02 Thread rander.wang
for overflow if too many insn (2)Make sure the start insn is the first insn of searched array because if it is not the first, the offset maybe invalid. And it is complex to modify offset without error Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/sr

[Beignet] [PATCH] backend: refine global immediate optimization

2017-06-30 Thread rander.wang
for ABS(UD) = UD on Gen, so delete it, or it make compilation failed on some platform Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 4 1 file changed, 4 deletions(-) diff --git a/backend/src/b

[Beignet] [PATCH] backend: improve add zero pattern

2017-06-23 Thread rander.wang
remove the negation check for adding zero. it also can be applied this optimization Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection_optimize.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/backe

[Beignet] [PATCH] Runtime: refine max group size for SKL & KBL

2017-06-22 Thread rander.wang
Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.w...@intel.com> --- src/cl_device_id.c | 18 +-

[Beignet] [PATCH] backend: refine pow function

2017-06-22 Thread rander.wang
Now save 40% time than before (1) group many branches which deal with corner case to one branch. (2) using HW exp2 and log2 to replace some instructions pass conformance tests and utest Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/

[Beignet] [PATCH] utests: add utest for fdiv to rcp

2017-06-19 Thread rander.wang
for this case 1.0f/src, 2.0f/src can be converted, but 3.0f/src and i/src cant Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_fdiv2rcp.cl | 8 ++ utests/CMakeLists.txt| 3 ++- utests/compiler_fdiv2rcp.cp

[Beignet] [PATCH] backend: refine fdiv to rcp at some cases

2017-06-18 Thread rander.wang
the conformance test and utests Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 29 + 1 file changed, 29 insertions(+) diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selecti

[Beignet] [PATCH] backend: refine math log function

2017-06-18 Thread rander.wang
remove a few unnecessary codes , and get 20% improvement at worse case. If X is a NAN, there are some if-return codes to return NAN. Now change it to add(x - x) which get the same NAN pass the conformance tests and utests Signed-off-by: rander.wang

[Beignet] [PATCH V2] backend: refine load/store merging algorithm

2017-06-15 Thread rander.wang
offset compared to start. Then call std:sort (2)check the number of candidate IO to be favorable to performance for most cases there is no chance to merge IO Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/llvm/llvm_loadstore_optimization.cp

[Beignet] [PATCH V4] backend: add global immediate optimization

2017-06-13 Thread rander.wang
V4: (1)refine some type errors (2)remove UD/D check for no need (3)refine imm calculate for UD/D Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 367 +++-- 1 file

[Beignet] [PATCH] backend: refine the local copy propagation.

2017-06-13 Thread rander.wang
src modifier is not supported by some instructions. so return false when it exists. This fix piglit % failed Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 32 ++ 1 file chang

[Beignet] [PATCH V3] backend: add global immediate optimization

2017-06-12 Thread rander.wang
hen local copy propagation can be done. V2: (1) add environment variable to enable/disable the optimization (2) refine the architecture of imm optimization, inherit from global optimizer not local block optimizer V3: merge with latest master driver

[Beignet] [PATCH V2] backend: add global immediate optimization

2017-06-11 Thread rander.wang
hen local copy propagation can be done. V2: (1) add environment variable to enable/disable the optimization (2) refine the architecture of imm optimization, inherit from global optimizer not local block optimizer Signed-off-by: rander.wang <rander.w.

[Beignet] [PATCH] backend: refine load/store merging algorithm

2017-06-04 Thread rander.wang
, for 32bit data, the distance is 4. Put them in a list (2) sort the list by the distance from the start. (3) search the continuous sequence including the start to merge Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/sr

[Beignet] [PATCH] utests: add utests for load/store optimization

2017-06-04 Thread rander.wang
loads in this case can be merged to 4 from 8 Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_load_store_merging.cl | 18 utests/CMakeLists.txt | 3 +- utests/compiler_load_store_merging.cpp | 51 +++

[Beignet] [PATCH] backend: add global immediate optimization

2017-05-26 Thread rander.wang
,8,1>:D it can be ADD(16) %49<1>:D: %48<8,8,1>:D 0x0:UD ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD Then the MOV can be removed. And after this optimization, ADD 0 can be change to M

[Beignet] [PATCH] utests: add utests for global imm optimized

2017-05-26 Thread rander.wang
lt;8,8,1>:D 0x0:UD ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_global_immediate_optimized.cl | 49 ++ utests/CMakeLists.txt | 3 +-

[Beignet] [PATCH] utests: added for optimization negativeAdd

2017-05-22 Thread rander.wang
the negtive Add is like: exp -a llvm transfer it to: add x -a, 0 exp x Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_remove_negative_add.cl | 4 utests/CMakeLists.txt | 3 ++-

[Beignet] [PATCH] utests: added for optimization negtiveAdd

2017-05-19 Thread rander.wang
the negtive Add is like: exp -a llvm transfer it to: add x -a, 0 exp x Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_remove_negtiveAdd.cl | 4 utests/CMakeLists.txt | 3 ++-

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-19 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 77 ++ 1 file changed, 77 insertions(+) diff --git

[Beignet] [PATCH] utests: add utest for sqrt-div optimization

2017-05-19 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- kernels/compiler_sqrtDiv.cl | 8 ++ utests/CMakeLists.txt | 3 ++- utests/compiler_sqrtDiv.cpp | 61 + 3 files changed, 71 insertions(+), 1 deletion(-) create mode 100644 k

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-19 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 68 ++ 1 file changed, 68 insertions(+) diff --git

[Beignet] [PATCH 1/2] backend: fix tgamma error after restructure

2017-05-18 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 56 ++--- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl b/backend/src/libocl/tmpl/ocl_math_common.t

[Beignet] [PATCH] backend: refine hypot function

2017-05-18 Thread rander.wang
the test OCL_Magnitude of opencv is slow on beignet because of hypot. refine the hypot, change algorithm and remove unnecessary code to get 30% up Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/tmpl/ocl_math_common.tmpl.c

[Beignet] [PATCH 1/2] Backend: Add optimization for negtive modifier

2017-05-17 Thread rander.wang
, -b, so it is a Mov operation like LocalCopyPropagation Signed-off-by: rander.wang <rander.w...@intel.com> --- .../src/backend/gen_insn_selection_optimize.cpp| 35 +++--- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/backend/src/b

[Beignet] [PATCH] backend: add sqrt-div pattern to instruction select

2017-05-17 Thread rander.wang
there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 71 ++ 1 file changed, 71 insertions(+) diff --git

[Beignet] [PATCH] backend: add double constants defined by spec

2017-04-20 Thread rander.wang
Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/libocl/include/ocl_float.h | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/backend/src/libocl/include/ocl_float.h b/backend/src/libocl/include/ocl_float.h index 7

[Beignet] [PATCH] backend: refine the unpack operation to 64bit register

2017-04-20 Thread rander.wang
H4L4H5L5H6L6H7L7 Now define a dedicated interace class to deal with this case Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/CMakeLists.txt | 1 + backend/src/backend/gen8_context.cpp | 28 +-- backend/src/backend/gen_context.cpp

[Beignet] [PATCH] backend: add denorm support to double operations

2017-04-19 Thread rander.wang
set Double Precision Denorm Mode bit in control register to enable denorm of double when there is a double operator. this is suppoted form IVB, so it is set in GenContext Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_context.cpp

[Beignet] [PATCH] backend: add denorm support to double operations

2017-04-18 Thread rander.wang
set Double Precision Denorm Mode bit in control register to enable denorm of double when there is a double operator. this is suppoted form IVB, so it is set in GenContext Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_context.cpp

[Beignet] [PATCH] backend: refine sel instruction

2017-04-18 Thread rander.wang
double, long , ulong are not supported by sel instruction with conditional modifier, or it outputs random data Signed-off-by: rander.wang <rander.w...@intel.com> --- backend/src/backend/gen_insn_selection.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff