Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/libocl/include/ocl_float.h | 34 --
1 file changed, 32 insertions(+), 2 deletions(-)
diff --git a/backend/src/libocl/include/ocl_float.h
b/backend/src/libocl/include/ocl_float.h
index 7
set Double Precision Denorm Mode bit in control register to enable
denorm of double
when there is a double operator. this is suppoted form IVB, so it is
set in GenContext
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_context.cpp
H4L4H5L5H6L6H7L7
Now define a dedicated interace class to deal with this case
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/CMakeLists.txt | 1 +
backend/src/backend/gen8_context.cpp | 28 +--
backend/src/backend/gen_context.cpp
double, long , ulong are not supported by sel instruction with
conditional modifier,
or it outputs random data
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff
set Double Precision Denorm Mode bit in control register to enable
denorm of double
when there is a double operator. this is suppoted form IVB, so it is
set in GenContext
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_context.cpp
offset to the pointer of start
pass OpenCV, conformance basic and compiler tests, utests
V4:check pointer type, if 64bit, modify it by 64, or 32
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/llvm/llvm_loadstore_optimization.cpp
for ABS(UD) = UD on Gen, so delete it,
or it make compilation failed on some platform
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection_optimize.cpp | 4
1 file changed, 4 deletions(-)
diff --git a/backend/src/b
remove the negation check for adding zero.
it also can be applied this optimization
V2: refine the function name for zeroAdd
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection_optimize.cpp | 6 +++---
1 file chan
the conformance test and utests
V2: refine negation flag
V3: modify negation by negate
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 28
1 file changed, 28 insertions(+)
diff --git a/backe
for overflow if too many insn
(2)Make sure the start insn is the first insn of searched array
because if it is not the first, the offset maybe invalid. And
it is complex to modify offset without error
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/sr
offset to the pointer of start
pass OpenCV, conformance basic and compiler tests, utests
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/llvm/llvm_loadstore_optimization.cpp | 103 ---
1 file changed, 74 insertions(+), 29 deletions(-)
diff
remove the negation check for adding zero.
it also can be applied this optimization
V2: refine the function name for zeroAdd
V3: refine the build
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection_optimize.cp
the test OCL_Magnitude of opencv is slow on beignet because
of hypot. refine the hypot, change algorithm and remove
unnecessary code to get 30% up
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/libocl/tmpl/ocl_math_common.tmpl.c
there some patterns like:
sqrt r1, r2;
load r4, 1.0; ===> rqrt r3, r2
div r3, r4, r1;
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 71 ++
1 file changed, 71 insertions(+)
diff --git
, -b, so it is a Mov operation like
LocalCopyPropagation
Signed-off-by: rander.wang <rander.w...@intel.com>
---
.../src/backend/gen_insn_selection_optimize.cpp| 35 +++---
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/backend/src/b
Now save 40% time than before
(1) group many branches which deal with corner case to one branch.
(2) using HW exp2 and log2 to replace some instructions
pass conformance tests and utest
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/
Now change max group size to 256. it is a reasonable
size for Gen9. According to performance test, 256 make
good progress in openCV and no regression. So change it
Signed-off-by: rander.wang <rander.w...@intel.com>
---
src/cl_device_id.c | 18 +-
remove the negation check for adding zero.
it also can be applied this optimization
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection_optimize.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/backe
the negtive Add is like:
exp -a
llvm transfer it to:
add x -a, 0
exp x
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_remove_negtiveAdd.cl | 4
utests/CMakeLists.txt | 3 ++-
there some patterns like:
sqrt r1, r2;
load r4, 1.0; ===> rqrt r3, r2
div r3, r4, r1;
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 77 ++
1 file changed, 77 insertions(+)
diff --git
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_sqrtDiv.cl | 8 ++
utests/CMakeLists.txt | 3 ++-
utests/compiler_sqrtDiv.cpp | 61 +
3 files changed, 71 insertions(+), 1 deletion(-)
create mode 100644 k
the negtive Add is like:
exp -a
llvm transfer it to:
add x -a, 0
exp x
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_remove_negative_add.cl | 4
utests/CMakeLists.txt | 3 ++-
,8,1>:D
it can be
ADD(16) %49<1>:D: %48<8,8,1>:D 0x0:UD
ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD
Then the MOV can be removed. And after this optimization, ADD 0 can be
change
to M
lt;8,8,1>:D 0x0:UD
ADD(16) %54<1>:D: %53<8,8,1>:D 0x30:UD
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_global_immediate_optimized.cl | 49 ++
utests/CMakeLists.txt | 3 +-
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 56 ++---
1 file changed, 31 insertions(+), 25 deletions(-)
diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
b/backend/src/libocl/tmpl/ocl_math_common.t
there some patterns like:
sqrt r1, r2;
load r4, 1.0; ===> rqrt r3, r2
div r3, r4, r1;
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 68 ++
1 file changed, 68 insertions(+)
diff --git
loads in this case can be merged to 4 from 8
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_load_store_merging.cl | 18
utests/CMakeLists.txt | 3 +-
utests/compiler_load_store_merging.cpp | 51 +++
, for 32bit data,
the
distance is 4. Put them in a list
(2) sort the list by the distance from the start.
(3) search the continuous sequence including the start to merge
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/sr
hen local copy propagation can be done.
V2: (1) add environment variable to enable/disable the optimization
(2) refine the architecture of imm optimization, inherit from global
optimizer not local block optimizer
V3: merge with latest master driver
hen local copy propagation can be done.
V2: (1) add environment variable to enable/disable the optimization
(2) refine the architecture of imm optimization, inherit from global
optimizer not local block optimizer
Signed-off-by: rander.wang <rander.w.
V4: (1)refine some type errors
(2)remove UD/D check for no need
(3)refine imm calculate for UD/D
Signed-off-by: rander.wang <rander.w...@intel.com>
---
.../src/backend/gen_insn_selection_optimize.cpp| 367 +++--
1 file
offset compared to start. Then call std:sort
(2)check the number of candidate IO to be favorable to performance
for most cases there is no chance to merge IO
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/llvm/llvm_loadstore_optimization.cp
remove a few unnecessary codes , and get 20% improvement
at worse case. If X is a NAN, there are some if-return
codes to return NAN. Now change it to add(x - x) which
get the same NAN
pass the conformance tests and utests
Signed-off-by: rander.wang
the conformance test and utests
Signed-off-by: rander.wang <rander.w...@intel.com>
---
backend/src/backend/gen_insn_selection.cpp | 29 +
1 file changed, 29 insertions(+)
diff --git a/backend/src/backend/gen_insn_selection.cpp
b/backend/src/backend/gen_insn_selecti
for this case 1.0f/src, 2.0f/src can be converted,
but 3.0f/src and i/src cant
Signed-off-by: rander.wang <rander.w...@intel.com>
---
kernels/compiler_fdiv2rcp.cl | 8 ++
utests/CMakeLists.txt| 3 ++-
utests/compiler_fdiv2rcp.cp
src modifier is not supported by some instructions.
so return false when it exists. This fix piglit %
failed
Signed-off-by: rander.wang <rander.w...@intel.com>
---
.../src/backend/gen_insn_selection_optimize.cpp| 32 ++
1 file chang
36 matches
Mail list logo