Re: [Beignet] [PATCH 2/2] Workgroup reduce add optimization using add4 and SLM

2016-03-04 Thread Lupescu, Grigore
May I connect remotely to an IVB machine ? - ideally the one on which you've tested they fail. You could send the credentials via IM or mail. Could you please test on other platforms like HSW, SKL or BSW to see if the tests fail there too ? -Original Message- From: Weng, Chuanbo Sent:

Re: [Beignet] [PATCH 2/2] Workgroup reduce add optimization using add4 and SLM

2016-03-04 Thread Weng, Chuanbo
The previous implementation works on IVB. SLM on IVB is active. All the utest_run you list below fail. It fails like below(seems the result is not calculated correctly): compiler_workgroup_reduce_add_float()[FAILED] Error: ((float *)buf_data[1])[i] == cpu_res at file /root/opencl/beignet

Re: [Beignet] [PATCH 2/2] Workgroup reduce add optimization using add4 and SLM

2016-03-04 Thread Lupescu, Grigore
The code doesn't use anything particular to the BDW platform - nor does it use anything different aside from the previous implementation. Does the previous implementation (prior to the patch) work on IVB ? I haven't tested it on IVB or HSW, just BDW. Another issue might be about the SLM in IVB -

[Beignet] [PATCH 1/3] GBE: Fix type mismatch bug.

2016-03-04 Thread Ruiling Song
the move instruction should have same type src & dst. Signed-off-by: Ruiling Song --- backend/src/llvm/llvm_gen_backend.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/backend/src/llvm/llvm_gen_backend.cpp b/backend/src/llvm/llvm_gen_backend.cpp index 6646536..32f8fe3

[Beignet] [PATCH 2/3] GBE: Fix SEL.bool issue.

2016-03-04 Thread Ruiling Song
the flag register is used by the condition source, we have to store the dst register in GRF. when using this dst, will update from the allocated GRF to flag register. Signed-off-by: Ruiling Song --- backend/src/backend/gen_reg_allocation.cpp | 5 + 1 file changed, 5 insertions(+) diff --git

[Beignet] [PATCH 3/3] add ocl 2.0 work_group_barrier support.

2016-03-04 Thread Ruiling Song
to do an image barrier, we need to: 1. flush L3 RW cache. 2. do a barrier gateway. 3. flush sampler cache. Note the fence argument maybe ORed together. We need to support non-immediate barrier() argument. Signed-off-by: Ruiling Song --- backend/src/backend/gen8_encoder.cpp | 24 ++

Re: [Beignet] [PATCH 2/2] Workgroup reduce add optimization using add4 and SLM

2016-03-04 Thread Weng, Chuanbo
Which platform does your code base on? I've run ./utest_run compiler_workgroup_reduce_add_float on IVB and BDW, it succeed on BDW but fail on IVB. -Original Message- From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Grigore Lupescu Sent: Tuesday, March 1, 2016 5:2