Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread Song, Ruiling

The load store optimizer did not do aggressive merge.
Normally the successive load instructions are not too far.
The performance difference is much higher than I thought.
So the performance number comes for SKL platform? Have you tried this patch on 
a BDW?
The performance behavior you observed may not be applied to other platform.
How much will the patch affect the compile time?

From: yan.wang [mailto:yan.w...@linux.intel.com]
Sent: Friday, March 10, 2017 10:52 AM
To: Song, Ruiling <ruiling.s...@intel.com>; beignet 
<beignet@lists.freedesktop.org>
Subject: Re: Re: [Beignet] [PATCH v2] Provide more possible candidate of 
load/store as possible.

It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32.
I am not sure the reason of maxLimit = maxVecSize * 8.
32 is too samll for saerching and could not find more available load after 
leading load.
It will improve eaw_decompose kernel of darktable from 2.1876s to 1.8855s 
because reduce send from 3 send (2 float, 2 float, 1 float) to 2 send (4 float, 
1 float).
There is another issue when compiing eaw_decompose kernel and I will submit 
another patch for it.
At least need set one low bound for maxLimit like 150 to avoid seaching range 
too slow.


yan.wang

From: Song, Ruiling<mailto:ruiling.s...@intel.com>
Date: 2017-03-10 10:39
To: yan.w...@linux.intel.com<mailto:yan.w...@linux.intel.com>; 
beignet@lists.freedesktop.org<mailto:beignet@lists.freedesktop.org>
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store 
as possible.


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com<mailto:yan.w...@linux.intel.com>
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet@lists.freedesktop.org<mailto:beignet@lists.freedesktop.org>
> Cc: Yan Wang <yan.w...@linux.intel.com<mailto:yan.w...@linux.intel.com>>
> Subject: [Beignet] [PATCH v2] Provide more possible candidate of load/store as
> possible.
>
> From: Yan Wang <yan.w...@linux.intel.com<mailto:yan.w...@linux.intel.com>>
>
> Avoid searching range too small in some case like vector of float.
> It will lead more load/store merged for improving perforamnce.
>
> Signed-off-by: Yan Wang 
> <yan.w...@linux.intel.com<mailto:yan.w...@linux.intel.com>>
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index e797e98..e569a8e 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -180,7 +180,7 @@ namespace gbe {
>  BasicBlock::iterator J = start;
>  ++J;
>
> -unsigned maxLimit = maxVecSize * 8;
> +unsigned maxLimit = std::max(maxVecSize * 8, 150u);

Could you give some performance number against some known benchmarks?
Please select some complex enough OpenCL kernel. Maybe luxmark? Darktable?
How it would benefit the runtime performance and how much it would hurt the 
compile-time performance?
So we could know whether the change is reasonable.

Thanks!
Ruiling
>  bool reordered = false;
>
>  for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> --
> 2.7.4
>
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org<mailto:Beignet@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org<mailto:Beignet@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
Some typo. Sorry for it.
I have modified it.



yan.wang
 
From: yan.wang
Date: 2017-03-10 10:52
To: ruiling.song; beignet
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store 
as possible.
It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32.
I am not sure the reason of maxLimit = maxVecSize * 8.
32 is too small for searching and could not find more available load after 
leading load.
It will improve eaw_decompose kernel of darktable from 2.1876s to 1.8855s 
because reduce send from 3 send (2 float, 2 float, 1 float) to 2 send (4 float, 
1 float).
There is another issue when compling eaw_decompose kernel and I will submit 
another patch for it.
At least need set one low bound for maxLimit like 150 to avoid searching range 
too small.



yan.wang
 
From: Song, Ruiling
Date: 2017-03-10 10:39
To: yan.w...@linux.intel.com; beignet@lists.freedesktop.org
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store 
as possible.
 
 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang <yan.w...@linux.intel.com>
> Subject: [Beignet] [PATCH v2] Provide more possible candidate of load/store as
> possible.
> 
> From: Yan Wang <yan.w...@linux.intel.com>
> 
> Avoid searching range too small in some case like vector of float.
> It will lead more load/store merged for improving perforamnce.
> 
> Signed-off-by: Yan Wang <yan.w...@linux.intel.com>
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index e797e98..e569a8e 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -180,7 +180,7 @@ namespace gbe {
>  BasicBlock::iterator J = start;
>  ++J;
> 
> -unsigned maxLimit = maxVecSize * 8;
> +unsigned maxLimit = std::max(maxVecSize * 8, 150u);
 
Could you give some performance number against some known benchmarks?
Please select some complex enough OpenCL kernel. Maybe luxmark? Darktable?
How it would benefit the runtime performance and how much it would hurt the 
compile-time performance?
So we could know whether the change is reasonable.
 
Thanks!
Ruiling
>  bool reordered = false;
> 
>  for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32.
I am not sure the reason of maxLimit = maxVecSize * 8.
32 is too samll for saerching and could not find more available load after 
leading load.
It will improve eaw_decompose kernel of darktable from 2.1876s to 1.8855s 
because reduce send from 3 send (2 float, 2 float, 1 float) to 2 send (4 float, 
1 float).
There is another issue when compiing eaw_decompose kernel and I will submit 
another patch for it.
At least need set one low bound for maxLimit like 150 to avoid seaching range 
too slow.



yan.wang
 
From: Song, Ruiling
Date: 2017-03-10 10:39
To: yan.w...@linux.intel.com; beignet@lists.freedesktop.org
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store 
as possible.
 
 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang <yan.w...@linux.intel.com>
> Subject: [Beignet] [PATCH v2] Provide more possible candidate of load/store as
> possible.
> 
> From: Yan Wang <yan.w...@linux.intel.com>
> 
> Avoid searching range too small in some case like vector of float.
> It will lead more load/store merged for improving perforamnce.
> 
> Signed-off-by: Yan Wang <yan.w...@linux.intel.com>
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index e797e98..e569a8e 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -180,7 +180,7 @@ namespace gbe {
>  BasicBlock::iterator J = start;
>  ++J;
> 
> -unsigned maxLimit = maxVecSize * 8;
> +unsigned maxLimit = std::max(maxVecSize * 8, 150u);
 
Could you give some performance number against some known benchmarks?
Please select some complex enough OpenCL kernel. Maybe luxmark? Darktable?
How it would benefit the runtime performance and how much it would hurt the 
compile-time performance?
So we could know whether the change is reasonable.
 
Thanks!
Ruiling
>  bool reordered = false;
> 
>  for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread Song, Ruiling


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v2] Provide more possible candidate of load/store as
> possible.
> 
> From: Yan Wang 
> 
> Avoid searching range too small in some case like vector of float.
> It will lead more load/store merged for improving perforamnce.
> 
> Signed-off-by: Yan Wang 
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index e797e98..e569a8e 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -180,7 +180,7 @@ namespace gbe {
>  BasicBlock::iterator J = start;
>  ++J;
> 
> -unsigned maxLimit = maxVecSize * 8;
> +unsigned maxLimit = std::max(maxVecSize * 8, 150u);

Could you give some performance number against some known benchmarks?
Please select some complex enough OpenCL kernel. Maybe luxmark? Darktable?
How it would benefit the runtime performance and how much it would hurt the 
compile-time performance?
So we could know whether the change is reasonable.

Thanks!
Ruiling
>  bool reordered = false;
> 
>  for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet