Re: [Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads

2018-03-07 Thread Ian Romanick
On 03/05/2018 04:53 PM, Kenneth Graunke wrote:
> On Friday, February 23, 2018 3:56:06 PM PST Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> No changes on other platforms.
>>
>> Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
>> total instructions in shared programs: 13059891 -> 13059884 (<.01%)
>> instructions in affected programs: 431 -> 424 (-1.62%)
>> helped: 7
>> HURT: 0
>> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
>> helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
>> 95% mean confidence interval for instructions value: -1.00 -1.00
>> 95% mean confidence interval for instructions %-change: -3.39% -0.71%
>> Instructions are helped.
>>
>> total cycles in shared programs: 409260032 -> 409260018 (<.01%)
>> cycles in affected programs: 4228 -> 4214 (-0.33%)
>> helped: 7
>> HURT: 0
>> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
>> helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
>> 95% mean confidence interval for cycles value: -2.00 -2.00
>> 95% mean confidence interval for cycles %-change: -1.15% 0.07%
>>
>> Inconclusive result (%-change mean confidence interval includes 0).
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/intel/compiler/brw_vec4_cse.cpp | 21 +
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_vec4_cse.cpp 
>> b/src/intel/compiler/brw_vec4_cse.cpp
>> index aa2f127..3302939 100644
>> --- a/src/intel/compiler/brw_vec4_cse.cpp
>> +++ b/src/intel/compiler/brw_vec4_cse.cpp
>> @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const 
>> vec4_instruction *b)
>>return xs[0].equals(ys[0]) &&
>>   ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
>>(xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
>> +   } else if (a->opcode == BRW_OPCODE_MOV &&
>> +  xs[0].file == IMM &&
>> +  xs[0].type == BRW_REGISTER_TYPE_VF) {
>> +  src_reg tmp_x = xs[0];
>> +  src_reg tmp_y = ys[0];
>> +
>> +  /* Smash out the values that are not part of the writemask.  Otherwise
>> +   * the equals and negative_equals operators will fail due to 
>> mismatches
>> +   * in unused components.
>> +   */
>> +  static const uint32_t mask[] = {
>> + 0x, 0x00ff, 0xff00, 0x,
>> + 0x00ff, 0x00ff00ff, 0x0000, 0x00ff,
>> + 0xff00, 0xffff, 0xff00ff00, 0xff00,
>> + 0x, 0x00ff, 0xff00, 0x
>> +  };
> 
> This looks a bit crazy at first glance...it isn't though, it's just a
> mapping from WRITEMASK_* to the proper byte masks.  But I think it might
> be clearer to write it with an explicit formula:
> 
> unsigned ab_writemask = a->dst.writemask & b->dst.writemask;
> uint32_t mask = ((ab_writemask & WRITEMASK_X) ? 0x00ff : 0) |
> ((ab_writemask & WRITEMASK_Y) ? 0xff00 : 0) |
> ((ab_writemask & WRITEMASK_Z) ? 0x00ff : 0) |
> ((ab_writemask & WRITEMASK_W) ? 0xff00 : 0);
> tmp_x.ud &= mask;
> tmp_y.ud &= mask;

I was going to complain that this would be bigger / less efficient /
something.  However, in a release build this is actually 48 bytes
smaller. :)

> This makes it pretty obvious what's going on.
> 
> Either way (though I prefer mine),
> Reviewed-by: Kenneth Graunke 
> 
>> +
>> +  tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask];
>> +  tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask];
>> +
>> +  return tmp_x.equals(tmp_y);
>> } else if (!a->is_commutative()) {
>>return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && 
>> xs[2].equals(ys[2]);
>> } else {
>>
> 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads

2018-03-05 Thread Kenneth Graunke
On Friday, February 23, 2018 3:56:06 PM PST Ian Romanick wrote:
> From: Ian Romanick 
> 
> No changes on other platforms.
> 
> Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
> total instructions in shared programs: 13059891 -> 13059884 (<.01%)
> instructions in affected programs: 431 -> 424 (-1.62%)
> helped: 7
> HURT: 0
> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
> helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
> 95% mean confidence interval for instructions value: -1.00 -1.00
> 95% mean confidence interval for instructions %-change: -3.39% -0.71%
> Instructions are helped.
> 
> total cycles in shared programs: 409260032 -> 409260018 (<.01%)
> cycles in affected programs: 4228 -> 4214 (-0.33%)
> helped: 7
> HURT: 0
> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
> 95% mean confidence interval for cycles value: -2.00 -2.00
> 95% mean confidence interval for cycles %-change: -1.15% 0.07%
> 
> Inconclusive result (%-change mean confidence interval includes 0).
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/intel/compiler/brw_vec4_cse.cpp | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_vec4_cse.cpp 
> b/src/intel/compiler/brw_vec4_cse.cpp
> index aa2f127..3302939 100644
> --- a/src/intel/compiler/brw_vec4_cse.cpp
> +++ b/src/intel/compiler/brw_vec4_cse.cpp
> @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const 
> vec4_instruction *b)
>return xs[0].equals(ys[0]) &&
>   ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
>(xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
> +   } else if (a->opcode == BRW_OPCODE_MOV &&
> +  xs[0].file == IMM &&
> +  xs[0].type == BRW_REGISTER_TYPE_VF) {
> +  src_reg tmp_x = xs[0];
> +  src_reg tmp_y = ys[0];
> +
> +  /* Smash out the values that are not part of the writemask.  Otherwise
> +   * the equals and negative_equals operators will fail due to mismatches
> +   * in unused components.
> +   */
> +  static const uint32_t mask[] = {
> + 0x, 0x00ff, 0xff00, 0x,
> + 0x00ff, 0x00ff00ff, 0x0000, 0x00ff,
> + 0xff00, 0xffff, 0xff00ff00, 0xff00,
> + 0x, 0x00ff, 0xff00, 0x
> +  };

This looks a bit crazy at first glance...it isn't though, it's just a
mapping from WRITEMASK_* to the proper byte masks.  But I think it might
be clearer to write it with an explicit formula:

unsigned ab_writemask = a->dst.writemask & b->dst.writemask;
uint32_t mask = ((ab_writemask & WRITEMASK_X) ? 0x00ff : 0) |
((ab_writemask & WRITEMASK_Y) ? 0xff00 : 0) |
((ab_writemask & WRITEMASK_Z) ? 0x00ff : 0) |
((ab_writemask & WRITEMASK_W) ? 0xff00 : 0);
tmp_x.ud &= mask;
tmp_y.ud &= mask;

This makes it pretty obvious what's going on.

Either way (though I prefer mine),
Reviewed-by: Kenneth Graunke 

> +
> +  tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask];
> +  tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask];
> +
> +  return tmp_x.equals(tmp_y);
> } else if (!a->is_commutative()) {
>return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && 
> xs[2].equals(ys[2]);
> } else {
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads

2018-02-23 Thread Ian Romanick
From: Ian Romanick 

No changes on other platforms.

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13059891 -> 13059884 (<.01%)
instructions in affected programs: 431 -> 424 (-1.62%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -3.39% -0.71%
Instructions are helped.

total cycles in shared programs: 409260032 -> 409260018 (<.01%)
cycles in affected programs: 4228 -> 4214 (-0.33%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -1.15% 0.07%

Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick 
---
 src/intel/compiler/brw_vec4_cse.cpp | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/intel/compiler/brw_vec4_cse.cpp 
b/src/intel/compiler/brw_vec4_cse.cpp
index aa2f127..3302939 100644
--- a/src/intel/compiler/brw_vec4_cse.cpp
+++ b/src/intel/compiler/brw_vec4_cse.cpp
@@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const 
vec4_instruction *b)
   return xs[0].equals(ys[0]) &&
  ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
   (xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
+   } else if (a->opcode == BRW_OPCODE_MOV &&
+  xs[0].file == IMM &&
+  xs[0].type == BRW_REGISTER_TYPE_VF) {
+  src_reg tmp_x = xs[0];
+  src_reg tmp_y = ys[0];
+
+  /* Smash out the values that are not part of the writemask.  Otherwise
+   * the equals and negative_equals operators will fail due to mismatches
+   * in unused components.
+   */
+  static const uint32_t mask[] = {
+ 0x, 0x00ff, 0xff00, 0x,
+ 0x00ff, 0x00ff00ff, 0x0000, 0x00ff,
+ 0xff00, 0xffff, 0xff00ff00, 0xff00,
+ 0x, 0x00ff, 0xff00, 0x
+  };
+
+  tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask];
+  tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask];
+
+  return tmp_x.equals(tmp_y);
} else if (!a->is_commutative()) {
   return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]);
} else {
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev