Re: [Mesa-dev] [PATCH 17/22] nir: Narrow some dot product operations

2018-03-07 Thread Ian Romanick
On 03/05/2018 04:35 PM, Matt Turner wrote:
> On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> On vector platforms, this helps elide some constant loads.
>>
>> No changes on Broadwell or Skylake.
>>
>> Haswell
>> total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
>> instructions in affected programs: 1277532 -> 1243902 (-2.63%)
>> helped: 13216
>> HURT: 95
> 
> What's going on in the hurt shaders?

I'm not completely sure.  All of these shaders are negatively affected
by the DPH transformation.  Only one of the shaders is small enough (19
instructions) to easily examine.  In that case, it looks like a couple
things end up not getting loaded via VF.  Only one of those is the DPH
operand.  There are appear to be changes in the constant loading in the
others as well, which causes the shaders to diverge slightly after about
5 instruction making comparisons between the 70+ instruction shaders
frustrating at best.  Many of the shorter shaders had flow control, so
that exacerbated the issue.

I tried a couple modifications to the DPH pattern including
'vec4(is_used_once)' and 'c(is_not_const)'.  These had missed results in
cycles, and didn't consistently help the instruction counts in the 95.

I did discover that I should have listed the transformations in the
opposite order.  As is, code that matches the last fdot4 pattern will
never become a multiply (speculation) because the previous
transformations will gradually convert it to a fdot2.

Flipping the order helped instructions in 1 program but hurt cycles.
Looking at the changed shader, it appears that flipping the order allows
an fdot4 to be converted to and fdot2 instead of an fdot3.  This allows
CSE to eliminate the (new) fdot2.

Oddly, flipping the order made a shader-db slightly slower...
1.113±0.711 seconds (0.429%±0.274%) at n=10 for a HSW run on my quadcore
HSW desktop.  I would have expected it to be slightly faster. *shrug*
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 17/22] nir: Narrow some dot product operations

2018-03-05 Thread Matt Turner
On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> On vector platforms, this helps elide some constant loads.
>
> No changes on Broadwell or Skylake.
>
> Haswell
> total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
> instructions in affected programs: 1277532 -> 1243902 (-2.63%)
> helped: 13216
> HURT: 95

What's going on in the hurt shaders?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 17/22] nir: Narrow some dot product operations

2018-02-27 Thread Samuel Iglesias Gonsálvez
Reviewed-by: Samuel Iglesias Gonsálvez 


On 24/02/18 00:56, Ian Romanick wrote:
> From: Ian Romanick 
>
> On vector platforms, this helps elide some constant loads.
>
> No changes on Broadwell or Skylake.
>
> Haswell
> total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
> instructions in affected programs: 1277532 -> 1243902 (-2.63%)
> helped: 13216
> HURT: 95
> helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
> helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
> HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
> HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
> 95% mean confidence interval for instructions value: -2.57 -2.49
> 95% mean confidence interval for instructions %-change: -3.65% -3.54%
> Instructions are helped.
>
> total cycles in shared programs: 409580819 -> 409268463 (-0.08%)
> cycles in affected programs: 71730652 -> 71418296 (-0.44%)
> helped: 9898
> HURT: 2352
> helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
> helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
> HURT stats (abs)   min: 2 max: 276 x̄: 23.25 x̃: 6
> HURT stats (rel)   min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
> 95% mean confidence interval for cycles value: -33.19 -17.80
> 95% mean confidence interval for cycles %-change: -4.50% -4.26%
> Cycles are helped.
>
> total fills in shared programs: 82059 -> 82052 (<.01%)
> fills in affected programs: 21 -> 14 (-33.33%)
> helped: 7
> HURT: 0
>
> Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
> total instructions in shared programs: 11811851 -> 11780605 (-0.26%)
> instructions in affected programs: 1155007 -> 1123761 (-2.71%)
> helped: 12304
> HURT: 95
> helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
> helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
> HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
> HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
> 95% mean confidence interval for instructions value: -2.56 -2.48
> 95% mean confidence interval for instructions %-change: -3.71% -3.59%
> Instructions are helped.
>
> total cycles in shared programs: 257618409 -> 257316805 (-0.12%)
> cycles in affected programs: 71999580 -> 71697976 (-0.42%)
> helped: 9155
> HURT: 2380
> helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
> helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
> HURT stats (abs)   min: 2 max: 290 x̄: 21.14 x̃: 4
> HURT stats (rel)   min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
> 95% mean confidence interval for cycles value: -34.32 -17.97
> 95% mean confidence interval for cycles %-change: -4.55% -4.29%
> Cycles are helped.
>
> GM45 and Iron Lake had nearly identical results (Iron Lake shown)
> total instructions in shared programs: 7886750 -> 7879944 (-0.09%)
> instructions in affected programs: 373781 -> 366975 (-1.82%)
> helped: 3715
> HURT: 47
> helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
> helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
> HURT stats (abs)   min: 1 max: 6 x̄: 2.55 x̃: 2
> HURT stats (rel)   min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
> 95% mean confidence interval for instructions value: -1.85 -1.77
> 95% mean confidence interval for instructions %-change: -2.91% -2.73%
> Instructions are helped.
>
> total cycles in shared programs: 178114636 -> 178095452 (-0.01%)
> cycles in affected programs: 7227666 -> 7208482 (-0.27%)
> helped: 3349
> HURT: 301
> helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
> helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
> HURT stats (abs)   min: 2 max: 42 x̄: 9.13 x̃: 10
> HURT stats (rel)   min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
> 95% mean confidence interval for cycles value: -5.52 -4.99
> 95% mean confidence interval for cycles %-change: -0.81% -0.73%
> Cycles are helped.
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 26ddf10..3366a43 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -125,6 +125,14 @@ optimizations = [
> (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
> (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
>  
> +   (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), 
> d)),
> +   (('fdot4', ('vec4', a, b,   c,   0.0), d), ('fdot3', ('vec3', a, b, c), 
> d)),
> +   (('fdot4', ('vec4', a, b,   0.0, 0.0), c), ('fdot2', ('vec2', a, b), c)),
> +   (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
> +
> +   (('fdot3', ('vec3', a, b,   0.0), c), ('fdot2', ('vec2', a, b), c)),
> +   (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
> +
> # (a * #b + #c) << #d
> # ((a * #b) << #d) + (#c << #d)
> # (a * (#b << #d)) + (#c << #d)



signature.asc

[Mesa-dev] [PATCH 17/22] nir: Narrow some dot product operations

2018-02-23 Thread Ian Romanick
From: Ian Romanick 

On vector platforms, this helps elide some constant loads.

No changes on Broadwell or Skylake.

Haswell
total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
instructions in affected programs: 1277532 -> 1243902 (-2.63%)
helped: 13216
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.57 -2.49
95% mean confidence interval for instructions %-change: -3.65% -3.54%
Instructions are helped.

total cycles in shared programs: 409580819 -> 409268463 (-0.08%)
cycles in affected programs: 71730652 -> 71418296 (-0.44%)
helped: 9898
HURT: 2352
helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
HURT stats (abs)   min: 2 max: 276 x̄: 23.25 x̃: 6
HURT stats (rel)   min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
95% mean confidence interval for cycles value: -33.19 -17.80
95% mean confidence interval for cycles %-change: -4.50% -4.26%
Cycles are helped.

total fills in shared programs: 82059 -> 82052 (<.01%)
fills in affected programs: 21 -> 14 (-33.33%)
helped: 7
HURT: 0

Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
total instructions in shared programs: 11811851 -> 11780605 (-0.26%)
instructions in affected programs: 1155007 -> 1123761 (-2.71%)
helped: 12304
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.56 -2.48
95% mean confidence interval for instructions %-change: -3.71% -3.59%
Instructions are helped.

total cycles in shared programs: 257618409 -> 257316805 (-0.12%)
cycles in affected programs: 71999580 -> 71697976 (-0.42%)
helped: 9155
HURT: 2380
helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
HURT stats (abs)   min: 2 max: 290 x̄: 21.14 x̃: 4
HURT stats (rel)   min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
95% mean confidence interval for cycles value: -34.32 -17.97
95% mean confidence interval for cycles %-change: -4.55% -4.29%
Cycles are helped.

GM45 and Iron Lake had nearly identical results (Iron Lake shown)
total instructions in shared programs: 7886750 -> 7879944 (-0.09%)
instructions in affected programs: 373781 -> 366975 (-1.82%)
helped: 3715
HURT: 47
helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
HURT stats (abs)   min: 1 max: 6 x̄: 2.55 x̃: 2
HURT stats (rel)   min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
95% mean confidence interval for instructions value: -1.85 -1.77
95% mean confidence interval for instructions %-change: -2.91% -2.73%
Instructions are helped.

total cycles in shared programs: 178114636 -> 178095452 (-0.01%)
cycles in affected programs: 7227666 -> 7208482 (-0.27%)
helped: 3349
HURT: 301
helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
HURT stats (abs)   min: 2 max: 42 x̄: 9.13 x̃: 10
HURT stats (rel)   min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
95% mean confidence interval for cycles value: -5.52 -4.99
95% mean confidence interval for cycles %-change: -0.81% -0.73%
Cycles are helped.

Signed-off-by: Ian Romanick 
---
 src/compiler/nir/nir_opt_algebraic.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 26ddf10..3366a43 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -125,6 +125,14 @@ optimizations = [
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
 
+   (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), d)),
+   (('fdot4', ('vec4', a, b,   c,   0.0), d), ('fdot3', ('vec3', a, b, c), d)),
+   (('fdot4', ('vec4', a, b,   0.0, 0.0), c), ('fdot2', ('vec2', a, b), c)),
+   (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
+
+   (('fdot3', ('vec3', a, b,   0.0), c), ('fdot2', ('vec2', a, b), c)),
+   (('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
+
# (a * #b + #c) << #d
# ((a * #b) << #d) + (#c << #d)
# (a * (#b << #d)) + (#c << #d)
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev