From: Ian Romanick
For some reason, if I did not move the regular lowering to late
optimizations, the new lowering would never trigger. This also means
that the fsub lowering had to be added to late optimizations, and this
requires "intel/compiler: Repeat nir_opt_algebraic_late until no more
progress".
The loops removed by this patch are the same loops added by
"intel/compiler: Don't emit flrp for Gen4 or Gen5"
I am CC'ing people who are responsible for drivers that set lower_flrp32
as this patch will likely affect shader-db results for those drivers.
No changes on any Gen6+ platform.
Iron Lake
total instructions in shared programs: 7730019 -> 7731893 (0.02%)
instructions in affected programs: 139980 -> 141854 (1.34%)
helped: 262
HURT: 329
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.11% max: 4.69% x̄: 1.70% x̃: 1.30%
HURT stats (abs) min: 1 max: 19 x̄: 8.09 x̃: 7
HURT stats (rel) min: 0.32% max: 23.53% x̄: 5.10% x̃: 4.74%
95% mean confidence interval for instructions value: 2.62 3.72
95% mean confidence interval for instructions %-change: 1.73% 2.44%
Instructions are HURT.
total cycles in shared programs: 177866190 -> 177851638 (<.01%)
cycles in affected programs: 18970354 -> 18955802 (-0.08%)
helped: 1700
HURT: 962
helped stats (abs) min: 2 max: 70 x̄: 17.40 x̃: 16
helped stats (rel) min: <.01% max: 3.36% x̄: 0.37% x̃: 0.23%
HURT stats (abs) min: 2 max: 114 x̄: 15.62 x̃: 6
HURT stats (rel) min: <.01% max: 10.50% x̄: 0.98% x̃: 0.39%
95% mean confidence interval for cycles value: -6.33 -4.60
95% mean confidence interval for cycles %-change: 0.07% 0.16%
Inconclusive result (value mean confidence interval and %-change mean
confidence interval disagree).
total loops in shared programs: 854 -> 850 (-0.47%)
loops in affected programs: 4 -> 0
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.
GM45
total instructions in shared programs: 4769335 -> 4770019 (0.01%)
instructions in affected programs: 90821 -> 91505 (0.75%)
helped: 219
HURT: 167
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.11% max: 4.35% x̄: 1.70% x̃: 1.30%
HURT stats (abs) min: 1 max: 19 x̄: 8.02 x̃: 7
HURT stats (rel) min: 0.32% max: 22.86% x̄: 4.95% x̃: 4.57%
95% mean confidence interval for instructions value: 1.12 2.43
95% mean confidence interval for instructions %-change: 0.77% 1.59%
Instructions are HURT.
total cycles in shared programs: 121980262 -> 121970888 (<.01%)
cycles in affected programs: 12861602 -> 12852228 (-0.07%)
helped: 1040
HURT: 492
helped stats (abs) min: 2 max: 70 x̄: 17.65 x̃: 16
helped stats (rel) min: <.01% max: 3.36% x̄: 0.32% x̃: 0.21%
HURT stats (abs) min: 2 max: 114 x̄: 18.26 x̃: 6
HURT stats (rel) min: <.01% max: 10.50% x̄: 1.00% x̃: 0.35%
95% mean confidence interval for cycles value: -7.34 -4.89
95% mean confidence interval for cycles %-change: 0.05% 0.17%
Inconclusive result (value mean confidence interval and %-change mean
confidence interval disagree).
total loops in shared programs: 631 -> 629 (-0.32%)
loops in affected programs: 2 -> 0
helped: 2
HURT: 0
Signed-off-by: Ian Romanick
Cc: Marek Olšák
Cc: Rob Clark
Cc: Eric Anholt
---
src/compiler/nir/nir_opt_algebraic.py | 30 --
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index f11a987c462..54f901e6cad 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -120,8 +120,6 @@ optimizations = [
(('flrp@64', 1.0, b, c), ('fadd', ('fsub', 1.0, c), ('fmul', b, c)),
'options->lower_flrp64'),
(('flrp@32', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))),
'options->lower_flrp32'),
(('flrp@64', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))),
'options->lower_flrp64'),
- (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
'options->lower_flrp32'),
- (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
'options->lower_flrp64'),
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
(('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c, ('fmul', b,
('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b,
c )), ('flrp', a, b, c), '!options->lower_flrp32'),
@@ -134,6 +132,30 @@ optimizations = [
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
+ # flrp(a, b, c) * flrp(d, e, c)
+ # (a(1-c) + bc)) * (d(1-c) + ec)
+ #
+ # Since (1-d) is common, it is one operation less than the other
+ # expansion.
+ (('fm