From: Karol Herbst <g...@karolherbst.de>
helps shaders in saints row IV, bioshock infinite and shadow warrior
total instructions in shared programs : 1921966 -> 1910935 (-0.57%)
total gprs used in shared programs: 251863 -> 251728 (-0.05%)
total local used in shared programs :
reduces calls up to 50%
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nou
From: Karol Herbst <g...@karolherbst.de>
helps shaders in multiple games
total instructions in shared programs : 1925865 -> 1922112 (-0.19%)
total gprs used in shared programs: 251863 -> 251863 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes
lgpr inst bytes
helped 0 0 62 62
hurt 0 0 0 0
v2: make the diff more clear and use swapSources
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
src/gallium/drive
From: Karol Herbst <g...@karolherbst.de>
helps shaders in multiple games
total instructions in shared programs : 1910935 -> 1901781 (-0.48%)
total gprs used in shared programs: 251728 -> 251728 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes
From: Karol Herbst <g...@karolherbst.de>
total instructions in shared programs : 1895008 -> 1894759 (-0.01%)
total gprs used in shared programs: 251728 -> 251715 (-0.01%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes used in shared programs : 17377
w IV
shadow warrior
talos principle
unigine heaven/valley
wasteland 2
witcher 2
Karol Herbst (7):
nv50/ir: enable PostRaConstantFolding for [c0,f0)
nv50/ir: swap sources in PostRaConstantFolding when src0 is imm
nv50/ir: optimize neg(and(set, 1)) to set
nv50/ir: optimize shl(
: simplified the code
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 32 ++
1 file changed, 32 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drive
the first patch improves some shaders in some games and the second one
fixes an issue if the optimizations passes are rerun
Karol Herbst (2):
nv50/ir: optimize neg(and(set, 1)) to set
nv50/ir: we can't do the add to mad conversion when the mul saturates
.../drivers/nouveau/codegen
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 1d04a6d..7e58328 100644
---
From: Karol Herbst <g...@karolherbst.de>
helps some shaders in multiple games
total instructions in shared programs : 1922267 -> 1922121 (-0.01%)
total gprs used in shared programs: 251878 -> 251878 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
t
From: Karol Herbst <g...@karolherbst.de>
helps shaders in some games
total instructions in shared programs : 1901958 -> 1895185 (-0.36%)
total gprs used in shared programs: 251739 -> 251739 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes
From: Karol Herbst <g...@karolherbst.de>
helps shaders in multiple games
total instructions in shared programs : 1926020 -> 1922267 (-0.19%)
total gprs used in shared programs: 251878 -> 251878 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes
From: Karol Herbst <g...@karolherbst.de>
helps shaders in multiple games
total instructions in shared programs : 192 -> 1901958 (-0.48%)
total gprs used in shared programs: 251739 -> 251739 (0.00%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes
From: Karol Herbst <g...@karolherbst.de>
helps shaders in saints row IV, bioshock infinite and shadow warrior
total instructions in shared programs : 1922121 -> 192 (-0.57%)
total gprs used in shared programs: 251878 -> 251739 (-0.06%)
total local used in shared programs :
w IV
shadow warrior
talos principle
unigine heaven/valley
wasteland 2
witcher 2
Karol Herbst (6):
nv50/ir: enable PostRaConstantFolding for [c0,f0)
nv50/ir: swap sources in PostRaConstantFolding when src0 is imm
nv50/ir: optimize neg(add(bool, 1)) to bool for OP_SET and OP_SLCT
nv50/ir: opti
reduces Pass rerun by around 40%
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/g
From: Karol Herbst <g...@karolherbst.de>
sometimes an application might crash with a message like this:
ERROR: no viable spill candidates left
this is due to a memory corruption wich only manifest when there is another RA
round
this fixes this
Signed-off-by: Karol Herbst
> Ilia Mirkin <imir...@alum.mit.edu> hat am 26. Januar 2016 um 04:53
> geschrieben:
>
> On Mon, Jan 25, 2016 at 9:57 AM, Karol Herbst <nouv...@karolherbst.de> wrote:
> > From: Karol Herbst <g...@karolherbst.de>
> >
> > helps some shaders in mul
3 3
hurt 0 0 0 0
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
Karol Herbst (2):
nv50/ir: add PostRADCE Pass
nv50/ir: optimize sub(a, 0) to a
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 2 +-
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 77 ++
2 files changed, 52 insertions(+), 27 deletions(-)
--
2.7.1
ograms : 5569 -> 5569 (0.00%)
total bytes used in shared programs : 16513528 -> 16451848 (-0.37%)
v2: remove the DCE stuff from NV50PostRaConstantFolding alltogether
only run this Pass with NV50_PROG_OPTIMIZE >= 1
Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
src/ga
Hi all,
the game "Divinity: Original Sin - Enhanced Edition" uses
ARB_shading_language_include whenever it detects a non catalyst driver on Linux.
Apitraces from the game running on catalyst show that the shaders are simply
included within the game engine and replay fine with all mesa drivers as
=640:
inst_executed: 1.03G
inst_issued1: 614M -> 500M
inst_issued2: 213M -> 271M
score: 1021 -> 1056
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 59 ++
1 file changed, 59 insertions(+)
diff --gi
the compiler pass aren't as big as with it.
Karol Herbst (4):
nv50: add target->hasDualIssueing()
nvc0/ir: don't dual issue instructions which depend on each other
nvc0/ir: dual issue two min/max instructions
nv50: add PostRADualIssue Pass
src/gallium/drivers/nouveau/codegen/nv50_ir.
> 1030
with dual_issue pass:
inst_executed: 1.03G
inst_issued1: 535M -> 500M
inst_issued2: 254M -> 271M
score: 1052 -> 1056
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp| 14 --
1 file changed, 12 in
no changes without a dual_issue pass
changes with for ./GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=6 /width=1024 /height=640:
inst_executed: 1.03G
inst_issued1: 538M -> 535M
inst_issued2: 251M -> 254M
score: 1038 -> 1052
Signed-off-by: Kar
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_target.h| 1 +
src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 7 ++-
src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h | 1 +
3 files changed, 8 inse
2016-08-13 18:17 GMT+02:00 Ilia Mirkin <imir...@alum.mit.edu>:
> On Sat, Aug 13, 2016 at 6:02 AM, Karol Herbst <karolher...@gmail.com> wrote:
>> no changes without a dual_issue pass
>>
>> changes with for ./GpuTest /test=pixmark_piano /benchmark /no_scorebox
>
2016-08-13 19:27 GMT+02:00 Ilia Mirkin <imir...@alum.mit.edu>:
> On Sat, Aug 13, 2016 at 1:24 PM, karol herbst <karolher...@gmail.com> wrote:
>> 2016-08-13 18:17 GMT+02:00 Ilia Mirkin <imir...@alum.mit.edu>:
>>> On Sat, Aug 13, 2016 at 6:02 AM, Karol
2016-08-13 17:43 GMT+02:00 Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de>:
> Hi Karol,
>
> one question inline.
>
>
> On 13.08.2016 12:02, Karol Herbst wrote:
>>
>> min/max pairs can be dual issued on Kepler1
>>
>> changes for ./GpuTest /te
2016-08-13 21:33 GMT+02:00 Ilia Mirkin :
> On Sat, Aug 13, 2016 at 3:26 PM, Connor Abbott wrote:
>> So, I don't know much about how nv50 ir works, but to me this just
>> seems like a pretty slow implementation of a very limited instruction
>> scheduler.
slightly improves performance for GpuTest /test=pixmark_piano /benchmark
/no_scorebox /msaa=0 /benchmark_duration_ms=6 /width=1024 /height=640
score: 1031 -> 1033
observed from the binary generated by nvidia
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/galliu
Hey,
nice work regarding the lmsensor bits. But I think it makes sense to
also wire the power things in, cause we actually expose them within
nouveau. Others might want or actually do the same as well.
Many thanks
2016-09-12 20:33 GMT+02:00 Steven Toth :
> Three new
2016-09-12 23:20 GMT+02:00 Steven Toth :
>> nice work regarding the lmsensor bits. But I think it makes sense to
>> also wire the power things in, cause we actually expose them within
>> nouveau. Others might want or actually do the same as well.
>
> Karol, thank you for your
well it won't for your GPU, it is currently Fermi (GF100+) only.
I guess I will add support for it later then
2016-09-13 13:14 GMT+02:00 Steven Toth :
>>> Ahh, my nouveau card must be too old then. I only get temperature from
>>> it. I have a 6yo(?) 8800 GTS. That being
2016-09-13 0:15 GMT+02:00 Steven Toth :
>> I think you expose Temperature, Voltage and Current. But Nouveau exposes
>> Temperature, Voltage, Fan and Power through hwmon.
>>
>> Read the "power" section here for more info:
>>
25837792 -> 25837192 (-0.00%)
localgpr inst bytes
helped 0 0 33 33
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drive
t way you won't break things and mupuf will appreciate. :)
>
I think you read the patches in the wrong order. The two first patches
are the changes in the emiter.
> On 10/08/2016 05:43 PM, Karol Herbst wrote:
>>
>> Signed-off-by: Karol Herbst <karolher...@gmail.c
we might want to add more folding passes here, so make it a bit more generic
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 124 ++---
1 file changed, 62 insertions(+), 62 deletions(-)
diff --git a/src/gallium/d
just little random noise in shader-db
will help in the next patch
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_
This series reworks the structure of the pass to make it easier to add
more optimisations to it.
Also implements folding for mad on gf100+ ISAs to reduce instruction count
by ~0.37%
I can only test it on a gk106 for now.
Karol Herbst (6):
nv50/ir: add LIMM form of mad to gk110
nv50/ir: add
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 49 ++
1 file changed, 32 insertions(+), 17 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
b/src/gallium/drivers/nouveau/c
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h| 2 +-
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20 +++-
2 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/src/gallium/drivers/nouveau/c
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 32 --
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers/nouveau/c
odeEmitterGM107::emitIMMD and it indeed does some magic there.
>
> On Sat, Oct 8, 2016 at 3:23 PM, Karol Herbst <karolher...@gmail.com> wrote:
>> the emit code uses 19 everywhere, so we should let
>> CodeEmitterGM107::longIMMD and TargetNVC0::insnCanLoad check against
>&g
looks great, a few comments below
2016-10-08 21:55 GMT+02:00 Samuel Pitoiset :
> total instructions in shared programs :2286901 -> 2284473 (-0.11%)
> total gprs used in shared programs:335256 -> 335273 (0.01%)
> total local used in shared programs :31968 -> 31968
the emit code uses 19 everywhere, so we should let
CodeEmitterGM107::longIMMD and TargetNVC0::insnCanLoad check against
this too
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 6 +++---
src/gallium/drivers/nouveau/c
2016-10-08 18:12 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>:
> Usually we prefix with gm107/ir, gk110/ir, etc...
>
> More comments below.
>
> On 10/08/2016 05:43 PM, Karol Herbst wrote:
>>
>> Signed-off-by: Karol Herbst <karolher...@gmail.com>
2016-10-08 18:39 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>:
>
>
> On 10/08/2016 05:43 PM, Karol Herbst wrote:
>>
>> Signed-off-by: Karol Herbst <karolher...@gmail.com>
>> ---
>> src/gallium/drivers/nouveau/codegen/nv50_ir.h
v2: renamed commit
reordered modifiers
add assert(dst == src2)
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 50 ++
1 file changed, 33 insertions(+), 17 deletions(-)
diff --git a/src/gallium/d
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h| 2 +-
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20 +++-
2 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/src/gallium/drivers/nouveau/c
t bytes
helped 0 2640934093
hurt 0 20 61 61
Karol Herbst (6):
gk110/ir: add LIMM form of mad
gm107/ir: add LIMM form of mad
nv50/ir: replace post_ra_dead by Instruction::isDead
nv50/ir:
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 120 +++--
1 file changed, 62 inse
gt; 25743616 (-0.12%)
localgpr inst bytes
helped 0 2617361736
hurt 0 20 78 78
v2: reorder to show the benefit of this patch
Signed-off-by: Karol Herbst <karolher...@gmail.com>
0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 64 +++---
1 file changed
v2: renamed commit
reordered modifiers
add assert(dst == src2)
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 35 --
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/src/gallium/d
lgpr inst bytes
helped 0 25 100 100
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 2 +-
1 file changed,
fixes a crash in the case simplify reports an error
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 12 +++-
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.
2016-09-30 16:57 GMT+02:00 Ian Romanick :
> On 09/30/2016 06:23 AM, Brian Paul wrote:
>> On 09/30/2016 04:59 AM, Emil Velikov wrote:
>>> On 30 September 2016 at 03:31, Timothy Arceri
>>> wrote:
On Thu, 2016-09-29 at 19:17 -0700, Jason
2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>:
>
>
> On 10/08/2016 10:04 PM, Karol Herbst wrote:
>>
>> looks great, a few comments below
>
>
> Thanks!
>
>>
>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset <samuel.pit
2016-10-26 19:20 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>:
>
>
> On 10/09/2016 11:04 AM, Karol Herbst wrote:
>>
>> v2: renamed commit
>> reordered modifiers
>> add assert(dst == src2)
>>
>> Signed-off-by: Karol Herbst <
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
v4: rename it to PostRaLoadPropagation
Signed-off-by: Karol Herbst <karolher...@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
This series reworks the structure of the pass to make it easier to add
more optimisations to it.
I have to rework the RA commit a bit and the post_ra_dead patch should be
submitted on its own.
v2: swaped the last two commits
v3: reworked order
v4: droped last two patches
Karol Herbst (4
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 35 --
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: removed wrong neg mod emission
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 50 ++
.../drivers/nouveau/codegen/nv50_ir_peepho
0 0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
v3: remove predicate checks
enable only for gf100 ISA
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_i
Subject: [PATCH v5] gm107/ir: add LIMM form of mad
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
v5: no roudning bit for limms
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cp
no regressions in piglit on my nve6
2016-11-06 15:05 GMT+01:00 Karol Herbst <karolher...@gmail.com>:
> This series reworks the structure of the pass to make it easier to add
> more optimisations to it.
>
> I have to rework the RA commit a bit and the post_ra_dead patch sho
2016-11-08 13:35 GMT+01:00 Juan A. Suarez Romero <jasua...@igalia.com>:
> On Sat, 2016-11-05 at 10:48 +0100, Karol Herbst wrote:
>> "#version 0512": 0:1(10): error: GLSL 3.30 is not supported.
>> Supported
>> versions are: 1.10, 1.20, 1.30, 1.00 ES, and 3.00
2016-11-05 2:50 GMT+01:00 Ian Romanick :
> (Sorry about the top post. Sent from my phone.)
>
> That expression will allow versions like 0130 as valid. If you just want to
> allow 0, you need a more complex regular expression. I feel like that's
> just a bandage... what
2016-11-07 10:05 GMT+01:00 Juan A. Suarez Romero <jasua...@igalia.com>:
> On Sat, 2016-11-05 at 10:48 +0100, Karol Herbst wrote:
>> 2016-11-05 2:50 GMT+01:00 Ian Romanick <i...@freedesktop.org>:
>> > (Sorry about the top post. Sent from my phone.)
>> >
>&g
01%)
>>> total local used in shared programs : 9505 -> 9505 (0.00%)
>>> total bytes used in shared programs : 25837192 -> 25833736 (-0.01%)
>>>
>>> local gpr inst bytes
>>> helped 0 25
On 21 October 2016 8:30:33 a.m. GMT+02:00, Ilia Mirkin
wrote:
>Signed-off-by: Ilia Mirkin
>---
>.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 23
>++
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
>diff --git
2016-10-30 23:45 GMT+01:00 Matt Turner <matts...@gmail.com>:
> On Sun, Oct 30, 2016 at 2:20 PM, Karol Herbst <karolher...@gmail.com> wrote:
>> Signed-off-by: Karol Herbst <karolher...@gmail.com>
>>
>> fixup
>>
>> Signed-off-by: Karol Herbst &l
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 2 +-
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 24 --
2 files changed, 9 insertions(+), 17 deletions(-)
diff --git a/src/gallium/drivers/nouveau/c
0 0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
v3: remove predicate checks
enable only for gf100 ISA
Signed-off-by: Karol Herbst <karolher...@gmail.com>
fixup
Signed-off-by: Karol Herbst &
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
Signed-off-by: Karol Herbst <karolher...@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 35 --
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp
4591
hurt 0 23 64 64
Karol Herbst (6):
nv50/ir: restructure postraconstantfolding pass
nv50/ir: implement mad post ra folding for nvc0+
nv50/ra: always prefer def == src2 for mad/sad
gk110/ir: add LIMM form of mad
gm107/ir: add LIMM form of m
localgpr inst bytes
helped 0 2619371937
hurt 0 23 81 81
v2: reorder to show the benefit of this patch
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/code
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: removed wrong neg mod emission
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 50 ++
.../drivers/nouveau/codegen/nv50_ir_peepho
for reference the bug I've created for this:
https://bugs.freedesktop.org/show_bug.cgi?id=97420
and thanks for fixing this
2016-11-04 13:22 GMT+01:00 Juan A. Suarez Romero :
> Shader can define #version as an integer, including 0.
>
> Initializes version to -1 to know later
2016-10-09 21:34 GMT+02:00 Ilia Mirkin <imir...@alum.mit.edu>:
> On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst <karolher...@gmail.com> wrote:
>> 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset <samuel.pitoi...@gmail.com>:
>>>
>>>
>>> On 10/08/20
that game still depends on ARB_shading_language_include and it checks
for that extension by checking if the function pointers are there. One
hacky solution is this:
diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
index 63f4921..e1ab885 100644
--- a/src/glx/glxcmds.c
+++ b/src/glx/glxcmds.c
@@
ytes used in shared programs : 36123344 -> 36115776 (-0.02%)
localgpr inst bytes
helped 2 48 243 243
hurt 2 3 32 32
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
s
ams: 481563 -> 481511 (-0.01%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36139384 -> 36123344 (-0.04%)
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 4 ++
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
v5: no rounding bit for limms
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 34 --
.../drivers/nouveau/c
0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
v3: remove predicate checks
enable only for gf100 ISA
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_i
was "nv50/ir: PostRaConstantFolding improvements" before.
nothing really changed from the last version, just minor things.
Karol Herbst (5):
nv50/ir: restructure and rename postraconstantfolding pass
nv50/ir: implement mad post ra folding for nvc0+
gk110/ir: add LIMM form of mad
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
v4: rename it to PostRaLoadPropagation
Signed-off-by: Karol Herbst <karolher...@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
in shared programs : 36061888 -> 36056504 (-0.01%)
localgpr inst bytes
helped 0 0 228 228
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/galli
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: removed wrong neg mod emission
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 50 ++
.../drivers/nouveau/codegen/nv50_ir_peepho
35749888 -> 35214176 (-1.50%)
localgpr inst bytes
helped 17182940914091
hurt 4 44 3 3
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drive
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.
For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.
The last patch enables looping opts.
v2: update commit author
Karol Herbst
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp| 17 +++--
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/c
From: Karol Herbst <nouv...@karolherbst.de>
With the shader cache, compilation time matters less.
As a side effect we can write more optimizations to produce better optimized
code.
total instructions in shared programs : 3931743 -> 3917512 (-0.36%)
total gprs used in shared programs
From: Karol Herbst <nouv...@karolherbst.de>
Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephol
1 - 100 of 923 matches
Mail list logo