Re: [Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-16 Thread Tobias Klausmann
ping on this v2

On 8/13/17 3:02 AM, Tobias Klausmann wrote:
> On using builtin functions we have to move the input to registers $0 and $1, 
> if
> one of the input value is an immediate, we fail to propagate the immediate:
>
> ...
> mov u32 $r477 0x0003 (0)
> ...
> mov u32 $r0 %r473 (0)
> mov u32 $r1 $r477 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> With this patch the immediate is propagated, potentially causing the first MOV
> to be superfluous, which we'd remove in that case:
>
> ...
>
> mov u32 $r0 %r473 (0)
> mov u32 $r1 0x0003 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> Shaderdb stats:
> total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
> total gprs used in shared programs: 582972 -> 582881 (-0.02%)
> total local used in shared programs   : 17960 -> 17960 (0.00%)
>
> localgpr   inst  bytes
> helped   0  91 112 112
>   hurt   0   0   0   0
>
> v2:
>  implement some changes proposed by imirkin, the manual deletion of the dead
>  mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
>  to division function") as the potentially dead mov is unlinked properly,
>  causing later passes to not notice the mov op at all and thus not cleaning it
>  up. That makes up a big chunk of the regression the above commit caused.
>  Keep the deletion of the op where it is, deleting it later unnecessarily 
> blows
>  up size of the change.
>
> Signed-off-by: Tobias Klausmann 
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 
> +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index c8f0701572..7243b1d2e4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
> int builtin;
>  
> bld.setPosition(i, false);
> -   bld.mkMovToReg(0, i->getSrc(0));
> -   bld.mkMovToReg(1, i->getSrc(1));
> +
> +   // Generate movs to the input regs for the call we want to generate
> +   for (int s = 0; i->srcExists(s); ++s) {
> +  Instruction *ld = i->getSrc(s)->getInsn();
> +  assert(ld->getSrc(0) != NULL);
> +  // check if we are moving an immediate, propagate it in that case
> +  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
> +!(ld->src(0).getFile() == FILE_IMMEDIATE))
> + bld.mkMovToReg(s, i->getSrc(s));
> +  else {
> + bld.mkMovToReg(s, ld->getSrc(0));
> + // Clear the src, to make code elimination possible here before we
> + // delete the instruction i later
> + i->setSrc(s, NULL);
> + if (ld->isDead())
> +delete_Instruction(prog, ld);
> +  }
> +   }
> +
> switch (i->dType) {
> case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
> case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-12 Thread Tobias Klausmann
On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:

...
mov u32 $r477 0x0003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:

...

mov u32 $r0 %r473 (0)
mov u32 $r1 0x0003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs: 582972 -> 582881 (-0.02%)
total local used in shared programs   : 17960 -> 17960 (0.00%)

localgpr   inst  bytes
helped   0  91 112 112
  hurt   0   0   0   0

v2:
 implement some changes proposed by imirkin, the manual deletion of the dead
 mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
 to division function") as the potentially dead mov is unlinked properly,
 causing later passes to not notice the mov op at all and thus not cleaning it
 up. That makes up a big chunk of the regression the above commit caused.
 Keep the deletion of the op where it is, deleting it later unnecessarily blows
 up size of the change.

Signed-off-by: Tobias Klausmann 
---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..7243b1d2e4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;
 
bld.setPosition(i, false);
-   bld.mkMovToReg(0, i->getSrc(0));
-   bld.mkMovToReg(1, i->getSrc(1));
+
+   // Generate movs to the input regs for the call we want to generate
+   for (int s = 0; i->srcExists(s); ++s) {
+  Instruction *ld = i->getSrc(s)->getInsn();
+  assert(ld->getSrc(0) != NULL);
+  // check if we are moving an immediate, propagate it in that case
+  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+!(ld->src(0).getFile() == FILE_IMMEDIATE))
+ bld.mkMovToReg(s, i->getSrc(s));
+  else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
+ if (ld->isDead())
+delete_Instruction(prog, ld);
+  }
+   }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
-- 
2.14.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau