Re: [Mesa-dev] [PATCH v2] nv50/ir: make a copy of tex src if it's referenced multiple times

2018-04-10 Thread Ilia Mirkin
On Tue, Apr 10, 2018 at 6:08 AM, Karol Herbst  wrote:
> I guess this fixes a bug somewhere?

Yeah... I describe it in the commit description, I thought. Here's the
situation:

%r1 = 5
%r2 = texsize %r1
%r3 = texsize %r1

Now, let's not worry about why those didn't get CSE'd. (Let's say they
refer to different textures, but query the same LOD.)

With the current code, r1, r2, and r3 all get joined to a single RIG
node with coalesceValues() which happens as part of the whole
JOIN_MASK_TEX thing (to make sure that src == dst regs for a tex op,
since nothing else can be encoded for nv50).

This is obviously bad - no way to make that RA happen -- the assigned
reg (to the RIG node) will overwrite the %r1 value after the first
texsize, and the second size will get a bogus LOD input in addition to
then also overwriting the result of the first texsize.

This is basically the same problem as a merge, and we get out of it
the same way as a merge -- adding extra copies. I refactored that
constraint code, although in hindsight perhaps I should have left it
alone and just pushed the tex onto the constrList and treat it like a
MERGE. I can go redo it that way too.

>
> On Tue, Apr 10, 2018 at 6:11 AM, Ilia Mirkin  wrote:
>> For nv50 we coalesce the srcs and defs into a single node. As such, we
>> can end up with impossible constraints if the source is referenced
>> after the tex operation (which, due to the coalescing of values, will
>> have overwritten it).
>>
>> This logic already exists for inserting moves for MERGE/UNION sources.
>> It's the exact same idea here, so leverage that code, which also
>> includes a few optimizations around not extending live ranges
>> unnecessarily.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>
>> v1 -> v2: make use of existing logic in insertConstraintMoves
>>
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 86 
>> --
>>  1 file changed, 49 insertions(+), 37 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
>> index 3a0e56e1385..7d107aca68d 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
>> @@ -257,6 +257,7 @@ private:
>> private:
>>virtual bool visit(BasicBlock *);
>>
>> +  void insertConstraintMove(Instruction *, int s);
>>bool insertConstraintMoves();
>>
>>void condenseDefs(Instruction *);
>> @@ -2216,6 +2217,8 @@ 
>> RegAlloc::InsertConstraintsPass::texConstraintNV50(TexInstruction *tex)
>> for (c = 0; tex->srcExists(c) || tex->defExists(c); ++c) {
>>if (!tex->srcExists(c))
>>   tex->setSrc(c, new_LValue(func, tex->getSrc(0)->asLValue()));
>> +  else
>> + insertConstraintMove(tex, c);
>>if (!tex->defExists(c))
>>   tex->setDef(c, new_LValue(func, tex->getDef(0)->asLValue()));
>> }
>> @@ -2288,6 +2291,51 @@ RegAlloc::InsertConstraintsPass::visit(BasicBlock *bb)
>> return true;
>>  }
>>
>> +void
>> +RegAlloc::InsertConstraintsPass::insertConstraintMove(Instruction *cst, int 
>> s)
>> +{
>> +   const uint8_t size = cst->src(s).getSize();
>> +
>> +   assert(cst->getSrc(s)->defs.size() == 1); // still SSA
>> +
>> +   Instruction *defi = cst->getSrc(s)->defs.front()->getInsn();
>> +   bool imm = defi->op == OP_MOV &&
>> +  defi->src(0).getFile() == FILE_IMMEDIATE;
>> +   bool load = defi->op == OP_LOAD &&
>> +  defi->src(0).getFile() == FILE_MEMORY_CONST &&
>> +  !defi->src(0).isIndirect(0);
>> +   // catch some cases where don't really need MOVs
>> +   if (cst->getSrc(s)->refCount() == 1 && !defi->constrainedDefs()) {
>> +  if (imm || load) {
>> + // Move the defi right before the cst. No point in expanding
>> + // the range.
>> + defi->bb->remove(defi);
>> + cst->bb->insertBefore(cst, defi);
>> +  }
>> +  return;
>> +   }
>> +
>> +   LValue *lval = new_LValue(func, cst->src(s).getFile());
>> +   lval->reg.size = size;
>> +
>> +   Instruction *mov = new_Instruction(func, OP_MOV, typeOfSize(size));
>> +   mov->setDef(0, lval);
>> +   mov->setSrc(0, cst->getSrc(s));
>> +
>> +   if (load) {
>> +  mov->op = OP_LOAD;
>> +  mov->setSrc(0, defi->getSrc(0));
>> +   } else if (imm) {
>> +  mov->setSrc(0, defi->getSrc(0));
>> +   }
>> +
>> +   if (defi->getPredicate())
>> +  mov->setPredicate(defi->cc, defi->getPredicate());
>> +
>> +   cst->setSrc(s, mov->getDef(0));
>> +   cst->bb->insertBefore(cst, mov);
>> +}
>> +
>>  // Insert extra moves so that, if multiple register constraints on a value 
>> are
>>  // in conflict, these conflicts can be resolved.
>>  bool
>> @@ -2328,46 +2376,10 @@ 
>> RegAlloc::InsertConstraintsPass::insertConstraintMoves()
>> cst->bb->insertBefore(cst, mov);
>> continue;
>>  }
>> -

Re: [Mesa-dev] [PATCH v2] nv50/ir: make a copy of tex src if it's referenced multiple times

2018-04-10 Thread Karol Herbst
I guess this fixes a bug somewhere?

On Tue, Apr 10, 2018 at 6:11 AM, Ilia Mirkin  wrote:
> For nv50 we coalesce the srcs and defs into a single node. As such, we
> can end up with impossible constraints if the source is referenced
> after the tex operation (which, due to the coalescing of values, will
> have overwritten it).
>
> This logic already exists for inserting moves for MERGE/UNION sources.
> It's the exact same idea here, so leverage that code, which also
> includes a few optimizations around not extending live ranges
> unnecessarily.
>
> Signed-off-by: Ilia Mirkin 
> ---
>
> v1 -> v2: make use of existing logic in insertConstraintMoves
>
>  src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 86 
> --
>  1 file changed, 49 insertions(+), 37 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
> index 3a0e56e1385..7d107aca68d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
> @@ -257,6 +257,7 @@ private:
> private:
>virtual bool visit(BasicBlock *);
>
> +  void insertConstraintMove(Instruction *, int s);
>bool insertConstraintMoves();
>
>void condenseDefs(Instruction *);
> @@ -2216,6 +2217,8 @@ 
> RegAlloc::InsertConstraintsPass::texConstraintNV50(TexInstruction *tex)
> for (c = 0; tex->srcExists(c) || tex->defExists(c); ++c) {
>if (!tex->srcExists(c))
>   tex->setSrc(c, new_LValue(func, tex->getSrc(0)->asLValue()));
> +  else
> + insertConstraintMove(tex, c);
>if (!tex->defExists(c))
>   tex->setDef(c, new_LValue(func, tex->getDef(0)->asLValue()));
> }
> @@ -2288,6 +2291,51 @@ RegAlloc::InsertConstraintsPass::visit(BasicBlock *bb)
> return true;
>  }
>
> +void
> +RegAlloc::InsertConstraintsPass::insertConstraintMove(Instruction *cst, int 
> s)
> +{
> +   const uint8_t size = cst->src(s).getSize();
> +
> +   assert(cst->getSrc(s)->defs.size() == 1); // still SSA
> +
> +   Instruction *defi = cst->getSrc(s)->defs.front()->getInsn();
> +   bool imm = defi->op == OP_MOV &&
> +  defi->src(0).getFile() == FILE_IMMEDIATE;
> +   bool load = defi->op == OP_LOAD &&
> +  defi->src(0).getFile() == FILE_MEMORY_CONST &&
> +  !defi->src(0).isIndirect(0);
> +   // catch some cases where don't really need MOVs
> +   if (cst->getSrc(s)->refCount() == 1 && !defi->constrainedDefs()) {
> +  if (imm || load) {
> + // Move the defi right before the cst. No point in expanding
> + // the range.
> + defi->bb->remove(defi);
> + cst->bb->insertBefore(cst, defi);
> +  }
> +  return;
> +   }
> +
> +   LValue *lval = new_LValue(func, cst->src(s).getFile());
> +   lval->reg.size = size;
> +
> +   Instruction *mov = new_Instruction(func, OP_MOV, typeOfSize(size));
> +   mov->setDef(0, lval);
> +   mov->setSrc(0, cst->getSrc(s));
> +
> +   if (load) {
> +  mov->op = OP_LOAD;
> +  mov->setSrc(0, defi->getSrc(0));
> +   } else if (imm) {
> +  mov->setSrc(0, defi->getSrc(0));
> +   }
> +
> +   if (defi->getPredicate())
> +  mov->setPredicate(defi->cc, defi->getPredicate());
> +
> +   cst->setSrc(s, mov->getDef(0));
> +   cst->bb->insertBefore(cst, mov);
> +}
> +
>  // Insert extra moves so that, if multiple register constraints on a value 
> are
>  // in conflict, these conflicts can be resolved.
>  bool
> @@ -2328,46 +2376,10 @@ 
> RegAlloc::InsertConstraintsPass::insertConstraintMoves()
> cst->bb->insertBefore(cst, mov);
> continue;
>  }
> -assert(cst->getSrc(s)->defs.size() == 1); // still SSA
> -
> -Instruction *defi = cst->getSrc(s)->defs.front()->getInsn();
> -bool imm = defi->op == OP_MOV &&
> -   defi->src(0).getFile() == FILE_IMMEDIATE;
> -bool load = defi->op == OP_LOAD &&
> -   defi->src(0).getFile() == FILE_MEMORY_CONST &&
> -   !defi->src(0).isIndirect(0);
> -// catch some cases where don't really need MOVs
> -if (cst->getSrc(s)->refCount() == 1 && !defi->constrainedDefs()) 
> {
> -   if (imm || load) {
> -  // Move the defi right before the cst. No point in 
> expanding
> -  // the range.
> -  defi->bb->remove(defi);
> -  cst->bb->insertBefore(cst, defi);
> -   }
> -   continue;
> -}
>
> -LValue *lval = new_LValue(func, cst->src(s).getFile());
> -lval->reg.size = size;
> -
> -mov = new_Instruction(func, OP_MOV, typeOfSize(size));
> -mov->setDef(0, lval);
> -mov->setSrc(0, cst->getSrc(s));
> -
> -if (load) {
> -   mov->op = OP_LOAD;
> -   mov->setSrc(0, defi->getSrc(0));
> -}