Re: [Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-25 Thread Iago Toral
On Wed, 2016-08-24 at 18:51 -0700, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > 
> > On Tue, 2016-08-23 at 12:58 -0700, Francisco Jerez wrote:
> > > 
> > > Iago Toral Quiroga  writes:
> > > 
> > > > 
> > > > 
> > > > So we can access it in the vec4 backend to handle byte offsets
> > > > into
> > > > registers.
> > > This change has deep implications in the meaning of the vec4
> > > register
> > > objects because the representation of register offsets is now
> > > split
> > > between 'reg_offset' and 'subreg_offset', and there are a *lot*
> > > of
> > > places that directly rely on the current representation of
> > > register
> > > offsets, just grep the VEC4 back-end for 'reg_offset' -- Every
> > > occurrence is now suspect of being inaccurate because the offset
> > > of a
> > > register is no longer guaranteed to be 32B-aligned, so comparing
> > > reg_offsets alone to find out whether two registers are
> > > equivalent or
> > > whether they overlap is no longer sufficient.
> > > 
> > > Do you *really* need to make this representation change for FP64?
> > No, in fact the fp64 series I sent did not need this at all.
> > 
> Right, but instead you used the subnr field at the IR level which
> suffers from pretty much the same problem.

Right. I think the reason this little series did not show any
regressions in piglit for me is that right now there is nothing in the
backend that uses offsets that are not 32b-aligned so this change does
not really affect anything except fp64 code, and even then it only
comes into effect when we split DF instructions and they have any 32-
bit arguments that are not uniforms.

> > mentioned that we should use a horiz_offset() helper that worked in
> > terms of channels and I assumed that also meant that you wanted
> > offset() to work in similar fashion, like we have in the FS
> > backend. I
> > am actually relieved to see that you did not mean this :)
> > 
> > That said, to implement horiz_offset() we need to use something
> > like
> > subnr or subreg_offset, right?
> Not necessarily, it depends on how much granularity you need the
> offset
> helpers to have, if 32B units is sufficient for FP64 I wouldn't
> bother
> to make this change yet (you can just assert that the calculated
> increment is a multiple of 32B).

That would be sufficient only for DF operands but if we are splitting a
DF instruction that uses a 32-bit source then we need a 16-byte offset
:-/

>   My point was that by using
> horiz_offset() in the SIMD lowering pass you could trim the pass down
> and avoid building this assumption into it -- The pass will only be
> limited by the underlying representation of register offsets and will
> automatically be able to handle lowering a larger set of register
> regions as soon as we change the representation without any
> additional
> effort.

Ok, yeah, that sounds reasonable.

> > 
> > so right after the simd splitting pass runs we are again in the
> > situation where offsets are split between two fields. You mentioned
> > in
> > another e-mail that we should probably postpone simd splitting
> > after
> > the main opt loop, so I guess that should remove the problem for
> > the
> > most part...
> > 
> I didn't mean to say you need to change that :), I just sent you a
> few
> questions asking how you were planning to handle some corner cases if
> you do it before optimization. 


Oh, alright :). So, replying to that question: for now we rely on the
scalarization pass that runs right before codegen to make sure that we
always produce correct code by fully scalarizing anything that isn't
natively supported. This version of the series scalarizes everything
without exception, but my work-in-progress v2 lets supported natively
swizzles combinations (that is, anything that does not require to split
instructions by components to produce valid swizzle combinations)
through without scalarization.

Going forward I was thinking that when we want to deal with splitting
instructions to produce valid swizzle combinations that should not be
handled by the simd splitting pass, because that is not a simd
splitting, it is a component/swizzle splitting. I thought that we might
want to do that in the scalarization pass itself right before codegen
and not run any opt passes after it. That pass would not need to do
writes to temporaries.

If we were to run opt passes after that component splitting then there
is another possible problem: that we end up undoing the splitting :-/

>  That said I doubt that doing SIMD
> lowering after the optimization loop would substantially change the
> situation here, even if you do it after the optimization loop we're
> likely to notice eventually that some optimization pass (e.g. copy
> propagation) is helpful after SIMD lowering (have a look at the
> optimization passes run after SIMD lowering in
> fs_visitor::optimize()),
> and you won't be able to run them unless you make sure that the SIMD
> lowering pass 

Re: [Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-24 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2016-08-23 at 12:58 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > 
>> > So we can access it in the vec4 backend to handle byte offsets into
>> > registers.
>> This change has deep implications in the meaning of the vec4 register
>> objects because the representation of register offsets is now split
>> between 'reg_offset' and 'subreg_offset', and there are a *lot* of
>> places that directly rely on the current representation of register
>> offsets, just grep the VEC4 back-end for 'reg_offset' -- Every
>> occurrence is now suspect of being inaccurate because the offset of a
>> register is no longer guaranteed to be 32B-aligned, so comparing
>> reg_offsets alone to find out whether two registers are equivalent or
>> whether they overlap is no longer sufficient.
>> 
>> Do you *really* need to make this representation change for FP64?
>
> No, in fact the fp64 series I sent did not need this at all.
>
Right, but instead you used the subnr field at the IR level which
suffers from pretty much the same problem.

> I only did this because in the review of the simd lowering pass you
> mentioned that we should use a horiz_offset() helper that worked in
> terms of channels and I assumed that also meant that you wanted
> offset() to work in similar fashion, like we have in the FS backend. I
> am actually relieved to see that you did not mean this :)
>
> That said, to implement horiz_offset() we need to use something like
> subnr or subreg_offset, right?

Not necessarily, it depends on how much granularity you need the offset
helpers to have, if 32B units is sufficient for FP64 I wouldn't bother
to make this change yet (you can just assert that the calculated
increment is a multiple of 32B).  My point was that by using
horiz_offset() in the SIMD lowering pass you could trim the pass down
and avoid building this assumption into it -- The pass will only be
limited by the underlying representation of register offsets and will
automatically be able to handle lowering a larger set of register
regions as soon as we change the representation without any additional
effort.

> so right after the simd splitting pass runs we are again in the
> situation where offsets are split between two fields. You mentioned in
> another e-mail that we should probably postpone simd splitting after
> the main opt loop, so I guess that should remove the problem for the
> most part...
>

I didn't mean to say you need to change that :), I just sent you a few
questions asking how you were planning to handle some corner cases if
you do it before optimization.  That said I doubt that doing SIMD
lowering after the optimization loop would substantially change the
situation here, even if you do it after the optimization loop we're
likely to notice eventually that some optimization pass (e.g. copy
propagation) is helpful after SIMD lowering (have a look at the
optimization passes run after SIMD lowering in fs_visitor::optimize()),
and you won't be able to run them unless you make sure that the SIMD
lowering pass spits out valid IR (i.e. IR in the form accepted by all
other optimization passes).

>>   Or is
>> there some way around it?  ISTR that in the big VEC4 FP64 series you
>> ended up using backend_reg::subnr for this (which is just an
>> undercover
>> version of subreg_offset so the same caveat applies :P) when you had
>> to
>> address both halves of a single-precision register during SIMD
>> lowering
>
> Right, I thought this use of subnr there was maybe a bit of an abuse
> since it seems that before fp64 we mostly only used it at codegen time
> so I thought that having a different field for this in the IR
> (subreg_offset, like we do in the FS backend) was probably a better
> idea, but maybe it doesn't really matter much. If you have no issues
> with using subnr instead of subreg_offset for this I am fine too.
>

I doubt that using the subnr field instead saves you from any of the
trouble caused by subreg_offset, unless you only set it to a non-zero
value at (quasi-)codegen time.

> The thing is that in the simd splitting pass I sent for review we
> assumed that we are only splitting DF instructions (because that's
> really the only case at the moment) and that the split DF dst and src
> regions are going to be exactly one SIMD register long (which is really
> what we end up with in all the cases where we need to split). Because
> of that, we can use the offset() helper to compute the start of the
> src/dst regions for DF operands (and subnr if any of the operands in
> the instruction is 32-bit) and because of that we added this assert:
>
> /* We always split so that each lowered instruction writes exactly to
>  * one register.
>  */
> assert(inst->regs_written == inst->exec_size / lowered_width);
>
> If that assert is not met by any DF instruction that we need to split
> it means that we are splitting things in smaller chunks than that and
> using offset() 

Re: [Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-24 Thread Iago Toral
On Tue, 2016-08-23 at 12:58 -0700, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > 
> > So we can access it in the vec4 backend to handle byte offsets into
> > registers.
> This change has deep implications in the meaning of the vec4 register
> objects because the representation of register offsets is now split
> between 'reg_offset' and 'subreg_offset', and there are a *lot* of
> places that directly rely on the current representation of register
> offsets, just grep the VEC4 back-end for 'reg_offset' -- Every
> occurrence is now suspect of being inaccurate because the offset of a
> register is no longer guaranteed to be 32B-aligned, so comparing
> reg_offsets alone to find out whether two registers are equivalent or
> whether they overlap is no longer sufficient.
> 
> Do you *really* need to make this representation change for FP64?

No, in fact the fp64 series I sent did not need this at all.

I only did this because in the review of the simd lowering pass you
mentioned that we should use a horiz_offset() helper that worked in
terms of channels and I assumed that also meant that you wanted
offset() to work in similar fashion, like we have in the FS backend. I
am actually relieved to see that you did not mean this :)

That said, to implement horiz_offset() we need to use something like
subnr or subreg_offset, right? so right after the simd splitting pass
runs we are again in the situation where offsets are split between two
fields. You mentioned in another e-mail that we should probably
postpone simd splitting after the main opt loop, so I guess that should
remove the problem for the most part...

>   Or is
> there some way around it?  ISTR that in the big VEC4 FP64 series you
> ended up using backend_reg::subnr for this (which is just an
> undercover
> version of subreg_offset so the same caveat applies :P) when you had
> to
> address both halves of a single-precision register during SIMD
> lowering

Right, I thought this use of subnr there was maybe a bit of an abuse
since it seems that before fp64 we mostly only used it at codegen time
so I thought that having a different field for this in the IR
(subreg_offset, like we do in the FS backend) was probably a better
idea, but maybe it doesn't really matter much. If you have no issues
with using subnr instead of subreg_offset for this I am fine too.

The thing is that in the simd splitting pass I sent for review we
assumed that we are only splitting DF instructions (because that's
really the only case at the moment) and that the split DF dst and src
regions are going to be exactly one SIMD register long (which is really
what we end up with in all the cases where we need to split). Because
of that, we can use the offset() helper to compute the start of the
src/dst regions for DF operands (and subnr if any of the operands in
the instruction is 32-bit) and because of that we added this assert:

/* We always split so that each lowered instruction writes exactly to
 * one register.
 */
assert(inst->regs_written == inst->exec_size / lowered_width);

If that assert is not met by any DF instruction that we need to split
it means that we are splitting things in smaller chunks than that and
using offset() when computing the split regions won't do what we need
for that scenario,  so that's why I added the assert. We don't have any
such cases at the moment, but in the review process you suggested that
we changed the implementation to use a horiz_offset() and remove that
assertion, so I understood that you wanted the splitting pass to deal
with regions that would not be register aligned, and that requires that
we have a way to express offsets that are not register aligned too, and
that's how I ended up doing this. I suppose I misunderstood your
comments?

> (Can you remind me what the exact use-case was for that?).

We use subnr in two scenarios:

1. To select the second half of a DF register (ZW), but we only do this
right before codegen, in the IR we work with the ZW wizzles.

2. To generate the second half of split instructions that use single-
precision sources that are not uniforms, since these would not be
register aligned after the split (they need to be 16-byte aligned).

>   To address
> the ZW components of a double-precision vector this seems less of a
> requirement because you can just use the logical (i.e. 64-bit-based)
> swizzles at the IR level and only translate to physical
> swizzles+offsets+strides during codegen time.

That's exactly what we are doing, but as I said, we also need to use
subnr when we split a DF instruction that has single-precision sources.
Off the top of my head this can happen, for example, with some virtual
opcodes like {pick/set}_{low/high}_32bit that we need for things like
packDouble2x32 and similar situations (basically when we need to
produce code that operates on the low/high 32-bit of a DF). There might
be other cases of DF operations that take a single-precision source, I
would have to 

Re: [Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-23 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> So we can access it in the vec4 backend to handle byte offsets into
> registers.

This change has deep implications in the meaning of the vec4 register
objects because the representation of register offsets is now split
between 'reg_offset' and 'subreg_offset', and there are a *lot* of
places that directly rely on the current representation of register
offsets, just grep the VEC4 back-end for 'reg_offset' -- Every
occurrence is now suspect of being inaccurate because the offset of a
register is no longer guaranteed to be 32B-aligned, so comparing
reg_offsets alone to find out whether two registers are equivalent or
whether they overlap is no longer sufficient.

Do you *really* need to make this representation change for FP64?  Or is
there some way around it?  ISTR that in the big VEC4 FP64 series you
ended up using backend_reg::subnr for this (which is just an undercover
version of subreg_offset so the same caveat applies :P) when you had to
address both halves of a single-precision register during SIMD lowering
(Can you remind me what the exact use-case was for that?).  To address
the ZW components of a double-precision vector this seems less of a
requirement because you can just use the logical (i.e. 64-bit-based)
swizzles at the IR level and only translate to physical
swizzles+offsets+strides during codegen time.

If the answer is that you definitely need this change, I think I'm going
to help you out with this because the change is going to be massive and
involve auditing the whole VEC4 back-end.  I think there are two lessons
to learn from the FS back-end:

 - Having the register offset split between two variables has been an
   endless source of bugs, because it's just too tempting to only take
   one of them into account and ignore the other.

 - It wouldn't be substantially more difficult to change reg_offset to
   be expressed in byte units instead, even if it involves going through
   every occurrence of reg_offset in the back-end, because adding a
   separate subreg_offset field invalidates every ocurrence of
   reg_offset in the back-end anyway.

> ---
>  src/mesa/drivers/dri/i965/brw_ir_fs.h  | 6 --
>  src/mesa/drivers/dri/i965/brw_shader.h | 6 ++
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
> b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> index f214483..00fbace 100644
> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> @@ -52,12 +52,6 @@ public:
> /** Smear a channel of the reg to all channels. */
> fs_reg _smear(unsigned subreg);
>  
> -   /**
> -* Offset in bytes from the start of the register.  Values up to a
> -* backend_reg::reg_offset unit are valid.
> -*/
> -   int subreg_offset;
> -
> /** Register region horizontal stride */
> uint8_t stride;
>  };
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> b/src/mesa/drivers/dri/i965/brw_shader.h
> index e61c080..ae23830 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.h
> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> @@ -75,6 +75,12 @@ struct backend_reg : private brw_reg
>  */
> uint16_t reg_offset;
>  
> +   /**
> +* Offset in bytes from the start of the register.  Values up to a
> +* backend_reg::reg_offset unit are valid.
> +*/
> +   uint16_t subreg_offset;
> +
> using brw_reg::type;
> using brw_reg::file;
> using brw_reg::negate;
> -- 
> 2.7.4


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-23 Thread Iago Toral Quiroga
So we can access it in the vec4 backend to handle byte offsets into
registers.
---
 src/mesa/drivers/dri/i965/brw_ir_fs.h  | 6 --
 src/mesa/drivers/dri/i965/brw_shader.h | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index f214483..00fbace 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -52,12 +52,6 @@ public:
/** Smear a channel of the reg to all channels. */
fs_reg _smear(unsigned subreg);
 
-   /**
-* Offset in bytes from the start of the register.  Values up to a
-* backend_reg::reg_offset unit are valid.
-*/
-   int subreg_offset;
-
/** Register region horizontal stride */
uint8_t stride;
 };
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index e61c080..ae23830 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -75,6 +75,12 @@ struct backend_reg : private brw_reg
 */
uint16_t reg_offset;
 
+   /**
+* Offset in bytes from the start of the register.  Values up to a
+* backend_reg::reg_offset unit are valid.
+*/
+   uint16_t subreg_offset;
+
using brw_reg::type;
using brw_reg::file;
using brw_reg::negate;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev