Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
On Tue, 2016-09-13 at 22:12 -0700, Francisco Jerez wrote: > Iago Toralwrites: > > > > > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote: > > > > > > Iago Toral Quiroga writes: > > > > > > > > > > > > > > > We will use this in cases where we want to force the vstride of > > > > a > > > > src_reg > > > > to 0 to exploit a particular behavior of the hardware. It will > > > > come > > > > in > > > > handy to implement access to components Z/W. > > > > --- > > > > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + > > > > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ > > > > 2 files changed, 3 insertions(+) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > index f66c093..f3cce4b 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > @@ -51,6 +51,7 @@ public: > > > > explicit src_reg(const dst_reg ); > > > > > > > > src_reg *reladdr; > > > > + bool force_vstride0; > > > I was wondering whether it would make more sense to unify this > > > with > > > the > > > FS back-end's fs_reg::stride (a numeric stride field is also > > > likely > > > more > > > convenient to do arithmetic on than a boolean) and promote it to > > > backend_reg? It could be defined as the number of components to > > > jump > > > over for each logical channel of the register, which is just the > > > vstride > > > in single-precision SIMD4x2 and the hstride in scalar mode. > > We could do that, but I thought it would be a good idea to make it > > clear that here we are using the vstride=0 with a very specific > > intention and we don't expect the hardware to do what it would be > > expected (we are trying to exploit a hardware bug after all). If we > > were to use a normal stride field for this I think we would make > > this > > intention much less obvious and other people reading the code would > > have a much harder time understanding what is really going on. > > Since we > > are being tricky here I think the extra field to signal that we are > > trying to do something "special" might be worth it: people can > > track > > where we read and write that field and see exactly where it is > > being > > used for the purpose of exploiting this particular hardware > > behavior. > > > Yes, I agree that the hardware's behavior on Gen7 with non-identity > vstride is tricky and special -- Special enough that *none* of the > VEC4 > optimization passes and IR-handling code need to be aware of it, > because > the field is only going to be used as internal book-keeping data > structure in convert_to_hw_regs() and immediately discarded. IOW > you're > storing an internal data structure of convert_to_hw_regs() as part of > the shared IR data structure, with no well-defined semantics and > which > no back-end code (not even convert_to_hw_regs()) is going to be able > to > honor. > > So if your argument for making the representation of vstride > unnecessarily non-orthogonal is that you want to discourage people > from > using it at the IR level (which is fair because it won't work at > all!), > I would argue that it doesn't belong in the IR data structures in the > first place, because you could just keep convert_to_hw_regs' internal > data structures internal to convert_to_hw_regs. (I don't actually > think > you need the data structure, neither internal nor external, but more > on > that later) Yes, that makes sense. > > > > > > > > But thinking about it some more, I wonder if it's really > > > necessary to > > > expose vertical strides at the IR level? Aren't you planing to > > > use > > > this > > > during the conversion to HW registers exclusively? Why don't you > > > set > > > the vstride field directly in that case? > > Yes, this is used exclusively at that time. The conversion to > > hardware > > registers in convert_to_hw_regs() happens in two stages now: > > > > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This > > one > > takes care of checking the regioning on DF instructions, translate > > swizzles and set force_vstrid0 to true when needed (which is also > > the > > only place that would set this to true). Then the rest of the code > > in > > convert_to_hw_regs() just operates as usual, only that it will > > check > > the force_vstride0 setting to decide the vstride to use for DF > > regions. > > > > I did it like this because it allows us to keep the DF swizzle > > translation and regioning checking logic separated from the > > conversion > > to hardware registers, but this separation means that we need to > > tell > > the latter when it has to set the vstride to 0, thus the addition > > of > > the forcE_vstride0 field. I think having these two things separated > > makes sense and makes the code easier to read. We can keep both > > things > > separate and still avoid the force_vstride0 field by using a stride > >
Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
Iago Toralwrites: > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote: >> Iago Toral Quiroga writes: >> >> > >> > We will use this in cases where we want to force the vstride of a >> > src_reg >> > to 0 to exploit a particular behavior of the hardware. It will come >> > in >> > handy to implement access to components Z/W. >> > --- >> > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + >> > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ >> > 2 files changed, 3 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > index f66c093..f3cce4b 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > @@ -51,6 +51,7 @@ public: >> > explicit src_reg(const dst_reg ); >> > >> > src_reg *reladdr; >> > + bool force_vstride0; >> I was wondering whether it would make more sense to unify this with >> the >> FS back-end's fs_reg::stride (a numeric stride field is also likely >> more >> convenient to do arithmetic on than a boolean) and promote it to >> backend_reg? It could be defined as the number of components to jump >> over for each logical channel of the register, which is just the >> vstride >> in single-precision SIMD4x2 and the hstride in scalar mode. > > We could do that, but I thought it would be a good idea to make it > clear that here we are using the vstride=0 with a very specific > intention and we don't expect the hardware to do what it would be > expected (we are trying to exploit a hardware bug after all). If we > were to use a normal stride field for this I think we would make this > intention much less obvious and other people reading the code would > have a much harder time understanding what is really going on. Since we > are being tricky here I think the extra field to signal that we are > trying to do something "special" might be worth it: people can track > where we read and write that field and see exactly where it is being > used for the purpose of exploiting this particular hardware behavior. > Yes, I agree that the hardware's behavior on Gen7 with non-identity vstride is tricky and special -- Special enough that *none* of the VEC4 optimization passes and IR-handling code need to be aware of it, because the field is only going to be used as internal book-keeping data structure in convert_to_hw_regs() and immediately discarded. IOW you're storing an internal data structure of convert_to_hw_regs() as part of the shared IR data structure, with no well-defined semantics and which no back-end code (not even convert_to_hw_regs()) is going to be able to honor. So if your argument for making the representation of vstride unnecessarily non-orthogonal is that you want to discourage people from using it at the IR level (which is fair because it won't work at all!), I would argue that it doesn't belong in the IR data structures in the first place, because you could just keep convert_to_hw_regs' internal data structures internal to convert_to_hw_regs. (I don't actually think you need the data structure, neither internal nor external, but more on that later) >> But thinking about it some more, I wonder if it's really necessary to >> expose vertical strides at the IR level? Aren't you planing to use >> this >> during the conversion to HW registers exclusively? Why don't you set >> the vstride field directly in that case? > > Yes, this is used exclusively at that time. The conversion to hardware > registers in convert_to_hw_regs() happens in two stages now: > > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This one > takes care of checking the regioning on DF instructions, translate > swizzles and set force_vstrid0 to true when needed (which is also the > only place that would set this to true). Then the rest of the code in > convert_to_hw_regs() just operates as usual, only that it will check > the force_vstride0 setting to decide the vstride to use for DF regions. > > I did it like this because it allows us to keep the DF swizzle > translation and regioning checking logic separated from the conversion > to hardware registers, but this separation means that we need to tell > the latter when it has to set the vstride to 0, thus the addition of > the forcE_vstride0 field. I think having these two things separated > makes sense and makes the code easier to read. We can keep both things > separate and still avoid the force_vstride0 field by using a stride > field as you suggest above, but as I said, I think we might be doing a > rather tricky thing a bit less obvious than it should to other people. > Keeping these two tasks logically separate from each other sounds fine to me, but you don't need to extend the IR for them to exchange data. AFAICT expand_64bit_swizzle_to_32bit() is doing two things: - Calculate the hardware swizzle, which potentially involves an adjustment of the subregister offset
Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote: > Iago Toral Quirogawrites: > > > > > We will use this in cases where we want to force the vstride of a > > src_reg > > to 0 to exploit a particular behavior of the hardware. It will come > > in > > handy to implement access to components Z/W. > > --- > > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + > > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ > > 2 files changed, 3 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > index f66c093..f3cce4b 100644 > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > @@ -51,6 +51,7 @@ public: > > explicit src_reg(const dst_reg ); > > > > src_reg *reladdr; > > + bool force_vstride0; > I was wondering whether it would make more sense to unify this with > the > FS back-end's fs_reg::stride (a numeric stride field is also likely > more > convenient to do arithmetic on than a boolean) and promote it to > backend_reg? It could be defined as the number of components to jump > over for each logical channel of the register, which is just the > vstride > in single-precision SIMD4x2 and the hstride in scalar mode. We could do that, but I thought it would be a good idea to make it clear that here we are using the vstride=0 with a very specific intention and we don't expect the hardware to do what it would be expected (we are trying to exploit a hardware bug after all). If we were to use a normal stride field for this I think we would make this intention much less obvious and other people reading the code would have a much harder time understanding what is really going on. Since we are being tricky here I think the extra field to signal that we are trying to do something "special" might be worth it: people can track where we read and write that field and see exactly where it is being used for the purpose of exploiting this particular hardware behavior. > But thinking about it some more, I wonder if it's really necessary to > expose vertical strides at the IR level? Aren't you planing to use > this > during the conversion to HW registers exclusively? Why don't you set > the vstride field directly in that case? Yes, this is used exclusively at that time. The conversion to hardware registers in convert_to_hw_regs() happens in two stages now: We call our 'expand_64bit_swizzle_to_32bit()' helper first. This one takes care of checking the regioning on DF instructions, translate swizzles and set force_vstrid0 to true when needed (which is also the only place that would set this to true). Then the rest of the code in convert_to_hw_regs() just operates as usual, only that it will check the force_vstride0 setting to decide the vstride to use for DF regions. I did it like this because it allows us to keep the DF swizzle translation and regioning checking logic separated from the conversion to hardware registers, but this separation means that we need to tell the latter when it has to set the vstride to 0, thus the addition of the forcE_vstride0 field. I think having these two things separated makes sense and makes the code easier to read. We can keep both things separate and still avoid the force_vstride0 field by using a stride field as you suggest above, but as I said, I think we might be doing a rather tricky thing a bit less obvious than it should to other people. > > > > }; > > > > static inline src_reg > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > > index a20b2fd..bfbbd96 100644 > > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > > @@ -70,6 +70,7 @@ src_reg::src_reg(struct ::brw_reg reg) : > > { > > this->reg_offset = 0; > > this->reladdr = NULL; > > + this->force_vstride0 = false; > > } > > > > src_reg::src_reg(const dst_reg ) : > > @@ -77,6 +78,7 @@ src_reg::src_reg(const dst_reg ) : > > { > > this->reladdr = reg.reladdr; > > this->swizzle = brw_swizzle_for_mask(reg.writemask); > > + this->force_vstride0 = false; > > } > > > > void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
Iago Toral Quirogawrites: > We will use this in cases where we want to force the vstride of a src_reg > to 0 to exploit a particular behavior of the hardware. It will come in > handy to implement access to components Z/W. > --- > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > index f66c093..f3cce4b 100644 > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > @@ -51,6 +51,7 @@ public: > explicit src_reg(const dst_reg ); > > src_reg *reladdr; > + bool force_vstride0; I was wondering whether it would make more sense to unify this with the FS back-end's fs_reg::stride (a numeric stride field is also likely more convenient to do arithmetic on than a boolean) and promote it to backend_reg? It could be defined as the number of components to jump over for each logical channel of the register, which is just the vstride in single-precision SIMD4x2 and the hstride in scalar mode. But thinking about it some more, I wonder if it's really necessary to expose vertical strides at the IR level? Aren't you planing to use this during the conversion to HW registers exclusively? Why don't you set the vstride field directly in that case? > }; > > static inline src_reg > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index a20b2fd..bfbbd96 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -70,6 +70,7 @@ src_reg::src_reg(struct ::brw_reg reg) : > { > this->reg_offset = 0; > this->reladdr = NULL; > + this->force_vstride0 = false; > } > > src_reg::src_reg(const dst_reg ) : > @@ -77,6 +78,7 @@ src_reg::src_reg(const dst_reg ) : > { > this->reladdr = reg.reladdr; > this->swizzle = brw_swizzle_for_mask(reg.writemask); > + this->force_vstride0 = false; > } > > void > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
We will use this in cases where we want to force the vstride of a src_reg to 0 to exploit a particular behavior of the hardware. It will come in handy to implement access to components Z/W. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ 2 files changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/dri/i965/brw_ir_vec4.h index f66c093..f3cce4b 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h @@ -51,6 +51,7 @@ public: explicit src_reg(const dst_reg ); src_reg *reladdr; + bool force_vstride0; }; static inline src_reg diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index a20b2fd..bfbbd96 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -70,6 +70,7 @@ src_reg::src_reg(struct ::brw_reg reg) : { this->reg_offset = 0; this->reladdr = NULL; + this->force_vstride0 = false; } src_reg::src_reg(const dst_reg ) : @@ -77,6 +78,7 @@ src_reg::src_reg(const dst_reg ) : { this->reladdr = reg.reladdr; this->swizzle = brw_swizzle_for_mask(reg.writemask); + this->force_vstride0 = false; } void -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev