Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

2017-02-15 Thread Samuel Iglesias Gonsálvez
On Thu, 2017-02-16 at 08:11 +0100, Samuel Iglesias Gonsálvez wrote:
> On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote:
> > Samuel Iglesias Gonsálvez  writes:
> > 
> > > From: "Juan A. Suarez Romero" 
> > > 
> > > When converting a DF to 32-bit conversions, we set dst stride to
> > > 2,
> > > to fulfill alignment restrictions because the upper Dword of
> > > every
> > > Qword will be written with undefined value.
> > > 
> > > But in IVB/BYT, this is not necessary, as each DF conversion
> > > already
> > > writes 2, the first one the real value, and the second one a 0.
> > > That is, IVB/BYT already set stride = 2 implicitly, so we must
> > > set
> > > it to
> > > 1 explicitly to avoid ending up with stride = 4.
> > > 
> > > v2:
> > > - Fix typo (Matt)
> > > 
> > > v3:
> > > - Fix stride in the destination's brw_reg, don't modity IR
> > > (Curro)
> > > 
> > > Signed-off-by: Juan A. Suarez Romero 
> > > Signed-off-by: Samuel Iglesias Gonsálvez 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76
> > > +++---
> > >  1 file changed, 45 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > index 2f60ddd8706..e0a80d73b70 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
> > >  
> > >  static struct brw_reg
> > >  brw_reg_from_fs_reg(const struct gen_device_info *devinfo,
> > > fs_inst
> > > *inst,
> > > -fs_reg *reg, bool compressed)
> > > +fs_reg *reg, bool is_dst, bool compressed)
> > >  {
> > > struct brw_reg brw_reg;
> > >  
> > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct
> > > gen_device_info *devinfo, fs_inst *inst,
> > >   const unsigned width = MIN2(reg_width, phys_width);
> > >   brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> > > reg->nr, 0);
> > >   brw_reg = stride(brw_reg, width * reg->stride, width,
> > > reg->stride);
> > > - /* From the IvyBridge PRM (EU Changes by Processor
> > > Generation, page 13):
> > > -  *  "Each DF (Double Float) operand uses an element
> > > size
> > > of 4 rather
> > > -  *   than 8 and all regioning parameters are twice what
> > > the values
> > > -  *   would be based on the true element size: ExecSize,
> > > Width,
> > > -  *   HorzStride, and VertStride. Each DF operand uses a
> > > pair of
> > > -  *   channels and all masking and swizzing should be
> > > adjusted
> > > -  *   appropriately."
> > > -  *
> > > -  * From the IvyBridge PRM (Special Requirements for
> > > Handling Double
> > > -  * Precision Data Types, page 71):
> > > -  *  "In Align1 mode, all regioning parameters like
> > > stride, execution
> > > -  *   size, and width must use the syntax of a pair of
> > > packed
> > > -  *   floats. The offsets for these data types must be
> > > 64-
> > > bit
> > > -  *   aligned. The execution size and regioning
> > > parameters
> > > are in terms
> > > -  *   of floats."
> > > -  *
> > > -  * Summarized: when handling DF-typed arguments,
> > > ExecSize,
> > > -  * VertStride, and Width must be doubled, and
> > > HorzStride
> > > must be
> > > -  * doubled when the region is not scalar.
> > > -  *
> > > -  * It applies to BayTrail too.
> > > -  */
> > > - if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > > - type_sz(reg->type) == 8) {
> > > -brw_reg.width++;
> > > -if (brw_reg.vstride > 0)
> > > -   brw_reg.vstride++;
> > > -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> > > +
> > > + if (devinfo->gen == 7 && !devinfo->is_haswell) {
> > > +/* From the IvyBridge PRM (EU Changes by Processor
> > > Generation, page 13):
> > > + *  "Each DF (Double Float) operand uses an element
> > > size of 4 rather
> > > + *   than 8 and all regioning parameters are twice
> > > what the values
> > > + *   would be based on the true element size:
> > > ExecSize, Width,
> > > + *   HorzStride, and VertStride. Each DF operand
> > > uses
> > > a pair of
> > > + *   channels and all masking and swizzing should be
> > > adjusted
> > > + *   appropriately."
> > > + *
> > > + * From the IvyBridge PRM (Special Requirements for
> > > Handling Double
> > > + * Precision Data Types, page 71):
> > > + *  "In Align1 mode, all regioning parameters like
> > > stride, execution
> > > + *   size, and width must use the syntax of a pair
> > > of
> > 

Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

2017-02-15 Thread Samuel Iglesias Gonsálvez
On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
> > From: "Juan A. Suarez Romero" 
> > 
> > When converting a DF to 32-bit conversions, we set dst stride to 2,
> > to fulfill alignment restrictions because the upper Dword of every
> > Qword will be written with undefined value.
> > 
> > But in IVB/BYT, this is not necessary, as each DF conversion
> > already
> > writes 2, the first one the real value, and the second one a 0.
> > That is, IVB/BYT already set stride = 2 implicitly, so we must set
> > it to
> > 1 explicitly to avoid ending up with stride = 4.
> > 
> > v2:
> > - Fix typo (Matt)
> > 
> > v3:
> > - Fix stride in the destination's brw_reg, don't modity IR (Curro)
> > 
> > Signed-off-by: Juan A. Suarez Romero 
> > Signed-off-by: Samuel Iglesias Gonsálvez 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76
> > +++---
> >  1 file changed, 45 insertions(+), 31 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index 2f60ddd8706..e0a80d73b70 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
> >  
> >  static struct brw_reg
> >  brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst
> > *inst,
> > -fs_reg *reg, bool compressed)
> > +fs_reg *reg, bool is_dst, bool compressed)
> >  {
> > struct brw_reg brw_reg;
> >  
> > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct
> > gen_device_info *devinfo, fs_inst *inst,
> >   const unsigned width = MIN2(reg_width, phys_width);
> >   brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> > reg->nr, 0);
> >   brw_reg = stride(brw_reg, width * reg->stride, width,
> > reg->stride);
> > - /* From the IvyBridge PRM (EU Changes by Processor
> > Generation, page 13):
> > -  *  "Each DF (Double Float) operand uses an element size
> > of 4 rather
> > -  *   than 8 and all regioning parameters are twice what
> > the values
> > -  *   would be based on the true element size: ExecSize,
> > Width,
> > -  *   HorzStride, and VertStride. Each DF operand uses a
> > pair of
> > -  *   channels and all masking and swizzing should be
> > adjusted
> > -  *   appropriately."
> > -  *
> > -  * From the IvyBridge PRM (Special Requirements for
> > Handling Double
> > -  * Precision Data Types, page 71):
> > -  *  "In Align1 mode, all regioning parameters like
> > stride, execution
> > -  *   size, and width must use the syntax of a pair of
> > packed
> > -  *   floats. The offsets for these data types must be 64-
> > bit
> > -  *   aligned. The execution size and regioning parameters
> > are in terms
> > -  *   of floats."
> > -  *
> > -  * Summarized: when handling DF-typed arguments,
> > ExecSize,
> > -  * VertStride, and Width must be doubled, and HorzStride
> > must be
> > -  * doubled when the region is not scalar.
> > -  *
> > -  * It applies to BayTrail too.
> > -  */
> > - if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > - type_sz(reg->type) == 8) {
> > -brw_reg.width++;
> > -if (brw_reg.vstride > 0)
> > -   brw_reg.vstride++;
> > -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> > +
> > + if (devinfo->gen == 7 && !devinfo->is_haswell) {
> > +/* From the IvyBridge PRM (EU Changes by Processor
> > Generation, page 13):
> > + *  "Each DF (Double Float) operand uses an element
> > size of 4 rather
> > + *   than 8 and all regioning parameters are twice
> > what the values
> > + *   would be based on the true element size:
> > ExecSize, Width,
> > + *   HorzStride, and VertStride. Each DF operand uses
> > a pair of
> > + *   channels and all masking and swizzing should be
> > adjusted
> > + *   appropriately."
> > + *
> > + * From the IvyBridge PRM (Special Requirements for
> > Handling Double
> > + * Precision Data Types, page 71):
> > + *  "In Align1 mode, all regioning parameters like
> > stride, execution
> > + *   size, and width must use the syntax of a pair of
> > packed
> > + *   floats. The offsets for these data types must be
> > 64-bit
> > + *   aligned. The execution size and regioning
> > parameters are in terms
> > + *   of floats."
> > + *
> > + * Summarized: when handling DF-typed arguments,
> > ExecSize,
> > + * VertStride, and Width must 

Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

2017-02-15 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: "Juan A. Suarez Romero" 
>
> When converting a DF to 32-bit conversions, we set dst stride to 2,
> to fulfill alignment restrictions because the upper Dword of every
> Qword will be written with undefined value.
>
> But in IVB/BYT, this is not necessary, as each DF conversion already
> writes 2, the first one the real value, and the second one a 0.
> That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
> 1 explicitly to avoid ending up with stride = 4.
>
> v2:
> - Fix typo (Matt)
>
> v3:
> - Fix stride in the destination's brw_reg, don't modity IR (Curro)
>
> Signed-off-by: Juan A. Suarez Romero 
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 
> +++---
>  1 file changed, 45 insertions(+), 31 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 2f60ddd8706..e0a80d73b70 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
>  
>  static struct brw_reg
>  brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
> -fs_reg *reg, bool compressed)
> +fs_reg *reg, bool is_dst, bool compressed)
>  {
> struct brw_reg brw_reg;
>  
> @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct gen_device_info 
> *devinfo, fs_inst *inst,
>   const unsigned width = MIN2(reg_width, phys_width);
>   brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
>   brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
> - /* From the IvyBridge PRM (EU Changes by Processor Generation, page 
> 13):
> -  *  "Each DF (Double Float) operand uses an element size of 4 rather
> -  *   than 8 and all regioning parameters are twice what the values
> -  *   would be based on the true element size: ExecSize, Width,
> -  *   HorzStride, and VertStride. Each DF operand uses a pair of
> -  *   channels and all masking and swizzing should be adjusted
> -  *   appropriately."
> -  *
> -  * From the IvyBridge PRM (Special Requirements for Handling Double
> -  * Precision Data Types, page 71):
> -  *  "In Align1 mode, all regioning parameters like stride, execution
> -  *   size, and width must use the syntax of a pair of packed
> -  *   floats. The offsets for these data types must be 64-bit
> -  *   aligned. The execution size and regioning parameters are in 
> terms
> -  *   of floats."
> -  *
> -  * Summarized: when handling DF-typed arguments, ExecSize,
> -  * VertStride, and Width must be doubled, and HorzStride must be
> -  * doubled when the region is not scalar.
> -  *
> -  * It applies to BayTrail too.
> -  */
> - if (devinfo->gen == 7 && !devinfo->is_haswell &&
> - type_sz(reg->type) == 8) {
> -brw_reg.width++;
> -if (brw_reg.vstride > 0)
> -   brw_reg.vstride++;
> -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> +
> + if (devinfo->gen == 7 && !devinfo->is_haswell) {
> +/* From the IvyBridge PRM (EU Changes by Processor Generation, 
> page 13):
> + *  "Each DF (Double Float) operand uses an element size of 4 
> rather
> + *   than 8 and all regioning parameters are twice what the 
> values
> + *   would be based on the true element size: ExecSize, Width,
> + *   HorzStride, and VertStride. Each DF operand uses a pair of
> + *   channels and all masking and swizzing should be adjusted
> + *   appropriately."
> + *
> + * From the IvyBridge PRM (Special Requirements for Handling 
> Double
> + * Precision Data Types, page 71):
> + *  "In Align1 mode, all regioning parameters like stride, 
> execution
> + *   size, and width must use the syntax of a pair of packed
> + *   floats. The offsets for these data types must be 64-bit
> + *   aligned. The execution size and regioning parameters are in 
> terms
> + *   of floats."
> + *
> + * Summarized: when handling DF-typed arguments, ExecSize,
> + * VertStride, and Width must be doubled, and HorzStride must be
> + * doubled when the region is not scalar.
> + *
> + * It applies to BayTrail too.
> + */
> +if (type_sz(reg->type) == 8) {
> +   brw_reg.width++;
> +   if (brw_reg.vstride > 0)
> +  brw_reg.vstride++;
> +  

[Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

2017-02-14 Thread Samuel Iglesias Gonsálvez
From: "Juan A. Suarez Romero" 

When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

Signed-off-by: Juan A. Suarez Romero 
Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 +++---
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f60ddd8706..e0a80d73b70 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
 
 static struct brw_reg
 brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
-fs_reg *reg, bool compressed)
+fs_reg *reg, bool is_dst, bool compressed)
 {
struct brw_reg brw_reg;
 
@@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, 
fs_inst *inst,
  const unsigned width = MIN2(reg_width, phys_width);
  brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
  brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
- /* From the IvyBridge PRM (EU Changes by Processor Generation, page 
13):
-  *  "Each DF (Double Float) operand uses an element size of 4 rather
-  *   than 8 and all regioning parameters are twice what the values
-  *   would be based on the true element size: ExecSize, Width,
-  *   HorzStride, and VertStride. Each DF operand uses a pair of
-  *   channels and all masking and swizzing should be adjusted
-  *   appropriately."
-  *
-  * From the IvyBridge PRM (Special Requirements for Handling Double
-  * Precision Data Types, page 71):
-  *  "In Align1 mode, all regioning parameters like stride, execution
-  *   size, and width must use the syntax of a pair of packed
-  *   floats. The offsets for these data types must be 64-bit
-  *   aligned. The execution size and regioning parameters are in terms
-  *   of floats."
-  *
-  * Summarized: when handling DF-typed arguments, ExecSize,
-  * VertStride, and Width must be doubled, and HorzStride must be
-  * doubled when the region is not scalar.
-  *
-  * It applies to BayTrail too.
-  */
- if (devinfo->gen == 7 && !devinfo->is_haswell &&
- type_sz(reg->type) == 8) {
-brw_reg.width++;
-if (brw_reg.vstride > 0)
-   brw_reg.vstride++;
-assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+ if (devinfo->gen == 7 && !devinfo->is_haswell) {
+/* From the IvyBridge PRM (EU Changes by Processor Generation, 
page 13):
+ *  "Each DF (Double Float) operand uses an element size of 4 
rather
+ *   than 8 and all regioning parameters are twice what the values
+ *   would be based on the true element size: ExecSize, Width,
+ *   HorzStride, and VertStride. Each DF operand uses a pair of
+ *   channels and all masking and swizzing should be adjusted
+ *   appropriately."
+ *
+ * From the IvyBridge PRM (Special Requirements for Handling Double
+ * Precision Data Types, page 71):
+ *  "In Align1 mode, all regioning parameters like stride, 
execution
+ *   size, and width must use the syntax of a pair of packed
+ *   floats. The offsets for these data types must be 64-bit
+ *   aligned. The execution size and regioning parameters are in 
terms
+ *   of floats."
+ *
+ * Summarized: when handling DF-typed arguments, ExecSize,
+ * VertStride, and Width must be doubled, and HorzStride must be
+ * doubled when the region is not scalar.
+ *
+ * It applies to BayTrail too.
+ */
+if (type_sz(reg->type) == 8) {
+   brw_reg.width++;
+   if (brw_reg.vstride > 0)
+  brw_reg.vstride++;
+   assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
+}
+
+/* When converting from DF->F, we set destination's stride as 2 as 
an
+ * because each d2f conversion implicitly writes 2 F,
+ * being the first one