Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions
On Thu, 2017-02-16 at 08:11 +0100, Samuel Iglesias Gonsálvez wrote: > On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote: > > Samuel Iglesias Gonsálvezwrites: > > > > > From: "Juan A. Suarez Romero" > > > > > > When converting a DF to 32-bit conversions, we set dst stride to > > > 2, > > > to fulfill alignment restrictions because the upper Dword of > > > every > > > Qword will be written with undefined value. > > > > > > But in IVB/BYT, this is not necessary, as each DF conversion > > > already > > > writes 2, the first one the real value, and the second one a 0. > > > That is, IVB/BYT already set stride = 2 implicitly, so we must > > > set > > > it to > > > 1 explicitly to avoid ending up with stride = 4. > > > > > > v2: > > > - Fix typo (Matt) > > > > > > v3: > > > - Fix stride in the destination's brw_reg, don't modity IR > > > (Curro) > > > > > > Signed-off-by: Juan A. Suarez Romero > > > Signed-off-by: Samuel Iglesias Gonsálvez > > > --- > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 > > > +++--- > > > 1 file changed, 45 insertions(+), 31 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > index 2f60ddd8706..e0a80d73b70 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg) > > > > > > static struct brw_reg > > > brw_reg_from_fs_reg(const struct gen_device_info *devinfo, > > > fs_inst > > > *inst, > > > -fs_reg *reg, bool compressed) > > > +fs_reg *reg, bool is_dst, bool compressed) > > > { > > > struct brw_reg brw_reg; > > > > > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct > > > gen_device_info *devinfo, fs_inst *inst, > > > const unsigned width = MIN2(reg_width, phys_width); > > > brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), > > > reg->nr, 0); > > > brw_reg = stride(brw_reg, width * reg->stride, width, > > > reg->stride); > > > - /* From the IvyBridge PRM (EU Changes by Processor > > > Generation, page 13): > > > - * "Each DF (Double Float) operand uses an element > > > size > > > of 4 rather > > > - * than 8 and all regioning parameters are twice what > > > the values > > > - * would be based on the true element size: ExecSize, > > > Width, > > > - * HorzStride, and VertStride. Each DF operand uses a > > > pair of > > > - * channels and all masking and swizzing should be > > > adjusted > > > - * appropriately." > > > - * > > > - * From the IvyBridge PRM (Special Requirements for > > > Handling Double > > > - * Precision Data Types, page 71): > > > - * "In Align1 mode, all regioning parameters like > > > stride, execution > > > - * size, and width must use the syntax of a pair of > > > packed > > > - * floats. The offsets for these data types must be > > > 64- > > > bit > > > - * aligned. The execution size and regioning > > > parameters > > > are in terms > > > - * of floats." > > > - * > > > - * Summarized: when handling DF-typed arguments, > > > ExecSize, > > > - * VertStride, and Width must be doubled, and > > > HorzStride > > > must be > > > - * doubled when the region is not scalar. > > > - * > > > - * It applies to BayTrail too. > > > - */ > > > - if (devinfo->gen == 7 && !devinfo->is_haswell && > > > - type_sz(reg->type) == 8) { > > > -brw_reg.width++; > > > -if (brw_reg.vstride > 0) > > > - brw_reg.vstride++; > > > -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); > > > + > > > + if (devinfo->gen == 7 && !devinfo->is_haswell) { > > > +/* From the IvyBridge PRM (EU Changes by Processor > > > Generation, page 13): > > > + * "Each DF (Double Float) operand uses an element > > > size of 4 rather > > > + * than 8 and all regioning parameters are twice > > > what the values > > > + * would be based on the true element size: > > > ExecSize, Width, > > > + * HorzStride, and VertStride. Each DF operand > > > uses > > > a pair of > > > + * channels and all masking and swizzing should be > > > adjusted > > > + * appropriately." > > > + * > > > + * From the IvyBridge PRM (Special Requirements for > > > Handling Double > > > + * Precision Data Types, page 71): > > > + * "In Align1 mode, all regioning parameters like > > > stride, execution > > > + * size, and width must use the syntax of a pair > > > of > >
Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions
On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote: > Samuel Iglesias Gonsálvezwrites: > > > From: "Juan A. Suarez Romero" > > > > When converting a DF to 32-bit conversions, we set dst stride to 2, > > to fulfill alignment restrictions because the upper Dword of every > > Qword will be written with undefined value. > > > > But in IVB/BYT, this is not necessary, as each DF conversion > > already > > writes 2, the first one the real value, and the second one a 0. > > That is, IVB/BYT already set stride = 2 implicitly, so we must set > > it to > > 1 explicitly to avoid ending up with stride = 4. > > > > v2: > > - Fix typo (Matt) > > > > v3: > > - Fix stride in the destination's brw_reg, don't modity IR (Curro) > > > > Signed-off-by: Juan A. Suarez Romero > > Signed-off-by: Samuel Iglesias Gonsálvez > > --- > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 > > +++--- > > 1 file changed, 45 insertions(+), 31 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > index 2f60ddd8706..e0a80d73b70 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg) > > > > static struct brw_reg > > brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst > > *inst, > > -fs_reg *reg, bool compressed) > > +fs_reg *reg, bool is_dst, bool compressed) > > { > > struct brw_reg brw_reg; > > > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct > > gen_device_info *devinfo, fs_inst *inst, > > const unsigned width = MIN2(reg_width, phys_width); > > brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), > > reg->nr, 0); > > brw_reg = stride(brw_reg, width * reg->stride, width, > > reg->stride); > > - /* From the IvyBridge PRM (EU Changes by Processor > > Generation, page 13): > > - * "Each DF (Double Float) operand uses an element size > > of 4 rather > > - * than 8 and all regioning parameters are twice what > > the values > > - * would be based on the true element size: ExecSize, > > Width, > > - * HorzStride, and VertStride. Each DF operand uses a > > pair of > > - * channels and all masking and swizzing should be > > adjusted > > - * appropriately." > > - * > > - * From the IvyBridge PRM (Special Requirements for > > Handling Double > > - * Precision Data Types, page 71): > > - * "In Align1 mode, all regioning parameters like > > stride, execution > > - * size, and width must use the syntax of a pair of > > packed > > - * floats. The offsets for these data types must be 64- > > bit > > - * aligned. The execution size and regioning parameters > > are in terms > > - * of floats." > > - * > > - * Summarized: when handling DF-typed arguments, > > ExecSize, > > - * VertStride, and Width must be doubled, and HorzStride > > must be > > - * doubled when the region is not scalar. > > - * > > - * It applies to BayTrail too. > > - */ > > - if (devinfo->gen == 7 && !devinfo->is_haswell && > > - type_sz(reg->type) == 8) { > > -brw_reg.width++; > > -if (brw_reg.vstride > 0) > > - brw_reg.vstride++; > > -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); > > + > > + if (devinfo->gen == 7 && !devinfo->is_haswell) { > > +/* From the IvyBridge PRM (EU Changes by Processor > > Generation, page 13): > > + * "Each DF (Double Float) operand uses an element > > size of 4 rather > > + * than 8 and all regioning parameters are twice > > what the values > > + * would be based on the true element size: > > ExecSize, Width, > > + * HorzStride, and VertStride. Each DF operand uses > > a pair of > > + * channels and all masking and swizzing should be > > adjusted > > + * appropriately." > > + * > > + * From the IvyBridge PRM (Special Requirements for > > Handling Double > > + * Precision Data Types, page 71): > > + * "In Align1 mode, all regioning parameters like > > stride, execution > > + * size, and width must use the syntax of a pair of > > packed > > + * floats. The offsets for these data types must be > > 64-bit > > + * aligned. The execution size and regioning > > parameters are in terms > > + * of floats." > > + * > > + * Summarized: when handling DF-typed arguments, > > ExecSize, > > + * VertStride, and Width must
Re: [Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions
Samuel Iglesias Gonsálvezwrites: > From: "Juan A. Suarez Romero" > > When converting a DF to 32-bit conversions, we set dst stride to 2, > to fulfill alignment restrictions because the upper Dword of every > Qword will be written with undefined value. > > But in IVB/BYT, this is not necessary, as each DF conversion already > writes 2, the first one the real value, and the second one a 0. > That is, IVB/BYT already set stride = 2 implicitly, so we must set it to > 1 explicitly to avoid ending up with stride = 4. > > v2: > - Fix typo (Matt) > > v3: > - Fix stride in the destination's brw_reg, don't modity IR (Curro) > > Signed-off-by: Juan A. Suarez Romero > Signed-off-by: Samuel Iglesias Gonsálvez > --- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 > +++--- > 1 file changed, 45 insertions(+), 31 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index 2f60ddd8706..e0a80d73b70 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg) > > static struct brw_reg > brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst, > -fs_reg *reg, bool compressed) > +fs_reg *reg, bool is_dst, bool compressed) > { > struct brw_reg brw_reg; > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct gen_device_info > *devinfo, fs_inst *inst, > const unsigned width = MIN2(reg_width, phys_width); > brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0); > brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride); > - /* From the IvyBridge PRM (EU Changes by Processor Generation, page > 13): > - * "Each DF (Double Float) operand uses an element size of 4 rather > - * than 8 and all regioning parameters are twice what the values > - * would be based on the true element size: ExecSize, Width, > - * HorzStride, and VertStride. Each DF operand uses a pair of > - * channels and all masking and swizzing should be adjusted > - * appropriately." > - * > - * From the IvyBridge PRM (Special Requirements for Handling Double > - * Precision Data Types, page 71): > - * "In Align1 mode, all regioning parameters like stride, execution > - * size, and width must use the syntax of a pair of packed > - * floats. The offsets for these data types must be 64-bit > - * aligned. The execution size and regioning parameters are in > terms > - * of floats." > - * > - * Summarized: when handling DF-typed arguments, ExecSize, > - * VertStride, and Width must be doubled, and HorzStride must be > - * doubled when the region is not scalar. > - * > - * It applies to BayTrail too. > - */ > - if (devinfo->gen == 7 && !devinfo->is_haswell && > - type_sz(reg->type) == 8) { > -brw_reg.width++; > -if (brw_reg.vstride > 0) > - brw_reg.vstride++; > -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); > + > + if (devinfo->gen == 7 && !devinfo->is_haswell) { > +/* From the IvyBridge PRM (EU Changes by Processor Generation, > page 13): > + * "Each DF (Double Float) operand uses an element size of 4 > rather > + * than 8 and all regioning parameters are twice what the > values > + * would be based on the true element size: ExecSize, Width, > + * HorzStride, and VertStride. Each DF operand uses a pair of > + * channels and all masking and swizzing should be adjusted > + * appropriately." > + * > + * From the IvyBridge PRM (Special Requirements for Handling > Double > + * Precision Data Types, page 71): > + * "In Align1 mode, all regioning parameters like stride, > execution > + * size, and width must use the syntax of a pair of packed > + * floats. The offsets for these data types must be 64-bit > + * aligned. The execution size and regioning parameters are in > terms > + * of floats." > + * > + * Summarized: when handling DF-typed arguments, ExecSize, > + * VertStride, and Width must be doubled, and HorzStride must be > + * doubled when the region is not scalar. > + * > + * It applies to BayTrail too. > + */ > +if (type_sz(reg->type) == 8) { > + brw_reg.width++; > + if (brw_reg.vstride > 0) > + brw_reg.vstride++; > +
[Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions
From: "Juan A. Suarez Romero"When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) Signed-off-by: Juan A. Suarez Romero Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 +++--- 1 file changed, 45 insertions(+), 31 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 2f60ddd8706..e0a80d73b70 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg) static struct brw_reg brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst, -fs_reg *reg, bool compressed) +fs_reg *reg, bool is_dst, bool compressed) { struct brw_reg brw_reg; @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst, const unsigned width = MIN2(reg_width, phys_width); brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0); brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride); - /* From the IvyBridge PRM (EU Changes by Processor Generation, page 13): - * "Each DF (Double Float) operand uses an element size of 4 rather - * than 8 and all regioning parameters are twice what the values - * would be based on the true element size: ExecSize, Width, - * HorzStride, and VertStride. Each DF operand uses a pair of - * channels and all masking and swizzing should be adjusted - * appropriately." - * - * From the IvyBridge PRM (Special Requirements for Handling Double - * Precision Data Types, page 71): - * "In Align1 mode, all regioning parameters like stride, execution - * size, and width must use the syntax of a pair of packed - * floats. The offsets for these data types must be 64-bit - * aligned. The execution size and regioning parameters are in terms - * of floats." - * - * Summarized: when handling DF-typed arguments, ExecSize, - * VertStride, and Width must be doubled, and HorzStride must be - * doubled when the region is not scalar. - * - * It applies to BayTrail too. - */ - if (devinfo->gen == 7 && !devinfo->is_haswell && - type_sz(reg->type) == 8) { -brw_reg.width++; -if (brw_reg.vstride > 0) - brw_reg.vstride++; -assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); + + if (devinfo->gen == 7 && !devinfo->is_haswell) { +/* From the IvyBridge PRM (EU Changes by Processor Generation, page 13): + * "Each DF (Double Float) operand uses an element size of 4 rather + * than 8 and all regioning parameters are twice what the values + * would be based on the true element size: ExecSize, Width, + * HorzStride, and VertStride. Each DF operand uses a pair of + * channels and all masking and swizzing should be adjusted + * appropriately." + * + * From the IvyBridge PRM (Special Requirements for Handling Double + * Precision Data Types, page 71): + * "In Align1 mode, all regioning parameters like stride, execution + * size, and width must use the syntax of a pair of packed + * floats. The offsets for these data types must be 64-bit + * aligned. The execution size and regioning parameters are in terms + * of floats." + * + * Summarized: when handling DF-typed arguments, ExecSize, + * VertStride, and Width must be doubled, and HorzStride must be + * doubled when the region is not scalar. + * + * It applies to BayTrail too. + */ +if (type_sz(reg->type) == 8) { + brw_reg.width++; + if (brw_reg.vstride > 0) + brw_reg.vstride++; + assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); +} + +/* When converting from DF->F, we set destination's stride as 2 as an + * because each d2f conversion implicitly writes 2 F, + * being the first one