Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.
On Mon, 2019-01-07 at 12:00 -0800, Francisco Jerez wrote: > Iago Toral writes: > > > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote: > > > Currently the visitor attempts to enforce the regioning > > > restrictions > > > that apply to double-precision instructions on CHV/BXT at NIR-to- > > > i965 > > > translation time. It is possible though for the copy propagation > > > pass > > > to violate this restriction if a strided move is propagated into > > > one > > > of the affected instructions. I've only reproduced this issue on > > > a > > > future platform but it could affect CHV/BXT too under the right > > > conditions. > > > > > > Cc: mesa-sta...@lists.freedesktop.org > > > --- > > > .../compiler/brw_fs_copy_propagation.cpp | 10 +++ > > > src/intel/compiler/brw_ir_fs.h| 28 > > > +++ > > > 2 files changed, 38 insertions(+) > > > > > > diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp > > > b/src/intel/compiler/brw_fs_copy_propagation.cpp > > > index a8ec1c34630..c23ce1ef426 100644 > > > --- a/src/intel/compiler/brw_fs_copy_propagation.cpp > > > +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp > > > @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, > > > unsigned stride, > > > if (stride > 4) > > >return false; > > > > > > + /* Bail if the channels of the source need to be aligned to > > > the > > > byte offset > > > +* of the corresponding channel of the destination, and the > > > provided stride > > > +* would break this restriction. > > > +*/ > > > + if (has_dst_aligned_region_restriction(devinfo, inst) && > > > + !(type_sz(inst->src[arg].type) * stride == > > > + type_sz(inst->dst.type) * inst->dst.stride || > > > + stride == 0)) > > > + return false; > > > + > > > /* 3-source instructions can only be Align16, which restricts > > > what strides > > > * they can take. They can only take a stride of 1 (the usual > > > case), or 0 > > > * with a special "repctrl" bit. But the repctrl bit doesn't > > > work > > > for > > > diff --git a/src/intel/compiler/brw_ir_fs.h > > > b/src/intel/compiler/brw_ir_fs.h > > > index 07e7224e0f8..95b069a2e02 100644 > > > --- a/src/intel/compiler/brw_ir_fs.h > > > +++ b/src/intel/compiler/brw_ir_fs.h > > > @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst) > > > return type_sz(get_exec_type(inst)); > > > } > > > > > > +/** > > > + * Return whether the following regioning restriction applies to > > > the > > > specified > > > + * instruction. From the Cherryview PRM Vol 7. "Register Region > > > + * Restrictions": > > > + * > > > + * "When source or destination datatype is 64b or operation is > > > integer DWord > > > + * multiply, regioning in Align1 must follow these rules: > > > + * > > > + * 1. Source and Destination horizontal stride must be aligned > > > to > > > the same qword. > > > + * 2. Regioning must ensure Src.Vstride = Src.Width * > > > Src.Hstride. > > > + * 3. Source and Destination offset must be the same, except > > > the > > > case of > > > + * scalar source." > > > + */ > > > +static inline bool > > > +has_dst_aligned_region_restriction(const gen_device_info > > > *devinfo, > > > + const fs_inst *inst) > > > +{ > > > + const brw_reg_type exec_type = get_exec_type(inst); > > > + const bool is_int_multiply = > > > !brw_reg_type_is_floating_point(exec_type) && > > > + (inst->opcode == BRW_OPCODE_MUL || inst->opcode == > > > BRW_OPCODE_MAD); > > > > Should this be extended to include MAC and MACH too? > > > > The documentation is unclear, but it doesn't look like that's the > case > according to the simulator, because those instructions don't do more > than a 16x16 or 32x16 bit integer multiply respectively. Ok, then it should be fine, thanks for looking into it. > > > + > > > + if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 || > > > + (type_sz(exec_type) == 4 && is_int_multiply)) > > > + return devinfo->is_cherryview || > > > gen_device_info_is_9lp(devinfo); > > > > How about: > > > > if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) { > >... > > } else { > >return false; > > } > > > > since we only really need to do these checks in those platforms it > > might make a bit more sense to do it this way. > > > > Right now the difference is purely cosmetic, but in the future that > won't work for the platform this was designed for, I can send you > more > details off-list. That's fine, I was just looking at it from the point of view of the platforms this is required for at present. If writing it like this makes more sense to add future gens later on let's keep it as you wrote it. > > > + else > > > + return false; > > > +} > > > + > > > #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.
Iago Toral writes: > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote: >> Currently the visitor attempts to enforce the regioning restrictions >> that apply to double-precision instructions on CHV/BXT at NIR-to-i965 >> translation time. It is possible though for the copy propagation >> pass >> to violate this restriction if a strided move is propagated into one >> of the affected instructions. I've only reproduced this issue on a >> future platform but it could affect CHV/BXT too under the right >> conditions. >> >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> .../compiler/brw_fs_copy_propagation.cpp | 10 +++ >> src/intel/compiler/brw_ir_fs.h| 28 >> +++ >> 2 files changed, 38 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp >> b/src/intel/compiler/brw_fs_copy_propagation.cpp >> index a8ec1c34630..c23ce1ef426 100644 >> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp >> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp >> @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, >> unsigned stride, >> if (stride > 4) >>return false; >> >> + /* Bail if the channels of the source need to be aligned to the >> byte offset >> +* of the corresponding channel of the destination, and the >> provided stride >> +* would break this restriction. >> +*/ >> + if (has_dst_aligned_region_restriction(devinfo, inst) && >> + !(type_sz(inst->src[arg].type) * stride == >> + type_sz(inst->dst.type) * inst->dst.stride || >> + stride == 0)) >> + return false; >> + >> /* 3-source instructions can only be Align16, which restricts >> what strides >> * they can take. They can only take a stride of 1 (the usual >> case), or 0 >> * with a special "repctrl" bit. But the repctrl bit doesn't work >> for >> diff --git a/src/intel/compiler/brw_ir_fs.h >> b/src/intel/compiler/brw_ir_fs.h >> index 07e7224e0f8..95b069a2e02 100644 >> --- a/src/intel/compiler/brw_ir_fs.h >> +++ b/src/intel/compiler/brw_ir_fs.h >> @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst) >> return type_sz(get_exec_type(inst)); >> } >> >> +/** >> + * Return whether the following regioning restriction applies to the >> specified >> + * instruction. From the Cherryview PRM Vol 7. "Register Region >> + * Restrictions": >> + * >> + * "When source or destination datatype is 64b or operation is >> integer DWord >> + * multiply, regioning in Align1 must follow these rules: >> + * >> + * 1. Source and Destination horizontal stride must be aligned to >> the same qword. >> + * 2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride. >> + * 3. Source and Destination offset must be the same, except the >> case of >> + * scalar source." >> + */ >> +static inline bool >> +has_dst_aligned_region_restriction(const gen_device_info *devinfo, >> + const fs_inst *inst) >> +{ >> + const brw_reg_type exec_type = get_exec_type(inst); >> + const bool is_int_multiply = >> !brw_reg_type_is_floating_point(exec_type) && >> + (inst->opcode == BRW_OPCODE_MUL || inst->opcode == >> BRW_OPCODE_MAD); > > Should this be extended to include MAC and MACH too? > The documentation is unclear, but it doesn't look like that's the case according to the simulator, because those instructions don't do more than a 16x16 or 32x16 bit integer multiply respectively. >> + >> + if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 || >> + (type_sz(exec_type) == 4 && is_int_multiply)) >> + return devinfo->is_cherryview || >> gen_device_info_is_9lp(devinfo); > > How about: > > if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) { >... > } else { >return false; > } > > since we only really need to do these checks in those platforms it > might make a bit more sense to do it this way. > Right now the difference is purely cosmetic, but in the future that won't work for the platform this was designed for, I can send you more details off-list. >> + else >> + return false; >> +} >> + >> #endif signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.
On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote: > Currently the visitor attempts to enforce the regioning restrictions > that apply to double-precision instructions on CHV/BXT at NIR-to-i965 > translation time. It is possible though for the copy propagation > pass > to violate this restriction if a strided move is propagated into one > of the affected instructions. I've only reproduced this issue on a > future platform but it could affect CHV/BXT too under the right > conditions. > > Cc: mesa-sta...@lists.freedesktop.org > --- > .../compiler/brw_fs_copy_propagation.cpp | 10 +++ > src/intel/compiler/brw_ir_fs.h| 28 > +++ > 2 files changed, 38 insertions(+) > > diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp > b/src/intel/compiler/brw_fs_copy_propagation.cpp > index a8ec1c34630..c23ce1ef426 100644 > --- a/src/intel/compiler/brw_fs_copy_propagation.cpp > +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp > @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, > unsigned stride, > if (stride > 4) >return false; > > + /* Bail if the channels of the source need to be aligned to the > byte offset > +* of the corresponding channel of the destination, and the > provided stride > +* would break this restriction. > +*/ > + if (has_dst_aligned_region_restriction(devinfo, inst) && > + !(type_sz(inst->src[arg].type) * stride == > + type_sz(inst->dst.type) * inst->dst.stride || > + stride == 0)) > + return false; > + > /* 3-source instructions can only be Align16, which restricts > what strides > * they can take. They can only take a stride of 1 (the usual > case), or 0 > * with a special "repctrl" bit. But the repctrl bit doesn't work > for > diff --git a/src/intel/compiler/brw_ir_fs.h > b/src/intel/compiler/brw_ir_fs.h > index 07e7224e0f8..95b069a2e02 100644 > --- a/src/intel/compiler/brw_ir_fs.h > +++ b/src/intel/compiler/brw_ir_fs.h > @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst) > return type_sz(get_exec_type(inst)); > } > > +/** > + * Return whether the following regioning restriction applies to the > specified > + * instruction. From the Cherryview PRM Vol 7. "Register Region > + * Restrictions": > + * > + * "When source or destination datatype is 64b or operation is > integer DWord > + * multiply, regioning in Align1 must follow these rules: > + * > + * 1. Source and Destination horizontal stride must be aligned to > the same qword. > + * 2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride. > + * 3. Source and Destination offset must be the same, except the > case of > + * scalar source." > + */ > +static inline bool > +has_dst_aligned_region_restriction(const gen_device_info *devinfo, > + const fs_inst *inst) > +{ > + const brw_reg_type exec_type = get_exec_type(inst); > + const bool is_int_multiply = > !brw_reg_type_is_floating_point(exec_type) && > + (inst->opcode == BRW_OPCODE_MUL || inst->opcode == > BRW_OPCODE_MAD); Should this be extended to include MAC and MACH too? > + > + if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 || > + (type_sz(exec_type) == 4 && is_int_multiply)) > + return devinfo->is_cherryview || > gen_device_info_is_9lp(devinfo); How about: if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) { ... } else { return false; } since we only really need to do these checks in those platforms it might make a bit more sense to do it this way. > + else > + return false; > +} > + > #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.
Currently the visitor attempts to enforce the regioning restrictions that apply to double-precision instructions on CHV/BXT at NIR-to-i965 translation time. It is possible though for the copy propagation pass to violate this restriction if a strided move is propagated into one of the affected instructions. I've only reproduced this issue on a future platform but it could affect CHV/BXT too under the right conditions. Cc: mesa-sta...@lists.freedesktop.org --- .../compiler/brw_fs_copy_propagation.cpp | 10 +++ src/intel/compiler/brw_ir_fs.h| 28 +++ 2 files changed, 38 insertions(+) diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp index a8ec1c34630..c23ce1ef426 100644 --- a/src/intel/compiler/brw_fs_copy_propagation.cpp +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride, if (stride > 4) return false; + /* Bail if the channels of the source need to be aligned to the byte offset +* of the corresponding channel of the destination, and the provided stride +* would break this restriction. +*/ + if (has_dst_aligned_region_restriction(devinfo, inst) && + !(type_sz(inst->src[arg].type) * stride == + type_sz(inst->dst.type) * inst->dst.stride || + stride == 0)) + return false; + /* 3-source instructions can only be Align16, which restricts what strides * they can take. They can only take a stride of 1 (the usual case), or 0 * with a special "repctrl" bit. But the repctrl bit doesn't work for diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h index 07e7224e0f8..95b069a2e02 100644 --- a/src/intel/compiler/brw_ir_fs.h +++ b/src/intel/compiler/brw_ir_fs.h @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst) return type_sz(get_exec_type(inst)); } +/** + * Return whether the following regioning restriction applies to the specified + * instruction. From the Cherryview PRM Vol 7. "Register Region + * Restrictions": + * + * "When source or destination datatype is 64b or operation is integer DWord + * multiply, regioning in Align1 must follow these rules: + * + * 1. Source and Destination horizontal stride must be aligned to the same qword. + * 2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride. + * 3. Source and Destination offset must be the same, except the case of + * scalar source." + */ +static inline bool +has_dst_aligned_region_restriction(const gen_device_info *devinfo, + const fs_inst *inst) +{ + const brw_reg_type exec_type = get_exec_type(inst); + const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) && + (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD); + + if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 || + (type_sz(exec_type) == 4 && is_int_multiply)) + return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo); + else + return false; +} + #endif -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev