Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On Tue, Mar 6, 2018 at 11:28 AM, Ian Romanick wrote: > On 03/06/2018 11:24 AM, Matt Turner wrote: >> On Tue, Mar 6, 2018 at 11:22 AM, Ian Romanick wrote: >>> On 03/05/2018 04:40 PM, Matt Turner wrote: On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: > From: Kenneth Graunke > > v2 (idr): Don't allow CSEL with a non-float src2. > > v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. brw_disassemble_inst fs_visitor::dump_instruction vec4_instruction:writes_flag vec4_visitor::dump_instruction >>> >>> Do we need to update vec4 code? This instruction is BDW+, and we don't >>> use the vec4 backend for anything on those platforms... and this >>> optimization is FS-only. >> >> Not strictly necessary, but those conditions are checking the same >> thing so I'd keep them in sync. > > Ok. How can I test various dump_instruction and brw_disassemble_inst paths? brw_disassemble_inst() is called when we use INTEL_DEBUG=fs,vs,gs,etc to disassemble individual instructions. INTEL_DEBUG=... on a shader with csel without the change to brw_disassemble_inst should print csel.cmod.f0, and with the change should just print csel.cmod. dump_instructions() prints the backend IR and is used for debugging optimization passes. INTEL_DEBUG=optimizer will write out text files after each optimization pass that makes progress. The change will prevent the .f0 from being printed, like in brw_disassemble_inst(). We don't want to print .f0 because it doesn't write (or even use!) .f0. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On 03/06/2018 11:24 AM, Matt Turner wrote: > On Tue, Mar 6, 2018 at 11:22 AM, Ian Romanick wrote: >> On 03/05/2018 04:40 PM, Matt Turner wrote: >>> On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: From: Kenneth Graunke v2 (idr): Don't allow CSEL with a non-float src2. v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. >>> >>> brw_disassemble_inst >>> fs_visitor::dump_instruction >>> vec4_instruction:writes_flag >>> vec4_visitor::dump_instruction >> >> Do we need to update vec4 code? This instruction is BDW+, and we don't >> use the vec4 backend for anything on those platforms... and this >> optimization is FS-only. > > Not strictly necessary, but those conditions are checking the same > thing so I'd keep them in sync. Ok. How can I test various dump_instruction and brw_disassemble_inst paths? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On Tue, Mar 6, 2018 at 11:22 AM, Ian Romanick wrote: > On 03/05/2018 04:40 PM, Matt Turner wrote: >> On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: >>> From: Kenneth Graunke >>> >>> v2 (idr): Don't allow CSEL with a non-float src2. >>> >>> v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. >> >> brw_disassemble_inst >> fs_visitor::dump_instruction >> vec4_instruction:writes_flag >> vec4_visitor::dump_instruction > > Do we need to update vec4 code? This instruction is BDW+, and we don't > use the vec4 backend for anything on those platforms... and this > optimization is FS-only. Not strictly necessary, but those conditions are checking the same thing so I'd keep them in sync. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On 03/05/2018 04:40 PM, Matt Turner wrote: > On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: >> From: Kenneth Graunke >> >> v2 (idr): Don't allow CSEL with a non-float src2. >> >> v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. > > brw_disassemble_inst > fs_visitor::dump_instruction > vec4_instruction:writes_flag > vec4_visitor::dump_instruction Do we need to update vec4 code? This instruction is BDW+, and we don't use the vec4 backend for anything on those platforms... and this optimization is FS-only. > should all be updated similarly. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: > From: Kenneth Graunke > > v2 (idr): Don't allow CSEL with a non-float src2. > > v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. brw_disassemble_inst fs_visitor::dump_instruction vec4_instruction:writes_flag vec4_visitor::dump_instruction should all be updated similarly. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick wrote: > From: Kenneth Graunke > > v2 (idr): Don't allow CSEL with a non-float src2. > > v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. > > Signed-off-by: Kenneth Graunke > Signed-off-by: Ian Romanick > --- > src/intel/compiler/brw_eu.h | 1 + > src/intel/compiler/brw_eu_emit.c| 1 + > src/intel/compiler/brw_fs.cpp | 1 + > src/intel/compiler/brw_fs_builder.h | 22 +- > src/intel/compiler/brw_fs_generator.cpp | 5 + > 5 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h > index 2d0f56f..3201624 100644 > --- a/src/intel/compiler/brw_eu.h > +++ b/src/intel/compiler/brw_eu.h > @@ -171,6 +171,7 @@ ALU2(SHR) > ALU2(SHL) > ALU1(DIM) > ALU2(ASR) > +ALU3(CSEL) > ALU1(F32TO16) > ALU1(F16TO32) > ALU2(ADD) > diff --git a/src/intel/compiler/brw_eu_emit.c > b/src/intel/compiler/brw_eu_emit.c > index c25d8d6..305a759 100644 > --- a/src/intel/compiler/brw_eu_emit.c > +++ b/src/intel/compiler/brw_eu_emit.c > @@ -953,6 +953,7 @@ ALU2(SHR) > ALU2(SHL) > ALU1(DIM) > ALU2(ASR) > +ALU3(CSEL) > ALU1(FRC) > ALU1(RNDD) > ALU2(MAC) > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index bed632d..accae1b 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -929,6 +929,7 @@ unsigned > fs_inst::flags_written() const > { > if ((conditional_mod && (opcode != BRW_OPCODE_SEL && > +opcode != BRW_OPCODE_CSEL && > opcode != BRW_OPCODE_IF && > opcode != BRW_OPCODE_WHILE)) || > opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) { > diff --git a/src/intel/compiler/brw_fs_builder.h > b/src/intel/compiler/brw_fs_builder.h > index 87394bc..47eac42 100644 > --- a/src/intel/compiler/brw_fs_builder.h > +++ b/src/intel/compiler/brw_fs_builder.h > @@ -458,7 +458,6 @@ namespace brw { >ALU1(BFREV) >ALU1(CBIT) >ALU2(CMPN) > - ALU3(CSEL) >ALU1(DIM) >ALU2(DP2) >ALU2(DP3) > @@ -534,6 +533,27 @@ namespace brw { >} > >/** > + * CSEL: dst = src2 0.0f ? src0 : src1 > + */ > + instruction * > + CSEL(const dst_reg &dst, const src_reg &src0, const src_reg &src1, > + const src_reg &src2, brw_conditional_mod condition) const > + { > + /* CSEL only operates on floats, so we can't do integer =/> > + * comparisons. Zero/non-zero (== and !=) comparisons almost work. > + * 0x8000 fails because it is -0.0, and -0.0 == 0.0. > + */ > + assert(src2.type == BRW_REGISTER_TYPE_F); > + > + return set_condmod(condition, > +emit(BRW_OPCODE_CSEL, > + retype(dst, BRW_REGISTER_TYPE_F), > + retype(src0, BRW_REGISTER_TYPE_F), > + retype(src1, BRW_REGISTER_TYPE_F), > + src2)); > + } > + > + /** > * Emit a linear interpolation instruction. > */ >instruction * > diff --git a/src/intel/compiler/brw_fs_generator.cpp > b/src/intel/compiler/brw_fs_generator.cpp > index cd5be05..9157367 100644 > --- a/src/intel/compiler/brw_fs_generator.cpp > +++ b/src/intel/compiler/brw_fs_generator.cpp > @@ -1826,6 +1826,11 @@ fs_generator::generate_code(const cfg_t *cfg, int > dispatch_width) >case BRW_OPCODE_SEL: > brw_SEL(p, dst, src[0], src[1]); > break; > + case BRW_OPCODE_CSEL: > + brw_set_default_access_mode(p, BRW_ALIGN_16); > + brw_CSEL(p, dst, src[0], src[1], src[2]); > + brw_set_default_access_mode(p, BRW_ALIGN_1); > + break; This is not okay, actually. Gen11 gets rid of Align16 mode, and even on Gen10 I've switched all 3-src instructions to use align1 mode. Just make it work like MAD/LRP/etc and it'll be fine. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On Monday, March 5, 2018 11:17:29 AM PST Ian Romanick wrote: > On 02/28/2018 12:58 AM, Samuel Iglesias Gonsálvez wrote: > > On 24/02/18 00:56, Ian Romanick wrote: > >> From: Kenneth Graunke > >> > >> v2 (idr): Don't allow CSEL with a non-float src2. > >> > >> v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. > >> > >> Signed-off-by: Kenneth Graunke > >> Signed-off-by: Ian Romanick > >> --- > >> src/intel/compiler/brw_eu.h | 1 + > >> src/intel/compiler/brw_eu_emit.c| 1 + > >> src/intel/compiler/brw_fs.cpp | 1 + > >> src/intel/compiler/brw_fs_builder.h | 22 +- > >> src/intel/compiler/brw_fs_generator.cpp | 5 + > >> 5 files changed, 29 insertions(+), 1 deletion(-) > >> > >> diff --git a/src/intel/compiler/brw_fs_generator.cpp > >> b/src/intel/compiler/brw_fs_generator.cpp > >> index cd5be05..9157367 100644 > >> --- a/src/intel/compiler/brw_fs_generator.cpp > >> +++ b/src/intel/compiler/brw_fs_generator.cpp > >> @@ -1826,6 +1826,11 @@ fs_generator::generate_code(const cfg_t *cfg, int > >> dispatch_width) > >>case BRW_OPCODE_SEL: > >> brw_SEL(p, dst, src[0], src[1]); > >> break; > >> + case BRW_OPCODE_CSEL: > >> + brw_set_default_access_mode(p, BRW_ALIGN_16); > >> + brw_CSEL(p, dst, src[0], src[1], src[2]); > >> + brw_set_default_access_mode(p, BRW_ALIGN_1); > > > > Setting the access mode to ALIGN_1 is not needed, right? It will > > continue with the next instruction in the loop that sets ALIGN1 at the > > beginning of the loop. > > This was in Ken's original patch, so I'm not 100% sure. I've seen this > pattern around generating some other instructions, so my gut tells me > that it's done like this to be explicit and more defensive. Yeah, I don't think it's strictly necessary, but it's probably a good idea regardless, just to be defensive. Most instructions should be align1, and we just want to emit this one instruction in align16. A more conservative approach still would be to push/set_default/pop. I think when I wrote this, I was planning on eliminating push/pop, but that never happened (and probably won't at this point). I think it's fine as is, or I'd be fine with push/set/pop. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On 02/28/2018 12:58 AM, Samuel Iglesias Gonsálvez wrote: > On 24/02/18 00:56, Ian Romanick wrote: >> From: Kenneth Graunke >> >> v2 (idr): Don't allow CSEL with a non-float src2. >> >> v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. >> >> Signed-off-by: Kenneth Graunke >> Signed-off-by: Ian Romanick >> --- >> src/intel/compiler/brw_eu.h | 1 + >> src/intel/compiler/brw_eu_emit.c| 1 + >> src/intel/compiler/brw_fs.cpp | 1 + >> src/intel/compiler/brw_fs_builder.h | 22 +- >> src/intel/compiler/brw_fs_generator.cpp | 5 + >> 5 files changed, 29 insertions(+), 1 deletion(-) >> >> diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h >> index 2d0f56f..3201624 100644 >> --- a/src/intel/compiler/brw_eu.h >> +++ b/src/intel/compiler/brw_eu.h >> @@ -171,6 +171,7 @@ ALU2(SHR) >> ALU2(SHL) >> ALU1(DIM) >> ALU2(ASR) >> +ALU3(CSEL) >> ALU1(F32TO16) >> ALU1(F16TO32) >> ALU2(ADD) >> diff --git a/src/intel/compiler/brw_eu_emit.c >> b/src/intel/compiler/brw_eu_emit.c >> index c25d8d6..305a759 100644 >> --- a/src/intel/compiler/brw_eu_emit.c >> +++ b/src/intel/compiler/brw_eu_emit.c >> @@ -953,6 +953,7 @@ ALU2(SHR) >> ALU2(SHL) >> ALU1(DIM) >> ALU2(ASR) >> +ALU3(CSEL) >> ALU1(FRC) >> ALU1(RNDD) >> ALU2(MAC) >> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >> index bed632d..accae1b 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -929,6 +929,7 @@ unsigned >> fs_inst::flags_written() const >> { >> if ((conditional_mod && (opcode != BRW_OPCODE_SEL && >> +opcode != BRW_OPCODE_CSEL && >> opcode != BRW_OPCODE_IF && >> opcode != BRW_OPCODE_WHILE)) || >> opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) { >> diff --git a/src/intel/compiler/brw_fs_builder.h >> b/src/intel/compiler/brw_fs_builder.h >> index 87394bc..47eac42 100644 >> --- a/src/intel/compiler/brw_fs_builder.h >> +++ b/src/intel/compiler/brw_fs_builder.h >> @@ -458,7 +458,6 @@ namespace brw { >>ALU1(BFREV) >>ALU1(CBIT) >>ALU2(CMPN) >> - ALU3(CSEL) >>ALU1(DIM) >>ALU2(DP2) >>ALU2(DP3) >> @@ -534,6 +533,27 @@ namespace brw { >>} >> >>/** >> + * CSEL: dst = src2 0.0f ? src0 : src1 >> + */ >> + instruction * >> + CSEL(const dst_reg &dst, const src_reg &src0, const src_reg &src1, >> + const src_reg &src2, brw_conditional_mod condition) const >> + { >> + /* CSEL only operates on floats, so we can't do integer =/> >> + * comparisons. Zero/non-zero (== and !=) comparisons almost work. >> + * 0x8000 fails because it is -0.0, and -0.0 == 0.0. >> + */ >> + assert(src2.type == BRW_REGISTER_TYPE_F); >> + >> + return set_condmod(condition, >> +emit(BRW_OPCODE_CSEL, >> + retype(dst, BRW_REGISTER_TYPE_F), >> + retype(src0, BRW_REGISTER_TYPE_F), >> + retype(src1, BRW_REGISTER_TYPE_F), >> + src2)); >> + } >> + >> + /** >> * Emit a linear interpolation instruction. >> */ >>instruction * >> diff --git a/src/intel/compiler/brw_fs_generator.cpp >> b/src/intel/compiler/brw_fs_generator.cpp >> index cd5be05..9157367 100644 >> --- a/src/intel/compiler/brw_fs_generator.cpp >> +++ b/src/intel/compiler/brw_fs_generator.cpp >> @@ -1826,6 +1826,11 @@ fs_generator::generate_code(const cfg_t *cfg, int >> dispatch_width) >>case BRW_OPCODE_SEL: >> brw_SEL(p, dst, src[0], src[1]); >> break; >> + case BRW_OPCODE_CSEL: >> + brw_set_default_access_mode(p, BRW_ALIGN_16); >> + brw_CSEL(p, dst, src[0], src[1], src[2]); >> + brw_set_default_access_mode(p, BRW_ALIGN_1); > > Setting the access mode to ALIGN_1 is not needed, right? It will > continue with the next instruction in the loop that sets ALIGN1 at the > beginning of the loop. This was in Ken's original patch, so I'm not 100% sure. I've seen this pattern around generating some other instructions, so my gut tells me that it's done like this to be explicit and more defensive. > In any case, > > Reviewed-by: Samuel Iglesias Gonsálvez > > Sam > > >> + break; >>case BRW_OPCODE_BFREV: >> assert(devinfo->gen >= 7); >> brw_BFREV(p, retype(dst, BRW_REGISTER_TYPE_UD), > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/22] i965/fs: Add infrastructure for generating CSEL instructions.
On 24/02/18 00:56, Ian Romanick wrote: > From: Kenneth Graunke > > v2 (idr): Don't allow CSEL with a non-float src2. > > v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. > > Signed-off-by: Kenneth Graunke > Signed-off-by: Ian Romanick > --- > src/intel/compiler/brw_eu.h | 1 + > src/intel/compiler/brw_eu_emit.c| 1 + > src/intel/compiler/brw_fs.cpp | 1 + > src/intel/compiler/brw_fs_builder.h | 22 +- > src/intel/compiler/brw_fs_generator.cpp | 5 + > 5 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h > index 2d0f56f..3201624 100644 > --- a/src/intel/compiler/brw_eu.h > +++ b/src/intel/compiler/brw_eu.h > @@ -171,6 +171,7 @@ ALU2(SHR) > ALU2(SHL) > ALU1(DIM) > ALU2(ASR) > +ALU3(CSEL) > ALU1(F32TO16) > ALU1(F16TO32) > ALU2(ADD) > diff --git a/src/intel/compiler/brw_eu_emit.c > b/src/intel/compiler/brw_eu_emit.c > index c25d8d6..305a759 100644 > --- a/src/intel/compiler/brw_eu_emit.c > +++ b/src/intel/compiler/brw_eu_emit.c > @@ -953,6 +953,7 @@ ALU2(SHR) > ALU2(SHL) > ALU1(DIM) > ALU2(ASR) > +ALU3(CSEL) > ALU1(FRC) > ALU1(RNDD) > ALU2(MAC) > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index bed632d..accae1b 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -929,6 +929,7 @@ unsigned > fs_inst::flags_written() const > { > if ((conditional_mod && (opcode != BRW_OPCODE_SEL && > +opcode != BRW_OPCODE_CSEL && > opcode != BRW_OPCODE_IF && > opcode != BRW_OPCODE_WHILE)) || > opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) { > diff --git a/src/intel/compiler/brw_fs_builder.h > b/src/intel/compiler/brw_fs_builder.h > index 87394bc..47eac42 100644 > --- a/src/intel/compiler/brw_fs_builder.h > +++ b/src/intel/compiler/brw_fs_builder.h > @@ -458,7 +458,6 @@ namespace brw { >ALU1(BFREV) >ALU1(CBIT) >ALU2(CMPN) > - ALU3(CSEL) >ALU1(DIM) >ALU2(DP2) >ALU2(DP3) > @@ -534,6 +533,27 @@ namespace brw { >} > >/** > + * CSEL: dst = src2 0.0f ? src0 : src1 > + */ > + instruction * > + CSEL(const dst_reg &dst, const src_reg &src0, const src_reg &src1, > + const src_reg &src2, brw_conditional_mod condition) const > + { > + /* CSEL only operates on floats, so we can't do integer =/> > + * comparisons. Zero/non-zero (== and !=) comparisons almost work. > + * 0x8000 fails because it is -0.0, and -0.0 == 0.0. > + */ > + assert(src2.type == BRW_REGISTER_TYPE_F); > + > + return set_condmod(condition, > +emit(BRW_OPCODE_CSEL, > + retype(dst, BRW_REGISTER_TYPE_F), > + retype(src0, BRW_REGISTER_TYPE_F), > + retype(src1, BRW_REGISTER_TYPE_F), > + src2)); > + } > + > + /** > * Emit a linear interpolation instruction. > */ >instruction * > diff --git a/src/intel/compiler/brw_fs_generator.cpp > b/src/intel/compiler/brw_fs_generator.cpp > index cd5be05..9157367 100644 > --- a/src/intel/compiler/brw_fs_generator.cpp > +++ b/src/intel/compiler/brw_fs_generator.cpp > @@ -1826,6 +1826,11 @@ fs_generator::generate_code(const cfg_t *cfg, int > dispatch_width) >case BRW_OPCODE_SEL: >brw_SEL(p, dst, src[0], src[1]); >break; > + case BRW_OPCODE_CSEL: > + brw_set_default_access_mode(p, BRW_ALIGN_16); > + brw_CSEL(p, dst, src[0], src[1], src[2]); > + brw_set_default_access_mode(p, BRW_ALIGN_1); Setting the access mode to ALIGN_1 is not needed, right? It will continue with the next instruction in the loop that sets ALIGN1 at the beginning of the loop. In any case, Reviewed-by: Samuel Iglesias Gonsálvez Sam > + break; >case BRW_OPCODE_BFREV: > assert(devinfo->gen >= 7); > brw_BFREV(p, retype(dst, BRW_REGISTER_TYPE_UD), signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev