Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 29/07/18 a las 19:47, Chema Casanova escribió: > El 28/07/18 a las 01:45, Francisco Jerez escribió: >> Chema Casanova writes: [...] > If we have a partial write/read: > > I understood that you my initial patter proposal would only be ok for > the first GRF of src[i]/dst (reg_offset == 0) > > periodic_mask(this->exec_size, /* count */ >this->src[i].stride * type_sz(this->src[i].type), /*step */ >type_sz(this->src[i].type), /* bits */ >this->src[i].offset % REG_SIZE); /* offset */ > > In the case we manage only reg_offset == 0 we get a huge improvement > reducing all problems many of the register_pressure we have now on all > SIMD8 shaders with 8/16bits test cases. > > I understood that you didn't agree that for cases where src/destination > use more than 1 GRF (reg_offset == 1) we can not guarantee that we can > apply the same internal offset (this->src[i].offset % REG_SIZE) as the > base register to calculate a patter. So It would be better to return ~0u > on reads or 0u in writes. > >>> Yes, but you could easily determine whether the mask is going to be invariant with respect to reg_offset (where reg_offset is within bounds) and in that case return the periodic_mask() expression above, otherwise return 0/~0u depending on whether reg_offset is within bounds. >>> >>> Ok, so we are within bounds, we don't have a predicated write, we are >>> not a send message. Then we have an ALU opcode and we return the >>> periodic_mask. >>> >> >> Those are all necessary but not sufficient conditions for the >> periodic_mask() expression above to give you the correct answer for any >> in-bounds reg_offset > 0, you should check that byte_offset < type_size >> * stride in addition. > > That's true. Fixed in v5. > > If we don't satisfy the condition then we return 0 on writes and ~0u on > reads. Could you have a look at the v5 to check if I can count with your R-b ? https://patchwork.freedesktop.org/patch/241482/ I suppose you didn't have time to have a look at the other patch of the series. "[v2,2/2] intel/fs: Improve liveness range calculation for partial writes" https://patchwork.freedesktop.org/patch/239839/ Thanks in advance, Chema ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 28/07/18 a las 01:45, Francisco Jerez escribió: > Chema Casanova writes: > >> El 27/07/18 a las 02:44, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 26/07/18 a las 20:02, Francisco Jerez escribió: > Chema Casanova writes: > >> El 20/07/18 a las 22:10, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 20/07/18 a las 00:34, Francisco Jerez escribió: > Chema Casanova writes: > >> El 14/07/18 a las 00:14, Francisco Jerez escribió: >>> Jose Maria Casanova Crespo writes: >>> For a register source/destination of an instruction the function returns the read/write byte pattern of a 32-byte registers as a unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the stride and if the register is source or destination. The objective of the functions if to help to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. --- src/intel/compiler/brw_fs.cpp | 183 + src/intel/compiler/brw_ir_fs.h | 1 + 2 files changed, 184 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 2b8363ca362..f3045c4ff6c 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read/written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the register + * stride and the register is source or destination for the instruction. + * + * The objective of this function is to identify which parts of the register + * are read or written for operations that don't read/write a full register. + * So we can identify in live range variable analysis if a partial write has + * completelly defined the part of the register used by a partial read. So we + * avoid extending the liveness range because all data read was already + * defined although the wasn't completely written. + */ +unsigned +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const +{ + if (is_dst) { >> >>> Please split into two functions (like fs_inst::src_read and >>> ::src_written) since that would make the call-sites of this method >>> more >>> self-documenting than a boolean parameter. You should be able to >>> share >>> code by refactoring the common logic into a separate function (see >>> below >>> for some suggestions on how that could be achieved). >> >> Sure, it would improve readability and simplifies the logic, I've >> chosen >> dst_write_pattern and src_read_pattern. >> >>> + /* We don't know what is written so we return the worts case */ >>> >>> "worst" >> >> Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0; + /* We assume that send destinations are completely written */ + if (this->is_send_from_grf()) + return ~0u; >>> >>> Some send-like instructions won't be caught by this condition, you >>> should check for this->mlen != 0 in addition. >> >> Would it be enough to check for (this->mlen > 0) and forget about >> is_send_from_grf? I am using this approach in v2 I am sending. >> > > I don't think the mlen > 0 condition would catch all cases either... > E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need > both > conditions. Sucks... That is
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Chema Casanova writes: > El 27/07/18 a las 02:44, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 26/07/18 a las 20:02, Francisco Jerez escribió: Chema Casanova writes: > El 20/07/18 a las 22:10, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 20/07/18 a las 00:34, Francisco Jerez escribió: Chema Casanova writes: > El 14/07/18 a las 00:14, Francisco Jerez escribió: >> Jose Maria Casanova Crespo writes: >> >>> For a register source/destination of an instruction the function >>> returns >>> the read/write byte pattern of a 32-byte registers as a unsigned >>> int. >>> >>> The returned pattern takes into account the exec_size of the >>> instruction, >>> the type bitsize, the stride and if the register is source or >>> destination. >>> >>> The objective of the functions if to help to know the read/written >>> bytes >>> of the instructions to improve the liveness analysis for partial >>> read/writes. >>> >>> We manage special cases for >>> SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the >>> bitsize >>> parameter they have a different read pattern. >>> --- >>> src/intel/compiler/brw_fs.cpp | 183 >>> + >>> src/intel/compiler/brw_ir_fs.h | 1 + >>> 2 files changed, 184 insertions(+) >>> >>> diff --git a/src/intel/compiler/brw_fs.cpp >>> b/src/intel/compiler/brw_fs.cpp >>> index 2b8363ca362..f3045c4ff6c 100644 >>> --- a/src/intel/compiler/brw_fs.cpp >>> +++ b/src/intel/compiler/brw_fs.cpp >>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >>> this->dst.offset % REG_SIZE != 0); >>> } >>> >>> +/** >>> + * Returns a 32-bit uint whose bits represent if the associated >>> register byte >>> + * has been read/written by the instruction. The returned pattern >>> takes into >>> + * account the exec_size of the instruction, the type bitsize and >>> the register >>> + * stride and the register is source or destination for the >>> instruction. >>> + * >>> + * The objective of this function is to identify which parts of >>> the register >>> + * are read or written for operations that don't read/write a full >>> register. >>> + * So we can identify in live range variable analysis if a partial >>> write has >>> + * completelly defined the part of the register used by a partial >>> read. So we >>> + * avoid extending the liveness range because all data read was >>> already >>> + * defined although the wasn't completely written. >>> + */ >>> +unsigned >>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean >>> is_dst) const >>> +{ >>> + if (is_dst) { > >> Please split into two functions (like fs_inst::src_read and >> ::src_written) since that would make the call-sites of this method >> more >> self-documenting than a boolean parameter. You should be able to >> share >> code by refactoring the common logic into a separate function (see >> below >> for some suggestions on how that could be achieved). > > Sure, it would improve readability and simplifies the logic, I've > chosen > dst_write_pattern and src_read_pattern. > >> >>> + /* We don't know what is written so we return the worts case >>> */ >> >> "worst" > > Fixed. > >>> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >>> + return 0; >>> + /* We assume that send destinations are completely written */ >>> + if (this->is_send_from_grf()) >>> + return ~0u; >> >> Some send-like instructions won't be caught by this condition, you >> should check for this->mlen != 0 in addition. > > Would it be enough to check for (this->mlen > 0) and forget about > is_send_from_grf? I am using this approach in v2 I am sending. > I don't think the mlen > 0 condition would catch all cases either... E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both conditions. Sucks... >>> >>> That is true, so now we have the: >>> (this->is_send_from_grf() || this->mlen != 0) >>> >>> + } else { >>> + /* byte_scattered_write_logical pattern of src[1] is 32-bit
[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 27/07/18 a las 02:44, Francisco Jerez escribió: > Chema Casanova writes: > >> El 26/07/18 a las 20:02, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 20/07/18 a las 22:10, Francisco Jerez escribió: > Chema Casanova writes: > >> El 20/07/18 a las 00:34, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 14/07/18 a las 00:14, Francisco Jerez escribió: > Jose Maria Casanova Crespo writes: > >> For a register source/destination of an instruction the function >> returns >> the read/write byte pattern of a 32-byte registers as a unsigned int. >> >> The returned pattern takes into account the exec_size of the >> instruction, >> the type bitsize, the stride and if the register is source or >> destination. >> >> The objective of the functions if to help to know the read/written >> bytes >> of the instructions to improve the liveness analysis for partial >> read/writes. >> >> We manage special cases for >> SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the >> bitsize >> parameter they have a different read pattern. >> --- >> src/intel/compiler/brw_fs.cpp | 183 >> + >> src/intel/compiler/brw_ir_fs.h | 1 + >> 2 files changed, 184 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs.cpp >> b/src/intel/compiler/brw_fs.cpp >> index 2b8363ca362..f3045c4ff6c 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >> this->dst.offset % REG_SIZE != 0); >> } >> >> +/** >> + * Returns a 32-bit uint whose bits represent if the associated >> register byte >> + * has been read/written by the instruction. The returned pattern >> takes into >> + * account the exec_size of the instruction, the type bitsize and >> the register >> + * stride and the register is source or destination for the >> instruction. >> + * >> + * The objective of this function is to identify which parts of the >> register >> + * are read or written for operations that don't read/write a full >> register. >> + * So we can identify in live range variable analysis if a partial >> write has >> + * completelly defined the part of the register used by a partial >> read. So we >> + * avoid extending the liveness range because all data read was >> already >> + * defined although the wasn't completely written. >> + */ >> +unsigned >> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) >> const >> +{ >> + if (is_dst) { > Please split into two functions (like fs_inst::src_read and > ::src_written) since that would make the call-sites of this method > more > self-documenting than a boolean parameter. You should be able to > share > code by refactoring the common logic into a separate function (see > below > for some suggestions on how that could be achieved). Sure, it would improve readability and simplifies the logic, I've chosen dst_write_pattern and src_read_pattern. > >> + /* We don't know what is written so we return the worts case >> */ > > "worst" Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >> + return 0; >> + /* We assume that send destinations are completely written */ >> + if (this->is_send_from_grf()) >> + return ~0u; > > Some send-like instructions won't be caught by this condition, you > should check for this->mlen != 0 in addition. Would it be enough to check for (this->mlen > 0) and forget about is_send_from_grf? I am using this approach in v2 I am sending. >>> >>> I don't think the mlen > 0 condition would catch all cases either... >>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >>> conditions. Sucks... >> >> That is true, so now we have the: >> (this->is_send_from_grf() || this->mlen != 0) >> >> + } else { >> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >> aligned >> + * so the read pattern depends on the bitsize stored at src[4] >> + */ >> + if (this->opcode ==
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Chema Casanova writes: > El 26/07/18 a las 20:02, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 20/07/18 a las 22:10, Francisco Jerez escribió: Chema Casanova writes: > El 20/07/18 a las 00:34, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 14/07/18 a las 00:14, Francisco Jerez escribió: Jose Maria Casanova Crespo writes: > For a register source/destination of an instruction the function > returns > the read/write byte pattern of a 32-byte registers as a unsigned int. > > The returned pattern takes into account the exec_size of the > instruction, > the type bitsize, the stride and if the register is source or > destination. > > The objective of the functions if to help to know the read/written > bytes > of the instructions to improve the liveness analysis for partial > read/writes. > > We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL > and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the > bitsize > parameter they have a different read pattern. > --- > src/intel/compiler/brw_fs.cpp | 183 > + > src/intel/compiler/brw_ir_fs.h | 1 + > 2 files changed, 184 insertions(+) > > diff --git a/src/intel/compiler/brw_fs.cpp > b/src/intel/compiler/brw_fs.cpp > index 2b8363ca362..f3045c4ff6c 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const > this->dst.offset % REG_SIZE != 0); > } > > +/** > + * Returns a 32-bit uint whose bits represent if the associated > register byte > + * has been read/written by the instruction. The returned pattern > takes into > + * account the exec_size of the instruction, the type bitsize and > the register > + * stride and the register is source or destination for the > instruction. > + * > + * The objective of this function is to identify which parts of the > register > + * are read or written for operations that don't read/write a full > register. > + * So we can identify in live range variable analysis if a partial > write has > + * completelly defined the part of the register used by a partial > read. So we > + * avoid extending the liveness range because all data read was > already > + * defined although the wasn't completely written. > + */ > +unsigned > +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) > const > +{ > + if (is_dst) { >>> Please split into two functions (like fs_inst::src_read and ::src_written) since that would make the call-sites of this method more self-documenting than a boolean parameter. You should be able to share code by refactoring the common logic into a separate function (see below for some suggestions on how that could be achieved). >>> >>> Sure, it would improve readability and simplifies the logic, I've chosen >>> dst_write_pattern and src_read_pattern. >>> > + /* We don't know what is written so we return the worts case */ "worst" >>> >>> Fixed. >>> > + if (this->predicate && this->opcode != BRW_OPCODE_SEL) > + return 0; > + /* We assume that send destinations are completely written */ > + if (this->is_send_from_grf()) > + return ~0u; Some send-like instructions won't be caught by this condition, you should check for this->mlen != 0 in addition. >>> >>> Would it be enough to check for (this->mlen > 0) and forget about >>> is_send_from_grf? I am using this approach in v2 I am sending. >>> >> >> I don't think the mlen > 0 condition would catch all cases either... >> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >> conditions. Sucks... > > That is true, so now we have the: > (this->is_send_from_grf() || this->mlen != 0) > > + } else { > + /* byte_scattered_write_logical pattern of src[1] is 32-bit > aligned > + * so the read pattern depends on the bitsize stored at src[4] > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL > && > + this->src[1].nr == r.nr) { >>> I feel uncomfortable about attempting to guess the source the caller is referring to by comparing
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 26/07/18 a las 20:02, Francisco Jerez escribió: > Chema Casanova writes: > >> El 20/07/18 a las 22:10, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 20/07/18 a las 00:34, Francisco Jerez escribió: > Chema Casanova writes: > >> El 14/07/18 a las 00:14, Francisco Jerez escribió: >>> Jose Maria Casanova Crespo writes: >>> For a register source/destination of an instruction the function returns the read/write byte pattern of a 32-byte registers as a unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the stride and if the register is source or destination. The objective of the functions if to help to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. --- src/intel/compiler/brw_fs.cpp | 183 + src/intel/compiler/brw_ir_fs.h | 1 + 2 files changed, 184 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 2b8363ca362..f3045c4ff6c 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read/written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the register + * stride and the register is source or destination for the instruction. + * + * The objective of this function is to identify which parts of the register + * are read or written for operations that don't read/write a full register. + * So we can identify in live range variable analysis if a partial write has + * completelly defined the part of the register used by a partial read. So we + * avoid extending the liveness range because all data read was already + * defined although the wasn't completely written. + */ +unsigned +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const +{ + if (is_dst) { >> >>> Please split into two functions (like fs_inst::src_read and >>> ::src_written) since that would make the call-sites of this method more >>> self-documenting than a boolean parameter. You should be able to share >>> code by refactoring the common logic into a separate function (see below >>> for some suggestions on how that could be achieved). >> >> Sure, it would improve readability and simplifies the logic, I've chosen >> dst_write_pattern and src_read_pattern. >> >>> + /* We don't know what is written so we return the worts case */ >>> >>> "worst" >> >> Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0; + /* We assume that send destinations are completely written */ + if (this->is_send_from_grf()) + return ~0u; >>> >>> Some send-like instructions won't be caught by this condition, you >>> should check for this->mlen != 0 in addition. >> >> Would it be enough to check for (this->mlen > 0) and forget about >> is_send_from_grf? I am using this approach in v2 I am sending. >> > > I don't think the mlen > 0 condition would catch all cases either... > E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both > conditions. Sucks... That is true, so now we have the: (this->is_send_from_grf() || this->mlen != 0) + } else { + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned + * so the read pattern depends on the bitsize stored at src[4] + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && + this->src[1].nr == r.nr) { >> >>> I feel uncomfortable about attempting to guess the source the caller is >>> referring to by comparing the registers for equality. E.g. you could >>> potentially end up with two sources that compare equal but have >>> different semantics (e.g. as a result of CSE) which
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Chema Casanova writes: > El 20/07/18 a las 22:10, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 20/07/18 a las 00:34, Francisco Jerez escribió: Chema Casanova writes: > El 14/07/18 a las 00:14, Francisco Jerez escribió: >> Jose Maria Casanova Crespo writes: >> >>> For a register source/destination of an instruction the function returns >>> the read/write byte pattern of a 32-byte registers as a unsigned int. >>> >>> The returned pattern takes into account the exec_size of the >>> instruction, >>> the type bitsize, the stride and if the register is source or >>> destination. >>> >>> The objective of the functions if to help to know the read/written bytes >>> of the instructions to improve the liveness analysis for partial >>> read/writes. >>> >>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >>> parameter they have a different read pattern. >>> --- >>> src/intel/compiler/brw_fs.cpp | 183 + >>> src/intel/compiler/brw_ir_fs.h | 1 + >>> 2 files changed, 184 insertions(+) >>> >>> diff --git a/src/intel/compiler/brw_fs.cpp >>> b/src/intel/compiler/brw_fs.cpp >>> index 2b8363ca362..f3045c4ff6c 100644 >>> --- a/src/intel/compiler/brw_fs.cpp >>> +++ b/src/intel/compiler/brw_fs.cpp >>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >>> this->dst.offset % REG_SIZE != 0); >>> } >>> >>> +/** >>> + * Returns a 32-bit uint whose bits represent if the associated >>> register byte >>> + * has been read/written by the instruction. The returned pattern >>> takes into >>> + * account the exec_size of the instruction, the type bitsize and the >>> register >>> + * stride and the register is source or destination for the >>> instruction. >>> + * >>> + * The objective of this function is to identify which parts of the >>> register >>> + * are read or written for operations that don't read/write a full >>> register. >>> + * So we can identify in live range variable analysis if a partial >>> write has >>> + * completelly defined the part of the register used by a partial >>> read. So we >>> + * avoid extending the liveness range because all data read was already >>> + * defined although the wasn't completely written. >>> + */ >>> +unsigned >>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) >>> const >>> +{ >>> + if (is_dst) { > >> Please split into two functions (like fs_inst::src_read and >> ::src_written) since that would make the call-sites of this method more >> self-documenting than a boolean parameter. You should be able to share >> code by refactoring the common logic into a separate function (see below >> for some suggestions on how that could be achieved). > > Sure, it would improve readability and simplifies the logic, I've chosen > dst_write_pattern and src_read_pattern. > >> >>> + /* We don't know what is written so we return the worts case */ >> >> "worst" > > Fixed. > >>> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >>> + return 0; >>> + /* We assume that send destinations are completely written */ >>> + if (this->is_send_from_grf()) >>> + return ~0u; >> >> Some send-like instructions won't be caught by this condition, you >> should check for this->mlen != 0 in addition. > > Would it be enough to check for (this->mlen > 0) and forget about > is_send_from_grf? I am using this approach in v2 I am sending. > I don't think the mlen > 0 condition would catch all cases either... E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both conditions. Sucks... >>> >>> That is true, so now we have the: >>> (this->is_send_from_grf() || this->mlen != 0) >>> >>> + } else { >>> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >>> aligned >>> + * so the read pattern depends on the bitsize stored at src[4] >>> + */ >>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >>> + this->src[1].nr == r.nr) { > >> I feel uncomfortable about attempting to guess the source the caller is >> referring to by comparing the registers for equality. E.g. you could >> potentially end up with two sources that compare equal but have >> different semantics (e.g. as a result of CSE) which might cause it to >> get the wrong answer. It would probably be better to pass a source >> index and a byte offset as argument instead of an fs_reg. > > I've didn't thought about CSE, I'm now
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 20/07/18 a las 22:10, Francisco Jerez escribió: > Chema Casanova writes: > >> El 20/07/18 a las 00:34, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 14/07/18 a las 00:14, Francisco Jerez escribió: > Jose Maria Casanova Crespo writes: > >> For a register source/destination of an instruction the function returns >> the read/write byte pattern of a 32-byte registers as a unsigned int. >> >> The returned pattern takes into account the exec_size of the instruction, >> the type bitsize, the stride and if the register is source or >> destination. >> >> The objective of the functions if to help to know the read/written bytes >> of the instructions to improve the liveness analysis for partial >> read/writes. >> >> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >> parameter they have a different read pattern. >> --- >> src/intel/compiler/brw_fs.cpp | 183 + >> src/intel/compiler/brw_ir_fs.h | 1 + >> 2 files changed, 184 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs.cpp >> b/src/intel/compiler/brw_fs.cpp >> index 2b8363ca362..f3045c4ff6c 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >> this->dst.offset % REG_SIZE != 0); >> } >> >> +/** >> + * Returns a 32-bit uint whose bits represent if the associated >> register byte >> + * has been read/written by the instruction. The returned pattern takes >> into >> + * account the exec_size of the instruction, the type bitsize and the >> register >> + * stride and the register is source or destination for the instruction. >> + * >> + * The objective of this function is to identify which parts of the >> register >> + * are read or written for operations that don't read/write a full >> register. >> + * So we can identify in live range variable analysis if a partial >> write has >> + * completelly defined the part of the register used by a partial read. >> So we >> + * avoid extending the liveness range because all data read was already >> + * defined although the wasn't completely written. >> + */ >> +unsigned >> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) >> const >> +{ >> + if (is_dst) { > Please split into two functions (like fs_inst::src_read and > ::src_written) since that would make the call-sites of this method more > self-documenting than a boolean parameter. You should be able to share > code by refactoring the common logic into a separate function (see below > for some suggestions on how that could be achieved). Sure, it would improve readability and simplifies the logic, I've chosen dst_write_pattern and src_read_pattern. > >> + /* We don't know what is written so we return the worts case */ > > "worst" Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >> + return 0; >> + /* We assume that send destinations are completely written */ >> + if (this->is_send_from_grf()) >> + return ~0u; > > Some send-like instructions won't be caught by this condition, you > should check for this->mlen != 0 in addition. Would it be enough to check for (this->mlen > 0) and forget about is_send_from_grf? I am using this approach in v2 I am sending. >>> >>> I don't think the mlen > 0 condition would catch all cases either... >>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >>> conditions. Sucks... >> >> That is true, so now we have the: >> (this->is_send_from_grf() || this->mlen != 0) >> >> + } else { >> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >> aligned >> + * so the read pattern depends on the bitsize stored at src[4] >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >> + this->src[1].nr == r.nr) { > I feel uncomfortable about attempting to guess the source the caller is > referring to by comparing the registers for equality. E.g. you could > potentially end up with two sources that compare equal but have > different semantics (e.g. as a result of CSE) which might cause it to > get the wrong answer. It would probably be better to pass a source > index and a byte offset as argument instead of an fs_reg. I've didn't thought about CSE, I'm now receiving the number of source and the reg_offset. I'm using reg_offset instead of byte offsets as it simplifies the logic. Now we are using always
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Chema Casanova writes: > El 20/07/18 a las 00:34, Francisco Jerez escribió: >> Chema Casanova writes: >> >>> El 14/07/18 a las 00:14, Francisco Jerez escribió: Jose Maria Casanova Crespo writes: > For a register source/destination of an instruction the function returns > the read/write byte pattern of a 32-byte registers as a unsigned int. > > The returned pattern takes into account the exec_size of the instruction, > the type bitsize, the stride and if the register is source or destination. > > The objective of the functions if to help to know the read/written bytes > of the instructions to improve the liveness analysis for partial > read/writes. > > We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL > and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize > parameter they have a different read pattern. > --- > src/intel/compiler/brw_fs.cpp | 183 + > src/intel/compiler/brw_ir_fs.h | 1 + > 2 files changed, 184 insertions(+) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index 2b8363ca362..f3045c4ff6c 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const > this->dst.offset % REG_SIZE != 0); > } > > +/** > + * Returns a 32-bit uint whose bits represent if the associated register > byte > + * has been read/written by the instruction. The returned pattern takes > into > + * account the exec_size of the instruction, the type bitsize and the > register > + * stride and the register is source or destination for the instruction. > + * > + * The objective of this function is to identify which parts of the > register > + * are read or written for operations that don't read/write a full > register. > + * So we can identify in live range variable analysis if a partial write > has > + * completelly defined the part of the register used by a partial read. > So we > + * avoid extending the liveness range because all data read was already > + * defined although the wasn't completely written. > + */ > +unsigned > +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const > +{ > + if (is_dst) { >>> Please split into two functions (like fs_inst::src_read and ::src_written) since that would make the call-sites of this method more self-documenting than a boolean parameter. You should be able to share code by refactoring the common logic into a separate function (see below for some suggestions on how that could be achieved). >>> >>> Sure, it would improve readability and simplifies the logic, I've chosen >>> dst_write_pattern and src_read_pattern. >>> > + /* We don't know what is written so we return the worts case */ "worst" >>> >>> Fixed. >>> > + if (this->predicate && this->opcode != BRW_OPCODE_SEL) > + return 0; > + /* We assume that send destinations are completely written */ > + if (this->is_send_from_grf()) > + return ~0u; Some send-like instructions won't be caught by this condition, you should check for this->mlen != 0 in addition. >>> >>> Would it be enough to check for (this->mlen > 0) and forget about >>> is_send_from_grf? I am using this approach in v2 I am sending. >>> >> >> I don't think the mlen > 0 condition would catch all cases either... >> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >> conditions. Sucks... > > That is true, so now we have the: > (this->is_send_from_grf() || this->mlen != 0) > > + } else { > + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned > + * so the read pattern depends on the bitsize stored at src[4] > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && > + this->src[1].nr == r.nr) { >>> I feel uncomfortable about attempting to guess the source the caller is referring to by comparing the registers for equality. E.g. you could potentially end up with two sources that compare equal but have different semantics (e.g. as a result of CSE) which might cause it to get the wrong answer. It would probably be better to pass a source index and a byte offset as argument instead of an fs_reg. >>> >>> I've didn't thought about CSE, I'm now receiving the number of source >>> and the reg_offset. I'm using reg_offset instead of byte offsets as it >>> simplifies the logic. Now we are using always the base src register to >>> do all the calculation > + switch (this->src[4].ud) { > + case 32: > +return ~0u; > + case 16: > +
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 20/07/18 a las 00:34, Francisco Jerez escribió: > Chema Casanova writes: > >> El 14/07/18 a las 00:14, Francisco Jerez escribió: >>> Jose Maria Casanova Crespo writes: >>> For a register source/destination of an instruction the function returns the read/write byte pattern of a 32-byte registers as a unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the stride and if the register is source or destination. The objective of the functions if to help to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. --- src/intel/compiler/brw_fs.cpp | 183 + src/intel/compiler/brw_ir_fs.h | 1 + 2 files changed, 184 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 2b8363ca362..f3045c4ff6c 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read/written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the register + * stride and the register is source or destination for the instruction. + * + * The objective of this function is to identify which parts of the register + * are read or written for operations that don't read/write a full register. + * So we can identify in live range variable analysis if a partial write has + * completelly defined the part of the register used by a partial read. So we + * avoid extending the liveness range because all data read was already + * defined although the wasn't completely written. + */ +unsigned +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const +{ + if (is_dst) { >> >>> Please split into two functions (like fs_inst::src_read and >>> ::src_written) since that would make the call-sites of this method more >>> self-documenting than a boolean parameter. You should be able to share >>> code by refactoring the common logic into a separate function (see below >>> for some suggestions on how that could be achieved). >> >> Sure, it would improve readability and simplifies the logic, I've chosen >> dst_write_pattern and src_read_pattern. >> >>> + /* We don't know what is written so we return the worts case */ >>> >>> "worst" >> >> Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0; + /* We assume that send destinations are completely written */ + if (this->is_send_from_grf()) + return ~0u; >>> >>> Some send-like instructions won't be caught by this condition, you >>> should check for this->mlen != 0 in addition. >> >> Would it be enough to check for (this->mlen > 0) and forget about >> is_send_from_grf? I am using this approach in v2 I am sending. >> > > I don't think the mlen > 0 condition would catch all cases either... > E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both > conditions. Sucks... That is true, so now we have the: (this->is_send_from_grf() || this->mlen != 0) + } else { + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned + * so the read pattern depends on the bitsize stored at src[4] + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && + this->src[1].nr == r.nr) { >> >>> I feel uncomfortable about attempting to guess the source the caller is >>> referring to by comparing the registers for equality. E.g. you could >>> potentially end up with two sources that compare equal but have >>> different semantics (e.g. as a result of CSE) which might cause it to >>> get the wrong answer. It would probably be better to pass a source >>> index and a byte offset as argument instead of an fs_reg. >> >> I've didn't thought about CSE, I'm now receiving the number of source >> and the reg_offset. I'm using reg_offset instead of byte offsets as it >> simplifies the logic. Now we are using always the base src register to >> do all the calculation + switch (this->src[4].ud) { + case 32: +return ~0u; + case 16: +return 0x; + case 8: +return 0x; + default: +
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Chema Casanova writes: > El 14/07/18 a las 00:14, Francisco Jerez escribió: >> Jose Maria Casanova Crespo writes: >> >>> For a register source/destination of an instruction the function returns >>> the read/write byte pattern of a 32-byte registers as a unsigned int. >>> >>> The returned pattern takes into account the exec_size of the instruction, >>> the type bitsize, the stride and if the register is source or destination. >>> >>> The objective of the functions if to help to know the read/written bytes >>> of the instructions to improve the liveness analysis for partial >>> read/writes. >>> >>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >>> parameter they have a different read pattern. >>> --- >>> src/intel/compiler/brw_fs.cpp | 183 + >>> src/intel/compiler/brw_ir_fs.h | 1 + >>> 2 files changed, 184 insertions(+) >>> >>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >>> index 2b8363ca362..f3045c4ff6c 100644 >>> --- a/src/intel/compiler/brw_fs.cpp >>> +++ b/src/intel/compiler/brw_fs.cpp >>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >>> this->dst.offset % REG_SIZE != 0); >>> } >>> >>> +/** >>> + * Returns a 32-bit uint whose bits represent if the associated register >>> byte >>> + * has been read/written by the instruction. The returned pattern takes >>> into >>> + * account the exec_size of the instruction, the type bitsize and the >>> register >>> + * stride and the register is source or destination for the instruction. >>> + * >>> + * The objective of this function is to identify which parts of the >>> register >>> + * are read or written for operations that don't read/write a full >>> register. >>> + * So we can identify in live range variable analysis if a partial write >>> has >>> + * completelly defined the part of the register used by a partial read. So >>> we >>> + * avoid extending the liveness range because all data read was already >>> + * defined although the wasn't completely written. >>> + */ >>> +unsigned >>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const >>> +{ >>> + if (is_dst) { > >> Please split into two functions (like fs_inst::src_read and >> ::src_written) since that would make the call-sites of this method more >> self-documenting than a boolean parameter. You should be able to share >> code by refactoring the common logic into a separate function (see below >> for some suggestions on how that could be achieved). > > Sure, it would improve readability and simplifies the logic, I've chosen > dst_write_pattern and src_read_pattern. > >> >>> + /* We don't know what is written so we return the worts case */ >> >> "worst" > > Fixed. > >>> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >>> + return 0; >>> + /* We assume that send destinations are completely written */ >>> + if (this->is_send_from_grf()) >>> + return ~0u; >> >> Some send-like instructions won't be caught by this condition, you >> should check for this->mlen != 0 in addition. > > Would it be enough to check for (this->mlen > 0) and forget about > is_send_from_grf? I am using this approach in v2 I am sending. > I don't think the mlen > 0 condition would catch all cases either... E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both conditions. Sucks... >>> + } else { >>> + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned >>> + * so the read pattern depends on the bitsize stored at src[4] >>> + */ >>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >>> + this->src[1].nr == r.nr) { > >> I feel uncomfortable about attempting to guess the source the caller is >> referring to by comparing the registers for equality. E.g. you could >> potentially end up with two sources that compare equal but have >> different semantics (e.g. as a result of CSE) which might cause it to >> get the wrong answer. It would probably be better to pass a source >> index and a byte offset as argument instead of an fs_reg. > > I've didn't thought about CSE, I'm now receiving the number of source > and the reg_offset. I'm using reg_offset instead of byte offsets as it > simplifies the logic. Now we are using always the base src register to > do all the calculation >>> + switch (this->src[4].ud) { >>> + case 32: >>> +return ~0u; >>> + case 16: >>> +return 0x; >>> + case 8: >>> +return 0x; >>> + default: >>> +unreachable("Unsupported bitsize at >>> byte_scattered_write_logical"); >>> + } >> >> Replace the above switch statement with a call to "periodic_mask(8, 4, >> this->src[4].ud / 8)" (see below for the definition). > > Ok. > >>> + } >>> + /* As for
[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 14/07/18 a las 00:14, Francisco Jerez escribió: > Jose Maria Casanova Crespo writes: > >> For a register source/destination of an instruction the function returns >> the read/write byte pattern of a 32-byte registers as a unsigned int. >> >> The returned pattern takes into account the exec_size of the instruction, >> the type bitsize, the stride and if the register is source or destination. >> >> The objective of the functions if to help to know the read/written bytes >> of the instructions to improve the liveness analysis for partial read/writes. >> >> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >> parameter they have a different read pattern. >> --- >> src/intel/compiler/brw_fs.cpp | 183 + >> src/intel/compiler/brw_ir_fs.h | 1 + >> 2 files changed, 184 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >> index 2b8363ca362..f3045c4ff6c 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >> this->dst.offset % REG_SIZE != 0); >> } >> >> +/** >> + * Returns a 32-bit uint whose bits represent if the associated register >> byte >> + * has been read/written by the instruction. The returned pattern takes into >> + * account the exec_size of the instruction, the type bitsize and the >> register >> + * stride and the register is source or destination for the instruction. >> + * >> + * The objective of this function is to identify which parts of the register >> + * are read or written for operations that don't read/write a full register. >> + * So we can identify in live range variable analysis if a partial write has >> + * completelly defined the part of the register used by a partial read. So >> we >> + * avoid extending the liveness range because all data read was already >> + * defined although the wasn't completely written. >> + */ >> +unsigned >> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const >> +{ >> + if (is_dst) { > Please split into two functions (like fs_inst::src_read and > ::src_written) since that would make the call-sites of this method more > self-documenting than a boolean parameter. You should be able to share > code by refactoring the common logic into a separate function (see below > for some suggestions on how that could be achieved). Sure, it would improve readability and simplifies the logic, I've chosen dst_write_pattern and src_read_pattern. > >> + /* We don't know what is written so we return the worts case */ > > "worst" Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >> + return 0; >> + /* We assume that send destinations are completely written */ >> + if (this->is_send_from_grf()) >> + return ~0u; > > Some send-like instructions won't be caught by this condition, you > should check for this->mlen != 0 in addition. Would it be enough to check for (this->mlen > 0) and forget about is_send_from_grf? I am using this approach in v2 I am sending. >> + } else { >> + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned >> + * so the read pattern depends on the bitsize stored at src[4] >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >> + this->src[1].nr == r.nr) { > I feel uncomfortable about attempting to guess the source the caller is > referring to by comparing the registers for equality. E.g. you could > potentially end up with two sources that compare equal but have > different semantics (e.g. as a result of CSE) which might cause it to > get the wrong answer. It would probably be better to pass a source > index and a byte offset as argument instead of an fs_reg. I've didn't thought about CSE, I'm now receiving the number of source and the reg_offset. I'm using reg_offset instead of byte offsets as it simplifies the logic. Now we are using always the base src register to do all the calculation >> + switch (this->src[4].ud) { >> + case 32: >> +return ~0u; >> + case 16: >> +return 0x; >> + case 8: >> +return 0x; >> + default: >> +unreachable("Unsupported bitsize at >> byte_scattered_write_logical"); >> + } > > Replace the above switch statement with a call to "periodic_mask(8, 4, > this->src[4].ud / 8)" (see below for the definition). Ok. >> + } >> + /* As for byte_scattered_write_logical but we need to take into >> account >> + * that data written are in the payload offset 32 with SIMD8 and >> offset >> + * 64 with SIMD16. >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && >> + this->src[0].nr == r.nr) { >> + fs_reg payload =
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
Jose Maria Casanova Crespo writes: > For a register source/destination of an instruction the function returns > the read/write byte pattern of a 32-byte registers as a unsigned int. > > The returned pattern takes into account the exec_size of the instruction, > the type bitsize, the stride and if the register is source or destination. > > The objective of the functions if to help to know the read/written bytes > of the instructions to improve the liveness analysis for partial read/writes. > > We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL > and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize > parameter they have a different read pattern. > --- > src/intel/compiler/brw_fs.cpp | 183 + > src/intel/compiler/brw_ir_fs.h | 1 + > 2 files changed, 184 insertions(+) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index 2b8363ca362..f3045c4ff6c 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const > this->dst.offset % REG_SIZE != 0); > } > > +/** > + * Returns a 32-bit uint whose bits represent if the associated register byte > + * has been read/written by the instruction. The returned pattern takes into > + * account the exec_size of the instruction, the type bitsize and the > register > + * stride and the register is source or destination for the instruction. > + * > + * The objective of this function is to identify which parts of the register > + * are read or written for operations that don't read/write a full register. > + * So we can identify in live range variable analysis if a partial write has > + * completelly defined the part of the register used by a partial read. So we > + * avoid extending the liveness range because all data read was already > + * defined although the wasn't completely written. > + */ > +unsigned > +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const > +{ > + if (is_dst) { Please split into two functions (like fs_inst::src_read and ::src_written) since that would make the call-sites of this method more self-documenting than a boolean parameter. You should be able to share code by refactoring the common logic into a separate function (see below for some suggestions on how that could be achieved). > + /* We don't know what is written so we return the worts case */ "worst" > + if (this->predicate && this->opcode != BRW_OPCODE_SEL) > + return 0; > + /* We assume that send destinations are completelly written */ > + if (this->is_send_from_grf()) > + return ~0u; Some send-like instructions won't be caught by this condition, you should check for this->mlen != 0 in addition. > + } else { > + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned > + * so the read pattern depends on the bitsize stored at src[4] > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && > + this->src[1].nr == r.nr) { I feel uncomfortable about attempting to guess the source the caller is referring to by comparing the registers for equality. E.g. you could potentially end up with two sources that compare equal but have different semantics (e.g. as a result of CSE) which might cause it to get the wrong answer. It would probably be better to pass a source index and a byte offset as argument instead of an fs_reg. > + switch (this->src[4].ud) { > + case 32: > +return ~0u; > + case 16: > +return 0x; > + case 8: > +return 0x; > + default: > +unreachable("Unsupported bitsize at > byte_scattered_write_logical"); > + } Replace the above switch statement with a call to "periodic_mask(8, 4, this->src[4].ud / 8)" (see below for the definition). > + } > + /* As for byte_scattered_write_logical but we need to take into account > + * that data written are in the payload offset 32 with SIMD8 and offset > + * 64 with SIMD16. > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && > + this->src[0].nr == r.nr) { > + fs_reg payload = this->src[0]; > + payload.offset = REG_SIZE * this->exec_size / 8; byte_offset() is your friend. > + if (regions_overlap(r, REG_SIZE, > + payload, REG_SIZE * this->exec_size / 8)) { > +switch (this->src[2].ud) { > +case 32: > + return ~0u; > +case 16: > + return 0x; > +case 8: > + return 0x; > +default: > + unreachable("Unsupported bitsize at byte_scattered_write"); > +} Replace the above switch statement with a call to "periodic_mask(8, 4, this->src[2].ud / 8)". > + } else { > +
[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
For a register source/destination of an instruction the function returns the read/write byte pattern of a 32-byte registers as a unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the stride and if the register is source or destination. The objective of the functions if to help to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. --- src/intel/compiler/brw_fs.cpp | 183 + src/intel/compiler/brw_ir_fs.h | 1 + 2 files changed, 184 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 2b8363ca362..f3045c4ff6c 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read/written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the register + * stride and the register is source or destination for the instruction. + * + * The objective of this function is to identify which parts of the register + * are read or written for operations that don't read/write a full register. + * So we can identify in live range variable analysis if a partial write has + * completelly defined the part of the register used by a partial read. So we + * avoid extending the liveness range because all data read was already + * defined although the wasn't completely written. + */ +unsigned +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const +{ + if (is_dst) { + /* We don't know what is written so we return the worts case */ + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0; + /* We assume that send destinations are completelly written */ + if (this->is_send_from_grf()) + return ~0u; + } else { + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned + * so the read pattern depends on the bitsize stored at src[4] + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && + this->src[1].nr == r.nr) { + switch (this->src[4].ud) { + case 32: +return ~0u; + case 16: +return 0x; + case 8: +return 0x; + default: +unreachable("Unsupported bitsize at byte_scattered_write_logical"); + } + } + /* As for byte_scattered_write_logical but we need to take into account + * that data written are in the payload offset 32 with SIMD8 and offset + * 64 with SIMD16. + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && + this->src[0].nr == r.nr) { + fs_reg payload = this->src[0]; + payload.offset = REG_SIZE * this->exec_size / 8; + if (regions_overlap(r, REG_SIZE, + payload, REG_SIZE * this->exec_size / 8)) { +switch (this->src[2].ud) { +case 32: + return ~0u; +case 16: + return 0x; +case 8: + return 0x; +default: + unreachable("Unsupported bitsize at byte_scattered_write"); +} + } else { +return ~0u; + } + } + } + + /* We define the most conservative value in order to calculate liveness +* range. If it is a destination nothing is defined and if is a source +* all the bytes of the register could be read. So for release builds +* the unreachables would have always safe return value. */ + unsigned pattern = is_dst ? 0 : ~0u; + + /* In the general case we calculate the pattern for a specific register +* on base of the type_size and stride. We calculate the SIMD8 pattern +* and then we adjust the patter if needed for different exec_sizes +* and offset +*/ + switch (type_sz(r.type)){ + case 1: + switch (r.stride) { + case 0: + pattern = 0X1; + break; + case 1: + pattern = 0xff; + break; + case 2: + pattern = 0x; + break; + case 4: + pattern = 0x; + break; + case 8: + pattern = 0x01010101; + break; + default: + unreachable("Unknown pattern unsupported 8-bit stride"); + } + break; + case 2: + switch (r.stride) { + case 0: + pattern = 0X3; + break; + case 1: + pattern = 0x; + break; + case 2: + pattern = 0x; + break; +