Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? Thanks for the clarification, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 12.03.2013 10:31, Christian König wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? I can live with it. I think ... (I hope I don't regret this later; seems like this doesn't contain less information, then it's ok.) If the placement of the hint index offends someone, just write it as MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ... TEMP[ADDR[0].x-13 : 1] or TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or something ... nicer. Actually ... if TEMP[0] is placed at mem[0] and TEMP[4..1] is placed at, say, mem[0x1000 in bytes] do I have to load $register mem[$addr - 0xd0] (no this can't work) or load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the offset) or load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added the base TEMP to the immediate offset) This needs to be documented as well. Thanks for the clarification, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 12.03.2013 12:10, Christoph Bumiller wrote: On 12.03.2013 10:31, Christian König wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? One more thing. While you're at it (i.e. are familiar with the code), could you set the UsageMask in the TGSI declaration so we can pack scalar or vec2 arrays ? Also, you could then declare gl_ClipDistance outputs as DCL OUT[0..7].x, CLIPDIST so we can actually index clip distances properly ? With DCL OUT[0..1].xyzw, CLIPDIST we can't really index the individual components which leads to if ((index 3) == 0) MOV OUT[index / 4].x = value else if ((index 3) == 1) MOV OUT[index / 4].y = value which is unnecessary on some hardware. I can live with it. I think ... (I hope I don't regret this later; seems like this doesn't contain less information, then it's ok.) If the placement of the hint index offends someone, just write it as MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ... TEMP[ADDR[0].x-13 : 1] or TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or something ... nicer. Actually ... if TEMP[0] is placed at mem[0] and TEMP[4..1] is placed at, say, mem[0x1000 in bytes] do I have to load $register mem[$addr - 0xd0] (no this can't work) or load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the offset) or load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added the base TEMP to the immediate offset) This needs to be documented as well. Thanks for the clarification, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On Tue, Mar 12, 2013 at 10:31 AM, Christian König deathsim...@vodafone.de wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. In that case, it would be better to avoid using the operator [] and use something else, because it has nothing to do with indexing. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? I'm okay with your approach. Although it doesn't appear to be very clean, it gives the same amount of information to drivers. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
- Original Message - On Tue, Mar 12, 2013 at 10:31 AM, Christian König deathsim...@vodafone.de wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. In that case, it would be better to avoid using the operator [] and use something else, because it has nothing to do with indexing. Note that [] operator is also used for const buffer selection. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 13:44, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. Nice, finally someone implements a proper solution. However, it seems like this isn't used for arrays in the IN and OUT files (varyings). Would it be much more work to use it there, too ? Fragment Shader inputs seem to be read with if (index == 0) return in[0] else if (index == 1) ... sequences. And I may have spotted a bug in the following shader: in vec4 vertex[2]; in vec4 color; out vec4 value[4]; uniform int i, j; void main() { gl_Position = vertex[i]; value[0] = vertex[0]; value[1] = vertex[1]; value[2] = vec4(0.0); value[3] = vec4(0.0); value[j] = color; } gives me DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] why is this TEMP[1][] ? The array seems to be the first declaration ... 8: MOV OUT[1], TEMP[0] 9: MOV OUT[2], TEMP[1] 10: MOV OUT[3], TEMP[2] 11: MOV OUT[4], TEMP[3] 12: MOV OUT[0], TEMP[4] 13: END Ideally this would not use TEMP arrays at all though, but output arrays (I vaguely recall some radeon card doesn't support this though. Is that just outputs or also inputs ?). The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
Am 11.03.2013 14:47, schrieb Christoph Bumiller: On 11.03.2013 13:44, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. Nice, finally someone implements a proper solution. However, it seems like this isn't used for arrays in the IN and OUT files (varyings). Would it be much more work to use it there, too ? Shouldn't be to much of a problem, but I just wanted to solve temporaries first and when that's working look at all the rest. Fragment Shader inputs seem to be read with if (index == 0) return in[0] else if (index == 1) ... sequences. Well as said before it only handles temp arrays for now. That looks like the code that's generated if the driver reports to not have indirect support, do you know off hand where exactly that's handled? The glsl_to_tgsi code is unfortunately hard to read at best. And I may have spotted a bug in the following shader: in vec4 vertex[2]; in vec4 color; out vec4 value[4]; uniform int i, j; void main() { gl_Position = vertex[i]; value[0] = vertex[0]; value[1] = vertex[1]; value[2] = vec4(0.0); value[3] = vec4(0.0); value[j] = color; } gives me DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] why is this TEMP[1][] ? The array seems to be the first declaration ... I numbered the declarations starting with 1 (and not 0), so I could use 0 as the SPECIAL case saying that we want to address the whole range of registers and not just one declaration. I did this just for compatibility reasons, so I could look at handling temps only, and doesn't bother to much with inputs/outputs. Well so far the patchset is just an RFC, and so I want to let the list see the patches before either implementing inputs/outputs as well or fully document such quirks/hacks. 8: MOV OUT[1], TEMP[0] 9: MOV OUT[2], TEMP[1] 10: MOV OUT[3], TEMP[2] 11: MOV OUT[4], TEMP[3] 12: MOV OUT[0], TEMP[4] 13: END Ideally this would not use TEMP arrays at all though, but output arrays (I vaguely recall some radeon card doesn't support this though. Is that just outputs or also inputs ?). More or less correct, modern radeons don't have an output register space, but instead have export instructions. In the current driver we allocate registers for temps and outputs to work around this, but in the example above it wouldn't be necessary. Inputs are just registers as well, either preloaded when starting the shader or filled in by special instructions (vector fetches, coordinate interpolation etc...). Thanks for the comments, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 15:38, Christian König wrote: Am 11.03.2013 14:47, schrieb Christoph Bumiller: On 11.03.2013 13:44, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. Nice, finally someone implements a proper solution. However, it seems like this isn't used for arrays in the IN and OUT files (varyings). Would it be much more work to use it there, too ? Shouldn't be to much of a problem, but I just wanted to solve temporaries first and when that's working look at all the rest. Fragment Shader inputs seem to be read with if (index == 0) return in[0] else if (index == 1) ... sequences. Well as said before it only handles temp arrays for now. That looks like the code that's generated if the driver reports to not have indirect support, do you know off hand where exactly that's handled? The glsl_to_tgsi code is unfortunately hard to read at best. Apologies, I didn't remember I that I didn't advertise indirect support for fragment shaders, indirect inputs would be supported though. The reason why I really want array support for inputs, too, is that input space location depends on semantic, and thus doesn't necessarily correspond to the TGSI order. Treatment of arrays should be consistent in the end, right now it looks like we're having, if you read this like C code: float temp0[4]; temp0[i] = x; but float in0, in1, in2, in3; x = in[i]; why is this TEMP[1][] ? The array seems to be the first declaration ... I numbered the declarations starting with 1 (and not 0), so I could use 0 as the SPECIAL case saying that we want to address the whole range of registers and not just one declaration. I did this just for compatibility reasons, so I could look at handling temps only, and doesn't bother to much with inputs/outputs. Well so far the patchset is just an RFC, and so I want to let the list see the patches before either implementing inputs/outputs as well or fully document such quirks/hacks. Ah, good to know. This should be documented (maybe it is and I missed it ?). At least in the comment above struct tgsi_ind_register's definition, which is what I'd look at first. Thanks again for doing this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 03/11/2013 06:44 AM, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, I still don't quite get what's going on here. In Christoph's reply, it seems he tested your patch and got TGSI code that looks like this: DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] What exactly does LOCAL mean on the temp declarations? But in Jose's example, he wrote: DCL TEMP[1][0..70] DCL TEMP[2][0..7] MOV OUT[1], TEMP[1][ADDR[0].x] In this code, each chunk of temporaries has an explicit name as Marek suggested in his comments to the #4 proposal. What exactly is your proposal doing? Can you please provide some more sample TGSI code to illustrate what you're doing? And, how it would be extended for inputs/outputs? Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 19:33, Brian Paul wrote: On 03/11/2013 06:44 AM, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, I still don't quite get what's going on here. In Christoph's reply, it seems he tested your patch and got TGSI code that looks like this: DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] What exactly does LOCAL mean on the temp declarations? That the register isn't used for parameter passing between subroutines. Has been introduced a long time ago. See commit 2644952bd4dfa3b75112dee8dfd287a12d770705. But in Jose's example, he wrote: DCL TEMP[1][0..70] DCL TEMP[2][0..7] MOV OUT[1], TEMP[1][ADDR[0].x] In this code, each chunk of temporaries has an explicit name as Marek suggested in his comments to the #4 proposal. The point is that TEMP (and all other spaces likewise) are still a single space, i.e. without duplicate indices. The only real change is that an indirect access is supplied with the index of the declaration of which the range will be accessed. What exactly is your proposal doing? Can you please provide some more sample TGSI code to illustrate what you're doing? And, how it would be extended for inputs/outputs? Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev