Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Christian König

Am 12.03.2013 02:48, schrieb Marek Olšák:

On Mon, Mar 11, 2013 at 1:44 PM, Christian König
deathsim...@vodafone.de wrote:

Hi everybody,

this problem has been open for quite some time now, with a bunch of different
opinions and sometimes even patches floating on the list.

The solutions proposed or implemented so far all more or less incomplete, so
this approach was designed in mind with both completeness and compatibility
with existing code.

Over all it's just an implementation of what Tom Stellard named solution #4 in
this eMail thread: 
http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

Hi Christian,

this is definitely not the solution #4. According to the TGSI dump
Christoph posted, it looks more like #3.


Well, for me the main difference between proposal #3 and #4 is that #3 
tries to identify the declaration to use with the supplied offset, 
while #4 uses a completely distinct identifier for that.



The solution #4 completely changes the temporary file such that it
becomes two-dimensional with the first index being a literal and the
second index being either a literal or ADDR[literal], and it would
always be like that regardless of whether drivers support that or not.
One-dimensional indexing of TEMP is not allowed. For backward
compatibility, the drivers that do not support it would only get a
single array declaration TEMP[0][0..n] and TEMP[0][...] would be
everywhere in the code.


Ok, then I misunderstood you a bit, but I don't think the difference is 
so much.


What I'm proposing is that we have an optional ArrayID attached to 
each declaration and refer to this ArrayID in the indirect addressing 
operand. To sum it up declarations should look something like this:


DCL TEMP[0..3]// normal registers
DCL TEMP[1][4..11]// indirectly accessed array
DCL TEMP[2][12..15]// another indirectly accessed array
DCL TEMP[16..17] LOCAL// local registers

While an indirect operand might look like this:

MOV TEMP[16], TEMP[1][ADDR[0].x-13]

On the pro side for this approach is that it is compatible with all the 
existing state trackers and driver, and we don't need to generate 
different code depending on weather or not the driver supports this.



I don't know much about TGSI internals, so I can't review this. I'd
just like to say that TGSI dumps should make sense (2D indexing should
be only allowed with 2D declarations) and tgsi_text_translate should
be able to do the reverse - convert the dumps back to TGSI tokens.


Completely agree with that, and beside writing documentation testing 
this is still one of the todos with this patchset.


I have to admit that your approach looks a bit cleaner from the high 
above view. The problem with it is that it requires this additional 2D 
index on every operand, and we just don't have enough bits left for 
this. Even with my approach I need to make room for this ArrayID in the 
indirect addressing operand token, and this additional token is only 
there if the operand uses indirect adressing.


Do you think we can live with my approach or is there any major downside 
I currently don't see?


Thanks for the clarification,
Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Christoph Bumiller
On 12.03.2013 10:31, Christian König wrote:
 Am 12.03.2013 02:48, schrieb Marek Olšák:
 On Mon, Mar 11, 2013 at 1:44 PM, Christian König
 deathsim...@vodafone.de wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
 Hi Christian,

 this is definitely not the solution #4. According to the TGSI dump
 Christoph posted, it looks more like #3.

 Well, for me the main difference between proposal #3 and #4 is that #3
 tries to identify the declaration to use with the supplied offset,
 while #4 uses a completely distinct identifier for that.

 The solution #4 completely changes the temporary file such that it
 becomes two-dimensional with the first index being a literal and the
 second index being either a literal or ADDR[literal], and it would
 always be like that regardless of whether drivers support that or not.
 One-dimensional indexing of TEMP is not allowed. For backward
 compatibility, the drivers that do not support it would only get a
 single array declaration TEMP[0][0..n] and TEMP[0][...] would be
 everywhere in the code.

 Ok, then I misunderstood you a bit, but I don't think the difference
 is so much.

 What I'm proposing is that we have an optional ArrayID attached to
 each declaration and refer to this ArrayID in the indirect
 addressing operand. To sum it up declarations should look something
 like this:

 DCL TEMP[0..3]// normal registers
 DCL TEMP[1][4..11]// indirectly accessed array
 DCL TEMP[2][12..15]// another indirectly accessed array
 DCL TEMP[16..17] LOCAL// local registers

 While an indirect operand might look like this:

 MOV TEMP[16], TEMP[1][ADDR[0].x-13]

 On the pro side for this approach is that it is compatible with all
 the existing state trackers and driver, and we don't need to generate
 different code depending on weather or not the driver supports this.

 I don't know much about TGSI internals, so I can't review this. I'd
 just like to say that TGSI dumps should make sense (2D indexing should
 be only allowed with 2D declarations) and tgsi_text_translate should
 be able to do the reverse - convert the dumps back to TGSI tokens.

 Completely agree with that, and beside writing documentation testing
 this is still one of the todos with this patchset.

 I have to admit that your approach looks a bit cleaner from the high
 above view. The problem with it is that it requires this additional 2D
 index on every operand, and we just don't have enough bits left for
 this. Even with my approach I need to make room for this ArrayID in
 the indirect addressing operand token, and this additional token is
 only there if the operand uses indirect adressing.

 Do you think we can live with my approach or is there any major
 downside I currently don't see?


I can live with it. I think ... (I hope I don't regret this later; seems
like this doesn't contain less information, then it's ok.)
If the placement of the hint index offends someone, just write it as
MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ...
TEMP[ADDR[0].x-13 : 1] or
TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or
something ... nicer.

Actually ...
if TEMP[0] is placed at mem[0]
and TEMP[4..1] is placed at, say, mem[0x1000 in bytes]

do I have to
load $register mem[$addr - 0xd0] (no this can't work) or
load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the
offset) or
load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added
the base TEMP to the immediate offset)

This needs to be documented as well.

 Thanks for the clarification,
 Christian.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Christoph Bumiller
On 12.03.2013 12:10, Christoph Bumiller wrote:
 On 12.03.2013 10:31, Christian König wrote:
 Am 12.03.2013 02:48, schrieb Marek Olšák:
 On Mon, Mar 11, 2013 at 1:44 PM, Christian König
 deathsim...@vodafone.de wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
 Hi Christian,

 this is definitely not the solution #4. According to the TGSI dump
 Christoph posted, it looks more like #3.
 Well, for me the main difference between proposal #3 and #4 is that #3
 tries to identify the declaration to use with the supplied offset,
 while #4 uses a completely distinct identifier for that.

 The solution #4 completely changes the temporary file such that it
 becomes two-dimensional with the first index being a literal and the
 second index being either a literal or ADDR[literal], and it would
 always be like that regardless of whether drivers support that or not.
 One-dimensional indexing of TEMP is not allowed. For backward
 compatibility, the drivers that do not support it would only get a
 single array declaration TEMP[0][0..n] and TEMP[0][...] would be
 everywhere in the code.
 Ok, then I misunderstood you a bit, but I don't think the difference
 is so much.

 What I'm proposing is that we have an optional ArrayID attached to
 each declaration and refer to this ArrayID in the indirect
 addressing operand. To sum it up declarations should look something
 like this:

 DCL TEMP[0..3]// normal registers
 DCL TEMP[1][4..11]// indirectly accessed array
 DCL TEMP[2][12..15]// another indirectly accessed array
 DCL TEMP[16..17] LOCAL// local registers

 While an indirect operand might look like this:

 MOV TEMP[16], TEMP[1][ADDR[0].x-13]

 On the pro side for this approach is that it is compatible with all
 the existing state trackers and driver, and we don't need to generate
 different code depending on weather or not the driver supports this.

 I don't know much about TGSI internals, so I can't review this. I'd
 just like to say that TGSI dumps should make sense (2D indexing should
 be only allowed with 2D declarations) and tgsi_text_translate should
 be able to do the reverse - convert the dumps back to TGSI tokens.
 Completely agree with that, and beside writing documentation testing
 this is still one of the todos with this patchset.

 I have to admit that your approach looks a bit cleaner from the high
 above view. The problem with it is that it requires this additional 2D
 index on every operand, and we just don't have enough bits left for
 this. Even with my approach I need to make room for this ArrayID in
 the indirect addressing operand token, and this additional token is
 only there if the operand uses indirect adressing.

 Do you think we can live with my approach or is there any major
 downside I currently don't see?


One more thing. While you're at it (i.e. are familiar with the code),
could you set the UsageMask in the TGSI declaration so we can pack
scalar or vec2 arrays ?
Also, you could then declare gl_ClipDistance outputs as

DCL OUT[0..7].x, CLIPDIST

so we can actually index clip distances properly ?

With
DCL OUT[0..1].xyzw, CLIPDIST we can't really index the individual
components which leads to
if ((index  3) == 0)
   MOV OUT[index / 4].x = value
else if ((index  3) == 1)
   MOV OUT[index / 4].y = value

which is unnecessary on some hardware.

 I can live with it. I think ... (I hope I don't regret this later; seems
 like this doesn't contain less information, then it's ok.)
 If the placement of the hint index offends someone, just write it as
 MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ...
 TEMP[ADDR[0].x-13 : 1] or
 TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or
 something ... nicer.

 Actually ...
 if TEMP[0] is placed at mem[0]
 and TEMP[4..1] is placed at, say, mem[0x1000 in bytes]

 do I have to
 load $register mem[$addr - 0xd0] (no this can't work) or
 load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the
 offset) or
 load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added
 the base TEMP to the immediate offset)

 This needs to be documented as well.

 Thanks for the clarification,
 Christian.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list

Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Marek Olšák
On Tue, Mar 12, 2013 at 10:31 AM, Christian König
deathsim...@vodafone.de wrote:
 Am 12.03.2013 02:48, schrieb Marek Olšák:

 On Mon, Mar 11, 2013 at 1:44 PM, Christian König
 deathsim...@vodafone.de wrote:

 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less incomplete,
 so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named solution
 #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

 Hi Christian,

 this is definitely not the solution #4. According to the TGSI dump
 Christoph posted, it looks more like #3.


 Well, for me the main difference between proposal #3 and #4 is that #3 tries
 to identify the declaration to use with the supplied offset, while #4 uses
 a completely distinct identifier for that.


 The solution #4 completely changes the temporary file such that it
 becomes two-dimensional with the first index being a literal and the
 second index being either a literal or ADDR[literal], and it would
 always be like that regardless of whether drivers support that or not.
 One-dimensional indexing of TEMP is not allowed. For backward
 compatibility, the drivers that do not support it would only get a
 single array declaration TEMP[0][0..n] and TEMP[0][...] would be
 everywhere in the code.


 Ok, then I misunderstood you a bit, but I don't think the difference is so
 much.

 What I'm proposing is that we have an optional ArrayID attached to each
 declaration and refer to this ArrayID in the indirect addressing operand.
 To sum it up declarations should look something like this:

 DCL TEMP[0..3]// normal registers
 DCL TEMP[1][4..11]// indirectly accessed array
 DCL TEMP[2][12..15]// another indirectly accessed array
 DCL TEMP[16..17] LOCAL// local registers

 While an indirect operand might look like this:

 MOV TEMP[16], TEMP[1][ADDR[0].x-13]

 On the pro side for this approach is that it is compatible with all the
 existing state trackers and driver, and we don't need to generate different
 code depending on weather or not the driver supports this.

In that case, it would be better to avoid using the operator [] and
use something else, because it has nothing to do with indexing.



 I don't know much about TGSI internals, so I can't review this. I'd
 just like to say that TGSI dumps should make sense (2D indexing should
 be only allowed with 2D declarations) and tgsi_text_translate should
 be able to do the reverse - convert the dumps back to TGSI tokens.


 Completely agree with that, and beside writing documentation testing this is
 still one of the todos with this patchset.

 I have to admit that your approach looks a bit cleaner from the high above
 view. The problem with it is that it requires this additional 2D index on
 every operand, and we just don't have enough bits left for this. Even with
 my approach I need to make room for this ArrayID in the indirect addressing
 operand token, and this additional token is only there if the operand uses
 indirect adressing.

 Do you think we can live with my approach or is there any major downside I
 currently don't see?

I'm okay with your approach. Although it doesn't appear to be very
clean, it gives the same amount of information to drivers.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Jose Fonseca


- Original Message -
 On Tue, Mar 12, 2013 at 10:31 AM, Christian König
 deathsim...@vodafone.de wrote:
  Am 12.03.2013 02:48, schrieb Marek Olšák:
 
  On Mon, Mar 11, 2013 at 1:44 PM, Christian König
  deathsim...@vodafone.de wrote:
 
  Hi everybody,
 
  this problem has been open for quite some time now, with a bunch of
  different
  opinions and sometimes even patches floating on the list.
 
  The solutions proposed or implemented so far all more or less incomplete,
  so
  this approach was designed in mind with both completeness and
  compatibility
  with existing code.
 
  Over all it's just an implementation of what Tom Stellard named solution
  #4 in
  this eMail thread:
  http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
 
  Hi Christian,
 
  this is definitely not the solution #4. According to the TGSI dump
  Christoph posted, it looks more like #3.
 
 
  Well, for me the main difference between proposal #3 and #4 is that #3
  tries
  to identify the declaration to use with the supplied offset, while #4
  uses
  a completely distinct identifier for that.
 
 
  The solution #4 completely changes the temporary file such that it
  becomes two-dimensional with the first index being a literal and the
  second index being either a literal or ADDR[literal], and it would
  always be like that regardless of whether drivers support that or not.
  One-dimensional indexing of TEMP is not allowed. For backward
  compatibility, the drivers that do not support it would only get a
  single array declaration TEMP[0][0..n] and TEMP[0][...] would be
  everywhere in the code.
 
 
  Ok, then I misunderstood you a bit, but I don't think the difference is so
  much.
 
  What I'm proposing is that we have an optional ArrayID attached to each
  declaration and refer to this ArrayID in the indirect addressing operand.
  To sum it up declarations should look something like this:
 
  DCL TEMP[0..3]// normal registers
  DCL TEMP[1][4..11]// indirectly accessed array
  DCL TEMP[2][12..15]// another indirectly accessed array
  DCL TEMP[16..17] LOCAL// local registers
 
  While an indirect operand might look like this:
 
  MOV TEMP[16], TEMP[1][ADDR[0].x-13]
 
  On the pro side for this approach is that it is compatible with all the
  existing state trackers and driver, and we don't need to generate different
  code depending on weather or not the driver supports this.
 
 In that case, it would be better to avoid using the operator [] and
 use something else, because it has nothing to do with indexing.

Note that [] operator is also used for const buffer selection.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christian König
Hi everybody,

this problem has been open for quite some time now, with a bunch of different
opinions and sometimes even patches floating on the list.

The solutions proposed or implemented so far all more or less incomplete, so
this approach was designed in mind with both completeness and compatibility
with existing code.

Over all it's just an implementation of what Tom Stellard named solution #4 in
this eMail thread: 
http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

Please review and as usual comments are welcome,
Christian.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 13:44, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of different
 opinions and sometimes even patches floating on the list.
Nice, finally someone implements a proper solution.
However, it seems like this isn't used for arrays in the IN and OUT
files (varyings). Would it be much more work to use it there, too ?

Fragment Shader inputs seem to be read with if (index == 0) return
in[0] else if (index == 1) ... sequences.

And I may have spotted a bug in the following shader:

in vec4 vertex[2];
in vec4 color;
out vec4 value[4];

uniform int i, j;

void main()
{
gl_Position = vertex[i];

value[0] = vertex[0];
value[1] = vertex[1];
value[2] = vec4(0.0);
value[3] = vec4(0.0);
value[j] = color;
}

gives me

DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[12]
DCL OUT[2], GENERIC[13]
DCL OUT[3], GENERIC[14]
DCL OUT[4], GENERIC[15]
DCL CONST[0..1]
DCL TEMP[0..3], LOCAL
DCL TEMP[4], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {0., 0., 0., 0.}
  0: UARL ADDR[0].x, CONST[1].
  1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
there is no IN array, just single ones
  2: MOV TEMP[0], IN[0]
  3: MOV TEMP[1], IN[1]
  4: MOV TEMP[2], IMM[0].
  5: MOV TEMP[3], IMM[0].
  6: UARL ADDR[0].x, CONST[0].
  7: MOV TEMP[1][ADDR[0].x], IN[2]

why is this TEMP[1][] ? The array seems to be the first declaration ...

  8: MOV OUT[1], TEMP[0]
  9: MOV OUT[2], TEMP[1]
 10: MOV OUT[3], TEMP[2]
 11: MOV OUT[4], TEMP[3]
 12: MOV OUT[0], TEMP[4]
 13: END

Ideally this would not use TEMP arrays at all though, but output arrays
(I vaguely recall some radeon card doesn't support this though. Is that
just outputs or also inputs ?).

 The solutions proposed or implemented so far all more or less incomplete, so
 this approach was designed in mind with both completeness and compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named solution #4 in
 this eMail thread: 
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

 Please review and as usual comments are welcome,
 Christian.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christian König

Am 11.03.2013 14:47, schrieb Christoph Bumiller:

On 11.03.2013 13:44, Christian König wrote:

Hi everybody,

this problem has been open for quite some time now, with a bunch of different
opinions and sometimes even patches floating on the list.

Nice, finally someone implements a proper solution.
However, it seems like this isn't used for arrays in the IN and OUT
files (varyings). Would it be much more work to use it there, too ?


Shouldn't be to much of a problem, but I just wanted to solve 
temporaries first and when that's working look at all the rest.



Fragment Shader inputs seem to be read with if (index == 0) return
in[0] else if (index == 1) ... sequences.


Well as said before it only handles temp arrays for now. That looks like 
the code that's generated if the driver reports to not have indirect 
support, do you know off hand where exactly that's handled? The 
glsl_to_tgsi code is unfortunately hard to read at best.




And I may have spotted a bug in the following shader:

in vec4 vertex[2];
in vec4 color;
out vec4 value[4];

uniform int i, j;

void main()
{
 gl_Position = vertex[i];

 value[0] = vertex[0];
 value[1] = vertex[1];
 value[2] = vec4(0.0);
 value[3] = vec4(0.0);
 value[j] = color;
}

gives me

DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[12]
DCL OUT[2], GENERIC[13]
DCL OUT[3], GENERIC[14]
DCL OUT[4], GENERIC[15]
DCL CONST[0..1]
DCL TEMP[0..3], LOCAL
DCL TEMP[4], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {0., 0., 0., 0.}
   0: UARL ADDR[0].x, CONST[1].
   1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
there is no IN array, just single ones
   2: MOV TEMP[0], IN[0]
   3: MOV TEMP[1], IN[1]
   4: MOV TEMP[2], IMM[0].
   5: MOV TEMP[3], IMM[0].
   6: UARL ADDR[0].x, CONST[0].
   7: MOV TEMP[1][ADDR[0].x], IN[2]

why is this TEMP[1][] ? The array seems to be the first declaration ...


I numbered the declarations starting with 1 (and not 0), so I could use 
0 as the SPECIAL case saying that we want to address the whole range 
of registers and not just one declaration. I did this just for 
compatibility reasons, so I could look at handling temps only, and 
doesn't bother to much with inputs/outputs.


Well so far the patchset is just an RFC, and so I want to let the list 
see the patches before either implementing inputs/outputs as well or 
fully document such quirks/hacks.



   8: MOV OUT[1], TEMP[0]
   9: MOV OUT[2], TEMP[1]
  10: MOV OUT[3], TEMP[2]
  11: MOV OUT[4], TEMP[3]
  12: MOV OUT[0], TEMP[4]
  13: END

Ideally this would not use TEMP arrays at all though, but output arrays
(I vaguely recall some radeon card doesn't support this though. Is that
just outputs or also inputs ?).


More or less correct, modern radeons don't have an output register 
space, but instead have export instructions. In the current driver we 
allocate registers for temps and outputs to work around this, but in the 
example above it wouldn't be necessary.


Inputs are just registers as well, either preloaded when starting the 
shader or filled in by special instructions (vector fetches, coordinate 
interpolation etc...).


Thanks for the comments,
Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 15:38, Christian König wrote:
 Am 11.03.2013 14:47, schrieb Christoph Bumiller:
 On 11.03.2013 13:44, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.
 Nice, finally someone implements a proper solution.
 However, it seems like this isn't used for arrays in the IN and OUT
 files (varyings). Would it be much more work to use it there, too ?

 Shouldn't be to much of a problem, but I just wanted to solve
 temporaries first and when that's working look at all the rest.

 Fragment Shader inputs seem to be read with if (index == 0) return
 in[0] else if (index == 1) ... sequences.

 Well as said before it only handles temp arrays for now. That looks
 like the code that's generated if the driver reports to not have
 indirect support, do you know off hand where exactly that's handled?
 The glsl_to_tgsi code is unfortunately hard to read at best.


Apologies, I didn't remember I that I didn't advertise indirect support
for fragment shaders, indirect inputs would be supported though.
The reason why I really want array support for inputs, too, is that
input space location depends on semantic, and thus doesn't necessarily
correspond to the TGSI order.

Treatment of arrays should be consistent in the end, right now it looks
like we're having, if you read this like C code:
float temp0[4];
temp0[i] = x;

but
float in0, in1, in2, in3;
x = in[i];

 why is this TEMP[1][] ? The array seems to be the first declaration ...

 I numbered the declarations starting with 1 (and not 0), so I could
 use 0 as the SPECIAL case saying that we want to address the whole
 range of registers and not just one declaration. I did this just for
 compatibility reasons, so I could look at handling temps only, and
 doesn't bother to much with inputs/outputs.

 Well so far the patchset is just an RFC, and so I want to let the list
 see the patches before either implementing inputs/outputs as well or
 fully document such quirks/hacks.


Ah, good to know. This should be documented (maybe it is and I missed it
?). At least in the comment above struct tgsi_ind_register's definition,
which is what I'd look at first.

Thanks again for doing this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Brian Paul

On 03/11/2013 06:44 AM, Christian König wrote:

Hi everybody,

this problem has been open for quite some time now, with a bunch of different
opinions and sometimes even patches floating on the list.

The solutions proposed or implemented so far all more or less incomplete, so
this approach was designed in mind with both completeness and compatibility
with existing code.

Over all it's just an implementation of what Tom Stellard named solution #4 in
this eMail thread: 
http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

Please review and as usual comments are welcome,


I still don't quite get what's going on here.

In Christoph's reply, it seems he tested your patch and got TGSI code 
that looks like this:


DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[12]
DCL OUT[2], GENERIC[13]
DCL OUT[3], GENERIC[14]
DCL OUT[4], GENERIC[15]
DCL CONST[0..1]
DCL TEMP[0..3], LOCAL
DCL TEMP[4], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {0., 0., 0., 0.}
  0: UARL ADDR[0].x, CONST[1].
  1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
there is no IN array, just single ones
  2: MOV TEMP[0], IN[0]
  3: MOV TEMP[1], IN[1]
  4: MOV TEMP[2], IMM[0].
  5: MOV TEMP[3], IMM[0].
  6: UARL ADDR[0].x, CONST[0].
  7: MOV TEMP[1][ADDR[0].x], IN[2]

What exactly does LOCAL mean on the temp declarations?

But in Jose's example, he wrote:

  DCL TEMP[1][0..70]
  DCL TEMP[2][0..7]
  MOV OUT[1], TEMP[1][ADDR[0].x]

In this code, each chunk of temporaries has an explicit name as Marek 
suggested in his comments to the #4 proposal.


What exactly is your proposal doing?

Can you please provide some more sample TGSI code to illustrate what 
you're doing?  And, how it would be extended for inputs/outputs?


Thanks.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 19:33, Brian Paul wrote:
 On 03/11/2013 06:44 AM, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

 Please review and as usual comments are welcome,

 I still don't quite get what's going on here.

 In Christoph's reply, it seems he tested your patch and got TGSI code
 that looks like this:

 DCL IN[0]
 DCL IN[1]
 DCL IN[2]
 DCL OUT[0], POSITION
 DCL OUT[1], GENERIC[12]
 DCL OUT[2], GENERIC[13]
 DCL OUT[3], GENERIC[14]
 DCL OUT[4], GENERIC[15]
 DCL CONST[0..1]
 DCL TEMP[0..3], LOCAL
 DCL TEMP[4], LOCAL
 DCL ADDR[0]
 IMM[0] FLT32 {0., 0., 0., 0.}
   0: UARL ADDR[0].x, CONST[1].
   1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
 there is no IN array, just single ones
   2: MOV TEMP[0], IN[0]
   3: MOV TEMP[1], IN[1]
   4: MOV TEMP[2], IMM[0].
   5: MOV TEMP[3], IMM[0].
   6: UARL ADDR[0].x, CONST[0].
   7: MOV TEMP[1][ADDR[0].x], IN[2]

 What exactly does LOCAL mean on the temp declarations?

That the register isn't used for parameter passing between subroutines.
Has been introduced a long time ago.
See commit 2644952bd4dfa3b75112dee8dfd287a12d770705.

 But in Jose's example, he wrote:

   DCL TEMP[1][0..70]
   DCL TEMP[2][0..7]
   MOV OUT[1], TEMP[1][ADDR[0].x]

 In this code, each chunk of temporaries has an explicit name as Marek
 suggested in his comments to the #4 proposal.


The point is that TEMP (and all other spaces likewise) are still a
single space, i.e. without duplicate indices. The only real change is
that an indirect access is supplied with the index of the declaration of
which the range will be accessed.

 What exactly is your proposal doing?

 Can you please provide some more sample TGSI code to illustrate what
 you're doing?  And, how it would be extended for inputs/outputs?

 Thanks.

 -Brian


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Marek Olšák
On Mon, Mar 11, 2013 at 1:44 PM, Christian König
deathsim...@vodafone.de wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less incomplete, so
 this approach was designed in mind with both completeness and compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named solution #4 in
 this eMail thread: 
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

Hi Christian,

this is definitely not the solution #4. According to the TGSI dump
Christoph posted, it looks more like #3.

The solution #4 completely changes the temporary file such that it
becomes two-dimensional with the first index being a literal and the
second index being either a literal or ADDR[literal], and it would
always be like that regardless of whether drivers support that or not.
One-dimensional indexing of TEMP is not allowed. For backward
compatibility, the drivers that do not support it would only get a
single array declaration TEMP[0][0..n] and TEMP[0][...] would be
everywhere in the code.

I don't know much about TGSI internals, so I can't review this. I'd
just like to say that TGSI dumps should make sense (2D indexing should
be only allowed with 2D declarations) and tgsi_text_translate should
be able to do the reverse - convert the dumps back to TGSI tokens.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev