Re: [Mesa-dev] [PATCH 3/3] st/mesa: skip lower_output_reads when possible

2016-11-30 Thread Nicolai Hähnle

On 30.11.2016 12:43, Marek Olšák wrote:

On Wed, Nov 30, 2016 at 9:09 AM, Nicolai Hähnle  wrote:

On 29.11.2016 12:41, Marek Olšák wrote:


For the series:

Reviewed-by: Marek Olšák 

It was a matter of time that this would resurface again. We used to
have this, but some people didn't want it and removed it.



I can see how not reading from outputs might make life easier for some
drivers, but with the way we use LLVM, it was just redundant.



I wonder if radeonsi implements output indirect indexing exactly like
temps, or if there are differences.



Not quite. TCS is completely different, but in other shader stages output
indirect indexing uses the "fallback" path that builds an LLVM-level vector
out of the relevant part of the output file, then does insertelement, then
stores everything back to the output file.

Temporary indirect indexing should always use the path where the array is
one big alloca (or one alloca per component), and we load/store from a
pointer into that array. The fallback path should only be used when the
state tracker doesn't provide ArrayIDs.

If indirect indexing of outputs becomes a problem, we could try to
communicate ArrayIDs of outputs to improve that code.


We do have ArrayIDs of outputs for all shader stages except for fragment.


You're right. They're currently only used to limit the size of the 
temporary LLVM vector, we don't do the same alloca improvement as we do 
for temporaries though.


Nicolai


Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: skip lower_output_reads when possible

2016-11-30 Thread Marek Olšák
On Wed, Nov 30, 2016 at 9:09 AM, Nicolai Hähnle  wrote:
> On 29.11.2016 12:41, Marek Olšák wrote:
>>
>> For the series:
>>
>> Reviewed-by: Marek Olšák 
>>
>> It was a matter of time that this would resurface again. We used to
>> have this, but some people didn't want it and removed it.
>
>
> I can see how not reading from outputs might make life easier for some
> drivers, but with the way we use LLVM, it was just redundant.
>
>
>> I wonder if radeonsi implements output indirect indexing exactly like
>> temps, or if there are differences.
>
>
> Not quite. TCS is completely different, but in other shader stages output
> indirect indexing uses the "fallback" path that builds an LLVM-level vector
> out of the relevant part of the output file, then does insertelement, then
> stores everything back to the output file.
>
> Temporary indirect indexing should always use the path where the array is
> one big alloca (or one alloca per component), and we load/store from a
> pointer into that array. The fallback path should only be used when the
> state tracker doesn't provide ArrayIDs.
>
> If indirect indexing of outputs becomes a problem, we could try to
> communicate ArrayIDs of outputs to improve that code.

We do have ArrayIDs of outputs for all shader stages except for fragment.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: skip lower_output_reads when possible

2016-11-30 Thread Nicolai Hähnle

On 29.11.2016 12:41, Marek Olšák wrote:

For the series:

Reviewed-by: Marek Olšák 

It was a matter of time that this would resurface again. We used to
have this, but some people didn't want it and removed it.


I can see how not reading from outputs might make life easier for some 
drivers, but with the way we use LLVM, it was just redundant.




I wonder if radeonsi implements output indirect indexing exactly like
temps, or if there are differences.


Not quite. TCS is completely different, but in other shader stages 
output indirect indexing uses the "fallback" path that builds an 
LLVM-level vector out of the relevant part of the output file, then does 
insertelement, then stores everything back to the output file.


Temporary indirect indexing should always use the path where the array 
is one big alloca (or one alloca per component), and we load/store from 
a pointer into that array. The fallback path should only be used when 
the state tracker doesn't provide ArrayIDs.


If indirect indexing of outputs becomes a problem, we could try to 
communicate ArrayIDs of outputs to improve that code.


Nicolai




Marek


On Tue, Nov 29, 2016 at 12:01 PM, Nicolai Hähnle  wrote:

From: Nicolai Hähnle 

---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 8a247ea..7720edf 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -6438,21 +6438,22 @@ get_mesa_program_tgsi(struct gl_context *ctx,

v->have_sqrt = pscreen->get_shader_param(pscreen, ptarget,
 
PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED);
v->have_fma = pscreen->get_shader_param(pscreen, ptarget,
PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED);

_mesa_generate_parameters_list_for_uniforms(shader_program, shader,
prog->Parameters);

/* Remove reads from output registers. */
-   lower_output_reads(shader->Stage, shader->ir);
+   if (!pscreen->get_param(pscreen, PIPE_CAP_TGSI_CAN_READ_OUTPUTS))
+  lower_output_reads(shader->Stage, shader->ir);

/* Emit intermediate IR for main(). */
visit_exec_list(shader->ir, v);

 #if 0
/* Print out some information (for debugging purposes) used by the
 * optimization passes. */
{
   int i;
   int *first_writes = rzalloc_array(v->mem_ctx, int, v->next_temp);
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: skip lower_output_reads when possible

2016-11-29 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

It was a matter of time that this would resurface again. We used to
have this, but some people didn't want it and removed it.

I wonder if radeonsi implements output indirect indexing exactly like
temps, or if there are differences.

Marek


On Tue, Nov 29, 2016 at 12:01 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> ---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 8a247ea..7720edf 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -6438,21 +6438,22 @@ get_mesa_program_tgsi(struct gl_context *ctx,
>
> v->have_sqrt = pscreen->get_shader_param(pscreen, ptarget,
>  
> PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED);
> v->have_fma = pscreen->get_shader_param(pscreen, ptarget,
> 
> PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED);
>
> _mesa_generate_parameters_list_for_uniforms(shader_program, shader,
> prog->Parameters);
>
> /* Remove reads from output registers. */
> -   lower_output_reads(shader->Stage, shader->ir);
> +   if (!pscreen->get_param(pscreen, PIPE_CAP_TGSI_CAN_READ_OUTPUTS))
> +  lower_output_reads(shader->Stage, shader->ir);
>
> /* Emit intermediate IR for main(). */
> visit_exec_list(shader->ir, v);
>
>  #if 0
> /* Print out some information (for debugging purposes) used by the
>  * optimization passes. */
> {
>int i;
>int *first_writes = rzalloc_array(v->mem_ctx, int, v->next_temp);
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev