Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-04-29 Thread Marek Olšák
On Fri, Apr 29, 2016 at 11:42 AM, Ian Romanick  wrote:
> On 02/09/2016 12:02 AM, Ian Romanick wrote:
>> I submitted a public spec bug for this issue:
>>
>> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460
>>
>> I'm investigating whether a similar bug is needed for the SPIR-V
>> specification.
>>
>> I think an argument can be made for either the flush-to-zero or
>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>
>> "The precision of stored single- and double-precision floating-point
>> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>> floating-point numbersAny denormalized value input into a
>> shader or potentially generated by any operation in a shader can be
>> flushed to 0."
>>
>> Since there is no half-precision type in desktop GLSL, there is no
>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>> 16-bit subnormal values values are 32-bit normal values.
>>
>> As I mentioned before, from the point of view of an application
>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>> surprising and awful. :)
>>
>> While I think an argument can be made for either behavior, I also think
>> the argument for the non-flush-to-zero behavior is slightly stronger.
>> The case for flush-to-zero based on the above spec quotation fails for
>> two reasons.  First, the "input into [the] shader" is not a subnormal
>> number.  It is an integer.  Second, the "[value] potentially generated
>> by [the] operation" is not subnormal in single-precision.
>>
>> We've already determined that NVIDIA closed-source drivers do not flush
>> to zero.  I'm curious to know what AMD's closed-source drivers do for
>> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
>> flush to zero, then you had better believe that applications depend on
>> that behavior... and that also means that it doesn't matter very much
>> what piglit does or the spec does (or does not) say.  This is the sort
>> of situation where the spec changes to match application expectations
>> and shipping implementations... and Mesa drivers change to follow.  This
>> isn't even close to the first time through that loop.
>
> There is finally a conclusion to this issue.  The GLSL ES 3.2 spec says:
>
> "Returns a two-component floating-point vector with components
> obtained by unpacking a 32-bit unsigned integer into a pair of
> 16-bit values, interpreting those values as 16-bit floating-point
> numbers according to the OpenGL ES Specification, and converting
> them to 32-bit floating-point values."
>
> Since the OpenGL ES specification allows 16-bit floating-point denorms
> to flush to zero, the "interpreting those values as 16-bit
> floating-point numbers according to the OpenGL ES Specification" allows
> the flush to zero.
>
> The bug has been closed with the following comment:
>
>> For current versions and hardware, we have decided both ways have to
>> be supported: it okay to flush to zero and it is okay to not flush to
>> zero.
>>
>> Due to the general desire is to preserve values, we are looking at
>> future versions revisiting this current state.
>
> So... flushing is okay, but it may be okay in the future.  I don't think
> that helps answer the question about whether or not we should keep the
> piglit test that expects non-flush-to-zero behavior.
>

Thanks for the info. Since radeonsi now preserves 16-bit denorms on
all chips, we should be fine.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-04-29 Thread Ian Romanick
On 02/09/2016 12:02 AM, Ian Romanick wrote:
> I submitted a public spec bug for this issue:
> 
> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460
> 
> I'm investigating whether a similar bug is needed for the SPIR-V
> specification.
> 
> I think an argument can be made for either the flush-to-zero or
> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
> mentions subnormal values is section 4.7.1 (Range and Precision).
> 
> "The precision of stored single- and double-precision floating-point
> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
> floating-point numbersAny denormalized value input into a
> shader or potentially generated by any operation in a shader can be
> flushed to 0."
> 
> Since there is no half-precision type in desktop GLSL, there is no
> mention of 16-bit subnormal values.  As Roland mentioned before, all
> 16-bit subnormal values values are 32-bit normal values.
> 
> As I mentioned before, from the point of view of an application
> developer, the flush-to-zero behavior for unpackHalf2x16 is both
> surprising and awful. :)
> 
> While I think an argument can be made for either behavior, I also think
> the argument for the non-flush-to-zero behavior is slightly stronger.
> The case for flush-to-zero based on the above spec quotation fails for
> two reasons.  First, the "input into [the] shader" is not a subnormal
> number.  It is an integer.  Second, the "[value] potentially generated
> by [the] operation" is not subnormal in single-precision.
> 
> We've already determined that NVIDIA closed-source drivers do not flush
> to zero.  I'm curious to know what AMD's closed-source drivers do for
> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
> flush to zero, then you had better believe that applications depend on
> that behavior... and that also means that it doesn't matter very much
> what piglit does or the spec does (or does not) say.  This is the sort
> of situation where the spec changes to match application expectations
> and shipping implementations... and Mesa drivers change to follow.  This
> isn't even close to the first time through that loop.

There is finally a conclusion to this issue.  The GLSL ES 3.2 spec says:

"Returns a two-component floating-point vector with components
obtained by unpacking a 32-bit unsigned integer into a pair of
16-bit values, interpreting those values as 16-bit floating-point
numbers according to the OpenGL ES Specification, and converting
them to 32-bit floating-point values."

Since the OpenGL ES specification allows 16-bit floating-point denorms
to flush to zero, the "interpreting those values as 16-bit
floating-point numbers according to the OpenGL ES Specification" allows
the flush to zero.

The bug has been closed with the following comment:

> For current versions and hardware, we have decided both ways have to
> be supported: it okay to flush to zero and it is okay to not flush to
> zero.
>
> Due to the general desire is to preserve values, we are looking at
> future versions revisiting this current state.

So... flushing is okay, but it may be okay in the future.  I don't think
that helps answer the question about whether or not we should keep the
piglit test that expects non-flush-to-zero behavior.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-09 Thread Matt Arsenault

> On Feb 9, 2016, at 11:23, Tom Stellard  wrote:
> 
> We should still add +fp64-denormals even if the backend doesn't do
> anything with it now.

This is the default, so it doesn’t really matter anyway.

-Matt___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-09 Thread Tom Stellard
On Mon, Feb 08, 2016 at 09:38:32PM +0100, Marek Olšák wrote:
> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
> > On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> >> From: Marek Olšák 
> >>
> >> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> >> but not SI & CI, which can't disable denorms for those instructions.
> >
> > Do you know why this fixes FP16 conversions?  What does the OpenGL
> > spec say about denormal handing?
> 
> Yes, I know why. The patch explain everything as far as I can see
> though. What isn't clear?
> 

I got it now.

> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
> and accept FP16 denormals.
> VI: Supports FP16. FP16 denormal support is now configurable and
> affects FP16 conversions as well.(shared setting with FP64).
> 
> OpenGL doesn't require denormals. Piglit does. I think this is
> incorrect piglit behavior.
> 
> >
> >> ---
> >>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
> >>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
> >>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
> >>  3 files changed, 29 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >> b/src/gallium/drivers/radeonsi/si_shader.c
> >> index a4680ce..3f1db70 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
> >>
> >>   si_shader_binary_read_config(binary, conf, 0);
> >>
> >> + /* Enable 64-bit and 16-bit denormals, because there is no 
> >> performance
> >> +  * cost.
> >> +  *
> >> +  * If denormals are enabled, all floating-point output modifiers are
> >> +  * ignored.
> >> +  *
> >> +  * Don't enable denormals for 32-bit floats, because:
> >> +  * - Floating-point output modifiers would be ignored by the hw.
> >> +  * - Some opcodes don't support denormals, such as v_mad_f32. We 
> >> would
> >> +  *   have to stop using those.
> >> +  * - SI & CI would be very slow.
> >> +  */
> >> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> >> +
> >
> > Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
> do on SI & CI.
> 
> >
> > We should tell the compiler we are enabling fp-64 denorms by adding
> > +fp64-denormals to the feature string.  It would also be better to
> > read the float_mode value from the config registers emitted by the
> > compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.
> 

We should still add +fp64-denormals even if the backend doesn't do
anything with it now.  This will make it easier if we have to use this
feature string to enable fix a bug in the backend,
because we will just be able to update LLVM.

I don't have a problem hard-coding float_mode for now, but once LLVM is
emitting the correct thing, we should pull the value from LLVM.

-Tom

> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
> 
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
This is the only mode that should ever be used. I’m not sure why these are 
options. There technically are separate flush on input or flush on output 
options, but I’m not sure why they would be used.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
> 
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.

Yes, the runtime here should read the value out of the binary and enable it in 
the compiler rather than the runtime hardcoding it. If you wanted to load a 
shader with different FP rules for example it should be able to switch.

-Matt___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Tom Stellard
On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.

Do you know why this fixes FP16 conversions?  What does the OpenGL
spec say about denormal handing?

> ---
>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index a4680ce..3f1db70 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>  
>   si_shader_binary_read_config(binary, conf, 0);
>  
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> +

Do SI/CI support fp64 denorms?  If so, won't this hurt performance?

We should tell the compiler we are enabling fp-64 denorms by adding
+fp64-denormals to the feature string.  It would also be better to
read the float_mode value from the config registers emitted by the
compiler.


-Tom
>   FREE(binary->config);
>   FREE(binary->global_symbol_offsets);
>   binary->config = NULL;
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index ce795c0..77a4e47 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
>   shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
> 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
>  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B528_DX10_CLAMP(1);
> +S_00B528_DX10_CLAMP(1) |
> +S_00B528_FLOAT_MODE(shader->config.float_mode);
>   shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  
> S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
>  }
> @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B428_SGPRS((num_sgprs - 1) / 8) |
> -S_00B428_DX10_CLAMP(1));
> +S_00B428_DX10_CLAMP(1) |
> +S_00B428_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  
> S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
>  S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B328_SGPRS((num_sgprs - 1) / 8) |
>  S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B328_DX10_CLAMP(1));
> +S_00B328_DX10_CLAMP(1) |
> +S_00B328_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
>  S_00B32C_USER_SGPR(num_user_sgprs) |
>  
> S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
>  S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B228_SGPRS((num_sgprs - 1) / 8) |
> -S_00B228_DX10_CLAMP(1));
> +S_00B228_DX10_CLAMP(1) |
> +S_00B228_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
>  S_00B22C_USER_SGPR(num_user_sgprs) |
>  
> S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
> si_shader *gs)
>  S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 1:11 PM, Roland Scheidegger  wrote:
> This looks good to me, albeit I know nothing about the hw.
> So VI could do (just with some restrictios) even full-speed fp32 denorms
> whereas SI/CI can't? Interesting, I suppose that would be intended for

Yes, VI has full-speed fp32 denorms except for a few instructions
(e.g. MAD) which can be replaced by other instructions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Roland Scheidegger
This looks good to me, albeit I know nothing about the hw.
So VI could do (just with some restrictios) even full-speed fp32 denorms
whereas SI/CI can't? Interesting, I suppose that would be intended for
compute. intel x86 can't even do that (actually, I think skylake can),
though certainly other cpus could do that for ages.

(Albeit there's still nothing in the glsl spec which says this is
required for fp16 pack...)

Roland

Am 06.02.2016 um 13:15 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.
> ---
>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index a4680ce..3f1db70 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>  
>   si_shader_binary_read_config(binary, conf, 0);
>  
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> +
>   FREE(binary->config);
>   FREE(binary->global_symbol_offsets);
>   binary->config = NULL;
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index ce795c0..77a4e47 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
>   shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
> 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
>  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B528_DX10_CLAMP(1);
> +S_00B528_DX10_CLAMP(1) |
> +S_00B528_FLOAT_MODE(shader->config.float_mode);
>   shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  
> S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
>  }
> @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B428_SGPRS((num_sgprs - 1) / 8) |
> -S_00B428_DX10_CLAMP(1));
> +S_00B428_DX10_CLAMP(1) |
> +S_00B428_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  
> S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
>  S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B328_SGPRS((num_sgprs - 1) / 8) |
>  S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B328_DX10_CLAMP(1));
> +S_00B328_DX10_CLAMP(1) |
> +S_00B328_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
>  S_00B32C_USER_SGPR(num_user_sgprs) |
>  
> S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
>  S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B228_SGPRS((num_sgprs - 1) / 8) |
> -S_00B228_DX10_CLAMP(1));
> +S_00B228_DX10_CLAMP(1) |
> +S_00B228_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
>  S_00B22C_USER_SGPR(num_user_sgprs) |
>  
> S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
> si_shader *gs)
>  

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 12:38, Marek Olšák  wrote:
> 
>> 
>> We should tell the compiler we are enabling fp-64 denorms by adding
>> +fp64-denormals to the feature string.  It would also be better to
>> read the float_mode value from the config registers emitted by the
>> compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.

This is a bug. I think I left these because the config register macro names 
were different for the other shader types, even though they appeared to be the 
same thing.

-Matt___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Nicolai Hähnle

On 06.02.2016 07:15, Marek Olšák wrote:

From: Marek Olšák 

This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.


Reviewed-by: Nicolai Hähnle 


---
  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
  src/gallium/drivers/radeonsi/sid.h  |  3 +++
  3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a4680ce..3f1db70 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

+   /* Enable 64-bit and 16-bit denormals, because there is no performance
+* cost.
+*
+* If denormals are enabled, all floating-point output modifiers are
+* ignored.
+*
+* Don't enable denormals for 32-bit floats, because:
+* - Floating-point output modifiers would be ignored by the hw.
+* - Some opcodes don't support denormals, such as v_mad_f32. We would
+*   have to stop using those.
+* - SI & CI would be very slow.
+*/
+   conf->float_mode |= V_00B028_FP_64_DENORMS;
+
FREE(binary->config);
FREE(binary->global_symbol_offsets);
binary->config = NULL;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index ce795c0..77a4e47 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
4) |
   S_00B528_SGPRS((num_sgprs - 1) / 8) |
   S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B528_DX10_CLAMP(1);
+  S_00B528_DX10_CLAMP(1) |
+  S_00B528_FLOAT_MODE(shader->config.float_mode);
shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
   
S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
  }
@@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
   S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B428_SGPRS((num_sgprs - 1) / 8) |
-  S_00B428_DX10_CLAMP(1));
+  S_00B428_DX10_CLAMP(1) |
+  S_00B428_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
   S_00B42C_USER_SGPR(num_user_sgprs) |
   S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
   S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B328_SGPRS((num_sgprs - 1) / 8) |
   S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B328_DX10_CLAMP(1));
+  S_00B328_DX10_CLAMP(1) |
+  S_00B328_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
   S_00B32C_USER_SGPR(num_user_sgprs) |
   S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
   S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B228_SGPRS((num_sgprs - 1) / 8) |
-  S_00B228_DX10_CLAMP(1));
+  S_00B228_DX10_CLAMP(1) |
+  S_00B228_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
   S_00B22C_USER_SGPR(num_user_sgprs) |
   S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
si_shader *gs)
   S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B128_SGPRS((num_sgprs - 1) / 8) |
   S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B128_DX10_CLAMP(1));
+  S_00B128_DX10_CLAMP(1) |
+  S_00B128_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B12C_SPI_SHADER_PGM_RSRC2_VS,
   S_00B12C_USER_SGPR(num_user_sgprs) |
  

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 5:16 PM, Matt Arsenault  wrote:
>
> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
>
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
>
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.
>
>
> Yes, the runtime here should read the value out of the binary and enable it
> in the compiler rather than the runtime hardcoding it. If you wanted to load
> a shader with different FP rules for example it should be able to switch.

This is not the runtime. This is the compiler. :) It's the middle-end
though, not the back-end.

I would use +fp64-denormals if it did something to graphics shaders.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>> but not SI & CI, which can't disable denorms for those instructions.
>
> Do you know why this fixes FP16 conversions?  What does the OpenGL
> spec say about denormal handing?

Yes, I know why. The patch explain everything as far as I can see
though. What isn't clear?

SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
and accept FP16 denormals.
VI: Supports FP16. FP16 denormal support is now configurable and
affects FP16 conversions as well.(shared setting with FP64).

OpenGL doesn't require denormals. Piglit does. I think this is
incorrect piglit behavior.

>
>> ---
>>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>>  3 files changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>> b/src/gallium/drivers/radeonsi/si_shader.c
>> index a4680ce..3f1db70 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>>
>>   si_shader_binary_read_config(binary, conf, 0);
>>
>> + /* Enable 64-bit and 16-bit denormals, because there is no performance
>> +  * cost.
>> +  *
>> +  * If denormals are enabled, all floating-point output modifiers are
>> +  * ignored.
>> +  *
>> +  * Don't enable denormals for 32-bit floats, because:
>> +  * - Floating-point output modifiers would be ignored by the hw.
>> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
>> +  *   have to stop using those.
>> +  * - SI & CI would be very slow.
>> +  */
>> + conf->float_mode |= V_00B028_FP_64_DENORMS;
>> +
>
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?

Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
do on SI & CI.

>
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.

Yes, I agree, but LLVM only sets these parameters for compute or even
HSA-only kernels, not for graphics shaders. We need to set the mode
for all users _now_, not in 6 months. Last time I looked,
+fp64-denormals had no effect on graphics shaders.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Roland Scheidegger
Am 09.02.2016 um 00:53 schrieb Ian Romanick:
> On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
>> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
>>> On 02/08/2016 12:38 PM, Marek Olšák wrote:
 On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>> but not SI & CI, which can't disable denorms for those instructions.
>
> Do you know why this fixes FP16 conversions?  What does the OpenGL
> spec say about denormal handing?

 Yes, I know why. The patch explain everything as far as I can see
 though. What isn't clear?

 SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
 and accept FP16 denormals.
 VI: Supports FP16. FP16 denormal support is now configurable and
 affects FP16 conversions as well.(shared setting with FP64).

 OpenGL doesn't require denormals. Piglit does. I think this is
 incorrect piglit behavior.
>>>
>>> I submitted a public spec bug for this issue:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460=BQIDaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg=
>>>  
>>>
>>> I'm investigating whether a similar bug is needed for the SPIR-V
>>> specification.
>>>
>>> I think an argument can be made for either the flush-to-zero or
>>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>>
>>> "The precision of stored single- and double-precision floating-point
>>> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>>> floating-point numbersAny denormalized value input into a
>>> shader or potentially generated by any operation in a shader can be
>>> flushed to 0."
>>>
>>> Since there is no half-precision type in desktop GLSL, there is no
>>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>>> 16-bit subnormal values values are 32-bit normal values.
>>>
>>> As I mentioned before, from the point of view of an application
>>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>>> surprising and awful. :)
>>>
>>> While I think an argument can be made for either behavior, I also think
>>> the argument for the non-flush-to-zero behavior is slightly stronger.
>>> The case for flush-to-zero based on the above spec quotation fails for
>>> two reasons.  First, the "input into [the] shader" is not a subnormal
>>> number.  It is an integer.  Second, the "[value] potentially generated
>>> by [the] operation" is not subnormal in single-precision.
>>
>> I don't disagree with that, however OTOH you could make an argument that
>> such a strong guarantee for packed half floats is inconsistent with
>> what's required for them elsewhere in GL. In particular half float
>> texture formats - these are still based on ARB_half_float_pixel. Which
>> says denormals are optional, infs are optional, NaNs are optional -
>> albeit that's not any different to ordinary floats...
> 
> Thanks for mentioning this. :)  The same issue had occurred to me, and I
> was trying to find some relevant text in the GL spec.  I hadn't thought
> to look in the extension spec.
GL core spec 4.5 actually mentions pretty much the same within the
generic numeric bits, section 2.3.4.2 - except the extension bit has
explicitly listed that exponent 0 and mantissa non-zero may be decoded
to zero (and similar for infs, nans). But the core bits text still
mentions just that "providing a denormalized number or negative zero to
GL must yield predictable results" so flush to zero is apparently still
allowed.


> 
>> (And I still have the problem that d3d10 wants trunc behavior instead of
>> round... fwiw the precedent there in GL is also for r11g11b10 format,
>> which says round-to-nearest recommended but trunc allowed, and all too
>> large finite numbers converted to max finite (which is inconsistent with
>> nearest rounding). The spec is completely silent both within GLSL or GL
>> how rounding should be done for fp32 to fp16, albeit I don't disagree
>> round-to-nearest seems the most reasonable.)
> 
> The GLSL spec isn't silent.  Section 4.7.1 explicitly says, "The
> rounding mode cannot be set and is undefined."
Yes, but at least to me it's not really obvious this applies to all
operations - and at least the basic operations say "must be correctly
rounded", what does this even mean if the rounding mode isn't defined in
the first place? Would the rounding mode have to be consistent for all
operations, so, always "trunc" for all operations would 

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
FWIW, D3D requires round-toward-zero for conversion to FP16 colorbuffers.

radeonsi does the same: round-toward-zero for colorbuffers and
round-to-nearest-even for the GLSL packing functions. This looks like
an inconsistency that may cause issues though.

Marek

On Tue, Feb 9, 2016 at 1:12 AM, Roland Scheidegger  wrote:
> Am 09.02.2016 um 00:53 schrieb Ian Romanick:
>> On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
>>> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
 On 02/08/2016 12:38 PM, Marek Olšák wrote:
> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>>> From: Marek Olšák 
>>>
>>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>>> but not SI & CI, which can't disable denorms for those instructions.
>>
>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>> spec say about denormal handing?
>
> Yes, I know why. The patch explain everything as far as I can see
> though. What isn't clear?
>
> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
> and accept FP16 denormals.
> VI: Supports FP16. FP16 denormal support is now configurable and
> affects FP16 conversions as well.(shared setting with FP64).
>
> OpenGL doesn't require denormals. Piglit does. I think this is
> incorrect piglit behavior.

 I submitted a public spec bug for this issue:

 https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460=BQIDaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg=

 I'm investigating whether a similar bug is needed for the SPIR-V
 specification.

 I think an argument can be made for either the flush-to-zero or
 non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
 packHalf2x16.  The only place in the GLSL 4.50.5 specification that
 mentions subnormal values is section 4.7.1 (Range and Precision).

 "The precision of stored single- and double-precision floating-point
 variables is defined by the IEEE 754 standard for 32-bit and 64-bit
 floating-point numbersAny denormalized value input into a
 shader or potentially generated by any operation in a shader can be
 flushed to 0."

 Since there is no half-precision type in desktop GLSL, there is no
 mention of 16-bit subnormal values.  As Roland mentioned before, all
 16-bit subnormal values values are 32-bit normal values.

 As I mentioned before, from the point of view of an application
 developer, the flush-to-zero behavior for unpackHalf2x16 is both
 surprising and awful. :)

 While I think an argument can be made for either behavior, I also think
 the argument for the non-flush-to-zero behavior is slightly stronger.
 The case for flush-to-zero based on the above spec quotation fails for
 two reasons.  First, the "input into [the] shader" is not a subnormal
 number.  It is an integer.  Second, the "[value] potentially generated
 by [the] operation" is not subnormal in single-precision.
>>>
>>> I don't disagree with that, however OTOH you could make an argument that
>>> such a strong guarantee for packed half floats is inconsistent with
>>> what's required for them elsewhere in GL. In particular half float
>>> texture formats - these are still based on ARB_half_float_pixel. Which
>>> says denormals are optional, infs are optional, NaNs are optional -
>>> albeit that's not any different to ordinary floats...
>>
>> Thanks for mentioning this. :)  The same issue had occurred to me, and I
>> was trying to find some relevant text in the GL spec.  I hadn't thought
>> to look in the extension spec.
> GL core spec 4.5 actually mentions pretty much the same within the
> generic numeric bits, section 2.3.4.2 - except the extension bit has
> explicitly listed that exponent 0 and mantissa non-zero may be decoded
> to zero (and similar for infs, nans). But the core bits text still
> mentions just that "providing a denormalized number or negative zero to
> GL must yield predictable results" so flush to zero is apparently still
> allowed.
>
>
>>
>>> (And I still have the problem that d3d10 wants trunc behavior instead of
>>> round... fwiw the precedent there in GL is also for r11g11b10 format,
>>> which says round-to-nearest recommended but trunc allowed, and all too
>>> large finite numbers converted to max finite (which is inconsistent with
>>> nearest rounding). The spec is completely silent both within GLSL or GL
>>> how rounding should be done for fp32 to fp16, albeit I don't disagree
>>> round-to-nearest seems the most reasonable.)
>>
>> The GLSL spec isn't 

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Ian Romanick
On 02/08/2016 12:38 PM, Marek Olšák wrote:
> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>>> From: Marek Olšák 
>>>
>>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>>> but not SI & CI, which can't disable denorms for those instructions.
>>
>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>> spec say about denormal handing?
> 
> Yes, I know why. The patch explain everything as far as I can see
> though. What isn't clear?
> 
> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
> and accept FP16 denormals.
> VI: Supports FP16. FP16 denormal support is now configurable and
> affects FP16 conversions as well.(shared setting with FP64).
> 
> OpenGL doesn't require denormals. Piglit does. I think this is
> incorrect piglit behavior.

I submitted a public spec bug for this issue:

https://www.khronos.org/bugzilla/show_bug.cgi?id=1460

I'm investigating whether a similar bug is needed for the SPIR-V
specification.

I think an argument can be made for either the flush-to-zero or
non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
packHalf2x16.  The only place in the GLSL 4.50.5 specification that
mentions subnormal values is section 4.7.1 (Range and Precision).

"The precision of stored single- and double-precision floating-point
variables is defined by the IEEE 754 standard for 32-bit and 64-bit
floating-point numbersAny denormalized value input into a
shader or potentially generated by any operation in a shader can be
flushed to 0."

Since there is no half-precision type in desktop GLSL, there is no
mention of 16-bit subnormal values.  As Roland mentioned before, all
16-bit subnormal values values are 32-bit normal values.

As I mentioned before, from the point of view of an application
developer, the flush-to-zero behavior for unpackHalf2x16 is both
surprising and awful. :)

While I think an argument can be made for either behavior, I also think
the argument for the non-flush-to-zero behavior is slightly stronger.
The case for flush-to-zero based on the above spec quotation fails for
two reasons.  First, the "input into [the] shader" is not a subnormal
number.  It is an integer.  Second, the "[value] potentially generated
by [the] operation" is not subnormal in single-precision.

We've already determined that NVIDIA closed-source drivers do not flush
to zero.  I'm curious to know what AMD's closed-source drivers do for
16-bit subnormal values supplied to unpackHalf2x16.  If they do not
flush to zero, then you had better believe that applications depend on
that behavior... and that also means that it doesn't matter very much
what piglit does or the spec does (or does not) say.  This is the sort
of situation where the spec changes to match application expectations
and shipping implementations... and Mesa drivers change to follow.  This
isn't even close to the first time through that loop.

>>> ---
>>>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>>>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>>>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>>>  3 files changed, 29 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>>> b/src/gallium/drivers/radeonsi/si_shader.c
>>> index a4680ce..3f1db70 100644
>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>
>>>   si_shader_binary_read_config(binary, conf, 0);
>>>
>>> + /* Enable 64-bit and 16-bit denormals, because there is no performance
>>> +  * cost.
>>> +  *
>>> +  * If denormals are enabled, all floating-point output modifiers are
>>> +  * ignored.
>>> +  *
>>> +  * Don't enable denormals for 32-bit floats, because:
>>> +  * - Floating-point output modifiers would be ignored by the hw.
>>> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
>>> +  *   have to stop using those.
>>> +  * - SI & CI would be very slow.
>>> +  */
>>> + conf->float_mode |= V_00B028_FP_64_DENORMS;
>>> +
>>
>> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
> do on SI & CI.
> 
>>
>> We should tell the compiler we are enabling fp-64 denorms by adding
>> +fp64-denormals to the feature string.  It would also be better to
>> read the float_mode value from the config registers emitted by the
>> compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.
> 
> Marek
> 

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Roland Scheidegger
Am 09.02.2016 um 00:02 schrieb Ian Romanick:
> On 02/08/2016 12:38 PM, Marek Olšák wrote:
>> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
>>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
 From: Marek Olšák 

 This fixes FP16 conversion instructions for VI, which has 16-bit floats,
 but not SI & CI, which can't disable denorms for those instructions.
>>>
>>> Do you know why this fixes FP16 conversions?  What does the OpenGL
>>> spec say about denormal handing?
>>
>> Yes, I know why. The patch explain everything as far as I can see
>> though. What isn't clear?
>>
>> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
>> and accept FP16 denormals.
>> VI: Supports FP16. FP16 denormal support is now configurable and
>> affects FP16 conversions as well.(shared setting with FP64).
>>
>> OpenGL doesn't require denormals. Piglit does. I think this is
>> incorrect piglit behavior.
> 
> I submitted a public spec bug for this issue:
> 
> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460
> 
> I'm investigating whether a similar bug is needed for the SPIR-V
> specification.
> 
> I think an argument can be made for either the flush-to-zero or
> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
> mentions subnormal values is section 4.7.1 (Range and Precision).
> 
> "The precision of stored single- and double-precision floating-point
> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
> floating-point numbersAny denormalized value input into a
> shader or potentially generated by any operation in a shader can be
> flushed to 0."
> 
> Since there is no half-precision type in desktop GLSL, there is no
> mention of 16-bit subnormal values.  As Roland mentioned before, all
> 16-bit subnormal values values are 32-bit normal values.
> 
> As I mentioned before, from the point of view of an application
> developer, the flush-to-zero behavior for unpackHalf2x16 is both
> surprising and awful. :)
> 
> While I think an argument can be made for either behavior, I also think
> the argument for the non-flush-to-zero behavior is slightly stronger.
> The case for flush-to-zero based on the above spec quotation fails for
> two reasons.  First, the "input into [the] shader" is not a subnormal
> number.  It is an integer.  Second, the "[value] potentially generated
> by [the] operation" is not subnormal in single-precision.

I don't disagree with that, however OTOH you could make an argument that
such a strong guarantee for packed half floats is inconsistent with
what's required for them elsewhere in GL. In particular half float
texture formats - these are still based on ARB_half_float_pixel. Which
says denormals are optional, infs are optional, NaNs are optional -
albeit that's not any different to ordinary floats...
(And I still have the problem that d3d10 wants trunc behavior instead of
round... fwiw the precedent there in GL is also for r11g11b10 format,
which says round-to-nearest recommended but trunc allowed, and all too
large finite numbers converted to max finite (which is inconsistent with
nearest rounding). The spec is completely silent both within GLSL or GL
how rounding should be done for fp32 to fp16, albeit I don't disagree
round-to-nearest seems the most reasonable.)

Roland


> 
> We've already determined that NVIDIA closed-source drivers do not flush
> to zero.  I'm curious to know what AMD's closed-source drivers do for
> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
> flush to zero, then you had better believe that applications depend on
> that behavior... and that also means that it doesn't matter very much
> what piglit does or the spec does (or does not) say.  This is the sort
> of situation where the spec changes to match application expectations
> and shipping implementations... and Mesa drivers change to follow.  This
> isn't even close to the first time through that loop.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Ian Romanick
On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
>> On 02/08/2016 12:38 PM, Marek Olšák wrote:
>>> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
 On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> From: Marek Olšák 
>
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.

 Do you know why this fixes FP16 conversions?  What does the OpenGL
 spec say about denormal handing?
>>>
>>> Yes, I know why. The patch explain everything as far as I can see
>>> though. What isn't clear?
>>>
>>> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
>>> and accept FP16 denormals.
>>> VI: Supports FP16. FP16 denormal support is now configurable and
>>> affects FP16 conversions as well.(shared setting with FP64).
>>>
>>> OpenGL doesn't require denormals. Piglit does. I think this is
>>> incorrect piglit behavior.
>>
>> I submitted a public spec bug for this issue:
>>
>> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460
>>
>> I'm investigating whether a similar bug is needed for the SPIR-V
>> specification.
>>
>> I think an argument can be made for either the flush-to-zero or
>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>
>> "The precision of stored single- and double-precision floating-point
>> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>> floating-point numbersAny denormalized value input into a
>> shader or potentially generated by any operation in a shader can be
>> flushed to 0."
>>
>> Since there is no half-precision type in desktop GLSL, there is no
>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>> 16-bit subnormal values values are 32-bit normal values.
>>
>> As I mentioned before, from the point of view of an application
>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>> surprising and awful. :)
>>
>> While I think an argument can be made for either behavior, I also think
>> the argument for the non-flush-to-zero behavior is slightly stronger.
>> The case for flush-to-zero based on the above spec quotation fails for
>> two reasons.  First, the "input into [the] shader" is not a subnormal
>> number.  It is an integer.  Second, the "[value] potentially generated
>> by [the] operation" is not subnormal in single-precision.
> 
> I don't disagree with that, however OTOH you could make an argument that
> such a strong guarantee for packed half floats is inconsistent with
> what's required for them elsewhere in GL. In particular half float
> texture formats - these are still based on ARB_half_float_pixel. Which
> says denormals are optional, infs are optional, NaNs are optional -
> albeit that's not any different to ordinary floats...

Thanks for mentioning this. :)  The same issue had occurred to me, and I
was trying to find some relevant text in the GL spec.  I hadn't thought
to look in the extension spec.

> (And I still have the problem that d3d10 wants trunc behavior instead of
> round... fwiw the precedent there in GL is also for r11g11b10 format,
> which says round-to-nearest recommended but trunc allowed, and all too
> large finite numbers converted to max finite (which is inconsistent with
> nearest rounding). The spec is completely silent both within GLSL or GL
> how rounding should be done for fp32 to fp16, albeit I don't disagree
> round-to-nearest seems the most reasonable.)

The GLSL spec isn't silent.  Section 4.7.1 explicitly says, "The
rounding mode cannot be set and is undefined."

> Roland
> 
>> We've already determined that NVIDIA closed-source drivers do not flush
>> to zero.  I'm curious to know what AMD's closed-source drivers do for
>> 16-bit subnormal values supplied to unpackHalf2x16.  If they do not
>> flush to zero, then you had better believe that applications depend on
>> that behavior... and that also means that it doesn't matter very much
>> what piglit does or the spec does (or does not) say.  This is the sort
>> of situation where the spec changes to match application expectations
>> and shipping implementations... and Mesa drivers change to follow.  This
>> isn't even close to the first time through that loop.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-06 Thread Marek Olšák
From: Marek Olšák 

This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.
---
 src/gallium/drivers/radeonsi/si_shader.c| 14 ++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
 src/gallium/drivers/radeonsi/sid.h  |  3 +++
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a4680ce..3f1db70 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
 
si_shader_binary_read_config(binary, conf, 0);
 
+   /* Enable 64-bit and 16-bit denormals, because there is no performance
+* cost.
+*
+* If denormals are enabled, all floating-point output modifiers are
+* ignored.
+*
+* Don't enable denormals for 32-bit floats, because:
+* - Floating-point output modifiers would be ignored by the hw.
+* - Some opcodes don't support denormals, such as v_mad_f32. We would
+*   have to stop using those.
+* - SI & CI would be very slow.
+*/
+   conf->float_mode |= V_00B028_FP_64_DENORMS;
+
FREE(binary->config);
FREE(binary->global_symbol_offsets);
binary->config = NULL;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index ce795c0..77a4e47 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
4) |
   S_00B528_SGPRS((num_sgprs - 1) / 8) |
   S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B528_DX10_CLAMP(1);
+  S_00B528_DX10_CLAMP(1) |
+  S_00B528_FLOAT_MODE(shader->config.float_mode);
shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
   
S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
 }
@@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
   S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B428_SGPRS((num_sgprs - 1) / 8) |
-  S_00B428_DX10_CLAMP(1));
+  S_00B428_DX10_CLAMP(1) |
+  S_00B428_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
   S_00B42C_USER_SGPR(num_user_sgprs) |
   
S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
@@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
   S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B328_SGPRS((num_sgprs - 1) / 8) |
   S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B328_DX10_CLAMP(1));
+  S_00B328_DX10_CLAMP(1) |
+  S_00B328_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
   S_00B32C_USER_SGPR(num_user_sgprs) |
   
S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
@@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
   S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B228_SGPRS((num_sgprs - 1) / 8) |
-  S_00B228_DX10_CLAMP(1));
+  S_00B228_DX10_CLAMP(1) |
+  S_00B228_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
   S_00B22C_USER_SGPR(num_user_sgprs) |
   
S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
@@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
si_shader *gs)
   S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B128_SGPRS((num_sgprs - 1) / 8) |
   S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B128_DX10_CLAMP(1));
+  S_00B128_DX10_CLAMP(1) |
+  S_00B128_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B12C_SPI_SHADER_PGM_RSRC2_VS,
   S_00B12C_USER_SGPR(num_user_sgprs) |
   S_00B12C_SO_BASE0_EN(!!shader->selector->so.stride[0]) |
@@ -567,7 +572,8 @@ static void