Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Fri, Apr 29, 2016 at 11:42 AM, Ian Romanickwrote: > On 02/09/2016 12:02 AM, Ian Romanick wrote: >> I submitted a public spec bug for this issue: >> >> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460 >> >> I'm investigating whether a similar bug is needed for the SPIR-V >> specification. >> >> I think an argument can be made for either the flush-to-zero or >> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) >> packHalf2x16. The only place in the GLSL 4.50.5 specification that >> mentions subnormal values is section 4.7.1 (Range and Precision). >> >> "The precision of stored single- and double-precision floating-point >> variables is defined by the IEEE 754 standard for 32-bit and 64-bit >> floating-point numbersAny denormalized value input into a >> shader or potentially generated by any operation in a shader can be >> flushed to 0." >> >> Since there is no half-precision type in desktop GLSL, there is no >> mention of 16-bit subnormal values. As Roland mentioned before, all >> 16-bit subnormal values values are 32-bit normal values. >> >> As I mentioned before, from the point of view of an application >> developer, the flush-to-zero behavior for unpackHalf2x16 is both >> surprising and awful. :) >> >> While I think an argument can be made for either behavior, I also think >> the argument for the non-flush-to-zero behavior is slightly stronger. >> The case for flush-to-zero based on the above spec quotation fails for >> two reasons. First, the "input into [the] shader" is not a subnormal >> number. It is an integer. Second, the "[value] potentially generated >> by [the] operation" is not subnormal in single-precision. >> >> We've already determined that NVIDIA closed-source drivers do not flush >> to zero. I'm curious to know what AMD's closed-source drivers do for >> 16-bit subnormal values supplied to unpackHalf2x16. If they do not >> flush to zero, then you had better believe that applications depend on >> that behavior... and that also means that it doesn't matter very much >> what piglit does or the spec does (or does not) say. This is the sort >> of situation where the spec changes to match application expectations >> and shipping implementations... and Mesa drivers change to follow. This >> isn't even close to the first time through that loop. > > There is finally a conclusion to this issue. The GLSL ES 3.2 spec says: > > "Returns a two-component floating-point vector with components > obtained by unpacking a 32-bit unsigned integer into a pair of > 16-bit values, interpreting those values as 16-bit floating-point > numbers according to the OpenGL ES Specification, and converting > them to 32-bit floating-point values." > > Since the OpenGL ES specification allows 16-bit floating-point denorms > to flush to zero, the "interpreting those values as 16-bit > floating-point numbers according to the OpenGL ES Specification" allows > the flush to zero. > > The bug has been closed with the following comment: > >> For current versions and hardware, we have decided both ways have to >> be supported: it okay to flush to zero and it is okay to not flush to >> zero. >> >> Due to the general desire is to preserve values, we are looking at >> future versions revisiting this current state. > > So... flushing is okay, but it may be okay in the future. I don't think > that helps answer the question about whether or not we should keep the > piglit test that expects non-flush-to-zero behavior. > Thanks for the info. Since radeonsi now preserves 16-bit denorms on all chips, we should be fine. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On 02/09/2016 12:02 AM, Ian Romanick wrote: > I submitted a public spec bug for this issue: > > https://www.khronos.org/bugzilla/show_bug.cgi?id=1460 > > I'm investigating whether a similar bug is needed for the SPIR-V > specification. > > I think an argument can be made for either the flush-to-zero or > non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) > packHalf2x16. The only place in the GLSL 4.50.5 specification that > mentions subnormal values is section 4.7.1 (Range and Precision). > > "The precision of stored single- and double-precision floating-point > variables is defined by the IEEE 754 standard for 32-bit and 64-bit > floating-point numbersAny denormalized value input into a > shader or potentially generated by any operation in a shader can be > flushed to 0." > > Since there is no half-precision type in desktop GLSL, there is no > mention of 16-bit subnormal values. As Roland mentioned before, all > 16-bit subnormal values values are 32-bit normal values. > > As I mentioned before, from the point of view of an application > developer, the flush-to-zero behavior for unpackHalf2x16 is both > surprising and awful. :) > > While I think an argument can be made for either behavior, I also think > the argument for the non-flush-to-zero behavior is slightly stronger. > The case for flush-to-zero based on the above spec quotation fails for > two reasons. First, the "input into [the] shader" is not a subnormal > number. It is an integer. Second, the "[value] potentially generated > by [the] operation" is not subnormal in single-precision. > > We've already determined that NVIDIA closed-source drivers do not flush > to zero. I'm curious to know what AMD's closed-source drivers do for > 16-bit subnormal values supplied to unpackHalf2x16. If they do not > flush to zero, then you had better believe that applications depend on > that behavior... and that also means that it doesn't matter very much > what piglit does or the spec does (or does not) say. This is the sort > of situation where the spec changes to match application expectations > and shipping implementations... and Mesa drivers change to follow. This > isn't even close to the first time through that loop. There is finally a conclusion to this issue. The GLSL ES 3.2 spec says: "Returns a two-component floating-point vector with components obtained by unpacking a 32-bit unsigned integer into a pair of 16-bit values, interpreting those values as 16-bit floating-point numbers according to the OpenGL ES Specification, and converting them to 32-bit floating-point values." Since the OpenGL ES specification allows 16-bit floating-point denorms to flush to zero, the "interpreting those values as 16-bit floating-point numbers according to the OpenGL ES Specification" allows the flush to zero. The bug has been closed with the following comment: > For current versions and hardware, we have decided both ways have to > be supported: it okay to flush to zero and it is okay to not flush to > zero. > > Due to the general desire is to preserve values, we are looking at > future versions revisiting this current state. So... flushing is okay, but it may be okay in the future. I don't think that helps answer the question about whether or not we should keep the piglit test that expects non-flush-to-zero behavior. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
> On Feb 9, 2016, at 11:23, Tom Stellardwrote: > > We should still add +fp64-denormals even if the backend doesn't do > anything with it now. This is the default, so it doesn’t really matter anyway. -Matt___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Mon, Feb 08, 2016 at 09:38:32PM +0100, Marek Olšák wrote: > On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: > > On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: > >> From: Marek Olšák > >> > >> This fixes FP16 conversion instructions for VI, which has 16-bit floats, > >> but not SI & CI, which can't disable denorms for those instructions. > > > > Do you know why this fixes FP16 conversions? What does the OpenGL > > spec say about denormal handing? > > Yes, I know why. The patch explain everything as far as I can see > though. What isn't clear? > I got it now. > SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit > and accept FP16 denormals. > VI: Supports FP16. FP16 denormal support is now configurable and > affects FP16 conversions as well.(shared setting with FP64). > > OpenGL doesn't require denormals. Piglit does. I think this is > incorrect piglit behavior. > > > > >> --- > >> src/gallium/drivers/radeonsi/si_shader.c| 14 ++ > >> src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- > >> src/gallium/drivers/radeonsi/sid.h | 3 +++ > >> 3 files changed, 29 insertions(+), 6 deletions(-) > >> > >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c > >> b/src/gallium/drivers/radeonsi/si_shader.c > >> index a4680ce..3f1db70 100644 > >> --- a/src/gallium/drivers/radeonsi/si_shader.c > >> +++ b/src/gallium/drivers/radeonsi/si_shader.c > >> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, > >> > >> si_shader_binary_read_config(binary, conf, 0); > >> > >> + /* Enable 64-bit and 16-bit denormals, because there is no > >> performance > >> + * cost. > >> + * > >> + * If denormals are enabled, all floating-point output modifiers are > >> + * ignored. > >> + * > >> + * Don't enable denormals for 32-bit floats, because: > >> + * - Floating-point output modifiers would be ignored by the hw. > >> + * - Some opcodes don't support denormals, such as v_mad_f32. We > >> would > >> + * have to stop using those. > >> + * - SI & CI would be very slow. > >> + */ > >> + conf->float_mode |= V_00B028_FP_64_DENORMS; > >> + > > > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? > > Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms > do on SI & CI. > > > > > We should tell the compiler we are enabling fp-64 denorms by adding > > +fp64-denormals to the feature string. It would also be better to > > read the float_mode value from the config registers emitted by the > > compiler. > > Yes, I agree, but LLVM only sets these parameters for compute or even > HSA-only kernels, not for graphics shaders. We need to set the mode > for all users _now_, not in 6 months. Last time I looked, > +fp64-denormals had no effect on graphics shaders. > We should still add +fp64-denormals even if the backend doesn't do anything with it now. This will make it easier if we have to use this feature string to enable fix a bug in the backend, because we will just be able to update LLVM. I don't have a problem hard-coding float_mode for now, but once LLVM is emitting the correct thing, we should pull the value from LLVM. -Tom > Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
> On Feb 8, 2016, at 08:08, Tom Stellardwrote: > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? This is the only mode that should ever be used. I’m not sure why these are options. There technically are separate flush on input or flush on output options, but I’m not sure why they would be used.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
> On Feb 8, 2016, at 08:08, Tom Stellardwrote: > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? > > We should tell the compiler we are enabling fp-64 denorms by adding > +fp64-denormals to the feature string. It would also be better to > read the float_mode value from the config registers emitted by the > compiler. Yes, the runtime here should read the value out of the binary and enable it in the compiler rather than the runtime hardcoding it. If you wanted to load a shader with different FP rules for example it should be able to switch. -Matt___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: > From: Marek Olšák> > This fixes FP16 conversion instructions for VI, which has 16-bit floats, > but not SI & CI, which can't disable denorms for those instructions. Do you know why this fixes FP16 conversions? What does the OpenGL spec say about denormal handing? > --- > src/gallium/drivers/radeonsi/si_shader.c| 14 ++ > src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- > src/gallium/drivers/radeonsi/sid.h | 3 +++ > 3 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_shader.c > b/src/gallium/drivers/radeonsi/si_shader.c > index a4680ce..3f1db70 100644 > --- a/src/gallium/drivers/radeonsi/si_shader.c > +++ b/src/gallium/drivers/radeonsi/si_shader.c > @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, > > si_shader_binary_read_config(binary, conf, 0); > > + /* Enable 64-bit and 16-bit denormals, because there is no performance > + * cost. > + * > + * If denormals are enabled, all floating-point output modifiers are > + * ignored. > + * > + * Don't enable denormals for 32-bit floats, because: > + * - Floating-point output modifiers would be ignored by the hw. > + * - Some opcodes don't support denormals, such as v_mad_f32. We would > + * have to stop using those. > + * - SI & CI would be very slow. > + */ > + conf->float_mode |= V_00B028_FP_64_DENORMS; > + Do SI/CI support fp64 denorms? If so, won't this hurt performance? We should tell the compiler we are enabling fp-64 denorms by adding +fp64-denormals to the feature string. It would also be better to read the float_mode value from the config registers emitted by the compiler. -Tom > FREE(binary->config); > FREE(binary->global_symbol_offsets); > binary->config = NULL; > diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c > b/src/gallium/drivers/radeonsi/si_state_shaders.c > index ce795c0..77a4e47 100644 > --- a/src/gallium/drivers/radeonsi/si_state_shaders.c > +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c > @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader) > shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / > 4) | > S_00B528_SGPRS((num_sgprs - 1) / 8) | > S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | > -S_00B528_DX10_CLAMP(1); > +S_00B528_DX10_CLAMP(1) | > +S_00B528_FLOAT_MODE(shader->config.float_mode); > shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | > > S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0); > } > @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader) > si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, > S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B428_SGPRS((num_sgprs - 1) / 8) | > -S_00B428_DX10_CLAMP(1)); > +S_00B428_DX10_CLAMP(1) | > +S_00B428_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, > S_00B42C_USER_SGPR(num_user_sgprs) | > > S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader) > S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B328_SGPRS((num_sgprs - 1) / 8) | > S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) | > -S_00B328_DX10_CLAMP(1)); > +S_00B328_DX10_CLAMP(1) | > +S_00B328_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES, > S_00B32C_USER_SGPR(num_user_sgprs) | > > S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader) > si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS, > S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B228_SGPRS((num_sgprs - 1) / 8) | > -S_00B228_DX10_CLAMP(1)); > +S_00B228_DX10_CLAMP(1) | > +S_00B228_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS, > S_00B22C_USER_SGPR(num_user_sgprs) | > > S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct > si_shader *gs) > S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) | >
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Mon, Feb 8, 2016 at 1:11 PM, Roland Scheideggerwrote: > This looks good to me, albeit I know nothing about the hw. > So VI could do (just with some restrictios) even full-speed fp32 denorms > whereas SI/CI can't? Interesting, I suppose that would be intended for Yes, VI has full-speed fp32 denorms except for a few instructions (e.g. MAD) which can be replaced by other instructions. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
This looks good to me, albeit I know nothing about the hw. So VI could do (just with some restrictios) even full-speed fp32 denorms whereas SI/CI can't? Interesting, I suppose that would be intended for compute. intel x86 can't even do that (actually, I think skylake can), though certainly other cpus could do that for ages. (Albeit there's still nothing in the glsl spec which says this is required for fp16 pack...) Roland Am 06.02.2016 um 13:15 schrieb Marek Olšák: > From: Marek Olšák> > This fixes FP16 conversion instructions for VI, which has 16-bit floats, > but not SI & CI, which can't disable denorms for those instructions. > --- > src/gallium/drivers/radeonsi/si_shader.c| 14 ++ > src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- > src/gallium/drivers/radeonsi/sid.h | 3 +++ > 3 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_shader.c > b/src/gallium/drivers/radeonsi/si_shader.c > index a4680ce..3f1db70 100644 > --- a/src/gallium/drivers/radeonsi/si_shader.c > +++ b/src/gallium/drivers/radeonsi/si_shader.c > @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, > > si_shader_binary_read_config(binary, conf, 0); > > + /* Enable 64-bit and 16-bit denormals, because there is no performance > + * cost. > + * > + * If denormals are enabled, all floating-point output modifiers are > + * ignored. > + * > + * Don't enable denormals for 32-bit floats, because: > + * - Floating-point output modifiers would be ignored by the hw. > + * - Some opcodes don't support denormals, such as v_mad_f32. We would > + * have to stop using those. > + * - SI & CI would be very slow. > + */ > + conf->float_mode |= V_00B028_FP_64_DENORMS; > + > FREE(binary->config); > FREE(binary->global_symbol_offsets); > binary->config = NULL; > diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c > b/src/gallium/drivers/radeonsi/si_state_shaders.c > index ce795c0..77a4e47 100644 > --- a/src/gallium/drivers/radeonsi/si_state_shaders.c > +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c > @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader) > shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / > 4) | > S_00B528_SGPRS((num_sgprs - 1) / 8) | > S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | > -S_00B528_DX10_CLAMP(1); > +S_00B528_DX10_CLAMP(1) | > +S_00B528_FLOAT_MODE(shader->config.float_mode); > shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | > > S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0); > } > @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader) > si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, > S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B428_SGPRS((num_sgprs - 1) / 8) | > -S_00B428_DX10_CLAMP(1)); > +S_00B428_DX10_CLAMP(1) | > +S_00B428_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, > S_00B42C_USER_SGPR(num_user_sgprs) | > > S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader) > S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B328_SGPRS((num_sgprs - 1) / 8) | > S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) | > -S_00B328_DX10_CLAMP(1)); > +S_00B328_DX10_CLAMP(1) | > +S_00B328_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES, > S_00B32C_USER_SGPR(num_user_sgprs) | > > S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader) > si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS, > S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) | > S_00B228_SGPRS((num_sgprs - 1) / 8) | > -S_00B228_DX10_CLAMP(1)); > +S_00B228_DX10_CLAMP(1) | > +S_00B228_FLOAT_MODE(shader->config.float_mode)); > si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS, > S_00B22C_USER_SGPR(num_user_sgprs) | > > S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); > @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct > si_shader *gs) >
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
> On Feb 8, 2016, at 12:38, Marek Olšákwrote: > >> >> We should tell the compiler we are enabling fp-64 denorms by adding >> +fp64-denormals to the feature string. It would also be better to >> read the float_mode value from the config registers emitted by the >> compiler. > > Yes, I agree, but LLVM only sets these parameters for compute or even > HSA-only kernels, not for graphics shaders. We need to set the mode > for all users _now_, not in 6 months. Last time I looked, > +fp64-denormals had no effect on graphics shaders. This is a bug. I think I left these because the config register macro names were different for the other shader types, even though they appeared to be the same thing. -Matt___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On 06.02.2016 07:15, Marek Olšák wrote: From: Marek OlšákThis fixes FP16 conversion instructions for VI, which has 16-bit floats, but not SI & CI, which can't disable denorms for those instructions. Reviewed-by: Nicolai Hähnle --- src/gallium/drivers/radeonsi/si_shader.c| 14 ++ src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- src/gallium/drivers/radeonsi/sid.h | 3 +++ 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index a4680ce..3f1db70 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, si_shader_binary_read_config(binary, conf, 0); + /* Enable 64-bit and 16-bit denormals, because there is no performance +* cost. +* +* If denormals are enabled, all floating-point output modifiers are +* ignored. +* +* Don't enable denormals for 32-bit floats, because: +* - Floating-point output modifiers would be ignored by the hw. +* - Some opcodes don't support denormals, such as v_mad_f32. We would +* have to stop using those. +* - SI & CI would be very slow. +*/ + conf->float_mode |= V_00B028_FP_64_DENORMS; + FREE(binary->config); FREE(binary->global_symbol_offsets); binary->config = NULL; diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index ce795c0..77a4e47 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader) shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B528_SGPRS((num_sgprs - 1) / 8) | S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B528_DX10_CLAMP(1); + S_00B528_DX10_CLAMP(1) | + S_00B528_FLOAT_MODE(shader->config.float_mode); shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0); } @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader) si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B428_SGPRS((num_sgprs - 1) / 8) | - S_00B428_DX10_CLAMP(1)); + S_00B428_DX10_CLAMP(1) | + S_00B428_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, S_00B42C_USER_SGPR(num_user_sgprs) | S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader) S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B328_SGPRS((num_sgprs - 1) / 8) | S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B328_DX10_CLAMP(1)); + S_00B328_DX10_CLAMP(1) | + S_00B328_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES, S_00B32C_USER_SGPR(num_user_sgprs) | S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader) si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS, S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B228_SGPRS((num_sgprs - 1) / 8) | - S_00B228_DX10_CLAMP(1)); + S_00B228_DX10_CLAMP(1) | + S_00B228_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS, S_00B22C_USER_SGPR(num_user_sgprs) | S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct si_shader *gs) S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B128_SGPRS((num_sgprs - 1) / 8) | S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B128_DX10_CLAMP(1)); + S_00B128_DX10_CLAMP(1) | + S_00B128_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B12C_SPI_SHADER_PGM_RSRC2_VS, S_00B12C_USER_SGPR(num_user_sgprs) |
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Mon, Feb 8, 2016 at 5:16 PM, Matt Arsenaultwrote: > > On Feb 8, 2016, at 08:08, Tom Stellard wrote: > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? > > We should tell the compiler we are enabling fp-64 denorms by adding > +fp64-denormals to the feature string. It would also be better to > read the float_mode value from the config registers emitted by the > compiler. > > > Yes, the runtime here should read the value out of the binary and enable it > in the compiler rather than the runtime hardcoding it. If you wanted to load > a shader with different FP rules for example it should be able to switch. This is not the runtime. This is the compiler. :) It's the middle-end though, not the back-end. I would use +fp64-denormals if it did something to graphics shaders. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: > On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: >> From: Marek Olšák >> >> This fixes FP16 conversion instructions for VI, which has 16-bit floats, >> but not SI & CI, which can't disable denorms for those instructions. > > Do you know why this fixes FP16 conversions? What does the OpenGL > spec say about denormal handing? Yes, I know why. The patch explain everything as far as I can see though. What isn't clear? SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit and accept FP16 denormals. VI: Supports FP16. FP16 denormal support is now configurable and affects FP16 conversions as well.(shared setting with FP64). OpenGL doesn't require denormals. Piglit does. I think this is incorrect piglit behavior. > >> --- >> src/gallium/drivers/radeonsi/si_shader.c| 14 ++ >> src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- >> src/gallium/drivers/radeonsi/sid.h | 3 +++ >> 3 files changed, 29 insertions(+), 6 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >> b/src/gallium/drivers/radeonsi/si_shader.c >> index a4680ce..3f1db70 100644 >> --- a/src/gallium/drivers/radeonsi/si_shader.c >> +++ b/src/gallium/drivers/radeonsi/si_shader.c >> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, >> >> si_shader_binary_read_config(binary, conf, 0); >> >> + /* Enable 64-bit and 16-bit denormals, because there is no performance >> + * cost. >> + * >> + * If denormals are enabled, all floating-point output modifiers are >> + * ignored. >> + * >> + * Don't enable denormals for 32-bit floats, because: >> + * - Floating-point output modifiers would be ignored by the hw. >> + * - Some opcodes don't support denormals, such as v_mad_f32. We would >> + * have to stop using those. >> + * - SI & CI would be very slow. >> + */ >> + conf->float_mode |= V_00B028_FP_64_DENORMS; >> + > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms do on SI & CI. > > We should tell the compiler we are enabling fp-64 denorms by adding > +fp64-denormals to the feature string. It would also be better to > read the float_mode value from the config registers emitted by the > compiler. Yes, I agree, but LLVM only sets these parameters for compute or even HSA-only kernels, not for graphics shaders. We need to set the mode for all users _now_, not in 6 months. Last time I looked, +fp64-denormals had no effect on graphics shaders. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
Am 09.02.2016 um 00:53 schrieb Ian Romanick: > On 02/08/2016 03:37 PM, Roland Scheidegger wrote: >> Am 09.02.2016 um 00:02 schrieb Ian Romanick: >>> On 02/08/2016 12:38 PM, Marek Olšák wrote: On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: > On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: >> From: Marek Olšák >> >> This fixes FP16 conversion instructions for VI, which has 16-bit floats, >> but not SI & CI, which can't disable denorms for those instructions. > > Do you know why this fixes FP16 conversions? What does the OpenGL > spec say about denormal handing? Yes, I know why. The patch explain everything as far as I can see though. What isn't clear? SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit and accept FP16 denormals. VI: Supports FP16. FP16 denormal support is now configurable and affects FP16 conversions as well.(shared setting with FP64). OpenGL doesn't require denormals. Piglit does. I think this is incorrect piglit behavior. >>> >>> I submitted a public spec bug for this issue: >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460=BQIDaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg= >>> >>> >>> I'm investigating whether a similar bug is needed for the SPIR-V >>> specification. >>> >>> I think an argument can be made for either the flush-to-zero or >>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) >>> packHalf2x16. The only place in the GLSL 4.50.5 specification that >>> mentions subnormal values is section 4.7.1 (Range and Precision). >>> >>> "The precision of stored single- and double-precision floating-point >>> variables is defined by the IEEE 754 standard for 32-bit and 64-bit >>> floating-point numbersAny denormalized value input into a >>> shader or potentially generated by any operation in a shader can be >>> flushed to 0." >>> >>> Since there is no half-precision type in desktop GLSL, there is no >>> mention of 16-bit subnormal values. As Roland mentioned before, all >>> 16-bit subnormal values values are 32-bit normal values. >>> >>> As I mentioned before, from the point of view of an application >>> developer, the flush-to-zero behavior for unpackHalf2x16 is both >>> surprising and awful. :) >>> >>> While I think an argument can be made for either behavior, I also think >>> the argument for the non-flush-to-zero behavior is slightly stronger. >>> The case for flush-to-zero based on the above spec quotation fails for >>> two reasons. First, the "input into [the] shader" is not a subnormal >>> number. It is an integer. Second, the "[value] potentially generated >>> by [the] operation" is not subnormal in single-precision. >> >> I don't disagree with that, however OTOH you could make an argument that >> such a strong guarantee for packed half floats is inconsistent with >> what's required for them elsewhere in GL. In particular half float >> texture formats - these are still based on ARB_half_float_pixel. Which >> says denormals are optional, infs are optional, NaNs are optional - >> albeit that's not any different to ordinary floats... > > Thanks for mentioning this. :) The same issue had occurred to me, and I > was trying to find some relevant text in the GL spec. I hadn't thought > to look in the extension spec. GL core spec 4.5 actually mentions pretty much the same within the generic numeric bits, section 2.3.4.2 - except the extension bit has explicitly listed that exponent 0 and mantissa non-zero may be decoded to zero (and similar for infs, nans). But the core bits text still mentions just that "providing a denormalized number or negative zero to GL must yield predictable results" so flush to zero is apparently still allowed. > >> (And I still have the problem that d3d10 wants trunc behavior instead of >> round... fwiw the precedent there in GL is also for r11g11b10 format, >> which says round-to-nearest recommended but trunc allowed, and all too >> large finite numbers converted to max finite (which is inconsistent with >> nearest rounding). The spec is completely silent both within GLSL or GL >> how rounding should be done for fp32 to fp16, albeit I don't disagree >> round-to-nearest seems the most reasonable.) > > The GLSL spec isn't silent. Section 4.7.1 explicitly says, "The > rounding mode cannot be set and is undefined." Yes, but at least to me it's not really obvious this applies to all operations - and at least the basic operations say "must be correctly rounded", what does this even mean if the rounding mode isn't defined in the first place? Would the rounding mode have to be consistent for all operations, so, always "trunc" for all operations would
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
FWIW, D3D requires round-toward-zero for conversion to FP16 colorbuffers. radeonsi does the same: round-toward-zero for colorbuffers and round-to-nearest-even for the GLSL packing functions. This looks like an inconsistency that may cause issues though. Marek On Tue, Feb 9, 2016 at 1:12 AM, Roland Scheideggerwrote: > Am 09.02.2016 um 00:53 schrieb Ian Romanick: >> On 02/08/2016 03:37 PM, Roland Scheidegger wrote: >>> Am 09.02.2016 um 00:02 schrieb Ian Romanick: On 02/08/2016 12:38 PM, Marek Olšák wrote: > On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard wrote: >> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: >>> From: Marek Olšák >>> >>> This fixes FP16 conversion instructions for VI, which has 16-bit floats, >>> but not SI & CI, which can't disable denorms for those instructions. >> >> Do you know why this fixes FP16 conversions? What does the OpenGL >> spec say about denormal handing? > > Yes, I know why. The patch explain everything as far as I can see > though. What isn't clear? > > SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit > and accept FP16 denormals. > VI: Supports FP16. FP16 denormal support is now configurable and > affects FP16 conversions as well.(shared setting with FP64). > > OpenGL doesn't require denormals. Piglit does. I think this is > incorrect piglit behavior. I submitted a public spec bug for this issue: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460=BQIDaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg= I'm investigating whether a similar bug is needed for the SPIR-V specification. I think an argument can be made for either the flush-to-zero or non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) packHalf2x16. The only place in the GLSL 4.50.5 specification that mentions subnormal values is section 4.7.1 (Range and Precision). "The precision of stored single- and double-precision floating-point variables is defined by the IEEE 754 standard for 32-bit and 64-bit floating-point numbersAny denormalized value input into a shader or potentially generated by any operation in a shader can be flushed to 0." Since there is no half-precision type in desktop GLSL, there is no mention of 16-bit subnormal values. As Roland mentioned before, all 16-bit subnormal values values are 32-bit normal values. As I mentioned before, from the point of view of an application developer, the flush-to-zero behavior for unpackHalf2x16 is both surprising and awful. :) While I think an argument can be made for either behavior, I also think the argument for the non-flush-to-zero behavior is slightly stronger. The case for flush-to-zero based on the above spec quotation fails for two reasons. First, the "input into [the] shader" is not a subnormal number. It is an integer. Second, the "[value] potentially generated by [the] operation" is not subnormal in single-precision. >>> >>> I don't disagree with that, however OTOH you could make an argument that >>> such a strong guarantee for packed half floats is inconsistent with >>> what's required for them elsewhere in GL. In particular half float >>> texture formats - these are still based on ARB_half_float_pixel. Which >>> says denormals are optional, infs are optional, NaNs are optional - >>> albeit that's not any different to ordinary floats... >> >> Thanks for mentioning this. :) The same issue had occurred to me, and I >> was trying to find some relevant text in the GL spec. I hadn't thought >> to look in the extension spec. > GL core spec 4.5 actually mentions pretty much the same within the > generic numeric bits, section 2.3.4.2 - except the extension bit has > explicitly listed that exponent 0 and mantissa non-zero may be decoded > to zero (and similar for infs, nans). But the core bits text still > mentions just that "providing a denormalized number or negative zero to > GL must yield predictable results" so flush to zero is apparently still > allowed. > > >> >>> (And I still have the problem that d3d10 wants trunc behavior instead of >>> round... fwiw the precedent there in GL is also for r11g11b10 format, >>> which says round-to-nearest recommended but trunc allowed, and all too >>> large finite numbers converted to max finite (which is inconsistent with >>> nearest rounding). The spec is completely silent both within GLSL or GL >>> how rounding should be done for fp32 to fp16, albeit I don't disagree >>> round-to-nearest seems the most reasonable.) >> >> The GLSL spec isn't
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On 02/08/2016 12:38 PM, Marek Olšák wrote: > On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: >> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: >>> From: Marek Olšák >>> >>> This fixes FP16 conversion instructions for VI, which has 16-bit floats, >>> but not SI & CI, which can't disable denorms for those instructions. >> >> Do you know why this fixes FP16 conversions? What does the OpenGL >> spec say about denormal handing? > > Yes, I know why. The patch explain everything as far as I can see > though. What isn't clear? > > SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit > and accept FP16 denormals. > VI: Supports FP16. FP16 denormal support is now configurable and > affects FP16 conversions as well.(shared setting with FP64). > > OpenGL doesn't require denormals. Piglit does. I think this is > incorrect piglit behavior. I submitted a public spec bug for this issue: https://www.khronos.org/bugzilla/show_bug.cgi?id=1460 I'm investigating whether a similar bug is needed for the SPIR-V specification. I think an argument can be made for either the flush-to-zero or non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) packHalf2x16. The only place in the GLSL 4.50.5 specification that mentions subnormal values is section 4.7.1 (Range and Precision). "The precision of stored single- and double-precision floating-point variables is defined by the IEEE 754 standard for 32-bit and 64-bit floating-point numbersAny denormalized value input into a shader or potentially generated by any operation in a shader can be flushed to 0." Since there is no half-precision type in desktop GLSL, there is no mention of 16-bit subnormal values. As Roland mentioned before, all 16-bit subnormal values values are 32-bit normal values. As I mentioned before, from the point of view of an application developer, the flush-to-zero behavior for unpackHalf2x16 is both surprising and awful. :) While I think an argument can be made for either behavior, I also think the argument for the non-flush-to-zero behavior is slightly stronger. The case for flush-to-zero based on the above spec quotation fails for two reasons. First, the "input into [the] shader" is not a subnormal number. It is an integer. Second, the "[value] potentially generated by [the] operation" is not subnormal in single-precision. We've already determined that NVIDIA closed-source drivers do not flush to zero. I'm curious to know what AMD's closed-source drivers do for 16-bit subnormal values supplied to unpackHalf2x16. If they do not flush to zero, then you had better believe that applications depend on that behavior... and that also means that it doesn't matter very much what piglit does or the spec does (or does not) say. This is the sort of situation where the spec changes to match application expectations and shipping implementations... and Mesa drivers change to follow. This isn't even close to the first time through that loop. >>> --- >>> src/gallium/drivers/radeonsi/si_shader.c| 14 ++ >>> src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- >>> src/gallium/drivers/radeonsi/sid.h | 3 +++ >>> 3 files changed, 29 insertions(+), 6 deletions(-) >>> >>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >>> b/src/gallium/drivers/radeonsi/si_shader.c >>> index a4680ce..3f1db70 100644 >>> --- a/src/gallium/drivers/radeonsi/si_shader.c >>> +++ b/src/gallium/drivers/radeonsi/si_shader.c >>> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, >>> >>> si_shader_binary_read_config(binary, conf, 0); >>> >>> + /* Enable 64-bit and 16-bit denormals, because there is no performance >>> + * cost. >>> + * >>> + * If denormals are enabled, all floating-point output modifiers are >>> + * ignored. >>> + * >>> + * Don't enable denormals for 32-bit floats, because: >>> + * - Floating-point output modifiers would be ignored by the hw. >>> + * - Some opcodes don't support denormals, such as v_mad_f32. We would >>> + * have to stop using those. >>> + * - SI & CI would be very slow. >>> + */ >>> + conf->float_mode |= V_00B028_FP_64_DENORMS; >>> + >> >> Do SI/CI support fp64 denorms? If so, won't this hurt performance? > > Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms > do on SI & CI. > >> >> We should tell the compiler we are enabling fp-64 denorms by adding >> +fp64-denormals to the feature string. It would also be better to >> read the float_mode value from the config registers emitted by the >> compiler. > > Yes, I agree, but LLVM only sets these parameters for compute or even > HSA-only kernels, not for graphics shaders. We need to set the mode > for all users _now_, not in 6 months. Last time I looked, > +fp64-denormals had no effect on graphics shaders. > > Marek >
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
Am 09.02.2016 um 00:02 schrieb Ian Romanick: > On 02/08/2016 12:38 PM, Marek Olšák wrote: >> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: >>> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: From: Marek Olšák This fixes FP16 conversion instructions for VI, which has 16-bit floats, but not SI & CI, which can't disable denorms for those instructions. >>> >>> Do you know why this fixes FP16 conversions? What does the OpenGL >>> spec say about denormal handing? >> >> Yes, I know why. The patch explain everything as far as I can see >> though. What isn't clear? >> >> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit >> and accept FP16 denormals. >> VI: Supports FP16. FP16 denormal support is now configurable and >> affects FP16 conversions as well.(shared setting with FP64). >> >> OpenGL doesn't require denormals. Piglit does. I think this is >> incorrect piglit behavior. > > I submitted a public spec bug for this issue: > > https://www.khronos.org/bugzilla/show_bug.cgi?id=1460 > > I'm investigating whether a similar bug is needed for the SPIR-V > specification. > > I think an argument can be made for either the flush-to-zero or > non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) > packHalf2x16. The only place in the GLSL 4.50.5 specification that > mentions subnormal values is section 4.7.1 (Range and Precision). > > "The precision of stored single- and double-precision floating-point > variables is defined by the IEEE 754 standard for 32-bit and 64-bit > floating-point numbersAny denormalized value input into a > shader or potentially generated by any operation in a shader can be > flushed to 0." > > Since there is no half-precision type in desktop GLSL, there is no > mention of 16-bit subnormal values. As Roland mentioned before, all > 16-bit subnormal values values are 32-bit normal values. > > As I mentioned before, from the point of view of an application > developer, the flush-to-zero behavior for unpackHalf2x16 is both > surprising and awful. :) > > While I think an argument can be made for either behavior, I also think > the argument for the non-flush-to-zero behavior is slightly stronger. > The case for flush-to-zero based on the above spec quotation fails for > two reasons. First, the "input into [the] shader" is not a subnormal > number. It is an integer. Second, the "[value] potentially generated > by [the] operation" is not subnormal in single-precision. I don't disagree with that, however OTOH you could make an argument that such a strong guarantee for packed half floats is inconsistent with what's required for them elsewhere in GL. In particular half float texture formats - these are still based on ARB_half_float_pixel. Which says denormals are optional, infs are optional, NaNs are optional - albeit that's not any different to ordinary floats... (And I still have the problem that d3d10 wants trunc behavior instead of round... fwiw the precedent there in GL is also for r11g11b10 format, which says round-to-nearest recommended but trunc allowed, and all too large finite numbers converted to max finite (which is inconsistent with nearest rounding). The spec is completely silent both within GLSL or GL how rounding should be done for fp32 to fp16, albeit I don't disagree round-to-nearest seems the most reasonable.) Roland > > We've already determined that NVIDIA closed-source drivers do not flush > to zero. I'm curious to know what AMD's closed-source drivers do for > 16-bit subnormal values supplied to unpackHalf2x16. If they do not > flush to zero, then you had better believe that applications depend on > that behavior... and that also means that it doesn't matter very much > what piglit does or the spec does (or does not) say. This is the sort > of situation where the spec changes to match application expectations > and shipping implementations... and Mesa drivers change to follow. This > isn't even close to the first time through that loop. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
On 02/08/2016 03:37 PM, Roland Scheidegger wrote: > Am 09.02.2016 um 00:02 schrieb Ian Romanick: >> On 02/08/2016 12:38 PM, Marek Olšák wrote: >>> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellardwrote: On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote: > From: Marek Olšák > > This fixes FP16 conversion instructions for VI, which has 16-bit floats, > but not SI & CI, which can't disable denorms for those instructions. Do you know why this fixes FP16 conversions? What does the OpenGL spec say about denormal handing? >>> >>> Yes, I know why. The patch explain everything as far as I can see >>> though. What isn't clear? >>> >>> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit >>> and accept FP16 denormals. >>> VI: Supports FP16. FP16 denormal support is now configurable and >>> affects FP16 conversions as well.(shared setting with FP64). >>> >>> OpenGL doesn't require denormals. Piglit does. I think this is >>> incorrect piglit behavior. >> >> I submitted a public spec bug for this issue: >> >> https://www.khronos.org/bugzilla/show_bug.cgi?id=1460 >> >> I'm investigating whether a similar bug is needed for the SPIR-V >> specification. >> >> I think an argument can be made for either the flush-to-zero or >> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly) >> packHalf2x16. The only place in the GLSL 4.50.5 specification that >> mentions subnormal values is section 4.7.1 (Range and Precision). >> >> "The precision of stored single- and double-precision floating-point >> variables is defined by the IEEE 754 standard for 32-bit and 64-bit >> floating-point numbersAny denormalized value input into a >> shader or potentially generated by any operation in a shader can be >> flushed to 0." >> >> Since there is no half-precision type in desktop GLSL, there is no >> mention of 16-bit subnormal values. As Roland mentioned before, all >> 16-bit subnormal values values are 32-bit normal values. >> >> As I mentioned before, from the point of view of an application >> developer, the flush-to-zero behavior for unpackHalf2x16 is both >> surprising and awful. :) >> >> While I think an argument can be made for either behavior, I also think >> the argument for the non-flush-to-zero behavior is slightly stronger. >> The case for flush-to-zero based on the above spec quotation fails for >> two reasons. First, the "input into [the] shader" is not a subnormal >> number. It is an integer. Second, the "[value] potentially generated >> by [the] operation" is not subnormal in single-precision. > > I don't disagree with that, however OTOH you could make an argument that > such a strong guarantee for packed half floats is inconsistent with > what's required for them elsewhere in GL. In particular half float > texture formats - these are still based on ARB_half_float_pixel. Which > says denormals are optional, infs are optional, NaNs are optional - > albeit that's not any different to ordinary floats... Thanks for mentioning this. :) The same issue had occurred to me, and I was trying to find some relevant text in the GL spec. I hadn't thought to look in the extension spec. > (And I still have the problem that d3d10 wants trunc behavior instead of > round... fwiw the precedent there in GL is also for r11g11b10 format, > which says round-to-nearest recommended but trunc allowed, and all too > large finite numbers converted to max finite (which is inconsistent with > nearest rounding). The spec is completely silent both within GLSL or GL > how rounding should be done for fp32 to fp16, albeit I don't disagree > round-to-nearest seems the most reasonable.) The GLSL spec isn't silent. Section 4.7.1 explicitly says, "The rounding mode cannot be set and is undefined." > Roland > >> We've already determined that NVIDIA closed-source drivers do not flush >> to zero. I'm curious to know what AMD's closed-source drivers do for >> 16-bit subnormal values supplied to unpackHalf2x16. If they do not >> flush to zero, then you had better believe that applications depend on >> that behavior... and that also means that it doesn't matter very much >> what piglit does or the spec does (or does not) say. This is the sort >> of situation where the spec changes to match application expectations >> and shipping implementations... and Mesa drivers change to follow. This >> isn't even close to the first time through that loop. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats
From: Marek OlšákThis fixes FP16 conversion instructions for VI, which has 16-bit floats, but not SI & CI, which can't disable denorms for those instructions. --- src/gallium/drivers/radeonsi/si_shader.c| 14 ++ src/gallium/drivers/radeonsi/si_state_shaders.c | 18 -- src/gallium/drivers/radeonsi/sid.h | 3 +++ 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index a4680ce..3f1db70 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen, si_shader_binary_read_config(binary, conf, 0); + /* Enable 64-bit and 16-bit denormals, because there is no performance +* cost. +* +* If denormals are enabled, all floating-point output modifiers are +* ignored. +* +* Don't enable denormals for 32-bit floats, because: +* - Floating-point output modifiers would be ignored by the hw. +* - Some opcodes don't support denormals, such as v_mad_f32. We would +* have to stop using those. +* - SI & CI would be very slow. +*/ + conf->float_mode |= V_00B028_FP_64_DENORMS; + FREE(binary->config); FREE(binary->global_symbol_offsets); binary->config = NULL; diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index ce795c0..77a4e47 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader) shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B528_SGPRS((num_sgprs - 1) / 8) | S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B528_DX10_CLAMP(1); + S_00B528_DX10_CLAMP(1) | + S_00B528_FLOAT_MODE(shader->config.float_mode); shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0); } @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader) si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B428_SGPRS((num_sgprs - 1) / 8) | - S_00B428_DX10_CLAMP(1)); + S_00B428_DX10_CLAMP(1) | + S_00B428_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, S_00B42C_USER_SGPR(num_user_sgprs) | S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader) S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B328_SGPRS((num_sgprs - 1) / 8) | S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B328_DX10_CLAMP(1)); + S_00B328_DX10_CLAMP(1) | + S_00B328_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES, S_00B32C_USER_SGPR(num_user_sgprs) | S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader) si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS, S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B228_SGPRS((num_sgprs - 1) / 8) | - S_00B228_DX10_CLAMP(1)); + S_00B228_DX10_CLAMP(1) | + S_00B228_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS, S_00B22C_USER_SGPR(num_user_sgprs) | S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0)); @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct si_shader *gs) S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) | S_00B128_SGPRS((num_sgprs - 1) / 8) | S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt) | - S_00B128_DX10_CLAMP(1)); + S_00B128_DX10_CLAMP(1) | + S_00B128_FLOAT_MODE(shader->config.float_mode)); si_pm4_set_reg(pm4, R_00B12C_SPI_SHADER_PGM_RSRC2_VS, S_00B12C_USER_SGPR(num_user_sgprs) | S_00B12C_SO_BASE0_EN(!!shader->selector->so.stride[0]) | @@ -567,7 +572,8 @@ static void