Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 12:33 AM, Ilia Mirkin  wrote:
> On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy  wrote:
>> On 08/02/2016 00:21, Ilia Mirkin wrote:
>>>
>>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:

 SQRT is not supported everywhere, so replace
 it by RSQ + RCP

 Signed-off-by: Axel Davy 
 ---
   src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/state_trackers/nine/nine_ff.c
 b/src/gallium/state_trackers/nine/nine_ff.c
 index a5466a7..894fc63 100644
 --- a/src/gallium/state_trackers/nine/nine_ff.c
 +++ b/src/gallium/state_trackers/nine/nine_ff.c
 @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct
 vs_build_ctx *vs)
   struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);

   ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
 -ureg_SQRT(ureg, tmp_y, _X(tmp));
 +ureg_RSQ(ureg, tmp_y, _X(tmp));
 +ureg_RCP(ureg, tmp_y, _Y(tmp));
>>>
>>> I'd recommend doing
>>>
>>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))
>>>
>>> instead. That should be (a) more numerically stable (rcp doesn't have
>>> great precision), and (b) not blow up for 0.
>>
>> Ok for the precision, but I'm not sure for 0
>>
>> With the mul version, with 0, it ends up computing inf * 0 = NaN,
>> whereas with the rcp version, it does 1/inf == 0 (as far as I know),
>> which is the expected result.
>
> Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN.
> What you really want is a "flush nan to 0" option on the mul like nvc0
> has, but there's no way to express that in TGSI.
>
> Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and
> otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does:
>
>  emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]);
>  emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]);
>  /* For incoming channels <= 0, set the result to 0. */
>  op[0].negate = ~op[0].negate;
>  emit_asm(ir, TGSI_OPCODE_CMP, result_dst,
>   op[0], result_src, st_src_reg_for_float(0.0));

FWIW, "NaN to 0" is always enabled on radeon. We also enable it for compute.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Ilia Mirkin
On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy  wrote:
> On 08/02/2016 00:21, Ilia Mirkin wrote:
>>
>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:
>>>
>>> SQRT is not supported everywhere, so replace
>>> it by RSQ + RCP
>>>
>>> Signed-off-by: Axel Davy 
>>> ---
>>>   src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/state_trackers/nine/nine_ff.c
>>> b/src/gallium/state_trackers/nine/nine_ff.c
>>> index a5466a7..894fc63 100644
>>> --- a/src/gallium/state_trackers/nine/nine_ff.c
>>> +++ b/src/gallium/state_trackers/nine/nine_ff.c
>>> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct
>>> vs_build_ctx *vs)
>>>   struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
>>>
>>>   ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
>>> -ureg_SQRT(ureg, tmp_y, _X(tmp));
>>> +ureg_RSQ(ureg, tmp_y, _X(tmp));
>>> +ureg_RCP(ureg, tmp_y, _Y(tmp));
>>
>> I'd recommend doing
>>
>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))
>>
>> instead. That should be (a) more numerically stable (rcp doesn't have
>> great precision), and (b) not blow up for 0.
>
> Ok for the precision, but I'm not sure for 0
>
> With the mul version, with 0, it ends up computing inf * 0 = NaN,
> whereas with the rcp version, it does 1/inf == 0 (as far as I know),
> which is the expected result.

Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN.
What you really want is a "flush nan to 0" option on the mul like nvc0
has, but there's no way to express that in TGSI.

Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and
otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does:

 emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]);
 emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]);
 /* For incoming channels <= 0, set the result to 0. */
 op[0].negate = ~op[0].negate;
 emit_asm(ir, TGSI_OPCODE_CMP, result_dst,
  op[0], result_src, st_src_reg_for_float(0.0));
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Axel Davy
SQRT is not supported everywhere, so replace
it by RSQ + RCP

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
b/src/gallium/state_trackers/nine/nine_ff.c
index a5466a7..894fc63 100644
--- a/src/gallium/state_trackers/nine/nine_ff.c
+++ b/src/gallium/state_trackers/nine/nine_ff.c
@@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
 struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
 
 ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
-ureg_SQRT(ureg, tmp_y, _X(tmp));
+ureg_RSQ(ureg, tmp_y, _X(tmp));
+ureg_RCP(ureg, tmp_y, _Y(tmp));
 ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
 ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
 ureg_RCP(ureg, tmp_x, ureg_src(tmp));
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Ilia Mirkin
On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:
> SQRT is not supported everywhere, so replace
> it by RSQ + RCP
>
> Signed-off-by: Axel Davy 
> ---
>  src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
> b/src/gallium/state_trackers/nine/nine_ff.c
> index a5466a7..894fc63 100644
> --- a/src/gallium/state_trackers/nine/nine_ff.c
> +++ b/src/gallium/state_trackers/nine/nine_ff.c
> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
> vs_build_ctx *vs)
>  struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
>
>  ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
> -ureg_SQRT(ureg, tmp_y, _X(tmp));
> +ureg_RSQ(ureg, tmp_y, _X(tmp));
> +ureg_RCP(ureg, tmp_y, _Y(tmp));

I'd recommend doing

ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))

instead. That should be (a) more numerically stable (rcp doesn't have
great precision), and (b) not blow up for 0.

>  ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
>  ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
>  ureg_RCP(ureg, tmp_x, ureg_src(tmp));
> --
> 2.7.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Axel Davy

On 08/02/2016 00:21, Ilia Mirkin wrote:

On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:

SQRT is not supported everywhere, so replace
it by RSQ + RCP

Signed-off-by: Axel Davy 
---
  src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
b/src/gallium/state_trackers/nine/nine_ff.c
index a5466a7..894fc63 100644
--- a/src/gallium/state_trackers/nine/nine_ff.c
+++ b/src/gallium/state_trackers/nine/nine_ff.c
@@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
  struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);

  ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
-ureg_SQRT(ureg, tmp_y, _X(tmp));
+ureg_RSQ(ureg, tmp_y, _X(tmp));
+ureg_RCP(ureg, tmp_y, _Y(tmp));

I'd recommend doing

ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))

instead. That should be (a) more numerically stable (rcp doesn't have
great precision), and (b) not blow up for 0.

Ok for the precision, but I'm not sure for 0

With the mul version, with 0, it ends up computing inf * 0 = NaN,
whereas with the rcp version, it does 1/inf == 0 (as far as I know),
which is the expected result.




  ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
  ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
  ureg_RCP(ureg, tmp_x, ureg_src(tmp));
--
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev