Re: [Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-04-11 Thread Martin Peres

On 22/03/16 23:31, Ian Romanick wrote:

On 03/17/2016 09:18 AM, Martin Peres wrote:

On 16/03/16 19:33, Kenneth Graunke wrote:

The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
suggested workaround is to multiply by 0.7, scaling down the
amplitude slightly.  Apparently this also minimizes the error function,
reducing the maximum error from 0.6 to about 0.3.

I chose to apply this only when not saturating, as saturate already
clamps to 1.0.  This may or may not be a good idea.

Fixes 16 dEQP precision tests

 dEQP-GLES31.functional.shaders.builtin_functions.precision.
 {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive.

Signed-off-by: Kenneth Graunke 
---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26
--
   src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26
--
   2 files changed, 40 insertions(+), 12 deletions(-)

This has been in the Vulkan tree for a while - we needed it to pass the
Vulkan CTS, as it contains these same dEQP tests.

I haven't run shader-db yet, but I don't expect we'll like the results.

The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...

FYI, a quick run on hsw_gt2 shows -10.45% on Gputest:voplosion.

Can you explain this result?  -10.45% of what?  Instructions?  FPS?  And
this is comparing what to what?  Before this patch to after?


So sorry Ian, I completely missed your email!

Yes, it was just applying this patch and I was talking about FPS.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-03-22 Thread Kenneth Graunke
On Tuesday, March 22, 2016 2:31:42 PM PDT Ian Romanick wrote:
> On 03/17/2016 09:18 AM, Martin Peres wrote:
> > On 16/03/16 19:33, Kenneth Graunke wrote:
> >> The SIN and COS instructions on Intel hardware can produce values
> >> slightly outside of the [-1.0, 1.0] range for a small set of values.
> >> Obviously, this can break everyone's expectations about trig functions.
> >>
> >> According to an internal presentation, the COS instruction can produce
> >> a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
> >> suggested workaround is to multiply by 0.7, scaling down the
> >> amplitude slightly.  Apparently this also minimizes the error function,
> >> reducing the maximum error from 0.6 to about 0.3.
> >>
> >> I chose to apply this only when not saturating, as saturate already
> >> clamps to 1.0.  This may or may not be a good idea.
> >>
> >> Fixes 16 dEQP precision tests
> >>
> >> dEQP-GLES31.functional.shaders.builtin_functions.precision.
> >> {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
> >>
> >> at the cost of making every sin and cos call more expensive.
> >>
> >> Signed-off-by: Kenneth Graunke 
> >> ---
> >>   src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26
> >> --
> >>   src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26
> >> --
> >>   2 files changed, 40 insertions(+), 12 deletions(-)
> >>
> >> This has been in the Vulkan tree for a while - we needed it to pass the
> >> Vulkan CTS, as it contains these same dEQP tests.
> >>
> >> I haven't run shader-db yet, but I don't expect we'll like the results.
> >>
> >> The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
> >> an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...
> > 
> > FYI, a quick run on hsw_gt2 shows -10.45% on Gputest:voplosion.
> 
> Can you explain this result?  -10.45% of what?  Instructions?  FPS?  And
> this is comparing what to what?  Before this patch to after?

I think FPS goes down by 10.45% when applying this patch.  Pretty dire.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-03-22 Thread Ian Romanick
On 03/17/2016 09:18 AM, Martin Peres wrote:
> On 16/03/16 19:33, Kenneth Graunke wrote:
>> The SIN and COS instructions on Intel hardware can produce values
>> slightly outside of the [-1.0, 1.0] range for a small set of values.
>> Obviously, this can break everyone's expectations about trig functions.
>>
>> According to an internal presentation, the COS instruction can produce
>> a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
>> suggested workaround is to multiply by 0.7, scaling down the
>> amplitude slightly.  Apparently this also minimizes the error function,
>> reducing the maximum error from 0.6 to about 0.3.
>>
>> I chose to apply this only when not saturating, as saturate already
>> clamps to 1.0.  This may or may not be a good idea.
>>
>> Fixes 16 dEQP precision tests
>>
>> dEQP-GLES31.functional.shaders.builtin_functions.precision.
>> {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
>>
>> at the cost of making every sin and cos call more expensive.
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>   src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26
>> --
>>   src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26
>> --
>>   2 files changed, 40 insertions(+), 12 deletions(-)
>>
>> This has been in the Vulkan tree for a while - we needed it to pass the
>> Vulkan CTS, as it contains these same dEQP tests.
>>
>> I haven't run shader-db yet, but I don't expect we'll like the results.
>>
>> The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
>> an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...
> 
> FYI, a quick run on hsw_gt2 shows -10.45% on Gputest:voplosion.

Can you explain this result?  -10.45% of what?  Instructions?  FPS?  And
this is comparing what to what?  Before this patch to after?

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-03-20 Thread Kenneth Graunke
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
suggested workaround is to multiply by 0.7, scaling down the
amplitude slightly.  Apparently this also minimizes the error function,
reducing the maximum error from 0.6 to about 0.3.

I chose to apply this only when not saturating, as saturate already
clamps to 1.0.  This may or may not be a good idea.

Fixes 16 dEQP precision tests

   dEQP-GLES31.functional.shaders.builtin_functions.precision.
   {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26 --
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26 --
 2 files changed, 40 insertions(+), 12 deletions(-)

This has been in the Vulkan tree for a while - we needed it to pass the
Vulkan CTS, as it contains these same dEQP tests.

I haven't run shader-db yet, but I don't expect we'll like the results.

The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index cde8f0b..2f56f01 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -774,15 +774,29 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, 
nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
-   case nir_op_fsin:
-  inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]);
-  inst->saturate = instr->dest.saturate;
+   case nir_op_fsin: {
+  fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
+  inst = bld.emit(SHADER_OPCODE_SIN, tmp, op[0]);
+  if (instr->dest.saturate) {
+ inst->dst = result;
+ inst->saturate = true;
+  } else {
+ bld.MUL(result, tmp, brw_imm_f(0.7));
+  }
   break;
+   }
 
-   case nir_op_fcos:
-  inst = bld.emit(SHADER_OPCODE_COS, result, op[0]);
-  inst->saturate = instr->dest.saturate;
+   case nir_op_fcos: {
+  fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
+  inst = bld.emit(SHADER_OPCODE_COS, tmp, op[0]);
+  if (instr->dest.saturate) {
+ inst->dst = result;
+ inst->saturate = true;
+  } else {
+ bld.MUL(result, tmp, brw_imm_f(0.7));
+  }
   break;
+   }
 
case nir_op_fddx:
   if (fs_key->high_quality_derivatives) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 52977f1..c7a2408 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1100,15 +1100,29 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
-   case nir_op_fsin:
-  inst = emit_math(SHADER_OPCODE_SIN, dst, op[0]);
-  inst->saturate = instr->dest.saturate;
+   case nir_op_fsin: {
+  src_reg tmp = src_reg(this, glsl_type::vec4_type);
+  inst = emit_math(SHADER_OPCODE_SIN, dst_reg(tmp), op[0]);
+  if (instr->dest.saturate) {
+ inst->dst = dst;
+ inst->saturate = true;
+  } else {
+ emit(MUL(dst, tmp, brw_imm_f(0.7)));
+  }
   break;
+   }
 
-   case nir_op_fcos:
-  inst = emit_math(SHADER_OPCODE_COS, dst, op[0]);
-  inst->saturate = instr->dest.saturate;
+   case nir_op_fcos: {
+  src_reg tmp = src_reg(this, glsl_type::vec4_type);
+  inst = emit_math(SHADER_OPCODE_COS, dst_reg(tmp), op[0]);
+  if (instr->dest.saturate) {
+ inst->dst = dst;
+ inst->saturate = true;
+  } else {
+ emit(MUL(dst, tmp, brw_imm_f(0.7)));
+  }
   break;
+   }
 
case nir_op_idiv:
case nir_op_udiv:
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-03-19 Thread Kenneth Graunke
On Thursday, March 17, 2016 6:18:39 PM PDT Martin Peres wrote:
> On 16/03/16 19:33, Kenneth Graunke wrote:
> > The SIN and COS instructions on Intel hardware can produce values
> > slightly outside of the [-1.0, 1.0] range for a small set of values.
> > Obviously, this can break everyone's expectations about trig functions.
> >
> > According to an internal presentation, the COS instruction can produce
> > a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
> > suggested workaround is to multiply by 0.7, scaling down the
> > amplitude slightly.  Apparently this also minimizes the error function,
> > reducing the maximum error from 0.6 to about 0.3.
> >
> > I chose to apply this only when not saturating, as saturate already
> > clamps to 1.0.  This may or may not be a good idea.
> >
> > Fixes 16 dEQP precision tests
> >
> > dEQP-GLES31.functional.shaders.builtin_functions.precision.
> > {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
> >
> > at the cost of making every sin and cos call more expensive.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >   src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26 +++
+--
> >   src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26 +++
+--
> >   2 files changed, 40 insertions(+), 12 deletions(-)
> >
> > This has been in the Vulkan tree for a while - we needed it to pass the
> > Vulkan CTS, as it contains these same dEQP tests.
> >
> > I haven't run shader-db yet, but I don't expect we'll like the results.
> >
> > The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
> > an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...
> 
> FYI, a quick run on hsw_gt2 shows -10.45% on Gputest:voplosion.

Thanks!  It sounds like we probably should hide this behind some kind of
INTEL_STRICT_CONFORMANCE option or build time flag.  sin and cos are
really common, and making them cost effectively twice as much is really
painful.

I'll come up with a v2 at some point, let's NAK this for now.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Work around SIN/COS output range problem.

2016-03-19 Thread Martin Peres

On 16/03/16 19:33, Kenneth Graunke wrote:

The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.27 for inputs in the range (0.08296, 0.09888).  One
suggested workaround is to multiply by 0.7, scaling down the
amplitude slightly.  Apparently this also minimizes the error function,
reducing the maximum error from 0.6 to about 0.3.

I chose to apply this only when not saturating, as saturate already
clamps to 1.0.  This may or may not be a good idea.

Fixes 16 dEQP precision tests

dEQP-GLES31.functional.shaders.builtin_functions.precision.
{cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive.

Signed-off-by: Kenneth Graunke 
---
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 26 --
  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26 --
  2 files changed, 40 insertions(+), 12 deletions(-)

This has been in the Vulkan tree for a while - we needed it to pass the
Vulkan CTS, as it contains these same dEQP tests.

I haven't run shader-db yet, but I don't expect we'll like the results.

The patch is pretty sketchy, too.  I'm sort of tempted to hide it behind
an INTEL_STRICT_CONFORMANCE=1 option, like we had way back in the day...


FYI, a quick run on hsw_gt2 shows -10.45% on Gputest:voplosion.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev