Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-12 Thread Erik Faye-Lund
On Wed, Mar 12, 2014 at 1:32 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com 
 wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com 
 wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ 
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, 
 x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

 Are you saying that this only rewrites x = pow(y, 2.0) and not
 const float e = 2.0; x = pow(y, e);? If so, my point is moot,
 indeed. But if that's *not* the case, then I think we're in trouble
 still.

 The second would also get rewritten, because other passes will move the
 2.0 into the pow.

 I thought I understood your objection, but now I don't.  I think you'll
 have to lay out the pair of shaders involving the invariant keyword that
 you think that would be broken by this pass.

 My understanding is that
 ---8---
 invariant varying float v;
 attribute float a;
 const float e = 2.0;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 and
 ---8---
 invariant varying float v;
 attribute float a;
 uniform float e;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 ...should produce the exact same result, as long as the latter is
 passed 2.0 as the uniform e.

 Because v is marked as invariant, the expressions writing to v are the
 same, and the values passed in are the same.

 If we rewrite the first one to do a * a, we get a different result
 on implementations that do exp2(log2(a) * 2.0) for the latter, due
 to floating-point normalization in the intermediate steps.

 I don't think that's what the spec authors intended from the keyword.  I
 think what they intended was that if you had uniform float e in both
 cases, but different code for setting *other* lvalues, that you'd still
 get the same result in v.

I think that *might* be correct, but it doesn't seem to be what's
actually defined. Invariance comes from ESSL, and FWIW, I was one of
the spec authors in this case. But I don't remember the details down
to this level, and the spec doesn't seem to clarify either.

However, since constant expressions are given some slack wrt how it's
evaluated, I'm inclined to think that you're right about the spirit of
the spec.

AFAIR, we introduced invariant to get rid of ftransform (since we'd
already gotten rid of fixed-function state), so that multi-pass
rendering algorthms could be guaranteed that all passes ended up
covering the exact same fragments at the exact same depth coordinate.
And in cases like that, the inputs would really be of the same kind
all the time, I guess.

So yeah, perhaps. But I wouldn't feel safe about optimizations like
this without a clarification from Khronos.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-12 Thread Ian Romanick
On 03/12/2014 01:29 AM, Erik Faye-Lund wrote:
 On Wed, Mar 12, 2014 at 1:32 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com 
 wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com 
 wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ 
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, 
 x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

 Are you saying that this only rewrites x = pow(y, 2.0) and not
 const float e = 2.0; x = pow(y, e);? If so, my point is moot,
 indeed. But if that's *not* the case, then I think we're in trouble
 still.

 The second would also get rewritten, because other passes will move the
 2.0 into the pow.

 I thought I understood your objection, but now I don't.  I think you'll
 have to lay out the pair of shaders involving the invariant keyword that
 you think that would be broken by this pass.

 My understanding is that
 ---8---
 invariant varying float v;
 attribute float a;
 const float e = 2.0;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 and
 ---8---
 invariant varying float v;
 attribute float a;
 uniform float e;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 ...should produce the exact same result, as long as the latter is
 passed 2.0 as the uniform e.

 Because v is marked as invariant, the expressions writing to v are the
 same, and the values passed in are the same.

 If we rewrite the first one to do a * a, we get a different result
 on implementations that do exp2(log2(a) * 2.0) for the latter, due
 to floating-point normalization in the intermediate steps.

 I don't think that's what the spec authors intended from the keyword.  I
 think what they intended was that if you had uniform float e in both
 cases, but different code for setting *other* lvalues, that you'd still
 get the same result in v.
 
 I think that *might* be correct, but it doesn't seem to be what's
 actually defined. Invariance comes from ESSL, and FWIW, I was one of
 the spec authors in this case. But I don't remember the details down
 to this level, and the spec doesn't seem to clarify either.
 
 However, since constant expressions are given some slack wrt how it's
 evaluated, I'm inclined to think that you're right about the spirit of
 the spec.
 
 AFAIR, we introduced invariant to get rid of ftransform (since we'd
 already gotten rid of fixed-function state), so that multi-pass
 rendering algorthms could be guaranteed that all passes ended up
 covering the exact same fragments at the exact same depth coordinate.
 And in cases like that, the inputs would really be of the same kind
 all the time, I guess.

I believe the intention was to prevent inter-expression optimizations
from causing different shaders from producing different results.
Relative to this example, you can imagine:

attribute float a, b;

Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Erik Faye-Lund
On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
y), this will give different results for if y comes from a uniform vs
if it's a constant, no?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Erik Faye-Lund
On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

To be a bit more clear: I don't think this is valid for expressions
writing to variables marked as invariant (or expressions taking part
in the calculations that leads up to invariant variable writes).

I can't find anything allowing variance like this in the invariance
section of the GLSL 3.30 specifications. In particular, the list
following To guarantee invariance of a particular output variable
across two programs, the following must also be true doesn't seem to
require the values to be passed from the same source, only that the
same values are passed. And in this case, the value 2.0 is usually
exactly representable no matter what path it took there.

Perhaps I'm being a bit too pedantic here, though.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Ian Romanick
On 03/10/2014 07:21 PM, Roland Scheidegger wrote:
 Am 11.03.2014 01:23, schrieb Ian Romanick:
 I had a pretty similar patch on the top of my pow-optimization branch.
 I also expand x**3 and x**4.  I had hoped that would enable some cases
 to expand then merge to MADs.  It should also be faster on older GENs
 where POW perf sucks.  I didn't send it out because I wanted to add a
 similar optimization in the back end that would turn x*x*x*x back into
 x**4 on GPUs where the POW would be faster.
 I have no idea what performance POW has on newer intel gpu hw (since in
 contrast to older pre-snb hw with separate mathbox the manual doesn't
 list throughput for extended math functions, at least I never found it),
 but I find it highly unlikely that a POW has a cost lower than 2 muls
 anywhere.

The architecture has changed quite a bit, so math box is kind of a
thing of that past... and there was much rejoicing.  The timings that we
use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL
on Haswell.  The numbers are similar (but slightly longer) on
Sandybridge and Ivybridge.

 Roland
 
 
 I also didn't have anything in shader-db that benefitted from x**2 or
 x**3.  It seems like there were a couple that would be modified by a
 x**5 flattening, but I think that would universally be slower

 On 03/10/2014 03:54 PM, Matt Turner wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);
  
 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +
break;
  
 case ir_unop_rcp:

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Roland Scheidegger
Am 11.03.2014 17:29, schrieb Ian Romanick:
 On 03/10/2014 07:21 PM, Roland Scheidegger wrote:
 Am 11.03.2014 01:23, schrieb Ian Romanick:
 I had a pretty similar patch on the top of my pow-optimization branch.
 I also expand x**3 and x**4.  I had hoped that would enable some cases
 to expand then merge to MADs.  It should also be faster on older GENs
 where POW perf sucks.  I didn't send it out because I wanted to add a
 similar optimization in the back end that would turn x*x*x*x back into
 x**4 on GPUs where the POW would be faster.
 I have no idea what performance POW has on newer intel gpu hw (since in
 contrast to older pre-snb hw with separate mathbox the manual doesn't
 list throughput for extended math functions, at least I never found it),
 but I find it highly unlikely that a POW has a cost lower than 2 muls
 anywhere.
 
 The architecture has changed quite a bit, so math box is kind of a
 thing of that past...
That's why I said pre-SNB hw :-).

 and there was much rejoicing.  The timings that we
 use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL
 on Haswell.  The numbers are similar (but slightly longer) on
 Sandybridge and Ivybridge.
I think that works if you just care about latency. Since it appears you
have a base latency of 14 cycles for anything, but 22 for POW however
it looks to me like POW is significantly more expensive. (That is, if
you'd try to issue nothing but POWs or probably other functions from the
extended math group, you'd find you could only get 1/4 or so from the
throughput you get with MULs, since you probably cannot issue that
function every two cycles, but you can do that with MULs. Just a guess
though, assuming that during these additional latency cycles the hw
cannot do another POW, and even if true maybe latency is really still
more relevant in practice. But as said that's just a wild guess I blame
the docs for that :-).)


 Roland


 I also didn't have anything in shader-db that benefitted from x**2 or
 x**3.  It seems like there were a couple that would be modified by a
 x**5 flattening, but I think that would universally be slower

 On 03/10/2014 03:54 PM, Matt Turner wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);
  
 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +
break;
  
 case ir_unop_rcp:
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Eric Anholt
Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

This file would do the same thing on the same expression tree in two
different programs, so invariant is fine (we've probably got other
problems with invariant, though).  The keyword you're probably thinking
of is precise, which isn't in GLSL we implement yet.


pgpEqUhBo44bs.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Paul Berry
On 10 March 2014 17:23, Ian Romanick i...@freedesktop.org wrote:

 I had a pretty similar patch on the top of my pow-optimization branch.
 I also expand x**3 and x**4.  I had hoped that would enable some cases
 to expand then merge to MADs.  It should also be faster on older GENs
 where POW perf sucks.  I didn't send it out because I wanted to add a
 similar optimization in the back end that would turn x*x*x*x back into
 x**4 on GPUs where the POW would be faster.


Be careful with that one, though: pow is undefined when x  0, so x*x*x*x
= pow(x, 4) isn't a valid conversion in general--it only works if you know
the operation started off as a pow in the first place.  (Note: you can do
x*x*x*x = pow(abs(x), 4), but that doesn't generalize to odd powers).



 I also didn't have anything in shader-db that benefitted from x**2 or
 x**3.  It seems like there were a couple that would be modified by a
 x**5 flattening, but I think that would universally be slower

 On 03/10/2014 03:54 PM, Matt Turner wrote:
  Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
  ---
   src/glsl/opt_algebraic.cpp | 8 
   1 file changed, 8 insertions(+)
 
  diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
  index 5c49a78..8494bd9 100644
  --- a/src/glsl/opt_algebraic.cpp
  +++ b/src/glsl/opt_algebraic.cpp
  @@ -528,6 +528,14 @@
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
 if (is_vec_two(op_const[0]))
return expr(ir_unop_exp2, ir-operands[1]);
 
  +  if (is_vec_two(op_const[1])) {
  + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type,
 x,
  +  ir_var_temporary);
  + base_ir-insert_before(x);
  + base_ir-insert_before(assign(x, ir-operands[0]));
  + return mul(x, x);
  +  }
  +
 break;
 
  case ir_unop_rcp:
 

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Matt Turner
On Tue, Mar 11, 2014 at 10:35 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 11.03.2014 17:29, schrieb Ian Romanick:
 and there was much rejoicing.  The timings that we
 use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL
 on Haswell.  The numbers are similar (but slightly longer) on
 Sandybridge and Ivybridge.
 I think that works if you just care about latency. Since it appears you
 have a base latency of 14 cycles for anything, but 22 for POW however
 it looks to me like POW is significantly more expensive. (That is, if
 you'd try to issue nothing but POWs or probably other functions from the
 extended math group, you'd find you could only get 1/4 or so from the
 throughput you get with MULs, since you probably cannot issue that
 function every two cycles, but you can do that with MULs. Just a guess
 though, assuming that during these additional latency cycles the hw
 cannot do another POW, and even if true maybe latency is really still
 more relevant in practice. But as said that's just a wild guess I blame
 the docs for that :-).)

Nope, you're right. Haswell can issue 8 multiplies per EU per cycle,
but only one pow.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Erik Faye-Lund
On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

Are you saying that this only rewrites x = pow(y, 2.0) and not
const float e = 2.0; x = pow(y, e);? If so, my point is moot,
indeed. But if that's *not* the case, then I think we're in trouble
still.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Eric Anholt
Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ 
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

 Are you saying that this only rewrites x = pow(y, 2.0) and not
 const float e = 2.0; x = pow(y, e);? If so, my point is moot,
 indeed. But if that's *not* the case, then I think we're in trouble
 still.

The second would also get rewritten, because other passes will move the
2.0 into the pow.

I thought I understood your objection, but now I don't.  I think you'll
have to lay out the pair of shaders involving the invariant keyword that
you think that would be broken by this pass.


pgp8qFUA0D75x.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Erik Faye-Lund
On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com 
 wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ 
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, 
 x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

 Are you saying that this only rewrites x = pow(y, 2.0) and not
 const float e = 2.0; x = pow(y, e);? If so, my point is moot,
 indeed. But if that's *not* the case, then I think we're in trouble
 still.

 The second would also get rewritten, because other passes will move the
 2.0 into the pow.

 I thought I understood your objection, but now I don't.  I think you'll
 have to lay out the pair of shaders involving the invariant keyword that
 you think that would be broken by this pass.

My understanding is that
---8---
invariant varying float v;
attribute float a;
const float e = 2.0;
void main()
{
v = pow(a, e);
}
---8---
and
---8---
invariant varying float v;
attribute float a;
uniform float e;
void main()
{
v = pow(a, e);
}
---8---
...should produce the exact same result, as long as the latter is
passed 2.0 as the uniform e.

Because v is marked as invariant, the expressions writing to v are the
same, and the values passed in are the same.

If we rewrite the first one to do a * a, we get a different result
on implementations that do exp2(log2(a) * 2.0) for the latter, due
to floating-point normalization in the intermediate steps.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-11 Thread Eric Anholt
Erik Faye-Lund kusmab...@gmail.com writes:

 On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote:
 Erik Faye-Lund kusmab...@gmail.com writes:

 On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com 
 wrote:
 On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ 
 ir_algebraic_visitor::handle_expression(ir_expression *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);

 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, 
 x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +

 Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) *
 y), this will give different results for if y comes from a uniform vs
 if it's a constant, no?

 Yes, but that wouldn't be covered by the invariant keyword.

 To be a bit more clear: I don't think this is valid for expressions
 writing to variables marked as invariant (or expressions taking part
 in the calculations that leads up to invariant variable writes).

 I can't find anything allowing variance like this in the invariance
 section of the GLSL 3.30 specifications. In particular, the list
 following To guarantee invariance of a particular output variable
 across two programs, the following must also be true doesn't seem to
 require the values to be passed from the same source, only that the
 same values are passed. And in this case, the value 2.0 is usually
 exactly representable no matter what path it took there.

 Perhaps I'm being a bit too pedantic here, though.

 This file would do the same thing on the same expression tree in two
 different programs, so invariant is fine (we've probably got other
 problems with invariant, though).  The keyword you're probably thinking
 of is precise, which isn't in GLSL we implement yet.

 Are you saying that this only rewrites x = pow(y, 2.0) and not
 const float e = 2.0; x = pow(y, e);? If so, my point is moot,
 indeed. But if that's *not* the case, then I think we're in trouble
 still.

 The second would also get rewritten, because other passes will move the
 2.0 into the pow.

 I thought I understood your objection, but now I don't.  I think you'll
 have to lay out the pair of shaders involving the invariant keyword that
 you think that would be broken by this pass.

 My understanding is that
 ---8---
 invariant varying float v;
 attribute float a;
 const float e = 2.0;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 and
 ---8---
 invariant varying float v;
 attribute float a;
 uniform float e;
 void main()
 {
 v = pow(a, e);
 }
 ---8---
 ...should produce the exact same result, as long as the latter is
 passed 2.0 as the uniform e.

 Because v is marked as invariant, the expressions writing to v are the
 same, and the values passed in are the same.

 If we rewrite the first one to do a * a, we get a different result
 on implementations that do exp2(log2(a) * 2.0) for the latter, due
 to floating-point normalization in the intermediate steps.

I don't think that's what the spec authors intended from the keyword.  I
think what they intended was that if you had uniform float e in both
cases, but different code for setting *other* lvalues, that you'd still
get the same result in v.


pgpeI8UuoG24E.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-10 Thread Matt Turner
Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
---
 src/glsl/opt_algebraic.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 5c49a78..8494bd9 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   if (is_vec_two(op_const[0]))
  return expr(ir_unop_exp2, ir-operands[1]);
 
+  if (is_vec_two(op_const[1])) {
+ ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
+  ir_var_temporary);
+ base_ir-insert_before(x);
+ base_ir-insert_before(assign(x, ir-operands[0]));
+ return mul(x, x);
+  }
+
   break;
 
case ir_unop_rcp:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-10 Thread Ian Romanick
I had a pretty similar patch on the top of my pow-optimization branch.
I also expand x**3 and x**4.  I had hoped that would enable some cases
to expand then merge to MADs.  It should also be faster on older GENs
where POW perf sucks.  I didn't send it out because I wanted to add a
similar optimization in the back end that would turn x*x*x*x back into
x**4 on GPUs where the POW would be faster.

I also didn't have anything in shader-db that benefitted from x**2 or
x**3.  It seems like there were a couple that would be modified by a
x**5 flattening, but I think that would universally be slower

On 03/10/2014 03:54 PM, Matt Turner wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);
  
 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +
break;
  
 case ir_unop_rcp:
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.

2014-03-10 Thread Roland Scheidegger
Am 11.03.2014 01:23, schrieb Ian Romanick:
 I had a pretty similar patch on the top of my pow-optimization branch.
 I also expand x**3 and x**4.  I had hoped that would enable some cases
 to expand then merge to MADs.  It should also be faster on older GENs
 where POW perf sucks.  I didn't send it out because I wanted to add a
 similar optimization in the back end that would turn x*x*x*x back into
 x**4 on GPUs where the POW would be faster.
I have no idea what performance POW has on newer intel gpu hw (since in
contrast to older pre-snb hw with separate mathbox the manual doesn't
list throughput for extended math functions, at least I never found it),
but I find it highly unlikely that a POW has a cost lower than 2 muls
anywhere.

Roland


 I also didn't have anything in shader-db that benefitted from x**2 or
 x**3.  It seems like there were a couple that would be modified by a
 x**5 flattening, but I think that would universally be slower
 
 On 03/10/2014 03:54 PM, Matt Turner wrote:
 Cuts two instructions out of SynMark's Gl32VSInstancing benchmark.
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 5c49a78..8494bd9 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
if (is_vec_two(op_const[0]))
   return expr(ir_unop_exp2, ir-operands[1]);
  
 +  if (is_vec_two(op_const[1])) {
 + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x,
 +  ir_var_temporary);
 + base_ir-insert_before(x);
 + base_ir-insert_before(assign(x, ir-operands[0]));
 + return mul(x, x);
 +  }
 +
break;
  
 case ir_unop_rcp:

 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev