Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Wed, Mar 12, 2014 at 1:32 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. The second would also get rewritten, because other passes will move the 2.0 into the pow. I thought I understood your objection, but now I don't. I think you'll have to lay out the pair of shaders involving the invariant keyword that you think that would be broken by this pass. My understanding is that ---8--- invariant varying float v; attribute float a; const float e = 2.0; void main() { v = pow(a, e); } ---8--- and ---8--- invariant varying float v; attribute float a; uniform float e; void main() { v = pow(a, e); } ---8--- ...should produce the exact same result, as long as the latter is passed 2.0 as the uniform e. Because v is marked as invariant, the expressions writing to v are the same, and the values passed in are the same. If we rewrite the first one to do a * a, we get a different result on implementations that do exp2(log2(a) * 2.0) for the latter, due to floating-point normalization in the intermediate steps. I don't think that's what the spec authors intended from the keyword. I think what they intended was that if you had uniform float e in both cases, but different code for setting *other* lvalues, that you'd still get the same result in v. I think that *might* be correct, but it doesn't seem to be what's actually defined. Invariance comes from ESSL, and FWIW, I was one of the spec authors in this case. But I don't remember the details down to this level, and the spec doesn't seem to clarify either. However, since constant expressions are given some slack wrt how it's evaluated, I'm inclined to think that you're right about the spirit of the spec. AFAIR, we introduced invariant to get rid of ftransform (since we'd already gotten rid of fixed-function state), so that multi-pass rendering algorthms could be guaranteed that all passes ended up covering the exact same fragments at the exact same depth coordinate. And in cases like that, the inputs would really be of the same kind all the time, I guess. So yeah, perhaps. But I wouldn't feel safe about optimizations like this without a clarification from Khronos. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On 03/12/2014 01:29 AM, Erik Faye-Lund wrote: On Wed, Mar 12, 2014 at 1:32 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. The second would also get rewritten, because other passes will move the 2.0 into the pow. I thought I understood your objection, but now I don't. I think you'll have to lay out the pair of shaders involving the invariant keyword that you think that would be broken by this pass. My understanding is that ---8--- invariant varying float v; attribute float a; const float e = 2.0; void main() { v = pow(a, e); } ---8--- and ---8--- invariant varying float v; attribute float a; uniform float e; void main() { v = pow(a, e); } ---8--- ...should produce the exact same result, as long as the latter is passed 2.0 as the uniform e. Because v is marked as invariant, the expressions writing to v are the same, and the values passed in are the same. If we rewrite the first one to do a * a, we get a different result on implementations that do exp2(log2(a) * 2.0) for the latter, due to floating-point normalization in the intermediate steps. I don't think that's what the spec authors intended from the keyword. I think what they intended was that if you had uniform float e in both cases, but different code for setting *other* lvalues, that you'd still get the same result in v. I think that *might* be correct, but it doesn't seem to be what's actually defined. Invariance comes from ESSL, and FWIW, I was one of the spec authors in this case. But I don't remember the details down to this level, and the spec doesn't seem to clarify either. However, since constant expressions are given some slack wrt how it's evaluated, I'm inclined to think that you're right about the spirit of the spec. AFAIR, we introduced invariant to get rid of ftransform (since we'd already gotten rid of fixed-function state), so that multi-pass rendering algorthms could be guaranteed that all passes ended up covering the exact same fragments at the exact same depth coordinate. And in cases like that, the inputs would really be of the same kind all the time, I guess. I believe the intention was to prevent inter-expression optimizations from causing different shaders from producing different results. Relative to this example, you can imagine: attribute float a, b;
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On 03/10/2014 07:21 PM, Roland Scheidegger wrote: Am 11.03.2014 01:23, schrieb Ian Romanick: I had a pretty similar patch on the top of my pow-optimization branch. I also expand x**3 and x**4. I had hoped that would enable some cases to expand then merge to MADs. It should also be faster on older GENs where POW perf sucks. I didn't send it out because I wanted to add a similar optimization in the back end that would turn x*x*x*x back into x**4 on GPUs where the POW would be faster. I have no idea what performance POW has on newer intel gpu hw (since in contrast to older pre-snb hw with separate mathbox the manual doesn't list throughput for extended math functions, at least I never found it), but I find it highly unlikely that a POW has a cost lower than 2 muls anywhere. The architecture has changed quite a bit, so math box is kind of a thing of that past... and there was much rejoicing. The timings that we use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL on Haswell. The numbers are similar (but slightly longer) on Sandybridge and Ivybridge. Roland I also didn't have anything in shader-db that benefitted from x**2 or x**3. It seems like there were a couple that would be modified by a x**5 flattening, but I think that would universally be slower On 03/10/2014 03:54 PM, Matt Turner wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Am 11.03.2014 17:29, schrieb Ian Romanick: On 03/10/2014 07:21 PM, Roland Scheidegger wrote: Am 11.03.2014 01:23, schrieb Ian Romanick: I had a pretty similar patch on the top of my pow-optimization branch. I also expand x**3 and x**4. I had hoped that would enable some cases to expand then merge to MADs. It should also be faster on older GENs where POW perf sucks. I didn't send it out because I wanted to add a similar optimization in the back end that would turn x*x*x*x back into x**4 on GPUs where the POW would be faster. I have no idea what performance POW has on newer intel gpu hw (since in contrast to older pre-snb hw with separate mathbox the manual doesn't list throughput for extended math functions, at least I never found it), but I find it highly unlikely that a POW has a cost lower than 2 muls anywhere. The architecture has changed quite a bit, so math box is kind of a thing of that past... That's why I said pre-SNB hw :-). and there was much rejoicing. The timings that we use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL on Haswell. The numbers are similar (but slightly longer) on Sandybridge and Ivybridge. I think that works if you just care about latency. Since it appears you have a base latency of 14 cycles for anything, but 22 for POW however it looks to me like POW is significantly more expensive. (That is, if you'd try to issue nothing but POWs or probably other functions from the extended math group, you'd find you could only get 1/4 or so from the throughput you get with MULs, since you probably cannot issue that function every two cycles, but you can do that with MULs. Just a guess though, assuming that during these additional latency cycles the hw cannot do another POW, and even if true maybe latency is really still more relevant in practice. But as said that's just a wild guess I blame the docs for that :-).) Roland I also didn't have anything in shader-db that benefitted from x**2 or x**3. It seems like there were a couple that would be modified by a x**5 flattening, but I think that would universally be slower On 03/10/2014 03:54 PM, Matt Turner wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. pgpEqUhBo44bs.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On 10 March 2014 17:23, Ian Romanick i...@freedesktop.org wrote: I had a pretty similar patch on the top of my pow-optimization branch. I also expand x**3 and x**4. I had hoped that would enable some cases to expand then merge to MADs. It should also be faster on older GENs where POW perf sucks. I didn't send it out because I wanted to add a similar optimization in the back end that would turn x*x*x*x back into x**4 on GPUs where the POW would be faster. Be careful with that one, though: pow is undefined when x 0, so x*x*x*x = pow(x, 4) isn't a valid conversion in general--it only works if you know the operation started off as a pow in the first place. (Note: you can do x*x*x*x = pow(abs(x), 4), but that doesn't generalize to odd powers). I also didn't have anything in shader-db that benefitted from x**2 or x**3. It seems like there were a couple that would be modified by a x**5 flattening, but I think that would universally be slower On 03/10/2014 03:54 PM, Matt Turner wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Tue, Mar 11, 2014 at 10:35 AM, Roland Scheidegger srol...@vmware.com wrote: Am 11.03.2014 17:29, schrieb Ian Romanick: and there was much rejoicing. The timings that we use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL on Haswell. The numbers are similar (but slightly longer) on Sandybridge and Ivybridge. I think that works if you just care about latency. Since it appears you have a base latency of 14 cycles for anything, but 22 for POW however it looks to me like POW is significantly more expensive. (That is, if you'd try to issue nothing but POWs or probably other functions from the extended math group, you'd find you could only get 1/4 or so from the throughput you get with MULs, since you probably cannot issue that function every two cycles, but you can do that with MULs. Just a guess though, assuming that during these additional latency cycles the hw cannot do another POW, and even if true maybe latency is really still more relevant in practice. But as said that's just a wild guess I blame the docs for that :-).) Nope, you're right. Haswell can issue 8 multiplies per EU per cycle, but only one pow. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. The second would also get rewritten, because other passes will move the 2.0 into the pow. I thought I understood your objection, but now I don't. I think you'll have to lay out the pair of shaders involving the invariant keyword that you think that would be broken by this pass. pgp8qFUA0D75x.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. The second would also get rewritten, because other passes will move the 2.0 into the pow. I thought I understood your objection, but now I don't. I think you'll have to lay out the pair of shaders involving the invariant keyword that you think that would be broken by this pass. My understanding is that ---8--- invariant varying float v; attribute float a; const float e = 2.0; void main() { v = pow(a, e); } ---8--- and ---8--- invariant varying float v; attribute float a; uniform float e; void main() { v = pow(a, e); } ---8--- ...should produce the exact same result, as long as the latter is passed 2.0 as the uniform e. Because v is marked as invariant, the expressions writing to v are the same, and the values passed in are the same. If we rewrite the first one to do a * a, we get a different result on implementations that do exp2(log2(a) * 2.0) for the latter, due to floating-point normalization in the intermediate steps. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Erik Faye-Lund kusmab...@gmail.com writes: On Wed, Mar 12, 2014 at 12:00 AM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 7:27 PM, Eric Anholt e...@anholt.net wrote: Erik Faye-Lund kusmab...@gmail.com writes: On Tue, Mar 11, 2014 at 2:50 PM, Erik Faye-Lund kusmab...@gmail.com wrote: On Mon, Mar 10, 2014 at 11:54 PM, Matt Turner matts...@gmail.com wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + Is this safe? Since many GPUs implement pow(x, y) as exp2(log2(x) * y), this will give different results for if y comes from a uniform vs if it's a constant, no? Yes, but that wouldn't be covered by the invariant keyword. To be a bit more clear: I don't think this is valid for expressions writing to variables marked as invariant (or expressions taking part in the calculations that leads up to invariant variable writes). I can't find anything allowing variance like this in the invariance section of the GLSL 3.30 specifications. In particular, the list following To guarantee invariance of a particular output variable across two programs, the following must also be true doesn't seem to require the values to be passed from the same source, only that the same values are passed. And in this case, the value 2.0 is usually exactly representable no matter what path it took there. Perhaps I'm being a bit too pedantic here, though. This file would do the same thing on the same expression tree in two different programs, so invariant is fine (we've probably got other problems with invariant, though). The keyword you're probably thinking of is precise, which isn't in GLSL we implement yet. Are you saying that this only rewrites x = pow(y, 2.0) and not const float e = 2.0; x = pow(y, e);? If so, my point is moot, indeed. But if that's *not* the case, then I think we're in trouble still. The second would also get rewritten, because other passes will move the 2.0 into the pow. I thought I understood your objection, but now I don't. I think you'll have to lay out the pair of shaders involving the invariant keyword that you think that would be broken by this pass. My understanding is that ---8--- invariant varying float v; attribute float a; const float e = 2.0; void main() { v = pow(a, e); } ---8--- and ---8--- invariant varying float v; attribute float a; uniform float e; void main() { v = pow(a, e); } ---8--- ...should produce the exact same result, as long as the latter is passed 2.0 as the uniform e. Because v is marked as invariant, the expressions writing to v are the same, and the values passed in are the same. If we rewrite the first one to do a * a, we get a different result on implementations that do exp2(log2(a) * 2.0) for the latter, due to floating-point normalization in the intermediate steps. I don't think that's what the spec authors intended from the keyword. I think what they intended was that if you had uniform float e in both cases, but different code for setting *other* lvalues, that you'd still get the same result in v. pgpeI8UuoG24E.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
I had a pretty similar patch on the top of my pow-optimization branch. I also expand x**3 and x**4. I had hoped that would enable some cases to expand then merge to MADs. It should also be faster on older GENs where POW perf sucks. I didn't send it out because I wanted to add a similar optimization in the back end that would turn x*x*x*x back into x**4 on GPUs where the POW would be faster. I also didn't have anything in shader-db that benefitted from x**2 or x**3. It seems like there were a couple that would be modified by a x**5 flattening, but I think that would universally be slower On 03/10/2014 03:54 PM, Matt Turner wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: Optimize pow(x, 2) into x * x.
Am 11.03.2014 01:23, schrieb Ian Romanick: I had a pretty similar patch on the top of my pow-optimization branch. I also expand x**3 and x**4. I had hoped that would enable some cases to expand then merge to MADs. It should also be faster on older GENs where POW perf sucks. I didn't send it out because I wanted to add a similar optimization in the back end that would turn x*x*x*x back into x**4 on GPUs where the POW would be faster. I have no idea what performance POW has on newer intel gpu hw (since in contrast to older pre-snb hw with separate mathbox the manual doesn't list throughput for extended math functions, at least I never found it), but I find it highly unlikely that a POW has a cost lower than 2 muls anywhere. Roland I also didn't have anything in shader-db that benefitted from x**2 or x**3. It seems like there were a couple that would be modified by a x**5 flattening, but I think that would universally be slower On 03/10/2014 03:54 PM, Matt Turner wrote: Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 5c49a78..8494bd9 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); + if (is_vec_two(op_const[1])) { + ir_variable *x = new(ir) ir_variable(ir-operands[1]-type, x, + ir_var_temporary); + base_ir-insert_before(x); + base_ir-insert_before(assign(x, ir-operands[0])); + return mul(x, x); + } + break; case ir_unop_rcp: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev