Re: unified shader for layer rendering

2013-10-16 Thread Benoit Jacob
2013/10/10 Benoit Jacob jacob.benoi...@gmail.com

 this is the kind of work that would require very careful performance
 measurements


Here is a benchmark:
http://people.mozilla.org/~bjacob/webglbranchingbenchmark/webglbranchingbenchmark.html

Some results:
http://people.mozilla.org/~bjacob/webglbranchingbenchmark/webglbranchingbenchmarkresults.txt

Benoit
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-11 Thread Nicholas Cameron
On Friday, October 11, 2013 5:50:05 AM UTC+13, Benoit Girard wrote:
 On Thu, Oct 10, 2013 at 7:59 AM, Andreas Gal andreas@gmail.com wrote:
 
 
 
  Rationale:
 
  switching shaders tends to be expensive.
 
 
 
 
 
 In my opinion this is the only argument for working on this at moment.
 

I think almost the opposite :-) I am not sure if it is true that switching 
shaders is expensive - my understanding is that it is common in games to have 
many more shaders than we have and to switch them more frequently - and that is 
what GPUs are optimised for. Perhaps Dan G can confirm or refute that.

The advantage to me is that we have a single shader and avoid the combinatorial 
explosion when we add more shaders for things like SVG filters/CSS compositing. 
That may well be worth a performance hit to facilitate. There may also be other 
options - there is prior art here, the search term is 'shader permutations' (I 
looked into this a little 18 months ago, but do not remember if I found 
anything useful). We should explore all options before jumping on this, I think.

In terms of performance - there is a trade off here between branching and 
switching shaders, both of which are 'known' to be slow and also known to have 
been optimised in some GPUs/drivers. So we need to do some serious 
investigation to find out where the better perf is. In particular I think this 
may be a case where low end mobile GPUs and very old GPUs (the two areas where 
we really care about perf) may have very different characteristics. So, the 
right answer for b2g may not be the right answer for Firefox on Windows XP.

I have not recently been discussing new shaders, perhaps you are thinking of 
mstange who is looking at HW implementations of SVG filters?

 Particularly at the moment where we're overwhelmed with high priority
 
 desktop and mobile graphics work, I'd like to see numbers before we
 
 consider a change. I have seen no indications that we get hurt by switching
 
 shaders. I suspected it might matter when we start to have 100s of layers
 
 in a single page but we always fall down from another reason before this
 
 can become a problem. I'd like to be able to answer 'In which use cases
 
 would patching this lead to a user measurable improvement?' before working
 
 on this. Right now we have a long list of bugs where we have a clear answer
 
 to that question. Patching this is good to check off that we're using the
 
 GPU optimally on the GPU best practice dev guides and will later help us
 
 batch draw calls more aggressively but I'd like to have data to support
 
 this first.
 
 
 
 Also old Android drivers are a bit touchy with shaders so I recommend
 
 counting some dev times for resolving these issues.
 
 
 
 I know that roc and nrc have some plans for introducing more shaders which
 
 will make a unified shader approach more difficult. I'll let them weight in
 
 here.
 
 
 
 On the flip side I suspect having a single unified shader will be faster to
 
 compile then the several shaders we have on the start-up path.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-11 Thread Benoit Jacob
2013/10/11 Nicholas Cameron nick.r.came...@gmail.com

 The advantage to me is that we have a single shader and avoid the
 combinatorial explosion when we add more shaders for things like SVG
 filters/CSS compositing.



[...snip...]

 I have not recently been discussing new shaders, perhaps you are thinking
 of mstange who is looking at HW implementations of SVG filters?


Incidentally, I just looked into the feasibility of implementing
constant-time-regardless-of-operands (necessary for filter security)
filters in OpenGL shaders, as a similar topic is being discussed at the
moment on the WebGL mailing list, and there is a serious problem:

Newer GPUs (since roughly 2008 for high-end desktop GPUs, since 2013 for
high-end mobile GPUs) have IEEE754-conformant floating point with
denormals, and denormals may be slow there too.

https://developer.nvidia.com/content/cuda-pro-tip-flush-denormals-confidence
http://malideveloper.arm.com/engage-with-mali/benchmarking-floating-point-precision-part-iii/

I suggest on the Khronos public_webgl list that one way that this could be
solved in the future would be to write an OpenGL extension spec to force
flush-to-zero behavior to avoid denormals. For all I know, flush-to-zero is
currently a CUDA compiler flag but isn't exposed to OpenGL.

The NVIDIA whitepaper above also hints at this only being a problem with
multi-instruction functions such as square-root and inverse-square-root
(which is already a problem for e.g. lighting filters, which need to
normalize a vector), but that would at best be very NVIDIA-specific; in
general, denormals are a minority case that requires special handling, so
their slowness is rather universal; all x86 and ARM CPUs that I tested have
slow denormals.

Benoit
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


unified shader for layer rendering

2013-10-10 Thread Andreas Gal
Hi,

we currently have a zoo of shaders to render layers:

  RGBALayerProgramType,
  BGRALayerProgramType,
  RGBXLayerProgramType,
  BGRXLayerProgramType,
  RGBARectLayerProgramType,
  RGBXRectLayerProgramType,
  BGRARectLayerProgramType,
  RGBAExternalLayerProgramType,
  ColorLayerProgramType,
  YCbCrLayerProgramType,
  ComponentAlphaPass1ProgramType,
  ComponentAlphaPass1RGBProgramType,
  ComponentAlphaPass2ProgramType,
  ComponentAlphaPass2RGBProgramType,

(I have just eliminated the Copy2D variants, so omitted here.)

Next, I would like to replace everything but the YCbCr and ComponentAlpha 
shaders with one unified shader (attached below).

Rationale:

Most of our shader programs only differ minimally in cycle count, and we are 
generally memory bound, not GPU cycle bound (even on mobile). In addition, GPUs 
are actually are very efficient at branching, as long the branch is uniform and 
doesn't change direction per pixel or vertex (the driver compiles essentially 
variants and runs that). Last but not least, switching shaders tends to be 
expensive.

Proposed approach:

We use a single shader to replace the current 8 layer shaders. I verified with 
the mali shader compiler that the shortest path (color layer) is pretty close 
to the old color shader (is now 3, due to the opacity multiplication, was 1). 
For a lot of scenes we will be able to render without ever switching shaders, 
so that should more than make up for the extra cycles, especially since we are 
memory bound anyway.

More uniforms have to be set per shader invocation, but that should be pretty 
cheap.

I completely dropped the distinction of 2D and 3D masks. 3D masks should be 
able to handle the 2D case and the cycle savings are minimal, and as mentioned 
before, irrelevant.

An important advantage is that with this approach we can now easily add 
additional layer effects to the pipeline without exponentially exploding the 
number of programs 
(RGBXRectLayerProgramWithGrayscaleAndWithoutOpacityButMaskAndNotMask3D…).

Also, last but not least, this reduces code complexity quite a bit.

Feedback welcome.

Thanks,

Andreas

---

// Base color (will be rendered if layer texture is not read).
uniform vec4 uColor;

// Layer texture (disabled for color layers).
uniform bool uTextureEnabled;
uniform vec2 uTexCoordMultiplier;
uniform bool uTextureBGRA; // Default is RGBA.
uniform bool uTextureNoAlpha;
uniform float uTextureOpacity;
uniform sampler2D uTexture;
uniform bool uTextureUseExternalOES;
uniform samplerExternalOES uTextureExternalOES;
#ifndef GL_ES
uniform bool uTextureUseRect;
uniform sampler2DRect uTextureRect;
#endif

// Masking (optional)
uniform bool uMaskEnabled;
varying vec3 vMaskCoord;
uniform sampler2D uMaskTexture;

void main()
{
  vec4 color = uColor;
  if (uTextureEnabled) {
vec2 texCoord = vTexCoord * uTexCoordMultiplier;
if (uTextureUseExternalOES) {
  color = texture2D(uTextureExternalOES, texCoord);
#ifndef GL_ES
} else if (uTextureUseRect) {
  color = texture2DRect(uTexture, texCoord);
#endif
} else {
  color = texture2D(uTexture, texCoord);
}
if (uTextureBGRA) {
  color = color.bgra;
}
if (uTextureNoAlpha) {
  color = vec4(color.rgb, 1.0);
}
color *= uTextureOpacity;
  }
  if (uMaskEnabled) {
color *= texture2D(uMaskTexture, vMaskCoord.xy / vMaskCoord.z).r;
  }
  gl_FragColor = color;
}

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Benoit Girard
On Thu, Oct 10, 2013 at 7:59 AM, Andreas Gal andreas@gmail.com wrote:

 Rationale:
 switching shaders tends to be expensive.


In my opinion this is the only argument for working on this at moment.
Particularly at the moment where we're overwhelmed with high priority
desktop and mobile graphics work, I'd like to see numbers before we
consider a change. I have seen no indications that we get hurt by switching
shaders. I suspected it might matter when we start to have 100s of layers
in a single page but we always fall down from another reason before this
can become a problem. I'd like to be able to answer 'In which use cases
would patching this lead to a user measurable improvement?' before working
on this. Right now we have a long list of bugs where we have a clear answer
to that question. Patching this is good to check off that we're using the
GPU optimally on the GPU best practice dev guides and will later help us
batch draw calls more aggressively but I'd like to have data to support
this first.

Also old Android drivers are a bit touchy with shaders so I recommend
counting some dev times for resolving these issues.

I know that roc and nrc have some plans for introducing more shaders which
will make a unified shader approach more difficult. I'll let them weight in
here.

On the flip side I suspect having a single unified shader will be faster to
compile then the several shaders we have on the start-up path.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Benoit Jacob
I'll pile on what Benoit G said --- this is the kind of work that would
require very careful performance measurements before we commit to it.

Also, like Benoit said, we have seen no indication that glUseProgram is
hurting us. General GPU wisdom is that switching programs is not per se
expensive as long as one is not relinking them, and besides the general
performance caveat with any state change, forcing to split drawing into
multiple draw-calls, which also applies to updating uniforms, so we're not
escaping it here.

In addition to that, not all GPUs have real branching. My Sandy Bridge
Intel chipset has real branching, but older Intel integrated GPUs don't,
and I'd be very surprised if all of the mobile GPUs we're currently
supporting did. To put this in perspective, in the world of discrete
desktop NVIDIA GPUs, this was only introduced in the Geforce 6 series. Old,
but a lot more advanced that some integrated/mobile devices we still
support. On GPUs that are not capable of actual branching, if...else blocks
are implemented by executing all branches and masking the result. On such
GPUs, a unified shader would run considerably slower, basically N times
slower for N branches. Even on GPUs with branching, each branching has a
cost and we have N of them, so in all cases the unified shader approach
introduces new (at least potential) scalability issues.

So if we wanted to invest in this, we would need to conduct careful
benchmarking on a wide range of hardware.

Benoit


2013/10/10 Benoit Girard bgir...@mozilla.com

 On Thu, Oct 10, 2013 at 7:59 AM, Andreas Gal andreas@gmail.com
 wrote:

  Rationale:
  switching shaders tends to be expensive.
 

 In my opinion this is the only argument for working on this at moment.
 Particularly at the moment where we're overwhelmed with high priority
 desktop and mobile graphics work, I'd like to see numbers before we
 consider a change. I have seen no indications that we get hurt by switching
 shaders. I suspected it might matter when we start to have 100s of layers
 in a single page but we always fall down from another reason before this
 can become a problem. I'd like to be able to answer 'In which use cases
 would patching this lead to a user measurable improvement?' before working
 on this. Right now we have a long list of bugs where we have a clear answer
 to that question. Patching this is good to check off that we're using the
 GPU optimally on the GPU best practice dev guides and will later help us
 batch draw calls more aggressively but I'd like to have data to support
 this first.

 Also old Android drivers are a bit touchy with shaders so I recommend
 counting some dev times for resolving these issues.

 I know that roc and nrc have some plans for introducing more shaders which
 will make a unified shader approach more difficult. I'll let them weight in
 here.

 On the flip side I suspect having a single unified shader will be faster to
 compile then the several shaders we have on the start-up path.
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Benoit Jacob
2013/10/10 Benoit Jacob jacob.benoi...@gmail.com

 I'll pile on what Benoit G said --- this is the kind of work that would
 require very careful performance measurements before we commit to it.

 Also, like Benoit said, we have seen no indication that glUseProgram is
 hurting us. General GPU wisdom is that switching programs is not per se
 expensive as long as one is not relinking them, and besides the general
 performance caveat with any state change, forcing to split drawing into
 multiple draw-calls, which also applies to updating uniforms, so we're not
 escaping it here.

 In addition to that, not all GPUs have real branching. My Sandy Bridge
 Intel chipset has real branching, but older Intel integrated GPUs don't,
 and I'd be very surprised if all of the mobile GPUs we're currently
 supporting did. To put this in perspective, in the world of discrete
 desktop NVIDIA GPUs, this was only introduced in the Geforce 6 series.


In fact, even on a Geforce 6, we only get full real CPU-like (MIMD)
branching in vertex shaders, not in fragment shaders.

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html

Benoit
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Nicolas Silva
I do appreciate the fact that it reduces complexity (in addition to less
state changes).

I agree that the decision of dedicating resources on that rather than on
other high priority projects that are in the pipes should be motivated by
some numbers.

Cheers,

Nical




On Thu, Oct 10, 2013 at 11:04 AM, Benoit Jacob jacob.benoi...@gmail.comwrote:

 2013/10/10 Benoit Jacob jacob.benoi...@gmail.com

  I'll pile on what Benoit G said --- this is the kind of work that would
  require very careful performance measurements before we commit to it.
 
  Also, like Benoit said, we have seen no indication that glUseProgram is
  hurting us. General GPU wisdom is that switching programs is not per se
  expensive as long as one is not relinking them, and besides the general
  performance caveat with any state change, forcing to split drawing into
  multiple draw-calls, which also applies to updating uniforms, so we're
 not
  escaping it here.
 
  In addition to that, not all GPUs have real branching. My Sandy Bridge
  Intel chipset has real branching, but older Intel integrated GPUs don't,
  and I'd be very surprised if all of the mobile GPUs we're currently
  supporting did. To put this in perspective, in the world of discrete
  desktop NVIDIA GPUs, this was only introduced in the Geforce 6 series.
 

 In fact, even on a Geforce 6, we only get full real CPU-like (MIMD)
 branching in vertex shaders, not in fragment shaders.

 http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html

 Benoit
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Milan Sreckovic
I didn't see anything in this message that suggested we should drop everything 
we're doing and start on this right now, but most of the early comments I'm 
seeing are commenting on that.  Let's make that a separate discussion.

If we didn't have all these variations, what would we do?  Would we just do 
one, then add more (and add code complexity) only if we find significant 
performance improvements?  I'd have to go with probably yes on that one :)

Let's see if we can focus the discussion on what should we have, and then on 
how do we get there, and let's not worry about when and how for now.  Gets in 
the way.

--
- Milan

On 2013-10-10, at 7:59 , Andreas Gal andreas@gmail.com wrote:

 Hi,
 
 we currently have a zoo of shaders to render layers:
 
  RGBALayerProgramType,
  BGRALayerProgramType,
  RGBXLayerProgramType,
  BGRXLayerProgramType,
  RGBARectLayerProgramType,
  RGBXRectLayerProgramType,
  BGRARectLayerProgramType,
  RGBAExternalLayerProgramType,
  ColorLayerProgramType,
  YCbCrLayerProgramType,
  ComponentAlphaPass1ProgramType,
  ComponentAlphaPass1RGBProgramType,
  ComponentAlphaPass2ProgramType,
  ComponentAlphaPass2RGBProgramType,
 
 (I have just eliminated the Copy2D variants, so omitted here.)
 
 Next, I would like to replace everything but the YCbCr and ComponentAlpha 
 shaders with one unified shader (attached below).
 
 Rationale:
 
 Most of our shader programs only differ minimally in cycle count, and we are 
 generally memory bound, not GPU cycle bound (even on mobile). In addition, 
 GPUs are actually are very efficient at branching, as long the branch is 
 uniform and doesn't change direction per pixel or vertex (the driver compiles 
 essentially variants and runs that). Last but not least, switching shaders 
 tends to be expensive.
 
 Proposed approach:
 
 We use a single shader to replace the current 8 layer shaders. I verified 
 with the mali shader compiler that the shortest path (color layer) is pretty 
 close to the old color shader (is now 3, due to the opacity multiplication, 
 was 1). For a lot of scenes we will be able to render without ever switching 
 shaders, so that should more than make up for the extra cycles, especially 
 since we are memory bound anyway.
 
 More uniforms have to be set per shader invocation, but that should be pretty 
 cheap.
 
 I completely dropped the distinction of 2D and 3D masks. 3D masks should be 
 able to handle the 2D case and the cycle savings are minimal, and as 
 mentioned before, irrelevant.
 
 An important advantage is that with this approach we can now easily add 
 additional layer effects to the pipeline without exponentially exploding the 
 number of programs 
 (RGBXRectLayerProgramWithGrayscaleAndWithoutOpacityButMaskAndNotMask3D…).
 
 Also, last but not least, this reduces code complexity quite a bit.
 
 Feedback welcome.
 
 Thanks,
 
 Andreas
 
 ---
 
 // Base color (will be rendered if layer texture is not read).
 uniform vec4 uColor;
 
 // Layer texture (disabled for color layers).
 uniform bool uTextureEnabled;
 uniform vec2 uTexCoordMultiplier;
 uniform bool uTextureBGRA; // Default is RGBA.
 uniform bool uTextureNoAlpha;
 uniform float uTextureOpacity;
 uniform sampler2D uTexture;
 uniform bool uTextureUseExternalOES;
 uniform samplerExternalOES uTextureExternalOES;
 #ifndef GL_ES
 uniform bool uTextureUseRect;
 uniform sampler2DRect uTextureRect;
 #endif
 
 // Masking (optional)
 uniform bool uMaskEnabled;
 varying vec3 vMaskCoord;
 uniform sampler2D uMaskTexture;
 
 void main()
 {
  vec4 color = uColor;
  if (uTextureEnabled) {
vec2 texCoord = vTexCoord * uTexCoordMultiplier;
if (uTextureUseExternalOES) {
  color = texture2D(uTextureExternalOES, texCoord);
 #ifndef GL_ES
} else if (uTextureUseRect) {
  color = texture2DRect(uTexture, texCoord);
 #endif
} else {
  color = texture2D(uTexture, texCoord);
}
if (uTextureBGRA) {
  color = color.bgra;
}
if (uTextureNoAlpha) {
  color = vec4(color.rgb, 1.0);
}
color *= uTextureOpacity;
  }
  if (uMaskEnabled) {
color *= texture2D(uMaskTexture, vMaskCoord.xy / vMaskCoord.z).r;
  }
  gl_FragColor = color;
 }
 
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unified shader for layer rendering

2013-10-10 Thread Milan Sreckovic
Vlad put in a let's see if we can cache compiled shaders bug in a few weeks 
ago, and perhaps that is something we should consider when discussing shaders 
in general.  I didn't know about recompiling when some uniforms change though, 
that's good intel.
--
- Milan

On 2013-10-10, at 15:13 , Jeff Gilbert jgilb...@mozilla.com wrote:

 I'll also add a note that just because we aren't recompiling doesn't mean the 
 driver isn't.
 If we change enough (or maybe just the correct) uniforms, this can cause the 
 driver to recompile the shader, which is indeed slow. Trying to unify too 
 many shader types might just tickle this.
 
 Some drivers will shoot us a warning via KHR_debug that we can catch, when 
 shader-recompilation happens.
 
 -Jeff
 
 - Original Message -
 From: Nicolas Silva nical.si...@gmail.com
 To: Benoit Jacob jacob.benoi...@gmail.com
 Cc: Benoit Girard bgir...@mozilla.com, dev-platform@lists.mozilla.org, 
 Andreas Gal andreas@gmail.com
 Sent: Thursday, October 10, 2013 11:23:45 AM
 Subject: Re: unified shader for layer rendering
 
 I do appreciate the fact that it reduces complexity (in addition to less
 state changes).
 
 I agree that the decision of dedicating resources on that rather than on
 other high priority projects that are in the pipes should be motivated by
 some numbers.
 
 Cheers,
 
 Nical
 
 
 
 
 On Thu, Oct 10, 2013 at 11:04 AM, Benoit Jacob 
 jacob.benoi...@gmail.comwrote:
 
 2013/10/10 Benoit Jacob jacob.benoi...@gmail.com
 
 I'll pile on what Benoit G said --- this is the kind of work that would
 require very careful performance measurements before we commit to it.
 
 Also, like Benoit said, we have seen no indication that glUseProgram is
 hurting us. General GPU wisdom is that switching programs is not per se
 expensive as long as one is not relinking them, and besides the general
 performance caveat with any state change, forcing to split drawing into
 multiple draw-calls, which also applies to updating uniforms, so we're
 not
 escaping it here.
 
 In addition to that, not all GPUs have real branching. My Sandy Bridge
 Intel chipset has real branching, but older Intel integrated GPUs don't,
 and I'd be very surprised if all of the mobile GPUs we're currently
 supporting did. To put this in perspective, in the world of discrete
 desktop NVIDIA GPUs, this was only introduced in the Geforce 6 series.
 
 
 In fact, even on a Geforce 6, we only get full real CPU-like (MIMD)
 branching in vertex shaders, not in fragment shaders.
 
 http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html
 
 Benoit
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
 
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform