Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Marek Olšák
Yeah, st/mesa also compiles shaders on the first use, so we've got 3
places to fix: Wine, st/mesa, the driver.

Marek

On Wed, Aug 28, 2013 at 2:07 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 08/28/2013 02:59 AM, Marek Olšák wrote:

 First, you won't really see any significant continual difference in
 frame rate no matter how many shader variants you have unless you are
 very CPU-bound. The problem is shader compilation on the first use,
 that's where you get a big hiccup. Try Skyrim for example: You have to
 first look around and see every object that's around you and get
 unpleasant stuttering before you can actually go on and play the game.
 Yes, this also Wine's fault that it compiles shaders on the first use
 too, but we don't have to be as bad as Wine, do we? Valve also
 reported shader recompilations on the first use being a serious issue
 with open source drivers.


 I perfectly understand that deferred compilation is exactly the problem that
 makes the games freeze due to shader compilation on first use when something
 new appears on the screen, but I don't think we can solve this problem in
 the *driver* by trying to compile early, because AFAICS currently the
 shaders are passed to the driver too late anyway, and this happens not only
 with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump
 R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at
 the same time, what I see is that most of GL dumps happen while Heaven shows
 splash screen with loading progress, but most of the driver's dumps appear
 on the first frame and few more times during benchmark. It looks like
 compilation is deferred somewhere in the stack before the driver, or am I
 missing something?

 Vadim




 Marek

 On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com
 wrote:

 On 08/28/2013 12:43 AM, Marek Olšák wrote:


 Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
 with a Mesa driver that likes to compile shader variants on first use?
 It's HORRIBLE.



 I don't think that shader variants are bad, but it's definitely bad when
 we
 are compiling variants that are never used. Currently glxgears compiles
 18
 ps/vs shaders. In my branch with initial GS support [1] I switched
 handling
 of the shaders to deferred compilation, that is, shaders are compiled
 only
 before the actual draw. I found later that it's not really required for
 GS,
 but IIRC this change results in only 5 shaders being compiled for
 glxgears
 instead of 18. It seems most of the useless variants are results of state
 changes between creation of the shader state (initial compilation) and
 actual draw call.

 I had some concerns about increased overhead with those changes, and it's
 actually noticeable with drawoverhead demo, but I didn't see any
 regressions
 with a few real apps that I tested, e.g. glxgears even showed slightly
 better performance with these changes. Probably I also implemented it in
 a
 not very optimal way (I was mostly concentrated on GS support) and the
 overhead can be reduced.

 One more thing is duplicate shaders, I've analyzed shader dumps from
 Unigine
 Heaven 3.0 some time ago and found that from about 320 compiled shaders,
 only about 180 (50%) were unique, others were duplicates (detected by
 comparing the bytecode dumps for them in an automated way), maybe they
 had
 different shader keys (which still resulted in the same bytecode), but I
 suspect duplicate pipe shaders were also involved. Unfortunately I didn't
 have a time to investigate it more thoroughly since then.

 So my point is that we don't really need to eliminate shader variants,
 first
 we need to eliminate compilation of unused variants and duplicate
 shaders.
 Also we might want to consider offloading of the compilation to separate
 thread(s) and caching of shader binaries between runs.

 Vadim

   [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



 What the patch does is probably the right solution. At least
 alpha-test state changes don't cause shader recompilation and
 re-binding, which also negatively affects performance. Ideally we
 shouldn't depend on the framebuffer state at all, but we need to
 emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
 should always be fine with key.nr_cbufs forced to 8 for any shader
 without that property. I expect app developers to do the right thing
 and not write outputs they don't need.

 Marek

 On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com
 wrote:


 Not that I'm qualified to review r600 code, but couldn't you create
 different shader variants depending on whether you need alpha test? At
 least I would assume shader exports aren't free.

 Roland

 Am 27.08.2013 19:56, schrieb Vadim Girlin:


 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---

 Tested on evergreen with multiple combinations of backends 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Christian König
Well, for this discussion let's just assume that we fixed the delay in 
the upper layers of the stack and the driver sees the shader code as 
soon as the application (if I understood it correctly Vadim has just 
volunteered for the job).


Also let's assume that shaders are small and having allot of shader 
variants around after they are compiled isn't bad.


In this case the probably best solution is to compile early and try to 
make the shaders as state invariant as possible, e.g. don't do 
optimizations like getting ride of extra exports for case where we don't 
need the alpha test or if it's just a dependency on a boolean then have 
both variants covered by the bytecode and use a bit constant to choose 
between the two etc...


As a second step the driver should create a optimized version of the 
shader in a background thread when we know all the state that is/was 
active when the shader is used.


Of course you need a bit of heuristic for this, cause sometimes it is 
better to switch between shader variants and other times it is better to 
have one variant covering all the different states and just use bit 
constants to choose between them.


Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extra 
copy if you get this mail twice.


Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the 
problem that makes the games freeze due to shader compilation on first 
use when something new appears on the screen, but I don't think we can 
solve this problem in the *driver* by trying to compile early, because 
AFAICS currently the shaders are passed to the driver too late anyway, 
and this happens not only with wine. E.g. when I run Heaven in a 
window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see 
Heaven's window and console output at the same time, what I see is 
that most of GL dumps happen while Heaven shows splash screen with 
loading progress, but most of the driver's dumps appear on the first 
frame and few more times during benchmark. It looks like compilation 
is deferred somewhere in the stack before the driver, or am I missing 
something?


Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin 
vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad 
when we
are compiling variants that are never used. Currently glxgears 
compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched 
handling
of the shaders to deferred compilation, that is, shaders are 
compiled only
before the actual draw. I found later that it's not really required 
for GS,
but IIRC this change results in only 5 shaders being compiled for 
glxgears
instead of 18. It seems most of the useless variants are results of 
state

changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and 
it's
actually noticeable with drawoverhead demo, but I didn't see any 
regressions

with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented 
it in a

not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from 
Unigine
Heaven 3.0 some time ago and found that from about 320 compiled 
shaders,

only about 180 (50%) were unique, others were duplicates (detected by
comparing the bytecode dumps for them in an automated way), maybe 
they had
different shader keys (which still resulted in the same bytecode), 
but I
suspect duplicate pipe shaders were also involved. Unfortunately I 
didn't

have a time to investigate it more thoroughly since then.

So my point is that we don't really need to eliminate shader 
variants, first
we need to eliminate compilation of unused variants and duplicate 
shaders.
Also we might want to consider offloading of the compilation to 
separate

thread(s) and caching of shader binaries 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Henri Verbeet
On 28 August 2013 12:17, Marek Olšák mar...@gmail.com wrote:
 Yeah, st/mesa also compiles shaders on the first use, so we've got 3
 places to fix: Wine, st/mesa, the driver.

For what it's worth, while Wine definitely has some room for
improvement in this regard, in some cases we don't get the shaders any
earlier from the application either.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Vadim Girlin

On 08/28/2013 01:15 PM, Christian König wrote:

Well, for this discussion let's just assume that we fixed the delay in
the upper layers of the stack and the driver sees the shader code as
soon as the application (if I understood it correctly Vadim has just
volunteered for the job).


No, I'm not really volunteering to implement that. :)
I'm not even sure if it's possible in reasonable time. In fact it was 
more like a theoretical discussion about what would be required for the 
early compilation in the driver to make sense.


Perhaps I failed to explain it, but actually my point is that while the 
compilation is deferred in upper layers and nobody is going to change 
this (if it's possible at all), it doesn't make sense to try compiling 
early in the driver. I think we might prefer to defer the compilation in 
the driver as well - it doesn't make overall situation any worse, but 
can make it better by not compiling unused variants at least.


Vadim


Also let's assume that shaders are small and having allot of shader
variants around after they are compiled isn't bad.

In this case the probably best solution is to compile early and try to
make the shaders as state invariant as possible, e.g. don't do
optimizations like getting ride of extra exports for case where we don't
need the alpha test or if it's just a dependency on a boolean then have
both variants covered by the bytecode and use a bit constant to choose
between the two etc...

As a second step the driver should create a optimized version of the
shader in a background thread when we know all the state that is/was
active when the shader is used.

Of course you need a bit of heuristic for this, cause sometimes it is
better to switch between shader variants and other times it is better to
have one variant covering all the different states and just use bit
constants to choose between them.

Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extra
copy if you get this mail twice.

Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the
problem that makes the games freeze due to shader compilation on first
use when something new appears on the screen, but I don't think we can
solve this problem in the *driver* by trying to compile early, because
AFAICS currently the shaders are passed to the driver too late anyway,
and this happens not only with wine. E.g. when I run Heaven in a
window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see
Heaven's window and console output at the same time, what I see is
that most of GL dumps happen while Heaven shows splash screen with
loading progress, but most of the driver's dumps appear on the first
frame and few more times during benchmark. It looks like compilation
is deferred somewhere in the stack before the driver, or am I missing
something?

Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin
vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad
when we
are compiling variants that are never used. Currently glxgears
compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched
handling
of the shaders to deferred compilation, that is, shaders are
compiled only
before the actual draw. I found later that it's not really required
for GS,
but IIRC this change results in only 5 shaders being compiled for
glxgears
instead of 18. It seems most of the useless variants are results of
state
changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and
it's
actually noticeable with drawoverhead demo, but I didn't see any
regressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented
it in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from
Unigine
Heaven 3.0 some time 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Marek Olšák
On Wed, Aug 28, 2013 at 7:56 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 08/28/2013 01:15 PM, Christian König wrote:

 Well, for this discussion let's just assume that we fixed the delay in
 the upper layers of the stack and the driver sees the shader code as
 soon as the application (if I understood it correctly Vadim has just
 volunteered for the job).


 No, I'm not really volunteering to implement that. :)
 I'm not even sure if it's possible in reasonable time. In fact it was more
 like a theoretical discussion about what would be required for the early
 compilation in the driver to make sense.

 Perhaps I failed to explain it, but actually my point is that while the
 compilation is deferred in upper layers and nobody is going to change this
 (if it's possible at all), it doesn't make sense to try compiling early in
 the driver. I think we might prefer to defer the compilation in the driver
 as well - it doesn't make overall situation any worse, but can make it
 better by not compiling unused variants at least.

Sounds good to me.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Roland Scheidegger
Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:
 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.
 
 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---
 
 Tested on evergreen with multiple combinations of backends - no regressions,
 fixes some tests:
 
   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)
 
 With this patch, there are no regressions with default+sb vs default.
 There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
 AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H 
 instructions are not placed in the same TEX clause with corresponding 
 SAMPLE_G.
 
  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 300b5c4..f7eab76 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   unsigned opcode;
   int i, j, k, r = 0;
   int next_pos_base = 60, next_param_base = 0;
 + int max_color_exports = MAX2(key.nr_cbufs, 1);
   /* Declarations used by llvm code */
   bool use_llvm = false;
   bool indirect_gprs;
 @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   radeon_llvm_ctx.face_gpr = ctx.face_gpr;
   radeon_llvm_ctx.r600_inputs = ctx.shader-input;
   radeon_llvm_ctx.r600_outputs = ctx.shader-output;
 - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
 + radeon_llvm_ctx.color_buffer_count = max_color_exports;
   radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
   radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
 (rscreen-chip_class = EVERGREEN);
   radeon_llvm_ctx.stream_outputs = so;
 @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   case TGSI_PROCESSOR_FRAGMENT:
   if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
   /* never export more colors than the number of 
 CBs */
 - if (shader-output[i].sid = key.nr_cbufs) {
 + if (shader-output[i].sid = max_color_exports) 
 {
   /* skip export */
   j--;
   continue;
 @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   output[j].type = 
 V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
   shader-nr_ps_color_exports++;
   if (shader-fs_write_all  
 (rscreen-chip_class = EVERGREEN)) {
 - for (k = 1; k  key.nr_cbufs; k++) {
 + for (k = 1; k  max_color_exports; k++) 
 {
   j++;
   memset(output[j], 0, 
 sizeof(struct r600_bytecode_output));
   output[j].gpr = 
 shader-output[i].gpr;
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Marek Olšák
Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.

What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote:
 Not that I'm qualified to review r600 code, but couldn't you create
 different shader variants depending on whether you need alpha test? At
 least I would assume shader exports aren't free.

 Roland

 Am 27.08.2013 19:56, schrieb Vadim Girlin:
 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---

 Tested on evergreen with multiple combinations of backends - no regressions,
 fixes some tests:

   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)

 With this patch, there are no regressions with default+sb vs default.
 There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
 AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H
 instructions are not placed in the same TEX clause with corresponding 
 SAMPLE_G.

  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 300b5c4..f7eab76 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   unsigned opcode;
   int i, j, k, r = 0;
   int next_pos_base = 60, next_param_base = 0;
 + int max_color_exports = MAX2(key.nr_cbufs, 1);
   /* Declarations used by llvm code */
   bool use_llvm = false;
   bool indirect_gprs;
 @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   radeon_llvm_ctx.face_gpr = ctx.face_gpr;
   radeon_llvm_ctx.r600_inputs = ctx.shader-input;
   radeon_llvm_ctx.r600_outputs = ctx.shader-output;
 - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
 + radeon_llvm_ctx.color_buffer_count = max_color_exports;
   radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
   radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
 (rscreen-chip_class = EVERGREEN);
   radeon_llvm_ctx.stream_outputs = so;
 @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   case TGSI_PROCESSOR_FRAGMENT:
   if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
   /* never export more colors than the number of 
 CBs */
 - if (shader-output[i].sid = key.nr_cbufs) {
 + if (shader-output[i].sid = 
 max_color_exports) {
   /* skip export */
   j--;
   continue;
 @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
 *rscreen,
   output[j].type = 
 V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
   shader-nr_ps_color_exports++;
   if (shader-fs_write_all  
 (rscreen-chip_class = EVERGREEN)) {
 - for (k = 1; k  key.nr_cbufs; k++) {
 + for (k = 1; k  max_color_exports; 
 k++) {
   j++;
   memset(output[j], 0, 
 sizeof(struct r600_bytecode_output));
   output[j].gpr = 
 shader-output[i].gpr;

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/27/2013 11:00 PM, Roland Scheidegger wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.


I thought about performance, but my main concern now is to avoid serious 
regressions after enabling sb, we can try to improve it later.


Even if we won't emit this color export, we'll have fake export (with 
all color components masked) instead, and I'm not sure whether it's 
cheaper. Possibly hardware can see that there is no actual memory write, 
but benchmarks are needed to prove it.


Also there is another possible improvement for exports - sometimes we 
need to export depth/stencil but no colors, probably we can get rid of 
fake color export as well in such cases. Anyway, this also needs 
additional testing/benchmarking.


Vadim



Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no regressions,
fixes some tests:

   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H
instructions are not placed in the same TEX clause with corresponding SAMPLE_G.

  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
unsigned opcode;
int i, j, k, r = 0;
int next_pos_base = 60, next_param_base = 0;
+   int max_color_exports = MAX2(key.nr_cbufs, 1);
/* Declarations used by llvm code */
bool use_llvm = false;
bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
radeon_llvm_ctx.face_gpr = ctx.face_gpr;
radeon_llvm_ctx.r600_inputs = ctx.shader-input;
radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+   radeon_llvm_ctx.color_buffer_count = max_color_exports;
radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
(rscreen-chip_class = EVERGREEN);
radeon_llvm_ctx.stream_outputs = so;
@@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
case TGSI_PROCESSOR_FRAGMENT:
if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
/* never export more colors than the number of 
CBs */
-   if (shader-output[i].sid = key.nr_cbufs) {
+   if (shader-output[i].sid = max_color_exports) 
{
/* skip export */
j--;
continue;
@@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
output[j].type = 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
shader-nr_ps_color_exports++;
if (shader-fs_write_all  (rscreen-chip_class 
= EVERGREEN)) {
-   for (k = 1; k  key.nr_cbufs; k++) {
+   for (k = 1; k  max_color_exports; k++) 
{
j++;
memset(output[j], 0, 
sizeof(struct r600_bytecode_output));
output[j].gpr = 
shader-output[i].gpr;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 12:43 AM, Marek Olšák wrote:

Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.


I don't think that shader variants are bad, but it's definitely bad when 
we are compiling variants that are never used. Currently glxgears 
compiles 18 ps/vs shaders. In my branch with initial GS support [1] I 
switched handling of the shaders to deferred compilation, that is, 
shaders are compiled only before the actual draw. I found later that 
it's not really required for GS, but IIRC this change results in only 5 
shaders being compiled for glxgears instead of 18. It seems most of the 
useless variants are results of state changes between creation of the 
shader state (initial compilation) and actual draw call.


I had some concerns about increased overhead with those changes, and 
it's actually noticeable with drawoverhead demo, but I didn't see any 
regressions with a few real apps that I tested, e.g. glxgears even 
showed slightly better performance with these changes. Probably I also 
implemented it in a not very optimal way (I was mostly concentrated on 
GS support) and the overhead can be reduced.


One more thing is duplicate shaders, I've analyzed shader dumps from 
Unigine Heaven 3.0 some time ago and found that from about 320 compiled 
shaders, only about 180 (50%) were unique, others were duplicates 
(detected by comparing the bytecode dumps for them in an automated way), 
maybe they had different shader keys (which still resulted in the same 
bytecode), but I suspect duplicate pipe shaders were also involved. 
Unfortunately I didn't have a time to investigate it more thoroughly 
since then.


So my point is that we don't really need to eliminate shader variants, 
first we need to eliminate compilation of unused variants and duplicate 
shaders. Also we might want to consider offloading of the compilation to 
separate thread(s) and caching of shader binaries between runs.


Vadim

 [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no regressions,
fixes some tests:

   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H
instructions are not placed in the same TEX clause with corresponding SAMPLE_G.

  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
   unsigned opcode;
   int i, j, k, r = 0;
   int next_pos_base = 60, next_param_base = 0;
+ int max_color_exports = MAX2(key.nr_cbufs, 1);
   /* Declarations used by llvm code */
   bool use_llvm = false;
   bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
   radeon_llvm_ctx.face_gpr = ctx.face_gpr;
   radeon_llvm_ctx.r600_inputs = ctx.shader-input;
   radeon_llvm_ctx.r600_outputs = ctx.shader-output;
- radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+ radeon_llvm_ctx.color_buffer_count = max_color_exports;
   radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
   radeon_llvm_ctx.fs_color_all = 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Roland Scheidegger
Am 27.08.2013 23:52, schrieb Vadim Girlin:
 On 08/28/2013 12:43 AM, Marek Olšák wrote:
 Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
 with a Mesa driver that likes to compile shader variants on first use?
 It's HORRIBLE.
 
 I don't think that shader variants are bad, but it's definitely bad when
 we are compiling variants that are never used. Currently glxgears
 compiles 18 ps/vs shaders. In my branch with initial GS support [1] I
 switched handling of the shaders to deferred compilation, that is,
 shaders are compiled only before the actual draw. I found later that
 it's not really required for GS, but IIRC this change results in only 5
 shaders being compiled for glxgears instead of 18. It seems most of the
 useless variants are results of state changes between creation of the
 shader state (initial compilation) and actual draw call.
 
 I had some concerns about increased overhead with those changes, and
 it's actually noticeable with drawoverhead demo, but I didn't see any
 regressions with a few real apps that I tested, e.g. glxgears even
 showed slightly better performance with these changes. Probably I also
 implemented it in a not very optimal way (I was mostly concentrated on
 GS support) and the overhead can be reduced.
 
 One more thing is duplicate shaders, I've analyzed shader dumps from
 Unigine Heaven 3.0 some time ago and found that from about 320 compiled
 shaders, only about 180 (50%) were unique, others were duplicates
 (detected by comparing the bytecode dumps for them in an automated way),
 maybe they had different shader keys (which still resulted in the same
 bytecode), but I suspect duplicate pipe shaders were also involved.
 Unfortunately I didn't have a time to investigate it more thoroughly
 since then.
 
 So my point is that we don't really need to eliminate shader variants,
 first we need to eliminate compilation of unused variants and duplicate
 shaders. Also we might want to consider offloading of the compilation to
 separate thread(s) and caching of shader binaries between runs.

Hmm ok that seems a way more complicated problem than I thought :-).
Compile early and you might compile variants you will never use, compile
late and the delay might be noticeable.
I just thought it might be unlikely you'd actually need two variants -
e.g. some depth exporting shader is probably unlikely to use alpha test.
But ok I guess it shouldn't write color in this case, so even then it
might never be worth bothering. Was just a random idea ;-).

Roland


 
 Vadim
 
  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders
 

 What the patch does is probably the right solution. At least
 alpha-test state changes don't cause shader recompilation and
 re-binding, which also negatively affects performance. Ideally we
 shouldn't depend on the framebuffer state at all, but we need to
 emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
 should always be fine with key.nr_cbufs forced to 8 for any shader
 without that property. I expect app developers to do the right thing
 and not write outputs they don't need.

 Marek

 On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger
 srol...@vmware.com wrote:
 Not that I'm qualified to review r600 code, but couldn't you create
 different shader variants depending on whether you need alpha test? At
 least I would assume shader exports aren't free.

 Roland

 Am 27.08.2013 19:56, schrieb Vadim Girlin:
 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---

 Tested on evergreen with multiple combinations of backends - no
 regressions,
 fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
 regressions cased by reordering of exports in sb)

 With this patch, there are no regressions with default+sb vs default.
 There is one regression with llvm+sb vs llvm -
 fs-texturegrad-miplevels,
 AFAICS it's a problem with llvm backend uncovered by sb -
 SET_GRADIENTS_V/H
 instructions are not placed in the same TEX clause with
 corresponding SAMPLE_G.

   src/gallium/drivers/r600/r600_shader.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_shader.c
 b/src/gallium/drivers/r600/r600_shader.c
 index 300b5c4..f7eab76 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct
 r600_screen *rscreen,
unsigned opcode;
int i, j, k, r = 0;
int next_pos_base = 60, next_param_base = 0;
 + int max_color_exports = MAX2(key.nr_cbufs, 1);
/* Declarations used by llvm code */
bool use_llvm = false;

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Marek Olšák
First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.

Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 08/28/2013 12:43 AM, Marek Olšák wrote:

 Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
 with a Mesa driver that likes to compile shader variants on first use?
 It's HORRIBLE.


 I don't think that shader variants are bad, but it's definitely bad when we
 are compiling variants that are never used. Currently glxgears compiles 18
 ps/vs shaders. In my branch with initial GS support [1] I switched handling
 of the shaders to deferred compilation, that is, shaders are compiled only
 before the actual draw. I found later that it's not really required for GS,
 but IIRC this change results in only 5 shaders being compiled for glxgears
 instead of 18. It seems most of the useless variants are results of state
 changes between creation of the shader state (initial compilation) and
 actual draw call.

 I had some concerns about increased overhead with those changes, and it's
 actually noticeable with drawoverhead demo, but I didn't see any regressions
 with a few real apps that I tested, e.g. glxgears even showed slightly
 better performance with these changes. Probably I also implemented it in a
 not very optimal way (I was mostly concentrated on GS support) and the
 overhead can be reduced.

 One more thing is duplicate shaders, I've analyzed shader dumps from Unigine
 Heaven 3.0 some time ago and found that from about 320 compiled shaders,
 only about 180 (50%) were unique, others were duplicates (detected by
 comparing the bytecode dumps for them in an automated way), maybe they had
 different shader keys (which still resulted in the same bytecode), but I
 suspect duplicate pipe shaders were also involved. Unfortunately I didn't
 have a time to investigate it more thoroughly since then.

 So my point is that we don't really need to eliminate shader variants, first
 we need to eliminate compilation of unused variants and duplicate shaders.
 Also we might want to consider offloading of the compilation to separate
 thread(s) and caching of shader binaries between runs.

 Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



 What the patch does is probably the right solution. At least
 alpha-test state changes don't cause shader recompilation and
 re-binding, which also negatively affects performance. Ideally we
 shouldn't depend on the framebuffer state at all, but we need to
 emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
 should always be fine with key.nr_cbufs forced to 8 for any shader
 without that property. I expect app developers to do the right thing
 and not write outputs they don't need.

 Marek

 On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com
 wrote:

 Not that I'm qualified to review r600 code, but couldn't you create
 different shader variants depending on whether you need alpha test? At
 least I would assume shader exports aren't free.

 Roland

 Am 27.08.2013 19:56, schrieb Vadim Girlin:

 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---

 Tested on evergreen with multiple combinations of backends - no
 regressions,
 fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
 regressions cased by reordering of exports in sb)

 With this patch, there are no regressions with default+sb vs default.
 There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
 AFAICS it's a problem with llvm backend uncovered by sb -
 SET_GRADIENTS_V/H
 instructions are not placed in the same TEX clause with corresponding
 SAMPLE_G.

   src/gallium/drivers/r600/r600_shader.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_shader.c
 b/src/gallium/drivers/r600/r600_shader.c
 index 300b5c4..f7eab76 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen
 *rscreen,
unsigned 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 02:28 AM, Roland Scheidegger wrote:

Am 27.08.2013 23:52, schrieb Vadim Girlin:

On 08/28/2013 12:43 AM, Marek Olšák wrote:

Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.


I don't think that shader variants are bad, but it's definitely bad when
we are compiling variants that are never used. Currently glxgears
compiles 18 ps/vs shaders. In my branch with initial GS support [1] I
switched handling of the shaders to deferred compilation, that is,
shaders are compiled only before the actual draw. I found later that
it's not really required for GS, but IIRC this change results in only 5
shaders being compiled for glxgears instead of 18. It seems most of the
useless variants are results of state changes between creation of the
shader state (initial compilation) and actual draw call.

I had some concerns about increased overhead with those changes, and
it's actually noticeable with drawoverhead demo, but I didn't see any
regressions with a few real apps that I tested, e.g. glxgears even
showed slightly better performance with these changes. Probably I also
implemented it in a not very optimal way (I was mostly concentrated on
GS support) and the overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from
Unigine Heaven 3.0 some time ago and found that from about 320 compiled
shaders, only about 180 (50%) were unique, others were duplicates
(detected by comparing the bytecode dumps for them in an automated way),
maybe they had different shader keys (which still resulted in the same
bytecode), but I suspect duplicate pipe shaders were also involved.
Unfortunately I didn't have a time to investigate it more thoroughly
since then.

So my point is that we don't really need to eliminate shader variants,
first we need to eliminate compilation of unused variants and duplicate
shaders. Also we might want to consider offloading of the compilation to
separate thread(s) and caching of shader binaries between runs.


Hmm ok that seems a way more complicated problem than I thought :-).
Compile early and you might compile variants you will never use, compile
late and the delay might be noticeable.


Compilation of unused variants is not bad if they are compiled at the 
game/level loading time, I think many apps are trying to compile shaders 
early to avoid freezes during gameplay. But trying to compile early in 
the driver doesn't make sense currently because it's already too late 
anyway, if I'm not missing something, it's deferred in mesa or state 
tracker.


Otherwise probably it would be preferable for the driver to precompile 
variants that are likely to be used (but only if we really can do it 
early, at the loading time when shaders are created by the app).



I just thought it might be unlikely you'd actually need two variants -
e.g. some depth exporting shader is probably unlikely to use alpha test.
But ok I guess it shouldn't write color in this case, so even then it
might never be worth bothering. Was just a random idea ;-).


I think it's a good idea that just needs some benchmarking to make sure 
that it can provide any real benefits. I only wanted to say that it can 
be done separately from this fix.


Vadim




Roland




Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger
srol...@vmware.com wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no
regressions,
fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
 regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm -
fs-texturegrad-miplevels,
AFAICS it's a problem with llvm 

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the problem 
that makes the games freeze due to shader compilation on first use when 
something new appears on the screen, but I don't think we can solve this 
problem in the *driver* by trying to compile early, because AFAICS 
currently the shaders are passed to the driver too late anyway, and this 
happens not only with wine. E.g. when I run Heaven in a window with 
MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and 
console output at the same time, what I see is that most of GL dumps 
happen while Heaven shows splash screen with loading progress, but most 
of the driver's dumps appear on the first frame and few more times 
during benchmark. It looks like compilation is deferred somewhere in the 
stack before the driver, or am I missing something?


Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad when we
are compiling variants that are never used. Currently glxgears compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched handling
of the shaders to deferred compilation, that is, shaders are compiled only
before the actual draw. I found later that it's not really required for GS,
but IIRC this change results in only 5 shaders being compiled for glxgears
instead of 18. It seems most of the useless variants are results of state
changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and it's
actually noticeable with drawoverhead demo, but I didn't see any regressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented it in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from Unigine
Heaven 3.0 some time ago and found that from about 320 compiled shaders,
only about 180 (50%) were unique, others were duplicates (detected by
comparing the bytecode dumps for them in an automated way), maybe they had
different shader keys (which still resulted in the same bytecode), but I
suspect duplicate pipe shaders were also involved. Unfortunately I didn't
have a time to investigate it more thoroughly since then.

So my point is that we don't really need to eliminate shader variants, first
we need to eliminate compilation of unused variants and duplicate shaders.
Also we might want to consider offloading of the compilation to separate
thread(s) and caching of shader binaries between runs.

Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders




What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com
wrote:


Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:


We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no
regressions,
fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes