Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Yeah, st/mesa also compiles shaders on the first use, so we've got 3 places to fix: Wine, st/mesa, the driver. Marek On Wed, Aug 28, 2013 at 2:07 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). Also let's assume that shaders are small and having allot of shader variants around after they are compiled isn't bad. In this case the probably best solution is to compile early and try to make the shaders as state invariant as possible, e.g. don't do optimizations like getting ride of extra exports for case where we don't need the alpha test or if it's just a dependency on a boolean then have both variants covered by the bytecode and use a bit constant to choose between the two etc... As a second step the driver should create a optimized version of the shader in a background thread when we know all the state that is/was active when the shader is used. Of course you need a bit of heuristic for this, cause sometimes it is better to switch between shader variants and other times it is better to have one variant covering all the different states and just use bit constants to choose between them. Just some thoughts on this topic, Christian. PS: My mail server is once more driving me nuts, please ignore the extra copy if you get this mail twice. Am 28.08.2013 02:07, schrieb Vadim Girlin: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 28 August 2013 12:17, Marek Olšák mar...@gmail.com wrote: Yeah, st/mesa also compiles shaders on the first use, so we've got 3 places to fix: Wine, st/mesa, the driver. For what it's worth, while Wine definitely has some room for improvement in this regard, in some cases we don't get the shaders any earlier from the application either. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 01:15 PM, Christian König wrote: Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). No, I'm not really volunteering to implement that. :) I'm not even sure if it's possible in reasonable time. In fact it was more like a theoretical discussion about what would be required for the early compilation in the driver to make sense. Perhaps I failed to explain it, but actually my point is that while the compilation is deferred in upper layers and nobody is going to change this (if it's possible at all), it doesn't make sense to try compiling early in the driver. I think we might prefer to defer the compilation in the driver as well - it doesn't make overall situation any worse, but can make it better by not compiling unused variants at least. Vadim Also let's assume that shaders are small and having allot of shader variants around after they are compiled isn't bad. In this case the probably best solution is to compile early and try to make the shaders as state invariant as possible, e.g. don't do optimizations like getting ride of extra exports for case where we don't need the alpha test or if it's just a dependency on a boolean then have both variants covered by the bytecode and use a bit constant to choose between the two etc... As a second step the driver should create a optimized version of the shader in a background thread when we know all the state that is/was active when the shader is used. Of course you need a bit of heuristic for this, cause sometimes it is better to switch between shader variants and other times it is better to have one variant covering all the different states and just use bit constants to choose between them. Just some thoughts on this topic, Christian. PS: My mail server is once more driving me nuts, please ignore the extra copy if you get this mail twice. Am 28.08.2013 02:07, schrieb Vadim Girlin: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On Wed, Aug 28, 2013 at 7:56 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 01:15 PM, Christian König wrote: Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). No, I'm not really volunteering to implement that. :) I'm not even sure if it's possible in reasonable time. In fact it was more like a theoretical discussion about what would be required for the early compilation in the driver to make sense. Perhaps I failed to explain it, but actually my point is that while the compilation is deferred in upper layers and nobody is going to change this (if it's possible at all), it doesn't make sense to try compiling early in the driver. I think we might prefer to defer the compilation in the driver as well - it doesn't make overall situation any worse, but can make it better by not compiling unused variants at least. Sounds good to me. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { /* never export more colors than the number of CBs */ - if (shader-output[i].sid = key.nr_cbufs) { + if (shader-output[i].sid = max_color_exports) { /* skip export */ j--; continue; @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; shader-nr_ps_color_exports++; if (shader-fs_write_all (rscreen-chip_class = EVERGREEN)) { - for (k = 1; k key.nr_cbufs; k++) { + for (k = 1; k max_color_exports; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { /* never export more colors than the number of CBs */ - if (shader-output[i].sid = key.nr_cbufs) { + if (shader-output[i].sid = max_color_exports) { /* skip export */ j--; continue; @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; shader-nr_ps_color_exports++; if (shader-fs_write_all (rscreen-chip_class = EVERGREEN)) { - for (k = 1; k key.nr_cbufs; k++) { + for (k = 1; k max_color_exports; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/27/2013 11:00 PM, Roland Scheidegger wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. I thought about performance, but my main concern now is to avoid serious regressions after enabling sb, we can try to improve it later. Even if we won't emit this color export, we'll have fake export (with all color components masked) instead, and I'm not sure whether it's cheaper. Possibly hardware can see that there is no actual memory write, but benchmarks are needed to prove it. Also there is another possible improvement for exports - sometimes we need to export depth/stencil but no colors, probably we can get rid of fake color export as well in such cases. Anyway, this also needs additional testing/benchmarking. Vadim Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { /* never export more colors than the number of CBs */ - if (shader-output[i].sid = key.nr_cbufs) { + if (shader-output[i].sid = max_color_exports) { /* skip export */ j--; continue; @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; shader-nr_ps_color_exports++; if (shader-fs_write_all (rscreen-chip_class = EVERGREEN)) { - for (k = 1; k key.nr_cbufs; k++) { + for (k = 1; k max_color_exports; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all =
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Am 27.08.2013 23:52, schrieb Vadim Girlin: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Hmm ok that seems a way more complicated problem than I thought :-). Compile early and you might compile variants you will never use, compile late and the delay might be noticeable. I just thought it might be unlikely you'd actually need two variants - e.g. some depth exporting shader is probably unlikely to use alpha test. But ok I guess it shouldn't write color in this case, so even then it might never be worth bothering. Was just a random idea ;-). Roland Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false;
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 02:28 AM, Roland Scheidegger wrote: Am 27.08.2013 23:52, schrieb Vadim Girlin: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Hmm ok that seems a way more complicated problem than I thought :-). Compile early and you might compile variants you will never use, compile late and the delay might be noticeable. Compilation of unused variants is not bad if they are compiled at the game/level loading time, I think many apps are trying to compile shaders early to avoid freezes during gameplay. But trying to compile early in the driver doesn't make sense currently because it's already too late anyway, if I'm not missing something, it's deferred in mesa or state tracker. Otherwise probably it would be preferable for the driver to precompile variants that are likely to be used (but only if we really can do it early, at the loading time when shaders are created by the app). I just thought it might be unlikely you'd actually need two variants - e.g. some depth exporting shader is probably unlikely to use alpha test. But ok I guess it shouldn't write color in this case, so even then it might never be worth bothering. Was just a random idea ;-). I think it's a good idea that just needs some benchmarking to make sure that it can provide any real benefits. I only wanted to say that it can be done separately from this fix. Vadim Roland Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes