Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
FWIW I'm not really qualified to review this, but this alleviates the concerns I had some time ago about doing spilling for r600 before sb. So this makes all sense to me. Roland Am 28.04.2018 um 21:30 schrieb Gert Wollny: > this is another update of the series I've sent before. > > v3: > - Add new test mesa/st/tests/meson.build > - rebase patches to latest HEAD > > this is the merged version of two series [1] (TGSI: split, merge > and interleave arrays) and [2] (mesa/st/glsl_to_tgsi: Properly > resolve life times for simple if/else + use constructs) I sent > earlier. Considering that both parts target the same optimization > step and fix a bug if both are applied, I thought it is better to > add this second patch to the series. Changes refer to v1 of [1]: > > v2: > - rebase patches to latest HEAD > - add some code that allows obtaining some statistics about register >and instruction usage > - Add patch [2] that improves resolving the live range estimation with >simple if/else and use constructs. By adding this patch the series >fixes https://bugs.freedesktop.org/show_bug.cgi?id=105371 > > v1: > Patch 1: Split arrays that are only accessed directly: > I posted a first version off the the array splitting in patch 1 some > time ago. Eric Anholt pointed out that this might be done in > opt_array_splitting.cpp, but in another comment Timothy pointed out > that this is far from trivial, and he also pointed out that he was > proposing similar patches for NIR, but since currently no NIR->TGSI > transformation is available, TGSI based drivers can't make use of this. > > While the reminder off the series could be applied without this patch, I > think it makes less sense to do all the optimizations on arrays that could > simply be split into individual registers, so I repost the patch with some > changes. > > I tried to be exhaustive with comments and make the variable any type names > self-explaining, but since I've been staring at this code for a long time I > don't think I am capable of seeing any problems any more, so comments are > very > welcome. > > Best, > Gert > > PS: I have no git write access. > > > [1] https://patchwork.freedesktop.org/series/37991/ > [2] https://patchwork.freedesktop.org/series/39471/ > > > Gert Wollny (13): > mesa/st/glsl_to_tgsi: Add method to collect some statistics > mesa/st: glsl_to_tgsi: Split arrays who's elements are only accessed > directly > mesa/st/glsl_to_tgsi: Properly resolve life times simple if/else + use > constructs > mesa/st/glsl_to_tgsi:rename lifetime to register_live_range > mesa/st: Add helper classes for array merging and interleaving > mesa/st/glsl_to_tgsi: Add class to track array live range > mesa/st/glsl_to_tgsi:Add array merge logic > mesa/st/tests: Add tests for array merge helper classes. > mesa/st/glsl_to_tgsi: rename access_record to register_merge_record > and some more renames > mesa/st/glsl_to_tgsi: move evaluation of read mask up in the call > hierarchy > mesa/st/glsl_to_tgsi: add class for array access tracking > mesa/st/glsl_to_tgsi: add array life range evaluation into tracking > code > mesa/st/glsl_to_tgsi: Expose array live range tracking and merging > > src/mesa/Makefile.sources | 2 + > src/mesa/meson.build | 2 + > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 205 +- > .../state_tracker/st_glsl_to_tgsi_array_merge.cpp | 739 > + > .../state_tracker/st_glsl_to_tgsi_array_merge.h| 186 ++ > .../state_tracker/st_glsl_to_tgsi_temprename.cpp | 304 ++--- > .../state_tracker/st_glsl_to_tgsi_temprename.h | 45 +- > src/mesa/state_tracker/tests/Makefile.am | 20 +- > src/mesa/state_tracker/tests/meson.build | 14 + > src/mesa/state_tracker/tests/st_tests_common.cpp | 25 +- > src/mesa/state_tracker/tests/st_tests_common.h | 20 +- > .../tests/test_glsl_to_tgsi_array_merge.cpp| 296 + > .../tests/test_glsl_to_tgsi_lifetime.cpp | 33 +- > 13 files changed, 1760 insertions(+), 131 deletions(-) > create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_array_merge.cpp > create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_array_merge.h > create mode 100644 > src/mesa/state_tracker/tests/test_glsl_to_tgsi_array_merge.cpp > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
[...] > > $R600_DEBUG=merge ST_DEBUG=tgsi ./run wollny/ > > ATTENTION: default value of option > allow_glsl_extension_directive_midshader overridden by environment. > run: state_tracker/st_glsl_to_tgsi.cpp:5783: ureg_dst > dst_register(st_translate*, gl_register_file, unsigned int, unsigned > int): Assertion `array_id && array_id <= t->num_temp_arrays' failed. This would indicate that the array is split, but still addressed somewhere, i.e. I don't look at all possible locations where arrays are used. I think I debug this by letting the R600 driver advertise bindless texture support (obvioulsy it will not pass the tests, because it is not implemented and the hardware may not even support it, but the GLSL->TGSI conversion should still work. > > > I can, if you dont mind waiting for an answer sometimes. > I will rely on this, but I also will take some time do work on it. > But maybe even easier: is there an implicit/explicit magic number I > can play with to see if it changes anything? Unfortunately, in my code there are no specific magic numbers to change, if there is anything then it is probably in the LLVM backend (that is not used with r600). The only think you could try is to disable one of the two parts in my code: one is the split_array(), called around st_glsl_to_tgsi:7018 and the other is where the array merging and interleaving are applied, i.e. line st_glsl_to_tgsi:5537 My take is that this latter step is responsible for the regressions, especially the array interleaving could make it more difficult to optimize, because if the optimizer pass doesn't look at a per-component access, but on the higher level per-register base, then register live- ranges for the array elements will become longer, making it more difficult to schedule everything without spilling. > ATM it seems like your code improves half the shaders its affecting a > lot and hurting the other half bad like it hits an invisible wall. > I think one problem could be the relationship between VGPRs and SGPRs > used and max Wavefronts achieved. Unfortunately, on r600 such fine grained information is not available, so that I also have no real clue. Cheer, Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Am 29.04.2018 um 21:44 schrieb Gert Wollny: > Hello Benedict, > > thanks for all the testing! thanks for all the developing ;) > > On 29.04.2018 12:12, Benedikt Schemmer wrote: >>> Which are the names of these test? I'd like to check this on r600, >>> because here I didn't see any regressions last time I checked. >>> >> might of course be different on r600 (is bindless available?), >> also shader-db is more sensitive to problems than piglit >> >> 1. tests/spec/arb_bindless_texture/compiler/images/arrays-of-struct.frag >> 2. tests/spec/arb_bindless_texture/compiler/samplers/arrays-of-struct.frag > Indeed, bindless testures are not available on r600, so it is quite > difficult for me to test this. I would guess that parameters related to > this might be stored in the TGSI declaration that I currently don't check. > > If you have time for it, could you send me a TGSI dump of one of these > shaders? > With "ST_DEBUG=tgsi" this should be possible. I created 'R600_DEBUG=merge' so I can switch without having to recompile. $ST_DEBUG=tgsi ./run wollny/ ATTENTION: default value of option allow_glsl_extension_directive_midshader overridden by environment. FRAG DCL TEMP[0..55], ARRAY(1), LOCAL IMM[0] INT32 {0, 0, 0, 0} IMM[1] FLT32 {1., 2., 3., 4.} 0: STORE TEMP[0], IMM[0]., IMM[1], 2D 1: END wollny/41d88325fd2a57cb6af40de02dc281ee0683cc40_2.shader_test - LLVM diagnostic (remark): :0:0: 12 instructions in function wollny/41d88325fd2a57cb6af40de02dc281ee0683cc40_2.shader_test - Shader Stats: SGPRS: 16 VGPRS: 16 Code Size: 80 LDS: 0 Scratch: 0 Max Waves: 8 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0 FRAG DCL OUT[0], COLOR DCL TEMP[0..1], LOCAL DCL TEMP[2..57], ARRAY(1), LOCAL IMM[0] FLT32 {0., 0., 0., 0.} 0: MOV TEMP[0].xy, IMM[0]. 1: TEX TEMP[1], TEMP[0], TEMP[2].xyxy, 2D 2: MOV OUT[0], TEMP[1] 3: END wollny/a115868a349cd666b842a0e70f47451b4463903a_2.shader_test - LLVM diagnostic (remark): :0:0: 11 instructions in function wollny/a115868a349cd666b842a0e70f47451b4463903a_2.shader_test - Shader Stats: SGPRS: 24 VGPRS: 16 Code Size: 72 LDS: 0 Scratch: 0 Max Waves: 8 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0 Thread 0 took 0.13 seconds and compiled 2 shaders (not including SIMD16) with 1 GL context switches $R600_DEBUG=merge ST_DEBUG=tgsi ./run wollny/ ATTENTION: default value of option allow_glsl_extension_directive_midshader overridden by environment. run: state_tracker/st_glsl_to_tgsi.cpp:5783: ureg_dst dst_register(st_translate*, gl_register_file, unsigned int, unsigned int): Assertion `array_id && array_id <= t->num_temp_arrays' failed. => CRASHED <= while processing these shaders: wollny/41d88325fd2a57cb6af40de02dc281ee0683cc40_2.shader_test >> >>> For radeonsi my guess would be that the llvm optimizer works better >>> when the registers are not yet merged, and that would be the reason why >>> register_merge is disabled. >> well at least sometimes it doesn't, low hanging fruit maybe? > Unfortunately, I can't test on radeonsi I can, if you dont mind waiting for an answer sometimes. But maybe even easier: is there an implicit/explicit magic number I can play with to see if it changes anything? ATM it seems like your code improves half the shaders its affecting a lot and hurting the other half bad like it hits an invisible wall. I think one problem could be the relationship between VGPRs and SGPRs used and max Wavefronts achieved. This is somewhat similar to NIR although that changes things all over the place. > > Best, > Gert > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Hello Benedict, thanks for all the testing! On 29.04.2018 12:12, Benedikt Schemmer wrote: >> Which are the names of these test? I'd like to check this on r600, >> because here I didn't see any regressions last time I checked. >> > might of course be different on r600 (is bindless available?), > also shader-db is more sensitive to problems than piglit > > 1. tests/spec/arb_bindless_texture/compiler/images/arrays-of-struct.frag > 2. tests/spec/arb_bindless_texture/compiler/samplers/arrays-of-struct.frag Indeed, bindless testures are not available on r600, so it is quite difficult for me to test this. I would guess that parameters related to this might be stored in the TGSI declaration that I currently don't check. If you have time for it, could you send me a TGSI dump of one of these shaders? With "ST_DEBUG=tgsi" this should be possible. > >> For radeonsi my guess would be that the llvm optimizer works better >> when the registers are not yet merged, and that would be the reason why >> register_merge is disabled. > well at least sometimes it doesn't, low hanging fruit maybe? Unfortunately, I can't test on radeonsi Best, Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
sorry the last one wasnt correct: register merge vs yours Max Increase: SGPRS: 80 -> 96 (20.00 %) (in shaders/deusex_mankind/7c39f71090a9db19ac2e1542ea12804ae6c6495b_4864.shader_test) VGPRS: 64 -> 84 (31.25 %) (in shaders/dirtrally/0859b69789591d7046e211400b1edd9a7cfca734_742.shader_test) Spilled SGPRs: 0 -> 16 (0.00 %) (in shaders/deusex_mankind/d64e2084204e29749639e8fbd9a1e507c7e5e1dd_6840.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 24 -> 32 (33.33 %) (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Scratch size: 28 -> 36 (28.57 %) dwords per thread (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Code Size: 6988 -> 8036 (15.00 %) bytes (in shaders/cat/1847.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 5 -> 7 (40.00 %) (in shaders/ruiner/0967c5fce7fc456496b1cfa25fbb1d1c4dcf9bed_2958.shader_test) Wait states: 0 -> 0 (0.00 %) Max Decrease: SGPRS: 96 -> 72 (-25.00 %) (in shaders/deusex_mankind/c1f098c7b14b1ba291cfa9bba4e41ba91acaba30_3630.shader_test) VGPRS: 80 -> 68 (-15.00 %) (in shaders/dirtrally/710d3319bc986ea003f1a84ec6d3c01b2a8b9ded_2482.shader_test) Spilled SGPRs: 19 -> 0 (-100.00 %) (in shaders/deusex_mankind/0749c9ae23417f918c7286fe502ff5de4cb8e1a0_3276.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 9080 -> 8712 (-4.05 %) bytes (in shaders/dirtrally/2e1a8e04b49dee3e39b6f30a7c707f395abf_2480.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 8 -> 5 (-37.50 %) (in shaders/deusex_mankind/8dabec49e5b6c3b1cbcbaee194eff69f164d72f4_3968.shader_test) Wait states: 0 -> 0 (0.00 %) PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits 0ad6 . . . . . . . . . aer 590 . . . . . . . . . alien_isolation 1414 . . . . . . . . . anholt10 . . . . . . . . . bioshock_infinite 2581 .0.04 % . . . .0.01 % . . blackmesa584 . . . . . . . . . cat 573 -0.06 % -0.09 % . . . .0.08 %0.05 % . csgo1392 . . . . . . . . . deadisland_definitive 1776 . . . . . . . . . deadisland_original11602 . . . . . . . . . deadisland_riptide_..293 . . . . . . . . . deusex_mankind 5051 -0.06 % . -4.64 % . 33.33 % 28.57 % . -0.01 % . dirtrally7870.04 %0.49 % -0.46 % . . .0.03 % -0.23 % . dolphin 22 . . . . . . . . . dyinglight 4012 . . . . . . . . . eurotruck2 216 . . . . . . . . . f1_2015 746 . -0.03 %1.77 % . . .0.05 % . . glamor16 . . . . . . . . . hl2ep1 294 . . . . . . . . . hl2ep2 154 . . . . . . . . . hl2lostcoast 66 . . . . . . . . . hlsl3582 . . . . . . . . . humus-celshading 4 . . . . . . . . . humus-domino 6 . . . . . . . . . humus-dynamicbranching24 . . . . . . . . . humus-hdr 10 . . . . . . . . . humus-portals 2 . . . . . . . .
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Real world (old vs new) nothing vs register merge: PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits 0ad6 . . . . . . . . . aer 590 .0.26 % -20.00 % . . .0.34 % . . alien_isolation 1414 . . . . . . . . . anholt10 . . . . . . . . . bioshock_infinite 2581 -0.02 % . . . . .0.12 % . . blackmesa584 . . . . . . . . . cat 573 . -0.02 % . . . .0.11 % . . csgo1392 . . -0.88 % . . . -0.03 % . . deadisland_definitive 17760.06 % . . . . .0.15 %0.01 % . deadisland_original11602 . . . . . .0.05 % . . deadisland_riptide_..293 -0.06 %0.06 % . . . .0.32 % . . deusex_mankind 50510.14 % -0.01 % -1.57 % . . .0.18 % . . dirtrally787 -0.04 %0.15 %1.08 % . . .0.27 % -0.07 % . dolphin 22 . . . . . . . . . dyinglight 4012 .0.05 % . . . .0.34 % -0.01 % . eurotruck2 216 . . . . . . . . . f1_2015 746 -0.04 %0.02 %0.93 % . . .0.08 % . . glamor16 -2.33 % . . . . .3.97 % . . hl2ep1 294 . . . . . . . . . hl2ep2 154 . . . . . . . . . hl2lostcoast 66 . . . . . . . . . hlsl3582 . . . . . . -0.14 % . . humus-celshading 4 . . . . . . . . . humus-domino 6 . . . . . . . . . humus-dynamicbranching24 . . . . . . . . . humus-hdr 10 . . . . . . . . . humus-portals 2 . . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . . kerbal 1016 .0.11 % . . . .0.31 % . . larago 664 . . . . . .0.01 % . . madmax 354 . -0.08 % . . . . -0.02 %0.04 % . metro2033redux 4410 .0.03 % . . . .0.06 % -0.01 % . nexuiz80 . . . . . . . . . ruiner 685 . -0.02 % . . . .0.04 % . . sauerbraten7 . . . . . . . . . serioussam2017 736 . .7.59 % . . .0.05 % . . soma 436 . . . . . . . . . specops 1814 . . . . . .0.35 % . . stellaris434 . . . . . .0.11 % . . supertuxkart 4 . . . . . . . . . talos762 -0.02 % .0.09 % . . .0.01 % . . tesseract
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Am 29.04.2018 um 11:34 schrieb Gert Wollny: > Am Sonntag, den 29.04.2018, 10:43 +0200 schrieb Benedikt Schemmer: >> Hi Gert, >> >> couldn't resist at least to try what would happen if I enable >> register merge for radeonsi: >> >> PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR >> SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits >> piglit 80732 -0.16 % -0.02 >> % . .0.87 %0.86 %0.04 % . . >> -- >> >> All affected 513 -17.58 % -2.30 >> % . .4.12 %5.87 %1.73 %0.10 % . >> -- >> >> Total 80732 -0.16 % -0.02 >> % . .0.87 %0.86 %0.04 % . . >> >> I had already removed the defines around the debug code so thats also >> happily outputting data. >> >> fails with two piglit shaders: > Which are the names of these test? I'd like to check this on r600, > because here I didn't see any regressions last time I checked. > might of course be different on r600 (is bindless available?), also shader-db is more sensitive to problems than piglit 1. tests/spec/arb_bindless_texture/compiler/images/arrays-of-struct.frag 2. tests/spec/arb_bindless_texture/compiler/samplers/arrays-of-struct.frag > >> Real world is a little different: > > I guess these tests refer to enabled register_merge - without and with > this patch set, no? > > Out of curiosity, did you also look at how enabling register_merge > (before this series) impacts the result as compared to the normal > operation of radeonsi? no I didn't but if I do (old vs new) nothing vs register merge: PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits piglit 80732 . . . . . .0.06 % . . -- All affected 4350.89 %0.36 % . . . .3.61 % -0.06 % . -- Total 80732 . . . . . .0.06 % . . register merge vs yours: PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits piglit 80732 -0.16 % -0.02 % . .0.87 % 0.86 % -0.02 % . . -- All affected 83 -56.92 % -14.22 % . . 11.67 % 16.67 % -2.86 %0.93 % . -- Total 80732 -0.16 % -0.02 % . .0.87 % 0.86 % -0.02 % . . nothing vs yours: PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits piglit 80732 -0.16 % -0.02 % . .0.87 % 0.86 %0.04 % . . -- All affected 513 -17.58 % -2.30 % . .4.12 % 5.87 %1.73 %0.10 % . -- Total 80732 -0.16 % -0.02 % . .0.87 % 0.86 %0.04 % . . > > >> If theres an easy way to figure out when your code makes it worse and >> when its an improvement this would be really nice. > > My insentive for this series was, that on r600 the arrays are allocated > before the final optimization pass on the byte code that requires that > the number of registers is <= 124. When I started this no spilling was > implemented, and shaders with too many arrays and registers would > simply fail. Now spilling is impelmented, but AFAIK reducing the > numbers of registers in the final optimization pass does not result in > changed spilling, so bringing down the number of registers before tgsi- > to-bytecode is still of interest. > > For radeonsi my guess would be that the llvm optimizer works better > when the registers are not yet merged, and that would be the reason why > register_merge is disabled. well at least sometimes it doesn't, low hanging fruit
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Am Sonntag, den 29.04.2018, 10:43 +0200 schrieb Benedikt Schemmer: > Hi Gert, > > couldn't resist at least to try what would happen if I enable > register merge for radeonsi: > > PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR > SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits > piglit 80732 -0.16 % -0.02 > % . .0.87 %0.86 %0.04 % . . > -- > > All affected 513 -17.58 % -2.30 > % . .4.12 %5.87 %1.73 %0.10 % . > -- > > Total 80732 -0.16 % -0.02 > % . .0.87 %0.86 %0.04 % . . > > I had already removed the defines around the debug code so thats also > happily outputting data. > > fails with two piglit shaders: Which are the names of these test? I'd like to check this on r600, because here I didn't see any regressions last time I checked. > Real world is a little different: I guess these tests refer to enabled register_merge - without and with this patch set, no? Out of curiosity, did you also look at how enabling register_merge (before this series) impacts the result as compared to the normal operation of radeonsi? > If theres an easy way to figure out when your code makes it worse and > when its an improvement this would be really nice. My insentive for this series was, that on r600 the arrays are allocated before the final optimization pass on the byte code that requires that the number of registers is <= 124. When I started this no spilling was implemented, and shaders with too many arrays and registers would simply fail. Now spilling is impelmented, but AFAIK reducing the numbers of registers in the final optimization pass does not result in changed spilling, so bringing down the number of registers before tgsi- to-bytecode is still of interest. For radeonsi my guess would be that the llvm optimizer works better when the registers are not yet merged, and that would be the reason why register_merge is disabled. In any case, Timothy wrote in this thread [1] (last message) that he had similar patches for NIR. Best, Gert [1] https://patchwork.freedesktop.org/patch/189842/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Hi Gert, couldn't resist at least to try what would happen if I enable register merge for radeonsi: PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits piglit 80732 -0.16 % -0.02 % . .0.87 % 0.86 %0.04 % . . -- All affected 513 -17.58 % -2.30 % . .4.12 % 5.87 %1.73 %0.10 % . -- Total 80732 -0.16 % -0.02 % . .0.87 % 0.86 %0.04 % . . I had already removed the defines around the debug code so thats also happily outputting data. fails with two piglit shaders: [require] GLSL >= 3.30 [fragment shader] // [config] // expect_result: pass // glsl_version: 3.30 // require_extensions: GL_ARB_bindless_texture GL_ARB_shader_image_load_store // [end config] #version 330 #extension GL_ARB_bindless_texture: require #extension GL_ARB_shader_image_load_store: enable #extension GL_ARB_arrays_of_arrays: enable struct s { writeonly image2D img[3][2]; int y; }; void main() { s a[2][4]; imageStore(a[0][0].img[0][0], ivec2(0, 0), vec4(1, 2, 3, 4)); } and [require] GLSL >= 3.30 [fragment shader] // [config] // expect_result: pass // glsl_version: 3.30 // require_extensions: GL_ARB_bindless_texture // [end config] #version 330 #extension GL_ARB_bindless_texture: require #extension GL_ARB_arrays_of_arrays: enable struct s { sampler2D tex[3][2]; int y; }; out vec4 color; void main() { s a[2][4]; color = texture2D(a[0][0].tex[0][0], vec2(0, 0)); } Real world is a little different: Max Increase: SGPRS: 72 -> 96 (33.33 %) (in shaders/cat/1787.shader_test) VGPRS: 64 -> 84 (31.25 %) (in shaders/dirtrally/0859b69789591d7046e211400b1edd9a7cfca734_742.shader_test) Spilled SGPRs: 0 -> 16 (0.00 %) (in shaders/deusex_mankind/d64e2084204e29749639e8fbd9a1e507c7e5e1dd_6840.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 24 -> 32 (33.33 %) (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Scratch size: 28 -> 36 (28.57 %) dwords per thread (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Code Size: 6988 -> 8036 (15.00 %) bytes (in shaders/cat/1847.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 5 -> 7 (40.00 %) (in shaders/ruiner/0967c5fce7fc456496b1cfa25fbb1d1c4dcf9bed_2958.shader_test) Wait states: 0 -> 0 (0.00 %) Max Decrease: SGPRS: 104 -> 64 (-38.46 %) (in shaders/deusex_mankind/480ddf21b1076d36f9ffd9911389656b5d8e12cb_2878.shader_test) VGPRS: 44 -> 36 (-18.18 %) (in shaders/ruiner/0967c5fce7fc456496b1cfa25fbb1d1c4dcf9bed_2958.shader_test) Spilled SGPRs: 19 -> 0 (-100.00 %) (in shaders/deusex_mankind/0749c9ae23417f918c7286fe502ff5de4cb8e1a0_3276.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 17576 -> 17276 (-1.71 %) bytes (in shaders/ruiner/75b96ff36f5328b9ff9366f0d0fd58a1046f51bc_3053.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 8 -> 5 (-37.50 %) (in shaders/deusex_mankind/8dabec49e5b6c3b1cbcbaee194eff69f164d72f4_3968.shader_test) Wait states: 0 -> 0 (0.00 %) PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits 0ad6 . . . . . . . . . aer 590 .0.26 % -20.00 % . . .0.34 % . . alien_isolation 1414 . . . . . . . . . anholt10 . . . . . . . . . bioshock_infinite 2581 -0.02 %0.03 % . . . .0.13 % . . blackmesa584 . . . . . . . . . cat 573 -0.06 % -0.12 % . . . .0.20 %0.05 % . csgo1392 . . -0.88 % . . . -0.03 % . . deadisland_definitive 17760.06 % . . . . .0.15 %0.01 % . deadisland_original11602 . . . . . .0.05 % . . deadisland_riptide_..293 -0.06 %0.06 % . . . .0.32 % . . deusex_mankind 50510.08 %
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Hello Benedikt, Am Sonntag, den 29.04.2018, 00:06 +0200 schrieb Benedikt Schemmer: > Hi Gert > > Am 28.04.2018 um 23:51 schrieb Gert Wollny: > > Am Samstag, den 28.04.2018, 22:43 +0200 schrieb Benedikt Schemmer: > > > The patches apply cleanly, however I just did a shader-db test > > > run > > > and can't find a difference with your patch > > > applied, am I doing something wrong? > > > > AFAIK radeonsi doesn't use the register-merge optimizer in TGSI. > > > > Ah, ok. Was wondering why your debug code doesn't output anything. > Makes sense now ;) Not exactly, the reason there is no output is because -DNDEBUG is set. Without it the statistics should also be printed out on radeonsi, but thinking of it I should probably disable it when register_merge is not accessed, because without this the numbers will be inflated and don't have much meaning. > So is this useless on radeonsi? Indeed. > Seemed interesting to me. :) it certainly helps on r600 > > > compile times went up though: > > > > This is strange, because "see above". Did you compile with debug > > information and c++11 or higher enables? ... > > > > > not intentionally: Then you should actually not run any code that this series adds to mesa. I checked again, apart from the debugging output nothing will ever be run if a drivers that report PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERS != 0 (as does radeonsi). Best, Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Hi Gert Am 28.04.2018 um 23:51 schrieb Gert Wollny: > Am Samstag, den 28.04.2018, 22:43 +0200 schrieb Benedikt Schemmer: >> The patches apply cleanly, however I just did a shader-db test run >> and can't find a difference with your patch >> applied, am I doing something wrong? > > AFAIK radeonsi doesn't use the register-merge optimizer in TGSI. > Ah, ok. Was wondering why your debug code doesn't output anything. Makes sense now ;) So is this useless on radeonsi? Seemed interesting to me. >> >> compile times went up though: > This is strange, because "see above". Did you compile with debug > information and c++11 or higher enables? In this case there is one > access to a static variable (first patch) that per c++11 standard > should be thread save, which means that there might be a mutex > protecting access to that variable, and this would explain the longer > runtime in a multi-threaded environment. > not intentionally: prefix: /usr/local exec_prefix: ${prefix} libdir: ${prefix}/lib includedir: ${prefix}/include OpenGL: yes (ES1: yes ES2: yes) OSMesa: libOSMesa DRI platform:drm DRI drivers: i915 i965 nouveau r200 radeon swrast DRI driver dir: ${prefix}/lib/dri GLX: DRI-based EGL: yes EGL drivers: builtin:egl_dri2 builtin:egl_dri3 GBM: yes EGL/Vulkan/VL platforms: x11 wayland drm Vulkan drivers: intel radeon Vulkan ICD dir: ${datarootdir}/vulkan/icd.d llvm:yes llvm-config: llvm-config-6.0 llvm-version:6.0.0 Gallium drivers: nouveau svga r600 r300 i915 virgl radeonsi swrast Gallium st: mesa xa xvmc vdpau omx_bellagio va nine clover HUD extra stats: yes HUD lmsensors: yes Shared libs: yes Static libs: no Shared-glapi:yes CFLAGS: -O3 -fstack-protector-strong -Wall -Wextra -Werror=format-security -fno-omit-frame-pointer -Wall -Werror=implicit-function-declaration -Werror=missing-prototypes -Wmissing-prototypes -fno-math-errno -fno-trapping-math -std=c99 CXXFLAGS:-O3 -fstack-protector-strong -Wall -Wextra -Werror=format-security -fno-omit-frame-pointer -Wall -fno-math-errno -fno-trapping-math CXX11_CXXFLAGS: LDFLAGS: -Bsymbolic-functions -z relro Macros: -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_GNU_SOURCE -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS -DNDEBUG -DTEXTURE_FLOAT_ENABLED -DUSE_X86_64_ASM -DHAVE_SYS_SYSCTL_H -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_TIMESPEC_GET -DHAVE_STRTOD_L -DHAVE_DL_ITERATE_PHDR -DHAVE_POSIX_MEMALIGN -DHAVE_ZLIB -DHAVE_LINUX_FUTEX_H -DHAVE_GALLIUM_EXTRA_HUD=1 -DHAVE_LIBSENSORS=1 -DHAVE_LIBDRM -DGLX_USE_DRM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS -DHAVE_X11_PLATFORM -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRM_PLATFORM -DHAVE_DRI3 -DHAVE_DRI3_MODIFIERS -DENABLE_SHADER_CACHE -DHAVE_MINCORE -DHAVE_ST_VDPAU -DHAVE_LLVM=0x0600 -DMESA_LLVM_VERSION_PATCH=0 LLVM_CFLAGS: -I/usr/lib/llvm-6.0/include -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS LLVM_CXXFLAGS: -I/usr/lib/llvm-6.0/include -std=c++0x -std=c++11 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS LLVM_CPPFLAGS: -I/usr/lib/llvm-6.0/include -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS LLVM_LDFLAGS:-L/usr/lib/llvm-6.0/lib PYTHON2: python2.7 Run 'make' to build Mesa Cheers, Benedikt > Best, > Gert > >> >> before: >> Thread 3 took 113.72 seconds and compiled 17899 shaders (not >> including SIMD16) with 2232 GL context switches >> Thread 5 took 113.23 seconds and compiled 17767 shaders (not >> including SIMD16) with 2150 GL context switches >> Thread 7 took 116.63 seconds and compiled 18030 shaders (not >> including SIMD16) with 2219 GL context switches >> Thread 1 took 117.10 seconds and compiled 17966 shaders (not >> including SIMD16) with 2154 GL context switches >> Thread 4 took 113.76 seconds and compiled 18097 shaders (not >> including SIMD16) with 2285 GL context switches >> Thread 2 took 113.61 seconds and compiled 17111 shaders (not >> including SIMD16) with 1934 GL context switches >> Thread 6 took 118.93 seconds and compiled 17887 shaders (not >> including SIMD16) with 2205 GL context switches >> Thread 0 took 112.91 seconds and compiled 18232 shaders (not >> including SIMD16) with 2321 GL context switches >> >> with your patch: >> Thread 1 took 119.41 seconds and compiled 18495 shaders (not >> including SIMD16) with 2237 GL context switches >> Thread 7 took 122.11 seconds and compiled 17228 shaders (not >> including SIMD16) with 2105 GL context switches >> Thread 4 took 120.57
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
Am Samstag, den 28.04.2018, 22:43 +0200 schrieb Benedikt Schemmer: > The patches apply cleanly, however I just did a shader-db test run > and can't find a difference with your patch > applied, am I doing something wrong? AFAIK radeonsi doesn't use the register-merge optimizer in TGSI. > > compile times went up though: This is strange, because "see above". Did you compile with debug information and c++11 or higher enables? In this case there is one access to a static variable (first patch) that per c++11 standard should be thread save, which means that there might be a mutex protecting access to that variable, and this would explain the longer runtime in a multi-threaded environment. Best, Gert > > before: > Thread 3 took 113.72 seconds and compiled 17899 shaders (not > including SIMD16) with 2232 GL context switches > Thread 5 took 113.23 seconds and compiled 17767 shaders (not > including SIMD16) with 2150 GL context switches > Thread 7 took 116.63 seconds and compiled 18030 shaders (not > including SIMD16) with 2219 GL context switches > Thread 1 took 117.10 seconds and compiled 17966 shaders (not > including SIMD16) with 2154 GL context switches > Thread 4 took 113.76 seconds and compiled 18097 shaders (not > including SIMD16) with 2285 GL context switches > Thread 2 took 113.61 seconds and compiled 17111 shaders (not > including SIMD16) with 1934 GL context switches > Thread 6 took 118.93 seconds and compiled 17887 shaders (not > including SIMD16) with 2205 GL context switches > Thread 0 took 112.91 seconds and compiled 18232 shaders (not > including SIMD16) with 2321 GL context switches > > with your patch: > Thread 1 took 119.41 seconds and compiled 18495 shaders (not > including SIMD16) with 2237 GL context switches > Thread 7 took 122.11 seconds and compiled 17228 shaders (not > including SIMD16) with 2105 GL context switches > Thread 4 took 120.57 seconds and compiled 17989 shaders (not > including SIMD16) with 2165 GL context switches > Thread 5 took 119.79 seconds and compiled 17709 shaders (not > including SIMD16) with 2190 GL context switches > Thread 6 took 121.95 seconds and compiled 17804 shaders (not > including SIMD16) with 2209 GL context switches > Thread 2 took 121.43 seconds and compiled 17819 shaders (not > including SIMD16) with 2167 GL context switches > Thread 0 took 117.16 seconds and compiled 18180 shaders (not > including SIMD16) with 2151 GL context switches > Thread 3 took 124.79 seconds and compiled 17765 shaders (not > including SIMD16) with 2176 GL context switches > > > > Radeon RX 560 Series (POLARIS11, DRM 3.26.0, 4.17.0-rc2+, LLVM 6.0.0) > > Mesa git of today > > PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR > SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits > 0ad6 . . . . > . . . . . > aer 590 . . . . > . . . . . > alien_isolation 1414 . . . . > . . . . . > anholt10 . . . . > . . . . . > bioshock_infinite 2581 . . . . > . . . . . > blackmesa584 . . . . > . . . . . > cat 573 . . . . > . . . . . > csgo1392 . . . . > . . . . . > deadisland_definitive 1776 . . . . > . . . . . > deadisland_original11602 . . . . > . . . . . > deadisland_riptide_..293 . . . . > . . . . . > deusex_mankind 5051 . . . . > . . . . . > dirtrally787 . . . . > . . . . . > dolphin 22 . . . . > . . . . . > dyinglight 4012 . . . . > . . . . . > eurotruck2 216 . . . . > . . . . . > f1_2015 746 . . . . > . . . . . > glamor16 . . . . > . . . . . > hl2ep1 294
Re: [Mesa-dev] [PATCH v3 00/13] TGSI: improved live range tracking, also including arrays
The patches apply cleanly, however I just did a shader-db test run and can't find a difference with your patch applied, am I doing something wrong? compile times went up though: before: Thread 3 took 113.72 seconds and compiled 17899 shaders (not including SIMD16) with 2232 GL context switches Thread 5 took 113.23 seconds and compiled 17767 shaders (not including SIMD16) with 2150 GL context switches Thread 7 took 116.63 seconds and compiled 18030 shaders (not including SIMD16) with 2219 GL context switches Thread 1 took 117.10 seconds and compiled 17966 shaders (not including SIMD16) with 2154 GL context switches Thread 4 took 113.76 seconds and compiled 18097 shaders (not including SIMD16) with 2285 GL context switches Thread 2 took 113.61 seconds and compiled 17111 shaders (not including SIMD16) with 1934 GL context switches Thread 6 took 118.93 seconds and compiled 17887 shaders (not including SIMD16) with 2205 GL context switches Thread 0 took 112.91 seconds and compiled 18232 shaders (not including SIMD16) with 2321 GL context switches with your patch: Thread 1 took 119.41 seconds and compiled 18495 shaders (not including SIMD16) with 2237 GL context switches Thread 7 took 122.11 seconds and compiled 17228 shaders (not including SIMD16) with 2105 GL context switches Thread 4 took 120.57 seconds and compiled 17989 shaders (not including SIMD16) with 2165 GL context switches Thread 5 took 119.79 seconds and compiled 17709 shaders (not including SIMD16) with 2190 GL context switches Thread 6 took 121.95 seconds and compiled 17804 shaders (not including SIMD16) with 2209 GL context switches Thread 2 took 121.43 seconds and compiled 17819 shaders (not including SIMD16) with 2167 GL context switches Thread 0 took 117.16 seconds and compiled 18180 shaders (not including SIMD16) with 2151 GL context switches Thread 3 took 124.79 seconds and compiled 17765 shaders (not including SIMD16) with 2176 GL context switches Radeon RX 560 Series (POLARIS11, DRM 3.26.0, 4.17.0-rc2+, LLVM 6.0.0) Mesa git of today PERCENTAGE DELTASShaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits 0ad6 . . . . . . . . . aer 590 . . . . . . . . . alien_isolation 1414 . . . . . . . . . anholt10 . . . . . . . . . bioshock_infinite 2581 . . . . . . . . . blackmesa584 . . . . . . . . . cat 573 . . . . . . . . . csgo1392 . . . . . . . . . deadisland_definitive 1776 . . . . . . . . . deadisland_original11602 . . . . . . . . . deadisland_riptide_..293 . . . . . . . . . deusex_mankind 5051 . . . . . . . . . dirtrally787 . . . . . . . . . dolphin 22 . . . . . . . . . dyinglight 4012 . . . . . . . . . eurotruck2 216 . . . . . . . . . f1_2015 746 . . . . . . . . . glamor16 . . . . . . . . . hl2ep1 294 . . . . . . . . . hl2ep2 154 . . . . . . . . . hl2lostcoast 66 . . . . . . . . . hlsl3582 . . . . . . . . . humus-celshading 4 . . . . . . . . . humus-domino 6 . . . . . . . . .