Re: [Mesa-dev] r600g shader optimization
On Wed, 2011-10-19 at 10:49 -0400, Tom Stellard wrote: On Fri, 2011-10-07 at 10:14 -0400, Vadim Girlin wrote: Hi, Recently I've been working on the shader optimization for r600g, and now I have the initial working implementation of simple alu scheduler and register allocator. It has no piglit regressions, though it's still a work in progress and there are known issues with some applications. I've pushed the working branch to github: https://github.com/VadimGirlin/mesa/tree/r600_shader_opt Hi Vadim, What's the current status of this branch? Is there anything in there that is stable and ready to merge? I think it's not ready yet. I'm still working on the features such as barrier bit usage for parallel execution of the alu and fetch instructions. Experiments with some relatively simple algorithms for this doesn't show any benefits, so I'm thinking about rewriting the scheduler to use more complex approach. So far simple unification of the fetch instructions to minimize mixing of the different clause types (without using barrier bit for parallel execution) gives the best result for me (this is already implemented in my branch, though it might need still uncommitted patch to reduce register pressure in some cases). Also there are regressions with some applications. Currently I'm mostly working on the new features, but probably at least all known issues should be fixed before trying to merge it. And finally it will need the cleanup and coding style fixes. Vadim -Tom ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g shader optimization
Hi Vadim, wow quite impressive, does it also contains peep-hole optimisation? I tried to implement that once, but failed because I never got all the dependencies between opcodes correctly resolved. Rescheduling export instructions and setting the barrier flag of CF instructions correctly can also be quite an improvement. Just a side note in commit r600g: make some functions in r600_asm.c externally accessible you make a whole bunch of functions externally accessible, but didn't add a proper prefix like r600_ to the function name. That could led to a bit confusion when somebody else tries to hack on the code. Regards, Christian. Am Freitag, den 07.10.2011, 18:14 +0400 schrieb Vadim Girlin: Hi, Recently I've been working on the shader optimization for r600g, and now I have the initial working implementation of simple alu scheduler and register allocator. It has no piglit regressions, though it's still a work in progress and there are known issues with some applications. I've pushed the working branch to github: https://github.com/VadimGirlin/mesa/tree/r600_shader_opt Currently it supports evergreen only, but I'm planning to make it work with other chips too. It uses struct r600_bytecode as the source, converting it to SSA-based internal representation. I'm going to implement some optimization passes at that phase, but currently it's then doing final steps - register allocation, alu scheduling, and building new bytecode. I'm attaching as an example the dump for one of the shaders in the glxgears. You could get such dump for all shaders before and after processing by setting R600_OPT_DUMP environment variable to 2. Setting this variable to 1 will only print some information for the processed shaders - size, number of gprs, and number of alu instruction groups. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g shader optimization
On Sat, 2011-10-08 at 11:35 +0200, Christian König wrote: Hi Vadim, wow quite impressive, does it also contains peep-hole optimisation? Not yet, I want to create a stable framework first (conversion to the internal representation and back to the bytecode), and this is still not finished. Only some simple optimizations which are integrated in these phases are currently implemented - e.g. dead code elimination. When this task will be completed, it'll be possible to insert any optimizations in the middle, using the SSA form, def/use chains and other precomputed information, and without the need to bother about most of the hardware specific details. I tried to implement that once, but failed because I never got all the dependencies between opcodes correctly resolved. Rescheduling export instructions and setting the barrier flag of CF instructions correctly can also be quite an improvement. Currently I'm working on global scheduling, and I hope it will handle this. Just a side note in commit r600g: make some functions in r600_asm.c externally accessible you make a whole bunch of functions externally accessible, but didn't add a proper prefix like r600_ to the function name. That could led to a bit confusion when somebody else tries to hack on the code. I'll fix this, though probably I'll use some universal function to compute all instruction properties once and store them as flags, instead of using existing is_*_inst functions all over the code. Checking bank swizzles also should be done in more efficient for my code way, so I won't need check_and_set_bank_swizzles too. This commit is a temporary solution. There is a lot of such things in this code which were implemented in a quick and simple way just to check if this will work at all, it will be improved later. Vadim Regards, Christian. Am Freitag, den 07.10.2011, 18:14 +0400 schrieb Vadim Girlin: Hi, Recently I've been working on the shader optimization for r600g, and now I have the initial working implementation of simple alu scheduler and register allocator. It has no piglit regressions, though it's still a work in progress and there are known issues with some applications. I've pushed the working branch to github: https://github.com/VadimGirlin/mesa/tree/r600_shader_opt Currently it supports evergreen only, but I'm planning to make it work with other chips too. It uses struct r600_bytecode as the source, converting it to SSA-based internal representation. I'm going to implement some optimization passes at that phase, but currently it's then doing final steps - register allocation, alu scheduling, and building new bytecode. I'm attaching as an example the dump for one of the shaders in the glxgears. You could get such dump for all shaders before and after processing by setting R600_OPT_DUMP environment variable to 2. Setting this variable to 1 will only print some information for the processed shaders - size, number of gprs, and number of alu instruction groups. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev