Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/21/2013 04:04 AM, Marek Olšák wrote: Ah, I didn't know you had any other env vars. It's preferable to have as many boolean flags as possible handled by a single env var, because it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of boolean flags hidden behind a magic number). Feel free to have separate env vars for more complex parameters. I skimmed through some of your code and the coding style looks good. I'm also okay with C++, it really seems like the right choice here. However I agree with the argument that one header file per cpp might not always be a good idea, especially if the header file is pretty small. Thanks for reviewing. I pushed to my repo the branch with the following changes: - changes to existing r600g code splitted from the main big patch - small header files merged into sb_pass.h, sb_ir.h, sb_bc.h - added new R600_DEBUG flags to replace multiple env vars: sb - Enable optimization of graphics shaders sbcl - Enable optimization of compute shaders sbdry - Dry run, optimize but don't use new bytecode sbstat - Print optimization statistics (currently the time only) sbdump - Print IR after some passes. - added debug_id (shader index) to struct r600_bytecode, id's are assigned to each shader in r600_bytecode_init and printed in the shader dump header, it's intended to avoid reinventing shader numbering in different places for dumps and debugging. - some minor cleanups Updated branch can be found here: http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2 Vadim Marek On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/20/2013 03:11 AM, Marek Olšák wrote: Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. I agree, those vars probably need some cleanup, they were added before R600_DEBUG appeared. Though I'm afraid some of my options won't fit well into the R600_DEBUG flags, unless we'll add support for the name/value pairs with optional custom parsers. E.g. I have a group of env vars to define the range of included/excluded shaders for optimization and mode (include/exclude/off), I thought about doing this with a single var and custom parser to specify the range e.g. as 10-20, but after all it's just a debug feature, not intended for everyday use, and so far I failed to convince myself that it's worth the efforts. I can implement the support for custom parsers for R600_DEBUG, but do we really need it? Maybe it would be enough to add e.g.sb instead of R600_SB var to the R600_DEBUG flags for enabling it (probably together with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated internal debug options as is? Vadim There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 03:11 AM, Marek Olšák wrote: Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. I agree, those vars probably need some cleanup, they were added before R600_DEBUG appeared. Though I'm afraid some of my options won't fit well into the R600_DEBUG flags, unless we'll add support for the name/value pairs with optional custom parsers. E.g. I have a group of env vars to define the range of included/excluded shaders for optimization and mode (include/exclude/off), I thought about doing this with a single var and custom parser to specify the range e.g. as 10-20, but after all it's just a debug feature, not intended for everyday use, and so far I failed to convince myself that it's worth the efforts. I can implement the support for custom parsers for R600_DEBUG, but do we really need it? Maybe it would be enough to add e.g.sb instead of R600_SB var to the R600_DEBUG flags for enabling it (probably together with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated internal debug options as is? Vadim There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode structs. This results in the more readable shader dumps for instructions passed in native hw encoding from llvm backend. I think this also can help to catch more potential bugs related to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader dumps, including the fetch shaders, even when optimization is not enabled. Basically it can replace r600_bytecode_disasm and related code completely. Below are some quick benchmarks for shader performance and compilation time, to demonstrate that currently r600-sb might provide better performance for users, at least in some cases. As an example of the shaders with
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 01:42 PM, Christian König wrote: Am 19.04.2013 18:50, schrieb Vadim Girlin: On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Yeah I already thought that you're using something like this. On one hand that is really good, cause it is specialized on so produces really optimal code for the r600 target. But on the other hand it's bad, cause it is specialized on so produces really optimal code ONLY on the r600 target I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target. Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful. Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist i/5 (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code. One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well. I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient. Just speculating, what would it take to make those passes run on the LLVM Machine Instruction representation instead of your own representation? Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified. Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress. Vadim [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html [2] http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results Christian. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb [2]
Re: [Mesa-dev] r600g: status of the r600-sb branch
Am 20.04.2013 13:12, schrieb Vadim Girlin: On 04/20/2013 01:42 PM, Christian König wrote: Am 19.04.2013 18:50, schrieb Vadim Girlin: On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Yeah I already thought that you're using something like this. On one hand that is really good, cause it is specialized on so produces really optimal code for the r600 target. But on the other hand it's bad, cause it is specialized on so produces really optimal code ONLY on the r600 target I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target. Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful. Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist i/5 (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code. One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well. I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient. Just speculating, what would it take to make those passes run on the LLVM Machine Instruction representation instead of your own representation? Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified. Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress. Oh, I wasn't taking about replacing any LLVM passes, more like extending them to provide the same amount of functionality. Also I hadn't had LLVM IR in mind while writing this, but more the machine instruction representation they use. Well you have quite allot of C++
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 03:38 PM, Christian König wrote: Am 20.04.2013 13:12, schrieb Vadim Girlin: On 04/20/2013 01:42 PM, Christian König wrote: Am 19.04.2013 18:50, schrieb Vadim Girlin: On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Yeah I already thought that you're using something like this. On one hand that is really good, cause it is specialized on so produces really optimal code for the r600 target. But on the other hand it's bad, cause it is specialized on so produces really optimal code ONLY on the r600 target I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target. Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful. Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist i/5 (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code. One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well. I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient. Just speculating, what would it take to make those passes run on the LLVM Machine Instruction representation instead of your own representation? Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified. Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress. Oh, I wasn't taking about replacing any LLVM passes, more like extending them to provide the same amount of functionality. Also I hadn't had LLVM IR in mind while writing this, but more the machine instruction representation they use. Well you have quite allot of C++
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote: The choice of C++ (unlike in my previous branch that used C) was mostly driven by the fact that optimization algorithms usually deal with a lot of different complex data structures, containers, etc, and C++ allows to isolate implementation of all such things in separate and easily replaceable classes and concentrate on the logic, making the code more clean and readable. I'm sure it would be good fun to have a discussion about the relative merits of C and C++, though I think I've seen enough actual C++ that you're not going to convince me it's the better language. However, I don't think that should be the main consideration. It's probably more important to consider what current and potential new contributors prefer, and on Linux, particularly for the more low-level stuff, I suspect that pretty much means C. I haven't tried to keep it as a series of independent patches because during the development most changes were pretty intrusive and introduced new features, some parts were seriously reworked/rewritten more than one time, requiring changes in other parts, especially when intermediate representation of the code was changed. It was usually easier for me to simply fix the new regressions in the new code than to revert any changes and lose new features, so bisection wouldn't be very helpful anyway. That's why I didn't even try to keep the history. Anyway most of the code in the branch is new, so I don't think that the history of the patches that rewrite the same code few times during a development would make it more readable than simply reading the final code. I think I'm just going to disagree there. (But of course that's all just my personal opinion, which probably doesn't carry a lot of weight at the moment.) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 07:05 PM, Henri Verbeet wrote: On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote: The choice of C++ (unlike in my previous branch that used C) was mostly driven by the fact that optimization algorithms usually deal with a lot of different complex data structures, containers, etc, and C++ allows to isolate implementation of all such things in separate and easily replaceable classes and concentrate on the logic, making the code more clean and readable. I'm sure it would be good fun to have a discussion about the relative merits of C and C++, though I think I've seen enough actual C++ that you're not going to convince me it's the better language. I never wanted to convince you that C++ is better language, I just wanted to explain why I decided to switch from C to C++ in this particular case. However, I don't think that should be the main consideration. It's probably more important to consider what current and potential new contributors prefer, and on Linux, particularly for the more low-level stuff, I suspect that pretty much means C. Well, it may be considered as a low-level stuff because it's a part of the driver. On the other hand, I'd rather think of it as a part of the compiler, and compilers (especially optimization algorithms) don't really look like a low-level stuff to me. Depends on the definition of the low-level stuff though. To name a few examples, we can look at the compilers/optimizing backends used by mesa/gallium: GLSL compiler (written in C++). LLVM (written in C++), backends for nvidia drivers (written in C++)... Vadim I haven't tried to keep it as a series of independent patches because during the development most changes were pretty intrusive and introduced new features, some parts were seriously reworked/rewritten more than one time, requiring changes in other parts, especially when intermediate representation of the code was changed. It was usually easier for me to simply fix the new regressions in the new code than to revert any changes and lose new features, so bisection wouldn't be very helpful anyway. That's why I didn't even try to keep the history. Anyway most of the code in the branch is new, so I don't think that the history of the patches that rewrite the same code few times during a development would make it more readable than simply reading the final code. I think I'm just going to disagree there. (But of course that's all just my personal opinion, which probably doesn't carry a lot of weight at the moment.) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
Ah, I didn't know you had any other env vars. It's preferable to have as many boolean flags as possible handled by a single env var, because it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of boolean flags hidden behind a magic number). Feel free to have separate env vars for more complex parameters. I skimmed through some of your code and the coding style looks good. I'm also okay with C++, it really seems like the right choice here. However I agree with the argument that one header file per cpp might not always be a good idea, especially if the header file is pretty small. Marek On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/20/2013 03:11 AM, Marek Olšák wrote: Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. I agree, those vars probably need some cleanup, they were added before R600_DEBUG appeared. Though I'm afraid some of my options won't fit well into the R600_DEBUG flags, unless we'll add support for the name/value pairs with optional custom parsers. E.g. I have a group of env vars to define the range of included/excluded shaders for optimization and mode (include/exclude/off), I thought about doing this with a single var and custom parser to specify the range e.g. as 10-20, but after all it's just a debug feature, not intended for everyday use, and so far I failed to convince myself that it's worth the efforts. I can implement the support for custom parsers for R600_DEBUG, but do we really need it? Maybe it would be enough to add e.g.sb instead of R600_SB var to the R600_DEBUG flags for enabling it (probably together with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated internal debug options as is? Vadim There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend
Re: [Mesa-dev] r600g: status of the r600-sb branch
Hi Vadim, from your description it seems to be a post processing stage working on the bytecode of the shaders and additional to that is quite separated from the rest of the driver. If that's the case then I don't really see a reason why we shouldn't merge it, but at least at the beginning it should probably be disabled by default. On the other hand we should question if there are any optimizations in there that could be done on earlier stages, something like on the GLSL level for example? Cheers, Christian. Am 19.04.2013 16:48, schrieb Vadim Girlin: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode structs. This results in the more readable shader dumps for instructions passed in native hw encoding from llvm backend. I think this also can help to catch more potential bugs related to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader dumps, including the fetch shaders, even when optimization is not enabled. Basically it can replace r600_bytecode_disasm and related code completely. Below are some quick benchmarks for shader performance and compilation time, to demonstrate that currently r600-sb might provide better performance for users, at least in some cases. As an example of the shaders with good optimization opportunities I used the application that computes and renders atmospheric scattering effects, it was mentioned in the previous thread: http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html Here are current results for that app (Main.noprecompute, frames per second) with default backend, default backend + r600-sb, and llvm backend: defdef+sbllvm 240590248 Another quick benchmark is an OpenCL kernel performance with bfgminer (megahash/s): llvmllvm+sb 6887 One more benchmark is for compilation speed/overhead - I used two piglit tests, first compiles a lot of shaders (IIRC more than thousand), second compiles
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote: In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Personally, I'd be in favour of merging this at some point. While I haven't exactly done extensive testing or benchmarking with the branch, the things I did try at least worked correctly, so I'd say that's a good start at least. I'm afraid I can't claim extensive review either, but I guess the most obvious things I don't like about it are that it's C++, and spread over a large number of 1000 line files. Similarly, I don't really see the point of having a header file for just about each .cpp file. One for private interfaces and one for the public interface should probably be plenty. I'm not quite sure how others feel about that, although I suspect I'm not alone in at least the preference of C over C++. I also suspect it would help if this was some kind of logical, bisectable series of patches instead of a single commit that adds 18k+ lines. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 07:23 PM, Henri Verbeet wrote: On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote: In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Personally, I'd be in favour of merging this at some point. While I haven't exactly done extensive testing or benchmarking with the branch, the things I did try at least worked correctly, so I'd say that's a good start at least. I'm afraid I can't claim extensive review either, but I guess the most obvious things I don't like about it are that it's C++, and spread over a large number of 1000 line files. Similarly, I don't really see the point of having a header file for just about each .cpp file. One for private interfaces and one for the public interface should probably be plenty. I thought about that, but I'm just not sure what would be a preferred way. I agree that a lot of small files don't look very good, on the other hand it makes all classes better separated and readable, that's why I was not sure which way is best. Of course I can merge some files together if it's preferable. I'm not quite sure how others feel about that, although I suspect I'm not alone in at least the preference of C over C++. The choice of C++ (unlike in my previous branch that used C) was mostly driven by the fact that optimization algorithms usually deal with a lot of different complex data structures, containers, etc, and C++ allows to isolate implementation of all such things in separate and easily replaceable classes and concentrate on the logic, making the code more clean and readable. I also suspect it would help if this was some kind of logical, bisectable series of patches instead of a single commit that adds 18k+ lines. I haven't tried to keep it as a series of independent patches because during the development most changes were pretty intrusive and introduced new features, some parts were seriously reworked/rewritten more than one time, requiring changes in other parts, especially when intermediate representation of the code was changed. It was usually easier for me to simply fix the new regressions in the new code than to revert any changes and lose new features, so bisection wouldn't be very helpful anyway. That's why I didn't even try to keep the history. Anyway most of the code in the branch is new, so I don't think that the history of the patches that rewrite the same code few times during a development would make it more readable than simply reading the final code. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 07:13 PM, � wrote: Hi Vadim, from your description it seems to be a post processing stage working on the bytecode of the shaders and additional to that is quite separated from the rest of the driver. Yes, currently it's more like a post-processing stage, though on the other hand the only missing thing to consider it as a complete backend is an initial TGSI translator (that is, a sort of instruction selection pass). Basically it's exactly what default backend in the r600g does. I thought about writing direct translator from TGSI to my IR, but it would require some time and benefits aren't very clear, except the slightly reduced translation time. It's easier to rely on the default backend for that, and also it simplifies debugging by providing the ability to see and compare both the source (after default backend) and optimized bytecode. If that's the case then I don't really see a reason why we shouldn't merge it, but at least at the beginning it should probably be disabled by default. Yes, I agree that it's better to make it disabled as default, it's currently enabled in my branch just to simplify testing, but I'll change that in case if we'll merge the branch. On the other hand we should question if there are any optimizations in there that could be done on earlier stages, something like on the GLSL level for example? In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Vadim Cheers, Christian. Am 19.04.2013 16:48, schrieb Vadim Girlin: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode
Re: [Mesa-dev] r600g: status of the r600-sb branch
Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). Regards, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb [2] http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf Regards, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode structs. This results in the more readable shader dumps for instructions passed in native hw encoding from llvm backend. I think this also can help to catch more potential bugs related to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader dumps, including the fetch shaders, even when optimization is not enabled. Basically it can replace r600_bytecode_disasm and related code completely. Below are some quick benchmarks for shader performance and compilation time, to demonstrate that currently r600-sb might provide better performance for users, at least in some cases. As an example of the shaders with good optimization opportunities I used the application that computes and renders atmospheric scattering effects, it was mentioned in the previous thread: http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html Here are current results for that app (Main.noprecompute, frames per second) with default backend, default backend + r600-sb, and llvm backend: def def+sb llvm 240 590 248 Another quick benchmark is an OpenCL kernel performance with bfgminer (megahash/s): llvmllvm+sb 68 87 One more benchmark is for compilation speed/overhead - I used two piglit tests, first compiles a lot of shaders (IIRC more than thousand), second compiles a few huge shaders. Result is a test run time in seconds, this includes not only the compilation time but anyway shows the difference: def def+sb llvm tfb max-varyings10 14 53 fp-long-alu