Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-21 Thread Vadim Girlin

On 04/21/2013 04:04 AM, Marek Olšák wrote:

Ah, I didn't know you had any other env vars. It's preferable to have
as many boolean flags as possible handled by a single env var, because
it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of
boolean flags hidden behind a magic number). Feel free to have
separate env vars for more complex parameters.

I skimmed through some of your code and the coding style looks good.
I'm also okay with C++, it really seems like the right choice here.
However I agree with the argument that one header file per cpp might
not always be a good idea, especially if the header file is pretty
small.



Thanks for reviewing. I pushed to my repo the branch with the following 
changes:


- changes to existing r600g code splitted from the main big patch

- small header files merged into sb_pass.h, sb_ir.h, sb_bc.h

- added new R600_DEBUG flags to replace multiple env vars:
sb - Enable optimization of graphics shaders
sbcl - Enable optimization of compute shaders
sbdry - Dry run, optimize but don't use new bytecode
sbstat - Print optimization statistics (currently the time only)
sbdump - Print IR after some passes.

- added debug_id (shader index) to struct r600_bytecode, id's are 
assigned to each shader in r600_bytecode_init and printed in the shader 
dump header, it's intended to avoid reinventing shader numbering in 
different places for dumps and debugging.


- some minor cleanups

Updated branch can be found here:

  http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2

Vadim


Marek

On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 04/20/2013 03:11 AM, Marek Olšák wrote:


Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.



I agree, those vars probably need some cleanup, they were added before
R600_DEBUG appeared.

Though I'm afraid some of my options won't fit well into the R600_DEBUG
flags, unless we'll add support for the name/value pairs with optional
custom parsers.

E.g. I have a group of env vars to define the range of included/excluded
shaders for optimization and mode (include/exclude/off), I thought about
doing this with a single var and custom parser to specify the range e.g. as
10-20, but after all it's just a debug feature, not intended for everyday
use, and so far I failed to convince myself that it's worth the efforts.

I can implement the support for custom parsers for R600_DEBUG, but do we
really need it? Maybe it would be enough to add e.g.sb instead of R600_SB
var to the R600_DEBUG flags for enabling it (probably together with other
boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated
internal debug options as is?

Vadim



There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com
wrote:


Hi,

In the previous status update I said that the r600-sb branch is not ready
to
be merged yet, but recently I've done some cleanups and reworks, and
though
I haven't finished everything that I planned initially, I think now it's
in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary
goal
for the r600g developers, it's a complicated process and may require
quite
some time to achieve good results regarding the shader/compiler
performance,
and at the same time this branch already works and provides good results
in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader
performance
problems. We can always get rid of it if it becomes too much a
maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my
best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other
chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch
is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from
r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that
is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the
new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:11 AM, Marek Olšák wrote:

Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.


I agree, those vars probably need some cleanup, they were added before 
R600_DEBUG appeared.


Though I'm afraid some of my options won't fit well into the R600_DEBUG 
flags, unless we'll add support for the name/value pairs with optional 
custom parsers.


E.g. I have a group of env vars to define the range of included/excluded 
shaders for optimization and mode (include/exclude/off), I thought about 
doing this with a single var and custom parser to specify the range e.g. 
as 10-20, but after all it's just a debug feature, not intended for 
everyday use, and so far I failed to convince myself that it's worth the 
efforts.


I can implement the support for custom parsers for R600_DEBUG, but do we 
really need it? Maybe it would be enough to add e.g.sb instead of 
R600_SB var to the R600_DEBUG flags for enabling it (probably together 
with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave 
more complicated internal debug options as is?


Vadim


There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote:

Hi,

In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary goal
for the r600g developers, it's a complicated process and may require quite
some time to achieve good results regarding the shader/compiler performance,
and at the same time this branch already works and provides good results in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader performance
problems. We can always get rid of it if it becomes too much a maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think I'll be able to
catch up when it's implemented in the driver and default or llvm backend.
E.g. this branch already works for me on evergreen with some simple OpenCL
kernels, including bfgminer where it increases performance of the kernel
compiled with llvm backend by more than 20% for me.

Besides the performance benefits, I think that alternative backend also
might help with debugging of the default or llvm backend, in some cases it
helped me by exposing the bugs that are not very obvious otherwise, e.g. it
may be hard to compare the dumps from default and llvm backend to spot the
regression because they are too different, but after processing both shaders
with r600-sb the code is usually transformed to some more common form, and
often this makes it easier to compare and find the differences in shader
logic.

One additional feature that might help with llvm backend debugging is the
disassembler that works on the hardware bytecode instead of the internal
r600g bytecode structs. This results in the more readable shader dumps for
instructions passed in native hw encoding from llvm backend. I think this
also can help to catch more potential bugs related to bytecode building in
r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
dumps, including the fetch shaders, even when optimization is not enabled.
Basically it can replace r600_bytecode_disasm and related code completely.

Below are some quick benchmarks for shader performance and compilation time,
to demonstrate that currently r600-sb might provide better performance for
users, at least in some cases.

As an example of the shaders with 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least need 
some callbacks or caps to be tunable for the target.


Anyway the result of GCM pass is affected by the CFG structure, so when 
the target applies e.g. if-conversion or any other target-specific 
control flow optimization, this means that you might want to apply 
similar pass again on the target instruction level for better results, 
and then previous pass on higher level IR looks not very useful.


Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. High-level 
pass can't hoist i/5 (where i is loop counter) out of the loop, but 
after translation to target instructions it's possible to hoist some of 
the resulting instructions, producing more efficient code.


One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care 
about code placement during elimination of redundant operations, so 
you'll probably want to implement high-level GVN pass as well.


I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I 
suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.




Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own representation?


Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary control 
flow, this renders CFGs superfluous. Implementation of these passes on 
CFGs will be more complicated, it will also require the computation of 
dominance frontiers, loops detection and analysis, etc. On the r600-sb's 
IR these passes are greatly simplified.


Regarding the GCM, original algorithm as described in that pdf works on 
the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure 
how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and 
other passes that together do basically the same thing as GVN-GCM, so if 
you implement it, you might want to get rid of LLVM's own passes that 
duplicate the same functionality, and I'm not sure if this would be 
easy, possibly there are some interdependencies etc. Also I saw mentions 
of some plans (e.g. [1],[2]) regarding the implementation of global code 
motion in LLVM, looks like there is already some work in progress.


Vadim

[1] 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
[2] 
http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results




Christian.


Vadim

 [1]
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb

 [2]

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Christian König

Am 20.04.2013 13:12, schrieb Vadim Girlin:

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. 
Also

I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the 
efficiency of

the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that 
always

bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least 
need some callbacks or caps to be tunable for the target.


Anyway the result of GCM pass is affected by the CFG structure, so 
when the target applies e.g. if-conversion or any other 
target-specific control flow optimization, this means that you might 
want to apply similar pass again on the target instruction level for 
better results, and then previous pass on higher level IR looks not 
very useful.


Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. 
High-level pass can't hoist i/5 (where i is loop counter) out of the 
loop, but after translation to target instructions it's possible to 
hoist some of the resulting instructions, producing more efficient code.


One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not 
care about code placement during elimination of redundant operations, 
so you'll probably want to implement high-level GVN pass as well.


I think it's possible to implement GVN-GCM on GLSL or TGSI level, but 
I suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.




Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own 
representation?


Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary 
control flow, this renders CFGs superfluous. Implementation of these 
passes on CFGs will be more complicated, it will also require the 
computation of dominance frontiers, loops detection and analysis, etc. 
On the r600-sb's IR these passes are greatly simplified.


Regarding the GCM, original algorithm as described in that pdf works 
on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not 
sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, 
LICM and other passes that together do basically the same thing as 
GVN-GCM, so if you implement it, you might want to get rid of LLVM's 
own passes that duplicate the same functionality, and I'm not sure if 
this would be easy, possibly there are some interdependencies etc. 
Also I saw mentions of some plans (e.g. [1],[2]) regarding the 
implementation of global code motion in LLVM, looks like there is 
already some work in progress.




Oh, I wasn't taking about replacing any LLVM passes, more like extending 
them to provide the same amount of functionality. Also I hadn't had LLVM 
IR in mind while writing this, but more the machine instruction 
representation they use.


Well you have quite allot of C++ 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:38 PM, Christian König wrote:

Am 20.04.2013 13:12, schrieb Vadim Girlin:

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement.
Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the
efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that
always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least
need some callbacks or caps to be tunable for the target.

Anyway the result of GCM pass is affected by the CFG structure, so
when the target applies e.g. if-conversion or any other
target-specific control flow optimization, this means that you might
want to apply similar pass again on the target instruction level for
better results, and then previous pass on higher level IR looks not
very useful.

Also there are some high level operations that are translated to the
bunch of target instructions, e.g. integer division on r600.
High-level pass can't hoist i/5 (where i is loop counter) out of the
loop, but after translation to target instructions it's possible to
hoist some of the resulting instructions, producing more efficient code.

One more point is that GCM allows to achieve best efficiency when used
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not
care about code placement during elimination of redundant operations,
so you'll probably want to implement high-level GVN pass as well.

I think it's possible to implement GVN-GCM on GLSL or TGSI level, but
I suspect it will require a lot more efforts than it was required by
implementation of these passes in my branch, and will be less efficient.



Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own
representation?


Main difference between IRs is the representation of control flow,
r600-sb relies on the fact that r600 arch doesn't have arbitrary
control flow, this renders CFGs superfluous. Implementation of these
passes on CFGs will be more complicated, it will also require the
computation of dominance frontiers, loops detection and analysis, etc.
On the r600-sb's IR these passes are greatly simplified.

Regarding the GCM, original algorithm as described in that pdf works
on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not
sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE,
LICM and other passes that together do basically the same thing as
GVN-GCM, so if you implement it, you might want to get rid of LLVM's
own passes that duplicate the same functionality, and I'm not sure if
this would be easy, possibly there are some interdependencies etc.
Also I saw mentions of some plans (e.g. [1],[2]) regarding the
implementation of global code motion in LLVM, looks like there is
already some work in progress.



Oh, I wasn't taking about replacing any LLVM passes, more like extending
them to provide the same amount of functionality. Also I hadn't had LLVM
IR in mind while writing this, but more the machine instruction
representation they use.

Well you have quite allot of C++ 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Henri Verbeet
On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote:
 The choice of C++ (unlike in my previous branch that used C) was mostly
 driven by the fact that optimization algorithms usually deal with a lot of
 different complex data structures, containers, etc, and C++ allows to
 isolate implementation of all such things in separate and easily replaceable
 classes and concentrate on the logic, making the code more clean and
 readable.

I'm sure it would be good fun to have a discussion about the relative
merits of C and C++, though I think I've seen enough actual C++ that
you're not going to convince me it's the better language. However, I
don't think that should be the main consideration. It's probably more
important to consider what current and potential new contributors
prefer, and on Linux, particularly for the more low-level stuff, I
suspect that pretty much means C.

 I haven't tried to keep it as a series of independent patches because during
 the development most changes were pretty intrusive and introduced new
 features, some parts were seriously reworked/rewritten more than one time,
 requiring changes in other parts, especially when intermediate
 representation of the code was changed. It was usually easier for me to
 simply fix the new regressions in the new code than to revert any changes
 and lose new features, so bisection wouldn't be very helpful anyway. That's
 why I didn't even try to keep the history. Anyway most of the code in the
 branch is new, so I don't think that the history of the patches that rewrite
 the same code few times during a development would make it more readable
 than simply reading the final code.

I think I'm just going to disagree there. (But of course that's all
just my personal opinion, which probably doesn't carry a lot of weight
at the moment.)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 07:05 PM, Henri Verbeet wrote:

On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote:

The choice of C++ (unlike in my previous branch that used C) was mostly
driven by the fact that optimization algorithms usually deal with a lot of
different complex data structures, containers, etc, and C++ allows to
isolate implementation of all such things in separate and easily replaceable
classes and concentrate on the logic, making the code more clean and
readable.


I'm sure it would be good fun to have a discussion about the relative
merits of C and C++, though I think I've seen enough actual C++ that
you're not going to convince me it's the better language.


I never wanted to convince you that C++ is better language, I just 
wanted to explain why I decided to switch from C to C++ in this 
particular case.



However, I
don't think that should be the main consideration. It's probably more
important to consider what current and potential new contributors
prefer, and on Linux, particularly for the more low-level stuff, I
suspect that pretty much means C.


Well, it may be considered as a low-level stuff because it's a part of 
the driver. On the other hand, I'd rather think of it as a part of the 
compiler, and compilers (especially optimization algorithms) don't 
really look like a low-level stuff to me. Depends on the definition of 
the low-level stuff though.


To name a few examples, we can look at the compilers/optimizing backends 
used by mesa/gallium: GLSL compiler (written in C++). LLVM (written in 
C++), backends for nvidia drivers (written in C++)...


Vadim




I haven't tried to keep it as a series of independent patches because during
the development most changes were pretty intrusive and introduced new
features, some parts were seriously reworked/rewritten more than one time,
requiring changes in other parts, especially when intermediate
representation of the code was changed. It was usually easier for me to
simply fix the new regressions in the new code than to revert any changes
and lose new features, so bisection wouldn't be very helpful anyway. That's
why I didn't even try to keep the history. Anyway most of the code in the
branch is new, so I don't think that the history of the patches that rewrite
the same code few times during a development would make it more readable
than simply reading the final code.


I think I'm just going to disagree there. (But of course that's all
just my personal opinion, which probably doesn't carry a lot of weight
at the moment.)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Marek Olšák
Ah, I didn't know you had any other env vars. It's preferable to have
as many boolean flags as possible handled by a single env var, because
it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of
boolean flags hidden behind a magic number). Feel free to have
separate env vars for more complex parameters.

I skimmed through some of your code and the coding style looks good.
I'm also okay with C++, it really seems like the right choice here.
However I agree with the argument that one header file per cpp might
not always be a good idea, especially if the header file is pretty
small.

Marek

On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 04/20/2013 03:11 AM, Marek Olšák wrote:

 Please don't add any new environment variables and use R600_DEBUG
 instead. The other environment variables are deprecated.


 I agree, those vars probably need some cleanup, they were added before
 R600_DEBUG appeared.

 Though I'm afraid some of my options won't fit well into the R600_DEBUG
 flags, unless we'll add support for the name/value pairs with optional
 custom parsers.

 E.g. I have a group of env vars to define the range of included/excluded
 shaders for optimization and mode (include/exclude/off), I thought about
 doing this with a single var and custom parser to specify the range e.g. as
 10-20, but after all it's just a debug feature, not intended for everyday
 use, and so far I failed to convince myself that it's worth the efforts.

 I can implement the support for custom parsers for R600_DEBUG, but do we
 really need it? Maybe it would be enough to add e.g.sb instead of R600_SB
 var to the R600_DEBUG flags for enabling it (probably together with other
 boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated
 internal debug options as is?

 Vadim


 There is a table for R600_DEBUG in r600_pipe.c and it even comes with
 a help feature: R600_DEBUG=help

 Marek

 On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com
 wrote:

 Hi,

 In the previous status update I said that the r600-sb branch is not ready
 to
 be merged yet, but recently I've done some cleanups and reworks, and
 though
 I haven't finished everything that I planned initially, I think now it's
 in
 a better state and may be considered for merging.

 I'm interested to know if the people think that merging of the r600-sb
 branch makes sense at all. I'll try to explain here why it makes sense to
 me.

 Although I understand that the development of llvm backend is a primary
 goal
 for the r600g developers, it's a complicated process and may require
 quite
 some time to achieve good results regarding the shader/compiler
 performance,
 and at the same time this branch already works and provides good results
 in
 many cases. That's why I think it makes sense to merge this branch as a
 non-default backend at least as a temporary solution for shader
 performance
 problems. We can always get rid of it if it becomes too much a
 maintenance
 burden or when llvm backend catches up in terms of shader performance and
 compilation speed/overhead.

 Regarding the support and maintenance of this code, I'll try to do my
 best
 to fix possible issues, and so far there are no known unfixed issues. I
 tested it with many apps on evergreen and fixed all issues with other
 chips
 that were reported to me on the list or privately after the last status
 announce. There are no piglit regressions on evergreen when this branch
 is
 used with both default and llvm backends.

 This code was intentionally separated as much as possible from the other
 parts of the driver, basically there are just two functions used from
 r600g,
 and the shader code is passed to/from r600-sb as a hardware bytecode that
 is
 not going to change. I think it won't require any modifications at all to
 keep it in sync with the most changes in r600g.

 Some work might be required though if we'll want to add support for the
 new
 hw features that are currently unused, e.g. geometry shaders, new
 instruction types for compute shaders, etc, but I think I'll be able to
 catch up when it's implemented in the driver and default or llvm backend.
 E.g. this branch already works for me on evergreen with some simple
 OpenCL
 kernels, including bfgminer where it increases performance of the kernel
 compiled with llvm backend by more than 20% for me.

 Besides the performance benefits, I think that alternative backend also
 might help with debugging of the default or llvm backend, in some cases
 it
 helped me by exposing the bugs that are not very obvious otherwise, e.g.
 it
 may be hard to compare the dumps from default and llvm backend to spot
 the
 regression because they are too different, but after processing both
 shaders
 with r600-sb the code is usually transformed to some more common form,
 and
 often this makes it easier to compare and find the differences in shader
 logic.

 One additional feature that might help with llvm backend 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Christian König

Hi Vadim,

from your description it seems to be a post processing stage working on 
the bytecode of the shaders and additional to that is quite separated 
from the rest of the driver.


If that's the case then I don't really see a reason why we shouldn't 
merge it, but at least at the beginning it should probably be disabled 
by default.


On the other hand we should question if there are any optimizations in 
there that could be done on earlier stages, something like on the GLSL 
level for example?


Cheers,
Christian.

Am 19.04.2013 16:48, schrieb Vadim Girlin:

Hi,

In the previous status update I said that the r600-sb branch is not 
ready to be merged yet, but recently I've done some cleanups and 
reworks, and though I haven't finished everything that I planned 
initially, I think now it's in a better state and may be considered 
for merging.


I'm interested to know if the people think that merging of the r600-sb 
branch makes sense at all. I'll try to explain here why it makes sense 
to me.


Although I understand that the development of llvm backend is a 
primary goal for the r600g developers, it's a complicated process and 
may require quite some time to achieve good results regarding the 
shader/compiler performance, and at the same time this branch already 
works and provides good results in many cases. That's why I think it 
makes sense to merge this branch as a non-default backend at least as 
a temporary solution for shader performance problems. We can always 
get rid of it if it becomes too much a maintenance burden or when llvm 
backend catches up in terms of shader performance and compilation 
speed/overhead.


Regarding the support and maintenance of this code, I'll try to do my 
best to fix possible issues, and so far there are no known unfixed 
issues. I tested it with many apps on evergreen and fixed all issues 
with other chips that were reported to me on the list or privately 
after the last status announce. There are no piglit regressions on 
evergreen when this branch is used with both default and llvm backends.


This code was intentionally separated as much as possible from the 
other parts of the driver, basically there are just two functions used 
from r600g, and the shader code is passed to/from r600-sb as a 
hardware bytecode that is not going to change. I think it won't 
require any modifications at all to keep it in sync with the most 
changes in r600g.


Some work might be required though if we'll want to add support for 
the new hw features that are currently unused, e.g. geometry shaders, 
new instruction types for compute shaders, etc, but I think I'll be 
able to catch up when it's implemented in the driver and default or 
llvm backend. E.g. this branch already works for me on evergreen with 
some simple OpenCL kernels, including bfgminer where it increases 
performance of the kernel compiled with llvm backend by more than 20% 
for me.


Besides the performance benefits, I think that alternative backend 
also might help with debugging of the default or llvm backend, in some 
cases it helped me by exposing the bugs that are not very obvious 
otherwise, e.g. it may be hard to compare the dumps from default and 
llvm backend to spot the regression because they are too different, 
but after processing both shaders with r600-sb the code is usually 
transformed to some more common form, and often this makes it easier 
to compare and find the differences in shader logic.


One additional feature that might help with llvm backend debugging is 
the disassembler that works on the hardware bytecode instead of the 
internal r600g bytecode structs. This results in the more readable 
shader dumps for instructions passed in native hw encoding from llvm 
backend. I think this also can help to catch more potential bugs 
related to bytecode building in r600g/llvm. Currently r600-sb uses its 
bytecode disassembler for all shader dumps, including the fetch 
shaders, even when optimization is not enabled. Basically it can 
replace r600_bytecode_disasm and related code completely.


Below are some quick benchmarks for shader performance and compilation 
time, to demonstrate that currently r600-sb might provide better 
performance for users, at least in some cases.


As an example of the shaders with good optimization opportunities I 
used the application that computes and renders atmospheric scattering 
effects, it was mentioned in the previous thread:

http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html

Here are current results for that app (Main.noprecompute, frames per 
second) with default backend, default backend + r600-sb, and llvm 
backend:

defdef+sbllvm
240590248

Another quick benchmark is an OpenCL kernel performance with bfgminer 
(megahash/s):

llvmllvm+sb
6887

One more benchmark is for compilation speed/overhead - I used two 
piglit tests, first compiles a lot of shaders (IIRC more than 
thousand), second compiles 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Henri Verbeet
On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote:
 In the previous status update I said that the r600-sb branch is not ready to
 be merged yet, but recently I've done some cleanups and reworks, and though
 I haven't finished everything that I planned initially, I think now it's in
 a better state and may be considered for merging.

 I'm interested to know if the people think that merging of the r600-sb
 branch makes sense at all. I'll try to explain here why it makes sense to
 me.

Personally, I'd be in favour of merging this at some point. While I
haven't exactly done extensive testing or benchmarking with the
branch, the things I did try at least worked correctly, so I'd say
that's a good start at least.

I'm afraid I can't claim extensive review either, but I guess the most
obvious things I don't like about it are that it's C++, and spread
over a large number of  1000 line files. Similarly, I don't really
see the point of  having a header file for just about each .cpp file.
One for private interfaces and one for the public interface should
probably be plenty. I'm not quite sure how others feel about that,
although I suspect I'm not alone in at least the preference of C over
C++. I also suspect it would help if this was some kind of logical,
bisectable series of patches instead of a single commit that adds 18k+
lines.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 07:23 PM, Henri Verbeet wrote:

On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote:

In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.


Personally, I'd be in favour of merging this at some point. While I
haven't exactly done extensive testing or benchmarking with the
branch, the things I did try at least worked correctly, so I'd say
that's a good start at least.

I'm afraid I can't claim extensive review either, but I guess the most
obvious things I don't like about it are that it's C++, and spread
over a large number of  1000 line files. Similarly, I don't really
see the point of  having a header file for just about each .cpp file.
One for private interfaces and one for the public interface should
probably be plenty.


I thought about that, but I'm just not sure what would be a preferred 
way. I agree that a lot of small files don't look very good, on the 
other hand it makes all classes better separated and readable, that's 
why I was not sure which way is best. Of course I can merge some files 
together if it's preferable.



I'm not quite sure how others feel about that,
although I suspect I'm not alone in at least the preference of C over
C++.


The choice of C++ (unlike in my previous branch that used C) was mostly 
driven by the fact that optimization algorithms usually deal with a lot 
of different complex data structures, containers, etc, and C++ allows to 
isolate implementation of all such things in separate and easily 
replaceable classes and concentrate on the logic, making the code more 
clean and readable.



I also suspect it would help if this was some kind of logical,
bisectable series of patches instead of a single commit that adds 18k+
lines.


I haven't tried to keep it as a series of independent patches because 
during the development most changes were pretty intrusive and introduced 
new features, some parts were seriously reworked/rewritten more than one 
time, requiring changes in other parts, especially when intermediate 
representation of the code was changed. It was usually easier for me to 
simply fix the new regressions in the new code than to revert any 
changes and lose new features, so bisection wouldn't be very helpful 
anyway. That's why I didn't even try to keep the history. Anyway most of 
the code in the branch is new, so I don't think that the history of the 
patches that rewrite the same code few times during a development would 
make it more readable than simply reading the final code.


Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 07:13 PM, � wrote:

Hi Vadim,

from your description it seems to be a post processing stage working on
the bytecode of the shaders and additional to that is quite separated
from the rest of the driver.


Yes, currently it's more like a post-processing stage, though on the 
other hand the only missing thing to consider it as a complete backend 
is an initial TGSI translator (that is, a sort of instruction selection 
pass). Basically it's exactly what default backend in the r600g does. I 
thought about writing direct translator from TGSI to my IR, but it would 
require some time and benefits aren't very clear, except the slightly 
reduced translation time. It's easier to rely on the default backend for 
that, and also it simplifies debugging by providing the ability to see 
and compare both the source (after default backend) and optimized bytecode.




If that's the case then I don't really see a reason why we shouldn't
merge it, but at least at the beginning it should probably be disabled
by default.


Yes, I agree that it's better to make it disabled as default, it's 
currently enabled in my branch just to simplify testing, but I'll change 
that in case if we'll merge the branch.




On the other hand we should question if there are any optimizations in
there that could be done on earlier stages, something like on the GLSL
level for example?


In theory, yes, some optimizations in this branch are typically used on 
the earlier compilation stages, not on the target machine code. On the 
other hand, there are some differences that might make it harder, e.g. 
many algorithms require SSA form, and though it's possible to do similar 
optimizations without SSA, it would be hard to implement. Also I wanted 
to support both default backend and llvm backend for increased testing 
coverage and to be able to compare the efficiency of the algorithms in 
my experiments etc.


Vadim



Cheers,
Christian.

Am 19.04.2013 16:48, schrieb Vadim Girlin:

Hi,

In the previous status update I said that the r600-sb branch is not
ready to be merged yet, but recently I've done some cleanups and
reworks, and though I haven't finished everything that I planned
initially, I think now it's in a better state and may be considered
for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense
to me.

Although I understand that the development of llvm backend is a
primary goal for the r600g developers, it's a complicated process and
may require quite some time to achieve good results regarding the
shader/compiler performance, and at the same time this branch already
works and provides good results in many cases. That's why I think it
makes sense to merge this branch as a non-default backend at least as
a temporary solution for shader performance problems. We can always
get rid of it if it becomes too much a maintenance burden or when llvm
backend catches up in terms of shader performance and compilation
speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my
best to fix possible issues, and so far there are no known unfixed
issues. I tested it with many apps on evergreen and fixed all issues
with other chips that were reported to me on the list or privately
after the last status announce. There are no piglit regressions on
evergreen when this branch is used with both default and llvm backends.

This code was intentionally separated as much as possible from the
other parts of the driver, basically there are just two functions used
from r600g, and the shader code is passed to/from r600-sb as a
hardware bytecode that is not going to change. I think it won't
require any modifications at all to keep it in sync with the most
changes in r600g.

Some work might be required though if we'll want to add support for
the new hw features that are currently unused, e.g. geometry shaders,
new instruction types for compute shaders, etc, but I think I'll be
able to catch up when it's implemented in the driver and default or
llvm backend. E.g. this branch already works for me on evergreen with
some simple OpenCL kernels, including bfgminer where it increases
performance of the kernel compiled with llvm backend by more than 20%
for me.

Besides the performance benefits, I think that alternative backend
also might help with debugging of the default or llvm backend, in some
cases it helped me by exposing the bugs that are not very obvious
otherwise, e.g. it may be hard to compare the dumps from default and
llvm backend to spot the regression because they are too different,
but after processing both shaders with r600-sb the code is usually
transformed to some more common form, and often this makes it easier
to compare and find the differences in shader logic.

One additional feature that might help with llvm backend debugging is
the disassembler that works on the hardware bytecode instead of the
internal r600g bytecode 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Christian König

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used 
on the earlier compilation stages, not on the target machine code. On 
the other hand, there are some differences that might make it harder, 
e.g. many algorithms require SSA form, and though it's possible to do 
similar optimizations without SSA, it would be hard to implement. Also 
I wanted to support both default backend and llvm backend for 
increased testing coverage and to be able to compare the efficiency of 
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always 
bothered me a bit with both TGSI and GLSL (while I haven't done much 
with GLSL, so maybe I misjudge here).


Can you name the different algorithms used?

It's not a strict prerequisite, but I think we both agree that doing 
things like LICM on R600 bytecode isn't the best idea over all (when 
doing it on GLSL would be beneficial for all drivers not only r600).


Regards,
Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the 
notes.markdown file [1] in that branch, there are also links in the end 
to the full description of some algorithms, though some of them were 
modified/adapted for this branch.



It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global Code 
Motion, [2]), which probably could be also called global scheduler. In 
fact in my branch this pass is combined with some hw-specific scheduling 
logic, e.g. grouping fetch/alu instructions to reduce clause type 
switching in the code and the number of required CF instructions, 
potentially it can also schedule clauses to expose more parallelism with 
the BARRIER bit usage.


Vadim

 [1] 
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb
 [2] 
http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf



Regards,
Christian.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Marek Olšák
Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.

There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 Hi,

 In the previous status update I said that the r600-sb branch is not ready to
 be merged yet, but recently I've done some cleanups and reworks, and though
 I haven't finished everything that I planned initially, I think now it's in
 a better state and may be considered for merging.

 I'm interested to know if the people think that merging of the r600-sb
 branch makes sense at all. I'll try to explain here why it makes sense to
 me.

 Although I understand that the development of llvm backend is a primary goal
 for the r600g developers, it's a complicated process and may require quite
 some time to achieve good results regarding the shader/compiler performance,
 and at the same time this branch already works and provides good results in
 many cases. That's why I think it makes sense to merge this branch as a
 non-default backend at least as a temporary solution for shader performance
 problems. We can always get rid of it if it becomes too much a maintenance
 burden or when llvm backend catches up in terms of shader performance and
 compilation speed/overhead.

 Regarding the support and maintenance of this code, I'll try to do my best
 to fix possible issues, and so far there are no known unfixed issues. I
 tested it with many apps on evergreen and fixed all issues with other chips
 that were reported to me on the list or privately after the last status
 announce. There are no piglit regressions on evergreen when this branch is
 used with both default and llvm backends.

 This code was intentionally separated as much as possible from the other
 parts of the driver, basically there are just two functions used from r600g,
 and the shader code is passed to/from r600-sb as a hardware bytecode that is
 not going to change. I think it won't require any modifications at all to
 keep it in sync with the most changes in r600g.

 Some work might be required though if we'll want to add support for the new
 hw features that are currently unused, e.g. geometry shaders, new
 instruction types for compute shaders, etc, but I think I'll be able to
 catch up when it's implemented in the driver and default or llvm backend.
 E.g. this branch already works for me on evergreen with some simple OpenCL
 kernels, including bfgminer where it increases performance of the kernel
 compiled with llvm backend by more than 20% for me.

 Besides the performance benefits, I think that alternative backend also
 might help with debugging of the default or llvm backend, in some cases it
 helped me by exposing the bugs that are not very obvious otherwise, e.g. it
 may be hard to compare the dumps from default and llvm backend to spot the
 regression because they are too different, but after processing both shaders
 with r600-sb the code is usually transformed to some more common form, and
 often this makes it easier to compare and find the differences in shader
 logic.

 One additional feature that might help with llvm backend debugging is the
 disassembler that works on the hardware bytecode instead of the internal
 r600g bytecode structs. This results in the more readable shader dumps for
 instructions passed in native hw encoding from llvm backend. I think this
 also can help to catch more potential bugs related to bytecode building in
 r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
 dumps, including the fetch shaders, even when optimization is not enabled.
 Basically it can replace r600_bytecode_disasm and related code completely.

 Below are some quick benchmarks for shader performance and compilation time,
 to demonstrate that currently r600-sb might provide better performance for
 users, at least in some cases.

 As an example of the shaders with good optimization opportunities I used the
 application that computes and renders atmospheric scattering effects, it was
 mentioned in the previous thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html

 Here are current results for that app (Main.noprecompute, frames per second)
 with default backend, default backend + r600-sb, and llvm backend:
 def def+sb  llvm
 240 590 248

 Another quick benchmark is an OpenCL kernel performance with bfgminer
 (megahash/s):
 llvmllvm+sb
 68  87

 One more benchmark is for compilation speed/overhead - I used two piglit
 tests, first compiles a lot of shaders (IIRC more than thousand), second
 compiles a few huge shaders. Result is a test run time in seconds, this
 includes not only the compilation time but anyway shows the difference:
 def def+sb  llvm
 tfb max-varyings10  14  53
 fp-long-alu