Re: [Mesa-dev] [PATCH v2] r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling

2017-10-18 Thread Vadim Girlin

On 10/16/2017 10:06 PM, Gert Wollny wrote:

It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.

This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Signed-off-by: Gert Wollny 
---
Change w.r.t. v1:
- In schedule_alu() if pending.count() == 0 then don't expect that
   this value is reduced by a call to  prepare_alu_group(), instead
   continue the loop until it is exited by "break".

I've added you Vadim as to original author to the CC, maybe you can shed a bit 
more light
on what might be going wrong here, and whether there is an easy real fix 
instead of just
a workaround.


I'm honestly barely remember all related details, sorry, I guess you 
know that code a lot better than me now. That VLIW scheduling/packing 
stuff was pretty complicated even when I worked on it. :)

Now after 4 years I'm too scared to touch it.

So I can't reasonably review it now, but if this patch fixes the bug and 
doesn't result in any regressions, and if Glenn and Dave have no 
objections, I guess it's ok to push it, you can add my "acked-by".


I'd push it but I'm not ready to take responsibility for any possible 
fallout. I hope Dave or whoever maintains r600g will help with that.


Thanks for fixing it.



best regards,
Gert

Note: Submitter has no mesa-git write access.

  src/gallium/drivers/r600/sb/sb_sched.cpp | 43 
  src/gallium/drivers/r600/sb/sb_sched.h   |  8 +++---
  2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 5113b75684..2fbec2f77e 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -711,22 +711,24 @@ void alu_group_tracker::update_flags(alu_node* n) {
  }
  
  int post_scheduler::run() {

-   run_on(sh.root);
-   return 0;
+   return run_on(sh.root) ? 0 : 1;
  }
  
-void post_scheduler::run_on(container_node* n) {

-
+bool post_scheduler::run_on(container_node* n) {
+   int r = true;
for (node_riterator I = n->rbegin(), E = n->rend(); I != E; ++I) {
if (I->is_container()) {
if (I->subtype == NST_BB) {
bb_node* bb = static_cast(*I);
-   schedule_bb(bb);
+   r = schedule_bb(bb);
} else {
-   run_on(static_cast(*I));
+   r = run_on(static_cast(*I));
}
+   if (!r)
+   break;
}
}
+   return r;
  }
  
  void post_scheduler::init_uc_val(container_node *c, value *v) {

@@ -758,7 +760,7 @@ unsigned post_scheduler::init_ucm(container_node *c, node 
*n) {
return F == ucm.end() ? 0 : F->second;
  }
  
-void post_scheduler::schedule_bb(bb_node* bb) {

+bool post_scheduler::schedule_bb(bb_node* bb) {
PSC_DUMP(
sblog << "scheduling BB " << bb->id << "\n";
if (!pending.empty())
@@ -791,8 +793,10 @@ void post_scheduler::schedule_bb(bb_node* bb) {
  
  		if (n->is_alu_clause()) {

n->remove();
-   process_alu(static_cast(n));
-   continue;
+   bool r = process_alu(static_cast(n));
+   if (r)
+   continue;
+   return false;
}
  
  		n->remove();

@@ -800,6 +804,7 @@ void post_scheduler::schedule_bb(bb_node* bb) {
}
  
  	this->cur_bb = NULL;

+   return true;
  }
  
  void post_scheduler::init_regmap() {

@@ -933,10 +938,10 @@ void post_scheduler::process_fetch(container_node *c) {
cur_bb->push_front(c);
  }
  
-void post_scheduler::process_alu(container_node *c) {

+bool post_scheduler::process_alu(container_node *c) {
  
  	if (c->empty())

-   return;
+   return true;
  
  	ucm.clear();

alu.reset();
@@ -973,7 +978,7 @@ void post_scheduler::process_alu(container_node *c) {
}
}
  
-	schedule_alu(c);

+   return schedule_alu(c);
  }
  
  void post_scheduler::update_local_interferences() {

@@ -1135,15 +1140,20 @@ void post_scheduler::emit_clause() {
emit_index_registers();
  }
  
-void post_scheduler::schedule_alu(container_node *c) {

+bool post_scheduler::schedule_alu(container_node *c) {
  
  	assert(!ready.empty() || !ready_copies.empty());
  
-	while (1) {

-

Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert

2017-09-13 Thread Vadim Girlin

On 09/13/2017 11:16 AM, Gert Wollny wrote:

Am Dienstag, den 12.09.2017, 23:44 +0200 schrieb Glenn Kennard:


Vadim is correct, the fix is to extend the check in the if case above
to also exclude TGSI_FILE_SYSTEM_VALUE, and keep the assert in place.
ie:

   if (pshader->indirect_files & ~((1 << TGSI_FILE_CONSTANT) | (1 <<
TGSI_FILE_SAMPLER) | (1 << TGSI_FILE_SYSTEM_VALUE))) {


Good, I'll update the patch accordingly. I guess the else path below is
then only some fall-back for non-debug builds make all GPRs available
as one big array to keep the code somehow valid for execution, right?


Yes, it's just a safe fall-back in case if we don't have proper array 
info for some reason. It makes the backend assume that all GPRs can be 
accessed indirectly.




I think I'd like to add a comment for that when I submit the new patch,
because it is kind of irritating to see an assert and then a code path
that seems to properly handle the case that would make the assert fail.

if (pshader->num_arrays) {
...
} else {
sh->add_gpr_array(0, pshader->bc.ngpr, 0x0F);
}

Best,
Gert




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert

2017-09-12 Thread Vadim Girlin

On 09/12/2017 12:49 PM, Gert Wollny wrote:

Am Dienstag, den 12.09.2017, 09:56 +0300 schrieb Vadim Girlin:

On 09/11/2017 07:09 PM, Emil Velikov wrote:



Anyway, if num_arrays is 0 there, I suspect it can be a result of
some other issue. At the very least it looks like a potential
performance problem, because in that case we assume all shader
registers can be  accessed with indirect addressing and it can limit
the optimizations significantly. So it might make sense to figure out
why it's zero in the first place, in theory it shouldn't happen.
Maybe something is wrong with the indirect_files bits?


The shader that's failing is this (i.e. no arrays, and indirect access
only to SV).


Is the tested feature really supported by r600g? AFAICS the indirect 
index value is unused in the shader code.


Anyway, at first glance it looks like we don't need indirect addressing 
for GPRs in this case, so the outer "if" around that assert probably 
should handle this case too and skip the assert. I'm not 100% sure though.




FRAG
DCL SV[0], SAMPLEMASK
DCL OUT[0], COLOR
DCL CONST[0][0]
DCL TEMP[0..1], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {1., 0., 0., 0.}
IMM[1] INT32 {1, 0, 0, 0}
   0: MOV TEMP[0], IMM[0].xyyx
   1: UARL ADDR[0].x, CONST[0][0].
   2: USEQ TEMP[1].x, SV[ADDR[0].x]., IMM[1].
   3: UIF TEMP[1].
   4:   MOV TEMP[0].xy, IMM[0].yxyy
   5: ENDIF
   6: MOV OUT[0], TEMP[0]
   7: END

= SHADER #12 ==
PS/BARTS/EVERGREEN =
= 36 dw = 8 gprs = 1 stack
=
  4005 a418 ALU_PUSH_BEFORE 7 @10 KC0[CB0:0-15]
0010  00f9 00400c90 1 x: MOVR2.x,  1.0
0012  04f8 20400c90   y: MOVR2.y,  0
0014  04f8 40400c90   z: MOVR2.z,  0
0016  00f9 60400c90   w: MOVR2.w,  1.0
0018  8080 00800c90   t: MOVR4.x,  KC0[0].x
0020  801f4800 00601d10 2 x: SETE_INT   R3.x,  R0.z, 1
0022  801f00fe 00e0229c 3 MP  x: PRED_SETNE_INT R7.x,  PV.x, 0
0002  0003 8281 JUMP @6 POP:1
0004  000c a804 ALU_POP_AFTER 2 @24
0024  04f8 00400c90 4 x: MOVR2.x,  0
0026  80f9 20400c90   y: MOVR2.y,  1.0
0006  000e a00c ALU 4 @28
0028  0002 00200c90 5 x: MOVR1.x,  R2.x
0030  0402 20200c90   y: MOVR1.y,  R2.y
0032  0802 40200c90   z: MOVR1.z,  R2.z
0034  8c02 60200c90   w: MOVR1.w,  R2.w
0008  c0008000 95200688 EXPORT_DONEPIXEL 0 R1.xyzw  EOP
= SHADER_END



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert

2017-09-12 Thread Vadim Girlin

On 09/11/2017 07:09 PM, Emil Velikov wrote:

On 11 September 2017 at 15:39, Gert Wollny <gw.foss...@gmail.com> wrote:

The assert checks whether pshader->num_arrays != 0, but the code
after the assert actually branches based on the same check.

Removing this assert fixes:
   piglit spec@arb_gpu_shader5@execution@samplemaskin-indirect


Both assert() and if () have existed since day 1, with below commit.
Perhaps Vadim has some ideas what happened here?


I guess the assert was added initially just to make sure that I set 
indirect_files and num_arrays fields correctly elsewhere and everything 
related to the indirect arrays works as I expect.


Many features were added since then, so my assumptions from that time 
could be wrong now, I'm just not sure off-hand.


Anyway, if num_arrays is 0 there, I suspect it can be a result of some 
other issue. At the very least it looks like a potential performance 
problem, because in that case we assume all shader registers can be 
accessed with indirect addressing and it can limit the optimizations 
significantly. So it might make sense to figure out why it's zero in the 
first place, in theory it shouldn't happen. Maybe something is wrong 
with the indirect_files bits?


I'm adding Glenn to cc too, AFAIU he has added some related features 
since then, so possibly he knows better.





Cc: Vadim Girlin <vadimgir...@gmail.com>
Fixes: 2cd76917934 ("r600g/sb: initial commit of the optimizing shader backend")

-Emil



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix handling of TGSI_OPCODE_ARR with SB

2015-08-22 Thread Vadim Girlin

On 08/13/15 21:30, Glenn Kennard wrote:

FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.


FWIW, AFAIK it works in trans as well, just uses different rounding mode.

According to the description in the EG ISA doc: Channels 0-3 use
the 32-bit round mode state; channel 4 uses truncation..

So vector slots use default rounding mode, trans slot always uses trunc.

That is, I have no objections against that change, I think it makes 
sense to limit it to expected behavior, I hoped to control it somewhere 
later, but didn't ever get close to it.


So just FYI.




Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Fixes issue found on nine: https://github.com/iXit/Mesa-3D/issues/119

  src/gallium/drivers/r600/r600_isa.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 381f06d..fdbe1c0 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -262,7 +262,7 @@ static const struct alu_op_info alu_op_table[] = {
{PRED_SETNE_PUSH_INT,   2, { 0x4D, 0x4D },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_NE | AF_INT_CMP },
{PRED_SETLT_PUSH_INT,   2, { 0x4E, 0x4E },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_LT | AF_INT_CMP },
{PRED_SETLE_PUSH_INT,   2, { 0x4F, 0x4F },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_LE | AF_INT_CMP },
-   {FLT_TO_INT,1, { 0x6B, 0x50 },{   AF_S,  
AF_S, AF_VS, AF_VS},  AF_INT_DST | AF_CVT },
+   {FLT_TO_INT,1, { 0x6B, 0x50 },{   AF_S,  
AF_S,  AF_V,  AF_V},  AF_INT_DST | AF_CVT },
{BFREV_INT, 1, {   -1, 0x51 },{  0, 
0, AF_VS, AF_VS},  AF_INT_DST },
{ADDC_UINT, 2, {   -1, 0x52 },{  0, 
0, AF_VS, AF_VS},  AF_UINT_DST },
{SUBB_UINT, 2, {   -1, 0x53 },{  0, 
0, AF_VS, AF_VS},  AF_UINT_DST },



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-16 Thread Vadim Girlin

On 12/16/2014 05:44 AM, Dave Airlie wrote:

On 16 December 2014 at 08:59, Vadim Girlin vadimgir...@gmail.com wrote:

On 12/16/2014 01:30 AM, Dave Airlie wrote:




New patch is attached, the only difference is in the sb_sched.cpp (it
disables copy coalescing for some unsafe cases, so it may leave more
MOVs
than previously, but I don't think there will be any noticeable effect
on
performance).

So far I don't see any problems with it, but I don't have many GL apps
on
the test machine. At least lightsmark and unigine demos work for me.



Based on my limited understanding of the code:

Acked-by: Alex Deucher alexander.deuc...@amd.com




Alex, thanks for the review, I understand you wanted it to get into mesa
release, but it really needs careful testing with more apps, so far I
hoped
Dave would do it as long as he's looking into these issues anyway. In
theory
I can also install steam on the test machine and some games, it just
needs
the time and I'm not sure if I'll find it, so far my main job is
sufficient
to make me pretty tired.

Current scheduler in SB is very fragile after adding handling for all
special cases discovered during initial debugging etc, I said since the
very
beginning that I'd like to rewrite it, if only I had time. So any change
like this can potentially break some apps even if piglit passes, and I'm
not
ready to take responsibility for that if I commit it myself, I just don't
have time to deal with all possible consequences on all supported chips.

If you think it's ok, just push this patch (it requires revert of the
previous Dave's commit 7b0067d2). I'm really sorry that I can't do more
to
help with it.



Myself and Glenn are looking at it, Glenn noticed a piglit regression
from this yesterday, I'll reproduce today and take a look.



Hi, Dave  Glenn,

Thanks for looking into it. FWIW, when I worked on it I've ran piglit's
quick tests and didn't see any regressions on evergreen (juniper 5750).
There were some failed tests in some piglit runs, but AFAIU they were just
random.


Turns out we had a pre-existing fail that we noticed, not a regression.

I'm going to push this, since its better than what is there, we can
see if some public testing notices any big issues also.


Thanks, Dave. I'm really sorry that I can't pay as much attention to 
that code as I'd like, and I really appreciate your and Glenn's efforts 
for maintaining it.


(In case if someone thinks it's my fault, I must remind, I warned that I 
won't be able to support it even before it was merged. So please don't 
blame me :) ).

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-15 Thread Vadim Girlin

On 12/12/2014 05:28 PM, Alex Deucher wrote:

On Wed, Dec 10, 2014 at 6:50 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 12/09/2014 07:39 AM, Vadim Girlin wrote:


On 12/09/2014 05:18 AM, Dave Airlie wrote:


On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote:


On 12/06/2014 07:13 AM, Vadim Girlin wrote:



On 12/04/2014 01:43 AM, Dave Airlie wrote:



Hi Vadim,

I've been looking with Glenn's help into a bug in sb for a couple of
weeks now triggered by a change in how GLSL generates switch
statements.

I understand you probably aren't too interested in r600g but I believe
I'm hitting a design level problem and I would like some advice.

So it appears that GLSL can create loops that don't repeat for switch
statements, and it appears SB wasn't ready to handle such a thing.




Hi, Dave,

I suspect we should rather get rid of such loops somehow, i.e. convert
to something else, the loop that never repeats is not really a loop
anyway. AFAICS continue is not supported in switch statements
according to GLSL specs, so the loops generated for switch will
never be
repeated. Am I missing something? Even if repeating is possible
somehow,
at least we can get rid of the loops that are not repeated.

I think loops are less efficient than other control flow
instructions on
r600g hw (at least because they increase stack usage), and possibly on
other hw too.

In fact it seems sb basically gets rid of it already in IR, it just
doesn't know how to translate resulting control flow to ISA, because so
far it only supports specific control flow structure for if-then-else
that was previously preserved during optimizations. I think it may be
not very hard to implement support for that in finalizer, I'll look
into
it.




In fact handling that control flow in finalizer is not as easy as I
hoped,
probably impossible, at least if we want to make it efficient. I forgot
about the limitations of R600 ISA.

OTOH it seems I've managed to fix the issues with loops, the patch is
attached (it's meant to be used instead of 7b0067d2). There are no
piglit
regressions on evergreen, but I didn't test any real apps.


This does seem to fix the problems in piglit, and looks close to what
I was attempting but written by someone who knows what they are doing :-)

What is the sb_sched.cpp change for at the end for?



It fixes those scheduler/regalloc errors for switch tests.

Unfortunately, now I've installed some benchmarks for testing and AFAICS
this patch breaks at least lightsmark 2008, so it seems the condition
removed by the patch was there for a reason.

I'll probably try to come up with better fix.



New patch is attached, the only difference is in the sb_sched.cpp (it
disables copy coalescing for some unsafe cases, so it may leave more MOVs
than previously, but I don't think there will be any noticeable effect on
performance).

So far I don't see any problems with it, but I don't have many GL apps on
the test machine. At least lightsmark and unigine demos work for me.



Based on my limited understanding of the code:

Acked-by: Alex Deucher alexander.deuc...@amd.com


Alex, thanks for the review, I understand you wanted it to get into mesa 
release, but it really needs careful testing with more apps, so far I 
hoped Dave would do it as long as he's looking into these issues anyway. 
In theory I can also install steam on the test machine and some games, 
it just needs the time and I'm not sure if I'll find it, so far my main 
job is sufficient to make me pretty tired.


Current scheduler in SB is very fragile after adding handling for all 
special cases discovered during initial debugging etc, I said since the 
very beginning that I'd like to rewrite it, if only I had time. So any 
change like this can potentially break some apps even if piglit passes, 
and I'm not ready to take responsibility for that if I commit it myself, 
I just don't have time to deal with all possible consequences on all 
supported chips.


If you think it's ok, just push this patch (it requires revert of the 
previous Dave's commit 7b0067d2). I'm really sorry that I can't do more 
to help with it.


Vadim





Vadim




Vadim



Dave.






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-15 Thread Vadim Girlin

On 12/16/2014 01:30 AM, Dave Airlie wrote:



New patch is attached, the only difference is in the sb_sched.cpp (it
disables copy coalescing for some unsafe cases, so it may leave more
MOVs
than previously, but I don't think there will be any noticeable effect on
performance).

So far I don't see any problems with it, but I don't have many GL apps on
the test machine. At least lightsmark and unigine demos work for me.



Based on my limited understanding of the code:

Acked-by: Alex Deucher alexander.deuc...@amd.com



Alex, thanks for the review, I understand you wanted it to get into mesa
release, but it really needs careful testing with more apps, so far I hoped
Dave would do it as long as he's looking into these issues anyway. In theory
I can also install steam on the test machine and some games, it just needs
the time and I'm not sure if I'll find it, so far my main job is sufficient
to make me pretty tired.

Current scheduler in SB is very fragile after adding handling for all
special cases discovered during initial debugging etc, I said since the very
beginning that I'd like to rewrite it, if only I had time. So any change
like this can potentially break some apps even if piglit passes, and I'm not
ready to take responsibility for that if I commit it myself, I just don't
have time to deal with all possible consequences on all supported chips.

If you think it's ok, just push this patch (it requires revert of the
previous Dave's commit 7b0067d2). I'm really sorry that I can't do more to
help with it.


Myself and Glenn are looking at it, Glenn noticed a piglit regression
from this yesterday, I'll reproduce today and take a look.


Hi, Dave  Glenn,

Thanks for looking into it. FWIW, when I worked on it I've ran piglit's 
quick tests and didn't see any regressions on evergreen (juniper 5750). 
There were some failed tests in some piglit runs, but AFAIU they were 
just random.


If there are any problems with this fix, I'll be glad to try to help, if 
time allows.


Vadim




Dave.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-10 Thread Vadim Girlin

On 12/09/2014 07:39 AM, Vadim Girlin wrote:

On 12/09/2014 05:18 AM, Dave Airlie wrote:

On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote:

On 12/06/2014 07:13 AM, Vadim Girlin wrote:


On 12/04/2014 01:43 AM, Dave Airlie wrote:


Hi Vadim,

I've been looking with Glenn's help into a bug in sb for a couple of
weeks now triggered by a change in how GLSL generates switch
statements.

I understand you probably aren't too interested in r600g but I believe
I'm hitting a design level problem and I would like some advice.

So it appears that GLSL can create loops that don't repeat for switch
statements, and it appears SB wasn't ready to handle such a thing.



Hi, Dave,

I suspect we should rather get rid of such loops somehow, i.e. convert
to something else, the loop that never repeats is not really a loop
anyway. AFAICS continue is not supported in switch statements
according to GLSL specs, so the loops generated for switch will
never be
repeated. Am I missing something? Even if repeating is possible
somehow,
at least we can get rid of the loops that are not repeated.

I think loops are less efficient than other control flow
instructions on
r600g hw (at least because they increase stack usage), and possibly on
other hw too.

In fact it seems sb basically gets rid of it already in IR, it just
doesn't know how to translate resulting control flow to ISA, because so
far it only supports specific control flow structure for if-then-else
that was previously preserved during optimizations. I think it may be
not very hard to implement support for that in finalizer, I'll look
into
it.



In fact handling that control flow in finalizer is not as easy as I
hoped,
probably impossible, at least if we want to make it efficient. I forgot
about the limitations of R600 ISA.

OTOH it seems I've managed to fix the issues with loops, the patch is
attached (it's meant to be used instead of 7b0067d2). There are no
piglit
regressions on evergreen, but I didn't test any real apps.


This does seem to fix the problems in piglit, and looks close to what
I was attempting but written by someone who knows what they are doing :-)

What is the sb_sched.cpp change for at the end for?


It fixes those scheduler/regalloc errors for switch tests.

Unfortunately, now I've installed some benchmarks for testing and AFAICS
this patch breaks at least lightsmark 2008, so it seems the condition
removed by the patch was there for a reason.

I'll probably try to come up with better fix.


New patch is attached, the only difference is in the sb_sched.cpp (it 
disables copy coalescing for some unsafe cases, so it may leave more 
MOVs than previously, but I don't think there will be any noticeable 
effect on performance).


So far I don't see any problems with it, but I don't have many GL apps 
on the test machine. At least lightsmark and unigine demos work for me.


Vadim




Vadim



Dave.





From d2d16fa39c7b4e871d67e05bad92a540d7e5ea68 Mon Sep 17 00:00:00 2001
From: Vadim Girlin vadimgir...@gmail.com
Date: Wed, 10 Dec 2014 14:41:10 +0300
Subject: [PATCH] r600g/sb: fix issues with loops created for switch

---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp   | 2 ++
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 ++
 src/gallium/drivers/r600/sb/sb_if_conversion.cpp | 4 ++--
 src/gallium/drivers/r600/sb/sb_ir.h  | 9 +++--
 src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++
 5 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index f0849ca..3f362c4 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -110,6 +110,8 @@ int bc_finalizer::run() {
 
 void bc_finalizer::finalize_loop(region_node* r) {
 
+	update_nstack(r);
+
 	cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10);
 	cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END);
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index d787e5b..403f938 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -758,6 +758,8 @@ int bc_parser::prepare_loop(cf_node* c) {
 	c-insert_before(reg);
 	rep-move(c, end-next);
 
+	reg-src_loop = true;
+
 	loop_stack.push(reg);
 	return 0;
 }
diff --git a/src/gallium/drivers/r600/sb/sb_if_conversion.cpp b/src/gallium/drivers/r600/sb/sb_if_conversion.cpp
index 93edace..3f2b1b1 100644
--- a/src/gallium/drivers/r600/sb/sb_if_conversion.cpp
+++ b/src/gallium/drivers/r600/sb/sb_if_conversion.cpp
@@ -115,13 +115,13 @@ void if_conversion::convert_kill_instructions(region_node *r,
 bool if_conversion::check_and_convert(region_node *r) {
 
 	depart_node *nd1 = static_castdepart_node*(r-first);
-	if (!nd1-is_depart())
+	if (!nd1-is_depart() || nd1-target != r)
 		return false;
 	if_node *nif = static_castif_node*(nd1-first);
 	if (!nif-is_if

Re: [Mesa-dev] r600/sb loop issue

2014-12-08 Thread Vadim Girlin

On 12/06/2014 07:13 AM, Vadim Girlin wrote:

On 12/04/2014 01:43 AM, Dave Airlie wrote:

Hi Vadim,

I've been looking with Glenn's help into a bug in sb for a couple of
weeks now triggered by a change in how GLSL generates switch
statements.

I understand you probably aren't too interested in r600g but I believe
I'm hitting a design level problem and I would like some advice.

So it appears that GLSL can create loops that don't repeat for switch
statements, and it appears SB wasn't ready to handle such a thing.


Hi, Dave,

I suspect we should rather get rid of such loops somehow, i.e. convert
to something else, the loop that never repeats is not really a loop
anyway. AFAICS continue is not supported in switch statements
according to GLSL specs, so the loops generated for switch will never be
repeated. Am I missing something? Even if repeating is possible somehow,
at least we can get rid of the loops that are not repeated.

I think loops are less efficient than other control flow instructions on
r600g hw (at least because they increase stack usage), and possibly on
other hw too.

In fact it seems sb basically gets rid of it already in IR, it just
doesn't know how to translate resulting control flow to ISA, because so
far it only supports specific control flow structure for if-then-else
that was previously preserved during optimizations. I think it may be
not very hard to implement support for that in finalizer, I'll look into
it.


In fact handling that control flow in finalizer is not as easy as I 
hoped, probably impossible, at least if we want to make it efficient. I 
forgot about the limitations of R600 ISA.


OTOH it seems I've managed to fix the issues with loops, the patch is 
attached (it's meant to be used instead of 7b0067d2). There are no 
piglit regressions on evergreen, but I didn't test any real apps.


Vadim






sb has the -is_loop() and it just checks !repeats.empty(), so this
meant in the finalizer code we'd fall into the if statement which
would then assert.

I hacked/fixed (more hacked), this in
7b0067d23a6f64cf83c42e7f11b2cd4100c569fe
which attempts to detect single pass loops and handle things that way.

However this lead to stack depth calculations being incorrectly done,
so I moved the single loop detect into the is_loop check, (see
attached patch).

This fixes the rendering in some places, but lead to a regression in
tests/shaders/glsl-vs-continue-in-switch-in-do-while.shader_test
error at : PHI t76||FP@R3.x,   t128||FP@R3.x, t115||FP@R3.x,
t102||FP@R3.x, t89||FP@R3.x : expected
 operand value t115||FP@R3.x, gpr contains t17||FP@R3.x
error at : PHI t76||FP@R3.x,   t128||FP@R3.x, t115||FP@R3.x,
t102||FP@R3.x, t89||FP@R3.x : expected
 operand value t102||FP@R3.x, gpr contains t17||FP@R3.x

Now Glenn suspected this was due to the is_loop check in
sb_shader.cpp:create_bbs,
and changing that check to only detect repeating loops removes that
issue,
but introduces stack sizing issues again, resulting in lockups/random
rendering.

So I just want to ask had you considered single loops with an always
break in sb design,


I didn't see such loops with any test cases, so I didn't even think
about it.


and perhaps some idea where things are going so wrong with the
register alloc above.


Not sure, but as long as the only repeat node is optimized away in
bc_parser because it's useless due to unconditional break, I suspect it
may be not easy to make all other code think that it's still a loop.

I've tried a quick fix to not optimize the repeat away for such loops,
but it results in other issues, probably it will require handling this
as a special case in other places, so it doesn't look like a good idea
either.

I'll try to implement the solution that I described above, that is,
translate resulting control flow back to ISA. If it won't be too much
work, it's probably the best way and it won't use loop instructions in
the end.



I suspect I'll keep digging into this, but its getting to the edges of
the brain space/time I can find!

Dave.





From 4967ef90847f921fc0ef7c018ae7ae8048d2a6ce Mon Sep 17 00:00:00 2001
From: Vadim Girlin vadimgir...@gmail.com
Date: Mon, 8 Dec 2014 13:11:48 +0300
Subject: [PATCH] r600g/sb: fix issues with loops created for switch statements

---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp   | 2 ++
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 ++
 src/gallium/drivers/r600/sb/sb_if_conversion.cpp | 4 ++--
 src/gallium/drivers/r600/sb/sb_ir.h  | 9 +++--
 src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +-
 5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index f0849ca..3f362c4 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -110,6 +110,8 @@ int bc_finalizer::run() {
 
 void bc_finalizer::finalize_loop(region_node* r

Re: [Mesa-dev] r600/sb loop issue

2014-12-08 Thread Vadim Girlin

On 12/09/2014 05:18 AM, Dave Airlie wrote:

On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote:

On 12/06/2014 07:13 AM, Vadim Girlin wrote:


On 12/04/2014 01:43 AM, Dave Airlie wrote:


Hi Vadim,

I've been looking with Glenn's help into a bug in sb for a couple of
weeks now triggered by a change in how GLSL generates switch
statements.

I understand you probably aren't too interested in r600g but I believe
I'm hitting a design level problem and I would like some advice.

So it appears that GLSL can create loops that don't repeat for switch
statements, and it appears SB wasn't ready to handle such a thing.



Hi, Dave,

I suspect we should rather get rid of such loops somehow, i.e. convert
to something else, the loop that never repeats is not really a loop
anyway. AFAICS continue is not supported in switch statements
according to GLSL specs, so the loops generated for switch will never be
repeated. Am I missing something? Even if repeating is possible somehow,
at least we can get rid of the loops that are not repeated.

I think loops are less efficient than other control flow instructions on
r600g hw (at least because they increase stack usage), and possibly on
other hw too.

In fact it seems sb basically gets rid of it already in IR, it just
doesn't know how to translate resulting control flow to ISA, because so
far it only supports specific control flow structure for if-then-else
that was previously preserved during optimizations. I think it may be
not very hard to implement support for that in finalizer, I'll look into
it.



In fact handling that control flow in finalizer is not as easy as I hoped,
probably impossible, at least if we want to make it efficient. I forgot
about the limitations of R600 ISA.

OTOH it seems I've managed to fix the issues with loops, the patch is
attached (it's meant to be used instead of 7b0067d2). There are no piglit
regressions on evergreen, but I didn't test any real apps.


This does seem to fix the problems in piglit, and looks close to what
I was attempting but written by someone who knows what they are doing :-)

What is the sb_sched.cpp change for at the end for?


It fixes those scheduler/regalloc errors for switch tests.

Unfortunately, now I've installed some benchmarks for testing and AFAICS 
this patch breaks at least lightsmark 2008, so it seems the condition 
removed by the patch was there for a reason.


I'll probably try to come up with better fix.

Vadim



Dave.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-05 Thread Vadim Girlin

On 12/04/2014 01:43 AM, Dave Airlie wrote:

Hi Vadim,

I've been looking with Glenn's help into a bug in sb for a couple of
weeks now triggered by a change in how GLSL generates switch
statements.

I understand you probably aren't too interested in r600g but I believe
I'm hitting a design level problem and I would like some advice.

So it appears that GLSL can create loops that don't repeat for switch
statements, and it appears SB wasn't ready to handle such a thing.


Hi, Dave,

I suspect we should rather get rid of such loops somehow, i.e. convert 
to something else, the loop that never repeats is not really a loop 
anyway. AFAICS continue is not supported in switch statements 
according to GLSL specs, so the loops generated for switch will never be 
repeated. Am I missing something? Even if repeating is possible somehow, 
at least we can get rid of the loops that are not repeated.


I think loops are less efficient than other control flow instructions on 
r600g hw (at least because they increase stack usage), and possibly on 
other hw too.


In fact it seems sb basically gets rid of it already in IR, it just 
doesn't know how to translate resulting control flow to ISA, because so 
far it only supports specific control flow structure for if-then-else 
that was previously preserved during optimizations. I think it may be 
not very hard to implement support for that in finalizer, I'll look into it.




sb has the -is_loop() and it just checks !repeats.empty(), so this
meant in the finalizer code we'd fall into the if statement which
would then assert.

I hacked/fixed (more hacked), this in 7b0067d23a6f64cf83c42e7f11b2cd4100c569fe
which attempts to detect single pass loops and handle things that way.

However this lead to stack depth calculations being incorrectly done,
so I moved the single loop detect into the is_loop check, (see
attached patch).

This fixes the rendering in some places, but lead to a regression in
tests/shaders/glsl-vs-continue-in-switch-in-do-while.shader_test
error at : PHI t76||FP@R3.x,   t128||FP@R3.x, t115||FP@R3.x,
t102||FP@R3.x, t89||FP@R3.x : expected
 operand value t115||FP@R3.x, gpr contains t17||FP@R3.x
error at : PHI t76||FP@R3.x,   t128||FP@R3.x, t115||FP@R3.x,
t102||FP@R3.x, t89||FP@R3.x : expected
 operand value t102||FP@R3.x, gpr contains t17||FP@R3.x

Now Glenn suspected this was due to the is_loop check in
sb_shader.cpp:create_bbs,
and changing that check to only detect repeating loops removes that issue,
but introduces stack sizing issues again, resulting in lockups/random rendering.

So I just want to ask had you considered single loops with an always
break in sb design,


I didn't see such loops with any test cases, so I didn't even think 
about it.



and perhaps some idea where things are going so wrong with the
register alloc above.


Not sure, but as long as the only repeat node is optimized away in 
bc_parser because it's useless due to unconditional break, I suspect it 
may be not easy to make all other code think that it's still a loop.


I've tried a quick fix to not optimize the repeat away for such loops, 
but it results in other issues, probably it will require handling this 
as a special case in other places, so it doesn't look like a good idea 
either.


I'll try to implement the solution that I described above, that is, 
translate resulting control flow back to ISA. If it won't be too much 
work, it's probably the best way and it won't use loop instructions in 
the end.




I suspect I'll keep digging into this, but its getting to the edges of
the brain space/time I can find!

Dave.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-05 Thread Vadim Girlin

On 12/06/2014 07:50 AM, Matt Turner wrote:

On Fri, Dec 5, 2014 at 8:13 PM, Vadim Girlin vadimgir...@gmail.com wrote:

I suspect we should rather get rid of such loops somehow, i.e. convert to
something else, the loop that never repeats is not really a loop anyway.
AFAICS continue is not supported in switch statements according to GLSL
specs, so the loops generated for switch will never be repeated. Am I
missing something? Even if repeating is possible somehow, at least we can
get rid of the loops that are not repeated.


I don't think that's true. I don't see anything in the spec that would
lead me to believe continue cannot occur in a switch statement.


I've double-checked some versions of GLSL spec (1.30, 1.50, 3.30, 4.40) 
and all of them say the same (section 6.4 Jumps):


The continue jump is used only in loops.


In fact, we have some relatively complicated shaders that have a
continue in a switch. See
tests/shaders/glsl-fs-continue-in-switch-in-do-while.shader_test



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/sb loop issue

2014-12-05 Thread Vadim Girlin

On 12/06/2014 08:01 AM, Matt Turner wrote:

On Fri, Dec 5, 2014 at 8:56 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 12/06/2014 07:50 AM, Matt Turner wrote:


On Fri, Dec 5, 2014 at 8:13 PM, Vadim Girlin vadimgir...@gmail.com
wrote:


I suspect we should rather get rid of such loops somehow, i.e. convert to
something else, the loop that never repeats is not really a loop anyway.
AFAICS continue is not supported in switch statements according to GLSL
specs, so the loops generated for switch will never be repeated. Am I
missing something? Even if repeating is possible somehow, at least we can
get rid of the loops that are not repeated.



I don't think that's true. I don't see anything in the spec that would
lead me to believe continue cannot occur in a switch statement.



I've double-checked some versions of GLSL spec (1.30, 1.50, 3.30, 4.40) and
all of them say the same (section 6.4 Jumps):

The continue jump is used only in loops.


Sure, but isn't the continue below in a loop?

do {
switch (...) {
case ...:
   continue;
}
} while (...);



Ah, now I see, you're right. I just was mostly thinking about that loop 
that is created for a switch in IR, not about source, and somehow 
confused these things.


Thanks for pointing that out. Hopefully such cases won't complicate the 
problem in sb even more, need to check those tests.



The grammar is pretty unambiguous.

  jump_statement:
 CONTINUE SEMICOLON
 BREAK SEMICOLON
 RETURN SEMICOLON
 RETURN expression SEMICOLON
 DISCARD SEMICOLON // Fragment shader only.

If continue can't be in a switch, neither can break. :)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/16] radeonsi: Initial geometry shader support

2014-01-28 Thread Vadim Girlin
On Wed, 2014-01-29 at 07:13 +1000, Dave Airlie wrote:
  3) In si_init_gs_rings:
  - could you please use readable decimal numbers for specifying the
  sizes? Like 1024 * 1024 * ...
  [...]
  - isn't 64 MB too many for a ring buffer?
 
  I can write the numbers any way you like. :) But I just copied them from
  the corresponding r600g patches; I don't know yet how these numbers were
  derived, or what the constraints are for the ring buffer sizes. I'm
  trying to find out more about this.
 
 
 I don't think they are derived from anything yet, they were just big
 numbers Vadim used,

IIRC all these magic numbers were taken from the fglrx command stream
for some simple GS test on my 512MB juniper card.

Vadim

 
 I suppose we can calculate them from max vertices for the geom shader
 * number of outputs  * size of each output.
 
 Dave.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] gallium-tgsi: add TGSI_OPCODE_{FMA-POPCNT-MSB-LSB} description

2014-01-07 Thread Vadim Girlin
On Tue, 2014-01-07 at 21:49 +0100, Marek Olšák wrote:
 FYI, Evergreen has dedicated instructions for both MAD and FMA. FMA
 seems to be available on DX11 chips only.

FWIW, not all evergreen chips support FMA, only high-end chips that
support FP64 (I guess cypress only), according to the isa docs:

 Instructions
 FMA
 Description
 Fused single-precision multiply-add. Only for double-precision parts.
 dst = src0 * src1 + src2
 

Vadim


 
 Marek
 
 On Tue, Jan 7, 2014 at 8:20 PM, Roland Scheidegger srol...@vmware.com wrote:
  Yes that is certainly related. I'm actually not entirely sure what is
  allowed in glsl by default as OpenGL seems to have some lax rules
  regarding precision in any case (float calculations not required but
  allowed to use denorms, at least earlier versions weren't required to
  support Infs neither and so on).
  It is quite possible the MAD we were always using would have been
  allowed to really do fma (at least with OpenGL), unless the precise
  qualifier was used (which isn't supported yet?).
  TGSI also isn't really watertight about such issues neither (that is if
  you use it with hw such as r300 then you certainly don't expect ieee754
  rules to be followed but if you've got a d3d10-capable backend then you
  are expected to follow rules specified there which are _mostly_
  ieee754-2008).
  So I'm not really sure if TGSI MAD should be allowed to do either
  rounding or not, but someday it should be figured out and spelled out
  explicitly in docs.
 
  Roland
 
 
  Am 07.01.2014 19:24, schrieb Maxence Le Doré:
  I forgot the link :
 
  https://urldefense.proofpoint.com/v1/url?u=http://www.geeks3d.com/20120106/precise-qualifier-in-glsl-and-nvidia-geforce-cards/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=%2FzSAl55KOH0z7T5qkRj6BX164wf6QpYOnJLIzojXBQc%3D%0As=0ac5e0fbd69867705f0c52090c9ddf84e7832be80e724a0983c5aa2f5dde72e0
 
  2014/1/7 Maxence Le Doré maxence.led...@gmail.com:
  For this reason, GLSL 4.0 introduces the 'precise' qualifier. I invite
  you to take a look at this article.
 
  2014/1/6 Roland Scheidegger srol...@vmware.com:
  Am 05.01.2014 01:34, schrieb Maxence Le Doré:
  FMA(a,b,c) keeps extra precision (usually 1 more bit of mantissa,
  afaik) for the result a*b and add this to c, to finally produce a
  IEEE754 32bit float result.
 
  MAD(a,b,c) product a IEEE754 32bit float product a*b and add it to C.
 
  So, fma can be slightly more accurate. An accuracy that is something
  very appreciate.
 
  Actually in newer languages (such as opencl) mad is used to indicate
  intermediate rounding does not matter, so if your cpu can do fma but not
  mul+add in a single cycle it is allowed to use fma instead.
  FMA OTOH of course forces no intermediate rounding.
  Our tgsi definitions certainly initially were meaning intermediate
  rounding should take place, I don't know if we need to keep it that way
  or could repurpose that slightly (so if you require the intermediate
  rounding you'd just use mul+add).
 
  Roland
 
 
 
 
 
  2014/1/5 Marek Olšák mar...@gmail.com:
  How is FMA different from MAD?
 
  Please document the new opcodes in src/gallium/docs/source/tgsi.rst.
 
  Marek
 
  On Sun, Jan 5, 2014 at 12:42 AM, Maxence Le Doré
  maxence.led...@gmail.com wrote:
  From: Maxence Le Doré Maxence Le Doré
 
  ---
   src/gallium/auxiliary/tgsi/tgsi_info.c   | 16 
   src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h |  6 ++
   src/gallium/include/pipe/p_shader_tokens.h   |  9 -
   3 files changed, 30 insertions(+), 1 deletion(-)
 
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
  b/src/gallium/auxiliary/tgsi/tgsi_info.c
  index 0beef44..ed55940 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
  +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
  @@ -221,6 +221,12 @@ static const struct tgsi_opcode_info 
  opcode_info[TGSI_OPCODE_LAST] =
  { 1, 3, 1, 0, 0, 0, OTHR, TXL2, TGSI_OPCODE_TXL2 },
  { 1, 2, 0, 0, 0, 0, COMP, IMUL_HI, TGSI_OPCODE_IMUL_HI },
  { 1, 2, 0, 0, 0, 0, COMP, UMUL_HI, TGSI_OPCODE_UMUL_HI },
  +   { 1, 3, 0, 0, 0, 0, COMP, FMA, TGSI_OPCODE_FMA },
  +   { 1, 1, 0, 0, 0, 0, COMP, POPCNT, TGSI_OPCODE_POPCNT },
  +   { 1, 1, 0, 0, 0, 0, COMP, IMSB, TGSI_OPCODE_IMSB },
  +   { 1, 1, 0, 0, 0, 0, COMP, ILSB, TGSI_OPCODE_ILSB },
  +   { 1, 1, 0, 0, 0, 0, COMP, UMSB, TGSI_OPCODE_UMSB },
  +   { 1, 1, 0, 0, 0, 0, COMP, ULSB, TGSI_OPCODE_ULSB },
   };
 
   const struct tgsi_opcode_info *
  @@ -321,6 +327,11 @@ tgsi_opcode_infer_type( uint opcode )
  case TGSI_OPCODE_IABS:
  case TGSI_OPCODE_ISSG:
  case TGSI_OPCODE_IMUL_HI:
  +   case TGSI_OPCODE_POPCNT:
  +   case TGSI_OPCODE_ILSB:
  +   case TGSI_OPCODE_IMSB:
  +   case TGSI_OPCODE_ULSB:
  +   case TGSI_OPCODE_UMSB:
 return TGSI_TYPE_SIGNED;
  default:
 return TGSI_TYPE_FLOAT;
  @@ -344,9 +355,14 @@ tgsi_opcode_infer_src_type( uint opcode )
  case TGSI_OPCODE_SAMPLE_I:
  case 

Re: [Mesa-dev] [PATCH] r600g/sb: fix stack size computation on evergreen

2013-12-09 Thread Vadim Girlin
On Mon, 2013-12-09 at 10:56 -0500, Tom Stellard wrote:
 On Sat, Dec 07, 2013 at 07:06:36PM +0400, Vadim Girlin wrote:
  On evergreen we have to reserve 1 stack element in some additional cases
  besides the ones mentioned in the docs, but stack size computation was
  recently reimplemented exactly as described in the docs by the patch that
  added workarounds for stack issues on EG/CM, resulting in regressions
  with some apps (Serious Sam 3).
  
  This patch fixes it by restoring previous behavior.
  
  Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
  
  Signed-off-by: Vadim Girlin vadimgir...@gmail.com
  Cc: 10.0 mesa-sta...@lists.freedesktop.org
  ---
   src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 
   1 file changed, 12 insertions(+), 4 deletions(-)
  
  diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
  b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
  index bc71cf8..355eb63 100644
  --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
  +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
  @@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) {
   unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops,
  unsigned ifs, unsigned add) {
  unsigned stack_elements = add;
  -   bool has_non_wqm_push_with_loops_on_stack = false;
  bool has_non_wqm_push = (add != 0);
  region_node *r = n-is_region() ?
  static_castregion_node*(n) : n-get_parent_region();
  @@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, 
  unsigned loops,
  while (r) {
  if (r-is_loop()) {
  ++loops;
  -   if (has_non_wqm_push)
  -   has_non_wqm_push_with_loops_on_stack = true;
  } else {
  ++ifs;
  has_non_wqm_push = true;
  @@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, 
  unsigned loops,
  switch (ctx.hw_class) {
  case HW_CLASS_R600:
  case HW_CLASS_R700:
  +   // If any non-WQM push is invoked, 2 elements should be 
  reserved.
  if (has_non_wqm_push)
  stack_elements += 2;
  break;
  case HW_CLASS_CAYMAN:
  +   // If any stack operation is invoked, 2 elements should be 
  reserved
  if (stack_elements)
  stack_elements += 2;
  break;
  case HW_CLASS_EVERGREEN:
  -   if (has_non_wqm_push_with_loops_on_stack)
  +   // According to the docs we need to reserve 1 element for each 
  of the
  +   // following cases:
  +   //   1) non-WQM push is used with WQM/LOOP frames on stack
  +   //   2) ALU_ELSE_AFTER is used at the point of max stack usage
  +   // NOTE:
  +   // It was found that the conditions above are not sufficient, 
  there are
  +   // other cases where we also need to reserve stack space, 
  that's why
  +   // we always reserve 1 stack element if we have non-WQM push on 
  stack.
  +   // Condition 2 is ignored for now because we don't use this 
  instruction.
  +   if (has_non_wqm_push)
  ++stack_elements;
 
 The kernel analyzer reports a stack size of 2 for compute shaders that
 have 3 levels of ALU_PUSH_BEFORE.  This would suggest that you either need to
 reserve 2 sub-entries (stack_elements in the sb code) when there is a
 non-wqm push, or apply the CAYMAN rules to EVERGREEN.
 
 It is possible, though, that the kernel analyzer is over-allocating and
 this patch is correct, but I don't have any evidence for this yet.

Is there any test that fails with this patch? AFAIK this algorithm
worked fine for about 8 months in both old and sb backends, so I'd
rather prefer to have any evidence that this is not correct before
increasing stack allocation and reducing performance.

Vadim

 
 -Tom
 
 
  break;
  }
  -- 
  1.8.4.2
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: fix stack size computation on evergreen

2013-12-07 Thread Vadim Girlin
On evergreen we have to reserve 1 stack element in some additional cases
besides the ones mentioned in the docs, but stack size computation was
recently reimplemented exactly as described in the docs by the patch that
added workarounds for stack issues on EG/CM, resulting in regressions
with some apps (Serious Sam 3).

This patch fixes it by restoring previous behavior.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Cc: 10.0 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index bc71cf8..355eb63 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) {
 unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops,
unsigned ifs, unsigned add) {
unsigned stack_elements = add;
-   bool has_non_wqm_push_with_loops_on_stack = false;
bool has_non_wqm_push = (add != 0);
region_node *r = n-is_region() ?
static_castregion_node*(n) : n-get_parent_region();
@@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned 
loops,
while (r) {
if (r-is_loop()) {
++loops;
-   if (has_non_wqm_push)
-   has_non_wqm_push_with_loops_on_stack = true;
} else {
++ifs;
has_non_wqm_push = true;
@@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned 
loops,
switch (ctx.hw_class) {
case HW_CLASS_R600:
case HW_CLASS_R700:
+   // If any non-WQM push is invoked, 2 elements should be 
reserved.
if (has_non_wqm_push)
stack_elements += 2;
break;
case HW_CLASS_CAYMAN:
+   // If any stack operation is invoked, 2 elements should be 
reserved
if (stack_elements)
stack_elements += 2;
break;
case HW_CLASS_EVERGREEN:
-   if (has_non_wqm_push_with_loops_on_stack)
+   // According to the docs we need to reserve 1 element for each 
of the
+   // following cases:
+   //   1) non-WQM push is used with WQM/LOOP frames on stack
+   //   2) ALU_ELSE_AFTER is used at the point of max stack usage
+   // NOTE:
+   // It was found that the conditions above are not sufficient, 
there are
+   // other cases where we also need to reserve stack space, 
that's why
+   // we always reserve 1 stack element if we have non-WQM push on 
stack.
+   // Condition 2 is ignored for now because we don't use this 
instruction.
+   if (has_non_wqm_push)
++stack_elements;
break;
}
-- 
1.8.4.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: fix value::is_fixed()

2013-10-27 Thread Vadim Girlin
---
cc: Andreas Boll andreas.boll@gmail.com

Andreas, this patch should fix the issue with SB on RV770 that you 
reported on IRC (assert with interpolation-mixed.shader_test).

There are no piglit regressions with this patch on my evergreen,
but I can't test with r700 or any other chips.

 src/gallium/drivers/r600/sb/sb_valtable.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp 
b/src/gallium/drivers/r600/sb/sb_valtable.cpp
index 00aee66..0d39e9c 100644
--- a/src/gallium/drivers/r600/sb/sb_valtable.cpp
+++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp
@@ -255,8 +255,8 @@ void value::set_prealloc() {
 bool value::is_fixed() {
if (array  array-gpr)
return true;
-   if (chunk)
-   return chunk-is_fixed();
+   if (chunk  chunk-is_fixed())
+   return true;
return flags  VLF_FIXED;
 }
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Initialize shader::dce_flags.

2013-10-19 Thread Vadim Girlin

On 10/19/2013 06:18 AM, Vinson Lee wrote:

Fixes Uninitialized scalar field defect reported by Coverity.

Signed-off-by: Vinson Lee v...@freedesktop.org


Reviewed-by: Vadim Girlin vadimgir...@gmail.com


---
  src/gallium/drivers/r600/sb/sb_shader.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 98e52b1..38617a8 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -39,7 +39,8 @@ shader::shader(sb_context sctx, shader_target t, unsigned id)
coal(*this), bbs(),
target(t), vt(ex), ex(*this), root(),
compute_interferences(),
-  has_alu_predication(), uses_gradients(), safe_math(), ngpr(), nstack() {}
+  has_alu_predication(),
+  uses_gradients(), safe_math(), ngpr(), nstack(), dce_flags() {}

  bool shader::assign_slot(alu_node* n, alu_node *slots[5]) {




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: fix issue with DCE between GVN and GCM (v2)

2013-10-14 Thread Vadim Girlin
We can't perform DCE using the liveness pass between GVN and GCM
because it relies on the correct schedule, but GVN doesn't care about
preserving correctness - it's rescheduled later by GCM.

This patch makes dce_cleanup pass perform simple DCE
between GVN and GCM instead of relying on liveness pass.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_core.cpp| 10 --
 src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp | 22 ++
 src/gallium/drivers/r600/sb/sb_pass.h  |  7 +--
 src/gallium/drivers/r600/sb/sb_shader.h| 12 
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp 
b/src/gallium/drivers/r600/sb/sb_core.cpp
index b5dd88e..9fd9d9a 100644
--- a/src/gallium/drivers/r600/sb/sb_core.cpp
+++ b/src/gallium/drivers/r600/sb/sb_core.cpp
@@ -184,6 +184,8 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
SB_RUN_PASS(psi_ops,1);
 
SB_RUN_PASS(liveness,   0);
+
+   sh-dce_flags = DF_REMOVE_DEAD | DF_EXPAND;
SB_RUN_PASS(dce_cleanup,0);
SB_RUN_PASS(def_use,0);
 
@@ -201,9 +203,10 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
 
SB_RUN_PASS(gvn,1);
 
-   SB_RUN_PASS(liveness,   0);
+   SB_RUN_PASS(def_use,1);
+
+   sh-dce_flags = DF_REMOVE_DEAD | DF_REMOVE_UNUSED;
SB_RUN_PASS(dce_cleanup,1);
-   SB_RUN_PASS(def_use,0);
 
SB_RUN_PASS(ra_split,   0);
SB_RUN_PASS(def_use,0);
@@ -217,6 +220,9 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
sh-compute_interferences = true;
SB_RUN_PASS(liveness,   0);
 
+   sh-dce_flags = DF_REMOVE_DEAD;
+   SB_RUN_PASS(dce_cleanup,1);
+
SB_RUN_PASS(ra_coalesce,1);
SB_RUN_PASS(ra_init,1);
 
diff --git a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp 
b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
index f879395..79aef91 100644
--- a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
+++ b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
@@ -56,7 +56,8 @@ bool dce_cleanup::visit(cf_node n, bool enter) {
else
cleanup_dst(n);
} else {
-   if (n.bc.op_ptr-flags  (CF_CLAUSE | CF_BRANCH | CF_LOOP))
+   if ((sh.dce_flags  DF_EXPAND) 
+   (n.bc.op_ptr-flags  (CF_CLAUSE | CF_BRANCH | 
CF_LOOP)))
n.expand();
}
return true;
@@ -107,19 +108,20 @@ bool dce_cleanup::visit(region_node n, bool enter) {
 }
 
 void dce_cleanup::cleanup_dst(node n) {
-   cleanup_dst_vec(n.dst);
+   if (!cleanup_dst_vec(n.dst)  remove_unused 
+   !n.dst.empty()  !(n.flags  NF_DONT_KILL)  n.parent)
+   n.remove();
 }
 
 bool dce_cleanup::visit(container_node n, bool enter) {
-   if (enter) {
+   if (enter)
cleanup_dst(n);
-   } else {
-
-   }
return true;
 }
 
-void dce_cleanup::cleanup_dst_vec(vvec vv) {
+bool dce_cleanup::cleanup_dst_vec(vvec vv) {
+   bool alive = false;
+
for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) {
value* v = *I;
if (!v)
@@ -128,9 +130,13 @@ void dce_cleanup::cleanup_dst_vec(vvec vv) {
if (v-gvn_source  v-gvn_source-is_dead())
v-gvn_source = NULL;
 
-   if (v-is_dead())
+   if (v-is_dead() || (remove_unused  !v-is_rel()  !v-uses))
v = NULL;
+   else
+   alive = true;
}
+
+   return alive;
 }
 
 } // namespace r600_sb
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h 
b/src/gallium/drivers/r600/sb/sb_pass.h
index 95d2a20..a3f8515 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -119,9 +119,12 @@ public:
 class dce_cleanup : public vpass {
using vpass::visit;
 
+   bool remove_unused;
+
 public:
 
-   dce_cleanup(shader s) : vpass(s) {}
+   dce_cleanup(shader s) : vpass(s),
+   remove_unused(s.dce_flags  DF_REMOVE_UNUSED) {}
 
virtual bool visit(node n, bool enter);
virtual bool visit(alu_group_node n, bool enter);
@@ -135,7 +138,7 @@ public:
 private:
 
void cleanup_dst(node n);
-   void cleanup_dst_vec(vvec vv);
+   bool cleanup_dst_vec(vvec vv);
 
 };
 
diff --git a/src/gallium/drivers/r600/sb/sb_shader.h 
b/src/gallium/drivers/r600/sb/sb_shader.h
index e515d31..7955bba 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.h
+++ b/src/gallium/drivers/r600/sb

[Mesa-dev] [PATCH] r600g: fix tgsi_op2_s with trans-only instructions

2013-10-11 Thread Vadim Girlin
This fixes the issue when dst and src is the same reg and operation on one
channel overwrites the source for other channels, e.g.:

UMUL TEMP[2].xyz, TEMP[0].xyzz, TEMP[2].

In this example the result of the operation on channel x is written in
TEMP[2].x and then used as a second source operand for channels y and z
instead of original value in TEMP[2].x.

This patch stores the results in temp reg and moves them to
dst after performing operation on all channels.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70327

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_shader.c | 36 +-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index d17d670..aed2100 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1638,15 +1638,22 @@ static int tgsi_op2_s(struct r600_shader_ctx *ctx, int 
swap, int trans_only)
 {
struct tgsi_full_instruction *inst = 
ctx-parse.FullToken.FullInstruction;
struct r600_bytecode_alu alu;
-   int i, j, r;
-   int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);
+   unsigned write_mask = inst-Dst[0].Register.WriteMask;
+   int i, j, r, lasti = tgsi_last_instruction(write_mask);
+   /* use temp register if trans_only and more than one dst component */
+   int use_tmp = trans_only  (write_mask ^ (1  lasti));
 
-   for (i = 0; i  lasti + 1; i++) {
-   if (!(inst-Dst[0].Register.WriteMask  (1  i)))
+   for (i = 0; i = lasti; i++) {
+   if (!(write_mask  (1  i)))
continue;
 
memset(alu, 0, sizeof(struct r600_bytecode_alu));
-   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
+   if (use_tmp) {
+   alu.dst.sel = ctx-temp_reg;
+   alu.dst.chan = i;
+   alu.dst.write = 1;
+   } else
+   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
 
alu.op = ctx-inst_info-op;
if (!swap) {
@@ -1675,6 +1682,25 @@ static int tgsi_op2_s(struct r600_shader_ctx *ctx, int 
swap, int trans_only)
if (r)
return r;
}
+
+   if (use_tmp) {
+   /* move result from temp to dst */
+   for (i = 0; i = lasti; i++) {
+   if (!(write_mask  (1  i)))
+   continue;
+
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
+   alu.src[0].sel = ctx-temp_reg;
+   alu.src[0].chan = i;
+   alu.last = (i == lasti);
+
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
+   }
return 0;
 }
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr

2013-10-10 Thread Vadim Girlin
Currently it's hardcoded in the shader, so every change requires
compilation of the shader variant, killing the performance
in Serious Sam 3 and probably other apps.

This patch passes alpha_ref in the user sgpr and removes it from
the shader key.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c |  8 --
 src/gallium/drivers/radeonsi/radeonsi_shader.h | 39 +-
 src/gallium/drivers/radeonsi/si_state.c|  7 ++---
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 97ed4e3..5279bb0 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -570,11 +570,14 @@ static void si_alpha_test(struct lp_build_tgsi_context 
*bld_base,
 
if (si_shader_ctx-shader-key.ps.alpha_func != PIPE_FUNC_NEVER) {
LLVMValueRef out_ptr = 
si_shader_ctx-radeon_bld.soa.outputs[index][3];
+   LLVMValueRef alpha_ref = 
LLVMGetParam(si_shader_ctx-radeon_bld.main_fn,
+   SI_PARAM_ALPHA_REF);
+
LLVMValueRef alpha_pass =
lp_build_cmp(bld_base-base,
 si_shader_ctx-shader-key.ps.alpha_func,
 LLVMBuildLoad(gallivm-builder, out_ptr, 
),
-lp_build_const_float(gallivm, 
si_shader_ctx-shader-key.ps.alpha_ref));
+alpha_ref);
LLVMValueRef arg =
lp_build_select(bld_base-base,
alpha_pass,
@@ -1569,7 +1572,7 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
 {
struct lp_build_tgsi_context *bld_base = 
si_shader_ctx-radeon_bld.soa.bld_base;
struct gallivm_state *gallivm = bld_base-base.gallivm;
-   LLVMTypeRef params[20], f32, i8, i32, v2i32, v3i32;
+   LLVMTypeRef params[21], f32, i8, i32, v2i32, v3i32;
unsigned i, last_sgpr, num_params;
 
i8 = LLVMInt8TypeInContext(gallivm-context);
@@ -1614,6 +1617,7 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
break;
 
case TGSI_PROCESSOR_FRAGMENT:
+   params[SI_PARAM_ALPHA_REF] = f32;
params[SI_PARAM_PRIM_MASK] = i32;
last_sgpr = SI_PARAM_PRIM_MASK;
params[SI_PARAM_PERSP_SAMPLE] = v2i32;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h 
b/src/gallium/drivers/radeonsi/radeonsi_shader.h
index 1db8bb8..c9e851a 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h
@@ -37,9 +37,10 @@
 #define SI_SGPR_VERTEX_BUFFER  6  /* VS only */
 #define SI_SGPR_SO_BUFFER  8  /* VS only, stream-out */
 #define SI_SGPR_START_INSTANCE 10 /* VS only */
+#define SI_SGPR_ALPHA_REF  6  /* PS only */
 
 #define SI_VS_NUM_USER_SGPR11
-#define SI_PS_NUM_USER_SGPR6
+#define SI_PS_NUM_USER_SGPR7
 
 /* LLVM function parameter indices */
 #define SI_PARAM_CONST 0
@@ -53,23 +54,24 @@
 /* the other VS parameters are assigned dynamically */
 
 /* PS only parameters */
-#define SI_PARAM_PRIM_MASK 3
-#define SI_PARAM_PERSP_SAMPLE  4
-#define SI_PARAM_PERSP_CENTER  5
-#define SI_PARAM_PERSP_CENTROID6
-#define SI_PARAM_PERSP_PULL_MODEL  7
-#define SI_PARAM_LINEAR_SAMPLE 8
-#define SI_PARAM_LINEAR_CENTER 9
-#define SI_PARAM_LINEAR_CENTROID   10
-#define SI_PARAM_LINE_STIPPLE_TEX  11
-#define SI_PARAM_POS_X_FLOAT   12
-#define SI_PARAM_POS_Y_FLOAT   13
-#define SI_PARAM_POS_Z_FLOAT   14
-#define SI_PARAM_POS_W_FLOAT   15
-#define SI_PARAM_FRONT_FACE16
-#define SI_PARAM_ANCILLARY 17
-#define SI_PARAM_SAMPLE_COVERAGE   18
-#define SI_PARAM_POS_FIXED_PT  19
+#define SI_PARAM_ALPHA_REF 3
+#define SI_PARAM_PRIM_MASK 4
+#define SI_PARAM_PERSP_SAMPLE  5
+#define SI_PARAM_PERSP_CENTER  6
+#define SI_PARAM_PERSP_CENTROID7
+#define SI_PARAM_PERSP_PULL_MODEL  8
+#define SI_PARAM_LINEAR_SAMPLE 9
+#define SI_PARAM_LINEAR_CENTER 10
+#define SI_PARAM_LINEAR_CENTROID   11
+#define SI_PARAM_LINE_STIPPLE_TEX  12
+#define SI_PARAM_POS_X_FLOAT   13
+#define SI_PARAM_POS_Y_FLOAT   14
+#define SI_PARAM_POS_Z_FLOAT   15
+#define SI_PARAM_POS_W_FLOAT   16
+#define SI_PARAM_FRONT_FACE17
+#define SI_PARAM_ANCILLARY 18
+#define SI_PARAM_SAMPLE_COVERAGE   19
+#define SI_PARAM_POS_FIXED_PT  20
 
 struct si_shader_io {
unsignedname;
@@ -124,7 +126,6 @@ union si_shader_key {
unsignedalpha_func:3

Re: [Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr

2013-10-10 Thread Vadim Girlin

On 10/10/2013 02:11 PM, Michel Dänzer wrote:

On Don, 2013-10-10 at 12:49 +0400, Vadim Girlin wrote:

Currently it's hardcoded in the shader, so every change requires
compilation of the shader variant, killing the performance
in Serious Sam 3 and probably other apps.

This patch passes alpha_ref in the user sgpr and removes it from
the shader key.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com


Reviewed-by: Michel Dänzer michel.daen...@amd.com

I presume this causes no regressions with piglit quick.tests.


Yes, there are no regressions with piglit. Thanks for reviewing.

By the way, I'm also not sure if this is the right way of doing it, 
especially if we'll need to pass more parameters for any new features.


Possibly some other ways could be more preferable, e.g. to put it with 
any other data that we may need in the future into internal const buffer 
(like we do in r600g for clip planes etc), or maybe there are other ways 
on SI that I'm not aware of yet?


Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr

2013-10-10 Thread Vadim Girlin

On 10/10/2013 08:10 PM, Christian König wrote:

Am 10.10.2013 18:02, schrieb Vadim Girlin:

On 10/10/2013 02:11 PM, Michel Dänzer wrote:

On Don, 2013-10-10 at 12:49 +0400, Vadim Girlin wrote:

Currently it's hardcoded in the shader, so every change requires
compilation of the shader variant, killing the performance
in Serious Sam 3 and probably other apps.

This patch passes alpha_ref in the user sgpr and removes it from
the shader key.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com


Reviewed-by: Michel Dänzer michel.daen...@amd.com

I presume this causes no regressions with piglit quick.tests.


Yes, there are no regressions with piglit. Thanks for reviewing.

By the way, I'm also not sure if this is the right way of doing it,
especially if we'll need to pass more parameters for any new features.

Possibly some other ways could be more preferable, e.g. to put it with
any other data that we may need in the future into internal const
buffer (like we do in r600g for clip planes etc), or maybe there are
other ways on SI that I'm not aware of yet?


That strongly depends on how often we use a parameter. The docs speak of
a penalty associated with loading each SGPR so we should try to use as
less as possible, but loading something from constant space is also
costly without proper support for the constant IB.


By the way, AFAICS some SGPR inputs are often not used at all in the 
shaders, I guess we might want to use per-shader mapping of parameters 
to SGPRs so that we'll only load actually used values for each shader. 
Compiler will need to pack required input SGPRs to lowest indices and 
provide the parameter-SGPR map to the driver. OTOH this would slightly 
increase the amount of driver's work, so I'm not really sure yet if it's 
worth it, looks like we already have a pretty significant overhead.


Vadim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Move variable dereference after null check.

2013-09-28 Thread Vadim Girlin

On 09/28/2013 10:08 AM, Vinson Lee wrote:

Fixes Deference before null check defect reported by Coverity.

Signed-off-by: Vinson Lee v...@freedesktop.org


Reviewed-by: Vadim Girlin vadimgir...@gmail.com


---
  src/gallium/drivers/r600/sb/sb_ra_init.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
index 0b332a9..e53aba5 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
@@ -395,11 +395,12 @@ void ra_init::color_bs_constraint(ra_constraint* c) {

for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) {
value *v = *I;
-   sel_chan gpr = v-get_final_gpr();

if (!v || v-is_dead())
continue;

+   sel_chan gpr = v-get_final_gpr();
+
val_set interf;

if (v-chunk)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Vadim Girlin

On 08/28/2013 01:15 PM, Christian König wrote:

Well, for this discussion let's just assume that we fixed the delay in
the upper layers of the stack and the driver sees the shader code as
soon as the application (if I understood it correctly Vadim has just
volunteered for the job).


No, I'm not really volunteering to implement that. :)
I'm not even sure if it's possible in reasonable time. In fact it was 
more like a theoretical discussion about what would be required for the 
early compilation in the driver to make sense.


Perhaps I failed to explain it, but actually my point is that while the 
compilation is deferred in upper layers and nobody is going to change 
this (if it's possible at all), it doesn't make sense to try compiling 
early in the driver. I think we might prefer to defer the compilation in 
the driver as well - it doesn't make overall situation any worse, but 
can make it better by not compiling unused variants at least.


Vadim


Also let's assume that shaders are small and having allot of shader
variants around after they are compiled isn't bad.

In this case the probably best solution is to compile early and try to
make the shaders as state invariant as possible, e.g. don't do
optimizations like getting ride of extra exports for case where we don't
need the alpha test or if it's just a dependency on a boolean then have
both variants covered by the bytecode and use a bit constant to choose
between the two etc...

As a second step the driver should create a optimized version of the
shader in a background thread when we know all the state that is/was
active when the shader is used.

Of course you need a bit of heuristic for this, cause sometimes it is
better to switch between shader variants and other times it is better to
have one variant covering all the different states and just use bit
constants to choose between them.

Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extra
copy if you get this mail twice.

Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the
problem that makes the games freeze due to shader compilation on first
use when something new appears on the screen, but I don't think we can
solve this problem in the *driver* by trying to compile early, because
AFAICS currently the shaders are passed to the driver too late anyway,
and this happens not only with wine. E.g. when I run Heaven in a
window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see
Heaven's window and console output at the same time, what I see is
that most of GL dumps happen while Heaven shows splash screen with
loading progress, but most of the driver's dumps appear on the first
frame and few more times during benchmark. It looks like compilation
is deferred somewhere in the stack before the driver, or am I missing
something?

Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin
vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad
when we
are compiling variants that are never used. Currently glxgears
compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched
handling
of the shaders to deferred compilation, that is, shaders are
compiled only
before the actual draw. I found later that it's not really required
for GS,
but IIRC this change results in only 5 shaders being compiled for
glxgears
instead of 18. It seems most of the useless variants are results of
state
changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and
it's
actually noticeable with drawoverhead demo, but I didn't see any
regressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented
it in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from
Unigine
Heaven 3.0 some time

[Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin
We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no regressions,
fixes some tests:

  default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
  default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
  llvm   - fixes about 25 tests related to depth/stencil
  llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
   regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H 
instructions are not placed in the same TEX clause with corresponding SAMPLE_G.

 src/gallium/drivers/r600/r600_shader.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
unsigned opcode;
int i, j, k, r = 0;
int next_pos_base = 60, next_param_base = 0;
+   int max_color_exports = MAX2(key.nr_cbufs, 1);
/* Declarations used by llvm code */
bool use_llvm = false;
bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
radeon_llvm_ctx.face_gpr = ctx.face_gpr;
radeon_llvm_ctx.r600_inputs = ctx.shader-input;
radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+   radeon_llvm_ctx.color_buffer_count = max_color_exports;
radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
(rscreen-chip_class = EVERGREEN);
radeon_llvm_ctx.stream_outputs = so;
@@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
case TGSI_PROCESSOR_FRAGMENT:
if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
/* never export more colors than the number of 
CBs */
-   if (shader-output[i].sid = key.nr_cbufs) {
+   if (shader-output[i].sid = max_color_exports) 
{
/* skip export */
j--;
continue;
@@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
output[j].type = 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
shader-nr_ps_color_exports++;
if (shader-fs_write_all  
(rscreen-chip_class = EVERGREEN)) {
-   for (k = 1; k  key.nr_cbufs; k++) {
+   for (k = 1; k  max_color_exports; k++) 
{
j++;
memset(output[j], 0, 
sizeof(struct r600_bytecode_output));
output[j].gpr = 
shader-output[i].gpr;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/27/2013 11:00 PM, Roland Scheidegger wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.


I thought about performance, but my main concern now is to avoid serious 
regressions after enabling sb, we can try to improve it later.


Even if we won't emit this color export, we'll have fake export (with 
all color components masked) instead, and I'm not sure whether it's 
cheaper. Possibly hardware can see that there is no actual memory write, 
but benchmarks are needed to prove it.


Also there is another possible improvement for exports - sometimes we 
need to export depth/stencil but no colors, probably we can get rid of 
fake color export as well in such cases. Anyway, this also needs 
additional testing/benchmarking.


Vadim



Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no regressions,
fixes some tests:

   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H
instructions are not placed in the same TEX clause with corresponding SAMPLE_G.

  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
unsigned opcode;
int i, j, k, r = 0;
int next_pos_base = 60, next_param_base = 0;
+   int max_color_exports = MAX2(key.nr_cbufs, 1);
/* Declarations used by llvm code */
bool use_llvm = false;
bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
radeon_llvm_ctx.face_gpr = ctx.face_gpr;
radeon_llvm_ctx.r600_inputs = ctx.shader-input;
radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+   radeon_llvm_ctx.color_buffer_count = max_color_exports;
radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
(rscreen-chip_class = EVERGREEN);
radeon_llvm_ctx.stream_outputs = so;
@@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
case TGSI_PROCESSOR_FRAGMENT:
if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
/* never export more colors than the number of 
CBs */
-   if (shader-output[i].sid = key.nr_cbufs) {
+   if (shader-output[i].sid = max_color_exports) 
{
/* skip export */
j--;
continue;
@@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
output[j].type = 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
shader-nr_ps_color_exports++;
if (shader-fs_write_all  (rscreen-chip_class 
= EVERGREEN)) {
-   for (k = 1; k  key.nr_cbufs; k++) {
+   for (k = 1; k  max_color_exports; k++) 
{
j++;
memset(output[j], 0, 
sizeof(struct r600_bytecode_output));
output[j].gpr = 
shader-output[i].gpr;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 12:43 AM, Marek Olšák wrote:

Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.


I don't think that shader variants are bad, but it's definitely bad when 
we are compiling variants that are never used. Currently glxgears 
compiles 18 ps/vs shaders. In my branch with initial GS support [1] I 
switched handling of the shaders to deferred compilation, that is, 
shaders are compiled only before the actual draw. I found later that 
it's not really required for GS, but IIRC this change results in only 5 
shaders being compiled for glxgears instead of 18. It seems most of the 
useless variants are results of state changes between creation of the 
shader state (initial compilation) and actual draw call.


I had some concerns about increased overhead with those changes, and 
it's actually noticeable with drawoverhead demo, but I didn't see any 
regressions with a few real apps that I tested, e.g. glxgears even 
showed slightly better performance with these changes. Probably I also 
implemented it in a not very optimal way (I was mostly concentrated on 
GS support) and the overhead can be reduced.


One more thing is duplicate shaders, I've analyzed shader dumps from 
Unigine Heaven 3.0 some time ago and found that from about 320 compiled 
shaders, only about 180 (50%) were unique, others were duplicates 
(detected by comparing the bytecode dumps for them in an automated way), 
maybe they had different shader keys (which still resulted in the same 
bytecode), but I suspect duplicate pipe shaders were also involved. 
Unfortunately I didn't have a time to investigate it more thoroughly 
since then.


So my point is that we don't really need to eliminate shader variants, 
first we need to eliminate compilation of unused variants and duplicate 
shaders. Also we might want to consider offloading of the compilation to 
separate thread(s) and caching of shader binaries between runs.


Vadim

 [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no regressions,
fixes some tests:

   default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
   llvm   - fixes about 25 tests related to depth/stencil
   llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H
instructions are not placed in the same TEX clause with corresponding SAMPLE_G.

  src/gallium/drivers/r600/r600_shader.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
   unsigned opcode;
   int i, j, k, r = 0;
   int next_pos_base = 60, next_param_base = 0;
+ int max_color_exports = MAX2(key.nr_cbufs, 1);
   /* Declarations used by llvm code */
   bool use_llvm = false;
   bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
   radeon_llvm_ctx.face_gpr = ctx.face_gpr;
   radeon_llvm_ctx.r600_inputs = ctx.shader-input;
   radeon_llvm_ctx.r600_outputs = ctx.shader-output;
- radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+ radeon_llvm_ctx.color_buffer_count = max_color_exports;
   radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
   radeon_llvm_ctx.fs_color_all = shader

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 02:28 AM, Roland Scheidegger wrote:

Am 27.08.2013 23:52, schrieb Vadim Girlin:

On 08/28/2013 12:43 AM, Marek Olšák wrote:

Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.


I don't think that shader variants are bad, but it's definitely bad when
we are compiling variants that are never used. Currently glxgears
compiles 18 ps/vs shaders. In my branch with initial GS support [1] I
switched handling of the shaders to deferred compilation, that is,
shaders are compiled only before the actual draw. I found later that
it's not really required for GS, but IIRC this change results in only 5
shaders being compiled for glxgears instead of 18. It seems most of the
useless variants are results of state changes between creation of the
shader state (initial compilation) and actual draw call.

I had some concerns about increased overhead with those changes, and
it's actually noticeable with drawoverhead demo, but I didn't see any
regressions with a few real apps that I tested, e.g. glxgears even
showed slightly better performance with these changes. Probably I also
implemented it in a not very optimal way (I was mostly concentrated on
GS support) and the overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from
Unigine Heaven 3.0 some time ago and found that from about 320 compiled
shaders, only about 180 (50%) were unique, others were duplicates
(detected by comparing the bytecode dumps for them in an automated way),
maybe they had different shader keys (which still resulted in the same
bytecode), but I suspect duplicate pipe shaders were also involved.
Unfortunately I didn't have a time to investigate it more thoroughly
since then.

So my point is that we don't really need to eliminate shader variants,
first we need to eliminate compilation of unused variants and duplicate
shaders. Also we might want to consider offloading of the compilation to
separate thread(s) and caching of shader binaries between runs.


Hmm ok that seems a way more complicated problem than I thought :-).
Compile early and you might compile variants you will never use, compile
late and the delay might be noticeable.


Compilation of unused variants is not bad if they are compiled at the 
game/level loading time, I think many apps are trying to compile shaders 
early to avoid freezes during gameplay. But trying to compile early in 
the driver doesn't make sense currently because it's already too late 
anyway, if I'm not missing something, it's deferred in mesa or state 
tracker.


Otherwise probably it would be preferable for the driver to precompile 
variants that are likely to be used (but only if we really can do it 
early, at the loading time when shaders are created by the app).



I just thought it might be unlikely you'd actually need two variants -
e.g. some depth exporting shader is probably unlikely to use alpha test.
But ok I guess it shouldn't write color in this case, so even then it
might never be worth bothering. Was just a random idea ;-).


I think it's a good idea that just needs some benchmarking to make sure 
that it can provide any real benefits. I only wanted to say that it can 
be done separately from this fix.


Vadim




Roland




Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger
srol...@vmware.com wrote:

Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:

We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no
regressions,
fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and
 regressions cased by reordering of exports in sb)

With this patch, there are no regressions with default+sb vs default.
There is one regression with llvm+sb vs llvm -
fs-texturegrad-miplevels,
AFAICS it's a problem with llvm

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-27 Thread Vadim Girlin

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the problem 
that makes the games freeze due to shader compilation on first use when 
something new appears on the screen, but I don't think we can solve this 
problem in the *driver* by trying to compile early, because AFAICS 
currently the shaders are passed to the driver too late anyway, and this 
happens not only with wine. E.g. when I run Heaven in a window with 
MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and 
console output at the same time, what I see is that most of GL dumps 
happen while Heaven shows splash screen with loading progress, but most 
of the driver's dumps appear on the first frame and few more times 
during benchmark. It looks like compilation is deferred somewhere in the 
stack before the driver, or am I missing something?


Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad when we
are compiling variants that are never used. Currently glxgears compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched handling
of the shaders to deferred compilation, that is, shaders are compiled only
before the actual draw. I found later that it's not really required for GS,
but IIRC this change results in only 5 shaders being compiled for glxgears
instead of 18. It seems most of the useless variants are results of state
changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and it's
actually noticeable with drawoverhead demo, but I didn't see any regressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented it in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from Unigine
Heaven 3.0 some time ago and found that from about 320 compiled shaders,
only about 180 (50%) were unique, others were duplicates (detected by
comparing the bytecode dumps for them in an automated way), maybe they had
different shader keys (which still resulted in the same bytecode), but I
suspect duplicate pipe shaders were also involved. Unfortunately I didn't
have a time to investigate it more thoroughly since then.

So my point is that we don't really need to eliminate shader variants, first
we need to eliminate compilation of unused variants and duplicate shaders.
Also we might want to consider offloading of the compilation to separate
thread(s) and caching of shader binaries between runs.

Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders




What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek

On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com
wrote:


Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alpha test? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:


We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

Tested on evergreen with multiple combinations of backends - no
regressions,
fixes some tests:

default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
llvm   - fixes about 25 tests related to depth/stencil
llvm+sb- fixes

Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs

2013-08-24 Thread Vadim Girlin

On 08/24/2013 02:31 PM, Marek Olšák wrote:

Like Christoph said, COLOR0 (if available) must always be exported for
alpha test.


Are there any piglit tests for that? I didn't see any regressions with 
this patch (at least on evergreen), possibly I messed up the testing 
somehow. Also I think old backend uses the same logic.


Vadim



Marek

On Sat, Aug 24, 2013 at 3:30 AM, Vadim Girlin vadimgir...@gmail.com wrote:

Currently llvm backend always exports at least one color in pixel
shader even if no color buffers are enabled. With depth/stencil exports
this can result in the following code:

EXPORT PIXEL 0 R0.xyzw  VPM
EXPORT PIXEL 61R1.x___  VPM
EXPORT_DONEPIXEL 61R0._x__  VPM  EOP

AFAIU with zero color buffers no memory is reserved for colors in the export
ring and all exports in this example actually write to the same location.
The code above still works fine in this particular case, because correct
values are written last, but reordering can break it (especially with SB
which tends to reorder the exports).

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

This fixes regressions with LLVM+SB, so I consider it as a prerequisite
for enabling SB by default. Also it fixes some issues with LLVM backend alone.
Tested on evergreen only (I don't have other hw), needs testing on
pre-evergreen GPUs.

  src/gallium/drivers/r600/r600_llvm.c   | 2 +-
  src/gallium/drivers/r600/r600_shader.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 03a68e4..d2f4aff 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context 
* bld_base)
 } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
 switch (ctx-r600_outputs[i].name) {
 case TGSI_SEMANTIC_COLOR:
-   has_color = true;
 if ( color_count  ctx-color_buffer_count) {
+   has_color = true;
 LLVMValueRef args[3];
 args[0] = output;
 if (ctx-fs_color_all) {
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index fb766c4..85f8469 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
 radeon_llvm_ctx.face_gpr = ctx.face_gpr;
 radeon_llvm_ctx.r600_inputs = ctx.shader-input;
 radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+   radeon_llvm_ctx.color_buffer_count = key.nr_cbufs;
 radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
 radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
(rscreen-chip_class = EVERGREEN);
 radeon_llvm_ctx.stream_outputs = so;
--
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs

2013-08-24 Thread Vadim Girlin

On 08/24/2013 07:24 PM, Marek Olšák wrote:

See piglit/fbo-alphatest-nocolor.


Ah, it seems I just compared wrong results when I was testing all 
combinations of backends and looked for regressions.


Now I think the problem is that even though llvm backend correctly emits 
color export with nr_cbufs==0, but it still relies on 
nr_ps_color_exports value computed in the old backend path (which is 
currently broken for that case), and this resulted in the regressions 
that I wanted to fix. I'll send new patch.


Vadim



Marek

On Sat, Aug 24, 2013 at 3:12 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 08/24/2013 02:31 PM, Marek Olšák wrote:


Like Christoph said, COLOR0 (if available) must always be exported for
alpha test.



Are there any piglit tests for that? I didn't see any regressions with this
patch (at least on evergreen), possibly I messed up the testing somehow.
Also I think old backend uses the same logic.

Vadim




Marek

On Sat, Aug 24, 2013 at 3:30 AM, Vadim Girlin vadimgir...@gmail.com
wrote:


Currently llvm backend always exports at least one color in pixel
shader even if no color buffers are enabled. With depth/stencil exports
this can result in the following code:

EXPORT PIXEL 0 R0.xyzw  VPM
EXPORT PIXEL 61R1.x___  VPM
EXPORT_DONEPIXEL 61R0._x__  VPM  EOP

AFAIU with zero color buffers no memory is reserved for colors in the
export
ring and all exports in this example actually write to the same location.
The code above still works fine in this particular case, because correct
values are written last, but reordering can break it (especially with SB
which tends to reorder the exports).

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

This fixes regressions with LLVM+SB, so I consider it as a prerequisite
for enabling SB by default. Also it fixes some issues with LLVM backend
alone.
Tested on evergreen only (I don't have other hw), needs testing on
pre-evergreen GPUs.

   src/gallium/drivers/r600/r600_llvm.c   | 2 +-
   src/gallium/drivers/r600/r600_shader.c | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c
b/src/gallium/drivers/r600/r600_llvm.c
index 03a68e4..d2f4aff 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct
lp_build_tgsi_context * bld_base)
  } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
  switch (ctx-r600_outputs[i].name) {
  case TGSI_SEMANTIC_COLOR:
-   has_color = true;
  if ( color_count 
ctx-color_buffer_count) {
+   has_color = true;
  LLVMValueRef args[3];
  args[0] = output;
  if (ctx-fs_color_all) {
diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index fb766c4..85f8469 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen
*rscreen,
  radeon_llvm_ctx.face_gpr = ctx.face_gpr;
  radeon_llvm_ctx.r600_inputs = ctx.shader-input;
  radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs ,
1);
+   radeon_llvm_ctx.color_buffer_count = key.nr_cbufs;
  radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
  radeon_llvm_ctx.fs_color_all = shader-fs_write_all 
(rscreen-chip_class = EVERGREEN);
  radeon_llvm_ctx.stream_outputs = so;
--
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs

2013-08-23 Thread Vadim Girlin
Currently llvm backend always exports at least one color in pixel
shader even if no color buffers are enabled. With depth/stencil exports
this can result in the following code:

EXPORT PIXEL 0 R0.xyzw  VPM
EXPORT PIXEL 61R1.x___  VPM
EXPORT_DONEPIXEL 61R0._x__  VPM  EOP

AFAIU with zero color buffers no memory is reserved for colors in the export
ring and all exports in this example actually write to the same location.
The code above still works fine in this particular case, because correct
values are written last, but reordering can break it (especially with SB
which tends to reorder the exports).

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

This fixes regressions with LLVM+SB, so I consider it as a prerequisite
for enabling SB by default. Also it fixes some issues with LLVM backend alone.
Tested on evergreen only (I don't have other hw), needs testing on
pre-evergreen GPUs.

 src/gallium/drivers/r600/r600_llvm.c   | 2 +-
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 03a68e4..d2f4aff 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context 
* bld_base)
} else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
switch (ctx-r600_outputs[i].name) {
case TGSI_SEMANTIC_COLOR:
-   has_color = true;
if ( color_count  ctx-color_buffer_count) {
+   has_color = true;
LLVMValueRef args[3];
args[0] = output;
if (ctx-fs_color_all) {
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index fb766c4..85f8469 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
radeon_llvm_ctx.face_gpr = ctx.face_gpr;
radeon_llvm_ctx.r600_inputs = ctx.shader-input;
radeon_llvm_ctx.r600_outputs = ctx.shader-output;
-   radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
+   radeon_llvm_ctx.color_buffer_count = key.nr_cbufs;
radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
radeon_llvm_ctx.fs_color_all = shader-fs_write_all  
(rscreen-chip_class = EVERGREEN);
radeon_llvm_ctx.stream_outputs = so;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] [RFC] r600g: enable SB backend by default

2013-08-22 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_asm.c| 3 ++-
 src/gallium/drivers/r600/r600_pipe.c   | 4 ++--
 src/gallium/drivers/r600/r600_pipe.h   | 2 +-
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index b8eedae..a0492a6 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2281,7 +2281,8 @@ void *r600_create_vertex_fetch_shader(struct pipe_context 
*ctx,
uint32_t *bytecode;
int i, j, r, fs_size;
struct r600_fetch_shader *shader;
-   unsigned sb_disasm = rctx-screen-debug_flags  (DBG_SB_DISASM | 
DBG_SB);
+   unsigned no_sb = rctx-screen-debug_flags  DBG_NO_SB;
+   unsigned sb_disasm = !no_sb || (rctx-screen-debug_flags  
DBG_SB_DISASM);
 
assert(count  32);
 
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 2be5910..edd50f0 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -67,8 +67,8 @@ static const struct debug_named_value debug_options[] = {
{ noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of 
INVALIDATE_RANGE map flags },
 
/* shader backend */
-   { sb, DBG_SB, Enable optimization of graphics shaders },
-   { sbcl, DBG_SB_CS, Enable optimization of compute shaders },
+   { nosb, DBG_NO_SB, Disable sb backend for graphics shaders },
+   { sbcl, DBG_SB_CS, Enable sb backend for compute shaders },
{ sbdry, DBG_SB_DRY_RUN, Don't use optimized bytecode (just print 
the dumps) },
{ sbstat, DBG_SB_STAT, Print optimization statistics for shaders },
{ sbdump, DBG_SB_DUMP, Print IR dumps after some optimization 
passes },
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 21d68c9..398ac89 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -249,7 +249,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context 
*ctx,
 #define DBG_NO_ASYNC_DMA   (1  19)
 #define DBG_NO_DISCARD_RANGE   (1  20)
 /* shader backend */
-#define DBG_SB (1  21)
+#define DBG_NO_SB  (1  21)
 #define DBG_SB_CS  (1  22)
 #define DBG_SB_DRY_RUN (1  23)
 #define DBG_SB_STAT(1  24)
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index fb766c4..1563430 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -140,7 +140,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
int r, i;
uint32_t *ptr;
bool dump = r600_can_dump_shader(rctx-screen, 
tgsi_get_processor_type(sel-tokens));
-   unsigned use_sb = rctx-screen-debug_flags  DBG_SB;
+   unsigned use_sb = !(rctx-screen-debug_flags  DBG_NO_SB);
unsigned sb_disasm = use_sb || (rctx-screen-debug_flags  
DBG_SB_DISASM);
 
shader-shader.bc.isa = rctx-isa;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Initialize cf_node::bc.

2013-08-19 Thread Vadim Girlin

On 08/19/2013 01:35 AM, Vinson Lee wrote:

Fixes Uninitialized pointer field defect reported by Coverity.

Signed-off-by: Vinson Lee v...@freedesktop.org
---
  src/gallium/drivers/r600/sb/sb_ir.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index c838f62..b696e77 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -962,8 +962,8 @@ public:

  class cf_node : public container_node {
  protected:
-   cf_node() : container_node(NT_OP, NST_CF_INST), jump_target(),
-   jump_after_target() {};
+   cf_node() : container_node(NT_OP, NST_CF_INST), bc(),
+   jump_target(), jump_after_target() {};


Hi, Vinson,

IIRC I switched the initialization of bc struct from constructor 
initializer list to explicit memset due to reported issues with older 
gcc versions, it failed to initialize the struct properly. See commit 
41005d.


Constructors of cf_node (as well as fetch_node, alu_node) are protected 
and called only by helper functions (create_cf, create_fetch, 
create_alu) in friend class r600_sb::shader that create nodes in pool, 
memset for bc is called right after constructor in these functions, so 
actually bc is always initialized. I don't remember why I didn't use 
memset in constructor body though, maybe moving memset there would 
silence Coverity?


Vadim


  public:
bc_cf bc;




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Move memsets of member structs to within constructor bodies.

2013-08-19 Thread Vadim Girlin

On 08/19/2013 11:50 AM, Vinson Lee wrote:

Silences Uninitialized pointer field defects reported by Coverity.

Signed-off-by: Vinson Lee v...@freedesktop.org


Reviewed-by: Vadim Girlin vadimgir...@gmail.com


---
  src/gallium/drivers/r600/sb/sb_ir.h   | 6 +++---
  src/gallium/drivers/r600/sb/sb_shader.cpp | 3 ---
  2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index c838f62..a74d6cb 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -963,7 +963,7 @@ public:
  class cf_node : public container_node {
  protected:
cf_node() : container_node(NT_OP, NST_CF_INST), jump_target(),
-   jump_after_target() {};
+   jump_after_target() { memset(bc, 0, sizeof(bc_cf)); };
  public:
bc_cf bc;

@@ -982,7 +982,7 @@ public:

  class alu_node : public node {
  protected:
-   alu_node() : node(NT_OP, NST_ALU_INST) {};
+   alu_node() : node(NT_OP, NST_ALU_INST) { memset(bc, 0, 
sizeof(bc_alu)); };
  public:
bc_alu bc;

@@ -1028,7 +1028,7 @@ public:

  class fetch_node : public node {
  protected:
-   fetch_node() : node(NT_OP, NST_FETCH_INST) {};
+   fetch_node() : node(NT_OP, NST_FETCH_INST) { memset(bc, 0, 
sizeof(bc_fetch)); };
  public:
bc_fetch bc;

diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 9fc47ae..98e52b1 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -260,7 +260,6 @@ node* shader::create_node(node_type nt, node_subtype nst, 
node_flags flags) {

  alu_node* shader::create_alu() {
alu_node* n = new (pool.allocate(sizeof(alu_node))) alu_node();
-   memset(n-bc, 0, sizeof(bc_alu));
all_nodes.push_back(n);
return n;
  }
@@ -281,7 +280,6 @@ alu_packed_node* shader::create_alu_packed() {

  cf_node* shader::create_cf() {
cf_node* n = new (pool.allocate(sizeof(cf_node))) cf_node();
-   memset(n-bc, 0, sizeof(bc_cf));
n-bc.barrier = 1;
all_nodes.push_back(n);
return n;
@@ -289,7 +287,6 @@ cf_node* shader::create_cf() {

  fetch_node* shader::create_fetch() {
fetch_node* n = new (pool.allocate(sizeof(fetch_node))) fetch_node();
-   memset(n-bc, 0, sizeof(bc_fetch));
all_nodes.push_back(n);
return n;
  }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: use MULADD workaround on R7xx for MULADD_IEEE

2013-08-10 Thread Vadim Girlin
Looks like the same issue that was seen with MULADD in trans slot on
R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is
just a most frequently used?). The workaround is to never put
affected instructions into the trans slot.

IIRC it was mostly observed when affected instructions had kcache operands
and some specific bank swizzles, but I have no R7xx hw to verify that, also
I'm still not sure whether it affects R6xx. Probably the condition can be
narrowed to allow better ALU packing in some cases.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Cc: 9.2 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index f0e41f5..2792315 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -1490,7 +1490,8 @@ unsigned post_scheduler::try_add_instruction(node *n) {
 
// FIXME workaround for some problems with MULADD in trans slot 
on r700,
// (is it really needed on r600?)
-   if (a-bc.op == ALU_OP3_MULADD  !ctx.is_egcm()) {
+   if ((a-bc.op == ALU_OP3_MULADD || a-bc.op == 
ALU_OP3_MULADD_IEEE) 
+   !ctx.is_egcm()) {
allowed_slots = 0x0F;
}
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Dump correct value for CND.

2013-08-04 Thread Vadim Girlin

On 08/04/2013 11:02 AM, Vinson Lee wrote:

Fixes Copy-paste error reported by Coverity.

Signed-off-by: Vinson Lee v...@freedesktop.org
---
  src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
index 9d76465..9b1420d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
@@ -174,7 +174,7 @@ void bc_dump::dump(cf_node n) {
}

if (n.bc.cond)
-   s   CND:  n.bc.pop_count;
+   s   CND:  n.bc.cond;

if (n.bc.pop_count)
s   POP:  n.bc.pop_count;



Reviewed-by: Vadim Girlin vadimgir...@gmail.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Fix Android build

2013-07-03 Thread Vadim Girlin

On 06/28/2013 01:31 AM, Tom Stellard wrote:

From: Chih-Wei Huang cwhu...@android-x86.org

Add the sb CXX files to the Android Makefile and also stop using some
c++11 features.
---
  src/gallium/drivers/r600/Android.mk | 5 +++--
  src/gallium/drivers/r600/sb/sb_bc.h | 4 ++--
  src/gallium/drivers/r600/sb/sb_ra_init.cpp  | 2 +-
  src/gallium/drivers/r600/sb/sb_valtable.cpp | 4 ++--
  4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/Android.mk 
b/src/gallium/drivers/r600/Android.mk
index e5188bb..4d2f69f 100644
--- a/src/gallium/drivers/r600/Android.mk
+++ b/src/gallium/drivers/r600/Android.mk
@@ -28,11 +28,12 @@ include $(LOCAL_PATH)/Makefile.sources

  include $(CLEAR_VARS)

-LOCAL_SRC_FILES := $(C_SOURCES)
+LOCAL_SRC_FILES := $(C_SOURCES) $(CXX_SOURCES)

-LOCAL_C_INCLUDES :=
+LOCAL_C_INCLUDES := $(DRM_TOP)

  LOCAL_MODULE := libmesa_pipe_r600

+include external/stlport/libstlport.mk
  include $(GALLIUM_COMMON_MK)
  include $(BUILD_STATIC_LIBRARY)
diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 25255a7..73c250d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -846,7 +846,7 @@ public:
unsigned ndw() { return bc.size(); }

void write_data(uint32_t* dst) {
-   memcpy(dst, bc.data(), 4 * bc.size());
+   std::copy(bc.begin(), bc.end(), dst);
}

void align(unsigned a) {
@@ -870,7 +870,7 @@ public:
}

unsigned get_pos() { return pos; }
-   uint32_t *data() { return bc.data(); }
+   uint32_t *data() { return bc.begin(); }


This results in type conversion error for me with gcc 4.8.1 (fedora 19). 
Probably we can simply use bc[0] here.


PS Sorry for the late reply, I'm sick now so I haven't checked email 
recently. Also I'm not sure when I'll be able to look into it and run 
any tests myself, so if this issue is fixed and there are no other 
regressions, I'm OK with this patch.




bytecode  operator (uint32_t v) {
if (pos == ndw()) {
diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
index bfe5ab9..24b24a0 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
@@ -680,7 +680,7 @@ void ra_split::split_vec(vvec vv, vvec v1, vvec v2, bool 
allow_swz) {

value *t;
vvec::iterator F =
-   allow_swz ? find(v2.begin(), v2.end(), 
o) : v2.end();
+   allow_swz ? std::find(v2.begin(), 
v2.end(), o) : v2.end();

if (F != v2.end()) {
t = *(v1.begin() + (F - v2.begin()));
diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp 
b/src/gallium/drivers/r600/sb/sb_valtable.cpp
index 5e6aca0..00aee66 100644
--- a/src/gallium/drivers/r600/sb/sb_valtable.cpp
+++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp
@@ -207,7 +207,7 @@ void value_table::get_values(vvec v) {

for(vt_table::iterator I = hashtable.begin(), E = hashtable.end();
I != E; ++I) {
-   T = copy(I-begin(), I-end(), T);
+   T = std::copy(I-begin(), I-end(), T);
}
  }

@@ -368,7 +368,7 @@ inline bool sb_bitset::set_chk(unsigned id, bool bit) {
  }

  void sb_bitset::clear() {
-   memset(data.data(), 0, sizeof(basetype) * data.size());
+   std::fill(data.begin(), data.end(), 0);
  }

  void sb_bitset::resize(unsigned size) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: improve math optimizations

2013-06-05 Thread Vadim Girlin

On 06/05/2013 12:01 AM, Grigori Goronzy wrote:

On 31.05.2013 14:37, Vadim Girlin wrote:

There are no regressions on evergreen with piglit tests or any
other apps that I tested, with and without llvm backend.
(Issue with Unigine Heaven that I mentioned on #dri-devel
yesterday was in fact caused by my own well-hidden bug, now it's fixed).

Improvements for real apps probably won't be very noticeable in many
cases,
but this still might help some apps, e.g. this improves shader2 test
of the fill benchmark in mesa demos.



I see noticeable FPS improvements (~7%) with one of my older pixel
shader effects here, a plasma-like thingie.

But this also breaks rendering in some other cases, e.g.
http://www.iquilezles.org/apps/shadertoy/index2.html?p=Heart
looks wrong. The colors are off.



Thanks for testing, I'll fix this issue and send updated patch.

Vadim


Best regards
Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: improve math optimizations v2

2013-06-05 Thread Vadim Girlin
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.

GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.

v2 - always set proper src vector size for transformed instructions
   - check for clamp modifier in the expr_handler::fold_assoc

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_isa.h  |  19 +-
 src/gallium/drivers/r600/r600_pipe.c |   1 +
 src/gallium/drivers/r600/r600_pipe.h |   1 +
 src/gallium/drivers/r600/sb/sb_bc.h  |   1 +
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp |   2 +
 src/gallium/drivers/r600/sb/sb_context.cpp   |   1 +
 src/gallium/drivers/r600/sb/sb_core.cpp  |   1 +
 src/gallium/drivers/r600/sb/sb_expr.cpp  | 448 ---
 src/gallium/drivers/r600/sb/sb_expr.h|   4 +
 src/gallium/drivers/r600/sb/sb_shader.cpp|   2 +-
 src/gallium/drivers/r600/sb/sb_shader.h  |   2 +
 11 files changed, 435 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 89d..c6bb869 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -84,7 +84,8 @@ enum alu_op_flags
 * includes MULADDs (considering the MUL part on src0 and src1 only) */
AF_M_COMM = (1  23),
 
-   /* associative operation ((a op b) op c) == (a op (b op c))  */
+   /* associative operation ((a op b) op c) == (a op (b op c)),
+* includes MULADDs (considering the MUL part on src0 and src1 only) */
AF_M_ASSOC = (1  24),
 
AF_PRED_PUSH = (1  25),
@@ -373,11 +374,11 @@ static const struct alu_op_info alu_op_table[] = {
{SAD_ACCUM_HI_UINT, 3, {   -1, 0x0F },{  0, 
0,  AF_V,  AF_V},  AF_UINT_DST },
{MULADD_UINT24, 3, {   -1, 0x10 },{  0, 
0,  AF_V,  AF_V},  AF_UINT_DST | AF_24 },
{LDS_IDX_OP,3, {   -1, 0x11 },{  0, 
0,  AF_V,  AF_V},  0 },
-   {MULADD,3, { 0x10, 0x14 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_M2, 3, { 0x11, 0x15 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_M4, 3, { 0x12, 0x16 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_D2, 3, { 0x13, 0x17 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_IEEE,   3, { 0x14, 0x18 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_IEEE },
+   {MULADD,3, { 0x10, 0x14 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_M2, 3, { 0x11, 0x15 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_M4, 3, { 0x12, 0x16 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_D2, 3, { 0x13, 0x17 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_IEEE,   3, { 0x14, 0x18 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
{CNDE,  3, { 0x18, 0x19 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_E },
{CNDGT, 3, { 0x19, 0x1A },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_GT },
{CNDGE, 3, { 0x1A, 0x1B },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_GE },
@@ -397,9 +398,9 @@ static const struct alu_op_info alu_op_table[] = {
{MUL_LIT_M2,3, { 0x0D,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
{MUL_LIT_M4,3, { 0x0E,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
{MUL_LIT_D2,3, { 0x0F,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
-   {MULADD_IEEE_M2,3, { 0x15,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
-   {MULADD_IEEE_M4,3, { 0x16,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
-   {MULADD_IEEE_D2,3, { 0x17,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
+   {MULADD_IEEE_M2,3, { 0x15,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
+   {MULADD_IEEE_M4,3, { 0x16,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
+   {MULADD_IEEE_D2,3, { 0x17,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
 
{LDS_ADD,   2, {   -1, 0x0011 },{  0, 
0,  AF_V

[Mesa-dev] [PATCH] r600g/sb: improve math optimizations

2013-05-31 Thread Vadim Girlin
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.

GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

There are no regressions on evergreen with piglit tests or any
other apps that I tested, with and without llvm backend.
(Issue with Unigine Heaven that I mentioned on #dri-devel
yesterday was in fact caused by my own well-hidden bug, now it's fixed).

Improvements for real apps probably won't be very noticeable in many cases,
but this still might help some apps, e.g. this improves shader2 test
of the fill benchmark in mesa demos.

 src/gallium/drivers/r600/r600_isa.h  |  19 +-
 src/gallium/drivers/r600/r600_pipe.c |   1 +
 src/gallium/drivers/r600/r600_pipe.h |   1 +
 src/gallium/drivers/r600/sb/sb_bc.h  |   1 +
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp |   2 +
 src/gallium/drivers/r600/sb/sb_context.cpp   |   1 +
 src/gallium/drivers/r600/sb/sb_core.cpp  |   1 +
 src/gallium/drivers/r600/sb/sb_expr.cpp  | 444 ---
 src/gallium/drivers/r600/sb/sb_expr.h|   4 +
 src/gallium/drivers/r600/sb/sb_shader.cpp|   2 +-
 src/gallium/drivers/r600/sb/sb_shader.h  |   2 +
 11 files changed, 431 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 89d..c6bb869 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -84,7 +84,8 @@ enum alu_op_flags
 * includes MULADDs (considering the MUL part on src0 and src1 only) */
AF_M_COMM = (1  23),
 
-   /* associative operation ((a op b) op c) == (a op (b op c))  */
+   /* associative operation ((a op b) op c) == (a op (b op c)),
+* includes MULADDs (considering the MUL part on src0 and src1 only) */
AF_M_ASSOC = (1  24),
 
AF_PRED_PUSH = (1  25),
@@ -373,11 +374,11 @@ static const struct alu_op_info alu_op_table[] = {
{SAD_ACCUM_HI_UINT, 3, {   -1, 0x0F },{  0, 
0,  AF_V,  AF_V},  AF_UINT_DST },
{MULADD_UINT24, 3, {   -1, 0x10 },{  0, 
0,  AF_V,  AF_V},  AF_UINT_DST | AF_24 },
{LDS_IDX_OP,3, {   -1, 0x11 },{  0, 
0,  AF_V,  AF_V},  0 },
-   {MULADD,3, { 0x10, 0x14 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_M2, 3, { 0x11, 0x15 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_M4, 3, { 0x12, 0x16 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_D2, 3, { 0x13, 0x17 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM },
-   {MULADD_IEEE,   3, { 0x14, 0x18 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_IEEE },
+   {MULADD,3, { 0x10, 0x14 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_M2, 3, { 0x11, 0x15 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_M4, 3, { 0x12, 0x16 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_D2, 3, { 0x13, 0x17 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
+   {MULADD_IEEE,   3, { 0x14, 0x18 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
{CNDE,  3, { 0x18, 0x19 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_E },
{CNDGT, 3, { 0x19, 0x1A },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_GT },
{CNDGE, 3, { 0x1A, 0x1B },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_CMOV | AF_CC_GE },
@@ -397,9 +398,9 @@ static const struct alu_op_info alu_op_table[] = {
{MUL_LIT_M2,3, { 0x0D,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
{MUL_LIT_M4,3, { 0x0E,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
{MUL_LIT_D2,3, { 0x0F,   -1 },{  AF_VS, 
AF_VS, 0, 0},  0 },
-   {MULADD_IEEE_M2,3, { 0x15,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
-   {MULADD_IEEE_M4,3, { 0x16,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
-   {MULADD_IEEE_D2,3, { 0x17,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_IEEE },
+   {MULADD_IEEE_M2,3, { 0x15,   -1 },{  AF_VS, 
AF_VS, 0, 0},  AF_M_COMM | AF_M_ASSOC | AF_IEEE },
+   {MULADD_IEEE_M4,3

Re: [Mesa-dev] [PATCH 1/2] r600g: add ISA info for RAT instructions

2013-05-29 Thread Vadim Girlin

On 05/30/2013 05:48 AM, Tom Stellard wrote:

On Mon, May 27, 2013 at 02:15:21AM +0400, Vadim Girlin wrote:

This will help to improve dumps of the compute shaders,
also it will be required for complete handling of RAT instructions in sb.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
  src/gallium/drivers/r600/r600_isa.c |  19 ++
  src/gallium/drivers/r600/r600_isa.h | 132 
  2 files changed, 151 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_isa.c 
b/src/gallium/drivers/r600/r600_isa.c
index 4c6ccac..c99352f 100644
--- a/src/gallium/drivers/r600/r600_isa.c
+++ b/src/gallium/drivers/r600/r600_isa.c
@@ -81,6 +81,23 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa 
*isa) {
isa-cf_map[opc] = i + 1;
}

+   /* RAT instructions are not available on pre-evergreen */
+   if (ctx-chip_class = EVERGREEN) {
+   unsigned column = isa-hw_class - ISA_CC_EVERGREEN;
+
+   isa-rat_map = calloc(64, sizeof(unsigned));
+   if (!isa-rat_map)
+   return -1;
+
+   for (i = 0; i  TABLE_SIZE(rat_op_table); ++i) {
+   const struct rat_op_info *op = rat_op_table[i];
+   unsigned opc = op-opcode[column];
+   if (opc == -1)
+   continue;
+   isa-rat_map[opc] = i + 1;
+   }
+   }
+
return 0;
  }

@@ -97,6 +114,8 @@ int r600_isa_destroy(struct r600_isa *isa) {
free(isa-fetch_map);
if (isa-cf_map)
free(isa-cf_map);
+   if (isa-rat_map)
+   free(isa-rat_map);

free(isa);
return 0;
diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 89d..4055a04 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -147,6 +147,12 @@ enum cf_op_flags
CF_LOOP_START = (114)
  };

+enum rat_op_flags
+{
+   RF_RTN  = (1  0),
+
+};
+
  /* ALU instruction info */
  struct alu_op_info
  {
@@ -182,6 +188,15 @@ struct cf_op_info
int flags;
  };

+/* CF RAT instruction info */
+struct rat_op_info
+{
+   const char * name;
+   /* 0 - EG, 1 - CM */
+   int opcode[2];
+   int flags;
+};
+
  static const struct alu_op_info alu_op_table[] = {
{ADD,   2, { 0x00, 0x00 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
{MUL,   2, { 0x01, 0x01 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
@@ -665,6 +680,97 @@ static const struct cf_op_info cf_op_table[] = {
{CF_NATIVE, { 0x00, 0x00, 0x00, 0x00 },  
0  }
  };

+static const struct rat_op_info rat_op_table[] = {
+   {NOP, {0x00, 0x00}, 0},
+   {STORE_TYPED, {0x01, 0x01}, 0},
+   {STORE_RAW,   {0x02,   -1}, 0},


The LLVM backend uses the STORE_RAW instructions on Cayman
even though the docs don't list it is a legal instructions.  Will this
cause the code to crash?


Yes, I think it will cause assert (or crash if asserts are disabled),
though only if sbcl or sbdisasm are enabled in R600_DEBUG.



I have it on my TODO list to use the STORE_* instructions on Cayman, but
I'm not sure when I'll be able to get to it.  Maybe for now you can
enable this opcode on Cayman too.


Of course, if it works then possibly the doc is not completely correct. 
On the other hand, I wonder if the doc is right and this might explain 
some issues people report with compute on cayman.


Anyway I'll enable this opcode on Cayman for now to avoid the problems.



-Tom


+   {STORE_RAW_FDENORM,   {0x03,   -1}, 0},
+   {CMPXCHG_INT, {0x04, 0x00}, 0},


There is a typo in cayman opcode, should be 0x04 instead of 0x00, I'll 
fix it too.


Vadim



+   {CMPXCHG_FLT, {0x05,   -1}, 0},
+   {CMPXCHG_FDENORM, {0x06,   -1}, 0},
+   {ADD, {0x07, 0x07}, 0},
+   {SUB, {0x08, 0x08}, 0},
+   {RSUB,{0x09, 0x09}, 0},
+   {MIN_INT, {0x0A, 0x0A}, 0},
+   {MIN_UINT,{0x0B, 0x0B}, 0},
+   {MAX_INT, {0x0C, 0x0C}, 0},
+   {MAX_UINT,{0x0D, 0x0D}, 0},
+   {AND, {0x0E, 0x0E}, 0},
+   {OR,  {0x0F, 0x0F}, 0},
+   {XOR, {0x10, 0x10}, 0},
+   {MSKOR,   {0x11,   -1}, 0},
+   {INC_UINT,{0x12, 0x12}, 0},
+   {DEC_UINT,{0x13, 0x13}, 0},
+
+   {STORE_DWORD, {  -1, 0x14}, 0

[Mesa-dev] [PATCH 1/2] r600g: add ISA info for RAT instructions

2013-05-26 Thread Vadim Girlin
This will help to improve dumps of the compute shaders,
also it will be required for complete handling of RAT instructions in sb.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_isa.c |  19 ++
 src/gallium/drivers/r600/r600_isa.h | 132 
 2 files changed, 151 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_isa.c 
b/src/gallium/drivers/r600/r600_isa.c
index 4c6ccac..c99352f 100644
--- a/src/gallium/drivers/r600/r600_isa.c
+++ b/src/gallium/drivers/r600/r600_isa.c
@@ -81,6 +81,23 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa 
*isa) {
isa-cf_map[opc] = i + 1;
}
 
+   /* RAT instructions are not available on pre-evergreen */
+   if (ctx-chip_class = EVERGREEN) {
+   unsigned column = isa-hw_class - ISA_CC_EVERGREEN;
+
+   isa-rat_map = calloc(64, sizeof(unsigned));
+   if (!isa-rat_map)
+   return -1;
+
+   for (i = 0; i  TABLE_SIZE(rat_op_table); ++i) {
+   const struct rat_op_info *op = rat_op_table[i];
+   unsigned opc = op-opcode[column];
+   if (opc == -1)
+   continue;
+   isa-rat_map[opc] = i + 1;
+   }
+   }
+
return 0;
 }
 
@@ -97,6 +114,8 @@ int r600_isa_destroy(struct r600_isa *isa) {
free(isa-fetch_map);
if (isa-cf_map)
free(isa-cf_map);
+   if (isa-rat_map)
+   free(isa-rat_map);
 
free(isa);
return 0;
diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 89d..4055a04 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -147,6 +147,12 @@ enum cf_op_flags
CF_LOOP_START = (114)
 };
 
+enum rat_op_flags
+{
+   RF_RTN  = (1  0),
+
+};
+
 /* ALU instruction info */
 struct alu_op_info
 {
@@ -182,6 +188,15 @@ struct cf_op_info
int flags;
 };
 
+/* CF RAT instruction info */
+struct rat_op_info
+{
+   const char * name;
+   /* 0 - EG, 1 - CM */
+   int opcode[2];
+   int flags;
+};
+
 static const struct alu_op_info alu_op_table[] = {
{ADD,   2, { 0x00, 0x00 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
{MUL,   2, { 0x01, 0x01 },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_M_COMM | AF_M_ASSOC },
@@ -665,6 +680,97 @@ static const struct cf_op_info cf_op_table[] = {
{CF_NATIVE, { 0x00, 0x00, 0x00, 0x00 },  
0  }
 };
 
+static const struct rat_op_info rat_op_table[] = {
+   {NOP, {0x00, 0x00}, 0},
+   {STORE_TYPED, {0x01, 0x01}, 0},
+   {STORE_RAW,   {0x02,   -1}, 0},
+   {STORE_RAW_FDENORM,   {0x03,   -1}, 0},
+   {CMPXCHG_INT, {0x04, 0x00}, 0},
+   {CMPXCHG_FLT, {0x05,   -1}, 0},
+   {CMPXCHG_FDENORM, {0x06,   -1}, 0},
+   {ADD, {0x07, 0x07}, 0},
+   {SUB, {0x08, 0x08}, 0},
+   {RSUB,{0x09, 0x09}, 0},
+   {MIN_INT, {0x0A, 0x0A}, 0},
+   {MIN_UINT,{0x0B, 0x0B}, 0},
+   {MAX_INT, {0x0C, 0x0C}, 0},
+   {MAX_UINT,{0x0D, 0x0D}, 0},
+   {AND, {0x0E, 0x0E}, 0},
+   {OR,  {0x0F, 0x0F}, 0},
+   {XOR, {0x10, 0x10}, 0},
+   {MSKOR,   {0x11,   -1}, 0},
+   {INC_UINT,{0x12, 0x12}, 0},
+   {DEC_UINT,{0x13, 0x13}, 0},
+
+   {STORE_DWORD, {  -1, 0x14}, 0},
+   {STORE_SHORT, {  -1, 0x15}, 0},
+   {STORE_BYTE,  {  -1, 0x16}, 0},
+
+   {NOP_RTN_INTERNAL,{0x20, 0x20}, 0},
+
+   {XCHG_RTN,{0x22, 0x22}, RF_RTN },
+   {XCHG_FDENORM_RTN,{0x23,   -1}, RF_RTN },
+   {CMPXCHG_INT_RTN, {0x24, 0x24}, RF_RTN },
+   {CMPXCHG_FLT_RTN, {0x25, 0x25}, RF_RTN },
+   {CMPXCHG_FDENORM_RTN, {0x26, 0x26}, RF_RTN },
+   {ADD_RTN, {0x27, 0x27}, RF_RTN },
+   {SUB_RTN, {0x28, 0x28}, RF_RTN },
+   {RSUB_RTN,{0x29, 0x29}, RF_RTN },
+   {MIN_INT_RTN, {0x2A, 0x2A}, RF_RTN },
+   {MIN_UINT_RTN,{0x2B, 0x2B}, RF_RTN },
+   {MAX_INT_RTN, {0x2C, 0x2C}, RF_RTN },
+   {MAX_UINT_RTN,{0x2D, 0x2D}, RF_RTN

[Mesa-dev] [PATCH 2/2] r600g/sb: use ISA info for RAT instructions

2013-05-26 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_bc.h   | 12 +++-
 src/gallium/drivers/r600/sb/sb_bc_builder.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_decoder.cpp |  5 -
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp| 13 +++--
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 25255a7..9f546be 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -470,10 +470,16 @@ struct bc_cf {
unsigned comp_mask:4;
 
unsigned rat_id:4;
-   unsigned rat_inst:6;
unsigned rat_index_mode:2;
 
+   const rat_op_info *rat_op_ptr;
+   unsigned rat_op;
+
void set_op(unsigned op) { this-op = op; op_ptr = r600_isa_cf(op); }
+   void set_rat_op(unsigned op) {
+   this-rat_op = op;
+   rat_op_ptr = r600_isa_rat(op);
+   }
 
bool is_alu_extended() {
assert(op_ptr-flags  CF_ALU);
@@ -652,6 +658,10 @@ public:
return r600_isa_cf_opcode(isa-hw_class, op);
}
 
+   unsigned rat_opcode(unsigned op) {
+   return r600_isa_rat_opcode(isa-hw_class, op);
+   }
+
unsigned alu_opcode(unsigned op) {
return r600_isa_alu_opcode(isa-hw_class, op);
}
diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
index 55e2a85..4322f45 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
@@ -267,7 +267,7 @@ int bc_builder::build_cf_exp(cf_node* n) {
.INDEX_GPR(bc.index_gpr)
.RAT_ID(bc.rat_id)
.RAT_INDEX_MODE(bc.rat_index_mode)
-   .RAT_INST(bc.rat_inst)
+   .RAT_INST(ctx.rat_opcode(bc.rat_op))
.RW_GPR(bc.rw_gpr)
.RW_REL(bc.rw_rel)
.TYPE(bc.type);
diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
index 5e233f9..0f3c57a 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
@@ -242,13 +242,16 @@ int bc_decoder::decode_cf_mem(unsigned  i, bc_cf bc) {
} else {
assert(ctx.is_egcm());
CF_ALLOC_EXPORT_WORD0_RAT_EGCM w0(dw0);
+   unsigned rat_opcode = w0.get_RAT_INST();
+
+   bc.set_rat_op(r600_isa_rat_by_opcode(ctx.isa, rat_opcode));
+
bc.elem_size = w0.get_ELEM_SIZE();
bc.index_gpr = w0.get_INDEX_GPR();
bc.rw_gpr = w0.get_RW_GPR();
bc.rw_rel = w0.get_RW_REL();
bc.type = w0.get_TYPE();
bc.rat_id = w0.get_RAT_ID();
-   bc.rat_inst = w0.get_RAT_INST();
bc.rat_index_mode = w0.get_RAT_INDEX_MODE();
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
index 9d76465..152a33f 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
@@ -140,15 +140,24 @@ void bc_dump::dump(cf_node n) {
} else if (n.bc.op_ptr-flags  (CF_STRM | CF_RAT)) {
static const char *exp_type[] = {WRITE, WRITE_IND, 
WRITE_ACK,
WRITE_IND_ACK};
+
+   bool rat = (n.bc.op_ptr-flags  CF_RAT) != 0;
+
fill_to(s, 18);
s exp_type[n.bc.type]   ;
+
+   if (rat) {
+   s  n.bc.rat_op_ptr-name   ;
+   }
+
s.print_wl(n.bc.array_base, 5);
s   R  n.bc.rw_gpr  .;
for (int k = 0; k  4; ++k)
s  ((n.bc.comp_mask  (1  k)) ? chans[k] : '_');
 
-   if ((n.bc.op_ptr-flags  CF_RAT)  (n.bc.type  1)) {
-   s  , @R  n.bc.index_gpr  .xyz;
+   if (rat) {
+   if (n.bc.type  1)
+   s  , @R  n.bc.index_gpr  .xyz;
}
 
sES:  n.bc.elem_size;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] SIGFPE in libdrm_radeon on evergreen

2013-05-20 Thread Vadim Girlin

On 05/20/2013 11:27 AM, Dragomir Ivanov wrote:

0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848,
level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0)

It looks like division by 0. tile_split=0 from the call site.


Yes, I'm just not sure why tile_split is 0 here and what is the best way 
to fix it, possibly in fact this is a consequence of some problem in 
r600g, not in the libdrm. Though probably libdrm should handle it more 
gracefully anyway.


Vadim




On Mon, May 20, 2013 at 4:11 AM, Vadim Girlin vadimgir...@gmail.com wrote:


Reduced test app attached and below is gdb backtrace. I suspect something
is not initialized properly but I'm not very familiar with this code.

Vadim


Program received signal SIGFPE, Arithmetic exception.
0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
start_level=0)
  at radeon_surface.c:651
651 slice_pt = tileb / tile_split;

#0  0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
start_level=0)
  at radeon_surface.c:651
#1  0x76905eea in eg_surface_init_2d_miptrees (surf_man=0x633ea0,
surf=0x88d848) at radeon_surface.c:807
#2  0x76906062 in eg_surface_init (surf_man=0x633ea0,
surf=0x88d848) at radeon_surface.c:863
#3  0x76907fe6 in radeon_surface_init (surf_man=0x633ea0,
surf=0x88d848) at radeon_surface.c:1901
#4  0x7713260b in radeon_drm_winsys_surface_init (rws=0x6339a0,
surf=0x88d848) at radeon_drm_winsys.c:477
#5  0x770a3e1c in r600_setup_surface (screen=0x6340d0,
rtex=0x88d760, pitch_in_bytes_override=0) at r600_texture.c:203
#6  0x770a4774 in r600_texture_create_object (screen=0x6340d0,
base=0x7fffd6d0, pitch_in_bytes_override=0, buf=0x0,
surface=0x7fffc8e0)
  at r600_texture.c:432
#7  0x770a5268 in r600_texture_create (screen=0x6340d0,
templ=0x7fffd6d0) at r600_texture.c:607
#8  0x7708a5bd in r600_resource_create (screen=0x6340d0,
templ=0x7fffd6d0) at r600_resource.c:38
#9  0x77125579 in dri2_drawable_process_buffers
(drawable=0x88af80, buffers=0x88aea0, buffer_count=1, atts=0x88b628,
att_count=2) at dri2.c:283
#10 0x7712590a in dri2_allocate_textures (drawable=0x88af80,
statts=0x88b628, statts_count=2) at dri2.c:404
#11 0x77123e6a in dri_st_framebuffer_validate (stfbi=0x88af80,
statts=0x88b628, count=2, out=0x7fffd840) at dri_drawable.c:81
#12 0x76e461c1 in st_framebuffer_validate (stfb=0x88b1e0,
st=0x883870) at ../../src/mesa/state_tracker/**st_manager.c:193
#13 0x76e472a8 in st_api_make_current (stapi=0x7761b9e0
st_gl_api, stctxi=0x883870, stdrawi=0x88af80, streadi=0x88af80)
  at ../../src/mesa/state_tracker/**st_manager.c:721
#14 0x77122ce8 in dri_make_current (cPriv=0x7fdb70,
driDrawPriv=0x88af40, driReadPriv=0x88af40) at dri_context.c:255
#15 0x76c6ba1f in driBindContext (pcp=0x7fdb70, pdp=0x88af40,
prp=0x88af40) at ../../../../src/mesa/drivers/**dri/common/dri_util.c:382
#16 0x77dc57e3 in dri2_bind_context (context=0x7fd9d0,
old=0x616650, draw=67108873, read=67108873) at dri2_glx.c:172
#17 0x77d8c253 in MakeContextCurrent (dpy=0x602040, draw=67108873,
read=67108873, gc_user=0x7fd9d0) at glxcurrent.c:269
#18 0x00384e82713c in fgOpenWindow () from /lib64/libglut.so.3
#19 0x00384e825afa in fgCreateWindow () from /lib64/libglut.so.3
#20 0x00384e825b95 in fgCreateMenu () from /lib64/libglut.so.3
#21 0x00384e823cd3 in glutCreateMenu () from /lib64/libglut.so.3
#22 0x00400816 in main (argc=1, argv=0x7fffdf18) at test.c:17


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] r600g/sb: separate bytecode decoding and parsing

2013-05-11 Thread Vadim Girlin
Parsing and ir construction is required for optimization only,
it's unnecessary if we only need to print shader dump.
This should make new disassembler more tolerant to any new
features in the bytecode.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_bc.h   |  27 ++--
 src/gallium/drivers/r600/sb/sb_bc_builder.cpp |   4 -
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp  | 224 +-
 src/gallium/drivers/r600/sb/sb_core.cpp   |  45 --
 src/gallium/drivers/r600/sb/sb_shader.cpp |   4 +-
 src/gallium/drivers/r600/sb/sb_shader.h   |   3 +-
 6 files changed, 163 insertions(+), 144 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 9c6ed46..9f65098 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -674,40 +674,39 @@ class bc_parser {
typedef std::stackregion_node* region_stack;
region_stack loop_stack;
 
-   int enable_dump;
-   int optimize;
-
 public:
 
-   bc_parser(sb_context sctx, r600_bytecode *bc, r600_shader* pshader,
- int dump_source, int optimize) :
+   bc_parser(sb_context sctx, r600_bytecode *bc, r600_shader* pshader) :
ctx(sctx), dec(), bc(bc), pshader(pshader),
dw(), bc_ndw(), max_cf(),
sh(), error(), slots(), cgroup(),
-   cf_map(), loop_stack(), enable_dump(dump_source),
-   optimize(optimize) { }
+   cf_map(), loop_stack() { }
 
-   int parse();
+   int decode();
+   int prepare();
 
shader* get_shader() { assert(!error); return sh; }
 
 private:
 
-   int parse_shader();
+   int decode_shader();
 
int parse_decls();
 
-   int parse_cf(unsigned i, bool eop);
+   int decode_cf(unsigned i, bool eop);
 
-   int parse_alu_clause(cf_node *cf);
-   int parse_alu_group(cf_node* cf, unsigned i, unsigned gcnt);
+   int decode_alu_clause(cf_node *cf);
+   int decode_alu_group(cf_node* cf, unsigned i, unsigned gcnt);
 
-   int parse_fetch_clause(cf_node *cf);
+   int decode_fetch_clause(cf_node *cf);
 
int prepare_ir();
+   int prepare_alu_clause(cf_node *cf);
+   int prepare_alu_group(cf_node* cf, alu_group_node *g);
+   int prepare_fetch_clause(cf_node *cf);
+
int prepare_loop(cf_node *c);
int prepare_if(cf_node *c);
-   int prepare_alu_clause(cf_node *c);
 
 };
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
index b0c2e41..f40e469 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
@@ -94,10 +94,6 @@ int bc_builder::build() {
cf_pos = bb.get_pos();
}
 
-   if (sh.enable_dump) {
-   bc_dump(sh, cerr, bb).run();
-   }
-
return 0;
 }
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 8329287..9f3ecc5 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -47,7 +47,7 @@ namespace r600_sb {
 
 using std::cerr;
 
-int bc_parser::parse() {
+int bc_parser::decode() {
 
dw = bc-bytecode;
bc_ndw = bc-ndw;
@@ -71,47 +71,27 @@ int bc_parser::parse() {
t = TARGET_FETCH;
}
 
-   sh = new shader(ctx, t, bc-debug_id, enable_dump);
-   int r = parse_shader();
+   sh = new shader(ctx, t, bc-debug_id);
+   int r = decode_shader();
 
delete dec;
 
-   if (r)
-   return r;
-
sh-ngpr = bc-ngpr;
sh-nstack = bc-nstack;
 
-   if (sh-target != TARGET_FETCH) {
-   sh-src_stats.ndw = bc-ndw;
-   sh-collect_stats(false);
-   }
-
-   if (enable_dump) {
-   bc_dump(*sh, cerr, bc-bytecode, bc_ndw).run();
-   }
-
-   if (!optimize)
-   return 0;
-
-   prepare_ir();
-
return r;
 }
 
-int bc_parser::parse_shader() {
+int bc_parser::decode_shader() {
int r = 0;
unsigned i = 0;
bool eop = false;
 
sh-init();
 
-   if (pshader)
-   parse_decls();
-
do {
eop = false;
-   if ((r = parse_cf(i, eop)))
+   if ((r = decode_cf(i, eop)))
return r;
 
} while (!eop || (i  1) = max_cf);
@@ -119,34 +99,34 @@ int bc_parser::parse_shader() {
return 0;
 }
 
-int bc_parser::parse_decls() {
-
-// sh-prepare_regs(rs.bc.ngpr);
-
-   if (pshader-indirect_files  ~(1  TGSI_FILE_CONSTANT)) {
+int bc_parser::prepare() {
+   int r = 0;
+   if ((r = parse_decls()))
+   return r;
+   if ((r = prepare_ir()))
+   return r;
+   return 0;
+}
 
-#if SB_NO_ARRAY_INFO
+int bc_parser::parse_decls

Re: [Mesa-dev] r600g missing Bump mapping

2013-05-08 Thread Vadim Girlin

On 05/09/2013 02:42 AM, Dragomir Ivanov wrote:

Hi there,
I just fired Doom3 on 64 -bit Arch Linux (no 32 libs involved), to test
r600g progress.
Game runs fine, but I can't see bump mapping effects as on Catalyst under
windows. They are enabled in the options. Does Mesa/r600g support bumps?
AMD E-350 here. Evergreen class GPU.


Here are two screenshots made with git mesa on evergreen with Ultra 
settings, the only difference is toggled bump mapping option:


http://i.imgur.com/Cl0hamf.jpg
http://i.imgur.com/4IsjrR3.jpg

To me it looks like bump mapping works. Could you provide more detailed 
info (with screenshots etc) to demonstrate your issue? Also you might 
want to try resetting game options to default to make sure that you 
don't have any nonstandard tweaks.


Vadim



OpenGL renderer string: Gallium 0.4 on AMD PALM


OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.2


OpenGL core profile shading language version string: 1.40

Linux localhost 3.8.11-1-ARCH #1 SMP PREEMPT Wed May 1 20:18:57 CEST 2013
x86_64 GNU/Linux



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g missing Bump mapping

2013-05-08 Thread Vadim Girlin

On 05/09/2013 05:42 AM, Dragomir Ivanov wrote:

Hmm, Vadim... it works indeed. I can't push up to Ultra, but I see the
difference as on your screenshots.
Interestingly when I play it on win with catalyst, everything is WOW, on
r600g is Meh...
Unfortunately I erased windows, so I can't supply screenshot, but
subjectively it was way more beautiful on the same graphics level.


IIRC Doom3 tries to autodetect some settings and I guess there are 
differences in the game configuration with different drivers. Possibly 
some settings are misdetected with r600g, in this case I guess running 
it with the configuration file created for catalyst might help.


Also you might want to check game's console output for any hints, e.g. 
like this:


guessing video ram ( use +set sys_videoRam to force ) ..
guess failed, return default low-end VRAM setting ( 64MB VRAM )

Though I don't see this message with 64-bit port, looks like detection 
logic was changed there.


Vadim




On Thu, May 9, 2013 at 4:15 AM, Vadim Girlin vadimgir...@gmail.com wrote:


On 05/09/2013 02:42 AM, Dragomir Ivanov wrote:


Hi there,
I just fired Doom3 on 64 -bit Arch Linux (no 32 libs involved), to test
r600g progress.
Game runs fine, but I can't see bump mapping effects as on Catalyst under
windows. They are enabled in the options. Does Mesa/r600g support bumps?
AMD E-350 here. Evergreen class GPU.



Here are two screenshots made with git mesa on evergreen with Ultra
settings, the only difference is toggled bump mapping option:

http://i.imgur.com/Cl0hamf.jpg
http://i.imgur.com/4IsjrR3.jpg

To me it looks like bump mapping works. Could you provide more detailed
info (with screenshots etc) to demonstrate your issue? Also you might want
to try resetting game options to default to make sure that you don't have
any nonstandard tweaks.

Vadim



OpenGL renderer string: Gallium 0.4 on AMD PALM


OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.2


OpenGL core profile shading language version string: 1.40

Linux localhost 3.8.11-1-ARCH #1 SMP PREEMPT Wed May 1 20:18:57 CEST 2013
x86_64 GNU/Linux



__**_
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev



__**_
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] r600g/sb: fix kcache handling on r6xx

2013-05-04 Thread Vadim Girlin
Use the same limit for kcache constants in alu group on r6xx as on other
chips (two const pairs). Relaxing this will require additional checks to
make sure that all 4 consts in the group come from 2 kcache sets (clause
limit), probably without noticeable improvements of shader performance.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index b21b342..d0045ce 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -43,7 +43,11 @@ namespace r600_sb {
 using std::cerr;
 
 rp_kcache_tracker::rp_kcache_tracker(shader sh) : rp(), uc(),
-   sel_count(sh.get_ctx().is_r600() ? 4 : 2) {}
+   // FIXME: for now we'll use two const pairs limit for r600, 
same as
+   // for other chips, otherwise additional check in 
alu_group_tracker is
+   // required to make sure that all 4 consts in the group fit 
into 2
+   // kcache sets
+   sel_count(2) {}
 
 bool rp_kcache_tracker::try_reserve(sel_chan r) {
unsigned sel = kc_sel(r);
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] r600g/sb: optimize some cases for CNDxx instructions

2013-05-04 Thread Vadim Girlin
We can replace CNDxx with MOV (and possibly eliminate after
propagation) in following cases:

If src1 is equal to src2 in CNDxx instruction then the result doesn't
depend on condition and we can replace the instruction with
MOV dst, src1.

If src0 is const then we can evaluate the condition at compile time and
also replace it with MOV.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_expr.cpp | 84 +++--
 src/gallium/drivers/r600/sb/sb_expr.h   |  2 +
 2 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index e3c7858..8582c8e 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -432,13 +432,63 @@ bool expr_handler::fold_alu_op2(alu_node n) {
return true;
 }
 
+bool expr_handler::evaluate_condition(unsigned alu_cnd_flags,
+  literal s1, literal s2) {
+
+   unsigned cmp_type = alu_cnd_flags  AF_CMP_TYPE_MASK;
+   unsigned cc = alu_cnd_flags  AF_CC_MASK;
+
+   switch (cmp_type) {
+   case AF_FLOAT_CMP: {
+   switch (cc) {
+   case AF_CC_E : return s1.f == s2.f;
+   case AF_CC_GT: return s1.f   s2.f;
+   case AF_CC_GE: return s1.f = s2.f;
+   case AF_CC_NE: return s1.f != s2.f;
+   case AF_CC_LT: return s1.f   s2.f;
+   case AF_CC_LE: return s1.f = s2.f;
+   default:
+   assert(!invalid condition code);
+   return false;
+   }
+   }
+   case AF_INT_CMP: {
+   switch (cc) {
+   case AF_CC_E : return s1.i == s2.i;
+   case AF_CC_GT: return s1.i   s2.i;
+   case AF_CC_GE: return s1.i = s2.i;
+   case AF_CC_NE: return s1.i != s2.i;
+   case AF_CC_LT: return s1.i   s2.i;
+   case AF_CC_LE: return s1.i = s2.i;
+   default:
+   assert(!invalid condition code);
+   return false;
+   }
+   }
+   case AF_UINT_CMP: {
+   switch (cc) {
+   case AF_CC_E : return s1.u == s2.u;
+   case AF_CC_GT: return s1.u   s2.u;
+   case AF_CC_GE: return s1.u = s2.u;
+   case AF_CC_NE: return s1.u != s2.u;
+   case AF_CC_LT: return s1.u   s2.u;
+   case AF_CC_LE: return s1.u = s2.u;
+   default:
+   assert(!invalid condition code);
+   return false;
+   }
+   }
+   default:
+   assert(!invalid cmp_type);
+   return false;
+   }
+}
+
 bool expr_handler::fold_alu_op3(alu_node n) {
 
if (n.src.size()  3)
return false;
 
-   // TODO handle CNDxx by some common path
-
value* v0 = n.src[0];
value* v1 = n.src[1];
value* v2 = n.src[2];
@@ -449,9 +499,6 @@ bool expr_handler::fold_alu_op3(alu_node n) {
bool isc1 = v1-is_const();
bool isc2 = v2-is_const();
 
-   if (!isc0  !isc1  !isc2)
-   return false;
-
literal dv, cv0, cv1, cv2;
 
if (isc0) {
@@ -469,6 +516,33 @@ bool expr_handler::fold_alu_op3(alu_node n) {
apply_alu_src_mod(n.bc, 2, cv2);
}
 
+   if (n.bc.op_ptr-flags  AF_CMOV) {
+   int src = 0;
+
+   if (v1-gvalue() == v2-gvalue() 
+   n.bc.src[1].neg == n.bc.src[2].neg) {
+   // result doesn't depend on condition, convert to MOV
+   src = 1;
+   } else if (isc0) {
+   // src0 is const, condition can be evaluated, convert 
to MOV
+   bool cond = evaluate_condition(n.bc.op_ptr-flags  
(AF_CC_MASK |
+   AF_CMP_TYPE_MASK), cv0, literal(0));
+   src = cond ? 1 : 2;
+   }
+
+   if (src) {
+   // if src is selected, convert to MOV
+   n.bc.src[0] = n.bc.src[src];
+   n.src[0] = n.src[src];
+   n.src.resize(1);
+   n.bc.set_op(ALU_OP1_MOV);
+   return fold_alu_op1(n);
+   }
+   }
+
+   if (!isc0  !isc1  !isc2)
+   return false;
+
if (isc0  isc1  isc2) {
switch (n.bc.op) {
case ALU_OP3_MULADD: dv = cv0.f * cv1.f + cv2.f; break;
diff --git a/src/gallium/drivers/r600/sb/sb_expr.h 
b/src/gallium/drivers/r600/sb/sb_expr.h
index 7f3bd15..c7f7dbf 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.h
+++ b/src/gallium/drivers/r600/sb/sb_expr.h
@@ -76,6 +76,8 @@ public:
void apply_alu_dst_mod(const bc_alu bc, literal v);
 
void assign_source(value *dst, value *src

[Mesa-dev] [PATCH 2/3] r600g/sb: fix memory leaks

2013-05-04 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 3 ++-
 src/gallium/drivers/r600/sb/sb_shader.cpp| 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index e1478d3..8329287 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -74,6 +74,8 @@ int bc_parser::parse() {
sh = new shader(ctx, t, bc-debug_id, enable_dump);
int r = parse_shader();
 
+   delete dec;
+
if (r)
return r;
 
@@ -94,7 +96,6 @@ int bc_parser::parse() {
 
prepare_ir();
 
-   delete dec;
return r;
 }
 
diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 9bda84f..5944ba6 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -355,6 +355,11 @@ shader::~shader() {
for (node_vec::iterator I = all_nodes.begin(), E = all_nodes.end();
I != E; ++I)
(*I)-~node();
+
+   for (gpr_array_vec::iterator I = gpr_arrays.begin(), E = 
gpr_arrays.end();
+   I != E; ++I) {
+   delete *I;
+   }
 }
 
 void shader::dump_ir() {
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: use old shader disassembler by default

2013-05-03 Thread Vadim Girlin
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.

This patch adds sbdisasm flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.

If shader optimization is enabled, new disassembler is used by default.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_asm.c| 13 +++--
 src/gallium/drivers/r600/r600_pipe.c   |  1 +
 src/gallium/drivers/r600/r600_pipe.h   |  1 +
 src/gallium/drivers/r600/r600_shader.c | 22 +-
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 81b84ec..df0376a 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2281,6 +2281,7 @@ void *r600_create_vertex_fetch_shader(struct pipe_context 
*ctx,
uint32_t *bytecode;
int i, j, r, fs_size;
struct r600_fetch_shader *shader;
+   unsigned sb_disasm = rctx-screen-debug_flags  (DBG_SB_DISASM | 
DBG_SB);
 
assert(count  32);
 
@@ -2387,13 +2388,13 @@ void *r600_create_vertex_fetch_shader(struct 
pipe_context *ctx,
fprintf(stderr, \n);
}
 
-#if 0
-   r600_bytecode_disasm(bc);
+   if (!sb_disasm) {
+   r600_bytecode_disasm(bc);
 
-   fprintf(stderr, 
__\n);
-#else
-   r600_sb_bytecode_process(rctx, bc, NULL, 1 /*dump*/, 0 
/*optimize*/);
-#endif
+   fprintf(stderr, 
__\n);
+   } else {
+   r600_sb_bytecode_process(rctx, bc, NULL, 1 /*dump*/, 0 
/*optimize*/);
+   }
}
 
fs_size = bc.ndw*4;
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 4991fb2..daadaeb 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -73,6 +73,7 @@ static const struct debug_named_value debug_options[] = {
{ sbstat, DBG_SB_STAT, Print optimization statistics for shaders },
{ sbdump, DBG_SB_DUMP, Print IR dumps after some optimization 
passes },
{ sbnofallback, DBG_SB_NO_FALLBACK, Abort on errors instead of 
fallback },
+   { sbdisasm, DBG_SB_DISASM, Use sb disassembler for shader dumps },
 
DEBUG_NAMED_VALUE_END /* must be last */
 };
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 61e2022..bb4e429 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -264,6 +264,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context 
*ctx,
 #define DBG_SB_STAT(1  24)
 #define DBG_SB_DUMP(1  25)
 #define DBG_SB_NO_FALLBACK (1  26)
+#define DBG_SB_DISASM  (1  27)
 
 struct r600_tiling_info {
unsigned num_channels;
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 49218e5..9afd57f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -141,6 +141,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
uint32_t *ptr;
bool dump = r600_can_dump_shader(rctx-screen, 
tgsi_get_processor_type(sel-tokens));
unsigned use_sb = rctx-screen-debug_flags  DBG_SB;
+   unsigned sb_disasm = use_sb || (rctx-screen-debug_flags  
DBG_SB_DISASM);
 
shader-shader.bc.isa = rctx-isa;
 
@@ -163,21 +164,18 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
return r;
}
 
-#if 0
-   if (dump) {
+   if (dump  !sb_disasm) {
fprintf(stderr, 
--\n);
r600_bytecode_disasm(shader-shader.bc);
fprintf(stderr, 
__\n);
-   }
-#else
-   if (dump || use_sb) {
-   r = r600_sb_bytecode_process(rctx, shader-shader.bc, 
shader-shader, dump, use_sb);
+   } else if ((dump  sb_disasm) || use_sb) {
+   r = r600_sb_bytecode_process(rctx, shader-shader.bc, 
shader-shader,
+dump, use_sb);
if (r) {
R600_ERR(r600_sb_bytecode_process failed !\n);
return r;
}
}
-#endif
 
/* Store the shader in a buffer. */
if (shader-bo == NULL) {
@@ -307,6 +305,8 @@ int r600_compute_shader_create(struct pipe_context * ctx,
boolean use_kill = false;
bool dump = (r600_ctx-screen-debug_flags  DBG_CS) != 0

Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key

2013-05-03 Thread Vadim Girlin

On 05/03/2013 03:10 PM, Lauri Kasanen wrote:

Assigning a struct only copies the members - any padding is left as is.

Thus this code:

struct foo;
foo = bar;

leaves the padding of foo intact, ie uninitialized random garbage.

This patch fixes constant shader recompiles by initializing the struct
to zero.

Signed-off-by: Lauri Kasanen c...@gmx.com
---
  src/gallium/drivers/r600/r600_state_common.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c
b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..bf7cc39
100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -710,7 +710,7 @@ static int r600_shader_select(struct pipe_context
*ctx, struct r600_pipe_shader_selector* sel,
  bool *dirty)
  {
-   struct r600_shader_key key;
+   struct r600_shader_key key = {0};


I suspect the effect of this initialization on padding is undefined. 
Probably it's safer to use memset.


Vadim


struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader * shader = NULL;
int r;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-03 Thread Vadim Girlin

This patch results in lockups with Heaven on juniper for me.

Vadim


On 04/26/2013 09:21 PM, Tom Stellard wrote:

From: Tom Stellard thomas.stell...@amd.com

We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.

The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
---
  src/gallium/drivers/r600/r600_hw_context.c | 28 +---
  1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index b4fb3bf..8aebd25 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
cs-buf[cs-cdw++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
if (rctx-chip_class = EVERGREEN) {
-   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
-   S_0085F0_CB1_DEST_BASE_ENA(1) |
-   S_0085F0_CB2_DEST_BASE_ENA(1) |
-   S_0085F0_CB3_DEST_BASE_ENA(1) |
-   S_0085F0_CB4_DEST_BASE_ENA(1) |
-   S_0085F0_CB5_DEST_BASE_ENA(1) |
-   S_0085F0_CB6_DEST_BASE_ENA(1) |
-   S_0085F0_CB7_DEST_BASE_ENA(1) |
-   S_0085F0_CB8_DEST_BASE_ENA(1) |
-   S_0085F0_CB9_DEST_BASE_ENA(1) |
-   S_0085F0_CB10_DEST_BASE_ENA(1) |
-   S_0085F0_CB11_DEST_BASE_ENA(1) |
-   S_0085F0_DB_DEST_BASE_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_CB_ACTION_ENA(1) |
+   /* We were previously setting the CB and DB bits on
+* cp_coher_cntl, but this is unnecessary since
+* we are emitting the
+* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
+* Setting the CB bits was causing lockups when using
+* compute on cayman.
+*
+* XXX: Do even need to emit a surface sync packet here?
+* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
+* surface sync was not being emitted with the
+* R600_CONTEXT_FLUSH_AND_INV flag.
+*/
+   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_SH_ACTION_ENA(1) |
S_0085F0_SMX_ACTION_ENA(1) |



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-03 Thread Vadim Girlin

On 05/03/2013 05:36 PM, Alex Deucher wrote:

On Fri, May 3, 2013 at 9:30 AM, Vadim Girlin vadimgir...@gmail.com wrote:

This patch results in lockups with Heaven on juniper for me.


Does dropping the surface_sync packet completely help?  We shouldn't
need a surface_sync packet after a CACHE_FLUSH_AND_INV_EVENT packet
and prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b we never emitted
it.


Yes, this patch fixed it.

Vadim



Alex

diff --git a/src/gallium/drivers/r600/r600_hw_context.c
b/src/gallium/drivers/r600/r600_hw_context.c
index 6d8b2cf..944b666 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
 if (rctx-flags  R600_CONTEXT_FLUSH_AND_INV) {
 cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs-buf[cs-cdw++] =
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
-   if (rctx-chip_class = EVERGREEN) {
-   /* We were previously setting the CB and DB bits on
-* cp_coher_cntl, but this is unnecessary since
-* we are emitting the
-* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
-* Setting the CB bits was causing lockups when using
-* compute on cayman.
-*
-* XXX: Do even need to emit a surface sync packet here?
-* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
-* surface sync was not being emitted with the
-* R600_CONTEXT_FLUSH_AND_INV flag.
-*/
-   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_DB_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   } else {
-   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_VC_ACTION_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   }
-   emit_flush = 1;
 }

 if (rctx-flags  R600_CONTEXT_INVAL_READ_CACHES) {




Vadim



On 04/26/2013 09:21 PM, Tom Stellard wrote:


From: Tom Stellard thomas.stell...@amd.com

We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.

The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
---
   src/gallium/drivers/r600/r600_hw_context.c | 28
+---
   1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c
b/src/gallium/drivers/r600/r600_hw_context.c
index b4fb3bf..8aebd25 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
 cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs-buf[cs-cdw++] =
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
 if (rctx-chip_class = EVERGREEN) {
-   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
-   S_0085F0_CB1_DEST_BASE_ENA(1) |
-   S_0085F0_CB2_DEST_BASE_ENA(1) |
-   S_0085F0_CB3_DEST_BASE_ENA(1) |
-   S_0085F0_CB4_DEST_BASE_ENA(1) |
-   S_0085F0_CB5_DEST_BASE_ENA(1) |
-   S_0085F0_CB6_DEST_BASE_ENA(1) |
-   S_0085F0_CB7_DEST_BASE_ENA(1) |
-   S_0085F0_CB8_DEST_BASE_ENA(1) |
-   S_0085F0_CB9_DEST_BASE_ENA(1) |
-   S_0085F0_CB10_DEST_BASE_ENA(1) |
-   S_0085F0_CB11_DEST_BASE_ENA(1) |
-   S_0085F0_DB_DEST_BASE_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_CB_ACTION_ENA(1) |
+   /* We were previously setting the CB and DB bits
on
+* cp_coher_cntl, but this is unnecessary since
+* we are emitting the
+* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet

Re: [Mesa-dev] [PATCH 3/3] radeonsi: fix the max vertex shader input limit

2013-05-02 Thread Vadim Girlin

On 05/02/2013 11:06 AM, Michel Dänzer wrote:

On Don, 2013-05-02 at 05:45 +0200, Marek Olšák wrote:

---
  src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index c923c67..3b9be54 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -481,7 +481,7 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
 case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH:
 return 32;
 case PIPE_SHADER_CAP_MAX_INPUTS:
-   return 32;
+   return shader == PIPE_SHADER_VERTEX ? 16 : 32;


For r600g, I assume the limit of 16 is due to the number of hardware
registers available for vertex shader inputs,


AFAIK there is no such limit on r600 hw as well. IIRC there are at least 
32 registers for semantic fetch mapping, but I think we aren't limited 
even by this because we can use non-semantic fetches (and currently we 
don't use semantic fetches at all). Am I missing something?


Vadim


but as of SI the state is
no longer stored in registers but in resource descriptors in a BO. In
theory, I think we could even support many more inputs than 32, but
let's just leave it at that for now.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] r600g/sb: fix allocation of indirectly addressed input arrays

2013-05-02 Thread Vadim Girlin
Some inputs may be preloaded into predefined GPRs,
so we can't reallocate arrays with such inputs.

Fixes issues with webgl demo: http://oos.moxiecode.com/js_webgl/snake/

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp | 14 +-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp |  6 ++
 src/gallium/drivers/r600/sb/sb_shader.cpp  | 13 +
 src/gallium/drivers/r600/sb/sb_shader.h|  2 +-
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp
index cec4bbc..25c46f7 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp
@@ -345,13 +345,17 @@ void coalescer::init_reg_bitset(sb_bitset bs, val_set 
vs) {
for (val_set::iterator I = vs.begin(sh), E = vs.end(sh); I != E; ++I) {
value *v = *I;
 
-   if (!v-is_sgpr())
+   if (!v-is_any_gpr())
continue;
 
-   if (v-gpr) {
-   if (v-gpr = bs.size())
-   bs.resize(v-gpr + 64);
-   bs.set(v-gpr, 1);
+   unsigned gpr = v-get_final_gpr();
+   if (!gpr)
+   continue;
+
+   if (gpr) {
+   if (gpr = bs.size())
+   bs.resize(gpr + 64);
+   bs.set(gpr, 1);
}
}
 }
diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
index 0447f29..99ff6ff 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
@@ -244,6 +244,12 @@ void ra_init::alloc_arrays() {
cerr  \n;
);
 
+   // skip preallocated arrays (e.g. with preloaded inputs)
+   if (a-gpr) {
+   RA_DUMP( cerr FIXED at   a-gpr  \n; );
+   continue;
+   }
+
bool dead = a-is_dead();
 
if (dead) {
diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 6dd3678..3fd6ea4 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -61,7 +61,7 @@ bool shader::assign_slot(alu_node* n, alu_node *slots[5]) {
return true;
 }
 
-void shader::add_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask,
+void shader::add_pinned_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask,
 bool src) {
unsigned chan = 0;
while (comp_mask) {
@@ -72,6 +72,11 @@ void shader::add_gpr_values(vvec vec, unsigned gpr, 
unsigned comp_mask,
v-gpr = v-pin_gpr = v-select;
v-fix();
}
+   if (v-array  !v-array-gpr) {
+   // if pinned value can be accessed with 
indirect addressing
+   // pin the entire array to its original location
+   v-array-gpr = v-array-base_gpr;
+   }
vec.push_back(v);
}
comp_mask = 1;
@@ -199,7 +204,7 @@ void shader::add_input(unsigned gpr, bool preloaded, 
unsigned comp_mask) {
i.comp_mask = comp_mask;
 
if (preloaded) {
-   add_gpr_values(root-dst, gpr, comp_mask, true);
+   add_pinned_gpr_values(root-dst, gpr, comp_mask, true);
}
 
 }
@@ -217,9 +222,9 @@ void shader::init_call_fs(cf_node* cf) {
for(inputs_vec::const_iterator I = inputs.begin(),
E = inputs.end(); I != E; ++I, ++gpr) {
if (!I-preloaded)
-   add_gpr_values(cf-dst, gpr, I-comp_mask, false);
+   add_pinned_gpr_values(cf-dst, gpr, I-comp_mask, 
false);
else
-   add_gpr_values(cf-src, gpr, I-comp_mask, true);
+   add_pinned_gpr_values(cf-src, gpr, I-comp_mask, true);
}
 }
 
diff --git a/src/gallium/drivers/r600/sb/sb_shader.h 
b/src/gallium/drivers/r600/sb/sb_shader.h
index aa71d54..b2e3837 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.h
+++ b/src/gallium/drivers/r600/sb/sb_shader.h
@@ -315,7 +315,7 @@ public:
value* get_value_version(value* v, unsigned ver);
 
void init();
-   void add_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, bool 
src);
+   void add_pinned_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, 
bool src);
 
void dump_ir();
 
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] r600g/sb: fix handling of interference sets in post_scheduler

2013-05-02 Thread Vadim Girlin
post_scheduler clears interference set for reallocatable values when
the value becomes live first time, and then updates it to take into
account modified order of operations, but this was not handled properly
if the value appears first time as a source in copy operation.

Fixes issues with webgl demo: http://madebyevan.com/webgl-water/

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 12 ++--
 src/gallium/drivers/r600/sb/sb_sched.h   |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 7e9eacc..d7c1795 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -874,7 +874,7 @@ void post_scheduler::update_local_interferences() {
}
 }
 
-void post_scheduler::update_live_src_vec(vvec vv, val_set born, bool src) {
+void post_scheduler::update_live_src_vec(vvec vv, val_set *born, bool src) {
for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) {
value *v = *I;
 
@@ -892,7 +892,8 @@ void post_scheduler::update_live_src_vec(vvec vv, val_set 
born, bool src) {
cleared_interf.add_val(v);
}
}
-   born.add_val(v);
+   if (born)
+   born-add_val(v);
}
} else if (v-is_rel()) {
if (!v-rel-is_any_gpr())
@@ -924,7 +925,7 @@ void post_scheduler::update_live_dst_vec(vvec vv) {
}
 }
 
-void post_scheduler::update_live(node *n, val_set born) {
+void post_scheduler::update_live(node *n, val_set *born) {
update_live_dst_vec(n-dst);
update_live_src_vec(n-src, born, true);
update_live_src_vec(n-dst, born, false);
@@ -948,7 +949,7 @@ void post_scheduler::process_group() {
if (!n)
continue;
 
-   update_live(n, vals_born);
+   update_live(n, vals_born);
}
 
PSC_DUMP(
@@ -1550,8 +1551,7 @@ bool post_scheduler::check_copy(node *n) {
if (s-is_prealloc()  !map_src_val(s))
return true;
 
-   live.remove_val(d);
-   live.add_val(s);
+   update_live(n, NULL);
 
release_src_values(n);
n-remove();
diff --git a/src/gallium/drivers/r600/sb/sb_sched.h 
b/src/gallium/drivers/r600/sb/sb_sched.h
index e74046c..a74484f 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.h
+++ b/src/gallium/drivers/r600/sb/sb_sched.h
@@ -297,9 +297,9 @@ public:
bool recolor_local(value *v);
 
void update_local_interferences();
-   void update_live_src_vec(vvec vv, val_set born, bool src);
+   void update_live_src_vec(vvec vv, val_set *born, bool src);
void update_live_dst_vec(vvec vv);
-   void update_live(node *n, val_set born);
+   void update_live(node *n, val_set *born);
void process_group();
 
void set_color_local_val(value *v, sel_chan color);
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] r600g/sb: silence warnings with gcc 4.8

2013-05-02 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_ra_init.cpp | 25 +++--
 src/gallium/drivers/r600/sb/sb_sched.cpp   |  4 
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
index 99ff6ff..03b8efd 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
@@ -75,7 +75,7 @@ public:
 
void set(unsigned index, unsigned val);
 
-   sel_chan find_free_bit(unsigned start);
+   sel_chan find_free_bit();
sel_chan find_free_chans(unsigned mask);
sel_chan find_free_array(unsigned size, unsigned mask);
 
@@ -148,24 +148,21 @@ void regbits::set(unsigned index, unsigned val) {
 }
 
 // free register for ra means the bit is set
-sel_chan regbits::find_free_bit(unsigned start) {
-   unsigned elt = start  bt_index_shift;
-   unsigned bit = start  bt_index_mask;
-
-   unsigned end = start  MAX_GPR - num_temps ? MAX_GPR - num_temps : 
MAX_GPR;
+sel_chan regbits::find_free_bit() {
+   unsigned elt = 0;
+   unsigned bit = 0;
 
-   while (elt  end  !dta[elt]) {
+   while (elt  size  !dta[elt])
++elt;
-   bit = 0;
-   }
 
-   if (elt = end)
+   if (elt = size)
return 0;
 
-   // FIXME this seems broken when not starting from 0
+   bit = __builtin_ctz(dta[elt]) + (elt  bt_index_shift);
+
+   assert(bit  MAX_GPR - num_temps);
 
-   bit += __builtin_ctz(dta[elt]);
-   return ((elt  bt_index_shift) | bit) + 1;
+   return bit + 1;
 }
 
 // find free gpr component to use as indirectly addressable array
@@ -482,7 +479,7 @@ void ra_init::color(value* v) {
unsigned mask = 1  v-pin_gpr.chan();
c = rb.find_free_chans(mask) + v-pin_gpr.chan();
} else {
-   c = rb.find_free_bit(0);
+   c = rb.find_free_bit();
}
 
assert(c  c.sel()  128 - ctx.alu_temp_gprs  color failed);
diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index d7c1795..b21b342 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -542,6 +542,10 @@ bool alu_group_tracker::try_reserve(alu_node* n) {
 
assert(first_slot != ~0  last_slot != ~0);
 
+   // silence array subscript is above array bounds with gcc 4.8
+   if (last_slot = 5)
+   abort();
+
int i = first_nf;
alu_node *a = slots[i];
bool backtrack = false;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] r600g/sb: don't run unnecessary passes

2013-05-02 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_core.cpp | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp 
b/src/gallium/drivers/r600/sb/sb_core.cpp
index 9f81ed4..b919fa4 100644
--- a/src/gallium/drivers/r600/sb/sb_core.cpp
+++ b/src/gallium/drivers/r600/sb/sb_core.cpp
@@ -187,9 +187,6 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
SB_RUN_PASS(dce_cleanup,1);
SB_RUN_PASS(def_use,0);
 
-   SB_RUN_PASS(liveness,   0);
-   SB_RUN_PASS(dce_cleanup,0);
-
SB_RUN_PASS(ra_split,   0);
SB_RUN_PASS(def_use,0);
 
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeonsi: fix the max vertex shader input limit

2013-05-02 Thread Vadim Girlin

On 05/02/2013 07:55 PM, Marek Olšák wrote:

AFAIK, there are 16 fetch shader resources. These are the resource
slots for r600:

Ah, you are right (though it's higher on EG as Alex wrote). Anyway, I'm 
not against your patch, I just wanted to understand where this limit 
comes from. I think this cap itself is a bit misleading, because its 
description in the docs is about input registers, not about 
resources/buffers, and shader inputs do not necessarily come from 
separate resources.


Vadim



[offset .. +count]
PS: 0   .. +160
VS: 160 .. +160
FS: 320 .. +16
GS: 336 .. +160

Marek

On Thu, May 2, 2013 at 5:04 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 05/02/2013 11:06 AM, Michel Dänzer wrote:


On Don, 2013-05-02 at 05:45 +0200, Marek Olšák wrote:


---
   src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index c923c67..3b9be54 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -481,7 +481,7 @@ static int r600_get_shader_param(struct pipe_screen*
pscreen, unsigned shader, e
  case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH:
  return 32;
  case PIPE_SHADER_CAP_MAX_INPUTS:
-   return 32;
+   return shader == PIPE_SHADER_VERTEX ? 16 : 32;



For r600g, I assume the limit of 16 is due to the number of hardware
registers available for vertex shader inputs,



AFAIK there is no such limit on r600 hw as well. IIRC there are at least 32
registers for semantic fetch mapping, but I think we aren't limited even by
this because we can use non-semantic fetches (and currently we don't use
semantic fetches at all). Am I missing something?

Vadim



but as of SI the state is
no longer stored in registers but in resource descriptors in a BO. In
theory, I think we could even support many more inputs than 32, but
let's just leave it at that for now.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600 sb test results

2013-05-02 Thread Vadim Girlin

On 05/02/2013 06:34 PM, Lauri Kasanen wrote:

On Thu, 02 May 2013 00:45:13 +0400
Vadim Girlin vadimgir...@gmail.com wrote:


On 05/01/2013 11:36 PM, Lauri Kasanen wrote:

Now that it built, I could test your optimizations in my own apps.
These are on current master 8eef6ad, on a RV710 (HD 4350 pci-e).

In one of my private apps, using R600_DEBUG=sb caused regressions: FPS
went from 28 to 7, the SSAO shader gave visual distortions/flicker, and
the cpu was constantly pegged.

Here's the output from R600_DEBUG=sb,sbstat in case it helps:
http://bayfiles.net/file/Pmkh/PUj0Ru/vadim.gz

It seems as if it's constantly handling new shaders? My app certainly
issues no new shaders, they are all linked when the app starts.


r600g may rebuild shaders at runtime because some GL features are
implemented in shader code, so if your app changes some specific GL
states (e.g. two-sided rendering mode), then r600g has to build and
switch between different shader variants.


It mainly uses the stencil buffer, the clear color is changed in
various passes, some occlusion queries with color masks, but nothing
exotic. New uniforms are of course sent each frame.


On the other hand there is caching of shader variants in r600g
implemented specially to prevent repetitive rebuilding of shaders, looks
like it doesn't work in your case for some reason. Optimizations take
more time than rebuilding with default backend, that explains
performance regression.

Could you provide some test app that reproduces these issues?


It's quite time-taking to cut it down, and apitraces of it in full are
several gigs (far too much to upload with my connection). I'll see if I
can get just the SSAO isolated, with minimal textures, to get a smaller
trace.


I'm almost sure that the same issue that you have with glxgears affects 
your app too, so you might want to wait until we resolve the problem 
with gears, possibly this will solve other rendering issues as well.





Please also send me the dump with R600_DEBUG=sb,ps,vs, maybe I'll be
able to spot anything wrong there.


http://bayfiles.net/file/PmY5/xgIdlZ/foo.gz


Let me know what you need to debug this.

- Lauri

PS: I'm not sure if this should be public or not, I think you're the
only one working on it?


Yes, I doubt that anyone else will work on it, on the other hand I think
reporting this on the list might help other users who will possibly hit
similar issues. Also at least in this case it looks rather like a
problem in r600g, so I'm cc'ing mesa-dev, r600-sb just made this issue
more noticeable because shader rebuilding with optimization requires
more time.


Using standard r600g, the cpu usage is less than 25% of one core, so
nothing was showing it was constantly rebuilding shaders. Is there some
way I could've found it was doing that, and if so, why?


You could run the app with R600_DEBUG=ps,vs (without sb) - it will 
also print the dump of every built shader. r600-sb doesn't affect the 
logic of shader rebuilding, it just processes the shaders when asked by 
r600g, so I think you'll see the same - a lot of built shaders. You 
could even try this with older mesa (before r600-sb was merged) to be sure.


As for the cause of rebuilding, I don't see any changes in the shaders 
in your dump that might be explained by state changes, it's exactly the 
same shaders rebuilt more than once, so far I don't know why. You might 
want to look into r600_shader_select function with debugger to see 
what's going wrong, it computes the key for required shader variant 
using r600_shader_selector_key, then looks at the list of variants to 
find already built shader with the same key, and builds a new one only 
if it can't find existing shader. Looks like something fails there.


By the way, I won't be very surprised if some old gcc release simply 
fails at handling bitfields which are used to store both the keys of 
shader variants in r600g and bytecode data in r600-sb (the same data 
that ends up being broken in your glxgears dump), IIRC there were 
bitfields-related bugs.


Vadim



- Lauri



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: use hex instead of binary constants

2013-05-01 Thread Vadim Girlin
This should fix build issues with GCC  4.3

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

cc: Lauri Kasanen c...@gmx.com
Lauri, please test to make sure that I didn't miss anything.

 src/gallium/drivers/r600/r600_shader.c   |  6 +++---
 src/gallium/drivers/r600/sb/sb_bc.h  |  4 ++--
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 10 +-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp   |  2 +-
 src/gallium/drivers/r600/sb/sb_sched.cpp |  8 
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index fd3fe39..49218e5 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1005,7 +1005,7 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
r600_add_gpr_array(ctx-shader,
   
ctx-file_offset[TGSI_FILE_TEMPORARY] +
   
d-Range.First,
-  d-Range.Last - d-Range.First + 
1, 0b);
+  d-Range.Last - d-Range.First + 
1, 0x0F);
}
}
break;
@@ -1421,13 +1421,13 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
r600_add_gpr_array(shader, 
ctx.file_offset[TGSI_FILE_INPUT],
   ctx.file_offset[TGSI_FILE_OUTPUT] -
   ctx.file_offset[TGSI_FILE_INPUT],
-  0b);
+  0x0F);
}
if (ctx.info.indirect_files  (1  TGSI_FILE_OUTPUT)) {
r600_add_gpr_array(shader, 
ctx.file_offset[TGSI_FILE_OUTPUT],
   ctx.file_offset[TGSI_FILE_TEMPORARY] 
-
   ctx.file_offset[TGSI_FILE_OUTPUT],
-  0b);
+  0x0F);
}
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 0b9bc07..9c6ed46 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -553,9 +553,9 @@ public:
unsigned mask = 0;
unsigned slot_flags = alu_slots(op_ptr);
if (slot_flags  AF_V)
-   mask = 0b0;
+   mask = 0x0F;
if (!is_cayman()  (slot_flags  AF_S))
-   mask |= 0b1;
+   mask |= 0x10;
return mask;
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index cc75528..e1478d3 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -126,7 +126,7 @@ int bc_parser::parse_decls() {
 
 #if SB_NO_ARRAY_INFO
 
-   sh-add_gpr_array(0, pshader-bc.ngpr, 0b);
+   sh-add_gpr_array(0, pshader-bc.ngpr, 0x0F);
 
 #else
 
@@ -140,7 +140,7 @@ int bc_parser::parse_decls() {
}
 
} else {
-   sh-add_gpr_array(0, pshader-bc.ngpr, 0b);
+   sh-add_gpr_array(0, pshader-bc.ngpr, 0x0F);
}
 
 
@@ -149,7 +149,7 @@ int bc_parser::parse_decls() {
}
 
if (sh-target == TARGET_VS)
-   sh-add_input(0, 1, 0b);
+   sh-add_input(0, 1, 0x0F);
 
bool ps_interp = ctx.hw_class = HW_CLASS_EVERGREEN
 sh-target == TARGET_PS;
@@ -159,7 +159,7 @@ int bc_parser::parse_decls() {
for (unsigned i = 0; i  pshader-ninput; ++i) {
r600_shader_io  in = pshader-input[i];
bool preloaded = sh-target == TARGET_PS  !(ps_interp  
in.spi_sid);
-   sh-add_input(in.gpr, preloaded, /*in.write_mask*/ 0b);
+   sh-add_input(in.gpr, preloaded, /*in.write_mask*/ 0x0F);
if (ps_interp  in.spi_sid) {
if (in.interpolate == TGSI_INTERPOLATE_LINEAR ||
in.interpolate == 
TGSI_INTERPOLATE_COLOR)
@@ -176,7 +176,7 @@ int bc_parser::parse_decls() {
unsigned gpr = 0;
 
while (mask) {
-   sh-add_input(gpr, true, mask  0b);
+   sh-add_input(gpr, true, mask  0x0F);
++gpr;
mask = 4;
}
diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp 
b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
index 75b2d5d..0447f29 100644
--- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp
@@ -360,7 +360,7 @@ void

Re: [Mesa-dev] r600-sb: glxgears wrong rendering

2013-05-01 Thread Vadim Girlin

On 05/01/2013 11:42 PM, Lauri Kasanen wrote:

Hi

Running R600_DEBUG=sb glxgears on a RV710 gives wrong output:
http://i40.tinypic.com/t7gx09.png

This is on current master, git-8eef6ad.

Let me know what you need to debug this.


Please send me the output with R600_DEBUG=sb,ps,vs

Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600 sb test results

2013-05-01 Thread Vadim Girlin

On 05/01/2013 11:36 PM, Lauri Kasanen wrote:

Hi Vadim

Now that it built, I could test your optimizations in my own apps.
These are on current master 8eef6ad, on a RV710 (HD 4350 pci-e).

In one of my private apps, using R600_DEBUG=sb caused regressions: FPS
went from 28 to 7, the SSAO shader gave visual distortions/flicker, and
the cpu was constantly pegged.

Here's the output from R600_DEBUG=sb,sbstat in case it helps:
http://bayfiles.net/file/Pmkh/PUj0Ru/vadim.gz

It seems as if it's constantly handling new shaders? My app certainly
issues no new shaders, they are all linked when the app starts.


Hi,

r600g may rebuild shaders at runtime because some GL features are 
implemented in shader code, so if your app changes some specific GL 
states (e.g. two-sided rendering mode), then r600g has to build and 
switch between different shader variants.


On the other hand there is caching of shader variants in r600g 
implemented specially to prevent repetitive rebuilding of shaders, looks 
like it doesn't work in your case for some reason. Optimizations take 
more time than rebuilding with default backend, that explains 
performance regression.


Could you provide some test app that reproduces these issues?

Please also send me the dump with R600_DEBUG=sb,ps,vs, maybe I'll be 
able to spot anything wrong there.




Let me know what you need to debug this.

- Lauri

PS: I'm not sure if this should be public or not, I think you're the
only one working on it?


Yes, I doubt that anyone else will work on it, on the other hand I think 
reporting this on the list might help other users who will possibly hit 
similar issues. Also at least in this case it looks rather like a 
problem in r600g, so I'm cc'ing mesa-dev, r600-sb just made this issue 
more noticeable because shader rebuilding with optimization requires 
more time.


Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: mask unused source components for SAMPLE

2013-04-27 Thread Vadim Girlin
This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

There are no piglit regressions with this patch on evergreen.

I consider this as a prerequisite for r600-sb branch, it fixes the performance
regression with optimized shaders uncovered by some recent changes to tgsi
and/or r600 codegen.

If there are no objections or new suggestions, is it OK to push the latest
version of r600-sb-2 branch [1] that includes this patch?

The changes in the branch after the recent mail include 3 additional patches
to improve handling of some corner cases (they fix some issues reported on IRC),
also they add switching to unoptimized code in case of possible internal
optimization problems, and new option sbnofallback for R600_DEBUG to disable
such fallback. 

Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2

 src/gallium/drivers/r600/r600_shader.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 0204f80..aa88252 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4739,6 +4739,26 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
/* the array index is read from Z */
tex.coord_type_z = 0;
 
+   /* mask unused source components */
+   if (opcode == FETCH_OP_SAMPLE) {
+   switch (inst-Texture.Texture) {
+   case TGSI_TEXTURE_2D:
+   case TGSI_TEXTURE_RECT:
+   tex.src_sel_z = 7;
+   tex.src_sel_w = 7;
+   break;
+   case TGSI_TEXTURE_1D_ARRAY:
+   tex.src_sel_y = 7;
+   tex.src_sel_w = 7;
+   break;
+   case TGSI_TEXTURE_1D:
+   tex.src_sel_y = 7;
+   tex.src_sel_z = 7;
+   tex.src_sel_w = 7;
+   break;
+   }
+   }
+
r = r600_bytecode_add_tex(ctx-bc, tex);
if (r)
return r;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: mask unused source components for SAMPLE

2013-04-27 Thread Vadim Girlin

On 04/27/2013 02:53 PM, Marek Olšák wrote:

Reviewed-by: Marek Olšák mar...@gmail.com

This looks incomplete though. There are a lot more texture opcodes and
texture targets which could be handled there as well.


Yes, this patch handles most trivial cases, though I think they are most 
frequently used cases as well. Also it covers all known to me cases 
where it caused problems for optimization.


I'll look into other cases later - they are more complex, so there is 
more chances to break something (I'm not sure about piglit coverage for 
this), and IIRC many of them either actually use all components of 
source register or modify the swizzles in such a way that there is no 
unused components, e.g. xyzz with SHADOW2D/SAMPLE_C.


Vadim


Marek

On Sat, Apr 27, 2013 at 10:29 AM, Vadim Girlin vadimgir...@gmail.com wrote:

This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---

There are no piglit regressions with this patch on evergreen.

I consider this as a prerequisite for r600-sb branch, it fixes the performance
regression with optimized shaders uncovered by some recent changes to tgsi
and/or r600 codegen.

If there are no objections or new suggestions, is it OK to push the latest
version of r600-sb-2 branch [1] that includes this patch?

The changes in the branch after the recent mail include 3 additional patches
to improve handling of some corner cases (they fix some issues reported on IRC),
also they add switching to unoptimized code in case of possible internal
optimization problems, and new option sbnofallback for R600_DEBUG to disable
such fallback.

Vadim

   [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2

  src/gallium/drivers/r600/r600_shader.c | 20 
  1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 0204f80..aa88252 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4739,6 +4739,26 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
 /* the array index is read from Z */
 tex.coord_type_z = 0;

+   /* mask unused source components */
+   if (opcode == FETCH_OP_SAMPLE) {
+   switch (inst-Texture.Texture) {
+   case TGSI_TEXTURE_2D:
+   case TGSI_TEXTURE_RECT:
+   tex.src_sel_z = 7;
+   tex.src_sel_w = 7;
+   break;
+   case TGSI_TEXTURE_1D_ARRAY:
+   tex.src_sel_y = 7;
+   tex.src_sel_w = 7;
+   break;
+   case TGSI_TEXTURE_1D:
+   tex.src_sel_y = 7;
+   tex.src_sel_z = 7;
+   tex.src_sel_w = 7;
+   break;
+   }
+   }
+
 r = r600_bytecode_add_tex(ctx-bc, tex);
 if (r)
 return r;
--
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-21 Thread Vadim Girlin

On 04/21/2013 04:04 AM, Marek Olšák wrote:

Ah, I didn't know you had any other env vars. It's preferable to have
as many boolean flags as possible handled by a single env var, because
it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of
boolean flags hidden behind a magic number). Feel free to have
separate env vars for more complex parameters.

I skimmed through some of your code and the coding style looks good.
I'm also okay with C++, it really seems like the right choice here.
However I agree with the argument that one header file per cpp might
not always be a good idea, especially if the header file is pretty
small.



Thanks for reviewing. I pushed to my repo the branch with the following 
changes:


- changes to existing r600g code splitted from the main big patch

- small header files merged into sb_pass.h, sb_ir.h, sb_bc.h

- added new R600_DEBUG flags to replace multiple env vars:
sb - Enable optimization of graphics shaders
sbcl - Enable optimization of compute shaders
sbdry - Dry run, optimize but don't use new bytecode
sbstat - Print optimization statistics (currently the time only)
sbdump - Print IR after some passes.

- added debug_id (shader index) to struct r600_bytecode, id's are 
assigned to each shader in r600_bytecode_init and printed in the shader 
dump header, it's intended to avoid reinventing shader numbering in 
different places for dumps and debugging.


- some minor cleanups

Updated branch can be found here:

  http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2

Vadim


Marek

On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 04/20/2013 03:11 AM, Marek Olšák wrote:


Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.



I agree, those vars probably need some cleanup, they were added before
R600_DEBUG appeared.

Though I'm afraid some of my options won't fit well into the R600_DEBUG
flags, unless we'll add support for the name/value pairs with optional
custom parsers.

E.g. I have a group of env vars to define the range of included/excluded
shaders for optimization and mode (include/exclude/off), I thought about
doing this with a single var and custom parser to specify the range e.g. as
10-20, but after all it's just a debug feature, not intended for everyday
use, and so far I failed to convince myself that it's worth the efforts.

I can implement the support for custom parsers for R600_DEBUG, but do we
really need it? Maybe it would be enough to add e.g.sb instead of R600_SB
var to the R600_DEBUG flags for enabling it (probably together with other
boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated
internal debug options as is?

Vadim



There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com
wrote:


Hi,

In the previous status update I said that the r600-sb branch is not ready
to
be merged yet, but recently I've done some cleanups and reworks, and
though
I haven't finished everything that I planned initially, I think now it's
in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary
goal
for the r600g developers, it's a complicated process and may require
quite
some time to achieve good results regarding the shader/compiler
performance,
and at the same time this branch already works and provides good results
in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader
performance
problems. We can always get rid of it if it becomes too much a
maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my
best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other
chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch
is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from
r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that
is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the
new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:11 AM, Marek Olšák wrote:

Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.


I agree, those vars probably need some cleanup, they were added before 
R600_DEBUG appeared.


Though I'm afraid some of my options won't fit well into the R600_DEBUG 
flags, unless we'll add support for the name/value pairs with optional 
custom parsers.


E.g. I have a group of env vars to define the range of included/excluded 
shaders for optimization and mode (include/exclude/off), I thought about 
doing this with a single var and custom parser to specify the range e.g. 
as 10-20, but after all it's just a debug feature, not intended for 
everyday use, and so far I failed to convince myself that it's worth the 
efforts.


I can implement the support for custom parsers for R600_DEBUG, but do we 
really need it? Maybe it would be enough to add e.g.sb instead of 
R600_SB var to the R600_DEBUG flags for enabling it (probably together 
with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave 
more complicated internal debug options as is?


Vadim


There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote:

Hi,

In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary goal
for the r600g developers, it's a complicated process and may require quite
some time to achieve good results regarding the shader/compiler performance,
and at the same time this branch already works and provides good results in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader performance
problems. We can always get rid of it if it becomes too much a maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think I'll be able to
catch up when it's implemented in the driver and default or llvm backend.
E.g. this branch already works for me on evergreen with some simple OpenCL
kernels, including bfgminer where it increases performance of the kernel
compiled with llvm backend by more than 20% for me.

Besides the performance benefits, I think that alternative backend also
might help with debugging of the default or llvm backend, in some cases it
helped me by exposing the bugs that are not very obvious otherwise, e.g. it
may be hard to compare the dumps from default and llvm backend to spot the
regression because they are too different, but after processing both shaders
with r600-sb the code is usually transformed to some more common form, and
often this makes it easier to compare and find the differences in shader
logic.

One additional feature that might help with llvm backend debugging is the
disassembler that works on the hardware bytecode instead of the internal
r600g bytecode structs. This results in the more readable shader dumps for
instructions passed in native hw encoding from llvm backend. I think this
also can help to catch more potential bugs related to bytecode building in
r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
dumps, including the fetch shaders, even when optimization is not enabled.
Basically it can replace r600_bytecode_disasm and related code completely.

Below are some quick benchmarks for shader performance and compilation time,
to demonstrate that currently r600-sb might provide better performance for
users, at least in some cases.

As an example of the shaders

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least need 
some callbacks or caps to be tunable for the target.


Anyway the result of GCM pass is affected by the CFG structure, so when 
the target applies e.g. if-conversion or any other target-specific 
control flow optimization, this means that you might want to apply 
similar pass again on the target instruction level for better results, 
and then previous pass on higher level IR looks not very useful.


Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. High-level 
pass can't hoist i/5 (where i is loop counter) out of the loop, but 
after translation to target instructions it's possible to hoist some of 
the resulting instructions, producing more efficient code.


One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care 
about code placement during elimination of redundant operations, so 
you'll probably want to implement high-level GVN pass as well.


I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I 
suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.




Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own representation?


Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary control 
flow, this renders CFGs superfluous. Implementation of these passes on 
CFGs will be more complicated, it will also require the computation of 
dominance frontiers, loops detection and analysis, etc. On the r600-sb's 
IR these passes are greatly simplified.


Regarding the GCM, original algorithm as described in that pdf works on 
the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure 
how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and 
other passes that together do basically the same thing as GVN-GCM, so if 
you implement it, you might want to get rid of LLVM's own passes that 
duplicate the same functionality, and I'm not sure if this would be 
easy, possibly there are some interdependencies etc. Also I saw mentions 
of some plans (e.g. [1],[2]) regarding the implementation of global code 
motion in LLVM, looks like there is already some work in progress.


Vadim

[1] 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
[2] 
http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results




Christian.


Vadim

 [1]
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb

 [2

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:38 PM, Christian König wrote:

Am 20.04.2013 13:12, schrieb Vadim Girlin:

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement.
Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the
efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that
always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least
need some callbacks or caps to be tunable for the target.

Anyway the result of GCM pass is affected by the CFG structure, so
when the target applies e.g. if-conversion or any other
target-specific control flow optimization, this means that you might
want to apply similar pass again on the target instruction level for
better results, and then previous pass on higher level IR looks not
very useful.

Also there are some high level operations that are translated to the
bunch of target instructions, e.g. integer division on r600.
High-level pass can't hoist i/5 (where i is loop counter) out of the
loop, but after translation to target instructions it's possible to
hoist some of the resulting instructions, producing more efficient code.

One more point is that GCM allows to achieve best efficiency when used
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not
care about code placement during elimination of redundant operations,
so you'll probably want to implement high-level GVN pass as well.

I think it's possible to implement GVN-GCM on GLSL or TGSI level, but
I suspect it will require a lot more efforts than it was required by
implementation of these passes in my branch, and will be less efficient.



Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own
representation?


Main difference between IRs is the representation of control flow,
r600-sb relies on the fact that r600 arch doesn't have arbitrary
control flow, this renders CFGs superfluous. Implementation of these
passes on CFGs will be more complicated, it will also require the
computation of dominance frontiers, loops detection and analysis, etc.
On the r600-sb's IR these passes are greatly simplified.

Regarding the GCM, original algorithm as described in that pdf works
on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not
sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE,
LICM and other passes that together do basically the same thing as
GVN-GCM, so if you implement it, you might want to get rid of LLVM's
own passes that duplicate the same functionality, and I'm not sure if
this would be easy, possibly there are some interdependencies etc.
Also I saw mentions of some plans (e.g. [1],[2]) regarding the
implementation of global code motion in LLVM, looks like there is
already some work in progress.



Oh, I wasn't taking about replacing any LLVM passes, more like extending
them to provide the same amount of functionality. Also I hadn't had LLVM
IR in mind while writing this, but more the machine instruction
representation they use.

Well you have quite allot of C

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 07:05 PM, Henri Verbeet wrote:

On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote:

The choice of C++ (unlike in my previous branch that used C) was mostly
driven by the fact that optimization algorithms usually deal with a lot of
different complex data structures, containers, etc, and C++ allows to
isolate implementation of all such things in separate and easily replaceable
classes and concentrate on the logic, making the code more clean and
readable.


I'm sure it would be good fun to have a discussion about the relative
merits of C and C++, though I think I've seen enough actual C++ that
you're not going to convince me it's the better language.


I never wanted to convince you that C++ is better language, I just 
wanted to explain why I decided to switch from C to C++ in this 
particular case.



However, I
don't think that should be the main consideration. It's probably more
important to consider what current and potential new contributors
prefer, and on Linux, particularly for the more low-level stuff, I
suspect that pretty much means C.


Well, it may be considered as a low-level stuff because it's a part of 
the driver. On the other hand, I'd rather think of it as a part of the 
compiler, and compilers (especially optimization algorithms) don't 
really look like a low-level stuff to me. Depends on the definition of 
the low-level stuff though.


To name a few examples, we can look at the compilers/optimizing backends 
used by mesa/gallium: GLSL compiler (written in C++). LLVM (written in 
C++), backends for nvidia drivers (written in C++)...


Vadim




I haven't tried to keep it as a series of independent patches because during
the development most changes were pretty intrusive and introduced new
features, some parts were seriously reworked/rewritten more than one time,
requiring changes in other parts, especially when intermediate
representation of the code was changed. It was usually easier for me to
simply fix the new regressions in the new code than to revert any changes
and lose new features, so bisection wouldn't be very helpful anyway. That's
why I didn't even try to keep the history. Anyway most of the code in the
branch is new, so I don't think that the history of the patches that rewrite
the same code few times during a development would make it more readable
than simply reading the final code.


I think I'm just going to disagree there. (But of course that's all
just my personal opinion, which probably doesn't carry a lot of weight
at the moment.)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

Hi,

In the previous status update I said that the r600-sb branch is not 
ready to be merged yet, but recently I've done some cleanups and 
reworks, and though I haven't finished everything that I planned 
initially, I think now it's in a better state and may be considered for 
merging.


I'm interested to know if the people think that merging of the r600-sb 
branch makes sense at all. I'll try to explain here why it makes sense 
to me.


Although I understand that the development of llvm backend is a primary 
goal for the r600g developers, it's a complicated process and may 
require quite some time to achieve good results regarding the 
shader/compiler performance, and at the same time this branch already 
works and provides good results in many cases. That's why I think it 
makes sense to merge this branch as a non-default backend at least as a 
temporary solution for shader performance problems. We can always get 
rid of it if it becomes too much a maintenance burden or when llvm 
backend catches up in terms of shader performance and compilation 
speed/overhead.


Regarding the support and maintenance of this code, I'll try to do my 
best to fix possible issues, and so far there are no known unfixed 
issues. I tested it with many apps on evergreen and fixed all issues 
with other chips that were reported to me on the list or privately after 
the last status announce. There are no piglit regressions on evergreen 
when this branch is used with both default and llvm backends.


This code was intentionally separated as much as possible from the other 
parts of the driver, basically there are just two functions used from 
r600g, and the shader code is passed to/from r600-sb as a hardware 
bytecode that is not going to change. I think it won't require any 
modifications at all to keep it in sync with the most changes in r600g.


Some work might be required though if we'll want to add support for the 
new hw features that are currently unused, e.g. geometry shaders, new 
instruction types for compute shaders, etc, but I think I'll be able to 
catch up when it's implemented in the driver and default or llvm 
backend. E.g. this branch already works for me on evergreen with some 
simple OpenCL kernels, including bfgminer where it increases performance 
of the kernel compiled with llvm backend by more than 20% for me.


Besides the performance benefits, I think that alternative backend also 
might help with debugging of the default or llvm backend, in some cases 
it helped me by exposing the bugs that are not very obvious otherwise, 
e.g. it may be hard to compare the dumps from default and llvm backend 
to spot the regression because they are too different, but after 
processing both shaders with r600-sb the code is usually transformed to 
some more common form, and often this makes it easier to compare and 
find the differences in shader logic.


One additional feature that might help with llvm backend debugging is 
the disassembler that works on the hardware bytecode instead of the 
internal r600g bytecode structs. This results in the more readable 
shader dumps for instructions passed in native hw encoding from llvm 
backend. I think this also can help to catch more potential bugs related 
to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode 
disassembler for all shader dumps, including the fetch shaders, even 
when optimization is not enabled. Basically it can replace 
r600_bytecode_disasm and related code completely.


Below are some quick benchmarks for shader performance and compilation 
time, to demonstrate that currently r600-sb might provide better 
performance for users, at least in some cases.


As an example of the shaders with good optimization opportunities I used 
the application that computes and renders atmospheric scattering 
effects, it was mentioned in the previous thread:

http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html

Here are current results for that app (Main.noprecompute, frames per 
second) with default backend, default backend + r600-sb, and llvm backend:

def def+sb  llvm
240 590 248

Another quick benchmark is an OpenCL kernel performance with bfgminer 
(megahash/s):

llvmllvm+sb
68  87  

One more benchmark is for compilation speed/overhead - I used two piglit 
tests, first compiles a lot of shaders (IIRC more than thousand), second 
compiles a few huge shaders. Result is a test run time in seconds, this 
includes not only the compilation time but anyway shows the difference:

def def+sb  llvm
tfb max-varyings10  14  53
fp-long-alu 0.170.380.68

This is especially important for GL apps, because longer compilation 
time results in the more significant freezes in the games etc. As for 
the quality of the compiled code in this test, of course generally llvm 
backend is already able to produce better code in some 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 07:23 PM, Henri Verbeet wrote:

On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote:

In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.


Personally, I'd be in favour of merging this at some point. While I
haven't exactly done extensive testing or benchmarking with the
branch, the things I did try at least worked correctly, so I'd say
that's a good start at least.

I'm afraid I can't claim extensive review either, but I guess the most
obvious things I don't like about it are that it's C++, and spread
over a large number of  1000 line files. Similarly, I don't really
see the point of  having a header file for just about each .cpp file.
One for private interfaces and one for the public interface should
probably be plenty.


I thought about that, but I'm just not sure what would be a preferred 
way. I agree that a lot of small files don't look very good, on the 
other hand it makes all classes better separated and readable, that's 
why I was not sure which way is best. Of course I can merge some files 
together if it's preferable.



I'm not quite sure how others feel about that,
although I suspect I'm not alone in at least the preference of C over
C++.


The choice of C++ (unlike in my previous branch that used C) was mostly 
driven by the fact that optimization algorithms usually deal with a lot 
of different complex data structures, containers, etc, and C++ allows to 
isolate implementation of all such things in separate and easily 
replaceable classes and concentrate on the logic, making the code more 
clean and readable.



I also suspect it would help if this was some kind of logical,
bisectable series of patches instead of a single commit that adds 18k+
lines.


I haven't tried to keep it as a series of independent patches because 
during the development most changes were pretty intrusive and introduced 
new features, some parts were seriously reworked/rewritten more than one 
time, requiring changes in other parts, especially when intermediate 
representation of the code was changed. It was usually easier for me to 
simply fix the new regressions in the new code than to revert any 
changes and lose new features, so bisection wouldn't be very helpful 
anyway. That's why I didn't even try to keep the history. Anyway most of 
the code in the branch is new, so I don't think that the history of the 
patches that rewrite the same code few times during a development would 
make it more readable than simply reading the final code.


Vadim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 07:13 PM, � wrote:

Hi Vadim,

from your description it seems to be a post processing stage working on
the bytecode of the shaders and additional to that is quite separated
from the rest of the driver.


Yes, currently it's more like a post-processing stage, though on the 
other hand the only missing thing to consider it as a complete backend 
is an initial TGSI translator (that is, a sort of instruction selection 
pass). Basically it's exactly what default backend in the r600g does. I 
thought about writing direct translator from TGSI to my IR, but it would 
require some time and benefits aren't very clear, except the slightly 
reduced translation time. It's easier to rely on the default backend for 
that, and also it simplifies debugging by providing the ability to see 
and compare both the source (after default backend) and optimized bytecode.




If that's the case then I don't really see a reason why we shouldn't
merge it, but at least at the beginning it should probably be disabled
by default.


Yes, I agree that it's better to make it disabled as default, it's 
currently enabled in my branch just to simplify testing, but I'll change 
that in case if we'll merge the branch.




On the other hand we should question if there are any optimizations in
there that could be done on earlier stages, something like on the GLSL
level for example?


In theory, yes, some optimizations in this branch are typically used on 
the earlier compilation stages, not on the target machine code. On the 
other hand, there are some differences that might make it harder, e.g. 
many algorithms require SSA form, and though it's possible to do similar 
optimizations without SSA, it would be hard to implement. Also I wanted 
to support both default backend and llvm backend for increased testing 
coverage and to be able to compare the efficiency of the algorithms in 
my experiments etc.


Vadim



Cheers,
Christian.

Am 19.04.2013 16:48, schrieb Vadim Girlin:

Hi,

In the previous status update I said that the r600-sb branch is not
ready to be merged yet, but recently I've done some cleanups and
reworks, and though I haven't finished everything that I planned
initially, I think now it's in a better state and may be considered
for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense
to me.

Although I understand that the development of llvm backend is a
primary goal for the r600g developers, it's a complicated process and
may require quite some time to achieve good results regarding the
shader/compiler performance, and at the same time this branch already
works and provides good results in many cases. That's why I think it
makes sense to merge this branch as a non-default backend at least as
a temporary solution for shader performance problems. We can always
get rid of it if it becomes too much a maintenance burden or when llvm
backend catches up in terms of shader performance and compilation
speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my
best to fix possible issues, and so far there are no known unfixed
issues. I tested it with many apps on evergreen and fixed all issues
with other chips that were reported to me on the list or privately
after the last status announce. There are no piglit regressions on
evergreen when this branch is used with both default and llvm backends.

This code was intentionally separated as much as possible from the
other parts of the driver, basically there are just two functions used
from r600g, and the shader code is passed to/from r600-sb as a
hardware bytecode that is not going to change. I think it won't
require any modifications at all to keep it in sync with the most
changes in r600g.

Some work might be required though if we'll want to add support for
the new hw features that are currently unused, e.g. geometry shaders,
new instruction types for compute shaders, etc, but I think I'll be
able to catch up when it's implemented in the driver and default or
llvm backend. E.g. this branch already works for me on evergreen with
some simple OpenCL kernels, including bfgminer where it increases
performance of the kernel compiled with llvm backend by more than 20%
for me.

Besides the performance benefits, I think that alternative backend
also might help with debugging of the default or llvm backend, in some
cases it helped me by exposing the bugs that are not very obvious
otherwise, e.g. it may be hard to compare the dumps from default and
llvm backend to spot the regression because they are too different,
but after processing both shaders with r600-sb the code is usually
transformed to some more common form, and often this makes it easier
to compare and find the differences in shader logic.

One additional feature that might help with llvm backend debugging is
the disassembler that works on the hardware bytecode instead of the
internal r600g bytecode

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-19 Thread Vadim Girlin

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the 
notes.markdown file [1] in that branch, there are also links in the end 
to the full description of some algorithms, though some of them were 
modified/adapted for this branch.



It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global Code 
Motion, [2]), which probably could be also called global scheduler. In 
fact in my branch this pass is combined with some hw-specific scheduling 
logic, e.g. grouping fetch/alu instructions to reduce clause type 
switching in the code and the number of required CF instructions, 
potentially it can also schedule clauses to expose more parallelism with 
the BARRIER bit usage.


Vadim

 [1] 
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb
 [2] 
http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf



Regards,
Christian.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] r600g: Workaround for a harware bug with nested loops on Cayman

2013-04-16 Thread Vadim Girlin

On 04/15/2013 11:22 PM, Martin Andersson wrote:

There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
LOOP_STARTxxx for nested loops may put the branch stack into a state
such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
by replacing the ALU_PUSH_BEFORE with a PUSH + ALU

Fixes piglit tests EXT_transform_feedback/order*

v2: Use existing loop count and improve comment
---
  src/gallium/drivers/r600/r600_shader.c | 17 ++---
  1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..f4398fd 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
return 0;
  }

-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int 
alu_type)
  {
struct r600_bytecode_alu alu;
int r;
@@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, 
int opcode)

alu.last = 1;

-   r = r600_bytecode_add_alu_type(ctx-bc, alu, CF_OP_ALU_PUSH_BEFORE);
+   r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type);
if (r)
return r;
return 0;
@@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct r600_shader_ctx 
*ctx, unsigned fc_sp)

  static int tgsi_if(struct r600_shader_ctx *ctx)
  {
-   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+   int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+   /* There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
+* LOOP_STARTxxx for nested loops may put the branch stack into a state
+* such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
+* by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */
+   if (ctx-bc-chip_class == CAYMAN  ctx-bc-stack.loop  1) {
+   r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH);


Oh, it seems I overlooked potential issue here: jump address for PUSH is 
not set properly, so I guess there will be GPU lockups in case of a 
jump. Ideally we could set it to jump over the whole IF-ENDIF block if 
there are no active threads, but I think it's a rare case, so simplest 
fix is to avoid computation of the address and set jump address for PUSH 
to the next instruction, like this:


ctx-bc-cf_last-cf_addr = ctx-bc-cf_last-id + 2;

We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all 
so I think at least we won't have any serious performance regressions.


Everything else looks ok, so I think I'll commit your patch with this 
change soon if there are no objections.


Vadim


+   alu_type = CF_OP_ALU;
+   }
+
+   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);

r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP);




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman

2013-04-15 Thread Vadim Girlin

On 04/15/2013 10:52 AM, Martin Andersson wrote:

On Mon, Apr 15, 2013 at 1:09 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 04/15/2013 01:05 AM, Martin Andersson wrote:


There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested
loops may put the branch stack into a state such that ALU_PUSH_BEFORE
doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE
with a PUSH + ALU for nested loops.

Fixes piglit tests:
spec/!OpenGL 1.1/read-front
spec/EXT_transform_feedback/order*
spec/glsl-1.40/uniform_buffer/fs-struct-pad

No piglit regressions.
---
   src/gallium/drivers/r600/r600_shader.c | 33
++---
   1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..aee011e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx);
   static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
   static int tgsi_endloop(struct r600_shader_ctx *ctx);
   static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx);

   /*
* bytestream - r600 shader
@@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
 return 0;
   }

-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int
alu_type)
   {
 struct r600_bytecode_alu alu;
 int r;
@@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx
*ctx, int opcode)

 alu.last = 1;

-   r = r600_bytecode_add_alu_type(ctx-bc, alu,
CF_OP_ALU_PUSH_BEFORE);
+   r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type);
 if (r)
 return r;
 return 0;
@@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct
r600_shader_ctx *ctx, unsigned fc_sp)

   static int tgsi_if(struct r600_shader_ctx *ctx)
   {
-   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+   int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+   /*
+  There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx
for nested
+  loops may put the branch stack into a state such that
ALU_PUSH_BEFORE
+  doesn't work as expected. Workaround this by replacing the
ALU_PUSH_BEFORE
+  with a PUSH + ALU for nested loops.
+*/
+   if (ctx-bc-chip_class == CAYMAN 
need_cayman_loop_bug_workaround(ctx)) {



We already have current loop level for the stack size computation, see
r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call may be
replaced with ctx-bc-stack.loop  1, if I'm not missing something.


Ok, will try that tonight. Should I add a comment that it is a hardware bug?


Yes, you might want to clarify that it's a hw bug on cayman, though I 
think it's OK either way. Also git complains about some trailing spaces 
in your patch.


With that change for condition (and removal of the need_cayman_... 
function that becomes unused) and fixed whitespace issues, the patch 
looks good to me.


Vadim




Vadim



+   r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH);
+   alu_type = CF_OP_ALU;
+   }
+
+   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);

 r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP);

@@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct
r600_shader_ctx *ctx)
 return 0;
   }

+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx)
+{
+   unsigned int fscp;
+   int num_loops = 0;
+   for (fscp = ctx-bc-fc_sp; fscp  0; fscp--)
+   {
+   if (FC_LOOP == ctx-bc-fc_stack[fscp].type)
+   ++num_loops;
+   }
+
+   return num_loops = 2;
+}
+
   static int tgsi_umad(struct r600_shader_ctx *ctx)
   {
 struct tgsi_full_instruction *inst =
ctx-parse.FullToken.FullInstruction;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: handle drirc disable_glsl_line_continuations option

2013-04-15 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/include/state_tracker/st_api.h  | 1 +
 src/gallium/state_trackers/dri/common/dri_context.c | 2 ++
 src/gallium/state_trackers/dri/common/dri_screen.c  | 3 ++-
 src/mesa/state_tracker/st_extensions.c  | 3 +++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/gallium/include/state_tracker/st_api.h 
b/src/gallium/include/state_tracker/st_api.h
index 9f3d2a1..52c9dc0 100644
--- a/src/gallium/include/state_tracker/st_api.h
+++ b/src/gallium/include/state_tracker/st_api.h
@@ -240,6 +240,7 @@ struct st_visual
 struct st_config_options
 {
boolean force_glsl_extensions_warn;
+   boolean disable_glsl_line_continuations;
 };
 
 /**
diff --git a/src/gallium/state_trackers/dri/common/dri_context.c 
b/src/gallium/state_trackers/dri/common/dri_context.c
index 49cd794..58a710d 100644
--- a/src/gallium/state_trackers/dri/common/dri_context.c
+++ b/src/gallium/state_trackers/dri/common/dri_context.c
@@ -54,6 +54,8 @@ static void dri_fill_st_options(struct st_config_options 
*options,
 {
options-force_glsl_extensions_warn =
   driQueryOptionb(optionCache, force_glsl_extensions_warn);
+   options-disable_glsl_line_continuations =
+  driQueryOptionb(optionCache, disable_glsl_line_continuations);
 }
 
 GLboolean
diff --git a/src/gallium/state_trackers/dri/common/dri_screen.c 
b/src/gallium/state_trackers/dri/common/dri_screen.c
index 2f525a2..fd2971c 100644
--- a/src/gallium/state_trackers/dri/common/dri_screen.c
+++ b/src/gallium/state_trackers/dri/common/dri_screen.c
@@ -66,6 +66,7 @@ PUBLIC const char __driConfigOptions[] =
 
   DRI_CONF_SECTION_DEBUG
  DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false)
+ DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false)
   DRI_CONF_SECTION_END
 
   DRI_CONF_SECTION_MISCELLANEOUS
@@ -75,7 +76,7 @@ PUBLIC const char __driConfigOptions[] =
 
 #define false 0
 
-static const uint __driNConfigOptions = 11;
+static const uint __driNConfigOptions = 12;
 
 static const __DRIconfig **
 dri_fill_in_modes(struct dri_screen *screen)
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index f986480..ffb9f7e 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -714,6 +714,9 @@ void st_init_extensions(struct st_context *st)
if (st-options.force_glsl_extensions_warn)
   ctx-Const.ForceGLSLExtensionsWarn = 1;
 
+   if (st-options.disable_glsl_line_continuations)
+  ctx-Const.DisableGLSLLineContinuations = 1;
+
ctx-Const.MinMapBufferAlignment =
   screen-get_param(screen, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT);
if (ctx-Const.MinMapBufferAlignment = 64) {
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-14 Thread Vadim Girlin

On 04/13/2013 09:54 PM, Martin Andersson wrote:

On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 04/12/2013 11:36 PM, Martin Andersson wrote:


I have made some progress with this issue.

Vadim, I did as you suggested and tried to mimic the output from the
shader analyser
tool. I used your patch as a base and then tried various ways to see
what would work.
After many tries (and lockups) I did managed to get the
ext_transform_feedback/order
test to pass.

It is a very ugly patch but it should illustrate what the problem (and
potential solution) is.

Your test program fails however because explicit break statements do
not work. It
should be possible to use the same code for the explicit breaks as for
the implicit
loop break.The reason it does not is that I detect the implicit break
with a hack and
it does notwork for explicit breaks.

The problem is that I need to detect the break statement when creating the
corresponding if statement. So that I can treat it differently than
other regular if
statements. Anyone knows how I could do that, or is this the wrong
approach?



It doesn't work with my test app because IF/ENDIF blocks with BRK may
contain other code, so you can't simply throw away IF/ENDIF making that code
execute unconditionally.


Yeah my hack is not an viable option.


By the way, shader analyzer in some cases also produces the code with
JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really
work as expected with catalyst. Possibly we're simply missing something in
the hardware configuration.

Also there is one thing that I didn't take into account in my initial patch
- r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain
why my initial patch doesn't work. Possibly if we prevent that optimization
for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough
to make it work. I'm attaching the additional patch that will force POP to
be a separate instruction in this case, please test it (on top of the my
first patch). This would be at least not very intrusive.


No, that patch did not help either.


If this won't help, then I think we should understand what exactly we are
trying to fix before implementing any big changes, possibly there is a
better solution or at least a more clean workaround. In the worst case we
can return to your approach and improve it to handle other cases.


I'm starting to think that there is nothing wrong with the shader
compiler. It seems to me that a push, pop inside a nested loop clears
the break status on a thread.

shift_reg = 1u;
count = 0u;
while (true) {
 if (x == 1u)
 break;
  while (true) {
  if (x != 1u)
   count = 10u;
  if (x == 1u)
   count = 20u;
  break;
  }
  shift_reg = 2u;
  break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 20
expected output: shift_reg == 1, count == 0

If I swap the if statements in the inner loop I get different results.

shift_reg = 1u;
count = 0u;
while (true) {
 if (x == 1u)
 break;
  while (true) {
  if (x == 1u)
   count = 20u;
  if (x != 1u)
   count = 10u;
  break;
  }
  shift_reg = 2u;
  break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 0
expected output: shift_reg == 1, count == 0

I tested both cases on mesa master and mesa master + Vadims two
patches with the same results.



This turned out to be a known issue with cayman: BREAK/CONTINUE followed 
by LOOP_STARTxxx for nested loop may put the branch stack into the state 
such that ALU_PUSH_BEFORE doesn't work as expected.


It seems the simplest workaround is either to avoid ALU_PUSH_BEFORE in 
nested loops completely or to replace it with separate PUSH and ALU.


We can check if we actually have BREAK/CONTINUE in the outer loop before 
LOOP_START for the inner loop, but I think it will be true in most 
cases, so the simplest fix for r600g is to replace all ALU_PUSH_BEFORE 
with PUSH + ALU in the nested loops on cayman.


Vadim


//Martin


Vadim



//Martin

On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin vadimgir...@gmail.com
wrote:


On 04/11/2013 02:08 AM, Marek Olšák wrote:



Here's the output:

creating vs ...
shader compilation status: OK
creating fs ...
shader compilation status: OK
thread #0 (0;0) : ref = 16608
thread #1 (1;0) : ref = 27873
thread #2 (0;1) : ref = 16608
thread #3 (1;1) : ref = 27877
results:
thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
thread 1 (1, 0): expected = 27873, observed = 27873, OK
thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
thread 3 (1, 1): expected = 27877, observed = 27877, OK



Thanks. According to these results, it looks like

Re: [Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman

2013-04-14 Thread Vadim Girlin

On 04/15/2013 01:05 AM, Martin Andersson wrote:

There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested
loops may put the branch stack into a state such that ALU_PUSH_BEFORE
doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE
with a PUSH + ALU for nested loops.

Fixes piglit tests:
spec/!OpenGL 1.1/read-front
spec/EXT_transform_feedback/order*
spec/glsl-1.40/uniform_buffer/fs-struct-pad

No piglit regressions.
---
  src/gallium/drivers/r600/r600_shader.c | 33 ++---
  1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..aee011e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx);
  static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
  static int tgsi_endloop(struct r600_shader_ctx *ctx);
  static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx);

  /*
   * bytestream - r600 shader
@@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
return 0;
  }

-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int 
alu_type)
  {
struct r600_bytecode_alu alu;
int r;
@@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, 
int opcode)

alu.last = 1;

-   r = r600_bytecode_add_alu_type(ctx-bc, alu, CF_OP_ALU_PUSH_BEFORE);
+   r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type);
if (r)
return r;
return 0;
@@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct r600_shader_ctx 
*ctx, unsigned fc_sp)

  static int tgsi_if(struct r600_shader_ctx *ctx)
  {
-   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+   int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+   /*
+  There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for 
nested
+  loops may put the branch stack into a state such that ALU_PUSH_BEFORE
+  doesn't work as expected. Workaround this by replacing the 
ALU_PUSH_BEFORE
+  with a PUSH + ALU for nested loops.
+*/
+   if (ctx-bc-chip_class == CAYMAN  
need_cayman_loop_bug_workaround(ctx)) {


We already have current loop level for the stack size computation, see 
r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call 
may be replaced with ctx-bc-stack.loop  1, if I'm not missing 
something.


Vadim


+   r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH);
+   alu_type = CF_OP_ALU;
+   }
+
+   emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);

r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP);

@@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx 
*ctx)
return 0;
  }

+static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx)
+{
+   unsigned int fscp;
+   int num_loops = 0;
+   for (fscp = ctx-bc-fc_sp; fscp  0; fscp--)
+   {
+   if (FC_LOOP == ctx-bc-fc_stack[fscp].type)
+   ++num_loops;
+   }
+
+   return num_loops = 2;
+}
+
  static int tgsi_umad(struct r600_shader_ctx *ctx)
  {
struct tgsi_full_instruction *inst = 
ctx-parse.FullToken.FullInstruction;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-12 Thread Vadim Girlin

On 04/12/2013 11:36 PM, Martin Andersson wrote:

I have made some progress with this issue.

Vadim, I did as you suggested and tried to mimic the output from the
shader analyser
tool. I used your patch as a base and then tried various ways to see
what would work.
After many tries (and lockups) I did managed to get the
ext_transform_feedback/order
test to pass.

It is a very ugly patch but it should illustrate what the problem (and
potential solution) is.

Your test program fails however because explicit break statements do
not work. It
should be possible to use the same code for the explicit breaks as for
the implicit
loop break.The reason it does not is that I detect the implicit break
with a hack and
it does notwork for explicit breaks.

The problem is that I need to detect the break statement when creating the
corresponding if statement. So that I can treat it differently than
other regular if
statements. Anyone knows how I could do that, or is this the wrong approach?



It doesn't work with my test app because IF/ENDIF blocks with BRK may 
contain other code, so you can't simply throw away IF/ENDIF making that 
code execute unconditionally.


By the way, shader analyzer in some cases also produces the code with 
JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will 
really work as expected with catalyst. Possibly we're simply missing 
something in the hardware configuration.


Also there is one thing that I didn't take into account in my initial 
patch - r600g converts ALU followed by POP to ALU_POP_AFTER and this 
might explain why my initial patch doesn't work. Possibly if we prevent 
that optimization for ALU containing PRED_SET-BREAK and leave separate 
POP, it might be enough to make it work. I'm attaching the additional 
patch that will force POP to be a separate instruction in this case, 
please test it (on top of the my first patch). This would be at least 
not very intrusive.


If this won't help, then I think we should understand what exactly we 
are trying to fix before implementing any big changes, possibly there is 
a better solution or at least a more clean workaround. In the worst case 
we can return to your approach and improve it to handle other cases.


Vadim


//Martin

On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin vadimgir...@gmail.com wrote:

On 04/11/2013 02:08 AM, Marek Olšák wrote:


Here's the output:

creating vs ...
shader compilation status: OK
creating fs ...
shader compilation status: OK
thread #0 (0;0) : ref = 16608
thread #1 (1;0) : ref = 27873
thread #2 (0;1) : ref = 16608
thread #3 (1;1) : ref = 27877
results:
   thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
   thread 1 (1, 0): expected = 27873, observed = 27873, OK
   thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
   thread 3 (1, 1): expected = 27877, observed = 27877, OK



Thanks. According to these results, it looks like LOOP_START_DX10 for inner
loop somehow reactivates the threads that were put into inactive-break state
by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the inner
loop doesn't work as expected in this case. In other words, it looks weird.

I can't explain why would this happen. It might be interesting to run these
tests with llvm backend to see if there are any differences.

Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP in
the PRED_SET encoding as in my earlier patch, but without any stack push/pop
operations and jumps (where it's possible), closer to what the catalyst
(shader analyzer) does. I'm not sure if it will help though, and anyway
we'll need stack operations in some cases, so I'm afraid this won't fix the
issue completely.

So far I have no other ideas.

Vadim


Marek


On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin
vadimgir...@gmail.comwrote:


On 04/10/2013 01:53 PM, Marek Olšák wrote:


glsl-fs-loop-nested passes here.

nstack is 3 and adding 4 to it doesn't help.



Ok, thanks.

Also I wrote a simple test app that should reproduce the issue if it's
really related to diverging control flow with nested loops and might more
information about what's going wrong.

The source is in the attachment and needs to be compiled with -lGL -lglut
-lGLEW. The app renders four points and computes some value for each
point
in the loops similar to the transform feedback order test, but it doesn't
use tfb. It should render four green or red squares depending on
correctness of the result.

Here is the correct output produced for me on evergreen:

   thread 0 (0, 0): expected = 16608, observed = 16608, OK
   thread 1 (1, 0): expected = 27873, observed = 27873, OK
   thread 2 (0, 1): expected = 16608, observed = 16608, OK
   thread 3 (1, 1): expected = 27877, observed = 27877, OK

Please post the output if it fails on cayman.

Vadim




Marek


On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com
wrote:

   On 04/10/2013 03:58 AM, Marek Olšák wrote:



   Hi Vadim,



your patch does not fix the test.



Hmm

Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-11 Thread Vadim Girlin

On 04/11/2013 02:08 AM, Marek Olšák wrote:

Here's the output:

creating vs ...
shader compilation status: OK
creating fs ...
shader compilation status: OK
thread #0 (0;0) : ref = 16608
thread #1 (1;0) : ref = 27873
thread #2 (0;1) : ref = 16608
thread #3 (1;1) : ref = 27877
results:
  thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
  thread 1 (1, 0): expected = 27873, observed = 27873, OK
  thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
  thread 3 (1, 1): expected = 27877, observed = 27877, OK



Thanks. According to these results, it looks like LOOP_START_DX10 for 
inner loop somehow reactivates the threads that were put into 
inactive-break state by the LOOP_BREAK in the outer loop. Also it seems 
LOOP_BREAK in the inner loop doesn't work as expected in this case. In 
other words, it looks weird.


I can't explain why would this happen. It might be interesting to run 
these tests with llvm backend to see if there are any differences.


Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP 
in the PRED_SET encoding as in my earlier patch, but without any stack 
push/pop operations and jumps (where it's possible), closer to what the 
catalyst (shader analyzer) does. I'm not sure if it will help though, 
and anyway we'll need stack operations in some cases, so I'm afraid this 
won't fix the issue completely.


So far I have no other ideas.

Vadim


Marek


On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin vadimgir...@gmail.comwrote:


On 04/10/2013 01:53 PM, Marek Olšák wrote:


glsl-fs-loop-nested passes here.

nstack is 3 and adding 4 to it doesn't help.



Ok, thanks.

Also I wrote a simple test app that should reproduce the issue if it's
really related to diverging control flow with nested loops and might more
information about what's going wrong.

The source is in the attachment and needs to be compiled with -lGL -lglut
-lGLEW. The app renders four points and computes some value for each point
in the loops similar to the transform feedback order test, but it doesn't
use tfb. It should render four green or red squares depending on
correctness of the result.

Here is the correct output produced for me on evergreen:

  thread 0 (0, 0): expected = 16608, observed = 16608, OK
  thread 1 (1, 0): expected = 27873, observed = 27873, OK
  thread 2 (0, 1): expected = 16608, observed = 16608, OK
  thread 3 (1, 1): expected = 27877, observed = 27877, OK

Please post the output if it fails on cayman.

Vadim




Marek


On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com
wrote:

  On 04/10/2013 03:58 AM, Marek Olšák wrote:


  Hi Vadim,


your patch does not fix the test.



Hmm, I'm out of ideas then. Thanks for testing.

I've checked the shader dump few times but I don't see anything obviously
wrong there, and the same code (except the minor ALU grouping changes due
to the VLIW4/VLIW5 difference) works fine for me on evergreen.

According to the Martin's observations it looks like if the threads that
shouldn't execute the loop body were incorrectly left in the active
state.
LOOP_BREAK should put them into the inactive-break state, but something
goes wrong. Do the other piglit tests with nested loops (e.g.
glsl-fs-loop-nested) work on cayman? Though possibly there are no other
tests with the diverging loops as in this case.

I'll try to write a simpler test with the diverging loops to see if the
issue is really caused by the incorrect control flow handling, and to
figure out the exact instruction that results in the incorrect active
state.

Also probably it worth checking if the stack size is correct for that
shader (latest mesa should print nstack value in the shader disassemble
header, I think it should be 3 for that shader) and maybe try adding some
constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just to be
sure that we reserve enough of stack space, though I don't think stack
size
is the cause of this issue.

Vadim



  Marek



On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com
wrote:

   On 04/09/2013 10:58 AM, Martin Andersson wrote:



   On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com
wrote:



   Pushed, thanks. The transform feedback test still doesn't pass, but
at


least
the hardlocks are gone.


  Thanks, I have looked into the other issue as well

http://lists.freedesktop.org/**archives/mesa-dev/2013-**March/**http://lists.freedesktop.org/archives/mesa-dev/2013-March/**
**036941.htmlhttp://lists.**freedesktop.org/**archives/**
mesa-dev/2013-March/**036941.**htmlhttp://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html



http://lists.**freedesktop.**org/archives/mesa-**http://freedesktop.org/archives/mesa-**
dev/2013-March/036941.htmlhtt**p://lists.freedesktop.org/**
archives/mesa-dev/2013-March/**036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html









The problem arises when there are nested loops. If I rework the code
so there are
no nested

Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-10 Thread Vadim Girlin

On 04/10/2013 03:58 AM, Marek Olšák wrote:

Hi Vadim,

your patch does not fix the test.


Hmm, I'm out of ideas then. Thanks for testing.

I've checked the shader dump few times but I don't see anything 
obviously wrong there, and the same code (except the minor ALU grouping 
changes due to the VLIW4/VLIW5 difference) works fine for me on evergreen.


According to the Martin's observations it looks like if the threads that 
shouldn't execute the loop body were incorrectly left in the active 
state. LOOP_BREAK should put them into the inactive-break state, but 
something goes wrong. Do the other piglit tests with nested loops (e.g. 
glsl-fs-loop-nested) work on cayman? Though possibly there are no other 
tests with the diverging loops as in this case.


I'll try to write a simpler test with the diverging loops to see if the 
issue is really caused by the incorrect control flow handling, and to 
figure out the exact instruction that results in the incorrect active state.


Also probably it worth checking if the stack size is correct for that 
shader (latest mesa should print nstack value in the shader disassemble 
header, I think it should be 3 for that shader) and maybe try adding 
some constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just 
to be sure that we reserve enough of stack space, though I don't think 
stack size is the cause of this issue.


Vadim




Marek


On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com wrote:


On 04/09/2013 10:58 AM, Martin Andersson wrote:


On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote:


Pushed, thanks. The transform feedback test still doesn't pass, but at
least
the hardlocks are gone.



Thanks, I have looked into the other issue as well
http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html

The problem arises when there are nested loops. If I rework the code
so there are
no nested loops the issue disappears. At least one pixel also needs to
enter the
outer loop. The pixels that should enter the outer loop behaves
correctly. It is those
pixels that should not enter the outer loop that misbehaves. It does
not matter if they
also fails the test for the inner loop, they will still execute the
instruction inside. That
leads to the strange results for that test.



Please test the attached patch.

Vadim



The strangeness is easier to see if the NUM_POINTS in the
ext_transform_feedback/
order.c are run with smaller values,like 3, 6 and 9. Disable the code
that fail the test
and print starting_x, shift_reg_final and iteration_count.

Marek, since you implemented transform feedback for r600, do you think
the issue
is with the tranform feedback code or the shader compiler or some other
thing?

//Martin
__**_
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] GLSL compiler bug

2013-04-10 Thread Vadim Girlin

Hi,

It seems there is a bug in the compiler. The problem may be reproduced 
with the following shader (complete shader_test file attached):


void main()
{
float f = 0.0;
while (true) {
f = 1.0;
break;
f = 0.5;
}
gl_FragColor = vec4(1.0 - f, f, 0.0, 1.0);
}

The result of compilation is equal to:

while (true) {
break;
}
gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);

In other words, GLSL compiler eliminates both assignments to f in the 
loop body and the resulting value of the f variable is 0.


Vadim

[require]
GLSL = 1.20

[vertex shader]
void main()
{
gl_Position = gl_Vertex;
}

[fragment shader]
void main()
{
float f = 0.0;
while (true) {
f = 1.0;
break;
f = 0.5;
}
gl_FragColor = vec4(1.0 - f, f, 0.0, 1.0);
}

[test]
clear color 0.0 0.0 0.0 0.0
clear

draw rect -1 -1 2 2
probe all rgba 0.0 1.0 0.0 1.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-10 Thread Vadim Girlin

On 04/10/2013 01:53 PM, Marek Olšák wrote:

glsl-fs-loop-nested passes here.

nstack is 3 and adding 4 to it doesn't help.


Ok, thanks.

Also I wrote a simple test app that should reproduce the issue if it's 
really related to diverging control flow with nested loops and might 
more information about what's going wrong.


The source is in the attachment and needs to be compiled with -lGL 
-lglut -lGLEW. The app renders four points and computes some value for 
each point in the loops similar to the transform feedback order test, 
but it doesn't use tfb. It should render four green or red squares 
depending on correctness of the result.


Here is the correct output produced for me on evergreen:

 thread 0 (0, 0): expected = 16608, observed = 16608, OK
 thread 1 (1, 0): expected = 27873, observed = 27873, OK
 thread 2 (0, 1): expected = 16608, observed = 16608, OK
 thread 3 (1, 1): expected = 27877, observed = 27877, OK

Please post the output if it fails on cayman.

Vadim




Marek


On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com wrote:


On 04/10/2013 03:58 AM, Marek Olšák wrote:


Hi Vadim,

your patch does not fix the test.



Hmm, I'm out of ideas then. Thanks for testing.

I've checked the shader dump few times but I don't see anything obviously
wrong there, and the same code (except the minor ALU grouping changes due
to the VLIW4/VLIW5 difference) works fine for me on evergreen.

According to the Martin's observations it looks like if the threads that
shouldn't execute the loop body were incorrectly left in the active state.
LOOP_BREAK should put them into the inactive-break state, but something
goes wrong. Do the other piglit tests with nested loops (e.g.
glsl-fs-loop-nested) work on cayman? Though possibly there are no other
tests with the diverging loops as in this case.

I'll try to write a simpler test with the diverging loops to see if the
issue is really caused by the incorrect control flow handling, and to
figure out the exact instruction that results in the incorrect active state.

Also probably it worth checking if the stack size is correct for that
shader (latest mesa should print nstack value in the shader disassemble
header, I think it should be 3 for that shader) and maybe try adding some
constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just to be
sure that we reserve enough of stack space, though I don't think stack size
is the cause of this issue.

Vadim




Marek


On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com
wrote:

  On 04/09/2013 10:58 AM, Martin Andersson wrote:


  On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote:


  Pushed, thanks. The transform feedback test still doesn't pass, but at

least
the hardlocks are gone.



Thanks, I have looked into the other issue as well
http://lists.freedesktop.org/archives/mesa-dev/2013-March/**
**036941.htmlhttp://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html
http://lists.**freedesktop.org/archives/mesa-**
dev/2013-March/036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html





The problem arises when there are nested loops. If I rework the code
so there are
no nested loops the issue disappears. At least one pixel also needs to
enter the
outer loop. The pixels that should enter the outer loop behaves
correctly. It is those
pixels that should not enter the outer loop that misbehaves. It does
not matter if they
also fails the test for the inner loop, they will still execute the
instruction inside. That
leads to the strange results for that test.



Please test the attached patch.

Vadim


  The strangeness is easier to see if the NUM_POINTS in the

ext_transform_feedback/
order.c are run with smaller values,like 3, 6 and 9. Disable the code
that fail the test
and print starting_x, shift_reg_final and iteration_count.

Marek, since you implemented transform feedback for r600, do you think
the issue
is with the tranform feedback code or the shader compiler or some other
thing?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-devhttp://lists.freedesktop.org/**mailman/listinfo/mesa-dev
htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev







__**_
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev











#include stdio.h
#include stdlib.h
#include GL/glew.h
#include GL/glut.h

const char *vss =
#version 130\n
in int x, y, ref;
flat out int b, fref;
void main() {
b = 0;
int i = 0, j = 0;
b |= 32;
while (true) {
b |= 64

Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-09 Thread Vadim Girlin

On 04/09/2013 10:58 AM, Martin Andersson wrote:

On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote:

Pushed, thanks. The transform feedback test still doesn't pass, but at least
the hardlocks are gone.


Thanks, I have looked into the other issue as well
http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html

The problem arises when there are nested loops. If I rework the code
so there are
no nested loops the issue disappears. At least one pixel also needs to enter the
outer loop. The pixels that should enter the outer loop behaves
correctly. It is those
pixels that should not enter the outer loop that misbehaves. It does
not matter if they
also fails the test for the inner loop, they will still execute the
instruction inside. That
leads to the strange results for that test.


Please test the attached patch.

Vadim



The strangeness is easier to see if the NUM_POINTS in the
ext_transform_feedback/
order.c are run with smaller values,like 3, 6 and 9. Disable the code
that fail the test
and print starting_x, shift_reg_final and iteration_count.

Marek, since you implemented transform feedback for r600, do you think the issue
is with the tranform feedback code or the shader compiler or some other thing?

//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



From 46456ca7ecfa3f0b107b1f9106d024f9f239a571 Mon Sep 17 00:00:00 2001
From: Vadim Girlin vadimgir...@gmail.com
Date: Wed, 10 Apr 2013 01:20:19 +0400
Subject: [PATCH] r600g: use ALU EXECUTE_MASK_OP on cayman instead of
 LOOP_BREAK/CONTINUE

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_asm.c| 14 --
 src/gallium/drivers/r600/r600_shader.c | 24 +++-
 src/gallium/drivers/r600/r600d.h   |  5 +
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c
index 26a848a..2874adf 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -1985,6 +1985,7 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 		LIST_FOR_EACH_ENTRY(alu, cf-alu, list) {
 			const char *omod_str[] = {,*2,*4,/2};
 			const struct alu_op_info *aop = r600_isa_alu(alu-op);
+			bool cm_execmask_op = alu-execute_mask  bc-chip_class == CAYMAN;
 			int o = 0;
 
 			r600_bytecode_alu_nliterals(bc, alu, literal, nliteral);
@@ -1997,8 +1998,10 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 	alu-update_pred ? 'P':' ',
 	alu-pred_sel ? alu-pred_sel==2 ? '0':'1':' ');
 
-			o += fprintf(stderr, %s%s%s , aop-name,
-	omod_str[alu-omod], alu-dst.clamp ? _sat:);
+			o += fprintf(stderr, %s , aop-name);
+			if (!cm_execmask_op)
+o += fprintf(stderr, %s , omod_str[alu-omod]);
+			o += fprintf(stderr, %s , alu-dst.clamp ? _sat:);
 
 			o += print_indent(o,60);
 			o += print_dst(alu);
@@ -2012,6 +2015,13 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 o += fprintf(stderr,   BS:%d, alu-bank_swizzle);
 			}
 
+			if (cm_execmask_op  alu-omod) {
+static const char* cm_em_op_names[] =
+	{BREAK, CONTINUE, KILL};
+
+fprintf(stderr,   %s, cm_em_op_names[alu-omod - 1]);
+			}
+
 			fprintf(stderr, \n);
 			id += 2;
 
diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index f801707..d1cac36 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5827,7 +5827,29 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx)
 		return -EINVAL;
 	}
 
-	r600_bytecode_add_cfinst(ctx-bc, ctx-inst_info-op);
+
+	if (ctx-bc-chip_class == CAYMAN) {
+		struct r600_bytecode_alu alu = {};
+		int r;
+
+		alu.op = ALU_OP2_PRED_SETE;
+		alu.src[0].sel = V_SQ_ALU_SRC_0;
+		alu.src[1].sel = V_SQ_ALU_SRC_1;
+
+		if (ctx-inst_info-op == CF_OP_LOOP_BREAK)
+			alu.omod = SQ_ALU_EXECUTE_MASK_OP_BREAK;
+		else
+			alu.omod = SQ_ALU_EXECUTE_MASK_OP_CONTINUE;
+
+		alu.execute_mask = 1;
+		alu.last = 1;
+
+		r = r600_bytecode_add_alu(ctx-bc, alu);
+		if (r)
+			return r;
+	} else {
+		r600_bytecode_add_cfinst(ctx-bc, ctx-inst_info-op);
+	}
 
 	fc_set_mid(ctx, fscp);
 
diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h
index 9b31383..679dd81 100644
--- a/src/gallium/drivers/r600/r600d.h
+++ b/src/gallium/drivers/r600/r600d.h
@@ -3698,4 +3698,9 @@
 #define DMA_PACKET_CONSTANT_FILL	0xd /* 7xx only */
 #define DMA_PACKET_NOP			0xf
 
+#define SQ_ALU_EXECUTE_MASK_OP_DEACTIVATE0x0
+#define SQ_ALU_EXECUTE_MASK_OP_BREAK 0x1
+#define SQ_ALU_EXECUTE_MASK_OP_CONTINUE  0x2
+#define SQ_ALU_EXECUTE_MASK_OP_KILL  0x3
+
 #endif
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: add support for compressed texture

2013-04-08 Thread Vadim Girlin

On 04/08/2013 02:03 PM, Marek Olšák wrote:

On Mon, Apr 8, 2013 at 11:29 AM, Michel Dänzer mic...@daenzer.net wrote:


On Fre, 2013-04-05 at 17:36 -0400, j.gli...@gmail.com wrote:

From: Jerome Glisse jgli...@redhat.com

Most test pass, issue are with border color and swizzle.


FWIW, those issues are there with non-compressed formats as well. I'm
afraid we might need to change the hardware border colour depending on
the swizzle.



I don't think so. The issue with the swizzled border color seems to be a
bad hardware design decision present since r600 rather than a hardware bug.
I tried fixing it for older chipsets with no success. I doubt the hw
designers fixed this for SI. The problem is the hardware tries to guess
what the border color swizzle is from the combined pipe_format+sampler view
swizzle combination. You need 2 texture swizzle states in the texture unit
for the border color to be swizzled correctly, because texels must be
swizzled by the pipe_format swizzle and sampler view swizzle, but the
border color must be swizzled by the sampler view only. The main problem is
that the hardware internally tries to undo the pipe_format swizzle in a way
that just doesn't work. I don't remember the exact swizzles being used by
hardware, but I got crazy cases like if I set texture swizzle to ywzx, the
border color will be ywyy. There is no way to access those zx components of
the border color for that specific swizzling. For some cases, the hardware
succeeds in guessing what the border color should be, e.g. if I set texture
swizzle to .zyxw, the returned border color will be .xyzw (and that would
be correct if the swizzle came from pipe_format, and incorrect if the
swizzle came from sampler view).



I also looked into this issue some time ago (on evergreen) and IIRC I 
found that the swizzle is actually applied twice to border color in most 
cases (at least when swizzle_y is not 2 or 3), I think it's just a bug 
(or we are missing something in the hw configuration).


Anyway, according to my tests in many cases (960 of 1296 total swizzles, 
74%) it's possible to apply some precomputed swizzle to border color 
before writing it to the registers to get the correct result in the end, 
but I'm not sure if it makes sense to implement that.


Vadim


It was easy with r300, because I could just undo pipe_format swizzling
before passing the border color to the hardware.

Marek



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-04-01 Thread Vadim Girlin

On 04/02/2013 12:48 AM, Vincent Lejeune wrote:

Btw where can I find some more info on stack_size ?
I assumed it should represent the amout of max stacked exec_mask,
but it looks like it is possible to have much more manually pushed exec_mask 
level
than reported by nstack (iiuc a push count as much as a 1/4 of a loop level).


Yes, different instructions consume different amount of stack space. 
There is an explanation in the ISA docs, section 3.6.5 Stack 
Allocation, it's basically correct but don't expect it to be precise 
regarding the special cases (e.g. in the cayman isa doc comments in the 
table 3.6 look like a copy-paste from r600/r700 docs instead of the 
cayman-specific comments). I've added the additional info that I have 
regarding the special cases for chip generations and my notes as the 
comments in the patch (see callstack_update_max_depth function).


Vadim






- Mail original -

De�: Vadim Girlin vadimgir...@gmail.com
�: Vincent Lejeune v...@ovi.com
Cc�: Alex Deucher alexdeuc...@gmail.com; mesa-dev@lists.freedesktop.org 
mesa-dev@lists.freedesktop.org
Envoy� le : Dimanche 31 mars 2013 22h34
Objet�: Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than 
required v4

On 04/01/2013 12:00 AM, Vincent Lejeune wrote:

  Hi Vadim,

  Does this patch work ? (It's still not pushed)


It works for me on evergreen, but I'm not sure about other chip generations.
I wanted to ask somebody to test it, but the problem is that the piglit coverage
for this is not enough (e.g. initial version of this patch had no regressions
with piglit but resulted in artifacts with Heaven). I thought about adding more
control flow tests but haven't written them yet. The same algorithm
seemingly works in my r600-sb branch with other chips, but the test coverage
with that branch is even lower due to the if-conversion that eliminates most of
the conditional control flow.

I usually prefer not to push any patches until I'm sure that they are not
breaking anything. But well, possibly in this case it's easier to simply
push it and wait for the bug reports. I think I'll check if it needs
rebasing and push it in a day or two if there are no objections.

Vadim


  I'm working on doing native control flow for llvm and intend to port

your patch on the control flow reservation.


  Vincent




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

2013-03-31 Thread Vadim Girlin

On 03/31/2013 01:01 PM, Martin Andersson wrote:

On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin vadimgir...@gmail.com wrote:

On 03/30/2013 05:35 AM, Martin Andersson wrote:


I found an issue with the shader compiler for Cayman when I looked
into why the ext_transform_feedback/order test case caused a GPU stall.
It turned out the stall was an infinite loop that was the result of broken
calculation in the shader function. The issue is that Cayman uses the
tgsi_umad function for UMAD, but that does not work since it does not
populate the y, z and w slots for UMUL that cayman requires.

This patch implements a cayman_umad. There are some things I'm unsure of
though.

The UMUL for Cayman is compiled to, as far as I can tell,
ALU_OP2_MULLO_INT and
not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the
uint
version in cayman_umad. In the patch I used the uint one, because that
seemed
the most logical.



Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
MULLO_UINT should be used.


Ok, I will send a patch for that as well then.





The add part of UMAD I copied from tgsi_umad and that had a loop around
the
variable lasti, but the variable lasti is usally not used in cayman
specific code.



The only difference with umad on cayman is in the mul part - each MULLO_UINT
should be expanded to 4 slots on cayman. Add part doesn't need any changes.



This is used in tgsi functions.
int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);



This is used to determine last written vector component from the write mask,
so that if tgsi instruction doesn't write e.g. W component, we don't have to
emit R600 instruction(s) for that component.




But in cayman specific code this is used instead.
int last_slot = (inst-Dst[0].Register.WriteMask  0x8) ? 4 : 3;



This is used for instructions like RECIP_xxx (see the comment at
r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot
if the write to the W component is required, but MULLO_UINT is different -
it should be expanded to 4 instruction slots always. By the way, it seems
cayman_mul_int_instr is incorrect in this regard.




It does not work to switch lasti with last_slot, since that makes the loop
run too
many times (in my test case lasti is 0 and last_slot is 3). So I just
removed the
loop, was that correct or should i resolve that in some other way?



No, it's not correct, there should be a loop over the vector components for
addition as well - it should be performed in the same way as on the
pre-cayman chips. In your patch you are only performing the addition for one
component.

Basically, the only required change for UMAD on cayman is that you need to
expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on
cayman.


Should I keep the cayman_umad function or should I modify tgsi_umad and
add the cayman specific part there?


I think it's better to modify tgsi_umad (to avoid unnecessary code 
duplication).


Vadim




Vadim




Martin Andersson (1):
r600g: Add a Cayman specific version of UMAD

   src/gallium/drivers/r600/r600_shader.c | 47
+-
   1 file changed, 46 insertions(+), 1 deletion(-)





//Martin



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-03-31 Thread Vadim Girlin

On 04/01/2013 12:00 AM, Vincent Lejeune wrote:

Hi Vadim,

Does this patch work ? (It's still not pushed)


It works for me on evergreen, but I'm not sure about other chip 
generations. I wanted to ask somebody to test it, but the problem is 
that the piglit coverage for this is not enough (e.g. initial version of 
this patch had no regressions with piglit but resulted in artifacts with 
Heaven). I thought about adding more control flow tests but haven't 
written them yet. The same algorithm seemingly works in my r600-sb 
branch with other chips, but the test coverage with that branch is even 
lower due to the if-conversion that eliminates most of the conditional 
control flow.


I usually prefer not to push any patches until I'm sure that they are 
not breaking anything. But well, possibly in this case it's easier to 
simply push it and wait for the bug reports. I think I'll check if it 
needs rebasing and push it in a day or two if there are no objections.


Vadim


I'm working on doing native control flow for llvm and intend to port your patch 
on the control flow reservation.

Vincent


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Possible bug with r600g shader compiler

2013-03-31 Thread Vadim Girlin

On 03/31/2013 04:51 PM, Martin Andersson wrote:

Hi,

I think have found a bug in the r600g shader compiler. I have a AMD 6950
and I'm running mesa from git.

The bug is exercised by the the vertex shader program in piglit
ext_transform_feedback/order.c

I have simplified the shader program so the compiled shader is easier to read:

#version 130
in uint starting_x;
flat out uint starting_x_copy;
flat out uint iteration_count;
flat out uint shift_reg_final;
uniform uint shift_count;

void main()
{
 gl_Position = vec4(0.0);
 uint x = starting_x;
 uint count = 0u;
 uint shift_reg = 1u;
 starting_x_copy = starting_x;
 uint k;
 while (x != 0u) {
 shift_reg = shift_count;
 for (k = 0u; k  shift_count; ++k)
 ++count;
 x = 0u;
 }
 iteration_count = count;
 shift_reg_final = shift_reg;
}

It compiles to, http://pastebin.com/cQ8rbKCv.

input:
shift_count 64
starting_x 0

actual output:
iteration_count 1
shift_reg 1

expected output:
iteration_count 0
shift_reg 1

When the shader is run with starting_x set to 0 the iteration_count output is 1.
That should be impossible since the ++count is inside the while loop guarded
by x != 0. That the iteration_count is 1 and not 64 is also strange, it seems
to somehow have gotten past the while guard but only executed one iteration in
the for loop before exiting again. Another thing to note is that shift_reg
is not set to 64.

If I write 64 instead of shift_count in the for loop (k  64u) (effectivily
optimizing it to 64 add statements instead of a loop) or switch the while
to an if, the program behaves as expected. That leads me to belive that
the issue is with the two nested loops.

The docs mentions something about nested flowcontrol for PRED_SETE_64.

The instruction can also establish a predicate result (execute or skip) for
subsequent predicated instruction execution. This additional control allows a
compiler to support one-instruction issue for if-elseif operations, or an
integer result for nested flow-control, by using single-precision operations
to manipulate a predicate counter.

But the while and for loops are compiled to PRED_SETNE_INT which does not have
that comment. Anyway, I just wanted to include that comment in case it was
relevant.


Predication is not used with the default compiler backend in r600g 
(currently it may be used with the llvm backend only), so it's not 
relevant. Anyway, this comment applies to all PRED_xxx instructions. 
Omitted comment in the docs doesn't mean anything, some things in the 
docs may be even incorrect.




Anyone knows whats wrong or have any ideas for how I could debug it further?


You might want to modify the test to get rid of the transform feedback, 
just to make sure that it's not a transform feedback issue.


Vadim



//Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

2013-03-30 Thread Vadim Girlin

On 03/30/2013 05:35 AM, Martin Andersson wrote:

I found an issue with the shader compiler for Cayman when I looked
into why the ext_transform_feedback/order test case caused a GPU stall.
It turned out the stall was an infinite loop that was the result of broken
calculation in the shader function. The issue is that Cayman uses the
tgsi_umad function for UMAD, but that does not work since it does not
populate the y, z and w slots for UMUL that cayman requires.

This patch implements a cayman_umad. There are some things I'm unsure of though.

The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and
not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint
version in cayman_umad. In the patch I used the uint one, because that seemed
the most logical.


Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK 
MULLO_UINT should be used.




The add part of UMAD I copied from tgsi_umad and that had a loop around the
variable lasti, but the variable lasti is usally not used in cayman specific 
code.



The only difference with umad on cayman is in the mul part - each 
MULLO_UINT should be expanded to 4 slots on cayman. Add part doesn't 
need any changes.



This is used in tgsi functions.
int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);


This is used to determine last written vector component from the write 
mask, so that if tgsi instruction doesn't write e.g. W component, we 
don't have to emit R600 instruction(s) for that component.




But in cayman specific code this is used instead.
int last_slot = (inst-Dst[0].Register.WriteMask  0x8) ? 4 : 3;


This is used for instructions like RECIP_xxx (see the comment at 
r600_shader.c:40) that should be expanded to 3 slots with optional 4th 
slot if the write to the W component is required, but MULLO_UINT is 
different - it should be expanded to 4 instruction slots always. By the 
way, it seems cayman_mul_int_instr is incorrect in this regard.




It does not work to switch lasti with last_slot, since that makes the loop run 
too
many times (in my test case lasti is 0 and last_slot is 3). So I just removed 
the
loop, was that correct or should i resolve that in some other way?


No, it's not correct, there should be a loop over the vector components 
for addition as well - it should be performed in the same way as on the 
pre-cayman chips. In your patch you are only performing the addition for 
one component.


Basically, the only required change for UMAD on cayman is that you need 
to expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots 
on cayman.


Vadim



Martin Andersson (1):
   r600g: Add a Cayman specific version of UMAD

  src/gallium/drivers/r600/r600_shader.c | 47 +-
  1 file changed, 46 insertions(+), 1 deletion(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations

2013-03-28 Thread Vadim Girlin

On 03/28/2013 01:01 PM, � wrote:

Am 27.03.2013 20:37, schrieb Vadim Girlin:

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
  src/gallium/drivers/r600/r600_shader.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 29facf7..d4c9c03 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -874,12 +874,12 @@ static int select_twoside_color(struct
r600_shader_ctx *ctx, int front, int back
  static int tgsi_declaration(struct r600_shader_ctx *ctx)
  {
  struct tgsi_full_declaration *d =
ctx-parse.FullToken.FullDeclaration;
-unsigned i;
-int r;
+int r, i, j, count = d-Range.Last - d-Range.First + 1;
  switch (d-Declaration.File) {
  case TGSI_FILE_INPUT:
-i = ctx-shader-ninput++;
+i = ctx-shader-ninput;
+ctx-shader-ninput += count;
  ctx-shader-input[i].name = d-Semantic.Name;
  ctx-shader-input[i].sid = d-Semantic.Index;
  ctx-shader-input[i].interpolate = d-Interp.Interpolate;
@@ -903,9 +903,15 @@ static int tgsi_declaration(struct
r600_shader_ctx *ctx)
  return r;
  }
  }
+for (j = 1; j  count; ++j) {
+memcpy(ctx-shader-input[i + j], ctx-shader-input[i],
+   sizeof(struct r600_shader_io));


Instead of memcpy, shouldn't an assignment do the trick here as well?


Yes, assignment should work fine, I just used to use memcpy in such 
cases for some reason. I'll replace memcpy with assignment.


Also I think second part (outputs handling) can be dropped for now - 
currently we only need to handle the inputs (for HUD shaders), and later 
when array declarations for inputs/outputs will be implemented in TGSI 
probably we'll need to update the parser in r600g anyway - I'm just not 
sure yet how the semantic indices should be handled for input/output arrays.


Vadim




+ctx-shader-input[i + j].gpr += j;
+}
  break;
  case TGSI_FILE_OUTPUT:
-i = ctx-shader-noutput++;
+i = ctx-shader-noutput;
+ctx-shader-noutput += count;
  ctx-shader-output[i].name = d-Semantic.Name;
  ctx-shader-output[i].sid = d-Semantic.Index;
  ctx-shader-output[i].gpr =
ctx-file_offset[TGSI_FILE_OUTPUT] + d-Range.First;
@@ -933,6 +939,11 @@ static int tgsi_declaration(struct
r600_shader_ctx *ctx)
  break;
  }
  }
+for (j = 1; j  count; ++j) {
+memcpy(ctx-shader-output[i + j], ctx-shader-output[i],
+   sizeof(struct r600_shader_io));


Same here.


+ctx-shader-output[i + j].gpr += j;
+}
  break;
  case TGSI_FILE_CONSTANT:
  case TGSI_FILE_TEMPORARY:


Christian.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Emit CF_ALU and use true kcache register.

2013-03-28 Thread Vadim Girlin

On 03/28/2013 09:47 PM, Vincent Lejeune wrote:

 [snip]


diff --git a/lib/Target/R600/R600RegisterInfo.td 
b/lib/Target/R600/R600RegisterInfo.td
index ce5994c..3ee6623 100644
--- a/lib/Target/R600/R600RegisterInfo.td
+++ b/lib/Target/R600/R600RegisterInfo.td
@@ -43,6 +43,37 @@ foreach Index = 0-127 in {
 Index;
  }

+// KCACHE_BANK0
+foreach Index = 159-128 in {
+  foreach Chan = [ X, Y, Z, W ] in {
+// 32-bit Temporary Registers
+def KC0_#Index#_#Chan : R600RegWithChan KC0[#Index#-128].#Chan, Index, 
Chan;
+  }
+  // 128-bit Temporary Registers
+  def KC0_#Index#_XYZW : R600Reg_128 KC0[#Index#-128].XYZW,
+ [!castRegister(KC0_#Index#_X),
+  !castRegister(KC0_#Index#_Y),
+  !castRegister(KC0_#Index#_Z),
+  !castRegister(KC0_#Index#_W)],
+ Index;
+}
+
+// KCACHE_BANK1
+foreach Index = 191-159 in {


Probably 160 should be used instead of 159 here (and in the two 
occurrences below)?


Vadim


+  foreach Chan = [ X, Y, Z, W ] in {
+// 32-bit Temporary Registers
+def KC1_#Index#_#Chan : R600RegWithChan KC1[#Index#-159].#Chan, Index, 
Chan;
+  }
+  // 128-bit Temporary Registers
+  def KC1_#Index#_XYZW : R600Reg_128 KC1[#Index#-159].XYZW,
+ [!castRegister(KC1_#Index#_X),
+  !castRegister(KC1_#Index#_Y),
+  !castRegister(KC1_#Index#_Z),
+  !castRegister(KC1_#Index#_W)],
+ Index;
+}
+
+
  // Array Base Register holding input in FS
  foreach Index = 448-480 in {
def ArrayBase#Index :  R600RegARRAY_BASE, Index;
@@ -80,6 +111,38 @@ def R600_Addr : RegisterClass AMDGPU, [i32], 127, (add (sequence 
Addr%u_X,

  } // End isAllocatable = 0

+def R600_KC0_X : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC0_%u_X, 128, 159));
+
+def R600_KC0_Y : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC0_%u_Y, 128, 159));
+
+def R600_KC0_Z : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC0_%u_Z, 128, 159));
+
+def R600_KC0_W : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC0_%u_W, 128, 159));
+
+def R600_KC0 : RegisterClass AMDGPU, [f32, i32], 32,
+   (interleave R600_KC0_X, R600_KC0_Y,
+   R600_KC0_Z, R600_KC0_W);
+
+def R600_KC1_X : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC1_%u_X, 160, 191));
+
+def R600_KC1_Y : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC1_%u_Y, 160, 191));
+
+def R600_KC1_Z : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC1_%u_Z, 160, 191));
+
+def R600_KC1_W : RegisterClass AMDGPU, [f32, i32], 32,
+  (add (sequence KC1_%u_W, 160, 191));
+
+def R600_KC1 : RegisterClass AMDGPU, [f32, i32], 32,
+   (interleave R600_KC1_X, R600_KC1_Y,
+   R600_KC1_Z, R600_KC1_W);
+
  def R600_TReg32_X : RegisterClass AMDGPU, [f32, i32], 32,
 (add (sequence T%u_X, 0, 127), AR_X);

diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll
index e8e2bf5..3d70e4b 100644
--- a/test/CodeGen/R600/kcache-fold.ll
+++ b/test/CodeGen/R600/kcache-fold.ll
@@ -1,7 +1,7 @@
  ;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s

  ; CHECK: @main1
-; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
+; CHECK: MOV T{{[0-9]+\.[XYZW], KC0}}
  define void @main1() {
  main_body:
%0 = load 4 x float addrspace(8)* null



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations

2013-03-27 Thread Vadim Girlin
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_shader.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 29facf7..d4c9c03 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -874,12 +874,12 @@ static int select_twoside_color(struct r600_shader_ctx 
*ctx, int front, int back
 static int tgsi_declaration(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_declaration *d = ctx-parse.FullToken.FullDeclaration;
-   unsigned i;
-   int r;
+   int r, i, j, count = d-Range.Last - d-Range.First + 1;
 
switch (d-Declaration.File) {
case TGSI_FILE_INPUT:
-   i = ctx-shader-ninput++;
+   i = ctx-shader-ninput;
+   ctx-shader-ninput += count;
ctx-shader-input[i].name = d-Semantic.Name;
ctx-shader-input[i].sid = d-Semantic.Index;
ctx-shader-input[i].interpolate = d-Interp.Interpolate;
@@ -903,9 +903,15 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
return r;
}
}
+   for (j = 1; j  count; ++j) {
+   memcpy(ctx-shader-input[i + j], 
ctx-shader-input[i],
+  sizeof(struct r600_shader_io));
+   ctx-shader-input[i + j].gpr += j;
+   }
break;
case TGSI_FILE_OUTPUT:
-   i = ctx-shader-noutput++;
+   i = ctx-shader-noutput;
+   ctx-shader-noutput += count;
ctx-shader-output[i].name = d-Semantic.Name;
ctx-shader-output[i].sid = d-Semantic.Index;
ctx-shader-output[i].gpr = ctx-file_offset[TGSI_FILE_OUTPUT] 
+ d-Range.First;
@@ -933,6 +939,11 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
break;
}
}
+   for (j = 1; j  count; ++j) {
+   memcpy(ctx-shader-output[i + j], 
ctx-shader-output[i],
+  sizeof(struct r600_shader_io));
+   ctx-shader-output[i + j].gpr += j;
+   }
break;
case TGSI_FILE_CONSTANT:
case TGSI_FILE_TEMPORARY:
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] Head-up display for Gallium DRI2 drivers

2013-03-26 Thread Vadim Girlin

On 03/26/2013 02:00 AM, Marek Olšák wrote:

On Mon, Mar 25, 2013 at 10:38 PM, Ondrej Holecek aaa...@gmail.com wrote:

On Saturday 23 of March 2013 00:50:59 Marek Olšák wrote:

Hi everyone, one image is better than a thousand words:
...


Hi,

I tried your patches and hit a few problems. As first, they do not apply
cleanly on master as they are expecting another your patch cso: add constant
buffer save/restore feature for postprocessing to be present. But I guess you
are aware of that.


Yes, I sent the patch to mesa-dev earlier.



Second problem is that when I build mesa with HUD on my 32bit virtual machine,
HUD works (with 32bit app of course). When I build it on 64bit (both are same
uptodate OS openSUSE 12.3), HUD is not working (with 64bit app). I managed to
track it down to failed IMM instruction parsing during HUD_create function. It
appears that translate_ctx structure in tgsi_text_translate (file
src/gallium/auxiliary/tgsi/tgsi_text.c) is not initialized to zeros under my
64bit system, instead ctx.num_immediates is equal to 1 and hence trigger
Immediates must be sorted error.
Following fixes HUD for me (note that I really don't know if I am not broking
something here in regards to mesa):

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 6b97bee..247ec75 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1577,6 +1577,7 @@ tgsi_text_translate(
 ctx.tokens = tokens;
 ctx.tokens_cur = tokens;
 ctx.tokens_end = tokens + num_tokens;
+   ctx.num_immediates = 0;

 if (!translate( ctx ))
return FALSE;


I've sent a fix for this a couple of days ago:

http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg36038.html



The third issue is that on both 32bit and 64bit build fonts are not displayed
in HUD. I see graphs and transparent background rectangles for text but no
text is visible. This one I did not yet solve.


Your driver must support the I8_UNORM texture format.


I think this also may be related to unexpected by some drivers TGSI 
declaration of vertex shader inputs:


DCL IN[0..1]

At least r600g expects the separate declaration for each input, though 
fortunately it still works in this case because parsed declarations of 
VS inputs aren't really used in r600g. I noticed exactly the same issue 
(missing text) with my r600-sb branch because it relies on the number of 
the parsed inputs from r600g's tgsi translator. It's 1 in this case 
instead of 2, so second input register is considered undefined and 
optimized away.


I suspect that some other drivers may also handle this declaration 
incorrectly and this may explain the issue.


Vadim





One last thought, is it intentional when wrong query is entered that hud graph
is displayed but empty? Maybe some text like wrong query XXX would be a good
hint. I know it is printed on stdout but looking for warnings in chatty apps
like openarena is little tricky.


Yes, it's intentional. I guess I can at least make it not draw an empty pane.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   >