Re: [Mesa-dev] [PATCH v2] r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling
On 10/16/2017 10:06 PM, Gert Wollny wrote: It is possible that the optimizer ends up in an infinite loop in post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group() does not find a proper scheduling. This can be deducted from pending.count() being larger than zero and not getting smaller. This patch works around this problem by signalling this failure so that the optimizers bails out and the un-optimized shader is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142 Signed-off-by: Gert Wollny--- Change w.r.t. v1: - In schedule_alu() if pending.count() == 0 then don't expect that this value is reduced by a call to prepare_alu_group(), instead continue the loop until it is exited by "break". I've added you Vadim as to original author to the CC, maybe you can shed a bit more light on what might be going wrong here, and whether there is an easy real fix instead of just a workaround. I'm honestly barely remember all related details, sorry, I guess you know that code a lot better than me now. That VLIW scheduling/packing stuff was pretty complicated even when I worked on it. :) Now after 4 years I'm too scared to touch it. So I can't reasonably review it now, but if this patch fixes the bug and doesn't result in any regressions, and if Glenn and Dave have no objections, I guess it's ok to push it, you can add my "acked-by". I'd push it but I'm not ready to take responsibility for any possible fallout. I hope Dave or whoever maintains r600g will help with that. Thanks for fixing it. best regards, Gert Note: Submitter has no mesa-git write access. src/gallium/drivers/r600/sb/sb_sched.cpp | 43 src/gallium/drivers/r600/sb/sb_sched.h | 8 +++--- 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 5113b75684..2fbec2f77e 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -711,22 +711,24 @@ void alu_group_tracker::update_flags(alu_node* n) { } int post_scheduler::run() { - run_on(sh.root); - return 0; + return run_on(sh.root) ? 0 : 1; } -void post_scheduler::run_on(container_node* n) { - +bool post_scheduler::run_on(container_node* n) { + int r = true; for (node_riterator I = n->rbegin(), E = n->rend(); I != E; ++I) { if (I->is_container()) { if (I->subtype == NST_BB) { bb_node* bb = static_cast (*I); - schedule_bb(bb); + r = schedule_bb(bb); } else { - run_on(static_cast (*I)); + r = run_on(static_cast (*I)); } + if (!r) + break; } } + return r; } void post_scheduler::init_uc_val(container_node *c, value *v) { @@ -758,7 +760,7 @@ unsigned post_scheduler::init_ucm(container_node *c, node *n) { return F == ucm.end() ? 0 : F->second; } -void post_scheduler::schedule_bb(bb_node* bb) { +bool post_scheduler::schedule_bb(bb_node* bb) { PSC_DUMP( sblog << "scheduling BB " << bb->id << "\n"; if (!pending.empty()) @@ -791,8 +793,10 @@ void post_scheduler::schedule_bb(bb_node* bb) { if (n->is_alu_clause()) { n->remove(); - process_alu(static_cast (n)); - continue; + bool r = process_alu(static_cast (n)); + if (r) + continue; + return false; } n->remove(); @@ -800,6 +804,7 @@ void post_scheduler::schedule_bb(bb_node* bb) { } this->cur_bb = NULL; + return true; } void post_scheduler::init_regmap() { @@ -933,10 +938,10 @@ void post_scheduler::process_fetch(container_node *c) { cur_bb->push_front(c); } -void post_scheduler::process_alu(container_node *c) { +bool post_scheduler::process_alu(container_node *c) { if (c->empty()) - return; + return true; ucm.clear(); alu.reset(); @@ -973,7 +978,7 @@ void post_scheduler::process_alu(container_node *c) { } } - schedule_alu(c); + return schedule_alu(c); } void post_scheduler::update_local_interferences() { @@ -1135,15 +1140,20 @@ void post_scheduler::emit_clause() { emit_index_registers(); } -void post_scheduler::schedule_alu(container_node *c) { +bool post_scheduler::schedule_alu(container_node *c) { assert(!ready.empty() || !ready_copies.empty()); - while (1) { -
Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert
On 09/13/2017 11:16 AM, Gert Wollny wrote: Am Dienstag, den 12.09.2017, 23:44 +0200 schrieb Glenn Kennard: Vadim is correct, the fix is to extend the check in the if case above to also exclude TGSI_FILE_SYSTEM_VALUE, and keep the assert in place. ie: if (pshader->indirect_files & ~((1 << TGSI_FILE_CONSTANT) | (1 << TGSI_FILE_SAMPLER) | (1 << TGSI_FILE_SYSTEM_VALUE))) { Good, I'll update the patch accordingly. I guess the else path below is then only some fall-back for non-debug builds make all GPRs available as one big array to keep the code somehow valid for execution, right? Yes, it's just a safe fall-back in case if we don't have proper array info for some reason. It makes the backend assume that all GPRs can be accessed indirectly. I think I'd like to add a comment for that when I submit the new patch, because it is kind of irritating to see an assert and then a code path that seems to properly handle the case that would make the assert fail. if (pshader->num_arrays) { ... } else { sh->add_gpr_array(0, pshader->bc.ngpr, 0x0F); } Best, Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert
On 09/12/2017 12:49 PM, Gert Wollny wrote: Am Dienstag, den 12.09.2017, 09:56 +0300 schrieb Vadim Girlin: On 09/11/2017 07:09 PM, Emil Velikov wrote: Anyway, if num_arrays is 0 there, I suspect it can be a result of some other issue. At the very least it looks like a potential performance problem, because in that case we assume all shader registers can be accessed with indirect addressing and it can limit the optimizations significantly. So it might make sense to figure out why it's zero in the first place, in theory it shouldn't happen. Maybe something is wrong with the indirect_files bits? The shader that's failing is this (i.e. no arrays, and indirect access only to SV). Is the tested feature really supported by r600g? AFAICS the indirect index value is unused in the shader code. Anyway, at first glance it looks like we don't need indirect addressing for GPRs in this case, so the outer "if" around that assert probably should handle this case too and skip the assert. I'm not 100% sure though. FRAG DCL SV[0], SAMPLEMASK DCL OUT[0], COLOR DCL CONST[0][0] DCL TEMP[0..1], LOCAL DCL ADDR[0] IMM[0] FLT32 {1., 0., 0., 0.} IMM[1] INT32 {1, 0, 0, 0} 0: MOV TEMP[0], IMM[0].xyyx 1: UARL ADDR[0].x, CONST[0][0]. 2: USEQ TEMP[1].x, SV[ADDR[0].x]., IMM[1]. 3: UIF TEMP[1]. 4: MOV TEMP[0].xy, IMM[0].yxyy 5: ENDIF 6: MOV OUT[0], TEMP[0] 7: END = SHADER #12 == PS/BARTS/EVERGREEN = = 36 dw = 8 gprs = 1 stack = 4005 a418 ALU_PUSH_BEFORE 7 @10 KC0[CB0:0-15] 0010 00f9 00400c90 1 x: MOVR2.x, 1.0 0012 04f8 20400c90 y: MOVR2.y, 0 0014 04f8 40400c90 z: MOVR2.z, 0 0016 00f9 60400c90 w: MOVR2.w, 1.0 0018 8080 00800c90 t: MOVR4.x, KC0[0].x 0020 801f4800 00601d10 2 x: SETE_INT R3.x, R0.z, 1 0022 801f00fe 00e0229c 3 MP x: PRED_SETNE_INT R7.x, PV.x, 0 0002 0003 8281 JUMP @6 POP:1 0004 000c a804 ALU_POP_AFTER 2 @24 0024 04f8 00400c90 4 x: MOVR2.x, 0 0026 80f9 20400c90 y: MOVR2.y, 1.0 0006 000e a00c ALU 4 @28 0028 0002 00200c90 5 x: MOVR1.x, R2.x 0030 0402 20200c90 y: MOVR1.y, R2.y 0032 0802 40200c90 z: MOVR1.z, R2.z 0034 8c02 60200c90 w: MOVR1.w, R2.w 0008 c0008000 95200688 EXPORT_DONEPIXEL 0 R1.xyzw EOP = SHADER_END ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert
On 09/11/2017 07:09 PM, Emil Velikov wrote: On 11 September 2017 at 15:39, Gert Wollny <gw.foss...@gmail.com> wrote: The assert checks whether pshader->num_arrays != 0, but the code after the assert actually branches based on the same check. Removing this assert fixes: piglit spec@arb_gpu_shader5@execution@samplemaskin-indirect Both assert() and if () have existed since day 1, with below commit. Perhaps Vadim has some ideas what happened here? I guess the assert was added initially just to make sure that I set indirect_files and num_arrays fields correctly elsewhere and everything related to the indirect arrays works as I expect. Many features were added since then, so my assumptions from that time could be wrong now, I'm just not sure off-hand. Anyway, if num_arrays is 0 there, I suspect it can be a result of some other issue. At the very least it looks like a potential performance problem, because in that case we assume all shader registers can be accessed with indirect addressing and it can limit the optimizations significantly. So it might make sense to figure out why it's zero in the first place, in theory it shouldn't happen. Maybe something is wrong with the indirect_files bits? I'm adding Glenn to cc too, AFAIU he has added some related features since then, so possibly he knows better. Cc: Vadim Girlin <vadimgir...@gmail.com> Fixes: 2cd76917934 ("r600g/sb: initial commit of the optimizing shader backend") -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Fix handling of TGSI_OPCODE_ARR with SB
On 08/13/15 21:30, Glenn Kennard wrote: FLT_TO_INT goes in the vector pipes on evergreen/NI, not the trans unit as on earlier chips. FWIW, AFAIK it works in trans as well, just uses different rounding mode. According to the description in the EG ISA doc: Channels 0-3 use the 32-bit round mode state; channel 4 uses truncation.. So vector slots use default rounding mode, trans slot always uses trunc. That is, I have no objections against that change, I think it makes sense to limit it to expected behavior, I hoped to control it somewhere later, but didn't ever get close to it. So just FYI. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Fixes issue found on nine: https://github.com/iXit/Mesa-3D/issues/119 src/gallium/drivers/r600/r600_isa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 381f06d..fdbe1c0 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -262,7 +262,7 @@ static const struct alu_op_info alu_op_table[] = { {PRED_SETNE_PUSH_INT, 2, { 0x4D, 0x4D },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_NE | AF_INT_CMP }, {PRED_SETLT_PUSH_INT, 2, { 0x4E, 0x4E },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_LT | AF_INT_CMP }, {PRED_SETLE_PUSH_INT, 2, { 0x4F, 0x4F },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_LE | AF_INT_CMP }, - {FLT_TO_INT,1, { 0x6B, 0x50 },{ AF_S, AF_S, AF_VS, AF_VS}, AF_INT_DST | AF_CVT }, + {FLT_TO_INT,1, { 0x6B, 0x50 },{ AF_S, AF_S, AF_V, AF_V}, AF_INT_DST | AF_CVT }, {BFREV_INT, 1, { -1, 0x51 },{ 0, 0, AF_VS, AF_VS}, AF_INT_DST }, {ADDC_UINT, 2, { -1, 0x52 },{ 0, 0, AF_VS, AF_VS}, AF_UINT_DST }, {SUBB_UINT, 2, { -1, 0x53 },{ 0, 0, AF_VS, AF_VS}, AF_UINT_DST }, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/16/2014 05:44 AM, Dave Airlie wrote: On 16 December 2014 at 08:59, Vadim Girlin vadimgir...@gmail.com wrote: On 12/16/2014 01:30 AM, Dave Airlie wrote: New patch is attached, the only difference is in the sb_sched.cpp (it disables copy coalescing for some unsafe cases, so it may leave more MOVs than previously, but I don't think there will be any noticeable effect on performance). So far I don't see any problems with it, but I don't have many GL apps on the test machine. At least lightsmark and unigine demos work for me. Based on my limited understanding of the code: Acked-by: Alex Deucher alexander.deuc...@amd.com Alex, thanks for the review, I understand you wanted it to get into mesa release, but it really needs careful testing with more apps, so far I hoped Dave would do it as long as he's looking into these issues anyway. In theory I can also install steam on the test machine and some games, it just needs the time and I'm not sure if I'll find it, so far my main job is sufficient to make me pretty tired. Current scheduler in SB is very fragile after adding handling for all special cases discovered during initial debugging etc, I said since the very beginning that I'd like to rewrite it, if only I had time. So any change like this can potentially break some apps even if piglit passes, and I'm not ready to take responsibility for that if I commit it myself, I just don't have time to deal with all possible consequences on all supported chips. If you think it's ok, just push this patch (it requires revert of the previous Dave's commit 7b0067d2). I'm really sorry that I can't do more to help with it. Myself and Glenn are looking at it, Glenn noticed a piglit regression from this yesterday, I'll reproduce today and take a look. Hi, Dave Glenn, Thanks for looking into it. FWIW, when I worked on it I've ran piglit's quick tests and didn't see any regressions on evergreen (juniper 5750). There were some failed tests in some piglit runs, but AFAIU they were just random. Turns out we had a pre-existing fail that we noticed, not a regression. I'm going to push this, since its better than what is there, we can see if some public testing notices any big issues also. Thanks, Dave. I'm really sorry that I can't pay as much attention to that code as I'd like, and I really appreciate your and Glenn's efforts for maintaining it. (In case if someone thinks it's my fault, I must remind, I warned that I won't be able to support it even before it was merged. So please don't blame me :) ). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/12/2014 05:28 PM, Alex Deucher wrote: On Wed, Dec 10, 2014 at 6:50 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 12/09/2014 07:39 AM, Vadim Girlin wrote: On 12/09/2014 05:18 AM, Dave Airlie wrote: On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote: On 12/06/2014 07:13 AM, Vadim Girlin wrote: On 12/04/2014 01:43 AM, Dave Airlie wrote: Hi Vadim, I've been looking with Glenn's help into a bug in sb for a couple of weeks now triggered by a change in how GLSL generates switch statements. I understand you probably aren't too interested in r600g but I believe I'm hitting a design level problem and I would like some advice. So it appears that GLSL can create loops that don't repeat for switch statements, and it appears SB wasn't ready to handle such a thing. Hi, Dave, I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I think loops are less efficient than other control flow instructions on r600g hw (at least because they increase stack usage), and possibly on other hw too. In fact it seems sb basically gets rid of it already in IR, it just doesn't know how to translate resulting control flow to ISA, because so far it only supports specific control flow structure for if-then-else that was previously preserved during optimizations. I think it may be not very hard to implement support for that in finalizer, I'll look into it. In fact handling that control flow in finalizer is not as easy as I hoped, probably impossible, at least if we want to make it efficient. I forgot about the limitations of R600 ISA. OTOH it seems I've managed to fix the issues with loops, the patch is attached (it's meant to be used instead of 7b0067d2). There are no piglit regressions on evergreen, but I didn't test any real apps. This does seem to fix the problems in piglit, and looks close to what I was attempting but written by someone who knows what they are doing :-) What is the sb_sched.cpp change for at the end for? It fixes those scheduler/regalloc errors for switch tests. Unfortunately, now I've installed some benchmarks for testing and AFAICS this patch breaks at least lightsmark 2008, so it seems the condition removed by the patch was there for a reason. I'll probably try to come up with better fix. New patch is attached, the only difference is in the sb_sched.cpp (it disables copy coalescing for some unsafe cases, so it may leave more MOVs than previously, but I don't think there will be any noticeable effect on performance). So far I don't see any problems with it, but I don't have many GL apps on the test machine. At least lightsmark and unigine demos work for me. Based on my limited understanding of the code: Acked-by: Alex Deucher alexander.deuc...@amd.com Alex, thanks for the review, I understand you wanted it to get into mesa release, but it really needs careful testing with more apps, so far I hoped Dave would do it as long as he's looking into these issues anyway. In theory I can also install steam on the test machine and some games, it just needs the time and I'm not sure if I'll find it, so far my main job is sufficient to make me pretty tired. Current scheduler in SB is very fragile after adding handling for all special cases discovered during initial debugging etc, I said since the very beginning that I'd like to rewrite it, if only I had time. So any change like this can potentially break some apps even if piglit passes, and I'm not ready to take responsibility for that if I commit it myself, I just don't have time to deal with all possible consequences on all supported chips. If you think it's ok, just push this patch (it requires revert of the previous Dave's commit 7b0067d2). I'm really sorry that I can't do more to help with it. Vadim Vadim Vadim Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/16/2014 01:30 AM, Dave Airlie wrote: New patch is attached, the only difference is in the sb_sched.cpp (it disables copy coalescing for some unsafe cases, so it may leave more MOVs than previously, but I don't think there will be any noticeable effect on performance). So far I don't see any problems with it, but I don't have many GL apps on the test machine. At least lightsmark and unigine demos work for me. Based on my limited understanding of the code: Acked-by: Alex Deucher alexander.deuc...@amd.com Alex, thanks for the review, I understand you wanted it to get into mesa release, but it really needs careful testing with more apps, so far I hoped Dave would do it as long as he's looking into these issues anyway. In theory I can also install steam on the test machine and some games, it just needs the time and I'm not sure if I'll find it, so far my main job is sufficient to make me pretty tired. Current scheduler in SB is very fragile after adding handling for all special cases discovered during initial debugging etc, I said since the very beginning that I'd like to rewrite it, if only I had time. So any change like this can potentially break some apps even if piglit passes, and I'm not ready to take responsibility for that if I commit it myself, I just don't have time to deal with all possible consequences on all supported chips. If you think it's ok, just push this patch (it requires revert of the previous Dave's commit 7b0067d2). I'm really sorry that I can't do more to help with it. Myself and Glenn are looking at it, Glenn noticed a piglit regression from this yesterday, I'll reproduce today and take a look. Hi, Dave Glenn, Thanks for looking into it. FWIW, when I worked on it I've ran piglit's quick tests and didn't see any regressions on evergreen (juniper 5750). There were some failed tests in some piglit runs, but AFAIU they were just random. If there are any problems with this fix, I'll be glad to try to help, if time allows. Vadim Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/09/2014 07:39 AM, Vadim Girlin wrote: On 12/09/2014 05:18 AM, Dave Airlie wrote: On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote: On 12/06/2014 07:13 AM, Vadim Girlin wrote: On 12/04/2014 01:43 AM, Dave Airlie wrote: Hi Vadim, I've been looking with Glenn's help into a bug in sb for a couple of weeks now triggered by a change in how GLSL generates switch statements. I understand you probably aren't too interested in r600g but I believe I'm hitting a design level problem and I would like some advice. So it appears that GLSL can create loops that don't repeat for switch statements, and it appears SB wasn't ready to handle such a thing. Hi, Dave, I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I think loops are less efficient than other control flow instructions on r600g hw (at least because they increase stack usage), and possibly on other hw too. In fact it seems sb basically gets rid of it already in IR, it just doesn't know how to translate resulting control flow to ISA, because so far it only supports specific control flow structure for if-then-else that was previously preserved during optimizations. I think it may be not very hard to implement support for that in finalizer, I'll look into it. In fact handling that control flow in finalizer is not as easy as I hoped, probably impossible, at least if we want to make it efficient. I forgot about the limitations of R600 ISA. OTOH it seems I've managed to fix the issues with loops, the patch is attached (it's meant to be used instead of 7b0067d2). There are no piglit regressions on evergreen, but I didn't test any real apps. This does seem to fix the problems in piglit, and looks close to what I was attempting but written by someone who knows what they are doing :-) What is the sb_sched.cpp change for at the end for? It fixes those scheduler/regalloc errors for switch tests. Unfortunately, now I've installed some benchmarks for testing and AFAICS this patch breaks at least lightsmark 2008, so it seems the condition removed by the patch was there for a reason. I'll probably try to come up with better fix. New patch is attached, the only difference is in the sb_sched.cpp (it disables copy coalescing for some unsafe cases, so it may leave more MOVs than previously, but I don't think there will be any noticeable effect on performance). So far I don't see any problems with it, but I don't have many GL apps on the test machine. At least lightsmark and unigine demos work for me. Vadim Vadim Dave. From d2d16fa39c7b4e871d67e05bad92a540d7e5ea68 Mon Sep 17 00:00:00 2001 From: Vadim Girlin vadimgir...@gmail.com Date: Wed, 10 Dec 2014 14:41:10 +0300 Subject: [PATCH] r600g/sb: fix issues with loops created for switch --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 ++ src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 ++ src/gallium/drivers/r600/sb/sb_if_conversion.cpp | 4 ++-- src/gallium/drivers/r600/sb/sb_ir.h | 9 +++-- src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++ 5 files changed, 16 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index f0849ca..3f362c4 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -110,6 +110,8 @@ int bc_finalizer::run() { void bc_finalizer::finalize_loop(region_node* r) { + update_nstack(r); + cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10); cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END); diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index d787e5b..403f938 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -758,6 +758,8 @@ int bc_parser::prepare_loop(cf_node* c) { c-insert_before(reg); rep-move(c, end-next); + reg-src_loop = true; + loop_stack.push(reg); return 0; } diff --git a/src/gallium/drivers/r600/sb/sb_if_conversion.cpp b/src/gallium/drivers/r600/sb/sb_if_conversion.cpp index 93edace..3f2b1b1 100644 --- a/src/gallium/drivers/r600/sb/sb_if_conversion.cpp +++ b/src/gallium/drivers/r600/sb/sb_if_conversion.cpp @@ -115,13 +115,13 @@ void if_conversion::convert_kill_instructions(region_node *r, bool if_conversion::check_and_convert(region_node *r) { depart_node *nd1 = static_castdepart_node*(r-first); - if (!nd1-is_depart()) + if (!nd1-is_depart() || nd1-target != r) return false; if_node *nif = static_castif_node*(nd1-first); if (!nif-is_if
Re: [Mesa-dev] r600/sb loop issue
On 12/06/2014 07:13 AM, Vadim Girlin wrote: On 12/04/2014 01:43 AM, Dave Airlie wrote: Hi Vadim, I've been looking with Glenn's help into a bug in sb for a couple of weeks now triggered by a change in how GLSL generates switch statements. I understand you probably aren't too interested in r600g but I believe I'm hitting a design level problem and I would like some advice. So it appears that GLSL can create loops that don't repeat for switch statements, and it appears SB wasn't ready to handle such a thing. Hi, Dave, I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I think loops are less efficient than other control flow instructions on r600g hw (at least because they increase stack usage), and possibly on other hw too. In fact it seems sb basically gets rid of it already in IR, it just doesn't know how to translate resulting control flow to ISA, because so far it only supports specific control flow structure for if-then-else that was previously preserved during optimizations. I think it may be not very hard to implement support for that in finalizer, I'll look into it. In fact handling that control flow in finalizer is not as easy as I hoped, probably impossible, at least if we want to make it efficient. I forgot about the limitations of R600 ISA. OTOH it seems I've managed to fix the issues with loops, the patch is attached (it's meant to be used instead of 7b0067d2). There are no piglit regressions on evergreen, but I didn't test any real apps. Vadim sb has the -is_loop() and it just checks !repeats.empty(), so this meant in the finalizer code we'd fall into the if statement which would then assert. I hacked/fixed (more hacked), this in 7b0067d23a6f64cf83c42e7f11b2cd4100c569fe which attempts to detect single pass loops and handle things that way. However this lead to stack depth calculations being incorrectly done, so I moved the single loop detect into the is_loop check, (see attached patch). This fixes the rendering in some places, but lead to a regression in tests/shaders/glsl-vs-continue-in-switch-in-do-while.shader_test error at : PHI t76||FP@R3.x, t128||FP@R3.x, t115||FP@R3.x, t102||FP@R3.x, t89||FP@R3.x : expected operand value t115||FP@R3.x, gpr contains t17||FP@R3.x error at : PHI t76||FP@R3.x, t128||FP@R3.x, t115||FP@R3.x, t102||FP@R3.x, t89||FP@R3.x : expected operand value t102||FP@R3.x, gpr contains t17||FP@R3.x Now Glenn suspected this was due to the is_loop check in sb_shader.cpp:create_bbs, and changing that check to only detect repeating loops removes that issue, but introduces stack sizing issues again, resulting in lockups/random rendering. So I just want to ask had you considered single loops with an always break in sb design, I didn't see such loops with any test cases, so I didn't even think about it. and perhaps some idea where things are going so wrong with the register alloc above. Not sure, but as long as the only repeat node is optimized away in bc_parser because it's useless due to unconditional break, I suspect it may be not easy to make all other code think that it's still a loop. I've tried a quick fix to not optimize the repeat away for such loops, but it results in other issues, probably it will require handling this as a special case in other places, so it doesn't look like a good idea either. I'll try to implement the solution that I described above, that is, translate resulting control flow back to ISA. If it won't be too much work, it's probably the best way and it won't use loop instructions in the end. I suspect I'll keep digging into this, but its getting to the edges of the brain space/time I can find! Dave. From 4967ef90847f921fc0ef7c018ae7ae8048d2a6ce Mon Sep 17 00:00:00 2001 From: Vadim Girlin vadimgir...@gmail.com Date: Mon, 8 Dec 2014 13:11:48 +0300 Subject: [PATCH] r600g/sb: fix issues with loops created for switch statements --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 ++ src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 ++ src/gallium/drivers/r600/sb/sb_if_conversion.cpp | 4 ++-- src/gallium/drivers/r600/sb/sb_ir.h | 9 +++-- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index f0849ca..3f362c4 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -110,6 +110,8 @@ int bc_finalizer::run() { void bc_finalizer::finalize_loop(region_node* r
Re: [Mesa-dev] r600/sb loop issue
On 12/09/2014 05:18 AM, Dave Airlie wrote: On 8 December 2014 at 20:41, Vadim Girlin vadimgir...@gmail.com wrote: On 12/06/2014 07:13 AM, Vadim Girlin wrote: On 12/04/2014 01:43 AM, Dave Airlie wrote: Hi Vadim, I've been looking with Glenn's help into a bug in sb for a couple of weeks now triggered by a change in how GLSL generates switch statements. I understand you probably aren't too interested in r600g but I believe I'm hitting a design level problem and I would like some advice. So it appears that GLSL can create loops that don't repeat for switch statements, and it appears SB wasn't ready to handle such a thing. Hi, Dave, I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I think loops are less efficient than other control flow instructions on r600g hw (at least because they increase stack usage), and possibly on other hw too. In fact it seems sb basically gets rid of it already in IR, it just doesn't know how to translate resulting control flow to ISA, because so far it only supports specific control flow structure for if-then-else that was previously preserved during optimizations. I think it may be not very hard to implement support for that in finalizer, I'll look into it. In fact handling that control flow in finalizer is not as easy as I hoped, probably impossible, at least if we want to make it efficient. I forgot about the limitations of R600 ISA. OTOH it seems I've managed to fix the issues with loops, the patch is attached (it's meant to be used instead of 7b0067d2). There are no piglit regressions on evergreen, but I didn't test any real apps. This does seem to fix the problems in piglit, and looks close to what I was attempting but written by someone who knows what they are doing :-) What is the sb_sched.cpp change for at the end for? It fixes those scheduler/regalloc errors for switch tests. Unfortunately, now I've installed some benchmarks for testing and AFAICS this patch breaks at least lightsmark 2008, so it seems the condition removed by the patch was there for a reason. I'll probably try to come up with better fix. Vadim Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/04/2014 01:43 AM, Dave Airlie wrote: Hi Vadim, I've been looking with Glenn's help into a bug in sb for a couple of weeks now triggered by a change in how GLSL generates switch statements. I understand you probably aren't too interested in r600g but I believe I'm hitting a design level problem and I would like some advice. So it appears that GLSL can create loops that don't repeat for switch statements, and it appears SB wasn't ready to handle such a thing. Hi, Dave, I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I think loops are less efficient than other control flow instructions on r600g hw (at least because they increase stack usage), and possibly on other hw too. In fact it seems sb basically gets rid of it already in IR, it just doesn't know how to translate resulting control flow to ISA, because so far it only supports specific control flow structure for if-then-else that was previously preserved during optimizations. I think it may be not very hard to implement support for that in finalizer, I'll look into it. sb has the -is_loop() and it just checks !repeats.empty(), so this meant in the finalizer code we'd fall into the if statement which would then assert. I hacked/fixed (more hacked), this in 7b0067d23a6f64cf83c42e7f11b2cd4100c569fe which attempts to detect single pass loops and handle things that way. However this lead to stack depth calculations being incorrectly done, so I moved the single loop detect into the is_loop check, (see attached patch). This fixes the rendering in some places, but lead to a regression in tests/shaders/glsl-vs-continue-in-switch-in-do-while.shader_test error at : PHI t76||FP@R3.x, t128||FP@R3.x, t115||FP@R3.x, t102||FP@R3.x, t89||FP@R3.x : expected operand value t115||FP@R3.x, gpr contains t17||FP@R3.x error at : PHI t76||FP@R3.x, t128||FP@R3.x, t115||FP@R3.x, t102||FP@R3.x, t89||FP@R3.x : expected operand value t102||FP@R3.x, gpr contains t17||FP@R3.x Now Glenn suspected this was due to the is_loop check in sb_shader.cpp:create_bbs, and changing that check to only detect repeating loops removes that issue, but introduces stack sizing issues again, resulting in lockups/random rendering. So I just want to ask had you considered single loops with an always break in sb design, I didn't see such loops with any test cases, so I didn't even think about it. and perhaps some idea where things are going so wrong with the register alloc above. Not sure, but as long as the only repeat node is optimized away in bc_parser because it's useless due to unconditional break, I suspect it may be not easy to make all other code think that it's still a loop. I've tried a quick fix to not optimize the repeat away for such loops, but it results in other issues, probably it will require handling this as a special case in other places, so it doesn't look like a good idea either. I'll try to implement the solution that I described above, that is, translate resulting control flow back to ISA. If it won't be too much work, it's probably the best way and it won't use loop instructions in the end. I suspect I'll keep digging into this, but its getting to the edges of the brain space/time I can find! Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/06/2014 07:50 AM, Matt Turner wrote: On Fri, Dec 5, 2014 at 8:13 PM, Vadim Girlin vadimgir...@gmail.com wrote: I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I don't think that's true. I don't see anything in the spec that would lead me to believe continue cannot occur in a switch statement. I've double-checked some versions of GLSL spec (1.30, 1.50, 3.30, 4.40) and all of them say the same (section 6.4 Jumps): The continue jump is used only in loops. In fact, we have some relatively complicated shaders that have a continue in a switch. See tests/shaders/glsl-fs-continue-in-switch-in-do-while.shader_test ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600/sb loop issue
On 12/06/2014 08:01 AM, Matt Turner wrote: On Fri, Dec 5, 2014 at 8:56 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 12/06/2014 07:50 AM, Matt Turner wrote: On Fri, Dec 5, 2014 at 8:13 PM, Vadim Girlin vadimgir...@gmail.com wrote: I suspect we should rather get rid of such loops somehow, i.e. convert to something else, the loop that never repeats is not really a loop anyway. AFAICS continue is not supported in switch statements according to GLSL specs, so the loops generated for switch will never be repeated. Am I missing something? Even if repeating is possible somehow, at least we can get rid of the loops that are not repeated. I don't think that's true. I don't see anything in the spec that would lead me to believe continue cannot occur in a switch statement. I've double-checked some versions of GLSL spec (1.30, 1.50, 3.30, 4.40) and all of them say the same (section 6.4 Jumps): The continue jump is used only in loops. Sure, but isn't the continue below in a loop? do { switch (...) { case ...: continue; } } while (...); Ah, now I see, you're right. I just was mostly thinking about that loop that is created for a switch in IR, not about source, and somehow confused these things. Thanks for pointing that out. Hopefully such cases won't complicate the problem in sb even more, need to check those tests. The grammar is pretty unambiguous. jump_statement: CONTINUE SEMICOLON BREAK SEMICOLON RETURN SEMICOLON RETURN expression SEMICOLON DISCARD SEMICOLON // Fragment shader only. If continue can't be in a switch, neither can break. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/16] radeonsi: Initial geometry shader support
On Wed, 2014-01-29 at 07:13 +1000, Dave Airlie wrote: 3) In si_init_gs_rings: - could you please use readable decimal numbers for specifying the sizes? Like 1024 * 1024 * ... [...] - isn't 64 MB too many for a ring buffer? I can write the numbers any way you like. :) But I just copied them from the corresponding r600g patches; I don't know yet how these numbers were derived, or what the constraints are for the ring buffer sizes. I'm trying to find out more about this. I don't think they are derived from anything yet, they were just big numbers Vadim used, IIRC all these magic numbers were taken from the fglrx command stream for some simple GS test on my 512MB juniper card. Vadim I suppose we can calculate them from max vertices for the geom shader * number of outputs * size of each output. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] gallium-tgsi: add TGSI_OPCODE_{FMA-POPCNT-MSB-LSB} description
On Tue, 2014-01-07 at 21:49 +0100, Marek Olšák wrote: FYI, Evergreen has dedicated instructions for both MAD and FMA. FMA seems to be available on DX11 chips only. FWIW, not all evergreen chips support FMA, only high-end chips that support FP64 (I guess cypress only), according to the isa docs: Instructions FMA Description Fused single-precision multiply-add. Only for double-precision parts. dst = src0 * src1 + src2 Vadim Marek On Tue, Jan 7, 2014 at 8:20 PM, Roland Scheidegger srol...@vmware.com wrote: Yes that is certainly related. I'm actually not entirely sure what is allowed in glsl by default as OpenGL seems to have some lax rules regarding precision in any case (float calculations not required but allowed to use denorms, at least earlier versions weren't required to support Infs neither and so on). It is quite possible the MAD we were always using would have been allowed to really do fma (at least with OpenGL), unless the precise qualifier was used (which isn't supported yet?). TGSI also isn't really watertight about such issues neither (that is if you use it with hw such as r300 then you certainly don't expect ieee754 rules to be followed but if you've got a d3d10-capable backend then you are expected to follow rules specified there which are _mostly_ ieee754-2008). So I'm not really sure if TGSI MAD should be allowed to do either rounding or not, but someday it should be figured out and spelled out explicitly in docs. Roland Am 07.01.2014 19:24, schrieb Maxence Le Doré: I forgot the link : https://urldefense.proofpoint.com/v1/url?u=http://www.geeks3d.com/20120106/precise-qualifier-in-glsl-and-nvidia-geforce-cards/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=%2FzSAl55KOH0z7T5qkRj6BX164wf6QpYOnJLIzojXBQc%3D%0As=0ac5e0fbd69867705f0c52090c9ddf84e7832be80e724a0983c5aa2f5dde72e0 2014/1/7 Maxence Le Doré maxence.led...@gmail.com: For this reason, GLSL 4.0 introduces the 'precise' qualifier. I invite you to take a look at this article. 2014/1/6 Roland Scheidegger srol...@vmware.com: Am 05.01.2014 01:34, schrieb Maxence Le Doré: FMA(a,b,c) keeps extra precision (usually 1 more bit of mantissa, afaik) for the result a*b and add this to c, to finally produce a IEEE754 32bit float result. MAD(a,b,c) product a IEEE754 32bit float product a*b and add it to C. So, fma can be slightly more accurate. An accuracy that is something very appreciate. Actually in newer languages (such as opencl) mad is used to indicate intermediate rounding does not matter, so if your cpu can do fma but not mul+add in a single cycle it is allowed to use fma instead. FMA OTOH of course forces no intermediate rounding. Our tgsi definitions certainly initially were meaning intermediate rounding should take place, I don't know if we need to keep it that way or could repurpose that slightly (so if you require the intermediate rounding you'd just use mul+add). Roland 2014/1/5 Marek Olšák mar...@gmail.com: How is FMA different from MAD? Please document the new opcodes in src/gallium/docs/source/tgsi.rst. Marek On Sun, Jan 5, 2014 at 12:42 AM, Maxence Le Doré maxence.led...@gmail.com wrote: From: Maxence Le Doré Maxence Le Doré --- src/gallium/auxiliary/tgsi/tgsi_info.c | 16 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h | 6 ++ src/gallium/include/pipe/p_shader_tokens.h | 9 - 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 0beef44..ed55940 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -221,6 +221,12 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 3, 1, 0, 0, 0, OTHR, TXL2, TGSI_OPCODE_TXL2 }, { 1, 2, 0, 0, 0, 0, COMP, IMUL_HI, TGSI_OPCODE_IMUL_HI }, { 1, 2, 0, 0, 0, 0, COMP, UMUL_HI, TGSI_OPCODE_UMUL_HI }, + { 1, 3, 0, 0, 0, 0, COMP, FMA, TGSI_OPCODE_FMA }, + { 1, 1, 0, 0, 0, 0, COMP, POPCNT, TGSI_OPCODE_POPCNT }, + { 1, 1, 0, 0, 0, 0, COMP, IMSB, TGSI_OPCODE_IMSB }, + { 1, 1, 0, 0, 0, 0, COMP, ILSB, TGSI_OPCODE_ILSB }, + { 1, 1, 0, 0, 0, 0, COMP, UMSB, TGSI_OPCODE_UMSB }, + { 1, 1, 0, 0, 0, 0, COMP, ULSB, TGSI_OPCODE_ULSB }, }; const struct tgsi_opcode_info * @@ -321,6 +327,11 @@ tgsi_opcode_infer_type( uint opcode ) case TGSI_OPCODE_IABS: case TGSI_OPCODE_ISSG: case TGSI_OPCODE_IMUL_HI: + case TGSI_OPCODE_POPCNT: + case TGSI_OPCODE_ILSB: + case TGSI_OPCODE_IMSB: + case TGSI_OPCODE_ULSB: + case TGSI_OPCODE_UMSB: return TGSI_TYPE_SIGNED; default: return TGSI_TYPE_FLOAT; @@ -344,9 +355,14 @@ tgsi_opcode_infer_src_type( uint opcode ) case TGSI_OPCODE_SAMPLE_I: case
Re: [Mesa-dev] [PATCH] r600g/sb: fix stack size computation on evergreen
On Mon, 2013-12-09 at 10:56 -0500, Tom Stellard wrote: On Sat, Dec 07, 2013 at 07:06:36PM +0400, Vadim Girlin wrote: On evergreen we have to reserve 1 stack element in some additional cases besides the ones mentioned in the docs, but stack size computation was recently reimplemented exactly as described in the docs by the patch that added workarounds for stack issues on EG/CM, resulting in regressions with some apps (Serious Sam 3). This patch fixes it by restoring previous behavior. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369 Signed-off-by: Vadim Girlin vadimgir...@gmail.com Cc: 10.0 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index bc71cf8..355eb63 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) { unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, unsigned ifs, unsigned add) { unsigned stack_elements = add; - bool has_non_wqm_push_with_loops_on_stack = false; bool has_non_wqm_push = (add != 0); region_node *r = n-is_region() ? static_castregion_node*(n) : n-get_parent_region(); @@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, while (r) { if (r-is_loop()) { ++loops; - if (has_non_wqm_push) - has_non_wqm_push_with_loops_on_stack = true; } else { ++ifs; has_non_wqm_push = true; @@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, switch (ctx.hw_class) { case HW_CLASS_R600: case HW_CLASS_R700: + // If any non-WQM push is invoked, 2 elements should be reserved. if (has_non_wqm_push) stack_elements += 2; break; case HW_CLASS_CAYMAN: + // If any stack operation is invoked, 2 elements should be reserved if (stack_elements) stack_elements += 2; break; case HW_CLASS_EVERGREEN: - if (has_non_wqm_push_with_loops_on_stack) + // According to the docs we need to reserve 1 element for each of the + // following cases: + // 1) non-WQM push is used with WQM/LOOP frames on stack + // 2) ALU_ELSE_AFTER is used at the point of max stack usage + // NOTE: + // It was found that the conditions above are not sufficient, there are + // other cases where we also need to reserve stack space, that's why + // we always reserve 1 stack element if we have non-WQM push on stack. + // Condition 2 is ignored for now because we don't use this instruction. + if (has_non_wqm_push) ++stack_elements; The kernel analyzer reports a stack size of 2 for compute shaders that have 3 levels of ALU_PUSH_BEFORE. This would suggest that you either need to reserve 2 sub-entries (stack_elements in the sb code) when there is a non-wqm push, or apply the CAYMAN rules to EVERGREEN. It is possible, though, that the kernel analyzer is over-allocating and this patch is correct, but I don't have any evidence for this yet. Is there any test that fails with this patch? AFAIK this algorithm worked fine for about 8 months in both old and sb backends, so I'd rather prefer to have any evidence that this is not correct before increasing stack allocation and reducing performance. Vadim -Tom break; } -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: fix stack size computation on evergreen
On evergreen we have to reserve 1 stack element in some additional cases besides the ones mentioned in the docs, but stack size computation was recently reimplemented exactly as described in the docs by the patch that added workarounds for stack issues on EG/CM, resulting in regressions with some apps (Serious Sam 3). This patch fixes it by restoring previous behavior. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369 Signed-off-by: Vadim Girlin vadimgir...@gmail.com Cc: 10.0 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index bc71cf8..355eb63 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) { unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, unsigned ifs, unsigned add) { unsigned stack_elements = add; - bool has_non_wqm_push_with_loops_on_stack = false; bool has_non_wqm_push = (add != 0); region_node *r = n-is_region() ? static_castregion_node*(n) : n-get_parent_region(); @@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, while (r) { if (r-is_loop()) { ++loops; - if (has_non_wqm_push) - has_non_wqm_push_with_loops_on_stack = true; } else { ++ifs; has_non_wqm_push = true; @@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, unsigned loops, switch (ctx.hw_class) { case HW_CLASS_R600: case HW_CLASS_R700: + // If any non-WQM push is invoked, 2 elements should be reserved. if (has_non_wqm_push) stack_elements += 2; break; case HW_CLASS_CAYMAN: + // If any stack operation is invoked, 2 elements should be reserved if (stack_elements) stack_elements += 2; break; case HW_CLASS_EVERGREEN: - if (has_non_wqm_push_with_loops_on_stack) + // According to the docs we need to reserve 1 element for each of the + // following cases: + // 1) non-WQM push is used with WQM/LOOP frames on stack + // 2) ALU_ELSE_AFTER is used at the point of max stack usage + // NOTE: + // It was found that the conditions above are not sufficient, there are + // other cases where we also need to reserve stack space, that's why + // we always reserve 1 stack element if we have non-WQM push on stack. + // Condition 2 is ignored for now because we don't use this instruction. + if (has_non_wqm_push) ++stack_elements; break; } -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: fix value::is_fixed()
--- cc: Andreas Boll andreas.boll@gmail.com Andreas, this patch should fix the issue with SB on RV770 that you reported on IRC (assert with interpolation-mixed.shader_test). There are no piglit regressions with this patch on my evergreen, but I can't test with r700 or any other chips. src/gallium/drivers/r600/sb/sb_valtable.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp b/src/gallium/drivers/r600/sb/sb_valtable.cpp index 00aee66..0d39e9c 100644 --- a/src/gallium/drivers/r600/sb/sb_valtable.cpp +++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp @@ -255,8 +255,8 @@ void value::set_prealloc() { bool value::is_fixed() { if (array array-gpr) return true; - if (chunk) - return chunk-is_fixed(); + if (chunk chunk-is_fixed()) + return true; return flags VLF_FIXED; } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Initialize shader::dce_flags.
On 10/19/2013 06:18 AM, Vinson Lee wrote: Fixes Uninitialized scalar field defect reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org Reviewed-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_shader.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 98e52b1..38617a8 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -39,7 +39,8 @@ shader::shader(sb_context sctx, shader_target t, unsigned id) coal(*this), bbs(), target(t), vt(ex), ex(*this), root(), compute_interferences(), - has_alu_predication(), uses_gradients(), safe_math(), ngpr(), nstack() {} + has_alu_predication(), + uses_gradients(), safe_math(), ngpr(), nstack(), dce_flags() {} bool shader::assign_slot(alu_node* n, alu_node *slots[5]) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: fix issue with DCE between GVN and GCM (v2)
We can't perform DCE using the liveness pass between GVN and GCM because it relies on the correct schedule, but GVN doesn't care about preserving correctness - it's rescheduled later by GCM. This patch makes dce_cleanup pass perform simple DCE between GVN and GCM instead of relying on liveness pass. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088 Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_core.cpp| 10 -- src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp | 22 ++ src/gallium/drivers/r600/sb/sb_pass.h | 7 +-- src/gallium/drivers/r600/sb/sb_shader.h| 12 4 files changed, 39 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp b/src/gallium/drivers/r600/sb/sb_core.cpp index b5dd88e..9fd9d9a 100644 --- a/src/gallium/drivers/r600/sb/sb_core.cpp +++ b/src/gallium/drivers/r600/sb/sb_core.cpp @@ -184,6 +184,8 @@ int r600_sb_bytecode_process(struct r600_context *rctx, SB_RUN_PASS(psi_ops,1); SB_RUN_PASS(liveness, 0); + + sh-dce_flags = DF_REMOVE_DEAD | DF_EXPAND; SB_RUN_PASS(dce_cleanup,0); SB_RUN_PASS(def_use,0); @@ -201,9 +203,10 @@ int r600_sb_bytecode_process(struct r600_context *rctx, SB_RUN_PASS(gvn,1); - SB_RUN_PASS(liveness, 0); + SB_RUN_PASS(def_use,1); + + sh-dce_flags = DF_REMOVE_DEAD | DF_REMOVE_UNUSED; SB_RUN_PASS(dce_cleanup,1); - SB_RUN_PASS(def_use,0); SB_RUN_PASS(ra_split, 0); SB_RUN_PASS(def_use,0); @@ -217,6 +220,9 @@ int r600_sb_bytecode_process(struct r600_context *rctx, sh-compute_interferences = true; SB_RUN_PASS(liveness, 0); + sh-dce_flags = DF_REMOVE_DEAD; + SB_RUN_PASS(dce_cleanup,1); + SB_RUN_PASS(ra_coalesce,1); SB_RUN_PASS(ra_init,1); diff --git a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp index f879395..79aef91 100644 --- a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp +++ b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp @@ -56,7 +56,8 @@ bool dce_cleanup::visit(cf_node n, bool enter) { else cleanup_dst(n); } else { - if (n.bc.op_ptr-flags (CF_CLAUSE | CF_BRANCH | CF_LOOP)) + if ((sh.dce_flags DF_EXPAND) + (n.bc.op_ptr-flags (CF_CLAUSE | CF_BRANCH | CF_LOOP))) n.expand(); } return true; @@ -107,19 +108,20 @@ bool dce_cleanup::visit(region_node n, bool enter) { } void dce_cleanup::cleanup_dst(node n) { - cleanup_dst_vec(n.dst); + if (!cleanup_dst_vec(n.dst) remove_unused + !n.dst.empty() !(n.flags NF_DONT_KILL) n.parent) + n.remove(); } bool dce_cleanup::visit(container_node n, bool enter) { - if (enter) { + if (enter) cleanup_dst(n); - } else { - - } return true; } -void dce_cleanup::cleanup_dst_vec(vvec vv) { +bool dce_cleanup::cleanup_dst_vec(vvec vv) { + bool alive = false; + for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) { value* v = *I; if (!v) @@ -128,9 +130,13 @@ void dce_cleanup::cleanup_dst_vec(vvec vv) { if (v-gvn_source v-gvn_source-is_dead()) v-gvn_source = NULL; - if (v-is_dead()) + if (v-is_dead() || (remove_unused !v-is_rel() !v-uses)) v = NULL; + else + alive = true; } + + return alive; } } // namespace r600_sb diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h index 95d2a20..a3f8515 100644 --- a/src/gallium/drivers/r600/sb/sb_pass.h +++ b/src/gallium/drivers/r600/sb/sb_pass.h @@ -119,9 +119,12 @@ public: class dce_cleanup : public vpass { using vpass::visit; + bool remove_unused; + public: - dce_cleanup(shader s) : vpass(s) {} + dce_cleanup(shader s) : vpass(s), + remove_unused(s.dce_flags DF_REMOVE_UNUSED) {} virtual bool visit(node n, bool enter); virtual bool visit(alu_group_node n, bool enter); @@ -135,7 +138,7 @@ public: private: void cleanup_dst(node n); - void cleanup_dst_vec(vvec vv); + bool cleanup_dst_vec(vvec vv); }; diff --git a/src/gallium/drivers/r600/sb/sb_shader.h b/src/gallium/drivers/r600/sb/sb_shader.h index e515d31..7955bba 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.h +++ b/src/gallium/drivers/r600/sb
[Mesa-dev] [PATCH] r600g: fix tgsi_op2_s with trans-only instructions
This fixes the issue when dst and src is the same reg and operation on one channel overwrites the source for other channels, e.g.: UMUL TEMP[2].xyz, TEMP[0].xyzz, TEMP[2]. In this example the result of the operation on channel x is written in TEMP[2].x and then used as a second source operand for channels y and z instead of original value in TEMP[2].x. This patch stores the results in temp reg and moves them to dst after performing operation on all channels. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70327 Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_shader.c | 36 +- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index d17d670..aed2100 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1638,15 +1638,22 @@ static int tgsi_op2_s(struct r600_shader_ctx *ctx, int swap, int trans_only) { struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; struct r600_bytecode_alu alu; - int i, j, r; - int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); + unsigned write_mask = inst-Dst[0].Register.WriteMask; + int i, j, r, lasti = tgsi_last_instruction(write_mask); + /* use temp register if trans_only and more than one dst component */ + int use_tmp = trans_only (write_mask ^ (1 lasti)); - for (i = 0; i lasti + 1; i++) { - if (!(inst-Dst[0].Register.WriteMask (1 i))) + for (i = 0; i = lasti; i++) { + if (!(write_mask (1 i))) continue; memset(alu, 0, sizeof(struct r600_bytecode_alu)); - tgsi_dst(ctx, inst-Dst[0], i, alu.dst); + if (use_tmp) { + alu.dst.sel = ctx-temp_reg; + alu.dst.chan = i; + alu.dst.write = 1; + } else + tgsi_dst(ctx, inst-Dst[0], i, alu.dst); alu.op = ctx-inst_info-op; if (!swap) { @@ -1675,6 +1682,25 @@ static int tgsi_op2_s(struct r600_shader_ctx *ctx, int swap, int trans_only) if (r) return r; } + + if (use_tmp) { + /* move result from temp to dst */ + for (i = 0; i = lasti; i++) { + if (!(write_mask (1 i))) + continue; + + memset(alu, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_MOV; + tgsi_dst(ctx, inst-Dst[0], i, alu.dst); + alu.src[0].sel = ctx-temp_reg; + alu.src[0].chan = i; + alu.last = (i == lasti); + + r = r600_bytecode_add_alu(ctx-bc, alu); + if (r) + return r; + } + } return 0; } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr
Currently it's hardcoded in the shader, so every change requires compilation of the shader variant, killing the performance in Serious Sam 3 and probably other apps. This patch passes alpha_ref in the user sgpr and removes it from the shader key. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/radeonsi/radeonsi_shader.c | 8 -- src/gallium/drivers/radeonsi/radeonsi_shader.h | 39 +- src/gallium/drivers/radeonsi/si_state.c| 7 ++--- 3 files changed, 29 insertions(+), 25 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 97ed4e3..5279bb0 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -570,11 +570,14 @@ static void si_alpha_test(struct lp_build_tgsi_context *bld_base, if (si_shader_ctx-shader-key.ps.alpha_func != PIPE_FUNC_NEVER) { LLVMValueRef out_ptr = si_shader_ctx-radeon_bld.soa.outputs[index][3]; + LLVMValueRef alpha_ref = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, + SI_PARAM_ALPHA_REF); + LLVMValueRef alpha_pass = lp_build_cmp(bld_base-base, si_shader_ctx-shader-key.ps.alpha_func, LLVMBuildLoad(gallivm-builder, out_ptr, ), -lp_build_const_float(gallivm, si_shader_ctx-shader-key.ps.alpha_ref)); +alpha_ref); LLVMValueRef arg = lp_build_select(bld_base-base, alpha_pass, @@ -1569,7 +1572,7 @@ static void create_function(struct si_shader_context *si_shader_ctx) { struct lp_build_tgsi_context *bld_base = si_shader_ctx-radeon_bld.soa.bld_base; struct gallivm_state *gallivm = bld_base-base.gallivm; - LLVMTypeRef params[20], f32, i8, i32, v2i32, v3i32; + LLVMTypeRef params[21], f32, i8, i32, v2i32, v3i32; unsigned i, last_sgpr, num_params; i8 = LLVMInt8TypeInContext(gallivm-context); @@ -1614,6 +1617,7 @@ static void create_function(struct si_shader_context *si_shader_ctx) break; case TGSI_PROCESSOR_FRAGMENT: + params[SI_PARAM_ALPHA_REF] = f32; params[SI_PARAM_PRIM_MASK] = i32; last_sgpr = SI_PARAM_PRIM_MASK; params[SI_PARAM_PERSP_SAMPLE] = v2i32; diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h b/src/gallium/drivers/radeonsi/radeonsi_shader.h index 1db8bb8..c9e851a 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h @@ -37,9 +37,10 @@ #define SI_SGPR_VERTEX_BUFFER 6 /* VS only */ #define SI_SGPR_SO_BUFFER 8 /* VS only, stream-out */ #define SI_SGPR_START_INSTANCE 10 /* VS only */ +#define SI_SGPR_ALPHA_REF 6 /* PS only */ #define SI_VS_NUM_USER_SGPR11 -#define SI_PS_NUM_USER_SGPR6 +#define SI_PS_NUM_USER_SGPR7 /* LLVM function parameter indices */ #define SI_PARAM_CONST 0 @@ -53,23 +54,24 @@ /* the other VS parameters are assigned dynamically */ /* PS only parameters */ -#define SI_PARAM_PRIM_MASK 3 -#define SI_PARAM_PERSP_SAMPLE 4 -#define SI_PARAM_PERSP_CENTER 5 -#define SI_PARAM_PERSP_CENTROID6 -#define SI_PARAM_PERSP_PULL_MODEL 7 -#define SI_PARAM_LINEAR_SAMPLE 8 -#define SI_PARAM_LINEAR_CENTER 9 -#define SI_PARAM_LINEAR_CENTROID 10 -#define SI_PARAM_LINE_STIPPLE_TEX 11 -#define SI_PARAM_POS_X_FLOAT 12 -#define SI_PARAM_POS_Y_FLOAT 13 -#define SI_PARAM_POS_Z_FLOAT 14 -#define SI_PARAM_POS_W_FLOAT 15 -#define SI_PARAM_FRONT_FACE16 -#define SI_PARAM_ANCILLARY 17 -#define SI_PARAM_SAMPLE_COVERAGE 18 -#define SI_PARAM_POS_FIXED_PT 19 +#define SI_PARAM_ALPHA_REF 3 +#define SI_PARAM_PRIM_MASK 4 +#define SI_PARAM_PERSP_SAMPLE 5 +#define SI_PARAM_PERSP_CENTER 6 +#define SI_PARAM_PERSP_CENTROID7 +#define SI_PARAM_PERSP_PULL_MODEL 8 +#define SI_PARAM_LINEAR_SAMPLE 9 +#define SI_PARAM_LINEAR_CENTER 10 +#define SI_PARAM_LINEAR_CENTROID 11 +#define SI_PARAM_LINE_STIPPLE_TEX 12 +#define SI_PARAM_POS_X_FLOAT 13 +#define SI_PARAM_POS_Y_FLOAT 14 +#define SI_PARAM_POS_Z_FLOAT 15 +#define SI_PARAM_POS_W_FLOAT 16 +#define SI_PARAM_FRONT_FACE17 +#define SI_PARAM_ANCILLARY 18 +#define SI_PARAM_SAMPLE_COVERAGE 19 +#define SI_PARAM_POS_FIXED_PT 20 struct si_shader_io { unsignedname; @@ -124,7 +126,6 @@ union si_shader_key { unsignedalpha_func:3
Re: [Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr
On 10/10/2013 02:11 PM, Michel Dänzer wrote: On Don, 2013-10-10 at 12:49 +0400, Vadim Girlin wrote: Currently it's hardcoded in the shader, so every change requires compilation of the shader variant, killing the performance in Serious Sam 3 and probably other apps. This patch passes alpha_ref in the user sgpr and removes it from the shader key. Signed-off-by: Vadim Girlin vadimgir...@gmail.com Reviewed-by: Michel Dänzer michel.daen...@amd.com I presume this causes no regressions with piglit quick.tests. Yes, there are no regressions with piglit. Thanks for reviewing. By the way, I'm also not sure if this is the right way of doing it, especially if we'll need to pass more parameters for any new features. Possibly some other ways could be more preferable, e.g. to put it with any other data that we may need in the future into internal const buffer (like we do in r600g for clip planes etc), or maybe there are other ways on SI that I'm not aware of yet? Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: pass alpha_ref value to PS in the user sgpr
On 10/10/2013 08:10 PM, Christian König wrote: Am 10.10.2013 18:02, schrieb Vadim Girlin: On 10/10/2013 02:11 PM, Michel Dänzer wrote: On Don, 2013-10-10 at 12:49 +0400, Vadim Girlin wrote: Currently it's hardcoded in the shader, so every change requires compilation of the shader variant, killing the performance in Serious Sam 3 and probably other apps. This patch passes alpha_ref in the user sgpr and removes it from the shader key. Signed-off-by: Vadim Girlin vadimgir...@gmail.com Reviewed-by: Michel Dänzer michel.daen...@amd.com I presume this causes no regressions with piglit quick.tests. Yes, there are no regressions with piglit. Thanks for reviewing. By the way, I'm also not sure if this is the right way of doing it, especially if we'll need to pass more parameters for any new features. Possibly some other ways could be more preferable, e.g. to put it with any other data that we may need in the future into internal const buffer (like we do in r600g for clip planes etc), or maybe there are other ways on SI that I'm not aware of yet? That strongly depends on how often we use a parameter. The docs speak of a penalty associated with loading each SGPR so we should try to use as less as possible, but loading something from constant space is also costly without proper support for the constant IB. By the way, AFAICS some SGPR inputs are often not used at all in the shaders, I guess we might want to use per-shader mapping of parameters to SGPRs so that we'll only load actually used values for each shader. Compiler will need to pack required input SGPRs to lowest indices and provide the parameter-SGPR map to the driver. OTOH this would slightly increase the amount of driver's work, so I'm not really sure yet if it's worth it, looks like we already have a pretty significant overhead. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Move variable dereference after null check.
On 09/28/2013 10:08 AM, Vinson Lee wrote: Fixes Deference before null check defect reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org Reviewed-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index 0b332a9..e53aba5 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -395,11 +395,12 @@ void ra_init::color_bs_constraint(ra_constraint* c) { for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) { value *v = *I; - sel_chan gpr = v-get_final_gpr(); if (!v || v-is_dead()) continue; + sel_chan gpr = v-get_final_gpr(); + val_set interf; if (v-chunk) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 01:15 PM, Christian König wrote: Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). No, I'm not really volunteering to implement that. :) I'm not even sure if it's possible in reasonable time. In fact it was more like a theoretical discussion about what would be required for the early compilation in the driver to make sense. Perhaps I failed to explain it, but actually my point is that while the compilation is deferred in upper layers and nobody is going to change this (if it's possible at all), it doesn't make sense to try compiling early in the driver. I think we might prefer to defer the compilation in the driver as well - it doesn't make overall situation any worse, but can make it better by not compiling unused variants at least. Vadim Also let's assume that shaders are small and having allot of shader variants around after they are compiled isn't bad. In this case the probably best solution is to compile early and try to make the shaders as state invariant as possible, e.g. don't do optimizations like getting ride of extra exports for case where we don't need the alpha test or if it's just a dependency on a boolean then have both variants covered by the bytecode and use a bit constant to choose between the two etc... As a second step the driver should create a optimized version of the shader in a background thread when we know all the state that is/was active when the shader is used. Of course you need a bit of heuristic for this, cause sometimes it is better to switch between shader variants and other times it is better to have one variant covering all the different states and just use bit constants to choose between them. Just some thoughts on this topic, Christian. PS: My mail server is once more driving me nuts, please ignore the extra copy if you get this mail twice. Am 28.08.2013 02:07, schrieb Vadim Girlin: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time
[Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { /* never export more colors than the number of CBs */ - if (shader-output[i].sid = key.nr_cbufs) { + if (shader-output[i].sid = max_color_exports) { /* skip export */ j--; continue; @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; shader-nr_ps_color_exports++; if (shader-fs_write_all (rscreen-chip_class = EVERGREEN)) { - for (k = 1; k key.nr_cbufs; k++) { + for (k = 1; k max_color_exports; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/27/2013 11:00 PM, Roland Scheidegger wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. I thought about performance, but my main concern now is to avoid serious regressions after enabling sb, we can try to improve it later. Even if we won't emit this color export, we'll have fake export (with all color components masked) instead, and I'm not sure whether it's cheaper. Possibly hardware can see that there is no actual memory write, but benchmarks are needed to prove it. Also there is another possible improvement for exports - sometimes we need to export depth/stencil but no colors, probably we can get rid of fake color export as well in such cases. Anyway, this also needs additional testing/benchmarking. Vadim Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { /* never export more colors than the number of CBs */ - if (shader-output[i].sid = key.nr_cbufs) { + if (shader-output[i].sid = max_color_exports) { /* skip export */ j--; continue; @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; shader-nr_ps_color_exports++; if (shader-fs_write_all (rscreen-chip_class = EVERGREEN)) { - for (k = 1; k key.nr_cbufs; k++) { + for (k = 1; k max_color_exports; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm backend uncovered by sb - SET_GRADIENTS_V/H instructions are not placed in the same TEX clause with corresponding SAMPLE_G. src/gallium/drivers/r600/r600_shader.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 300b5c4..f7eab76 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, unsigned opcode; int i, j, k, r = 0; int next_pos_base = 60, next_param_base = 0; + int max_color_exports = MAX2(key.nr_cbufs, 1); /* Declarations used by llvm code */ bool use_llvm = false; bool indirect_gprs; @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = max_color_exports; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 02:28 AM, Roland Scheidegger wrote: Am 27.08.2013 23:52, schrieb Vadim Girlin: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Hmm ok that seems a way more complicated problem than I thought :-). Compile early and you might compile variants you will never use, compile late and the delay might be noticeable. Compilation of unused variants is not bad if they are compiled at the game/level loading time, I think many apps are trying to compile shaders early to avoid freezes during gameplay. But trying to compile early in the driver doesn't make sense currently because it's already too late anyway, if I'm not missing something, it's deferred in mesa or state tracker. Otherwise probably it would be preferable for the driver to precompile variants that are likely to be used (but only if we really can do it early, at the loading time when shaders are created by the app). I just thought it might be unlikely you'd actually need two variants - e.g. some depth exporting shader is probably unlikely to use alpha test. But ok I guess it shouldn't write color in this case, so even then it might never be worth bothering. Was just a random idea ;-). I think it's a good idea that just needs some benchmarking to make sure that it can provide any real benefits. I only wanted to say that it can be done separately from this fix. Vadim Roland Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes about 300 tests (llvm's depth/stencil issues and regressions cased by reordering of exports in sb) With this patch, there are no regressions with default+sb vs default. There is one regression with llvm+sb vs llvm - fs-texturegrad-miplevels, AFAICS it's a problem with llvm
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends - no regressions, fixes some tests: default- fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff llvm - fixes about 25 tests related to depth/stencil llvm+sb- fixes
Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs
On 08/24/2013 02:31 PM, Marek Olšák wrote: Like Christoph said, COLOR0 (if available) must always be exported for alpha test. Are there any piglit tests for that? I didn't see any regressions with this patch (at least on evergreen), possibly I messed up the testing somehow. Also I think old backend uses the same logic. Vadim Marek On Sat, Aug 24, 2013 at 3:30 AM, Vadim Girlin vadimgir...@gmail.com wrote: Currently llvm backend always exports at least one color in pixel shader even if no color buffers are enabled. With depth/stencil exports this can result in the following code: EXPORT PIXEL 0 R0.xyzw VPM EXPORT PIXEL 61R1.x___ VPM EXPORT_DONEPIXEL 61R0._x__ VPM EOP AFAIU with zero color buffers no memory is reserved for colors in the export ring and all exports in this example actually write to the same location. The code above still works fine in this particular case, because correct values are written last, but reordering can break it (especially with SB which tends to reorder the exports). Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- This fixes regressions with LLVM+SB, so I consider it as a prerequisite for enabling SB by default. Also it fixes some issues with LLVM backend alone. Tested on evergreen only (I don't have other hw), needs testing on pre-evergreen GPUs. src/gallium/drivers/r600/r600_llvm.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 03a68e4..d2f4aff 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context * bld_base) } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) { switch (ctx-r600_outputs[i].name) { case TGSI_SEMANTIC_COLOR: - has_color = true; if ( color_count ctx-color_buffer_count) { + has_color = true; LLVMValueRef args[3]; args[0] = output; if (ctx-fs_color_all) { diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fb766c4..85f8469 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = key.nr_cbufs; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs
On 08/24/2013 07:24 PM, Marek Olšák wrote: See piglit/fbo-alphatest-nocolor. Ah, it seems I just compared wrong results when I was testing all combinations of backends and looked for regressions. Now I think the problem is that even though llvm backend correctly emits color export with nr_cbufs==0, but it still relies on nr_ps_color_exports value computed in the old backend path (which is currently broken for that case), and this resulted in the regressions that I wanted to fix. I'll send new patch. Vadim Marek On Sat, Aug 24, 2013 at 3:12 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/24/2013 02:31 PM, Marek Olšák wrote: Like Christoph said, COLOR0 (if available) must always be exported for alpha test. Are there any piglit tests for that? I didn't see any regressions with this patch (at least on evergreen), possibly I messed up the testing somehow. Also I think old backend uses the same logic. Vadim Marek On Sat, Aug 24, 2013 at 3:30 AM, Vadim Girlin vadimgir...@gmail.com wrote: Currently llvm backend always exports at least one color in pixel shader even if no color buffers are enabled. With depth/stencil exports this can result in the following code: EXPORT PIXEL 0 R0.xyzw VPM EXPORT PIXEL 61R1.x___ VPM EXPORT_DONEPIXEL 61R0._x__ VPM EOP AFAIU with zero color buffers no memory is reserved for colors in the export ring and all exports in this example actually write to the same location. The code above still works fine in this particular case, because correct values are written last, but reordering can break it (especially with SB which tends to reorder the exports). Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- This fixes regressions with LLVM+SB, so I consider it as a prerequisite for enabling SB by default. Also it fixes some issues with LLVM backend alone. Tested on evergreen only (I don't have other hw), needs testing on pre-evergreen GPUs. src/gallium/drivers/r600/r600_llvm.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 03a68e4..d2f4aff 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context * bld_base) } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) { switch (ctx-r600_outputs[i].name) { case TGSI_SEMANTIC_COLOR: - has_color = true; if ( color_count ctx-color_buffer_count) { + has_color = true; LLVMValueRef args[3]; args[0] = output; if (ctx-fs_color_all) { diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fb766c4..85f8469 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = key.nr_cbufs; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs
Currently llvm backend always exports at least one color in pixel shader even if no color buffers are enabled. With depth/stencil exports this can result in the following code: EXPORT PIXEL 0 R0.xyzw VPM EXPORT PIXEL 61R1.x___ VPM EXPORT_DONEPIXEL 61R0._x__ VPM EOP AFAIU with zero color buffers no memory is reserved for colors in the export ring and all exports in this example actually write to the same location. The code above still works fine in this particular case, because correct values are written last, but reordering can break it (especially with SB which tends to reorder the exports). Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- This fixes regressions with LLVM+SB, so I consider it as a prerequisite for enabling SB by default. Also it fixes some issues with LLVM backend alone. Tested on evergreen only (I don't have other hw), needs testing on pre-evergreen GPUs. src/gallium/drivers/r600/r600_llvm.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 03a68e4..d2f4aff 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context * bld_base) } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) { switch (ctx-r600_outputs[i].name) { case TGSI_SEMANTIC_COLOR: - has_color = true; if ( color_count ctx-color_buffer_count) { + has_color = true; LLVMValueRef args[3]; args[0] = output; if (ctx-fs_color_all) { diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fb766c4..85f8469 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); + radeon_llvm_ctx.color_buffer_count = key.nr_cbufs; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] [RFC] r600g: enable SB backend by default
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_asm.c| 3 ++- src/gallium/drivers/r600/r600_pipe.c | 4 ++-- src/gallium/drivers/r600/r600_pipe.h | 2 +- src/gallium/drivers/r600/r600_shader.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index b8eedae..a0492a6 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -2281,7 +2281,8 @@ void *r600_create_vertex_fetch_shader(struct pipe_context *ctx, uint32_t *bytecode; int i, j, r, fs_size; struct r600_fetch_shader *shader; - unsigned sb_disasm = rctx-screen-debug_flags (DBG_SB_DISASM | DBG_SB); + unsigned no_sb = rctx-screen-debug_flags DBG_NO_SB; + unsigned sb_disasm = !no_sb || (rctx-screen-debug_flags DBG_SB_DISASM); assert(count 32); diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 2be5910..edd50f0 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -67,8 +67,8 @@ static const struct debug_named_value debug_options[] = { { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of INVALIDATE_RANGE map flags }, /* shader backend */ - { sb, DBG_SB, Enable optimization of graphics shaders }, - { sbcl, DBG_SB_CS, Enable optimization of compute shaders }, + { nosb, DBG_NO_SB, Disable sb backend for graphics shaders }, + { sbcl, DBG_SB_CS, Enable sb backend for compute shaders }, { sbdry, DBG_SB_DRY_RUN, Don't use optimized bytecode (just print the dumps) }, { sbstat, DBG_SB_STAT, Print optimization statistics for shaders }, { sbdump, DBG_SB_DUMP, Print IR dumps after some optimization passes }, diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 21d68c9..398ac89 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -249,7 +249,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context *ctx, #define DBG_NO_ASYNC_DMA (1 19) #define DBG_NO_DISCARD_RANGE (1 20) /* shader backend */ -#define DBG_SB (1 21) +#define DBG_NO_SB (1 21) #define DBG_SB_CS (1 22) #define DBG_SB_DRY_RUN (1 23) #define DBG_SB_STAT(1 24) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fb766c4..1563430 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -140,7 +140,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx, int r, i; uint32_t *ptr; bool dump = r600_can_dump_shader(rctx-screen, tgsi_get_processor_type(sel-tokens)); - unsigned use_sb = rctx-screen-debug_flags DBG_SB; + unsigned use_sb = !(rctx-screen-debug_flags DBG_NO_SB); unsigned sb_disasm = use_sb || (rctx-screen-debug_flags DBG_SB_DISASM); shader-shader.bc.isa = rctx-isa; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Initialize cf_node::bc.
On 08/19/2013 01:35 AM, Vinson Lee wrote: Fixes Uninitialized pointer field defect reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org --- src/gallium/drivers/r600/sb/sb_ir.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_ir.h b/src/gallium/drivers/r600/sb/sb_ir.h index c838f62..b696e77 100644 --- a/src/gallium/drivers/r600/sb/sb_ir.h +++ b/src/gallium/drivers/r600/sb/sb_ir.h @@ -962,8 +962,8 @@ public: class cf_node : public container_node { protected: - cf_node() : container_node(NT_OP, NST_CF_INST), jump_target(), - jump_after_target() {}; + cf_node() : container_node(NT_OP, NST_CF_INST), bc(), + jump_target(), jump_after_target() {}; Hi, Vinson, IIRC I switched the initialization of bc struct from constructor initializer list to explicit memset due to reported issues with older gcc versions, it failed to initialize the struct properly. See commit 41005d. Constructors of cf_node (as well as fetch_node, alu_node) are protected and called only by helper functions (create_cf, create_fetch, create_alu) in friend class r600_sb::shader that create nodes in pool, memset for bc is called right after constructor in these functions, so actually bc is always initialized. I don't remember why I didn't use memset in constructor body though, maybe moving memset there would silence Coverity? Vadim public: bc_cf bc; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Move memsets of member structs to within constructor bodies.
On 08/19/2013 11:50 AM, Vinson Lee wrote: Silences Uninitialized pointer field defects reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org Reviewed-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_ir.h | 6 +++--- src/gallium/drivers/r600/sb/sb_shader.cpp | 3 --- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_ir.h b/src/gallium/drivers/r600/sb/sb_ir.h index c838f62..a74d6cb 100644 --- a/src/gallium/drivers/r600/sb/sb_ir.h +++ b/src/gallium/drivers/r600/sb/sb_ir.h @@ -963,7 +963,7 @@ public: class cf_node : public container_node { protected: cf_node() : container_node(NT_OP, NST_CF_INST), jump_target(), - jump_after_target() {}; + jump_after_target() { memset(bc, 0, sizeof(bc_cf)); }; public: bc_cf bc; @@ -982,7 +982,7 @@ public: class alu_node : public node { protected: - alu_node() : node(NT_OP, NST_ALU_INST) {}; + alu_node() : node(NT_OP, NST_ALU_INST) { memset(bc, 0, sizeof(bc_alu)); }; public: bc_alu bc; @@ -1028,7 +1028,7 @@ public: class fetch_node : public node { protected: - fetch_node() : node(NT_OP, NST_FETCH_INST) {}; + fetch_node() : node(NT_OP, NST_FETCH_INST) { memset(bc, 0, sizeof(bc_fetch)); }; public: bc_fetch bc; diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 9fc47ae..98e52b1 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -260,7 +260,6 @@ node* shader::create_node(node_type nt, node_subtype nst, node_flags flags) { alu_node* shader::create_alu() { alu_node* n = new (pool.allocate(sizeof(alu_node))) alu_node(); - memset(n-bc, 0, sizeof(bc_alu)); all_nodes.push_back(n); return n; } @@ -281,7 +280,6 @@ alu_packed_node* shader::create_alu_packed() { cf_node* shader::create_cf() { cf_node* n = new (pool.allocate(sizeof(cf_node))) cf_node(); - memset(n-bc, 0, sizeof(bc_cf)); n-bc.barrier = 1; all_nodes.push_back(n); return n; @@ -289,7 +287,6 @@ cf_node* shader::create_cf() { fetch_node* shader::create_fetch() { fetch_node* n = new (pool.allocate(sizeof(fetch_node))) fetch_node(); - memset(n-bc, 0, sizeof(bc_fetch)); all_nodes.push_back(n); return n; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: use MULADD workaround on R7xx for MULADD_IEEE
Looks like the same issue that was seen with MULADD in trans slot on R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is just a most frequently used?). The workaround is to never put affected instructions into the trans slot. IIRC it was mostly observed when affected instructions had kcache operands and some specific bank swizzles, but I have no R7xx hw to verify that, also I'm still not sure whether it affects R6xx. Probably the condition can be narrowed to allow better ALU packing in some cases. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927 Signed-off-by: Vadim Girlin vadimgir...@gmail.com Cc: 9.2 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/r600/sb/sb_sched.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index f0e41f5..2792315 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -1490,7 +1490,8 @@ unsigned post_scheduler::try_add_instruction(node *n) { // FIXME workaround for some problems with MULADD in trans slot on r700, // (is it really needed on r600?) - if (a-bc.op == ALU_OP3_MULADD !ctx.is_egcm()) { + if ((a-bc.op == ALU_OP3_MULADD || a-bc.op == ALU_OP3_MULADD_IEEE) + !ctx.is_egcm()) { allowed_slots = 0x0F; } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Dump correct value for CND.
On 08/04/2013 11:02 AM, Vinson Lee wrote: Fixes Copy-paste error reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org --- src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp index 9d76465..9b1420d 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp @@ -174,7 +174,7 @@ void bc_dump::dump(cf_node n) { } if (n.bc.cond) - s CND: n.bc.pop_count; + s CND: n.bc.cond; if (n.bc.pop_count) s POP: n.bc.pop_count; Reviewed-by: Vadim Girlin vadimgir...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Fix Android build
On 06/28/2013 01:31 AM, Tom Stellard wrote: From: Chih-Wei Huang cwhu...@android-x86.org Add the sb CXX files to the Android Makefile and also stop using some c++11 features. --- src/gallium/drivers/r600/Android.mk | 5 +++-- src/gallium/drivers/r600/sb/sb_bc.h | 4 ++-- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 2 +- src/gallium/drivers/r600/sb/sb_valtable.cpp | 4 ++-- 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/Android.mk b/src/gallium/drivers/r600/Android.mk index e5188bb..4d2f69f 100644 --- a/src/gallium/drivers/r600/Android.mk +++ b/src/gallium/drivers/r600/Android.mk @@ -28,11 +28,12 @@ include $(LOCAL_PATH)/Makefile.sources include $(CLEAR_VARS) -LOCAL_SRC_FILES := $(C_SOURCES) +LOCAL_SRC_FILES := $(C_SOURCES) $(CXX_SOURCES) -LOCAL_C_INCLUDES := +LOCAL_C_INCLUDES := $(DRM_TOP) LOCAL_MODULE := libmesa_pipe_r600 +include external/stlport/libstlport.mk include $(GALLIUM_COMMON_MK) include $(BUILD_STATIC_LIBRARY) diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 25255a7..73c250d 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -846,7 +846,7 @@ public: unsigned ndw() { return bc.size(); } void write_data(uint32_t* dst) { - memcpy(dst, bc.data(), 4 * bc.size()); + std::copy(bc.begin(), bc.end(), dst); } void align(unsigned a) { @@ -870,7 +870,7 @@ public: } unsigned get_pos() { return pos; } - uint32_t *data() { return bc.data(); } + uint32_t *data() { return bc.begin(); } This results in type conversion error for me with gcc 4.8.1 (fedora 19). Probably we can simply use bc[0] here. PS Sorry for the late reply, I'm sick now so I haven't checked email recently. Also I'm not sure when I'll be able to look into it and run any tests myself, so if this issue is fixed and there are no other regressions, I'm OK with this patch. bytecode operator (uint32_t v) { if (pos == ndw()) { diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index bfe5ab9..24b24a0 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -680,7 +680,7 @@ void ra_split::split_vec(vvec vv, vvec v1, vvec v2, bool allow_swz) { value *t; vvec::iterator F = - allow_swz ? find(v2.begin(), v2.end(), o) : v2.end(); + allow_swz ? std::find(v2.begin(), v2.end(), o) : v2.end(); if (F != v2.end()) { t = *(v1.begin() + (F - v2.begin())); diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp b/src/gallium/drivers/r600/sb/sb_valtable.cpp index 5e6aca0..00aee66 100644 --- a/src/gallium/drivers/r600/sb/sb_valtable.cpp +++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp @@ -207,7 +207,7 @@ void value_table::get_values(vvec v) { for(vt_table::iterator I = hashtable.begin(), E = hashtable.end(); I != E; ++I) { - T = copy(I-begin(), I-end(), T); + T = std::copy(I-begin(), I-end(), T); } } @@ -368,7 +368,7 @@ inline bool sb_bitset::set_chk(unsigned id, bool bit) { } void sb_bitset::clear() { - memset(data.data(), 0, sizeof(basetype) * data.size()); + std::fill(data.begin(), data.end(), 0); } void sb_bitset::resize(unsigned size) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: improve math optimizations
On 06/05/2013 12:01 AM, Grigori Goronzy wrote: On 31.05.2013 14:37, Vadim Girlin wrote: There are no regressions on evergreen with piglit tests or any other apps that I tested, with and without llvm backend. (Issue with Unigine Heaven that I mentioned on #dri-devel yesterday was in fact caused by my own well-hidden bug, now it's fixed). Improvements for real apps probably won't be very noticeable in many cases, but this still might help some apps, e.g. this improves shader2 test of the fill benchmark in mesa demos. I see noticeable FPS improvements (~7%) with one of my older pixel shader effects here, a plasma-like thingie. But this also breaks rendering in some other cases, e.g. http://www.iquilezles.org/apps/shadertoy/index2.html?p=Heart looks wrong. The colors are off. Thanks for testing, I'll fix this issue and send updated patch. Vadim Best regards Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: improve math optimizations v2
This patch adds support for some math optimizations that are generally considered unsafe, that's why they are currently disabled for compute shaders. GL requirements are less strict, so they are enabled for for GL shaders by default. In case of any issues with applications that rely on higher precision than guaranteed by GL, 'sbsafemath' option in R600_DEBUG allows to disable them. v2 - always set proper src vector size for transformed instructions - check for clamp modifier in the expr_handler::fold_assoc Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_isa.h | 19 +- src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/sb/sb_bc.h | 1 + src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 + src/gallium/drivers/r600/sb/sb_context.cpp | 1 + src/gallium/drivers/r600/sb/sb_core.cpp | 1 + src/gallium/drivers/r600/sb/sb_expr.cpp | 448 --- src/gallium/drivers/r600/sb/sb_expr.h| 4 + src/gallium/drivers/r600/sb/sb_shader.cpp| 2 +- src/gallium/drivers/r600/sb/sb_shader.h | 2 + 11 files changed, 435 insertions(+), 47 deletions(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 89d..c6bb869 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -84,7 +84,8 @@ enum alu_op_flags * includes MULADDs (considering the MUL part on src0 and src1 only) */ AF_M_COMM = (1 23), - /* associative operation ((a op b) op c) == (a op (b op c)) */ + /* associative operation ((a op b) op c) == (a op (b op c)), +* includes MULADDs (considering the MUL part on src0 and src1 only) */ AF_M_ASSOC = (1 24), AF_PRED_PUSH = (1 25), @@ -373,11 +374,11 @@ static const struct alu_op_info alu_op_table[] = { {SAD_ACCUM_HI_UINT, 3, { -1, 0x0F },{ 0, 0, AF_V, AF_V}, AF_UINT_DST }, {MULADD_UINT24, 3, { -1, 0x10 },{ 0, 0, AF_V, AF_V}, AF_UINT_DST | AF_24 }, {LDS_IDX_OP,3, { -1, 0x11 },{ 0, 0, AF_V, AF_V}, 0 }, - {MULADD,3, { 0x10, 0x14 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_M2, 3, { 0x11, 0x15 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_M4, 3, { 0x12, 0x16 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_D2, 3, { 0x13, 0x17 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_IEEE, 3, { 0x14, 0x18 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_IEEE }, + {MULADD,3, { 0x10, 0x14 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_M2, 3, { 0x11, 0x15 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_M4, 3, { 0x12, 0x16 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_D2, 3, { 0x13, 0x17 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_IEEE, 3, { 0x14, 0x18 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, {CNDE, 3, { 0x18, 0x19 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_E }, {CNDGT, 3, { 0x19, 0x1A },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_GT }, {CNDGE, 3, { 0x1A, 0x1B },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_GE }, @@ -397,9 +398,9 @@ static const struct alu_op_info alu_op_table[] = { {MUL_LIT_M2,3, { 0x0D, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, {MUL_LIT_M4,3, { 0x0E, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, {MUL_LIT_D2,3, { 0x0F, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, - {MULADD_IEEE_M2,3, { 0x15, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, - {MULADD_IEEE_M4,3, { 0x16, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, - {MULADD_IEEE_D2,3, { 0x17, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, + {MULADD_IEEE_M2,3, { 0x15, -1 },{ AF_VS, AF_VS, 0, 0}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, + {MULADD_IEEE_M4,3, { 0x16, -1 },{ AF_VS, AF_VS, 0, 0}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, + {MULADD_IEEE_D2,3, { 0x17, -1 },{ AF_VS, AF_VS, 0, 0}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, {LDS_ADD, 2, { -1, 0x0011 },{ 0, 0, AF_V
[Mesa-dev] [PATCH] r600g/sb: improve math optimizations
This patch adds support for some math optimizations that are generally considered unsafe, that's why they are currently disabled for compute shaders. GL requirements are less strict, so they are enabled for for GL shaders by default. In case of any issues with applications that rely on higher precision than guaranteed by GL, 'sbsafemath' option in R600_DEBUG allows to disable them. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- There are no regressions on evergreen with piglit tests or any other apps that I tested, with and without llvm backend. (Issue with Unigine Heaven that I mentioned on #dri-devel yesterday was in fact caused by my own well-hidden bug, now it's fixed). Improvements for real apps probably won't be very noticeable in many cases, but this still might help some apps, e.g. this improves shader2 test of the fill benchmark in mesa demos. src/gallium/drivers/r600/r600_isa.h | 19 +- src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/sb/sb_bc.h | 1 + src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 2 + src/gallium/drivers/r600/sb/sb_context.cpp | 1 + src/gallium/drivers/r600/sb/sb_core.cpp | 1 + src/gallium/drivers/r600/sb/sb_expr.cpp | 444 --- src/gallium/drivers/r600/sb/sb_expr.h| 4 + src/gallium/drivers/r600/sb/sb_shader.cpp| 2 +- src/gallium/drivers/r600/sb/sb_shader.h | 2 + 11 files changed, 431 insertions(+), 47 deletions(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 89d..c6bb869 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -84,7 +84,8 @@ enum alu_op_flags * includes MULADDs (considering the MUL part on src0 and src1 only) */ AF_M_COMM = (1 23), - /* associative operation ((a op b) op c) == (a op (b op c)) */ + /* associative operation ((a op b) op c) == (a op (b op c)), +* includes MULADDs (considering the MUL part on src0 and src1 only) */ AF_M_ASSOC = (1 24), AF_PRED_PUSH = (1 25), @@ -373,11 +374,11 @@ static const struct alu_op_info alu_op_table[] = { {SAD_ACCUM_HI_UINT, 3, { -1, 0x0F },{ 0, 0, AF_V, AF_V}, AF_UINT_DST }, {MULADD_UINT24, 3, { -1, 0x10 },{ 0, 0, AF_V, AF_V}, AF_UINT_DST | AF_24 }, {LDS_IDX_OP,3, { -1, 0x11 },{ 0, 0, AF_V, AF_V}, 0 }, - {MULADD,3, { 0x10, 0x14 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_M2, 3, { 0x11, 0x15 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_M4, 3, { 0x12, 0x16 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_D2, 3, { 0x13, 0x17 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM }, - {MULADD_IEEE, 3, { 0x14, 0x18 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_IEEE }, + {MULADD,3, { 0x10, 0x14 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_M2, 3, { 0x11, 0x15 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_M4, 3, { 0x12, 0x16 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_D2, 3, { 0x13, 0x17 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, + {MULADD_IEEE, 3, { 0x14, 0x18 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, {CNDE, 3, { 0x18, 0x19 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_E }, {CNDGT, 3, { 0x19, 0x1A },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_GT }, {CNDGE, 3, { 0x1A, 0x1B },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_CMOV | AF_CC_GE }, @@ -397,9 +398,9 @@ static const struct alu_op_info alu_op_table[] = { {MUL_LIT_M2,3, { 0x0D, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, {MUL_LIT_M4,3, { 0x0E, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, {MUL_LIT_D2,3, { 0x0F, -1 },{ AF_VS, AF_VS, 0, 0}, 0 }, - {MULADD_IEEE_M2,3, { 0x15, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, - {MULADD_IEEE_M4,3, { 0x16, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, - {MULADD_IEEE_D2,3, { 0x17, -1 },{ AF_VS, AF_VS, 0, 0}, AF_IEEE }, + {MULADD_IEEE_M2,3, { 0x15, -1 },{ AF_VS, AF_VS, 0, 0}, AF_M_COMM | AF_M_ASSOC | AF_IEEE }, + {MULADD_IEEE_M4,3
Re: [Mesa-dev] [PATCH 1/2] r600g: add ISA info for RAT instructions
On 05/30/2013 05:48 AM, Tom Stellard wrote: On Mon, May 27, 2013 at 02:15:21AM +0400, Vadim Girlin wrote: This will help to improve dumps of the compute shaders, also it will be required for complete handling of RAT instructions in sb. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_isa.c | 19 ++ src/gallium/drivers/r600/r600_isa.h | 132 2 files changed, 151 insertions(+) diff --git a/src/gallium/drivers/r600/r600_isa.c b/src/gallium/drivers/r600/r600_isa.c index 4c6ccac..c99352f 100644 --- a/src/gallium/drivers/r600/r600_isa.c +++ b/src/gallium/drivers/r600/r600_isa.c @@ -81,6 +81,23 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa *isa) { isa-cf_map[opc] = i + 1; } + /* RAT instructions are not available on pre-evergreen */ + if (ctx-chip_class = EVERGREEN) { + unsigned column = isa-hw_class - ISA_CC_EVERGREEN; + + isa-rat_map = calloc(64, sizeof(unsigned)); + if (!isa-rat_map) + return -1; + + for (i = 0; i TABLE_SIZE(rat_op_table); ++i) { + const struct rat_op_info *op = rat_op_table[i]; + unsigned opc = op-opcode[column]; + if (opc == -1) + continue; + isa-rat_map[opc] = i + 1; + } + } + return 0; } @@ -97,6 +114,8 @@ int r600_isa_destroy(struct r600_isa *isa) { free(isa-fetch_map); if (isa-cf_map) free(isa-cf_map); + if (isa-rat_map) + free(isa-rat_map); free(isa); return 0; diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 89d..4055a04 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -147,6 +147,12 @@ enum cf_op_flags CF_LOOP_START = (114) }; +enum rat_op_flags +{ + RF_RTN = (1 0), + +}; + /* ALU instruction info */ struct alu_op_info { @@ -182,6 +188,15 @@ struct cf_op_info int flags; }; +/* CF RAT instruction info */ +struct rat_op_info +{ + const char * name; + /* 0 - EG, 1 - CM */ + int opcode[2]; + int flags; +}; + static const struct alu_op_info alu_op_table[] = { {ADD, 2, { 0x00, 0x00 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, {MUL, 2, { 0x01, 0x01 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, @@ -665,6 +680,97 @@ static const struct cf_op_info cf_op_table[] = { {CF_NATIVE, { 0x00, 0x00, 0x00, 0x00 }, 0 } }; +static const struct rat_op_info rat_op_table[] = { + {NOP, {0x00, 0x00}, 0}, + {STORE_TYPED, {0x01, 0x01}, 0}, + {STORE_RAW, {0x02, -1}, 0}, The LLVM backend uses the STORE_RAW instructions on Cayman even though the docs don't list it is a legal instructions. Will this cause the code to crash? Yes, I think it will cause assert (or crash if asserts are disabled), though only if sbcl or sbdisasm are enabled in R600_DEBUG. I have it on my TODO list to use the STORE_* instructions on Cayman, but I'm not sure when I'll be able to get to it. Maybe for now you can enable this opcode on Cayman too. Of course, if it works then possibly the doc is not completely correct. On the other hand, I wonder if the doc is right and this might explain some issues people report with compute on cayman. Anyway I'll enable this opcode on Cayman for now to avoid the problems. -Tom + {STORE_RAW_FDENORM, {0x03, -1}, 0}, + {CMPXCHG_INT, {0x04, 0x00}, 0}, There is a typo in cayman opcode, should be 0x04 instead of 0x00, I'll fix it too. Vadim + {CMPXCHG_FLT, {0x05, -1}, 0}, + {CMPXCHG_FDENORM, {0x06, -1}, 0}, + {ADD, {0x07, 0x07}, 0}, + {SUB, {0x08, 0x08}, 0}, + {RSUB,{0x09, 0x09}, 0}, + {MIN_INT, {0x0A, 0x0A}, 0}, + {MIN_UINT,{0x0B, 0x0B}, 0}, + {MAX_INT, {0x0C, 0x0C}, 0}, + {MAX_UINT,{0x0D, 0x0D}, 0}, + {AND, {0x0E, 0x0E}, 0}, + {OR, {0x0F, 0x0F}, 0}, + {XOR, {0x10, 0x10}, 0}, + {MSKOR, {0x11, -1}, 0}, + {INC_UINT,{0x12, 0x12}, 0}, + {DEC_UINT,{0x13, 0x13}, 0}, + + {STORE_DWORD, { -1, 0x14}, 0
[Mesa-dev] [PATCH 1/2] r600g: add ISA info for RAT instructions
This will help to improve dumps of the compute shaders, also it will be required for complete handling of RAT instructions in sb. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_isa.c | 19 ++ src/gallium/drivers/r600/r600_isa.h | 132 2 files changed, 151 insertions(+) diff --git a/src/gallium/drivers/r600/r600_isa.c b/src/gallium/drivers/r600/r600_isa.c index 4c6ccac..c99352f 100644 --- a/src/gallium/drivers/r600/r600_isa.c +++ b/src/gallium/drivers/r600/r600_isa.c @@ -81,6 +81,23 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa *isa) { isa-cf_map[opc] = i + 1; } + /* RAT instructions are not available on pre-evergreen */ + if (ctx-chip_class = EVERGREEN) { + unsigned column = isa-hw_class - ISA_CC_EVERGREEN; + + isa-rat_map = calloc(64, sizeof(unsigned)); + if (!isa-rat_map) + return -1; + + for (i = 0; i TABLE_SIZE(rat_op_table); ++i) { + const struct rat_op_info *op = rat_op_table[i]; + unsigned opc = op-opcode[column]; + if (opc == -1) + continue; + isa-rat_map[opc] = i + 1; + } + } + return 0; } @@ -97,6 +114,8 @@ int r600_isa_destroy(struct r600_isa *isa) { free(isa-fetch_map); if (isa-cf_map) free(isa-cf_map); + if (isa-rat_map) + free(isa-rat_map); free(isa); return 0; diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 89d..4055a04 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -147,6 +147,12 @@ enum cf_op_flags CF_LOOP_START = (114) }; +enum rat_op_flags +{ + RF_RTN = (1 0), + +}; + /* ALU instruction info */ struct alu_op_info { @@ -182,6 +188,15 @@ struct cf_op_info int flags; }; +/* CF RAT instruction info */ +struct rat_op_info +{ + const char * name; + /* 0 - EG, 1 - CM */ + int opcode[2]; + int flags; +}; + static const struct alu_op_info alu_op_table[] = { {ADD, 2, { 0x00, 0x00 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, {MUL, 2, { 0x01, 0x01 },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_M_COMM | AF_M_ASSOC }, @@ -665,6 +680,97 @@ static const struct cf_op_info cf_op_table[] = { {CF_NATIVE, { 0x00, 0x00, 0x00, 0x00 }, 0 } }; +static const struct rat_op_info rat_op_table[] = { + {NOP, {0x00, 0x00}, 0}, + {STORE_TYPED, {0x01, 0x01}, 0}, + {STORE_RAW, {0x02, -1}, 0}, + {STORE_RAW_FDENORM, {0x03, -1}, 0}, + {CMPXCHG_INT, {0x04, 0x00}, 0}, + {CMPXCHG_FLT, {0x05, -1}, 0}, + {CMPXCHG_FDENORM, {0x06, -1}, 0}, + {ADD, {0x07, 0x07}, 0}, + {SUB, {0x08, 0x08}, 0}, + {RSUB,{0x09, 0x09}, 0}, + {MIN_INT, {0x0A, 0x0A}, 0}, + {MIN_UINT,{0x0B, 0x0B}, 0}, + {MAX_INT, {0x0C, 0x0C}, 0}, + {MAX_UINT,{0x0D, 0x0D}, 0}, + {AND, {0x0E, 0x0E}, 0}, + {OR, {0x0F, 0x0F}, 0}, + {XOR, {0x10, 0x10}, 0}, + {MSKOR, {0x11, -1}, 0}, + {INC_UINT,{0x12, 0x12}, 0}, + {DEC_UINT,{0x13, 0x13}, 0}, + + {STORE_DWORD, { -1, 0x14}, 0}, + {STORE_SHORT, { -1, 0x15}, 0}, + {STORE_BYTE, { -1, 0x16}, 0}, + + {NOP_RTN_INTERNAL,{0x20, 0x20}, 0}, + + {XCHG_RTN,{0x22, 0x22}, RF_RTN }, + {XCHG_FDENORM_RTN,{0x23, -1}, RF_RTN }, + {CMPXCHG_INT_RTN, {0x24, 0x24}, RF_RTN }, + {CMPXCHG_FLT_RTN, {0x25, 0x25}, RF_RTN }, + {CMPXCHG_FDENORM_RTN, {0x26, 0x26}, RF_RTN }, + {ADD_RTN, {0x27, 0x27}, RF_RTN }, + {SUB_RTN, {0x28, 0x28}, RF_RTN }, + {RSUB_RTN,{0x29, 0x29}, RF_RTN }, + {MIN_INT_RTN, {0x2A, 0x2A}, RF_RTN }, + {MIN_UINT_RTN,{0x2B, 0x2B}, RF_RTN }, + {MAX_INT_RTN, {0x2C, 0x2C}, RF_RTN }, + {MAX_UINT_RTN,{0x2D, 0x2D}, RF_RTN
[Mesa-dev] [PATCH 2/2] r600g/sb: use ISA info for RAT instructions
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_bc.h | 12 +++- src/gallium/drivers/r600/sb/sb_bc_builder.cpp | 2 +- src/gallium/drivers/r600/sb/sb_bc_decoder.cpp | 5 - src/gallium/drivers/r600/sb/sb_bc_dump.cpp| 13 +++-- 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 25255a7..9f546be 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -470,10 +470,16 @@ struct bc_cf { unsigned comp_mask:4; unsigned rat_id:4; - unsigned rat_inst:6; unsigned rat_index_mode:2; + const rat_op_info *rat_op_ptr; + unsigned rat_op; + void set_op(unsigned op) { this-op = op; op_ptr = r600_isa_cf(op); } + void set_rat_op(unsigned op) { + this-rat_op = op; + rat_op_ptr = r600_isa_rat(op); + } bool is_alu_extended() { assert(op_ptr-flags CF_ALU); @@ -652,6 +658,10 @@ public: return r600_isa_cf_opcode(isa-hw_class, op); } + unsigned rat_opcode(unsigned op) { + return r600_isa_rat_opcode(isa-hw_class, op); + } + unsigned alu_opcode(unsigned op) { return r600_isa_alu_opcode(isa-hw_class, op); } diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp index 55e2a85..4322f45 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp @@ -267,7 +267,7 @@ int bc_builder::build_cf_exp(cf_node* n) { .INDEX_GPR(bc.index_gpr) .RAT_ID(bc.rat_id) .RAT_INDEX_MODE(bc.rat_index_mode) - .RAT_INST(bc.rat_inst) + .RAT_INST(ctx.rat_opcode(bc.rat_op)) .RW_GPR(bc.rw_gpr) .RW_REL(bc.rw_rel) .TYPE(bc.type); diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp index 5e233f9..0f3c57a 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp @@ -242,13 +242,16 @@ int bc_decoder::decode_cf_mem(unsigned i, bc_cf bc) { } else { assert(ctx.is_egcm()); CF_ALLOC_EXPORT_WORD0_RAT_EGCM w0(dw0); + unsigned rat_opcode = w0.get_RAT_INST(); + + bc.set_rat_op(r600_isa_rat_by_opcode(ctx.isa, rat_opcode)); + bc.elem_size = w0.get_ELEM_SIZE(); bc.index_gpr = w0.get_INDEX_GPR(); bc.rw_gpr = w0.get_RW_GPR(); bc.rw_rel = w0.get_RW_REL(); bc.type = w0.get_TYPE(); bc.rat_id = w0.get_RAT_ID(); - bc.rat_inst = w0.get_RAT_INST(); bc.rat_index_mode = w0.get_RAT_INDEX_MODE(); } diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp index 9d76465..152a33f 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp @@ -140,15 +140,24 @@ void bc_dump::dump(cf_node n) { } else if (n.bc.op_ptr-flags (CF_STRM | CF_RAT)) { static const char *exp_type[] = {WRITE, WRITE_IND, WRITE_ACK, WRITE_IND_ACK}; + + bool rat = (n.bc.op_ptr-flags CF_RAT) != 0; + fill_to(s, 18); s exp_type[n.bc.type] ; + + if (rat) { + s n.bc.rat_op_ptr-name ; + } + s.print_wl(n.bc.array_base, 5); s R n.bc.rw_gpr .; for (int k = 0; k 4; ++k) s ((n.bc.comp_mask (1 k)) ? chans[k] : '_'); - if ((n.bc.op_ptr-flags CF_RAT) (n.bc.type 1)) { - s , @R n.bc.index_gpr .xyz; + if (rat) { + if (n.bc.type 1) + s , @R n.bc.index_gpr .xyz; } sES: n.bc.elem_size; -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] SIGFPE in libdrm_radeon on evergreen
On 05/20/2013 11:27 AM, Dragomir Ivanov wrote: 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) It looks like division by 0. tile_split=0 from the call site. Yes, I'm just not sure why tile_split is 0 here and what is the best way to fix it, possibly in fact this is a consequence of some problem in r600g, not in the libdrm. Though probably libdrm should handle it more gracefully anyway. Vadim On Mon, May 20, 2013 at 4:11 AM, Vadim Girlin vadimgir...@gmail.com wrote: Reduced test app attached and below is gdb backtrace. I suspect something is not initialized properly but I'm not very familiar with this code. Vadim Program received signal SIGFPE, Arithmetic exception. 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) at radeon_surface.c:651 651 slice_pt = tileb / tile_split; #0 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) at radeon_surface.c:651 #1 0x76905eea in eg_surface_init_2d_miptrees (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:807 #2 0x76906062 in eg_surface_init (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:863 #3 0x76907fe6 in radeon_surface_init (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:1901 #4 0x7713260b in radeon_drm_winsys_surface_init (rws=0x6339a0, surf=0x88d848) at radeon_drm_winsys.c:477 #5 0x770a3e1c in r600_setup_surface (screen=0x6340d0, rtex=0x88d760, pitch_in_bytes_override=0) at r600_texture.c:203 #6 0x770a4774 in r600_texture_create_object (screen=0x6340d0, base=0x7fffd6d0, pitch_in_bytes_override=0, buf=0x0, surface=0x7fffc8e0) at r600_texture.c:432 #7 0x770a5268 in r600_texture_create (screen=0x6340d0, templ=0x7fffd6d0) at r600_texture.c:607 #8 0x7708a5bd in r600_resource_create (screen=0x6340d0, templ=0x7fffd6d0) at r600_resource.c:38 #9 0x77125579 in dri2_drawable_process_buffers (drawable=0x88af80, buffers=0x88aea0, buffer_count=1, atts=0x88b628, att_count=2) at dri2.c:283 #10 0x7712590a in dri2_allocate_textures (drawable=0x88af80, statts=0x88b628, statts_count=2) at dri2.c:404 #11 0x77123e6a in dri_st_framebuffer_validate (stfbi=0x88af80, statts=0x88b628, count=2, out=0x7fffd840) at dri_drawable.c:81 #12 0x76e461c1 in st_framebuffer_validate (stfb=0x88b1e0, st=0x883870) at ../../src/mesa/state_tracker/**st_manager.c:193 #13 0x76e472a8 in st_api_make_current (stapi=0x7761b9e0 st_gl_api, stctxi=0x883870, stdrawi=0x88af80, streadi=0x88af80) at ../../src/mesa/state_tracker/**st_manager.c:721 #14 0x77122ce8 in dri_make_current (cPriv=0x7fdb70, driDrawPriv=0x88af40, driReadPriv=0x88af40) at dri_context.c:255 #15 0x76c6ba1f in driBindContext (pcp=0x7fdb70, pdp=0x88af40, prp=0x88af40) at ../../../../src/mesa/drivers/**dri/common/dri_util.c:382 #16 0x77dc57e3 in dri2_bind_context (context=0x7fd9d0, old=0x616650, draw=67108873, read=67108873) at dri2_glx.c:172 #17 0x77d8c253 in MakeContextCurrent (dpy=0x602040, draw=67108873, read=67108873, gc_user=0x7fd9d0) at glxcurrent.c:269 #18 0x00384e82713c in fgOpenWindow () from /lib64/libglut.so.3 #19 0x00384e825afa in fgCreateWindow () from /lib64/libglut.so.3 #20 0x00384e825b95 in fgCreateMenu () from /lib64/libglut.so.3 #21 0x00384e823cd3 in glutCreateMenu () from /lib64/libglut.so.3 #22 0x00400816 in main (argc=1, argv=0x7fffdf18) at test.c:17 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] r600g/sb: separate bytecode decoding and parsing
Parsing and ir construction is required for optimization only, it's unnecessary if we only need to print shader dump. This should make new disassembler more tolerant to any new features in the bytecode. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_bc.h | 27 ++-- src/gallium/drivers/r600/sb/sb_bc_builder.cpp | 4 - src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 224 +- src/gallium/drivers/r600/sb/sb_core.cpp | 45 -- src/gallium/drivers/r600/sb/sb_shader.cpp | 4 +- src/gallium/drivers/r600/sb/sb_shader.h | 3 +- 6 files changed, 163 insertions(+), 144 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 9c6ed46..9f65098 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -674,40 +674,39 @@ class bc_parser { typedef std::stackregion_node* region_stack; region_stack loop_stack; - int enable_dump; - int optimize; - public: - bc_parser(sb_context sctx, r600_bytecode *bc, r600_shader* pshader, - int dump_source, int optimize) : + bc_parser(sb_context sctx, r600_bytecode *bc, r600_shader* pshader) : ctx(sctx), dec(), bc(bc), pshader(pshader), dw(), bc_ndw(), max_cf(), sh(), error(), slots(), cgroup(), - cf_map(), loop_stack(), enable_dump(dump_source), - optimize(optimize) { } + cf_map(), loop_stack() { } - int parse(); + int decode(); + int prepare(); shader* get_shader() { assert(!error); return sh; } private: - int parse_shader(); + int decode_shader(); int parse_decls(); - int parse_cf(unsigned i, bool eop); + int decode_cf(unsigned i, bool eop); - int parse_alu_clause(cf_node *cf); - int parse_alu_group(cf_node* cf, unsigned i, unsigned gcnt); + int decode_alu_clause(cf_node *cf); + int decode_alu_group(cf_node* cf, unsigned i, unsigned gcnt); - int parse_fetch_clause(cf_node *cf); + int decode_fetch_clause(cf_node *cf); int prepare_ir(); + int prepare_alu_clause(cf_node *cf); + int prepare_alu_group(cf_node* cf, alu_group_node *g); + int prepare_fetch_clause(cf_node *cf); + int prepare_loop(cf_node *c); int prepare_if(cf_node *c); - int prepare_alu_clause(cf_node *c); }; diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp index b0c2e41..f40e469 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp @@ -94,10 +94,6 @@ int bc_builder::build() { cf_pos = bb.get_pos(); } - if (sh.enable_dump) { - bc_dump(sh, cerr, bb).run(); - } - return 0; } diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index 8329287..9f3ecc5 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -47,7 +47,7 @@ namespace r600_sb { using std::cerr; -int bc_parser::parse() { +int bc_parser::decode() { dw = bc-bytecode; bc_ndw = bc-ndw; @@ -71,47 +71,27 @@ int bc_parser::parse() { t = TARGET_FETCH; } - sh = new shader(ctx, t, bc-debug_id, enable_dump); - int r = parse_shader(); + sh = new shader(ctx, t, bc-debug_id); + int r = decode_shader(); delete dec; - if (r) - return r; - sh-ngpr = bc-ngpr; sh-nstack = bc-nstack; - if (sh-target != TARGET_FETCH) { - sh-src_stats.ndw = bc-ndw; - sh-collect_stats(false); - } - - if (enable_dump) { - bc_dump(*sh, cerr, bc-bytecode, bc_ndw).run(); - } - - if (!optimize) - return 0; - - prepare_ir(); - return r; } -int bc_parser::parse_shader() { +int bc_parser::decode_shader() { int r = 0; unsigned i = 0; bool eop = false; sh-init(); - if (pshader) - parse_decls(); - do { eop = false; - if ((r = parse_cf(i, eop))) + if ((r = decode_cf(i, eop))) return r; } while (!eop || (i 1) = max_cf); @@ -119,34 +99,34 @@ int bc_parser::parse_shader() { return 0; } -int bc_parser::parse_decls() { - -// sh-prepare_regs(rs.bc.ngpr); - - if (pshader-indirect_files ~(1 TGSI_FILE_CONSTANT)) { +int bc_parser::prepare() { + int r = 0; + if ((r = parse_decls())) + return r; + if ((r = prepare_ir())) + return r; + return 0; +} -#if SB_NO_ARRAY_INFO +int bc_parser::parse_decls
Re: [Mesa-dev] r600g missing Bump mapping
On 05/09/2013 02:42 AM, Dragomir Ivanov wrote: Hi there, I just fired Doom3 on 64 -bit Arch Linux (no 32 libs involved), to test r600g progress. Game runs fine, but I can't see bump mapping effects as on Catalyst under windows. They are enabled in the options. Does Mesa/r600g support bumps? AMD E-350 here. Evergreen class GPU. Here are two screenshots made with git mesa on evergreen with Ultra settings, the only difference is toggled bump mapping option: http://i.imgur.com/Cl0hamf.jpg http://i.imgur.com/4IsjrR3.jpg To me it looks like bump mapping works. Could you provide more detailed info (with screenshots etc) to demonstrate your issue? Also you might want to try resetting game options to default to make sure that you don't have any nonstandard tweaks. Vadim OpenGL renderer string: Gallium 0.4 on AMD PALM OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.2 OpenGL core profile shading language version string: 1.40 Linux localhost 3.8.11-1-ARCH #1 SMP PREEMPT Wed May 1 20:18:57 CEST 2013 x86_64 GNU/Linux ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g missing Bump mapping
On 05/09/2013 05:42 AM, Dragomir Ivanov wrote: Hmm, Vadim... it works indeed. I can't push up to Ultra, but I see the difference as on your screenshots. Interestingly when I play it on win with catalyst, everything is WOW, on r600g is Meh... Unfortunately I erased windows, so I can't supply screenshot, but subjectively it was way more beautiful on the same graphics level. IIRC Doom3 tries to autodetect some settings and I guess there are differences in the game configuration with different drivers. Possibly some settings are misdetected with r600g, in this case I guess running it with the configuration file created for catalyst might help. Also you might want to check game's console output for any hints, e.g. like this: guessing video ram ( use +set sys_videoRam to force ) .. guess failed, return default low-end VRAM setting ( 64MB VRAM ) Though I don't see this message with 64-bit port, looks like detection logic was changed there. Vadim On Thu, May 9, 2013 at 4:15 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 05/09/2013 02:42 AM, Dragomir Ivanov wrote: Hi there, I just fired Doom3 on 64 -bit Arch Linux (no 32 libs involved), to test r600g progress. Game runs fine, but I can't see bump mapping effects as on Catalyst under windows. They are enabled in the options. Does Mesa/r600g support bumps? AMD E-350 here. Evergreen class GPU. Here are two screenshots made with git mesa on evergreen with Ultra settings, the only difference is toggled bump mapping option: http://i.imgur.com/Cl0hamf.jpg http://i.imgur.com/4IsjrR3.jpg To me it looks like bump mapping works. Could you provide more detailed info (with screenshots etc) to demonstrate your issue? Also you might want to try resetting game options to default to make sure that you don't have any nonstandard tweaks. Vadim OpenGL renderer string: Gallium 0.4 on AMD PALM OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.2 OpenGL core profile shading language version string: 1.40 Linux localhost 3.8.11-1-ARCH #1 SMP PREEMPT Wed May 1 20:18:57 CEST 2013 x86_64 GNU/Linux __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] r600g/sb: fix kcache handling on r6xx
Use the same limit for kcache constants in alu group on r6xx as on other chips (two const pairs). Relaxing this will require additional checks to make sure that all 4 consts in the group come from 2 kcache sets (clause limit), probably without noticeable improvements of shader performance. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index b21b342..d0045ce 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -43,7 +43,11 @@ namespace r600_sb { using std::cerr; rp_kcache_tracker::rp_kcache_tracker(shader sh) : rp(), uc(), - sel_count(sh.get_ctx().is_r600() ? 4 : 2) {} + // FIXME: for now we'll use two const pairs limit for r600, same as + // for other chips, otherwise additional check in alu_group_tracker is + // required to make sure that all 4 consts in the group fit into 2 + // kcache sets + sel_count(2) {} bool rp_kcache_tracker::try_reserve(sel_chan r) { unsigned sel = kc_sel(r); -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] r600g/sb: optimize some cases for CNDxx instructions
We can replace CNDxx with MOV (and possibly eliminate after propagation) in following cases: If src1 is equal to src2 in CNDxx instruction then the result doesn't depend on condition and we can replace the instruction with MOV dst, src1. If src0 is const then we can evaluate the condition at compile time and also replace it with MOV. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_expr.cpp | 84 +++-- src/gallium/drivers/r600/sb/sb_expr.h | 2 + 2 files changed, 81 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp index e3c7858..8582c8e 100644 --- a/src/gallium/drivers/r600/sb/sb_expr.cpp +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp @@ -432,13 +432,63 @@ bool expr_handler::fold_alu_op2(alu_node n) { return true; } +bool expr_handler::evaluate_condition(unsigned alu_cnd_flags, + literal s1, literal s2) { + + unsigned cmp_type = alu_cnd_flags AF_CMP_TYPE_MASK; + unsigned cc = alu_cnd_flags AF_CC_MASK; + + switch (cmp_type) { + case AF_FLOAT_CMP: { + switch (cc) { + case AF_CC_E : return s1.f == s2.f; + case AF_CC_GT: return s1.f s2.f; + case AF_CC_GE: return s1.f = s2.f; + case AF_CC_NE: return s1.f != s2.f; + case AF_CC_LT: return s1.f s2.f; + case AF_CC_LE: return s1.f = s2.f; + default: + assert(!invalid condition code); + return false; + } + } + case AF_INT_CMP: { + switch (cc) { + case AF_CC_E : return s1.i == s2.i; + case AF_CC_GT: return s1.i s2.i; + case AF_CC_GE: return s1.i = s2.i; + case AF_CC_NE: return s1.i != s2.i; + case AF_CC_LT: return s1.i s2.i; + case AF_CC_LE: return s1.i = s2.i; + default: + assert(!invalid condition code); + return false; + } + } + case AF_UINT_CMP: { + switch (cc) { + case AF_CC_E : return s1.u == s2.u; + case AF_CC_GT: return s1.u s2.u; + case AF_CC_GE: return s1.u = s2.u; + case AF_CC_NE: return s1.u != s2.u; + case AF_CC_LT: return s1.u s2.u; + case AF_CC_LE: return s1.u = s2.u; + default: + assert(!invalid condition code); + return false; + } + } + default: + assert(!invalid cmp_type); + return false; + } +} + bool expr_handler::fold_alu_op3(alu_node n) { if (n.src.size() 3) return false; - // TODO handle CNDxx by some common path - value* v0 = n.src[0]; value* v1 = n.src[1]; value* v2 = n.src[2]; @@ -449,9 +499,6 @@ bool expr_handler::fold_alu_op3(alu_node n) { bool isc1 = v1-is_const(); bool isc2 = v2-is_const(); - if (!isc0 !isc1 !isc2) - return false; - literal dv, cv0, cv1, cv2; if (isc0) { @@ -469,6 +516,33 @@ bool expr_handler::fold_alu_op3(alu_node n) { apply_alu_src_mod(n.bc, 2, cv2); } + if (n.bc.op_ptr-flags AF_CMOV) { + int src = 0; + + if (v1-gvalue() == v2-gvalue() + n.bc.src[1].neg == n.bc.src[2].neg) { + // result doesn't depend on condition, convert to MOV + src = 1; + } else if (isc0) { + // src0 is const, condition can be evaluated, convert to MOV + bool cond = evaluate_condition(n.bc.op_ptr-flags (AF_CC_MASK | + AF_CMP_TYPE_MASK), cv0, literal(0)); + src = cond ? 1 : 2; + } + + if (src) { + // if src is selected, convert to MOV + n.bc.src[0] = n.bc.src[src]; + n.src[0] = n.src[src]; + n.src.resize(1); + n.bc.set_op(ALU_OP1_MOV); + return fold_alu_op1(n); + } + } + + if (!isc0 !isc1 !isc2) + return false; + if (isc0 isc1 isc2) { switch (n.bc.op) { case ALU_OP3_MULADD: dv = cv0.f * cv1.f + cv2.f; break; diff --git a/src/gallium/drivers/r600/sb/sb_expr.h b/src/gallium/drivers/r600/sb/sb_expr.h index 7f3bd15..c7f7dbf 100644 --- a/src/gallium/drivers/r600/sb/sb_expr.h +++ b/src/gallium/drivers/r600/sb/sb_expr.h @@ -76,6 +76,8 @@ public: void apply_alu_dst_mod(const bc_alu bc, literal v); void assign_source(value *dst, value *src
[Mesa-dev] [PATCH 2/3] r600g/sb: fix memory leaks
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 3 ++- src/gallium/drivers/r600/sb/sb_shader.cpp| 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index e1478d3..8329287 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -74,6 +74,8 @@ int bc_parser::parse() { sh = new shader(ctx, t, bc-debug_id, enable_dump); int r = parse_shader(); + delete dec; + if (r) return r; @@ -94,7 +96,6 @@ int bc_parser::parse() { prepare_ir(); - delete dec; return r; } diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 9bda84f..5944ba6 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -355,6 +355,11 @@ shader::~shader() { for (node_vec::iterator I = all_nodes.begin(), E = all_nodes.end(); I != E; ++I) (*I)-~node(); + + for (gpr_array_vec::iterator I = gpr_arrays.begin(), E = gpr_arrays.end(); + I != E; ++I) { + delete *I; + } } void shader::dump_ir() { -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: use old shader disassembler by default
New disassembler is not completely isolated yet from further processing in r600g/sb that is not required for printing the dump, so it has higher probability to fail in case of any unexpected features in the bytecode. This patch adds sbdisasm flag for R600_DEBUG that allows to use new disassembler in r600g/sb for shader dumps when shader optimization is not enabled. If shader optimization is enabled, new disassembler is used by default. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_asm.c| 13 +++-- src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_shader.c | 22 +- 4 files changed, 18 insertions(+), 19 deletions(-) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 81b84ec..df0376a 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -2281,6 +2281,7 @@ void *r600_create_vertex_fetch_shader(struct pipe_context *ctx, uint32_t *bytecode; int i, j, r, fs_size; struct r600_fetch_shader *shader; + unsigned sb_disasm = rctx-screen-debug_flags (DBG_SB_DISASM | DBG_SB); assert(count 32); @@ -2387,13 +2388,13 @@ void *r600_create_vertex_fetch_shader(struct pipe_context *ctx, fprintf(stderr, \n); } -#if 0 - r600_bytecode_disasm(bc); + if (!sb_disasm) { + r600_bytecode_disasm(bc); - fprintf(stderr, __\n); -#else - r600_sb_bytecode_process(rctx, bc, NULL, 1 /*dump*/, 0 /*optimize*/); -#endif + fprintf(stderr, __\n); + } else { + r600_sb_bytecode_process(rctx, bc, NULL, 1 /*dump*/, 0 /*optimize*/); + } } fs_size = bc.ndw*4; diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 4991fb2..daadaeb 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -73,6 +73,7 @@ static const struct debug_named_value debug_options[] = { { sbstat, DBG_SB_STAT, Print optimization statistics for shaders }, { sbdump, DBG_SB_DUMP, Print IR dumps after some optimization passes }, { sbnofallback, DBG_SB_NO_FALLBACK, Abort on errors instead of fallback }, + { sbdisasm, DBG_SB_DISASM, Use sb disassembler for shader dumps }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 61e2022..bb4e429 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -264,6 +264,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context *ctx, #define DBG_SB_STAT(1 24) #define DBG_SB_DUMP(1 25) #define DBG_SB_NO_FALLBACK (1 26) +#define DBG_SB_DISASM (1 27) struct r600_tiling_info { unsigned num_channels; diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 49218e5..9afd57f 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -141,6 +141,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx, uint32_t *ptr; bool dump = r600_can_dump_shader(rctx-screen, tgsi_get_processor_type(sel-tokens)); unsigned use_sb = rctx-screen-debug_flags DBG_SB; + unsigned sb_disasm = use_sb || (rctx-screen-debug_flags DBG_SB_DISASM); shader-shader.bc.isa = rctx-isa; @@ -163,21 +164,18 @@ int r600_pipe_shader_create(struct pipe_context *ctx, return r; } -#if 0 - if (dump) { + if (dump !sb_disasm) { fprintf(stderr, --\n); r600_bytecode_disasm(shader-shader.bc); fprintf(stderr, __\n); - } -#else - if (dump || use_sb) { - r = r600_sb_bytecode_process(rctx, shader-shader.bc, shader-shader, dump, use_sb); + } else if ((dump sb_disasm) || use_sb) { + r = r600_sb_bytecode_process(rctx, shader-shader.bc, shader-shader, +dump, use_sb); if (r) { R600_ERR(r600_sb_bytecode_process failed !\n); return r; } } -#endif /* Store the shader in a buffer. */ if (shader-bo == NULL) { @@ -307,6 +305,8 @@ int r600_compute_shader_create(struct pipe_context * ctx, boolean use_kill = false; bool dump = (r600_ctx-screen-debug_flags DBG_CS) != 0
Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key
On 05/03/2013 03:10 PM, Lauri Kasanen wrote: Assigning a struct only copies the members - any padding is left as is. Thus this code: struct foo; foo = bar; leaves the padding of foo intact, ie uninitialized random garbage. This patch fixes constant shader recompiles by initializing the struct to zero. Signed-off-by: Lauri Kasanen c...@gmx.com --- src/gallium/drivers/r600/r600_state_common.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..bf7cc39 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -710,7 +710,7 @@ static int r600_shader_select(struct pipe_context *ctx, struct r600_pipe_shader_selector* sel, bool *dirty) { - struct r600_shader_key key; + struct r600_shader_key key = {0}; I suspect the effect of this initialization on padding is undefined. Probably it's safer to use memset. Vadim struct r600_context *rctx = (struct r600_context *)ctx; struct r600_pipe_shader * shader = NULL; int r; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV
This patch results in lockups with Heaven on juniper for me. Vadim On 04/26/2013 09:21 PM, Tom Stellard wrote: From: Tom Stellard thomas.stell...@amd.com We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet when this flush flag is set, so flushing the dest caches with a SURFACE_SYNC should not be necessary. The motivation for this change is that emitting a SURFACE_SYNC packet with the CB bits set was causing compute shaders to hang on Cayman. --- src/gallium/drivers/r600/r600_hw_context.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index b4fb3bf..8aebd25 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx) cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0); if (rctx-chip_class = EVERGREEN) { - cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) | - S_0085F0_CB1_DEST_BASE_ENA(1) | - S_0085F0_CB2_DEST_BASE_ENA(1) | - S_0085F0_CB3_DEST_BASE_ENA(1) | - S_0085F0_CB4_DEST_BASE_ENA(1) | - S_0085F0_CB5_DEST_BASE_ENA(1) | - S_0085F0_CB6_DEST_BASE_ENA(1) | - S_0085F0_CB7_DEST_BASE_ENA(1) | - S_0085F0_CB8_DEST_BASE_ENA(1) | - S_0085F0_CB9_DEST_BASE_ENA(1) | - S_0085F0_CB10_DEST_BASE_ENA(1) | - S_0085F0_CB11_DEST_BASE_ENA(1) | - S_0085F0_DB_DEST_BASE_ENA(1) | - S_0085F0_TC_ACTION_ENA(1) | - S_0085F0_CB_ACTION_ENA(1) | + /* We were previously setting the CB and DB bits on +* cp_coher_cntl, but this is unnecessary since +* we are emitting the +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet. +* Setting the CB bits was causing lockups when using +* compute on cayman. +* +* XXX: Do even need to emit a surface sync packet here? +* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b +* surface sync was not being emitted with the +* R600_CONTEXT_FLUSH_AND_INV flag. +*/ + cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) | S_0085F0_DB_ACTION_ENA(1) | S_0085F0_SH_ACTION_ENA(1) | S_0085F0_SMX_ACTION_ENA(1) | ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV
On 05/03/2013 05:36 PM, Alex Deucher wrote: On Fri, May 3, 2013 at 9:30 AM, Vadim Girlin vadimgir...@gmail.com wrote: This patch results in lockups with Heaven on juniper for me. Does dropping the surface_sync packet completely help? We shouldn't need a surface_sync packet after a CACHE_FLUSH_AND_INV_EVENT packet and prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b we never emitted it. Yes, this patch fixed it. Vadim Alex diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 6d8b2cf..944b666 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx) if (rctx-flags R600_CONTEXT_FLUSH_AND_INV) { cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0); - if (rctx-chip_class = EVERGREEN) { - /* We were previously setting the CB and DB bits on -* cp_coher_cntl, but this is unnecessary since -* we are emitting the -* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet. -* Setting the CB bits was causing lockups when using -* compute on cayman. -* -* XXX: Do even need to emit a surface sync packet here? -* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b -* surface sync was not being emitted with the -* R600_CONTEXT_FLUSH_AND_INV flag. -*/ - cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) | - S_0085F0_DB_ACTION_ENA(1) | - S_0085F0_SH_ACTION_ENA(1) | - S_0085F0_SMX_ACTION_ENA(1) | - S_0085F0_FULL_CACHE_ENA(1); - } else { - cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) | - S_0085F0_SH_ACTION_ENA(1) | - S_0085F0_VC_ACTION_ENA(1) | - S_0085F0_TC_ACTION_ENA(1) | - S_0085F0_FULL_CACHE_ENA(1); - } - emit_flush = 1; } if (rctx-flags R600_CONTEXT_INVAL_READ_CACHES) { Vadim On 04/26/2013 09:21 PM, Tom Stellard wrote: From: Tom Stellard thomas.stell...@amd.com We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet when this flush flag is set, so flushing the dest caches with a SURFACE_SYNC should not be necessary. The motivation for this change is that emitting a SURFACE_SYNC packet with the CB bits set was causing compute shaders to hang on Cayman. --- src/gallium/drivers/r600/r600_hw_context.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index b4fb3bf..8aebd25 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx) cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0); if (rctx-chip_class = EVERGREEN) { - cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) | - S_0085F0_CB1_DEST_BASE_ENA(1) | - S_0085F0_CB2_DEST_BASE_ENA(1) | - S_0085F0_CB3_DEST_BASE_ENA(1) | - S_0085F0_CB4_DEST_BASE_ENA(1) | - S_0085F0_CB5_DEST_BASE_ENA(1) | - S_0085F0_CB6_DEST_BASE_ENA(1) | - S_0085F0_CB7_DEST_BASE_ENA(1) | - S_0085F0_CB8_DEST_BASE_ENA(1) | - S_0085F0_CB9_DEST_BASE_ENA(1) | - S_0085F0_CB10_DEST_BASE_ENA(1) | - S_0085F0_CB11_DEST_BASE_ENA(1) | - S_0085F0_DB_DEST_BASE_ENA(1) | - S_0085F0_TC_ACTION_ENA(1) | - S_0085F0_CB_ACTION_ENA(1) | + /* We were previously setting the CB and DB bits on +* cp_coher_cntl, but this is unnecessary since +* we are emitting the +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
Re: [Mesa-dev] [PATCH 3/3] radeonsi: fix the max vertex shader input limit
On 05/02/2013 11:06 AM, Michel Dänzer wrote: On Don, 2013-05-02 at 05:45 +0200, Marek Olšák wrote: --- src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c b/src/gallium/drivers/radeonsi/radeonsi_pipe.c index c923c67..3b9be54 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c @@ -481,7 +481,7 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH: return 32; case PIPE_SHADER_CAP_MAX_INPUTS: - return 32; + return shader == PIPE_SHADER_VERTEX ? 16 : 32; For r600g, I assume the limit of 16 is due to the number of hardware registers available for vertex shader inputs, AFAIK there is no such limit on r600 hw as well. IIRC there are at least 32 registers for semantic fetch mapping, but I think we aren't limited even by this because we can use non-semantic fetches (and currently we don't use semantic fetches at all). Am I missing something? Vadim but as of SI the state is no longer stored in registers but in resource descriptors in a BO. In theory, I think we could even support many more inputs than 32, but let's just leave it at that for now. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] r600g/sb: fix allocation of indirectly addressed input arrays
Some inputs may be preloaded into predefined GPRs, so we can't reallocate arrays with such inputs. Fixes issues with webgl demo: http://oos.moxiecode.com/js_webgl/snake/ Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp | 14 +- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 6 ++ src/gallium/drivers/r600/sb/sb_shader.cpp | 13 + src/gallium/drivers/r600/sb/sb_shader.h| 2 +- 4 files changed, 25 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp b/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp index cec4bbc..25c46f7 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_coalesce.cpp @@ -345,13 +345,17 @@ void coalescer::init_reg_bitset(sb_bitset bs, val_set vs) { for (val_set::iterator I = vs.begin(sh), E = vs.end(sh); I != E; ++I) { value *v = *I; - if (!v-is_sgpr()) + if (!v-is_any_gpr()) continue; - if (v-gpr) { - if (v-gpr = bs.size()) - bs.resize(v-gpr + 64); - bs.set(v-gpr, 1); + unsigned gpr = v-get_final_gpr(); + if (!gpr) + continue; + + if (gpr) { + if (gpr = bs.size()) + bs.resize(gpr + 64); + bs.set(gpr, 1); } } } diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index 0447f29..99ff6ff 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -244,6 +244,12 @@ void ra_init::alloc_arrays() { cerr \n; ); + // skip preallocated arrays (e.g. with preloaded inputs) + if (a-gpr) { + RA_DUMP( cerr FIXED at a-gpr \n; ); + continue; + } + bool dead = a-is_dead(); if (dead) { diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 6dd3678..3fd6ea4 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -61,7 +61,7 @@ bool shader::assign_slot(alu_node* n, alu_node *slots[5]) { return true; } -void shader::add_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, +void shader::add_pinned_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, bool src) { unsigned chan = 0; while (comp_mask) { @@ -72,6 +72,11 @@ void shader::add_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, v-gpr = v-pin_gpr = v-select; v-fix(); } + if (v-array !v-array-gpr) { + // if pinned value can be accessed with indirect addressing + // pin the entire array to its original location + v-array-gpr = v-array-base_gpr; + } vec.push_back(v); } comp_mask = 1; @@ -199,7 +204,7 @@ void shader::add_input(unsigned gpr, bool preloaded, unsigned comp_mask) { i.comp_mask = comp_mask; if (preloaded) { - add_gpr_values(root-dst, gpr, comp_mask, true); + add_pinned_gpr_values(root-dst, gpr, comp_mask, true); } } @@ -217,9 +222,9 @@ void shader::init_call_fs(cf_node* cf) { for(inputs_vec::const_iterator I = inputs.begin(), E = inputs.end(); I != E; ++I, ++gpr) { if (!I-preloaded) - add_gpr_values(cf-dst, gpr, I-comp_mask, false); + add_pinned_gpr_values(cf-dst, gpr, I-comp_mask, false); else - add_gpr_values(cf-src, gpr, I-comp_mask, true); + add_pinned_gpr_values(cf-src, gpr, I-comp_mask, true); } } diff --git a/src/gallium/drivers/r600/sb/sb_shader.h b/src/gallium/drivers/r600/sb/sb_shader.h index aa71d54..b2e3837 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.h +++ b/src/gallium/drivers/r600/sb/sb_shader.h @@ -315,7 +315,7 @@ public: value* get_value_version(value* v, unsigned ver); void init(); - void add_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, bool src); + void add_pinned_gpr_values(vvec vec, unsigned gpr, unsigned comp_mask, bool src); void dump_ir(); -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] r600g/sb: fix handling of interference sets in post_scheduler
post_scheduler clears interference set for reallocatable values when the value becomes live first time, and then updates it to take into account modified order of operations, but this was not handled properly if the value appears first time as a source in copy operation. Fixes issues with webgl demo: http://madebyevan.com/webgl-water/ Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 12 ++-- src/gallium/drivers/r600/sb/sb_sched.h | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 7e9eacc..d7c1795 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -874,7 +874,7 @@ void post_scheduler::update_local_interferences() { } } -void post_scheduler::update_live_src_vec(vvec vv, val_set born, bool src) { +void post_scheduler::update_live_src_vec(vvec vv, val_set *born, bool src) { for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) { value *v = *I; @@ -892,7 +892,8 @@ void post_scheduler::update_live_src_vec(vvec vv, val_set born, bool src) { cleared_interf.add_val(v); } } - born.add_val(v); + if (born) + born-add_val(v); } } else if (v-is_rel()) { if (!v-rel-is_any_gpr()) @@ -924,7 +925,7 @@ void post_scheduler::update_live_dst_vec(vvec vv) { } } -void post_scheduler::update_live(node *n, val_set born) { +void post_scheduler::update_live(node *n, val_set *born) { update_live_dst_vec(n-dst); update_live_src_vec(n-src, born, true); update_live_src_vec(n-dst, born, false); @@ -948,7 +949,7 @@ void post_scheduler::process_group() { if (!n) continue; - update_live(n, vals_born); + update_live(n, vals_born); } PSC_DUMP( @@ -1550,8 +1551,7 @@ bool post_scheduler::check_copy(node *n) { if (s-is_prealloc() !map_src_val(s)) return true; - live.remove_val(d); - live.add_val(s); + update_live(n, NULL); release_src_values(n); n-remove(); diff --git a/src/gallium/drivers/r600/sb/sb_sched.h b/src/gallium/drivers/r600/sb/sb_sched.h index e74046c..a74484f 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.h +++ b/src/gallium/drivers/r600/sb/sb_sched.h @@ -297,9 +297,9 @@ public: bool recolor_local(value *v); void update_local_interferences(); - void update_live_src_vec(vvec vv, val_set born, bool src); + void update_live_src_vec(vvec vv, val_set *born, bool src); void update_live_dst_vec(vvec vv); - void update_live(node *n, val_set born); + void update_live(node *n, val_set *born); void process_group(); void set_color_local_val(value *v, sel_chan color); -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] r600g/sb: silence warnings with gcc 4.8
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 25 +++-- src/gallium/drivers/r600/sb/sb_sched.cpp | 4 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index 99ff6ff..03b8efd 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -75,7 +75,7 @@ public: void set(unsigned index, unsigned val); - sel_chan find_free_bit(unsigned start); + sel_chan find_free_bit(); sel_chan find_free_chans(unsigned mask); sel_chan find_free_array(unsigned size, unsigned mask); @@ -148,24 +148,21 @@ void regbits::set(unsigned index, unsigned val) { } // free register for ra means the bit is set -sel_chan regbits::find_free_bit(unsigned start) { - unsigned elt = start bt_index_shift; - unsigned bit = start bt_index_mask; - - unsigned end = start MAX_GPR - num_temps ? MAX_GPR - num_temps : MAX_GPR; +sel_chan regbits::find_free_bit() { + unsigned elt = 0; + unsigned bit = 0; - while (elt end !dta[elt]) { + while (elt size !dta[elt]) ++elt; - bit = 0; - } - if (elt = end) + if (elt = size) return 0; - // FIXME this seems broken when not starting from 0 + bit = __builtin_ctz(dta[elt]) + (elt bt_index_shift); + + assert(bit MAX_GPR - num_temps); - bit += __builtin_ctz(dta[elt]); - return ((elt bt_index_shift) | bit) + 1; + return bit + 1; } // find free gpr component to use as indirectly addressable array @@ -482,7 +479,7 @@ void ra_init::color(value* v) { unsigned mask = 1 v-pin_gpr.chan(); c = rb.find_free_chans(mask) + v-pin_gpr.chan(); } else { - c = rb.find_free_bit(0); + c = rb.find_free_bit(); } assert(c c.sel() 128 - ctx.alu_temp_gprs color failed); diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index d7c1795..b21b342 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -542,6 +542,10 @@ bool alu_group_tracker::try_reserve(alu_node* n) { assert(first_slot != ~0 last_slot != ~0); + // silence array subscript is above array bounds with gcc 4.8 + if (last_slot = 5) + abort(); + int i = first_nf; alu_node *a = slots[i]; bool backtrack = false; -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] r600g/sb: don't run unnecessary passes
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/sb/sb_core.cpp | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp b/src/gallium/drivers/r600/sb/sb_core.cpp index 9f81ed4..b919fa4 100644 --- a/src/gallium/drivers/r600/sb/sb_core.cpp +++ b/src/gallium/drivers/r600/sb/sb_core.cpp @@ -187,9 +187,6 @@ int r600_sb_bytecode_process(struct r600_context *rctx, SB_RUN_PASS(dce_cleanup,1); SB_RUN_PASS(def_use,0); - SB_RUN_PASS(liveness, 0); - SB_RUN_PASS(dce_cleanup,0); - SB_RUN_PASS(ra_split, 0); SB_RUN_PASS(def_use,0); -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] radeonsi: fix the max vertex shader input limit
On 05/02/2013 07:55 PM, Marek Olšák wrote: AFAIK, there are 16 fetch shader resources. These are the resource slots for r600: Ah, you are right (though it's higher on EG as Alex wrote). Anyway, I'm not against your patch, I just wanted to understand where this limit comes from. I think this cap itself is a bit misleading, because its description in the docs is about input registers, not about resources/buffers, and shader inputs do not necessarily come from separate resources. Vadim [offset .. +count] PS: 0 .. +160 VS: 160 .. +160 FS: 320 .. +16 GS: 336 .. +160 Marek On Thu, May 2, 2013 at 5:04 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 05/02/2013 11:06 AM, Michel Dänzer wrote: On Don, 2013-05-02 at 05:45 +0200, Marek Olšák wrote: --- src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c b/src/gallium/drivers/radeonsi/radeonsi_pipe.c index c923c67..3b9be54 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c @@ -481,7 +481,7 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH: return 32; case PIPE_SHADER_CAP_MAX_INPUTS: - return 32; + return shader == PIPE_SHADER_VERTEX ? 16 : 32; For r600g, I assume the limit of 16 is due to the number of hardware registers available for vertex shader inputs, AFAIK there is no such limit on r600 hw as well. IIRC there are at least 32 registers for semantic fetch mapping, but I think we aren't limited even by this because we can use non-semantic fetches (and currently we don't use semantic fetches at all). Am I missing something? Vadim but as of SI the state is no longer stored in registers but in resource descriptors in a BO. In theory, I think we could even support many more inputs than 32, but let's just leave it at that for now. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600 sb test results
On 05/02/2013 06:34 PM, Lauri Kasanen wrote: On Thu, 02 May 2013 00:45:13 +0400 Vadim Girlin vadimgir...@gmail.com wrote: On 05/01/2013 11:36 PM, Lauri Kasanen wrote: Now that it built, I could test your optimizations in my own apps. These are on current master 8eef6ad, on a RV710 (HD 4350 pci-e). In one of my private apps, using R600_DEBUG=sb caused regressions: FPS went from 28 to 7, the SSAO shader gave visual distortions/flicker, and the cpu was constantly pegged. Here's the output from R600_DEBUG=sb,sbstat in case it helps: http://bayfiles.net/file/Pmkh/PUj0Ru/vadim.gz It seems as if it's constantly handling new shaders? My app certainly issues no new shaders, they are all linked when the app starts. r600g may rebuild shaders at runtime because some GL features are implemented in shader code, so if your app changes some specific GL states (e.g. two-sided rendering mode), then r600g has to build and switch between different shader variants. It mainly uses the stencil buffer, the clear color is changed in various passes, some occlusion queries with color masks, but nothing exotic. New uniforms are of course sent each frame. On the other hand there is caching of shader variants in r600g implemented specially to prevent repetitive rebuilding of shaders, looks like it doesn't work in your case for some reason. Optimizations take more time than rebuilding with default backend, that explains performance regression. Could you provide some test app that reproduces these issues? It's quite time-taking to cut it down, and apitraces of it in full are several gigs (far too much to upload with my connection). I'll see if I can get just the SSAO isolated, with minimal textures, to get a smaller trace. I'm almost sure that the same issue that you have with glxgears affects your app too, so you might want to wait until we resolve the problem with gears, possibly this will solve other rendering issues as well. Please also send me the dump with R600_DEBUG=sb,ps,vs, maybe I'll be able to spot anything wrong there. http://bayfiles.net/file/PmY5/xgIdlZ/foo.gz Let me know what you need to debug this. - Lauri PS: I'm not sure if this should be public or not, I think you're the only one working on it? Yes, I doubt that anyone else will work on it, on the other hand I think reporting this on the list might help other users who will possibly hit similar issues. Also at least in this case it looks rather like a problem in r600g, so I'm cc'ing mesa-dev, r600-sb just made this issue more noticeable because shader rebuilding with optimization requires more time. Using standard r600g, the cpu usage is less than 25% of one core, so nothing was showing it was constantly rebuilding shaders. Is there some way I could've found it was doing that, and if so, why? You could run the app with R600_DEBUG=ps,vs (without sb) - it will also print the dump of every built shader. r600-sb doesn't affect the logic of shader rebuilding, it just processes the shaders when asked by r600g, so I think you'll see the same - a lot of built shaders. You could even try this with older mesa (before r600-sb was merged) to be sure. As for the cause of rebuilding, I don't see any changes in the shaders in your dump that might be explained by state changes, it's exactly the same shaders rebuilt more than once, so far I don't know why. You might want to look into r600_shader_select function with debugger to see what's going wrong, it computes the key for required shader variant using r600_shader_selector_key, then looks at the list of variants to find already built shader with the same key, and builds a new one only if it can't find existing shader. Looks like something fails there. By the way, I won't be very surprised if some old gcc release simply fails at handling bitfields which are used to store both the keys of shader variants in r600g and bytecode data in r600-sb (the same data that ends up being broken in your glxgears dump), IIRC there were bitfields-related bugs. Vadim - Lauri ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: use hex instead of binary constants
This should fix build issues with GCC 4.3 Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- cc: Lauri Kasanen c...@gmx.com Lauri, please test to make sure that I didn't miss anything. src/gallium/drivers/r600/r600_shader.c | 6 +++--- src/gallium/drivers/r600/sb/sb_bc.h | 4 ++-- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 10 +- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 2 +- src/gallium/drivers/r600/sb/sb_sched.cpp | 8 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fd3fe39..49218e5 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1005,7 +1005,7 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) r600_add_gpr_array(ctx-shader, ctx-file_offset[TGSI_FILE_TEMPORARY] + d-Range.First, - d-Range.Last - d-Range.First + 1, 0b); + d-Range.Last - d-Range.First + 1, 0x0F); } } break; @@ -1421,13 +1421,13 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, r600_add_gpr_array(shader, ctx.file_offset[TGSI_FILE_INPUT], ctx.file_offset[TGSI_FILE_OUTPUT] - ctx.file_offset[TGSI_FILE_INPUT], - 0b); + 0x0F); } if (ctx.info.indirect_files (1 TGSI_FILE_OUTPUT)) { r600_add_gpr_array(shader, ctx.file_offset[TGSI_FILE_OUTPUT], ctx.file_offset[TGSI_FILE_TEMPORARY] - ctx.file_offset[TGSI_FILE_OUTPUT], - 0b); + 0x0F); } } diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 0b9bc07..9c6ed46 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -553,9 +553,9 @@ public: unsigned mask = 0; unsigned slot_flags = alu_slots(op_ptr); if (slot_flags AF_V) - mask = 0b0; + mask = 0x0F; if (!is_cayman() (slot_flags AF_S)) - mask |= 0b1; + mask |= 0x10; return mask; } diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index cc75528..e1478d3 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -126,7 +126,7 @@ int bc_parser::parse_decls() { #if SB_NO_ARRAY_INFO - sh-add_gpr_array(0, pshader-bc.ngpr, 0b); + sh-add_gpr_array(0, pshader-bc.ngpr, 0x0F); #else @@ -140,7 +140,7 @@ int bc_parser::parse_decls() { } } else { - sh-add_gpr_array(0, pshader-bc.ngpr, 0b); + sh-add_gpr_array(0, pshader-bc.ngpr, 0x0F); } @@ -149,7 +149,7 @@ int bc_parser::parse_decls() { } if (sh-target == TARGET_VS) - sh-add_input(0, 1, 0b); + sh-add_input(0, 1, 0x0F); bool ps_interp = ctx.hw_class = HW_CLASS_EVERGREEN sh-target == TARGET_PS; @@ -159,7 +159,7 @@ int bc_parser::parse_decls() { for (unsigned i = 0; i pshader-ninput; ++i) { r600_shader_io in = pshader-input[i]; bool preloaded = sh-target == TARGET_PS !(ps_interp in.spi_sid); - sh-add_input(in.gpr, preloaded, /*in.write_mask*/ 0b); + sh-add_input(in.gpr, preloaded, /*in.write_mask*/ 0x0F); if (ps_interp in.spi_sid) { if (in.interpolate == TGSI_INTERPOLATE_LINEAR || in.interpolate == TGSI_INTERPOLATE_COLOR) @@ -176,7 +176,7 @@ int bc_parser::parse_decls() { unsigned gpr = 0; while (mask) { - sh-add_input(gpr, true, mask 0b); + sh-add_input(gpr, true, mask 0x0F); ++gpr; mask = 4; } diff --git a/src/gallium/drivers/r600/sb/sb_ra_init.cpp b/src/gallium/drivers/r600/sb/sb_ra_init.cpp index 75b2d5d..0447f29 100644 --- a/src/gallium/drivers/r600/sb/sb_ra_init.cpp +++ b/src/gallium/drivers/r600/sb/sb_ra_init.cpp @@ -360,7 +360,7 @@ void
Re: [Mesa-dev] r600-sb: glxgears wrong rendering
On 05/01/2013 11:42 PM, Lauri Kasanen wrote: Hi Running R600_DEBUG=sb glxgears on a RV710 gives wrong output: http://i40.tinypic.com/t7gx09.png This is on current master, git-8eef6ad. Let me know what you need to debug this. Please send me the output with R600_DEBUG=sb,ps,vs Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600 sb test results
On 05/01/2013 11:36 PM, Lauri Kasanen wrote: Hi Vadim Now that it built, I could test your optimizations in my own apps. These are on current master 8eef6ad, on a RV710 (HD 4350 pci-e). In one of my private apps, using R600_DEBUG=sb caused regressions: FPS went from 28 to 7, the SSAO shader gave visual distortions/flicker, and the cpu was constantly pegged. Here's the output from R600_DEBUG=sb,sbstat in case it helps: http://bayfiles.net/file/Pmkh/PUj0Ru/vadim.gz It seems as if it's constantly handling new shaders? My app certainly issues no new shaders, they are all linked when the app starts. Hi, r600g may rebuild shaders at runtime because some GL features are implemented in shader code, so if your app changes some specific GL states (e.g. two-sided rendering mode), then r600g has to build and switch between different shader variants. On the other hand there is caching of shader variants in r600g implemented specially to prevent repetitive rebuilding of shaders, looks like it doesn't work in your case for some reason. Optimizations take more time than rebuilding with default backend, that explains performance regression. Could you provide some test app that reproduces these issues? Please also send me the dump with R600_DEBUG=sb,ps,vs, maybe I'll be able to spot anything wrong there. Let me know what you need to debug this. - Lauri PS: I'm not sure if this should be public or not, I think you're the only one working on it? Yes, I doubt that anyone else will work on it, on the other hand I think reporting this on the list might help other users who will possibly hit similar issues. Also at least in this case it looks rather like a problem in r600g, so I'm cc'ing mesa-dev, r600-sb just made this issue more noticeable because shader rebuilding with optimization requires more time. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: mask unused source components for SAMPLE
This results in more clean shader code and may improve the quality of optimized code produced by r600-sb due to eliminated false dependencies in some cases. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- There are no piglit regressions with this patch on evergreen. I consider this as a prerequisite for r600-sb branch, it fixes the performance regression with optimized shaders uncovered by some recent changes to tgsi and/or r600 codegen. If there are no objections or new suggestions, is it OK to push the latest version of r600-sb-2 branch [1] that includes this patch? The changes in the branch after the recent mail include 3 additional patches to improve handling of some corner cases (they fix some issues reported on IRC), also they add switching to unoptimized code in case of possible internal optimization problems, and new option sbnofallback for R600_DEBUG to disable such fallback. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2 src/gallium/drivers/r600/r600_shader.c | 20 1 file changed, 20 insertions(+) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 0204f80..aa88252 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4739,6 +4739,26 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) /* the array index is read from Z */ tex.coord_type_z = 0; + /* mask unused source components */ + if (opcode == FETCH_OP_SAMPLE) { + switch (inst-Texture.Texture) { + case TGSI_TEXTURE_2D: + case TGSI_TEXTURE_RECT: + tex.src_sel_z = 7; + tex.src_sel_w = 7; + break; + case TGSI_TEXTURE_1D_ARRAY: + tex.src_sel_y = 7; + tex.src_sel_w = 7; + break; + case TGSI_TEXTURE_1D: + tex.src_sel_y = 7; + tex.src_sel_z = 7; + tex.src_sel_w = 7; + break; + } + } + r = r600_bytecode_add_tex(ctx-bc, tex); if (r) return r; -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: mask unused source components for SAMPLE
On 04/27/2013 02:53 PM, Marek Olšák wrote: Reviewed-by: Marek Olšák mar...@gmail.com This looks incomplete though. There are a lot more texture opcodes and texture targets which could be handled there as well. Yes, this patch handles most trivial cases, though I think they are most frequently used cases as well. Also it covers all known to me cases where it caused problems for optimization. I'll look into other cases later - they are more complex, so there is more chances to break something (I'm not sure about piglit coverage for this), and IIRC many of them either actually use all components of source register or modify the swizzles in such a way that there is no unused components, e.g. xyzz with SHADOW2D/SAMPLE_C. Vadim Marek On Sat, Apr 27, 2013 at 10:29 AM, Vadim Girlin vadimgir...@gmail.com wrote: This results in more clean shader code and may improve the quality of optimized code produced by r600-sb due to eliminated false dependencies in some cases. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- There are no piglit regressions with this patch on evergreen. I consider this as a prerequisite for r600-sb branch, it fixes the performance regression with optimized shaders uncovered by some recent changes to tgsi and/or r600 codegen. If there are no objections or new suggestions, is it OK to push the latest version of r600-sb-2 branch [1] that includes this patch? The changes in the branch after the recent mail include 3 additional patches to improve handling of some corner cases (they fix some issues reported on IRC), also they add switching to unoptimized code in case of possible internal optimization problems, and new option sbnofallback for R600_DEBUG to disable such fallback. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2 src/gallium/drivers/r600/r600_shader.c | 20 1 file changed, 20 insertions(+) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 0204f80..aa88252 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4739,6 +4739,26 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) /* the array index is read from Z */ tex.coord_type_z = 0; + /* mask unused source components */ + if (opcode == FETCH_OP_SAMPLE) { + switch (inst-Texture.Texture) { + case TGSI_TEXTURE_2D: + case TGSI_TEXTURE_RECT: + tex.src_sel_z = 7; + tex.src_sel_w = 7; + break; + case TGSI_TEXTURE_1D_ARRAY: + tex.src_sel_y = 7; + tex.src_sel_w = 7; + break; + case TGSI_TEXTURE_1D: + tex.src_sel_y = 7; + tex.src_sel_z = 7; + tex.src_sel_w = 7; + break; + } + } + r = r600_bytecode_add_tex(ctx-bc, tex); if (r) return r; -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/21/2013 04:04 AM, Marek Olšák wrote: Ah, I didn't know you had any other env vars. It's preferable to have as many boolean flags as possible handled by a single env var, because it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of boolean flags hidden behind a magic number). Feel free to have separate env vars for more complex parameters. I skimmed through some of your code and the coding style looks good. I'm also okay with C++, it really seems like the right choice here. However I agree with the argument that one header file per cpp might not always be a good idea, especially if the header file is pretty small. Thanks for reviewing. I pushed to my repo the branch with the following changes: - changes to existing r600g code splitted from the main big patch - small header files merged into sb_pass.h, sb_ir.h, sb_bc.h - added new R600_DEBUG flags to replace multiple env vars: sb - Enable optimization of graphics shaders sbcl - Enable optimization of compute shaders sbdry - Dry run, optimize but don't use new bytecode sbstat - Print optimization statistics (currently the time only) sbdump - Print IR after some passes. - added debug_id (shader index) to struct r600_bytecode, id's are assigned to each shader in r600_bytecode_init and printed in the shader dump header, it's intended to avoid reinventing shader numbering in different places for dumps and debugging. - some minor cleanups Updated branch can be found here: http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2 Vadim Marek On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/20/2013 03:11 AM, Marek Olšák wrote: Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. I agree, those vars probably need some cleanup, they were added before R600_DEBUG appeared. Though I'm afraid some of my options won't fit well into the R600_DEBUG flags, unless we'll add support for the name/value pairs with optional custom parsers. E.g. I have a group of env vars to define the range of included/excluded shaders for optimization and mode (include/exclude/off), I thought about doing this with a single var and custom parser to specify the range e.g. as 10-20, but after all it's just a debug feature, not intended for everyday use, and so far I failed to convince myself that it's worth the efforts. I can implement the support for custom parsers for R600_DEBUG, but do we really need it? Maybe it would be enough to add e.g.sb instead of R600_SB var to the R600_DEBUG flags for enabling it (probably together with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated internal debug options as is? Vadim There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 03:11 AM, Marek Olšák wrote: Please don't add any new environment variables and use R600_DEBUG instead. The other environment variables are deprecated. I agree, those vars probably need some cleanup, they were added before R600_DEBUG appeared. Though I'm afraid some of my options won't fit well into the R600_DEBUG flags, unless we'll add support for the name/value pairs with optional custom parsers. E.g. I have a group of env vars to define the range of included/excluded shaders for optimization and mode (include/exclude/off), I thought about doing this with a single var and custom parser to specify the range e.g. as 10-20, but after all it's just a debug feature, not intended for everyday use, and so far I failed to convince myself that it's worth the efforts. I can implement the support for custom parsers for R600_DEBUG, but do we really need it? Maybe it would be enough to add e.g.sb instead of R600_SB var to the R600_DEBUG flags for enabling it (probably together with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated internal debug options as is? Vadim There is a table for R600_DEBUG in r600_pipe.c and it even comes with a help feature: R600_DEBUG=help Marek On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode structs. This results in the more readable shader dumps for instructions passed in native hw encoding from llvm backend. I think this also can help to catch more potential bugs related to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader dumps, including the fetch shaders, even when optimization is not enabled. Basically it can replace r600_bytecode_disasm and related code completely. Below are some quick benchmarks for shader performance and compilation time, to demonstrate that currently r600-sb might provide better performance for users, at least in some cases. As an example of the shaders
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 01:42 PM, Christian König wrote: Am 19.04.2013 18:50, schrieb Vadim Girlin: On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Yeah I already thought that you're using something like this. On one hand that is really good, cause it is specialized on so produces really optimal code for the r600 target. But on the other hand it's bad, cause it is specialized on so produces really optimal code ONLY on the r600 target I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target. Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful. Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist i/5 (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code. One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well. I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient. Just speculating, what would it take to make those passes run on the LLVM Machine Instruction representation instead of your own representation? Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified. Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress. Vadim [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html [2] http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results Christian. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb [2
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 03:38 PM, Christian König wrote: Am 20.04.2013 13:12, schrieb Vadim Girlin: On 04/20/2013 01:42 PM, Christian König wrote: Am 19.04.2013 18:50, schrieb Vadim Girlin: On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Yeah I already thought that you're using something like this. On one hand that is really good, cause it is specialized on so produces really optimal code for the r600 target. But on the other hand it's bad, cause it is specialized on so produces really optimal code ONLY on the r600 target I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target. Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful. Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist i/5 (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code. One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well. I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient. Just speculating, what would it take to make those passes run on the LLVM Machine Instruction representation instead of your own representation? Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified. Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress. Oh, I wasn't taking about replacing any LLVM passes, more like extending them to provide the same amount of functionality. Also I hadn't had LLVM IR in mind while writing this, but more the machine instruction representation they use. Well you have quite allot of C
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/20/2013 07:05 PM, Henri Verbeet wrote: On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote: The choice of C++ (unlike in my previous branch that used C) was mostly driven by the fact that optimization algorithms usually deal with a lot of different complex data structures, containers, etc, and C++ allows to isolate implementation of all such things in separate and easily replaceable classes and concentrate on the logic, making the code more clean and readable. I'm sure it would be good fun to have a discussion about the relative merits of C and C++, though I think I've seen enough actual C++ that you're not going to convince me it's the better language. I never wanted to convince you that C++ is better language, I just wanted to explain why I decided to switch from C to C++ in this particular case. However, I don't think that should be the main consideration. It's probably more important to consider what current and potential new contributors prefer, and on Linux, particularly for the more low-level stuff, I suspect that pretty much means C. Well, it may be considered as a low-level stuff because it's a part of the driver. On the other hand, I'd rather think of it as a part of the compiler, and compilers (especially optimization algorithms) don't really look like a low-level stuff to me. Depends on the definition of the low-level stuff though. To name a few examples, we can look at the compilers/optimizing backends used by mesa/gallium: GLSL compiler (written in C++). LLVM (written in C++), backends for nvidia drivers (written in C++)... Vadim I haven't tried to keep it as a series of independent patches because during the development most changes were pretty intrusive and introduced new features, some parts were seriously reworked/rewritten more than one time, requiring changes in other parts, especially when intermediate representation of the code was changed. It was usually easier for me to simply fix the new regressions in the new code than to revert any changes and lose new features, so bisection wouldn't be very helpful anyway. That's why I didn't even try to keep the history. Anyway most of the code in the branch is new, so I don't think that the history of the patches that rewrite the same code few times during a development would make it more readable than simply reading the final code. I think I'm just going to disagree there. (But of course that's all just my personal opinion, which probably doesn't carry a lot of weight at the moment.) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] r600g: status of the r600-sb branch
Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode structs. This results in the more readable shader dumps for instructions passed in native hw encoding from llvm backend. I think this also can help to catch more potential bugs related to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader dumps, including the fetch shaders, even when optimization is not enabled. Basically it can replace r600_bytecode_disasm and related code completely. Below are some quick benchmarks for shader performance and compilation time, to demonstrate that currently r600-sb might provide better performance for users, at least in some cases. As an example of the shaders with good optimization opportunities I used the application that computes and renders atmospheric scattering effects, it was mentioned in the previous thread: http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html Here are current results for that app (Main.noprecompute, frames per second) with default backend, default backend + r600-sb, and llvm backend: def def+sb llvm 240 590 248 Another quick benchmark is an OpenCL kernel performance with bfgminer (megahash/s): llvmllvm+sb 68 87 One more benchmark is for compilation speed/overhead - I used two piglit tests, first compiles a lot of shaders (IIRC more than thousand), second compiles a few huge shaders. Result is a test run time in seconds, this includes not only the compilation time but anyway shows the difference: def def+sb llvm tfb max-varyings10 14 53 fp-long-alu 0.170.380.68 This is especially important for GL apps, because longer compilation time results in the more significant freezes in the games etc. As for the quality of the compiled code in this test, of course generally llvm backend is already able to produce better code in some
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 07:23 PM, Henri Verbeet wrote: On 19 April 2013 16:48, Vadim Girlin vadimgir...@gmail.com wrote: In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Personally, I'd be in favour of merging this at some point. While I haven't exactly done extensive testing or benchmarking with the branch, the things I did try at least worked correctly, so I'd say that's a good start at least. I'm afraid I can't claim extensive review either, but I guess the most obvious things I don't like about it are that it's C++, and spread over a large number of 1000 line files. Similarly, I don't really see the point of having a header file for just about each .cpp file. One for private interfaces and one for the public interface should probably be plenty. I thought about that, but I'm just not sure what would be a preferred way. I agree that a lot of small files don't look very good, on the other hand it makes all classes better separated and readable, that's why I was not sure which way is best. Of course I can merge some files together if it's preferable. I'm not quite sure how others feel about that, although I suspect I'm not alone in at least the preference of C over C++. The choice of C++ (unlike in my previous branch that used C) was mostly driven by the fact that optimization algorithms usually deal with a lot of different complex data structures, containers, etc, and C++ allows to isolate implementation of all such things in separate and easily replaceable classes and concentrate on the logic, making the code more clean and readable. I also suspect it would help if this was some kind of logical, bisectable series of patches instead of a single commit that adds 18k+ lines. I haven't tried to keep it as a series of independent patches because during the development most changes were pretty intrusive and introduced new features, some parts were seriously reworked/rewritten more than one time, requiring changes in other parts, especially when intermediate representation of the code was changed. It was usually easier for me to simply fix the new regressions in the new code than to revert any changes and lose new features, so bisection wouldn't be very helpful anyway. That's why I didn't even try to keep the history. Anyway most of the code in the branch is new, so I don't think that the history of the patches that rewrite the same code few times during a development would make it more readable than simply reading the final code. Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 07:13 PM, � wrote: Hi Vadim, from your description it seems to be a post processing stage working on the bytecode of the shaders and additional to that is quite separated from the rest of the driver. Yes, currently it's more like a post-processing stage, though on the other hand the only missing thing to consider it as a complete backend is an initial TGSI translator (that is, a sort of instruction selection pass). Basically it's exactly what default backend in the r600g does. I thought about writing direct translator from TGSI to my IR, but it would require some time and benefits aren't very clear, except the slightly reduced translation time. It's easier to rely on the default backend for that, and also it simplifies debugging by providing the ability to see and compare both the source (after default backend) and optimized bytecode. If that's the case then I don't really see a reason why we shouldn't merge it, but at least at the beginning it should probably be disabled by default. Yes, I agree that it's better to make it disabled as default, it's currently enabled in my branch just to simplify testing, but I'll change that in case if we'll merge the branch. On the other hand we should question if there are any optimizations in there that could be done on earlier stages, something like on the GLSL level for example? In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Vadim Cheers, Christian. Am 19.04.2013 16:48, schrieb Vadim Girlin: Hi, In the previous status update I said that the r600-sb branch is not ready to be merged yet, but recently I've done some cleanups and reworks, and though I haven't finished everything that I planned initially, I think now it's in a better state and may be considered for merging. I'm interested to know if the people think that merging of the r600-sb branch makes sense at all. I'll try to explain here why it makes sense to me. Although I understand that the development of llvm backend is a primary goal for the r600g developers, it's a complicated process and may require quite some time to achieve good results regarding the shader/compiler performance, and at the same time this branch already works and provides good results in many cases. That's why I think it makes sense to merge this branch as a non-default backend at least as a temporary solution for shader performance problems. We can always get rid of it if it becomes too much a maintenance burden or when llvm backend catches up in terms of shader performance and compilation speed/overhead. Regarding the support and maintenance of this code, I'll try to do my best to fix possible issues, and so far there are no known unfixed issues. I tested it with many apps on evergreen and fixed all issues with other chips that were reported to me on the list or privately after the last status announce. There are no piglit regressions on evergreen when this branch is used with both default and llvm backends. This code was intentionally separated as much as possible from the other parts of the driver, basically there are just two functions used from r600g, and the shader code is passed to/from r600-sb as a hardware bytecode that is not going to change. I think it won't require any modifications at all to keep it in sync with the most changes in r600g. Some work might be required though if we'll want to add support for the new hw features that are currently unused, e.g. geometry shaders, new instruction types for compute shaders, etc, but I think I'll be able to catch up when it's implemented in the driver and default or llvm backend. E.g. this branch already works for me on evergreen with some simple OpenCL kernels, including bfgminer where it increases performance of the kernel compiled with llvm backend by more than 20% for me. Besides the performance benefits, I think that alternative backend also might help with debugging of the default or llvm backend, in some cases it helped me by exposing the bugs that are not very obvious otherwise, e.g. it may be hard to compare the dumps from default and llvm backend to spot the regression because they are too different, but after processing both shaders with r600-sb the code is usually transformed to some more common form, and often this makes it easier to compare and find the differences in shader logic. One additional feature that might help with llvm backend debugging is the disassembler that works on the hardware bytecode instead of the internal r600g bytecode
Re: [Mesa-dev] r600g: status of the r600-sb branch
On 04/19/2013 08:35 PM, Christian König wrote: Hey Vadim, Am 19.04.2013 18:18, schrieb Vadim Girlin: [SNIP] In theory, yes, some optimizations in this branch are typically used on the earlier compilation stages, not on the target machine code. On the other hand, there are some differences that might make it harder, e.g. many algorithms require SSA form, and though it's possible to do similar optimizations without SSA, it would be hard to implement. Also I wanted to support both default backend and llvm backend for increased testing coverage and to be able to compare the efficiency of the algorithms in my experiments etc. Yeah I know, missing an SSA implementation is also something that always bothered me a bit with both TGSI and GLSL (while I haven't done much with GLSL, so maybe I misjudge here). Can you name the different algorithms used? There is a short description of the algorithms and passes in the notes.markdown file [1] in that branch, there are also links in the end to the full description of some algorithms, though some of them were modified/adapted for this branch. It's not a strict prerequisite, but I think we both agree that doing things like LICM on R600 bytecode isn't the best idea over all (when doing it on GLSL would be beneficial for all drivers not only r600). In fact there is no special LICM pass, it's done by the GCM (Global Code Motion, [2]), which probably could be also called global scheduler. In fact in my branch this pass is combined with some hw-specific scheduling logic, e.g. grouping fetch/alu instructions to reduce clause type switching in the code and the number of required CF instructions, potentially it can also schedule clauses to expose more parallelism with the BARRIER bit usage. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb [2] http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf Regards, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] r600g: Workaround for a harware bug with nested loops on Cayman
On 04/15/2013 11:22 PM, Martin Andersson wrote: There is a hardware bug on Cayman where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested loops may put the branch stack into a state such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE with a PUSH + ALU Fixes piglit tests EXT_transform_feedback/order* v2: Use existing loop count and improve comment --- src/gallium/drivers/r600/r600_shader.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6dbca50..f4398fd 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx) return 0; } -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int alu_type) { struct r600_bytecode_alu alu; int r; @@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) alu.last = 1; - r = r600_bytecode_add_alu_type(ctx-bc, alu, CF_OP_ALU_PUSH_BEFORE); + r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type); if (r) return r; return 0; @@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct r600_shader_ctx *ctx, unsigned fc_sp) static int tgsi_if(struct r600_shader_ctx *ctx) { - emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT); + int alu_type = CF_OP_ALU_PUSH_BEFORE; + + /* There is a hardware bug on Cayman where a BREAK/CONTINUE followed by +* LOOP_STARTxxx for nested loops may put the branch stack into a state +* such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this +* by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */ + if (ctx-bc-chip_class == CAYMAN ctx-bc-stack.loop 1) { + r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH); Oh, it seems I overlooked potential issue here: jump address for PUSH is not set properly, so I guess there will be GPU lockups in case of a jump. Ideally we could set it to jump over the whole IF-ENDIF block if there are no active threads, but I think it's a rare case, so simplest fix is to avoid computation of the address and set jump address for PUSH to the next instruction, like this: ctx-bc-cf_last-cf_addr = ctx-bc-cf_last-id + 2; We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all so I think at least we won't have any serious performance regressions. Everything else looks ok, so I think I'll commit your patch with this change soon if there are no objections. Vadim + alu_type = CF_OP_ALU; + } + + emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type); r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman
On 04/15/2013 10:52 AM, Martin Andersson wrote: On Mon, Apr 15, 2013 at 1:09 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/15/2013 01:05 AM, Martin Andersson wrote: There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested loops may put the branch stack into a state such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE with a PUSH + ALU for nested loops. Fixes piglit tests: spec/!OpenGL 1.1/read-front spec/EXT_transform_feedback/order* spec/glsl-1.40/uniform_buffer/fs-struct-pad No piglit regressions. --- src/gallium/drivers/r600/r600_shader.c | 33 ++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6dbca50..aee011e 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx); static int tgsi_bgnloop(struct r600_shader_ctx *ctx); static int tgsi_endloop(struct r600_shader_ctx *ctx); static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx); +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx); /* * bytestream - r600 shader @@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx) return 0; } -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int alu_type) { struct r600_bytecode_alu alu; int r; @@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) alu.last = 1; - r = r600_bytecode_add_alu_type(ctx-bc, alu, CF_OP_ALU_PUSH_BEFORE); + r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type); if (r) return r; return 0; @@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct r600_shader_ctx *ctx, unsigned fc_sp) static int tgsi_if(struct r600_shader_ctx *ctx) { - emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT); + int alu_type = CF_OP_ALU_PUSH_BEFORE; + + /* + There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested + loops may put the branch stack into a state such that ALU_PUSH_BEFORE + doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE + with a PUSH + ALU for nested loops. +*/ + if (ctx-bc-chip_class == CAYMAN need_cayman_loop_bug_workaround(ctx)) { We already have current loop level for the stack size computation, see r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call may be replaced with ctx-bc-stack.loop 1, if I'm not missing something. Ok, will try that tonight. Should I add a comment that it is a hardware bug? Yes, you might want to clarify that it's a hw bug on cayman, though I think it's OK either way. Also git complains about some trailing spaces in your patch. With that change for condition (and removal of the need_cayman_... function that becomes unused) and fixed whitespace issues, the patch looks good to me. Vadim Vadim + r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH); + alu_type = CF_OP_ALU; + } + + emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type); r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP); @@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx) return 0; } +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx) +{ + unsigned int fscp; + int num_loops = 0; + for (fscp = ctx-bc-fc_sp; fscp 0; fscp--) + { + if (FC_LOOP == ctx-bc-fc_stack[fscp].type) + ++num_loops; + } + + return num_loops = 2; +} + static int tgsi_umad(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: handle drirc disable_glsl_line_continuations option
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/include/state_tracker/st_api.h | 1 + src/gallium/state_trackers/dri/common/dri_context.c | 2 ++ src/gallium/state_trackers/dri/common/dri_screen.c | 3 ++- src/mesa/state_tracker/st_extensions.c | 3 +++ 4 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/gallium/include/state_tracker/st_api.h b/src/gallium/include/state_tracker/st_api.h index 9f3d2a1..52c9dc0 100644 --- a/src/gallium/include/state_tracker/st_api.h +++ b/src/gallium/include/state_tracker/st_api.h @@ -240,6 +240,7 @@ struct st_visual struct st_config_options { boolean force_glsl_extensions_warn; + boolean disable_glsl_line_continuations; }; /** diff --git a/src/gallium/state_trackers/dri/common/dri_context.c b/src/gallium/state_trackers/dri/common/dri_context.c index 49cd794..58a710d 100644 --- a/src/gallium/state_trackers/dri/common/dri_context.c +++ b/src/gallium/state_trackers/dri/common/dri_context.c @@ -54,6 +54,8 @@ static void dri_fill_st_options(struct st_config_options *options, { options-force_glsl_extensions_warn = driQueryOptionb(optionCache, force_glsl_extensions_warn); + options-disable_glsl_line_continuations = + driQueryOptionb(optionCache, disable_glsl_line_continuations); } GLboolean diff --git a/src/gallium/state_trackers/dri/common/dri_screen.c b/src/gallium/state_trackers/dri/common/dri_screen.c index 2f525a2..fd2971c 100644 --- a/src/gallium/state_trackers/dri/common/dri_screen.c +++ b/src/gallium/state_trackers/dri/common/dri_screen.c @@ -66,6 +66,7 @@ PUBLIC const char __driConfigOptions[] = DRI_CONF_SECTION_DEBUG DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) + DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_SECTION_END DRI_CONF_SECTION_MISCELLANEOUS @@ -75,7 +76,7 @@ PUBLIC const char __driConfigOptions[] = #define false 0 -static const uint __driNConfigOptions = 11; +static const uint __driNConfigOptions = 12; static const __DRIconfig ** dri_fill_in_modes(struct dri_screen *screen) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index f986480..ffb9f7e 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -714,6 +714,9 @@ void st_init_extensions(struct st_context *st) if (st-options.force_glsl_extensions_warn) ctx-Const.ForceGLSLExtensionsWarn = 1; + if (st-options.disable_glsl_line_continuations) + ctx-Const.DisableGLSLLineContinuations = 1; + ctx-Const.MinMapBufferAlignment = screen-get_param(screen, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT); if (ctx-Const.MinMapBufferAlignment = 64) { -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/13/2013 09:54 PM, Martin Andersson wrote: On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/12/2013 11:36 PM, Martin Andersson wrote: I have made some progress with this issue. Vadim, I did as you suggested and tried to mimic the output from the shader analyser tool. I used your patch as a base and then tried various ways to see what would work. After many tries (and lockups) I did managed to get the ext_transform_feedback/order test to pass. It is a very ugly patch but it should illustrate what the problem (and potential solution) is. Your test program fails however because explicit break statements do not work. It should be possible to use the same code for the explicit breaks as for the implicit loop break.The reason it does not is that I detect the implicit break with a hack and it does notwork for explicit breaks. The problem is that I need to detect the break statement when creating the corresponding if statement. So that I can treat it differently than other regular if statements. Anyone knows how I could do that, or is this the wrong approach? It doesn't work with my test app because IF/ENDIF blocks with BRK may contain other code, so you can't simply throw away IF/ENDIF making that code execute unconditionally. Yeah my hack is not an viable option. By the way, shader analyzer in some cases also produces the code with JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really work as expected with catalyst. Possibly we're simply missing something in the hardware configuration. Also there is one thing that I didn't take into account in my initial patch - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain why my initial patch doesn't work. Possibly if we prevent that optimization for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough to make it work. I'm attaching the additional patch that will force POP to be a separate instruction in this case, please test it (on top of the my first patch). This would be at least not very intrusive. No, that patch did not help either. If this won't help, then I think we should understand what exactly we are trying to fix before implementing any big changes, possibly there is a better solution or at least a more clean workaround. In the worst case we can return to your approach and improve it to handle other cases. I'm starting to think that there is nothing wrong with the shader compiler. It seems to me that a push, pop inside a nested loop clears the break status on a thread. shift_reg = 1u; count = 0u; while (true) { if (x == 1u) break; while (true) { if (x != 1u) count = 10u; if (x == 1u) count = 20u; break; } shift_reg = 2u; break; } input: x == 0 actual ouput: shift_reg == 2, count == 10 expected output: shift_reg == 2, count == 10 input: x == 1 actual ouput: shift_reg == 2, count == 20 expected output: shift_reg == 1, count == 0 If I swap the if statements in the inner loop I get different results. shift_reg = 1u; count = 0u; while (true) { if (x == 1u) break; while (true) { if (x == 1u) count = 20u; if (x != 1u) count = 10u; break; } shift_reg = 2u; break; } input: x == 0 actual ouput: shift_reg == 2, count == 10 expected output: shift_reg == 2, count == 10 input: x == 1 actual ouput: shift_reg == 2, count == 0 expected output: shift_reg == 1, count == 0 I tested both cases on mesa master and mesa master + Vadims two patches with the same results. This turned out to be a known issue with cayman: BREAK/CONTINUE followed by LOOP_STARTxxx for nested loop may put the branch stack into the state such that ALU_PUSH_BEFORE doesn't work as expected. It seems the simplest workaround is either to avoid ALU_PUSH_BEFORE in nested loops completely or to replace it with separate PUSH and ALU. We can check if we actually have BREAK/CONTINUE in the outer loop before LOOP_START for the inner loop, but I think it will be true in most cases, so the simplest fix for r600g is to replace all ALU_PUSH_BEFORE with PUSH + ALU in the nested loops on cayman. Vadim //Martin Vadim //Martin On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/11/2013 02:08 AM, Marek Olšák wrote: Here's the output: creating vs ... shader compilation status: OK creating fs ... shader compilation status: OK thread #0 (0;0) : ref = 16608 thread #1 (1;0) : ref = 27873 thread #2 (0;1) : ref = 16608 thread #3 (1;1) : ref = 27877 results: thread 0 (0, 0): expected = 16608, observed = 27876, FAIL thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 27876, FAIL thread 3 (1, 1): expected = 27877, observed = 27877, OK Thanks. According to these results, it looks like
Re: [Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman
On 04/15/2013 01:05 AM, Martin Andersson wrote: There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested loops may put the branch stack into a state such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE with a PUSH + ALU for nested loops. Fixes piglit tests: spec/!OpenGL 1.1/read-front spec/EXT_transform_feedback/order* spec/glsl-1.40/uniform_buffer/fs-struct-pad No piglit regressions. --- src/gallium/drivers/r600/r600_shader.c | 33 ++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6dbca50..aee011e 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx); static int tgsi_bgnloop(struct r600_shader_ctx *ctx); static int tgsi_endloop(struct r600_shader_ctx *ctx); static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx); +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx); /* * bytestream - r600 shader @@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx) return 0; } -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int alu_type) { struct r600_bytecode_alu alu; int r; @@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode) alu.last = 1; - r = r600_bytecode_add_alu_type(ctx-bc, alu, CF_OP_ALU_PUSH_BEFORE); + r = r600_bytecode_add_alu_type(ctx-bc, alu, alu_type); if (r) return r; return 0; @@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct r600_shader_ctx *ctx, unsigned fc_sp) static int tgsi_if(struct r600_shader_ctx *ctx) { - emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT); + int alu_type = CF_OP_ALU_PUSH_BEFORE; + + /* + There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested + loops may put the branch stack into a state such that ALU_PUSH_BEFORE + doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE + with a PUSH + ALU for nested loops. +*/ + if (ctx-bc-chip_class == CAYMAN need_cayman_loop_bug_workaround(ctx)) { We already have current loop level for the stack size computation, see r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call may be replaced with ctx-bc-stack.loop 1, if I'm not missing something. Vadim + r600_bytecode_add_cfinst(ctx-bc, CF_OP_PUSH); + alu_type = CF_OP_ALU; + } + + emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type); r600_bytecode_add_cfinst(ctx-bc, CF_OP_JUMP); @@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx) return 0; } +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx) +{ + unsigned int fscp; + int num_loops = 0; + for (fscp = ctx-bc-fc_sp; fscp 0; fscp--) + { + if (FC_LOOP == ctx-bc-fc_stack[fscp].type) + ++num_loops; + } + + return num_loops = 2; +} + static int tgsi_umad(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/12/2013 11:36 PM, Martin Andersson wrote: I have made some progress with this issue. Vadim, I did as you suggested and tried to mimic the output from the shader analyser tool. I used your patch as a base and then tried various ways to see what would work. After many tries (and lockups) I did managed to get the ext_transform_feedback/order test to pass. It is a very ugly patch but it should illustrate what the problem (and potential solution) is. Your test program fails however because explicit break statements do not work. It should be possible to use the same code for the explicit breaks as for the implicit loop break.The reason it does not is that I detect the implicit break with a hack and it does notwork for explicit breaks. The problem is that I need to detect the break statement when creating the corresponding if statement. So that I can treat it differently than other regular if statements. Anyone knows how I could do that, or is this the wrong approach? It doesn't work with my test app because IF/ENDIF blocks with BRK may contain other code, so you can't simply throw away IF/ENDIF making that code execute unconditionally. By the way, shader analyzer in some cases also produces the code with JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really work as expected with catalyst. Possibly we're simply missing something in the hardware configuration. Also there is one thing that I didn't take into account in my initial patch - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain why my initial patch doesn't work. Possibly if we prevent that optimization for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough to make it work. I'm attaching the additional patch that will force POP to be a separate instruction in this case, please test it (on top of the my first patch). This would be at least not very intrusive. If this won't help, then I think we should understand what exactly we are trying to fix before implementing any big changes, possibly there is a better solution or at least a more clean workaround. In the worst case we can return to your approach and improve it to handle other cases. Vadim //Martin On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/11/2013 02:08 AM, Marek Olšák wrote: Here's the output: creating vs ... shader compilation status: OK creating fs ... shader compilation status: OK thread #0 (0;0) : ref = 16608 thread #1 (1;0) : ref = 27873 thread #2 (0;1) : ref = 16608 thread #3 (1;1) : ref = 27877 results: thread 0 (0, 0): expected = 16608, observed = 27876, FAIL thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 27876, FAIL thread 3 (1, 1): expected = 27877, observed = 27877, OK Thanks. According to these results, it looks like LOOP_START_DX10 for inner loop somehow reactivates the threads that were put into inactive-break state by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the inner loop doesn't work as expected in this case. In other words, it looks weird. I can't explain why would this happen. It might be interesting to run these tests with llvm backend to see if there are any differences. Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP in the PRED_SET encoding as in my earlier patch, but without any stack push/pop operations and jumps (where it's possible), closer to what the catalyst (shader analyzer) does. I'm not sure if it will help though, and anyway we'll need stack operations in some cases, so I'm afraid this won't fix the issue completely. So far I have no other ideas. Vadim Marek On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin vadimgir...@gmail.comwrote: On 04/10/2013 01:53 PM, Marek Olšák wrote: glsl-fs-loop-nested passes here. nstack is 3 and adding 4 to it doesn't help. Ok, thanks. Also I wrote a simple test app that should reproduce the issue if it's really related to diverging control flow with nested loops and might more information about what's going wrong. The source is in the attachment and needs to be compiled with -lGL -lglut -lGLEW. The app renders four points and computes some value for each point in the loops similar to the transform feedback order test, but it doesn't use tfb. It should render four green or red squares depending on correctness of the result. Here is the correct output produced for me on evergreen: thread 0 (0, 0): expected = 16608, observed = 16608, OK thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 16608, OK thread 3 (1, 1): expected = 27877, observed = 27877, OK Please post the output if it fails on cayman. Vadim Marek On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/10/2013 03:58 AM, Marek Olšák wrote: Hi Vadim, your patch does not fix the test. Hmm
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/11/2013 02:08 AM, Marek Olšák wrote: Here's the output: creating vs ... shader compilation status: OK creating fs ... shader compilation status: OK thread #0 (0;0) : ref = 16608 thread #1 (1;0) : ref = 27873 thread #2 (0;1) : ref = 16608 thread #3 (1;1) : ref = 27877 results: thread 0 (0, 0): expected = 16608, observed = 27876, FAIL thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 27876, FAIL thread 3 (1, 1): expected = 27877, observed = 27877, OK Thanks. According to these results, it looks like LOOP_START_DX10 for inner loop somehow reactivates the threads that were put into inactive-break state by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the inner loop doesn't work as expected in this case. In other words, it looks weird. I can't explain why would this happen. It might be interesting to run these tests with llvm backend to see if there are any differences. Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP in the PRED_SET encoding as in my earlier patch, but without any stack push/pop operations and jumps (where it's possible), closer to what the catalyst (shader analyzer) does. I'm not sure if it will help though, and anyway we'll need stack operations in some cases, so I'm afraid this won't fix the issue completely. So far I have no other ideas. Vadim Marek On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin vadimgir...@gmail.comwrote: On 04/10/2013 01:53 PM, Marek Olšák wrote: glsl-fs-loop-nested passes here. nstack is 3 and adding 4 to it doesn't help. Ok, thanks. Also I wrote a simple test app that should reproduce the issue if it's really related to diverging control flow with nested loops and might more information about what's going wrong. The source is in the attachment and needs to be compiled with -lGL -lglut -lGLEW. The app renders four points and computes some value for each point in the loops similar to the transform feedback order test, but it doesn't use tfb. It should render four green or red squares depending on correctness of the result. Here is the correct output produced for me on evergreen: thread 0 (0, 0): expected = 16608, observed = 16608, OK thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 16608, OK thread 3 (1, 1): expected = 27877, observed = 27877, OK Please post the output if it fails on cayman. Vadim Marek On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/10/2013 03:58 AM, Marek Olšák wrote: Hi Vadim, your patch does not fix the test. Hmm, I'm out of ideas then. Thanks for testing. I've checked the shader dump few times but I don't see anything obviously wrong there, and the same code (except the minor ALU grouping changes due to the VLIW4/VLIW5 difference) works fine for me on evergreen. According to the Martin's observations it looks like if the threads that shouldn't execute the loop body were incorrectly left in the active state. LOOP_BREAK should put them into the inactive-break state, but something goes wrong. Do the other piglit tests with nested loops (e.g. glsl-fs-loop-nested) work on cayman? Though possibly there are no other tests with the diverging loops as in this case. I'll try to write a simpler test with the diverging loops to see if the issue is really caused by the incorrect control flow handling, and to figure out the exact instruction that results in the incorrect active state. Also probably it worth checking if the stack size is correct for that shader (latest mesa should print nstack value in the shader disassemble header, I think it should be 3 for that shader) and maybe try adding some constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just to be sure that we reserve enough of stack space, though I don't think stack size is the cause of this issue. Vadim Marek On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/09/2013 10:58 AM, Martin Andersson wrote: On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote: Pushed, thanks. The transform feedback test still doesn't pass, but at least the hardlocks are gone. Thanks, I have looked into the other issue as well http://lists.freedesktop.org/**archives/mesa-dev/2013-**March/**http://lists.freedesktop.org/archives/mesa-dev/2013-March/** **036941.htmlhttp://lists.**freedesktop.org/**archives/** mesa-dev/2013-March/**036941.**htmlhttp://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html http://lists.**freedesktop.**org/archives/mesa-**http://freedesktop.org/archives/mesa-** dev/2013-March/036941.htmlhtt**p://lists.freedesktop.org/** archives/mesa-dev/2013-March/**036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html The problem arises when there are nested loops. If I rework the code so there are no nested
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/10/2013 03:58 AM, Marek Olšák wrote: Hi Vadim, your patch does not fix the test. Hmm, I'm out of ideas then. Thanks for testing. I've checked the shader dump few times but I don't see anything obviously wrong there, and the same code (except the minor ALU grouping changes due to the VLIW4/VLIW5 difference) works fine for me on evergreen. According to the Martin's observations it looks like if the threads that shouldn't execute the loop body were incorrectly left in the active state. LOOP_BREAK should put them into the inactive-break state, but something goes wrong. Do the other piglit tests with nested loops (e.g. glsl-fs-loop-nested) work on cayman? Though possibly there are no other tests with the diverging loops as in this case. I'll try to write a simpler test with the diverging loops to see if the issue is really caused by the incorrect control flow handling, and to figure out the exact instruction that results in the incorrect active state. Also probably it worth checking if the stack size is correct for that shader (latest mesa should print nstack value in the shader disassemble header, I think it should be 3 for that shader) and maybe try adding some constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just to be sure that we reserve enough of stack space, though I don't think stack size is the cause of this issue. Vadim Marek On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/09/2013 10:58 AM, Martin Andersson wrote: On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote: Pushed, thanks. The transform feedback test still doesn't pass, but at least the hardlocks are gone. Thanks, I have looked into the other issue as well http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html The problem arises when there are nested loops. If I rework the code so there are no nested loops the issue disappears. At least one pixel also needs to enter the outer loop. The pixels that should enter the outer loop behaves correctly. It is those pixels that should not enter the outer loop that misbehaves. It does not matter if they also fails the test for the inner loop, they will still execute the instruction inside. That leads to the strange results for that test. Please test the attached patch. Vadim The strangeness is easier to see if the NUM_POINTS in the ext_transform_feedback/ order.c are run with smaller values,like 3, 6 and 9. Disable the code that fail the test and print starting_x, shift_reg_final and iteration_count. Marek, since you implemented transform feedback for r600, do you think the issue is with the tranform feedback code or the shader compiler or some other thing? //Martin __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] GLSL compiler bug
Hi, It seems there is a bug in the compiler. The problem may be reproduced with the following shader (complete shader_test file attached): void main() { float f = 0.0; while (true) { f = 1.0; break; f = 0.5; } gl_FragColor = vec4(1.0 - f, f, 0.0, 1.0); } The result of compilation is equal to: while (true) { break; } gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); In other words, GLSL compiler eliminates both assignments to f in the loop body and the resulting value of the f variable is 0. Vadim [require] GLSL = 1.20 [vertex shader] void main() { gl_Position = gl_Vertex; } [fragment shader] void main() { float f = 0.0; while (true) { f = 1.0; break; f = 0.5; } gl_FragColor = vec4(1.0 - f, f, 0.0, 1.0); } [test] clear color 0.0 0.0 0.0 0.0 clear draw rect -1 -1 2 2 probe all rgba 0.0 1.0 0.0 1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/10/2013 01:53 PM, Marek Olšák wrote: glsl-fs-loop-nested passes here. nstack is 3 and adding 4 to it doesn't help. Ok, thanks. Also I wrote a simple test app that should reproduce the issue if it's really related to diverging control flow with nested loops and might more information about what's going wrong. The source is in the attachment and needs to be compiled with -lGL -lglut -lGLEW. The app renders four points and computes some value for each point in the loops similar to the transform feedback order test, but it doesn't use tfb. It should render four green or red squares depending on correctness of the result. Here is the correct output produced for me on evergreen: thread 0 (0, 0): expected = 16608, observed = 16608, OK thread 1 (1, 0): expected = 27873, observed = 27873, OK thread 2 (0, 1): expected = 16608, observed = 16608, OK thread 3 (1, 1): expected = 27877, observed = 27877, OK Please post the output if it fails on cayman. Vadim Marek On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/10/2013 03:58 AM, Marek Olšák wrote: Hi Vadim, your patch does not fix the test. Hmm, I'm out of ideas then. Thanks for testing. I've checked the shader dump few times but I don't see anything obviously wrong there, and the same code (except the minor ALU grouping changes due to the VLIW4/VLIW5 difference) works fine for me on evergreen. According to the Martin's observations it looks like if the threads that shouldn't execute the loop body were incorrectly left in the active state. LOOP_BREAK should put them into the inactive-break state, but something goes wrong. Do the other piglit tests with nested loops (e.g. glsl-fs-loop-nested) work on cayman? Though possibly there are no other tests with the diverging loops as in this case. I'll try to write a simpler test with the diverging loops to see if the issue is really caused by the incorrect control flow handling, and to figure out the exact instruction that results in the incorrect active state. Also probably it worth checking if the stack size is correct for that shader (latest mesa should print nstack value in the shader disassemble header, I think it should be 3 for that shader) and maybe try adding some constant, e.g. 4 to the bc-nstack in the r600_bytecode_build just to be sure that we reserve enough of stack space, though I don't think stack size is the cause of this issue. Vadim Marek On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 04/09/2013 10:58 AM, Martin Andersson wrote: On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote: Pushed, thanks. The transform feedback test still doesn't pass, but at least the hardlocks are gone. Thanks, I have looked into the other issue as well http://lists.freedesktop.org/archives/mesa-dev/2013-March/** **036941.htmlhttp://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html http://lists.**freedesktop.org/archives/mesa-** dev/2013-March/036941.htmlhttp://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html The problem arises when there are nested loops. If I rework the code so there are no nested loops the issue disappears. At least one pixel also needs to enter the outer loop. The pixels that should enter the outer loop behaves correctly. It is those pixels that should not enter the outer loop that misbehaves. It does not matter if they also fails the test for the inner loop, they will still execute the instruction inside. That leads to the strange results for that test. Please test the attached patch. Vadim The strangeness is easier to see if the NUM_POINTS in the ext_transform_feedback/ order.c are run with smaller values,like 3, 6 and 9. Disable the code that fail the test and print starting_x, shift_reg_final and iteration_count. Marek, since you implemented transform feedback for r600, do you think the issue is with the tranform feedback code or the shader compiler or some other thing? //Martin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-devhttp://lists.freedesktop.org/**mailman/listinfo/mesa-dev htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev #include stdio.h #include stdlib.h #include GL/glew.h #include GL/glut.h const char *vss = #version 130\n in int x, y, ref; flat out int b, fref; void main() { b = 0; int i = 0, j = 0; b |= 32; while (true) { b |= 64
Re: [Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
On 04/09/2013 10:58 AM, Martin Andersson wrote: On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák mar...@gmail.com wrote: Pushed, thanks. The transform feedback test still doesn't pass, but at least the hardlocks are gone. Thanks, I have looked into the other issue as well http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html The problem arises when there are nested loops. If I rework the code so there are no nested loops the issue disappears. At least one pixel also needs to enter the outer loop. The pixels that should enter the outer loop behaves correctly. It is those pixels that should not enter the outer loop that misbehaves. It does not matter if they also fails the test for the inner loop, they will still execute the instruction inside. That leads to the strange results for that test. Please test the attached patch. Vadim The strangeness is easier to see if the NUM_POINTS in the ext_transform_feedback/ order.c are run with smaller values,like 3, 6 and 9. Disable the code that fail the test and print starting_x, shift_reg_final and iteration_count. Marek, since you implemented transform feedback for r600, do you think the issue is with the tranform feedback code or the shader compiler or some other thing? //Martin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev From 46456ca7ecfa3f0b107b1f9106d024f9f239a571 Mon Sep 17 00:00:00 2001 From: Vadim Girlin vadimgir...@gmail.com Date: Wed, 10 Apr 2013 01:20:19 +0400 Subject: [PATCH] r600g: use ALU EXECUTE_MASK_OP on cayman instead of LOOP_BREAK/CONTINUE Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_asm.c| 14 -- src/gallium/drivers/r600/r600_shader.c | 24 +++- src/gallium/drivers/r600/r600d.h | 5 + 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 26a848a..2874adf 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -1985,6 +1985,7 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) LIST_FOR_EACH_ENTRY(alu, cf-alu, list) { const char *omod_str[] = {,*2,*4,/2}; const struct alu_op_info *aop = r600_isa_alu(alu-op); + bool cm_execmask_op = alu-execute_mask bc-chip_class == CAYMAN; int o = 0; r600_bytecode_alu_nliterals(bc, alu, literal, nliteral); @@ -1997,8 +1998,10 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) alu-update_pred ? 'P':' ', alu-pred_sel ? alu-pred_sel==2 ? '0':'1':' '); - o += fprintf(stderr, %s%s%s , aop-name, - omod_str[alu-omod], alu-dst.clamp ? _sat:); + o += fprintf(stderr, %s , aop-name); + if (!cm_execmask_op) +o += fprintf(stderr, %s , omod_str[alu-omod]); + o += fprintf(stderr, %s , alu-dst.clamp ? _sat:); o += print_indent(o,60); o += print_dst(alu); @@ -2012,6 +2015,13 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) o += fprintf(stderr, BS:%d, alu-bank_swizzle); } + if (cm_execmask_op alu-omod) { +static const char* cm_em_op_names[] = + {BREAK, CONTINUE, KILL}; + +fprintf(stderr, %s, cm_em_op_names[alu-omod - 1]); + } + fprintf(stderr, \n); id += 2; diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index f801707..d1cac36 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -5827,7 +5827,29 @@ static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx) return -EINVAL; } - r600_bytecode_add_cfinst(ctx-bc, ctx-inst_info-op); + + if (ctx-bc-chip_class == CAYMAN) { + struct r600_bytecode_alu alu = {}; + int r; + + alu.op = ALU_OP2_PRED_SETE; + alu.src[0].sel = V_SQ_ALU_SRC_0; + alu.src[1].sel = V_SQ_ALU_SRC_1; + + if (ctx-inst_info-op == CF_OP_LOOP_BREAK) + alu.omod = SQ_ALU_EXECUTE_MASK_OP_BREAK; + else + alu.omod = SQ_ALU_EXECUTE_MASK_OP_CONTINUE; + + alu.execute_mask = 1; + alu.last = 1; + + r = r600_bytecode_add_alu(ctx-bc, alu); + if (r) + return r; + } else { + r600_bytecode_add_cfinst(ctx-bc, ctx-inst_info-op); + } fc_set_mid(ctx, fscp); diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h index 9b31383..679dd81 100644 --- a/src/gallium/drivers/r600/r600d.h +++ b/src/gallium/drivers/r600/r600d.h @@ -3698,4 +3698,9 @@ #define DMA_PACKET_CONSTANT_FILL 0xd /* 7xx only */ #define DMA_PACKET_NOP 0xf +#define SQ_ALU_EXECUTE_MASK_OP_DEACTIVATE0x0 +#define SQ_ALU_EXECUTE_MASK_OP_BREAK 0x1 +#define SQ_ALU_EXECUTE_MASK_OP_CONTINUE 0x2 +#define SQ_ALU_EXECUTE_MASK_OP_KILL 0x3 + #endif -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: add support for compressed texture
On 04/08/2013 02:03 PM, Marek Olšák wrote: On Mon, Apr 8, 2013 at 11:29 AM, Michel Dänzer mic...@daenzer.net wrote: On Fre, 2013-04-05 at 17:36 -0400, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Most test pass, issue are with border color and swizzle. FWIW, those issues are there with non-compressed formats as well. I'm afraid we might need to change the hardware border colour depending on the swizzle. I don't think so. The issue with the swizzled border color seems to be a bad hardware design decision present since r600 rather than a hardware bug. I tried fixing it for older chipsets with no success. I doubt the hw designers fixed this for SI. The problem is the hardware tries to guess what the border color swizzle is from the combined pipe_format+sampler view swizzle combination. You need 2 texture swizzle states in the texture unit for the border color to be swizzled correctly, because texels must be swizzled by the pipe_format swizzle and sampler view swizzle, but the border color must be swizzled by the sampler view only. The main problem is that the hardware internally tries to undo the pipe_format swizzle in a way that just doesn't work. I don't remember the exact swizzles being used by hardware, but I got crazy cases like if I set texture swizzle to ywzx, the border color will be ywyy. There is no way to access those zx components of the border color for that specific swizzling. For some cases, the hardware succeeds in guessing what the border color should be, e.g. if I set texture swizzle to .zyxw, the returned border color will be .xyzw (and that would be correct if the swizzle came from pipe_format, and incorrect if the swizzle came from sampler view). I also looked into this issue some time ago (on evergreen) and IIRC I found that the swizzle is actually applied twice to border color in most cases (at least when swizzle_y is not 2 or 3), I think it's just a bug (or we are missing something in the hw configuration). Anyway, according to my tests in many cases (960 of 1296 total swizzles, 74%) it's possible to apply some precomputed swizzle to border color before writing it to the registers to get the correct result in the end, but I'm not sure if it makes sense to implement that. Vadim It was easy with r300, because I could just undo pipe_format swizzling before passing the border color to the hardware. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4
On 04/02/2013 12:48 AM, Vincent Lejeune wrote: Btw where can I find some more info on stack_size ? I assumed it should represent the amout of max stacked exec_mask, but it looks like it is possible to have much more manually pushed exec_mask level than reported by nstack (iiuc a push count as much as a 1/4 of a loop level). Yes, different instructions consume different amount of stack space. There is an explanation in the ISA docs, section 3.6.5 Stack Allocation, it's basically correct but don't expect it to be precise regarding the special cases (e.g. in the cayman isa doc comments in the table 3.6 look like a copy-paste from r600/r700 docs instead of the cayman-specific comments). I've added the additional info that I have regarding the special cases for chip generations and my notes as the comments in the patch (see callstack_update_max_depth function). Vadim - Mail original - De�: Vadim Girlin vadimgir...@gmail.com �: Vincent Lejeune v...@ovi.com Cc�: Alex Deucher alexdeuc...@gmail.com; mesa-dev@lists.freedesktop.org mesa-dev@lists.freedesktop.org Envoy� le : Dimanche 31 mars 2013 22h34 Objet�: Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4 On 04/01/2013 12:00 AM, Vincent Lejeune wrote: Hi Vadim, Does this patch work ? (It's still not pushed) It works for me on evergreen, but I'm not sure about other chip generations. I wanted to ask somebody to test it, but the problem is that the piglit coverage for this is not enough (e.g. initial version of this patch had no regressions with piglit but resulted in artifacts with Heaven). I thought about adding more control flow tests but haven't written them yet. The same algorithm seemingly works in my r600-sb branch with other chips, but the test coverage with that branch is even lower due to the if-conversion that eliminates most of the conditional control flow. I usually prefer not to push any patches until I'm sure that they are not breaking anything. But well, possibly in this case it's easier to simply push it and wait for the bug reports. I think I'll check if it needs rebasing and push it in a day or two if there are no objections. Vadim I'm working on doing native control flow for llvm and intend to port your patch on the control flow reservation. Vincent ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD
On 03/31/2013 01:01 PM, Martin Andersson wrote: On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 03/30/2013 05:35 AM, Martin Andersson wrote: I found an issue with the shader compiler for Cayman when I looked into why the ext_transform_feedback/order test case caused a GPU stall. It turned out the stall was an infinite loop that was the result of broken calculation in the shader function. The issue is that Cayman uses the tgsi_umad function for UMAD, but that does not work since it does not populate the y, z and w slots for UMUL that cayman requires. This patch implements a cayman_umad. There are some things I'm unsure of though. The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint version in cayman_umad. In the patch I used the uint one, because that seemed the most logical. Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK MULLO_UINT should be used. Ok, I will send a patch for that as well then. The add part of UMAD I copied from tgsi_umad and that had a loop around the variable lasti, but the variable lasti is usally not used in cayman specific code. The only difference with umad on cayman is in the mul part - each MULLO_UINT should be expanded to 4 slots on cayman. Add part doesn't need any changes. This is used in tgsi functions. int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); This is used to determine last written vector component from the write mask, so that if tgsi instruction doesn't write e.g. W component, we don't have to emit R600 instruction(s) for that component. But in cayman specific code this is used instead. int last_slot = (inst-Dst[0].Register.WriteMask 0x8) ? 4 : 3; This is used for instructions like RECIP_xxx (see the comment at r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot if the write to the W component is required, but MULLO_UINT is different - it should be expanded to 4 instruction slots always. By the way, it seems cayman_mul_int_instr is incorrect in this regard. It does not work to switch lasti with last_slot, since that makes the loop run too many times (in my test case lasti is 0 and last_slot is 3). So I just removed the loop, was that correct or should i resolve that in some other way? No, it's not correct, there should be a loop over the vector components for addition as well - it should be performed in the same way as on the pre-cayman chips. In your patch you are only performing the addition for one component. Basically, the only required change for UMAD on cayman is that you need to expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on cayman. Should I keep the cayman_umad function or should I modify tgsi_umad and add the cayman specific part there? I think it's better to modify tgsi_umad (to avoid unnecessary code duplication). Vadim Vadim Martin Andersson (1): r600g: Add a Cayman specific version of UMAD src/gallium/drivers/r600/r600_shader.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) //Martin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4
On 04/01/2013 12:00 AM, Vincent Lejeune wrote: Hi Vadim, Does this patch work ? (It's still not pushed) It works for me on evergreen, but I'm not sure about other chip generations. I wanted to ask somebody to test it, but the problem is that the piglit coverage for this is not enough (e.g. initial version of this patch had no regressions with piglit but resulted in artifacts with Heaven). I thought about adding more control flow tests but haven't written them yet. The same algorithm seemingly works in my r600-sb branch with other chips, but the test coverage with that branch is even lower due to the if-conversion that eliminates most of the conditional control flow. I usually prefer not to push any patches until I'm sure that they are not breaking anything. But well, possibly in this case it's easier to simply push it and wait for the bug reports. I think I'll check if it needs rebasing and push it in a day or two if there are no objections. Vadim I'm working on doing native control flow for llvm and intend to port your patch on the control flow reservation. Vincent ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Possible bug with r600g shader compiler
On 03/31/2013 04:51 PM, Martin Andersson wrote: Hi, I think have found a bug in the r600g shader compiler. I have a AMD 6950 and I'm running mesa from git. The bug is exercised by the the vertex shader program in piglit ext_transform_feedback/order.c I have simplified the shader program so the compiled shader is easier to read: #version 130 in uint starting_x; flat out uint starting_x_copy; flat out uint iteration_count; flat out uint shift_reg_final; uniform uint shift_count; void main() { gl_Position = vec4(0.0); uint x = starting_x; uint count = 0u; uint shift_reg = 1u; starting_x_copy = starting_x; uint k; while (x != 0u) { shift_reg = shift_count; for (k = 0u; k shift_count; ++k) ++count; x = 0u; } iteration_count = count; shift_reg_final = shift_reg; } It compiles to, http://pastebin.com/cQ8rbKCv. input: shift_count 64 starting_x 0 actual output: iteration_count 1 shift_reg 1 expected output: iteration_count 0 shift_reg 1 When the shader is run with starting_x set to 0 the iteration_count output is 1. That should be impossible since the ++count is inside the while loop guarded by x != 0. That the iteration_count is 1 and not 64 is also strange, it seems to somehow have gotten past the while guard but only executed one iteration in the for loop before exiting again. Another thing to note is that shift_reg is not set to 64. If I write 64 instead of shift_count in the for loop (k 64u) (effectivily optimizing it to 64 add statements instead of a loop) or switch the while to an if, the program behaves as expected. That leads me to belive that the issue is with the two nested loops. The docs mentions something about nested flowcontrol for PRED_SETE_64. The instruction can also establish a predicate result (execute or skip) for subsequent predicated instruction execution. This additional control allows a compiler to support one-instruction issue for if-elseif operations, or an integer result for nested flow-control, by using single-precision operations to manipulate a predicate counter. But the while and for loops are compiled to PRED_SETNE_INT which does not have that comment. Anyway, I just wanted to include that comment in case it was relevant. Predication is not used with the default compiler backend in r600g (currently it may be used with the llvm backend only), so it's not relevant. Anyway, this comment applies to all PRED_xxx instructions. Omitted comment in the docs doesn't mean anything, some things in the docs may be even incorrect. Anyone knows whats wrong or have any ideas for how I could debug it further? You might want to modify the test to get rid of the transform feedback, just to make sure that it's not a transform feedback issue. Vadim //Martin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD
On 03/30/2013 05:35 AM, Martin Andersson wrote: I found an issue with the shader compiler for Cayman when I looked into why the ext_transform_feedback/order test case caused a GPU stall. It turned out the stall was an infinite loop that was the result of broken calculation in the shader function. The issue is that Cayman uses the tgsi_umad function for UMAD, but that does not work since it does not populate the y, z and w slots for UMUL that cayman requires. This patch implements a cayman_umad. There are some things I'm unsure of though. The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint version in cayman_umad. In the patch I used the uint one, because that seemed the most logical. Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK MULLO_UINT should be used. The add part of UMAD I copied from tgsi_umad and that had a loop around the variable lasti, but the variable lasti is usally not used in cayman specific code. The only difference with umad on cayman is in the mul part - each MULLO_UINT should be expanded to 4 slots on cayman. Add part doesn't need any changes. This is used in tgsi functions. int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); This is used to determine last written vector component from the write mask, so that if tgsi instruction doesn't write e.g. W component, we don't have to emit R600 instruction(s) for that component. But in cayman specific code this is used instead. int last_slot = (inst-Dst[0].Register.WriteMask 0x8) ? 4 : 3; This is used for instructions like RECIP_xxx (see the comment at r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot if the write to the W component is required, but MULLO_UINT is different - it should be expanded to 4 instruction slots always. By the way, it seems cayman_mul_int_instr is incorrect in this regard. It does not work to switch lasti with last_slot, since that makes the loop run too many times (in my test case lasti is 0 and last_slot is 3). So I just removed the loop, was that correct or should i resolve that in some other way? No, it's not correct, there should be a loop over the vector components for addition as well - it should be performed in the same way as on the pre-cayman chips. In your patch you are only performing the addition for one component. Basically, the only required change for UMAD on cayman is that you need to expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on cayman. Vadim Martin Andersson (1): r600g: Add a Cayman specific version of UMAD src/gallium/drivers/r600/r600_shader.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations
On 03/28/2013 01:01 PM, � wrote: Am 27.03.2013 20:37, schrieb Vadim Girlin: Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_shader.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 29facf7..d4c9c03 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -874,12 +874,12 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back static int tgsi_declaration(struct r600_shader_ctx *ctx) { struct tgsi_full_declaration *d = ctx-parse.FullToken.FullDeclaration; -unsigned i; -int r; +int r, i, j, count = d-Range.Last - d-Range.First + 1; switch (d-Declaration.File) { case TGSI_FILE_INPUT: -i = ctx-shader-ninput++; +i = ctx-shader-ninput; +ctx-shader-ninput += count; ctx-shader-input[i].name = d-Semantic.Name; ctx-shader-input[i].sid = d-Semantic.Index; ctx-shader-input[i].interpolate = d-Interp.Interpolate; @@ -903,9 +903,15 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) return r; } } +for (j = 1; j count; ++j) { +memcpy(ctx-shader-input[i + j], ctx-shader-input[i], + sizeof(struct r600_shader_io)); Instead of memcpy, shouldn't an assignment do the trick here as well? Yes, assignment should work fine, I just used to use memcpy in such cases for some reason. I'll replace memcpy with assignment. Also I think second part (outputs handling) can be dropped for now - currently we only need to handle the inputs (for HUD shaders), and later when array declarations for inputs/outputs will be implemented in TGSI probably we'll need to update the parser in r600g anyway - I'm just not sure yet how the semantic indices should be handled for input/output arrays. Vadim +ctx-shader-input[i + j].gpr += j; +} break; case TGSI_FILE_OUTPUT: -i = ctx-shader-noutput++; +i = ctx-shader-noutput; +ctx-shader-noutput += count; ctx-shader-output[i].name = d-Semantic.Name; ctx-shader-output[i].sid = d-Semantic.Index; ctx-shader-output[i].gpr = ctx-file_offset[TGSI_FILE_OUTPUT] + d-Range.First; @@ -933,6 +939,11 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) break; } } +for (j = 1; j count; ++j) { +memcpy(ctx-shader-output[i + j], ctx-shader-output[i], + sizeof(struct r600_shader_io)); Same here. +ctx-shader-output[i + j].gpr += j; +} break; case TGSI_FILE_CONSTANT: case TGSI_FILE_TEMPORARY: Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] R600: Emit CF_ALU and use true kcache register.
On 03/28/2013 09:47 PM, Vincent Lejeune wrote: [snip] diff --git a/lib/Target/R600/R600RegisterInfo.td b/lib/Target/R600/R600RegisterInfo.td index ce5994c..3ee6623 100644 --- a/lib/Target/R600/R600RegisterInfo.td +++ b/lib/Target/R600/R600RegisterInfo.td @@ -43,6 +43,37 @@ foreach Index = 0-127 in { Index; } +// KCACHE_BANK0 +foreach Index = 159-128 in { + foreach Chan = [ X, Y, Z, W ] in { +// 32-bit Temporary Registers +def KC0_#Index#_#Chan : R600RegWithChan KC0[#Index#-128].#Chan, Index, Chan; + } + // 128-bit Temporary Registers + def KC0_#Index#_XYZW : R600Reg_128 KC0[#Index#-128].XYZW, + [!castRegister(KC0_#Index#_X), + !castRegister(KC0_#Index#_Y), + !castRegister(KC0_#Index#_Z), + !castRegister(KC0_#Index#_W)], + Index; +} + +// KCACHE_BANK1 +foreach Index = 191-159 in { Probably 160 should be used instead of 159 here (and in the two occurrences below)? Vadim + foreach Chan = [ X, Y, Z, W ] in { +// 32-bit Temporary Registers +def KC1_#Index#_#Chan : R600RegWithChan KC1[#Index#-159].#Chan, Index, Chan; + } + // 128-bit Temporary Registers + def KC1_#Index#_XYZW : R600Reg_128 KC1[#Index#-159].XYZW, + [!castRegister(KC1_#Index#_X), + !castRegister(KC1_#Index#_Y), + !castRegister(KC1_#Index#_Z), + !castRegister(KC1_#Index#_W)], + Index; +} + + // Array Base Register holding input in FS foreach Index = 448-480 in { def ArrayBase#Index : R600RegARRAY_BASE, Index; @@ -80,6 +111,38 @@ def R600_Addr : RegisterClass AMDGPU, [i32], 127, (add (sequence Addr%u_X, } // End isAllocatable = 0 +def R600_KC0_X : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC0_%u_X, 128, 159)); + +def R600_KC0_Y : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC0_%u_Y, 128, 159)); + +def R600_KC0_Z : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC0_%u_Z, 128, 159)); + +def R600_KC0_W : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC0_%u_W, 128, 159)); + +def R600_KC0 : RegisterClass AMDGPU, [f32, i32], 32, + (interleave R600_KC0_X, R600_KC0_Y, + R600_KC0_Z, R600_KC0_W); + +def R600_KC1_X : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC1_%u_X, 160, 191)); + +def R600_KC1_Y : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC1_%u_Y, 160, 191)); + +def R600_KC1_Z : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC1_%u_Z, 160, 191)); + +def R600_KC1_W : RegisterClass AMDGPU, [f32, i32], 32, + (add (sequence KC1_%u_W, 160, 191)); + +def R600_KC1 : RegisterClass AMDGPU, [f32, i32], 32, + (interleave R600_KC1_X, R600_KC1_Y, + R600_KC1_Z, R600_KC1_W); + def R600_TReg32_X : RegisterClass AMDGPU, [f32, i32], 32, (add (sequence T%u_X, 0, 127), AR_X); diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll index e8e2bf5..3d70e4b 100644 --- a/test/CodeGen/R600/kcache-fold.ll +++ b/test/CodeGen/R600/kcache-fold.ll @@ -1,7 +1,7 @@ ;RUN: llc %s -march=r600 -mcpu=redwood | FileCheck %s ; CHECK: @main1 -; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}} +; CHECK: MOV T{{[0-9]+\.[XYZW], KC0}} define void @main1() { main_body: %0 = load 4 x float addrspace(8)* null ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations
Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_shader.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 29facf7..d4c9c03 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -874,12 +874,12 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back static int tgsi_declaration(struct r600_shader_ctx *ctx) { struct tgsi_full_declaration *d = ctx-parse.FullToken.FullDeclaration; - unsigned i; - int r; + int r, i, j, count = d-Range.Last - d-Range.First + 1; switch (d-Declaration.File) { case TGSI_FILE_INPUT: - i = ctx-shader-ninput++; + i = ctx-shader-ninput; + ctx-shader-ninput += count; ctx-shader-input[i].name = d-Semantic.Name; ctx-shader-input[i].sid = d-Semantic.Index; ctx-shader-input[i].interpolate = d-Interp.Interpolate; @@ -903,9 +903,15 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) return r; } } + for (j = 1; j count; ++j) { + memcpy(ctx-shader-input[i + j], ctx-shader-input[i], + sizeof(struct r600_shader_io)); + ctx-shader-input[i + j].gpr += j; + } break; case TGSI_FILE_OUTPUT: - i = ctx-shader-noutput++; + i = ctx-shader-noutput; + ctx-shader-noutput += count; ctx-shader-output[i].name = d-Semantic.Name; ctx-shader-output[i].sid = d-Semantic.Index; ctx-shader-output[i].gpr = ctx-file_offset[TGSI_FILE_OUTPUT] + d-Range.First; @@ -933,6 +939,11 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) break; } } + for (j = 1; j count; ++j) { + memcpy(ctx-shader-output[i + j], ctx-shader-output[i], + sizeof(struct r600_shader_io)); + ctx-shader-output[i + j].gpr += j; + } break; case TGSI_FILE_CONSTANT: case TGSI_FILE_TEMPORARY: -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Head-up display for Gallium DRI2 drivers
On 03/26/2013 02:00 AM, Marek Olšák wrote: On Mon, Mar 25, 2013 at 10:38 PM, Ondrej Holecek aaa...@gmail.com wrote: On Saturday 23 of March 2013 00:50:59 Marek Olšák wrote: Hi everyone, one image is better than a thousand words: ... Hi, I tried your patches and hit a few problems. As first, they do not apply cleanly on master as they are expecting another your patch cso: add constant buffer save/restore feature for postprocessing to be present. But I guess you are aware of that. Yes, I sent the patch to mesa-dev earlier. Second problem is that when I build mesa with HUD on my 32bit virtual machine, HUD works (with 32bit app of course). When I build it on 64bit (both are same uptodate OS openSUSE 12.3), HUD is not working (with 64bit app). I managed to track it down to failed IMM instruction parsing during HUD_create function. It appears that translate_ctx structure in tgsi_text_translate (file src/gallium/auxiliary/tgsi/tgsi_text.c) is not initialized to zeros under my 64bit system, instead ctx.num_immediates is equal to 1 and hence trigger Immediates must be sorted error. Following fixes HUD for me (note that I really don't know if I am not broking something here in regards to mesa): diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index 6b97bee..247ec75 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -1577,6 +1577,7 @@ tgsi_text_translate( ctx.tokens = tokens; ctx.tokens_cur = tokens; ctx.tokens_end = tokens + num_tokens; + ctx.num_immediates = 0; if (!translate( ctx )) return FALSE; I've sent a fix for this a couple of days ago: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg36038.html The third issue is that on both 32bit and 64bit build fonts are not displayed in HUD. I see graphs and transparent background rectangles for text but no text is visible. This one I did not yet solve. Your driver must support the I8_UNORM texture format. I think this also may be related to unexpected by some drivers TGSI declaration of vertex shader inputs: DCL IN[0..1] At least r600g expects the separate declaration for each input, though fortunately it still works in this case because parsed declarations of VS inputs aren't really used in r600g. I noticed exactly the same issue (missing text) with my r600-sb branch because it relies on the number of the parsed inputs from r600g's tgsi translator. It's 1 in this case instead of 2, so second input register is considered undefined and optimized away. I suspect that some other drivers may also handle this declaration incorrectly and this may explain the issue. Vadim One last thought, is it intentional when wrong query is entered that hud graph is displayed but empty? Maybe some text like wrong query XXX would be a good hint. I know it is printed on stdout but looking for warnings in chatty apps like openarena is little tricky. Yes, it's intentional. I guess I can at least make it not draw an empty pane. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev