I just realized I didn't send it to the list:
There was yet another problem with reordering of instructions. The
attached patch (which is against my earlier patch) should fix this.
~Nicolai
On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:
Another thought; the same changed are probably needed for the vertprog code. I
think there are also a lot of bugs there.
On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:
This patch seems to break one of my longer fragment programs. I believe this
is
because it's running out of registers, but I haven't looked into it in
detail
yet.
I think this patch should be committed, but directly followed by a patch to
reduce the number of registers used.
On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
There were a number of bugs related to the pairing of vector and
scalar operations where swizzles ended up using the wrong source
register, or an instruction was moved forward and ended up overwriting
an aliased register.
The new algorithm for register allocation is slightly conservative and
may run out of registers before it's strictly necessary. On the plus
side, it Just Works.
Pairing of instructions is done whenever possible, and in more cases
than before, so in practice this change should be a net win.
The patch mostly fixes glean/texCombine. One remaining problem is that
the code duplicates constants and parameters all over the place and
therefore quickly runs out of resources and falls back to software.
I'm going to look into that as well.
Please test and commit this patch. If you notice any regressions,
please tell me (but the tests are looking good).
~Nicolai
commit 1ec4703585171f504180425b65dfab92be2a7782
Author: Nicolai Haehnle [EMAIL PROTECTED]
Date: Sun Mar 18 13:29:18 2007 +0100
r300: Fix fragment program reordering
Do not move an instruction that writes to a temp forward past an instruction
that reads the same temporary.
diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index bc43953..29436ab 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -674,6 +674,11 @@ struct reg_lifetime {
emitted instruction that writes to the register */
int vector_valid;
int scalar_valid;
+
+ /* Index to the slot where the register was last read.
+ This is also the first slot in which the register may be written again */
+ int vector_lastread;
+ int scalar_lastread;
};
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 3c54830..89e9f65 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -1026,10 +1026,11 @@ static void emit_tex(struct r300_fragment_program *rp,
*/
static int get_earliest_allowed_write(
struct r300_fragment_program* rp,
- GLuint dest)
+ GLuint dest, int mask)
{
COMPILE_STATE;
int idx;
+ int pos;
GLuint index = REG_GET_INDEX(dest);
assert(REG_GET_VALID(dest));
@@ -1047,7 +1048,17 @@ static int get_earliest_allowed_write(
return 0;
}
- return cs-hwtemps[idx].reserved;
+ pos = cs-hwtemps[idx].reserved;
+ if (mask WRITEMASK_XYZ) {
+ if (pos cs-hwtemps[idx].vector_lastread)
+ pos = cs-hwtemps[idx].vector_lastread;
+ }
+ if (mask WRITEMASK_W) {
+ if (pos cs-hwtemps[idx].scalar_lastread)
+ pos = cs-hwtemps[idx].scalar_lastread;
+ }
+
+ return pos;
}
@@ -1070,7 +1081,8 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
GLboolean emit_sop,
int argc,
GLuint* src,
- GLuint dest)
+ GLuint dest,
+ int mask)
{
COMPILE_STATE;
int hwsrc[3];
@@ -1092,7 +1104,7 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
if (emit_sop)
used |= SLOT_OP_SCALAR;
- pos = get_earliest_allowed_write(rp, dest);
+ pos = get_earliest_allowed_write(rp, dest, mask);
if (rp-node[rp-cur_node].alu_offset pos)
pos = rp-node[rp-cur_node].alu_offset;
@@ -1191,6 +1203,21 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
cs-slot[pos].ssrc[i] = tempssrc[i];
}
+ for(i = 0; i argc; ++i) {
+ if (REG_GET_TYPE(src[i]) == REG_TYPE_TEMP) {
+ int regnr = hwsrc[i] 31;
+
+ if (used (SLOT_SRC_VECTOR i)) {
+if (cs-hwtemps[regnr].vector_lastread pos)
+ cs-hwtemps[regnr].vector_lastread = pos;
+ }
+ if (used (SLOT_SRC_SCALAR i)) {
+if (cs-hwtemps[regnr].scalar_lastread pos)
+ cs-hwtemps[regnr].scalar_lastread = pos;
+ }
+ }
+ }
+
// Emit the source fetch code
rp-alu.inst[pos].inst1 = ~R300_FPI1_SRC_MASK;
rp-alu.inst[pos].inst1 |=
@@ -1287,7 +1314,7 @@ static void emit_arith(struct r300_fragment_program *rp,
if ((mask WRITEMASK_W) || vop == R300_FPI0_OUTC_REPL_ALPHA)
emit_sop = GL_TRUE;
- pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest);
+ pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest,