Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. What is EXTBF? Extract byte to float? I ask because Unigine Heaven has shaders that pack 3x byte-integers into one component of a vec4 and extracts them with shifts/ands and converts them to floats, and i965 could do the extraction and conversion in a single instruction. I'm curious if this is the same thing you're optimizing. I thought about adding an extract_byte(src, byte_num) operation, but i965's copy propagation caused me some headache and I shelved it. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 6:57 PM, Matt Turner matts...@gmail.com wrote: On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. What is EXTBF? Extract byte to float? I ask because Unigine Heaven has shaders that pack 3x byte-integers into one component of a vec4 and extracts them with shifts/ands and converts them to floats, and i965 could do the extraction and conversion in a single instruction. I'm curious if this is the same thing you're optimizing. Well, I apparently just needed to read your second patch's commit message to confirm my suspicions. I thought about adding an extract_byte(src, byte_num) operation, but i965's copy propagation caused me some headache and I shelved it. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 9:57 PM, Matt Turner matts...@gmail.com wrote: On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. What is EXTBF? Extract byte to float? Extract Bitfield. I ask because Unigine Heaven has shaders that pack 3x byte-integers into one component of a vec4 and extracts them with shifts/ands and converts them to floats, and i965 could do the extraction and conversion in a single instruction. I'm curious if this is the same thing you're optimizing. I thought about adding an extract_byte(src, byte_num) operation, but i965's copy propagation caused me some headache and I shelved it. Yes, I think it's the same shader... it's doing a texelFetch() and then grabbing bytes 0, 1, 2 off that. The generated shader code after the second patch does: /*05d0*/ TLD.LL.P R0, R24, 0x0, 2D, 0x3; /*05d8*/ TEXDEPBAR 0x0; /*05e0*/ I2F.F32.U8 R2, R1; /*05e8*/ FFMA.FTZ R2, R2, R15, R19; /*05f0*/ I2F.F32.U8 R8, R1.B1; /*05f8*/ FFMA.FTZ R8, R8, R15, R19; /*0608*/ I2F.F32.U8 R1, R1.B2; I'll let you guess what these things mean. TLD = texelfetch :) -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++--- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 3841c33..b0e74f0 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1023,27 +1023,53 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) case OP_AND: { - CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp(); - if (!cmp || cmp-op == OP_SLCT || cmp-getDef(0)-refCount() 1) - return; - if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32)) - return; - if (imm0.reg.data.f32 != 1.0) - return; - if (i-getSrc(t)-getInsn()-dType != TYPE_U32) - return; + Instruction *src = i-getSrc(t)-getInsn(); + ImmediateValue imm1; + if (imm0.reg.data.u32 == 0) { + i-op = OP_MOV; + i-setSrc(0, new_ImmediateValue(prog, 0u)); + i-src(0).mod = Modifier(0); + i-setSrc(1, NULL); + } else if (imm0.reg.data.u32 == ~0U) { + i-op = i-src(t).mod.getOp(); + if (t) { +i-setSrc(0, i-getSrc(t)); +i-src(0).mod = i-src(t).mod; + } + i-setSrc(1, NULL); + } else if (src-asCmp()) { + CmpInstruction *cmp = src-asCmp(); + if (!cmp || cmp-op == OP_SLCT || cmp-getDef(0)-refCount() 1) +return; + if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32)) +return; + if (imm0.reg.data.f32 != 1.0) +return; + if (cmp-dType != TYPE_U32) +return; - i-getSrc(t)-getInsn()-dType = TYPE_F32; - if (i-src(t).mod != Modifier(0)) { - assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT)); - i-src(t).mod = Modifier(0); - cmp-setCond = inverseCondCode(cmp-setCond); - } - i-op = OP_MOV; - i-setSrc(s, NULL); - if (t) { - i-setSrc(0, i-getSrc(t)); - i-setSrc(t, NULL); + cmp-dType = TYPE_F32; + if (i-src(t).mod != Modifier(0)) { +assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT)); +i-src(t).mod = Modifier(0); +cmp-setCond = inverseCondCode(cmp-setCond); + } + i-op = OP_MOV; + i-setSrc(s, NULL); + if (t) { +i-setSrc(0, i-getSrc(t)); +i-setSrc(t, NULL); + } + } else if (prog-getTarget()-isOpSupported(OP_EXTBF, TYPE_U32) + src-op == OP_SHR + src-src(1).getImmediate(imm1) + i-src(t).mod == Modifier(0) + util_is_power_of_two(imm0.reg.data.u32 + 1)) { + // low byte = offset, high byte = width + uint32_t ext = (util_last_bit(imm0.reg.data.u32) 8) | imm1.reg.data.u32; + i-op = OP_EXTBF; + i-setSrc(0, src-getSrc(0)); + i-setSrc(1, new_ImmediateValue(prog, ext)); } } break; -- 2.4.6 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words
Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1 + .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 2 + .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 4 ++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 79 -- 4 files changed, 82 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index f06056f..8f15429 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -933,6 +933,7 @@ CodeEmitterGK110::emitCVT(const Instruction *i) code[0] |= typeSizeofLog2(dType) 10; code[0] |= typeSizeofLog2(i-sType) 12; + code[1] |= i-subOp 12; if (isSignedIntType(dType)) code[0] |= 0x4000; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index ef5c87d..6e22788 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -818,6 +818,7 @@ CodeEmitterGM107::emitI2F() emitField(0x31, 1, (insn-op == OP_ABS) || insn-src(0).mod.abs()); emitCC (0x2f); emitField(0x2d, 1, (insn-op == OP_NEG) || insn-src(0).mod.neg()); + emitField(0x29, 2, insn-subOp); emitRND (0x27, rnd, -1); emitField(0x0d, 1, isSignedType(insn-sType)); emitField(0x0a, 2, util_logbase2(typeSizeof(insn-sType))); @@ -850,6 +851,7 @@ CodeEmitterGM107::emitI2I() emitField(0x31, 1, (insn-op == OP_ABS) || insn-src(0).mod.abs()); emitCC (0x2f); emitField(0x2d, 1, (insn-op == OP_NEG) || insn-src(0).mod.neg()); + emitField(0x29, 2, insn-subOp); emitField(0x0d, 1, isSignedType(insn-sType)); emitField(0x0c, 1, isSignedType(insn-dType)); emitField(0x0a, 2, util_logbase2(typeSizeof(insn-sType))); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index 5703712..6bf5219 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1020,6 +1020,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i) code[0] |= util_logbase2(typeSizeof(dType)) 20; code[0] |= util_logbase2(typeSizeof(i-sType)) 23; + // for 8/16 source types, the byte/word is in subOp. word 1 is + // represented as 2. + code[1] |= i-subOp 0x17; + if (sat) code[0] |= 0x20; if (abs) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index b0e74f0..e37420c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1312,7 +1312,8 @@ private: void handleRCP(Instruction *); void handleSLCT(Instruction *); void handleLOGOP(Instruction *); - void handleCVT(Instruction *); + void handleCVT_NEG(Instruction *); + void handleCVT_EXTBF(Instruction *); void handleSUCLAMP(Instruction *); BuildUtil bld; @@ -1563,12 +1564,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop) // nv50: // F2I(NEG(I2F(ABS(SET void -AlgebraicOpt::handleCVT(Instruction *cvt) +AlgebraicOpt::handleCVT_NEG(Instruction *cvt) { + Instruction *insn = cvt-getSrc(0)-getInsn(); if (cvt-sType != TYPE_F32 || cvt-dType != TYPE_S32 || cvt-src(0).mod != Modifier(0)) return; - Instruction *insn = cvt-getSrc(0)-getInsn(); if (!insn || insn-op != OP_NEG || insn-dType != TYPE_F32) return; if (insn-src(0).mod != Modifier(0)) @@ -1598,6 +1599,74 @@ AlgebraicOpt::handleCVT(Instruction *cvt) delete_Instruction(prog, cvt); } +// Some shaders extract packed bytes out of words and convert them to +// e.g. float. The Fermi+ CVT instruction can extract those directly, as can +// nv50 for word sizes. +// +// CVT(EXTBF(x, byte/word)) +// CVT(AND(bytemask, x)) +// CVT(AND(bytemask, SHR(x, 8/16/24))) +void +AlgebraicOpt::handleCVT_EXTBF(Instruction *cvt) +{ + Instruction *insn = cvt-getSrc(0)-getInsn(); + ImmediateValue imm0, imm1; + Value *arg = NULL; + unsigned width, offset; + if ((cvt-sType != TYPE_U32 cvt-sType != TYPE_S32) || !insn) + return; + if (insn-op == OP_EXTBF insn-src(1).getImmediate(imm0)) { + width = (imm0.reg.data.u32 8) 0xff; + offset = imm0.reg.data.u32 0xff; + arg = insn-getSrc(0); + + if (width != 8 width != 16) + return; + if (width == 8
[Nouveau] [Bug 91125] [NVE7] Nouveau read fault, locking up the gpu
https://bugs.freedesktop.org/show_bug.cgi?id=91125 --- Comment #6 from David March davidmarch...@gmail.com --- Probable duplicate bug (though this current bug was filed earlier) with more information: https://bugs.freedesktop.org/show_bug.cgi?id=91598 Apparently per above fail set_domain means the it is running out of vram. -- You are receiving this mail because: You are the assignee for the bug. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [Bug 91598] Broken Rendering of Plasma 5 Desktop
https://bugs.freedesktop.org/show_bug.cgi?id=91598 --- Comment #11 from David March davidmarch...@gmail.com --- I was able to initiate an apitrace when some textual corruptions (but not the full menu corruptions which happen in the later stages) started appearing on the plasmashell menus. However upon playing back the trace it does not recreate the issue so I doubt it will do much good? If you still want it let me know. the trace is 4.3 GB and gzip is able to take it down to about 2.5 GB. If you only need to see it to see what it looks like when the corruptions occur then would some before and after screenshots perhaps work better? Also I have found that after closing some applications and then doing 'echo 1 /proc/sys/vm/drop_caches' I can almost immediately restore proper rendering again. Next time I will test further to see if dropping caches always works even without closing any applications. If it does I would think that narrows down what is going on with this bug and what is triggering it. -- You are receiving this mail because: You are the assignee for the bug. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [Bug 91598] Broken Rendering of Plasma 5 Desktop
https://bugs.freedesktop.org/show_bug.cgi?id=91598 --- Comment #12 from David March davidmarch...@gmail.com --- Probably the same bug responsible: https://bugs.freedesktop.org/show_bug.cgi?id=91125 includes screenshots of corruption from another user. -- You are receiving this mail because: You are the assignee for the bug. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau