Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015-08-18 Thread Matt Turner
On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Some shaders appear to extract bits using shift/and combos. Detect
 (some) of those and convert to EXTBF instead.

What is EXTBF? Extract byte to float?

I ask because Unigine Heaven has shaders that pack 3x byte-integers
into one component of a vec4 and extracts them with shifts/ands and
converts them to floats, and i965 could do the extraction and
conversion in a single instruction. I'm curious if this is the same
thing you're optimizing.

I thought about adding an extract_byte(src, byte_num) operation, but
i965's copy propagation caused me some headache and I shelved it.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015-08-18 Thread Matt Turner
On Tue, Aug 18, 2015 at 6:57 PM, Matt Turner matts...@gmail.com wrote:
 On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Some shaders appear to extract bits using shift/and combos. Detect
 (some) of those and convert to EXTBF instead.

 What is EXTBF? Extract byte to float?

 I ask because Unigine Heaven has shaders that pack 3x byte-integers
 into one component of a vec4 and extracts them with shifts/ands and
 converts them to floats, and i965 could do the extraction and
 conversion in a single instruction. I'm curious if this is the same
 thing you're optimizing.

Well, I apparently just needed to read your second patch's commit
message to confirm my suspicions.

 I thought about adding an extract_byte(src, byte_num) operation, but
 i965's copy propagation caused me some headache and I shelved it.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015-08-18 Thread Ilia Mirkin
On Tue, Aug 18, 2015 at 9:57 PM, Matt Turner matts...@gmail.com wrote:
 On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Some shaders appear to extract bits using shift/and combos. Detect
 (some) of those and convert to EXTBF instead.

 What is EXTBF? Extract byte to float?

Extract Bitfield.


 I ask because Unigine Heaven has shaders that pack 3x byte-integers
 into one component of a vec4 and extracts them with shifts/ands and
 converts them to floats, and i965 could do the extraction and
 conversion in a single instruction. I'm curious if this is the same
 thing you're optimizing.

 I thought about adding an extract_byte(src, byte_num) operation, but
 i965's copy propagation caused me some headache and I shelved it.

Yes, I think it's the same shader... it's doing a texelFetch() and
then grabbing bytes 0, 1, 2 off that.

The generated shader code after the second patch does:

/*05d0*/   TLD.LL.P R0, R24, 0x0, 2D, 0x3;
/*05d8*/   TEXDEPBAR 0x0;
/*05e0*/   I2F.F32.U8 R2, R1;
/*05e8*/   FFMA.FTZ R2, R2, R15, R19;
/*05f0*/   I2F.F32.U8 R8, R1.B1;
/*05f8*/   FFMA.FTZ R8, R8, R15, R19;
/*0608*/   I2F.F32.U8 R1, R1.B2;

I'll let you guess what these things mean. TLD = texelfetch :)

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015-08-18 Thread Ilia Mirkin
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 66 +++---
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3841c33..b0e74f0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1023,27 +1023,53 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
 
case OP_AND:
{
-  CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp();
-  if (!cmp || cmp-op == OP_SLCT || cmp-getDef(0)-refCount()  1)
- return;
-  if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32))
- return;
-  if (imm0.reg.data.f32 != 1.0)
- return;
-  if (i-getSrc(t)-getInsn()-dType != TYPE_U32)
- return;
+  Instruction *src = i-getSrc(t)-getInsn();
+  ImmediateValue imm1;
+  if (imm0.reg.data.u32 == 0) {
+ i-op = OP_MOV;
+ i-setSrc(0, new_ImmediateValue(prog, 0u));
+ i-src(0).mod = Modifier(0);
+ i-setSrc(1, NULL);
+  } else if (imm0.reg.data.u32 == ~0U) {
+ i-op = i-src(t).mod.getOp();
+ if (t) {
+i-setSrc(0, i-getSrc(t));
+i-src(0).mod = i-src(t).mod;
+ }
+ i-setSrc(1, NULL);
+  } else if (src-asCmp()) {
+ CmpInstruction *cmp = src-asCmp();
+ if (!cmp || cmp-op == OP_SLCT || cmp-getDef(0)-refCount()  1)
+return;
+ if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32))
+return;
+ if (imm0.reg.data.f32 != 1.0)
+return;
+ if (cmp-dType != TYPE_U32)
+return;
 
-  i-getSrc(t)-getInsn()-dType = TYPE_F32;
-  if (i-src(t).mod != Modifier(0)) {
- assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT));
- i-src(t).mod = Modifier(0);
- cmp-setCond = inverseCondCode(cmp-setCond);
-  }
-  i-op = OP_MOV;
-  i-setSrc(s, NULL);
-  if (t) {
- i-setSrc(0, i-getSrc(t));
- i-setSrc(t, NULL);
+ cmp-dType = TYPE_F32;
+ if (i-src(t).mod != Modifier(0)) {
+assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT));
+i-src(t).mod = Modifier(0);
+cmp-setCond = inverseCondCode(cmp-setCond);
+ }
+ i-op = OP_MOV;
+ i-setSrc(s, NULL);
+ if (t) {
+i-setSrc(0, i-getSrc(t));
+i-setSrc(t, NULL);
+ }
+  } else if (prog-getTarget()-isOpSupported(OP_EXTBF, TYPE_U32) 
+ src-op == OP_SHR 
+ src-src(1).getImmediate(imm1) 
+ i-src(t).mod == Modifier(0) 
+ util_is_power_of_two(imm0.reg.data.u32 + 1)) {
+ // low byte = offset, high byte = width
+ uint32_t ext = (util_last_bit(imm0.reg.data.u32)  8) | 
imm1.reg.data.u32;
+ i-op = OP_EXTBF;
+ i-setSrc(0, src-getSrc(0));
+ i-setSrc(1, new_ImmediateValue(prog, ext));
   }
}
   break;
-- 
2.4.6

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words

2015-08-18 Thread Ilia Mirkin
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.

This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp |  1 +
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  2 +
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |  4 ++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 79 --
 4 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index f06056f..8f15429 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -933,6 +933,7 @@ CodeEmitterGK110::emitCVT(const Instruction *i)
 
code[0] |= typeSizeofLog2(dType)  10;
code[0] |= typeSizeofLog2(i-sType)  12;
+   code[1] |= i-subOp  12;
 
if (isSignedIntType(dType))
   code[0] |= 0x4000;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index ef5c87d..6e22788 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -818,6 +818,7 @@ CodeEmitterGM107::emitI2F()
emitField(0x31, 1, (insn-op == OP_ABS) || insn-src(0).mod.abs());
emitCC   (0x2f);
emitField(0x2d, 1, (insn-op == OP_NEG) || insn-src(0).mod.neg());
+   emitField(0x29, 2, insn-subOp);
emitRND  (0x27, rnd, -1);
emitField(0x0d, 1, isSignedType(insn-sType));
emitField(0x0a, 2, util_logbase2(typeSizeof(insn-sType)));
@@ -850,6 +851,7 @@ CodeEmitterGM107::emitI2I()
emitField(0x31, 1, (insn-op == OP_ABS) || insn-src(0).mod.abs());
emitCC   (0x2f);
emitField(0x2d, 1, (insn-op == OP_NEG) || insn-src(0).mod.neg());
+   emitField(0x29, 2, insn-subOp);
emitField(0x0d, 1, isSignedType(insn-sType));
emitField(0x0c, 1, isSignedType(insn-dType));
emitField(0x0a, 2, util_logbase2(typeSizeof(insn-sType)));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5703712..6bf5219 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1020,6 +1020,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i)
   code[0] |= util_logbase2(typeSizeof(dType))  20;
   code[0] |= util_logbase2(typeSizeof(i-sType))  23;
 
+  // for 8/16 source types, the byte/word is in subOp. word 1 is
+  // represented as 2.
+  code[1] |= i-subOp  0x17;
+
   if (sat)
  code[0] |= 0x20;
   if (abs)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index b0e74f0..e37420c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1312,7 +1312,8 @@ private:
void handleRCP(Instruction *);
void handleSLCT(Instruction *);
void handleLOGOP(Instruction *);
-   void handleCVT(Instruction *);
+   void handleCVT_NEG(Instruction *);
+   void handleCVT_EXTBF(Instruction *);
void handleSUCLAMP(Instruction *);
 
BuildUtil bld;
@@ -1563,12 +1564,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
 // nv50:
 //  F2I(NEG(I2F(ABS(SET
 void
-AlgebraicOpt::handleCVT(Instruction *cvt)
+AlgebraicOpt::handleCVT_NEG(Instruction *cvt)
 {
+   Instruction *insn = cvt-getSrc(0)-getInsn();
if (cvt-sType != TYPE_F32 ||
cvt-dType != TYPE_S32 || cvt-src(0).mod != Modifier(0))
   return;
-   Instruction *insn = cvt-getSrc(0)-getInsn();
if (!insn || insn-op != OP_NEG || insn-dType != TYPE_F32)
   return;
if (insn-src(0).mod != Modifier(0))
@@ -1598,6 +1599,74 @@ AlgebraicOpt::handleCVT(Instruction *cvt)
delete_Instruction(prog, cvt);
 }
 
+// Some shaders extract packed bytes out of words and convert them to
+// e.g. float. The Fermi+ CVT instruction can extract those directly, as can
+// nv50 for word sizes.
+//
+// CVT(EXTBF(x, byte/word))
+// CVT(AND(bytemask, x))
+// CVT(AND(bytemask, SHR(x, 8/16/24)))
+void
+AlgebraicOpt::handleCVT_EXTBF(Instruction *cvt)
+{
+   Instruction *insn = cvt-getSrc(0)-getInsn();
+   ImmediateValue imm0, imm1;
+   Value *arg = NULL;
+   unsigned width, offset;
+   if ((cvt-sType != TYPE_U32  cvt-sType != TYPE_S32) || !insn)
+  return;
+   if (insn-op == OP_EXTBF  insn-src(1).getImmediate(imm0)) {
+  width = (imm0.reg.data.u32  8)  0xff;
+  offset = imm0.reg.data.u32  0xff;
+  arg = insn-getSrc(0);
+
+  if (width != 8  width != 16)
+ return;
+  if (width == 8  

[Nouveau] [Bug 91125] [NVE7] Nouveau read fault, locking up the gpu

2015-08-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91125

--- Comment #6 from David March davidmarch...@gmail.com ---
Probable duplicate bug (though this current bug was filed earlier) with more
information:

https://bugs.freedesktop.org/show_bug.cgi?id=91598

Apparently per above fail set_domain means the it is running out of vram.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 91598] Broken Rendering of Plasma 5 Desktop

2015-08-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91598

--- Comment #11 from David March davidmarch...@gmail.com ---
I was able to initiate an apitrace when some textual corruptions (but not the
full menu corruptions which happen in the later stages) started appearing on
the plasmashell menus.  However upon playing back the trace it does not
recreate the issue so I doubt it will do much good?  If you still want it let
me know.  the trace is 4.3 GB and gzip is able to take it down to about 2.5 GB.

If you only need to see it to see what it looks like when the corruptions occur
then would some before and after screenshots perhaps work better?

Also I have found that after closing some applications and then doing 'echo 1 
/proc/sys/vm/drop_caches' I can almost immediately restore proper rendering
again.  Next time I will test further to see if dropping caches always works
even without closing any applications.  If it does I would think that narrows
down what is going on with this bug and what is triggering it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 91598] Broken Rendering of Plasma 5 Desktop

2015-08-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91598

--- Comment #12 from David March davidmarch...@gmail.com ---
Probably the same bug responsible:
https://bugs.freedesktop.org/show_bug.cgi?id=91125 includes screenshots of
corruption from another user.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau