Re: [Mesa-dev] [PATCH 1/4] llvmpipe: (trivial) minimally simplify mask construction

2017-01-04 Thread Jose Fonseca

On 21/12/16 04:01, srol...@vmware.com wrote:

From: Roland Scheidegger 

simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).
---
 src/gallium/auxiliary/gallivm/lp_bld_logic.c |  2 ++
 src/gallium/drivers/llvmpipe/lp_bld_depth.c  | 52 ++--
 src/gallium/drivers/llvmpipe/lp_state_fs.c   | 16 +
 3 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c 
b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
index 1a50e82..524917a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
@@ -327,6 +327,8 @@ lp_build_select(struct lp_build_context *bld,
* supported yet for a long time, and LLVM will generate poor code when
* the mask is not the result of a comparison.
* Also, llvm 3.7 may miscompile them (bug 94972).
+   * XXX: Even if the instruction was an SExt, this may still produce
+   * terrible code. Try piglit stencil-twoside.
*/

   /* Convert the mask to a vector of booleans.
diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 0c27c2f..d5d5c5a 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -963,16 +963,48 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
if (stencil[0].enabled) {

   if (face) {
- LLVMValueRef zero = lp_build_const_int32(gallivm, 0);
-
- /* front_facing = face != 0 ? ~0 : 0 */
- front_facing = LLVMBuildICmp(builder, LLVMIntNE, face, zero, "");
- front_facing = LLVMBuildSExt(builder, front_facing,
-  LLVMIntTypeInContext(gallivm->context,
- 
s_bld.type.length*s_bld.type.width),
-  "");
- front_facing = LLVMBuildBitCast(builder, front_facing,
- s_bld.int_vec_type, "");
+ if (0) {
+/*
+ * XXX: the scalar expansion below produces atrocious code
+ * (basically producing a 64bit scalar value, then moving the 2
+ * 32bit pieces separately to simd, plus 4 shuffles, which is
+ * seriously lame). But the scalar-simd transitions are always
+ * tricky, so no big surprise there.
+ * This here would be way better, however llvm has some serious
+ * trouble later using it in the select, probably because it will
+ * recognize the expression as constant and move the simd value
+ * away (out of the loop) - and then it will suddenly try
+ * constructing i1 high-bit masks out of it later...
+ * (Try piglit stencil-twoside.)
+ * Note this is NOT due to using SExt/Trunc, it fails exactly the
+ * same even when using native compare/select.
+ * I cannot reproduce this problem when using stand-alone compiler
+ * though, suggesting some problem with optimization passes...
+ * (With stand-alone compilation, the construction of this mask
+ * value, no matter if the easy 3 instruction here or the complex
+ * 16+ one below, never gets separated from where it's used.)
+ * The scalar code still has the same problem, but the generated
+ * code looks a bit better at least for some reason, even if
+ * mostly by luck (the fundamental issue clearly is the same).
+ */



+front_facing = lp_build_broadcast(gallivm, s_bld.vec_type, face);
+/* front_facing = face != 0 ? ~0 : 0 */
+front_facing = lp_build_compare(gallivm, s_bld.type,
+PIPE_FUNC_NOTEQUAL,
+front_facing, s_bld.zero);
+ } else {


Let's leave the command about the ICmp/SExt innefficiencies, but remove 
this dead path.




+LLVMValueRef zero = lp_build_const_int32(gallivm, 0);
+
+/* front_facing = face != 0 ? ~0 : 0 */
+front_facing = LLVMBuildICmp(builder, LLVMIntNE, face, zero, "");
+front_facing = LLVMBuildSExt(builder, front_facing,
+ 

[Mesa-dev] [PATCH 1/4] llvmpipe: (trivial) minimally simplify mask construction

2016-12-20 Thread sroland
From: Roland Scheidegger 

simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).
---
 src/gallium/auxiliary/gallivm/lp_bld_logic.c |  2 ++
 src/gallium/drivers/llvmpipe/lp_bld_depth.c  | 52 ++--
 src/gallium/drivers/llvmpipe/lp_state_fs.c   | 16 +
 3 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c 
b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
index 1a50e82..524917a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
@@ -327,6 +327,8 @@ lp_build_select(struct lp_build_context *bld,
* supported yet for a long time, and LLVM will generate poor code when
* the mask is not the result of a comparison.
* Also, llvm 3.7 may miscompile them (bug 94972).
+   * XXX: Even if the instruction was an SExt, this may still produce
+   * terrible code. Try piglit stencil-twoside.
*/
 
   /* Convert the mask to a vector of booleans.
diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 0c27c2f..d5d5c5a 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -963,16 +963,48 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
if (stencil[0].enabled) {
 
   if (face) {
- LLVMValueRef zero = lp_build_const_int32(gallivm, 0);
-
- /* front_facing = face != 0 ? ~0 : 0 */
- front_facing = LLVMBuildICmp(builder, LLVMIntNE, face, zero, "");
- front_facing = LLVMBuildSExt(builder, front_facing,
-  LLVMIntTypeInContext(gallivm->context,
- 
s_bld.type.length*s_bld.type.width),
-  "");
- front_facing = LLVMBuildBitCast(builder, front_facing,
- s_bld.int_vec_type, "");
+ if (0) {
+/*
+ * XXX: the scalar expansion below produces atrocious code
+ * (basically producing a 64bit scalar value, then moving the 2
+ * 32bit pieces separately to simd, plus 4 shuffles, which is
+ * seriously lame). But the scalar-simd transitions are always
+ * tricky, so no big surprise there.
+ * This here would be way better, however llvm has some serious
+ * trouble later using it in the select, probably because it will
+ * recognize the expression as constant and move the simd value
+ * away (out of the loop) - and then it will suddenly try
+ * constructing i1 high-bit masks out of it later...
+ * (Try piglit stencil-twoside.)
+ * Note this is NOT due to using SExt/Trunc, it fails exactly the
+ * same even when using native compare/select.
+ * I cannot reproduce this problem when using stand-alone compiler
+ * though, suggesting some problem with optimization passes...
+ * (With stand-alone compilation, the construction of this mask
+ * value, no matter if the easy 3 instruction here or the complex
+ * 16+ one below, never gets separated from where it's used.)
+ * The scalar code still has the same problem, but the generated
+ * code looks a bit better at least for some reason, even if
+ * mostly by luck (the fundamental issue clearly is the same).
+ */
+front_facing = lp_build_broadcast(gallivm, s_bld.vec_type, face);
+/* front_facing = face != 0 ? ~0 : 0 */
+front_facing = lp_build_compare(gallivm, s_bld.type,
+PIPE_FUNC_NOTEQUAL,
+front_facing, s_bld.zero);
+ } else {
+LLVMValueRef zero = lp_build_const_int32(gallivm, 0);
+
+/* front_facing = face != 0 ? ~0 : 0 */
+front_facing = LLVMBuildICmp(builder, LLVMIntNE, face, zero, "");
+front_facing = LLVMBuildSExt(builder, front_facing,
+ LLVMIntTypeInContext(gallivm->context,
+
s_bld.type.length*s_bld.type.width),
+ "");
+