Re: [rfc rs6000] troubles with gimple folding for vec_sel

2018-10-24 Thread Segher Boessenkool
On Wed, Oct 24, 2018 at 02:27:07PM +0200, Richard Biener wrote:
> If vec_sel is really a bitwise merge then there's no choice but using
> BIT_{AND,IOR,NOT}_EXPR for open-coding it.  I suppose the
> original builtin expanded to an UNSPEC because there's nothing
> in RTL besides bitwise operations describing the semantics.

The machine instructions vsel/xxsel work by bit, yes.


Segher


Re: [rfc rs6000] troubles with gimple folding for vec_sel

2018-10-24 Thread Richard Biener
On Tue, Oct 23, 2018 at 11:47 PM Will Schmidt  wrote:
>
> Hi all,
> I've been attempting to get early gimple-folding to work with the
> vec_sel intrinsic for powerpc, and I've run into a snag or two such that
> I'm not sure how to best proceed.  Code snippet is below, followed by a
> description of the issues as I interpret them below.
>
> Apologies for the ramble, Thanks in advance, ...  :-)
>
> -Will
>
> ---8<---
> /* vector selects */
> /* d = vec_sel (a, b, c)
> Each bit of the result vector (d) has the value of
> the corresponding bit of (a) if the corresponding bit
> of (c) is 0. Otherwise, each bit of the result vector
> has the value of the corresponding bit of (b).  */
> case ALTIVEC_BUILTIN_VSEL_16QI:
> case ALTIVEC_BUILTIN_VSEL_8HI:
> case ALTIVEC_BUILTIN_VSEL_4SI:
> case ALTIVEC_BUILTIN_VSEL_2DI:
> case ALTIVEC_BUILTIN_VSEL_4SF:
> case ALTIVEC_BUILTIN_VSEL_2DF:
> {
> tree cond_tree = gimple_call_arg (stmt, 2);
> tree then_tree = gimple_call_arg (stmt, 0);
> tree else_tree = gimple_call_arg (stmt, 1);
> lhs = gimple_call_lhs (stmt);
> location_t loc = gimple_location (stmt);
> gimple_seq stmts = NULL;
>tree truth_cond_tree_type
>  = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree));
>tree truth_cond_tree
>  = gimple_convert (&stmts, loc, truth_cond_tree_type, cond_tree);
>gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree,
> then_tree, else_tree);
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> }
> ---8<---
>
> First issue (easier?) - The above code snippet works for the
> existing powerpc/fold-vec-sel-* testcases, except that when comparing
> before and after codegen, we end up with an extra pair of instructions
> (vspltisw and vcmpgtsh ~~ splat zero, compare).  This appears to be due
> to taking the "Fake op0 < 0" path during optabs.c:
> expand_vec_cond_expr(),  I've not fully exhausted my debug on that
> front, but mention it in case this is something obvious.  And, this is
> probably minor with respect to issue 2.
>
> Second issue - This works for the simple tests, so seems like the
> implementation should be close to correct, but triggers an ICE
> when trying to build some of the pre-existing tests.  After some
> investigation, this appears to be specific to those tests that have
> non-variable values for the condition vector.
>
> For instance, our powerpc/altivec-32.c testcase contains:
>   unsigned k = 1;
>   a = (vector unsigned) { 0, 0, 0, 1 };
>   b = c = (vector unsigned) { 0, 0, 0, 0 };
>   a = vec_add (a, vec_splats (k));
>   b = vec_add (b, a);
>   c = vec_sel (c, a, b);
>
> which ends up as...
> c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2});
> Our condition vector is the last argument, so this gets rearranged a bit
> when we build the vec_cond_expr, and we eventually get (at gimple time)
>
>   _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, { 
> 1, 1, 1, 2 }>;
>
> And we subsequently ICE (at expand time) when we hit a gcc_unreachable()
> in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)'
> value to a (boolean) zero or minus_one, and fail.
>
> during RTL pass: expand
> dump file: altivec-34.c.230r.expand
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:
>  In function ‘foo’:
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6:
>  internal compiler error: in const_vector_mask_from_tree, at expr.c:12247
> 0x1059fba7 const_vector_mask_from_tree
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247
>
> Ultimately, this seems like an impasse between the vec_sel intrinsic
> being a bit-wise operation, and the truth_vector being of a boolean
> type.   But.. i'm not certain.
> I had at an earlier time tried to implement vec_sel() up using a mix of
> BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some
> incredibly horrible codegen.  (which would be a no-go).  I may need to
> revisit that..

If vec_sel is really a bitwise merge then there's no choice but using
BIT_{AND,IOR,NOT}_EXPR for open-coding it.  I suppose the
original builtin expanded to an UNSPEC because there's nothing
in RTL besides bitwise operations describing the semantics.

Now - for constants that do select on element boundary (which seems
to be what you implemented) I'd have chosen not a VEC_COND_EXPR
but instead a VEC_PERM_EXPR (you have to translate the constant
bit-mask to a constant vector of indexes).  I don't think VEC_COND_EXPR
is a good choice for the middle-end or RTL expansion.

Richard.

> Thanks,
> -Will
>
>
>


[rfc rs6000] troubles with gimple folding for vec_sel

2018-10-23 Thread Will Schmidt
Hi all, 
I've been attempting to get early gimple-folding to work with the
vec_sel intrinsic for powerpc, and I've run into a snag or two such that
I'm not sure how to best proceed.  Code snippet is below, followed by a
description of the issues as I interpret them below.

Apologies for the ramble, Thanks in advance, ...  :-)

-Will

---8<---
/* vector selects */
/* d = vec_sel (a, b, c)
Each bit of the result vector (d) has the value of
the corresponding bit of (a) if the corresponding bit
of (c) is 0. Otherwise, each bit of the result vector
has the value of the corresponding bit of (b).  */
case ALTIVEC_BUILTIN_VSEL_16QI:
case ALTIVEC_BUILTIN_VSEL_8HI:
case ALTIVEC_BUILTIN_VSEL_4SI:
case ALTIVEC_BUILTIN_VSEL_2DI:
case ALTIVEC_BUILTIN_VSEL_4SF:
case ALTIVEC_BUILTIN_VSEL_2DF:
{
tree cond_tree = gimple_call_arg (stmt, 2);
tree then_tree = gimple_call_arg (stmt, 0);
tree else_tree = gimple_call_arg (stmt, 1);
lhs = gimple_call_lhs (stmt);
location_t loc = gimple_location (stmt);
gimple_seq stmts = NULL;
   tree truth_cond_tree_type
 = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree));
   tree truth_cond_tree
 = gimple_convert (&stmts, loc, truth_cond_tree_type, cond_tree);
   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree,
then_tree, else_tree);
   gimple_set_location (g, gimple_location (stmt));
   gsi_replace (gsi, g, true);
   return true;
}
---8<---

First issue (easier?) - The above code snippet works for the
existing powerpc/fold-vec-sel-* testcases, except that when comparing
before and after codegen, we end up with an extra pair of instructions
(vspltisw and vcmpgtsh ~~ splat zero, compare).  This appears to be due
to taking the "Fake op0 < 0" path during optabs.c:
expand_vec_cond_expr(),  I've not fully exhausted my debug on that
front, but mention it in case this is something obvious.  And, this is
probably minor with respect to issue 2.

Second issue - This works for the simple tests, so seems like the
implementation should be close to correct, but triggers an ICE
when trying to build some of the pre-existing tests.  After some
investigation, this appears to be specific to those tests that have
non-variable values for the condition vector.

For instance, our powerpc/altivec-32.c testcase contains: 
  unsigned k = 1;
  a = (vector unsigned) { 0, 0, 0, 1 };
  b = c = (vector unsigned) { 0, 0, 0, 0 };
  a = vec_add (a, vec_splats (k));
  b = vec_add (b, a);
  c = vec_sel (c, a, b);

which ends up as...
c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2});
Our condition vector is the last argument, so this gets rearranged a bit
when we build the vec_cond_expr, and we eventually get (at gimple time)

  _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, { 1, 
1, 1, 2 }>;

And we subsequently ICE (at expand time) when we hit a gcc_unreachable()
in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)'
value to a (boolean) zero or minus_one, and fail.

during RTL pass: expand
dump file: altivec-34.c.230r.expand
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:
 In function ‘foo’:
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6:
 internal compiler error: in const_vector_mask_from_tree, at expr.c:12247
0x1059fba7 const_vector_mask_from_tree
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247

Ultimately, this seems like an impasse between the vec_sel intrinsic
being a bit-wise operation, and the truth_vector being of a boolean
type.   But.. i'm not certain.
I had at an earlier time tried to implement vec_sel() up using a mix of
BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some
incredibly horrible codegen.  (which would be a no-go).  I may need to
revisit that..

Thanks,
-Will