On Tue, Oct 23, 2018 at 11:47 PM Will Schmidt wrote:
>
> Hi all,
> I've been attempting to get early gimple-folding to work with the
> vec_sel intrinsic for powerpc, and I've run into a snag or two such that
> I'm not sure how to best proceed. Code snippet is below, followed by a
> description of the issues as I interpret them below.
>
> Apologies for the ramble, Thanks in advance, ... :-)
>
> -Will
>
> ---8<---
> /* vector selects */
> /* d = vec_sel (a, b, c)
> Each bit of the result vector (d) has the value of
> the corresponding bit of (a) if the corresponding bit
> of (c) is 0. Otherwise, each bit of the result vector
> has the value of the corresponding bit of (b). */
> case ALTIVEC_BUILTIN_VSEL_16QI:
> case ALTIVEC_BUILTIN_VSEL_8HI:
> case ALTIVEC_BUILTIN_VSEL_4SI:
> case ALTIVEC_BUILTIN_VSEL_2DI:
> case ALTIVEC_BUILTIN_VSEL_4SF:
> case ALTIVEC_BUILTIN_VSEL_2DF:
> {
> tree cond_tree = gimple_call_arg (stmt, 2);
> tree then_tree = gimple_call_arg (stmt, 0);
> tree else_tree = gimple_call_arg (stmt, 1);
> lhs = gimple_call_lhs (stmt);
> location_t loc = gimple_location (stmt);
> gimple_seq stmts = NULL;
>tree truth_cond_tree_type
> = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree));
>tree truth_cond_tree
> = gimple_convert (&stmts, loc, truth_cond_tree_type, cond_tree);
>gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree,
> then_tree, else_tree);
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> }
> ---8<---
>
> First issue (easier?) - The above code snippet works for the
> existing powerpc/fold-vec-sel-* testcases, except that when comparing
> before and after codegen, we end up with an extra pair of instructions
> (vspltisw and vcmpgtsh ~~ splat zero, compare). This appears to be due
> to taking the "Fake op0 < 0" path during optabs.c:
> expand_vec_cond_expr(), I've not fully exhausted my debug on that
> front, but mention it in case this is something obvious. And, this is
> probably minor with respect to issue 2.
>
> Second issue - This works for the simple tests, so seems like the
> implementation should be close to correct, but triggers an ICE
> when trying to build some of the pre-existing tests. After some
> investigation, this appears to be specific to those tests that have
> non-variable values for the condition vector.
>
> For instance, our powerpc/altivec-32.c testcase contains:
> unsigned k = 1;
> a = (vector unsigned) { 0, 0, 0, 1 };
> b = c = (vector unsigned) { 0, 0, 0, 0 };
> a = vec_add (a, vec_splats (k));
> b = vec_add (b, a);
> c = vec_sel (c, a, b);
>
> which ends up as...
> c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2});
> Our condition vector is the last argument, so this gets rearranged a bit
> when we build the vec_cond_expr, and we eventually get (at gimple time)
>
> _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, {
> 1, 1, 1, 2 }>;
>
> And we subsequently ICE (at expand time) when we hit a gcc_unreachable()
> in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)'
> value to a (boolean) zero or minus_one, and fail.
>
> during RTL pass: expand
> dump file: altivec-34.c.230r.expand
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:
> In function ‘foo’:
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6:
> internal compiler error: in const_vector_mask_from_tree, at expr.c:12247
> 0x1059fba7 const_vector_mask_from_tree
> /home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247
>
> Ultimately, this seems like an impasse between the vec_sel intrinsic
> being a bit-wise operation, and the truth_vector being of a boolean
> type. But.. i'm not certain.
> I had at an earlier time tried to implement vec_sel() up using a mix of
> BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some
> incredibly horrible codegen. (which would be a no-go). I may need to
> revisit that..
If vec_sel is really a bitwise merge then there's no choice but using
BIT_{AND,IOR,NOT}_EXPR for open-coding it. I suppose the
original builtin expanded to an UNSPEC because there's nothing
in RTL besides bitwise operations describing the semantics.
Now - for constants that do select on element boundary (which seems
to be what you implemented) I'd have chosen not a VEC_COND_EXPR
but instead a VEC_PERM_EXPR (you have to translate the constant
bit-mask to a constant vector of indexes). I don't think VEC_COND_EXPR
is a good choice for the middle-end or RTL expansion.
Richard.
> Thanks,
> -Will
>
>
>