Re: [RFC] Try vector as a new representation for vector masks

2015-09-25 Thread Ilya Enkovich
2015-09-23 16:53 GMT+03:00 Richard Biener :
> On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich  wrote:
>> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
>>> 2015-09-18 15:22 GMT+03:00 Richard Biener :

 I was thinking about targets not supporting generating vec
 (of whatever mode) from a comparison directly but only via
 a COND_EXPR.
>>>
>>> Where may these direct comparisons come from? Vectorizer never
>>> generates unsupported statements. It means we get them from
>>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>>> Actually vect lowering checks if we are able to make comparison and
>>> expand also uses vec_cond to expand vector comparison, so probably we
>>> may live with them.
>>>

 Not sure if we are always talking about the same thing for
 "bool patterns".  I'd remove bool patterns completely, IMHO
 they are not necessary at all.
>>>
>>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>>> how to remove them completely for targets not supporting comparison
>>> vectorization.
>>>

 I think we do allow this, just the vectorizer doesn't expect it.  In the 
 long
 run I want to get rid of the GENERIC exprs in both COND_EXPR and
 VEC_COND_EXPR.  Just didn't have the time to do this...
>>>
>>> That would be nice. As a first step I'd like to support optabs for
>>> VEC_COND_EXPR directly using vec.
>>>
>>> Thanks,
>>> Ilya
>>>

 Richard.
>>
>> Hi Richard,
>>
>> Do you think we have enough confidence approach is working and we may
>> start integrating it into trunk? What would be integration plan then?
>
> I'm still worried about the vec vector size vs. element size
> issue (well, somewhat).

Yeah, I hit another problem related to element size in vec lowering.
It uses inner type sizes in expand_vector_piecewise and bool vector
expand goes in a wrong way. There were also other places with similar
problems and therefore I want to try to use bools of different sizes
and see how it goes. Also having different sized bools may be useful
to represent masks pack/unpack in scalar code.

>
> Otherwise the integration plan would be
>
>  1) put in the vector GIMPLE type support and change the vector
> comparison type IL requirement to be vector,
> fixing all fallout
>
>  2) get support for directly expanding vector comparisons to
> vector and make use of that from the x86 backend
>
>  3) make the vectorizer generate the above if supported
>
> I think independent improvements are
>
>  1) remove (most) of the bool patterns from the vectorizer
>
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded

Sounds great!

Ilya

>
> (same for COND_EXPR?)
>
> Richard.
>
>> Thanks,
>> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-25 Thread Richard Biener
On Thu, Sep 24, 2015 at 6:37 PM, Richard Henderson  wrote:
> On 09/24/2015 01:09 AM, Richard Biener wrote:
>> Both are basically a (target) restriction on how we should expand a 
>> conditional
>> move (and its condition).  It's techincally convenient to tie both together 
>> by
>> having them in the same statement but it's also techincally very incovenient
>> in other places.  I'd say for targets where
>>
>> tem_1 = a_2 < b_3;
>> res_4 = tem_1 ? c_5 : d_6;
>> res_7 = tem_1 ? x_8 : z_9;
>>
>> presents a serious issue ("re-using" the flags register) out-of-SSA should
>> duplicate the conditionals so that TER can do its job (and RTL expansion
>> should use TER to get at the flags setter).
>
> Sure it's a target restriction, but it's an extremely common one.  Essentially
> all of our production platforms have it.  What do we gain by adding some sort
> of target hook for this?

A cleaner IL, no GENERIC expression tree building in GIMPLE (I guess that's
sth Andrew needs for his GIMPLE types project as well), less awkward
special-casing of comparisons based on context in code like genmatch.c
or in value-numbering.

>> I imagine that if we expand the above to adjacent statements the CPUs can
>> re-use the condition code.
>
> Sure, but IMO it should be the job of RTL CSE to make that decision, after all
> of the uses (and clobbers) of the flags register have been exposed.
>
>> To me where the condition is in GIMPLE is an implementation detail and the
>> inconveniences outweight the benefits.
>
> Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
> inconvenient?

The inconvenience is not the number of operands but that we have two operation
codes and that we compute two values but only have an SSA name def for one
of them.  Oh, and did I mention that second operation is GENERIC?

So one way to clean things up would be to no longer use a GIMPLE_ASSIGN for
x = a < b ? c : d but instead use a GIMPLE_COND and give that a SSA def
for the result, using the true/false label operand places for 'c' and 'd'.

That still wouldn't get the compare a SSA def but at least it would get rid of
the 2nd operator code and the GENERIC expression operand.

>From the GIMPLE side forcing out the comparison to a separate stmt looks
more obvious and if we're considering doing a different thing then we may as
well think of how to represent predicating arbitrary stmts or how to explicitely
model condition codes in GIMPLE.

It kind of looks like we want a GIMPLE PARALLEL ... (we already have
a GIMPLE stmt with multiple defs - GIMPLE_ASM)

Richard.

>
> r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-24 Thread Richard Henderson
On 09/24/2015 01:09 AM, Richard Biener wrote:
> Both are basically a (target) restriction on how we should expand a 
> conditional
> move (and its condition).  It's techincally convenient to tie both together by
> having them in the same statement but it's also techincally very incovenient
> in other places.  I'd say for targets where
> 
> tem_1 = a_2 < b_3;
> res_4 = tem_1 ? c_5 : d_6;
> res_7 = tem_1 ? x_8 : z_9;
> 
> presents a serious issue ("re-using" the flags register) out-of-SSA should
> duplicate the conditionals so that TER can do its job (and RTL expansion
> should use TER to get at the flags setter).

Sure it's a target restriction, but it's an extremely common one.  Essentially
all of our production platforms have it.  What do we gain by adding some sort
of target hook for this?

> I imagine that if we expand the above to adjacent statements the CPUs can
> re-use the condition code.

Sure, but IMO it should be the job of RTL CSE to make that decision, after all
of the uses (and clobbers) of the flags register have been exposed.

> To me where the condition is in GIMPLE is an implementation detail and the
> inconveniences outweight the benefits.

Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
inconvenient?


r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-24 Thread Richard Biener
On Wed, Sep 23, 2015 at 8:44 PM, Richard Henderson  wrote:
> On 09/23/2015 06:53 AM, Richard Biener wrote:
>> I think independent improvements are
>>
>>  1) remove (most) of the bool patterns from the vectorizer
>>
>>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
>>
>> (same for COND_EXPR?)
>
> Careful.
>
> The reason that COND_EXPR have embedded comparisons is to handle flags
> registers.  You can't separate the setting of the flags from the using of the
> flags on most targets, because there's only one flags register.
>
> The same is true for VEC_COND_EXPR with respect to MIPS.  The base 
> architecture
> has 8 floating-point comparison result flags, and the vector compare
> instructions are fixed to set fcc[0:width-1].  So again there's only one
> possible output location for the result of the compare.
>
> MIPS is going to present a problem if we attempt to generalize logical
> combinations of these vector, since one has to use several instructions
> (or one insn and pre-load constants into two registers) to get the fcc bits 
> out
> into a form we can manipulate.

Both are basically a (target) restriction on how we should expand a conditional
move (and its condition).  It's techincally convenient to tie both together by
having them in the same statement but it's also techincally very incovenient
in other places.  I'd say for targets where

tem_1 = a_2 < b_3;
res_4 = tem_1 ? c_5 : d_6;
res_7 = tem_1 ? x_8 : z_9;

presents a serious issue ("re-using" the flags register) out-of-SSA
should duplicate
the conditionals so that TER can do its job (and RTL expansion should use TER
to get at the flags setter).  I imagine that if we expand the above to
adjacent statements
the CPUs can re-use the condition code.

To me where the condition is in GIMPLE is an implementation detail and the
inconveniences outweight the benefits.

Maybe we should make the effects of TER on the statement schedule explicitely
visible to make debugging that easier and remove the implicit scheduling from
the SSA name expansion code (basically require SSA names do have expanded defs).
That way we have the chance to perform pre-expansion "scheduling" in a more
predictable way leaving only the parts of the expansion using TER that want to
see a bigger expression (like [VEC_]COND_EXPR expansion eventually).

Richard.

>
> r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Henderson
On 09/23/2015 06:53 AM, Richard Biener wrote:
> I think independent improvements are
> 
>  1) remove (most) of the bool patterns from the vectorizer
> 
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
> 
> (same for COND_EXPR?)

Careful.

The reason that COND_EXPR have embedded comparisons is to handle flags
registers.  You can't separate the setting of the flags from the using of the
flags on most targets, because there's only one flags register.

The same is true for VEC_COND_EXPR with respect to MIPS.  The base architecture
has 8 floating-point comparison result flags, and the vector compare
instructions are fixed to set fcc[0:width-1].  So again there's only one
possible output location for the result of the compare.

MIPS is going to present a problem if we attempt to generalize logical
combinations of these vector, since one has to use several instructions
(or one insn and pre-load constants into two registers) to get the fcc bits out
into a form we can manipulate.


r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener
On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich  wrote:
> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
>> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>>
>>> I was thinking about targets not supporting generating vec
>>> (of whatever mode) from a comparison directly but only via
>>> a COND_EXPR.
>>
>> Where may these direct comparisons come from? Vectorizer never
>> generates unsupported statements. It means we get them from
>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>> Actually vect lowering checks if we are able to make comparison and
>> expand also uses vec_cond to expand vector comparison, so probably we
>> may live with them.
>>
>>>
>>> Not sure if we are always talking about the same thing for
>>> "bool patterns".  I'd remove bool patterns completely, IMHO
>>> they are not necessary at all.
>>
>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>> how to remove them completely for targets not supporting comparison
>> vectorization.
>>
>>>
>>> I think we do allow this, just the vectorizer doesn't expect it.  In the 
>>> long
>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>>> VEC_COND_EXPR.  Just didn't have the time to do this...
>>
>> That would be nice. As a first step I'd like to support optabs for
>> VEC_COND_EXPR directly using vec.
>>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>
> Hi Richard,
>
> Do you think we have enough confidence approach is working and we may
> start integrating it into trunk? What would be integration plan then?

I'm still worried about the vec vector size vs. element size
issue (well, somewhat).

Otherwise the integration plan would be

 1) put in the vector GIMPLE type support and change the vector
comparison type IL requirement to be vector,
fixing all fallout

 2) get support for directly expanding vector comparisons to
vector and make use of that from the x86 backend

 3) make the vectorizer generate the above if supported

I think independent improvements are

 1) remove (most) of the bool patterns from the vectorizer

 2) make VEC_COND_EXPR not have a GENERIC comparison embedded

(same for COND_EXPR?)

Richard.

> Thanks,
> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener
On Fri, Sep 18, 2015 at 3:40 PM, Ilya Enkovich  wrote:
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
>>> 2015-09-03 15:11 GMT+03:00 Richard Biener :
 On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  
 wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.

 Well, the frontends already create vec_cond exprs I believe.  So for
 bool patterns the vectorizer would have to do the same, but the
 comparison result in there would still use vec.  Thus the scalar

  _Bool a = b < c;
  _Bool c = a || d;
  if (c)

 would become

  vec a = VEC_COND ;
  vec c = a | d;
>>>
>>> This should be identical to
>>>
>>> vec<_Bool> a = a < b;
>>> vec<_Bool> c = a | d;
>>>
>>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>>> supports vector comparison into vec, right?
>>>

 when the target does not have vecs directly and otherwise
 vec directly (dropping the VEC_COND).

 Just the vector comparison inside the VEC_COND would always
 have vec type.
>>>
>>> I don't really understand what you mean by 'doesn't have vecs
>>> dirrectly' here. Currently I have a hook to ask for a vec mode
>>> and assume target doesn't support it in case it returns VOIDmode. But
>>> in such case I have no mode to use for vec inside VEC_COND
>>> either.
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier?

That's what I say - the vecotirzer wouldn't generate them.

> So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>>> In default implementation of the new target hook I always return
>>> integer vector mode (to have default behavior similar to the current
>>> one). It should allow me to use vec for conditions in all
>>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>>> in case comparison is supported by target? Or control it via
>>> additional hook.
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.

The vectorizer can vectorize comparisons by emitting a VEC_COND_EXPR
(the bool pattern would turn the comparison into a COND_EXPR).  I don't
see how the pattern intermediate step is necessary.  The important part
is to get the desired vector type of the comparison determined.

>>

 And the "bool patterns" I am talking about are those in
 tree-vect-patterns.c, not any targets instruction patterns.
>>>
>>> I refer to them also. BTW bool patterns also pull comparison into
>>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>>> think with vector comparisons in place we should allow SSA_NAME as
>>> conditions in VEC_COND for better CSE. That should require new vcond
>>> optabs though.
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Ilya
>>>

 Richard.

>>
>> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Ilya Enkovich
2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.
>
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.

Hi Richard,

Do you think we have enough confidence approach is working and we may
start integrating it into trunk? What would be integration plan then?

Thanks,
Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-21 Thread Richard Henderson
On 09/21/2015 05:08 AM, Ilya Enkovich wrote:
> There is no any conversion here, maskload_optab is a convert_optab
> because it uses two modes, one for value and the other one for mask.

Ah, I see.  In which case I think we ought to come up with a different name.
C.f. get_vcond_icode.


r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-21 Thread Ilya Enkovich
2015-09-18 19:50 GMT+03:00 Richard Henderson :
> On 09/18/2015 06:21 AM, Ilya Enkovich wrote:
 +machine_mode
 +default_get_mask_mode (unsigned nunits, unsigned vector_size)
 +{
 +  unsigned elem_size = vector_size / nunits;
 +  machine_mode elem_mode
 += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>>>
>>> Why these arguments as opposed to passing elem_size?  It seems that every 
>>> hook
>>> is going to have to do this division...
>>
>> Every target would have nunits = vector_size / elem_size because
>> nunits is used to create a vector mode. Thus no difference.
>
> I meant passing nunits and elem_size, but not vector_size.  Thus no division
> required.  If the target does require the vector size, it could be obtained by
> multiplication, which is cheaper.  But in cases like this we'd not require
> either mult or div.

OK

>
 @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
create_output_operand (&ops[0], target, TYPE_MODE (type));
create_fixed_operand (&ops[1], mem);
create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
 -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
 +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
 +   TYPE_MODE (TREE_TYPE (maskt))),
 +3, ops);
>>>
>>> Why do we now need a conversion here?
>>
>> Mask mode was implicit for masked loads and stores. Now it becomes
>> explicit because we may load the same value using different masks.
>> E.g. for i386 we may load 256bit vector using both vector and scalar
>> masks.
>
> Ok, sure, the mask mode is needed, I get that.  But that doesn't answer the
> question regarding conversion.  Why would convert_optab_handler be needed to
> *change* the mode of the mask.  I assume that's not actually possible, with 
> the
> target hook already having chosen the proper mode for the mask.

There is no any conversion here, maskload_optab is a convert_optab
because it uses two modes, one for value and the other one for mask.

Ilya

>
>
> r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Henderson
On 09/18/2015 06:21 AM, Ilya Enkovich wrote:
>>> +machine_mode
>>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>>> +{
>>> +  unsigned elem_size = vector_size / nunits;
>>> +  machine_mode elem_mode
>>> += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>>
>> Why these arguments as opposed to passing elem_size?  It seems that every 
>> hook
>> is going to have to do this division...
> 
> Every target would have nunits = vector_size / elem_size because
> nunits is used to create a vector mode. Thus no difference.

I meant passing nunits and elem_size, but not vector_size.  Thus no division
required.  If the target does require the vector size, it could be obtained by
multiplication, which is cheaper.  But in cases like this we'd not require
either mult or div.

>>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>>>create_output_operand (&ops[0], target, TYPE_MODE (type));
>>>create_fixed_operand (&ops[1], mem);
>>>create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
>>> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
>>> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
>>> +   TYPE_MODE (TREE_TYPE (maskt))),
>>> +3, ops);
>>
>> Why do we now need a conversion here?
> 
> Mask mode was implicit for masked loads and stores. Now it becomes
> explicit because we may load the same value using different masks.
> E.g. for i386 we may load 256bit vector using both vector and scalar
> masks.

Ok, sure, the mask mode is needed, I get that.  But that doesn't answer the
question regarding conversion.  Why would convert_optab_handler be needed to
*change* the mode of the mask.  I assume that's not actually possible, with the
target hook already having chosen the proper mode for the mask.


r~


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-18 15:29 GMT+03:00 Richard Biener :
> On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich  wrote:
>> On 08 Sep 15:37, Ilya Enkovich wrote:
>>> 2015-09-04 23:42 GMT+03:00 Jeff Law :
>>> >
>>> > So do we have enough confidence in this representation that we want to go
>>> > ahead and commit to it?
>>>
>>> I think new representation fits nice mostly. There are some places
>>> where I have to make some exceptions for vector of bools to make it
>>> work. This is mostly to avoid target modifications. I'd like to avoid
>>> necessity to change all targets currently supporting vec_cond. It
>>> makes me add some special handling of vec in GIMPLE, e.g. I add
>>> a special code in vect_init_vector to build vec invariants with
>>> proper casting to int. Otherwise I'd need to do it on a target side.
>>>
>>> I made several fixes and current patch (still allowing integer vector
>>> result for vector comparison and applying bool patterns) passes
>>> bootstrap and regression testing on x86_64. Now I'll try to fully
>>> switch to vec and see how it goes.
>>>
>>> Thanks,
>>> Ilya
>>>
>>
>> Hi,
>>
>> I made a step forward forcing vector comparisons have a mask (vec) 
>> result and disabling bool patterns in case vector comparison is supported by 
>> target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  
>> I had to make some modifications to use vec_cond instead.  Don't know if 
>> there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and 
>> mode sizes.  I fixed vector type size computation to match mode size and 
>> added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't 
>> layout it with VOID mode. I don't know why we may need a canonical type 
>> here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec constants/contstructors requires special handling.  
>> Common case should require target hooks/optabs to expand vector into 
>> required mode.  But I suppose we want to have a generic code to handle 
>> vector of int mode case to avoid modification of multiple targets which use 
>> default vec modes.
>
> One complication you might run into currently is that at the moment we
> require the comparison result to be
> of the same size as the comparison operands.  This means that
> vector with, say, 4 elements has
> to support different modes for v4si < v4si vs. v4df < v4df (if you
> think of x86 with its multiple vector sizes).
> That's for the "fallback" non-mask vector only of course.  Does
> that mean we have to use different
> bool types with different modes here?

I though about boolean types with different sizes/modes. I still avoid
them but it causes some ugliness. E.g. sizeof(innertype)*nelems !=
sizeof(vectortype) for vec. I causes some special handling in
type layout and problems in lowering because BIT_FIELD_REF uses more
bits than resulting type has. I use additional comparison to handle
it. Richard also proposed to extract one bit only for bools. Don't
know if differently sized boolean types may help to resolve this issue
or create more problems.

>
> So the other possibility is to never expose the fallback vector
> anywhere but make sure to lower to
> vector via VEC_COND_EXPRs.  After all it's only the vectorizer
> that should create stmts with
> vector LHS and the vectorizer is already careful to only
> generate code supported by the target.

In case vec has integer vector mode, comparison should be
handled similar to VEC_COND_EXPR by vect lowering and expand which
should be enough to have it properly handled on targets with no
vec support.

Thanks,
Ilya

>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). 
>>  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling. 
>>  I didn't look into all of them but it seems the main problem is in mixed 
>> type sizes.  With bool patterns and integer vector masks we just put 
>> int->(other sized int) conversion for masks and it gives us required mask 
>> transformation.  With boolean mask we don't have a proper scalar statements 
>> to do that.  I think mask widening/narrowing may be directly supported in 
>> masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split 
>> into several parts.
>>
>> Thanks,
>> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-18 15:22 GMT+03:00 Richard Biener :
> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
>> 2015-09-03 15:11 GMT+03:00 Richard Biener :
>>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  
>>> wrote:
 Adding CCs.

 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>
> Totally disabling old style vector comparison and bool pattern is a
> goal but doing hat would mean a lot of regressions for many targets.
> Do you want to it to be tried to estimate amount of changes required
> and reveal possible issues? What would be integration plan for these
> changes? Do you want to just introduce new vector in GIMPLE
> disabling bool patterns and then resolving vectorization regression on
> all targets or allow them live together with following target switch
> one by one from bool patterns with finally removing them? Not all
> targets are likely to be adopted fast I suppose.
>>>
>>> Well, the frontends already create vec_cond exprs I believe.  So for
>>> bool patterns the vectorizer would have to do the same, but the
>>> comparison result in there would still use vec.  Thus the scalar
>>>
>>>  _Bool a = b < c;
>>>  _Bool c = a || d;
>>>  if (c)
>>>
>>> would become
>>>
>>>  vec a = VEC_COND ;
>>>  vec c = a | d;
>>
>> This should be identical to
>>
>> vec<_Bool> a = a < b;
>> vec<_Bool> c = a | d;
>>
>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>> supports vector comparison into vec, right?
>>
>>>
>>> when the target does not have vecs directly and otherwise
>>> vec directly (dropping the VEC_COND).
>>>
>>> Just the vector comparison inside the VEC_COND would always
>>> have vec type.
>>
>> I don't really understand what you mean by 'doesn't have vecs
>> dirrectly' here. Currently I have a hook to ask for a vec mode
>> and assume target doesn't support it in case it returns VOIDmode. But
>> in such case I have no mode to use for vec inside VEC_COND
>> either.
>
> I was thinking about targets not supporting generating vec
> (of whatever mode) from a comparison directly but only via
> a COND_EXPR.

Where may these direct comparisons come from? Vectorizer never
generates unsupported statements. It means we get them from
gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
Actually vect lowering checks if we are able to make comparison and
expand also uses vec_cond to expand vector comparison, so probably we
may live with them.

>
>> In default implementation of the new target hook I always return
>> integer vector mode (to have default behavior similar to the current
>> one). It should allow me to use vec for conditions in all
>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>> in case comparison is supported by target? Or control it via
>> additional hook.
>
> Not sure if we are always talking about the same thing for
> "bool patterns".  I'd remove bool patterns completely, IMHO
> they are not necessary at all.

I refer to transformations made by vect_recog_bool_pattern. Don't see
how to remove them completely for targets not supporting comparison
vectorization.

>
>>>
>>> And the "bool patterns" I am talking about are those in
>>> tree-vect-patterns.c, not any targets instruction patterns.
>>
>> I refer to them also. BTW bool patterns also pull comparison into
>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>> think with vector comparisons in place we should allow SSA_NAME as
>> conditions in VEC_COND for better CSE. That should require new vcond
>> optabs though.
>
> I think we do allow this, just the vectorizer doesn't expect it.  In the long
> run I want to get rid of the GENERIC exprs in both COND_EXPR and
> VEC_COND_EXPR.  Just didn't have the time to do this...

That would be nice. As a first step I'd like to support optabs for
VEC_COND_EXPR directly using vec.

Thanks,
Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
>
> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-17 20:35 GMT+03:00 Richard Henderson :
> On 09/15/2015 06:52 AM, Ilya Enkovich wrote:
>> I made a step forward forcing vector comparisons have a mask (vec) 
>> result and disabling bool patterns in case vector comparison is supported by 
>> target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  
>> I had to make some modifications to use vec_cond instead.  Don't know if 
>> there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and 
>> mode sizes.  I fixed vector type size computation to match mode size and 
>> added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't 
>> layout it with VOID mode. I don't know why we may need a canonical type 
>> here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec constants/contstructors requires special handling.  
>> Common case should require target hooks/optabs to expand vector into 
>> required mode.  But I suppose we want to have a generic code to handle 
>> vector of int mode case to avoid modification of multiple targets which use 
>> default vec modes.
>>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). 
>>  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling. 
>>  I didn't look into all of them but it seems the main problem is in mixed 
>> type sizes.  With bool patterns and integer vector masks we just put 
>> int->(other sized int) conversion for masks and it gives us required mask 
>> transformation.  With boolean mask we don't have a proper scalar statements 
>> to do that.  I think mask widening/narrowing may be directly supported in 
>> masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split 
>> into several parts.
>
> The general approach looks good.
>

Great!

>
>> +/* By defaults a vector of integers is used as a mask.  */
>> +
>> +machine_mode
>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>> +{
>> +  unsigned elem_size = vector_size / nunits;
>> +  machine_mode elem_mode
>> += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>
> Why these arguments as opposed to passing elem_size?  It seems that every hook
> is going to have to do this division...

Every target would have nunits = vector_size / elem_size because
nunits is used to create a vector mode. Thus no difference.

>
>> +#define VECTOR_MASK_TYPE_P(TYPE) \
>> +  (TREE_CODE (TYPE) == VECTOR_TYPE   \
>> +   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)
>
> Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being 
> tested?

OK

>
>> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree 
>> op1)
>>return true;
>>  }
>>  }
>> -  /* Or an integer vector type with the same size and element count
>> +  /* Or a boolean vector type with the same element count
>>   as the comparison operand types.  */
>>else if (TREE_CODE (type) == VECTOR_TYPE
>> -&& TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
>> +&& TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>
> VECTOR_BOOLEAN_TYPE_P.
>
>> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>> tree t, tree bitsize, tree bitpos)
>>  {
>>if (bitpos)
>> -return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +{
>> +  if (TREE_CODE (type) == BOOLEAN_TYPE)
>> + {
>> +   tree itype
>> + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
>> +   tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
>> + bitsize, bitpos);
>> +   return gimplify_build2 (gsi, NE_EXPR, type, field,
>> +   build_zero_cst (itype));
>> + }
>> +  else
>> + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +}
>>else
>>  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>>  }
>
> So... this is us lowering vector operations on a target that doesn't support
> them.  Which means that we get to decide what value is produced for a
> comparison?  Which means that we can settle on the "normal" -1, correct?
>
> Which means that we ought not need to extract the entire element and then
> compare for non-zero, but rather simply extract a single bit from the element,
> and directly call that a boolean result, correct?

Didn't think about that. I'll give it a try.

>
> I assume you tested all this code with -mno-sse or equivalent arch default?

I didn't make some special runs for that. Just used regression testi

Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Biener
On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich  wrote:
> On 08 Sep 15:37, Ilya Enkovich wrote:
>> 2015-09-04 23:42 GMT+03:00 Jeff Law :
>> >
>> > So do we have enough confidence in this representation that we want to go
>> > ahead and commit to it?
>>
>> I think new representation fits nice mostly. There are some places
>> where I have to make some exceptions for vector of bools to make it
>> work. This is mostly to avoid target modifications. I'd like to avoid
>> necessity to change all targets currently supporting vec_cond. It
>> makes me add some special handling of vec in GIMPLE, e.g. I add
>> a special code in vect_init_vector to build vec invariants with
>> proper casting to int. Otherwise I'd need to do it on a target side.
>>
>> I made several fixes and current patch (still allowing integer vector
>> result for vector comparison and applying bool patterns) passes
>> bootstrap and regression testing on x86_64. Now I'll try to fully
>> switch to vec and see how it goes.
>>
>> Thanks,
>> Ilya
>>
>
> Hi,
>
> I made a step forward forcing vector comparisons have a mask (vec) 
> result and disabling bool patterns in case vector comparison is supported by 
> target.  Several issues were met.
>
>  - c/c++ front-ends generate vector comparison with integer vector result.  I 
> had to make some modifications to use vec_cond instead.  Don't know if there 
> are other front-ends producing vector comparisons.
>  - vector lowering fails to expand vector masks due to mismatch of type and 
> mode sizes.  I fixed vector type size computation to match mode size and 
> added a special handling of mask expand.
>  - I disabled canonical type creation for vector mask because we can't layout 
> it with VOID mode. I don't know why we may need a canonical type here.  But 
> get_mask_mode call may be moved into type layout to get it.
>  - Expand of vec constants/contstructors requires special handling.  
> Common case should require target hooks/optabs to expand vector into required 
> mode.  But I suppose we want to have a generic code to handle vector of int 
> mode case to avoid modification of multiple targets which use default 
> vec modes.

One complication you might run into currently is that at the moment we
require the comparison result to be
of the same size as the comparison operands.  This means that
vector with, say, 4 elements has
to support different modes for v4si < v4si vs. v4df < v4df (if you
think of x86 with its multiple vector sizes).
That's for the "fallback" non-mask vector only of course.  Does
that mean we have to use different
bool types with different modes here?

So the other possibility is to never expose the fallback vector
anywhere but make sure to lower to
vector via VEC_COND_EXPRs.  After all it's only the vectorizer
that should create stmts with
vector LHS and the vectorizer is already careful to only
generate code supported by the target.

> Currently 'make check' shows two types of regression.
>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  
> This must be due to my front-end changes.  Hope it will be easy to fix.
>   - missed vectorization. All of them appear due to bool patterns disabling.  
> I didn't look into all of them but it seems the main problem is in mixed type 
> sizes.  With bool patterns and integer vector masks we just put int->(other 
> sized int) conversion for masks and it gives us required mask transformation. 
>  With boolean mask we don't have a proper scalar statements to do that.  I 
> think mask widening/narrowing may be directly supported in masked statements 
> vectorization.  Going to look into it.
>
> I attach what I currently have for a prototype.  It grows bigger so I split 
> into several parts.
>
> Thanks,
> Ilya
> --
> * avx512-vec-bool-01-add-truth-vector.ChangeLog
>
> 2015-09-15  Ilya Enkovich  
>
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
> * stor-layout.c (layout_type): Use mode to get vector mask size.
> (vector_type_mode): Likewise.
> * target.def (get_mask_mode): New.
> * targhooks.c (default_vector_alignment): Use mode alignment
> for vector masks.
> (default_get_mask_mode): New.
> * targhooks.h (default_get_mask_mode): New.
> * tree.c (make_vector_type): Vector mask has no canonical type.
> (build_truth_vector_type): New.
> (build_same_sized_truth_vector_type): New.
> (truth_type_for): Support vector masks.
> * tree.h (VECTOR_MASK_TYPE_P): New.
> (build_truth_vector_type): New.
> (build_same_sized_truth_vector_type): New.
>
> * avx512-vec-bool-02-no-int-vec-cmp.ChangeLog
>
> gcc/
>
> 2015-09-15  Ilya Enkovich  
>
> * tree-cfg.c (verify_gimple_comparison) Require vector mask
> type for vector comparison.
> (verify_gimple_assign_ternary): Likewise.
>
> gcc/c
>
> 2015-09-15  Ilya Enkovich  
>
> * c-typeck.c (build_con

Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Biener
On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
> 2015-09-03 15:11 GMT+03:00 Richard Biener :
>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
>>> Adding CCs.
>>>
>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
 2015-09-01 17:25 GMT+03:00 Richard Biener :

 Totally disabling old style vector comparison and bool pattern is a
 goal but doing hat would mean a lot of regressions for many targets.
 Do you want to it to be tried to estimate amount of changes required
 and reveal possible issues? What would be integration plan for these
 changes? Do you want to just introduce new vector in GIMPLE
 disabling bool patterns and then resolving vectorization regression on
 all targets or allow them live together with following target switch
 one by one from bool patterns with finally removing them? Not all
 targets are likely to be adopted fast I suppose.
>>
>> Well, the frontends already create vec_cond exprs I believe.  So for
>> bool patterns the vectorizer would have to do the same, but the
>> comparison result in there would still use vec.  Thus the scalar
>>
>>  _Bool a = b < c;
>>  _Bool c = a || d;
>>  if (c)
>>
>> would become
>>
>>  vec a = VEC_COND ;
>>  vec c = a | d;
>
> This should be identical to
>
> vec<_Bool> a = a < b;
> vec<_Bool> c = a | d;
>
> where vec<_Bool> has VxSI mode. And we should prefer it in case target
> supports vector comparison into vec, right?
>
>>
>> when the target does not have vecs directly and otherwise
>> vec directly (dropping the VEC_COND).
>>
>> Just the vector comparison inside the VEC_COND would always
>> have vec type.
>
> I don't really understand what you mean by 'doesn't have vecs
> dirrectly' here. Currently I have a hook to ask for a vec mode
> and assume target doesn't support it in case it returns VOIDmode. But
> in such case I have no mode to use for vec inside VEC_COND
> either.

I was thinking about targets not supporting generating vec
(of whatever mode) from a comparison directly but only via
a COND_EXPR.

> In default implementation of the new target hook I always return
> integer vector mode (to have default behavior similar to the current
> one). It should allow me to use vec for conditions in all
> vec_cond. But we'd need some other trigger for bool patterns to apply.
> Probably check vec_cmp optab in check_bool_pattern and don't convert
> in case comparison is supported by target? Or control it via
> additional hook.

Not sure if we are always talking about the same thing for
"bool patterns".  I'd remove bool patterns completely, IMHO
they are not necessary at all.

>>
>> And the "bool patterns" I am talking about are those in
>> tree-vect-patterns.c, not any targets instruction patterns.
>
> I refer to them also. BTW bool patterns also pull comparison into
> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
> think with vector comparisons in place we should allow SSA_NAME as
> conditions in VEC_COND for better CSE. That should require new vcond
> optabs though.

I think we do allow this, just the vectorizer doesn't expect it.  In the long
run I want to get rid of the GENERIC exprs in both COND_EXPR and
VEC_COND_EXPR.  Just didn't have the time to do this...

Richard.

> Ilya
>
>>
>> Richard.
>>

 Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-17 Thread Richard Henderson
On 09/15/2015 06:52 AM, Ilya Enkovich wrote:
> I made a step forward forcing vector comparisons have a mask (vec) 
> result and disabling bool patterns in case vector comparison is supported by 
> target.  Several issues were met.
> 
>  - c/c++ front-ends generate vector comparison with integer vector result.  I 
> had to make some modifications to use vec_cond instead.  Don't know if there 
> are other front-ends producing vector comparisons.
>  - vector lowering fails to expand vector masks due to mismatch of type and 
> mode sizes.  I fixed vector type size computation to match mode size and 
> added a special handling of mask expand.
>  - I disabled canonical type creation for vector mask because we can't layout 
> it with VOID mode. I don't know why we may need a canonical type here.  But 
> get_mask_mode call may be moved into type layout to get it.
>  - Expand of vec constants/contstructors requires special handling.  
> Common case should require target hooks/optabs to expand vector into required 
> mode.  But I suppose we want to have a generic code to handle vector of int 
> mode case to avoid modification of multiple targets which use default 
> vec modes.
> 
> Currently 'make check' shows two types of regression.
>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  
> This must be due to my front-end changes.  Hope it will be easy to fix.
>   - missed vectorization. All of them appear due to bool patterns disabling.  
> I didn't look into all of them but it seems the main problem is in mixed type 
> sizes.  With bool patterns and integer vector masks we just put int->(other 
> sized int) conversion for masks and it gives us required mask transformation. 
>  With boolean mask we don't have a proper scalar statements to do that.  I 
> think mask widening/narrowing may be directly supported in masked statements 
> vectorization.  Going to look into it.
> 
> I attach what I currently have for a prototype.  It grows bigger so I split 
> into several parts.

The general approach looks good.


> +/* By defaults a vector of integers is used as a mask.  */
> +
> +machine_mode
> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
> +{
> +  unsigned elem_size = vector_size / nunits;
> +  machine_mode elem_mode
> += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);

Why these arguments as opposed to passing elem_size?  It seems that every hook
is going to have to do this division...

> +#define VECTOR_MASK_TYPE_P(TYPE) \
> +  (TREE_CODE (TYPE) == VECTOR_TYPE   \
> +   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)

Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being 
tested?

> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree 
> op1)
>return true;
>  }
>  }
> -  /* Or an integer vector type with the same size and element count
> +  /* Or a boolean vector type with the same element count
>   as the comparison operand types.  */
>else if (TREE_CODE (type) == VECTOR_TYPE
> -&& TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
> +&& TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)

VECTOR_BOOLEAN_TYPE_P.

> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
> tree t, tree bitsize, tree bitpos)
>  {
>if (bitpos)
> -return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
> +{
> +  if (TREE_CODE (type) == BOOLEAN_TYPE)
> + {
> +   tree itype
> + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
> +   tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
> + bitsize, bitpos);
> +   return gimplify_build2 (gsi, NE_EXPR, type, field,
> +   build_zero_cst (itype));
> + }
> +  else
> + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
> +}
>else
>  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>  }

So... this is us lowering vector operations on a target that doesn't support
them.  Which means that we get to decide what value is produced for a
comparison?  Which means that we can settle on the "normal" -1, correct?

Which means that we ought not need to extract the entire element and then
compare for non-zero, but rather simply extract a single bit from the element,
and directly call that a boolean result, correct?

I assume you tested all this code with -mno-sse or equivalent arch default?

> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>create_output_operand (&ops[0], target, TYPE_MODE (type));
>create_fixed_operand (&ops[1], mem);
>create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
> + 

Re: [RFC] Try vector as a new representation for vector masks

2015-09-15 Thread Ilya Enkovich
On 08 Sep 15:37, Ilya Enkovich wrote:
> 2015-09-04 23:42 GMT+03:00 Jeff Law :
> >
> > So do we have enough confidence in this representation that we want to go
> > ahead and commit to it?
> 
> I think new representation fits nice mostly. There are some places
> where I have to make some exceptions for vector of bools to make it
> work. This is mostly to avoid target modifications. I'd like to avoid
> necessity to change all targets currently supporting vec_cond. It
> makes me add some special handling of vec in GIMPLE, e.g. I add
> a special code in vect_init_vector to build vec invariants with
> proper casting to int. Otherwise I'd need to do it on a target side.
> 
> I made several fixes and current patch (still allowing integer vector
> result for vector comparison and applying bool patterns) passes
> bootstrap and regression testing on x86_64. Now I'll try to fully
> switch to vec and see how it goes.
> 
> Thanks,
> Ilya
> 

Hi,

I made a step forward forcing vector comparisons have a mask (vec) result 
and disabling bool patterns in case vector comparison is supported by target.  
Several issues were met.

 - c/c++ front-ends generate vector comparison with integer vector result.  I 
had to make some modifications to use vec_cond instead.  Don't know if there 
are other front-ends producing vector comparisons.
 - vector lowering fails to expand vector masks due to mismatch of type and 
mode sizes.  I fixed vector type size computation to match mode size and added 
a special handling of mask expand.
 - I disabled canonical type creation for vector mask because we can't layout 
it with VOID mode. I don't know why we may need a canonical type here.  But 
get_mask_mode call may be moved into type layout to get it.
 - Expand of vec constants/contstructors requires special handling.  
Common case should require target hooks/optabs to expand vector into required 
mode.  But I suppose we want to have a generic code to handle vector of int 
mode case to avoid modification of multiple targets which use default vec 
modes.

Currently 'make check' shows two types of regression.
  - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  
This must be due to my front-end changes.  Hope it will be easy to fix.
  - missed vectorization. All of them appear due to bool patterns disabling.  I 
didn't look into all of them but it seems the main problem is in mixed type 
sizes.  With bool patterns and integer vector masks we just put int->(other 
sized int) conversion for masks and it gives us required mask transformation.  
With boolean mask we don't have a proper scalar statements to do that.  I think 
mask widening/narrowing may be directly supported in masked statements 
vectorization.  Going to look into it.

I attach what I currently have for a prototype.  It grows bigger so I split 
into several parts.

Thanks,
Ilya
--
* avx512-vec-bool-01-add-truth-vector.ChangeLog

2015-09-15  Ilya Enkovich  

* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
* stor-layout.c (layout_type): Use mode to get vector mask size.
(vector_type_mode): Likewise.
* target.def (get_mask_mode): New.
* targhooks.c (default_vector_alignment): Use mode alignment
for vector masks.
(default_get_mask_mode): New.
* targhooks.h (default_get_mask_mode): New.
* tree.c (make_vector_type): Vector mask has no canonical type.
(build_truth_vector_type): New.
(build_same_sized_truth_vector_type): New.
(truth_type_for): Support vector masks.
* tree.h (VECTOR_MASK_TYPE_P): New.
(build_truth_vector_type): New.
(build_same_sized_truth_vector_type): New.

* avx512-vec-bool-02-no-int-vec-cmp.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  

* tree-cfg.c (verify_gimple_comparison) Require vector mask
type for vector comparison.
(verify_gimple_assign_ternary): Likewise.

gcc/c

2015-09-15  Ilya Enkovich  

* c-typeck.c (build_conditional_expr): Use vector mask
type for vector comparison.
(build_vec_cmp): New.
(build_binary_op): Use build_vec_cmp for comparison.

gcc/cp

2015-09-15  Ilya Enkovich  

* call.c (build_conditional_expr_1): Use vector mask
type for vector comparison.
* typeck.c (build_vec_cmp): New.
(cp_build_binary_op): Use build_vec_cmp for comparison.

* avx512-vec-bool-03-vec-lower.ChangeLog

2015-09-15  Ilya Enkovich  

* tree-vect-generic.c (tree_vec_extract): Use additional
comparison when extracting boolean value.
(do_bool_compare): New.
(expand_vector_comparison): Add casts for vector mask.
(expand_vector_divmod): Use vector mask type for vector
comparison.
(expand_vector_operations_1) Skip scalar mode mask statements.

* avx512-vec-bool-04-vectorize.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  

* expr.c (do_store_flag): Us

Re: [RFC] Try vector as a new representation for vector masks

2015-09-08 Thread Ilya Enkovich
2015-09-04 23:42 GMT+03:00 Jeff Law :
> On 09/01/2015 07:08 AM, Ilya Enkovich wrote:
>>
>> On 27 Aug 09:55, Richard Biener wrote:
>>>
>>> I suggest you try modifying those parts first according to this
>>> scheme that will most likely uncover issues we missed.
>>>
>>> Thanks, Richard.
>>>
>>
>> I tried to implement this scheme and apply it for MASK_LOAD and
>> MASK_STORE.  There were no major issues (for now).
>
> So do we have enough confidence in this representation that we want to go
> ahead and commit to it?

I think new representation fits nice mostly. There are some places
where I have to make some exceptions for vector of bools to make it
work. This is mostly to avoid target modifications. I'd like to avoid
necessity to change all targets currently supporting vec_cond. It
makes me add some special handling of vec in GIMPLE, e.g. I add
a special code in vect_init_vector to build vec invariants with
proper casting to int. Otherwise I'd need to do it on a target side.

I made several fixes and current patch (still allowing integer vector
result for vector comparison and applying bool patterns) passes
bootstrap and regression testing on x86_64. Now I'll try to fully
switch to vec and see how it goes.

Thanks,
Ilya

>
>>
>> I had to introduce significant number of new patterns in i386 target
>> to support new optabs.  The reason was vector compare was never
>> expanded separately and always was a part of a vec_cond expansion.
>
> One could argue we should have fixed this already, so I don't see the new
> patterns as a bad thing, but instead they're addressing a long term
> mis-design.
>
>>
>>
>> For now I still don't disable bool patterns, thus new masks apply to
>> masked loads and stores only.  Patch is also not tested and tried on
>> several small tests only.  Could you please look at what I currently
>> have and say if it's in sync with your view on vector masking?
>
> I'm going to let Richi run with this for the most part -- but I will chime
> in with a thank you for being willing to bounce this around a bit while we
> figure out the representational issues.
>
>
> jeff



gcc/

2015-09-08  Ilya Enkovich  

* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
(ix86_expand_int_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
op_true and op_false.
(ix86_int_cmp_code_to_pcmp_immediate): New.
(ix86_fp_cmp_code_to_pcmp_immediate): New.
(ix86_cmp_code_to_pcmp_immediate): New.
(ix86_expand_mask_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
(ix86_expand_int_sse_cmp): New.
(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
(ix86_expand_int_vec_cmp): New.
(ix86_get_mask_mode): New.
(TARGET_VECTORIZE_GET_MASK_MODE): New.
* config/i386/sse.md (avx512fmaskmodelower): New.
(vec_cmp): New.
(vec_cmp): New.
(vec_cmpv2div2di): New.
(vec_cmpu): New.
(vec_cmpu): New.
(vec_cmpuv2div2di): New.
(maskload): Rename to ...
(maskload): ... this.
(maskstore): Rename to ...
(maskstore): ... this.
(maskload): New.
(maskstore): New.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
(expand_MASK_STORE): Likewise.
* optabs.c (vector_compare_rtx): Add OPNO arg.
(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
(get_vec_cmp_icode): New.
(expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* optabs.def (vec_cmp_optab): New.
(vec_cmpu_optab): New.
(maskload_optab): Transform into convert optab.
(maskstore_optab): Likewise.
* optabs.h (expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* target.def (get_mask_mode): New.
* targhooks.c (default_vector_alignment): Use mode alignment
for vector masks.
(default_get_mask_mode): New.
* targhooks.h (default_get_mask_mode): New.
* tree-cfg.c (verify_gimple_comparison): Support vector mask.
* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
can_vec_mask_load_store_p signature change.
(predicate_mem_writes): Use boolean mask.
* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-generic.c (expand_vector_comparison): Use
expand_vec_cmp_expr_p for comparison availability.
(expand_vector_operations_1): Ignore mask statements with scalar mode.
* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
operations for VF.  Add mask type computation.
* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
(vect_get_vec_def_for_operand): Support mask constant.
(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
signature change.
(vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
(get_mask_type_for_scalar_type): New.
* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
(enum st

Re: [RFC] Try vector as a new representation for vector masks

2015-09-04 Thread Jeff Law

On 09/01/2015 07:08 AM, Ilya Enkovich wrote:

On 27 Aug 09:55, Richard Biener wrote:

I suggest you try modifying those parts first according to this
scheme that will most likely uncover issues we missed.

Thanks, Richard.



I tried to implement this scheme and apply it for MASK_LOAD and
MASK_STORE.  There were no major issues (for now).
So do we have enough confidence in this representation that we want to 
go ahead and commit to it?




I had to introduce significant number of new patterns in i386 target
to support new optabs.  The reason was vector compare was never
expanded separately and always was a part of a vec_cond expansion.
One could argue we should have fixed this already, so I don't see the 
new patterns as a bad thing, but instead they're addressing a long term 
mis-design.





For now I still don't disable bool patterns, thus new masks apply to
masked loads and stores only.  Patch is also not tested and tried on
several small tests only.  Could you please look at what I currently
have and say if it's in sync with your view on vector masking?
I'm going to let Richi run with this for the most part -- but I will 
chime in with a thank you for being willing to bounce this around a bit 
while we figure out the representational issues.



jeff


Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Ilya Enkovich
2015-09-03 15:11 GMT+03:00 Richard Biener :
> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
>> Adding CCs.
>>
>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>>
>>> Totally disabling old style vector comparison and bool pattern is a
>>> goal but doing hat would mean a lot of regressions for many targets.
>>> Do you want to it to be tried to estimate amount of changes required
>>> and reveal possible issues? What would be integration plan for these
>>> changes? Do you want to just introduce new vector in GIMPLE
>>> disabling bool patterns and then resolving vectorization regression on
>>> all targets or allow them live together with following target switch
>>> one by one from bool patterns with finally removing them? Not all
>>> targets are likely to be adopted fast I suppose.
>
> Well, the frontends already create vec_cond exprs I believe.  So for
> bool patterns the vectorizer would have to do the same, but the
> comparison result in there would still use vec.  Thus the scalar
>
>  _Bool a = b < c;
>  _Bool c = a || d;
>  if (c)
>
> would become
>
>  vec a = VEC_COND ;
>  vec c = a | d;

This should be identical to

vec<_Bool> a = a < b;
vec<_Bool> c = a | d;

where vec<_Bool> has VxSI mode. And we should prefer it in case target
supports vector comparison into vec, right?

>
> when the target does not have vecs directly and otherwise
> vec directly (dropping the VEC_COND).
>
> Just the vector comparison inside the VEC_COND would always
> have vec type.

I don't really understand what you mean by 'doesn't have vecs
dirrectly' here. Currently I have a hook to ask for a vec mode
and assume target doesn't support it in case it returns VOIDmode. But
in such case I have no mode to use for vec inside VEC_COND
either.

In default implementation of the new target hook I always return
integer vector mode (to have default behavior similar to the current
one). It should allow me to use vec for conditions in all
vec_cond. But we'd need some other trigger for bool patterns to apply.
Probably check vec_cmp optab in check_bool_pattern and don't convert
in case comparison is supported by target? Or control it via
additional hook.

>
> And the "bool patterns" I am talking about are those in
> tree-vect-patterns.c, not any targets instruction patterns.

I refer to them also. BTW bool patterns also pull comparison into
vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
think with vector comparisons in place we should allow SSA_NAME as
conditions in VEC_COND for better CSE. That should require new vcond
optabs though.

Ilya

>
> Richard.
>
>>>
>>> Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Richard Biener
On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich  
>>> wrote:
 On 27 Aug 09:55, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  
> wrote:
> >
> > Yes, I want to try it. But getting rid of bool patterns would mean
> > support for all targets currently supporting vec_cond. Would it be OK
> > to have vector mask co-exist with bool patterns for some time?
>
> No, I'd like to remove the bool patterns anyhow - the vectorizer should 
> be able
> to figure out the correct vector type (or mask type) from the uses.  
> Currently
> it simply looks at the stmts LHS type but as all stmt operands already 
> have
> vector types it can as well compute the result type from those.  We'd 
> want to
> have a helper function that does this result type computation as I figure 
> it
> will be needed in multiple places.
>
> This is now on my personal TODO list (but that's already quite long for 
> GCC 6),
> so if you manage to get to that...  see
> tree-vect-loop.c:vect_determine_vectorization_factor
> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
> vector type set from data-ref analysis already - there 'bool' loads
> correctly get
> VNQImode).  There is a basic-block / SLP part as well that would need to 
> use
> the helper function (eventually with some SLP discovery order issue).
>
> > Thus first step would be to require vector for MASK_LOAD and
> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> > MASK_STORE).
>
> You can certainly try that first, but as soon as you hit complications 
> with
> needing to adjust bool patterns then I'd rather get rid of them.
>
> >
> > I can directly build a vector type with specified mode to avoid it. 
> > Smth. like:
> >
> > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
> > current_vector_size);
> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>
> Hmm, indeed, that might be a (good) solution.  Btw, in this case
> target attribute
> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change 
> depending
> on the active target).  There would also be no way for the user to
> declare vector
> in source (which is good because of that target attribute issue...).
>
> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
> and adjusting
> truth_type_for is the way to go.
>
> I suggest you try modifying those parts first according to this scheme
> that will most
> likely uncover issues we missed.
>
> Thanks,
> Richard.
>

 I tried to implement this scheme and apply it for MASK_LOAD and 
 MASK_STORE.  There were no major issues (for now).

 build_truth_vector_type and get_mask_type_for_scalar_type were added to 
 build a mask type.  It is always a vector of bools but its mode is 
 determined by a target using number of units and currently used vector 
 length.

 As previously I fixed if-conversion to apply boolean masks for loads and 
 stores which automatically disables bool patterns for them and flow goes 
 by a mask path.  Vectorization factor computation is fixed to have a 
 separate computation for mask types.  Comparison is now handled separately 
 by vectorizer and is vectorized into vector comparison.

 Optabs for masked loads and stores were transformed into convert optabs.  
 Now it is checked using both value and mask modes.

 Optabs for comparison were added.  These are also convert optabs checking 
 value and result type.

 I had to introduce significant number of new patterns in i386 target to 
 support new optabs.  The reason was vector compare was never expanded 
 separately and always was a part of a vec_cond expansion.
>>>
>>> Indeed.
>>>
 As a result it's possible to use the sage GIMPLE representation for both 
 vector and scalar masks target type.  Here is an example I used as a 
 simple test:

   for (i=0; i>>>   {
 float t = a[i];
 if (t > 0.0f && t < 1.0e+2f)
   if (c[i] != 0)
 c[i] = 1;
   }

 Produced vector GIMPLE (before expand):

   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
 0.0 };
   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 

Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Ilya Enkovich
Adding CCs.

2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich  wrote:
>>> On 27 Aug 09:55, Richard Biener wrote:
 On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  
 wrote:
 >
 > Yes, I want to try it. But getting rid of bool patterns would mean
 > support for all targets currently supporting vec_cond. Would it be OK
 > to have vector mask co-exist with bool patterns for some time?

 No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
 able
 to figure out the correct vector type (or mask type) from the uses.  
 Currently
 it simply looks at the stmts LHS type but as all stmt operands already have
 vector types it can as well compute the result type from those.  We'd want 
 to
 have a helper function that does this result type computation as I figure 
 it
 will be needed in multiple places.

 This is now on my personal TODO list (but that's already quite long for 
 GCC 6),
 so if you manage to get to that...  see
 tree-vect-loop.c:vect_determine_vectorization_factor
 which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
 vector type set from data-ref analysis already - there 'bool' loads
 correctly get
 VNQImode).  There is a basic-block / SLP part as well that would need to 
 use
 the helper function (eventually with some SLP discovery order issue).

 > Thus first step would be to require vector for MASK_LOAD and
 > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
 > MASK_STORE).

 You can certainly try that first, but as soon as you hit complications with
 needing to adjust bool patterns then I'd rather get rid of them.

 >
 > I can directly build a vector type with specified mode to avoid it. 
 > Smth. like:
 >
 > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
 > current_vector_size);
 > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

 Hmm, indeed, that might be a (good) solution.  Btw, in this case
 target attribute
 boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
 on the active target).  There would also be no way for the user to
 declare vector
 in source (which is good because of that target attribute issue...).

 So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
 and adjusting
 truth_type_for is the way to go.

 I suggest you try modifying those parts first according to this scheme
 that will most
 likely uncover issues we missed.

 Thanks,
 Richard.

>>>
>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. 
>>>  There were no major issues (for now).
>>>
>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to 
>>> build a mask type.  It is always a vector of bools but its mode is 
>>> determined by a target using number of units and currently used vector 
>>> length.
>>>
>>> As previously I fixed if-conversion to apply boolean masks for loads and 
>>> stores which automatically disables bool patterns for them and flow goes by 
>>> a mask path.  Vectorization factor computation is fixed to have a separate 
>>> computation for mask types.  Comparison is now handled separately by 
>>> vectorizer and is vectorized into vector comparison.
>>>
>>> Optabs for masked loads and stores were transformed into convert optabs.  
>>> Now it is checked using both value and mask modes.
>>>
>>> Optabs for comparison were added.  These are also convert optabs checking 
>>> value and result type.
>>>
>>> I had to introduce significant number of new patterns in i386 target to 
>>> support new optabs.  The reason was vector compare was never expanded 
>>> separately and always was a part of a vec_cond expansion.
>>
>> Indeed.
>>
>>> As a result it's possible to use the sage GIMPLE representation for both 
>>> vector and scalar masks target type.  Here is an example I used as a simple 
>>> test:
>>>
>>>   for (i=0; i>>   {
>>> float t = a[i];
>>> if (t > 0.0f && t < 1.0e+2f)
>>>   if (c[i] != 0)
>>> c[i] = 1;
>>>   }
>>>
>>> Produced vector GIMPLE (before expand):
>>>
>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
>>> 0.0 };
>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
>>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 
>>> });
>>
>> Looks good to me.
>>
>>> Produce

Re: [RFC] Try vector as a new representation for vector masks

2015-09-01 Thread Richard Biener
On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich  wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  
>> wrote:
>> >
>> > Yes, I want to try it. But getting rid of bool patterns would mean
>> > support for all targets currently supporting vec_cond. Would it be OK
>> > to have vector mask co-exist with bool patterns for some time?
>>
>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
>> able
>> to figure out the correct vector type (or mask type) from the uses.  
>> Currently
>> it simply looks at the stmts LHS type but as all stmt operands already have
>> vector types it can as well compute the result type from those.  We'd want to
>> have a helper function that does this result type computation as I figure it
>> will be needed in multiple places.
>>
>> This is now on my personal TODO list (but that's already quite long for GCC 
>> 6),
>> so if you manage to get to that...  see
>> tree-vect-loop.c:vect_determine_vectorization_factor
>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>> vector type set from data-ref analysis already - there 'bool' loads
>> correctly get
>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>> the helper function (eventually with some SLP discovery order issue).
>>
>> > Thus first step would be to require vector for MASK_LOAD and
>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>> > MASK_STORE).
>>
>> You can certainly try that first, but as soon as you hit complications with
>> needing to adjust bool patterns then I'd rather get rid of them.
>>
>> >
>> > I can directly build a vector type with specified mode to avoid it. Smth. 
>> > like:
>> >
>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>
>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>> target attribute
>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>> on the active target).  There would also be no way for the user to
>> declare vector
>> in source (which is good because of that target attribute issue...).
>>
>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>> and adjusting
>> truth_type_for is the way to go.
>>
>> I suggest you try modifying those parts first according to this scheme
>> that will most
>> likely uncover issues we missed.
>>
>> Thanks,
>> Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  
> There were no major issues (for now).
>
> build_truth_vector_type and get_mask_type_for_scalar_type were added to build 
> a mask type.  It is always a vector of bools but its mode is determined by a 
> target using number of units and currently used vector length.
>
> As previously I fixed if-conversion to apply boolean masks for loads and 
> stores which automatically disables bool patterns for them and flow goes by a 
> mask path.  Vectorization factor computation is fixed to have a separate 
> computation for mask types.  Comparison is now handled separately by 
> vectorizer and is vectorized into vector comparison.
>
> Optabs for masked loads and stores were transformed into convert optabs.  Now 
> it is checked using both value and mask modes.
>
> Optabs for comparison were added.  These are also convert optabs checking 
> value and result type.
>
> I had to introduce significant number of new patterns in i386 target to 
> support new optabs.  The reason was vector compare was never expanded 
> separately and always was a part of a vec_cond expansion.

Indeed.

> As a result it's possible to use the sage GIMPLE representation for both 
> vector and scalar masks target type.  Here is an example I used as a simple 
> test:
>
>   for (i=0; i   {
> float t = a[i];
> if (t > 0.0f && t < 1.0e+2f)
>   if (c[i] != 0)
> c[i] = 1;
>   }
>
> Produced vector GIMPLE (before expand):
>
>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 
> };
>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 
> });

Looks good to me.

> Produced assembler on AVX-512:
>
> vmovups (%rdi), %zmm0
> vcmpps  $25, %zmm5, %zmm0, %k1
> vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
> vmovdqa32   -64(%rdx), %zmm2{%k1}
> vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
> vmovdqa32   %zmm4, (%rcx){%k1}
>
> Produced assembler on AVX-2:
>
> vmovups (%rdx), %xmm1
> vinser

[RFC] Try vector as a new representation for vector masks

2015-09-01 Thread Ilya Enkovich
On 27 Aug 09:55, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  wrote:
> >
> > Yes, I want to try it. But getting rid of bool patterns would mean
> > support for all targets currently supporting vec_cond. Would it be OK
> > to have vector mask co-exist with bool patterns for some time?
> 
> No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
> able
> to figure out the correct vector type (or mask type) from the uses.  Currently
> it simply looks at the stmts LHS type but as all stmt operands already have
> vector types it can as well compute the result type from those.  We'd want to
> have a helper function that does this result type computation as I figure it
> will be needed in multiple places.
> 
> This is now on my personal TODO list (but that's already quite long for GCC 
> 6),
> so if you manage to get to that...  see
> tree-vect-loop.c:vect_determine_vectorization_factor
> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
> vector type set from data-ref analysis already - there 'bool' loads
> correctly get
> VNQImode).  There is a basic-block / SLP part as well that would need to use
> the helper function (eventually with some SLP discovery order issue).
> 
> > Thus first step would be to require vector for MASK_LOAD and
> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> > MASK_STORE).
> 
> You can certainly try that first, but as soon as you hit complications with
> needing to adjust bool patterns then I'd rather get rid of them.
> 
> >
> > I can directly build a vector type with specified mode to avoid it. Smth. 
> > like:
> >
> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
> 
> Hmm, indeed, that might be a (good) solution.  Btw, in this case
> target attribute
> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
> on the active target).  There would also be no way for the user to
> declare vector
> in source (which is good because of that target attribute issue...).
> 
> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
> and adjusting
> truth_type_for is the way to go.
> 
> I suggest you try modifying those parts first according to this scheme
> that will most
> likely uncover issues we missed.
> 
> Thanks,
> Richard.
> 

I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  
There were no major issues (for now).

build_truth_vector_type and get_mask_type_for_scalar_type were added to build a 
mask type.  It is always a vector of bools but its mode is determined by a 
target using number of units and currently used vector length.

As previously I fixed if-conversion to apply boolean masks for loads and stores 
which automatically disables bool patterns for them and flow goes by a mask 
path.  Vectorization factor computation is fixed to have a separate computation 
for mask types.  Comparison is now handled separately by vectorizer and is 
vectorized into vector comparison.

Optabs for masked loads and stores were transformed into convert optabs.  Now 
it is checked using both value and mask modes.

Optabs for comparison were added.  These are also convert optabs checking value 
and result type.

I had to introduce significant number of new patterns in i386 target to support 
new optabs.  The reason was vector compare was never expanded separately and 
always was a part of a vec_cond expansion.

As a result it's possible to use the sage GIMPLE representation for both vector 
and scalar masks target type.  Here is an example I used as a simple test:

  for (i=0; i 0.0f && t < 1.0e+2f)
  if (c[i] != 0)
c[i] = 1;
  }

Produced vector GIMPLE (before expand):

  vect_t_5.22_105 = MEM[base: _256, offset: 0B];
  mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
  mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
1.0e+2, 1.0e+2, 1.0e+2 };
  mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
  vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
  mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
  mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
  MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });

Produced assembler on AVX-512:

vmovups (%rdi), %zmm0
vcmpps  $25, %zmm5, %zmm0, %k1
vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
vmovdqa32   -64(%rdx), %zmm2{%k1}
vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
vmovdqa32   %zmm4, (%rcx){%k1}

Produced assembler on AVX-2:

vmovups (%rdx), %xmm1
vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1
vcmpltps%ymm1, %ymm3, %ymm0
vcmpltps%ymm5, %ymm1, %ymm1
vpand   %ymm0, %ymm1, %ymm0
vpmaskmovd  -32(%rcx), %ymm0, %ymm1
vpcmpeqd%ymm2, %ymm1, %ymm1
vpcmpeqd%ymm2