Re: [PATCH] avoid modifying type in attribute access handler [PR94098]

2020-03-18 Thread Jason Merrill via Gcc-patches

On 3/18/20 6:04 PM, Martin Sebor wrote:

On 3/16/20 3:10 PM, Jason Merrill wrote:

On 3/16/20 4:41 PM, Martin Sebor wrote:

The recent fix to avoid modifying in place the type argument in
handle_access_attribute (PR 92721) was incomplete and didn't fully
resolve the problem (an ICE in the C++ front-end).  The attached
patch removes the remaining modification that causes the ICE.  In
addition, the change adjusts checking calls to functions declared
with the attribute to scan all its instances.

The attached patch was tested on x86_64-linux.

I'm puzzled that the ICE only triggers in the C++ front-end and not
also in C.  The code that issues the internal error is in comptypes()
in cp/typeck.c and has this comment:

 /* The two types are structurally equivalent, but their
    canonical types were different. This is a failure of the
    canonical type propagation code.*/
 internal_error
   ("canonical types differ for identical types %qT and %qT",
    t1, t2);

What is "the canonical type propagation code" it refers to?


Generally, code that makes sure that TYPE_CANONICAL equality is 
equivalent to type identity, not any one specific place.



Is it valid to modify the type in an attribute handler


Only if (flags & ATTR_FLAG_TYPE_IN_PLACE).


If not, and if modifying a type in place is not valid I'd expect
decl_attributes to enforce it.  I looked for other attribute handlers
to see if any also modify the type argument in place (by adding or
removing attributes) but couldn't really find any.  So if it is
invalid I'd like to add such an assertion (probably for GCC 11) but
before I do I want to make sure I'm not missing something.


Generally messing with _ATTRIBUTES happens in decl_attributes: 
changing it directly if it's a DECL or ATTR_FLAG_TYPE_IN_PLACE, 
otherwise using build_type_attribute_variant.  If you need to do it in 
your handler, you should follow the same pattern.


That's the conclusion I came to as well, but thanks for confirming
it.  With the patch I don't need to make the change but since it's
not obvious that it's a no-no and since it's apparently only detected
under very special conditions I'm wondering is if it's possible to
detect it more reliably than only in C++ comptypes.  The trouble is
that I don't exactly what is allowed and what isn't and what to look
for to tell if the handler did something that's not allowed.

The C++ ICE triggered because the redeclared function's type is
considered the same as the original (structural_comptypes()
returns true) but the declarations' canonical types are different.
structural_comptypes() is C++-specific and I don't know what
alternative to call in the middle-end to compare the types and
get the equivalent result.


I think type_cache_hasher::equal is the closest, but it looks like it 
doesn't check everything.



PS As a data point, I found just two attribute handlers in
c-attribs.c that modify a type in place: one for attribute const
and the other noreturn.  They both do it for function pointers
to set the 'const' and 'noreturn' bits on the pointed to types,
and by calling build_type_variant.


Hmm, yes, that does sound like a bug.

Jason



Re: [PATCH] avoid -Wredundant-tags on a first declaration in use (PR 93824)

2020-03-18 Thread Jason Merrill via Gcc-patches

On 3/12/20 6:38 PM, Martin Sebor wrote:

On 3/12/20 11:03 AM, Martin Sebor wrote:

On 3/11/20 3:30 PM, Martin Sebor wrote:

On 3/11/20 2:10 PM, Jason Merrill wrote:

On 3/11/20 12:57 PM, Martin Sebor wrote:

On 3/9/20 6:08 PM, Jason Merrill wrote:

On 3/9/20 5:39 PM, Martin Sebor wrote:

On 3/9/20 1:40 PM, Jason Merrill wrote:

On 3/9/20 12:31 PM, Martin Sebor wrote:

On 2/28/20 1:24 PM, Jason Merrill wrote:

On 2/28/20 12:45 PM, Martin Sebor wrote:

On 2/28/20 9:58 AM, Jason Merrill wrote:

On 2/24/20 6:58 PM, Martin Sebor wrote:
-Wredundant-tags doesn't consider type declarations that 
are also
the first uses of the type, such as in 'void f (struct S);' 
and
issues false positives for those.  According to the 
reported that's

making it harder to use the warning to clean up LibreOffice.

The attached patch extends -Wredundant-tags to avoid these 
false
positives by relying on the same 
class_decl_loc_t::class2loc mapping
as -Wmismatched-tags.  The patch also somewhat improves the 
detection
of both issues in template declarations (though more work 
is still

needed there).


+ a new entry for it and return unless it's a 
declaration
+ involving a template that may need to be 
diagnosed by

+ -Wredundant-tags.  */
   *rdl = class_decl_loc_t (class_key, false, def_p);
-  return;
+  if (TREE_CODE (decl) != TEMPLATE_DECL)
+    return;


How can the first appearance of a class template be redundant?


I'm not sure I correctly understand the question.  The 
comment says
"involving a template" (i.e., not one of the first 
declaration of

a template).  The test case that corresponds to this test is:

   template  struct S7 { };
   struct S7 s7v;  // { dg-warning 
"\\\[-Wredundant-tags" }


where DECL is the TEPLATE_DECL of S7.

As I mentioned, more work is still needed to handle templates 
right
because some redundant tags are still not diagnosed.  For 
example:


   template  struct S7 { };
   template 
   using U = struct S7;   // missing warning


When we get here for an instance of a template, it doesn't 
make sense to treat it as a new type.


If decl is a template and type_decl is an instance of that 
template, do we want to (before the lookup) change type_decl 
to the template or the corresponding generic TYPE_DECL, which 
should already be in the table?


I'm struggling with how to do this.  Given type (a RECORD_TYPE) 
and
type_decl (a TEMPLATE_DECL) representing the use of a template, 
how

do I get the corresponding template (or its explicit or partial
specialization) in the three cases below?

   1) Instance of the primary:
  template  class A;
  struct A a;

   2) Instance of an explicit specialization:
  template  class B;
  template <> struct B;
  class B b;

   3) Instance of a partial specialization:
  template  class C;
  template  struct C;
  class C c;

By trial and (lots of) error I figured out that in both (1) and 
(2),

but not in (3), TYPE_MAIN_DECL (TYPE_TI_TEMPLATE (type)) returns
the template's type_decl.

Is there some function to call to get it in (3), or even better,
in all three cases?


I think you're looking for most_general_template.


I don't think that's quite what I'm looking for.  At least it 
doesn't

return the template or its specialization in all three cases above.


Ah, true, that function stops at specializations.  Oddly, I don't 
think there's currently a similar function that looks through 
them. You could create one that does a simple loop through 
DECL_TI_TEMPLATE like is_specialization_of.


Thanks for the tip.  Even with that I'm having trouble with partial
specializations.  For example in:

   template    struct S;
   template  class S;
   extern class  S s1;
   extern struct S s2;  // expect -Wmismatched-tags

how do I find the declaration of the partial specialization when given
the type in the extern declaration?  A loop in my find_template_for()
function (similar to is_specialization_of) only visits the implicit
specialization S (i.e., its own type) and the primary.


Is that a problem?  The name is from the primary template, so does 
it matter for this warning whether there's an explicit 
specialization involved?


I don't understand the question.  S is an instance of
the partial specialization.  To diagnose the right mismatch the warning
needs to know how to find the template (i.e., either the primary, or
the explicit or partial specialization) the instance corresponds to and
the class-key it was declared with.  As it is, while GCC does diagnose
the right declaration (that of s2), it does that thanks to a bug:
because it finds and uses the type and class-key used to declare s1.
If we get rid of s1 it doesn't diagnose anything.

I tried using DECL_TEMPLATE_SPECIALIZATIONS() to get the list of
the partial specializations but it doesn't like any of the arguments
I've given it (it ICEs).


With this fixed, here's the algorithm I tried:

1) for a type T of a template instantiation (s1 

Re: [PATCH V2] correct COUNT and PROB for unrolled loop

2020-03-18 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

Hi!

I'd like to ping following patch. As near end of gcc10 stage 4, it seems
I would ask approval for GCC11 trunk.

Thanks,
Jiufu Guo

> Hi Honza and all,
>
> I updated the patch a little as below. Bootstrap and regtest are ok
> on powerpc64le.
>
> Is OK for trunk?
>
> Thanks for comments.
> Jiufu
>
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 727e951..ded0046 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify-me.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "dumpfile.h"
> +#include "cfgrtl.h"
>  
>  static void copy_loops_to (class loop **, int,
>  class loop *);
> @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge 
> e,
> /* If original loop is executed COUNT_IN times, the unrolled
>loop will account SCALE_MAIN_DEN times.  */
> scale_main = count_in.probability_in (scale_main_den);
> +
> +   /* If we are guessing at the number of iterations and count_in
> +  becomes unrealistically small, reset probability.  */
> +   if (!(count_in.reliable_p () || loop->any_estimate))
> + {
> +   profile_count new_count_in = count_in.apply_probability 
> (scale_main);
> +   profile_count preheader_count = loop_preheader_edge (loop)->count 
> ();
> +   if (new_count_in.apply_scale (1, 10) < preheader_count)
> + scale_main = profile_probability::likely ();
> + }
> +
> scale_act = scale_main * prob_pass_main;
>   }
>else
>   {
> +   profile_count new_loop_count;
> profile_count preheader_count = e->count ();
> -   for (i = 0; i < ndupl; i++)
> - scale_main = scale_main * scale_step[i];
> scale_act = preheader_count.probability_in (count_in);
> +   /* Compute final preheader count after peeling NDUPL copies.  */
> +   for (i = 0; i < ndupl; i++)
> + preheader_count = preheader_count.apply_probability (scale_step[i]);
> +   /* Subtract out exit(s) from peeled copies.  */
> +   new_loop_count = count_in - (e->count () - preheader_count);
> +   scale_main = new_loop_count.probability_in (count_in);
>   }
>  }
>  
> @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edge 
> e,
> scale_bbs_frequencies (new_bbs, n, scale_act);
> scale_act = scale_act * scale_step[j];
>   }
> +
> +  /* Need to update PROB of exit edge and corresponding COUNT.  */
> +  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
> +   && bbs_to_scale)
> + {
> +   edge new_exit = new_spec_edges[SE_ORIG];
> +   profile_count new_count_in = new_exit->src->count;
> +   profile_count preheader_count = loop_preheader_edge (loop)->count ();
> +   edge e;
> +   edge_iterator ei;
> +
> +   FOR_EACH_EDGE (e, ei, new_exit->src->succs)
> + if (e != new_exit)
> +   break;
> +
> +   gcc_assert (e && e != new_exit);
> +
> +   new_exit->probability = preheader_count.probability_in (new_count_in);
> +   e->probability = new_exit->probability.invert ();
> +
> +   profile_count new_latch_count
> + = new_exit->src->count.apply_probability (e->probability);
> +   profile_count old_latch_count = e->dest->count;
> +
> +   EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
> + scale_bbs_frequencies_profile_count (new_bbs + i, 1,
> +  new_latch_count,
> +  old_latch_count);
> +
> +   if (current_ir_type () != IR_GIMPLE)
> + update_br_prob_note (e->src);
> + }
>  }
>free (new_bbs);
>free (orig_loops);
> diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c
> new file mode 100644
> index 000..f3b7c22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68212.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
> max-unroll-times=4 -fdump-rtl-alignments" } */
> +
> +void foo(long int *a, long int *b, long int n)
> +{
> +  long int i;
> +
> +  for (i = 0; i < n; i++)
> +a[i] = *b;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
> "alignments"} } */
> +


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-18 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Thursday, March 19, 2020 7:52 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi!
> 
> On Tue, Mar 17, 2020 at 02:05:19AM +, Yangfei (Felix) wrote:
> > > Trying 7 -> 8:
> > > 7: r99:SI=r103:SI>>0x8
> > >   REG_DEAD r103:SI
> > > 8: r100:SI=r99:SI&0x6
> > >   REG_DEAD r99:SI
> > > Failed to match this instruction:
> > > (set (reg:SI 100)
> > > (and:SI (lshiftrt:SI (reg:SI 103)
> > > (const_int 8 [0x8]))
> > > (const_int 6 [0x6])))
> > >
> > > That should match already, perhaps with a splitter.  aarch64 does
> > > not have very generic rotate-and-mask (or -insert) instructions, so
> > > the
> > > aarch64 backend needs to help combine with the less trivial cases.
> > >
> > > If you have a splitter for *this* one, all else will probably work
> > > "automatically": you split it to two ubfm, and the second of those
> > > can then merge into the compare instruction, and everything works out.
> >
> > Do you mean splitting the above pattern into a combination of ubfx and 
> > ubfiz?
> (Both are aliases of ubfm).
> 
> Sure.  The problem with aarch's bitfield instruction is that either the 
> source or
> the dest has to be right-aligned, which isn't natural for the compiler.
> 
> > I still don't see how the benefit can be achieved.
> > The following is the expected assembly for the test case:
> > tst x0, 1536
> > csetw0, ne
> > ret
> > This may not happen when the remaining ubfx is there.  Also what
> instruction be matched when ubfiz is merged into the compare?
> > Anything I missed?
> 
> The second insn could combine with the compare, and then that can combine
> back further.

Let me paste the RTL input to the combine phase:
/
(insn 6 3 7 2 (set (reg:SI 98)
(ashiftrt:SI (reg:SI 102)
(const_int 8 [0x8]))) "foo.c":3:16 742 
{*aarch64_ashr_sisd_or_int_si3}
 (expr_list:REG_DEAD (reg:SI 102)
(nil)))
(note 7 6 8 2 NOTE_INSN_DELETED)
(insn 8 7 9 2 (set (reg:CC_NZ 66 cc)
(compare:CC_NZ (and:SI (reg:SI 98)
(const_int 6 [0x6]))
(const_int 0 [0]))) "foo.c":5:8 698 {*andsi3nr_compare0}
 (expr_list:REG_DEAD (reg:SI 98)
(nil)))
(note 9 8 14 2 NOTE_INSN_DELETED)
(insn 14 9 15 2 (set (reg/i:SI 0 x0)
(ne:SI (reg:CC_NZ 66 cc)
(const_int 0 [0]))) "foo.c":10:1 494 {aarch64_cstoresi}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
*/

Two issues that I can see here:
1. When the ubfiz is combined with the compare, the combined insn does not 
necessarily mean a equality comparison with zero.  
  This is also the case when all the three insns (ubfx & ubfiz & compare) are 
combined together.  

2. Given that the patterns for ubfx and ubfiz are already not simple, I am 
afraid the pattern we got by combining the three would be much complex.
  And even more complex when further merged with insn 14 here in order to make 
sure that we are doing a equality comparison with zero.  

So it looks difficult when we go this port-specific way without matching a 
"zero_extact".  

> Another approach:
> 
> Trying 7 -> 9:
> 7: r99:SI=r103:SI>>0x8
>   REG_DEAD r103:SI
> 9: cc:CC_NZ=cmp(r99:SI&0x6,0)
>   REG_DEAD r99:SI
> Failed to match this instruction:
> (set (reg:CC_NZ 66 cc)
> (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
> (const_int 8 [0x8]))
> (const_int 6 [0x6]))
> (const_int 0 [0])))
> 
> This can be recognised as just that "tst" insn, no?  But combine (or
> simplify-rtx) should get rid of the shift here, just the "and" is simpler 
> after all (it
> just needs to change the constant for that).

No, this does not mean an equality comparison with zero.  I have mentioned this 
in my previous mail.  

Thanks,
Felix


Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Martin Sebor via Gcc-patches

On 3/18/20 1:04 PM, Jakub Jelinek wrote:

On Wed, Mar 18, 2020 at 12:57:18PM -0600, Martin Sebor via Gcc-patches wrote:

I noticed this last night:

   https://sourceware.org/pipermail/glibc-cvs/2020q1/069150.html

Presumably that's the fix.


Or maybe for REAL_TYPE just care here about TYPE_MODE which should be all
that matters?  If double and long double are the same, it isn't a big deal.
And similarly for INTEGER_TYPEs only care about TYPE_MODE/TYPE_PRECISION?
If unsigned long and unsigned long long are the same, why should we care?


There are a few reasons why diagnosing incompatible declarations
(of built-ins or any other kind) is helpful even for same size
types.

First, -Wbuiltin-declaration-mismatch is documented to "warn if
a built-in function is declared with an incompatible signature."
Distinct types like unsigned long and long long are incompatible
and redeclaring functions with incompatible argument types (or
return types) makes their signatures incompatible, and the code
undefined.  Detecting incompatibilities is the purpose of
the warning, irrespective of whether or not a subset of them
might be considered "big deal" in some situations[*].

Second, the TYPE_MODE test isn't sufficient to discriminate between
signed and unsigned types with the same precision, and those can
cause subtle bugs to go undetected.

Third, built-in type mismatches tend to cause us headaches (such
as ICEs) in parts of the compiler (the middle-end or other parts
of the front ends) that are updated with the assumption of type
compatibility in the C/C++ sense.  They are an unnecessary gotcha
to keep in mind that's easy to forget about or not get exactly
right and that leads to wasted resources: users or testers
reporting them as bugs and developers fixing them at the end
of each release.

The overly loose matching based on TYPE_MODE was in place before
GCC 8.  Since then, we have been tightening up these checks.  It
would be a step backward to change direction and start encouraging
sloppy code, glossing over latent bugs, and exposing ourselves to
more reports of ICEs.  Specialized projects like Glibc that have
a legitimate need for declaring symbols with incompatible types
have the option to disable either the warnings or the built-ins
themselves.  Nothing indicates that this practice is commonplace
or that GCC is too strict (it is still more permissive than
the same option in Clang that predates GCC's by a number of
years.)

Martin

[*] As a precedent for warning on similarly "benign" mismatches
consider issuing -Wformat when passing same size arguments to
directives like %i, %li, and %lli, or (for issuing "portability"
warnings for code that's fully valid and safe for the current
target), -Wchar-subscripts when -fno-signed-char is in effect.


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-18 Thread Segher Boessenkool
On Tue, Mar 17, 2020 at 03:32:34PM +, Michael Matz wrote:
> On Mon, 16 Mar 2020, Richard Sandiford wrote:
> > Similarly for non-call exceptions on other statements.  It sounds like 
> > what you're describing requires the corresponding definition to happen 
> > for memory outputs regardless of whether the asm throws or not, so that 
> > the memory appears to change on both excecution paths.  Otherwise, the 
> > compiler would be able to assume that the memory operand still has its 
> > original value in the exception handler.
> 
> Well, it's both: on the exception path the compiler has to assume that the 
> the value wasn't changed (so that former defines are regarded as dead) or 
> that it already has changed (so that the effects the throwing 
> "instruction" had on the result (if any) aren't lost).  The easiest for 
> this is to regard the result place as also being an input.
> 
> (If broadened to all instructions under -fnon-call-exceptions, and not 
> just to asms will have quite a bad effect on optimization capabilities, 
> but I believe with enough force it's already possible now to construct 
> miscompiling testcases with the right mixtures of return types and ABIs)

It's a tradeoff: do we want this to work for almost no one and get PRs
that we cannot solve, or do we generate slightly worse assembler code
for -fnon-call-exceptions?  I don't think this is a difficult decision
to make, considering that you already get pretty bad performance with
that flag (if indeed it works correctly at all).


Segher


Re: [PATCH] RISC-V: Using fmv.x.w/fmv.w.x rather than fmv.x.s/fmv.s.x

2020-03-18 Thread Maciej W. Rozycki via Gcc-patches
On Wed, 18 Mar 2020, Jim Wilson wrote:

> >  The new mnemonics have been supported by GAS for a little while now and
> > the old ones have been retained, however this is still a change that
> > breaks backwards compatibility.  So I wonder if we shouldn't have an
> > autoconf test included for this feature, and either resort to wiring GCC
> > to keep using the old mnemonics or bail out at GCC compilation time if
> > GAS is found not to handle the new ones.
> >
> >  At the very least I think we ought to document the minimum version of
> > binutils now required by GCC for RISC-V support.
> 
> The new opcodes were added to gas in 2017-09-27, and I can't recommend
> using any binutils or gcc release that predates 2018-01-01 because
> they are all known to be buggy, or incompatible with the current ISA
> definition.  So I don't see any need for a configure test for this
> change.  Anyone missing the new instructions in gas has bigger
> problems to worry about.

 Fair enough.

> As for the minimum binutils version, I would strongly recommend the
> most recent one released before the gcc release that you are using,
> though it is likely than anything back to 2018-01-01 would work, just
> not as well.

 For me it's not an issue as I actively work on the toolchain and keep all 
checkouts close to the current tips of the respective master branches.  
However binary package maintainers or end users of the toolchain need to 
know the dependencies between component versions whether they want to 
build the pieces from sources or combine them from prebuilt packages.

 Our installation instructions state binutils 2.28 as the requirement for 
all the RISC-V targets, however the change for fmv.x.w/fmv.w.x instruction 
support was only added in the binutils 2.30 development cycle.

  Maciej


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-18 Thread Segher Boessenkool
Hi!

On Tue, Mar 17, 2020 at 02:05:19AM +, Yangfei (Felix) wrote:
> > Trying 7 -> 8:
> > 7: r99:SI=r103:SI>>0x8
> >   REG_DEAD r103:SI
> > 8: r100:SI=r99:SI&0x6
> >   REG_DEAD r99:SI
> > Failed to match this instruction:
> > (set (reg:SI 100)
> > (and:SI (lshiftrt:SI (reg:SI 103)
> > (const_int 8 [0x8]))
> > (const_int 6 [0x6])))
> > 
> > That should match already, perhaps with a splitter.  aarch64 does not have
> > very generic rotate-and-mask (or -insert) instructions, so the
> > aarch64 backend needs to help combine with the less trivial cases.
> > 
> > If you have a splitter for *this* one, all else will probably work
> > "automatically": you split it to two ubfm, and the second of those can then
> > merge into the compare instruction, and everything works out.
> 
> Do you mean splitting the above pattern into a combination of ubfx and ubfiz? 
>  (Both are aliases of ubfm).  

Sure.  The problem with aarch's bitfield instruction is that either the
source or the dest has to be right-aligned, which isn't natural for the
compiler.

> I still don't see how the benefit can be achieved.  
> The following is the expected assembly for the test case:  
> tst x0, 1536
> csetw0, ne
> ret
> This may not happen when the remaining ubfx is there.  Also what instruction 
> be matched when ubfiz is merged into the compare?  
> Anything I missed?  

The second insn could combine with the compare, and then that can combine
back further.

Another approach:

Trying 7 -> 9:
7: r99:SI=r103:SI>>0x8
  REG_DEAD r103:SI
9: cc:CC_NZ=cmp(r99:SI&0x6,0)
  REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:CC_NZ 66 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))

This can be recognised as just that "tst" insn, no?  But combine (or
simplify-rtx) should get rid of the shift here, just the "and" is
simpler after all (it just needs to change the constant for that).

> Also it's interesting to see how this may affect on those archs.  

Yes.  Which is why "canonicalisation" rules that are really just "this
works better on targets A and B" do not usually work well.  Rules that
are predictable and that actually simplify the code might still need
all targets to update (and target maintainers will grumble, myself
included), but at least that is a way forwards (and not backwards or
sideways).


Segher


[committed] libstdc++: Fix is_trivially_constructible (PR 94033)

2020-03-18 Thread Jonathan Wakely via Gcc-patches
This attempts to make is_nothrow_constructible more robust (and
efficient to compile) by not depending on is_constructible. Instead the
__is_constructible intrinsic is used directly. The helper class
__is_nt_constructible_impl which checks whether the construction is
non-throwing now takes a bool template parameter that is substituted by
the result of the instrinsic. This fixes the reported bug by not using
the already-instantiated (and incorrect) value of std::is_constructible.
I don't think it really fixes the problem in general, because
std::is_nothrow_constructible itself could already have been
instantiated in a context where it gives the wrong result. A proper fix
needs to be done in the compiler.

PR libstdc++/94033
* include/std/type_traits (__is_nt_default_constructible_atom): Remove.
(__is_nt_default_constructible_impl): Remove.
(__is_nothrow_default_constructible_impl): Remove.
(__is_nt_constructible_impl): Add bool template parameter. Adjust
partial specializations.
(__is_nothrow_constructible_impl): Replace class template with alias
template.
(is_nothrow_default_constructible): Derive from alias template
__is_nothrow_constructible_impl instead of
__is_nothrow_default_constructible_impl.
* testsuite/20_util/is_nothrow_constructible/94003.cc: New test.

Tested powerpc64le-linux, committed to master.

commit b3341826531e80e02f194460b4fbe1b0541c0463
Author: Jonathan Wakely 
Date:   Wed Mar 18 23:19:12 2020 +

libstdc++: Fix is_trivially_constructible (PR 94033)

This attempts to make is_nothrow_constructible more robust (and
efficient to compile) by not depending on is_constructible. Instead the
__is_constructible intrinsic is used directly. The helper class
__is_nt_constructible_impl which checks whether the construction is
non-throwing now takes a bool template parameter that is substituted by
the result of the instrinsic. This fixes the reported bug by not using
the already-instantiated (and incorrect) value of std::is_constructible.
I don't think it really fixes the problem in general, because
std::is_nothrow_constructible itself could already have been
instantiated in a context where it gives the wrong result. A proper fix
needs to be done in the compiler.

PR libstdc++/94033
* include/std/type_traits (__is_nt_default_constructible_atom): 
Remove.
(__is_nt_default_constructible_impl): Remove.
(__is_nothrow_default_constructible_impl): Remove.
(__is_nt_constructible_impl): Add bool template parameter. Adjust
partial specializations.
(__is_nothrow_constructible_impl): Replace class template with alias
template.
(is_nothrow_default_constructible): Derive from alias template
__is_nothrow_constructible_impl instead of
__is_nothrow_default_constructible_impl.
* testsuite/20_util/is_nothrow_constructible/94003.cc: New test.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 14aa2b37a4f..68abf148a38 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -961,61 +961,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"template argument must be a complete class or an unbounded array");
 };
 
-  template
-struct __is_nt_default_constructible_atom
-: public integral_constant
+  template
+struct __is_nt_constructible_impl
+: public false_type
 { };
 
-  template::value>
-struct __is_nt_default_constructible_impl;
-
-  template
-struct __is_nt_default_constructible_impl<_Tp, true>
-: public __and_<__is_array_known_bounds<_Tp>,
-   __is_nt_default_constructible_atom::type>>
-{ };
-
-  template
-struct __is_nt_default_constructible_impl<_Tp, false>
-: public __is_nt_default_constructible_atom<_Tp>
-{ };
-
-  template
-using __is_nothrow_default_constructible_impl
-  = __and_<__is_constructible_impl<_Tp>,
-  __is_nt_default_constructible_impl<_Tp>>;
-
-  /// is_nothrow_default_constructible
-  template
-struct is_nothrow_default_constructible
-: public __is_nothrow_default_constructible_impl<_Tp>::type
-{
-  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
-   "template argument must be a complete class or an unbounded array");
-};
-
   template
-struct __is_nt_constructible_impl
-: public integral_constant()...))>
+struct __is_nt_constructible_impl
+: public __bool_constant()...))>
 { };
 
   template
-struct __is_nt_constructible_impl<_Tp, _Arg>
-: public integral_constant(declval<_Arg>()))>
+struct __is_nt_constructible_impl
+: public __bool_constant(std::declval<_Arg>()))>
 { };
 
   template
-struct __is_nt_constructible_impl<_Tp>
-: public 

Re: Re: Re: [PATCH, rs6000] Add command line and builtin compatibility

2020-03-18 Thread Carl Love via Gcc-patches
Segher:

> 
> Yes, but only for this fprnd vs. 2.06 (vsx) situation.  Like we
> already
> have:
> 
>   if (TARGET_DIRECT_MOVE && !TARGET_VSX)
> {
>   if (rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
> error ("%qs requires %qs", "-mdirect-move", "-mvsx");
>   rs6000_isa_flags &= ~OPTION_MASK_DIRECT_MOVE;
> }
> 
> (and many other cases there), we could do this there as well (so,
> don't
> allow -mvsx (maybe via a -mcpu= etc.) at the same time as -mno-
> fprnd).
> 

I redid the patch to try and make it more general.  It looks to me like
TARGET_VSX is set for Power 7 and newer.  I setup a test similar to the
example checking TARGET_VSX. So if you are on a Power 7 then -mvsx is
set for you, i.e. the user would not have to explicitly use the option.
My objection to the error message in the example is that the user
wouldn't necessarily know what processor or ISA is implied by -mvsx. 
So in my error message I called out the processor number.  We could do
it based on ISA.  I figure the user is more likely to know the
processor version then the ISA level supported by the processor so went
with the processor number in the patch.  Thoughts?

gcc -mno-fprnd -g -mcpu=power7 -c vsx-builtin-3.c
cc1: error: ‘-mno-fprnd’ not compatible with Power 7 and newer

 Carl Love


---
>From 212d2521437e7c32801b851bf9e23a9a12418de0 Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Wed, 11 Mar 2020 14:33:31 -0500
Subject: [PATCH] rs6000: Add command line and builtin compatibility check

PR/target 87583

gcc/ChangeLog

2020-03-18  Carl Love  

* gcc/config/rs6000/rs6000.c (altivec_resolve_overloaded_builtin):
Add check for TARGET_FRND for Power 7 or newer.
---
 gcc/config/rs6000/rs6000.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ac9455e3b7c..5c72a863dbf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3716,6 +3716,14 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_isa_flags &= ~OPTION_MASK_CRYPTO;
 }
 
+  if (!TARGET_FPRND && TARGET_VSX)
+{
+  if (rs6000_isa_flags_explicit & OPTION_MASK_FPRND)
+   /* TARGET_VSX = 1 implies Power 7 and newer */
+   error ("%qs not compatible with Power 7 and newer", "-mno-fprnd");
+  rs6000_isa_flags &= ~OPTION_MASK_FPRND;
+}
+
   if (TARGET_DIRECT_MOVE && !TARGET_VSX)
 {
   if (rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
-- 
2.17.1




Re: [PATCH] RISC-V: Using fmv.x.w/fmv.w.x rather than fmv.x.s/fmv.s.x

2020-03-18 Thread Jim Wilson
On Tue, Mar 17, 2020 at 2:42 PM Maciej W. Rozycki  wrote:
> On Tue, 18 Feb 2020, Kito Cheng wrote:
> >  - fmv.x.s/fmv.s.x renamed to fmv.x.w/fmv.w.x in the latest RISC-V ISA
> >manual.
>
>  The new mnemonics have been supported by GAS for a little while now and
> the old ones have been retained, however this is still a change that
> breaks backwards compatibility.  So I wonder if we shouldn't have an
> autoconf test included for this feature, and either resort to wiring GCC
> to keep using the old mnemonics or bail out at GCC compilation time if
> GAS is found not to handle the new ones.
>
>  At the very least I think we ought to document the minimum version of
> binutils now required by GCC for RISC-V support.

The new opcodes were added to gas in 2017-09-27, and I can't recommend
using any binutils or gcc release that predates 2018-01-01 because
they are all known to be buggy, or incompatible with the current ISA
definition.  So I don't see any need for a configure test for this
change.  Anyone missing the new instructions in gas has bigger
problems to worry about.

Speaking of which, the ISA is unfortunately still making the
occasional backwards incompatible change, though I and others keep
complaining about that.  There was a break between the privilege spec
1.9 and 1.9.1, and there was a break between the priv spec 1.9.1 and
1.11.  Though I'm told that the goal is no breaks from priv spec 1.10
forward, and that was released 2017-05-17 so we can't properly support
any priv spec predating that.  Fortunately the priv spec only affects
people doing OS level work, or bare metal work.  But there have been
some unpriv isa spec breaks too though not in any critical areas.  For
instance some instructions like fence.i and the csr* insns have been
moved out of the base ISA into extensions and we haven't decided how
to handle that yet.  binutils and gcc still think they are part of the
base ISA.  The syntax for specifying architecture extensions changed,
sx extensions were dropped, and it used to be that x came before s but
now s comes before x.  We decided to drop support for ISA strings
supported before the 2019-12-13 unpriv spec because there were no
known uses of s or sx extensions that would be affected by the change,
and it was too complicated trying to support both the old and new
syntax.  I realize that you would like perfect compatibility, but that
won't be possible until the RISC-V ecosystem is more mature.  At least
for the linux support, we are being very careful not to change
anything that would break linux.  That is just for rv64 linux though.
rv32 linux is not upstream yet, and still adding breaking changes
because of Y2038 work.  There was a very minor ABI change last year
that affects rv64 linux, but it was obscure enough that no one testing
gcc-10 seems to have been affected by it.  There are also no official
distro releases that we need backward compatibility with yet.

As for the minimum binutils version, I would strongly recommend the
most recent one released before the gcc release that you are using,
though it is likely than anything back to 2018-01-01 would work, just
not as well.

Jim


Re: [stage1][PATCH] optgen: make more sanity checks for enums.

2020-03-18 Thread Joseph Myers
On Wed, 18 Mar 2020, Martin Liška wrote:

> On 3/17/20 11:41 PM, Martin Sebor wrote:
> > The script reports errors by emitting them as #error directives into
> > standard output (so they cause the build to fail). Should this new
> > routine do the same thing?  (/dev/stderr is also not available on all
> > flavors of UNIX but I'm not sure how much that matters here.)
> 
> Good point Martin. Yes, #error emission works fine here:
> 
> ./options.h:1:2: error: #error Empty option argument 'Enum' during parsing of:
> Enum (diagnostic_prefixing_rule) String(once)
> Value(DIAGNOSTICS_SHOW_PREFIX_ONCE)
> 1 | #error Empty option argument 'Enum' during parsing of: Enum
> (diagnostic_prefixing_rule) String(once) Value(DIAGNOSTICS_SHOW_PREFIX_ONCE)
>   |  ^
> 
> There's updated version of the patch.

This version is also OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2][gcc] libgccjit: handle long literals in playback::context::new_string_literal

2020-03-18 Thread Andrea Corallo
Andrea Corallo  writes:

> Hi all,
>
> second version of the patch for the 200 characters limit for literal
> strings addressing comments.
>
> make check-jit is passing clean.
>
> Best Regards
>   Andrea
>
> gcc/jit/ChangeLog
> 2020-??-??  Andrea Corallo  
>
>   * jit-playback.h
>   (gcc::jit::playback::context m_recording_ctxt): Remove
>   m_char_array_type_node field.
>   * jit-playback.c
>   (playback::context::context) Remove m_char_array_type_node from member
>   initializer list.
>   (playback::context::new_string_literal) Fix logic to handle string
>   length > 200.
>
> gcc/testsuite/ChangeLog
> 2020-??-??  Andrea Corallo  
>
>   * jit.dg/all-non-failing-tests.h: Add test-long-string-literal.c.
>   * jit.dg/test-long-string-literal.c: New testcase.

Kind ping, okay for trunk?


Re: [PATCH V3][gcc] libgccjit: introduce version entry points

2020-03-18 Thread Andrea Corallo
Hi all,

Updated version of the patch mainly addressing comments on the
concurrency issues.

I came to the conclusions that the caching should be done in the
function that we decide to be thread safe.  However I haven't touched
parse_basever in any direction in the hope of having this still in
stage4.  As result I've mostly applied the mutex solution.

'make check-jit' runs clean

Bests

  Andrea

gcc/jit/ChangeLog
2020-??-??  Andrea Corallo  
David Malcolm  

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_13): New ABI tag
plus add version paragraph.
* libgccjit++.h (namespace gccjit::version): Add new namespace.
* libgccjit.c (gcc_jit_version_major, gcc_jit_version_minor)
(gcc_jit_version_patchlevel): New functions.
* libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_version): New macro.
(gcc_jit_version_major, gcc_jit_version_minor)
(gcc_jit_version_patchlevel): New functions.
* libgccjit.map (LIBGCCJIT_ABI_13) New ABI tag.

gcc/testsuite/ChangeLog
2020-??-??  Andrea Corallo  

* jit.dg/test-version.c: New testcase.
* jit.dg/all-non-failing-tests.h: Add test-version.c.
>From ad86baf8472c6684aed9b62652922a83e147952a Mon Sep 17 00:00:00 2001
From: AndreaCorallo 
Date: Sun, 8 Mar 2020 13:46:33 +
Subject: [PATCH] Add new version entry point

---
 gcc/jit/docs/topics/compatibility.rst| 33 ++
 gcc/jit/libgccjit++.h| 22 ++
 gcc/jit/libgccjit.c  | 46 
 gcc/jit/libgccjit.h  | 16 +++
 gcc/jit/libgccjit.map|  9 +++-
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  7 +++
 gcc/testsuite/jit.dg/test-version.c  | 26 +++
 7 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/jit.dg/test-version.c

diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index a6faee0810e..0c0ce070d72 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -61,6 +61,28 @@ You can see the symbol tags provided by libgccjit.so using ``objdump``:
LIBGCCJIT_ABI_0
[...snip...]
 
+Programmatically checking version
+***
+
+Client code can programmatically check libgccjit version using:
+
+.. function::  int gcc_jit_version_major (void)
+
+   Return libgccjit major version.  This is analogous to __GNUC__ in C code.
+
+.. function::  int gcc_jit_version_minor (void)
+
+   Return libgccjit minor version.  This is analogous to
+   __GNUC_MINOR__ in C code.
+
+.. function::  int gcc_jit_version_patchlevel (void)
+
+   Return libgccjit patchlevel version.  This is analogous to
+   __GNUC_PATCHLEVEL__ in C code.
+
+.. note:: These entry points has been added with ``LIBGCCJIT_ABI_13``
+  (see below).
+
 ABI symbol tags
 ***
 
@@ -182,3 +204,14 @@ entrypoints:
 
 ``LIBGCCJIT_ABI_12`` covers the addition of
 :func:`gcc_jit_context_new_bitfield`
+
+``LIBGCCJIT_ABI_13``
+
+``LIBGCCJIT_ABI_13`` covers the addition of version functions via API
+entrypoints:
+
+  * :func:`gcc_jit_version_major`
+
+  * :func:`gcc_jit_version_minor`
+
+  * :func:`gcc_jit_version_patchlevel`
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index 82a62d614c5..69e67766640 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -49,6 +49,8 @@ namespace gccjit
   class timer;
   class auto_time;
 
+  namespace version {};
+
   /* Errors within the API become C++ exceptions of this class.  */
   class error
   {
@@ -1913,6 +1915,26 @@ auto_time::~auto_time ()
   m_timer.pop (m_item_name);
 }
 
+namespace version
+{
+inline int
+major_v ()
+{
+  return gcc_jit_version_major ();
+}
+
+inline int
+minor_v ()
+{
+  return gcc_jit_version_minor ();
+}
+
+inline int
+patchlevel_v ()
+{
+  return gcc_jit_version_patchlevel ();
+}
+} // namespace version
 } // namespace gccjit
 
 #endif /* #ifndef LIBGCCJIT_PLUS_PLUS_H */
diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
index 21a0dc09b03..1c5a12e9c01 100644
--- a/gcc/jit/libgccjit.h
+++ b/gcc/jit/libgccjit.h
@@ -1487,6 +1487,22 @@ gcc_jit_context_new_rvalue_from_vector (gcc_jit_context *ctxt,
 	size_t num_elements,
 	gcc_jit_rvalue **elements);
 
+#define LIBGCCJIT_HAVE_gcc_jit_version
+
+/* Functions to retrive libgccjit version.
+   Analogous to __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ in C code.
+
+   These API entrypoints were added in LIBGCCJIT_ABI_13; you can test for their
+   presence using
+ #ifdef LIBGCCJIT_HAVE_gcc_jit_version
+ */
+extern int
+gcc_jit_version_major (void);
+extern int
+gcc_jit_version_minor (void);
+extern int
+gcc_jit_version_patchlevel (void);
+
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 83055fc297b..a29e9885e59 100644
--- a/gcc/jit/libgccjit.c
+++ 

Re: [PATCH] c++: Fix constexpr evaluation of self-modifying CONSTRUCTORs [PR94066]

2020-03-18 Thread Patrick Palka via Gcc-patches
On Wed, 18 Mar 2020, Jason Merrill wrote:

> On 3/18/20 11:58 AM, Patrick Palka wrote:
> > On Wed, 18 Mar 2020, Patrick Palka wrote:
> > 
> > > On Tue, 17 Mar 2020, Jason Merrill wrote:
> > > 
> > > > On 3/16/20 1:39 PM, Patrick Palka wrote:
> > > > > In this PR, we are performing constexpr evaluation of a CONSTRUCTOR of
> > > > > type
> > > > > union U which looks like
> > > > > 
> > > > > {.a=foo (&)}.
> > > > > 
> > > > > Since the function foo takes a reference to the CONSTRUCTOR we're
> > > > > building,
> > > > > it
> > > > > could potentially modify the CONSTRUCTOR from under us.  In particular
> > > > > since
> > > > > U
> > > > > is a union, the evaluation of a's initializer could change the active
> > > > > member
> > > > > from a to another member -- something which cxx_eval_bare_aggregate
> > > > > doesn't
> > > > > expect to happen.
> > > > > 
> > > > > Upon further investigation, it turns out this issue is not limited to
> > > > > constructors of UNION_TYPE and not limited to cxx_eval_bare_aggregate
> > > > > either.
> > > > > For example, within cxx_eval_store_expression we may be evaluating an
> > > > > assignment
> > > > > such as (this comes from the test pr94066-2.C):
> > > > > 
> > > > > ((union U *) this)->a = TARGET_EXPR  > > > > this)>;
> > > > 
> > > > I assume this is actually an INIT_EXPR, or we would have preevaluated
> > > > and not
> > > > had this problem.
> > > 
> > > Yes exactly, I should have specified that the above is an INIT_EXPR and
> > > not a MODIFY_EXPR.
> > > 
> > > > 
> > > > > where evaluation of foo could change the active member of *this, which
> > > > > was
> > > > > set
> > > > > earlier in cxx_eval_store_expression to 'a'.  And if U is a
> > > > > RECORD_TYPE,
> > > > > then
> > > > > evaluation of foo could add new fields to *this, thereby making stale
> > > > > the
> > > > > 'valp'
> > > > > pointer to the target constructor_elt through which we're later
> > > > > assigning.
> > > > > 
> > > > > So in short, it seems that both cxx_eval_bare_aggregate and
> > > > > cxx_eval_store_expression do not anticipate that a constructor_elt's
> > > > > initializer
> > > > > could modify the underlying CONSTRUCTOR as a side-effect.
> > > > 
> > > > Oof.  If this is well-formed, it's because initialization of a doesn't
> > > > actually start until the return statement of foo, so we're probably
> > > > wrong to
> > > > create a CONSTRUCTOR to hold the value of 'a' before evaluating foo.
> > > > Perhaps
> > > > init_subob_ctx shouldn't preemptively create a CONSTRUCTOR, and
> > > > similarly for
> > > > the cxx_eval_store_expression !preeval code.
> > > 
> > > Hmm, I think I see what you mean.  I'll look into this.
> > 
> > In cpp0x/constexpr-array12.C we have
> > 
> >  struct A { int ar[3]; };
> >  constexpr A a1 = { 0, a1.ar[0] };
> > 
> > the initializer for a1 is a CONSTRUCTOR with the form
> > 
> >  {.ar={0, (int) VIEW_CONVERT_EXPR(a1).ar[0]}}
> > 
> > If we don't preemptively create a CONSTRUCTOR in cxx_eval_bare_aggregate
> > to hold the value of 'ar' before evaluating its initializer, then we
> > won't be able to resolve the 'a1.ar[0]' later on, and we will reject
> > this otherwise valid test case with an "accessing an uninitialized array
> > element" diagnostic.  So it seems we need to continue creating a
> > CONSTRUCTOR in cxx_eval_bare_aggregate before evaluating the initializer
> > of an aggregate sub-object to handle self-referential CONSTRUCTORs like
> > the one above.
> > 
> > Then again, clang is going with rejecting the original testcase with the
> > following justification: https://bugs.llvm.org/show_bug.cgi?id=45133#c1
> > Should we follow suit?
> 
> Yes, let's.  There's no need to bend over backward to allow this kind of
> pathological testcase.  Hubert is proposing that the initialization of u.a
> starts with the call to foo, and the testcase has undefined behavior because
> it ends the lifetime of u.a in the middle of its initialization.

Got it.  I filed PR c++/94219 to track the testcase that uses a struct
in place of the union, on which we also ICE but this ICE is not a
regression from what I can tell.  For GCC 10 I'll work on a minimal
patch to reject the original testcase from PR c++/94066.



[PING][PATCH] avoid -Wredundant-tags on a first declaration in use (PR 93824)

2020-03-18 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541962.html

On 3/12/20 4:38 PM, Martin Sebor wrote:

On 3/12/20 11:03 AM, Martin Sebor wrote:

On 3/11/20 3:30 PM, Martin Sebor wrote:

On 3/11/20 2:10 PM, Jason Merrill wrote:

On 3/11/20 12:57 PM, Martin Sebor wrote:

On 3/9/20 6:08 PM, Jason Merrill wrote:

On 3/9/20 5:39 PM, Martin Sebor wrote:

On 3/9/20 1:40 PM, Jason Merrill wrote:

On 3/9/20 12:31 PM, Martin Sebor wrote:

On 2/28/20 1:24 PM, Jason Merrill wrote:

On 2/28/20 12:45 PM, Martin Sebor wrote:

On 2/28/20 9:58 AM, Jason Merrill wrote:

On 2/24/20 6:58 PM, Martin Sebor wrote:
-Wredundant-tags doesn't consider type declarations that 
are also
the first uses of the type, such as in 'void f (struct S);' 
and
issues false positives for those.  According to the 
reported that's

making it harder to use the warning to clean up LibreOffice.

The attached patch extends -Wredundant-tags to avoid these 
false
positives by relying on the same 
class_decl_loc_t::class2loc mapping
as -Wmismatched-tags.  The patch also somewhat improves the 
detection
of both issues in template declarations (though more work 
is still

needed there).


+ a new entry for it and return unless it's a 
declaration
+ involving a template that may need to be 
diagnosed by

+ -Wredundant-tags.  */
   *rdl = class_decl_loc_t (class_key, false, def_p);
-  return;
+  if (TREE_CODE (decl) != TEMPLATE_DECL)
+    return;


How can the first appearance of a class template be redundant?


I'm not sure I correctly understand the question.  The 
comment says
"involving a template" (i.e., not one of the first 
declaration of

a template).  The test case that corresponds to this test is:

   template  struct S7 { };
   struct S7 s7v;  // { dg-warning 
"\\\[-Wredundant-tags" }


where DECL is the TEPLATE_DECL of S7.

As I mentioned, more work is still needed to handle templates 
right
because some redundant tags are still not diagnosed.  For 
example:


   template  struct S7 { };
   template 
   using U = struct S7;   // missing warning


When we get here for an instance of a template, it doesn't 
make sense to treat it as a new type.


If decl is a template and type_decl is an instance of that 
template, do we want to (before the lookup) change type_decl 
to the template or the corresponding generic TYPE_DECL, which 
should already be in the table?


I'm struggling with how to do this.  Given type (a RECORD_TYPE) 
and
type_decl (a TEMPLATE_DECL) representing the use of a template, 
how

do I get the corresponding template (or its explicit or partial
specialization) in the three cases below?

   1) Instance of the primary:
  template  class A;
  struct A a;

   2) Instance of an explicit specialization:
  template  class B;
  template <> struct B;
  class B b;

   3) Instance of a partial specialization:
  template  class C;
  template  struct C;
  class C c;

By trial and (lots of) error I figured out that in both (1) and 
(2),

but not in (3), TYPE_MAIN_DECL (TYPE_TI_TEMPLATE (type)) returns
the template's type_decl.

Is there some function to call to get it in (3), or even better,
in all three cases?


I think you're looking for most_general_template.


I don't think that's quite what I'm looking for.  At least it 
doesn't

return the template or its specialization in all three cases above.


Ah, true, that function stops at specializations.  Oddly, I don't 
think there's currently a similar function that looks through 
them. You could create one that does a simple loop through 
DECL_TI_TEMPLATE like is_specialization_of.


Thanks for the tip.  Even with that I'm having trouble with partial
specializations.  For example in:

   template    struct S;
   template  class S;
   extern class  S s1;
   extern struct S s2;  // expect -Wmismatched-tags

how do I find the declaration of the partial specialization when given
the type in the extern declaration?  A loop in my find_template_for()
function (similar to is_specialization_of) only visits the implicit
specialization S (i.e., its own type) and the primary.


Is that a problem?  The name is from the primary template, so does 
it matter for this warning whether there's an explicit 
specialization involved?


I don't understand the question.  S is an instance of
the partial specialization.  To diagnose the right mismatch the warning
needs to know how to find the template (i.e., either the primary, or
the explicit or partial specialization) the instance corresponds to and
the class-key it was declared with.  As it is, while GCC does diagnose
the right declaration (that of s2), it does that thanks to a bug:
because it finds and uses the type and class-key used to declare s1.
If we get rid of s1 it doesn't diagnose anything.

I tried using DECL_TEMPLATE_SPECIALIZATIONS() to get the list of
the partial specializations but it doesn't like any of the arguments
I've given it (it ICEs).


With this fixed, here's the 

[committed] [PR rtl/90275] Complete recent change

2020-03-18 Thread Jeff Law via Gcc-patches
So I must have botched this when I hand-applied Richard's patch and use that to
generate a new one (I went back to his original and verified he got it right).

We had a test like 

  && MEM_P (whatever) 

ANd wanted to include regs  ie

  && (MEM_P (whatever) || REG_P (whatever))

I added the latter, but didn't remove the former.  As a result the testcase 
still
failed.

This applies the obvious bit to remove the && MEM_P (whatever) line.

You could legitimately ask why the tester didn't flag the failure.  The tester
only looks for regressions.  A new test that fails is ignored.  I'd like to
change that one day, but for now that's where we are to avoid excessive noise.

Anyway, I put the attached patch into my tester last week.  And:

http://gcc.gnu.org/jenkins/job/arm-linux-gnueabi/962/console

Tests that now work, but didn't before (1 tests):

gcc.c-torture/compile/pr90275.c   -O3 -g  (test for excess errors)

Committing to the trunk as obvious.

Jeff
commit 529ea7d9596b26ba103578eeab448e9862a2d2c5
Author: Jeff Law 
Date:   Wed Mar 18 16:07:28 2020 -0600

Complete change to resolve pr90275.

PR rtl-optimization/90275
* cse.c (cse_insn): Delete no-op register moves too.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8694f272a9c..3a2e491113e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-03-12  Richard Sandiford  
+
+   PR rtl-optimization/90275
+   * cse.c (cse_insn): Delete no-op register moves too.
+
 2020-03-18  Martin Sebor  
 
PR ipa/92799
diff --git a/gcc/cse.c b/gcc/cse.c
index 08984c17040..3e8724b3fed 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5329,7 +5329,6 @@ cse_insn (rtx_insn *insn)
  else if (n_sets == 1
   && !CALL_P (insn)
   && (MEM_P (trial) || REG_P (trial))
-  && MEM_P (dest)
   && rtx_equal_p (trial, dest)
   && !side_effects_p (dest)
   && (cfun->can_delete_dead_exceptions


Re: [PATCH] avoid modifying type in attribute access handler [PR94098]

2020-03-18 Thread Martin Sebor via Gcc-patches

On 3/16/20 3:10 PM, Jason Merrill wrote:

On 3/16/20 4:41 PM, Martin Sebor wrote:

The recent fix to avoid modifying in place the type argument in
handle_access_attribute (PR 92721) was incomplete and didn't fully
resolve the problem (an ICE in the C++ front-end).  The attached
patch removes the remaining modification that causes the ICE.  In
addition, the change adjusts checking calls to functions declared
with the attribute to scan all its instances.

The attached patch was tested on x86_64-linux.

I'm puzzled that the ICE only triggers in the C++ front-end and not
also in C.  The code that issues the internal error is in comptypes()
in cp/typeck.c and has this comment:

 /* The two types are structurally equivalent, but their
    canonical types were different. This is a failure of the
    canonical type propagation code.*/
 internal_error
   ("canonical types differ for identical types %qT and %qT",
    t1, t2);

What is "the canonical type propagation code" it refers to?


Generally, code that makes sure that TYPE_CANONICAL equality is 
equivalent to type identity, not any one specific place.



Is it valid to modify the type in an attribute handler


Only if (flags & ATTR_FLAG_TYPE_IN_PLACE).


If not, and if modifying a type in place is not valid I'd expect
decl_attributes to enforce it.  I looked for other attribute handlers
to see if any also modify the type argument in place (by adding or
removing attributes) but couldn't really find any.  So if it is
invalid I'd like to add such an assertion (probably for GCC 11) but
before I do I want to make sure I'm not missing something.


Generally messing with _ATTRIBUTES happens in decl_attributes: changing 
it directly if it's a DECL or ATTR_FLAG_TYPE_IN_PLACE, otherwise using 
build_type_attribute_variant.  If you need to do it in your handler, you 
should follow the same pattern.


That's the conclusion I came to as well, but thanks for confirming
it.  With the patch I don't need to make the change but since it's
not obvious that it's a no-no and since it's apparently only detected
under very special conditions I'm wondering is if it's possible to
detect it more reliably than only in C++ comptypes.  The trouble is
that I don't exactly what is allowed and what isn't and what to look
for to tell if the handler did something that's not allowed.

The C++ ICE triggered because the redeclared function's type is
considered the same as the original (structural_comptypes()
returns true) but the declarations' canonical types are different.
structural_comptypes() is C++-specific and I don't know what
alternative to call in the middle-end to compare the types and
get the equivalent result.

Martin

PS As a data point, I found just two attribute handlers in
c-attribs.c that modify a type in place: one for attribute const
and the other noreturn.  They both do it for function pointers
to set the 'const' and 'noreturn' bits on the pointed to types,
and by calling build_type_variant.


Re: [RS6000] make PLT loads volatile

2020-03-18 Thread Segher Boessenkool
Hi!

On Sat, Mar 14, 2020 at 09:30:02AM +1030, Alan Modra wrote:
> On Fri, Mar 13, 2020 at 10:40:38AM -0500, Segher Boessenkool wrote:
> > > Using a call-saved register to cache a load out of the PLT looks
> > > really silly
> > 
> > Who said anything about using call-saved registers?  GCC will usually
> > make a stack slot for this, and only use a non-volatile register when
> > that is profitable.  (I know it is a bit too aggressive with it, but
> > that is a generic problem).
> 
> Using a stack slot comes about due to hoisting then running out of
> call-saved registers in the loop.  Score another reason not to hoist
> PLT loads.

LRA can do this directly, without ever using call-saved registers.  There
are some other passes that can do rematerialisation as well.  Not enough
yet, but GCC does *not* use non-volatile registers to save values, unless
it thinks that is cheaper (which it currently thinks too often, but that
is a separate problem).

> > > when the inline PLT call is turned back into a direct
> > > call by the linker.
> > 
> > Ah, so yeah, for direct calls we do not want this.  I was thinking this
> > was about indirect calls (via a bctrl that is), dunno how I got that
> > misperception.  Sorry.
> > 
> > What is this like for indirect calls (at C level)?  Does your patch do
> > anything to those?
> 
> No effect at all.  To put your mind at rest on this point you can
> verify quite easily by noticing that UNSPECV_PLT* is only generated in
> rs6000_longcall_ref, and calls to that function are conditional on
> GET_CODE (func_desc) == SYMBOL_REF.

Could you please send a new patch (could be the same patch even) that
is easier to review for me?  With things like all of the above info in
the message (describing the setting, the problem, and the solution).

Or should I read the original message another ten times until it clicks?
It certainly is possible this matter just is hard to grasp :-/

Thanks in advance,


Segher


Re: [PATCH] c++: Adjust handling of COMPOUND_EXPRs in cp_build_binary_op [PR91993]

2020-03-18 Thread Jason Merrill via Gcc-patches

On 3/5/20 2:51 AM, Jakub Jelinek wrote:

Hi!

As the testcases shows, the -Wconversion warning behaves quite differently
when -fsanitize=undefined vs. when not sanitizing, but in the end it is
not something specific to sanitizing, if a user uses
   return static_cast(static_cast((d++, a) << 1U) | b) | c;
instead of
   return static_cast(static_cast(a << 1U) | b) | c;
and thus there is some COMPOUND_EXPR involved, cp_build_binary_op behaves
significantly different, e.g. shorten_binary_op will have different result
(uc for the case without COMPOUND_EXPR, int with it), but it isn't limited
to that.

The following patch attempts to handle those the same, basically ignoring
everything but the ultimately last operand of COMPOUND_EXPR(s) and treating
the other COMPOUND_EXPR(s) operand(s) just as side-effects that need to be
evaluated first.


How about improving get_narrower to handle COMPOUND_EXPR?  I'd think 
that would do the trick without affecting evaluation order.


Jason



Re: c: ignore initializers for elements of variable-size types [PR93577]

2020-03-18 Thread Jeff Law via Gcc-patches
On Tue, 2020-03-17 at 14:27 +0100, Christophe Lyon wrote:
> 
> > Jeff pointed out off-list that using:
> > 
> >N:  ... /* { dg-error {...} } */
> >  N+1:  /* { dg-error {...} "" { target *-*-* } .-1 } */
> > 
> > led to two identical test names for line N.  This was causing the
> > results to fluctuate when using contrib/compare_tests (which I admit
> > I don't do, so hadn't noticed).  Your patch fixes the cases that
> > mattered, but for future-proofing reasons, this patch adds proper
> > test names for the remaining instances.
> > 
> 
> Thanks.
> 
> Just checked, there are many more testcases with duplicate "names"
> (266 under gcc.target/aarch64 only) :-(
Yea.  It's a fairly common issue and not immediately obvious unless someone is
using the compare_tests script.

I think when this came up with some of Martin Sebor's work we agreed to fault in
fixes to the existing tests, so I think that would apply here as well.

Jeff
> 



Re: [PATCH] drop weakref attribute on function definitions (PR 92799)

2020-03-18 Thread Martin Sebor via Gcc-patches

On 3/12/20 2:10 PM, Jeff Law wrote:

On Mon, 2020-03-09 at 14:33 -0600, Martin Sebor wrote:

On 3/5/20 5:26 PM, Jeff Law wrote:

On Fri, 2020-02-14 at 15:41 -0700, Martin Sebor wrote:

Because attribute weakref introduces a kind of a definition, it can
only be applied to declarations of symbols that are not defined.  GCC
normally issues a warning when the attribute is applied to a defined
symbol, but PR 92799 shows that it misses some cases on which it then
leads to an ICE.

The ICE was introduced in GCC 4.5.  Prior to then, GCC accepted such
invalid definitions and silently dropped the weakref attribute.

The attached patch avoids the ICE while again dropping the invalid
attribute from the definition, except with the (now) usual warning.

Tested on x86_64-linux.

I also looked for code bases that make use of attribute weakref to
rebuild them as another test but couldn't find any.  (There are
a couple of instances in the Linux kernel but they look #ifdef'd
out).  Does anyone know of any that do use it that I could try to
build on Linux?

So you added this check

... || DECL_INITIAL (decl) != error_mark_node

Do you need to check that DECL_INITIAL is not NULL?  IIUC DECL_INITIAL in
this
context is a tri-state.

NULL -- DECL is not a function definition
error_mark_node -- it was a function definition, but the body was free'd
everything else -- the function definition


I've only seen two values come up for a function declared weakref in
the test suite: error_mark_node and something with the TREE_CODE of
BLOCK (the block where the weakref function is used when it's also
explicitly defined in the code, and when the attribute is subsequently
diagnosed by the warning).

So when it's a BLOCK it's giving you the outermost block scope for the function
which we usually use to generate debug info.

The key point is that we use ERROR_MARK_NODE because a non-NULL DECL_INITIAL
denotes an actual function definition, so we can't set it to NULL blindly.  See
cgraph_node::expand, cgraph_node:release_body, rest_of_handle_final and perhaps
other places.  Another example would be setting up the ifunc resolver.  It's a
real function so DECL_INITIAL must be non-NULL, but we don't really have a block
structure we can point to, so instead we set it to error_mark_node
(make_dispatcher_decl).

I wonder if the earlier node->definition check ultimately prevents DECL_INITIAL
from being NULL at this point.  According to cgraph.h that field is true when 
the
symbol corresponds to a definition in the current unit.  That would seem to
indicate yes.

So I think we can go forward with your patch as-is.


Thanks.  I committed it in r10-7267.  I'll give it a few days and then
backport it to release branches unless there's any concern with it.

Martin


Re: [PATCH] c++: Fix constexpr evaluation of self-modifying CONSTRUCTORs [PR94066]

2020-03-18 Thread Jason Merrill via Gcc-patches

On 3/18/20 11:58 AM, Patrick Palka wrote:

On Wed, 18 Mar 2020, Patrick Palka wrote:


On Tue, 17 Mar 2020, Jason Merrill wrote:


On 3/16/20 1:39 PM, Patrick Palka wrote:

In this PR, we are performing constexpr evaluation of a CONSTRUCTOR of type
union U which looks like

{.a=foo (&)}.

Since the function foo takes a reference to the CONSTRUCTOR we're building,
it
could potentially modify the CONSTRUCTOR from under us.  In particular since
U
is a union, the evaluation of a's initializer could change the active member
from a to another member -- something which cxx_eval_bare_aggregate doesn't
expect to happen.

Upon further investigation, it turns out this issue is not limited to
constructors of UNION_TYPE and not limited to cxx_eval_bare_aggregate
either.
For example, within cxx_eval_store_expression we may be evaluating an
assignment
such as (this comes from the test pr94066-2.C):

((union U *) this)->a = TARGET_EXPR ;


I assume this is actually an INIT_EXPR, or we would have preevaluated and not
had this problem.


Yes exactly, I should have specified that the above is an INIT_EXPR and
not a MODIFY_EXPR.




where evaluation of foo could change the active member of *this, which was
set
earlier in cxx_eval_store_expression to 'a'.  And if U is a RECORD_TYPE,
then
evaluation of foo could add new fields to *this, thereby making stale the
'valp'
pointer to the target constructor_elt through which we're later assigning.

So in short, it seems that both cxx_eval_bare_aggregate and
cxx_eval_store_expression do not anticipate that a constructor_elt's
initializer
could modify the underlying CONSTRUCTOR as a side-effect.


Oof.  If this is well-formed, it's because initialization of a doesn't
actually start until the return statement of foo, so we're probably wrong to
create a CONSTRUCTOR to hold the value of 'a' before evaluating foo.  Perhaps
init_subob_ctx shouldn't preemptively create a CONSTRUCTOR, and similarly for
the cxx_eval_store_expression !preeval code.


Hmm, I think I see what you mean.  I'll look into this.


In cpp0x/constexpr-array12.C we have

 struct A { int ar[3]; };
 constexpr A a1 = { 0, a1.ar[0] };

the initializer for a1 is a CONSTRUCTOR with the form

 {.ar={0, (int) VIEW_CONVERT_EXPR(a1).ar[0]}}

If we don't preemptively create a CONSTRUCTOR in cxx_eval_bare_aggregate
to hold the value of 'ar' before evaluating its initializer, then we
won't be able to resolve the 'a1.ar[0]' later on, and we will reject
this otherwise valid test case with an "accessing an uninitialized array
element" diagnostic.  So it seems we need to continue creating a
CONSTRUCTOR in cxx_eval_bare_aggregate before evaluating the initializer
of an aggregate sub-object to handle self-referential CONSTRUCTORs like
the one above.

Then again, clang is going with rejecting the original testcase with the
following justification: https://bugs.llvm.org/show_bug.cgi?id=45133#c1
Should we follow suit?


Yes, let's.  There's no need to bend over backward to allow this kind of 
pathological testcase.  Hubert is proposing that the initialization of 
u.a starts with the call to foo, and the testcase has undefined behavior 
because it ends the lifetime of u.a in the middle of its initialization.


Jason



Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Segher Boessenkool
On Wed, Mar 18, 2020 at 10:12:00PM +0800, Kewen.Lin wrote:
> > Btw, why not implement the neccessary vector init patterns?
> 
> Power doesn't support 64bit vector size, it looks a bit hacky and
> confusing to introduce this kind of mode just for some optab requirement,
> but I admit the optab hack can immediately make it work.  :)

But it opens up all kinds of other problems.  To begin with, how is a
short vector mapped to a "real" vector?

We don't have ops on short integer types, either, for similar reasons.


Segher


[PATCH] Rearrange detection of temporary directory for NetBSD

2020-03-18 Thread Kamil Rytarowski
Set /tmp first, then /var/tmp. /tmp is volatile on NetBSD and
/var/tmp not. This improves performance in the common use.
The downstream copy of GCC was patched for this preference
since 2015.

Remove occurence of /usr/tmp as it was never valid for NetBSD.
It was already activey disabled in the GCC manual page in 1996 and
in the GCC source code at least in 1998.

This change is not a matter of user-preference but Operating
System defaults that disagree with the libiberty detection plan.

No functional change for other Operataing Systems/environments.

libiberty/ChangeLog:

* make-temp-file.c (choose_tmpdir): Honor NetBSD specific paths.
---
 libiberty/ChangeLog| 4 
 libiberty/make-temp-file.c | 8 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 106c107e91a..18b9357aaed 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,7 @@
+2020-03-18  Kamil Rytarowski  
+
+   * make-temp-file.c (choose_tmpdir): Honor NetBSD specific paths.
+
 2020-03-05  Egeyar Bagcioglu  

* simple-object.c (handle_lto_debug_sections): Name
diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c
index cb08c27af6f..674333f042b 100644
--- a/libiberty/make-temp-file.c
+++ b/libiberty/make-temp-file.c
@@ -129,10 +129,16 @@ choose_tmpdir (void)
base = try_dir (P_tmpdir, base);
 #endif

-  /* Try /var/tmp, /usr/tmp, then /tmp.  */
+#if defined(__NetBSD__)
+  /* Try /tmp (volatile), then /var/tmp (non-volatile) on NetBSD.  */
+  base = try_dir (tmp, base);
+  base = try_dir (vartmp, base);
+#else
+  /* For others try /var/tmp, /usr/tmp, then /tmp.  */
   base = try_dir (vartmp, base);
   base = try_dir (usrtmp, base);
   base = try_dir (tmp, base);
+#endif

   /* If all else fails, use the current directory!  */
   if (base == 0)
--
2.25.0



Re: [PATCH] middle-end/94188 fix fold of addr expression generation

2020-03-18 Thread Richard Biener
On March 18, 2020 6:20:29 PM GMT+01:00, Maxim Kuvyrkov 
 wrote:
>
>> On 17 Mar 2020, at 17:40, Richard Biener  wrote:
>> 
>> 
>> This adds a missing type conversion to build_fold_addr_expr and
>adjusts
>> fallout - build_fold_addr_expr was used as a convenience to build an
>> ADDR_EXPR but some callers do not expect the result to be simplified
>> to something else.
>> 
>> Bootstrapped on x86_64-unknown-linux-gnu, testin in progress.
>> 
>> This is the 3rd or 4th attempt and I hope to have catched all fallout
>
>> with this.  I think it's inevitable we fix the mistake in
>> build_fold_addr_expr.
>> 
>> Richard.
>> 
>> 2020-03-17  Richard Biener  
>> 
>>  PR middle-end/94188
>>  * fold-const.c (build_fold_addr_expr): Convert address to
>>  correct type.
>>  * asan.c (maybe_create_ssa_name): Strip useless type conversions.
>>  * gimple-fold.c (gimple_fold_stmt_to_constant_1): Use build1
>>  to build the ADDR_EXPR which we don't really want to simplify.
>>  * tree-ssa-dom.c (record_equivalences_from_stmt): Likewise.
>>  * tree-ssa-loop-im.c (gather_mem_refs_stmt): Likewise.
>>  * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Likewise.
>>  (simplify_builtin_call): Strip useless type conversions.
>>  * tree-ssa-strlen.c (new_strinfo): Likewise.
>> 
>>  * gcc.dg/pr94188.c: New testcase.
>
>Hi Richard,
>
>This breaks Linux kernel build on 32-bit ARM:
>
>00:01:29 ./include/linux/string.h:333:9: internal compiler error: in
>gen_movsi, at config/arm/arm.md:6291
>00:01:29 make[2]: *** [sound/drivers/serial-u16550.o] Error 1
>
>Would you please investigate?  Let me know if you need any help
>reproducing the problem.

Please file a bug report with preprocessed source and instructions how to 
configure a cross to reproduce this. 

The change has caused more fallout than I expected... 

Thanks, 
Richard. 

>Kernel’s build line is (assuming cross-compilation):
>make CC=/path/to/arm-linux-gnueabihf-gcc ARCH=arm
>CROSS_COMPILE=arm-linux-gnueabihf- HOSTCC=gcc allyesconfig
>
>Regards,
>
>--
>Maxim Kuvyrkov
>https://www.linaro.org



Re: [PATCH] c++: Include the constraint parameter mapping in diagnostic constraint contexts

2020-03-18 Thread Patrick Palka via Gcc-patches
On Wed, 18 Mar 2020, Patrick Palka wrote:

> When diagnosing a constraint error, we currently try to print the constraint
> inside a diagnostic constraint context with its template arguments substituted
> in.  If substitution fails, then we instead just print the dependent
> form, as in the the test case below:
> 
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: static assertion 
> failed
>  14 | static_assert(E); // { dg-error "static assertion failed|not a 
> class" }
> |   ^~
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: note: constraints not 
> satisfied
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:4:11:   required for the 
> satisfaction of ‘C’
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:8:11:   required for the 
> satisfaction of ‘D’
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: ‘int’ is not a 
> class, struct, or union type
> 
> But printing just the dependent form sometimes makes it difficult to decipher
> the diagnostic.  In the above example, for instance, there's no indication of
> how the template argument 'int' relates to either of the 'T's.
> 
> This patch improves the situation by changing these diagnostics to always 
> print
> the dependent form of the constraint, and alongside it the (preferably
> substituted) constraint parameter mapping.  So with the same test case below 
> we
> now get:
> 
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: static assertion 
> failed
>  14 | static_assert(E); // { dg-error "static assertion failed|not a 
> class" }
> |   ^~
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: note: constraints not 
> satisfied
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:4:11:   required for the 
> satisfaction of ‘C’ [with T = typename T::type]
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:8:11:   required for the 
> satisfaction of ‘D’ [with T = int]
>   gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: ‘int’ is not a 
> class, struct, or union type
> 
> This change arguably makes it easier to figure out what's going on whenever a
> constraint fails due to substitution resulting in an invalid type rather than
> failing due to the constraint evaluating to false.
> 
> Tested on x86_64-pc-linux-gnu, does this look reasonable?  I'm not sure if
> printing an unsubstituted parameter mapping (like in the line 4 message above)
> is always useful, but perhaps it's better than nothing?
> 

Ah sorry, this is an old version of the patch.  Please consider this one
instead:

-- >8 --

gcc/cp/ChangeLog:

* cxx-pretty-print.c (pp_cxx_parameter_mapping): Make extern.
* cxx-pretty-print.h (pp_cxx_parameter_mapping): Declare.
* error.c (rebuild_concept_check): Delete.
(print_concept_check_info): Print the dependent form of the constraint 
and the
preferably substituted parameter mapping alongside it.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic6.C: New test.
---
 gcc/cp/cxx-pretty-print.c   |  2 +-
 gcc/cp/cxx-pretty-print.h   |  1 +
 gcc/cp/error.c  | 41 -
 gcc/testsuite/g++.dg/concepts/diagnostic6.C | 14 +++
 4 files changed, 32 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic6.C

diff --git a/gcc/cp/cxx-pretty-print.c b/gcc/cp/cxx-pretty-print.c
index 100154e400f..1e89c3f9033 100644
--- a/gcc/cp/cxx-pretty-print.c
+++ b/gcc/cp/cxx-pretty-print.c
@@ -2878,7 +2878,7 @@ pp_cxx_check_constraint (cxx_pretty_printer *pp, tree t)
 /* Output the "[with ...]" clause for a parameter mapping of an atomic
constraint.   */
 
-static void
+void
 pp_cxx_parameter_mapping (cxx_pretty_printer *pp, tree map)
 {
   for (tree p = map; p; p = TREE_CHAIN (p))
diff --git a/gcc/cp/cxx-pretty-print.h b/gcc/cp/cxx-pretty-print.h
index 7c7347f57ba..494f3fdde59 100644
--- a/gcc/cp/cxx-pretty-print.h
+++ b/gcc/cp/cxx-pretty-print.h
@@ -112,5 +112,6 @@ void pp_cxx_conjunction (cxx_pretty_printer *, tree);
 void pp_cxx_disjunction (cxx_pretty_printer *, tree);
 void pp_cxx_constraint (cxx_pretty_printer *, tree);
 void pp_cxx_constrained_type_spec (cxx_pretty_printer *, tree);
+void pp_cxx_parameter_mapping (cxx_pretty_printer *, tree);
 
 #endif /* GCC_CXX_PRETTY_PRINT_H */
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index cc51b6ffe13..fc08558e9b6 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -3680,27 +3680,6 @@ print_location (diagnostic_context *context, location_t 
loc)
  "locus", xloc.file, xloc.line);
 }
 
-/* Instantiate the concept check for the purpose of diagnosing an error.  */
-
-static tree
-rebuild_concept_check (tree expr, tree map, tree args)
-{
-  /* Instantiate the parameter mapping for the template-id.  */
-  map = tsubst_parameter_mapping (map, args, tf_none, NULL_TREE);
-  if (map == error_mark_node)
-return error_mark_node;
-  args = get_mapped_args 

[PATCH] c++: Include the constraint parameter mapping in diagnostic constraint contexts

2020-03-18 Thread Patrick Palka via Gcc-patches
When diagnosing a constraint error, we currently try to print the constraint
inside a diagnostic constraint context with its template arguments substituted
in.  If substitution fails, then we instead just print the dependent
form, as in the the test case below:

  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: static assertion 
failed
 14 | static_assert(E); // { dg-error "static assertion failed|not a 
class" }
|   ^~
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: note: constraints not 
satisfied
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:4:11:   required for the 
satisfaction of ‘C’
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:8:11:   required for the 
satisfaction of ‘D’
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: ‘int’ is not a 
class, struct, or union type

But printing just the dependent form sometimes makes it difficult to decipher
the diagnostic.  In the above example, for instance, there's no indication of
how the template argument 'int' relates to either of the 'T's.

This patch improves the situation by changing these diagnostics to always print
the dependent form of the constraint, and alongside it the (preferably
substituted) constraint parameter mapping.  So with the same test case below we
now get:

  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: static assertion 
failed
 14 | static_assert(E); // { dg-error "static assertion failed|not a 
class" }
|   ^~
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: note: constraints not 
satisfied
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:4:11:   required for the 
satisfaction of ‘C’ [with T = typename T::type]
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:8:11:   required for the 
satisfaction of ‘D’ [with T = int]
  gcc/testsuite/g++.dg/concepts/diagnostic6.C:14:15: error: ‘int’ is not a 
class, struct, or union type

This change arguably makes it easier to figure out what's going on whenever a
constraint fails due to substitution resulting in an invalid type rather than
failing due to the constraint evaluating to false.

Tested on x86_64-pc-linux-gnu, does this look reasonable?  I'm not sure if
printing an unsubstituted parameter mapping (like in the line 4 message above)
is always useful, but perhaps it's better than nothing?

gcc/cp/ChangeLog:

* cxx-pretty-print.c (pp_cxx_parameter_mapping): Make extern.
* cxx-pretty-print.h (pp_cxx_parameter_mapping): Declare.
* error.c (rebuild_concept_check): Delete.
(print_concept_check_info): Print the constraint uninstantiated and the
parameter mapping alongside it.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic6.C: New test.
---
 gcc/cp/cxx-pretty-print.c   |  2 +-
 gcc/cp/cxx-pretty-print.h   |  1 +
 gcc/cp/error.c  | 39 -
 gcc/testsuite/g++.dg/concepts/diagnostic6.C | 14 
 4 files changed, 30 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic6.C

diff --git a/gcc/cp/cxx-pretty-print.c b/gcc/cp/cxx-pretty-print.c
index 100154e400f..1e89c3f9033 100644
--- a/gcc/cp/cxx-pretty-print.c
+++ b/gcc/cp/cxx-pretty-print.c
@@ -2878,7 +2878,7 @@ pp_cxx_check_constraint (cxx_pretty_printer *pp, tree t)
 /* Output the "[with ...]" clause for a parameter mapping of an atomic
constraint.   */
 
-static void
+void
 pp_cxx_parameter_mapping (cxx_pretty_printer *pp, tree map)
 {
   for (tree p = map; p; p = TREE_CHAIN (p))
diff --git a/gcc/cp/cxx-pretty-print.h b/gcc/cp/cxx-pretty-print.h
index 7c7347f57ba..494f3fdde59 100644
--- a/gcc/cp/cxx-pretty-print.h
+++ b/gcc/cp/cxx-pretty-print.h
@@ -112,5 +112,6 @@ void pp_cxx_conjunction (cxx_pretty_printer *, tree);
 void pp_cxx_disjunction (cxx_pretty_printer *, tree);
 void pp_cxx_constraint (cxx_pretty_printer *, tree);
 void pp_cxx_constrained_type_spec (cxx_pretty_printer *, tree);
+void pp_cxx_parameter_mapping (cxx_pretty_printer *, tree);
 
 #endif /* GCC_CXX_PRETTY_PRINT_H */
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index cc51b6ffe13..4bf835e84a1 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -3680,27 +3680,6 @@ print_location (diagnostic_context *context, location_t 
loc)
  "locus", xloc.file, xloc.line);
 }
 
-/* Instantiate the concept check for the purpose of diagnosing an error.  */
-
-static tree
-rebuild_concept_check (tree expr, tree map, tree args)
-{
-  /* Instantiate the parameter mapping for the template-id.  */
-  map = tsubst_parameter_mapping (map, args, tf_none, NULL_TREE);
-  if (map == error_mark_node)
-return error_mark_node;
-  args = get_mapped_args (map);
-
-  /* Rebuild the template id using substituted arguments. Substituting
- directly through the expression will trigger recursive satisfaction,
- so don't do that.  */
-  tree id = unpack_concept_check (expr);
-  args = tsubst_template_args (TREE_OPERAND (id, 

RE: [PATCH v2][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][8/5x]: Remaining MVE store intrinsics which
> stores an half word, word and double word to memory.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534340.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE store intrinsics which stores an
> halfword, word or double word to memory.
> 
> vstrdq_scatter_base_p_s64, vstrdq_scatter_base_p_u64,
> vstrdq_scatter_base_s64, vstrdq_scatter_base_u64,
> vstrdq_scatter_offset_p_s64, vstrdq_scatter_offset_p_u64,
> vstrdq_scatter_offset_s64, vstrdq_scatter_offset_u64,
> vstrdq_scatter_shifted_offset_p_s64,
> vstrdq_scatter_shifted_offset_p_u64, vstrdq_scatter_shifted_offset_s64,
> vstrdq_scatter_shifted_offset_u64, vstrhq_scatter_offset_f16,
> vstrhq_scatter_offset_p_f16, vstrhq_scatter_shifted_offset_f16,
> vstrhq_scatter_shifted_offset_p_f16,
> vstrwq_scatter_base_f32, vstrwq_scatter_base_p_f32,
> vstrwq_scatter_offset_f32, vstrwq_scatter_offset_p_f32,
> vstrwq_scatter_offset_p_s32, vstrwq_scatter_offset_p_u32,
> vstrwq_scatter_offset_s32, vstrwq_scatter_offset_u32,
> vstrwq_scatter_shifted_offset_f32,
> vstrwq_scatter_shifted_offset_p_f32, vstrwq_scatter_shifted_offset_p_s32,
> vstrwq_scatter_shifted_offset_p_u32, vstrwq_scatter_shifted_offset_s32,
> vstrwq_scatter_shifted_offset_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> In this patch a new predicate "Ri" is defined to check the immediate is in the
> range of +/-1016 and multiple of 8.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-05  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vstrdq_scatter_base_p_s64): Define
> macro.
>   (vstrdq_scatter_base_p_u64): Likewise.
>   (vstrdq_scatter_base_s64): Likewise.
>   (vstrdq_scatter_base_u64): Likewise.
>   (vstrdq_scatter_offset_p_s64): Likewise.
>   (vstrdq_scatter_offset_p_u64): Likewise.
>   (vstrdq_scatter_offset_s64): Likewise.
>   (vstrdq_scatter_offset_u64): Likewise.
>   (vstrdq_scatter_shifted_offset_p_s64): Likewise.
>   (vstrdq_scatter_shifted_offset_p_u64): Likewise.
>   (vstrdq_scatter_shifted_offset_s64): Likewise.
>   (vstrdq_scatter_shifted_offset_u64): Likewise.
>   (vstrhq_scatter_offset_f16): Likewise.
>   (vstrhq_scatter_offset_p_f16): Likewise.
>   (vstrhq_scatter_shifted_offset_f16): Likewise.
>   (vstrhq_scatter_shifted_offset_p_f16): Likewise.
>   (vstrwq_scatter_base_f32): Likewise.
>   (vstrwq_scatter_base_p_f32): Likewise.
>   (vstrwq_scatter_offset_f32): Likewise.
>   (vstrwq_scatter_offset_p_f32): Likewise.
>   (vstrwq_scatter_offset_p_s32): Likewise.
>   (vstrwq_scatter_offset_p_u32): Likewise.
>   (vstrwq_scatter_offset_s32): Likewise.
>   (vstrwq_scatter_offset_u32): Likewise.
>   (vstrwq_scatter_shifted_offset_f32): Likewise.
>   (vstrwq_scatter_shifted_offset_p_f32): Likewise.
>   (vstrwq_scatter_shifted_offset_p_s32): Likewise.
>   (vstrwq_scatter_shifted_offset_p_u32): Likewise.
>   (vstrwq_scatter_shifted_offset_s32): Likewise.
>   (vstrwq_scatter_shifted_offset_u32): Likewise.
>   (__arm_vstrdq_scatter_base_p_s64): Define intrinsic.
>   (__arm_vstrdq_scatter_base_p_u64): Likewise.
>   (__arm_vstrdq_scatter_base_s64): Likewise.
>   (__arm_vstrdq_scatter_base_u64): Likewise.
>   (__arm_vstrdq_scatter_offset_p_s64): Likewise.
>   (__arm_vstrdq_scatter_offset_p_u64): Likewise.
>   (__arm_vstrdq_scatter_offset_s64): Likewise.
>   (__arm_vstrdq_scatter_offset_u64): Likewise.
>   (__arm_vstrdq_scatter_shifted_offset_p_s64): Likewise.
>   (__arm_vstrdq_scatter_shifted_offset_p_u64): Likewise.
>   (__arm_vstrdq_scatter_shifted_offset_s64): Likewise.
>   (__arm_vstrdq_scatter_shifted_offset_u64): Likewise.
>   (__arm_vstrwq_scatter_offset_p_s32): Likewise.
>   (__arm_vstrwq_scatter_offset_p_u32): Likewise.
>   (__arm_vstrwq_scatter_offset_s32): Likewise.
>   (__arm_vstrwq_scatter_offset_u32): Likewise.
>   (__arm_vstrwq_scatter_shifted_offset_p_s32): Likewise.
>   (__arm_vstrwq_scatter_shifted_offset_p_u32): Likewise.
>   (__arm_vstrwq_scatter_shifted_offset_s32): Likewise.
>   (__arm_vstrwq_scatter_shifted_offset_u32): Likewise.
>   (__arm_vstrhq_scatter_offset_f16): Likewise.
>   (__arm_vstrhq_scatter_offset_p_f16): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_f16): 

RE: [PATCH v2][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][7/5x]: MVE store intrinsics which stores
> byte,half word or word to memory.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534335.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE store intrinsics which stores a
> byte, halfword, or word to memory.
> 
> vst1q_f32, vst1q_f16, vst1q_s8, vst1q_s32, vst1q_s16, vst1q_u8, vst1q_u32,
> vst1q_u16, vstrhq_f16, vstrhq_scatter_offset_s32, vstrhq_scatter_offset_s16,
> vstrhq_scatter_offset_u32, vstrhq_scatter_offset_u16,
> vstrhq_scatter_offset_p_s32, vstrhq_scatter_offset_p_s16,
> vstrhq_scatter_offset_p_u32, vstrhq_scatter_offset_p_u16,
> vstrhq_scatter_shifted_offset_s32,
> vstrhq_scatter_shifted_offset_s16, vstrhq_scatter_shifted_offset_u32,
> vstrhq_scatter_shifted_offset_u16, vstrhq_scatter_shifted_offset_p_s32,
> vstrhq_scatter_shifted_offset_p_s16, vstrhq_scatter_shifted_offset_p_u32,
> vstrhq_scatter_shifted_offset_p_u16, vstrhq_s32, vstrhq_s16, vstrhq_u32,
> vstrhq_u16, vstrhq_p_f16, vstrhq_p_s32, vstrhq_p_s16, vstrhq_p_u32,
> vstrhq_p_u16, vstrwq_f32, vstrwq_s32, vstrwq_u32, vstrwq_p_f32,
> vstrwq_p_s32, vstrwq_p_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vst1q_f32): Define macro.
>   (vst1q_f16): Likewise.
>   (vst1q_s8): Likewise.
>   (vst1q_s32): Likewise.
>   (vst1q_s16): Likewise.
>   (vst1q_u8): Likewise.
>   (vst1q_u32): Likewise.
>   (vst1q_u16): Likewise.
>   (vstrhq_f16): Likewise.
>   (vstrhq_scatter_offset_s32): Likewise.
>   (vstrhq_scatter_offset_s16): Likewise.
>   (vstrhq_scatter_offset_u32): Likewise.
>   (vstrhq_scatter_offset_u16): Likewise.
>   (vstrhq_scatter_offset_p_s32): Likewise.
>   (vstrhq_scatter_offset_p_s16): Likewise.
>   (vstrhq_scatter_offset_p_u32): Likewise.
>   (vstrhq_scatter_offset_p_u16): Likewise.
>   (vstrhq_scatter_shifted_offset_s32): Likewise.
>   (vstrhq_scatter_shifted_offset_s16): Likewise.
>   (vstrhq_scatter_shifted_offset_u32): Likewise.
>   (vstrhq_scatter_shifted_offset_u16): Likewise.
>   (vstrhq_scatter_shifted_offset_p_s32): Likewise.
>   (vstrhq_scatter_shifted_offset_p_s16): Likewise.
>   (vstrhq_scatter_shifted_offset_p_u32): Likewise.
>   (vstrhq_scatter_shifted_offset_p_u16): Likewise.
>   (vstrhq_s32): Likewise.
>   (vstrhq_s16): Likewise.
>   (vstrhq_u32): Likewise.
>   (vstrhq_u16): Likewise.
>   (vstrhq_p_f16): Likewise.
>   (vstrhq_p_s32): Likewise.
>   (vstrhq_p_s16): Likewise.
>   (vstrhq_p_u32): Likewise.
>   (vstrhq_p_u16): Likewise.
>   (vstrwq_f32): Likewise.
>   (vstrwq_s32): Likewise.
>   (vstrwq_u32): Likewise.
>   (vstrwq_p_f32): Likewise.
>   (vstrwq_p_s32): Likewise.
>   (vstrwq_p_u32): Likewise.
>   (__arm_vst1q_s8): Define intrinsic.
>   (__arm_vst1q_s32): Likewise.
>   (__arm_vst1q_s16): Likewise.
>   (__arm_vst1q_u8): Likewise.
>   (__arm_vst1q_u32): Likewise.
>   (__arm_vst1q_u16): Likewise.
>   (__arm_vstrhq_scatter_offset_s32): Likewise.
>   (__arm_vstrhq_scatter_offset_s16): Likewise.
>   (__arm_vstrhq_scatter_offset_u32): Likewise.
>   (__arm_vstrhq_scatter_offset_u16): Likewise.
>   (__arm_vstrhq_scatter_offset_p_s32): Likewise.
>   (__arm_vstrhq_scatter_offset_p_s16): Likewise.
>   (__arm_vstrhq_scatter_offset_p_u32): Likewise.
>   (__arm_vstrhq_scatter_offset_p_u16): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_s32): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_s16): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_u32): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_u16): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_p_s32): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_p_s16): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_p_u32): Likewise.
>   (__arm_vstrhq_scatter_shifted_offset_p_u16): Likewise.
>   (__arm_vstrhq_s32): Likewise.
>   (__arm_vstrhq_s16): Likewise.
>   (__arm_vstrhq_u32): Likewise.
>   (__arm_vstrhq_u16): Likewise.
>   (__arm_vstrhq_p_s32): Likewise.
>   (__arm_vstrhq_p_s16): Likewise.
>   (__arm_vstrhq_p_u32): Likewise.
>   (__arm_vstrhq_p_u16): Likewise.
>   (__arm_vstrwq_s32): 

Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 18, 2020 at 12:57:18PM -0600, Martin Sebor via Gcc-patches wrote:
> I noticed this last night:
> 
>   https://sourceware.org/pipermail/glibc-cvs/2020q1/069150.html
> 
> Presumably that's the fix.

Or maybe for REAL_TYPE just care here about TYPE_MODE which should be all
that matters?  If double and long double are the same, it isn't a big deal.
And similarly for INTEGER_TYPEs only care about TYPE_MODE/TYPE_PRECISION?
If unsigned long and unsigned long long are the same, why should we care?

Jakub



RE: [PATCH v2][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][6/5x]: Remaining MVE load intrinsics which
> loads half word and word or double word from memory.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534354.html
> 
> 
> 
> Hello,
> 
> This patch supports the following Remaining MVE ACLE load intrinsics which
> load an halfword,
> word or double word from memory.
> 
> vldrdq_gather_base_s64, vldrdq_gather_base_u64,
> vldrdq_gather_base_z_s64,
> vldrdq_gather_base_z_u64, vldrdq_gather_offset_s64,
> vldrdq_gather_offset_u64,
> vldrdq_gather_offset_z_s64, vldrdq_gather_offset_z_u64,
> vldrdq_gather_shifted_offset_s64,
> vldrdq_gather_shifted_offset_u64, vldrdq_gather_shifted_offset_z_s64,
> vldrdq_gather_shifted_offset_z_u64, vldrhq_gather_offset_f16,
> vldrhq_gather_offset_z_f16,
> vldrhq_gather_shifted_offset_f16, vldrhq_gather_shifted_offset_z_f16,
> vldrwq_gather_base_f32,
> vldrwq_gather_base_z_f32, vldrwq_gather_offset_f32,
> vldrwq_gather_offset_s32,
> vldrwq_gather_offset_u32, vldrwq_gather_offset_z_f32,
> vldrwq_gather_offset_z_s32,
> vldrwq_gather_offset_z_u32, vldrwq_gather_shifted_offset_f32,
> vldrwq_gather_shifted_offset_s32,
> vldrwq_gather_shifted_offset_u32, vldrwq_gather_shifted_offset_z_f32,
> vldrwq_gather_shifted_offset_z_s32, vldrwq_gather_shifted_offset_z_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vldrdq_gather_base_s64): Define macro.
>   (vldrdq_gather_base_u64): Likewise.
>   (vldrdq_gather_base_z_s64): Likewise.
>   (vldrdq_gather_base_z_u64): Likewise.
>   (vldrdq_gather_offset_s64): Likewise.
>   (vldrdq_gather_offset_u64): Likewise.
>   (vldrdq_gather_offset_z_s64): Likewise.
>   (vldrdq_gather_offset_z_u64): Likewise.
>   (vldrdq_gather_shifted_offset_s64): Likewise.
>   (vldrdq_gather_shifted_offset_u64): Likewise.
>   (vldrdq_gather_shifted_offset_z_s64): Likewise.
>   (vldrdq_gather_shifted_offset_z_u64): Likewise.
>   (vldrhq_gather_offset_f16): Likewise.
>   (vldrhq_gather_offset_z_f16): Likewise.
>   (vldrhq_gather_shifted_offset_f16): Likewise.
>   (vldrhq_gather_shifted_offset_z_f16): Likewise.
>   (vldrwq_gather_base_f32): Likewise.
>   (vldrwq_gather_base_z_f32): Likewise.
>   (vldrwq_gather_offset_f32): Likewise.
>   (vldrwq_gather_offset_s32): Likewise.
>   (vldrwq_gather_offset_u32): Likewise.
>   (vldrwq_gather_offset_z_f32): Likewise.
>   (vldrwq_gather_offset_z_s32): Likewise.
>   (vldrwq_gather_offset_z_u32): Likewise.
>   (vldrwq_gather_shifted_offset_f32): Likewise.
>   (vldrwq_gather_shifted_offset_s32): Likewise.
>   (vldrwq_gather_shifted_offset_u32): Likewise.
>   (vldrwq_gather_shifted_offset_z_f32): Likewise.
>   (vldrwq_gather_shifted_offset_z_s32): Likewise.
>   (vldrwq_gather_shifted_offset_z_u32): Likewise.
>   (__arm_vldrdq_gather_base_s64): Define intrinsic.
>   (__arm_vldrdq_gather_base_u64): Likewise.
>   (__arm_vldrdq_gather_base_z_s64): Likewise.
>   (__arm_vldrdq_gather_base_z_u64): Likewise.
>   (__arm_vldrdq_gather_offset_s64): Likewise.
>   (__arm_vldrdq_gather_offset_u64): Likewise.
>   (__arm_vldrdq_gather_offset_z_s64): Likewise.
>   (__arm_vldrdq_gather_offset_z_u64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_s64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_u64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_z_s64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_z_u64): Likewise.
>   (__arm_vldrwq_gather_offset_s32): Likewise.
>   (__arm_vldrwq_gather_offset_u32): Likewise.
>   (__arm_vldrwq_gather_offset_z_s32): Likewise.
>   (__arm_vldrwq_gather_offset_z_u32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_s32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_u32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_z_s32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_z_u32): Likewise.
>   (__arm_vldrhq_gather_offset_f16): Likewise.
>   (__arm_vldrhq_gather_offset_z_f16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_f16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_f16): Likewise.
>   (__arm_vldrwq_gather_base_f32): Likewise.
>   (__arm_vldrwq_gather_base_z_f32): Likewise.
>   

Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Martin Sebor via Gcc-patches

On 3/18/20 8:30 AM, Jeff Law wrote:

On Wed, 2020-03-18 at 14:25 +, Szabolcs Nagy wrote:

The 03/13/2020 10:45, Martin Sebor via Gcc-patches wrote:

On 3/12/20 7:17 PM, Joseph Myers wrote:

On Thu, 5 Mar 2020, Martin Sebor wrote:


Tested on x86_64-linux.  Is this acceptable for GCC 10?  How about 9?


OK for GCC 10.


Thank you.  I committed it to trunk in r10-7162.


arm glibc build fails for me since this commit.

../sysdeps/ieee754/dbl-64/s_modf.c:84:28: error: conflicting types for built-in
function 'modfl'; expected 'long double(long double,  long double *)' [-
Werror=builtin-declaration-mismatch]
84 | libm_alias_double (__modf, modf)
   |^~~~

it seems this used to compile but not any more:

double modf (double x, double *p) { return x; }
extern __typeof (modf) modfl __attribute__ ((weak, alias ("modf")))
__attribute__ ((__copy__ (modf)));

I think Joseph posted something this morning that might fix this.


I noticed this last night:

  https://sourceware.org/pipermail/glibc-cvs/2020q1/069150.html

Presumably that's the fix.

Martin


RE: [PATCH v2][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a
> byte, halfword, or word from memory.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534352.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE load intrinsics which load a byte,
> halfword,
> or word from memory.
> vld1q_s8, vld1q_s32, vld1q_s16, vld1q_u8, vld1q_u32, vld1q_u16,
> vldrhq_gather_offset_s32,
> vldrhq_gather_offset_s16, vldrhq_gather_offset_u32,
> vldrhq_gather_offset_u16,
> vldrhq_gather_offset_z_s32, vldrhq_gather_offset_z_s16,
> vldrhq_gather_offset_z_u32,
> vldrhq_gather_offset_z_u16, vldrhq_gather_shifted_offset_s32,vldrwq_f32,
> vldrwq_z_f32,
> vldrhq_gather_shifted_offset_s16, vldrhq_gather_shifted_offset_u32,
> vldrhq_gather_shifted_offset_u16, vldrhq_gather_shifted_offset_z_s32,
> vldrhq_gather_shifted_offset_z_s16, vldrhq_gather_shifted_offset_z_u32,
> vldrhq_gather_shifted_offset_z_u16, vldrhq_s32, vldrhq_s16, vldrhq_u32,
> vldrhq_u16,
> vldrhq_z_s32, vldrhq_z_s16, vldrhq_z_u32, vldrhq_z_u16, vldrwq_s32,
> vldrwq_u32,
> vldrwq_z_s32, vldrwq_z_u32, vld1q_f32, vld1q_f16, vldrhq_f16,
> vldrhq_z_f16.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vld1q_s8): Define macro.
>   (vld1q_s32): Likewise.
>   (vld1q_s16): Likewise.
>   (vld1q_u8): Likewise.
>   (vld1q_u32): Likewise.
>   (vld1q_u16): Likewise.
>   (vldrhq_gather_offset_s32): Likewise.
>   (vldrhq_gather_offset_s16): Likewise.
>   (vldrhq_gather_offset_u32): Likewise.
>   (vldrhq_gather_offset_u16): Likewise.
>   (vldrhq_gather_offset_z_s32): Likewise.
>   (vldrhq_gather_offset_z_s16): Likewise.
>   (vldrhq_gather_offset_z_u32): Likewise.
>   (vldrhq_gather_offset_z_u16): Likewise.
>   (vldrhq_gather_shifted_offset_s32): Likewise.
>   (vldrhq_gather_shifted_offset_s16): Likewise.
>   (vldrhq_gather_shifted_offset_u32): Likewise.
>   (vldrhq_gather_shifted_offset_u16): Likewise.
>   (vldrhq_gather_shifted_offset_z_s32): Likewise.
>   (vldrhq_gather_shifted_offset_z_s16): Likewise.
>   (vldrhq_gather_shifted_offset_z_u32): Likewise.
>   (vldrhq_gather_shifted_offset_z_u16): Likewise.
>   (vldrhq_s32): Likewise.
>   (vldrhq_s16): Likewise.
>   (vldrhq_u32): Likewise.
>   (vldrhq_u16): Likewise.
>   (vldrhq_z_s32): Likewise.
>   (vldrhq_z_s16): Likewise.
>   (vldrhq_z_u32): Likewise.
>   (vldrhq_z_u16): Likewise.
>   (vldrwq_s32): Likewise.
>   (vldrwq_u32): Likewise.
>   (vldrwq_z_s32): Likewise.
>   (vldrwq_z_u32): Likewise.
>   (vld1q_f32): Likewise.
>   (vld1q_f16): Likewise.
>   (vldrhq_f16): Likewise.
>   (vldrhq_z_f16): Likewise.
>   (vldrwq_f32): Likewise.
>   (vldrwq_z_f32): Likewise.
>   (__arm_vld1q_s8): Define intrinsic.
>   (__arm_vld1q_s32): Likewise.
>   (__arm_vld1q_s16): Likewise.
>   (__arm_vld1q_u8): Likewise.
>   (__arm_vld1q_u32): Likewise.
>   (__arm_vld1q_u16): Likewise.
>   (__arm_vldrhq_gather_offset_s32): Likewise.
>   (__arm_vldrhq_gather_offset_s16): Likewise.
>   (__arm_vldrhq_gather_offset_u32): Likewise.
>   (__arm_vldrhq_gather_offset_u16): Likewise.
>   (__arm_vldrhq_gather_offset_z_s32): Likewise.
>   (__arm_vldrhq_gather_offset_z_s16): Likewise.
>   (__arm_vldrhq_gather_offset_z_u32): Likewise.
>   (__arm_vldrhq_gather_offset_z_u16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_s32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_s16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_u32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_u16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_s32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_s16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_u32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_u16): Likewise.
>   (__arm_vldrhq_s32): Likewise.
>   (__arm_vldrhq_s16): Likewise.
>   (__arm_vldrhq_u32): Likewise.
>   (__arm_vldrhq_u16): Likewise.
>   (__arm_vldrhq_z_s32): Likewise.
>   (__arm_vldrhq_z_s16): Likewise.
>   (__arm_vldrhq_z_u32): Likewise.
>   (__arm_vldrhq_z_u16): Likewise.
>   (__arm_vldrwq_s32): Likewise.
>   (__arm_vldrwq_u32): 

RE: [PATCH v2][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z)
> suffix.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534333.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE load intrinsics with zero(_z)
> suffix.
> * ``_z`` (zero) which indicates false-predicated lanes are filled with zeroes,
> these are only used for load instructions.
> 
> vldrbq_gather_offset_z_s16, vldrbq_gather_offset_z_u8,
> vldrbq_gather_offset_z_s32, vldrbq_gather_offset_z_u16,
> vldrbq_gather_offset_z_u32, vldrbq_gather_offset_z_s8, vldrbq_z_s16,
> vldrbq_z_u8, vldrbq_z_s8, vldrbq_z_s32, vldrbq_z_u16, vldrbq_z_u32,
> vldrwq_gather_base_z_u32, vldrwq_gather_base_z_s32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c (LDRGBS_Z_QUALIFIERS): Define builtin
>   qualifier.
>   (LDRGBU_Z_QUALIFIERS): Likewise.
>   (LDRGS_Z_QUALIFIERS): Likewise.
>   (LDRGU_Z_QUALIFIERS): Likewise.
>   (LDRS_Z_QUALIFIERS): Likewise.
>   (LDRU_Z_QUALIFIERS): Likewise.
>   * config/arm/arm_mve.h (vldrbq_gather_offset_z_s16): Define
> macro.
>   (vldrbq_gather_offset_z_u8): Likewise.
>   (vldrbq_gather_offset_z_s32): Likewise.
>   (vldrbq_gather_offset_z_u16): Likewise.
>   (vldrbq_gather_offset_z_u32): Likewise.
>   (vldrbq_gather_offset_z_s8): Likewise.
>   (vldrbq_z_s16): Likewise.
>   (vldrbq_z_u8): Likewise.
>   (vldrbq_z_s8): Likewise.
>   (vldrbq_z_s32): Likewise.
>   (vldrbq_z_u16): Likewise.
>   (vldrbq_z_u32): Likewise.
>   (vldrwq_gather_base_z_u32): Likewise.
>   (vldrwq_gather_base_z_s32): Likewise.
>   (__arm_vldrbq_gather_offset_z_s8): Define intrinsic.
>   (__arm_vldrbq_gather_offset_z_s32): Likewise.
>   (__arm_vldrbq_gather_offset_z_s16): Likewise.
>   (__arm_vldrbq_gather_offset_z_u8): Likewise.
>   (__arm_vldrbq_gather_offset_z_u32): Likewise.
>   (__arm_vldrbq_gather_offset_z_u16): Likewise.
>   (__arm_vldrbq_z_s8): Likewise.
>   (__arm_vldrbq_z_s32): Likewise.
>   (__arm_vldrbq_z_s16): Likewise.
>   (__arm_vldrbq_z_u8): Likewise.
>   (__arm_vldrbq_z_u32): Likewise.
>   (__arm_vldrbq_z_u16): Likewise.
>   (__arm_vldrwq_gather_base_z_s32): Likewise.
>   (__arm_vldrwq_gather_base_z_u32): Likewise.
>   (vldrbq_gather_offset_z): Define polymorphic variant.
>   * config/arm/arm_mve_builtins.def (LDRGBS_Z_QUALIFIERS): Use
> builtin
> qualifier.
> (LDRGBU_Z_QUALIFIERS): Likewise.
> (LDRGS_Z_QUALIFIERS): Likewise.
> (LDRGU_Z_QUALIFIERS): Likewise.
> (LDRS_Z_QUALIFIERS): Likewise.
> (LDRU_Z_QUALIFIERS): Likewise.
>   * config/arm/mve.md (mve_vldrbq_gather_offset_z_):
> Define
>   RTL pattern.
>   (mve_vldrbq_z_): Likewise.
>   (mve_vldrwq_gather_base_z_v4si): Likewise.
> 
> gcc/testsuite/ChangeLog: Likewise.
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c: New
> test.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c:
> Likewise.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index
> c87fa3118510e4de90ac9afe08608fb2315f4809..c3deb9efc8849019141b64305
> 43e93605fda4af4 100644
> --- 

RE: [PATCH v2][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][3/5x]: MVE store intrinsics with predicated
> suffix.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534337.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE store intrinsics with predicated
> suffix.
> 
> vstrbq_p_s8, vstrbq_p_s32, vstrbq_p_s16, vstrbq_p_u8, vstrbq_p_u32,
> vstrbq_p_u16, vstrbq_scatter_offset_p_s8, vstrbq_scatter_offset_p_s32,
> vstrbq_scatter_offset_p_s16, vstrbq_scatter_offset_p_u8,
> vstrbq_scatter_offset_p_u32, vstrbq_scatter_offset_p_u16,
> vstrwq_scatter_base_p_s32, vstrwq_scatter_base_p_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c (STRS_P_QUALIFIERS): Define builtin
>   qualifier.
>   (STRU_P_QUALIFIERS): Likewise.
>   (STRSU_P_QUALIFIERS): Likewise.
>   (STRSS_P_QUALIFIERS): Likewise.
>   (STRSBS_P_QUALIFIERS): Likewise.
>   (STRSBU_P_QUALIFIERS): Likewise.
>   * config/arm/arm_mve.h (vstrbq_p_s8): Define macro.
>   (vstrbq_p_s32): Likewise.
>   (vstrbq_p_s16): Likewise.
>   (vstrbq_p_u8): Likewise.
>   (vstrbq_p_u32): Likewise.
>   (vstrbq_p_u16): Likewise.
>   (vstrbq_scatter_offset_p_s8): Likewise.
>   (vstrbq_scatter_offset_p_s32): Likewise.
>   (vstrbq_scatter_offset_p_s16): Likewise.
>   (vstrbq_scatter_offset_p_u8): Likewise.
>   (vstrbq_scatter_offset_p_u32): Likewise.
>   (vstrbq_scatter_offset_p_u16): Likewise.
>   (vstrwq_scatter_base_p_s32): Likewise.
>   (vstrwq_scatter_base_p_u32): Likewise.
>   (__arm_vstrbq_p_s8): Define intrinsic.
>   (__arm_vstrbq_p_s32): Likewise.
>   (__arm_vstrbq_p_s16): Likewise.
>   (__arm_vstrbq_p_u8): Likewise.
>   (__arm_vstrbq_p_u32): Likewise.
>   (__arm_vstrbq_p_u16): Likewise.
>   (__arm_vstrbq_scatter_offset_p_s8): Likewise.
>   (__arm_vstrbq_scatter_offset_p_s32): Likewise.
>   (__arm_vstrbq_scatter_offset_p_s16): Likewise.
>   (__arm_vstrbq_scatter_offset_p_u8): Likewise.
>   (__arm_vstrbq_scatter_offset_p_u32): Likewise.
>   (__arm_vstrbq_scatter_offset_p_u16): Likewise.
>   (__arm_vstrwq_scatter_base_p_s32): Likewise.
>   (__arm_vstrwq_scatter_base_p_u32): Likewise.
>   (vstrbq_p): Define polymorphic variant.
>   (vstrbq_scatter_offset_p): Likewise.
>   (vstrwq_scatter_base_p): Likewise.
>   * config/arm/arm_mve_builtins.def (STRS_P_QUALIFIERS): Use
> builtin
>   qualifier.
> (STRU_P_QUALIFIERS): Likewise.
> (STRSU_P_QUALIFIERS): Likewise.
> (STRSS_P_QUALIFIERS): Likewise.
> (STRSBS_P_QUALIFIERS): Likewise.
> (STRSBU_P_QUALIFIERS): Likewise.
>   * config/arm/mve.md
> (mve_vstrbq_scatter_offset_p_): Define
>   RTL pattern.
>   (mve_vstrwq_scatter_base_p_v4si): Likewise.
>   (mve_vstrbq_p_): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c: New test.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c:
> Likewise.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index
> aced55f52d317e8deafdc6a6804db3b80c00fd80..c87fa3118510e4de90ac9afe
> 08608fb2315f4809 100644
> --- a/gcc/config/arm/arm-builtins.c
> 

RE: [PATCH v2][ARM][GCC][2/5x]: MVE load intrinsics.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][2/5x]: MVE load intrinsics.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534338.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE load intrinsics.
> 
> vldrbq_gather_offset_u8, vldrbq_gather_offset_s8, vldrbq_s8, vldrbq_u8,
> vldrbq_gather_offset_u16, vldrbq_gather_offset_s16, vldrbq_s16,
> vldrbq_u16, vldrbq_gather_offset_u32, vldrbq_gather_offset_s32,
> vldrbq_s32, vldrbq_u32, vldrwq_gather_base_s32, vldrwq_gather_base_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.

Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c (LDRGU_QUALIFIERS): Define builtin
>   qualifier.
>   (LDRGS_QUALIFIERS): Likewise.
>   (LDRS_QUALIFIERS): Likewise.
>   (LDRU_QUALIFIERS): Likewise.
>   (LDRGBS_QUALIFIERS): Likewise.
>   (LDRGBU_QUALIFIERS): Likewise.
>   * config/arm/arm_mve.h (vldrbq_gather_offset_u8): Define macro.
>   (vldrbq_gather_offset_s8): Likewise.
>   (vldrbq_s8): Likewise.
>   (vldrbq_u8): Likewise.
>   (vldrbq_gather_offset_u16): Likewise.
>   (vldrbq_gather_offset_s16): Likewise.
>   (vldrbq_s16): Likewise.
>   (vldrbq_u16): Likewise.
>   (vldrbq_gather_offset_u32): Likewise.
>   (vldrbq_gather_offset_s32): Likewise.
>   (vldrbq_s32): Likewise.
>   (vldrbq_u32): Likewise.
>   (vldrwq_gather_base_s32): Likewise.
>   (vldrwq_gather_base_u32): Likewise.
>   (__arm_vldrbq_gather_offset_u8): Define intrinsic.
>   (__arm_vldrbq_gather_offset_s8): Likewise.
>   (__arm_vldrbq_s8): Likewise.
>   (__arm_vldrbq_u8): Likewise.
>   (__arm_vldrbq_gather_offset_u16): Likewise.
>   (__arm_vldrbq_gather_offset_s16): Likewise.
>   (__arm_vldrbq_s16): Likewise.
>   (__arm_vldrbq_u16): Likewise.
>   (__arm_vldrbq_gather_offset_u32): Likewise.
>   (__arm_vldrbq_gather_offset_s32): Likewise.
>   (__arm_vldrbq_s32): Likewise.
>   (__arm_vldrbq_u32): Likewise.
>   (__arm_vldrwq_gather_base_s32): Likewise.
>   (__arm_vldrwq_gather_base_u32): Likewise.
>   (vldrbq_gather_offset): Define polymorphic variant.
>   * config/arm/arm_mve_builtins.def (LDRGU_QUALIFIERS): Use
> builtin
>   qualifier.
> (LDRGS_QUALIFIERS): Likewise.
> (LDRS_QUALIFIERS): Likewise.
> (LDRU_QUALIFIERS): Likewise.
> (LDRGBS_QUALIFIERS): Likewise.
> (LDRGBU_QUALIFIERS): Likewise.
>   * config/arm/mve.md (VLDRBGOQ): Define iterator.
>   (VLDRBQ): Likewise.
>   (VLDRWGBQ): Likewise.
>   (mve_vldrbq_gather_offset_): Define RTL pattern.
>   (mve_vldrbq_): Likewise.
>   (mve_vldrwq_gather_base_v4si): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c: New
> test.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c: Likewise.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index
> b285f074285116ce621e324b644d43efb6538b9d..aced55f52d317e8deafdc6a6
> 804db3b80c00fd80 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -612,6 +612,36 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>qualifier_unsigned};
>  #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
> 
> +static enum arm_type_qualifiers
> 

RE: [PATCH v2][ARM][GCC][1/5x]: MVE store intrinsics.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 17:18
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][1/5x]: MVE store intrinsics.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534334.html
> 
> 
> 
> Hello,
> 
> This patch supports the following MVE ACLE store intrinsics.
> 
> vstrbq_scatter_offset_s8, vstrbq_scatter_offset_s32,
> vstrbq_scatter_offset_s16, vstrbq_scatter_offset_u8,
> vstrbq_scatter_offset_u32, vstrbq_scatter_offset_u16, vstrbq_s8, vstrbq_s32,
> vstrbq_s16, vstrbq_u8, vstrbq_u32, vstrbq_u16, vstrwq_scatter_base_s32,
> vstrwq_scatter_base_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch into master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c (STRS_QUALIFIERS): Define builtin
> qualifier.
>   (STRU_QUALIFIERS): Likewise.
>   (STRSS_QUALIFIERS): Likewise.
>   (STRSU_QUALIFIERS): Likewise.
>   (STRSBS_QUALIFIERS): Likewise.
>   (STRSBU_QUALIFIERS): Likewise.
>   * config/arm/arm_mve.h (vstrbq_s8): Define macro.
>   (vstrbq_u8): Likewise.
>   (vstrbq_u16): Likewise.
>   (vstrbq_scatter_offset_s8): Likewise.
>   (vstrbq_scatter_offset_u8): Likewise.
>   (vstrbq_scatter_offset_u16): Likewise.
>   (vstrbq_s16): Likewise.
>   (vstrbq_u32): Likewise.
>   (vstrbq_scatter_offset_s16): Likewise.
>   (vstrbq_scatter_offset_u32): Likewise.
>   (vstrbq_s32): Likewise.
>   (vstrbq_scatter_offset_s32): Likewise.
>   (vstrwq_scatter_base_s32): Likewise.
>   (vstrwq_scatter_base_u32): Likewise.
>   (__arm_vstrbq_scatter_offset_s8): Define intrinsic.
>   (__arm_vstrbq_scatter_offset_s32): Likewise.
>   (__arm_vstrbq_scatter_offset_s16): Likewise.
>   (__arm_vstrbq_scatter_offset_u8): Likewise.
>   (__arm_vstrbq_scatter_offset_u32): Likewise.
>   (__arm_vstrbq_scatter_offset_u16): Likewise.
>   (__arm_vstrbq_s8): Likewise.
>   (__arm_vstrbq_s32): Likewise.
>   (__arm_vstrbq_s16): Likewise.
>   (__arm_vstrbq_u8): Likewise.
>   (__arm_vstrbq_u32): Likewise.
>   (__arm_vstrbq_u16): Likewise.
>   (__arm_vstrwq_scatter_base_s32): Likewise.
>   (__arm_vstrwq_scatter_base_u32): Likewise.
>   (vstrbq): Define polymorphic variant.
>   (vstrbq_scatter_offset): Likewise.
>   (vstrwq_scatter_base): Likewise.
>   * config/arm/arm_mve_builtins.def (STRS_QUALIFIERS): Use builtin
>   qualifier.
> (STRU_QUALIFIERS): Likewise.
> (STRSS_QUALIFIERS): Likewise.
> (STRSU_QUALIFIERS): Likewise.
> (STRSBS_QUALIFIERS): Likewise.
> (STRSBU_QUALIFIERS): Likewise.
>   * config/arm/mve.md (MVE_B_ELEM): Define mode attribute
> iterator.
>   (VSTRWSBQ): Define iterators.
>   (VSTRBSOQ): Likewise.
>   (VSTRBQ): Likewise.
>   (mve_vstrbq_): Define RTL pattern.
>   (mve_vstrbq_scatter_offset_): Likewise.
>   (mve_vstrwq_scatter_base_v4si): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-01  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * gcc.target/arm/mve/intrinsics/vstrbq_s16.c: New test.
>   * gcc.target/arm/mve/intrinsics/vstrbq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrbq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index
> 26f0379f62b95886414d2eb4d7c6a6c4fc235e60..b285f074285116ce621e324b
> 644d43efb6538b9d 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -579,6 +579,39 @@
> 

[PATCH v2][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534333.html



Hello,

This patch supports the following MVE ACLE load intrinsics with zero(_z)
suffix.
* ``_z`` (zero) which indicates false-predicated lanes are filled with zeroes,
these are only used for load instructions.

vldrbq_gather_offset_z_s16, vldrbq_gather_offset_z_u8, 
vldrbq_gather_offset_z_s32,
vldrbq_gather_offset_z_u16, vldrbq_gather_offset_z_u32, 
vldrbq_gather_offset_z_s8,
vldrbq_z_s16, vldrbq_z_u8, vldrbq_z_s8, vldrbq_z_s32, vldrbq_z_u16, 
vldrbq_z_u32,
vldrwq_gather_base_z_u32, vldrwq_gather_base_z_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (LDRGBS_Z_QUALIFIERS): Define builtin
qualifier.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGS_Z_QUALIFIERS): Likewise.
(LDRGU_Z_QUALIFIERS): Likewise.
(LDRS_Z_QUALIFIERS): Likewise.
(LDRU_Z_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vldrbq_gather_offset_z_s16): Define macro.
(vldrbq_gather_offset_z_u8): Likewise.
(vldrbq_gather_offset_z_s32): Likewise.
(vldrbq_gather_offset_z_u16): Likewise.
(vldrbq_gather_offset_z_u32): Likewise.
(vldrbq_gather_offset_z_s8): Likewise.
(vldrbq_z_s16): Likewise.
(vldrbq_z_u8): Likewise.
(vldrbq_z_s8): Likewise.
(vldrbq_z_s32): Likewise.
(vldrbq_z_u16): Likewise.
(vldrbq_z_u32): Likewise.
(vldrwq_gather_base_z_u32): Likewise.
(vldrwq_gather_base_z_s32): Likewise.
(__arm_vldrbq_gather_offset_z_s8): Define intrinsic.
(__arm_vldrbq_gather_offset_z_s32): Likewise.
(__arm_vldrbq_gather_offset_z_s16): Likewise.
(__arm_vldrbq_gather_offset_z_u8): Likewise.
(__arm_vldrbq_gather_offset_z_u32): Likewise.
(__arm_vldrbq_gather_offset_z_u16): Likewise.
(__arm_vldrbq_z_s8): Likewise.
(__arm_vldrbq_z_s32): Likewise.
(__arm_vldrbq_z_s16): Likewise.
(__arm_vldrbq_z_u8): Likewise.
(__arm_vldrbq_z_u32): Likewise.
(__arm_vldrbq_z_u16): Likewise.
(__arm_vldrwq_gather_base_z_s32): Likewise.
(__arm_vldrwq_gather_base_z_u32): Likewise.
(vldrbq_gather_offset_z): Define polymorphic variant.
* config/arm/arm_mve_builtins.def (LDRGBS_Z_QUALIFIERS): Use builtin
qualifier.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGS_Z_QUALIFIERS): Likewise.
(LDRGU_Z_QUALIFIERS): Likewise.
(LDRS_Z_QUALIFIERS): Likewise.
(LDRU_Z_QUALIFIERS): Likewise.
* config/arm/mve.md (mve_vldrbq_gather_offset_z_): Define
RTL pattern.
(mve_vldrbq_z_): Likewise.
(mve_vldrwq_gather_base_z_v4si): Likewise.

gcc/testsuite/ChangeLog: Likewise.

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
c87fa3118510e4de90ac9afe08608fb2315f4809..c3deb9efc8849019141b6430543e93605fda4af4
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -677,6 +677,40 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
 #define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate,
+  qualifier_unsigned};

[PATCH v2][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534354.html



Hello,

This patch supports the following Remaining MVE ACLE load intrinsics which load 
an halfword,
word or double word from memory.

vldrdq_gather_base_s64, vldrdq_gather_base_u64, vldrdq_gather_base_z_s64,
vldrdq_gather_base_z_u64, vldrdq_gather_offset_s64, vldrdq_gather_offset_u64,
vldrdq_gather_offset_z_s64, vldrdq_gather_offset_z_u64, 
vldrdq_gather_shifted_offset_s64,
vldrdq_gather_shifted_offset_u64, vldrdq_gather_shifted_offset_z_s64,
vldrdq_gather_shifted_offset_z_u64, vldrhq_gather_offset_f16, 
vldrhq_gather_offset_z_f16,
vldrhq_gather_shifted_offset_f16, vldrhq_gather_shifted_offset_z_f16, 
vldrwq_gather_base_f32,
vldrwq_gather_base_z_f32, vldrwq_gather_offset_f32, vldrwq_gather_offset_s32,
vldrwq_gather_offset_u32, vldrwq_gather_offset_z_f32, 
vldrwq_gather_offset_z_s32,
vldrwq_gather_offset_z_u32, vldrwq_gather_shifted_offset_f32, 
vldrwq_gather_shifted_offset_s32,
vldrwq_gather_shifted_offset_u32, vldrwq_gather_shifted_offset_z_f32,
vldrwq_gather_shifted_offset_z_s32, vldrwq_gather_shifted_offset_z_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vldrdq_gather_base_s64): Define macro.
(vldrdq_gather_base_u64): Likewise.
(vldrdq_gather_base_z_s64): Likewise.
(vldrdq_gather_base_z_u64): Likewise.
(vldrdq_gather_offset_s64): Likewise.
(vldrdq_gather_offset_u64): Likewise.
(vldrdq_gather_offset_z_s64): Likewise.
(vldrdq_gather_offset_z_u64): Likewise.
(vldrdq_gather_shifted_offset_s64): Likewise.
(vldrdq_gather_shifted_offset_u64): Likewise.
(vldrdq_gather_shifted_offset_z_s64): Likewise.
(vldrdq_gather_shifted_offset_z_u64): Likewise.
(vldrhq_gather_offset_f16): Likewise.
(vldrhq_gather_offset_z_f16): Likewise.
(vldrhq_gather_shifted_offset_f16): Likewise.
(vldrhq_gather_shifted_offset_z_f16): Likewise.
(vldrwq_gather_base_f32): Likewise.
(vldrwq_gather_base_z_f32): Likewise.
(vldrwq_gather_offset_f32): Likewise.
(vldrwq_gather_offset_s32): Likewise.
(vldrwq_gather_offset_u32): Likewise.
(vldrwq_gather_offset_z_f32): Likewise.
(vldrwq_gather_offset_z_s32): Likewise.
(vldrwq_gather_offset_z_u32): Likewise.
(vldrwq_gather_shifted_offset_f32): Likewise.
(vldrwq_gather_shifted_offset_s32): Likewise.
(vldrwq_gather_shifted_offset_u32): Likewise.
(vldrwq_gather_shifted_offset_z_f32): Likewise.
(vldrwq_gather_shifted_offset_z_s32): Likewise.
(vldrwq_gather_shifted_offset_z_u32): Likewise.
(__arm_vldrdq_gather_base_s64): Define intrinsic.
(__arm_vldrdq_gather_base_u64): Likewise.
(__arm_vldrdq_gather_base_z_s64): Likewise.
(__arm_vldrdq_gather_base_z_u64): Likewise.
(__arm_vldrdq_gather_offset_s64): Likewise.
(__arm_vldrdq_gather_offset_u64): Likewise.
(__arm_vldrdq_gather_offset_z_s64): Likewise.
(__arm_vldrdq_gather_offset_z_u64): Likewise.
(__arm_vldrdq_gather_shifted_offset_s64): Likewise.
(__arm_vldrdq_gather_shifted_offset_u64): Likewise.
(__arm_vldrdq_gather_shifted_offset_z_s64): Likewise.
(__arm_vldrdq_gather_shifted_offset_z_u64): Likewise.
(__arm_vldrwq_gather_offset_s32): Likewise.
(__arm_vldrwq_gather_offset_u32): Likewise.
(__arm_vldrwq_gather_offset_z_s32): Likewise.
(__arm_vldrwq_gather_offset_z_u32): Likewise.
(__arm_vldrwq_gather_shifted_offset_s32): Likewise.
(__arm_vldrwq_gather_shifted_offset_u32): Likewise.
(__arm_vldrwq_gather_shifted_offset_z_s32): Likewise.
(__arm_vldrwq_gather_shifted_offset_z_u32): Likewise.
(__arm_vldrhq_gather_offset_f16): Likewise.
(__arm_vldrhq_gather_offset_z_f16): Likewise.
(__arm_vldrhq_gather_shifted_offset_f16): Likewise.
(__arm_vldrhq_gather_shifted_offset_z_f16): Likewise.
(__arm_vldrwq_gather_base_f32): Likewise.
(__arm_vldrwq_gather_base_z_f32): Likewise.
(__arm_vldrwq_gather_offset_f32): Likewise.
(__arm_vldrwq_gather_offset_z_f32): Likewise.
(__arm_vldrwq_gather_shifted_offset_f32): Likewise.
(__arm_vldrwq_gather_shifted_offset_z_f32): Likewise.
(vldrhq_gather_offset): Define polymorphic variant.
(vldrhq_gather_offset_z): Likewise.
(vldrhq_gather_shifted_offset): Likewise.
(vldrhq_gather_shifted_offset_z): Likewise.

[PATCH v2][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534340.html



Hello,

This patch supports the following MVE ACLE store intrinsics which stores an 
halfword,
word or double word to memory.

vstrdq_scatter_base_p_s64, vstrdq_scatter_base_p_u64, vstrdq_scatter_base_s64,
vstrdq_scatter_base_u64, vstrdq_scatter_offset_p_s64, 
vstrdq_scatter_offset_p_u64,
vstrdq_scatter_offset_s64, vstrdq_scatter_offset_u64, 
vstrdq_scatter_shifted_offset_p_s64,
vstrdq_scatter_shifted_offset_p_u64, vstrdq_scatter_shifted_offset_s64,
vstrdq_scatter_shifted_offset_u64, vstrhq_scatter_offset_f16, 
vstrhq_scatter_offset_p_f16,
vstrhq_scatter_shifted_offset_f16, vstrhq_scatter_shifted_offset_p_f16,
vstrwq_scatter_base_f32, vstrwq_scatter_base_p_f32, vstrwq_scatter_offset_f32,
vstrwq_scatter_offset_p_f32, vstrwq_scatter_offset_p_s32, 
vstrwq_scatter_offset_p_u32,
vstrwq_scatter_offset_s32, vstrwq_scatter_offset_u32, 
vstrwq_scatter_shifted_offset_f32,
vstrwq_scatter_shifted_offset_p_f32, vstrwq_scatter_shifted_offset_p_s32,
vstrwq_scatter_shifted_offset_p_u32, vstrwq_scatter_shifted_offset_s32,
vstrwq_scatter_shifted_offset_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch a new predicate "Ri" is defined to check the immediate is in the
range of +/-1016 and multiple of 8.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-05  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vstrdq_scatter_base_p_s64): Define macro.
(vstrdq_scatter_base_p_u64): Likewise.
(vstrdq_scatter_base_s64): Likewise.
(vstrdq_scatter_base_u64): Likewise.
(vstrdq_scatter_offset_p_s64): Likewise.
(vstrdq_scatter_offset_p_u64): Likewise.
(vstrdq_scatter_offset_s64): Likewise.
(vstrdq_scatter_offset_u64): Likewise.
(vstrdq_scatter_shifted_offset_p_s64): Likewise.
(vstrdq_scatter_shifted_offset_p_u64): Likewise.
(vstrdq_scatter_shifted_offset_s64): Likewise.
(vstrdq_scatter_shifted_offset_u64): Likewise.
(vstrhq_scatter_offset_f16): Likewise.
(vstrhq_scatter_offset_p_f16): Likewise.
(vstrhq_scatter_shifted_offset_f16): Likewise.
(vstrhq_scatter_shifted_offset_p_f16): Likewise.
(vstrwq_scatter_base_f32): Likewise.
(vstrwq_scatter_base_p_f32): Likewise.
(vstrwq_scatter_offset_f32): Likewise.
(vstrwq_scatter_offset_p_f32): Likewise.
(vstrwq_scatter_offset_p_s32): Likewise.
(vstrwq_scatter_offset_p_u32): Likewise.
(vstrwq_scatter_offset_s32): Likewise.
(vstrwq_scatter_offset_u32): Likewise.
(vstrwq_scatter_shifted_offset_f32): Likewise.
(vstrwq_scatter_shifted_offset_p_f32): Likewise.
(vstrwq_scatter_shifted_offset_p_s32): Likewise.
(vstrwq_scatter_shifted_offset_p_u32): Likewise.
(vstrwq_scatter_shifted_offset_s32): Likewise.
(vstrwq_scatter_shifted_offset_u32): Likewise.
(__arm_vstrdq_scatter_base_p_s64): Define intrinsic.
(__arm_vstrdq_scatter_base_p_u64): Likewise.
(__arm_vstrdq_scatter_base_s64): Likewise.
(__arm_vstrdq_scatter_base_u64): Likewise.
(__arm_vstrdq_scatter_offset_p_s64): Likewise.
(__arm_vstrdq_scatter_offset_p_u64): Likewise.
(__arm_vstrdq_scatter_offset_s64): Likewise.
(__arm_vstrdq_scatter_offset_u64): Likewise.
(__arm_vstrdq_scatter_shifted_offset_p_s64): Likewise.
(__arm_vstrdq_scatter_shifted_offset_p_u64): Likewise.
(__arm_vstrdq_scatter_shifted_offset_s64): Likewise.
(__arm_vstrdq_scatter_shifted_offset_u64): Likewise.
(__arm_vstrwq_scatter_offset_p_s32): Likewise.
(__arm_vstrwq_scatter_offset_p_u32): Likewise.
(__arm_vstrwq_scatter_offset_s32): Likewise.
(__arm_vstrwq_scatter_offset_u32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_p_s32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_p_u32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_s32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_u32): Likewise.
(__arm_vstrhq_scatter_offset_f16): Likewise.
(__arm_vstrhq_scatter_offset_p_f16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_f16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_p_f16): Likewise.
(__arm_vstrwq_scatter_base_f32): Likewise.
(__arm_vstrwq_scatter_base_p_f32): Likewise.
(__arm_vstrwq_scatter_offset_f32): Likewise.
(__arm_vstrwq_scatter_offset_p_f32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_f32): Likewise.
(__arm_vstrwq_scatter_shifted_offset_p_f32): Likewise.

[PATCH v2][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534335.html



Hello,

This patch supports the following MVE ACLE store intrinsics which stores a 
byte, halfword,
or word to memory.

vst1q_f32, vst1q_f16, vst1q_s8, vst1q_s32, vst1q_s16, vst1q_u8, vst1q_u32, 
vst1q_u16,
vstrhq_f16, vstrhq_scatter_offset_s32, vstrhq_scatter_offset_s16, 
vstrhq_scatter_offset_u32,
vstrhq_scatter_offset_u16, vstrhq_scatter_offset_p_s32, 
vstrhq_scatter_offset_p_s16,
vstrhq_scatter_offset_p_u32, vstrhq_scatter_offset_p_u16, 
vstrhq_scatter_shifted_offset_s32,
vstrhq_scatter_shifted_offset_s16, vstrhq_scatter_shifted_offset_u32,
vstrhq_scatter_shifted_offset_u16, vstrhq_scatter_shifted_offset_p_s32,
vstrhq_scatter_shifted_offset_p_s16, vstrhq_scatter_shifted_offset_p_u32,
vstrhq_scatter_shifted_offset_p_u16, vstrhq_s32, vstrhq_s16, vstrhq_u32, 
vstrhq_u16,
vstrhq_p_f16, vstrhq_p_s32, vstrhq_p_s16, vstrhq_p_u32, vstrhq_p_u16, 
vstrwq_f32,
vstrwq_s32, vstrwq_u32, vstrwq_p_f32, vstrwq_p_s32, vstrwq_p_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vst1q_f32): Define macro.
(vst1q_f16): Likewise.
(vst1q_s8): Likewise.
(vst1q_s32): Likewise.
(vst1q_s16): Likewise.
(vst1q_u8): Likewise.
(vst1q_u32): Likewise.
(vst1q_u16): Likewise.
(vstrhq_f16): Likewise.
(vstrhq_scatter_offset_s32): Likewise.
(vstrhq_scatter_offset_s16): Likewise.
(vstrhq_scatter_offset_u32): Likewise.
(vstrhq_scatter_offset_u16): Likewise.
(vstrhq_scatter_offset_p_s32): Likewise.
(vstrhq_scatter_offset_p_s16): Likewise.
(vstrhq_scatter_offset_p_u32): Likewise.
(vstrhq_scatter_offset_p_u16): Likewise.
(vstrhq_scatter_shifted_offset_s32): Likewise.
(vstrhq_scatter_shifted_offset_s16): Likewise.
(vstrhq_scatter_shifted_offset_u32): Likewise.
(vstrhq_scatter_shifted_offset_u16): Likewise.
(vstrhq_scatter_shifted_offset_p_s32): Likewise.
(vstrhq_scatter_shifted_offset_p_s16): Likewise.
(vstrhq_scatter_shifted_offset_p_u32): Likewise.
(vstrhq_scatter_shifted_offset_p_u16): Likewise.
(vstrhq_s32): Likewise.
(vstrhq_s16): Likewise.
(vstrhq_u32): Likewise.
(vstrhq_u16): Likewise.
(vstrhq_p_f16): Likewise.
(vstrhq_p_s32): Likewise.
(vstrhq_p_s16): Likewise.
(vstrhq_p_u32): Likewise.
(vstrhq_p_u16): Likewise.
(vstrwq_f32): Likewise.
(vstrwq_s32): Likewise.
(vstrwq_u32): Likewise.
(vstrwq_p_f32): Likewise.
(vstrwq_p_s32): Likewise.
(vstrwq_p_u32): Likewise.
(__arm_vst1q_s8): Define intrinsic.
(__arm_vst1q_s32): Likewise.
(__arm_vst1q_s16): Likewise.
(__arm_vst1q_u8): Likewise.
(__arm_vst1q_u32): Likewise.
(__arm_vst1q_u16): Likewise.
(__arm_vstrhq_scatter_offset_s32): Likewise.
(__arm_vstrhq_scatter_offset_s16): Likewise.
(__arm_vstrhq_scatter_offset_u32): Likewise.
(__arm_vstrhq_scatter_offset_u16): Likewise.
(__arm_vstrhq_scatter_offset_p_s32): Likewise.
(__arm_vstrhq_scatter_offset_p_s16): Likewise.
(__arm_vstrhq_scatter_offset_p_u32): Likewise.
(__arm_vstrhq_scatter_offset_p_u16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_s32): Likewise.
(__arm_vstrhq_scatter_shifted_offset_s16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_u32): Likewise.
(__arm_vstrhq_scatter_shifted_offset_u16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_p_s32): Likewise.
(__arm_vstrhq_scatter_shifted_offset_p_s16): Likewise.
(__arm_vstrhq_scatter_shifted_offset_p_u32): Likewise.
(__arm_vstrhq_scatter_shifted_offset_p_u16): Likewise.
(__arm_vstrhq_s32): Likewise.
(__arm_vstrhq_s16): Likewise.
(__arm_vstrhq_u32): Likewise.
(__arm_vstrhq_u16): Likewise.
(__arm_vstrhq_p_s32): Likewise.
(__arm_vstrhq_p_s16): Likewise.
(__arm_vstrhq_p_u32): Likewise.
(__arm_vstrhq_p_u16): Likewise.
(__arm_vstrwq_s32): Likewise.
(__arm_vstrwq_u32): Likewise.
(__arm_vstrwq_p_s32): Likewise.
(__arm_vstrwq_p_u32): Likewise.
(__arm_vstrwq_p_f32): Likewise.
(__arm_vstrwq_f32): Likewise.
(__arm_vst1q_f32): Likewise.
(__arm_vst1q_f16): Likewise.
(__arm_vstrhq_f16): Likewise.
(__arm_vstrhq_p_f16): Likewise.
(vst1q): Define polymorphic variant.
  

[PATCH v2][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534352.html



Hello,

This patch supports the following MVE ACLE load intrinsics which load a byte, 
halfword,
or word from memory.
vld1q_s8, vld1q_s32, vld1q_s16, vld1q_u8, vld1q_u32, vld1q_u16, 
vldrhq_gather_offset_s32,
vldrhq_gather_offset_s16, vldrhq_gather_offset_u32, vldrhq_gather_offset_u16,
vldrhq_gather_offset_z_s32, vldrhq_gather_offset_z_s16, 
vldrhq_gather_offset_z_u32,
vldrhq_gather_offset_z_u16, vldrhq_gather_shifted_offset_s32,vldrwq_f32, 
vldrwq_z_f32,
vldrhq_gather_shifted_offset_s16, vldrhq_gather_shifted_offset_u32,
vldrhq_gather_shifted_offset_u16, vldrhq_gather_shifted_offset_z_s32,
vldrhq_gather_shifted_offset_z_s16, vldrhq_gather_shifted_offset_z_u32,
vldrhq_gather_shifted_offset_z_u16, vldrhq_s32, vldrhq_s16, vldrhq_u32, 
vldrhq_u16,
vldrhq_z_s32, vldrhq_z_s16, vldrhq_z_u32, vldrhq_z_u16, vldrwq_s32, vldrwq_u32,
vldrwq_z_s32, vldrwq_z_u32, vld1q_f32, vld1q_f16, vldrhq_f16, vldrhq_z_f16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vld1q_s8): Define macro.
(vld1q_s32): Likewise.
(vld1q_s16): Likewise.
(vld1q_u8): Likewise.
(vld1q_u32): Likewise.
(vld1q_u16): Likewise.
(vldrhq_gather_offset_s32): Likewise.
(vldrhq_gather_offset_s16): Likewise.
(vldrhq_gather_offset_u32): Likewise.
(vldrhq_gather_offset_u16): Likewise.
(vldrhq_gather_offset_z_s32): Likewise.
(vldrhq_gather_offset_z_s16): Likewise.
(vldrhq_gather_offset_z_u32): Likewise.
(vldrhq_gather_offset_z_u16): Likewise.
(vldrhq_gather_shifted_offset_s32): Likewise.
(vldrhq_gather_shifted_offset_s16): Likewise.
(vldrhq_gather_shifted_offset_u32): Likewise.
(vldrhq_gather_shifted_offset_u16): Likewise.
(vldrhq_gather_shifted_offset_z_s32): Likewise.
(vldrhq_gather_shifted_offset_z_s16): Likewise.
(vldrhq_gather_shifted_offset_z_u32): Likewise.
(vldrhq_gather_shifted_offset_z_u16): Likewise.
(vldrhq_s32): Likewise.
(vldrhq_s16): Likewise.
(vldrhq_u32): Likewise.
(vldrhq_u16): Likewise.
(vldrhq_z_s32): Likewise.
(vldrhq_z_s16): Likewise.
(vldrhq_z_u32): Likewise.
(vldrhq_z_u16): Likewise.
(vldrwq_s32): Likewise.
(vldrwq_u32): Likewise.
(vldrwq_z_s32): Likewise.
(vldrwq_z_u32): Likewise.
(vld1q_f32): Likewise.
(vld1q_f16): Likewise.
(vldrhq_f16): Likewise.
(vldrhq_z_f16): Likewise.
(vldrwq_f32): Likewise.
(vldrwq_z_f32): Likewise.
(__arm_vld1q_s8): Define intrinsic.
(__arm_vld1q_s32): Likewise.
(__arm_vld1q_s16): Likewise.
(__arm_vld1q_u8): Likewise.
(__arm_vld1q_u32): Likewise.
(__arm_vld1q_u16): Likewise.
(__arm_vldrhq_gather_offset_s32): Likewise.
(__arm_vldrhq_gather_offset_s16): Likewise.
(__arm_vldrhq_gather_offset_u32): Likewise.
(__arm_vldrhq_gather_offset_u16): Likewise.
(__arm_vldrhq_gather_offset_z_s32): Likewise.
(__arm_vldrhq_gather_offset_z_s16): Likewise.
(__arm_vldrhq_gather_offset_z_u32): Likewise.
(__arm_vldrhq_gather_offset_z_u16): Likewise.
(__arm_vldrhq_gather_shifted_offset_s32): Likewise.
(__arm_vldrhq_gather_shifted_offset_s16): Likewise.
(__arm_vldrhq_gather_shifted_offset_u32): Likewise.
(__arm_vldrhq_gather_shifted_offset_u16): Likewise.
(__arm_vldrhq_gather_shifted_offset_z_s32): Likewise.
(__arm_vldrhq_gather_shifted_offset_z_s16): Likewise.
(__arm_vldrhq_gather_shifted_offset_z_u32): Likewise.
(__arm_vldrhq_gather_shifted_offset_z_u16): Likewise.
(__arm_vldrhq_s32): Likewise.
(__arm_vldrhq_s16): Likewise.
(__arm_vldrhq_u32): Likewise.
(__arm_vldrhq_u16): Likewise.
(__arm_vldrhq_z_s32): Likewise.
(__arm_vldrhq_z_s16): Likewise.
(__arm_vldrhq_z_u32): Likewise.
(__arm_vldrhq_z_u16): Likewise.
(__arm_vldrwq_s32): Likewise.
(__arm_vldrwq_u32): Likewise.
(__arm_vldrwq_z_s32): Likewise.
(__arm_vldrwq_z_u32): Likewise.
(__arm_vld1q_f32): Likewise.
(__arm_vld1q_f16): Likewise.
(__arm_vldrwq_f32): Likewise.
(__arm_vldrwq_z_f32): Likewise.
(__arm_vldrhq_z_f16): Likewise.
(__arm_vldrhq_f16): Likewise.
(vld1q): Define polymorphic variant.
(vldrhq_gather_offset): Likewise.

[PATCH v2][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534337.html



Hello,

This patch supports the following MVE ACLE store intrinsics with predicated
suffix.

vstrbq_p_s8, vstrbq_p_s32, vstrbq_p_s16, vstrbq_p_u8, vstrbq_p_u32,
vstrbq_p_u16, vstrbq_scatter_offset_p_s8, vstrbq_scatter_offset_p_s32,
vstrbq_scatter_offset_p_s16, vstrbq_scatter_offset_p_u8,
vstrbq_scatter_offset_p_u32, vstrbq_scatter_offset_p_u16,
vstrwq_scatter_base_p_s32, vstrwq_scatter_base_p_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (STRS_P_QUALIFIERS): Define builtin
qualifier.
(STRU_P_QUALIFIERS): Likewise.
(STRSU_P_QUALIFIERS): Likewise.
(STRSS_P_QUALIFIERS): Likewise.
(STRSBS_P_QUALIFIERS): Likewise.
(STRSBU_P_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vstrbq_p_s8): Define macro.
(vstrbq_p_s32): Likewise.
(vstrbq_p_s16): Likewise.
(vstrbq_p_u8): Likewise.
(vstrbq_p_u32): Likewise.
(vstrbq_p_u16): Likewise.
(vstrbq_scatter_offset_p_s8): Likewise.
(vstrbq_scatter_offset_p_s32): Likewise.
(vstrbq_scatter_offset_p_s16): Likewise.
(vstrbq_scatter_offset_p_u8): Likewise.
(vstrbq_scatter_offset_p_u32): Likewise.
(vstrbq_scatter_offset_p_u16): Likewise.
(vstrwq_scatter_base_p_s32): Likewise.
(vstrwq_scatter_base_p_u32): Likewise.
(__arm_vstrbq_p_s8): Define intrinsic.
(__arm_vstrbq_p_s32): Likewise.
(__arm_vstrbq_p_s16): Likewise.
(__arm_vstrbq_p_u8): Likewise.
(__arm_vstrbq_p_u32): Likewise.
(__arm_vstrbq_p_u16): Likewise.
(__arm_vstrbq_scatter_offset_p_s8): Likewise.
(__arm_vstrbq_scatter_offset_p_s32): Likewise.
(__arm_vstrbq_scatter_offset_p_s16): Likewise.
(__arm_vstrbq_scatter_offset_p_u8): Likewise.
(__arm_vstrbq_scatter_offset_p_u32): Likewise.
(__arm_vstrbq_scatter_offset_p_u16): Likewise.
(__arm_vstrwq_scatter_base_p_s32): Likewise.
(__arm_vstrwq_scatter_base_p_u32): Likewise.
(vstrbq_p): Define polymorphic variant.
(vstrbq_scatter_offset_p): Likewise.
(vstrwq_scatter_base_p): Likewise.
* config/arm/arm_mve_builtins.def (STRS_P_QUALIFIERS): Use builtin
qualifier.
(STRU_P_QUALIFIERS): Likewise.
(STRSU_P_QUALIFIERS): Likewise.
(STRSS_P_QUALIFIERS): Likewise.
(STRSBS_P_QUALIFIERS): Likewise.
(STRSBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md (mve_vstrbq_scatter_offset_p_): Define
RTL pattern.
(mve_vstrwq_scatter_base_p_v4si): Likewise.
(mve_vstrbq_p_): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
aced55f52d317e8deafdc6a6804db3b80c00fd80..c87fa3118510e4de90ac9afe08608fb2315f4809
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -613,6 +613,41 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
 static enum arm_type_qualifiers
+arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_unsigned};
+#define STRS_P_QUALIFIERS (arm_strs_p_qualifiers)
+
+static enum arm_type_qualifiers

[PATCH v2][ARM][GCC][1/5x]: MVE store intrinsics.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534334.html



Hello,

This patch supports the following MVE ACLE store intrinsics.

vstrbq_scatter_offset_s8, vstrbq_scatter_offset_s32, vstrbq_scatter_offset_s16,
vstrbq_scatter_offset_u8, vstrbq_scatter_offset_u32, vstrbq_scatter_offset_u16,
vstrbq_s8, vstrbq_s32, vstrbq_s16, vstrbq_u8, vstrbq_u32, vstrbq_u16,
vstrwq_scatter_base_s32, vstrwq_scatter_base_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (STRS_QUALIFIERS): Define builtin qualifier.
(STRU_QUALIFIERS): Likewise.
(STRSS_QUALIFIERS): Likewise.
(STRSU_QUALIFIERS): Likewise.
(STRSBS_QUALIFIERS): Likewise.
(STRSBU_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vstrbq_s8): Define macro.
(vstrbq_u8): Likewise.
(vstrbq_u16): Likewise.
(vstrbq_scatter_offset_s8): Likewise.
(vstrbq_scatter_offset_u8): Likewise.
(vstrbq_scatter_offset_u16): Likewise.
(vstrbq_s16): Likewise.
(vstrbq_u32): Likewise.
(vstrbq_scatter_offset_s16): Likewise.
(vstrbq_scatter_offset_u32): Likewise.
(vstrbq_s32): Likewise.
(vstrbq_scatter_offset_s32): Likewise.
(vstrwq_scatter_base_s32): Likewise.
(vstrwq_scatter_base_u32): Likewise.
(__arm_vstrbq_scatter_offset_s8): Define intrinsic.
(__arm_vstrbq_scatter_offset_s32): Likewise.
(__arm_vstrbq_scatter_offset_s16): Likewise.
(__arm_vstrbq_scatter_offset_u8): Likewise.
(__arm_vstrbq_scatter_offset_u32): Likewise.
(__arm_vstrbq_scatter_offset_u16): Likewise.
(__arm_vstrbq_s8): Likewise.
(__arm_vstrbq_s32): Likewise.
(__arm_vstrbq_s16): Likewise.
(__arm_vstrbq_u8): Likewise.
(__arm_vstrbq_u32): Likewise.
(__arm_vstrbq_u16): Likewise.
(__arm_vstrwq_scatter_base_s32): Likewise.
(__arm_vstrwq_scatter_base_u32): Likewise.
(vstrbq): Define polymorphic variant.
(vstrbq_scatter_offset): Likewise.
(vstrwq_scatter_base): Likewise.
* config/arm/arm_mve_builtins.def (STRS_QUALIFIERS): Use builtin
qualifier.
(STRU_QUALIFIERS): Likewise.
(STRSS_QUALIFIERS): Likewise.
(STRSU_QUALIFIERS): Likewise.
(STRSBS_QUALIFIERS): Likewise.
(STRSBU_QUALIFIERS): Likewise.
* config/arm/mve.md (MVE_B_ELEM): Define mode attribute iterator.
(VSTRWSBQ): Define iterators.
(VSTRBSOQ): Likewise. 
(VSTRBQ): Likewise.
(mve_vstrbq_): Define RTL pattern.
(mve_vstrbq_scatter_offset_): Likewise.
(mve_vstrwq_scatter_base_v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* gcc.target/arm/mve/intrinsics/vstrbq_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vstrbq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrbq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
26f0379f62b95886414d2eb4d7c6a6c4fc235e60..b285f074285116ce621e324b644d43efb6538b9d
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -579,6 +579,39 @@ 
arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none };
+#define STRS_QUALIFIERS (arm_strs_qualifiers)
+
+static enum 

[PATCH v2][ARM][GCC][2/5x]: MVE load intrinsics.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534338.html



Hello,

This patch supports the following MVE ACLE load intrinsics.

vldrbq_gather_offset_u8, vldrbq_gather_offset_s8, vldrbq_s8, vldrbq_u8,
vldrbq_gather_offset_u16, vldrbq_gather_offset_s16, vldrbq_s16, vldrbq_u16,
vldrbq_gather_offset_u32, vldrbq_gather_offset_s32, vldrbq_s32, vldrbq_u32,
vldrwq_gather_base_s32, vldrwq_gather_base_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (LDRGU_QUALIFIERS): Define builtin
qualifier.
(LDRGS_QUALIFIERS): Likewise.
(LDRS_QUALIFIERS): Likewise.
(LDRU_QUALIFIERS): Likewise.
(LDRGBS_QUALIFIERS): Likewise.
(LDRGBU_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vldrbq_gather_offset_u8): Define macro.
(vldrbq_gather_offset_s8): Likewise.
(vldrbq_s8): Likewise.
(vldrbq_u8): Likewise.
(vldrbq_gather_offset_u16): Likewise.
(vldrbq_gather_offset_s16): Likewise.
(vldrbq_s16): Likewise.
(vldrbq_u16): Likewise.
(vldrbq_gather_offset_u32): Likewise.
(vldrbq_gather_offset_s32): Likewise.
(vldrbq_s32): Likewise.
(vldrbq_u32): Likewise.
(vldrwq_gather_base_s32): Likewise.
(vldrwq_gather_base_u32): Likewise.
(__arm_vldrbq_gather_offset_u8): Define intrinsic.
(__arm_vldrbq_gather_offset_s8): Likewise.
(__arm_vldrbq_s8): Likewise.
(__arm_vldrbq_u8): Likewise.
(__arm_vldrbq_gather_offset_u16): Likewise.
(__arm_vldrbq_gather_offset_s16): Likewise.
(__arm_vldrbq_s16): Likewise.
(__arm_vldrbq_u16): Likewise.
(__arm_vldrbq_gather_offset_u32): Likewise.
(__arm_vldrbq_gather_offset_s32): Likewise.
(__arm_vldrbq_s32): Likewise.
(__arm_vldrbq_u32): Likewise.
(__arm_vldrwq_gather_base_s32): Likewise.
(__arm_vldrwq_gather_base_u32): Likewise.
(vldrbq_gather_offset): Define polymorphic variant.
* config/arm/arm_mve_builtins.def (LDRGU_QUALIFIERS): Use builtin
qualifier.
(LDRGS_QUALIFIERS): Likewise.
(LDRS_QUALIFIERS): Likewise.
(LDRU_QUALIFIERS): Likewise.
(LDRGBS_QUALIFIERS): Likewise.
(LDRGBU_QUALIFIERS): Likewise.
* config/arm/mve.md (VLDRBGOQ): Define iterator.
(VLDRBQ): Likewise. 
(VLDRWGBQ): Likewise.
(mve_vldrbq_gather_offset_): Define RTL pattern.
(mve_vldrbq_): Likewise.
(mve_vldrwq_gather_base_v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
b285f074285116ce621e324b644d43efb6538b9d..aced55f52d317e8deafdc6a6804db3b80c00fd80
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -612,6 +612,36 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned};
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+#define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+#define LDRGS_QUALIFIERS (arm_ldrgs_qualifiers)
+
+static enum arm_type_qualifiers

Re: [PATCH] middle-end/94188 fix fold of addr expression generation

2020-03-18 Thread Maxim Kuvyrkov via Gcc-patches


> On 17 Mar 2020, at 17:40, Richard Biener  wrote:
> 
> 
> This adds a missing type conversion to build_fold_addr_expr and adjusts
> fallout - build_fold_addr_expr was used as a convenience to build an
> ADDR_EXPR but some callers do not expect the result to be simplified
> to something else.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testin in progress.
> 
> This is the 3rd or 4th attempt and I hope to have catched all fallout 
> with this.  I think it's inevitable we fix the mistake in
> build_fold_addr_expr.
> 
> Richard.
> 
> 2020-03-17  Richard Biener  
> 
>   PR middle-end/94188
>   * fold-const.c (build_fold_addr_expr): Convert address to
>   correct type.
>   * asan.c (maybe_create_ssa_name): Strip useless type conversions.
>   * gimple-fold.c (gimple_fold_stmt_to_constant_1): Use build1
>   to build the ADDR_EXPR which we don't really want to simplify.
>   * tree-ssa-dom.c (record_equivalences_from_stmt): Likewise.
>   * tree-ssa-loop-im.c (gather_mem_refs_stmt): Likewise.
>   * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Likewise.
>   (simplify_builtin_call): Strip useless type conversions.
>   * tree-ssa-strlen.c (new_strinfo): Likewise.
> 
>   * gcc.dg/pr94188.c: New testcase.

Hi Richard,

This breaks Linux kernel build on 32-bit ARM:

00:01:29 ./include/linux/string.h:333:9: internal compiler error: in gen_movsi, 
at config/arm/arm.md:6291
00:01:29 make[2]: *** [sound/drivers/serial-u16550.o] Error 1

Would you please investigate?  Let me know if you need any help reproducing the 
problem.

Kernel’s build line is (assuming cross-compilation):
make CC=/path/to/arm-linux-gnueabihf-gcc ARCH=arm 
CROSS_COMPILE=arm-linux-gnueabihf- HOSTCC=gcc allyesconfig

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org




RE: [PATCH v2][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 11:32
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][4/4x]: MVE intrinsics with quaternary
> operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534345.html
> 
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with quaternary operands.
> 
> vabdq_m_f32, vabdq_m_f16, vaddq_m_f32, vaddq_m_f16, vaddq_m_n_f32,
> vaddq_m_n_f16, vandq_m_f32, vandq_m_f16, vbicq_m_f32, vbicq_m_f16,
> vbrsrq_m_n_f32, vbrsrq_m_n_f16, vcaddq_rot270_m_f32,
> vcaddq_rot270_m_f16, vcaddq_rot90_m_f32, vcaddq_rot90_m_f16,
> vcmlaq_m_f32, vcmlaq_m_f16, vcmlaq_rot180_m_f32,
> vcmlaq_rot180_m_f16, vcmlaq_rot270_m_f32, vcmlaq_rot270_m_f16,
> vcmlaq_rot90_m_f32, vcmlaq_rot90_m_f16, vcmulq_m_f32, vcmulq_m_f16,
> vcmulq_rot180_m_f32, vcmulq_rot180_m_f16, vcmulq_rot270_m_f32,
> vcmulq_rot270_m_f16, vcmulq_rot90_m_f32, vcmulq_rot90_m_f16,
> vcvtq_m_n_s32_f32, vcvtq_m_n_s16_f16, vcvtq_m_n_u32_f32,
> vcvtq_m_n_u16_f16, veorq_m_f32, veorq_m_f16, vfmaq_m_f32,
> vfmaq_m_f16, vfmaq_m_n_f32, vfmaq_m_n_f16, vfmasq_m_n_f32,
> vfmasq_m_n_f16, vfmsq_m_f32, vfmsq_m_f16, vmaxnmq_m_f32,
> vmaxnmq_m_f16, vminnmq_m_f32, vminnmq_m_f16, vmulq_m_f32,
> vmulq_m_f16, vmulq_m_n_f32, vmulq_m_n_f16, vornq_m_f32,
> vornq_m_f16, vorrq_m_f32, vorrq_m_f16, vsubq_m_f32, vsubq_m_f16,
> vsubq_m_n_f32, vsubq_m_n_f16.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1]  https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-10-31  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vabdq_m_f32): Define macro.
>   (vabdq_m_f16): Likewise.
>   (vaddq_m_f32): Likewise.
>   (vaddq_m_f16): Likewise.
>   (vaddq_m_n_f32): Likewise.
>   (vaddq_m_n_f16): Likewise.
>   (vandq_m_f32): Likewise.
>   (vandq_m_f16): Likewise.
>   (vbicq_m_f32): Likewise.
>   (vbicq_m_f16): Likewise.
>   (vbrsrq_m_n_f32): Likewise.
>   (vbrsrq_m_n_f16): Likewise.
>   (vcaddq_rot270_m_f32): Likewise.
>   (vcaddq_rot270_m_f16): Likewise.
>   (vcaddq_rot90_m_f32): Likewise.
>   (vcaddq_rot90_m_f16): Likewise.
>   (vcmlaq_m_f32): Likewise.
>   (vcmlaq_m_f16): Likewise.
>   (vcmlaq_rot180_m_f32): Likewise.
>   (vcmlaq_rot180_m_f16): Likewise.
>   (vcmlaq_rot270_m_f32): Likewise.
>   (vcmlaq_rot270_m_f16): Likewise.
>   (vcmlaq_rot90_m_f32): Likewise.
>   (vcmlaq_rot90_m_f16): Likewise.
>   (vcmulq_m_f32): Likewise.
>   (vcmulq_m_f16): Likewise.
>   (vcmulq_rot180_m_f32): Likewise.
>   (vcmulq_rot180_m_f16): Likewise.
>   (vcmulq_rot270_m_f32): Likewise.
>   (vcmulq_rot270_m_f16): Likewise.
>   (vcmulq_rot90_m_f32): Likewise.
>   (vcmulq_rot90_m_f16): Likewise.
>   (vcvtq_m_n_s32_f32): Likewise.
>   (vcvtq_m_n_s16_f16): Likewise.
>   (vcvtq_m_n_u32_f32): Likewise.
>   (vcvtq_m_n_u16_f16): Likewise.
>   (veorq_m_f32): Likewise.
>   (veorq_m_f16): Likewise.
>   (vfmaq_m_f32): Likewise.
>   (vfmaq_m_f16): Likewise.
>   (vfmaq_m_n_f32): Likewise.
>   (vfmaq_m_n_f16): Likewise.
>   (vfmasq_m_n_f32): Likewise.
>   (vfmasq_m_n_f16): Likewise.
>   (vfmsq_m_f32): Likewise.
>   (vfmsq_m_f16): Likewise.
>   (vmaxnmq_m_f32): Likewise.
>   (vmaxnmq_m_f16): Likewise.
>   (vminnmq_m_f32): Likewise.
>   (vminnmq_m_f16): Likewise.
>   (vmulq_m_f32): Likewise.
>   (vmulq_m_f16): Likewise.
>   (vmulq_m_n_f32): Likewise.
>   (vmulq_m_n_f16): Likewise.
>   (vornq_m_f32): Likewise.
>   (vornq_m_f16): Likewise.
>   (vorrq_m_f32): Likewise.
>   (vorrq_m_f16): Likewise.
>   (vsubq_m_f32): Likewise.
>   (vsubq_m_f16): Likewise.
>   (vsubq_m_n_f32): Likewise.
>   (vsubq_m_n_f16): Likewise.
>   (__attribute__): Likewise.
>   (__arm_vabdq_m_f32): Likewise.
>   (__arm_vabdq_m_f16): Likewise.
>   (__arm_vaddq_m_f32): Likewise.
>   (__arm_vaddq_m_f16): Likewise.
>   (__arm_vaddq_m_n_f32): Likewise.
>   (__arm_vaddq_m_n_f16): Likewise.
>   (__arm_vandq_m_f32): Likewise.
>   (__arm_vandq_m_f16): Likewise.
>   (__arm_vbicq_m_f32): Likewise.
>   (__arm_vbicq_m_f16): Likewise.
>   (__arm_vbrsrq_m_n_f32): Likewise.
>   (__arm_vbrsrq_m_n_f16): Likewise.
>   (__arm_vcaddq_rot270_m_f32): Likewise.
>   (__arm_vcaddq_rot270_m_f16): Likewise.
>   (__arm_vcaddq_rot90_m_f32): Likewise.
>   (__arm_vcaddq_rot90_m_f16): Likewise.
>   (__arm_vcmlaq_m_f32): Likewise.
>  

RE: [PATCH v2][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 11:32
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][3/4x]: MVE intrinsics with quaternary
> operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534324.html
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with quaternary operands.
> 
> vmlaldavaq_p_s16, vmlaldavaq_p_s32, vmlaldavaq_p_u16,
> vmlaldavaq_p_u32, vmlaldavaxq_p_s16, vmlaldavaxq_p_s32,
> vmlaldavaxq_p_u16, vmlaldavaxq_p_u32, vmlsldavaq_p_s16,
> vmlsldavaq_p_s32, vmlsldavaxq_p_s16, vmlsldavaxq_p_s32,
> vmullbq_poly_m_p16, vmullbq_poly_m_p8, vmulltq_poly_m_p16,
> vmulltq_poly_m_p8, vqdmullbq_m_n_s16, vqdmullbq_m_n_s32,
> vqdmullbq_m_s16, vqdmullbq_m_s32, vqdmulltq_m_n_s16,
> vqdmulltq_m_n_s32, vqdmulltq_m_s16, vqdmulltq_m_s32,
> vqrshrnbq_m_n_s16, vqrshrnbq_m_n_s32, vqrshrnbq_m_n_u16,
> vqrshrnbq_m_n_u32, vqrshrntq_m_n_s16, vqrshrntq_m_n_s32,
> vqrshrntq_m_n_u16, vqrshrntq_m_n_u32, vqrshrunbq_m_n_s16,
> vqrshrunbq_m_n_s32, vqrshruntq_m_n_s16, vqrshruntq_m_n_s32,
> vqshrnbq_m_n_s16, vqshrnbq_m_n_s32, vqshrnbq_m_n_u16,
> vqshrnbq_m_n_u32, vqshrntq_m_n_s16, vqshrntq_m_n_s32,
> vqshrntq_m_n_u16, vqshrntq_m_n_u32, vqshrunbq_m_n_s16,
> vqshrunbq_m_n_s32, vqshruntq_m_n_s16, vqshruntq_m_n_s32,
> vrmlaldavhaq_p_s32, vrmlaldavhaq_p_u32, vrmlaldavhaxq_p_s32,
> vrmlsldavhaq_p_s32, vrmlsldavhaxq_p_s32, vrshrnbq_m_n_s16,
> vrshrnbq_m_n_s32, vrshrnbq_m_n_u16, vrshrnbq_m_n_u32,
> vrshrntq_m_n_s16, vrshrntq_m_n_s32, vrshrntq_m_n_u16,
> vrshrntq_m_n_u32, vshllbq_m_n_s16, vshllbq_m_n_s8, vshllbq_m_n_u16,
> vshllbq_m_n_u8, vshlltq_m_n_s16, vshlltq_m_n_s8, vshlltq_m_n_u16,
> vshlltq_m_n_u8, vshrnbq_m_n_s16, vshrnbq_m_n_s32, vshrnbq_m_n_u16,
> vshrnbq_m_n_u32, vshrntq_m_n_s16, vshrntq_m_n_s32, vshrntq_m_n_u16,
> vshrntq_m_n_u32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-10-31  Andre Vieira  
>   Mihail Ionescu  
>   Srinath Parvathaneni  
> 
>   * config/arm/arm-protos.h (arm_mve_immediate_check):
>   * config/arm/arm.c (arm_mve_immediate_check): Define fuction to
>   check
>   mode and interger value.
>   * config/arm/arm_mve.h (vmlaldavaq_p_s32): Define macro.
>   (vmlaldavaq_p_s16): Likewise.
>   (vmlaldavaq_p_u32): Likewise.
>   (vmlaldavaq_p_u16): Likewise.
>   (vmlaldavaxq_p_s32): Likewise.
>   (vmlaldavaxq_p_s16): Likewise.
>   (vmlaldavaxq_p_u32): Likewise.
>   (vmlaldavaxq_p_u16): Likewise.
>   (vmlsldavaq_p_s32): Likewise.
>   (vmlsldavaq_p_s16): Likewise.
>   (vmlsldavaxq_p_s32): Likewise.
>   (vmlsldavaxq_p_s16): Likewise.
>   (vmullbq_poly_m_p8): Likewise.
>   (vmullbq_poly_m_p16): Likewise.
>   (vmulltq_poly_m_p8): Likewise.
>   (vmulltq_poly_m_p16): Likewise.
>   (vqdmullbq_m_n_s32): Likewise.
>   (vqdmullbq_m_n_s16): Likewise.
>   (vqdmullbq_m_s32): Likewise.
>   (vqdmullbq_m_s16): Likewise.
>   (vqdmulltq_m_n_s32): Likewise.
>   (vqdmulltq_m_n_s16): Likewise.
>   (vqdmulltq_m_s32): Likewise.
>   (vqdmulltq_m_s16): Likewise.
>   (vqrshrnbq_m_n_s32): Likewise.
>   (vqrshrnbq_m_n_s16): Likewise.
>   (vqrshrnbq_m_n_u32): Likewise.
>   (vqrshrnbq_m_n_u16): Likewise.
>   (vqrshrntq_m_n_s32): Likewise.
>   (vqrshrntq_m_n_s16): Likewise.
>   (vqrshrntq_m_n_u32): Likewise.
>   (vqrshrntq_m_n_u16): Likewise.
>   (vqrshrunbq_m_n_s32): Likewise.
>   (vqrshrunbq_m_n_s16): Likewise.
>   (vqrshruntq_m_n_s32): Likewise.
>   (vqrshruntq_m_n_s16): Likewise.
>   (vqshrnbq_m_n_s32): Likewise.
>   (vqshrnbq_m_n_s16): Likewise.
>   (vqshrnbq_m_n_u32): Likewise.
>   (vqshrnbq_m_n_u16): Likewise.
>   (vqshrntq_m_n_s32): Likewise.
>   (vqshrntq_m_n_s16): Likewise.
>   (vqshrntq_m_n_u32): Likewise.
>   (vqshrntq_m_n_u16): Likewise.
>   (vqshrunbq_m_n_s32): Likewise.
>   (vqshrunbq_m_n_s16): Likewise.
>   (vqshruntq_m_n_s32): Likewise.
>   (vqshruntq_m_n_s16): Likewise.
>   (vrmlaldavhaq_p_s32): Likewise.
>   (vrmlaldavhaq_p_u32): Likewise.
>   (vrmlaldavhaxq_p_s32): Likewise.
>   (vrmlsldavhaq_p_s32): Likewise.
>   (vrmlsldavhaxq_p_s32): Likewise.
>   (vrshrnbq_m_n_s32): Likewise.
>   (vrshrnbq_m_n_s16): Likewise.
>   (vrshrnbq_m_n_u32): Likewise.
>   (vrshrnbq_m_n_u16): Likewise.
>   (vrshrntq_m_n_s32): Likewise.
>   (vrshrntq_m_n_s16): Likewise.
>   (vrshrntq_m_n_u32): Likewise.
>   (vrshrntq_m_n_u16): Likewise.
>   

RE: [PATCH v2][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 11:31
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][2/4x]: MVE intrinsics with quaternary
> operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534339.html
> 
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with quaternary operands.
> 
> vabdq_m_s8, vabdq_m_s32, vabdq_m_s16, vabdq_m_u8, vabdq_m_u32,
> vabdq_m_u16, vaddq_m_n_s8, vaddq_m_n_s32, vaddq_m_n_s16,
> vaddq_m_n_u8, vaddq_m_n_u32, vaddq_m_n_u16, vaddq_m_s8,
> vaddq_m_s32, vaddq_m_s16, vaddq_m_u8, vaddq_m_u32, vaddq_m_u16,
> vandq_m_s8, vandq_m_s32, vandq_m_s16, vandq_m_u8, vandq_m_u32,
> vandq_m_u16, vbicq_m_s8, vbicq_m_s32, vbicq_m_s16, vbicq_m_u8,
> vbicq_m_u32, vbicq_m_u16, vbrsrq_m_n_s8, vbrsrq_m_n_s32,
> vbrsrq_m_n_s16, vbrsrq_m_n_u8, vbrsrq_m_n_u32, vbrsrq_m_n_u16,
> vcaddq_rot270_m_s8, vcaddq_rot270_m_s32, vcaddq_rot270_m_s16,
> vcaddq_rot270_m_u8, vcaddq_rot270_m_u32, vcaddq_rot270_m_u16,
> vcaddq_rot90_m_s8, vcaddq_rot90_m_s32, vcaddq_rot90_m_s16,
> vcaddq_rot90_m_u8, vcaddq_rot90_m_u32, vcaddq_rot90_m_u16,
> veorq_m_s8, veorq_m_s32, veorq_m_s16, veorq_m_u8, veorq_m_u32,
> veorq_m_u16, vhaddq_m_n_s8, vhaddq_m_n_s32, vhaddq_m_n_s16,
> vhaddq_m_n_u8, vhaddq_m_n_u32, vhaddq_m_n_u16, vhaddq_m_s8,
> vhaddq_m_s32, vhaddq_m_s16, vhaddq_m_u8, vhaddq_m_u32,
> vhaddq_m_u16, vhcaddq_rot270_m_s8, vhcaddq_rot270_m_s32,
> vhcaddq_rot270_m_s16, vhcaddq_rot90_m_s8, vhcaddq_rot90_m_s32,
> vhcaddq_rot90_m_s16, vhsubq_m_n_s8, vhsubq_m_n_s32,
> vhsubq_m_n_s16, vhsubq_m_n_u8, vhsubq_m_n_u32, vhsubq_m_n_u16,
> vhsubq_m_s8, vhsubq_m_s32, vhsubq_m_s16, vhsubq_m_u8,
> vhsubq_m_u32, vhsubq_m_u16, vmaxq_m_s8, vmaxq_m_s32,
> vmaxq_m_s16, vmaxq_m_u8, vmaxq_m_u32, vmaxq_m_u16, vminq_m_s8,
> vminq_m_s32, vminq_m_s16, vminq_m_u8, vminq_m_u32, vminq_m_u16,
> vmladavaq_p_s8, vmladavaq_p_s32, vmladavaq_p_s16, vmladavaq_p_u8,
> vmladavaq_p_u32, vmladavaq_p_u16, vmladavaxq_p_s8,
> vmladavaxq_p_s32, vmladavaxq_p_s16, vmlaq_m_n_s8, vmlaq_m_n_s32,
> vmlaq_m_n_s16, vmlaq_m_n_u8, vmlaq_m_n_u32, vmlaq_m_n_u16,
> vmlasq_m_n_s8, vmlasq_m_n_s32, vmlasq_m_n_s16, vmlasq_m_n_u8,
> vmlasq_m_n_u32, vmlasq_m_n_u16, vmlsdavaq_p_s8, vmlsdavaq_p_s32,
> vmlsdavaq_p_s16, vmlsdavaxq_p_s8, vmlsdavaxq_p_s32, vmlsdavaxq_p_s16,
> vmulhq_m_s8, vmulhq_m_s32, vmulhq_m_s16, vmulhq_m_u8,
> vmulhq_m_u32, vmulhq_m_u16, vmullbq_int_m_s8, vmullbq_int_m_s32,
> vmullbq_int_m_s16, vmullbq_int_m_u8, vmullbq_int_m_u32,
> vmullbq_int_m_u16, vmulltq_int_m_s8, vmulltq_int_m_s32,
> vmulltq_int_m_s16, vmulltq_int_m_u8, vmulltq_int_m_u32,
> vmulltq_int_m_u16, vmulq_m_n_s8, vmulq_m_n_s32, vmulq_m_n_s16,
> vmulq_m_n_u8, vmulq_m_n_u32, vmulq_m_n_u16, vmulq_m_s8,
> vmulq_m_s32, vmulq_m_s16, vmulq_m_u8, vmulq_m_u32, vmulq_m_u16,
> vornq_m_s8, vornq_m_s32, vornq_m_s16, vornq_m_u8, vornq_m_u32,
> vornq_m_u16, vorrq_m_s8, vorrq_m_s32, vorrq_m_s16, vorrq_m_u8,
> vorrq_m_u32, vorrq_m_u16, vqaddq_m_n_s8, vqaddq_m_n_s32,
> vqaddq_m_n_s16, vqaddq_m_n_u8, vqaddq_m_n_u32, vqaddq_m_n_u16,
> vqaddq_m_s8, vqaddq_m_s32, vqaddq_m_s16, vqaddq_m_u8,
> vqaddq_m_u32, vqaddq_m_u16, vqdmladhq_m_s8, vqdmladhq_m_s32,
> vqdmladhq_m_s16, vqdmladhxq_m_s8, vqdmladhxq_m_s32,
> vqdmladhxq_m_s16, vqdmlahq_m_n_s8, vqdmlahq_m_n_s32,
> vqdmlahq_m_n_s16, vqdmlahq_m_n_u8, vqdmlahq_m_n_u32,
> vqdmlahq_m_n_u16, vqdmlsdhq_m_s8, vqdmlsdhq_m_s32,
> vqdmlsdhq_m_s16, vqdmlsdhxq_m_s8, vqdmlsdhxq_m_s32,
> vqdmlsdhxq_m_s16, vqdmulhq_m_n_s8, vqdmulhq_m_n_s32,
> vqdmulhq_m_n_s16, vqdmulhq_m_s8, vqdmulhq_m_s32, vqdmulhq_m_s16,
> vqrdmladhq_m_s8, vqrdmladhq_m_s32, vqrdmladhq_m_s16,
> vqrdmladhxq_m_s8, vqrdmladhxq_m_s32, vqrdmladhxq_m_s16,
> vqrdmlahq_m_n_s8, vqrdmlahq_m_n_s32, vqrdmlahq_m_n_s16,
> vqrdmlahq_m_n_u8, vqrdmlahq_m_n_u32, vqrdmlahq_m_n_u16,
> vqrdmlashq_m_n_s8, vqrdmlashq_m_n_s32, vqrdmlashq_m_n_s16,
> vqrdmlashq_m_n_u8, vqrdmlashq_m_n_u32, vqrdmlashq_m_n_u16,
> vqrdmlsdhq_m_s8, vqrdmlsdhq_m_s32, vqrdmlsdhq_m_s16,
> vqrdmlsdhxq_m_s8, vqrdmlsdhxq_m_s32, vqrdmlsdhxq_m_s16,
> vqrdmulhq_m_n_s8, vqrdmulhq_m_n_s32, vqrdmulhq_m_n_s16,
> vqrdmulhq_m_s8, vqrdmulhq_m_s32, vqrdmulhq_m_s16, vqrshlq_m_s8,
> vqrshlq_m_s32, vqrshlq_m_s16, vqrshlq_m_u8, vqrshlq_m_u32,
> vqrshlq_m_u16, vqshlq_m_n_s8, vqshlq_m_n_s32, vqshlq_m_n_s16,
> vqshlq_m_n_u8, vqshlq_m_n_u32, vqshlq_m_n_u16, vqshlq_m_s8,
> vqshlq_m_s32, vqshlq_m_s16, vqshlq_m_u8, vqshlq_m_u32, vqshlq_m_u16,
> vqsubq_m_n_s8, vqsubq_m_n_s32, vqsubq_m_n_s16, vqsubq_m_n_u8,
> vqsubq_m_n_u32, vqsubq_m_n_u16, vqsubq_m_s8, vqsubq_m_s32,
> vqsubq_m_s16, vqsubq_m_u8, vqsubq_m_u32, vqsubq_m_u16,
> vrhaddq_m_s8, vrhaddq_m_s32, vrhaddq_m_s16, vrhaddq_m_u8,
> vrhaddq_m_u32, vrhaddq_m_u16, vrmulhq_m_s8, vrmulhq_m_s32,
> vrmulhq_m_s16, vrmulhq_m_u8, vrmulhq_m_u32, vrmulhq_m_u16,
> vrshlq_m_s8, vrshlq_m_s32, vrshlq_m_s16, vrshlq_m_u8, vrshlq_m_u32,
> vrshlq_m_u16, vrshrq_m_n_s8, 

RE: [PATCH v2][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 11:29
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v2][ARM][GCC][1/4x]: MVE intrinsics with quaternary
> operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v1.
> (version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-
> November/534332.html
> 
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with quaternary operands.
> 
> vsriq_m_n_s8, vsubq_m_s8, vsubq_x_s8, vcvtq_m_n_f16_u16,
> vcvtq_x_n_f16_u16,
> vqshluq_m_n_s8, vabavq_p_s8, vsriq_m_n_u8, vshlq_m_u8, vshlq_x_u8,
> vsubq_m_u8,
> vsubq_x_u8, vabavq_p_u8, vshlq_m_s8, vshlq_x_s8, vcvtq_m_n_f16_s16,
> vcvtq_x_n_f16_s16, vsriq_m_n_s16, vsubq_m_s16, vsubq_x_s16,
> vcvtq_m_n_f32_u32,
> vcvtq_x_n_f32_u32, vqshluq_m_n_s16, vabavq_p_s16, vsriq_m_n_u16,
> vshlq_m_u16, vshlq_x_u16, vsubq_m_u16, vsubq_x_u16, vabavq_p_u16,
> vshlq_m_s16,
> vshlq_x_s16, vcvtq_m_n_f32_s32, vcvtq_x_n_f32_s32, vsriq_m_n_s32,
> vsubq_m_s32,
> vsubq_x_s32, vqshluq_m_n_s32, vabavq_p_s32, vsriq_m_n_u32,
> vshlq_m_u32,
> vshlq_x_u32, vsubq_m_u32, vsubq_x_u32, vabavq_p_u32, vshlq_m_s32,
> vshlq_x_s32.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill


> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-10-29  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c
> (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS):
>   Define builtin qualifier.
>   (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS):
> Likewise.
>   (QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
>   (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS):
> Likewise.
>   (QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS):
> Likewise.
>   (QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS):
> Likewise.
>   (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS):
> Likewise.
>   (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS):
> Likewise.
>   * config/arm/arm_mve.h (vsriq_m_n_s8): Define macro.
>   (vsubq_m_s8): Likewise.
>   (vcvtq_m_n_f16_u16): Likewise.
>   (vqshluq_m_n_s8): Likewise.
>   (vabavq_p_s8): Likewise.
>   (vsriq_m_n_u8): Likewise.
>   (vshlq_m_u8): Likewise.
>   (vsubq_m_u8): Likewise.
>   (vabavq_p_u8): Likewise.
>   (vshlq_m_s8): Likewise.
>   (vcvtq_m_n_f16_s16): Likewise.
>   (vsriq_m_n_s16): Likewise.
>   (vsubq_m_s16): Likewise.
>   (vcvtq_m_n_f32_u32): Likewise.
>   (vqshluq_m_n_s16): Likewise.
>   (vabavq_p_s16): Likewise.
>   (vsriq_m_n_u16): Likewise.
>   (vshlq_m_u16): Likewise.
>   (vsubq_m_u16): Likewise.
>   (vabavq_p_u16): Likewise.
>   (vshlq_m_s16): Likewise.
>   (vcvtq_m_n_f32_s32): Likewise.
>   (vsriq_m_n_s32): Likewise.
>   (vsubq_m_s32): Likewise.
>   (vqshluq_m_n_s32): Likewise.
>   (vabavq_p_s32): Likewise.
>   (vsriq_m_n_u32): Likewise.
>   (vshlq_m_u32): Likewise.
>   (vsubq_m_u32): Likewise.
>   (vabavq_p_u32): Likewise.
>   (vshlq_m_s32): Likewise.
>   (__arm_vsriq_m_n_s8): Define intrinsic.
>   (__arm_vsubq_m_s8): Likewise.
>   (__arm_vqshluq_m_n_s8): Likewise.
>   (__arm_vabavq_p_s8): Likewise.
>   (__arm_vsriq_m_n_u8): Likewise.
>   (__arm_vshlq_m_u8): Likewise.
>   (__arm_vsubq_m_u8): Likewise.
>   (__arm_vabavq_p_u8): Likewise.
>   (__arm_vshlq_m_s8): Likewise.
>   (__arm_vsriq_m_n_s16): Likewise.
>   (__arm_vsubq_m_s16): Likewise.
>   (__arm_vqshluq_m_n_s16): Likewise.
>   (__arm_vabavq_p_s16): Likewise.
>   (__arm_vsriq_m_n_u16): Likewise.
>   (__arm_vshlq_m_u16): Likewise.
>   (__arm_vsubq_m_u16): Likewise.
>   (__arm_vabavq_p_u16): Likewise.
>   (__arm_vshlq_m_s16): Likewise.
>   (__arm_vsriq_m_n_s32): Likewise.
>   (__arm_vsubq_m_s32): Likewise.
>   (__arm_vqshluq_m_n_s32): Likewise.
>   (__arm_vabavq_p_s32): Likewise.
>   (__arm_vsriq_m_n_u32): Likewise.
>   (__arm_vshlq_m_u32): Likewise.
>   (__arm_vsubq_m_u32): Likewise.
>   (__arm_vabavq_p_u32): Likewise.
>   (__arm_vshlq_m_s32): Likewise.
>   (__arm_vcvtq_m_n_f16_u16): Likewise.
>   (__arm_vcvtq_m_n_f16_s16): Likewise.
>   (__arm_vcvtq_m_n_f32_u32): Likewise.
>   (__arm_vcvtq_m_n_f32_s32): Likewise.
>   (vcvtq_m_n): Define polymorphic variant.
>   (vqshluq_m_n): Likewise.
>   (vshlq_m): Likewise.
>   (vsriq_m_n): Likewise.
>   (vsubq_m): Likewise.
>   (vabavq_p): Likewise.
>   * config/arm/arm_mve_builtins.def
>   (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Use
> builtin qualifier.
>   

RE: [PATCH v3][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 11:21
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v3][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v2.
> (version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-
> March/542068.html
> 
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with ternary operands.
> 
> vrmlaldavhaxq_s32, vrmlsldavhaq_s32, vrmlsldavhaxq_s32, vaddlvaq_p_s32,
> vcvtbq_m_f16_f32, vcvtbq_m_f32_f16, vcvttq_m_f16_f32, vcvttq_m_f32_f16,
> vrev16q_m_s8, vrev32q_m_f16, vrmlaldavhq_p_s32, vrmlaldavhxq_p_s32,
> vrmlsldavhq_p_s32, vrmlsldavhxq_p_s32, vaddlvaq_p_u32, vrev16q_m_u8,
> vrmlaldavhq_p_u32, vmvnq_m_n_s16, vorrq_m_n_s16, vqrshrntq_n_s16,
> vqshrnbq_n_s16, vqshrntq_n_s16, vrshrnbq_n_s16, vrshrntq_n_s16,
> vshrnbq_n_s16, vshrntq_n_s16, vcmlaq_f16, vcmlaq_rot180_f16,
> vcmlaq_rot270_f16, vcmlaq_rot90_f16, vfmaq_f16, vfmaq_n_f16,
> vfmasq_n_f16, vfmsq_f16, vmlaldavaq_s16, vmlaldavaxq_s16,
> vmlsldavaq_s16, vmlsldavaxq_s16, vabsq_m_f16, vcvtmq_m_s16_f16,
> vcvtnq_m_s16_f16, vcvtpq_m_s16_f16, vcvtq_m_s16_f16, vdupq_m_n_f16,
> vmaxnmaq_m_f16, vmaxnmavq_p_f16, vmaxnmvq_p_f16,
> vminnmaq_m_f16, vminnmavq_p_f16, vminnmvq_p_f16, vmlaldavq_p_s16,
> vmlaldavxq_p_s16, vmlsldavq_p_s16, vmlsldavxq_p_s16, vmovlbq_m_s8,
> vmovltq_m_s8, vmovnbq_m_s16, vmovntq_m_s16, vnegq_m_f16,
> vpselq_f16, vqmovnbq_m_s16, vqmovntq_m_s16, vrev32q_m_s8,
> vrev64q_m_f16, vrndaq_m_f16, vrndmq_m_f16, vrndnq_m_f16,
> vrndpq_m_f16, vrndq_m_f16, vrndxq_m_f16, vcmpeqq_m_n_f16,
> vcmpgeq_m_f16, vcmpgeq_m_n_f16, vcmpgtq_m_f16, vcmpgtq_m_n_f16,
> vcmpleq_m_f16, vcmpleq_m_n_f16, vcmpltq_m_f16, vcmpltq_m_n_f16,
> vcmpneq_m_f16, vcmpneq_m_n_f16, vmvnq_m_n_u16, vorrq_m_n_u16,
> vqrshruntq_n_s16, vqshrunbq_n_s16, vqshruntq_n_s16, vcvtmq_m_u16_f16,
> vcvtnq_m_u16_f16, vcvtpq_m_u16_f16, vcvtq_m_u16_f16,
> vqmovunbq_m_s16, vqmovuntq_m_s16, vqrshrntq_n_u16, vqshrnbq_n_u16,
> vqshrntq_n_u16, vrshrnbq_n_u16, vrshrntq_n_u16, vshrnbq_n_u16,
> vshrntq_n_u16, vmlaldavaq_u16, vmlaldavaxq_u16, vmlaldavq_p_u16,
> vmlaldavxq_p_u16, vmovlbq_m_u8, vmovltq_m_u8, vmovnbq_m_u16,
> vmovntq_m_u16, vqmovnbq_m_u16, vqmovntq_m_u16, vrev32q_m_u8,
> vmvnq_m_n_s32, vorrq_m_n_s32, vqrshrntq_n_s32, vqshrnbq_n_s32,
> vqshrntq_n_s32, vrshrnbq_n_s32, vrshrntq_n_s32, vshrnbq_n_s32,
> vshrntq_n_s32, vcmlaq_f32, vcmlaq_rot180_f32, vcmlaq_rot270_f32,
> vcmlaq_rot90_f32, vfmaq_f32, vfmaq_n_f32, vfmasq_n_f32, vfmsq_f32,
> vmlaldavaq_s32, vmlaldavaxq_s32, vmlsldavaq_s32, vmlsldavaxq_s32,
> vabsq_m_f32, vcvtmq_m_s32_f32, vcvtnq_m_s32_f32, vcvtpq_m_s32_f32,
> vcvtq_m_s32_f32, vdupq_m_n_f32, vmaxnmaq_m_f32, vmaxnmavq_p_f32,
> vmaxnmvq_p_f32, vminnmaq_m_f32, vminnmavq_p_f32, vminnmvq_p_f32,
> vmlaldavq_p_s32, vmlaldavxq_p_s32, vmlsldavq_p_s32, vmlsldavxq_p_s32,
> vmovlbq_m_s16, vmovltq_m_s16, vmovnbq_m_s32, vmovntq_m_s32,
> vnegq_m_f32, vpselq_f32, vqmovnbq_m_s32, vqmovntq_m_s32,
> vrev32q_m_s16, vrev64q_m_f32, vrndaq_m_f32, vrndmq_m_f32,
> vrndnq_m_f32, vrndpq_m_f32, vrndq_m_f32, vrndxq_m_f32,
> vcmpeqq_m_n_f32, vcmpgeq_m_f32, vcmpgeq_m_n_f32, vcmpgtq_m_f32,
> vcmpgtq_m_n_f32, vcmpleq_m_f32, vcmpleq_m_n_f32, vcmpltq_m_f32,
> vcmpltq_m_n_f32, vcmpneq_m_f32, vcmpneq_m_n_f32, vmvnq_m_n_u32,
> vorrq_m_n_u32, vqrshruntq_n_s32, vqshrunbq_n_s32, vqshruntq_n_s32,
> vcvtmq_m_u32_f32, vcvtnq_m_u32_f32, vcvtpq_m_u32_f32,
> vcvtq_m_u32_f32, vqmovunbq_m_s32, vqmovuntq_m_s32,
> vqrshrntq_n_u32, vqshrnbq_n_u32, vqshrntq_n_u32, vrshrnbq_n_u32,
> vrshrntq_n_u32, vshrnbq_n_u32, vshrntq_n_u32, vmlaldavaq_u32,
> vmlaldavaxq_u32, vmlaldavq_p_u32, vmlaldavxq_p_u32, vmovlbq_m_u16,
> vmovltq_m_u16, vmovnbq_m_u32, vmovntq_m_u32, vqmovnbq_m_u32,
> vqmovntq_m_u32, vrev32q_m_u16.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this patch to master.
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2019-10-29  Andre Vieira  
> Mihail Ionescu  
> Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (vrmlaldavhaxq_s32): Define macro.
>   (vrmlsldavhaq_s32): Likewise.
>   (vrmlsldavhaxq_s32): Likewise.
>   (vaddlvaq_p_s32): Likewise.
>   (vcvtbq_m_f16_f32): Likewise.
>   (vcvtbq_m_f32_f16): Likewise.
>   (vcvttq_m_f16_f32): Likewise.
>   (vcvttq_m_f32_f16): Likewise.
>   (vrev16q_m_s8): Likewise.
>   (vrev32q_m_f16): Likewise.
>   (vrmlaldavhq_p_s32): Likewise.
>   (vrmlaldavhxq_p_s32): Likewise.
>   (vrmlsldavhq_p_s32): Likewise.
>   (vrmlsldavhxq_p_s32): Likewise.
>   (vaddlvaq_p_u32): Likewise.
>   (vrev16q_m_u8): Likewise.
>   (vrmlaldavhq_p_u32): Likewise.
>   (vmvnq_m_n_s16): Likewise.

RE: [PATCH v4][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Kyrylo Tkachov
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 18 March 2020 16:16
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH v4][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
> 
> Hello Kyrill,
> 
> Following patch is the rebased version of v3.
> (version v3) https://gcc.gnu.org/pipermail/gcc-patches/2020-
> March/542207.html
> 
> 
> 
> Hello,
> 
> This patch supports following MVE ACLE intrinsics with ternary operands.
> 
> vpselq_u8, vpselq_s8, vrev64q_m_u8, vqrdmlashq_n_u8, vqrdmlahq_n_u8,
> vqdmlahq_n_u8, vmvnq_m_u8, vmlasq_n_u8, vmlaq_n_u8, vmladavq_p_u8,
> vmladavaq_u8, vminvq_p_u8, vmaxvq_p_u8, vdupq_m_n_u8,
> vcmpneq_m_u8, vcmpneq_m_n_u8, vcmphiq_m_u8, vcmphiq_m_n_u8,
> vcmpeqq_m_u8, vcmpeqq_m_n_u8, vcmpcsq_m_u8, vcmpcsq_m_n_u8,
> vclzq_m_u8, vaddvaq_p_u8, vsriq_n_u8, vsliq_n_u8, vshlq_m_r_u8,
> vrshlq_m_n_u8, vqshlq_m_r_u8, vqrshlq_m_n_u8, vminavq_p_s8,
> vminaq_m_s8, vmaxavq_p_s8, vmaxaq_m_s8, vcmpneq_m_s8,
> vcmpneq_m_n_s8, vcmpltq_m_s8, vcmpltq_m_n_s8, vcmpleq_m_s8,
> vcmpleq_m_n_s8, vcmpgtq_m_s8, vcmpgtq_m_n_s8, vcmpgeq_m_s8,
> vcmpgeq_m_n_s8, vcmpeqq_m_s8, vcmpeqq_m_n_s8, vshlq_m_r_s8,
> vrshlq_m_n_s8, vrev64q_m_s8, vqshlq_m_r_s8, vqrshlq_m_n_s8,
> vqnegq_m_s8, vqabsq_m_s8, vnegq_m_s8, vmvnq_m_s8, vmlsdavxq_p_s8,
> vmlsdavq_p_s8, vmladavxq_p_s8, vmladavq_p_s8, vminvq_p_s8,
> vmaxvq_p_s8, vdupq_m_n_s8, vclzq_m_s8, vclsq_m_s8, vaddvaq_p_s8,
> vabsq_m_s8, vqrdmlsdhxq_s8, vqrdmlsdhq_s8, vqrdmlashq_n_s8,
> vqrdmlahq_n_s8, vqrdmladhxq_s8, vqrdmladhq_s8, vqdmlsdhxq_s8,
> vqdmlsdhq_s8, vqdmlahq_n_s8, vqdmladhxq_s8, vqdmladhq_s8,
> vmlsdavaxq_s8, vmlsdavaq_s8, vmlasq_n_s8, vmlaq_n_s8, vmladavaxq_s8,
> vmladavaq_s8, vsriq_n_s8, vsliq_n_s8, vpselq_u16, vpselq_s16,
> vrev64q_m_u16, vqrdmlashq_n_u16, vqrdmlahq_n_u16, vqdmlahq_n_u16,
> vmvnq_m_u16, vmlasq_n_u16, vmlaq_n_u16, vmladavq_p_u16,
> vmladavaq_u16, vminvq_p_u16, vmaxvq_p_u16, vdupq_m_n_u16,
> vcmpneq_m_u16, vcmpneq_m_n_u16, vcmphiq_m_u16, vcmphiq_m_n_u16,
> vcmpeqq_m_u16, vcmpeqq_m_n_u16, vcmpcsq_m_u16, vcmpcsq_m_n_u16,
> vclzq_m_u16, vaddvaq_p_u16, vsriq_n_u16, vsliq_n_u16, vshlq_m_r_u16,
> vrshlq_m_n_u16, vqshlq_m_r_u16, vqrshlq_m_n_u16, vminavq_p_s16,
> vminaq_m_s16, vmaxavq_p_s16, vmaxaq_m_s16, vcmpneq_m_s16,
> vcmpneq_m_n_s16, vcmpltq_m_s16, vcmpltq_m_n_s16, vcmpleq_m_s16,
> vcmpleq_m_n_s16, vcmpgtq_m_s16, vcmpgtq_m_n_s16, vcmpgeq_m_s16,
> vcmpgeq_m_n_s16, vcmpeqq_m_s16, vcmpeqq_m_n_s16, vshlq_m_r_s16,
> vrshlq_m_n_s16, vrev64q_m_s16, vqshlq_m_r_s16, vqrshlq_m_n_s16,
> vqnegq_m_s16, vqabsq_m_s16, vnegq_m_s16, vmvnq_m_s16,
> vmlsdavxq_p_s16, vmlsdavq_p_s16, vmladavxq_p_s16, vmladavq_p_s16,
> vminvq_p_s16, vmaxvq_p_s16, vdupq_m_n_s16, vclzq_m_s16, vclsq_m_s16,
> vaddvaq_p_s16, vabsq_m_s16, vqrdmlsdhxq_s16, vqrdmlsdhq_s16,
> vqrdmlashq_n_s16, vqrdmlahq_n_s16, vqrdmladhxq_s16, vqrdmladhq_s16,
> vqdmlsdhxq_s16, vqdmlsdhq_s16, vqdmlahq_n_s16, vqdmladhxq_s16,
> vqdmladhq_s16, vmlsdavaxq_s16, vmlsdavaq_s16, vmlasq_n_s16,
> vmlaq_n_s16, vmladavaxq_s16, vmladavaq_s16, vsriq_n_s16, vsliq_n_s16,
> vpselq_u32, vpselq_s32, vrev64q_m_u32, vqrdmlashq_n_u32,
> vqrdmlahq_n_u32, vqdmlahq_n_u32, vmvnq_m_u32, vmlasq_n_u32,
> vmlaq_n_u32, vmladavq_p_u32, vmladavaq_u32, vminvq_p_u32,
> vmaxvq_p_u32, vdupq_m_n_u32, vcmpneq_m_u32, vcmpneq_m_n_u32,
> vcmphiq_m_u32, vcmphiq_m_n_u32, vcmpeqq_m_u32, vcmpeqq_m_n_u32,
> vcmpcsq_m_u32, vcmpcsq_m_n_u32, vclzq_m_u32, vaddvaq_p_u32,
> vsriq_n_u32, vsliq_n_u32, vshlq_m_r_u32, vrshlq_m_n_u32,
> vqshlq_m_r_u32, vqrshlq_m_n_u32, vminavq_p_s32, vminaq_m_s32,
> vmaxavq_p_s32, vmaxaq_m_s32, vcmpneq_m_s32, vcmpneq_m_n_s32,
> vcmpltq_m_s32, vcmpltq_m_n_s32, vcmpleq_m_s32, vcmpleq_m_n_s32,
> vcmpgtq_m_s32, vcmpgtq_m_n_s32, vcmpgeq_m_s32, vcmpgeq_m_n_s32,
> vcmpeqq_m_s32, vcmpeqq_m_n_s32, vshlq_m_r_s32, vrshlq_m_n_s32,
> vrev64q_m_s32, vqshlq_m_r_s32, vqrshlq_m_n_s32, vqnegq_m_s32,
> vqabsq_m_s32, vnegq_m_s32, vmvnq_m_s32, vmlsdavxq_p_s32,
> vmlsdavq_p_s32, vmladavxq_p_s32, vmladavq_p_s32, vminvq_p_s32,
> vmaxvq_p_s32, vdupq_m_n_s32, vclzq_m_s32, vclsq_m_s32,
> vaddvaq_p_s32, vabsq_m_s32, vqrdmlsdhxq_s32, vqrdmlsdhq_s32,
> vqrdmlashq_n_s32, vqrdmlahq_n_s32, vqrdmladhxq_s32, vqrdmladhq_s32,
> vqdmlsdhxq_s32, vqdmlsdhq_s32, vqdmlahq_n_s32, vqdmladhxq_s32,
> vqdmladhq_s32, vmlsdavaxq_s32, vmlsdavaq_s32, vmlasq_n_s32,
> vmlaq_n_s32, vmladavaxq_s32, vmladavaq_s32, vsriq_n_s32, vsliq_n_s32,
> vpselq_u64, vpselq_s64.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> In this patch new constraints "Rc" and "Re" are added, which checks the
> constant is with in the range of 0 to 15 and 0 to 31 respectively.
> 
> Also a new predicates "mve_imm_15" and "mve_imm_31" are added, to
> check the the matching constraint Rc and Re respectively.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Thanks, I've pushed this 

[PATCH v4][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v3.
(version v3) https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542207.html



Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vpselq_u8, vpselq_s8, vrev64q_m_u8, vqrdmlashq_n_u8, vqrdmlahq_n_u8,
vqdmlahq_n_u8, vmvnq_m_u8, vmlasq_n_u8, vmlaq_n_u8, vmladavq_p_u8,
vmladavaq_u8, vminvq_p_u8, vmaxvq_p_u8, vdupq_m_n_u8, vcmpneq_m_u8,
vcmpneq_m_n_u8, vcmphiq_m_u8, vcmphiq_m_n_u8, vcmpeqq_m_u8,
vcmpeqq_m_n_u8, vcmpcsq_m_u8, vcmpcsq_m_n_u8, vclzq_m_u8, vaddvaq_p_u8,
vsriq_n_u8, vsliq_n_u8, vshlq_m_r_u8, vrshlq_m_n_u8, vqshlq_m_r_u8,
vqrshlq_m_n_u8, vminavq_p_s8, vminaq_m_s8, vmaxavq_p_s8, vmaxaq_m_s8,
vcmpneq_m_s8, vcmpneq_m_n_s8, vcmpltq_m_s8, vcmpltq_m_n_s8, vcmpleq_m_s8,
vcmpleq_m_n_s8, vcmpgtq_m_s8, vcmpgtq_m_n_s8, vcmpgeq_m_s8, vcmpgeq_m_n_s8,
vcmpeqq_m_s8, vcmpeqq_m_n_s8, vshlq_m_r_s8, vrshlq_m_n_s8, vrev64q_m_s8,
vqshlq_m_r_s8, vqrshlq_m_n_s8, vqnegq_m_s8, vqabsq_m_s8, vnegq_m_s8,
vmvnq_m_s8, vmlsdavxq_p_s8, vmlsdavq_p_s8, vmladavxq_p_s8, vmladavq_p_s8,
vminvq_p_s8, vmaxvq_p_s8, vdupq_m_n_s8, vclzq_m_s8, vclsq_m_s8, vaddvaq_p_s8,
vabsq_m_s8, vqrdmlsdhxq_s8, vqrdmlsdhq_s8, vqrdmlashq_n_s8, vqrdmlahq_n_s8,
vqrdmladhxq_s8, vqrdmladhq_s8, vqdmlsdhxq_s8, vqdmlsdhq_s8, vqdmlahq_n_s8,
vqdmladhxq_s8, vqdmladhq_s8, vmlsdavaxq_s8, vmlsdavaq_s8, vmlasq_n_s8,
vmlaq_n_s8, vmladavaxq_s8, vmladavaq_s8, vsriq_n_s8, vsliq_n_s8, vpselq_u16,
vpselq_s16, vrev64q_m_u16, vqrdmlashq_n_u16, vqrdmlahq_n_u16, vqdmlahq_n_u16,
vmvnq_m_u16, vmlasq_n_u16, vmlaq_n_u16, vmladavq_p_u16, vmladavaq_u16,
vminvq_p_u16, vmaxvq_p_u16, vdupq_m_n_u16, vcmpneq_m_u16, vcmpneq_m_n_u16,
vcmphiq_m_u16, vcmphiq_m_n_u16, vcmpeqq_m_u16, vcmpeqq_m_n_u16, vcmpcsq_m_u16,
vcmpcsq_m_n_u16, vclzq_m_u16, vaddvaq_p_u16, vsriq_n_u16, vsliq_n_u16,
vshlq_m_r_u16, vrshlq_m_n_u16, vqshlq_m_r_u16, vqrshlq_m_n_u16, vminavq_p_s16,
vminaq_m_s16, vmaxavq_p_s16, vmaxaq_m_s16, vcmpneq_m_s16, vcmpneq_m_n_s16,
vcmpltq_m_s16, vcmpltq_m_n_s16, vcmpleq_m_s16, vcmpleq_m_n_s16, vcmpgtq_m_s16,
vcmpgtq_m_n_s16, vcmpgeq_m_s16, vcmpgeq_m_n_s16, vcmpeqq_m_s16, vcmpeqq_m_n_s16,
vshlq_m_r_s16, vrshlq_m_n_s16, vrev64q_m_s16, vqshlq_m_r_s16, vqrshlq_m_n_s16,
vqnegq_m_s16, vqabsq_m_s16, vnegq_m_s16, vmvnq_m_s16, vmlsdavxq_p_s16,
vmlsdavq_p_s16, vmladavxq_p_s16, vmladavq_p_s16, vminvq_p_s16, vmaxvq_p_s16,
vdupq_m_n_s16, vclzq_m_s16, vclsq_m_s16, vaddvaq_p_s16, vabsq_m_s16,
vqrdmlsdhxq_s16, vqrdmlsdhq_s16, vqrdmlashq_n_s16, vqrdmlahq_n_s16,
vqrdmladhxq_s16, vqrdmladhq_s16, vqdmlsdhxq_s16, vqdmlsdhq_s16, vqdmlahq_n_s16,
vqdmladhxq_s16, vqdmladhq_s16, vmlsdavaxq_s16, vmlsdavaq_s16, vmlasq_n_s16,
vmlaq_n_s16, vmladavaxq_s16, vmladavaq_s16, vsriq_n_s16, vsliq_n_s16, 
vpselq_u32,
vpselq_s32, vrev64q_m_u32, vqrdmlashq_n_u32, vqrdmlahq_n_u32, vqdmlahq_n_u32,
vmvnq_m_u32, vmlasq_n_u32, vmlaq_n_u32, vmladavq_p_u32, vmladavaq_u32,
vminvq_p_u32, vmaxvq_p_u32, vdupq_m_n_u32, vcmpneq_m_u32, vcmpneq_m_n_u32,
vcmphiq_m_u32, vcmphiq_m_n_u32, vcmpeqq_m_u32, vcmpeqq_m_n_u32, vcmpcsq_m_u32,
vcmpcsq_m_n_u32, vclzq_m_u32, vaddvaq_p_u32, vsriq_n_u32, vsliq_n_u32,
vshlq_m_r_u32, vrshlq_m_n_u32, vqshlq_m_r_u32, vqrshlq_m_n_u32, vminavq_p_s32,
vminaq_m_s32, vmaxavq_p_s32, vmaxaq_m_s32, vcmpneq_m_s32, vcmpneq_m_n_s32,
vcmpltq_m_s32, vcmpltq_m_n_s32, vcmpleq_m_s32, vcmpleq_m_n_s32, vcmpgtq_m_s32,
vcmpgtq_m_n_s32, vcmpgeq_m_s32, vcmpgeq_m_n_s32, vcmpeqq_m_s32, vcmpeqq_m_n_s32,
vshlq_m_r_s32, vrshlq_m_n_s32, vrev64q_m_s32, vqshlq_m_r_s32, vqrshlq_m_n_s32,
vqnegq_m_s32, vqabsq_m_s32, vnegq_m_s32, vmvnq_m_s32, vmlsdavxq_p_s32,
vmlsdavq_p_s32, vmladavxq_p_s32, vmladavq_p_s32, vminvq_p_s32, vmaxvq_p_s32,
vdupq_m_n_s32, vclzq_m_s32, vclsq_m_s32, vaddvaq_p_s32, vabsq_m_s32,
vqrdmlsdhxq_s32, vqrdmlsdhq_s32, vqrdmlashq_n_s32, vqrdmlahq_n_s32,
vqrdmladhxq_s32, vqrdmladhq_s32, vqdmlsdhxq_s32, vqdmlsdhq_s32, vqdmlahq_n_s32,
vqdmladhxq_s32, vqdmladhq_s32, vmlsdavaxq_s32, vmlsdavaq_s32, vmlasq_n_s32,
vmlaq_n_s32, vmladavaxq_s32, vmladavaq_s32, vsriq_n_s32, vsliq_n_s32,
vpselq_u64, vpselq_s64.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraints "Rc" and "Re" are added, which checks the 
constant is with
in the range of 0 to 15 and 0 to 31 respectively.

Also a new predicates "mve_imm_15" and "mve_imm_31" are added, to check the the 
matching
constraint Rc and Re respectively.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-25  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vpselq_u8): Define macro.
(vpselq_s8): Likewise.
(vrev64q_m_u8): Likewise.
(vqrdmlashq_n_u8): Likewise.
(vqrdmlahq_n_u8): Likewise.
(vqdmlahq_n_u8): Likewise.
(vmvnq_m_u8): Likewise.
(vmlasq_n_u8): Likewise.

Re: [PATCH] c++: Fix constexpr evaluation of self-modifying CONSTRUCTORs [PR94066]

2020-03-18 Thread Patrick Palka via Gcc-patches
On Wed, 18 Mar 2020, Patrick Palka wrote:

> On Tue, 17 Mar 2020, Jason Merrill wrote:
> 
> > On 3/16/20 1:39 PM, Patrick Palka wrote:
> > > In this PR, we are performing constexpr evaluation of a CONSTRUCTOR of 
> > > type
> > > union U which looks like
> > > 
> > >{.a=foo (&)}.
> > > 
> > > Since the function foo takes a reference to the CONSTRUCTOR we're 
> > > building,
> > > it
> > > could potentially modify the CONSTRUCTOR from under us.  In particular 
> > > since
> > > U
> > > is a union, the evaluation of a's initializer could change the active 
> > > member
> > > from a to another member -- something which cxx_eval_bare_aggregate 
> > > doesn't
> > > expect to happen.
> > > 
> > > Upon further investigation, it turns out this issue is not limited to
> > > constructors of UNION_TYPE and not limited to cxx_eval_bare_aggregate
> > > either.
> > > For example, within cxx_eval_store_expression we may be evaluating an
> > > assignment
> > > such as (this comes from the test pr94066-2.C):
> > > 
> > >((union U *) this)->a = TARGET_EXPR ;
> > 
> > I assume this is actually an INIT_EXPR, or we would have preevaluated and 
> > not
> > had this problem.
> 
> Yes exactly, I should have specified that the above is an INIT_EXPR and
> not a MODIFY_EXPR.
> 
> > 
> > > where evaluation of foo could change the active member of *this, which was
> > > set
> > > earlier in cxx_eval_store_expression to 'a'.  And if U is a RECORD_TYPE,
> > > then
> > > evaluation of foo could add new fields to *this, thereby making stale the
> > > 'valp'
> > > pointer to the target constructor_elt through which we're later assigning.
> > > 
> > > So in short, it seems that both cxx_eval_bare_aggregate and
> > > cxx_eval_store_expression do not anticipate that a constructor_elt's
> > > initializer
> > > could modify the underlying CONSTRUCTOR as a side-effect.
> > 
> > Oof.  If this is well-formed, it's because initialization of a doesn't
> > actually start until the return statement of foo, so we're probably wrong to
> > create a CONSTRUCTOR to hold the value of 'a' before evaluating foo.  
> > Perhaps
> > init_subob_ctx shouldn't preemptively create a CONSTRUCTOR, and similarly 
> > for
> > the cxx_eval_store_expression !preeval code.
> 
> Hmm, I think I see what you mean.  I'll look into this.

In cpp0x/constexpr-array12.C we have

struct A { int ar[3]; };
constexpr A a1 = { 0, a1.ar[0] };

the initializer for a1 is a CONSTRUCTOR with the form

{.ar={0, (int) VIEW_CONVERT_EXPR(a1).ar[0]}}

If we don't preemptively create a CONSTRUCTOR in cxx_eval_bare_aggregate
to hold the value of 'ar' before evaluating its initializer, then we
won't be able to resolve the 'a1.ar[0]' later on, and we will reject
this otherwise valid test case with an "accessing an uninitialized array
element" diagnostic.  So it seems we need to continue creating a
CONSTRUCTOR in cxx_eval_bare_aggregate before evaluating the initializer
of an aggregate sub-object to handle self-referential CONSTRUCTORs like
the one above.

Then again, clang is going with rejecting the original testcase with the
following justification: https://bugs.llvm.org/show_bug.cgi?id=45133#c1
Should we follow suit?

> 
> > 
> > > To fix this problem, this patch makes cxx_eval_bare_aggregate and
> > > cxx_eval_store_expression recompute the pointer to the constructor_elt
> > > through
> > > which we're assigning, after the initializer has been evaluated.
> > > 
> > > I am worried that the use of get_or_insert_ctor_field in
> > > cxx_eval_bare_aggregate
> > > might introduce quadratic behavior where there wasn't before.  I wonder if
> > > there's a cheap heuristic we can use in cxx_eval_bare_aggregate to 
> > > determine
> > > whether "self-modification" took place, and in which case we could avoid
> > > calling
> > > get_or_insert_ctor_field and do the fast thing we that we were doing 
> > > before
> > > the
> > > patch.
> > 
> > We could remember the (integer) index of the constructor element and verify
> > that our sub-ctors are still at those indices before doing a new search.
> 
> Good idea, this should essentially eliminate the overhead of the second
> set of calls to get_or_insert_ctor_fields in cxx_eval_store_expression
> in the common case where the initializer doesn't mutate the underlying
> CONSTRUCTORs.  I added this optimization to v2 of the patch, through a
> new 'pos_hint' parameter of get_or_insert_ctor_fields.
> 
> As another optimization to get_or_insert_ctor_fields, we can first check
> whether the bit_position of INDEX is greater than the bit_position of
> the last constructor element of CTOR.  In his case we can immediately
> append a new constructor element to CTOR, and avoid iterating over the
> whole TYPE_FIELDS chain and over CTOR.  I think this optimization
> eliminates the potential quadratic behavior in cxx_eval_bare_aggregate
> in the common case.
> 
> With these two optimizations, the added overhead of the recomputations
> added by this patch 

Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-18 Thread J.W. Jagersma via Gcc-patches
Hi Michael, thanks for your response.

On 2020-03-17 16:32, Michael Matz wrote:
> Hello,
> 
> On Mon, 16 Mar 2020, Richard Sandiford wrote:
> 
>> Segher Boessenkool  writes:
>>> On Mon, Mar 16, 2020 at 05:47:03PM +, Richard Sandiford wrote:
 Segher Boessenkool  writes:
>> we do delete "x = 1" for f1.   I think that's the expected behaviour.
>> We don't yet delete the initialisation in f2, but I think in principle
>> we could.
>
> Right.  And this is incorrect if the asm may throw.

 Well...

>> So the kind of contract I was advocating was:
>>
>> - the compiler can't make any assumptions about what the asm does
>>   or doesn't do to output operands when an exception is raised
>>
>> - the source code can't make any assumption about the values bound
>>   to output operands when an exception is raised

 ...with this interpretation, the deletions above would be correct even
 if the asm throws.
>>>
>>> The write to "x" *before the asm* is deleted.  I cannot think of any
>>> interpretation where that is correct (this does not involve inline asm
>>> at all: it is deleting an observable side effect before the exception).
>>
>> It's correct under the contract above :-)
>>
> And the easiest (and only feasible?) way to do this is for the compiler
> to automatically make an input for every output as well, imo.

 Modifying the asm like that feels a bit dangerous,
>>>
>>> Yes, obviously.  The other option is to accept that almost all existing
>>> inline asm will have UB, with -fnon-call-exceptions.  I think that is
>>> an even less desirable option.
>>>
 And the other problem
 still exists: he compiler might assume that the output isn't modified
 unless the asm completes normally.
>>>
>>> I don't understand what this means?  As far as the compiler is concerned
>>> any asm is just one instruction?  And it all executes completely always.
>>> You need to do things with the constraints to tell the compiler it does
>>> not know some of the values around.  If you have both an input and an
>>> output for a variable, the compiler does not know what value is written
>>> to it, and it might just be the one that was the input already (which is
>>> the same effect as not writing it at all).
>>
>> Normally, for SSA names in something like:
>>
>>   _1 = foo ()
>>
>> the definition of _1 does not take place when foo throws.
> 
> Mostly, but maybe we need to lift this somewhen.  E.g. when we support 
> SSA form for non-registers; the actual return migth then be via invisible 
> reference, and hence the result might be changed even if foo throws.  That 
> also could happen right now for some return types depending on the 
> architecture (think large float types).  Our workaround for some of these 
> cases (where it's obvious that the result will lie in memory) is to put 
> the real copy-out into an extra gimple insn and make the LHS be a 
> temporary; but of course we don't want that with too large types.
> 
>> Similarly for non-call exceptions on other statements.  It sounds like 
>> what you're describing requires the corresponding definition to happen 
>> for memory outputs regardless of whether the asm throws or not, so that 
>> the memory appears to change on both excecution paths.  Otherwise, the 
>> compiler would be able to assume that the memory operand still has its 
>> original value in the exception handler.
> 
> Well, it's both: on the exception path the compiler has to assume that the 
> the value wasn't changed (so that former defines are regarded as dead) or 
> that it already has changed (so that the effects the throwing 
> "instruction" had on the result (if any) aren't lost).  The easiest for 
> this is to regard the result place as also being an input.

The way I see it, there are two options: either you use the outputs
when an exception is thrown, or you don't.

The first option is more or less what my first patch did, but it was
incomplete.  Changing each output to in+out would make that work
correctly.

The second is what I have implemented now, each output is assigned via
a temporary which is then assigned back to the variable bound to this
output.  On exception, this temporary is discarded.  However this is
not possible for asms that write directly to memory, so those cases are
treated like option #1.

I think the second option is easier on optimization since any previous
assignments can be removed from the normal code path, and registers
don't need to be loaded with known-valid values before the asm.  The
first option is more consistent since all outputs are treated the same,
but more dangerous, as the asm may write incomplete values before
throwing.

So, I prefer the second option, but I don't really have a strong
opinion either way.  As long as I can catch my exceptions I'm fine with
anything.  If the consensus is that the first option is preferable,
then I'll implement and submit that.

> (If broadened 

[Patch, committed] Fix libgomp.oacc-fortran/atomic_capture-1.f90

2020-03-18 Thread Tobias Burnus

The test case does the following:

   igot = 1

   !$acc parallel loop copy (igot, itmp)
 do i = 1, N
   !$acc atomic capture
   iarr(i) = igot
   igot = max (igot, i)
   !$acc end atomic
 end do
   !$acc end parallel loop

And then checks that "all(iarr < N)". That works fine as long as
no other code accesses "igot" after the i=N thread has set
it. Otherwise, all later accesses to igot will be N – and
thus some iarr(i) will be N, causing FAILs.

Or in other words: The current code either works if i=N is
run last (e.g. serial code) or when all concurrent accesses
access "igot" early enough such that none (or at least not
i=N) has modified it.

While serial ("host") and Nvidia work (PASS),
AMDGCN runs did fail before this patch.

The OG9 commit by Julian already fixed this but it was
never put on the GCC trunk = 10.0, cf.
868d3ad10f2dfd532a494bfe1513200eb361a6de on devel/omp/gcc-9.

While Julian's patch modified much more (also a C test case),
I have committed a minimal version which only fixes the
issue mentioned above for MIN and for MAX.

Committed as r10-7252-g26cbcfe5fce57b090b0f2336aad27d84b725f760

Cheers,

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 26cbcfe5fce57b090b0f2336aad27d84b725f760
Author: Tobias Burnus 
Date:   Wed Mar 18 16:28:08 2020 +0100

Fix libgomp.oacc-fortran/atomic_capture-1.f90

2020-03-18  Julian Brown 
Tobias Burnus  

* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90: Really make
it work concurrently.

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 3dbe94bc982..9a1065fef4e 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,9 @@
+2020-03-18  Julian Brown 
+	Tobias Burnus  
+
+	* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90: Really make
+	it work concurrently.
+
 2020-03-18  Tobias Burnus  
 
 	* testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C: Add
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
index 5a4a1e03f64..536b3f0030c 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
@@ -275,8 +275,9 @@ program main
   if (ltmp .neqv. .not. lexp) STOP 33
   if (lgot .neqv. lexp) STOP 34
 
-  igot = 1
+  igot = 0
   iexp = N
+  iarr = -42
 
   !$acc parallel loop copy (igot, itmp)
 do i = 1, N
@@ -287,13 +288,24 @@ program main
 end do
   !$acc end parallel loop
 
+  if (igot /= N) stop 107
+  itmp = 0
+  do i = 1, N
+ if (iarr(i) == 0) then
+   itmp = i
+   exit
+ end if
+  end do
+  ! At most one iarr element can be 0.
   do i = 1, N
- if (.not. (1 <= iarr(i) .and. iarr(i) < iexp)) STOP 35
+ if ((iarr(i) == 0 .and. i /= itmp) &
+ .or. iarr(i) < 0 .or. iarr(i) >= N) STOP 35
   end do
   if (igot /= iexp) STOP 36
 
-  igot = N
+  igot = N + 1
   iexp = 1
+  iarr = -42
 
   !$acc parallel loop copy (igot, itmp)
 do i = 1, N
@@ -304,8 +316,18 @@ program main
 end do
   !$acc end parallel loop
 
+  if (igot /= 1) stop 108
+  itmp = N + 1
+  ! At most one iarr element can be N+1.
+  do i = 1, N
+ if (iarr(i) == N + 1) then
+   itmp = i
+   exit
+ end if
+  end do
   do i = 1, N
- if (.not. (iarr(i) == 1 .or. iarr(i) == N)) STOP 37
+ if ((iarr(i) == N + 1 .and. i /= itmp) &
+ .or. iarr(i) <= 0 .or. iarr(i) > N + 1) STOP 37
   end do
   if (igot /= iexp) STOP 38
 


Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 2:56 PM Kewen.Lin  wrote:
>
> Hi Richi,
>
> Thanks for your comments.
>
> on 2020/3/18 下午6:39, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As PR90332 shows, the current scalar epilogue peeling for gaps
> >> elimination requires expected vec_init optab with two half size
> >> vector mode.  On Power, we don't support vector mode like V8QI,
> >> so can't support optab like vec_initv16qiv8qi.  But we want to
> >> leverage existing scalar mode like DI to init the desirable
> >> vector mode.  This patch is to extend the existing support for
> >> Power, as evaluated on Power9 we can see expected 1.9% speed up
> >> on SPEC2017 525.x264_r.
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
> >>
> >> Is it ok for trunk?
> >
> > There's already code exercising such a case in vectorizable_load
> > (VMAT_STRIDED_SLP) which you could have factored out.
> >
>
> Nice, will refer to and factor it.
>
> >  vectype, bool slp,
> >  than the alignment boundary B.  Every vector access will
> >  be a multiple of B and so we are guaranteed to access a
> >  non-gap element in the same B-sized block.  */
> > + machine_mode half_mode;
> >   if (overrun_p
> >   && gap < (vect_known_alignment_in_bytes (first_dr_info)
> > / vect_get_scalar_dr_size (first_dr_info)))
> > -   overrun_p = false;
> > -
> > +   {
> > + overrun_p = false;
> > + if (known_eq (nunits, (group_size - gap) * 2)
> > + && known_eq (nunits, group_size)
> > + && get_half_mode_for_vector (vectype, _mode))
> > +   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
> > +   }
> >
> > why do you need to amend this case?
> >
>
> This path can define overrun_p to false, some case can fall into
> "no peeling for gaps" hunk in vectorizable_load.  Since I used
> DR_GROUP_HALF_MODE to save the half mode, if some case matches
> this condition, vectorizable_load hunk can get unitialized
> DR_GROUP_HALF_MODE.  But even with proposed recomputing way, I
> think we still need to check the vec_init optab here if the
> know_eq half size conditions hold?

Hmm, but for the above case it's fine to access the excess elements.

I guess the vectorizable_load code needs to be amended with
the alignment check or we do need to store somewhere our
decision to use smaller loads.

>
> > I don't like storing DR_GROUP_HALF_MODE very much, later
> > you need a vector type and it looks cheap enough to recompute
> > it where you need it?  Iff then it doesn't belong to DR_GROUP
> > but to the stmt-info.
> >
>
> OK, I was intended not to recompute it for time saving, will
> throw it away.
>
> > I realize the original optimization was kind of a hack (and I was too
> > lazy to implement the integer mode construction path ...).
> >
> > So, can you factor out the existing code into a function returning
> > the vector type for construction for a vector type and a
> > pieces size?  So for V16QI and a pieces-size of 4 we'd
> > get either V16QI back (then construction from V4QI pieces
> > should work) or V4SI (then construction from SImode pieces
> > should work)?  Eventually as secondary output provide that
> > piece type (SI / V4QI).
>
> Sure.  I'm very poor to get a function name, does function name
> suitable_vector_and_pieces sound good?
>   ie. tree suitable_vector_and_pieces (tree vtype, tree *ptype);

tree vector_vector_composition_type (tree vtype, poly_uint64 nelts,
tree *ptype);

where nelts specifies the number of vtype elements in a piece.

Richard.

>
> BR,
> Kewen
>


Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Jeff Law via Gcc-patches
On Wed, 2020-03-18 at 14:25 +, Szabolcs Nagy wrote:
> The 03/13/2020 10:45, Martin Sebor via Gcc-patches wrote:
> > On 3/12/20 7:17 PM, Joseph Myers wrote:
> > > On Thu, 5 Mar 2020, Martin Sebor wrote:
> > > 
> > > > Tested on x86_64-linux.  Is this acceptable for GCC 10?  How about 9?
> > > 
> > > OK for GCC 10.
> > 
> > Thank you.  I committed it to trunk in r10-7162.
> 
> arm glibc build fails for me since this commit.
> 
> ../sysdeps/ieee754/dbl-64/s_modf.c:84:28: error: conflicting types for 
> built-in 
> function 'modfl'; expected 'long double(long double,  long double *)' [-
> Werror=builtin-declaration-mismatch]
>84 | libm_alias_double (__modf, modf)
>   |^~~~
> 
> it seems this used to compile but not any more:
> 
> double modf (double x, double *p) { return x; }
> extern __typeof (modf) modfl __attribute__ ((weak, alias ("modf")))
> __attribute__ ((__copy__ (modf)));
I think Joseph posted something this morning that might fix this.  

jeff



Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Szabolcs Nagy
The 03/13/2020 10:45, Martin Sebor via Gcc-patches wrote:
> On 3/12/20 7:17 PM, Joseph Myers wrote:
> > On Thu, 5 Mar 2020, Martin Sebor wrote:
> > 
> > > Tested on x86_64-linux.  Is this acceptable for GCC 10?  How about 9?
> > 
> > OK for GCC 10.
> 
> Thank you.  I committed it to trunk in r10-7162.

i see glibc build failure on arm since this commit:

../sysdeps/ieee754/dbl-64/s_modf.c:84:28: error: conflicting types for built-in 
function 'modfl'; expected 'long double(long double,  long double *)' 
[-Werror=builtin-declaration-mismatch]
   84 | libm_alias_double (__modf, modf)
  |^~~~

it seems this used to compile:

double modf (double x, double *p) { return x; }
extern __typeof (modf) modfl __attribute__ ((weak, alias ("modf"))) 
__attribute__ ((__copy__ (modf)));


Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-18 Thread Szabolcs Nagy
The 03/13/2020 10:45, Martin Sebor via Gcc-patches wrote:
> On 3/12/20 7:17 PM, Joseph Myers wrote:
> > On Thu, 5 Mar 2020, Martin Sebor wrote:
> > 
> > > Tested on x86_64-linux.  Is this acceptable for GCC 10?  How about 9?
> > 
> > OK for GCC 10.
> 
> Thank you.  I committed it to trunk in r10-7162.

arm glibc build fails for me since this commit.

../sysdeps/ieee754/dbl-64/s_modf.c:84:28: error: conflicting types for built-in 
function 'modfl'; expected 'long double(long double,  long double *)' 
[-Werror=builtin-declaration-mismatch]
   84 | libm_alias_double (__modf, modf)
  |^~~~

it seems this used to compile but not any more:

double modf (double x, double *p) { return x; }
extern __typeof (modf) modfl __attribute__ ((weak, alias ("modf"))) 
__attribute__ ((__copy__ (modf)));


[committed] analyzer: make summarized dumps more comprehensive

2020-03-18 Thread David Malcolm via Gcc-patches
The previous implementation of summarized dumps within
region_model::dump_to_pp showed only the "top-level" keys within the
current frame and for globals, and thus didn't e.g. show the values
of fields of structs, or elements of arrays.

This patch rewrites it to gather a vec of representative path_vars
for all regions, using this to generate the dump, so that all expressible
lvalues ought to make it to the summarized dump.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 884d914111228eed977d794f38e4cc88bf132a58.

gcc/analyzer/ChangeLog:
* region-model.cc: Include "stor-layout.h".
(region_model::dump_to_pp): Rather than calling
dump_summary_of_map on each of the current frame and the globals,
instead get a vec of representative path_vars for all regions,
and then dump a summary of all of them.
(region_model::dump_summary_of_map): Delete, rewriting into...
(region_model::dump_summary_of_rep_path_vars): ...this new
function, working on a vec of path_vars.
(region_model::set_value): New overload.
(region_model::get_representative_path_var): Rename
"parent_region" local to "parent_reg" and consolidate with other
local.  Guard test for grandparent being stack on parent_reg being
non-NULL.  Move handling for parent being an array_region to
within guard for parent_reg being non-NULL.
(selftest::make_test_compound_type): New function.
(selftest::test_dump_2): New selftest.
(selftest::test_dump_3): New selftest.
(selftest::test_stack_frames): Update expected output from
simplified dump to show "a" and "b" from parent frame and "y" in
child frame.
(selftest::analyzer_region_model_cc_tests): Call test_dump_2 and
test_dump_3.
* region-model.h (region_model::set_value): New overload decl.
(region_model::dump_summary_of_map): Delete.
(region_model::dump_summary_of_rep_path_vars): New.
---
 gcc/analyzer/region-model.cc | 261 ++-
 gcc/analyzer/region-model.h  |   6 +-
 2 files changed, 199 insertions(+), 68 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 45a190299ea..d0554adb675 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "analyzer/sm.h"
 #include "analyzer/pending-diagnostic.h"
 #include "analyzer/analyzer-selftests.h"
+#include "stor-layout.h"
 
 #if ENABLE_ANALYZER
 
@@ -3538,7 +3539,7 @@ region_model::dump_dot (const char *path) const
 /* Dump a multiline representation of this model to PP, showing the
region hierarchy, the svalues, and any constraints.
 
-   If SUMMARIZE is true, show only the most pertient information,
+   If SUMMARIZE is true, show only the most pertinent information,
in a form that attempts to be less verbose.
Otherwise, show all information.  */
 
@@ -3547,18 +3548,23 @@ region_model::dump_to_pp (pretty_printer *pp, bool 
summarize) const
 {
   if (summarize)
 {
-  bool is_first = true;
-  region_id frame_id = get_current_frame_id ();
-  frame_region *frame = get_region  (frame_id);
-  if (frame)
-   dump_summary_of_map (pp, frame, _first);
-
-  region_id globals_id = get_globals_region_id ();
-  map_region *globals = get_region  (globals_id);
-  if (globals)
-   dump_summary_of_map (pp, globals, _first);
+  auto_vec rep_path_vars;
 
   unsigned i;
+  region *reg;
+  FOR_EACH_VEC_ELT (m_regions, i, reg)
+   {
+ region_id rid = region_id::from_int (i);
+ path_var pv = get_representative_path_var (rid);
+ if (pv.m_tree)
+   rep_path_vars.safe_push (pv);
+   }
+  bool is_first = true;
+
+  /* Work with a copy in case the get_lvalue calls change anything
+(they shouldn't).  */
+  region_model copy (*this);
+  copy.dump_summary_of_rep_path_vars (pp, _path_vars, _first);
 
   equiv_class *ec;
   FOR_EACH_VEC_ELT (m_constraints->m_equiv_classes, i, ec)
@@ -3680,37 +3686,28 @@ dump_vec_of_tree (pretty_printer *pp,
   pp_printf (pp, "}: %s", label);
 }
 
-/* Dump *MAP_REGION to PP in compact form, updating *IS_FIRST.
-   Subroutine of region_model::dump_to_pp for use on stack frames and for
-   the "globals" region.  */
+/* Dump all *REP_PATH_VARS to PP in compact form, updating *IS_FIRST.
+   Subroutine of region_model::dump_to_pp.  */
 
 void
-region_model::dump_summary_of_map (pretty_printer *pp,
-  map_region *map_region,
-  bool *is_first) const
-{
-  /* Get the keys, sorted by tree_cmp.  In particular, this ought
- to alphabetize any decls.  */
-  auto_vec keys (map_region->elements ());
-  for (map_region::iterator_t iter = map_region->begin ();
-   iter != map_region->end 

[committed] analyzer: add test coverage for fixed ICE [PR94047]

2020-03-18 Thread David Malcolm via Gcc-patches
PR analyzer/94047 reports an ICE, which turned out to be caused
by the erroneous use of TREE_TYPE on the view region's type
in region_model::get_representative_path_var that I introduced
in r10-7024-ge516294a1acb28d44cfd583cc6a80354044e and
fixed in g:787477a226033e36be3f6d16b71be13dd917e982.

This patch adds a regression test for the ICE.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to master as f665beeba625490bd96a593d23e00726d969cf98.

gcc/testsuite/ChangeLog:
PR analyzer/94047
* gcc.dg/analyzer/pr94047.c: New test.
---
 gcc/testsuite/gcc.dg/analyzer/pr94047.c | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr94047.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr94047.c 
b/gcc/testsuite/gcc.dg/analyzer/pr94047.c
new file mode 100644
index 000..d989a254c9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr94047.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-Wno-analyzer-too-complex" } */
+/* TODO: the above ought not to be necessary, but currently is due to a
+   state explosion within the for loop.  */
+
+typedef struct list
+{
+  struct list *next;
+} tlist;
+
+void
+bar (struct list *l)
+{
+  l->next = l->next->next;
+}
+
+void
+foo (void)
+{
+  struct list l;
+  tlist t = l;
+  for (;;)
+bar ();
+}
-- 
2.21.0



[committed] analyzer: introduce noop_region_model_context

2020-03-18 Thread David Malcolm via Gcc-patches
tentative_region_model_context and test_region_model_context are both
forced to implement numerous pure virtual vfuncs of the abstract
region_model_context.

This patch adds a noop_region_model_context which provides empty
implementations of all of region_model_context's pure virtual functions,
and subclasses the above classes from that, rather than from
region_model_context directly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 7d9c107ab1eab331e7011513b11e26b78850d614.

gcc/analyzer/ChangeLog:
* region-model.h (class noop_region_model_context): New subclass
of region_model_context.
(class tentative_region_model_context): Inherit from
noop_region_model_context rather than from region_model_context;
drop redundant vfunc implementations.
(class test_region_model_context): Likewise.
---
 gcc/analyzer/region-model.h | 84 ++---
 1 file changed, 23 insertions(+), 61 deletions(-)

diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index c1fe592e30c..035b611b813 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -1972,42 +1972,50 @@ class region_model_context
const dump_location_t ) = 0;
 };
 
-/* A subclass of region_model_context for determining if operations fail
-   e.g. "can we generate a region for the lvalue of EXPR?".  */
+/* A "do nothing" subclass of region_model_context.  */
 
-class tentative_region_model_context : public region_model_context
+class noop_region_model_context : public region_model_context
 {
 public:
-  tentative_region_model_context () : m_num_unexpected_codes (0) {}
-
-  void warn (pending_diagnostic *) FINAL OVERRIDE {}
-  void remap_svalue_ids (const svalue_id_map &) FINAL OVERRIDE {}
-  int on_svalue_purge (svalue_id, const svalue_id_map &) FINAL OVERRIDE
+  void warn (pending_diagnostic *) OVERRIDE {}
+  void remap_svalue_ids (const svalue_id_map &) OVERRIDE {}
+  int on_svalue_purge (svalue_id, const svalue_id_map &) OVERRIDE
   {
 return 0;
   }
-  logger *get_logger () FINAL OVERRIDE { return NULL; }
+  logger *get_logger () OVERRIDE { return NULL; }
   void on_inherited_svalue (svalue_id parent_sid ATTRIBUTE_UNUSED,
svalue_id child_sid  ATTRIBUTE_UNUSED)
-FINAL OVERRIDE
+OVERRIDE
   {
   }
   void on_cast (svalue_id src_sid ATTRIBUTE_UNUSED,
-   svalue_id dst_sid ATTRIBUTE_UNUSED) FINAL OVERRIDE
+   svalue_id dst_sid ATTRIBUTE_UNUSED) OVERRIDE
   {
   }
   void on_condition (tree lhs ATTRIBUTE_UNUSED,
 enum tree_code op ATTRIBUTE_UNUSED,
-tree rhs ATTRIBUTE_UNUSED) FINAL OVERRIDE
+tree rhs ATTRIBUTE_UNUSED) OVERRIDE
   {
   }
-  void on_unknown_change (svalue_id sid ATTRIBUTE_UNUSED) FINAL OVERRIDE
+  void on_unknown_change (svalue_id sid ATTRIBUTE_UNUSED) OVERRIDE
   {
   }
   void on_phi (const gphi *phi ATTRIBUTE_UNUSED,
-  tree rhs ATTRIBUTE_UNUSED) FINAL OVERRIDE
+  tree rhs ATTRIBUTE_UNUSED) OVERRIDE
   {
   }
+  void on_unexpected_tree_code (tree, const dump_location_t &) OVERRIDE {}
+};
+
+/* A subclass of region_model_context for determining if operations fail
+   e.g. "can we generate a region for the lvalue of EXPR?".  */
+
+class tentative_region_model_context : public noop_region_model_context
+{
+public:
+  tentative_region_model_context () : m_num_unexpected_codes (0) {}
+
   void on_unexpected_tree_code (tree, const dump_location_t &)
 FINAL OVERRIDE
   {
@@ -2143,7 +2151,7 @@ using namespace ::selftest;
 /* An implementation of region_model_context for use in selftests, which
stores any pending_diagnostic instances passed to it.  */
 
-class test_region_model_context : public region_model_context
+class test_region_model_context : public noop_region_model_context
 {
 public:
   void warn (pending_diagnostic *d) FINAL OVERRIDE
@@ -2151,54 +2159,8 @@ public:
 m_diagnostics.safe_push (d);
   }
 
-  void remap_svalue_ids (const svalue_id_map &) FINAL OVERRIDE
-  {
-/* Empty.  */
-  }
-
-#if 0
-  bool can_purge_p (svalue_id) FINAL OVERRIDE
-  {
-return true;
-  }
-#endif
-
-  int on_svalue_purge (svalue_id, const svalue_id_map &) FINAL OVERRIDE
-  {
-/* Empty.  */
-return 0;
-  }
-
-  logger *get_logger () FINAL OVERRIDE { return NULL; }
-
-  void on_inherited_svalue (svalue_id parent_sid ATTRIBUTE_UNUSED,
-   svalue_id child_sid  ATTRIBUTE_UNUSED)
-FINAL OVERRIDE
-  {
-  }
-
-  void on_cast (svalue_id src_sid ATTRIBUTE_UNUSED,
-   svalue_id dst_sid ATTRIBUTE_UNUSED) FINAL OVERRIDE
-  {
-  }
-
   unsigned get_num_diagnostics () const { return m_diagnostics.length (); }
 
-  void on_condition (tree lhs ATTRIBUTE_UNUSED,
-enum tree_code op ATTRIBUTE_UNUSED,
-tree rhs ATTRIBUTE_UNUSED) FINAL OVERRIDE
-  {
-  }
-
- 

[committed] analyzer: tweaks to exploded_node ctor

2020-03-18 Thread David Malcolm via Gcc-patches
I have followup work that touches this, so it's easiest to get this
cleanup in first.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 0db2cd177020920e187ef47791d52cf689133a25.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::exploded_node): Move implementation
here from header; accept point_and_state by const reference rather
than by value.
* exploded-graph.h (exploded_node::exploded_node): Pass
point_and_state by const reference rather than by value.  Move
body to engine.cc.
---
 gcc/analyzer/engine.cc| 11 +++
 gcc/analyzer/exploded-graph.h |  7 +--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 2431ae34474..977626fd1b8 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -795,6 +795,17 @@ print_enode_indices (pretty_printer *pp,
 }
 }
 
+/* class exploded_node : public dnode.  */
+
+/* exploded_node's ctor.  */
+
+exploded_node::exploded_node (const point_and_state ,
+ int index)
+: m_ps (ps), m_status (STATUS_WORKLIST), m_index (index)
+{
+  gcc_checking_assert (ps.get_state ().m_region_model->canonicalized_p ());
+}
+
 /* For use by dump_dot, get a value for the .dot "fillcolor" attribute.
Colorize by sm-state, to make it easier to see how sm-state propagates
through the exploded_graph.  */
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index c0a520a9961..b9a561848b3 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -175,12 +175,7 @@ class exploded_node : public dnode
 STATUS_MERGER
   };
 
-  exploded_node (point_and_state ps,
-int index)
-  : m_ps (ps), m_status (STATUS_WORKLIST), m_index (index)
-  {
-gcc_checking_assert (ps.get_state ().m_region_model->canonicalized_p ());
-  }
+  exploded_node (const point_and_state , int index);
 
   hashval_t hash () const { return m_ps.hash (); }
 
-- 
2.21.0



Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
on 2020/3/18 下午6:40, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 11:39 AM Richard Biener
>  wrote:
>>
>> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin  wrote:
>>>
>>> Hi,
>>>
>>> As PR90332 shows, the current scalar epilogue peeling for gaps
>>> elimination requires expected vec_init optab with two half size
>>> vector mode.  On Power, we don't support vector mode like V8QI,
>>> so can't support optab like vec_initv16qiv8qi.  But we want to
>>> leverage existing scalar mode like DI to init the desirable
>>> vector mode.  This patch is to extend the existing support for
>>> Power, as evaluated on Power9 we can see expected 1.9% speed up
>>> on SPEC2017 525.x264_r.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>>>
>>> Is it ok for trunk?
>>
>> There's already code exercising such a case in vectorizable_load
>> (VMAT_STRIDED_SLP) which you could have factored out.
>>
>>  vectype, bool slp,
>>  than the alignment boundary B.  Every vector access will
>>  be a multiple of B and so we are guaranteed to access a
>>  non-gap element in the same B-sized block.  */
>> + machine_mode half_mode;
>>   if (overrun_p
>>   && gap < (vect_known_alignment_in_bytes (first_dr_info)
>> / vect_get_scalar_dr_size (first_dr_info)))
>> -   overrun_p = false;
>> -
>> +   {
>> + overrun_p = false;
>> + if (known_eq (nunits, (group_size - gap) * 2)
>> + && known_eq (nunits, group_size)
>> + && get_half_mode_for_vector (vectype, _mode))
>> +   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
>> +   }
>>
>> why do you need to amend this case?
>>
>> I don't like storing DR_GROUP_HALF_MODE very much, later
>> you need a vector type and it looks cheap enough to recompute
>> it where you need it?  Iff then it doesn't belong to DR_GROUP
>> but to the stmt-info.
>>
>> I realize the original optimization was kind of a hack (and I was too
>> lazy to implement the integer mode construction path ...).
>>
>> So, can you factor out the existing code into a function returning
>> the vector type for construction for a vector type and a
>> pieces size?  So for V16QI and a pieces-size of 4 we'd
>> get either V16QI back (then construction from V4QI pieces
>> should work) or V4SI (then construction from SImode pieces
>> should work)?  Eventually as secondary output provide that
>> piece type (SI / V4QI).
> 
> Btw, why not implement the neccessary vector init patterns?
> 

Power doesn't support 64bit vector size, it looks a bit hacky and
confusing to introduce this kind of mode just for some optab requirement,
but I admit the optab hack can immediately make it work.  :)

BR,
Kewen



Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for your comments.

on 2020/3/18 下午6:39, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR90332 shows, the current scalar epilogue peeling for gaps
>> elimination requires expected vec_init optab with two half size
>> vector mode.  On Power, we don't support vector mode like V8QI,
>> so can't support optab like vec_initv16qiv8qi.  But we want to
>> leverage existing scalar mode like DI to init the desirable
>> vector mode.  This patch is to extend the existing support for
>> Power, as evaluated on Power9 we can see expected 1.9% speed up
>> on SPEC2017 525.x264_r.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>>
>> Is it ok for trunk?
> 
> There's already code exercising such a case in vectorizable_load
> (VMAT_STRIDED_SLP) which you could have factored out.
> 

Nice, will refer to and factor it.

>  vectype, bool slp,
>  than the alignment boundary B.  Every vector access will
>  be a multiple of B and so we are guaranteed to access a
>  non-gap element in the same B-sized block.  */
> + machine_mode half_mode;
>   if (overrun_p
>   && gap < (vect_known_alignment_in_bytes (first_dr_info)
> / vect_get_scalar_dr_size (first_dr_info)))
> -   overrun_p = false;
> -
> +   {
> + overrun_p = false;
> + if (known_eq (nunits, (group_size - gap) * 2)
> + && known_eq (nunits, group_size)
> + && get_half_mode_for_vector (vectype, _mode))
> +   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
> +   }
> 
> why do you need to amend this case?
> 

This path can define overrun_p to false, some case can fall into
"no peeling for gaps" hunk in vectorizable_load.  Since I used
DR_GROUP_HALF_MODE to save the half mode, if some case matches
this condition, vectorizable_load hunk can get unitialized
DR_GROUP_HALF_MODE.  But even with proposed recomputing way, I
think we still need to check the vec_init optab here if the
know_eq half size conditions hold? 


> I don't like storing DR_GROUP_HALF_MODE very much, later
> you need a vector type and it looks cheap enough to recompute
> it where you need it?  Iff then it doesn't belong to DR_GROUP
> but to the stmt-info.
> 

OK, I was intended not to recompute it for time saving, will
throw it away.

> I realize the original optimization was kind of a hack (and I was too
> lazy to implement the integer mode construction path ...).
> 
> So, can you factor out the existing code into a function returning
> the vector type for construction for a vector type and a
> pieces size?  So for V16QI and a pieces-size of 4 we'd
> get either V16QI back (then construction from V4QI pieces
> should work) or V4SI (then construction from SImode pieces
> should work)?  Eventually as secondary output provide that
> piece type (SI / V4QI).

Sure.  I'm very poor to get a function name, does function name
suitable_vector_and_pieces sound good? 
  ie. tree suitable_vector_and_pieces (tree vtype, tree *ptype);


BR,
Kewen



Re: [PATCH] c++: Fix constexpr evaluation of self-modifying CONSTRUCTORs [PR94066]

2020-03-18 Thread Patrick Palka via Gcc-patches
On Tue, 17 Mar 2020, Jason Merrill wrote:

> On 3/16/20 1:39 PM, Patrick Palka wrote:
> > In this PR, we are performing constexpr evaluation of a CONSTRUCTOR of type
> > union U which looks like
> > 
> >{.a=foo (&)}.
> > 
> > Since the function foo takes a reference to the CONSTRUCTOR we're building,
> > it
> > could potentially modify the CONSTRUCTOR from under us.  In particular since
> > U
> > is a union, the evaluation of a's initializer could change the active member
> > from a to another member -- something which cxx_eval_bare_aggregate doesn't
> > expect to happen.
> > 
> > Upon further investigation, it turns out this issue is not limited to
> > constructors of UNION_TYPE and not limited to cxx_eval_bare_aggregate
> > either.
> > For example, within cxx_eval_store_expression we may be evaluating an
> > assignment
> > such as (this comes from the test pr94066-2.C):
> > 
> >((union U *) this)->a = TARGET_EXPR ;
> 
> I assume this is actually an INIT_EXPR, or we would have preevaluated and not
> had this problem.

Yes exactly, I should have specified that the above is an INIT_EXPR and
not a MODIFY_EXPR.

> 
> > where evaluation of foo could change the active member of *this, which was
> > set
> > earlier in cxx_eval_store_expression to 'a'.  And if U is a RECORD_TYPE,
> > then
> > evaluation of foo could add new fields to *this, thereby making stale the
> > 'valp'
> > pointer to the target constructor_elt through which we're later assigning.
> > 
> > So in short, it seems that both cxx_eval_bare_aggregate and
> > cxx_eval_store_expression do not anticipate that a constructor_elt's
> > initializer
> > could modify the underlying CONSTRUCTOR as a side-effect.
> 
> Oof.  If this is well-formed, it's because initialization of a doesn't
> actually start until the return statement of foo, so we're probably wrong to
> create a CONSTRUCTOR to hold the value of 'a' before evaluating foo.  Perhaps
> init_subob_ctx shouldn't preemptively create a CONSTRUCTOR, and similarly for
> the cxx_eval_store_expression !preeval code.

Hmm, I think I see what you mean.  I'll look into this.

> 
> > To fix this problem, this patch makes cxx_eval_bare_aggregate and
> > cxx_eval_store_expression recompute the pointer to the constructor_elt
> > through
> > which we're assigning, after the initializer has been evaluated.
> > 
> > I am worried that the use of get_or_insert_ctor_field in
> > cxx_eval_bare_aggregate
> > might introduce quadratic behavior where there wasn't before.  I wonder if
> > there's a cheap heuristic we can use in cxx_eval_bare_aggregate to determine
> > whether "self-modification" took place, and in which case we could avoid
> > calling
> > get_or_insert_ctor_field and do the fast thing we that we were doing before
> > the
> > patch.
> 
> We could remember the (integer) index of the constructor element and verify
> that our sub-ctors are still at those indices before doing a new search.

Good idea, this should essentially eliminate the overhead of the second
set of calls to get_or_insert_ctor_fields in cxx_eval_store_expression
in the common case where the initializer doesn't mutate the underlying
CONSTRUCTORs.  I added this optimization to v2 of the patch, through a
new 'pos_hint' parameter of get_or_insert_ctor_fields.

As another optimization to get_or_insert_ctor_fields, we can first check
whether the bit_position of INDEX is greater than the bit_position of
the last constructor element of CTOR.  In his case we can immediately
append a new constructor element to CTOR, and avoid iterating over the
whole TYPE_FIELDS chain and over CTOR.  I think this optimization
eliminates the potential quadratic behavior in cxx_eval_bare_aggregate
in the common case.

With these two optimizations, the added overhead of the recomputations
added by this patch should be negligible.

-- >8 --

gcc/cp/ChangeLog:

PR c++/94066
* constexpr.c (get_or_insert_ctor_field): Split out (while adding
handling for VECTOR_TYPEs, and common-case optimizations) from ...
(cxx_eval_store_expression): ... here.  Record the sequence of indexes
into INDEXES that yields the suboject we're assigning to.  Record the
integer offsets of the constructor indexes we're assigning through into
INDEX_POSITIONS.  After evaluating the initializer of the store
expression, recompute valp using INDEXES and using INDEX_POSITIONS as
hints.
(cxx_eval_bare_aggregate): Use get_or_insert_ctor_field to recompute the
pointer to the constructor_elt we're assigning through after evaluating
each initializer.

gcc/testsuite/ChangeLog:

PR c++/94066
* g++.dg/cpp1y/pr94066.C: New test.
* g++.dg/cpp1y/pr94066-2.C: New test.
* g++.dg/cpp1y/pr94066-3.C: New test.
* g++.dg/cpp1y/pr94066-4.C: New test.
* g++.dg/cpp1y/pr94066-5.C: New test.
---
 gcc/cp/constexpr.c | 219 -
 

[committed] amdgcn: Fix vector compare modes

2020-03-18 Thread Andrew Stubbs
This patch fixes a problem which has existed for a long time, but showed 
itself again after my previous patch to add conditional vector operators.


The solution is to set STORE_FLAG_VALUE properly. (More details in the 
patch header.)


Andrew
amdgcn: Fix vector compare modes

The GCN VCC register has 64 CC values in one registers, one bit for each
vector lane.

Previously we avoided problems with invalid optimizations by not declaring
a mode for the comparison operators, but it turns out that causes other
problems (and build warnings).

Instead, the optimization issues can be avoided by setting
STORE_REGISTER_VALUE to -1, meaning that all the bits are significant.

(It would be better if we could set STORE_REGISTER_VALUE according to the
known mask or vector size, but we can't.)

2020-03-18  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (vec_cmpdi): Set operand 1 to DImode.
	(vec_cmpdi_dup): Likewise.
	* config/gcn/gcn.h (STORE_FLAG_VALUE): Set to -1.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 68d89fadc9e..d3620688a9c 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2549,7 +2549,7 @@
 
 (define_insn "vec_cmpdi"
   [(set (match_operand:DI 0 "register_operand"	  "=cV,cV,  e, e,Sg,Sg")
-	(match_operator 1 "gcn_fp_compare_operator"
+	(match_operator:DI 1 "gcn_fp_compare_operator"
 	  [(match_operand:VCMP_MODE 2 "gcn_alu_operand"
 		  "vSv, B,vSv, B, v,vA")
 	   (match_operand:VCMP_MODE 3 "gcn_vop3_operand"
@@ -2658,7 +2658,7 @@
 
 (define_insn "vec_cmpdi_dup"
   [(set (match_operand:DI 0 "register_operand"		   "=cV,cV, e,e,Sg")
-	(match_operator 1 "gcn_fp_compare_operator"
+	(match_operator:DI 1 "gcn_fp_compare_operator"
 	  [(vec_duplicate:VCMP_MODE
 	 (match_operand: 2 "gcn_alu_operand"
 			   " Sv, B,Sv,B, A"))
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 0efa99f3bee..9993a995d05 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -607,6 +607,10 @@ enum gcn_builtin_codes
 #define SLOW_BYTE_ACCESS 0
 #define WORD_REGISTER_OPERATIONS 1
 
+/* Flag values are either BImode or DImode, but either way the compiler
+   should assume that all the bits are live.  */
+#define STORE_FLAG_VALUE -1
+
 /* Definitions for register eliminations.
 
This is an array of structures.  Each structure initializes one pair


[committed 2/2] libstdc++: Fix compilation of with Clang

2020-03-18 Thread Jonathan Wakely via Gcc-patches

Clang 9 supports C++20 via -std=c++2a but doesn't support three-way
comparisons, so  fails to compile. When the compiler doesn't
support default comparisons, this patch defines operator== and
operator!= for the _Stop_state_ref class. That is enough for the header
to be compiled with Clang. It allows operator== for stop_token and
stop_source to work, but not operator!= because that isn't explicitly
defined.

* include/std/stop_token (stop_token::_Stop_state_ref): Define
comparison operators explicitly if the compiler won't synthesize them.

Tested powerpc64le-linux, committed to master.


commit e5de406f9967ef4b0bbdbcbc0320869d2bf04558
Author: Jonathan Wakely 
Date:   Wed Mar 18 12:55:29 2020 +

libstdc++ Fix compilation of  with Clang

Clang 9 supports C++20 via -std=c++2a but doesn't support three-way
comparisons, so  fails to compile. When the compiler doesn't
support default comparisons, this patch defines operator== and
operator!= for the _Stop_state_ref class. That is enough for the header
to be compiled with Clang. It allows operator== for stop_token and
stop_source to work, but not operator!= because that isn't explicitly
defined.

* include/std/stop_token (stop_token::_Stop_state_ref): Define
comparison operators explicitly if the compiler won't synthesize them.

diff --git a/libstdc++-v3/include/std/stop_token b/libstdc++-v3/include/std/stop_token
index 6fb8ae05197..87beb08c71d 100644
--- a/libstdc++-v3/include/std/stop_token
+++ b/libstdc++-v3/include/std/stop_token
@@ -456,8 +456,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   _Stop_state_t* operator->() const noexcept { return _M_ptr; }
 
+#if __cpp_impl_three_way_comparison >= 201907L
   friend bool
   operator==(const _Stop_state_ref&, const _Stop_state_ref&) = default;
+#else
+  friend bool
+  operator==(const _Stop_state_ref& __lhs, const _Stop_state_ref& __rhs)
+  noexcept
+  { return __lhs._M_ptr == __rhs._M_ptr; }
+
+  friend bool
+  operator!=(const _Stop_state_ref& __lhs, const _Stop_state_ref& __rhs)
+  noexcept
+  { return __lhs._M_ptr != __rhs._M_ptr; }
+#endif
 
 private:
   _Stop_state_t* _M_ptr = nullptr;


[committed] amdgcn: Add cond_add/sub/and/ior/xor for all vector modes

2020-03-18 Thread Andrew Stubbs
This patch adds support for new conditional vector operators, including 
cond_mult, and more modes for cond_add, cond_sub, cond_and, cond_ior and 
cond_xor. This allows vectorization of more algorithms and several new 
test passes.


The min and max operators remain on the to do list because those require 
extends and truncates for some modes.


Andrew
amdgcn: Add cond_add/sub/and/ior/xor for all vector modes

2020-03-18  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (COND_MODE): Delete.
	(COND_INT_MODE): Delete.
	(cond_op): Add "mult".
	(cond_): Use VEC_ALLREG_MODE.
	(cond_): Use VEC_ALLREG_INT_MODE.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index a8034f77798..68d89fadc9e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2903,19 +2903,15 @@
 DONE;
   })
 
-; FIXME this should be VEC_REG_MODE, but not all dependencies are implemented.
-(define_mode_iterator COND_MODE [V64SI V64DI V64SF V64DF])
-(define_mode_iterator COND_INT_MODE [V64SI V64DI])
-
-(define_code_iterator cond_op [plus minus])
+(define_code_iterator cond_op [plus minus mult])
 
 (define_expand "cond_"
-  [(match_operand:COND_MODE 0 "register_operand")
+  [(match_operand:VEC_ALLREG_MODE 0 "register_operand")
(match_operand:DI 1 "register_operand")
-   (cond_op:COND_MODE
- (match_operand:COND_MODE 2 "gcn_alu_operand")
- (match_operand:COND_MODE 3 "gcn_alu_operand"))
-   (match_operand:COND_MODE 4 "register_operand")]
+   (cond_op:VEC_ALLREG_MODE
+ (match_operand:VEC_ALLREG_MODE 2 "gcn_alu_operand")
+ (match_operand:VEC_ALLREG_MODE 3 "gcn_alu_operand"))
+   (match_operand:VEC_ALLREG_MODE 4 "register_operand")]
   ""
   {
 operands[1] = force_reg (DImode, operands[1]);
@@ -2927,15 +2923,16 @@
 DONE;
   })
 
+;; TODO smin umin smax umax
 (define_code_iterator cond_bitop [and ior xor])
 
 (define_expand "cond_"
-  [(match_operand:COND_INT_MODE 0 "register_operand")
+  [(match_operand:VEC_ALLREG_INT_MODE 0 "register_operand")
(match_operand:DI 1 "register_operand")
-   (cond_bitop:COND_INT_MODE
- (match_operand:COND_INT_MODE 2 "gcn_alu_operand")
- (match_operand:COND_INT_MODE 3 "gcn_alu_operand"))
-   (match_operand:COND_INT_MODE 4 "register_operand")]
+   (cond_bitop:VEC_ALLREG_INT_MODE
+ (match_operand:VEC_ALLREG_INT_MODE 2 "gcn_alu_operand")
+ (match_operand:VEC_ALLREG_INT_MODE 3 "gcn_alu_operand"))
+   (match_operand:VEC_ALLREG_INT_MODE 4 "register_operand")]
   ""
   {
 operands[1] = force_reg (DImode, operands[1]);


[committed 1/2] libstdc++: Fix compilation with released versions of Clang

2020-03-18 Thread Jonathan Wakely via Gcc-patches
Clang 9 supports C++20 via -std=c++2a but doesn't support Concepts, so
several of the new additions related to the Ranges library fail to
compile with -std=c++2a. The new definition of iterator_traits and the
definition of default_sentinel_t are guarded by __cpp_lib_concepts, so
check that in addition to __cplusplus > 201703L.

* include/bits/stl_algobase.h (__lexicographical_compare_aux): Check
__cpp_lib_concepts before using iter_reference_t.
* include/bits/stream_iterator.h (istream_iterator): Check
__cpp_lib_concepts before using default_sentinel_t.
* include/bits/streambuf_iterator.h (istreambuf_iterator): Likewise.

Tested powerpc64le-linux, committed to master.


commit 07522ae90b5bae2ca95b64f3a4de60bea0cc0567
Author: Jonathan Wakely 
Date:   Wed Mar 18 12:55:29 2020 +

libstdc++: Fix compilation with released versions of Clang

Clang 9 supports C++20 via -std=c++2a but doesn't support Concepts, so
several of the new additions related to the Ranges library fail to
compile with -std=c++2a. The new definition of iterator_traits and the
definition of default_sentinel_t are guarded by __cpp_lib_concepts, so
check that in addition to __cplusplus > 201703L.

* include/bits/stl_algobase.h (__lexicographical_compare_aux): Check
__cpp_lib_concepts before using iter_reference_t.
* include/bits/stream_iterator.h (istream_iterator): Check
__cpp_lib_concepts before using default_sentinel_t.
* include/bits/streambuf_iterator.h (istreambuf_iterator): Likewise.

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 8f3ca885f03..a7e92d4b473 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1283,7 +1283,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 && !__gnu_cxx::__numeric_traits<_ValueType2>::__is_signed
 && __is_pointer<_II1>::__value
 && __is_pointer<_II2>::__value
-#if __cplusplus > 201703L
+#if __cplusplus > 201703L && __cpp_lib_concepts
 // For C++20 iterator_traits::value_type is non-volatile
 // so __is_byte could be true, but we can't use memcmp with
 // volatile data.
diff --git a/libstdc++-v3/include/bits/stream_iterator.h 
b/libstdc++-v3/include/bits/stream_iterator.h
index 9d8ead092b8..bd5ba2a80c0 100644
--- a/libstdc++-v3/include/bits/stream_iterator.h
+++ b/libstdc++-v3/include/bits/stream_iterator.h
@@ -77,7 +77,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_ok(__obj._M_ok)
   { }
 
-#if __cplusplus > 201703L
+#if __cplusplus > 201703L && __cpp_lib_concepts
   constexpr
   istream_iterator(default_sentinel_t)
   noexcept(is_nothrow_default_constructible_v<_Tp>)
@@ -153,7 +153,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator!=(const istream_iterator& __x, const istream_iterator& __y)
   { return !__x._M_equal(__y); }
 
-#if __cplusplus > 201703L
+#if __cplusplus > 201703L && __cpp_lib_concepts
   friend bool
   operator==(const istream_iterator& __i, default_sentinel_t)
   { return !__i._M_stream; }
diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h 
b/libstdc++-v3/include/bits/streambuf_iterator.h
index fc06c50040c..d3f1610fc8d 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -115,7 +115,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_CONSTEXPR istreambuf_iterator() _GLIBCXX_USE_NOEXCEPT
   : _M_sbuf(0), _M_c(traits_type::eof()) { }
 
-#if __cplusplus > 201703L
+#if __cplusplus > 201703L && __cpp_lib_concepts
   constexpr istreambuf_iterator(default_sentinel_t) noexcept
   : istreambuf_iterator() { }
 #endif
@@ -215,7 +215,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return traits_type::eq_int_type(__c, __eof);
   }
 
-#if __cplusplus > 201703L
+#if __cplusplus > 201703L && __cpp_lib_concepts
   friend bool
   operator==(const istreambuf_iterator& __i, default_sentinel_t __s)
   { return __i._M_at_eof(); }


[PATCH] middle-end/94206 fix memset folding to avoid types with padding

2020-03-18 Thread Richard Biener


This makes sure that the store a memset is folded to uses a type
covering all bits.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-03-18   Richard Biener  

PR middle-end/94206
* gimple-fold.c (gimple_fold_builtin_memset): Avoid using
partial int modes or not mode-precision integer types for
the store.

* gcc.dg/torture/pr94206.c: New testcase.
---
 gcc/gimple-fold.c  |  6 ++
 gcc/testsuite/gcc.dg/torture/pr94206.c | 17 +
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr94206.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3f17de974ed..c5939f19f59 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1235,12 +1235,18 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, 
tree c, tree len)
 
   length = tree_to_uhwi (len);
   if (GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (etype)) != length
+  || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (etype))
+ != GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (etype)))
   || get_pointer_alignment (dest) / BITS_PER_UNIT < length)
 return NULL_TREE;
 
   if (length > HOST_BITS_PER_WIDE_INT / BITS_PER_UNIT)
 return NULL_TREE;
 
+  if (!type_has_mode_precision_p (etype))
+etype = lang_hooks.types.type_for_mode (SCALAR_INT_TYPE_MODE (etype),
+   TYPE_UNSIGNED (etype));
+
   if (integer_zerop (c))
 cval = 0;
   else
diff --git a/gcc/testsuite/gcc.dg/torture/pr94206.c 
b/gcc/testsuite/gcc.dg/torture/pr94206.c
new file mode 100644
index 000..9e54bba4ed4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr94206.c
@@ -0,0 +1,17 @@
+/* { dg-do run { target lp64 } } */
+
+struct {
+unsigned long x:33;
+} s;
+typedef __typeof__(s.x + 0) uint33;
+
+int main()
+{
+  uint33 x;
+  __builtin_memset(, -1, sizeof x);
+  unsigned long u;
+  __builtin_memcpy(, , sizeof u);
+  if (u != -1ul)
+__builtin_abort ();
+  return 0;
+}
-- 
2.16.4


[committed] Fix up duplicated duplicated words in comments

2020-03-18 Thread Jakub Jelinek via Gcc-patches
Hi!

Another set of duplicated word fixes for things I've missed last time.
These include e.g. *.cc files I forgot about, or duplicated words at the start
or end of line.

Tested on x86_64-linux, committed to trunk as obvious.

2020-03-18  Jakub Jelinek  

* asan.c (get_mem_refs_of_builtin_call): Fix up duplicated word issue
in a comment.
* config/arc/arc.c (frame_stack_add): Likewise.
* gimple-loop-versioning.cc (loop_versioning::analyze_arbitrary_term):
Likewise.
* ipa-predicate.c (predicate::remap_after_inlining): Likewise.
* tree-ssa-strlen.h (handle_printf_call): Likewise.
* tree-ssa-strlen.c (is_strlen_related_p): Likewise.
* optinfo-emit-json.cc (optrecord_json_writer::add_record): Likewise.
analyzer/
* sm-malloc.cc (malloc_state_machine::on_stmt): Fix up duplicated word
issue in a comment.
* region-model.cc (region_model::make_region_for_unexpected_tree_code,
region_model::delete_region_and_descendents): Likewise.
* engine.cc (class exploded_cluster): Likewise.
* diagnostic-manager.cc (class path_builder): Likewise.
cp/
* constraint.cc (resolve_function_concept_check, subsumes_constraints,
strictly_subsumes): Fix up duplicated word issue in a comment.
* coroutines.cc (build_init_or_final_await, captures_temporary):
Likewise.
* logic.cc (dnf_size_r, cnf_size_r): Likewise.
* pt.c (append_type_to_template_for_access_check): Likewise.
d/
* expr.cc (ExprVisitor::visit (CatAssignExp *)): Fix up duplicated
word issue in a comment.
* d-target.cc (Target::FPTypeProperties::max): Likewise.
fortran/
* class.c (generate_finalization_wrapper): Fix up duplicated word
issue in a comment.
* trans-types.c (gfc_get_nodesc_array_type): Likewise.

--- gcc/asan.c.jj   2020-01-12 11:54:36.191416758 +0100
+++ gcc/asan.c  2020-03-18 11:40:49.402322262 +0100
@@ -794,7 +794,7 @@ get_mem_refs_of_builtin_call (gcall *cal
   handle_builtin_alloca (call, iter);
   break;
 /* And now the __atomic* and __sync builtins.
-   These are handled differently from the classical memory memory
+   These are handled differently from the classical memory
access builtins above.  */
 
 case BUILT_IN_ATOMIC_LOAD_1:
--- gcc/config/arc/arc.c.jj 2020-03-14 08:14:47.063741917 +0100
+++ gcc/config/arc/arc.c2020-03-18 11:30:29.758456109 +0100
@@ -2607,7 +2607,7 @@ frame_stack_add (HOST_WIDE_INT offset)
register.
 
During compilation of a function the frame size is evaluated
-   multiple times, it is not until the reload pass is complete the the
+   multiple times, it is not until the reload pass is complete the
frame size is considered fixed (it is at this point that space for
all spills has been allocated).  However the frame_pointer_needed
variable is not set true until the register allocation pass, as a
--- gcc/cp/constraint.cc.jj 2020-03-18 08:52:00.401441356 +0100
+++ gcc/cp/constraint.cc2020-03-18 11:32:52.317354771 +0100
@@ -316,7 +316,7 @@ resolve_function_concept_overload (tree
   return cands;
 }
 
-/* Determine if the the call expression CALL is a constraint check, and
+/* Determine if the call expression CALL is a constraint check, and
return the concept declaration and arguments being checked. If CALL
does not denote a constraint check, return NULL.  */
 
@@ -2958,7 +2958,7 @@ equivalently_constrained (tree d1, tree
  Partial ordering of constraints
 ---*/
 
-/* Returns true when the the constraints in A subsume those in B.  */
+/* Returns true when the constraints in A subsume those in B.  */
 
 bool
 subsumes_constraints (tree a, tree b)
@@ -2968,7 +2968,7 @@ subsumes_constraints (tree a, tree b)
   return subsumes (a, b);
 }
 
-/* Returns true when the the constraints in CI (with arguments
+/* Returns true when the constraints in CI (with arguments
ARGS) strictly subsume the associated constraints of TMPL.  */
 
 bool
--- gcc/cp/coroutines.cc.jj 2020-03-15 22:35:30.361489995 +0100
+++ gcc/cp/coroutines.cc2020-03-18 11:36:40.804986823 +0100
@@ -2466,7 +2466,7 @@ build_init_or_final_await (location_t lo
 return error_mark_node;
 
   /* So build the co_await for this */
-  /* For initial/final suspends the call is is "a" per [expr.await] 3.2.  */
+  /* For initial/final suspends the call is "a" per [expr.await] 3.2.  */
   return build_co_await (loc, setup_call, (is_final ? FINAL_SUSPEND_POINT
: INITIAL_SUSPEND_POINT));
 }
@@ -2547,7 +2547,7 @@ static tree
 captures_temporary (tree *stmt, int *do_subtree, void *d)
 {
   /* Stop recursing if we see an await expression, the subtrees
- of that will be handled when it it processed.  */
+ of that will be handled when it is 

Re: 答复: [PATCH PR94201] aarch64:ICE in tiny code model for ilp32

2020-03-18 Thread Richard Sandiford
"duanbo (C)"  writes:
> Thank you for your suggestions.  Looks like I have pasted the wrong test 
> case.  I'm sorry for that.  
> I have modified accordingly and changed to use the correct test now in my new 
> patch.  
> I have carried a full test, no new failure witnessed.  Newly added test fail 
> without the patch and pass after applying the patch.
> Attached please find the my new patch.  Can you sponsor it if it's OK to go? 

Thanks, pushed as g:d91480dee934478063fe5945b73ff3c108e40a91

Richard


[PATCH v2][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534345.html



Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vabdq_m_f32, vabdq_m_f16, vaddq_m_f32, vaddq_m_f16, vaddq_m_n_f32, 
vaddq_m_n_f16,
vandq_m_f32, vandq_m_f16, vbicq_m_f32, vbicq_m_f16, vbrsrq_m_n_f32, 
vbrsrq_m_n_f16,
vcaddq_rot270_m_f32, vcaddq_rot270_m_f16, vcaddq_rot90_m_f32, 
vcaddq_rot90_m_f16,
vcmlaq_m_f32, vcmlaq_m_f16, vcmlaq_rot180_m_f32, vcmlaq_rot180_m_f16,
vcmlaq_rot270_m_f32, vcmlaq_rot270_m_f16, vcmlaq_rot90_m_f32, 
vcmlaq_rot90_m_f16,
vcmulq_m_f32, vcmulq_m_f16, vcmulq_rot180_m_f32, vcmulq_rot180_m_f16,
vcmulq_rot270_m_f32, vcmulq_rot270_m_f16, vcmulq_rot90_m_f32, 
vcmulq_rot90_m_f16,
vcvtq_m_n_s32_f32, vcvtq_m_n_s16_f16, vcvtq_m_n_u32_f32, vcvtq_m_n_u16_f16,
veorq_m_f32, veorq_m_f16, vfmaq_m_f32, vfmaq_m_f16, vfmaq_m_n_f32, 
vfmaq_m_n_f16,
vfmasq_m_n_f32, vfmasq_m_n_f16, vfmsq_m_f32, vfmsq_m_f16, vmaxnmq_m_f32,
vmaxnmq_m_f16, vminnmq_m_f32, vminnmq_m_f16, vmulq_m_f32, vmulq_m_f16,
vmulq_m_n_f32, vmulq_m_n_f16, vornq_m_f32, vornq_m_f16, vorrq_m_f32, 
vorrq_m_f16,
vsubq_m_f32, vsubq_m_f16, vsubq_m_n_f32, vsubq_m_n_f16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1]  
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-31  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vabdq_m_f32): Define macro.
(vabdq_m_f16): Likewise.
(vaddq_m_f32): Likewise.
(vaddq_m_f16): Likewise.
(vaddq_m_n_f32): Likewise.
(vaddq_m_n_f16): Likewise.
(vandq_m_f32): Likewise.
(vandq_m_f16): Likewise.
(vbicq_m_f32): Likewise.
(vbicq_m_f16): Likewise.
(vbrsrq_m_n_f32): Likewise.
(vbrsrq_m_n_f16): Likewise.
(vcaddq_rot270_m_f32): Likewise.
(vcaddq_rot270_m_f16): Likewise.
(vcaddq_rot90_m_f32): Likewise.
(vcaddq_rot90_m_f16): Likewise.
(vcmlaq_m_f32): Likewise.
(vcmlaq_m_f16): Likewise.
(vcmlaq_rot180_m_f32): Likewise.
(vcmlaq_rot180_m_f16): Likewise.
(vcmlaq_rot270_m_f32): Likewise.
(vcmlaq_rot270_m_f16): Likewise.
(vcmlaq_rot90_m_f32): Likewise.
(vcmlaq_rot90_m_f16): Likewise.
(vcmulq_m_f32): Likewise.
(vcmulq_m_f16): Likewise.
(vcmulq_rot180_m_f32): Likewise.
(vcmulq_rot180_m_f16): Likewise.
(vcmulq_rot270_m_f32): Likewise.
(vcmulq_rot270_m_f16): Likewise.
(vcmulq_rot90_m_f32): Likewise.
(vcmulq_rot90_m_f16): Likewise.
(vcvtq_m_n_s32_f32): Likewise.
(vcvtq_m_n_s16_f16): Likewise.
(vcvtq_m_n_u32_f32): Likewise.
(vcvtq_m_n_u16_f16): Likewise.
(veorq_m_f32): Likewise.
(veorq_m_f16): Likewise.
(vfmaq_m_f32): Likewise.
(vfmaq_m_f16): Likewise.
(vfmaq_m_n_f32): Likewise.
(vfmaq_m_n_f16): Likewise.
(vfmasq_m_n_f32): Likewise.
(vfmasq_m_n_f16): Likewise.
(vfmsq_m_f32): Likewise.
(vfmsq_m_f16): Likewise.
(vmaxnmq_m_f32): Likewise.
(vmaxnmq_m_f16): Likewise.
(vminnmq_m_f32): Likewise.
(vminnmq_m_f16): Likewise.
(vmulq_m_f32): Likewise.
(vmulq_m_f16): Likewise.
(vmulq_m_n_f32): Likewise.
(vmulq_m_n_f16): Likewise.
(vornq_m_f32): Likewise.
(vornq_m_f16): Likewise.
(vorrq_m_f32): Likewise.
(vorrq_m_f16): Likewise.
(vsubq_m_f32): Likewise.
(vsubq_m_f16): Likewise.
(vsubq_m_n_f32): Likewise.
(vsubq_m_n_f16): Likewise.
(__attribute__): Likewise.
(__arm_vabdq_m_f32): Likewise.
(__arm_vabdq_m_f16): Likewise.
(__arm_vaddq_m_f32): Likewise.
(__arm_vaddq_m_f16): Likewise.
(__arm_vaddq_m_n_f32): Likewise.
(__arm_vaddq_m_n_f16): Likewise.
(__arm_vandq_m_f32): Likewise.
(__arm_vandq_m_f16): Likewise.
(__arm_vbicq_m_f32): Likewise.
(__arm_vbicq_m_f16): Likewise.
(__arm_vbrsrq_m_n_f32): Likewise.
(__arm_vbrsrq_m_n_f16): Likewise.
(__arm_vcaddq_rot270_m_f32): Likewise.
(__arm_vcaddq_rot270_m_f16): Likewise.
(__arm_vcaddq_rot90_m_f32): Likewise.
(__arm_vcaddq_rot90_m_f16): Likewise.
(__arm_vcmlaq_m_f32): Likewise.
(__arm_vcmlaq_m_f16): Likewise.
(__arm_vcmlaq_rot180_m_f32): Likewise.
(__arm_vcmlaq_rot180_m_f16): Likewise.
(__arm_vcmlaq_rot270_m_f32): Likewise.
(__arm_vcmlaq_rot270_m_f16): Likewise.
(__arm_vcmlaq_rot90_m_f32): Likewise.
(__arm_vcmlaq_rot90_m_f16): Likewise.
(__arm_vcmulq_m_f32): Likewise.

Re: [PATCH v3][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hi Kyrill,

This patches was already committed.
I have resend this by mistake.
Sorry, please ignore this.

Regards
SRI.

From: Gcc-patches  on behalf of Srinath 
Parvathaneni 
Sent: 18 March 2020 11:18
To: gcc-patches@gcc.gnu.org 
Subject: [PATCH v3][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.

Hello Kyrill,

Following patch is the rebased version of v2.
(version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542068.html



Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vabavq_s8, vabavq_s16, vabavq_s32, vbicq_m_n_s16, vbicq_m_n_s32,
vbicq_m_n_u16, vbicq_m_n_u32, vcmpeqq_m_f16, vcmpeqq_m_f32,
vcvtaq_m_s16_f16, vcvtaq_m_u16_f16, vcvtaq_m_s32_f32, vcvtaq_m_u32_f32,
vcvtq_m_f16_s16, vcvtq_m_f16_u16, vcvtq_m_f32_s32, vcvtq_m_f32_u32,
vqrshrnbq_n_s16, vqrshrnbq_n_u16, vqrshrnbq_n_s32, vqrshrnbq_n_u32,
vqrshrunbq_n_s16, vqrshrunbq_n_s32, vrmlaldavhaq_s32, vrmlaldavhaq_u32,
vshlcq_s8, vshlcq_u8, vshlcq_s16, vshlcq_u16, vshlcq_s32, vshlcq_u32,
vabavq_s8, vabavq_s16, vabavq_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-23  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS):
Define qualifier for ternary operands.
(TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vabavq_s8): Define macro.
(vabavq_s16): Likewise.
(vabavq_s32): Likewise.
(vbicq_m_n_s16): Likewise.
(vbicq_m_n_s32): Likewise.
(vbicq_m_n_u16): Likewise.
(vbicq_m_n_u32): Likewise.
(vcmpeqq_m_f16): Likewise.
(vcmpeqq_m_f32): Likewise.
(vcvtaq_m_s16_f16): Likewise.
(vcvtaq_m_u16_f16): Likewise.
(vcvtaq_m_s32_f32): Likewise.
(vcvtaq_m_u32_f32): Likewise.
(vcvtq_m_f16_s16): Likewise.
(vcvtq_m_f16_u16): Likewise.
(vcvtq_m_f32_s32): Likewise.
(vcvtq_m_f32_u32): Likewise.
(vqrshrnbq_n_s16): Likewise.
(vqrshrnbq_n_u16): Likewise.
(vqrshrnbq_n_s32): Likewise.
(vqrshrnbq_n_u32): Likewise.
(vqrshrunbq_n_s16): Likewise.
(vqrshrunbq_n_s32): Likewise.
(vrmlaldavhaq_s32): Likewise.
(vrmlaldavhaq_u32): Likewise.
(vshlcq_s8): Likewise.
(vshlcq_u8): Likewise.
(vshlcq_s16): Likewise.
(vshlcq_u16): Likewise.
(vshlcq_s32): Likewise.
(vshlcq_u32): Likewise.
(vabavq_u8): Likewise.
(vabavq_u16): Likewise.
(vabavq_u32): Likewise.
(__arm_vabavq_s8): Define intrinsic.
(__arm_vabavq_s16): Likewise.
(__arm_vabavq_s32): Likewise.
(__arm_vabavq_u8): Likewise.
(__arm_vabavq_u16): Likewise.
(__arm_vabavq_u32): Likewise.
(__arm_vbicq_m_n_s16): Likewise.
(__arm_vbicq_m_n_s32): Likewise.
(__arm_vbicq_m_n_u16): Likewise.
(__arm_vbicq_m_n_u32): Likewise.
(__arm_vqrshrnbq_n_s16): Likewise.
(__arm_vqrshrnbq_n_u16): Likewise.
(__arm_vqrshrnbq_n_s32): Likewise.
(__arm_vqrshrnbq_n_u32): Likewise.
(__arm_vqrshrunbq_n_s16): Likewise.
(__arm_vqrshrunbq_n_s32): Likewise.
(__arm_vrmlaldavhaq_s32): Likewise.
(__arm_vrmlaldavhaq_u32): Likewise.
(__arm_vshlcq_s8): Likewise.
(__arm_vshlcq_u8): Likewise.
(__arm_vshlcq_s16): Likewise.
(__arm_vshlcq_u16): Likewise.
(__arm_vshlcq_s32): Likewise.
(__arm_vshlcq_u32): Likewise.
(__arm_vcmpeqq_m_f16): Likewise.
(__arm_vcmpeqq_m_f32): Likewise.
(__arm_vcvtaq_m_s16_f16): Likewise.
(__arm_vcvtaq_m_u16_f16): Likewise.
(__arm_vcvtaq_m_s32_f32): Likewise.
(__arm_vcvtaq_m_u32_f32): Likewise.
(__arm_vcvtq_m_f16_s16): Likewise.
(__arm_vcvtq_m_f16_u16): Likewise.
(__arm_vcvtq_m_f32_s32): Likewise.
(__arm_vcvtq_m_f32_u32): Likewise.

[PATCH v2][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534324.html


Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vmlaldavaq_p_s16, vmlaldavaq_p_s32, vmlaldavaq_p_u16, vmlaldavaq_p_u32, 
vmlaldavaxq_p_s16,
vmlaldavaxq_p_s32, vmlaldavaxq_p_u16, vmlaldavaxq_p_u32, vmlsldavaq_p_s16, 
vmlsldavaq_p_s32,
vmlsldavaxq_p_s16, vmlsldavaxq_p_s32, vmullbq_poly_m_p16, vmullbq_poly_m_p8,
vmulltq_poly_m_p16, vmulltq_poly_m_p8, vqdmullbq_m_n_s16, vqdmullbq_m_n_s32, 
vqdmullbq_m_s16,
vqdmullbq_m_s32, vqdmulltq_m_n_s16, vqdmulltq_m_n_s32, vqdmulltq_m_s16, 
vqdmulltq_m_s32,
vqrshrnbq_m_n_s16, vqrshrnbq_m_n_s32, vqrshrnbq_m_n_u16, vqrshrnbq_m_n_u32, 
vqrshrntq_m_n_s16,
vqrshrntq_m_n_s32, vqrshrntq_m_n_u16, vqrshrntq_m_n_u32, vqrshrunbq_m_n_s16, 
vqrshrunbq_m_n_s32,
vqrshruntq_m_n_s16, vqrshruntq_m_n_s32, vqshrnbq_m_n_s16, vqshrnbq_m_n_s32, 
vqshrnbq_m_n_u16,
vqshrnbq_m_n_u32, vqshrntq_m_n_s16, vqshrntq_m_n_s32, vqshrntq_m_n_u16, 
vqshrntq_m_n_u32,
vqshrunbq_m_n_s16, vqshrunbq_m_n_s32, vqshruntq_m_n_s16, vqshruntq_m_n_s32, 
vrmlaldavhaq_p_s32,
vrmlaldavhaq_p_u32, vrmlaldavhaxq_p_s32, vrmlsldavhaq_p_s32, 
vrmlsldavhaxq_p_s32,
vrshrnbq_m_n_s16, vrshrnbq_m_n_s32, vrshrnbq_m_n_u16, vrshrnbq_m_n_u32, 
vrshrntq_m_n_s16,
vrshrntq_m_n_s32, vrshrntq_m_n_u16, vrshrntq_m_n_u32, vshllbq_m_n_s16, 
vshllbq_m_n_s8,
vshllbq_m_n_u16, vshllbq_m_n_u8, vshlltq_m_n_s16, vshlltq_m_n_s8, 
vshlltq_m_n_u16, vshlltq_m_n_u8,
vshrnbq_m_n_s16, vshrnbq_m_n_s32, vshrnbq_m_n_u16, vshrnbq_m_n_u32, 
vshrntq_m_n_s16,
vshrntq_m_n_s32, vshrntq_m_n_u16, vshrntq_m_n_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-31  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-protos.h (arm_mve_immediate_check): 
* config/arm/arm.c (arm_mve_immediate_check): Define fuction to check
mode and interger value.
* config/arm/arm_mve.h (vmlaldavaq_p_s32): Define macro.
(vmlaldavaq_p_s16): Likewise.
(vmlaldavaq_p_u32): Likewise.
(vmlaldavaq_p_u16): Likewise.
(vmlaldavaxq_p_s32): Likewise.
(vmlaldavaxq_p_s16): Likewise.
(vmlaldavaxq_p_u32): Likewise.
(vmlaldavaxq_p_u16): Likewise.
(vmlsldavaq_p_s32): Likewise.
(vmlsldavaq_p_s16): Likewise.
(vmlsldavaxq_p_s32): Likewise.
(vmlsldavaxq_p_s16): Likewise.
(vmullbq_poly_m_p8): Likewise.
(vmullbq_poly_m_p16): Likewise.
(vmulltq_poly_m_p8): Likewise.
(vmulltq_poly_m_p16): Likewise.
(vqdmullbq_m_n_s32): Likewise.
(vqdmullbq_m_n_s16): Likewise.
(vqdmullbq_m_s32): Likewise.
(vqdmullbq_m_s16): Likewise.
(vqdmulltq_m_n_s32): Likewise.
(vqdmulltq_m_n_s16): Likewise.
(vqdmulltq_m_s32): Likewise.
(vqdmulltq_m_s16): Likewise.
(vqrshrnbq_m_n_s32): Likewise.
(vqrshrnbq_m_n_s16): Likewise.
(vqrshrnbq_m_n_u32): Likewise.
(vqrshrnbq_m_n_u16): Likewise.
(vqrshrntq_m_n_s32): Likewise.
(vqrshrntq_m_n_s16): Likewise.
(vqrshrntq_m_n_u32): Likewise.
(vqrshrntq_m_n_u16): Likewise.
(vqrshrunbq_m_n_s32): Likewise.
(vqrshrunbq_m_n_s16): Likewise.
(vqrshruntq_m_n_s32): Likewise.
(vqrshruntq_m_n_s16): Likewise.
(vqshrnbq_m_n_s32): Likewise.
(vqshrnbq_m_n_s16): Likewise.
(vqshrnbq_m_n_u32): Likewise.
(vqshrnbq_m_n_u16): Likewise.
(vqshrntq_m_n_s32): Likewise.
(vqshrntq_m_n_s16): Likewise.
(vqshrntq_m_n_u32): Likewise.
(vqshrntq_m_n_u16): Likewise.
(vqshrunbq_m_n_s32): Likewise.
(vqshrunbq_m_n_s16): Likewise.
(vqshruntq_m_n_s32): Likewise.
(vqshruntq_m_n_s16): Likewise.
(vrmlaldavhaq_p_s32): Likewise.
(vrmlaldavhaq_p_u32): Likewise.
(vrmlaldavhaxq_p_s32): Likewise.
(vrmlsldavhaq_p_s32): Likewise.
(vrmlsldavhaxq_p_s32): Likewise.
(vrshrnbq_m_n_s32): Likewise.
(vrshrnbq_m_n_s16): Likewise.
(vrshrnbq_m_n_u32): Likewise.
(vrshrnbq_m_n_u16): Likewise.
(vrshrntq_m_n_s32): Likewise.
(vrshrntq_m_n_s16): Likewise.
(vrshrntq_m_n_u32): Likewise.
(vrshrntq_m_n_u16): Likewise.
(vshllbq_m_n_s8): Likewise.
(vshllbq_m_n_s16): Likewise.
(vshllbq_m_n_u8): Likewise.
(vshllbq_m_n_u16): Likewise.
(vshlltq_m_n_s8): Likewise.
(vshlltq_m_n_s16): Likewise.
(vshlltq_m_n_u8): Likewise.
(vshlltq_m_n_u16): Likewise.
(vshrnbq_m_n_s32): Likewise.
(vshrnbq_m_n_s16): Likewise.
(vshrnbq_m_n_u32): 

[PATCH v2][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534339.html



Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vabdq_m_s8, vabdq_m_s32, vabdq_m_s16, vabdq_m_u8, vabdq_m_u32, vabdq_m_u16, 
vaddq_m_n_s8,
vaddq_m_n_s32, vaddq_m_n_s16, vaddq_m_n_u8, vaddq_m_n_u32, vaddq_m_n_u16, 
vaddq_m_s8,
vaddq_m_s32, vaddq_m_s16, vaddq_m_u8, vaddq_m_u32, vaddq_m_u16, vandq_m_s8, 
vandq_m_s32,
vandq_m_s16, vandq_m_u8, vandq_m_u32, vandq_m_u16, vbicq_m_s8, vbicq_m_s32, 
vbicq_m_s16,
vbicq_m_u8, vbicq_m_u32, vbicq_m_u16, vbrsrq_m_n_s8, vbrsrq_m_n_s32, 
vbrsrq_m_n_s16,
vbrsrq_m_n_u8, vbrsrq_m_n_u32, vbrsrq_m_n_u16, vcaddq_rot270_m_s8, 
vcaddq_rot270_m_s32,
vcaddq_rot270_m_s16, vcaddq_rot270_m_u8, vcaddq_rot270_m_u32, 
vcaddq_rot270_m_u16,
vcaddq_rot90_m_s8, vcaddq_rot90_m_s32, vcaddq_rot90_m_s16, vcaddq_rot90_m_u8,
vcaddq_rot90_m_u32, vcaddq_rot90_m_u16, veorq_m_s8, veorq_m_s32, veorq_m_s16, 
veorq_m_u8,
veorq_m_u32, veorq_m_u16, vhaddq_m_n_s8, vhaddq_m_n_s32, vhaddq_m_n_s16, 
vhaddq_m_n_u8,
vhaddq_m_n_u32, vhaddq_m_n_u16, vhaddq_m_s8, vhaddq_m_s32, vhaddq_m_s16, 
vhaddq_m_u8,
vhaddq_m_u32, vhaddq_m_u16, vhcaddq_rot270_m_s8, vhcaddq_rot270_m_s32, 
vhcaddq_rot270_m_s16,
vhcaddq_rot90_m_s8, vhcaddq_rot90_m_s32, vhcaddq_rot90_m_s16, vhsubq_m_n_s8, 
vhsubq_m_n_s32,
vhsubq_m_n_s16, vhsubq_m_n_u8, vhsubq_m_n_u32, vhsubq_m_n_u16, vhsubq_m_s8, 
vhsubq_m_s32,
vhsubq_m_s16, vhsubq_m_u8, vhsubq_m_u32, vhsubq_m_u16, vmaxq_m_s8, vmaxq_m_s32, 
vmaxq_m_s16,
vmaxq_m_u8, vmaxq_m_u32, vmaxq_m_u16, vminq_m_s8, vminq_m_s32, vminq_m_s16, 
vminq_m_u8,
vminq_m_u32, vminq_m_u16, vmladavaq_p_s8, vmladavaq_p_s32, vmladavaq_p_s16, 
vmladavaq_p_u8,
vmladavaq_p_u32, vmladavaq_p_u16, vmladavaxq_p_s8, vmladavaxq_p_s32, 
vmladavaxq_p_s16,
vmlaq_m_n_s8, vmlaq_m_n_s32, vmlaq_m_n_s16, vmlaq_m_n_u8, vmlaq_m_n_u32, 
vmlaq_m_n_u16,
vmlasq_m_n_s8, vmlasq_m_n_s32, vmlasq_m_n_s16, vmlasq_m_n_u8, vmlasq_m_n_u32, 
vmlasq_m_n_u16,
vmlsdavaq_p_s8, vmlsdavaq_p_s32, vmlsdavaq_p_s16, vmlsdavaxq_p_s8, 
vmlsdavaxq_p_s32,
vmlsdavaxq_p_s16, vmulhq_m_s8, vmulhq_m_s32, vmulhq_m_s16, vmulhq_m_u8, 
vmulhq_m_u32,
vmulhq_m_u16, vmullbq_int_m_s8, vmullbq_int_m_s32, vmullbq_int_m_s16, 
vmullbq_int_m_u8,
vmullbq_int_m_u32, vmullbq_int_m_u16, vmulltq_int_m_s8, vmulltq_int_m_s32, 
vmulltq_int_m_s16,
vmulltq_int_m_u8, vmulltq_int_m_u32, vmulltq_int_m_u16, vmulq_m_n_s8, 
vmulq_m_n_s32,
vmulq_m_n_s16, vmulq_m_n_u8, vmulq_m_n_u32, vmulq_m_n_u16, vmulq_m_s8, 
vmulq_m_s32,
vmulq_m_s16, vmulq_m_u8, vmulq_m_u32, vmulq_m_u16, vornq_m_s8, vornq_m_s32, 
vornq_m_s16,
vornq_m_u8, vornq_m_u32, vornq_m_u16, vorrq_m_s8, vorrq_m_s32, vorrq_m_s16, 
vorrq_m_u8,
vorrq_m_u32, vorrq_m_u16, vqaddq_m_n_s8, vqaddq_m_n_s32, vqaddq_m_n_s16, 
vqaddq_m_n_u8,
vqaddq_m_n_u32, vqaddq_m_n_u16, vqaddq_m_s8, vqaddq_m_s32, vqaddq_m_s16, 
vqaddq_m_u8, 
vqaddq_m_u32, vqaddq_m_u16, vqdmladhq_m_s8, vqdmladhq_m_s32, vqdmladhq_m_s16, 
vqdmladhxq_m_s8,
vqdmladhxq_m_s32, vqdmladhxq_m_s16, vqdmlahq_m_n_s8, vqdmlahq_m_n_s32, 
vqdmlahq_m_n_s16,
vqdmlahq_m_n_u8, vqdmlahq_m_n_u32, vqdmlahq_m_n_u16, vqdmlsdhq_m_s8, 
vqdmlsdhq_m_s32,
vqdmlsdhq_m_s16, vqdmlsdhxq_m_s8, vqdmlsdhxq_m_s32, vqdmlsdhxq_m_s16, 
vqdmulhq_m_n_s8,
vqdmulhq_m_n_s32, vqdmulhq_m_n_s16, vqdmulhq_m_s8, vqdmulhq_m_s32, 
vqdmulhq_m_s16,
vqrdmladhq_m_s8, vqrdmladhq_m_s32, vqrdmladhq_m_s16, vqrdmladhxq_m_s8, 
vqrdmladhxq_m_s32,
vqrdmladhxq_m_s16, vqrdmlahq_m_n_s8, vqrdmlahq_m_n_s32, vqrdmlahq_m_n_s16, 
vqrdmlahq_m_n_u8,
vqrdmlahq_m_n_u32, vqrdmlahq_m_n_u16, vqrdmlashq_m_n_s8, vqrdmlashq_m_n_s32, 
vqrdmlashq_m_n_s16,
vqrdmlashq_m_n_u8, vqrdmlashq_m_n_u32, vqrdmlashq_m_n_u16, vqrdmlsdhq_m_s8, 
vqrdmlsdhq_m_s32,
vqrdmlsdhq_m_s16, vqrdmlsdhxq_m_s8, vqrdmlsdhxq_m_s32, vqrdmlsdhxq_m_s16, 
vqrdmulhq_m_n_s8,
vqrdmulhq_m_n_s32, vqrdmulhq_m_n_s16, vqrdmulhq_m_s8, vqrdmulhq_m_s32, 
vqrdmulhq_m_s16,
vqrshlq_m_s8, vqrshlq_m_s32, vqrshlq_m_s16, vqrshlq_m_u8, vqrshlq_m_u32, 
vqrshlq_m_u16,
vqshlq_m_n_s8, vqshlq_m_n_s32, vqshlq_m_n_s16, vqshlq_m_n_u8, vqshlq_m_n_u32, 
vqshlq_m_n_u16,
vqshlq_m_s8, vqshlq_m_s32, vqshlq_m_s16, vqshlq_m_u8, vqshlq_m_u32, 
vqshlq_m_u16, 
vqsubq_m_n_s8, vqsubq_m_n_s32, vqsubq_m_n_s16, vqsubq_m_n_u8, vqsubq_m_n_u32, 
vqsubq_m_n_u16,
vqsubq_m_s8, vqsubq_m_s32, vqsubq_m_s16, vqsubq_m_u8, vqsubq_m_u32, 
vqsubq_m_u16,
vrhaddq_m_s8, vrhaddq_m_s32, vrhaddq_m_s16, vrhaddq_m_u8, vrhaddq_m_u32, 
vrhaddq_m_u16,
vrmulhq_m_s8, vrmulhq_m_s32, vrmulhq_m_s16, vrmulhq_m_u8, vrmulhq_m_u32, 
vrmulhq_m_u16,
vrshlq_m_s8, vrshlq_m_s32, vrshlq_m_s16, vrshlq_m_u8, vrshlq_m_u32, 
vrshlq_m_u16, vrshrq_m_n_s8,
vrshrq_m_n_s32, vrshrq_m_n_s16, vrshrq_m_n_u8, vrshrq_m_n_u32, vrshrq_m_n_u16, 
vshlq_m_n_s8,
vshlq_m_n_s32, vshlq_m_n_s16, vshlq_m_n_u8, vshlq_m_n_u32, vshlq_m_n_u16, 
vshrq_m_n_s8,
vshrq_m_n_s32, vshrq_m_n_s16, vshrq_m_n_u8, vshrq_m_n_u32, vshrq_m_n_u16, 
vsliq_m_n_s8,
vsliq_m_n_s32, vsliq_m_n_s16, vsliq_m_n_u8, vsliq_m_n_u32, vsliq_m_n_u16, 
vsubq_m_n_s8,

[PATCH v2][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v1.
(version v1) https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534332.html



Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vsriq_m_n_s8, vsubq_m_s8, vsubq_x_s8, vcvtq_m_n_f16_u16, vcvtq_x_n_f16_u16,
vqshluq_m_n_s8, vabavq_p_s8, vsriq_m_n_u8, vshlq_m_u8, vshlq_x_u8, vsubq_m_u8,
vsubq_x_u8, vabavq_p_u8, vshlq_m_s8, vshlq_x_s8, vcvtq_m_n_f16_s16,
vcvtq_x_n_f16_s16, vsriq_m_n_s16, vsubq_m_s16, vsubq_x_s16, vcvtq_m_n_f32_u32,
vcvtq_x_n_f32_u32, vqshluq_m_n_s16, vabavq_p_s16, vsriq_m_n_u16,
vshlq_m_u16, vshlq_x_u16, vsubq_m_u16, vsubq_x_u16, vabavq_p_u16, vshlq_m_s16,
vshlq_x_s16, vcvtq_m_n_f32_s32, vcvtq_x_n_f32_s32, vsriq_m_n_s32, vsubq_m_s32,
vsubq_x_s32, vqshluq_m_n_s32, vabavq_p_s32, vsriq_m_n_u32, vshlq_m_u32,
vshlq_x_u32, vsubq_m_u32, vsubq_x_u32, vabavq_p_u32, vshlq_m_s32, vshlq_x_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-29  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c 
(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS):
Define builtin qualifier.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vsriq_m_n_s8): Define macro.
(vsubq_m_s8): Likewise.
(vcvtq_m_n_f16_u16): Likewise.
(vqshluq_m_n_s8): Likewise.
(vabavq_p_s8): Likewise.
(vsriq_m_n_u8): Likewise.
(vshlq_m_u8): Likewise.
(vsubq_m_u8): Likewise.
(vabavq_p_u8): Likewise.
(vshlq_m_s8): Likewise.
(vcvtq_m_n_f16_s16): Likewise.
(vsriq_m_n_s16): Likewise.
(vsubq_m_s16): Likewise.
(vcvtq_m_n_f32_u32): Likewise.
(vqshluq_m_n_s16): Likewise.
(vabavq_p_s16): Likewise.
(vsriq_m_n_u16): Likewise.
(vshlq_m_u16): Likewise.
(vsubq_m_u16): Likewise.
(vabavq_p_u16): Likewise.
(vshlq_m_s16): Likewise.
(vcvtq_m_n_f32_s32): Likewise.
(vsriq_m_n_s32): Likewise.
(vsubq_m_s32): Likewise.
(vqshluq_m_n_s32): Likewise.
(vabavq_p_s32): Likewise.
(vsriq_m_n_u32): Likewise.
(vshlq_m_u32): Likewise.
(vsubq_m_u32): Likewise.
(vabavq_p_u32): Likewise.
(vshlq_m_s32): Likewise.
(__arm_vsriq_m_n_s8): Define intrinsic.
(__arm_vsubq_m_s8): Likewise.
(__arm_vqshluq_m_n_s8): Likewise.
(__arm_vabavq_p_s8): Likewise.
(__arm_vsriq_m_n_u8): Likewise.
(__arm_vshlq_m_u8): Likewise.
(__arm_vsubq_m_u8): Likewise.
(__arm_vabavq_p_u8): Likewise.
(__arm_vshlq_m_s8): Likewise.
(__arm_vsriq_m_n_s16): Likewise.
(__arm_vsubq_m_s16): Likewise.
(__arm_vqshluq_m_n_s16): Likewise.
(__arm_vabavq_p_s16): Likewise.
(__arm_vsriq_m_n_u16): Likewise.
(__arm_vshlq_m_u16): Likewise.
(__arm_vsubq_m_u16): Likewise.
(__arm_vabavq_p_u16): Likewise.
(__arm_vshlq_m_s16): Likewise.
(__arm_vsriq_m_n_s32): Likewise.
(__arm_vsubq_m_s32): Likewise.
(__arm_vqshluq_m_n_s32): Likewise.
(__arm_vabavq_p_s32): Likewise.
(__arm_vsriq_m_n_u32): Likewise.
(__arm_vshlq_m_u32): Likewise.
(__arm_vsubq_m_u32): Likewise.
(__arm_vabavq_p_u32): Likewise.
(__arm_vshlq_m_s32): Likewise.
(__arm_vcvtq_m_n_f16_u16): Likewise.
(__arm_vcvtq_m_n_f16_s16): Likewise.
(__arm_vcvtq_m_n_f32_u32): Likewise.
(__arm_vcvtq_m_n_f32_s32): Likewise.
(vcvtq_m_n): Define polymorphic variant.
(vqshluq_m_n): Likewise.
(vshlq_m): Likewise.
(vsriq_m_n): Likewise.
(vsubq_m): Likewise.
(vabavq_p): Likewise.
* config/arm/arm_mve_builtins.def
(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Use builtin qualifier.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.

[committed] aarch64: Treat p12-p15 as call-preserved in SVE PCS functions

2020-03-18 Thread Richard Sandiford
Due to a stupid mistake that I can't really explain, I'd got the
treatment of p12-p15 mixed up when adding support for the SVE PCS.
The registers are supposed to be call-preserved rather than
call-clobbered.

The fix is simple, but it has quite a big effect on the PCS tests
(as it should!).

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Pushed.

Richard


2020-03-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_sve_abi): Treat p12-p15 as
call-preserved for SVE PCS functions.
(aarch64_layout_frame): Cope with up to 12 predicate save slots.
Optimize the case in which there are no following vector save slots.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/cpy_1.c: Leave gaps for in the
check-function-bodies patterns for p15 to be saved.
* gcc.target/aarch64/sve/pcs/args_1.c (callee_pred): Expect two
predicates to be saved.
* gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Expect p12-p15
to be saved and restored.
* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1.c (test_1): Likewise.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/stack_clash_1_128.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
* gcc.target/aarch64/sve/pcs/stack_clash_1_256.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
(test_4): Expect only 16 bytes of stack to be allocated for the
predicate save slot.
* gcc.target/aarch64/sve/pcs/stack_clash_1_512.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
(test_4): Expect only 16 bytes of stack to be allocated for the
predicate save slot.
* gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
(test_4): Expect only 16 bytes of stack to be allocated for the
predicate save slot.
* gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c (test_1): Expect
p12-p15 to be saved and restored.
(test_2): Remove p12-p15 from the clobber list.
(test_4): Expect only 32 bytes of stack to be allocated for the
predicate save slot.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Use z16 rather
than p4 to create a vector-sized save slot.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
---
 gcc/config/aarch64/aarch64.c  |  31 +-
 .../aarch64/sve/acle/general/cpy_1.c  |   4 +
 .../gcc.target/aarch64/sve/pcs/args_1.c   |   6 +
 .../aarch64/sve/pcs/saves_1_be_nowrap.c   |  78 +++--
 .../aarch64/sve/pcs/saves_1_be_wrap.c |  78 +++--
 .../aarch64/sve/pcs/saves_1_le_nowrap.c   |  78 +++--
 .../aarch64/sve/pcs/saves_1_le_wrap.c |  78 +++--
 .../aarch64/sve/pcs/saves_2_be_nowrap.c   | 304 ++
 .../aarch64/sve/pcs/saves_2_be_wrap.c | 304 ++
 .../aarch64/sve/pcs/saves_2_le_nowrap.c   | 304 ++
 .../aarch64/sve/pcs/saves_2_le_wrap.c | 304 ++
 .../gcc.target/aarch64/sve/pcs/saves_4_be.c   |  78 +++--
 .../gcc.target/aarch64/sve/pcs/saves_4_le.c   |  78 +++--
 .../gcc.target/aarch64/sve/pcs/saves_5_be.c   |  76 +++--
 .../gcc.target/aarch64/sve/pcs/saves_5_le.c   |  76 +++--
 .../aarch64/sve/pcs/stack_clash_1.c   |  81 ++---
 .../aarch64/sve/pcs/stack_clash_1_1024.c  |  82 ++---
 .../aarch64/sve/pcs/stack_clash_1_128.c 

[PATCH v3][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v2.
(version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542068.html



Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vrmlaldavhaxq_s32, vrmlsldavhaq_s32, vrmlsldavhaxq_s32, vaddlvaq_p_s32, 
vcvtbq_m_f16_f32,
vcvtbq_m_f32_f16, vcvttq_m_f16_f32, vcvttq_m_f32_f16, vrev16q_m_s8, 
vrev32q_m_f16,
vrmlaldavhq_p_s32, vrmlaldavhxq_p_s32, vrmlsldavhq_p_s32, vrmlsldavhxq_p_s32, 
vaddlvaq_p_u32,
vrev16q_m_u8, vrmlaldavhq_p_u32, vmvnq_m_n_s16, vorrq_m_n_s16, vqrshrntq_n_s16, 
vqshrnbq_n_s16,
vqshrntq_n_s16, vrshrnbq_n_s16, vrshrntq_n_s16, vshrnbq_n_s16, vshrntq_n_s16, 
vcmlaq_f16,
vcmlaq_rot180_f16, vcmlaq_rot270_f16, vcmlaq_rot90_f16, vfmaq_f16, vfmaq_n_f16, 
vfmasq_n_f16,
vfmsq_f16, vmlaldavaq_s16, vmlaldavaxq_s16, vmlsldavaq_s16, vmlsldavaxq_s16, 
vabsq_m_f16,
vcvtmq_m_s16_f16, vcvtnq_m_s16_f16, vcvtpq_m_s16_f16, vcvtq_m_s16_f16, 
vdupq_m_n_f16,
vmaxnmaq_m_f16, vmaxnmavq_p_f16, vmaxnmvq_p_f16, vminnmaq_m_f16, 
vminnmavq_p_f16, 
vminnmvq_p_f16, vmlaldavq_p_s16, vmlaldavxq_p_s16, vmlsldavq_p_s16, 
vmlsldavxq_p_s16, 
vmovlbq_m_s8, vmovltq_m_s8, vmovnbq_m_s16, vmovntq_m_s16, vnegq_m_f16, 
vpselq_f16,
vqmovnbq_m_s16, vqmovntq_m_s16, vrev32q_m_s8, vrev64q_m_f16, vrndaq_m_f16, 
vrndmq_m_f16,
vrndnq_m_f16, vrndpq_m_f16, vrndq_m_f16, vrndxq_m_f16, vcmpeqq_m_n_f16, 
vcmpgeq_m_f16,
vcmpgeq_m_n_f16, vcmpgtq_m_f16, vcmpgtq_m_n_f16, vcmpleq_m_f16, vcmpleq_m_n_f16,
vcmpltq_m_f16, vcmpltq_m_n_f16, vcmpneq_m_f16, vcmpneq_m_n_f16, vmvnq_m_n_u16,
vorrq_m_n_u16, vqrshruntq_n_s16, vqshrunbq_n_s16, vqshruntq_n_s16, 
vcvtmq_m_u16_f16,
vcvtnq_m_u16_f16, vcvtpq_m_u16_f16, vcvtq_m_u16_f16, vqmovunbq_m_s16, 
vqmovuntq_m_s16,
vqrshrntq_n_u16, vqshrnbq_n_u16, vqshrntq_n_u16, vrshrnbq_n_u16, 
vrshrntq_n_u16, 
vshrnbq_n_u16, vshrntq_n_u16, vmlaldavaq_u16, vmlaldavaxq_u16, vmlaldavq_p_u16,
vmlaldavxq_p_u16, vmovlbq_m_u8, vmovltq_m_u8, vmovnbq_m_u16, vmovntq_m_u16, 
vqmovnbq_m_u16,
vqmovntq_m_u16, vrev32q_m_u8, vmvnq_m_n_s32, vorrq_m_n_s32, vqrshrntq_n_s32, 
vqshrnbq_n_s32,
vqshrntq_n_s32, vrshrnbq_n_s32, vrshrntq_n_s32, vshrnbq_n_s32, vshrntq_n_s32, 
vcmlaq_f32,
vcmlaq_rot180_f32, vcmlaq_rot270_f32, vcmlaq_rot90_f32, vfmaq_f32, vfmaq_n_f32, 
vfmasq_n_f32,
vfmsq_f32, vmlaldavaq_s32, vmlaldavaxq_s32, vmlsldavaq_s32, vmlsldavaxq_s32, 
vabsq_m_f32,
vcvtmq_m_s32_f32, vcvtnq_m_s32_f32, vcvtpq_m_s32_f32, vcvtq_m_s32_f32, 
vdupq_m_n_f32,
vmaxnmaq_m_f32, vmaxnmavq_p_f32, vmaxnmvq_p_f32, vminnmaq_m_f32, 
vminnmavq_p_f32,
vminnmvq_p_f32, vmlaldavq_p_s32, vmlaldavxq_p_s32, vmlsldavq_p_s32, 
vmlsldavxq_p_s32,
vmovlbq_m_s16, vmovltq_m_s16, vmovnbq_m_s32, vmovntq_m_s32, vnegq_m_f32, 
vpselq_f32,
vqmovnbq_m_s32, vqmovntq_m_s32, vrev32q_m_s16, vrev64q_m_f32, vrndaq_m_f32, 
vrndmq_m_f32,
vrndnq_m_f32, vrndpq_m_f32, vrndq_m_f32, vrndxq_m_f32, vcmpeqq_m_n_f32, 
vcmpgeq_m_f32,
vcmpgeq_m_n_f32, vcmpgtq_m_f32, vcmpgtq_m_n_f32, vcmpleq_m_f32, vcmpleq_m_n_f32,
vcmpltq_m_f32, vcmpltq_m_n_f32, vcmpneq_m_f32, vcmpneq_m_n_f32, vmvnq_m_n_u32, 
vorrq_m_n_u32,
vqrshruntq_n_s32, vqshrunbq_n_s32, vqshruntq_n_s32, vcvtmq_m_u32_f32, 
vcvtnq_m_u32_f32,
vcvtpq_m_u32_f32, vcvtq_m_u32_f32, vqmovunbq_m_s32, vqmovuntq_m_s32, 
vqrshrntq_n_u32,
vqshrnbq_n_u32, vqshrntq_n_u32, vrshrnbq_n_u32, vrshrntq_n_u32, vshrnbq_n_u32, 
vshrntq_n_u32,
vmlaldavaq_u32, vmlaldavaxq_u32, vmlaldavq_p_u32, vmlaldavxq_p_u32, 
vmovlbq_m_u16,
vmovltq_m_u16, vmovnbq_m_u32, vmovntq_m_u32, vqmovnbq_m_u32, vqmovntq_m_u32, 
vrev32q_m_u16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-29  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vrmlaldavhaxq_s32): Define macro.
(vrmlsldavhaq_s32): Likewise.
(vrmlsldavhaxq_s32): Likewise.
(vaddlvaq_p_s32): Likewise.
(vcvtbq_m_f16_f32): Likewise.
(vcvtbq_m_f32_f16): Likewise.
(vcvttq_m_f16_f32): Likewise.
(vcvttq_m_f32_f16): Likewise.
(vrev16q_m_s8): Likewise.
(vrev32q_m_f16): Likewise.
(vrmlaldavhq_p_s32): Likewise.
(vrmlaldavhxq_p_s32): Likewise.
(vrmlsldavhq_p_s32): Likewise.
(vrmlsldavhxq_p_s32): Likewise.
(vaddlvaq_p_u32): Likewise.
(vrev16q_m_u8): Likewise.
(vrmlaldavhq_p_u32): Likewise.
(vmvnq_m_n_s16): Likewise.
(vorrq_m_n_s16): Likewise.
(vqrshrntq_n_s16): Likewise.
(vqshrnbq_n_s16): Likewise.
(vqshrntq_n_s16): Likewise.
(vrshrnbq_n_s16): Likewise.
(vrshrntq_n_s16): Likewise.
(vshrnbq_n_s16): Likewise.
(vshrntq_n_s16): Likewise.
(vcmlaq_f16): Likewise.
(vcmlaq_rot180_f16): Likewise.
(vcmlaq_rot270_f16): Likewise.

[PATCH v3][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v2.
(version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542067.html




Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vpselq_u8, vpselq_s8, vrev64q_m_u8, vqrdmlashq_n_u8, vqrdmlahq_n_u8,
vqdmlahq_n_u8, vmvnq_m_u8, vmlasq_n_u8, vmlaq_n_u8, vmladavq_p_u8,
vmladavaq_u8, vminvq_p_u8, vmaxvq_p_u8, vdupq_m_n_u8, vcmpneq_m_u8,
vcmpneq_m_n_u8, vcmphiq_m_u8, vcmphiq_m_n_u8, vcmpeqq_m_u8,
vcmpeqq_m_n_u8, vcmpcsq_m_u8, vcmpcsq_m_n_u8, vclzq_m_u8, vaddvaq_p_u8,
vsriq_n_u8, vsliq_n_u8, vshlq_m_r_u8, vrshlq_m_n_u8, vqshlq_m_r_u8,
vqrshlq_m_n_u8, vminavq_p_s8, vminaq_m_s8, vmaxavq_p_s8, vmaxaq_m_s8,
vcmpneq_m_s8, vcmpneq_m_n_s8, vcmpltq_m_s8, vcmpltq_m_n_s8, vcmpleq_m_s8,
vcmpleq_m_n_s8, vcmpgtq_m_s8, vcmpgtq_m_n_s8, vcmpgeq_m_s8, vcmpgeq_m_n_s8,
vcmpeqq_m_s8, vcmpeqq_m_n_s8, vshlq_m_r_s8, vrshlq_m_n_s8, vrev64q_m_s8,
vqshlq_m_r_s8, vqrshlq_m_n_s8, vqnegq_m_s8, vqabsq_m_s8, vnegq_m_s8,
vmvnq_m_s8, vmlsdavxq_p_s8, vmlsdavq_p_s8, vmladavxq_p_s8, vmladavq_p_s8,
vminvq_p_s8, vmaxvq_p_s8, vdupq_m_n_s8, vclzq_m_s8, vclsq_m_s8, vaddvaq_p_s8,
vabsq_m_s8, vqrdmlsdhxq_s8, vqrdmlsdhq_s8, vqrdmlashq_n_s8, vqrdmlahq_n_s8,
vqrdmladhxq_s8, vqrdmladhq_s8, vqdmlsdhxq_s8, vqdmlsdhq_s8, vqdmlahq_n_s8,
vqdmladhxq_s8, vqdmladhq_s8, vmlsdavaxq_s8, vmlsdavaq_s8, vmlasq_n_s8,
vmlaq_n_s8, vmladavaxq_s8, vmladavaq_s8, vsriq_n_s8, vsliq_n_s8, vpselq_u16,
vpselq_s16, vrev64q_m_u16, vqrdmlashq_n_u16, vqrdmlahq_n_u16, vqdmlahq_n_u16,
vmvnq_m_u16, vmlasq_n_u16, vmlaq_n_u16, vmladavq_p_u16, vmladavaq_u16,
vminvq_p_u16, vmaxvq_p_u16, vdupq_m_n_u16, vcmpneq_m_u16, vcmpneq_m_n_u16,
vcmphiq_m_u16, vcmphiq_m_n_u16, vcmpeqq_m_u16, vcmpeqq_m_n_u16, vcmpcsq_m_u16,
vcmpcsq_m_n_u16, vclzq_m_u16, vaddvaq_p_u16, vsriq_n_u16, vsliq_n_u16,
vshlq_m_r_u16, vrshlq_m_n_u16, vqshlq_m_r_u16, vqrshlq_m_n_u16, vminavq_p_s16,
vminaq_m_s16, vmaxavq_p_s16, vmaxaq_m_s16, vcmpneq_m_s16, vcmpneq_m_n_s16,
vcmpltq_m_s16, vcmpltq_m_n_s16, vcmpleq_m_s16, vcmpleq_m_n_s16, vcmpgtq_m_s16,
vcmpgtq_m_n_s16, vcmpgeq_m_s16, vcmpgeq_m_n_s16, vcmpeqq_m_s16, vcmpeqq_m_n_s16,
vshlq_m_r_s16, vrshlq_m_n_s16, vrev64q_m_s16, vqshlq_m_r_s16, vqrshlq_m_n_s16,
vqnegq_m_s16, vqabsq_m_s16, vnegq_m_s16, vmvnq_m_s16, vmlsdavxq_p_s16,
vmlsdavq_p_s16, vmladavxq_p_s16, vmladavq_p_s16, vminvq_p_s16, vmaxvq_p_s16,
vdupq_m_n_s16, vclzq_m_s16, vclsq_m_s16, vaddvaq_p_s16, vabsq_m_s16,
vqrdmlsdhxq_s16, vqrdmlsdhq_s16, vqrdmlashq_n_s16, vqrdmlahq_n_s16,
vqrdmladhxq_s16, vqrdmladhq_s16, vqdmlsdhxq_s16, vqdmlsdhq_s16, vqdmlahq_n_s16,
vqdmladhxq_s16, vqdmladhq_s16, vmlsdavaxq_s16, vmlsdavaq_s16, vmlasq_n_s16,
vmlaq_n_s16, vmladavaxq_s16, vmladavaq_s16, vsriq_n_s16, vsliq_n_s16, 
vpselq_u32,
vpselq_s32, vrev64q_m_u32, vqrdmlashq_n_u32, vqrdmlahq_n_u32, vqdmlahq_n_u32,
vmvnq_m_u32, vmlasq_n_u32, vmlaq_n_u32, vmladavq_p_u32, vmladavaq_u32,
vminvq_p_u32, vmaxvq_p_u32, vdupq_m_n_u32, vcmpneq_m_u32, vcmpneq_m_n_u32,
vcmphiq_m_u32, vcmphiq_m_n_u32, vcmpeqq_m_u32, vcmpeqq_m_n_u32, vcmpcsq_m_u32,
vcmpcsq_m_n_u32, vclzq_m_u32, vaddvaq_p_u32, vsriq_n_u32, vsliq_n_u32,
vshlq_m_r_u32, vrshlq_m_n_u32, vqshlq_m_r_u32, vqrshlq_m_n_u32, vminavq_p_s32,
vminaq_m_s32, vmaxavq_p_s32, vmaxaq_m_s32, vcmpneq_m_s32, vcmpneq_m_n_s32,
vcmpltq_m_s32, vcmpltq_m_n_s32, vcmpleq_m_s32, vcmpleq_m_n_s32, vcmpgtq_m_s32,
vcmpgtq_m_n_s32, vcmpgeq_m_s32, vcmpgeq_m_n_s32, vcmpeqq_m_s32, vcmpeqq_m_n_s32,
vshlq_m_r_s32, vrshlq_m_n_s32, vrev64q_m_s32, vqshlq_m_r_s32, vqrshlq_m_n_s32,
vqnegq_m_s32, vqabsq_m_s32, vnegq_m_s32, vmvnq_m_s32, vmlsdavxq_p_s32,
vmlsdavq_p_s32, vmladavxq_p_s32, vmladavq_p_s32, vminvq_p_s32, vmaxvq_p_s32,
vdupq_m_n_s32, vclzq_m_s32, vclsq_m_s32, vaddvaq_p_s32, vabsq_m_s32,
vqrdmlsdhxq_s32, vqrdmlsdhq_s32, vqrdmlashq_n_s32, vqrdmlahq_n_s32,
vqrdmladhxq_s32, vqrdmladhq_s32, vqdmlsdhxq_s32, vqdmlsdhq_s32, vqdmlahq_n_s32,
vqdmladhxq_s32, vqdmladhq_s32, vmlsdavaxq_s32, vmlsdavaq_s32, vmlasq_n_s32,
vmlaq_n_s32, vmladavaxq_s32, vmladavaq_s32, vsriq_n_s32, vsliq_n_s32,
vpselq_u64, vpselq_s64.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraints "Rc" and "Re" are added, which checks the 
constant is with
in the range of 0 to 15 and 0 to 31 respectively.

Also a new predicates "mve_imm_15" and "mve_imm_31" are added, to check the the 
matching
constraint Rc and Re respectively.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-25  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm_mve.h (vpselq_u8): Define macro.
(vpselq_s8): Likewise.
(vrev64q_m_u8): Likewise.
(vqrdmlashq_n_u8): Likewise.
(vqrdmlahq_n_u8): Likewise.
(vqdmlahq_n_u8): Likewise.
(vmvnq_m_u8): Likewise.
(vmlasq_n_u8): Likewise.

[PATCH v3][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.

2020-03-18 Thread Srinath Parvathaneni
Hello Kyrill,

Following patch is the rebased version of v2.
(version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542068.html



Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vabavq_s8, vabavq_s16, vabavq_s32, vbicq_m_n_s16, vbicq_m_n_s32,
vbicq_m_n_u16, vbicq_m_n_u32, vcmpeqq_m_f16, vcmpeqq_m_f32,
vcvtaq_m_s16_f16, vcvtaq_m_u16_f16, vcvtaq_m_s32_f32, vcvtaq_m_u32_f32,
vcvtq_m_f16_s16, vcvtq_m_f16_u16, vcvtq_m_f32_s32, vcvtq_m_f32_u32,
vqrshrnbq_n_s16, vqrshrnbq_n_u16, vqrshrnbq_n_s32, vqrshrnbq_n_u32,
vqrshrunbq_n_s16, vqrshrunbq_n_s32, vrmlaldavhaq_s32, vrmlaldavhaq_u32,
vshlcq_s8, vshlcq_u8, vshlcq_s16, vshlcq_u16, vshlcq_s32, vshlcq_u32,
vabavq_s8, vabavq_s16, vabavq_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-23  Andre Vieira  
Mihail Ionescu  
Srinath Parvathaneni  

* config/arm/arm-builtins.c (TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS):
Define qualifier for ternary operands.
(TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (vabavq_s8): Define macro.
(vabavq_s16): Likewise.
(vabavq_s32): Likewise.
(vbicq_m_n_s16): Likewise.
(vbicq_m_n_s32): Likewise.
(vbicq_m_n_u16): Likewise.
(vbicq_m_n_u32): Likewise.
(vcmpeqq_m_f16): Likewise.
(vcmpeqq_m_f32): Likewise.
(vcvtaq_m_s16_f16): Likewise.
(vcvtaq_m_u16_f16): Likewise.
(vcvtaq_m_s32_f32): Likewise.
(vcvtaq_m_u32_f32): Likewise.
(vcvtq_m_f16_s16): Likewise.
(vcvtq_m_f16_u16): Likewise.
(vcvtq_m_f32_s32): Likewise.
(vcvtq_m_f32_u32): Likewise.
(vqrshrnbq_n_s16): Likewise.
(vqrshrnbq_n_u16): Likewise.
(vqrshrnbq_n_s32): Likewise.
(vqrshrnbq_n_u32): Likewise.
(vqrshrunbq_n_s16): Likewise.
(vqrshrunbq_n_s32): Likewise.
(vrmlaldavhaq_s32): Likewise.
(vrmlaldavhaq_u32): Likewise.
(vshlcq_s8): Likewise.
(vshlcq_u8): Likewise.
(vshlcq_s16): Likewise.
(vshlcq_u16): Likewise.
(vshlcq_s32): Likewise.
(vshlcq_u32): Likewise.
(vabavq_u8): Likewise.
(vabavq_u16): Likewise.
(vabavq_u32): Likewise.
(__arm_vabavq_s8): Define intrinsic.
(__arm_vabavq_s16): Likewise.
(__arm_vabavq_s32): Likewise.
(__arm_vabavq_u8): Likewise.
(__arm_vabavq_u16): Likewise.
(__arm_vabavq_u32): Likewise.
(__arm_vbicq_m_n_s16): Likewise.
(__arm_vbicq_m_n_s32): Likewise.
(__arm_vbicq_m_n_u16): Likewise.
(__arm_vbicq_m_n_u32): Likewise.
(__arm_vqrshrnbq_n_s16): Likewise.
(__arm_vqrshrnbq_n_u16): Likewise.
(__arm_vqrshrnbq_n_s32): Likewise.
(__arm_vqrshrnbq_n_u32): Likewise.
(__arm_vqrshrunbq_n_s16): Likewise.
(__arm_vqrshrunbq_n_s32): Likewise.
(__arm_vrmlaldavhaq_s32): Likewise.
(__arm_vrmlaldavhaq_u32): Likewise.
(__arm_vshlcq_s8): Likewise.
(__arm_vshlcq_u8): Likewise.
(__arm_vshlcq_s16): Likewise.
(__arm_vshlcq_u16): Likewise.
(__arm_vshlcq_s32): Likewise.
(__arm_vshlcq_u32): Likewise.
(__arm_vcmpeqq_m_f16): Likewise.
(__arm_vcmpeqq_m_f32): Likewise.
(__arm_vcvtaq_m_s16_f16): Likewise.
(__arm_vcvtaq_m_u16_f16): Likewise.
(__arm_vcvtaq_m_s32_f32): Likewise.
(__arm_vcvtaq_m_u32_f32): Likewise.
(__arm_vcvtq_m_f16_s16): Likewise.
(__arm_vcvtq_m_f16_u16): Likewise.
(__arm_vcvtq_m_f32_s32): Likewise.
(__arm_vcvtq_m_f32_u32): Likewise.
(vcvtaq_m): Define polymorphic variant.
(vcvtq_m): Likewise.
(vabavq): Likewise.
(vshlcq): Likewise.
(vbicq_m_n): Likewise.
(vqrshrnbq_n): Likewise.
(vqrshrunbq_n): Likewise.
* config/arm/arm_mve_builtins.def
(TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS): Use the builtin 

[Patch,committed] libgomp testsuite - disable long double for AMDGCN

2020-03-18 Thread Tobias Burnus

The two OpenACC firstprivate-mappings-1.{c,C} testcases use
long double, but not with nvidia – this also disables it for gcn.

Additionally, it moves the '#define DO_LONG_DOUBLE 0' to the
libgomp file (before it was in the included ../../gcc/testsuite file).

Committed as r10-7238-g4da9288745d8f9c0d6918b685522e89c277020c7

Cheers,

Tobias

PS: Without that patch, it fails with:
lto1: fatal error: amdgcn-amdhsa - 80-bit-precision floating-point numbers 
unsupported (mode 'XF')

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
2020-03-18  Tobias Burnus  

	* testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C: Add
	#define DO_LONG_DOUBLE; set to 1, except for nvidia + gcn.
	* libgomp.oacc-c-c++-common/firstprivate-mappings-1.c: Likewise.

	* g++.dg/goacc/firstprivate-mappings-1.C: Only set DO_LONG_DOUBLE if
	not defined; update comments.
	* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.

 gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c   | 12 
 gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C | 12 
 libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C |  9 +
 .../libgomp.oacc-c-c++-common/firstprivate-mappings-1.c  |  9 +
 4 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
index 33576c50eca..7987beaed9a 100644
--- a/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
@@ -2,7 +2,9 @@
 
 /* This file is also sourced from
'../../../../libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c'
-   as an execution test.  */
+   as an execution test.
+
+   'long double' tests are compiled/used unless DO_LONG_DOUBLE is set to 0.  */
 
 /* See also '../../g++.dg/goacc/firstprivate-mappings-1.C'.  */
 
@@ -24,13 +26,7 @@
 # define HAVE_INT128 0
 #endif
 
-
-/* The one is only relevant for offloading compilation; will always be enabled
-   when doing tree scanning.  */
-#ifdef ACC_DEVICE_TYPE_nvidia
-/* PR71064.  */
-# define DO_LONG_DOUBLE 0
-#else
+#ifndef DO_LONG_DOUBLE
 # define DO_LONG_DOUBLE 1
 #endif
 
diff --git a/gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C b/gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C
index 639bf3f3299..1b1badb1a90 100644
--- a/gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C
+++ b/gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C
@@ -2,7 +2,9 @@
 
 /* This file is also sourced from
'../../../../libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C'
-   as an execution test.  */
+   as an execution test.
+
+   'long double' tests are compiled/used unless DO_LONG_DOUBLE is set to 0.  */
 
 /* See also '../../c-c++-common/goacc/firstprivate-mappings-1.c'.  */
 
@@ -21,13 +23,7 @@
 # define HAVE_INT128 0
 #endif
 
-
-/* The one is only relevant for offloading compilation; will always be enabled
-   when doing tree scanning.  */
-#ifdef ACC_DEVICE_TYPE_nvidia
-/* PR71064.  */
-# define DO_LONG_DOUBLE 0
-#else
+#ifndef DO_LONG_DOUBLE
 # define DO_LONG_DOUBLE 1
 #endif
 
diff --git a/libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C b/libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C
index c8dba9e5d1c..7b3e670073c 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C
@@ -1,3 +1,12 @@
 /* Verify OpenACC 'firstprivate' mappings for C++ reference types.  */
 
+/* PR middle-end/48591 */
+/* PR other/71064 */
+/* Set to 0 for offloading targets not supporting long double.  */
+#if defined(ACC_DEVICE_TYPE_nvidia) || defined(ACC_DEVICE_TYPE_gcn)
+# define DO_LONG_DOUBLE 0
+#else
+# define DO_LONG_DOUBLE 1
+#endif
+
 #include "../../../gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c
index 4a8b310414c..253f8bf0bd0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c
@@ -3,4 +3,13 @@
 /* { dg-additional-options "-Wno-psabi" } as apparently we're doing funny
things with vector arguments.  */
 
+/* PR middle-end/48591 */
+/* PR other/71064 */
+/* Set to 0 for offloading targets not supporting long double.  */
+#if defined(ACC_DEVICE_TYPE_nvidia) || defined(ACC_DEVICE_TYPE_gcn)
+# define DO_LONG_DOUBLE 0
+#else
+# define DO_LONG_DOUBLE 1
+#endif
+
 #include "../../../gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c"



Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 11:39 AM Richard Biener
 wrote:
>
> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin  wrote:
> >
> > Hi,
> >
> > As PR90332 shows, the current scalar epilogue peeling for gaps
> > elimination requires expected vec_init optab with two half size
> > vector mode.  On Power, we don't support vector mode like V8QI,
> > so can't support optab like vec_initv16qiv8qi.  But we want to
> > leverage existing scalar mode like DI to init the desirable
> > vector mode.  This patch is to extend the existing support for
> > Power, as evaluated on Power9 we can see expected 1.9% speed up
> > on SPEC2017 525.x264_r.
> >
> > Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
> >
> > Is it ok for trunk?
>
> There's already code exercising such a case in vectorizable_load
> (VMAT_STRIDED_SLP) which you could have factored out.
>
>  vectype, bool slp,
>  than the alignment boundary B.  Every vector access will
>  be a multiple of B and so we are guaranteed to access a
>  non-gap element in the same B-sized block.  */
> + machine_mode half_mode;
>   if (overrun_p
>   && gap < (vect_known_alignment_in_bytes (first_dr_info)
> / vect_get_scalar_dr_size (first_dr_info)))
> -   overrun_p = false;
> -
> +   {
> + overrun_p = false;
> + if (known_eq (nunits, (group_size - gap) * 2)
> + && known_eq (nunits, group_size)
> + && get_half_mode_for_vector (vectype, _mode))
> +   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
> +   }
>
> why do you need to amend this case?
>
> I don't like storing DR_GROUP_HALF_MODE very much, later
> you need a vector type and it looks cheap enough to recompute
> it where you need it?  Iff then it doesn't belong to DR_GROUP
> but to the stmt-info.
>
> I realize the original optimization was kind of a hack (and I was too
> lazy to implement the integer mode construction path ...).
>
> So, can you factor out the existing code into a function returning
> the vector type for construction for a vector type and a
> pieces size?  So for V16QI and a pieces-size of 4 we'd
> get either V16QI back (then construction from V4QI pieces
> should work) or V4SI (then construction from SImode pieces
> should work)?  Eventually as secondary output provide that
> piece type (SI / V4QI).

Btw, why not implement the neccessary vector init patterns?

> Thanks,
> Richard.
>
> > BR,
> > Kewen
> > ---
> >
> > gcc/ChangeLog
> >
> > 2020-MM-DD  Kewen Lin  
> >
> > PR tree-optimization/90332
> > * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode 
> > field.
> > (DR_GROUP_HALF_MODE): New macro.
> > * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function.
> > (get_group_load_store_type): Call get_half_mode_for_vector to query 
> > target
> > whether support half size mode and update DR_GROUP_HALF_MODE if yes.
> > (vectorizable_load): Build appropriate vector type based on
> > DR_GROUP_HALF_MODE.


Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR90332 shows, the current scalar epilogue peeling for gaps
> elimination requires expected vec_init optab with two half size
> vector mode.  On Power, we don't support vector mode like V8QI,
> so can't support optab like vec_initv16qiv8qi.  But we want to
> leverage existing scalar mode like DI to init the desirable
> vector mode.  This patch is to extend the existing support for
> Power, as evaluated on Power9 we can see expected 1.9% speed up
> on SPEC2017 525.x264_r.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>
> Is it ok for trunk?

There's already code exercising such a case in vectorizable_load
(VMAT_STRIDED_SLP) which you could have factored out.

 vectype, bool slp,
 than the alignment boundary B.  Every vector access will
 be a multiple of B and so we are guaranteed to access a
 non-gap element in the same B-sized block.  */
+ machine_mode half_mode;
  if (overrun_p
  && gap < (vect_known_alignment_in_bytes (first_dr_info)
/ vect_get_scalar_dr_size (first_dr_info)))
-   overrun_p = false;
-
+   {
+ overrun_p = false;
+ if (known_eq (nunits, (group_size - gap) * 2)
+ && known_eq (nunits, group_size)
+ && get_half_mode_for_vector (vectype, _mode))
+   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
+   }

why do you need to amend this case?

I don't like storing DR_GROUP_HALF_MODE very much, later
you need a vector type and it looks cheap enough to recompute
it where you need it?  Iff then it doesn't belong to DR_GROUP
but to the stmt-info.

I realize the original optimization was kind of a hack (and I was too
lazy to implement the integer mode construction path ...).

So, can you factor out the existing code into a function returning
the vector type for construction for a vector type and a
pieces size?  So for V16QI and a pieces-size of 4 we'd
get either V16QI back (then construction from V4QI pieces
should work) or V4SI (then construction from SImode pieces
should work)?  Eventually as secondary output provide that
piece type (SI / V4QI).

Thanks,
Richard.

> BR,
> Kewen
> ---
>
> gcc/ChangeLog
>
> 2020-MM-DD  Kewen Lin  
>
> PR tree-optimization/90332
> * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode field.
> (DR_GROUP_HALF_MODE): New macro.
> * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function.
> (get_group_load_store_type): Call get_half_mode_for_vector to query 
> target
> whether support half size mode and update DR_GROUP_HALF_MODE if yes.
> (vectorizable_load): Build appropriate vector type based on
> DR_GROUP_HALF_MODE.


Re: [PATCH] Bump LTO bytecode version.

2020-03-18 Thread Martin Liška

On 3/18/20 10:34 AM, Richard Biener wrote:

Yes, we don't really bump everytime we change something.


Fine, then please forget about the patch.

Martin


[PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
Hi,

As PR90332 shows, the current scalar epilogue peeling for gaps 
elimination requires expected vec_init optab with two half size
vector mode.  On Power, we don't support vector mode like V8QI,
so can't support optab like vec_initv16qiv8qi.  But we want to
leverage existing scalar mode like DI to init the desirable
vector mode.  This patch is to extend the existing support for
Power, as evaluated on Power9 we can see expected 1.9% speed up
on SPEC2017 525.x264_r.

Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.

Is it ok for trunk?  

BR,
Kewen
---

gcc/ChangeLog

2020-MM-DD  Kewen Lin  

PR tree-optimization/90332
* gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode field.
(DR_GROUP_HALF_MODE): New macro.
* gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function.
(get_group_load_store_type): Call get_half_mode_for_vector to query 
target
whether support half size mode and update DR_GROUP_HALF_MODE if yes.
(vectorizable_load): Build appropriate vector type based on
DR_GROUP_HALF_MODE.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2ca8e494680..24ec0d3759d 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2220,6 +2220,52 @@ vect_get_store_rhs (stmt_vec_info stmt_info)
   gcc_unreachable ();
 }
 
+/* Function GET_HALF_MODE_FOR_VECTOR
+
+   If target supports either of:
+ - One vector mode, whose size is half of given vector size and whose
+   element mode is the same as that of given vector.  Meanwhile, it's
+   available to init given vector with two of them.
+ - One scalar mode, whose size is half of given vector size.  Meanwhile,
+   vector mode with two of them exists and it's available to init it with
+   two of them.
+   return true and save the mode in HMODE.  Otherwise, return false.
+
+   VECTYPE is type of given vector type.  */
+
+static bool
+get_half_mode_for_vector (tree vectype, machine_mode *hmode)
+{
+  gcc_assert (VECTOR_TYPE_P (vectype));
+  machine_mode vec_mode = TYPE_MODE (vectype);
+  scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
+
+  /* Check whether half size vector mode supported.  */
+  gcc_assert (GET_MODE_NUNITS (vec_mode).is_constant ());
+  poly_uint64 n_half_units = exact_div (GET_MODE_NUNITS (vec_mode), 2);
+  if (related_vector_mode (vec_mode, elmode, n_half_units).exists (hmode)
+  && convert_optab_handler (vec_init_optab, vec_mode, *hmode)
+  != CODE_FOR_nothing)
+return true;
+
+  /* Check whether half size scalar mode supported.  */
+  poly_uint64 half_size = exact_div (GET_MODE_BITSIZE (vec_mode), 2);
+  opt_machine_mode smode
+= mode_for_size (half_size, GET_MODE_CLASS (elmode), 0);
+  if (!smode.exists ())
+return false;
+  *hmode = smode.require ();
+
+  machine_mode new_vec_mode;
+  if (related_vector_mode (vec_mode, as_a (*hmode), 2)
+   .exists (_vec_mode)
+  && convert_optab_handler (vec_init_optab, new_vec_mode, *hmode)
+  != CODE_FOR_nothing)
+return true;
+
+  return false;
+}
+
 /* A subroutine of get_load_store_type, with a subset of the same
arguments.  Handle the case where STMT_INFO is part of a grouped load
or store.
@@ -2290,33 +2336,36 @@ get_group_load_store_type (stmt_vec_info stmt_info, 
tree vectype, bool slp,
 than the alignment boundary B.  Every vector access will
 be a multiple of B and so we are guaranteed to access a
 non-gap element in the same B-sized block.  */
+ machine_mode half_mode;
  if (overrun_p
  && gap < (vect_known_alignment_in_bytes (first_dr_info)
/ vect_get_scalar_dr_size (first_dr_info)))
-   overrun_p = false;
-
+   {
+ overrun_p = false;
+ if (known_eq (nunits, (group_size - gap) * 2)
+ && known_eq (nunits, group_size)
+ && get_half_mode_for_vector (vectype, _mode))
+   DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
+   }
  /* If the gap splits the vector in half and the target
 can do half-vector operations avoid the epilogue peeling
 by simply loading half of the vector only.  Usually
 the construction with an upper zero half will be elided.  */
  dr_alignment_support alignment_support_scheme;
- scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
- machine_mode vmode;
  if (overrun_p
  && !masked_p
  && (((alignment_support_scheme
- = vect_supportable_dr_alignment (first_dr_info, false)))
-  == dr_aligned
+   = vect_supportable_dr_alignment (first_dr_info, false)))
+   == dr_aligned
  || alignment_support_scheme == dr_unaligned_supported)
  && known_eq (nunits, (group_size - gap) * 2)
  && known_eq 

Re: [PATCH] Bump LTO bytecode version.

2020-03-18 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 10:00 AM Martin Liška  wrote:
>
> On 3/18/20 9:56 AM, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 9:54 AM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> I would like to bump LTO bytecode version for the upcoming GCC 10.1 
> >> release.
> >>
> >> Ready for master?
> >
> > Um, is there any recent change warranting it?
>
> The API extension reshuffles lto_section_type enum values.
>
> >  The version is already different
> > from GCC 9s and I'd rather wait until we're closer to the actual release?  
> > Note
> > the LTO major doesn't match the GCC major ...
>
> But yes, the last change happened in:
>
> commit 86c23d9314c4081c13ebf629fd3393de4e316bf6
> Author: Jakub Jelinek 
> Date:   Thu May 16 11:30:41 2019 +0200
>
>  * lto-streamer.h (LTO_major_version): Bump to 9.
>
>  From-SVN: r271284
>
> which is right after stage1 opened.
> Is it fine break LTO bytecode during the development of a new release?

Yes, we don't really bump everytime we change something.

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Martin
>


Re: March Order

2020-03-18 Thread May Lee via Gcc-patches
Good Morning

We have attach our March order to this mail, confirm this order 
by return mail and issue send Invoice Asap.


Thanks & Best Regards


May Lee 
Know How International GmbH & Co. KG  
Import 


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-03-18 Thread Martin Liška

On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:

Hi Martin,

I like the patch. It definitely serves our purposes at Oracle and provides 
another way to do what my previous patches did as well.

1) It keeps the backwards compatibility regarding -frecord-gcc-switches; 
therefore, removes my related doubts about your previous patch.

2) It still makes use of -frecord-gcc-switches. The new option is only to 
control the format. This addresses some previous objections to having a new 
option doing something similar. Now the new option controls the behaviour of 
the existing one and that behaviour can be further extended.

3) It uses an environment variable as Jakub suggested.

The patch looks good and I confirm that it works for our purposes.


Hello.

Thank you for the support.



Having said that, I have to ask for recognition in this patch for my and my 
company's contributions. Can you please keep my name and my work email in the 
changelog and in the commit message?


Sure, sorry I forgot.

Martin



Thanks
Egeyar



On 3/17/20 2:53 PM, Martin Liška wrote:

Hi.

I'm sending enhanced patch that makes the following changes:
- a new option -frecord-gcc-switches-format is added; the option
  selects format (processed, driver) for all options that record
  GCC command line
- Dwarf gen_produce_string is now used in -fverbose-asm
- The .s file is affected in the following way:

BEFORE:

# GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
#    compiled by GNU C version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR version 
4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -fpreprocessed test.i -march=znver1 -mmmx -mno-3dnow
# -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha
# -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx
# -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1
# -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw
# -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
# -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
# -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi
# -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku
# -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
# -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b
# -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
# --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1
# -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
# options enabled:  -faggressive-loop-optimizations -fassume-phsa
# -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffp-int-builtin-inexact -ffunction-cse
# -fgcse-lm -fgnu-runtime -fgnu-unique -fident -finline-atomics
# -fipa-stack-alignment -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -flifetime-dse -flto-odr-type-merging -fmath-errno
# -fmerge-debug-strings -fpeephole -fplt -fprefetch-loop-arrays
# -frecord-gcc-switches -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion
# -fsemantic-interposition -fshow-column -fshrink-wrap-separate
# -fsigned-zeros -fsplit-ivs-in-unroller -fssa-backprop -fstdarg-opt
# -fstrict-volatile-bitfields -fsync-libcalls -ftrapping-math -ftree-cselim
# -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
# -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop
# -ftree-reassoc -ftree-scev-cprop -funit-at-a-time -funwind-tables
# -fverbose-asm -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -mabm -madx -maes -malign-stringops -mavx -mavx2
# -mavx256-split-unaligned-store -mbmi -mbmi2 -mclflushopt -mclzero -mcx16
# -mf16c -mfancy-math-387 -mfma -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc
# -mieee-fp -mlong-double-80 -mlzcnt -mmmx -mmovbe -mmwaitx -mpclmul
# -mpopcnt -mprfchw -mpush-args -mrdrnd -mrdseed -mred-zone -msahf -msha
# -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -msse4a -mssse3 -mstv
# -mtls-direct-seg-refs -mvzeroupper -mxsave -mxsavec -mxsaveopt -mxsaves

AFTER:

# GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
# GNU C17 10.0.1 20200317 (experimental) -march=znver1 -mmmx -mno-3dnow -msse 
-msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha -mpclmul 
-mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 
-mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 

RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-03-18 Thread Kyrylo Tkachov
Hi Dennis,

> -Original Message-
> From: Dennis Zhang 
> Sent: 12 March 2020 12:06
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ; 
> Ramana Radhakrishnan ; Kyrylo Tkachov 
> 
> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> (CDE): enable the feature
> 
> Hi all,
> 
> This patch is part of a series that adds support for the ARMv8.m 
> Custom Datapath Extension.
> This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE 
> on corresponding coprocessor 0-7.
> It also adds new check-effective for CDE feature.
> 
> ISA has been announced at
> https://developer.arm.com/architectures/instruction-sets/custom-
> instructions
> 
> Regtested and bootstrapped.
> 
> Is it OK to commit please?

Can you please rebase this patch on top of the recent MVE commits?
It currently doesn't apply cleanly to trunk.
Thanks,
Kyrill

> 
> Cheers
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-03-11  Dennis Zhang  
> 
> * config.gcc: Add arm_cde.h.
> * config/arm/arm-c.c (arm_cpu_builtins): Define or undefine 
> __ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
> * config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
> * config/arm/arm.c (arm_option_reconfigure_globals): Configure 
> arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
> * config/arm/arm.h (TARGET_CDE): New macro.
> * config/arm/arm_cde.h: New file.
> * doc/invoke.texi: Document cdecp[0-7] options.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-03-11  Dennis Zhang  
> 
> * gcc.target/arm/pragma_cde.c: New test.
> * lib/target-supports.exp (arm_v8m_main_cde): New check effective.
> (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.


Re: [PATCH] Bump LTO bytecode version.

2020-03-18 Thread Martin Liška

On 3/18/20 9:56 AM, Richard Biener wrote:

On Wed, Mar 18, 2020 at 9:54 AM Martin Liška  wrote:


Hi.

I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.

Ready for master?


Um, is there any recent change warranting it?


The API extension reshuffles lto_section_type enum values.


 The version is already different
from GCC 9s and I'd rather wait until we're closer to the actual release?  Note
the LTO major doesn't match the GCC major ...


But yes, the last change happened in:

commit 86c23d9314c4081c13ebf629fd3393de4e316bf6
Author: Jakub Jelinek 
Date:   Thu May 16 11:30:41 2019 +0200

* lto-streamer.h (LTO_major_version): Bump to 9.

From-SVN: r271284


which is right after stage1 opened.
Is it fine break LTO bytecode during the development of a new release?

Martin



Richard.


Martin




Re: [PATCH] Bump LTO bytecode version.

2020-03-18 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 9:54 AM Martin Liška  wrote:
>
> Hi.
>
> I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.
>
> Ready for master?

Um, is there any recent change warranting it?  The version is already different
from GCC 9s and I'd rather wait until we're closer to the actual release?  Note
the LTO major doesn't match the GCC major ...

Richard.

> Martin


[PATCH] Bump LTO bytecode version.

2020-03-18 Thread Martin Liška

Hi.

I would like to bump LTO bytecode version for the upcoming GCC 10.1 release.

Ready for master?
Martin
>From b48f4187e11da79d1b0a932b1202f882defc Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 18 Mar 2020 09:40:24 +0100
Subject: [PATCH 3/3] Bump LTO bytecode version.

gcc/ChangeLog:

2020-03-18  Martin Liska  

	* lto-streamer.h (LTO_major_version): Bump to 10.
---
 gcc/lto-streamer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 76aa6fe34b8..9c1c7539462 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -120,7 +120,7 @@ along with GCC; see the file COPYING3.  If not see
  String are represented in the table as pairs, a length in ULEB128
  form followed by the data for the string.  */
 
-#define LTO_major_version 9
+#define LTO_major_version 10
 #define LTO_minor_version 0
 
 typedef unsigned char	lto_decl_flags_t;
-- 
2.25.1



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-18 Thread Martin Liška

On 3/18/20 12:27 AM, Jan Hubicka wrote:

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and 
produce_symtab_extension

Hi,
I hope this is last version of the patch.


Hello.

Yes.



2020-03-12  Martin Liska  

* lto-section-in.c: Add extension_symtab.

ext_symtab  :)


Fixed.


diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
"mode_table",
"hsa",
"lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"

I would move ext_symtab next to symtab so the sections remains at least
bit reasonably ordered.


Ok, I'll adjust it and I will send a separate patch where we bump 
LTO_major_version.

  
+/* Write extension information for symbols (symbol type, section flags).  */

+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;

Do we still use vertical whitespace after decls per GNU coding style?


Dunno. This seems to me like a nit.


diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..4f82b439360 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -236,6 +236,7 @@ enum lto_section_type
LTO_section_ipa_hsa,
LTO_section_lto,
LTO_section_ipa_sra,
+  LTO_section_symtab_extension,

I guess symtab_ext to match the actual section name?


No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more 
descriptive
enum names.


LTO_N_SECTION_TYPES /* Must be last.  */
  };
  
diff --git a/include/lto-symtab.h b/include/lto-symtab.h

index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
  GCCPV_HIDDEN
};
  
+enum gcc_plugin_symbol_type

+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};


Probably comments here?


No. There are just shadow copy of enum types from plugin-api.h which
are documented.


+
  #endif /* GCC_LTO_SYMTAB_H  */
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}


I think we have chance to make some plan for future extensions without
introducing too many additional sections.

Currently there are 2 bytes per entry, while only 3 bits are actively
used of them.  If we invent next flag to pass we can use unused bits
however we need a way to indicate to plugin that the bit is defined.
This could be done by a simple version byte at the beggining of
ext_symtab section which will be 0 now and once we define extra bits we
bump it up to 1.


I like the suggested change, it can help us in the future.



It is not that important given that even empty file results in 2k LTO
object file, but I think it would be nicer in longer run.

+  /* This is for compatibility with older ABIs.  */

Perhaps say here that this ABI defined only "int def;"


Good point.



The patch look good to me. Thanks for the work!


Thanks. I'm sending updated patch that I've just tested on lto.exp and
both binutils master and HJ's branch that utilizes the new API.

Martin


Honza

+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
int visibility;
uint64_t size;
char *comdat_key;
@@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
LDPV_HIDDEN
  };
  
+/* The type of the symbol.  */

+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
  /* How a symbol is resolved.  */
  
  enum ld_plugin_symbol_resolution

@@ -431,7 +456,9 @@ enum ld_plugin_tag
LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
LDPT_GET_INPUT_SECTION_SIZE = 30,
LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
  };
  
  /* The plugin transfer vector.  */

--
2.25.1





>From 492e7dc5b5f792b2e9f92b5fc77e47fe9ee98da7 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/3] API extension for binutils (type of symbols).


Re: [stage1][PATCH] optgen: make more sanity checks for enums.

2020-03-18 Thread Martin Liška

On 3/17/20 11:41 PM, Martin Sebor wrote:

The script reports errors by emitting them as #error directives into
standard output (so they cause the build to fail). Should this new
routine do the same thing?  (/dev/stderr is also not available on all
flavors of UNIX but I'm not sure how much that matters here.)


Good point Martin. Yes, #error emission works fine here:

./options.h:1:2: error: #error Empty option argument 'Enum' during parsing of: 
Enum (diagnostic_prefixing_rule) String(once) 
Value(DIAGNOSTICS_SHOW_PREFIX_ONCE)
1 | #error Empty option argument 'Enum' during parsing of: Enum 
(diagnostic_prefixing_rule) String(once) Value(DIAGNOSTICS_SHOW_PREFIX_ONCE)
  |  ^

There's updated version of the patch.
Martin
>From e6ceafae43ab8735f0ad3d18a0cfc7f3dfbfd3ec Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 21 Feb 2020 10:40:57 +0100
Subject: [PATCH] optgen: make more sanity checks for enums.

gcc/ChangeLog:

2020-03-17  Martin Liska  

	* opt-functions.awk (opt_args_non_empty): New function.
	* opt-read.awk: Use the function for various option arguments.
---
 gcc/opt-functions.awk | 10 ++
 gcc/opt-read.awk  | 10 +-
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index 2f0442dc563..b4952b89315 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -72,6 +72,16 @@ function opt_args(name, flags)
 	return flags
 }
 
+# If FLAGS contains a "NAME(...argument...)" flag, return the value
+# of the argument.  Print error message otherwise.
+function opt_args_non_empty(name, flags, description)
+{
+	args = opt_args(name, flags)
+	if (args == "")
+		print "#error Empty option argument '" name "' during parsing of: " flags
+	return args
+}
+
 # Return the Nth comma-separated element of S.  Return the empty string
 # if S does not contain N elements.
 function nth_arg(n, s)
diff --git a/gcc/opt-read.awk b/gcc/opt-read.awk
index a2e16f29aff..9bb9dfcf6ca 100644
--- a/gcc/opt-read.awk
+++ b/gcc/opt-read.awk
@@ -81,8 +81,8 @@ BEGIN {
 		}
 		else if ($1 == "Enum") {
 			props = $2
-			name = opt_args("Name", props)
-			type = opt_args("Type", props)
+			name = opt_args_non_empty("Name", props)
+			type = opt_args_non_empty("Type", props)
 			unknown_error = opt_args("UnknownError", props)
 			enum_names[n_enums] = name
 			enum_type[name] = type
@@ -93,9 +93,9 @@ BEGIN {
 		}
 		else if ($1 == "EnumValue")  {
 			props = $2
-			enum_name = opt_args("Enum", props)
-			string = opt_args("String", props)
-			value = opt_args("Value", props)
+			enum_name = opt_args_non_empty("Enum", props)
+			string = opt_args_non_empty("String", props)
+			value = opt_args_non_empty("Value", props)
 			val_flags = "0"
 			val_flags = val_flags \
 			  test_flag("Canonical", props, "| CL_ENUM_CANONICAL") \
-- 
2.25.1