Re: [PATCH] options: Make --help= to emit values post-overrided

2020-08-13 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Thanks for the comments!

on 2020/8/13 上午12:10, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Segher,
>>
>> on 2020/8/7 锟斤拷锟斤拷10:42, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Fri, Aug 07, 2020 at 10:44:10AM +0800, Kewen.Lin wrote:
> I think this makes a lot of sense.
>
>> btw, not sure whether it's a good idea to move 
>> target_option_override_hook
>> call into print_specific_help and use one function local static
>> variable to control it's called once for all kinds of help dumping
>> (possible combination), then can remove the calls in function 
>> common_handle_option.
>
> I cannot easily imagine what that will look like...  it could easily be
> worse than what you have here (callbacks aren't so nice, but there are
> worse things).
>

 I attached opts_alt2.diff to be more specific for this, both alt1 and alt2
 follow the existing callback scheme, alt2 aims to avoid possible multiple
 times target_option_override_hook calls when we have several --help= or
 similar, but I guess alt1 is also fine since the hook should be allowed to
 be called more than once.
> 
> Yeah.  I guess ideally (and independently of this patch) we'd have a
> flag_checking assert that override_options is idempotent, but that
> might be tricky to implement.
> 
> It looks like there's a subtle (pre-existing) difference in what --help
> and --help= do.  --help already calls target_option_override_hook,
> but does it at the point that the option occurs.  --help= instead
> queues the help until we've finished processing other arguments,
> and would therefore take later options into account.
> 

Yes, it is.

> I don't know that one is obviously better than the other though.
> 
>> […]
>> *opts_alt1.diff*
>>
>> gcc/ChangeLog:
>>
>>  * opts-global.c (decode_options): Adjust call to print_help.
>>  * opts.c (print_help): Add one function point argument
>>  target_option_override_hook and call it before print_specific_help.
>>  * opts.h (print_help): Add one more argument to function declare.
> 
> I think personally I'd prefer an option (3): call
> target_option_override_hook directly in decode_options,
> if help_option_arguments is nonempty.  Like you say,
> decode_options appears to be the only caller of print_help.
> 

Good idea!  The related patch is attached, different from opts_alt{1,2}
it could still call target_option_override_hook even if we won't call
print_specific_help eventually for some special cases like lang_mask is
CL_DRIVER or include_flags is empty.  But I think it's fine.

Also bootstrapped/regtested on powerpc64le-linux-gnu P8.

BR,
Kewen
-
gcc/ChangeLog:

* opts-global.c (decode_options): Call target_option_override_hook
before it prints for --help=*.


diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..fc332871cb8 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -327,8 +327,14 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   unsigned i;
   const char *arg;
 
-  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+  if (!help_option_arguments.is_empty ())
+{
+  /* Consider post-overrided values for --help=*.  */
+  target_option_override_hook ();
+
+  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
+   print_help (opts, lang_mask, arg);
+}
 }
 
 /* Hold command-line options associated with stack limitation.  */


reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-13 Thread Hans-Peter Nilsson via Gcc-patches
Originally I thought to bootstrap this patch on MIPS and SPARC
since they're both delayed-branch-slot targets but I
reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
seems only visium and CRIS has this feature set, and I see no
trace of visium in neither newlib nor the simulator next to
glibc.  So, I just tested cris-elf.

This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
fillers using a method similar to that in commit 33c2207d3fda,
where care was taken for fill_simple_delay_slots to allow such
insns when scanning for delay-slot fillers *backwards* (before
the insn).

A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
For cc0 targets, insns don't mention clobbering cc0, so the
clobbers are mentioned in the "resources" only as a special
entity and only for compare-insns and branches, where the cc0
value matters.

In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
the register liveness detection in reorg.c / resource.c treats
that as a blocker (for other insns mentioning it, i.e. most)
when looking for delay-slot-filling candidates.  This means that
when comparing core and performance for a delay-slot cc0 target
before and after the de-cc0 conversion, the inability to fill a
delay slot after conversion manifests as a regression.  This was
one such case, for CRIS, with random_bitstring in
gcc.c-torture/execute/arith-rand-ll.c as well as the target
libgcc division function.

After this, all known performance regressions compared to cc0
are fixed.

Ok to commit?

gcc:
PR target/93372
* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
TARGET_FLAGS_REGNUM as delay-slot fillers.

gcc/testsuite:
PR target/93372
* gcc.target/cris/pr93372-47.c: New test.
---
 gcc/reorg.c| 37 +-
 gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c

diff --git a/gcc/reorg.c b/gcc/reorg.c
index dfd7494bf..83161caa0 100644
--- a/gcc/reorg.c
+++ b/gcc/reorg.c
@@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   CLEAR_RESOURCE ();
   CLEAR_RESOURCE ();
 
+  /* Handle the flags register specially, to be able to accept a
+ candidate that clobbers it.  See also fill_simple_delay_slots.  */
+  bool filter_flags
+= (slots_to_fill == 1
+   && targetm.flags_regnum != INVALID_REGNUM
+   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
+  struct resources fset;
+  struct resources flags_res;
+  if (filter_flags)
+{
+  CLEAR_RESOURCE ();
+  CLEAR_RESOURCE (_res);
+  SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
+}
+
   /* If we do not own this thread, we must stop as soon as we find
  something that we can't put in a delay slot, since all we can do
  is branch into THREAD at a later point.  Therefore, labels stop
@@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
 don't separate or copy insns that set and use CC0.  */
   if (! insn_references_resource_p (trial, , true)
- && ! insn_sets_resource_p (trial, , true)
+ && ! insn_sets_resource_p (trial, filter_flags ?  : , true)
  && ! insn_sets_resource_p (trial, , true)
+ /* If we're handling sets to the flags register specially, we
+only allow an insn into a delay-slot, if it either:
+- doesn't set the flags register,
+- the "set" of the flags register isn't used (clobbered),
+- insns between the delay-slot insn and the trial-insn
+as accounted in "set", have not affected the flags register.  */
+ && (! filter_flags
+ || ! insn_sets_resource_p (trial, _res, true)
+ || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
+ || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
  && (! own_thread || ! sets_cc0_p (pat)
  && ! can_throw_internal (trial))
@@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
   lose = 1;
   mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
   mark_referenced_resources (trial, , true);
+  if (filter_flags)
+   {
+ mark_set_resources (trial, , 0, MARK_SRC_DEST_CALL);
+
+ /* Groups of flags-register setters with users should not
+affect opportunities to move flags-register-setting insns
+(clobbers) into the delay-slot.  */
+ CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
+ CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
+   }
 
   /* Ensure we don't put insns between the setting of cc and the 

Re: [PATCH] [PR target/96350]Force ENDBR immediate into memory to avoid fake ENDBR opcode.

2020-08-13 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 11, 2020 at 5:56 PM Uros Bizjak  wrote:
>
> On Tue, Aug 11, 2020 at 11:36 AM Hongtao Liu  wrote:
> >
> > On Tue, Aug 11, 2020 at 4:38 PM Uros Bizjak  wrote:
> > >
> > > On Tue, Aug 11, 2020 at 5:30 AM Hongtao Liu  wrote:
> > > >
> > > > Hi:
> > > >   The issue is described in the bugzilla.
> > > >   Bootstrap is ok, regression test for i386/x86-64 backend is ok.
> > > >   Ok for trunk?
> > > >
> > > > ChangeLog
> > > > gcc/
> > > > PR target/96350
> > > > * config/i386/i386.c (ix86_legitimate_constant_p): Return
> > > > false for ENDBR immediate.
> > > > (ix86_legitimate_address_p): Ditto.
> > > > * config/i386/predicated.md
> > > > (x86_64_immediate_operand): Exclude ENDBR immediate.
> > > > (x86_64_zext_immediate_operand): Ditto.
> > > > (x86_64_dwzext_immediate_operand): Ditto.
> > > > (ix86_not_endbr_immediate_operand): New predicate.
> > > >
> > > > gcc/testsuite
> > > > * gcc.target/i386/endbr_immediate.c: New test.
> > >
> > > +;; Return true if VALUE isn't an ENDBR opcode in immediate field.
> > > +(define_predicate "ix86_not_endbr_immediate_operand"
> > > +  (match_test "1")
> > >
> > > Please reverse the above logic to introduce
> > > ix86_endbr_immediate_operand, that returns true for unwanted
> > > immediate. Something like:
> > >
> > > (define_predicate "ix86_endbr_immediate_operand"
> > >   (match_code "const_int")
> > > ...
> > >
> > > And you will be able to use it like:
> > >
> > > if (ix86_endbr_immediate_operand (x, VOIDmode)
> > >   return false;
> > >
> >
> > Changed.
>
> No, it is not.
>
> +  if ((flag_cf_protection & CF_BRANCH)
> +  && CONST_INT_P (op))
>
> You don't need to check for const ints here.
>
> And please rewrite the body of the function to something like (untested):
>
> {
>   unsigned HOST_WIDE_INT val = TARGET_64BIT ? 0xfa1e0ff3 : 0xfb1e0ff3;
>
>   if (x == val)
> return 1;
>
>   if (TARGET_64BIT)
> for (; x >= val; x >>= 8)
>   if (x == val)
> return 1;
>
>   return 0;
> }
>
> so it will at least *look* like some thoughts have been spent on this.
> I don't plan to review the code where it is obvious from the first
> look that it was thrown together in a hurry. Please get some internal
> company signoff first. Ping me in a week for a review.
>

Sorry for the hurry, i know your time is precious.

> Uros.
> >
> > >/* Otherwise we handle everything else in the move patterns.  */
> > > -  return true;
> > > +  return ix86_not_endbr_immediate_operand (x, VOIDmode);
> > >  }
> > >
> > > Please handle this in CASE_CONST_SCALAR_INT: part.
> > >
> > > +  if (disp && !ix86_not_endbr_immediate_operand (disp, VOIDmode))
> > > +return false;
> > >
> > > And this in:
> > >
> > >   /* Validate displacement.  */
> > >   if (disp)
> > > {
> > >
> >
> > Changed.
>
> A better place for these new special cases is at the beginning of the
> part I referred, not at the end.
>

Yes.

> Uros.

Update patch.

-- 
BR,
Hongtao
From d89dfb93e54dd3a9717fdb4d3f58cccf93b15072 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 4 Aug 2020 10:00:13 +0800
Subject: [PATCH] Force ENDBR immediate into memory.

gcc/
	PR target/96350
	* config/i386/i386.c (ix86_legitimate_constant_p): Return
	false for ENDBR immediate.
	(ix86_legitimate_address_p): Ditto.
	* config/i386/predicated.md
	(x86_64_immediate_operand): Exclude ENDBR immediate.
	(x86_64_zext_immediate_operand): Ditto.
	(x86_64_dwzext_immediate_operand): Ditto.
	(ix86_endbr_immediate_operand): New predicate.

gcc/testsuite
	* gcc.target/i386/endbr_immediate.c: New test.
---
 gcc/config/i386/i386.c|   6 +
 gcc/config/i386/predicates.md |  30 +++
 .../gcc.target/i386/endbr_immediate.c | 198 ++
 3 files changed, 234 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/endbr_immediate.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8ea6a4d7ea7..ea92626e08e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10056,6 +10056,9 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
   break;
 
 CASE_CONST_SCALAR_INT:
+  if (ix86_endbr_immediate_operand (x, VOIDmode))
+	return false;
+
   switch (mode)
 	{
 	case E_TImode:
@@ -10449,6 +10452,9 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict)
   /* Validate displacement.  */
   if (disp)
 {
+  if (ix86_endbr_immediate_operand (disp, VOIDmode))
+	return false;
+
   if (GET_CODE (disp) == CONST
 	  && GET_CODE (XEXP (disp, 0)) == UNSPEC
 	  && XINT (XEXP (disp, 0), 1) != UNSPEC_MACHOPIC_OFFSET)
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 07e69d555c0..25d63bdb940 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -130,10 +130,35 @@
 (define_predicate "symbol_operand"
   (match_code "symbol_ref"))
 
+;; Return true if VALUE is an ENDBR opcode in immediate field.

[committed][testsuite] Add missing require-effective-target allloca

2020-08-13 Thread Tom de Vries
Hi,

Add missing require-effect-target alloca directives.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Add missing require-effective-target allloca

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr92088-1.c: Add require-effective-target alloca.
* gcc.dg/torture/pr92088-2.c: Same.
* gcc.dg/torture/pr93124.c: Same.
* gcc.dg/torture/pr94479.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: Same.

---
 gcc/testsuite/gcc.dg/torture/pr92088-1.c| 1 +
 gcc/testsuite/gcc.dg/torture/pr92088-2.c| 1 +
 gcc/testsuite/gcc.dg/torture/pr93124.c  | 1 +
 gcc/testsuite/gcc.dg/torture/pr94479.c  | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr92088-1.c 
b/gcc/testsuite/gcc.dg/torture/pr92088-1.c
index b56f8ad665e..488bdcbcce6 100644
--- a/gcc/testsuite/gcc.dg/torture/pr92088-1.c
+++ b/gcc/testsuite/gcc.dg/torture/pr92088-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target alloca } */
 
 int __attribute__((noipa))
 g (char *p)
diff --git a/gcc/testsuite/gcc.dg/torture/pr92088-2.c 
b/gcc/testsuite/gcc.dg/torture/pr92088-2.c
index a20a01cd1ce..6c9e5048d28 100644
--- a/gcc/testsuite/gcc.dg/torture/pr92088-2.c
+++ b/gcc/testsuite/gcc.dg/torture/pr92088-2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target alloca } */
 
 void foo(int n)
 {
diff --git a/gcc/testsuite/gcc.dg/torture/pr93124.c 
b/gcc/testsuite/gcc.dg/torture/pr93124.c
index 16bc8b54f14..0d361d8c7cf 100644
--- a/gcc/testsuite/gcc.dg/torture/pr93124.c
+++ b/gcc/testsuite/gcc.dg/torture/pr93124.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-fno-rerun-cse-after-loop 
-fno-guess-branch-probability -fno-tree-fre" } */
+/* { dg-require-effective-target alloca } */
 
 int x;
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr94479.c 
b/gcc/testsuite/gcc.dg/torture/pr94479.c
index 53285bb4f38..3e4058279aa 100644
--- a/gcc/testsuite/gcc.dg/torture/pr94479.c
+++ b/gcc/testsuite/gcc.dg/torture/pr94479.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-stack-check "specific" } */
 /* { dg-additional-options "-fstack-check -w" } */
+/* { dg-require-effective-target alloca } */
 
 int a;
 struct b {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c
index 6fd1bca3c7b..685a4fd8c89 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-22.c
@@ -1,7 +1,8 @@
 /* PR tree-optimization/91567 - Spurious -Wformat-overflow warnings building
glibc (32-bit only)
{ dg-do compile }
-   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" }
+   { dg-require-effective-target alloca } */
 
 typedef __SIZE_TYPE__ size_t;
 


Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread will schmidt via Gcc-patches
On Thu, 2020-08-13 at 17:55 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 13, 2020 at 05:11:11PM -0500, will schmidt wrote:
> > > > That is probably a level of detail that is not
> > > > really needed in the GCC code comment.  Probably best to just
> > > > change
> > > > the comment to read something like "ISA 3.0 sign extend
> > > > builtins". 
> > > 
> > > Sounds good.
> > 
> > As long as there are no issues defining the builtins for 3.0 here.
> > AFAIK they are not documented in ISA 3.0.  This is a happy accident
> > that these ISA 3.1 builtins can be implemented with existing
> > support.
> 
> There are *no* builtins defined in the ISA!  The insns are just ISA
> 3.0
> instructions.
> 

Ok. 

So then maybe just "Sign extend builtins" and leave off the ISA
reference all together.   

:-)

thanks
-WIll

> 
> Segher



Re: [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare

2020-08-13 Thread will schmidt via Gcc-patches
On Tue, 2020-08-11 at 12:22 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 2, adds support for divide, modulo, shift, compare of 128-bit
> integers.  The support adds the instruction and builtin support.
> 
>  Carl Love
> 
> 
> ---
> rs6000, 128-bit multiply, divide, shift, compare
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  
>   * config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
>   for new builtins .

Looks like there is also a change to the parameters for vec_rlnm(a,b,c)
here.  

>   * config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
>   UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
ok

>   (altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
>   altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
>   altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
>   altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
>   altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
>   define_insn.
>   (vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
>   vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
>   altivec_vrlqnm): New define_expands.

Also a whitespace fix in there.
ok.

>   * config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
>   BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.


Is this consistent with the other recent changes that reworked some of
those macro definition names?


>   (VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.

>   (VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
>   CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
>   VCMPAET_P): New macro expansions.

>   (VSIGNEXTSD2Q,VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,

comma+space 

>   VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,


>   MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.

>   (VRLQ, VSLQ, VSRQ, VSRAQ, SIGNEXT): New overload expansions.


DIVE, MOD  missing.



>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
>   P10_BUILTIN_VCMPEQUT, P10_BUILTIN_CMPGE_1TI,

Duplication of P10_BUILTIN_VCMPEQUT.  

>   P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
>   P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,

missing P10_BUILTIN_VCMPLE_U1TI

>   P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
>   P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,

>   P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,

>   P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
>   P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,

>   P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,

>   P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
>   P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
>   P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,

>   P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
>   P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
>   P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
>   New overloaded definitions.


>   (int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,
>   P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
>   P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
>   P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
>   P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.

Those are part of (rs6000_gimple_fold_builtin). 

Also may be worth a sniff check of the generated code to ensure the
folding behaves properly.


>   (int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, 
> int_ftype_int_v1ti_v1ti]:
>   New assignments.

ok.


missing (altivec_init_builtins): Add E_V1TImode case.


>   (int_ftype_int_v1ti_v1ti)[P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
>   P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
>   P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

Those are part of (builtin_function_type).


>   * config/rs6000/r6000.c (rs6000_builtin_mask_calculate): New
>   TARGET_TI_VECTOR_OPS definition.

>   (rs6000_option_override_internal): Add if TARGET_POWER10 statement.

comment below.


>   (rs6000_handle_altivec_attribute)[ E_TImode, E_V1TImode]: New case
>   statements.
>   (rs6000_opt_masks): Add ti-vector-ops entry.

ok.

>   * config/rs6000/r6000.h (MASK_TI_VECTOR_OPS, RS6000_BTM_P10_128BIT,
>   RS6000_BTM_TI_VECTOR_OPS, bool_V1TI_type_node): New defines.

>   (rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.

>   * config/rs6000/rs6000.opt: New mti-vector-ops entry.

comment below.

>   * config/rs6000/vector.md (vector_eqv1ti, vector_gtv1ti,
>   vector_nltv1ti, vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti,
>   vector_ngtuv1ti, 

[committed] analyzer: add regression test [PR96598]

2020-08-13 Thread David Malcolm via Gcc-patches
PR analyzer/96598 reports that -fanalyzer issues a false
  warning: use of NULL 'str' where non-null expected
with gcc 10.2 when used with -fsanitize=undefined.

This was fixed by g:808f4dfeb3a95f50f15e71148e5c1067f90a126d.

gcc/testsuite/ChangeLog:
PR analyzer/96598
* gcc.dg/analyzer/pr96598.c: New test.
---
 gcc/testsuite/gcc.dg/analyzer/pr96598.c | 26 +
 1 file changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96598.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96598.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96598.c
new file mode 100644
index 000..b4354cd3394
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96598.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-O0 -fsanitize=undefined" } */
+
+extern char *foo (char *dest, const char *src)
+  __attribute__ ((__nonnull__ (1, 2)));
+
+unsigned bar(const char *str)
+  __attribute__ ((__nonnull__ ()));
+
+unsigned test(const char *str, unsigned **pv)
+  __attribute__ ((__nonnull__ ()));
+
+unsigned test(const char* str, unsigned **pv)
+{
+  char buffer[130];
+
+  *pv = 0;
+
+  foo(buffer, str);
+  if (bar(buffer))
+{
+  const char *ptr = 0;
+  foo(buffer, str);
+  return bar(buffer);
+}
+  return 0;
+}
-- 
2.26.2



Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-13 Thread Martin Sebor via Gcc-patches

On 8/12/20 5:19 PM, Joseph Myers wrote:

On Fri, 7 Aug 2020, Martin Sebor via Gcc-patches wrote:


I don't see anything in the tests in this patch to cover this sort of case
(arrays of pointers, including arrays of pointers to arrays etc.).


I've added a few test cases and reworked the declarator parsing
(get_parm_array_spec) a bit, fixing some bugs.


I don't think get_parm_array_spec is yet logically right (and I don't see
tests of the sort of cases I'm concerned about, such as arrays of pointers
to arrays, or pointers with attributes applied to them).


Please have a look at the tests at the end of Warrary-parameter-4.c.
They should exercise the cases you brought up (as I understand them).
There are some tests for attributes in Warray-parameter.c.


You have logic

+  if (pd->kind == cdk_pointer
+ && (!next || next->kind == cdk_id))
+   {
+ /* Do nothing for the common case of a pointer.  The fact that
+the parameter is one can be deduced from the absence of
+an arg spec for it.  */
+ return attrs;
+   }

which is correct as far as it goes (when it returns with nothing done,
it's correct to do so, because the argument is indeed a pointer), but
incomplete:

* Maybe cdk_pointer is followed by cdk_attrs before cdk_id.  In this case
the code won't return.


I think I see the problem you're pointing out (I just don't see how
to trigger it or test that it doesn't happen).  If the tweak in
the attached update doesn't fix it a test case would be helpful.



* Maybe the code is correct to continue because we're in the case of an
array of pointers (cdk_array follows).  But as I understand it, the intent
is to set up an "arg spec" that describes only the (multidimensional)
array that is the parameter itself - not any array pointed to.  And it
looks to me like, in the case of an array of pointers to arrays, both sets
of array bounds would end up in the spec constructed.


Ideally, I'd like to check even pointers to arrays and so they should
be recorded somewhere.  The middle end code doesn't do any checking
of those yet for out-of-bounds accesses.  It wasn't a goal for
the first iteration so I've tweaked the code to avoid recording them.


What I think is correct is for both cdk_pointer and cdk_function to result
in the spec built up so far being cleared (regardless of what follows
cdk_pointer or cdk_function), rather than early return, so that the spec
present at the end is for the innermost sequence of array declarators
(possibly with attributes involved as well).  (cdk_function shouldn't
actually be an issue, since functions can't return arrays or functions,
but logically it seems appropriate to treat it like cdk_pointer.)


I've added a test for cdk_function to the one for cdk_pointer.



Then, the code

+  if (pd->kind == cdk_id)
+   {
+ /* Extract the upper bound from a parameter of an array type.  */

also seems misplaced.  If the type specifiers for the parameter are a
typedef for an array type, that array type should be processed *before*
the declarator to get the correct semantics (as if the bounds from those
type specifiers were given in the declarator), not at the end which gets
that type out of order with respect to array declarators.  (Processing
before the declarator also means clearing the results of that processing
if a pointer declarator is encountered at any point, because in that case
the array type in the type specifiers is irrelevant.)


I'm not sure I follow you here.  Can you show me what you mean on
a piece of code?  This test case (which IIUC does what you described)
works as expected:

$ cat q.c && gcc -O2 -S -Wall q.c
typedef int A[7][9];

void f (A[3][5]);
void f (A[1][5]);

void g (void)
{
  A a[2][5];
  f (a);
}
q.c:4:9: warning: argument 1 of type ‘int[1][5][7][9]’ with mismatched 
bound [-Warray-parameter=]

4 | void f (A[1][5]);
  | ^~~
q.c:3:9: note: previously declared as ‘int[3][5][7][9]’
3 | void f (A[3][5]);
  | ^~~
q.c: In function ‘g’:
q.c:9:3: warning: ‘f’ accessing 3780 bytes in a region of size 2520 
[-Wstringop-overflow=]

9 |   f (a);
  |   ^
q.c:9:3: note: referencing argument 1 of type ‘int (*)[5][7][9]’



The logic

+ /* Skip all constant bounds except the most significant one.
+The interior ones are included in the array type.  */
+ if (next && (next->kind == cdk_array || next->kind == cdk_pointer))
+   continue;

is another example of code that fails to look past cdk_attrs.


It should be handled by the tweak I added in the attached revision.

Martin

[2/5] - C front end support to detect out-of-bounds accesses to array parameters.

gcc/c-family/ChangeLog:

	PR c/50584
	* c-common.h (warn_parm_array_mismatch): Declare new function.
	(has_attribute): Move declaration of an existing function.
	(build_attr_access_from_parms): Declare new function.
	* c.opt (-Warray-parameter, -Wvla-parameter): New 

Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread Segher Boessenkool
Hi!

On Thu, Aug 13, 2020 at 05:11:11PM -0500, will schmidt wrote:
> > > That is probably a level of detail that is not
> > > really needed in the GCC code comment.  Probably best to just
> > > change
> > > the comment to read something like "ISA 3.0 sign extend builtins". 
> > 
> > Sounds good.
> 
> As long as there are no issues defining the builtins for 3.0 here.
> AFAIK they are not documented in ISA 3.0.  This is a happy accident
> that these ISA 3.1 builtins can be implemented with existing support.

There are *no* builtins defined in the ISA!  The insns are just ISA 3.0
instructions.


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-13 Thread Segher Boessenkool
Hi!

This is about the Power binding to some OpenMP API, right?  It has
nothing to do with "vector" or "ABI" -- we have vectors already, and
we have ABIs already, more than enough of each.

It is very very VERY hard to review this without being told the proper
setting here.


On Fri, Aug 07, 2020 at 08:35:52PM +, Bert Tenjy wrote:
> This patch adds functionality to enable use of POWER Architecture's
> VSX extensions to speed up certain code sequences.

It does?  Oh, to implement some OpenMP stuff?

> The document describing POWER Architecture Vector Function interface is
> tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile;
> do=view=powerarchvectfuncabi.html

"This page does not exist yet. You can create a new empty page, or use
one of the page templates."

> 4. Changes to files vect-simd-clone-{1,4,5,8}.c are needed since 
> PPC64 has only 128bit-wide vector bus. x86_64 for which the tests were
> initially written has buses wider than that for AVX and higher architectures.

There is no "vector bus".  All Power vector registers are 128 bits, yes.

> 5. Per Segher's response to v0, we still need to agree a name for the 
> guiding document whose name is currently 'POWER Architecture Vector Function 
> ABI'.

Not just the document title.  You should use terminology that agrees with
everything else, that isn't usiing the same words for different things,
that isn't super confusing, throughout the patch :-)

> +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */

The documentation for this hook says ((lack of) line wraps verbatim):

@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN 
(struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, 
@var{int})
This hook should set @var{vecsize_mangle}, @var{vecsize_int}, 
@var{vecsize_float}
fields in @var{simd_clone} structure pointed by @var{clone_info} argument and 
also
@var{simdlen} field if it was previously 0.
The hook should return 0 if SIMD clones shouldn't be emitted,
or number of @var{vecsize_mangle} variants that should be emitted.
@end deftypefn

so I have no idea what this hook should do.  Two of the four arguments
are left completely undefined, to start with.

> +
> +static int
> +rs6000_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
> +struct cgraph_simd_clone *clonei,
> +tree base_type, int num)

Indent is wrong here, btw (should use tabs, and everything should align
to that first "struct"...)  You need to be a bit more creative here,
maybe use shorter function names to begin with?  And use names that say
what the functions are *for*, or giving some context.

"simd" means nothing here.  More than  half of our backend is SIMD
stuff.  That has only sideways to do with what you call "simd" here :-/

> +  tree t;
> +  int i;

Declare things at first use, *in* the first use if you can (you usually
can).

> +  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);

This isn't a predicate, don't call it _p please.  It's a boolean, and
"decl_arg" isn't very meaningful either.

You might want to factor this differently altogether.

> +  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 
> 0;
> +   t && t != void_list_node; t = TREE_CHAIN (t), i++)

Do all that "i" stuff not in the "for" expression itself (but in the
body).  If a for expression doesn't fit on one line, it usually is more
readable if you put all three parts on separate lines.  Complex
initialisation like here is more readable if you do it *before* the
loop.

> + case E_QImode:
> + case E_HImode:
> + case E_SImode:
> + case E_DImode:
> + case E_SFmode:
> + case E_DFmode:

> +   warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +   "unsupported argument type %qT for simd", arg_type);

That isn't all types.  But you do ISA 2.07?  Put that in a comment
somewhere then please.

> +  if (TARGET_VSX)
> +{
> +  clonei->vecsize_mangle = 'b';
> +  ret = 1;
> +}

That is ISA 2.06 (Power 7), which you do not support I think?

> +  switch (clonei->vecsize_mangle)

I don't know what this is.

> +void
> +rs6000_simd_clone_adjust (struct cgraph_node *node)
> +{
> +}

Don't define it if it doesn't do anything?  Or is it required to exist
even if it doesn't ever do anything?

> +static int
> +rs6000_simd_clone_usable (struct cgraph_node *node)
> +{
> +  switch (node->simdclone->vecsize_mangle)
> +{
> +  case 'b':

(wrong indentation, "case" should align with "{")

> +if (!TARGET_VSX)
> +  return -1;
> +return 0;
> +  default:
> +gcc_unreachable ();
> +}
> +}

Please don't use switch statements where a simple "if" would do.

static int
rs6000_simd_clone_usable (struct cgraph_node *node)
{
  gcc_assert (node->simdclone->vecsize_mangle == 'b');

  if (TARGET_VSX)
return 0;

  return 

[PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

2020-08-13 Thread Lewis Hyatt via Gcc-patches
Hello-

The attached patch was discussed briefly on PR 54185 here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
The test case for this PR sometimes fails due to random failures in
pthread_create() that are not related to the original PR. This patch fixes
it up by ignoring those failures. The test case was designed to repeat the
same test 1000 times to attempt to reproduce a race condition, so I think is
OK if some of those iterations are simply skipped.

Thanks for taking a look at it; I can commit it if it makes sense.

-Lewis
libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

The test for this PR calls pthread_create() many times in a row, which may fail
with EAGAIN sometimes. Avoid generating a test failure in this case.

libstdc++-v3/ChangeLog:

PR libstdc++/54185
* testsuite/30_threads/condition_variable/54185.cc: Make test robust
to random pthread_create() failures.

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc 
b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
index ea0d5bb8740..cbd21e11e57 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
@@ -31,19 +31,25 @@
 std::condition_variable* cond = nullptr;
 std::mutex mx;
 int started = 0;
+bool notified = false;
 int constexpr NUM_THREADS = 10;
 
+static void finalize_cond()
+{
+  /* Lock should be held when calling this.  */
+  notified = true;
+  cond->notify_all();
+  delete cond;
+  cond = nullptr;
+}
+
 void do_thread_a()
 {
   std::unique_lock lock(mx);
   if(++started >= NUM_THREADS)
-  {
-cond->notify_all();
-delete cond;
-cond = nullptr;
-  }
+finalize_cond();
   else
-cond->wait(lock);
+cond->wait(lock, [] { return notified; });
 }
 
 int main(){
@@ -51,11 +57,24 @@ int main(){
   for(int j = 0; j < 1000; ++j)
   {
 started = 0;
+notified = false;
 cond = new std::condition_variable;
 for (int i = 0; i < NUM_THREADS; ++i)
-  vec.emplace_back(_thread_a);
-for (int i = 0; i < NUM_THREADS; ++i)
-  vec[i].join();
+  {
+   try
+ {
+   vec.emplace_back(_thread_a);
+ }
+   catch(const std::system_error&)
+ {
+   /* Thread creation may fail due to resource limits; just move on.  
*/
+   std::unique_lock lock(mx);
+   finalize_cond();
+   break;
+ }
+  }
+for(auto& thread: vec)
+  thread.join();
 vec.clear();
   }
 }


Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread will schmidt via Gcc-patches
On Thu, 2020-08-13 at 13:29 -0500, Segher Boessenkool wrote:
> On Thu, Aug 13, 2020 at 11:09:10AM -0700, Carl Love wrote:
> > The builtins
> > 
> > vector signed int vec_signexti (vector signed char a)
> > vector signed long long vec_signextll (vector signed char a)
> > vector signed int vec_signexti (vector signed short a)
> > vector signed long long vec_signextll (vector signed short a)
> > vector signed long long vec_signextll (vector signed int a)
> > 
> > were defined in the function prototypes directory in box called
> > "RFC
> > 2608 - 128-bit Binary Integer Operations".  The document the new
> > P10
> > builtins.  However, this subset of the newly defined builtins for
> > P10
> > can be implemented with existing Power 9 instructions.  That was
> > the
> > point of the comment.
> 
> Ah, I see :-)
> 
> > That is probably a level of detail that is not
> > really needed in the GCC code comment.  Probably best to just
> > change
> > the comment to read something like "ISA 3.0 sign extend builtins". 
> 
> Sounds good.

As long as there are no issues defining the builtins for 3.0 here.
AFAIK they are not documented in ISA 3.0.  This is a happy accident
that these ISA 3.1 builtins can be implemented with existing support.

> 
> > My thought for calling it out is that they could be back ported to
> > an
> > earlier GCC version since they use Power 9 instructions but it is
> > probably not worth the effort unless there is an explicit request
> > for
> > them. 
> 
> Yeah.  Thanks for the explanation!
> 
> 
> Segher



Re: [PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-13 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 01:58:31PM -0500, Peter Bergner wrote:
> On 8/12/20 8:59 PM, Peter Bergner wrote:
> > On 8/12/20 8:00 PM, Segher Boessenkool wrote:
> >> On Wed, Aug 12, 2020 at 03:32:18PM -0500, Peter Bergner wrote:
> > Ok, how about this comment then?
> > 
> > @@ -6444,8 +6444,30 @@ machine_mode
> >  rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
> >   machine_mode mode,
> >   int *punsignedp ATTRIBUTE_UNUSED,
> > - const_tree, int)
> > + const_tree, int for_return)
> >  {
> > +  /* Warning: this is a static local variable and not always NULL!
> > + This function is called multiple times for the same function
> > + and return value.  PREV_FUNC is used to keep track of the
> > + first time we encounter a function's return value in order
> > + to not report an error with that return value multiple times.  */
> > +  static struct function *prev_func = NULL;
> 
> Approved offline, so I pushed this to trunk.  Thanks!
> 
> Are we ok to backport this to GCC 10?  If you don't want this
> trickery in GCC 10, we could just backport the param handling
> which doesn't use the trickery and leave the return value
> unhandled.

It's okay for backporting as well.  It's all kind of wrong, but it will
in practice just work anyway:

1) struct function is GTY, and you don't mark the static variable here
as root, so the thing it points into might have gone away; but we really
only use the pointer here, we don't deref anything explicitly, so the
generated program won't either (hopefully, etc.)

2) Similarly, some other struct function for another function may (in
theory) be allocated at the same address when next we are called; that
other function will then never get the warning.

We'd need to store a flag in the struct function (or similarly) itself,
to make things kosher.  How do similar warnings elsewhere handle this?


Anyway, okay for trunk and backports.  Thanks!


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 08:40:22PM +, GT wrote:
> I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The 
> latter is
> much simpler and I see how I would add PPC attribute "vsx" similarly. If I 
> was to follow
> the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p 
> called near
> the end of the function is a problem. Because it in turn calls
> ix86_valid_target_attribute_tree and this last function doesn't have a 
> similarly named
> function in PPC code.
> 
> Also, once the attribute "vsx" is added, where is it used? I mean that in the 
> sense of
> where is execution conditioned on the definition of say, the "sse2" string in 
> x86_64?

You need to trigger what will the middle-end and backend do if you use
explicit __attribute__((target ("vsx"))) on the function, so in the end it
needs to do some parsing, create a TARGET_OPTION_NODE with the right option
changes and put it to the function.

Jakub



Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-13 Thread GT via Gcc-patches
‐‐‐ Original Message ‐‐‐
On Monday, August 10, 2020 2:07 PM, Jakub Jelinek  wrote:

> On Mon, Aug 10, 2020 at 05:29:49PM +, GT wrote:
>
> > > For PowerPC, if all you want to support is b which requires VSX, then the
> > > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and
> > > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported
> > > functions always set it to 'b' (and in the end return 1).
> > > Then ensure that the 'b' variants of function definitions get target 
> > > ("vsx")
> > > attribute added if !TARGET_VSX.
> >
> > So setting attribute "vsx" for 'b' variants of function definitions is what
> > should go in function rs6000_simd_clone_usable?
>
> No. That function should say if the particular clone ('b' in this case) is
> usable from some caller, and the answer for your 'b' is TARGET_VSX is
> required to be non-zero.
>
> The adjustment should go into the simd_clone_adjust target hook, see
> what ix86_simd_clone_adjust does (though, that one has more variants to
> handle).

I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The 
latter is
much simpler and I see how I would add PPC attribute "vsx" similarly. If I was 
to follow
the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p 
called near
the end of the function is a problem. Because it in turn calls
ix86_valid_target_attribute_tree and this last function doesn't have a 
similarly named
function in PPC code.

Also, once the attribute "vsx" is added, where is it used? I mean that in the 
sense of
where is execution conditioned on the definition of say, the "sse2" string in 
x86_64?

Bert.



Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Peter Bergner via Gcc-patches
On 8/13/20 3:00 PM, Carl Love wrote:
> On Thu, 2020-08-13 at 14:48 -0500, Bill Schmidt wrote:
>> OK, but that was just meant as an example.  We have a fair number of 
>> things that changed names, so I was somewhat surprised.  It could be 
>> that all of these are likewise hidden via the overload mechanism. 
>> Just 
>> checking to be sure.
> 
> OK, I will go dig thru the test cases in a similar way for all of the
> changes just to make sure.  I didn't get any test failures but yea, a
> lot of changes so lets double check.

I too was surprised there were no testsuite changes required.
If you ran the testsuite twice with the unpatched and patched
builds (as is required for patch submission) and there were no
regressions, then great.  Wow, but great.

Peter




Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Carl Love via Gcc-patches
Bill:


On Thu, 2020-08-13 at 14:48 -0500, Bill Schmidt wrote:
> OK, but that was just meant as an example.  We have a fair number of 
> things that changed names, so I was somewhat surprised.  It could be 
> that all of these are likewise hidden via the overload mechanism. 
> Just 
> checking to be sure.

OK, I will go dig thru the test cases in a similar way for all of the
changes just to make sure.  I didn't get any test failures but yea, a
lot of changes so lets double check.

 Carl



Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Bill Schmidt via Gcc-patches

On 8/13/20 2:24 PM, Carl Love wrote:

Bill:

On Thu, 2020-08-13 at 13:38 -0500, Bill Schmidt wrote:

Hi Carl,

Thanks for cleaning up the consistency issue.  The new names and
related
adjustments LGTM.

Are there no affected test cases that need adjusting?  That
surprises
me.  For example, didn't __builtin_altivec_xxeval become
__builtin_vsx_xxeval as a result of this change?  Does that not
appear
in any test cases?

Thanks,

Bill

In gcc/config/rs6000/rs6000-builtin.def we have

#define vec_ternarylogic(a, b, c, d)   __builtin_vec_xxeval (a, b, c, d)

The vec_ternarylogic() builtin is used in test files
gcc/testsuite/gcc.target/powerpc/vec-ternarylogic-X.c where X stands
for 1, 2, 3, 4, 5, 6, 7, 8, 9.

In gcc/confit/rs6000/rs6000-builtin.def

BU_P10V_VSX_4 (XXEVAL, "xxeval", CONST, xxeval)

now expands to __builtin_vsx_xxeval as you expect.

I do not  see a test case that uses the old builtin name
__builtin_altivec_xxeval.

carll@genoa:~/GCC/gcc-mainline-935/gcc/testsuite/gcc.target/powerpc$
grep -r  xxeval *
vec-ternarylogic-0.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-2.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-3.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-4.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-6.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-8.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-9.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
carll@genoa:~/GCC/gcc-mainline-935/gcc/testsuite/gcc.target/powerpc$

There just seems to be the various tests that are expected to generate
the xxeval instruction.  As far as I can see there is no test program that uses 
the __builtin_altivec_xxeval name.



OK, but that was just meant as an example.  We have a fair number of 
things that changed names, so I was somewhat surprised.  It could be 
that all of these are likewise hidden via the overload mechanism.  Just 
checking to be sure.


Thanks,
Bill



  Carl



Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Carl Love via Gcc-patches
Bill:

On Thu, 2020-08-13 at 13:38 -0500, Bill Schmidt wrote:
> Hi Carl,
> 
> Thanks for cleaning up the consistency issue.  The new names and
> related 
> adjustments LGTM.
> 
> Are there no affected test cases that need adjusting?  That
> surprises 
> me.  For example, didn't __builtin_altivec_xxeval become 
> __builtin_vsx_xxeval as a result of this change?  Does that not
> appear 
> in any test cases?
> 
> Thanks,
> 
> Bill

In gcc/config/rs6000/rs6000-builtin.def we have

#define vec_ternarylogic(a, b, c, d)   __builtin_vec_xxeval (a, b, c, d)

The vec_ternarylogic() builtin is used in test files 
gcc/testsuite/gcc.target/powerpc/vec-ternarylogic-X.c where X stands
for 1, 2, 3, 4, 5, 6, 7, 8, 9.

In gcc/confit/rs6000/rs6000-builtin.def 

BU_P10V_VSX_4 (XXEVAL, "xxeval", CONST, xxeval) 

now expands to __builtin_vsx_xxeval as you expect.

I do not  see a test case that uses the old builtin name
__builtin_altivec_xxeval.

carll@genoa:~/GCC/gcc-mainline-935/gcc/testsuite/gcc.target/powerpc$
grep -r  xxeval *
vec-ternarylogic-0.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-2.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-3.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-4.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-6.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-8.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
vec-ternarylogic-9.c:/* { dg-final { scan-assembler {\mxxeval\M} } } */
carll@genoa:~/GCC/gcc-mainline-935/gcc/testsuite/gcc.target/powerpc$ 

There just seems to be the various tests that are expected to generate
the xxeval instruction.  As far as I can see there is no test program that uses 
the __builtin_altivec_xxeval name. 

 Carl 



Re: [PATCH 1/5] infrastructure to detect out-of-bounds accesses to array parameters

2020-08-13 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-07 at 11:08 -0600, Martin Sebor via Gcc-patches wrote:
> On 7/28/20 7:16 PM, Martin Sebor wrote:
> > Patch 1 adds the basic infrastructure to support array/VLA bounds
> > in attribute access.  It extends the access string specification
> > to describe function parameters of array types (including VLAs),
> > extends the attr_access class to parse the string and store
> > the data in a form that's easy to work with, and implements
> > checking of various kinds f mismatches between redeclarations.
> > It doesn't actually enable anything new so no new tests are added.
> 
> Joseph's comments on patch 2 in the series prompted me to change
> how the array (and VLA) function parameters are formatted: instead
> of implementing it mostly outside the pretty printer (which, to do
> completely correctly, would require reimplementing what the pretty
> printer already does) I instead enhanced the pretty printer.  That
> let me simplify the formatting done in the helper.  The attached
> revision reflects this simplification (the only change from
> the original is to attr_access::array_as_string).
> 
> Martin
> [1/5] - Infrastructure to detect out-of-bounds accesses to array parameters.
> 
> gcc/ChangeLog:
> 
>   PR c/50584
>   * attribs.c (decl_attributes): Also pass decl along with type
>   attributes to handlers.
>   (init_attr_rdwr_indices): Change second argument to attribute chain.
>   Handle internal attribute representation in addition to external.
>   (get_parm_access): New function.
>   (attr_access::to_internal_string): Define new member function.
>   (attr_access::to_external_string): Define new member function.
>   (attr_access::vla_bounds): Define new member function.
>   * attribs.h (struct attr_access): Declare new members.
>   (attr_access::from_mode_char): Define new member function.
>   (get_parm_access): Declare new function.
>   * calls.c (initialize_argument_information): Pass function type
>   attributes to init_attr_rdwr_indices.
>   * tree-ssa-uninit.c (maybe_warn_pass_by_reference): Same.
> 
> gcc/c-family/ChangeLog:
> 
>   PR c/50584
>   * c-attribs.c (c_common_attribute_table): Add "arg spec" attribute.
>   (handle_argspec_attribute): New function.
>   (get_argument, get_argument_type): New functions.
>   (append_access_attrs): Add overload.  Handle internal attribute
>   representation in addition to external.
>   (handle_access_attribute): Handle internal attribute representation
>   in addition to external.
>   (build_attr_access_from_parms): New function.
>   * c-warn.c (parm_array_as_string): Define new function.
>   (plus_one):  Define new function.
>   (warn_parm_array_mismatch): Define new function.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/50584
>   * gcc.dg/attr-access-read-write-2.c: Adjust text of expected messages.
LGTM.
jeff
> 



Re: Add cold attribute to one time construction APIs

2020-08-13 Thread Aditya K via Gcc-patches
sure.
--

From: Jonathan Wakely 
Sent: Thursday, August 13, 2020 11:13 AM
To: Aditya K 
Cc: libstdc++ ; gcc-patches 
Subject: Re: Add cold attribute to one time construction APIs 
 
Please CC the libstdc++ list on all libstdc++ patches.

On Thu, 13 Aug 2020 at 17:51, Aditya K  wrote:
>
> Revised patch with _GLIBCXX_COLD added at the end.
>
> ```
> commit 3dc9f9a8461b1c88e991ceb517e5fdd81f268d1e
> Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
> Date:   Thu Aug 13 09:41:34 2020 -0700
>
> Add cold attribute to one time construction APIs
>
> __cxa_guard_acquire is used for only one purpose,
> namely guarding local static variable initialization,
> and since that purpose is definitionally cold, it should be attributed as 
>cold.
> Similarly for __cxa_guard_release and __cxa_guard_abort
>
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index b1fad59d4..f6f954eef 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -35,20 +35,21 @@
>
>  // The datestamp of the C++ library in compressed ISO date format.
>  #define __GLIBCXX__
>
>  // Macros for various attributes.
>  //   _GLIBCXX_PURE
>  //   _GLIBCXX_CONST
>  //   _GLIBCXX_NORETURN
>  //   _GLIBCXX_NOTHROW
>  //   _GLIBCXX_VISIBILITY
> +//   _GLIBCXX_COLD
>  #ifndef _GLIBCXX_PURE
>  # define _GLIBCXX_PURE __attribute__ ((__pure__))
>  #endif
>
>  #ifndef _GLIBCXX_CONST
>  # define _GLIBCXX_CONST __attribute__ ((__const__))
>  #endif
>
>  #ifndef _GLIBCXX_NORETURN
>  # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
> @@ -67,20 +68,24 @@
>  #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
>
>  #if _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
>  # define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V)))
>  #else
>  // If this is not supplied by the OS-specific or CPU-specific
>  // headers included below, it will be defined to an empty default.
>  # define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
>  #endif
>
> +#ifndef _GLIBCXX_COLD
> +# define _GLIBCXX_COLD __attribute__ ((cold))
> +#endif
> +
>  // Macros for deprecated attributes.
>  //   _GLIBCXX_USE_DEPRECATED
>  //   _GLIBCXX_DEPRECATED
>  //   _GLIBCXX17_DEPRECATED
>  //   _GLIBCXX20_DEPRECATED( string-literal )
>  #ifndef _GLIBCXX_USE_DEPRECATED
>  # define _GLIBCXX_USE_DEPRECATED 1
>  #endif
>
>  #if defined(__DEPRECATED) && (__cplusplus >= 201103L)
> diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
> index 000713ecd..24c1366e2 100644
> --- a/libstdc++-v3/libsupc++/cxxabi.h
> +++ b/libstdc++-v3/libsupc++/cxxabi.h
> @@ -108,27 +108,27 @@ namespace __cxxabiv1
>    __cxa_vec_delete2(void* __array_address, size_t __element_size,
> size_t __padding_size, __cxa_cdtor_type __destructor,
> void (*__dealloc) (void*));
>
>    void
>    __cxa_vec_delete3(void* __array_address, size_t __element_size,
> size_t __padding_size, __cxa_cdtor_type __destructor,
> void (*__dealloc) (void*, size_t));
>
>    int
> -  __cxa_guard_acquire(__guard*);
> +  __cxa_guard_acquire(__guard*) _GLIBCXX_COLD;
>
>    void
> -  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW;
> +  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
>
>    void
> -  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW;
> +  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
>
>    // DSO destruction.
>    int
>    __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
>
>    void
>    __cxa_finalize(void*);
>
>    // TLS destruction.
>    int
> ```
>
> From: Aditya K
> Sent: Thursday, August 13, 2020 10:47 AM
> To: Jeff Law via Gcc-patches ; jwakely@gmail.com 
> 
> Subject: Add cold attribute to one time construction APIs
>
> This would help compiler optimize local static objects.
>
> ```
> commit e2f299679ddf56a6d6d71ea9d589cd76b2ca107b
> Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
> Date:   Thu Aug 13 09:41:34 2020 -0700
>
> Add cold attribute to one time construction APIs
>
> __cxa_guard_acquire is used for only one purpose,
> namely guarding local static variable initialization,
> and since that purpose is definitionally cold, it should be attributed as 
>cold.
> Similarly for __cxa_guard_release and __cxa_guard_abort
>
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index b1fad59d4..359e955a7 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -39,20 +39,24 @@
>  // Macros for various attributes.
>  //   _GLIBCXX_PURE
>  //   _GLIBCXX_CONST
>  //   _GLIBCXX_NORETURN
>  //   _GLIBCXX_NOTHROW
>  //   _GLIBCXX_VISIBILITY
>  #ifndef _GLIBCXX_PURE
>  # define _GLIBCXX_PURE __attribute__ ((__pure__))
>  #endif
>
> +#ifndef _GLIBCXX_COLD
> +# define _GLIBCXX_COLD __attribute__ ((cold))
> +#endif
> +
>  #ifndef _GLIBCXX_CONST
>  # define _GLIBCXX_CONST 

Re: [PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-13 Thread Peter Bergner via Gcc-patches
On 8/12/20 8:59 PM, Peter Bergner wrote:
> On 8/12/20 8:00 PM, Segher Boessenkool wrote:
>> On Wed, Aug 12, 2020 at 03:32:18PM -0500, Peter Bergner wrote:
> Ok, how about this comment then?
> 
> @@ -6444,8 +6444,30 @@ machine_mode
>  rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
>   machine_mode mode,
>   int *punsignedp ATTRIBUTE_UNUSED,
> - const_tree, int)
> + const_tree, int for_return)
>  {
> +  /* Warning: this is a static local variable and not always NULL!
> + This function is called multiple times for the same function
> + and return value.  PREV_FUNC is used to keep track of the
> + first time we encounter a function's return value in order
> + to not report an error with that return value multiple times.  */
> +  static struct function *prev_func = NULL;

Approved offline, so I pushed this to trunk.  Thanks!

Are we ok to backport this to GCC 10?  If you don't want this
trickery in GCC 10, we could just backport the param handling
which doesn't use the trickery and leave the return value
unhandled.

Peter





[committed]: i386: Improve CET builtin expanders.

2020-08-13 Thread Uros Bizjak via Gcc-patches
Several fixes to CET builtin expanders:

a) Split out explicit zeroing of RDSSP output operand.
b) Use DImode memory operand for RSTORSSP and CLRSSBSY instructions.
c) Use parameterized pattern names to simplify calling of named patterns.

2020-08-13  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386-builtin.def (CET_NORMAL): Merge to CET BDESC array.
(__builtin_ia32_rddspd, __builtin_ia32_rddspq, __builtin_ia32_incsspd)
(__builtin_ia32_incsspq, __builtin_ia32_wrssd, __builtin_ia32_wrssq)
(__builtin_ia32_wrussd, __builtin_ia32_wrussq): Use CODE_FOR_nothing.
* config/i386/i386-builtins.c: Remove handling of CET_NORMAL builtins.
* config/i386/i386.md (@rdssp): Implement as parametrized
name pattern.  Use SWI48 mode iterator.  Introduce input operand
and remove explicit XOR zeroing from insn template.
(@incssp): Implement as parametrized name pattern.
Use SWI48 mode iterator.
(@wrss): Ditto.
(@wruss): Ditto.
(rstorssp): Remove expander.  Rename insn pattern from *rstorssp.
Use DImode memory operand.
(clrssbsy): Remove expander.  Rename insn pattern from *clrssbsy.
Use DImode memory operand.
(save_stack_nonlocal): Update for parametrized name patterns.
Use cleared register as an argument to gen_rddsp.
(restore_stack_nonlocal): Update for parametrized name patterns.
* config/i386/i386-expand.c (ix86_expand_builtin):
[case IX86_BUILTIN_RDSSPD, case IX86_BUILTIN_RDSSPQ]: Expand here.
[case IX86_BUILTIN_INCSSPD, case IX86_BUILTIN_INCSSPQ]: Ditto.
[case IX86_BUILTIN_RSTORSSP, case IX86_BUILTIN_CLRSSBSY]:
Generate DImode memory operand.
[case IX86_BUILTIN_WRSSD, case IX86_BUILTIN_WRSSQ]
[case IX86_BUILTIN_WRUSSD, case IX86_BUILTIN_WRUSSD]:
Update for parameterized name patterns.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 6270068fba1..25b80868bd3 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3126,21 +3126,17 @@ BDESC_END (MULTI_ARG, CET)
 
 /* CET.  */
 BDESC_FIRST (cet, CET,
-   OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_incsspsi, "__builtin_ia32_incsspd", 
IX86_BUILTIN_INCSSPD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
-BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_incsspdi, 
"__builtin_ia32_incsspq", IX86_BUILTIN_INCSSPQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64)
+   OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_nothing, "__builtin_ia32_rdsspd", 
IX86_BUILTIN_RDSSPD, UNKNOWN, (int) UINT_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_rdsspq", IX86_BUILTIN_RDSSPQ, UNKNOWN, (int) UINT64_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_nothing, "__builtin_ia32_incsspd", 
IX86_BUILTIN_INCSSPD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
+BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_incsspq", IX86_BUILTIN_INCSSPQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64)
 BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_saveprevssp, 
"__builtin_ia32_saveprevssp", IX86_BUILTIN_SAVEPREVSSP, UNKNOWN, (int) 
VOID_FTYPE_VOID)
 BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_rstorssp, "__builtin_ia32_rstorssp", 
IX86_BUILTIN_RSTORSSP, UNKNOWN, (int) VOID_FTYPE_PVOID)
-BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_wrsssi, "__builtin_ia32_wrssd", 
IX86_BUILTIN_WRSSD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED_PVOID)
-BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_wrssdi, 
"__builtin_ia32_wrssq", IX86_BUILTIN_WRSSQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64_PVOID)
-BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_wrusssi, "__builtin_ia32_wrussd", 
IX86_BUILTIN_WRUSSD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED_PVOID)
-BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_wrussdi, 
"__builtin_ia32_wrussq", IX86_BUILTIN_WRUSSQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64_PVOID)
+BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_nothing, "__builtin_ia32_wrssd", 
IX86_BUILTIN_WRSSD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED_PVOID)
+BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_wrssq", IX86_BUILTIN_WRSSQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64_PVOID)
+BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_nothing, "__builtin_ia32_wrussd", 
IX86_BUILTIN_WRUSSD, UNKNOWN, (int) VOID_FTYPE_UNSIGNED_PVOID)
+BDESC (OPTION_MASK_ISA_SHSTK | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_wrussq", IX86_BUILTIN_WRUSSQ, UNKNOWN, (int) 
VOID_FTYPE_UINT64_PVOID)
 BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_setssbsy, "__builtin_ia32_setssbsy", 
IX86_BUILTIN_SETSSBSY, UNKNOWN, (int) VOID_FTYPE_VOID)
 BDESC (OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_clrssbsy, "__builtin_ia32_clrssbsy", 
IX86_BUILTIN_CLRSSBSY, UNKNOWN, (int) VOID_FTYPE_PVOID)
 
-BDESC_END (CET, CET_NORMAL)
-
-BDESC_FIRST (cet_rdssp, CET_NORMAL,
-   OPTION_MASK_ISA_SHSTK, 0, CODE_FOR_rdsspsi, "__builtin_ia32_rdsspd", 
IX86_BUILTIN_RDSSPD, UNKNOWN, (int) 

Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Bill Schmidt via Gcc-patches

On 8/13/20 11:12 AM, Carl Love wrote:

GCC maintainers:

The macro expansion for the bfloat convert intrinsics XVCVBF16SP and
XVCVSPBF16 need to be restricted to P10.

The macro expansions BU_P10V_0, BU_P10V_1, BU_P10V_2, BU_P10V_3 expand
the name field as "__builtin_altivec_".  These macro expansions are
being used for both VSX and Altivec instructions.  There needs to be
separate expansions for VSX with the name field "__builtin_vsx_" and
for Altivec with the name field "__builtin_altivec_".

The following patch creates new macro expansions BU_P10V_VSX_# and
BU_P10V_AV_# for the VSX and Altivec instructions respectively.  The
new names are consistent with the P8 and P9 naming convention for the
VSX and Altivec instructions.

The macro expansion for XVCVBF16SP and XVCVSPBF16 is changed from
BU_VSX_1 to BU_P10V_VSX_1 to restrict it to P10 and beyond.  Also MISC
is changed to CONST in the macro expansion call.

The side effect of creating the macro expansions for VSX and Altivec is
it changes all of the expanded names.  The patch fixes all the uses of
the expanded names as needed for the new VSX and Altivec macros.

The patch has been run on

powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.

Please let me know if the patch is acceptable for trunk.



Hi Carl,

Thanks for cleaning up the consistency issue.  The new names and related 
adjustments LGTM.


Are there no affected test cases that need adjusting?  That surprises 
me.  For example, didn't __builtin_altivec_xxeval become 
__builtin_vsx_xxeval as a result of this change?  Does that not appear 
in any test cases?


Thanks,

Bill



 Carl Love

-
[PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V 
macro definitions.

gcc/ChangeLog

2020-08-12  Carl Love  
* config/rs6000/rs6000-builtin.def (BU_P10V_0, BU_P10V_1,
BU_P10V_2, BU_P10V_3): Rename BU_P10V_VSX_0, BU_P10V_VSX_1,
BU_P10V_VSX_2, BU_P10V_VSX_3 respectively.
(BU_P10V_4): Remove.
(BU_P10V_AV_0, BU_P10V_AV_1, BU_P10V_AV_2, BU_P10V_AV_3, BU_P10V_AV_4):
New definitions for Power 10 Altivec macros.
(VSTRIBR, VSTRIHR, VSTRIBL, VSTRIHL, VSTRIBR_P, VSTRIHR_P,
VSTRIBL_P, VSTRIHL_P, MTVSRBM, MTVSRHM, MTVSRWM, MTVSRDM, MTVSRQM,
VEXPANDMB, VEXPANDMH, VEXPANDMW, VEXPANDMD, VEXPANDMQ, VEXTRACTMB,
VEXTRACTMH, VEXTRACTMW, VEXTRACTMD, VEXTRACTMQ): Replace macro
expansion BU_P10V_1 with BU_P10V_AV_1.
(VCLRLB, VCLRRB, VCFUGED, VCLZDM, VCTZDM, VPDEPD, VPEXTD, VGNB,
VCNTMBB, VCNTMBH, VCNTMBW, VCNTMBD): Replace macro expansion
BU_P10V_2 with  BU_P10V_AV_2.
(VEXTRACTBL, VEXTRACTHL, VEXTRACTWL, VEXTRACTDL, VEXTRACTBR, VEXTRACTHR,
VEXTRACTWR, VEXTRACTDR, VINSERTGPRBL, VINSERTGPRHL, VINSERTGPRWL,
VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL, VINSERTGPRBR,
VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR, VINSERTVPRHR,
VINSERTVPRWR, VREPLACE_ELT_V4SI, VREPLACE_ELT_UV4SI, VREPLACE_ELT_V2DF,
VREPLACE_ELT_V4SF, VREPLACE_ELT_V2DI, VREPLACE_ELT_UV2DI, 
VREPLACE_UN_V4SI,
VREPLACE_UN_UV4SI, VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, 
VREPLACE_UN_UV2DI,
VREPLACE_UN_V2DF, VSLDB_V16QI, VSLDB_V8HI, VSLDB_V4SI, VSLDB_V2DI,
VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI, VSRDB_V2DI): Replace macro 
expansion
BU_P10V_3 with BU_P10V_AV_3.
(VXXSPLTIW_V4SI, VXXSPLTIW_V4SF, VXXSPLTID): Replace macro expansion
BU_P10V_1 with BU_P10V_AV_1.
(XXGENPCVM_V16QI, XXGENPCVM_V8HI, XXGENPCVM_V4SI, XXGENPCVM_V2DI):
Replace macro expansion BU_P10V_2 with BU_P10V_VSX_2.
(VXXSPLTI32DX_V4SI, VXXSPLTI32DX_V4SF, VXXBLEND_V16QI, VXXBLEND_V8HI,
VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): Replace 
macor
expansion BU_P10V_3 with BU_P10V_VSX_3.
(XXEVAL, VXXPERMX): Replace macro expansion BU_P10V_4 with 
BU_P10V_VSX_4.
(XVCVBF16SP, XVCVSPBF16): Replace macro expansion BU_VSX_1 with
BU_P10V_VSX_1. Also change MISC to CONST.
* config/rs6000/rs6000-c.c: (P10_BUILTIN_VXXPERMX): Replace with
P10V_BUILTIN_VXXPERMX.
(P10_BUILTIN_VCLRLB, P10_BUILTIN_VCLRLB, P10_BUILTIN_VCLRRB,
P10_BUILTIN_VGNB, P10_BUILTIN_XXEVAL, P10_BUILTIN_VXXPERMX,
P10_BUILTIN_VEXTRACTBL, P10_BUILTIN_VEXTRACTHL, P10_BUILTIN_VEXTRACTWL,
P10_BUILTIN_VEXTRACTDL, P10_BUILTIN_VINSERTGPRHL,
P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
P10_BUILTIN_VEXTRACTBR, P10_BUILTIN_VEXTRACTHR,
P10_BUILTIN_VEXTRACTWR, P10_BUILTIN_VEXTRACTDR,
P10_BUILTIN_VINSERTGPRBR, P10_BUILTIN_VINSERTGPRHR,
P10_BUILTIN_VINSERTGPRWR, P10_BUILTIN_VINSERTGPRDR,
P10_BUILTIN_VINSERTVPRBR, 

Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 11:09:10AM -0700, Carl Love wrote:
> The builtins
> 
> vector signed int vec_signexti (vector signed char a)
> vector signed long long vec_signextll (vector signed char a)
> vector signed int vec_signexti (vector signed short a)
> vector signed long long vec_signextll (vector signed short a)
> vector signed long long vec_signextll (vector signed int a)
> 
> were defined in the function prototypes directory in box called "RFC
> 2608 - 128-bit Binary Integer Operations".  The document the new P10
> builtins.  However, this subset of the newly defined builtins for P10
> can be implemented with existing Power 9 instructions.  That was the
> point of the comment.

Ah, I see :-)

> That is probably a level of detail that is not
> really needed in the GCC code comment.  Probably best to just change
> the comment to read something like "ISA 3.0 sign extend builtins". 

Sounds good.

> My thought for calling it out is that they could be back ported to an
> earlier GCC version since they use Power 9 instructions but it is
> probably not worth the effort unless there is an explicit request for
> them. 

Yeah.  Thanks for the explanation!


Segher


c++: Unconfuse lookup_name_real API a bit

2020-08-13 Thread Nathan Sidwell
modules has uncovered some exciting issues with hidden friends, and I 
need to wander into name lookup again.  This is a piece I punted on the 
first time round.  But now I have C++11, it's better than it would have 
been :)


The API for lookup_name_real is really confusing.  This addresses the 
part where we have NONCLASS to say DON'T search class scopes, and 
BLOCK_P to say DO search block scopes.  I've added a single bitmask to 
explicitly say which scopes to search.  I used an enum class so one 
can't accidentally misorder it.  It's also reordered so we don't mix it 
up with the parameters that say what kind of thing we're looking for.


gcc/cp/
* name-lookup.h (enum class LOOK_where): New.
(operator|, operator&): Overloads for it.
(lookup_name_real): Replace NONCLASS & BLOCK_P parms with WHERE.
* name-lookup.c (identifier_type_value_w): Adjust
lookup_name_real call.
(lookup_name_real_1): Replace NONCLASS and BLOCK_P parameters
with WHERE bitmask. Don't search namespaces if not asked to.
(lookup_name_real): Adjust lookup_name_real_1 call.
(lookup_name_nonclass, lookup_name)
(lookup_name_prefer_type): Likewise.
* call.c (build_operator_new_call)
(add_operator_candidates): Adjust lookup_name_real calls.
* parser.c (cp_parser_lookup_name): Likewise.
* pt.c (tsubst_friend_class, lookup_init_capture_pack)
(tsubst_expr): Likewise.
* semantics.c (capture_decltype): Likewise.
libcc1/
* libcp1plugin.cc (plugin_build_dependent_expr): Likewise.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git c/gcc/cp/call.c w/gcc/cp/call.c
index f164b211c9f..47a368d069d 100644
--- c/gcc/cp/call.c
+++ w/gcc/cp/call.c
@@ -4704,7 +4704,7 @@ build_operator_new_call (tree fnname, vec **args,
up in the global scope.
 
  we disregard block-scope declarations of "operator new".  */
-  fns = lookup_name_real (fnname, 0, 1, /*block_p=*/false, 0, 0);
+  fns = lookup_name_real (fnname, LOOK_where::NAMESPACE, 0, 0, 0);
   fns = lookup_arg_dependent (fnname, fns, *args);
 
   if (align_arg)
@@ -5982,7 +5982,8 @@ add_operator_candidates (z_candidate **candidates,
  consider.  */
   if (!memonly)
 {
-  tree fns = lookup_name_real (fnname, 0, 1, /*block_p=*/true, 0, 0);
+  tree fns = lookup_name_real (fnname, LOOK_where::BLOCK_NAMESPACE,
+   0, 0, 0);
   fns = lookup_arg_dependent (fnname, fns, arglist);
   add_candidates (fns, NULL_TREE, arglist, NULL_TREE,
 		  NULL_TREE, false, NULL_TREE, NULL_TREE,
diff --git c/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index 9f30d907a09..4fdac9421d1 100644
--- c/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -3741,7 +3741,7 @@ identifier_type_value_1 (tree id)
 return REAL_IDENTIFIER_TYPE_VALUE (id);
   /* Have to search for it. It must be on the global level, now.
  Ask lookup_name not to return non-types.  */
-  id = lookup_name_real (id, 2, 1, /*block_p=*/true, 0, 0);
+  id = lookup_name_real (id, LOOK_where::BLOCK_NAMESPACE, 2, 0, 0);
   if (id)
 return TREE_TYPE (id);
   return NULL_TREE;
@@ -6413,10 +6413,16 @@ innermost_non_namespace_value (tree name)
namespace of variables, functions and typedefs.  Return a ..._DECL
node of some kind representing its definition if there is only one
such declaration, or return a TREE_LIST with all the overloaded
-   definitions if there are many, or return 0 if it is undefined.
+   definitions if there are many, or return NULL_TREE if it is undefined.
Hidden name, either friend declaration or built-in function, are
not ignored.
 
+   WHERE controls which scopes are considered.  It is a bit mask of
+   LOOKUP_where::BLOCK (look in block scope), LOOKUP_where::CLASS
+   (look in class scopes) & LOOKUP_where::NAMESPACE (look in namespace
+   scopes).  It is an error for no bits to be set.  These scopes are
+   searched from innermost to outermost.
+
If PREFER_TYPE is > 0, we prefer TYPE_DECLs or namespaces.
If PREFER_TYPE is > 1, we reject non-type decls (e.g. namespaces).
Otherwise we prefer non-TYPE_DECLs.
@@ -6425,12 +6431,14 @@ innermost_non_namespace_value (tree name)
BLOCK_P is false, bindings in block scopes are ignored.  */
 
 static tree
-lookup_name_real_1 (tree name, int prefer_type, int nonclass, bool block_p,
+lookup_name_real_1 (tree name, LOOK_where where, int prefer_type,
 		int namespaces_only, int flags)
 {
   cxx_binding *iter;
   tree val = NULL_TREE;
 
+  gcc_checking_assert (unsigned (where) != 0);
+
   query_oracle (name);
 
   /* Conversion operators are handled specially because ordinary
@@ -6468,17 +6476,19 @@ lookup_name_real_1 (tree name, int prefer_type, int nonclass, bool block_p,
   /* First, look in non-namespace scopes.  */
 
   if (current_class_type == NULL_TREE)
-nonclass = 1;
+/* Maybe avoid searching the binding stack at all.  */
+where = LOOK_where (unsigned 

RE: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread Carl Love via Gcc-patches
Segher:

On Thu, 2020-08-13 at 12:36 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:22:37PM -0700, Carl Love wrote:
> > +/* Sign extend builtins that work on ISA 3.0, but not defined
> > until ISA 3.1.  */
> 
> What does this mean?  Not defined in GCC before now?  Does it need
> backporting?  Not defined in older versions of the ELFv2 ABI (or
> vector
> doc) and we do not want a backport?
> 
> > +  /* Sign extend builtins that work work on ISA 3.0, not added
> > until ISA 3.1 */

The builtins

vector signed int vec_signexti (vector signed char a)
vector signed long long vec_signextll (vector signed char a)
vector signed int vec_signexti (vector signed short a)
vector signed long long vec_signextll (vector signed short a)
vector signed long long vec_signextll (vector signed int a)

were defined in the function prototypes directory in box called "RFC
2608 - 128-bit Binary Integer Operations".  The document the new P10
builtins.  However, this subset of the newly defined builtins for P10
can be implemented with existing Power 9 instructions.  That was the
point of the comment.  That is probably a level of detail that is not
really needed in the GCC code comment.  Probably best to just change
the comment to read something like "ISA 3.0 sign extend builtins". 

My thought for calling it out is that they could be back ported to an
earlier GCC version since they use Power 9 instructions but it is
probably not worth the effort unless there is an explicit request for
them. 

 Carl 



Re: [PATCH] improve memcmp and memchr constant folding (PR 78257)

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 11:44:46AM -0600, Martin Sebor via Gcc-patches wrote:
> The underlined code above zeroes out the bytes of elements with
> no initializers as well as any padding between fields.  It doesn't
> consider CONSTRUCTOR_NO_CLEARING.  I didn't know about that bit so
> I looked it up.  According to the internals manual:
> 
>   Unrepresented fields will be cleared (zeroed), unless the
>   CONSTRUCTOR_NO_CLEARING flag is set, in which case their value
>   becomes undefined.

CONSTRUCTOR_NO_CLEARING shouldn't be relevant to the middle-end (after
gimplification).
Static variable initializers have zero initialization with or without that
bit, and other than that we only allow empty CONSTRUCTORs to mean all zeros
or VECTOR CONSTRUCTORs where missing elts are zero initialized too but
shouldn't really appear.

Jakub



Re: [PATCH] improve memcmp and memchr constant folding (PR 78257)

2020-08-13 Thread Martin Sebor via Gcc-patches

On 8/13/20 10:21 AM, Jeff Law wrote:

On Fri, 2020-07-31 at 17:55 -0600, Martin Sebor via Gcc-patches wrote:

The folders for these functions (and some others) call c_getsr
which relies on string_constant to return the representation of
constant strings.  Because the function doesn't handle constants
of other types, including aggregates, memcmp or memchr calls
involving those are not folded when they could be.

The attached patch extends the algorithm used by string_constant
to also handle constant aggregates involving elements or members
of the same types as native_encode_expr.  (The change restores
the empty initializer optimization inadvertently disabled in
the fix for pr96058.)

To avoid accidentally misusing either string_constant or c_getstr
with non-strings I have introduced a pair of new functions to get
the representation of those: byte_representation and getbyterep.

Tested on x86_64-linux.

Martin



PR tree-optimization/78257 - missing memcmp optimization with constant arrays

gcc/ChangeLog:

PR middle-end/78257
* builtins.c (expand_builtin_memory_copy_args): Rename called function.
(expand_builtin_stpcpy_1): Remove argument from call.
(expand_builtin_memcmp): Rename called function.
(inline_expand_builtin_bytecmp): Same.
* expr.c (convert_to_bytes): New function.
(constant_byte_string): New function (formerly string_constant).
(string_constant): Call constant_byte_string.
(byte_representation): New function.
* expr.h (byte_representation): Declare.
* fold-const-call.c (fold_const_call): Rename called function.
* fold-const.c (c_getstr): Remove an argument.
(getbyterep): Define a new function.
* fold-const.h (c_getstr): Remove an argument.
(getbyterep): Declare a new function.
* gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee.
(gimple_fold_builtin_string_compare): Same.
(gimple_fold_builtin_memchr): Same.

gcc/testsuite/ChangeLog:

PR middle-end/78257
* gcc.dg/memchr.c: New test.
* gcc.dg/memcmp-2.c: New test.
* gcc.dg/memcmp-3.c: New test.
* gcc.dg/memcmp-4.c: New test.

diff --git a/gcc/expr.c b/gcc/expr.c
index a150fa0d3b5..a124df54655 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11594,15 +11594,103 @@ is_aligning_offset (const_tree offset, const_tree 
exp)
/* This must now be the address of EXP.  */
return TREE_CODE (offset) == ADDR_EXPR && TREE_OPERAND (offset, 0) == exp;
  }
-
-/* Return the tree node if an ARG corresponds to a string constant or zero
-   if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
-   non-constant) offset in bytes within the string that ARG is accessing.
-   If MEM_SIZE is non-zero the storage size of the memory is returned.
-   If DECL is non-zero the constant declaration is returned if available.  */
  
-tree

-string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
+/* If EXPR is a constant initializer (either an expression or CONSTRUCTOR),
+   attempt to obtain its native representation as an array of nonzero BYTES.
+   Return true on success and false on failure (the latter without modifying
+   BYTES).  */
+
+static bool
+convert_to_bytes (tree type, tree expr, vec *bytes)
+{
+  if (TREE_CODE (expr) == CONSTRUCTOR)
+{
+  /* Set to the size of the CONSTRUCTOR elements.  */
+  unsigned HOST_WIDE_INT ctor_size = bytes->length ();
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+   {
+ tree val, idx;
+ tree eltype = TREE_TYPE (type);
+ unsigned HOST_WIDE_INT elsize =
+   tree_to_uhwi (TYPE_SIZE_UNIT (eltype));
+ unsigned HOST_WIDE_INT i, last_idx = HOST_WIDE_INT_M1U;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, idx, val)
+   {
+ /* Append zeros for elements with no initializers.  */
+ if (!tree_fits_uhwi_p (idx))
+   return false;
+ unsigned HOST_WIDE_INT cur_idx = tree_to_uhwi (idx);
+ if (unsigned HOST_WIDE_INT size = cur_idx - (last_idx + 1))
+   {
+ size = size * elsize + bytes->length ();
+ bytes->safe_grow_cleared (size);

  ^^^


+   }
+
+ if (!convert_to_bytes (eltype, val, bytes))
+   return false;
+
+ last_idx = cur_idx;
+   }
+   }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+   {
+ tree val, fld;
+ unsigned HOST_WIDE_INT i;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, fld, val)
+   {
+ /* Append zeros for members with no initializers and
+any padding.  */
+ unsigned HOST_WIDE_INT cur_off = int_byte_position (fld);
+ if (bytes->length () < cur_off)
+   bytes->safe_grow_cleared (cur_off);


Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-13 Thread Segher Boessenkool
Hi!

On Tue, Aug 11, 2020 at 12:22:37PM -0700, Carl Love wrote:
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1. 
>  */

What does this mean?  Not defined in GCC before now?  Does it need
backporting?  Not defined in older versions of the ELFv2 ABI (or vector
doc) and we do not want a backport?

> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 
> */

Same (also "work work").

> +uThe following sign extension builtins are provided.

(stray "u")

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> @@ -0,0 +1,128 @@
> +/* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } 
> */

/* { dg-do run { target { lp64 && p9vector_hw } } } */

or such; or do you require Linux actually?

> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */

Is -save-temps needed?  Not for the scan-assembler at least.

Okay for trunk with those details take care of.  Thanks!


Segher


Re: Add cold attribute to one time construction APIs

2020-08-13 Thread Jonathan Wakely via Gcc-patches
Please CC the libstdc++ list on all libstdc++ patches.

On Thu, 13 Aug 2020 at 17:51, Aditya K  wrote:
>
> Revised patch with _GLIBCXX_COLD added at the end.
>
> ```
> commit 3dc9f9a8461b1c88e991ceb517e5fdd81f268d1e
> Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
> Date:   Thu Aug 13 09:41:34 2020 -0700
>
> Add cold attribute to one time construction APIs
>
> __cxa_guard_acquire is used for only one purpose,
> namely guarding local static variable initialization,
> and since that purpose is definitionally cold, it should be attributed as 
> cold.
> Similarly for __cxa_guard_release and __cxa_guard_abort
>
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index b1fad59d4..f6f954eef 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -35,20 +35,21 @@
>
>  // The datestamp of the C++ library in compressed ISO date format.
>  #define __GLIBCXX__
>
>  // Macros for various attributes.
>  //   _GLIBCXX_PURE
>  //   _GLIBCXX_CONST
>  //   _GLIBCXX_NORETURN
>  //   _GLIBCXX_NOTHROW
>  //   _GLIBCXX_VISIBILITY
> +//   _GLIBCXX_COLD
>  #ifndef _GLIBCXX_PURE
>  # define _GLIBCXX_PURE __attribute__ ((__pure__))
>  #endif
>
>  #ifndef _GLIBCXX_CONST
>  # define _GLIBCXX_CONST __attribute__ ((__const__))
>  #endif
>
>  #ifndef _GLIBCXX_NORETURN
>  # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
> @@ -67,20 +68,24 @@
>  #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
>
>  #if _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
>  # define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V)))
>  #else
>  // If this is not supplied by the OS-specific or CPU-specific
>  // headers included below, it will be defined to an empty default.
>  # define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
>  #endif
>
> +#ifndef _GLIBCXX_COLD
> +# define _GLIBCXX_COLD __attribute__ ((cold))
> +#endif
> +
>  // Macros for deprecated attributes.
>  //   _GLIBCXX_USE_DEPRECATED
>  //   _GLIBCXX_DEPRECATED
>  //   _GLIBCXX17_DEPRECATED
>  //   _GLIBCXX20_DEPRECATED( string-literal )
>  #ifndef _GLIBCXX_USE_DEPRECATED
>  # define _GLIBCXX_USE_DEPRECATED 1
>  #endif
>
>  #if defined(__DEPRECATED) && (__cplusplus >= 201103L)
> diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
> index 000713ecd..24c1366e2 100644
> --- a/libstdc++-v3/libsupc++/cxxabi.h
> +++ b/libstdc++-v3/libsupc++/cxxabi.h
> @@ -108,27 +108,27 @@ namespace __cxxabiv1
>__cxa_vec_delete2(void* __array_address, size_t __element_size,
> size_t __padding_size, __cxa_cdtor_type __destructor,
> void (*__dealloc) (void*));
>
>void
>__cxa_vec_delete3(void* __array_address, size_t __element_size,
> size_t __padding_size, __cxa_cdtor_type __destructor,
> void (*__dealloc) (void*, size_t));
>
>int
> -  __cxa_guard_acquire(__guard*);
> +  __cxa_guard_acquire(__guard*) _GLIBCXX_COLD;
>
>void
> -  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW;
> +  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
>
>void
> -  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW;
> +  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
>
>// DSO destruction.
>int
>__cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
>
>void
>__cxa_finalize(void*);
>
>// TLS destruction.
>int
> ```
>
> From: Aditya K
> Sent: Thursday, August 13, 2020 10:47 AM
> To: Jeff Law via Gcc-patches ; jwakely@gmail.com 
> 
> Subject: Add cold attribute to one time construction APIs
>
> This would help compiler optimize local static objects.
>
> ```
> commit e2f299679ddf56a6d6d71ea9d589cd76b2ca107b
> Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
> Date:   Thu Aug 13 09:41:34 2020 -0700
>
> Add cold attribute to one time construction APIs
>
> __cxa_guard_acquire is used for only one purpose,
> namely guarding local static variable initialization,
> and since that purpose is definitionally cold, it should be attributed as 
> cold.
> Similarly for __cxa_guard_release and __cxa_guard_abort
>
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index b1fad59d4..359e955a7 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -39,20 +39,24 @@
>  // Macros for various attributes.
>  //   _GLIBCXX_PURE
>  //   _GLIBCXX_CONST
>  //   _GLIBCXX_NORETURN
>  //   _GLIBCXX_NOTHROW
>  //   _GLIBCXX_VISIBILITY
>  #ifndef _GLIBCXX_PURE
>  # define _GLIBCXX_PURE __attribute__ ((__pure__))
>  #endif
>
> +#ifndef _GLIBCXX_COLD
> +# define _GLIBCXX_COLD __attribute__ ((cold))
> +#endif
> +
>  #ifndef _GLIBCXX_CONST
>  # define _GLIBCXX_CONST __attribute__ ((__const__))
>  #endif
>
>  #ifndef _GLIBCXX_NORETURN
>  # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
>  #endif
>
>  // See below for C++
>  #ifndef 

Re: Add cold attribute to one time construction APIs

2020-08-13 Thread Aditya K via Gcc-patches
FYI libc++ patch sent for review: https://reviews.llvm.org/D85873

Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-08-13 Thread Jason Merrill via Gcc-patches

On 8/13/20 11:21 AM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Jason Merrill wrote:


On 8/10/20 2:18 PM, Patrick Palka wrote:

On Mon, 10 Aug 2020, Patrick Palka wrote:


In the below testcase, semantic analysis of the requires-expressions in
the generic lambda must be delayed until instantiation of the lambda
because the requirements depend on the lambda's template arguments.  But
tsubst_requires_expr does semantic analysis even during regeneration of
the lambda, which leads to various bogus errors and ICEs since some
subroutines aren't prepared to handle dependent/template trees.

This patch adjusts subroutines of tsubst_requires_expr to avoid doing
some problematic semantic analyses when processing_template_decl.
In particular, expr_noexcept_p generally can't be checked on a dependent
expression.  Next, tsubst_nested_requirement should avoid checking
satisfaction when processing_template_decl.  And similarly for
convert_to_void (called from tsubst_valid_expression_requirement).


I wonder if, instead of trying to do a partial substitution into a
requires-expression at all, we want to use the
PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
arguments for later satisfaction?


IIUC, avoiding partial substitution into a requires-expression would
mean we'd go from currently accepting the following testcase to
rejecting it, because we'd now instantiate B::type as part of the
first requirement before first noticing the SFINAE error in the second
requirement (which depends only on the outer template argument, and
which would determine the value of the requires-expression):

   template
   struct B { using type = T::fatal; };

   template
   constexpr auto foo() {
 return []  (U) {
   return requires { typename B::type; typename T::type; };
 };
   };

   int i = foo()(0);

I guess this is exactly the kind of testcase that motivates using the
PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism for
requires-expressions?


I think so, yes.




Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested
against the cmcstl2 project.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96409
PR c++/96410
* constraint.cc (tsubst_compound_requirement): When
processing_template_decl, don't check noexcept of the
substituted expression.
(tsubst_nested_requirement): Just substitute into the constraint
when processing_template_decl.
* cvt.c (convert_to_void): Don't resolve concept checks when
processing_template_decl.

gcc/testsuite/ChangeLog:

PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
---
   gcc/cp/constraint.cc  |  9 ++-
   gcc/cp/cvt.c  |  2 +-
   .../g++.dg/cpp2a/concepts-lambda13.C  | 25 +++
   3 files changed, 34 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index e4aace596e7..db2036502a7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1993,7 +1993,8 @@ tsubst_compound_requirement (tree t, tree args,
subst_info info)
   /* Check the noexcept condition.  */
 bool noexcept_p = COMPOUND_REQ_NOEXCEPT_P (t);
-  if (noexcept_p && !expr_noexcept_p (expr, tf_none))
+  if (!processing_template_decl
+  && noexcept_p && !expr_noexcept_p (expr, tf_none))
   return error_mark_node;
   /* Substitute through the type expression, if any.  */
@@ -2023,6 +2024,12 @@ static tree
   tsubst_nested_requirement (tree t, tree args, subst_info info)
   {
 /* Ensure that we're in an evaluation context prior to satisfaction.
*/
+  if (processing_template_decl)
+{
+  tree r = tsubst_constraint (TREE_OPERAND (t, 0), args,
+ info.complain, info.in_decl);


Oops, the patch is missing a check for error_mark_node here, so that
upon substitution failure we immediately resolve the requires-expression
to false.  Here's an updated patch with the check and a regression test
added:

-- >8 --

gcc/cp/ChangeLog:

PR c++/96409
PR c++/96410
* constraint.cc (tsubst_compound_requirement): When
processing_template_decl, don't check noexcept of the
substituted expression.
(tsubst_nested_requirement): Just substitute into the constraint
when processing_template_decl.
* cvt.c (convert_to_void): Don't resolve concept checks when
processing_template_decl.

gcc/testsuite/ChangeLog:

PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
* g++.dg/cpp2a/concepts-lambda14.C: New test.
---
   gcc/cp/constraint.cc  | 11 +++-
   gcc/cp/cvt.c  |  2 +-
   .../g++.dg/cpp2a/concepts-lambda13.C  | 25 +++
   

Re: Add cold attribute to one time construction APIs

2020-08-13 Thread Aditya K via Gcc-patches
Revised patch with _GLIBCXX_COLD added at the end.

```
commit 3dc9f9a8461b1c88e991ceb517e5fdd81f268d1e
Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
Date:   Thu Aug 13 09:41:34 2020 -0700

Add cold attribute to one time construction APIs

__cxa_guard_acquire is used for only one purpose,
namely guarding local static variable initialization,
and since that purpose is definitionally cold, it should be attributed as 
cold.
Similarly for __cxa_guard_release and __cxa_guard_abort

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4..f6f954eef 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -35,20 +35,21 @@
 
 // The datestamp of the C++ library in compressed ISO date format.
 #define __GLIBCXX__
 
 // Macros for various attributes.
 //   _GLIBCXX_PURE
 //   _GLIBCXX_CONST
 //   _GLIBCXX_NORETURN
 //   _GLIBCXX_NOTHROW
 //   _GLIBCXX_VISIBILITY
+//   _GLIBCXX_COLD
 #ifndef _GLIBCXX_PURE
 # define _GLIBCXX_PURE __attribute__ ((__pure__))
 #endif
 
 #ifndef _GLIBCXX_CONST
 # define _GLIBCXX_CONST __attribute__ ((__const__))
 #endif
 
 #ifndef _GLIBCXX_NORETURN
 # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
@@ -67,20 +68,24 @@
 #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
 
 #if _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY
 # define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V)))
 #else
 // If this is not supplied by the OS-specific or CPU-specific
 // headers included below, it will be defined to an empty default.
 # define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
 #endif
 
+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((cold))
+#endif
+
 // Macros for deprecated attributes.
 //   _GLIBCXX_USE_DEPRECATED
 //   _GLIBCXX_DEPRECATED
 //   _GLIBCXX17_DEPRECATED
 //   _GLIBCXX20_DEPRECATED( string-literal )
 #ifndef _GLIBCXX_USE_DEPRECATED
 # define _GLIBCXX_USE_DEPRECATED 1
 #endif
 
 #if defined(__DEPRECATED) && (__cplusplus >= 201103L)
diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 000713ecd..24c1366e2 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -108,27 +108,27 @@ namespace __cxxabiv1
   __cxa_vec_delete2(void* __array_address, size_t __element_size,
size_t __padding_size, __cxa_cdtor_type __destructor,
void (*__dealloc) (void*));
 
   void
   __cxa_vec_delete3(void* __array_address, size_t __element_size,
size_t __padding_size, __cxa_cdtor_type __destructor,
void (*__dealloc) (void*, size_t));
 
   int
-  __cxa_guard_acquire(__guard*);
+  __cxa_guard_acquire(__guard*) _GLIBCXX_COLD;
 
   void
-  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW;
+  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
 
   void
-  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW;
+  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
 
   // DSO destruction.
   int
   __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
 
   void
   __cxa_finalize(void*);
 
   // TLS destruction.
   int
```

From: Aditya K
Sent: Thursday, August 13, 2020 10:47 AM
To: Jeff Law via Gcc-patches ; jwakely@gmail.com 

Subject: Add cold attribute to one time construction APIs 
 
This would help compiler optimize local static objects.

```
commit e2f299679ddf56a6d6d71ea9d589cd76b2ca107b
Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
Date:   Thu Aug 13 09:41:34 2020 -0700

    Add cold attribute to one time construction APIs
    
    __cxa_guard_acquire is used for only one purpose,
    namely guarding local static variable initialization,
    and since that purpose is definitionally cold, it should be attributed as 
cold.
    Similarly for __cxa_guard_release and __cxa_guard_abort

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4..359e955a7 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -39,20 +39,24 @@
 // Macros for various attributes.
 //   _GLIBCXX_PURE
 //   _GLIBCXX_CONST
 //   _GLIBCXX_NORETURN
 //   _GLIBCXX_NOTHROW
 //   _GLIBCXX_VISIBILITY
 #ifndef _GLIBCXX_PURE
 # define _GLIBCXX_PURE __attribute__ ((__pure__))
 #endif
 
+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((cold))
+#endif
+
 #ifndef _GLIBCXX_CONST
 # define _GLIBCXX_CONST __attribute__ ((__const__))
 #endif
 
 #ifndef _GLIBCXX_NORETURN
 # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
 #endif
 
 // See below for C++
 #ifndef _GLIBCXX_NOTHROW
diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 000713ecd..24c1366e2 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -108,27 +108,27 @@ namespace __cxxabiv1
   __cxa_vec_delete2(void* __array_address, size_t __element_size,
 size_t __padding_size, 

Re: [PATCH 5/5] extend -Warray-bounds to detect out-of-bounds accesses to array parameters

2020-08-13 Thread Jeff Law via Gcc-patches
On Tue, 2020-07-28 at 19:24 -0600, Martin Sebor via Gcc-patches wrote:
> Patch 5 adds support for -Warray-bounds to detect out of bounds accesses
> in functions that take array/VLA arguments.  The changes also enable
> the warning for dynamically allocated memory and with it the detection
> of accesses that are only partially out of bounds (e.g., accessing
> a four byte int in the last two bytes of a buffer).  In hindsight this
> seems independent of the attribute access enhancement so I suppose it
> could have been split up into a separate change but I doubt it would
> reduce the size of the diff by more than 30 lines.

> [5/5] - Extend -Warray-bounds to detect out-of-bounds accesses to array 
> parameters.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/82608
>   PR middle-end/94195
>   PR c/50584
>   PR middle-end/84051
>   * gimple-array-bounds.cc (get_base_decl): New function.
>   (get_ref_size): New function.
>   (trailing_array): New function.
>   (array_bounds_checker::check_array_ref): Call them.  Handle arrays
>   declared in function parameters.
>   (array_bounds_checker::check_mem_ref):  Same.  Handle references to
>   dynamically allocated arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/82608
>   PR middle-end/94195
>   PR c/50584
>   PR middle-end/84051
>   * gcc.dg/Warray-bounds-63.c: New test.
>   * gcc.dg/Warray-bounds-64.c: New test.
>   * gcc.dg/Warray-bounds-65.c: New test.
>   * gcc.dg/Warray-bounds-66.c: New test.
> 
> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index c2dd6663c3a..b93ef7a7b74 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc
> @@ -36,6 +36,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "vr-values.h"
>  #include "domwalk.h"
>  #include "tree-cfg.h"
> +#include "attribs.h"
> +#include "builtins.h"
>  
>  // This purposely returns a value_range, not a value_range_equiv, to
>  // break the dependency on equivalences for this pass.
> @@ -46,19 +48,137 @@ array_bounds_checker::get_value_range (const_tree op)
>return ranges->get_value_range (op);
>  }
>  
> +/* Try to determine the DECL that REF refers to.  Return the DECL or
> +   the expression closest to it.  Used in informational notes pointing
> +   to referenced objects or function parameters.  */
> +
> +static tree
> +get_base_decl (tree ref)
[ ... ]

> +
> +/* Return the constant byte size of the object or type referenced by
> +   the MEM_REF ARG.  On success, set *PREF to the DECL or expression
> +   ARG refers to.  Otherwise return null.  */
> +
> +static tree
> +get_ref_size (tree arg, tree *pref)
[ ... ]
I'm surprised we don't already have routines to do this.  
get_ref_base_and_extent perhaps?

Otherwise it seems reasonable to me.  
Jeff


> 



Add cold attribute to one time construction APIs

2020-08-13 Thread Aditya K via Gcc-patches
This would help compiler optimize local static objects.

```
commit e2f299679ddf56a6d6d71ea9d589cd76b2ca107b
Author: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
Date:   Thu Aug 13 09:41:34 2020 -0700

Add cold attribute to one time construction APIs

__cxa_guard_acquire is used for only one purpose,
namely guarding local static variable initialization,
and since that purpose is definitionally cold, it should be attributed as 
cold.
Similarly for __cxa_guard_release and __cxa_guard_abort

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4..359e955a7 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -39,20 +39,24 @@
 // Macros for various attributes.
 //   _GLIBCXX_PURE
 //   _GLIBCXX_CONST
 //   _GLIBCXX_NORETURN
 //   _GLIBCXX_NOTHROW
 //   _GLIBCXX_VISIBILITY
 #ifndef _GLIBCXX_PURE
 # define _GLIBCXX_PURE __attribute__ ((__pure__))
 #endif
 
+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((cold))
+#endif
+
 #ifndef _GLIBCXX_CONST
 # define _GLIBCXX_CONST __attribute__ ((__const__))
 #endif
 
 #ifndef _GLIBCXX_NORETURN
 # define _GLIBCXX_NORETURN __attribute__ ((__noreturn__))
 #endif
 
 // See below for C++
 #ifndef _GLIBCXX_NOTHROW
diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 000713ecd..24c1366e2 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -108,27 +108,27 @@ namespace __cxxabiv1
   __cxa_vec_delete2(void* __array_address, size_t __element_size,
size_t __padding_size, __cxa_cdtor_type __destructor,
void (*__dealloc) (void*));
 
   void
   __cxa_vec_delete3(void* __array_address, size_t __element_size,
size_t __padding_size, __cxa_cdtor_type __destructor,
void (*__dealloc) (void*, size_t));
 
   int
-  __cxa_guard_acquire(__guard*);
+  __cxa_guard_acquire(__guard*) _GLIBCXX_COLD;
 
   void
-  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW;
+  __cxa_guard_release(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
 
   void
-  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW;
+  __cxa_guard_abort(__guard*) _GLIBCXX_NOTHROW _GLIBCXX_COLD;
 
   // DSO destruction.
   int
   __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
 
   void
   __cxa_finalize(void*);
 
   // TLS destruction.
   int
```

[PATCH V2 4/4] Change C front end to emit structured loop and switch tree nodes.

2020-08-13 Thread Sandra Loosemore
2020-08-12  Sandra Loosemore  

gcc/c
* c-decl.c (c_break_label, c_cont_label): Delete, and replace
with...
(in_statement): New.
(start_function): Adjust for above change.
(c_push_function_context, c_pop_function_context): Likewise.
* c-lang.h (struct language_function): Likewise.
* c-objc-common.h (LANG_HOOKS_BLOCK_MAY_FALLTHRU): Define.
* c-parser.c (objc_foreach_break_label, objc_foreach_continue_label):
New.
(c_parser_statement_after_labels): Adjust calls to c_finish_bc_stmt.
(c_parser_switch_statement): Adjust break/switch context handling
and calls to renamed functions.
(c_parser_while_statement): Adjust break/switch context handling and
build a WHILE_STMT.
(c_parser_do_statement): Ditto, with DO_STMT respectively.
(c_parser_for_statement): Ditto, with FOR_STMT respectively.
(c_parser_omp_for_loop): Adjust break/switch context handling.
* c-tree.h (c_break_label, c_cont_label): Delete.
(IN_SWITCH_STMT, IN_ITERATION_STMT): Define.
(IN_OMP_BLOCK, IN_OMP_FOR, IN_OBJC_FOREACH): Define.
(in_statement, switch_statement_break_seen_p): Declare.
(c_start_case, c_finish_case): Renamed to...
(c_start_switch, c_finish_switch).
(c_finish_bc_stmt): Adjust arguments.
* c-typeck.c (build_function_call_vec): Don't try to print
statements with %qE format.
(struct c_switch):  Rename switch_expr field to switch_stmt.
Add break_stmt_seen_p field.
(c_start_case): Rename to c_start_switch.  Build a SWITCH_STMT
instead of a SWITCH_EXPR.  Update for changes to struct c_switch.
(do_case): Update for changes to struct c_switch.
(c_finish_case): Rename to c_finish_switch.  Update for changes to
struct c_switch and change of representation from SWITCH_EXPR to
SWITCH_STMT.
(c_finish_loop): Delete.
(c_finish_bc_stmt): Update to reflect changes to break/continue
state representation.  Build a BREAK_STMT or CONTINUE_STMT instead
of a GOTO_EXPR except for objc foreach loops.

gcc/objc
* objc-act.c (objc_start_method_definition): Update to reflect
changes to break/continue state bookkeeping in C front end.

gcc/testsuite/
* gcc.dg/gomp/block-7.c: Update expected error message wording.
---
 gcc/c/c-decl.c  |  18 ++-
 gcc/c/c-lang.h  |   3 +-
 gcc/c/c-objc-common.h   |   2 +
 gcc/c/c-parser.c| 125 ++--
 gcc/c/c-tree.h  |  21 +++-
 gcc/c/c-typeck.c| 227 +++-
 gcc/objc/ChangeLog  |   5 +
 gcc/objc/objc-act.c |   6 +-
 gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
 9 files changed, 169 insertions(+), 250 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5d6b504..c82af63 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -112,9 +112,9 @@ struct obstack parser_obstack;
 
 static GTY(()) struct stmt_tree_s c_stmt_tree;
 
-/* State saving variables.  */
-tree c_break_label;
-tree c_cont_label;
+/* Zero if we are not in an iteration or switch statement, otherwise
+   a bitmask.  See bitmask definitions in c-tree.h.  */
+unsigned char in_statement;
 
 /* A list of decls to be made automatically visible in each file scope.  */
 static GTY(()) tree visible_builtins;
@@ -9160,10 +9160,8 @@ start_function (struct c_declspecs *declspecs, struct 
c_declarator *declarator,
   warn_about_return_type = 0;
   c_switch_stack = NULL;
 
-  /* Indicate no valid break/continue context by setting these variables
- to some non-null, non-label value.  We'll notice and emit the proper
- error message in c_finish_bc_stmt.  */
-  c_break_label = c_cont_label = size_zero_node;
+  /* Indicate no valid break/continue context.  */
+  in_statement = 0;
 
   decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL,
  , NULL, NULL, DEPRECATED_NORMAL);
@@ -10164,8 +10162,7 @@ c_push_function_context (void)
 
   p->base.x_stmt_tree = c_stmt_tree;
   c_stmt_tree.x_cur_stmt_list = vec_safe_copy (c_stmt_tree.x_cur_stmt_list);
-  p->x_break_label = c_break_label;
-  p->x_cont_label = c_cont_label;
+  p->x_in_statement = in_statement;
   p->x_switch_stack = c_switch_stack;
   p->arg_info = current_function_arg_info;
   p->returns_value = current_function_returns_value;
@@ -10204,8 +10201,7 @@ c_pop_function_context (void)
 
   c_stmt_tree = p->base.x_stmt_tree;
   p->base.x_stmt_tree.x_cur_stmt_list = NULL;
-  c_break_label = p->x_break_label;
-  c_cont_label = p->x_cont_label;
+  in_statement = p->x_in_statement;
   c_switch_stack = p->x_switch_stack;
   current_function_arg_info = p->arg_info;
   current_function_returns_value = p->returns_value;
diff --git a/gcc/c/c-lang.h 

[PATCH V2 3/4] Work around bootstrap failure in Fortran front end.

2020-08-13 Thread Sandra Loosemore
Switching the C++ front end to lower loops the same was as the C front
end triggered this error when bootstrapping the Fortran front end:

/path/to/gcc/fortran/interface.c:3546:12: error: '*new_arg' may be used 
uninitialized [-Werror=maybe-uninitialized]
 3546 |   new_arg[i]->next = NULL;
  |   ~^

Work around this by adding an assertion, which seems appropriate for
documentation and good coding practices anyway.

2020-08-12  Sandra Loosemore  

gcc/fortran/
* interface.c (gfc_compare_actual_formal): Add assertion after
main processing loop to silence maybe-uninitialized error.
---
 gcc/fortran/interface.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 7985fc7..9fea94c 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3527,6 +3527,10 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, 
gfc_formal_arglist *formal,
}
 }
 
+  /* We should have handled the cases where the formal arglist is null
+ already.  */
+  gcc_assert (n > 0);
+
   /* The argument lists are compatible.  We now relink a new actual
  argument list with null arguments in the right places.  The head
  of the list remains the head.  */
-- 
2.8.1



[PATCH V2 0/4] Unify C and C++ handling of loops and switches

2020-08-13 Thread Sandra Loosemore
This is a revised version of the patch set originally posted
last November:

https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534142.html

In addition to generally updating and rebasing the patches to reflect
other changes on mainline in the meantime, for this version I have
switched to using the C lowering strategy (directly to goto form)
rather than the C++ one (to LOOP_EXPR) because of regressions in the C
optimization tests.  Besides the ones previously noted in the original
patch submission, there were a bunch of new ones since November.  Some
of them were trivial to fix (e.g., flipping branch probabilities to
reflect the different sense of the loop exit condition in the
C++-style output), but I wasn't making much progress on others and
eventually decided to pursue the "plan B" of using the C-style output
everywhere, as discussed here:

https://gcc.gnu.org/pipermail/gcc-patches/2019-December/536536.html

The only regression I ran into with this was a bootstrap failure
building the Fortran front end from a new -Wmaybe-uninitialized error.
This might be a false positive but part 3 of the new series works
around it by adding an assertion to give g++ a hint.  Unfortunately I
had no luck in trying to reduce this to a standalone test case, but I
did observe that the failure went away when I compiled that file with
debugging enabled.  :-S  I could file a PR to look into this further if
the workaround is good enough for now.

-Sandra


Sandra Loosemore (4):
  Move loop and switch tree data structures from cp/ to c-family/.
  Use C-style loop lowering instead of C++-style.
  Work around bootstrap failure in Fortran front end.
  Change C front end to emit structured loop and switch tree nodes.

 gcc/c-family/c-common.c |  24 ++
 gcc/c-family/c-common.def   |  24 ++
 gcc/c-family/c-common.h |  53 +++-
 gcc/c-family/c-dump.c   |  38 +++
 gcc/c-family/c-gimplify.c   | 422 
 gcc/c-family/c-pretty-print.c   |  92 ++-
 gcc/c/c-decl.c  |  18 +-
 gcc/c/c-lang.h  |   3 +-
 gcc/c/c-objc-common.h   |   2 +
 gcc/c/c-parser.c| 125 +-
 gcc/c/c-tree.h  |  21 +-
 gcc/c/c-typeck.c| 227 ++---
 gcc/cp/cp-gimplify.c| 469 +++-
 gcc/cp/cp-objcp-common.c|  13 +-
 gcc/cp/cp-tree.def  |  23 --
 gcc/cp/cp-tree.h|  40 ---
 gcc/cp/cxx-pretty-print.c   |  78 --
 gcc/cp/dump.c   |  31 ---
 gcc/doc/generic.texi|  56 +++--
 gcc/fortran/interface.c |   4 +
 gcc/objc/ChangeLog  |   5 +
 gcc/objc/objc-act.c |   6 +-
 gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
 23 files changed, 938 insertions(+), 848 deletions(-)

-- 
2.8.1



[PATCH V2 1/4] Move loop and switch tree data structures from cp/ to c-family/.

2020-08-13 Thread Sandra Loosemore
This patch moves the definitions for DO_STMT, FOR_STMT, WHILE_STMT,
SWITCH_STMT, BREAK_STMT, and CONTINUE_STMT from the C++ front end to
c-family.  This includes the genericizers, pretty-printers, and dump
support as well as the tree definitions and accessors.  Some related
code for OMP_FOR and similar OMP constructs is also moved.

2020-08-12  Sandra Loosemore  

gcc/c-family/
* c-common.c (c_block_may_fallthrough): New, split from
cxx_block_may_fallthrough in the cp front end.
(c_common_init_ts): Move handling of loop and switch-related
statements here from the cp front end.
* c-common.def (FOR_STMT, WHILE_STMT, DO_STMT): Move here from
cp front end.
(BREAK_STMT, CONTINUE_STMT, SWITCH_STMT): Likewise.
* c-common.h (c_block_may_fallthru): Declare.
(bc_state_t): Move here from cp front end.
(save_bc_state, restore_bc_state): Declare.
(c_genericize_control_stmt): Declare.
(WHILE_COND, WHILE_BODY): Likewise.
(DO_COND, DO_BODY): Likewise.
(FOR_INIT_STMT, FOR_COND, FOR_EXPR, FOR_BODY, FOR_SCOPE): Likewise.
(SWITCH_STMT_COND, SWITCH_STMT_BODY): Likewise.
(SWITCH_STMT_TYPE, SWITCH_STMT_SCOPE): Likewise.
(SWITCH_STMT_ALL_CASES_P, SWITCH_STMT_NO_BREAK_P): Likewise.
(LABEL_DECL_BREAK, LABEL_DECL_CONTINUE): Likewise.
* c-dump.c (dump_stmt): Copy from cp front end.
(c_dump_tree): Move code to handle structured loop and switch
tree nodes here from cp front end.
* c-gimplify.c: Adjust includes.
(enum bc_t, bc_label, begin_bc_block, finish_bc_block): Move from
cp front end.
(save_bc_state, restore_bc_state): New functions using old code
from cp front end.
(get_bc_label, expr_loc_or_loc): Move from cp front end.
(genericize_c_loop): Move from cp front end.
(genericize_for_stmt, genericize_while_stmt): Likewise.
(genericize_do_stmt, genericize_switch_stmt): Likewise.
(genericize_continue_stmt, genericize_break_stmt): Likewise.
(genericize_omp_for_stmt): Likewise.
(c_genericize_control_stmt): New function using code split from
cp front end.
(c_genericize_control_r): New.
(c_genericize): Call walk_tree with c_genericize_control_r.
* c-pretty-print.c (c_pretty_printer::statement): Move code to handle
structured loop and switch tree nodes here from cp front end.

gcc/cp/
* cp-gimplify.c (enum bc_t, bc_label): Move to c-family.
(begin_bc_block, finish_bc_block, get_bc_label): Likewise.
(genericize_cp_loop): Likewise.
(genericize_for_stmt, genericize_while_stmt): Likewise.
(genericize_do_stmt, genericize_switch_stmt): Likewise.
(genericize_continue_stmt, genericize_break_stmt): Likewise.
(genericize_omp_for_stmt): Likewise.
(cp_genericize_r): Call c_genericize_control_stmt instead of
above functions directly.
(cp_genericize): Call save_bc_state and restore_bc_state instead
of manipulating bc_label directly.
* cp-objcp-common.c (cxx_block_may_fallthru): Defer to
c_block_may_fallthru instead of handling SWITCH_STMT here.
(cp_common_init_ts): Move handling of loop and switch-related
statements to c-family.
* cp-tree.def (FOR_STMT, WHILE_STMT, DO_STMT): Move to c-family.
(BREAK_STMT, CONTINUE_STMT, SWITCH_STMT): Likewise.
* cp-tree.h (LABEL_DECL_BREAK, LABEL_DECL_CONTINUE): Likewise.
(WHILE_COND, WHILE_BODY): Likewise.
(DO_COND, DO_BODY): Likewise.
(FOR_INIT_STMT, FOR_COND, FOR_EXPR, FOR_BODY, FOR_SCOPE): Likewise.
(SWITCH_STMT_COND, SWITCH_STMT_BODY): Likewise.
(SWITCH_STMT_TYPE, SWITCH_STMT_SCOPE): Likewise.
(SWITCH_STMT_ALL_CASES_P, SWITCH_STMT_NO_BREAK_P): Likewise.
* cxx-pretty-print.c (cxx_pretty_printer::statement): Move code
to handle structured loop and switch tree nodes to c-family.
* dump.c (cp_dump_tree): Likewise.

gcc/
* doc/generic.texi (Basic Statements): Document SWITCH_EXPR here,
not SWITCH_STMT.
(Statements for C and C++): Rename node to reflect what
the introduction already says about sharing between C and C++
front ends.  Copy-edit and correct documentation for structured
loops and switch.
---
 gcc/c-family/c-common.c   |  24 +++
 gcc/c-family/c-common.def |  24 +++
 gcc/c-family/c-common.h   |  53 -
 gcc/c-family/c-dump.c |  38 
 gcc/c-family/c-gimplify.c | 408 
 gcc/c-family/c-pretty-print.c |  92 -
 gcc/cp/cp-gimplify.c  | 469 --
 gcc/cp/cp-objcp-common.c  |  13 +-
 gcc/cp/cp-tree.def|  23 ---
 gcc/cp/cp-tree.h  |  40 
 gcc/cp/cxx-pretty-print.c |  78 ---
 

[PATCH V2 2/4] Use C-style loop lowering instead of C++-style.

2020-08-13 Thread Sandra Loosemore
The C and C++ front ends used to use the same strategy of lowering
loops to gotos with the end test canonicalized to the bottom of the
loop.  In 2014 the C++ front end was changed to emit LOOP_EXPRs
instead (commit 1a45860e7757ee054f6bf98bee4ebe5c661dfb90).

As part of the unification of the C and C++ loop handling, it's
desirable to use the same lowering strategy for both languages.
Applying the C++ strategy to C caused a number of regressions in C
optimization tests, related to flipping the sense of the COND_EXPR for
the exit test and changes in block ordering in the output code.  Many
of these regressions just require updating regexps in the test cases
but a few appear to be genuine optimization failures.  Since it
appears the optimizers handle the C code better than C++ code, let's
go back to using the C strategy for both languages.  The rationale for
the 2014 C++ patch (support for constexpr evaluation) has been solved
in other ways meanwhile.

2020-08-12  Sandra Loosemore  

gcc/c-family/
* c-gimplify.c (genericize_c_loop): Rewrite to match
c_finish_loop in c-typeck.c.
---
 gcc/c-family/c-gimplify.c | 110 ++
 1 file changed, 62 insertions(+), 48 deletions(-)

diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
index db930fc..8b326c9 100644
--- a/gcc/c-family/c-gimplify.c
+++ b/gcc/c-family/c-gimplify.c
@@ -217,9 +217,10 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
   void *data, walk_tree_fn func, walk_tree_lh lh)
 {
   tree blab, clab;
-  tree exit = NULL;
+  tree entry = NULL, exit = NULL, t;
   tree stmt_list = NULL;
-  tree debug_begin = NULL;
+  location_t cond_locus = expr_loc_or_loc (cond, start_locus);
+  location_t incr_locus = expr_loc_or_loc (incr, start_locus);
 
   protected_set_expr_location_if_unset (incr, start_locus);
 
@@ -232,35 +233,68 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
   walk_tree_1 (, func, data, NULL, lh);
   *walk_subtrees = 0;
 
-  if (MAY_HAVE_DEBUG_MARKER_STMTS
-  && (!cond || !integer_zerop (cond)))
+  /* If condition is zero don't generate a loop construct.  */
+  if (cond && integer_zerop (cond))
 {
-  debug_begin = build0 (DEBUG_BEGIN_STMT, void_type_node);
-  SET_EXPR_LOCATION (debug_begin, expr_loc_or_loc (cond, start_locus));
+  if (cond_is_first)
+   {
+ t = build1_loc (start_locus, GOTO_EXPR, void_type_node,
+ get_bc_label (bc_break));
+ append_to_statement_list (t, _list);
+   }
 }
-
-  if (cond && TREE_CODE (cond) != INTEGER_CST)
+  else
 {
-  /* If COND is constant, don't bother building an exit.  If it's false,
-we won't build a loop.  If it's true, any exits are in the body.  */
-  location_t cloc = expr_loc_or_loc (cond, start_locus);
-  exit = build1_loc (cloc, GOTO_EXPR, void_type_node,
-get_bc_label (bc_break));
-  exit = fold_build3_loc (cloc, COND_EXPR, void_type_node, cond,
- build_empty_stmt (cloc), exit);
-}
+  /* Expand to gotos.  */
+  tree top = build1 (LABEL_EXPR, void_type_node,
+create_artificial_label (start_locus));
 
-  if (exit && cond_is_first)
-{
-  append_to_statement_list (debug_begin, _list);
-  debug_begin = NULL_TREE;
-  append_to_statement_list (exit, _list);
+  /* If we have an exit condition, then we build an IF with gotos either
+out of the loop, or to the top of it.  If there's no exit condition,
+then we just build a jump back to the top.  */
+  exit = build1 (GOTO_EXPR, void_type_node, LABEL_EXPR_LABEL (top));
+
+  if (cond && !integer_nonzerop (cond))
+   {
+ /* Canonicalize the loop condition to the end.  This means
+generating a branch to the loop condition.  Reuse the
+continue label, if there is no incr expression.  */
+ if (cond_is_first)
+   {
+ if (incr)
+   {
+ entry = build1 (LABEL_EXPR, void_type_node,
+ create_artificial_label (start_locus));
+ t = build1_loc (start_locus, GOTO_EXPR, void_type_node,
+ LABEL_EXPR_LABEL (entry));
+   }
+ else
+   t = build1_loc (start_locus, GOTO_EXPR, void_type_node,
+   get_bc_label (bc_continue));
+ append_to_statement_list (t, _list);
+   }
+
+ t = build1 (GOTO_EXPR, void_type_node, get_bc_label (bc_break));
+ exit = fold_build3_loc (cond_locus,
+ COND_EXPR, void_type_node, cond, exit, t);
+   }
+  else
+   {
+ /* For the backward-goto's location of an unconditional loop
+use the beginning of the body, or, if there is none, the
+

Re: [PATCH 4/5] - extend -Wstringop-overflow to detect out-of-bounds accesses to array parameters

2020-08-13 Thread Jeff Law via Gcc-patches
On Tue, 2020-07-28 at 19:22 -0600, Martin Sebor via Gcc-patches wrote:
> Patch 4 adds support to the machinery behind -Wstringop-overflow
> to issue warnings for (likely) out of bounds accesses in calls to
> functions with the internal attribute access specification.  This
> implements the feature pr50584 asks for (plus more).

> [4/5] - Extend -Wstringop-overflow to detect out-of-bounds accesses to array 
> parameters.
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-uninit.c (maybe_warn_pass_by_reference): Handle attribute
>   access internal representation of arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/uninit-37.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR c/50584
>   * builtins.c (warn_for_access): Add argument.  Distinguish between
>   reads and writes.
>   (check_access): Add argument.  Distinguish between reads and writes.
>   (gimple_call_alloc_size): Set range even on failure.
>   (gimple_parm_array_size): New function.
>   (compute_objsize): Call it.
>   (check_memop_access): Pass check_access an additional argument.
>   (expand_builtin_memchr, expand_builtin_strcat): Same.
>   (expand_builtin_strcpy, expand_builtin_stpcpy_1): Same.
>   (expand_builtin_stpncpy, check_strncat_sizes): Same.
>   (expand_builtin_strncat, expand_builtin_strncpy): Same.
>   (expand_builtin_memcmp): Same.
>   * builtins.h (compute_objsize): Declare a new overload.
>   (gimple_parm_array_size): Declare.
>   (check_access): Add argument.
>   * calls.c (append_attrname): Simplify.
>   (maybe_warn_rdwr_sizes): Handle internal attribute access.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/50584
>   * c-c++-common/Wsizeof-pointer-memaccess1.c: Disable new expected
>   warnings.
>   * g++.dg/ext/attr-access.C: Update text of expected warnings.
>   * gcc.dg/Wstringop-overflow-23.c: Same.
>   * gcc.dg/Wstringop-overflow-24.c: Same.
>   * gcc.dg/attr-access-none.c: Same.
>   * gcc.dg/Wstringop-overflow-40.c: New test.
>   * gcc.dg/attr-access-2.c: New test.
OK once the prereqs are approved.

jeff
> 



Re: [PATCH 0/5] add checking of function array parameters (PR 50584)

2020-08-13 Thread Jeff Law via Gcc-patches
On Tue, 2020-07-28 at 19:20 -0600, Martin Sebor via Gcc-patches wrote:
> Patch 3 adjusts tree-ssa-uninit.c to the changes to attribute access but
> has only a cosmetic effect on informational notes in -Wuninitialized.
> [3/5] - Make use of new attribute access infrastructure in tree-ssa-uninit.c.
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-uninit.c (maybe_warn_pass_by_reference): Handle attribute
>   access internal representation of arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/uninit-37.c: New test.
> 
This is fine once the preqreqs are approved.

jeff



Re: [PATCH] improve memcmp and memchr constant folding (PR 78257)

2020-08-13 Thread Jeff Law via Gcc-patches
On Fri, 2020-07-31 at 17:55 -0600, Martin Sebor via Gcc-patches wrote:
> The folders for these functions (and some others) call c_getsr
> which relies on string_constant to return the representation of
> constant strings.  Because the function doesn't handle constants
> of other types, including aggregates, memcmp or memchr calls
> involving those are not folded when they could be.
> 
> The attached patch extends the algorithm used by string_constant
> to also handle constant aggregates involving elements or members
> of the same types as native_encode_expr.  (The change restores
> the empty initializer optimization inadvertently disabled in
> the fix for pr96058.)
> 
> To avoid accidentally misusing either string_constant or c_getstr
> with non-strings I have introduced a pair of new functions to get
> the representation of those: byte_representation and getbyterep.
> 
> Tested on x86_64-linux.
> 
> Martin

> PR tree-optimization/78257 - missing memcmp optimization with constant arrays
> 
> gcc/ChangeLog:
> 
>   PR middle-end/78257
>   * builtins.c (expand_builtin_memory_copy_args): Rename called function.
>   (expand_builtin_stpcpy_1): Remove argument from call.
>   (expand_builtin_memcmp): Rename called function.
>   (inline_expand_builtin_bytecmp): Same.
>   * expr.c (convert_to_bytes): New function.
>   (constant_byte_string): New function (formerly string_constant).
>   (string_constant): Call constant_byte_string.
>   (byte_representation): New function.
>   * expr.h (byte_representation): Declare.
>   * fold-const-call.c (fold_const_call): Rename called function.
>   * fold-const.c (c_getstr): Remove an argument.
>   (getbyterep): Define a new function.
>   * fold-const.h (c_getstr): Remove an argument.
>   (getbyterep): Declare a new function.
>   * gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee.
>   (gimple_fold_builtin_string_compare): Same.
>   (gimple_fold_builtin_memchr): Same.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/78257
>   * gcc.dg/memchr.c: New test.
>   * gcc.dg/memcmp-2.c: New test.
>   * gcc.dg/memcmp-3.c: New test.
>   * gcc.dg/memcmp-4.c: New test.
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index a150fa0d3b5..a124df54655 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11594,15 +11594,103 @@ is_aligning_offset (const_tree offset, const_tree 
> exp)
>/* This must now be the address of EXP.  */
>return TREE_CODE (offset) == ADDR_EXPR && TREE_OPERAND (offset, 0) == exp;
>  }
> -
> -/* Return the tree node if an ARG corresponds to a string constant or zero
> -   if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
> -   non-constant) offset in bytes within the string that ARG is accessing.
> -   If MEM_SIZE is non-zero the storage size of the memory is returned.
> -   If DECL is non-zero the constant declaration is returned if available.  */
>  
> -tree
> -string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
> +/* If EXPR is a constant initializer (either an expression or CONSTRUCTOR),
> +   attempt to obtain its native representation as an array of nonzero BYTES.
> +   Return true on success and false on failure (the latter without modifying
> +   BYTES).  */
> +
> +static bool
> +convert_to_bytes (tree type, tree expr, vec *bytes)
> +{
> +  if (TREE_CODE (expr) == CONSTRUCTOR)
> +{
> +  /* Set to the size of the CONSTRUCTOR elements.  */
> +  unsigned HOST_WIDE_INT ctor_size = bytes->length ();
> +
> +  if (TREE_CODE (type) == ARRAY_TYPE)
> + {
> +   tree val, idx;
> +   tree eltype = TREE_TYPE (type);
> +   unsigned HOST_WIDE_INT elsize =
> + tree_to_uhwi (TYPE_SIZE_UNIT (eltype));
> +   unsigned HOST_WIDE_INT i, last_idx = HOST_WIDE_INT_M1U;
> +   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, idx, val)
> + {
> +   /* Append zeros for elements with no initializers.  */
> +   if (!tree_fits_uhwi_p (idx))
> + return false;
> +   unsigned HOST_WIDE_INT cur_idx = tree_to_uhwi (idx);
> +   if (unsigned HOST_WIDE_INT size = cur_idx - (last_idx + 1))
> + {
> +   size = size * elsize + bytes->length ();
> +   bytes->safe_grow_cleared (size);
> + }
> +
> +   if (!convert_to_bytes (eltype, val, bytes))
> + return false;
> +
> +   last_idx = cur_idx;
> + }
> + }
> +  else if (TREE_CODE (type) == RECORD_TYPE)
> + {
> +   tree val, fld;
> +   unsigned HOST_WIDE_INT i;
> +   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, fld, val)
> + {
> +   /* Append zeros for members with no initializers and
> +  any padding.  */
> +   unsigned HOST_WIDE_INT cur_off = int_byte_position (fld);
> +   if (bytes->length () < cur_off)
> + bytes->safe_grow_cleared (cur_off);
> +

[PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-13 Thread Carl Love via Gcc-patches
GCC maintainers:

The macro expansion for the bfloat convert intrinsics XVCVBF16SP and
XVCVSPBF16 need to be restricted to P10.

The macro expansions BU_P10V_0, BU_P10V_1, BU_P10V_2, BU_P10V_3 expand
the name field as "__builtin_altivec_".  These macro expansions are
being used for both VSX and Altivec instructions.  There needs to be
separate expansions for VSX with the name field "__builtin_vsx_" and
for Altivec with the name field "__builtin_altivec_".  

The following patch creates new macro expansions BU_P10V_VSX_# and 
BU_P10V_AV_# for the VSX and Altivec instructions respectively.  The
new names are consistent with the P8 and P9 naming convention for the
VSX and Altivec instructions.

The macro expansion for XVCVBF16SP and XVCVSPBF16 is changed from
BU_VSX_1 to BU_P10V_VSX_1 to restrict it to P10 and beyond.  Also MISC
is changed to CONST in the macro expansion call.

The side effect of creating the macro expansions for VSX and Altivec is
it changes all of the expanded names.  The patch fixes all the uses of
the expanded names as needed for the new VSX and Altivec macros.

The patch has been run on 

powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.

Please let me know if the patch is acceptable for trunk.

Carl Love

-
[PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V 
macro definitions.

gcc/ChangeLog

2020-08-12  Carl Love  
* config/rs6000/rs6000-builtin.def (BU_P10V_0, BU_P10V_1,
BU_P10V_2, BU_P10V_3): Rename BU_P10V_VSX_0, BU_P10V_VSX_1,
BU_P10V_VSX_2, BU_P10V_VSX_3 respectively.
(BU_P10V_4): Remove.
(BU_P10V_AV_0, BU_P10V_AV_1, BU_P10V_AV_2, BU_P10V_AV_3, BU_P10V_AV_4):
New definitions for Power 10 Altivec macros.
(VSTRIBR, VSTRIHR, VSTRIBL, VSTRIHL, VSTRIBR_P, VSTRIHR_P,
VSTRIBL_P, VSTRIHL_P, MTVSRBM, MTVSRHM, MTVSRWM, MTVSRDM, MTVSRQM,
VEXPANDMB, VEXPANDMH, VEXPANDMW, VEXPANDMD, VEXPANDMQ, VEXTRACTMB,
VEXTRACTMH, VEXTRACTMW, VEXTRACTMD, VEXTRACTMQ): Replace macro
expansion BU_P10V_1 with BU_P10V_AV_1.
(VCLRLB, VCLRRB, VCFUGED, VCLZDM, VCTZDM, VPDEPD, VPEXTD, VGNB,
VCNTMBB, VCNTMBH, VCNTMBW, VCNTMBD): Replace macro expansion
BU_P10V_2 with  BU_P10V_AV_2.
(VEXTRACTBL, VEXTRACTHL, VEXTRACTWL, VEXTRACTDL, VEXTRACTBR, VEXTRACTHR,
VEXTRACTWR, VEXTRACTDR, VINSERTGPRBL, VINSERTGPRHL, VINSERTGPRWL,
VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL, VINSERTGPRBR,
VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR, VINSERTVPRHR,
VINSERTVPRWR, VREPLACE_ELT_V4SI, VREPLACE_ELT_UV4SI, VREPLACE_ELT_V2DF,
VREPLACE_ELT_V4SF, VREPLACE_ELT_V2DI, VREPLACE_ELT_UV2DI, 
VREPLACE_UN_V4SI,
VREPLACE_UN_UV4SI, VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, 
VREPLACE_UN_UV2DI,
VREPLACE_UN_V2DF, VSLDB_V16QI, VSLDB_V8HI, VSLDB_V4SI, VSLDB_V2DI,
VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI, VSRDB_V2DI): Replace macro 
expansion
BU_P10V_3 with BU_P10V_AV_3.
(VXXSPLTIW_V4SI, VXXSPLTIW_V4SF, VXXSPLTID): Replace macro expansion
BU_P10V_1 with BU_P10V_AV_1.
(XXGENPCVM_V16QI, XXGENPCVM_V8HI, XXGENPCVM_V4SI, XXGENPCVM_V2DI):
Replace macro expansion BU_P10V_2 with BU_P10V_VSX_2.
(VXXSPLTI32DX_V4SI, VXXSPLTI32DX_V4SF, VXXBLEND_V16QI, VXXBLEND_V8HI,
VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): Replace 
macor
expansion BU_P10V_3 with BU_P10V_VSX_3.
(XXEVAL, VXXPERMX): Replace macro expansion BU_P10V_4 with 
BU_P10V_VSX_4.
(XVCVBF16SP, XVCVSPBF16): Replace macro expansion BU_VSX_1 with
BU_P10V_VSX_1. Also change MISC to CONST.
* config/rs6000/rs6000-c.c: (P10_BUILTIN_VXXPERMX): Replace with
P10V_BUILTIN_VXXPERMX.
(P10_BUILTIN_VCLRLB, P10_BUILTIN_VCLRLB, P10_BUILTIN_VCLRRB,
P10_BUILTIN_VGNB, P10_BUILTIN_XXEVAL, P10_BUILTIN_VXXPERMX,
P10_BUILTIN_VEXTRACTBL, P10_BUILTIN_VEXTRACTHL, P10_BUILTIN_VEXTRACTWL,
P10_BUILTIN_VEXTRACTDL, P10_BUILTIN_VINSERTGPRHL,
P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
P10_BUILTIN_VEXTRACTBR, P10_BUILTIN_VEXTRACTHR,
P10_BUILTIN_VEXTRACTWR, P10_BUILTIN_VEXTRACTDR,
P10_BUILTIN_VINSERTGPRBR, P10_BUILTIN_VINSERTGPRHR,
P10_BUILTIN_VINSERTGPRWR, P10_BUILTIN_VINSERTGPRDR,
P10_BUILTIN_VINSERTVPRBR, P10_BUILTIN_VINSERTVPRHR,
P10_BUILTIN_VINSERTVPRWR, P10_BUILTIN_VREPLACE_ELT_UV4SI,
P10_BUILTIN_VREPLACE_ELT_V4SI, P10_BUILTIN_VREPLACE_ELT_UV2DI,
P10_BUILTIN_VREPLACE_ELT_V2DI, P10_BUILTIN_VREPLACE_ELT_V2DF,
P10_BUILTIN_VREPLACE_UN_UV4SI, P10_BUILTIN_VREPLACE_UN_V4SI,
P10_BUILTIN_VREPLACE_UN_V4SF, P10_BUILTIN_VREPLACE_UN_UV2DI,

Re: [PATCH] avoid -Wnonnull on synthesized condition in static_cast (PR 96003)

2020-08-13 Thread Jeff Law via Gcc-patches
On Fri, 2020-07-17 at 13:00 -0600, Martin Sebor via Gcc-patches wrote:
> The recent enhancement to treat the implicit this pointer argument
> as nonnull in member functions triggers spurious -Wnonnull for
> the synthesized conditional expression the C++ front end replaces
> the pointer with in some static_cast expressions.  The front end
> already sets the no-warning bit for the test but not for the whole
> conditional expression, so the attached fix extends the same solution
> to it.
> 
> The consequence of this fix is that user-written code like this:
> 
>static_cast(p ? p : 0)->f ();
> or
>static_cast(p ? p : nullptr)->f ();
> 
> don't trigger the warning because they are both transformed into
> the same expression as:
> 
>static_cast(p)->f ();
> 
> What still does trigger it is this:
> 
>static_cast(p ? p : (T*)0)->f ();
> 
> because here it's the inner COND_EXPR's no-warning bit that's set
> (the outer one is clear), whereas in the former expressions it's
> the other way around.  It would be nice if this worked consistently
> but I didn't see an easy way to do that and more than a quick fix
> seems outside the scope for this bug.
> 
> Another case reported by someone else in the same bug involves
> a dynamic_cast.  A simplified test case goes something like this:
> 
>if (dynamic_cast(p))
>  dynamic_cast(p)->f ();
> 
> The root cause is the same: the front end emitting the COND_EXPR
> 
>((p != 0) ? ((T*)__dynamic_cast(p, (& _ZTI1B), (& _ZTI1C), 0)) : 0)
> 
> I decided not to suppress the warning in this case because doing
> so would also suppress it in unconditional calls with the result
> of the cast:
> 
>dynamic_cast(p)->f ();
> 
> and that doesn't seem helpful.  Instead, I'd suggest to make
> the second cast in the if statement to reference to T&:
> 
>if (dynamic_cast(p))
>  dynamic_cast(*p).f ();
Hmmm, I wonder if this would fix a handful of errors I got when doing the 
testing
of the Ranger work for Aldy.  Let me throw the new version into the tester and
respin just the failing packages. 

Jeff
> 



Re: [committed] libstdc++: Deprecate the --enable-cheaders=c_std configuration

2020-08-13 Thread Jonathan Wakely via Gcc-patches

On 13/08/20 16:39 +0100, Jonathan Wakely wrote:

On 13/08/20 16:37 +0100, Jonathan Wakely wrote:

These headers do not offer any tangible benefit compared to the default
c_global version. They are not actively maintained meaning that they
have bugs which have already been fixed for the c_global headers.

This change adds a warning if they are used, and requires a new
--enable-cheaders-obsolete option to allow their use. Unless we receive
reports from users who rely on the c_std headers they should be removed
at some point in future.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_CHEADERS): Warn if the c_std
option is used and fail unless --enable-cheaders-obsolete is
also used.
* configure: Regenerate.

Tested powerpc64le-linux. Committed to trunk.


And this is the change for the release notes. Pushed to wwwdocs.





commit fb20e4dc5242fb245187dd0294f316b5136c8c7d
Author: Jonathan Wakely 
Date:   Thu Aug 13 16:37:58 2020 +0100

   Libstdc++ cheaders=c_std configuration is deprecated

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 1975c6c0..8526b87a 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -46,6 +46,11 @@ a work-in-progress.

  -gsplit-dwarf no longer enables debug info generation
  on its own but requires a separate -g for this.
+
+  The libstdc++ configure option --enable-cheaders=c_std
+  is deprecated and will be removed in a future release. It should be
+  possible to use --enable-cheaders=c_global (the default)
+  with no change in behaviour. 



And the fix for the validation errors! Oops. Pushed to wwwdocs.


commit b468e3fdfd164c473ae1cd3facfc9ec9af9d25a9
Author: Jonathan Wakely 
Date:   Thu Aug 13 16:43:45 2020 +0100

Replace obsolete  elements with 

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 8526b87a..708bb6ac 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -47,9 +47,9 @@ a work-in-progress.
   -gsplit-dwarf no longer enables debug info generation
   on its own but requires a separate -g for this.
 
-  The libstdc++ configure option --enable-cheaders=c_std
+  The libstdc++ configure option --enable-cheaders=c_std
   is deprecated and will be removed in a future release. It should be
-  possible to use --enable-cheaders=c_global (the default)
+  possible to use --enable-cheaders=c_global (the default)
   with no change in behaviour. 
 
 


Re: [committed] libstdc++: Deprecate the --enable-cheaders=c_std configuration

2020-08-13 Thread Jonathan Wakely via Gcc-patches

On 13/08/20 16:37 +0100, Jonathan Wakely wrote:

These headers do not offer any tangible benefit compared to the default
c_global version. They are not actively maintained meaning that they
have bugs which have already been fixed for the c_global headers.

This change adds a warning if they are used, and requires a new
--enable-cheaders-obsolete option to allow their use. Unless we receive
reports from users who rely on the c_std headers they should be removed
at some point in future.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_CHEADERS): Warn if the c_std
option is used and fail unless --enable-cheaders-obsolete is
also used.
* configure: Regenerate.

Tested powerpc64le-linux. Committed to trunk.


And this is the change for the release notes. Pushed to wwwdocs.


commit fb20e4dc5242fb245187dd0294f316b5136c8c7d
Author: Jonathan Wakely 
Date:   Thu Aug 13 16:37:58 2020 +0100

Libstdc++ cheaders=c_std configuration is deprecated

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 1975c6c0..8526b87a 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -46,6 +46,11 @@ a work-in-progress.
 
   -gsplit-dwarf no longer enables debug info generation
   on its own but requires a separate -g for this.
+
+  The libstdc++ configure option --enable-cheaders=c_std
+  is deprecated and will be removed in a future release. It should be
+  possible to use --enable-cheaders=c_global (the default)
+  with no change in behaviour. 
 
 
 


[committed] libstdc++: Deprecate the --enable-cheaders=c_std configuration

2020-08-13 Thread Jonathan Wakely via Gcc-patches
These headers do not offer any tangible benefit compared to the default
c_global version. They are not actively maintained meaning that they
have bugs which have already been fixed for the c_global headers.

This change adds a warning if they are used, and requires a new
--enable-cheaders-obsolete option to allow their use. Unless we receive
reports from users who rely on the c_std headers they should be removed
at some point in future.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_CHEADERS): Warn if the c_std
option is used and fail unless --enable-cheaders-obsolete is
also used.
* configure: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 55484a0f816ef9ad7e13fb1057751223ed8471d8
Author: Jonathan Wakely 
Date:   Thu Aug 13 16:33:28 2020

libstdc++: Deprecate the --enable-cheaders=c_std configuration

These headers do not offer any tangible benefit compared to the default
c_global version. They are not actively maintained meaning that they
have bugs which have already been fixed for the c_global headers.

This change adds a warning if they are used, and requires a new
--enable-cheaders-obsolete option to allow their use. Unless we receive
reports from users who rely on the c_std headers they should be removed
at some point in future.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_CHEADERS): Warn if the c_std
option is used and fail unless --enable-cheaders-obsolete is
also used.
* configure: Regenerate.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 26cf2197549..133125ec4fa 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2377,12 +2377,24 @@ dnl
 dnl --enable-cheaders= [does stuff].
 dnl --disable-cheaders [does not do anything, really].
 dnl  +  Usage:  GLIBCXX_ENABLE_CHEADERS[(DEFAULT)]
-dnl   Where DEFAULT is either 'c' or 'c_std' or 'c_global'.
+dnl   Where DEFAULT is either 'c' or 'c_global' or 'c_std'.
+dnl
+dnl To use the obsolete 'c_std' headers use --enable-cheaders-obsolete as
+dnl well as --enable-cheaders=c_std, otherwise configure will fail.
 dnl
 AC_DEFUN([GLIBCXX_ENABLE_CHEADERS], [
+  GLIBCXX_ENABLE(cheaders-obsolete,no,,
+[allow use of obsolete "C" headers for g++])
   GLIBCXX_ENABLE(cheaders,$1,[[[=KIND]]],
-[construct "C" headers for g++], [permit c|c_std|c_global])
+[construct "C" headers for g++], [permit c|c_global|c_std])
   AC_MSG_NOTICE("C" header strategy set to $enable_cheaders)
+  if test $enable_cheaders = c_std ; then
+AC_MSG_WARN([the --enable-cheaders=c_std configuration is obsolete, 
c_global should be used instead])
+AC_MSG_WARN([if you are unable to use c_global please report a bug or 
inform libstd...@gcc.gnu.org])
+if test $enable_cheaders_obsolete != yes ; then
+  AC_MSG_ERROR(use --enable-cheaders-obsolete to use c_std "C" headers)
+fi
+  fi
 
   C_INCLUDE_DIR='${glibcxx_srcdir}/include/'$enable_cheaders
 


[Patch, fortran] PRs 96100 and 96101 - Problems with string lengths of array constructors

2020-08-13 Thread Paul Richard Thomas via Gcc-patches
Hi All,

The fix for PR9601 is rather trivial and is the last chunk of the patch.
Finding the fix for PR96100 took a silly amount of time but it now looks
rather obvious. Trying to evaluate the string length by calling
gfc_conv_expr_descriptor, when this function is already failing to find it
is kind of doomed to failure :-) Therefore, gfc_conv_expr is used with
tse.descriptor_only set. This has the effect of ignoring trailing array
references and making use of gfc_conv_component_ref's being able to extract
the hidden string length for deferred length components. Finally, the
string length of the first element in the array constructor is set if this
is a deferred length component.

Regtests OK on FC31/x86_64 - OK for master?

Paul

This patch fixes PR96100 and PR96101 by making some minor changes to
the evaluation of string lengths for gfc_conv_expr_descriptor.

2020-08-13  Paul Thomas  

gcc/fortran
PR fortran/96100
PR fortran/96101
* trans-array.c (get_array_charlen): Tidy up the evaluation of
the string length for array constructors. Avoid trailing array
references. Ensure string lengths of deferred length components
are set. For parentheses operator apply string  length to both
the primary expression and the enclosed expression.

gcc/testsuite/
PR fortran/96100
PR fortran/96101
* gfortran.dg/char_length_23.f90: New test.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 8f93b43bafb..ad3286487e8 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7018,7 +7018,12 @@ get_array_charlen (gfc_expr *expr, gfc_se *se)
   e = gfc_constructor_first (expr->value.constructor)->expr;
 
   gfc_init_se (, NULL);
+
+  /* Avoid evaluating trailing array references since all we need is
+	 the string length.  */
   if (e->rank)
+	tse.descriptor_only = 1;
+  if (e->rank && e->expr_type != EXPR_VARIABLE)
 	gfc_conv_expr_descriptor (, e);
   else
 	gfc_conv_expr (, e);
@@ -7036,14 +7041,26 @@ get_array_charlen (gfc_expr *expr, gfc_se *se)
   gfc_add_modify (>pre, expr->ts.u.cl->backend_decl,
 		  tse.string_length);
 
+  /* Make sure that deferred length components point to the hidden
+	 string_length component.  */
+  if (TREE_CODE (tse.expr) == COMPONENT_REF
+	  && TREE_CODE (tse.string_length) == COMPONENT_REF
+	  && TREE_OPERAND (tse.expr, 0) == TREE_OPERAND (tse.string_length, 0))
+	e->ts.u.cl->backend_decl = expr->ts.u.cl->backend_decl;
+
   return;
 
 case EXPR_OP:
   get_array_charlen (expr->value.op.op1, se);
 
-  /* For parentheses the expression ts.u.cl is identical.  */
+  /* For parentheses the expression ts.u.cl should be identical.  */
   if (expr->value.op.op == INTRINSIC_PARENTHESES)
-	return;
+	{
+	  if (expr->value.op.op1->ts.u.cl != expr->ts.u.cl)
+	expr->ts.u.cl->backend_decl
+			= expr->value.op.op1->ts.u.cl->backend_decl;
+	  return;
+	}
 
   expr->ts.u.cl->backend_decl =
 		gfc_create_var (gfc_charlen_type_node, "sln");
! { dg-do compile }
!
! Test the fix for PRs 96100 and 96101.
!
! Contributed by Gerhardt Steinmetz  
!
program p
   type t
  character(:), allocatable :: c(:)
   end type
   type(t) :: x
   character(:), allocatable :: w

! PR96100
   allocate(x%c(2), source = 'def')
   associate (y => [x%c(1:1)])   ! ICE
 print *,y
   end associate

! PR96101
   w = 'abc'
   associate (y => ([w(:)]))
  print *, y ! ICE
   end associate

end


Re: [PATCH] c++: premature analysis of requires-expression [PR96410]

2020-08-13 Thread Patrick Palka via Gcc-patches
On Mon, 10 Aug 2020, Jason Merrill wrote:

> On 8/10/20 2:18 PM, Patrick Palka wrote:
> > On Mon, 10 Aug 2020, Patrick Palka wrote:
> > 
> > > In the below testcase, semantic analysis of the requires-expressions in
> > > the generic lambda must be delayed until instantiation of the lambda
> > > because the requirements depend on the lambda's template arguments.  But
> > > tsubst_requires_expr does semantic analysis even during regeneration of
> > > the lambda, which leads to various bogus errors and ICEs since some
> > > subroutines aren't prepared to handle dependent/template trees.
> > > 
> > > This patch adjusts subroutines of tsubst_requires_expr to avoid doing
> > > some problematic semantic analyses when processing_template_decl.
> > > In particular, expr_noexcept_p generally can't be checked on a dependent
> > > expression.  Next, tsubst_nested_requirement should avoid checking
> > > satisfaction when processing_template_decl.  And similarly for
> > > convert_to_void (called from tsubst_valid_expression_requirement).
> 
> I wonder if, instead of trying to do a partial substitution into a
> requires-expression at all, we want to use the
> PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism to remember the
> arguments for later satisfaction?

IIUC, avoiding partial substitution into a requires-expression would
mean we'd go from currently accepting the following testcase to
rejecting it, because we'd now instantiate B::type as part of the
first requirement before first noticing the SFINAE error in the second
requirement (which depends only on the outer template argument, and
which would determine the value of the requires-expression):

  template
  struct B { using type = T::fatal; };

  template
  constexpr auto foo() {
return []  (U) {
  return requires { typename B::type; typename T::type; };
};
  };

  int i = foo()(0);

I guess this is exactly the kind of testcase that motivates using the
PACK_EXPANSION_EXTRA_ARGS/IF_STMT_EXTRA_ARGS mechanism for
requires-expressions?

> 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested
> > > against the cmcstl2 project.  Does this look OK to commit?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/96409
> > >   PR c++/96410
> > >   * constraint.cc (tsubst_compound_requirement): When
> > >   processing_template_decl, don't check noexcept of the
> > >   substituted expression.
> > >   (tsubst_nested_requirement): Just substitute into the constraint
> > >   when processing_template_decl.
> > >   * cvt.c (convert_to_void): Don't resolve concept checks when
> > >   processing_template_decl.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/96409
> > >   PR c++/96410
> > >   * g++.dg/cpp2a/concepts-lambda13.C: New test.
> > > ---
> > >   gcc/cp/constraint.cc  |  9 ++-
> > >   gcc/cp/cvt.c  |  2 +-
> > >   .../g++.dg/cpp2a/concepts-lambda13.C  | 25 +++
> > >   3 files changed, 34 insertions(+), 2 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda13.C
> > > 
> > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > > index e4aace596e7..db2036502a7 100644
> > > --- a/gcc/cp/constraint.cc
> > > +++ b/gcc/cp/constraint.cc
> > > @@ -1993,7 +1993,8 @@ tsubst_compound_requirement (tree t, tree args,
> > > subst_info info)
> > >   /* Check the noexcept condition.  */
> > > bool noexcept_p = COMPOUND_REQ_NOEXCEPT_P (t);
> > > -  if (noexcept_p && !expr_noexcept_p (expr, tf_none))
> > > +  if (!processing_template_decl
> > > +  && noexcept_p && !expr_noexcept_p (expr, tf_none))
> > >   return error_mark_node;
> > >   /* Substitute through the type expression, if any.  */
> > > @@ -2023,6 +2024,12 @@ static tree
> > >   tsubst_nested_requirement (tree t, tree args, subst_info info)
> > >   {
> > > /* Ensure that we're in an evaluation context prior to satisfaction.
> > > */
> > > +  if (processing_template_decl)
> > > +{
> > > +  tree r = tsubst_constraint (TREE_OPERAND (t, 0), args,
> > > +   info.complain, info.in_decl);
> > 
> > Oops, the patch is missing a check for error_mark_node here, so that
> > upon substitution failure we immediately resolve the requires-expression
> > to false.  Here's an updated patch with the check and a regression test
> > added:
> > 
> > -- >8 --
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/96409
> > PR c++/96410
> > * constraint.cc (tsubst_compound_requirement): When
> > processing_template_decl, don't check noexcept of the
> > substituted expression.
> > (tsubst_nested_requirement): Just substitute into the constraint
> > when processing_template_decl.
> > * cvt.c (convert_to_void): Don't resolve concept checks when
> > processing_template_decl.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/96409
> > PR c++/96410
> > * g++.dg/cpp2a/concepts-lambda13.C: New test.
> >   

Re: Do not combine PRED_LOOP_GUARD and PRED_LOOP_GUARD_WITH_RECURSION

2020-08-13 Thread Martin Liška

On 8/12/20 1:02 PM, Tamar Christina wrote:

Hmm the regression on exchange2 is 11%. Or 1.22% on specint 2017 overall..
This is a rather big regression.. What would be the correct way to do this?


I can confirm that many our LNT tester configurations spotted that:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=232.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=294.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=33.407.0

Martin


Re: [Patch, fortran] PR93671 - gfortran 8-10 ICE on intrinsic assignment to allocatable derived-type component of coarray

2020-08-13 Thread Andre Vehreschild
Hi Thomas,

thanks for the review. Committed to trunk.

That's what I am here for: to fix some issues with coarrays.

Regards,
Andre

On Wed, 12 Aug 2020 18:58:01 +0200
Thomas Koenig  wrote:

> Hi Andre,
>
> > Regtests ok on FC31.x86_64. Ok for trunk?
>
> Good thing you're back!  Any help with bugfixing is
> highly appreciated, and Coarrays certainly can use
> some work.
>
> The patch is OK for trunk.
>
> Best regards
>
>   Thomas


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-13 Thread Joe Ramsay
From: Joe Ramsay 

Hi,

Previously, the machine description patterns for vst1q accepted a generic memory
operand for the destination, which could lead to an unrecognised builtin when
expanding vst1q* intrinsics. This change fixes the patterns to only accept MVE
memory operands.

Tested on arm-none-eabi, clean w.r.t. gcc and CMSIS-DSP testsuites. OK for
trunk?

Thanks,
Joe

gcc/ChangeLog:

2020-08-13  Joe Ramsay 

* config/arm/mve.md (mve_vst1q_f): Require MVE memory operand for
destination.
(mve_vst1q_): Likewise.

gcc/testsuite/ChangeLog:

2020-08-13  Joe Ramsay 

* gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only MVE
memory operand is accepted.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
---
 gcc/config/arm/mve.md   |  4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
 6 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9758862..465b39a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9330,7 +9330,7 @@
   [(set_attr "length" "4")])
 
 (define_expand "mve_vst1q_f"
-  [(match_operand: 0 "memory_operand")
+  [(match_operand: 0 "mve_memory_operand")
(unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
   ]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
@@ -9340,7 +9340,7 @@
 })
 
 (define_expand "mve_vst1q_"
-  [(match_operand:MVE_2 0 "memory_operand")
+  [(match_operand:MVE_2 0 "mve_memory_operand")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
   ]
   "TARGET_HAVE_MVE"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
index 363b4ca..312b746 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
@@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
   vst1q_f16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (float16_t * addr, float16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (float16_t a, float16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
index 37c4713..cd14e2c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
@@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
   vst1q_s16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (int16_t * addr, int16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (int16_t a, int16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
index fe5edea..0004c80 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
@@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
   vst1q_s8 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrb.8"  }  } */
-
 void
 foo1 (int8_t * addr, int8x16_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrb.8"  }  } */
+/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
+
+void
+foo2 (int8_t a, int8x16_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
index a4c8c1a..248e7ce 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
@@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
   vst1q_u16 (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
-
 void
 foo1 (uint16_t * addr, uint16x8_t value)
 {
   vst1q (addr, value);
 }
 
-/* { dg-final { scan-assembler "vstrh.16"  }  } */
+/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
+
+void
+foo2 (uint16_t a, uint16x8_t x)
+{
+  vst1q (, x);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
index bf20b6d..f8b48a6 

[PATCH] arm: Remove coercion from scalar argument to vmin & vmax intrinsics

2020-08-13 Thread Joe Ramsay
From: Joe Ramsay 

Hi,

This patch fixes an issue with vmin* and vmax* intrinsics which accept
a scalar argument. Previously when the scalar was of different width
to the vector elements this would generate __ARM_undef. This change
allows the scalar argument to be implicitly converted to the correct
width. Also tidied up the relevant unit tests, some of which would
have passed even if only one of two or three intrinsic calls had
compiled correctly.

Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
testsuites are clean. OK for trunk?

Thanks,
Joe

gcc/ChangeLog:

2020-08-10  Joe Ramsay 

* config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of scalar
argument.
(__arm_vmaxnmvq): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmaxavq): Likewise.
(__arm_vmaxavq_p): Likewise.
(__arm_vmaxvq): Likewise.
(__arm_vmaxvq_p): Likewise.
(__arm_vminavq): Likewise.
(__arm_vminavq_p): Likewise.
(__arm_vminvq): Likewise.
(__arm_vminvq_p): Likewise.

gcc/testsuite/ChangeLog:

2020-08-10  Joe Ramsay 

* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for mismatched
width of scalar argument.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.
---
 gcc/config/arm/arm_mve.h   | 110 ++---
 .../gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c  |  11 ++-
 .../gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c  

Re: [PATCH] emit-rtl.c: Allow splitting of RTX_FRAME_RELATED_P insns?

2020-08-13 Thread Senthil Kumar Selvaraj via Gcc-patches


Richard Sandiford writes:

> Senthil Kumar via Gcc-patches  writes:
>> Hi,
>>
>>   I'm working on converting the AVR backend to MODE_CC, following
>>   the steps described for case #2 in the CC0 transition wiki page,
>>   and I've implemented the first three bullet
>>   points (https://github.com/saaadhu/gcc-avr-cc0/tree/avr-cc0-squashed). With
>>   the below patch, there are zero regressions (for mega and xmega
>>   subarchs) compared to the current mainline, as of yesterday.
>>
>>   The wiki suggests using post-reload splitters, so that's the
>>   direction I took, but I ran into an issue where split_insn
>>   bails out early if RTX_FRAME_RELATED_P is true - this means
>>   that splits for REG_CC clobbering insns with
>>   RTX_FRAME_RELATED_P will never execute, resulting in a
>>   could-not-split insn ICE in the final stage.
>>
>>   I see that the recog.c:peep2_attempt allows splitting of a
>>   RTX_FRAME_RELATED_P insn, provided the result of the split is a
>>   single insn. Would it be ok to modify try_split also to
>>   allow those kinds of insns (tentative patch attached, code
>>   copied over from peep2_attempt, only setting old and new_insn)? Or is there
>>   a different approach to fix this?
>
> I agree there's no obvious reason why splitting to a single insn
> should be rejected but a peephole2 to a single instruction should be OK.
> And reusing the existing, tried-and-tested code is the way to go.
>
> But could you split the code out of peep2_attempt into a subroutine
> (probably still in recog.c) and reuse it in try_split?

How does the below patch look? Bootstrapped and reg tested on
x86_64-linux.
>
> BTW, just to check: is your email address in MAINTAINERS still correct?

It was out-of-date, yes - updated now.

Regards
Senthil


2020-08-13  Senthil Kumar Selvaraj  
   
gcc/ChangeLog:

* emit-rtl.c (try_split): Call copy_frame_info_to_split_insn
to split certain RTX_FRAME_RELATED_P insns.
* recog.c (copy_frame_info_to_split_insn): New function.
(peep2_attempt): Split copying of frame related info of
RTX_FRAME_RELATED_P insns into above function and call it.
* recog.h (copy_frame_info_to_split_insn): Declare it.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index f9b0e9714d9..3706f0a03fd 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3822,10 +3822,6 @@ try_split (rtx pat, rtx_insn *trial, int last)
   int njumps = 0;
   rtx_insn *call_insn = NULL;
 
-  /* We're not good at redistributing frame information.  */
-  if (RTX_FRAME_RELATED_P (trial))
-return trial;
-
   if (any_condjump_p (trial)
   && (note = find_reg_note (trial, REG_BR_PROB, 0)))
 split_branch_probability
@@ -3842,6 +3838,7 @@ try_split (rtx pat, rtx_insn *trial, int last)
   if (!seq)
 return trial;
 
+  int split_insn_count = 0;
   /* Avoid infinite loop if any insn of the result matches
  the original pattern.  */
   insn_last = seq;
@@ -3850,11 +3847,25 @@ try_split (rtx pat, rtx_insn *trial, int last)
   if (INSN_P (insn_last)
  && rtx_equal_p (PATTERN (insn_last), pat))
return trial;
+  split_insn_count++;
   if (!NEXT_INSN (insn_last))
break;
   insn_last = NEXT_INSN (insn_last);
 }
 
+  /* We're not good at redistributing frame information if
+ the split occurs before reload or if it results in more
+ than one insn.  */
+  if (RTX_FRAME_RELATED_P (trial))
+{
+  if (!reload_completed || split_insn_count != 1)
+return trial;
+
+  rtx_insn *new_insn = seq;
+  rtx_insn *old_insn = trial;
+  copy_frame_info_to_split_insn (old_insn, new_insn);
+}
+
   /* We will be adding the new sequence to the function.  The splitters
  may have introduced invalid RTL sharing, so unshare the sequence now.  */
   unshare_all_rtl_in_chain (seq);
diff --git a/gcc/recog.c b/gcc/recog.c
index 25f19b1b1cf..e024597f9d7 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -3277,6 +3277,78 @@ peep2_reinit_state (regset live)
   COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
 }
 
+/* Copies frame related info of an insn (old_insn) to the single
+   insn (new_insn) that was obtained by splitting old_insn.  */
+
+void
+copy_frame_info_to_split_insn (rtx_insn *old_insn, rtx_insn *new_insn)
+{
+  bool any_note = false;
+  rtx note;
+
+  if (!RTX_FRAME_RELATED_P (old_insn))
+return;
+
+  RTX_FRAME_RELATED_P (new_insn) = 1;
+
+  /* Allow the backend to fill in a note during the split.  */
+  for (note = REG_NOTES (new_insn); note ; note = XEXP (note, 1))
+switch (REG_NOTE_KIND (note))
+  {
+  case REG_FRAME_RELATED_EXPR:
+  case REG_CFA_DEF_CFA:
+  case REG_CFA_ADJUST_CFA:
+  case REG_CFA_OFFSET:
+  case REG_CFA_REGISTER:
+  case REG_CFA_EXPRESSION:
+  case REG_CFA_RESTORE:
+  case REG_CFA_SET_VDRAP:
+any_note = true;
+break;
+  default:
+break;
+  }
+
+  /* If the backend didn't supply a note, copy 

Re: r11-2663 causes static_assert failure

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 01:55:40PM +0100, Iain Sandoe wrote:
> diff --git a/gcc/vec.h b/gcc/vec.h
> index db48e97..a8fca34 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -623,7 +623,7 @@ public:
>/* FIXME - These fields should be private, but we need to cater to
>compilers that have stricter notions of PODness for types.  */
>vec_prefix m_vecpfx;
> -  T m_vecdata[1];
> +  T m_vecdata[1] __attribute__((__aligned__(alignof(T;

That uses an extension, so won't help us if the system compiler doesn't
support that extension, and will unnecessarily waste bytes in all the long
long (and some double?) vecs.

Perhaps even better we can just use vec itself rather than creating a new
artificial type, and just query the alignof the m_vecdata member in that
case:

--- gcc/vec.h.jj2020-08-12 12:45:58.410686880 +0200
+++ gcc/vec.h   2020-08-13 15:03:15.823777382 +0200
@@ -1281,10 +1281,11 @@ template
 inline size_t
 vec::embedded_size (unsigned alloc)
 {
-  struct alignas (T) U { char data[sizeof (T)]; };
+  vec *v;
+  struct alignas (alignof (v->m_vecdata)) U { char data[sizeof (T)]; };
   typedef vec vec_embedded;
-  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
-  static_assert (alignof (vec_embedded) == alignof(vec), "");
+  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
+  static_assert (alignof (vec_embedded) == alignof (vec), "");
   return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
 }
 


Jakub



Re: r11-2663 causes static_assert failure

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 01:46:53PM +0100, Iain Sandoe wrote:
> Jakub Jelinek via Gcc-patches  wrote:
> 
> > On Thu, Aug 13, 2020 at 02:06:21PM +0200, Tobias Burnus wrote:
> > > Build server is x86_64-gnu-linux, "i686-pc-linux-gnu-g++" is a
> > > native binary of GCC 5.2.0
> > > [Don't ask why i686 and not x86_64 is used.]
> > 
> > Ah, ok, so this boils down to:
> > int i = alignof (long long int);
> > struct TTT { char a; long long int b; };
> > int j = alignof (TTT);
> > ending up to be 8, 4 with -O2 -m32 in GCC 4.8 - 7.x and
> > only in GCC 8+ 4, 4.
> 
> Also the case for long long (darwin) and double (darwin, aix) where the
> embedded alignmen
> is 4 and the natural aligment is 8 - as ABI (so in dependent of compiler
> version).
> (which breaks bootstrap on all 32b Darwin hosts at least)
> 
> If the desired outcome is that the embedded vector is aligned to the natural
> alignment
> of the type that could be forced on that field of the vec?

The desired outcome is to get the offsetof value even for
vec with non-standard-layout type payload, and so in that case build

Jakub



Re: r11-2663 causes static_assert failure

2020-08-13 Thread Iain Sandoe

Iain Sandoe via Gcc-patches  wrote:


Jakub Jelinek via Gcc-patches  wrote:


On Thu, Aug 13, 2020 at 02:06:21PM +0200, Tobias Burnus wrote:
Build server is x86_64-gnu-linux, "i686-pc-linux-gnu-g++" is a native  
binary of GCC 5.2.0

[Don't ask why i686 and not x86_64 is used.]


Ah, ok, so this boils down to:
int i = alignof (long long int);
struct TTT { char a; long long int b; };
int j = alignof (TTT);
ending up to be 8, 4 with -O2 -m32 in GCC 4.8 - 7.x and
only in GCC 8+ 4, 4.


Also the case for long long (darwin) and double (darwin, aix) where the  
embedded alignmen
is 4 and the natural aligment is 8 - as ABI (so in dependent of compiler  
version).

(which breaks bootstrap on all 32b Darwin hosts at least)

If the desired outcome is that the embedded vector is aligned to the  
natural alignment

of the type that could be forced on that field of the vec?


i.e:

diff --git a/gcc/vec.h b/gcc/vec.h
index db48e97..a8fca34 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -623,7 +623,7 @@ public:
   /* FIXME - These fields should be private, but we need to cater to
 compilers that have stricter notions of PODness for types.  */
   vec_prefix m_vecpfx;
-  T m_vecdata[1];
+  T m_vecdata[1] __attribute__((__aligned__(alignof(T;
 };




Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-13 Thread Jan Hubicka
> 
> Thanks for the information :)  Tamar replied that there is another
> regression *on exchange2 is 11%.*, I've also rebased my code and confirmed
> it really getting even slower than before (revert the patch could pull the
> performance back)...

Yep, we need to figure out how to fix this - the summary on IRA issue is
interesting.  I was aware of it, but never really looked into place what
IRA does wrong.

Basically what happened historically is that when exchange2 was newly
added to spec we looked on it with Martin and noticed the issue with the
loop nest being predicted very argressively toward to the innermost loop
which led the loop optimizer to do funny things on loops that really
must not iterate too many times since we know that the frequency of
recursive call is strictly less than 1.

We spent some time on tuning inliner for low trip count loops and also
added the LOOP_GUARD_WITH_PREDICTION heuristics that was meant to reduce
probability of entering the loop which contains recursive call - this
should be a pattern in all similar backtrack-like algorithms. 
The conditions terminating the walk should be likely or the program
would never finish.
> 
> > 
> > Now if ipa-cp decides to duplicate digits few times we have a new
> > problem.  The tree of recursion is orgnaized in a way that the depth is
> > bounded by 10 (which GCC does not know) and moreover most time is not
> > spent on very deep levels of recursion.
> > 
> > For that you have the patch which increases frequencies of recursively
> > cloned nodes, however it still seems to me as very specific hack for
> > exchange: I do not see how to guess where most of time is spent.
> > Even for very regular trees, by master theorem, it depends on very
> > little differences in the estimates of recursion frequency whether most
> > of time is spent on the top of tree, bottom or things are balanced.
> 
> The build is not PGO, so I am not clear how profile count will affect the 
> ipa-cp and ipa-inline decision. 

Even without PGO the counts are used.  predict.c first estimates the
branch probabilities and then these are propagated to estimated counts
of basic blocks and this is still used thorough the compiler to drive
the optimization decisions (so ipa-inline computed the esitmated runtime
effects of the inlining that is all weighted by the counts, similarly
does ipa-cp).

What we ended up was a bug in the patch adding LOOP_GUARD_WITH_REDICTION
which resulted in the loop guards in exchange to be prodicted by both
LOOP_GUARD_WITH_RECRUSIOn and LOOP_GUARD.
Since first claims 85% chance that loop will not be entered and
second 75% the combined outcome got over 90% probability and combining
10 conditions resulted in very small frequency of the recursive edge.
It for did helps IRA to allocate sanely, but not for good reasons,
so we ended up with exchange improvements and did not notice the
bug (this is one fixed by patch above).

For some releases PGO performance is slower than non-PGO
https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/options
which I think is a combination of IRA and bad decisions in some loop
opts.  The other problem is that vectorizer tends to blow up the
register pressure too.

> Since there are no other callers outside of these specialized nodes, the
> guessed profile count should be same equal?  Perf tool shows that even
> each specialized node is called only once, none of them take same time for
> each call:
> 
>   40.65%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.4
>   16.31%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.3
>   10.91%  exchange2_gcc.o  libgfortran.so.5.0.0 [.] 
> _gfortran_mminloc0_4_i4
>5.41%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.6
>4.68%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] __logic_MOD_new_solver
>3.76%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.5
>1.07%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.7
>0.84%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_brute.constprop.0
>0.47%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.2
>0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_digits_2.constprop.1
>0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_covered.constprop.0
>0.11%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_reflected.constprop.0
>0.00%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
> __brute_force_MOD_brute.constprop.1
> 
> 
> digits_2.constprop.4 & digits_2.constprop.3 takes most of the execution time,
> So profile count and frequency seem not very helpful for this case? 

Yep, you can not really determine the time spent on each of recursion
levels from the recursive edge probability since you can 

[PATCH] Add missing vn_reference_t::punned initialization

2020-08-13 Thread Martin Liška

As mentioned in the PR, we miss one initialization of ::punned
in vn_reference_lookup_call.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/96597
* tree-ssa-sccvn.c (vn_reference_lookup_call): Add missing
initialization of ::punned.
(vn_reference_insert): Use consistently false instead of 0.
(vn_reference_insert_pieces): Likewise.
---
 gcc/tree-ssa-sccvn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 934ae40670d..789d3664db5 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -3578,6 +3578,7 @@ vn_reference_lookup_call (gcall *call, vn_reference_t 
*vnresult,
   vr->vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr->operands = valueize_shared_reference_ops_from_call (call);
   vr->type = gimple_expr_type (call);
+  vr->punned = false;
   vr->set = 0;
   vr->base_set = 0;
   vr->hashcode = vn_reference_compute_hash (vr);
@@ -3601,7 +3602,7 @@ vn_reference_insert (tree op, tree result, tree vuse, 
tree vdef)
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_shared_reference_ops_from_ref (op, ).copy ();
   vr1->type = TREE_TYPE (op);
-  vr1->punned = 0;
+  vr1->punned = false;
   ao_ref op_ref;
   ao_ref_init (_ref, op);
   vr1->set = ao_ref_alias_set (_ref);
@@ -3661,7 +3662,7 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_refs (operands);
   vr1->type = type;
-  vr1->punned = 0;
+  vr1->punned = false;
   vr1->set = set;
   vr1->base_set = base_set;
   vr1->hashcode = vn_reference_compute_hash (vr1);
--
2.28.0



Re: r11-2663 causes static_assert failure

2020-08-13 Thread Iain Sandoe via Gcc-patches

Jakub Jelinek via Gcc-patches  wrote:


On Thu, Aug 13, 2020 at 02:06:21PM +0200, Tobias Burnus wrote:
Build server is x86_64-gnu-linux, "i686-pc-linux-gnu-g++" is a native  
binary of GCC 5.2.0

[Don't ask why i686 and not x86_64 is used.]


Ah, ok, so this boils down to:
int i = alignof (long long int);
struct TTT { char a; long long int b; };
int j = alignof (TTT);
ending up to be 8, 4 with -O2 -m32 in GCC 4.8 - 7.x and
only in GCC 8+ 4, 4.


Also the case for long long (darwin) and double (darwin, aix) where the  
embedded alignmen
is 4 and the natural aligment is 8 - as ABI (so in dependent of compiler  
version).

(which breaks bootstrap on all 32b Darwin hosts at least)

If the desired outcome is that the embedded vector is aligned to the  
natural alignment

of the type that could be forced on that field of the vec?

Iain




So perhaps we want:
--- gcc/vec.h.jj2020-08-12 12:45:58.410686880 +0200
+++ gcc/vec.h   2020-08-13 14:18:06.967041880 +0200
@@ -1281,10 +1281,11 @@ template
inline size_t
vec::embedded_size (unsigned alloc)
{
-  struct alignas (T) U { char data[sizeof (T)]; };
+  struct V { char a; T b; };
+  struct alignas (V) U { char data[sizeof (T)]; };
  typedef vec vec_embedded;
-  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
-  static_assert (alignof (vec_embedded) == alignof(vec), "");
+  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
+  static_assert (alignof (vec_embedded) == alignof (vec), "");
  return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
}

or
--- gcc/vec.h.jj2020-08-12 12:45:58.410686880 +0200
+++ gcc/vec.h   2020-08-13 14:18:06.967041880 +0200
@@ -1281,10 +1281,11 @@ template
inline size_t
vec::embedded_size (unsigned alloc)
{
-  struct alignas (T) U { char data[sizeof (T)]; };
+  struct V { char a; T b; } *v;
+  struct alignas (alignof (v->b)) U { char data[sizeof (T)]; };
  typedef vec vec_embedded;
-  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
-  static_assert (alignof (vec_embedded) == alignof(vec), "");
+  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
+  static_assert (alignof (vec_embedded) == alignof (vec), "");
  return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
}

?

Jakub





Re: r11-2663 causes static_assert failure

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 02:25:42PM +0200, Jakub Jelinek via Gcc-patches wrote:
> --- gcc/vec.h.jj  2020-08-12 12:45:58.410686880 +0200
> +++ gcc/vec.h 2020-08-13 14:18:06.967041880 +0200
> @@ -1281,10 +1281,11 @@ template
>  inline size_t
>  vec::embedded_size (unsigned alloc)
>  {
> -  struct alignas (T) U { char data[sizeof (T)]; };
> +  struct V { char a; T b; } *v;
> +  struct alignas (alignof (v->b)) U { char data[sizeof (T)]; };
>typedef vec vec_embedded;
> -  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
> -  static_assert (alignof (vec_embedded) == alignof(vec), "");
> +  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
> +  static_assert (alignof (vec_embedded) == alignof (vec), "");
>return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
>  }
>  

I guess this version is better.  Or maybe we should use the old
embedded_size version with offsetof for std::is_standard_layout (T)
and only use the new one (and in that case we don't need hacks like this
otherwise)?

Jakub



Re: r11-2663 causes static_assert failure

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 02:06:21PM +0200, Tobias Burnus wrote:
> Build server is x86_64-gnu-linux, "i686-pc-linux-gnu-g++" is a native binary 
> of GCC 5.2.0
> [Don't ask why i686 and not x86_64 is used.]

Ah, ok, so this boils down to:
int i = alignof (long long int);
struct TTT { char a; long long int b; };
int j = alignof (TTT);
ending up to be 8, 4 with -O2 -m32 in GCC 4.8 - 7.x and
only in GCC 8+ 4, 4.

So perhaps we want:
--- gcc/vec.h.jj2020-08-12 12:45:58.410686880 +0200
+++ gcc/vec.h   2020-08-13 14:18:06.967041880 +0200
@@ -1281,10 +1281,11 @@ template
 inline size_t
 vec::embedded_size (unsigned alloc)
 {
-  struct alignas (T) U { char data[sizeof (T)]; };
+  struct V { char a; T b; };
+  struct alignas (V) U { char data[sizeof (T)]; };
   typedef vec vec_embedded;
-  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
-  static_assert (alignof (vec_embedded) == alignof(vec), "");
+  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
+  static_assert (alignof (vec_embedded) == alignof (vec), "");
   return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
 }

or
--- gcc/vec.h.jj2020-08-12 12:45:58.410686880 +0200
+++ gcc/vec.h   2020-08-13 14:18:06.967041880 +0200
@@ -1281,10 +1281,11 @@ template
 inline size_t
 vec::embedded_size (unsigned alloc)
 {
-  struct alignas (T) U { char data[sizeof (T)]; };
+  struct V { char a; T b; } *v;
+  struct alignas (alignof (v->b)) U { char data[sizeof (T)]; };
   typedef vec vec_embedded;
-  static_assert (sizeof (vec_embedded) == sizeof(vec), "");
-  static_assert (alignof (vec_embedded) == alignof(vec), "");
+  static_assert (sizeof (vec_embedded) == sizeof (vec), "");
+  static_assert (alignof (vec_embedded) == alignof (vec), "");
   return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
 }
 
?

Jakub



Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-13 Thread Richard Sandiford
Christophe Lyon  writes:
> On Thu, 13 Aug 2020 at 03:54, qiaopeixin  wrote:
>>
>> Thanks for the review and commit.
>>
>> All the best,
>> Peixin
>>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: 2020年8月13日 0:25
>> To: qiaopeixin 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
>> [PR96479]
>>
>> qiaopeixin  writes:
>> > Hi,
>> >
>> > The test case vector-subscript-2.c in the gcc testsuit will report an ICE 
>> > in the expand pass since '-mgeneral-regs-only' is incompatible with the 
>> > use of V4SI mode. I propose to report the diagnostic information instead 
>> > of ICE, and the problem has been discussed on PR 96479.
>> >
>> > I attached the patch to solve the problem. Bootstrapped and tested on 
>> > aarch64-linux-gnu. Any suggestions?
>>
>> Thanks, pushed.  I was initially sceptical because raising an error here and 
>> in aarch64_layout_arg is a hack.  Both functions are just query functions 
>> and shouldn't have any side effects.
>>
>> The approach we took for FP modes seemed better: we define the FP move 
>> patterns unconditionally, and raise an error if we try to emit an FP move 
>> with !TARGET_FLOAT.  This defers any error reporting until we actually try 
>> to generate code that depends on TARGET_FLOAT.
>>
>> But I guess SIMD stuff is different.  There's no reason in principle why you 
>> can't use:
>>
>>   unsigned short __attribute__((vector_size(8)))
>>
>> *within* a function with -mgeneral-regs-only.  It would just need to be 
>> emulated, in the same way as for:
>>
>>   unsigned short __attribute__((vector_size(4)))
>>
>> So it would be wrong to define the SIMD move patterns unconditionally and 
>> raise an error there.
>>
>> So all in all, I agree this is the best we can do given the current 
>> infrastructure.
>>
>
> Since this patch was committed my buildbot is broken for
> aarch64-linux-gnu because it now fails to build glibc-2.29:
> ../stdlib/bits/stdlib-float.h: In function 'atof':
> ../stdlib/bits/stdlib-float.h:26:1: error: '-mgeneral-regs-only' is
> incompatible with the use of floating-point types

Thanks for the heads-up.  I've reverted the patch for now.

Looking more closely, it seems like aarch64_init_cumulative_args
already tries to catch the problem that the patch was fixing:

  if (!silent_p
  && !TARGET_FLOAT
  && fndecl && TREE_PUBLIC (fndecl)
  && fntype && fntype != error_mark_node)
{
  const_tree type = TREE_TYPE (fntype);
  machine_mode mode ATTRIBUTE_UNUSED; /* To pass pointer as argument.  */
  int nregs ATTRIBUTE_UNUSED; /* Likewise.  */
  if (aarch64_vfp_is_call_or_return_candidate (TYPE_MODE (type), type,
   , , NULL, false))
aarch64_err_no_fpadvsimd (TYPE_MODE (type));
}

The only reason it doesn't work for the testcase is that TREE_PUBLIC
condition.  TBH I'm not sure why it or the fndecl test is there:
this is just as problematic when calling via a function pointer
or when calling a static function.

Richard


Re: r11-2663 causes static_assert failure

2020-08-13 Thread Tobias Burnus

On 8/13/20 1:52 PM, Jakub Jelinek wrote:


On Thu, Aug 13, 2020 at 01:38:07PM +0200, Tobias Burnus wrote:

I got a bit lost in this thread – but the
commit r11-2663-g82c4b78dbef6f03838e3040688c934360a09513f
"Replace std::vector<> usage in ipa-fnsummary.c with GCC's vec<>."

Causes here:

gcc-mainline/gcc/vec.h:1287:3: error: static assertion failed:
static_assert (alignof (vec_embedded) == alignof(vec), "");
^

On which instantiation it is and what is your system compiler?
Do you get when building the i686 native compiler, or i686 to powerpc64le 
cross-compiler
or powerpc64le to nvptx-none cross-compiler?


Build server is x86_64-gnu-linux, "i686-pc-linux-gnu-g++" is a native binary of 
GCC 5.2.0
[Don't ask why i686 and not x86_64 is used.]

Run command was:

i686-pc-linux-gnu-g++  -std=gnu++11 -c -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE
-DGENERATOR_FILE  -I. -Ibuild -I.../gcc-mainline/gcc 
-I.../gcc-mainline/gcc/build -I.../gcc-mainline/gcc/../include  
-I.../gcc-mainline/gcc/../libcpp/include  \
-o build/genemit.o .../gcc-mainline/gcc/genemit.c
In file included from .../gcc-mainline/gcc/rtl.h:30:0,
 from .../gcc-mainline/gcc/genautomata.c:111:
.../gcc-mainline/gcc/vec.h: In instantiation of 'static size_t vec::embedded_size(unsigned int) [with T = long long int; A = va_heap; 
size_t = unsigned int]':
.../gcc-mainline/gcc/vec.h:288:58:   required from 'static void 
va_heap::reserve(vec*&, unsigned int, bool) [with T = 
long long int]'
.../gcc-mainline/gcc/vec.h:1749:20:   required from 'bool 
vec::reserve(unsigned int, bool) [with T = long long int]'
.../gcc-mainline/gcc/vec.h:1858:11:   required from 'T* vec::safe_push(const 
T&) [with T = long long int]'
.../gcc-mainline/gcc/genautomata.c:7454:40:   required from here
.../gcc-mainline/gcc/vec.h:1287:3: error: static assertion failed:
   static_assert (alignof (vec_embedded) == alignof(vec), "");
   ^
Makefile:2719: recipe for target 'build/genautomata.o' failed
make[1]: *** [build/genautomata.o] Error 1

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[Patch] configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

2020-08-13 Thread Tobias Burnus

Here, the ${host} compiler was newer, supporting C++11 by default
while the ${build} compiler required -std=c++11.
Hence, the build failed.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
configure: Also check C++11 (flags) for ${build} compiler not only for ${host}

config/ChangeLog:

	* ax_cxx_compile_stdcxx.m4: Add fourth argument to check also
	the CXX_FOR_BUILD compiler.

ChangeLog:

	* configure.ac: Run AX_CXX_COMPILE_STDCXX also for ${build} compiler,
	if not the same as ${host}.
	* configure: Regenerate.

 config/ax_cxx_compile_stdcxx.m4 |   36 +-
 configure   | 1009 ++-
 configure.ac|4 +
 3 files changed, 1040 insertions(+), 9 deletions(-)
diff --git a/configure.ac b/configure.ac
index 1a53ed418e4..392389fb2fb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1470,6 +1470,10 @@ if test "$enable_bootstrap:$GXX" = "yes:yes"; then
   CXX="$CXX -std=c++11"
 elif test "$have_compiler" = yes; then
   AX_CXX_COMPILE_STDCXX(11)
+
+  if test "${build}" != "${host}"; then
+AX_CXX_COMPILE_STDCXX(11, [], [], [_FOR_BUILD])
+  fi
 fi
 
 # Used for setting $lt_cv_objdir
diff --git a/config/ax_cxx_compile_stdcxx.m4 b/config/ax_cxx_compile_stdcxx.m4
index 9413da624d2..0cd515fc65b 100644
--- a/config/ax_cxx_compile_stdcxx.m4
+++ b/config/ax_cxx_compile_stdcxx.m4
@@ -25,6 +25,10 @@
 #   regardless, after defining HAVE_CXX${VERSION} if and only if a
 #   supporting mode is found.
 #
+#   If the fourth argument is the CXX/CXXFLAG/CPPFLAG suffix, e.g.
+#   "_FOR_BUILD".
+#
+#
 # LICENSE
 #
 #   Copyright (c) 2008 Benjamin Kosnik 
@@ -60,12 +64,21 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 [$3], [mandatory], [ax_cxx_compile_cxx$1_required=true],
 [$3], [optional], [ax_cxx_compile_cxx$1_required=false],
 [m4_fatal([invalid third argument `$3' to AX_CXX_COMPILE_STDCXX])])
+  m4_if([$4], [], [],
+[$4], [_FOR_BUILD], [],
+[m4_fatal([invalid fourth argument `$4' to AX_CXX_COMPILE_STDCXX])])dnl
   AC_LANG_PUSH([C++])dnl
   ac_success=no
-
+  m4_if([$4], [_FOR_BUILD],
+[ax_cv_cxx_compile_cxx$1_orig_cxx="$CXX"
+ ax_cv_cxx_compile_cxx$1_orig_cxxflags="$CXXFLAGS"
+ ax_cv_cxx_compile_cxx$1_orig_cppflags="$CPPFLAGS"
+ CXX="$CXX$4"
+ CXXFLAGS="$CXXFLAGS$4"
+ CPPFLAGS="$CPPFLAGS$4"])
   m4_if([$2], [], [dnl
 AC_CACHE_CHECK(whether $CXX supports C++$1 features by default,
-		   ax_cv_cxx_compile_cxx$1,
+		   ax_cv_cxx_compile_cxx$1$4,
   [AC_COMPILE_IFELSE([AC_LANG_SOURCE([_AX_CXX_COMPILE_STDCXX_testbody_$1])],
 [ax_cv_cxx_compile_cxx$1=yes],
 [ax_cv_cxx_compile_cxx$1=no])])
@@ -77,7 +90,7 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
   if test x$ac_success = xno; then
 for alternative in ${ax_cxx_compile_alternatives}; do
   switch="-std=gnu++${alternative}"
-  cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1_$switch])
+  cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1$4_$switch])
   AC_CACHE_CHECK(whether $CXX supports C++$1 features with $switch,
  $cachevar,
 [ac_save_CXX="$CXX"
@@ -104,7 +117,7 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 dnl Cray's crayCC needs "-h std=c++11"
 for alternative in ${ax_cxx_compile_alternatives}; do
   for switch in -std=c++${alternative} +std=c++${alternative} "-h std=c++${alternative}"; do
-cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1_$switch])
+cachevar=AS_TR_SH([ax_cv_cxx_compile_cxx$1$4_$switch])
 AC_CACHE_CHECK(whether $CXX supports C++$1 features with $switch,
$cachevar,
   [ac_save_CXX="$CXX"
@@ -127,6 +140,13 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
   fi
 done
   fi])
+  m4_if([$4], [_FOR_BUILD],
+[CXX$4="$CXX"
+ CXXFLAGS$4="$CXXFLAGS"
+ CPPFLAGS$4="$CPPFLAGS"
+ CXX="$ax_cv_cxx_compile_cxx$1_orig_cxx"
+ CXXFLAGS="$ax_cv_cxx_compile_cxx$1_orig_cxxflags"
+ CPPFLAGS="$ax_cv_cxx_compile_cxx$1_orig_cppflags"])
   AC_LANG_POP([C++])
   if test x$ax_cxx_compile_cxx$1_required = xtrue; then
 if test x$ac_success = xno; then
@@ -134,14 +154,14 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 fi
   fi
   if test x$ac_success = xno; then
-HAVE_CXX$1=0
+HAVE_CXX$1$4=0
 AC_MSG_NOTICE([No compiler with C++$1 support was found])
   else
-HAVE_CXX$1=1
-AC_DEFINE(HAVE_CXX$1,1,
+HAVE_CXX$1$4=1
+AC_DEFINE(HAVE_CXX$1$4,1,
   [define if the compiler supports basic C++$1 syntax])
   fi
-  AC_SUBST(HAVE_CXX$1)
+  AC_SUBST(HAVE_CXX$1$4)
 ])
 
 
diff --git a/configure b/configure
index a0c5aca9e8d..f5973922565 100755
--- a/configure
+++ b/configure
@@ -694,6 +694,7 @@ extra_mpc_gmp_configure_flags
 extra_mpfr_configure_flags
 gmpinc
 gmplibs
+HAVE_CXX11_FOR_BUILD
 HAVE_CXX11
 

Re: r11-2663 causes static_assert failure (was: Re: std:vec for classes with constructor?)

2020-08-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 13, 2020 at 01:38:07PM +0200, Tobias Burnus wrote:
> Hi,
> 
> I got a bit lost in this thread – but the
> commit r11-2663-g82c4b78dbef6f03838e3040688c934360a09513f
> "Replace std::vector<> usage in ipa-fnsummary.c with GCC's vec<>."
> 
> Causes here:
> 
> gcc-mainline/gcc/vec.h:1287:3: error: static assertion failed:
>static_assert (alignof (vec_embedded) == alignof(vec), "");
>^
> for a '--build=i686-pc-linux-gnu --host=powerpc64le-linux-gnu 
> --target=nvptx-none'
> build.

On which instantiation it is and what is your system compiler?
Do you get when building the i686 native compiler, or i686 to powerpc64le 
cross-compiler
or powerpc64le to nvptx-none cross-compiler?

Jakub



Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-13 Thread Christophe Lyon via Gcc-patches
Hi,


On Thu, 13 Aug 2020 at 03:54, qiaopeixin  wrote:
>
> Thanks for the review and commit.
>
> All the best,
> Peixin
>
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: 2020年8月13日 0:25
> To: qiaopeixin 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
> [PR96479]
>
> qiaopeixin  writes:
> > Hi,
> >
> > The test case vector-subscript-2.c in the gcc testsuit will report an ICE 
> > in the expand pass since '-mgeneral-regs-only' is incompatible with the use 
> > of V4SI mode. I propose to report the diagnostic information instead of 
> > ICE, and the problem has been discussed on PR 96479.
> >
> > I attached the patch to solve the problem. Bootstrapped and tested on 
> > aarch64-linux-gnu. Any suggestions?
>
> Thanks, pushed.  I was initially sceptical because raising an error here and 
> in aarch64_layout_arg is a hack.  Both functions are just query functions and 
> shouldn't have any side effects.
>
> The approach we took for FP modes seemed better: we define the FP move 
> patterns unconditionally, and raise an error if we try to emit an FP move 
> with !TARGET_FLOAT.  This defers any error reporting until we actually try to 
> generate code that depends on TARGET_FLOAT.
>
> But I guess SIMD stuff is different.  There's no reason in principle why you 
> can't use:
>
>   unsigned short __attribute__((vector_size(8)))
>
> *within* a function with -mgeneral-regs-only.  It would just need to be 
> emulated, in the same way as for:
>
>   unsigned short __attribute__((vector_size(4)))
>
> So it would be wrong to define the SIMD move patterns unconditionally and 
> raise an error there.
>
> So all in all, I agree this is the best we can do given the current 
> infrastructure.
>

Since this patch was committed my buildbot is broken for
aarch64-linux-gnu because it now fails to build glibc-2.29:
../stdlib/bits/stdlib-float.h: In function 'atof':
../stdlib/bits/stdlib-float.h:26:1: error: '-mgeneral-regs-only' is
incompatible with the use of floating-point types

I haven't yet tried a more recent glibc version, not sure if 2.29 is
considered obsolete?

Christophe

> Thanks,
> Richard
>


r11-2663 causes static_assert failure (was: Re: std:vec for classes with constructor?)

2020-08-13 Thread Tobias Burnus

Hi,

I got a bit lost in this thread – but the
commit r11-2663-g82c4b78dbef6f03838e3040688c934360a09513f
"Replace std::vector<> usage in ipa-fnsummary.c with GCC's vec<>."

Causes here:

gcc-mainline/gcc/vec.h:1287:3: error: static assertion failed:
   static_assert (alignof (vec_embedded) == alignof(vec), "");
   ^
for a '--build=i686-pc-linux-gnu --host=powerpc64le-linux-gnu 
--target=nvptx-none'
build.

Tobias


On 8/7/20 8:04 PM, Aldy Hernandez via Gcc-patches wrote:


Is the attached patch what y'all suggest?

The patch reverts the GCC vec<> usage in ipa-fnsummary.c and
implements the above suggestion.

Bootstraps, tests running...

If tests pass, is this OK?

Thanks.
Aldy

On Fri, Aug 7, 2020 at 11:17 AM Jonathan Wakely via Gcc-patches
 wrote:

On 07/08/20 10:55 +0200, Jakub Jelinek wrote:

On Fri, Aug 07, 2020 at 09:34:38AM +0100, Jonathan Wakely via Gcc-patches wrote:

Now that you say it, vec has a T[1] member so depending on T
there might be a side-effect as invoking its CTOR?  Or it might
even not compile if there is no default CTOR available...  Ick :/

Right.

Does the GTY stuff add members to the struct, or otherwise alter its
layout?

Or perhaps use offsetof on an alternate structure that should have the same
layout.

Yes that's what I was going to suggest next.


So instead of
  typedef vec vec_embedded;
  return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
use
  struct alignas (T) U { char data[sizeof (T)]; };
  typedef vec vec_embedded;

static_assert(sizeof(vec_embedded) == sizeof(vec), "");
static_assert(alignof(vec_embedded) == alignof(vec), "");


  return offsetof (vec_embedded, m_vecdata) + alloc * sizeof (T);
where vec_embedded should have the same offset of m_vecdata as vec.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] Alternate fix to PR target/96558: Robustify ix86_expand_clear.

2020-08-13 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 13, 2020 at 10:28 AM Roger Sayle  wrote:
>
>
> This patch is an alternate/supplementary fix to the recent regression
> PR target/96558.  Currently ix86_expand_clear may/should only be called
> with a general register DEST after reload_completed.  With the simple
> change below, this function now checks these conditions itself, and
> does the right thing (or at least something reasonable) rather than ICE.
>
> This change alone is sufficient to fix the recent regression, and allow
> the recently added testcase to pass, but following the "why fix something
> just once" maxim, I propose adding both solutions (to reduce the risk
> of surprises in the future).  Leaving the peephole2 fix in place is
> reasonable, as scheduling or a later pass eventually move the condition
> code setter/use next to each other, so moving the vpxor with the
> peephole2 provides no (additional) benefit.  i.e. gcc.dg/pr96558.c contains:
>
> vpxor   %xmm0, %xmm0, %xmm0
> testl   %eax, %eax
> jne .L91
>
> This patch has been tested on x86_64-pc-linux-gnu (without the peephole2
> solution) with a make bootstrap and make -k check with no new failures,
> but allows the recently added testcase to pass.


But we gain nothing for non-GENERAL_REG_P registers. movdi/movsi
patterns can load 0 to XMM registers with pxor by itself, also with
mask registers. The reason that this peephole exists is due to the
fact that XOR with GENERAL_REGs clobbers flags, so we can't just stick
 "xor reg, reg" to movdi/movsi pattern when zero is loaded. We would
like to do so, but we can't - so we have to complicate our lives with
a peephole pattern that checks for live FLAGS_REG before
transformation.

I think that we should assert that only GENERAL_REGS should be
processed by ix86_expand_clear.

Uros.

> Ok for mainline?
>
>
> 2020-08-19  Roger Sayle  
>
> gcc/ChangeLog
> PR target/96558
> * config/i386/i386-expand.c (ix86_expand_clear): Explicitly
> test for reload_completed and GENERAL_REG_P, and emit a simple
> set of const0_rtx otherwise.
>
> Thanks in advance.
> Sorry for the inconvenience.
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>
>


Re: [PATCH] nvptx: Add support for subword compare-and-swap

2020-08-13 Thread Tom de Vries
On 7/20/20 3:19 PM, Kwok Cheung Yeung wrote:
> On 01/07/2020 3:28 pm, Tom de Vries wrote:
>> So, I think gcc needs a copy of (some of) the
>> gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
>> sync_char_short.
>>
>> However, since this patch only adds partial support, we cannot enable
>> sync_char_short for nvptx yet.  So, if you stick to partial support, you
>> should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
>> ideally could be an include of a generic test-case that is active for
>> sync_char_short only, with mention that it can be removed once
>> sync_char_short is enabled for nvptx).
>>
> 
> I have added gcc.target/nvptx/sync.c, which is a version of
> ia64-sync-3.c extended to test chars and shorts too.

I've:
- added the short/char part of that as gcc.dg/ia64-async-5.c
- included the sync_int_long ones of gcc.dg/ia64-async-* in
  gcc.target/nvptx
- did the same for gcc.dg/ia64-async-5.c as part of this patch

> I kept the original
> int and long tests because sync_int_long isn't indicated as being
> supported on nvptx either.
> 

Yep, I proposed a patch to enable that:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551842.html .

>> I looked at the implementation, and it looks ok to me, though I think we
>> need to make explicit in a comment what the assumptions are:
>> - that we have read and write access to the entire word, and
>> - that the word is not volatile.
>>
> 
> I've added some extra comments in the implementation. Like I said
> previously, the loop accounts for the larger word being volatile.
> 

Right, I known what that loop is intending to do. The loop though may
manifest worst-case as a hang, so I've mentioned that in the comment.

>> As for the oacc test-case, you could add the __int128 bit, perhaps along
>> the lines of how things are done in
>> libgomp/testsuite/libgomp.c++/target-8.C ?
>>
> 
> I've added a extra test for __int128 types in my libgomp testcase that
> runs if 128-bit types are supported.
> 
> I've tested that there are no regressions with the patch on standalone
> nvptx, and that the new reduction-16.c testcase passes with both nvptx
> and AMD GCN offloading.
> 
> Is this version okay for master and og10?
> 

Pushed as attached to master.

Thanks,
- Tom

nvptx: Add support for subword compare-and-swap

This adds support for __sync_val_compare_and_swap and
__sync_bool_compare_and_swap for 1-byte and 2-byte long
values, which are not natively supported on nvptx.

Build and reg-tested on nvptx.
Build and reg-tested libgomp on x86_64 with nvptx accelerator.

2020-07-16  Kwok Cheung Yeung  

	libgcc/
	* config/nvptx/atomic.c: New.
	* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.

	gcc/testsuite/
	* gcc.target/nvptx/sync-5.c: New.

	libgomp/
	* testsuite/libgomp.c-c++-common/reduction-16.c: New.

---
 gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c   |  2 +
 libgcc/config/nvptx/atomic.c   | 73 ++
 libgcc/config/nvptx/t-nvptx|  3 +-
 .../testsuite/libgomp.c-c++-common/reduction-16.c  | 53 
 4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c
new file mode 100644
index 000..ec40f2ca7a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "../../gcc.dg/ia64-sync-5.c"
diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
new file mode 100644
index 000..e1ea078692a
--- /dev/null
+++ b/libgcc/config/nvptx/atomic.c
@@ -0,0 +1,73 @@
+/* NVPTX atomic operations
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include 
+
+/* Implement __sync_val_compare_and_swap and __sync_bool_compare_and_swap
+   for 1 and 2-byte values (which are not natively supported) in terms of
+   __sync_val_compare_and_swap for 4-byte values (which is supported).
+   This assumes that the contents of the word 

Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-13 Thread Martin Liška

On 8/13/20 10:25 AM, Jose E. Marchesi wrote:



On 8/12/20 9:12 PM, Jose E. Marchesi wrote:

1) CHERRY_PICK_PREFIX = '(cherry picked from commit ' and I used
 a slightly differnt wording.


Yes, you used a bit different wording :)


2) If I am not mistaken while reading the script, the CHERRY_PICK
line
 should be part of the ChangeLog entries (indented, etc) and I did put
 it before the ChangeLog entries instead.


No, it should be placed at the end of a commit message (what git cherry-pick -x 
does).


... but with the right wording :)

Well, sorry for the inconveniences.  I will make sure to use the script
and check the output of gcc-verify -p for the backport notices from now
on.



Don't worry, it's just a nit. All of us are learning how to live with the
auto-generated ChangeLog entries.

Martin


RE: [PATCH] Alternate fix to PR target/96558: Robustify ix86_expand_clear.

2020-08-13 Thread Roger Sayle

Doh! ENOPATCH.

-Original Message-
From: Roger Sayle  
Sent: 13 August 2020 09:29
To: 'GCC Patches' 
Cc: 'Uros Bizjak' 
Subject: [PATCH] Alternate fix to PR target/96558: Robustify
ix86_expand_clear.


This patch is an alternate/supplementary fix to the recent regression PR
target/96558.  Currently ix86_expand_clear may/should only be called with a
general register DEST after reload_completed.  With the simple change below,
this function now checks these conditions itself, and does the right thing
(or at least something reasonable) rather than ICE.

This change alone is sufficient to fix the recent regression, and allow the
recently added testcase to pass, but following the "why fix something just
once" maxim, I propose adding both solutions (to reduce the risk of
surprises in the future).  Leaving the peephole2 fix in place is reasonable,
as scheduling or a later pass eventually move the condition code setter/use
next to each other, so moving the vpxor with the
peephole2 provides no (additional) benefit.  i.e. gcc.dg/pr96558.c contains:

vpxor   %xmm0, %xmm0, %xmm0
testl   %eax, %eax
jne .L91

This patch has been tested on x86_64-pc-linux-gnu (without the peephole2
solution) with a make bootstrap and make -k check with no new failures, but
allows the recently added testcase to pass.

Ok for mainline?


2020-08-19  Roger Sayle  

gcc/ChangeLog
PR target/96558
* config/i386/i386-expand.c (ix86_expand_clear): Explicitly
test for reload_completed and GENERAL_REG_P, and emit a simple
set of const0_rtx otherwise.

Thanks in advance.
Sorry for the inconvenience.
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index f441ba9..cf3e741 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -164,18 +164,22 @@ ix86_expand_clear (rtx dest)
   rtx tmp;
 
   /* We play register width games, which are only valid after reload.  */
-  gcc_assert (reload_completed);
-
-  /* Avoid HImode and its attendant prefix byte.  */
-  if (GET_MODE_SIZE (GET_MODE (dest)) < 4)
-dest = gen_rtx_REG (SImode, REGNO (dest));
-  tmp = gen_rtx_SET (dest, const0_rtx);
-
-  if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+  if (reload_completed && GENERAL_REG_P (dest))
 {
-  rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
-  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, tmp, clob));
+  /* Avoid HImode and its attendant prefix byte.  */
+  if (GET_MODE_SIZE (GET_MODE (dest)) < 4)
+dest = gen_rtx_REG (SImode, REGNO (dest));
+  tmp = gen_rtx_SET (dest, const0_rtx);
+
+  if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+   {
+ rtx clob = gen_rtx_CLOBBER (VOIDmode,
+ gen_rtx_REG (CCmode, FLAGS_REG));
+ tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, tmp, clob));
+   }
 }
+  else
+tmp = gen_rtx_SET (dest, const0_rtx);
 
   emit_insn (tmp);
 }


[PATCH] Alternate fix to PR target/96558: Robustify ix86_expand_clear.

2020-08-13 Thread Roger Sayle


This patch is an alternate/supplementary fix to the recent regression
PR target/96558.  Currently ix86_expand_clear may/should only be called
with a general register DEST after reload_completed.  With the simple
change below, this function now checks these conditions itself, and
does the right thing (or at least something reasonable) rather than ICE.

This change alone is sufficient to fix the recent regression, and allow
the recently added testcase to pass, but following the "why fix something
just once" maxim, I propose adding both solutions (to reduce the risk
of surprises in the future).  Leaving the peephole2 fix in place is
reasonable, as scheduling or a later pass eventually move the condition
code setter/use next to each other, so moving the vpxor with the
peephole2 provides no (additional) benefit.  i.e. gcc.dg/pr96558.c contains:

vpxor   %xmm0, %xmm0, %xmm0
testl   %eax, %eax
jne .L91

This patch has been tested on x86_64-pc-linux-gnu (without the peephole2
solution) with a make bootstrap and make -k check with no new failures,
but allows the recently added testcase to pass.

Ok for mainline?


2020-08-19  Roger Sayle  

gcc/ChangeLog
PR target/96558
* config/i386/i386-expand.c (ix86_expand_clear): Explicitly
test for reload_completed and GENERAL_REG_P, and emit a simple
set of const0_rtx otherwise.

Thanks in advance.
Sorry for the inconvenience.
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK




Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-13 Thread Jose E. Marchesi via Gcc-patches


> On 8/12/20 9:12 PM, Jose E. Marchesi wrote:
>> 1) CHERRY_PICK_PREFIX = '(cherry picked from commit ' and I used
>> a slightly differnt wording.
>
> Yes, you used a bit different wording :)
>
>> 2) If I am not mistaken while reading the script, the CHERRY_PICK
>> line
>> should be part of the ChangeLog entries (indented, etc) and I did put
>> it before the ChangeLog entries instead.
>
> No, it should be placed at the end of a commit message (what git cherry-pick 
> -x does).

... but with the right wording :)

Well, sorry for the inconveniences.  I will make sure to use the script
and check the output of gcc-verify -p for the backport notices from now
on.



Re: [PATCH] ipa: fix bit CPP when combined with IPA bit CP

2020-08-13 Thread Martin Liška

On 8/12/20 7:03 PM, Martin Liška wrote:

There's an updated version of the patch that is approved by Honza.

I'm going to install it (and I'll backport it as well).

Martin


And there's one obvious fix that aligns code in 
ipcp_bits_lattice::set_to_constant
with ipcp_bits_lattice::meet_with_1.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to install it.

Thanks,
Martin
>From 9a3f6f50693c0ef3dbdcbfe9931fd7d30e59f67d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 13 Aug 2020 09:38:41 +0200
Subject: [PATCH] ipa: fix ICE in get_default_value

The patch aligns code with ipcp_bits_lattice::set_to_constant
where we properly mask m_value with m_mask. The same should
be done here.

gcc/ChangeLog:

	PR ipa/96482
	* ipa-cp.c (ipcp_bits_lattice::meet_with_1): Mask m_value
	with m_mask.

gcc/testsuite/ChangeLog:

	PR ipa/96482
	* gcc.dg/ipa/pr96482-2.c: New test.
---
 gcc/ipa-cp.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr96482-2.c | 33 
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr96482-2.c

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 2b21280d919..e4910a04ffa 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1048,7 +1048,7 @@ ipcp_bits_lattice::meet_with_1 (widest_int value, widest_int mask,
 
   widest_int old_mask = m_mask;
   m_mask = (m_mask | mask) | (m_value ^ value);
-  m_value &= value;
+  m_value &= ~m_mask;
 
   if (wi::sext (m_mask, precision) == -1)
 return set_to_bottom ();
diff --git a/gcc/testsuite/gcc.dg/ipa/pr96482-2.c b/gcc/testsuite/gcc.dg/ipa/pr96482-2.c
new file mode 100644
index 000..54b71ac4fc0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr96482-2.c
@@ -0,0 +1,33 @@
+/* PR ipa/96482 */
+/* { dg-do compile } */
+/* { dg-options "-O2"  } */
+
+int i2c_transfer();
+void _dev_err();
+
+struct i2c_msg {
+  char bufaddr;
+  int adapterdev;
+} wdt87xx_i2c_xfer_client;
+
+int wdt87xx_i2c_xfer_client_0, wdt87xx_i2c_xfer_rxdata, wdt87xx_get_string_str_idx;
+
+void
+static wdt87xx_i2c_xfer(void *txdata, unsigned rxlen) {
+  struct i2c_msg msgs[] = {wdt87xx_i2c_xfer_client_0, rxlen,
+   wdt87xx_i2c_xfer_rxdata};
+  int error = i2c_transfer(wdt87xx_i2c_xfer_client, msgs);
+  _dev_err("", __func__, error);
+}
+static void wdt87xx_get_string(unsigned len) {
+  char tx_buf[] = {wdt87xx_get_string_str_idx, 3};
+  int rx_len = len + 2;
+  wdt87xx_i2c_xfer(tx_buf, rx_len);
+}
+
+void
+wdt87xx_ts_probe_tx_buf() {
+  wdt87xx_get_string(34);
+  wdt87xx_get_string(8);
+  wdt87xx_i2c_xfer(wdt87xx_ts_probe_tx_buf, 2);
+}
-- 
2.28.0



Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-13 Thread Jan Hubicka
> Hi!
> 
> On Wed, Aug 12, 2020 at 09:03:35PM +0200, Richard Biener wrote:
> > On August 12, 2020 7:53:07 PM GMT+02:00, Jan Hubicka  wrote:
> > >> From: Xiong Hu Luo 
> > >> 523.xalancbmk_r +1.32%
> > >> 541.leela_r +1.51%
> > >> 548.exchange2_r +31.87%
> > >> 507.cactuBSSN_r +0.80%
> > >> 526.blender_r   +1.25%
> > >> 538.imagick_r   +1.82%
> 
> > >> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > >> index 0211f08964f..11903ac1960 100644
> > >> --- a/gcc/cgraph.h
> > >> +++ b/gcc/cgraph.h
> > >> @@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
> > >>cgraph_node *c = callee->ultimate_alias_target ();
> > >>if (caller->inlined_to)
> > >>  return caller->inlined_to->decl == c->decl;
> > >> +  else if (caller->clone_of && c->clone_of)
> > >> +return caller->clone_of->decl == c->clone_of->decl;
> > >>else
> > >>  return caller->decl == c->decl;
> > >
> > >If you clone the function so it is no longer self recursive, it does
> > >not
> > >make much sense to lie to optimizers that the function is still
> > >recursive.
> 
> Like Richard says below (if I understand him right, sorry if not), the
> function still *is* recursive in its group of clones.

The test above is not an heuristics.  Its purpose is to determine when
offline body will be eliminated after inlining the call.  This happens
when
 1) this is last call to the function 
 2) function is not used otherwise (exported or visible)
 3) the call is not self recursive
In case of 3 the problem is that inlning will introduce new call to
function itself and offline copy will still be needed.

Here we see a chain of clones calling
 clone1->clone2->clone3->clone4...->clone9->clone1
inlining clone2 to clone3 will elliminate offline copy of clone3 and
will reduce code size.

Inliner has analysis which intends to model code size/time accurately
and the heuristics part. The patch makes size/time to produce wrong
results, while this needs to be solved in the heuristics part.

Note that with bit of optimization we should be able to eliminate call
clone9->clone1 because it is dead (I believe we don't do that only
becuase recursion limit is set 1 off). This however does not make
inlining clone2->clone3 any better idea. Same is true if user wrote the
clones explicitly.


> 
> > >The inlining would be harmful even if the programer did cloning by
> > >hand.
> > >I guess main problem is the extreme register pressure issue combining
> > >loop depth of 10 in caller with loop depth of 10 in callee just because
> > >the function is called once.
> > >
> > >The negative effect is most likely also due to wrong profile estimate
> > >which drives IRA to optimize wrong spot.  But I wonder if we simply
> > >don't want to teach inlining function called once to not construct
> > >large
> > >loop depths?  Something like do not inline if caller loop depth
> > >is over 3 or so?
> > 
> > I don't think that's good by itself (consider leaf functions and x86 xmm 
> > reg ABI across calls). Even with large loop depth abstraction penalty 
> > removal can make inlining worth it. For the testcase the recursiveness is 
> > what looks special (recursion from a deeper loop nest level). 
> 
> Yes, the loop stuff / register pressure issues might help for the
> exchange result, but what about the other five above?

I think such large loop nest (20 if you combine caller+callee) rarely
happens in practice, so it may be good heuristics. I guess it depends
how much exchange2 only hacks we want in compiler.  We should aim to
make changes that helps other codebases too.

The ipa-cp recursive clonning machinery IMO has good potential to help
other code which has fixed cap on depth of recursion.
Could consider following code:

int array[1000];
static
fact (int a)
{
  int ret;
  if (a > 1)
ret = fact (a - 1) * a;
  for (int i = 0; i < a; i++)
array[i+(a-1)*(a-1)/2] = 0;
  return ret;
}

test ()
{
  return fact (10);
}

This computes factorial and at the same time clears memory (just to
avoid tailcall happening and also to have functions bit bigger).

Here it is a win to produce all 10 clones, inline them together,
precompute factorial and combine the memsets.

We have few problems visible on this testcase
 1) ipa-cp needs reduction of evaluation thresholds
 2) estimated profile is bogus (since we estimate the recursion with
 probability 1/3 and after 10 levels of recursion the inner factorial
 get probability almost 0)
 3) inliner will still happily flatten the function not noticing that it
 has to be cold.  This is because it will do that pairwise and never see
 the combind probability.

Honza


[committed] openmp: Add support for non-rectangular loops in taskloop construct

2020-08-13 Thread Jakub Jelinek via Gcc-patches
Hi!

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-08-13  Jakub Jelinek  

* gimplify.c (gimplify_omp_taskloop_expr): New function.
(gimplify_omp_for): Use it.  For OMP_FOR_NON_RECTANGULAR
loops adjust in outer taskloop the var-outer decls.
* omp-expand.c (expand_omp_taskloop_for_inner): Handle non-rectangular
loops.
(expand_omp_for): Don't reject non-rectangular taskloop.
* omp-general.c (omp_extract_for_data): Don't assert that
non-rectangular loops have static schedule, instead treat loop->m1
or loop->m2 as if loop->n1 or loop->n2 is non-constant.

* testsuite/libgomp.c/loop-22.c (main): Add some further tests.
* testsuite/libgomp.c/loop-23.c (main): Likewise.
* testsuite/libgomp.c/loop-24.c: New test.

--- gcc/gimplify.c.jj   2020-08-11 14:20:35.150935247 +0200
+++ gcc/gimplify.c  2020-08-12 12:55:41.128589197 +0200
@@ -10996,6 +10996,37 @@ gimplify_omp_task (tree *expr_p, gimple_
   *expr_p = NULL_TREE;
 }
 
+/* Helper function for gimplify_omp_for.  If *TP is not a gimple constant,
+   force it into a temporary initialized in PRE_P and add firstprivate clause
+   to ORIG_FOR_STMT.  */
+
+static void
+gimplify_omp_taskloop_expr (tree type, tree *tp, gimple_seq *pre_p,
+   tree orig_for_stmt)
+{
+  if (*tp == NULL || is_gimple_constant (*tp))
+return;
+
+  *tp = get_initialized_tmp_var (*tp, pre_p, NULL, false);
+  /* Reference to pointer conversion is considered useless,
+ but is significant for firstprivate clause.  Force it
+ here.  */
+  if (type
+  && TREE_CODE (type) == POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (*tp)) == REFERENCE_TYPE)
+{
+  tree v = create_tmp_var (TYPE_MAIN_VARIANT (type));
+  tree m = build2 (INIT_EXPR, TREE_TYPE (v), v, *tp);
+  gimplify_and_add (m, pre_p);
+  *tp = v;
+}
+
+  tree c = build_omp_clause (input_location, OMP_CLAUSE_FIRSTPRIVATE);
+  OMP_CLAUSE_DECL (c) = *tp;
+  OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (orig_for_stmt);
+  OMP_FOR_CLAUSES (orig_for_stmt) = c;
+}
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */
 
 static enum gimplify_status
@@ -11298,65 +11329,34 @@ gimplify_omp_for (tree *expr_p, gimple_s
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
{
  t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
- if (!is_gimple_constant (TREE_OPERAND (t, 1)))
-   {
+ gimple_seq *for_pre_p = (gimple_seq_empty_p (for_pre_body)
+  ? pre_p : _pre_body);
  tree type = TREE_TYPE (TREE_OPERAND (t, 0));
- TREE_OPERAND (t, 1)
-   = get_initialized_tmp_var (TREE_OPERAND (t, 1),
-  gimple_seq_empty_p (for_pre_body)
-  ? pre_p : _pre_body, NULL,
-  false);
- /* Reference to pointer conversion is considered useless,
-but is significant for firstprivate clause.  Force it
-here.  */
- if (TREE_CODE (type) == POINTER_TYPE
- && (TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 1)))
- == REFERENCE_TYPE))
-   {
- tree v = create_tmp_var (TYPE_MAIN_VARIANT (type));
- tree m = build2 (INIT_EXPR, TREE_TYPE (v), v,
-  TREE_OPERAND (t, 1));
- gimplify_and_add (m, gimple_seq_empty_p (for_pre_body)
-  ? pre_p : _pre_body);
- TREE_OPERAND (t, 1) = v;
-   }
- tree c = build_omp_clause (input_location,
-OMP_CLAUSE_FIRSTPRIVATE);
- OMP_CLAUSE_DECL (c) = TREE_OPERAND (t, 1);
- OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (orig_for_stmt);
- OMP_FOR_CLAUSES (orig_for_stmt) = c;
+ if (TREE_CODE (TREE_OPERAND (t, 1)) == TREE_VEC)
+   {
+ tree v = TREE_OPERAND (t, 1);
+ gimplify_omp_taskloop_expr (type, _VEC_ELT (v, 1),
+ for_pre_p, orig_for_stmt);
+ gimplify_omp_taskloop_expr (type, _VEC_ELT (v, 2),
+ for_pre_p, orig_for_stmt);
}
+ else
+   gimplify_omp_taskloop_expr (type, _OPERAND (t, 1), for_pre_p,
+   orig_for_stmt);
 
  /* Handle OMP_FOR_COND.  */
  t = TREE_VEC_ELT (OMP_FOR_COND (for_stmt), i);
- if (!is_gimple_constant (TREE_OPERAND (t, 1)))
+ if (TREE_CODE (TREE_OPERAND (t, 1)) == TREE_VEC)
{
- tree type = TREE_TYPE (TREE_OPERAND (t, 0));
- TREE_OPERAND (t, 1)
-   = get_initialized_tmp_var (TREE_OPERAND (t, 1),
-  

Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-13 Thread Martin Liška

On 8/12/20 9:12 PM, Jose E. Marchesi wrote:

1) CHERRY_PICK_PREFIX = '(cherry picked from commit ' and I used
a slightly differnt wording.


Yes, you used a bit different wording :)



2) If I am not mistaken while reading the script, the CHERRY_PICK line
should be part of the ChangeLog entries (indented, etc) and I did put
it before the ChangeLog entries instead.


No, it should be placed at the end of a commit message (what git cherry-pick -x 
does).

Martin


Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-13 Thread luoxhu via Gcc-patches
Hi,

On 2020/8/13 01:53, Jan Hubicka wrote:
> Hello,
> with Martin we spent some time looking into exchange2 and my
> understanding of the problem is the following:
> 
> There is the self recursive function digits_2 with the property that it
> has 10 nested loops and calls itself from the innermost.
> Now we do not do amazing job on guessing the profile since it is quite
> atypical. First observation is that the callback frequencly needs to be
> less than 1 otherwise the program never terminates, however with 10
> nested loops one needs to predict every loop to iterate just few times
> and conditionals guarding them as not very likely. For that we added
> PRED_LOOP_GUARD_WITH_RECURSION some time ago and I fixed it yesterday
> (causing regression in exhange since the bad profile turned out to
> disable some harmful vectorization) and I also now added a cap to the
> self recursive frequency so things to not get mispropagated by ipa-cp.

Thanks for the information :)  Tamar replied that there is another
regression *on exchange2 is 11%.*, I've also rebased my code and confirmed
it really getting even slower than before (revert the patch could pull the
performance back)...

> 
> Now if ipa-cp decides to duplicate digits few times we have a new
> problem.  The tree of recursion is orgnaized in a way that the depth is
> bounded by 10 (which GCC does not know) and moreover most time is not
> spent on very deep levels of recursion.
> 
> For that you have the patch which increases frequencies of recursively
> cloned nodes, however it still seems to me as very specific hack for
> exchange: I do not see how to guess where most of time is spent.
> Even for very regular trees, by master theorem, it depends on very
> little differences in the estimates of recursion frequency whether most
> of time is spent on the top of tree, bottom or things are balanced.

The build is not PGO, so I am not clear how profile count will affect the 
ipa-cp and ipa-inline decision. 
Since there are no other callers outside of these specialized nodes, the
guessed profile count should be same equal?  Perf tool shows that even
each specialized node is called only once, none of them take same time for
each call:

  40.65%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.4
  16.31%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.3
  10.91%  exchange2_gcc.o  libgfortran.so.5.0.0 [.] _gfortran_mminloc0_4_i4
   5.41%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.6
   4.68%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] __logic_MOD_new_solver
   3.76%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.5
   1.07%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.7
   0.84%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_brute.constprop.0
   0.47%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.2
   0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_digits_2.constprop.1
   0.24%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_covered.constprop.0
   0.11%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_reflected.constprop.0
   0.00%  exchange2_gcc.o  exchange2_gcc.orig.slow  [.] 
__brute_force_MOD_brute.constprop.1


digits_2.constprop.4 & digits_2.constprop.3 takes most of the execution time,
So profile count and frequency seem not very helpful for this case? 

> 
> With algorithms doing backtracing, like exhchange, the likelyness of
> recusion reduces with deeper recursion level, but we do not know how
> quickly and what the level is.


> 
>> From: Xiong Hu Luo 
>>
>> For SPEC2017 exchange2, there is a large recursive functiondigits_2(function
>> size 1300) generates specialized node from digits_2.1 to digits_2.8 with 
>> added
>> build option:
>>
>> --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80
>>
>> ipa-inline pass will consider inline these nodes called only once, but these
>> large functions inlined too deeply will cause serious register spill and
>> performance down as followed.
>>
>> inlineA: brute (inline digits_2.1, 2.2, 2.3, 2.4) -> digits_2.5 (inline 2.6, 
>> 2.7, 2.8)
>> inlineB: digits_2.1 (inline digits_2.2, 2.3) -> call digits_2.4 (inline 
>> digits_2.5, 2.6) -> call digits_2.7 (inline 2.8)
>> inlineC: brute (inline digits_2) -> call 2.1 -> 2.2 (inline 2.3) -> 2.4 -> 
>> 2.5 -> 2.6 (inline 2.7 ) -> 2.8
>> inlineD: brute -> call digits_2 -> call 2.1 -> call 2.2 -> 2.3 -> 2.4 -> 2.5 
>> -> 2.6 -> 2.7 -> 2.8
>>
>> Performance diff:
>> inlineB is ~25% faster than inlineA;
>> inlineC is ~20% faster than inlineB;
>> inlineD is ~30% faster than inlineC.
>>
>> The master GCC code now generates inline sequence like inlineB, this patch
>> makes the ipa-inline pass behavior like inlineD by:
>>   1) The growth acumulation for recursive 

[PATCH] C-SKY: Fix assembling error with -mfloat-abi=hard.

2020-08-13 Thread Jojo R
gcc/ChangeLog:
* gcc/config/csky/csky-elf.h (ASM_SPEC): Use mfloat-abi.
* gcc/config/csky/csky-linux-elf.h (ASM_SPEC): mfloat-abi.

---
 gcc/config/csky/csky-elf.h   | 2 ++
 gcc/config/csky/csky-linux-elf.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/config/csky/csky-elf.h b/gcc/config/csky/csky-elf.h
index 0a319c0..15a0e73 100644
--- a/gcc/config/csky/csky-elf.h
+++ b/gcc/config/csky/csky-elf.h
@@ -47,6 +47,8 @@
   %{mcpu=*:-mcpu=%*}   \
   %{march=*:-march=%*} \
   %{mhard-float:-mhard-float}  \
+  %{mfloat-abi=softfp:-mhard-float} \
+  %{mfloat-abi=hard:-mhard-float}   \
   %{melrw:-melrw}  \
   %{mno-elrw:-mno-elrw}\
   %{mistack:-mistack}  \
diff --git a/gcc/config/csky/csky-linux-elf.h b/gcc/config/csky/csky-linux-elf.h
index 2f052fd..9a57dd04 100644
--- a/gcc/config/csky/csky-linux-elf.h
+++ b/gcc/config/csky/csky-linux-elf.h
@@ -47,6 +47,8 @@
   %{mcpu=*:-mcpu=%*}   \
   %{march=*:-march=%*} \
   %{mhard-float:-mhard-float}  \
+  %{mfloat-abi=softfp:-mhard-float} \
+  %{mfloat-abi=hard:-mhard-float}   \
   %{melrw:-melrw}  \
   %{mno-elrw:-mno-elrw}\
   %{mistack:-mistack}  \
-- 
1.9.1