Re: [PATCH] i386: Update naked-1.c for PIC

2017-07-31 Thread Uros Bizjak
On Tue, Aug 1, 2017 at 12:48 AM, H.J. Lu  wrote:
> For ia32 targets, -fPIC may generate
>
> call __x86.get_pc_thunk.ax
> ...
> __x86.get_pc_thunk.ax:
> movl(%esp), %eax
> ret
>
> We should check "ret" only for non-PIC or non-ia32 targets.

I have added -fno-pic to dg-options.

Tested on x86_64-linux-gnu {,-m32} and committed.

Uros.

Index: gcc.target/i386/naked-1.c
===
--- gcc.target/i386/naked-1.c   (revision 250745)
+++ gcc.target/i386/naked-1.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0" } */
+/* { dg-options "-O0 -fno-pic" } */

 /* Verify that __attribute__((naked)) produces a naked function
that does not use ret to return but traps at the end.  */


[PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-07-31 Thread Martin Sebor

Richard,

in discussing this work Jeff mentioned that your comments on
the tree-ssa-alias.c parts would be helpful.  When you have
a chance could you please give it a once over and let me know
if you have any suggestions or concerns?  There are no visible
changes to existing clients of the pass, just extensions that
are relied on only by the new diagnostics.

  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01264.html

I expect to revisit the sprintf %s patch mentioned below and
see how to simplify it by taking advantage of the changes
implemented here.

Thanks
Martin

On 07/24/2017 09:13 PM, Martin Sebor wrote:

Ping:

  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01264.html

This change is related to

  [PATCH] enhance -Wrestrict for sprintf %s arguments
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01176.html

On 07/20/2017 02:45 PM, Martin Sebor wrote:

With more testing (building GDB, Glibc, Busybox, and the Linux
kernel) I found a few bugs and weaknesses in the initial patch.
Attached is version 2 that fixes the uncovered problems and makes
further enhancements to handle more cases.

Martin

On 07/16/2017 05:47 PM, Martin Sebor wrote:

Being implemented in the front end, the -Wrestrict warning
detects only trivial instances of violations.  The attached
patch extends the implementation to the middle-end where
data flow and alias analysis can be combined to detect even
complex cases of overlap.  This work is independent of but
follows on the patch below (waiting for review):

  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00036.html

The changes rely on extending the tree-ssa-{alias,structalias}
machinery in a simple way to answer the "must-alias" kind of
question in addition to the current "may-alias."

The rest of the changes are in gimple-fold.c, tree-ssa-strlen.c,
and builtins.c.

Even though this change makes -Wrestrict a lot more useful, it's
not a complete implementation.  Not all built-ins are handled yet
(e.g., strncat), and support for user-defined functions is still
subject to the limitations of the front end implementation.  To
complete the support, handlers for the missing string built-ins
will need to be added to tree-ssa-strlen.c, and the remaining
bits should be moved from the front end to somewhere in
the middle-end (e.g., calls.c).

Martin








Re: PING: [PATCH] PR driver/81523: Make -static override -pie

2017-07-31 Thread H.J. Lu
On Mon, Jul 31, 2017 at 5:37 PM, Alan Modra  wrote:
> On Mon, Jul 31, 2017 at 08:04:13AM -0700, H.J. Lu wrote:
>> On Mon, Jul 24, 2017 at 10:24 AM, H.J. Lu  wrote:
>> > On Sun, Jul 23, 2017 at 8:14 AM, H.J. Lu  wrote:
>> >> -static and -pie together behave differently depending on whether GCC is
>> >> configured with --enable-default-pie.  On x86, "-static -pie" fails to
>> >> create executable when --enable-default-pie isn't used, but creates a
>> >> static executable when --enable-default-pie is used.  This patch makes
>> >> -static completely override -pie to create a static executable, regardless
>> >> if --enable-default-pie is used to configure GCC.
>> >>
>> >> OK for master?
>> >>
>> >> H.J.
>> >> --
>> >> 2017-07-23  Alan Modra  
>> >> H.J. Lu  
>> >>
>> >> gcc/
>> >>
>> >> PR driver/81523
>> >> * gcc.c (NO_PIE_SPEC): Delete.
>> >> (PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
>> >> exclusion..
>> >> (LINK_PIE_SPEC): ..to here.
>> >> * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
>> >> chain of crtbegin*.o selection, update for PIE_SPEC changes and
>> >> format.
>> >> (GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
>> >> * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
>> >> (ENDFILE_CRTEND_SPEC): Similarly.
>> >>
>> >
>> > We need to add %{no-pie:} to LINK_COMMAND_SPEC to prevent an error
>> > message when PIE isn't enabled by default.   Here is the updated patch with
>> > a testcase.
>> >
>>
>> PING.  I am enclosing the patch here.
>
> For anyone looking at this, the patch enclosed here is mine from
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01678.html minus some PR
> numbers and the powerpc specific part, plus testsuite (which looks to
> me like HJ copied pie-static-1.c to pie-static-2.c then forgot to
> change the order of -pie and -static), plus a tweak to
> LINK_COMMAND_SPEC.  The LINK_COMMAND_SPEC change looks good to me,
> except for the comment which doesn't exactly match the code.  That
> change is necessary because I ignorantly removed %{no-pie:} from
> LINK_PIE_SPEC without adding it back elsewhere.

Here is the updated patch.

-- 
H.J.
From a2ed5b4c96978fba81423e562cdcb4b85c8cf9cc Mon Sep 17 00:00:00 2001
From: Alan Modra 
Date: Thu, 20 Jul 2017 09:57:36 -0700
Subject: [PATCH] PR driver/81523: Make -static override -pie

-static and -pie together behave differently depending on whether GCC is
configured with --enable-default-pie.  On x86, "-static -pie" fails to
create executable when --enable-default-pie isn't used, but creates a
static executable when --enable-default-pie is used.  This patch makes
-static completely override -pie to create a static executable, regardless
if --enable-default-pie is used to configure GCC.

2017-07-24  Alan Modra  
	H.J. Lu  

gcc/

	PR driver/81523
	* gcc.c (NO_PIE_SPEC): Delete.
	(PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
	exclusion..
	(LINK_PIE_SPEC): ..to here.
	(LINK_COMMAND_SPEC): Support -no-pie.
	* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
	chain of crtbegin*.o selection, update for PIE_SPEC changes and
	format.
	(GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
	* config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
	(ENDFILE_CRTEND_SPEC): Similarly.

gcc/testsuite/

	PR driver/81523
	* gcc.dg/pie-7.c: New test.
	* gcc.dg/pie-static-1.c: Likewise.
	* gcc.dg/pie-static-2.c: Likewise.
---
 gcc/config/gnu-user.h   | 34 --
 gcc/config/sol2.h   | 12 ++--
 gcc/gcc.c   | 14 +++---
 gcc/testsuite/gcc.dg/pie-7.c|  7 +++
 gcc/testsuite/gcc.dg/pie-static-1.c |  7 +++
 gcc/testsuite/gcc.dg/pie-static-2.c |  7 +++
 6 files changed, 58 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pie-7.c
 create mode 100644 gcc/testsuite/gcc.dg/pie-static-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pie-static-2.c

diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2787a3d16be..de605b0c466 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -50,19 +50,28 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #if defined HAVE_LD_PIE
 #define GNU_USER_TARGET_STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:gcrt1.o%s;: \
-%{" PIE_SPEC ":Scrt1.o%s} %{" NO_PIE_SPEC ":crt1.o%s}}} \
-   crti.o%s %{static:crtbeginT.o%s;: %{shared:crtbeginS.o%s} \
-	  %{" PIE_SPEC ":crtbeginS.o%s} \
-	  %{" NO_PIE_SPEC ":crtbegin.o%s}} \
+  "%{shared:; \
+ pg|p|profile:gcrt1.o%s; \
+ static:crt1.o%s; \
+ " PIE_SPEC ":Scrt1.o%s; \
+ :crt1.o%s} \
+   crti.o%s \
+   %{static:crtbeginT.o%s; \
+ shared|" PIE_SPEC ":crtbeginS.o%s; \
+ :crtbegin.o%s} \

Re: [PATCH 0/2] add unique_ptr class

2017-07-31 Thread David Malcolm
On Mon, 2017-07-31 at 19:46 -0400, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> Hi,
> 
> I've been saying I'd do this for a long time, but I'm finally getting
> to
> importing the C++98 compatable unique_ptr class Pedro wrote for gdb a
> while
> back.  I believe the gtl namespace also comes from Pedro, but GNU
> template
> library seems as reasonable as any other name I can come up
> with.  I'm not sure
> at the moment what outside of gcc may want to use this, but putting
> it include/
> at least allows us to use it in libcpp which may be useful.  I didn't
> include
> too much usage in this series, but I believe other people have wanted
> this too,

Thanks for posting this.

For example, I could use this in:

  [PATCH 1/3] c-family: add name_hint/deferred_diagnostic
  https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00439.html

e.g. as discussed here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00123.html

Dave

> so I'm reasonably confident it will get a fair amount of usage.
> 
> patches individually bootstrapped + regtested on ppc64-linux-gnu, ok?
> 
> Trev
> 
> 


Re: PING: [PATCH] PR driver/81523: Make -static override -pie

2017-07-31 Thread Alan Modra
On Mon, Jul 31, 2017 at 08:04:13AM -0700, H.J. Lu wrote:
> On Mon, Jul 24, 2017 at 10:24 AM, H.J. Lu  wrote:
> > On Sun, Jul 23, 2017 at 8:14 AM, H.J. Lu  wrote:
> >> -static and -pie together behave differently depending on whether GCC is
> >> configured with --enable-default-pie.  On x86, "-static -pie" fails to
> >> create executable when --enable-default-pie isn't used, but creates a
> >> static executable when --enable-default-pie is used.  This patch makes
> >> -static completely override -pie to create a static executable, regardless
> >> if --enable-default-pie is used to configure GCC.
> >>
> >> OK for master?
> >>
> >> H.J.
> >> --
> >> 2017-07-23  Alan Modra  
> >> H.J. Lu  
> >>
> >> gcc/
> >>
> >> PR driver/81523
> >> * gcc.c (NO_PIE_SPEC): Delete.
> >> (PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
> >> exclusion..
> >> (LINK_PIE_SPEC): ..to here.
> >> * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
> >> chain of crtbegin*.o selection, update for PIE_SPEC changes and
> >> format.
> >> (GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
> >> * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
> >> (ENDFILE_CRTEND_SPEC): Similarly.
> >>
> >
> > We need to add %{no-pie:} to LINK_COMMAND_SPEC to prevent an error
> > message when PIE isn't enabled by default.   Here is the updated patch with
> > a testcase.
> >
> 
> PING.  I am enclosing the patch here.

For anyone looking at this, the patch enclosed here is mine from
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01678.html minus some PR
numbers and the powerpc specific part, plus testsuite (which looks to
me like HJ copied pie-static-1.c to pie-static-2.c then forgot to
change the order of -pie and -static), plus a tweak to
LINK_COMMAND_SPEC.  The LINK_COMMAND_SPEC change looks good to me,
except for the comment which doesn't exactly match the code.  That
change is necessary because I ignorantly removed %{no-pie:} from
LINK_PIE_SPEC without adding it back elsewhere.

> 
> Thanks.
> 
> 
> -- 
> H.J.

> From ea702c99286ab92a4b94f676d2340ce55fd173c3 Mon Sep 17 00:00:00 2001
> From: Alan Modra 
> Date: Thu, 20 Jul 2017 09:57:36 -0700
> Subject: [PATCH] PR driver/81523: Make -static override -pie
> 
> -static and -pie together behave differently depending on whether GCC is
> configured with --enable-default-pie.  On x86, "-static -pie" fails to
> create executable when --enable-default-pie isn't used, but creates a
> static executable when --enable-default-pie is used.  This patch makes
> -static completely override -pie to create a static executable, regardless
> if --enable-default-pie is used to configure GCC.
> 
> 2017-07-24  Alan Modra  
>   H.J. Lu  
> 
> gcc/
> 
>   PR driver/81523
>   * gcc.c (NO_PIE_SPEC): Delete.
>   (PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
>   exclusion..
>   (LINK_PIE_SPEC): ..to here.
>   (LINK_COMMAND_SPEC): Support -no-pie.
>   * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
>   chain of crtbegin*.o selection, update for PIE_SPEC changes and
>   format.
>   (GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
>   * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
>   (ENDFILE_CRTEND_SPEC): Similarly.
> 
> gcc/testsuite/
> 
>   PR driver/81523
>   * gcc.dg/pie-7.c: New test.
>   * gcc.dg/pie-static-1.c: Likewise.
>   * gcc.dg/pie-static-2.c: Likewise.
> ---
>  gcc/config/gnu-user.h   | 34 --
>  gcc/config/sol2.h   | 12 ++--
>  gcc/gcc.c   | 10 +-
>  gcc/testsuite/gcc.dg/pie-7.c|  7 +++
>  gcc/testsuite/gcc.dg/pie-static-1.c |  7 +++
>  gcc/testsuite/gcc.dg/pie-static-2.c |  7 +++
>  6 files changed, 56 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pie-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/pie-static-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pie-static-2.c
> 
> diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
> index 2787a3d16be..de605b0c466 100644
> --- a/gcc/config/gnu-user.h
> +++ b/gcc/config/gnu-user.h
> @@ -50,19 +50,28 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  
>  #if defined HAVE_LD_PIE
>  #define GNU_USER_TARGET_STARTFILE_SPEC \
> -  "%{!shared: %{pg|p|profile:gcrt1.o%s;: \
> -%{" PIE_SPEC ":Scrt1.o%s} %{" NO_PIE_SPEC ":crt1.o%s}}} \
> -   crti.o%s %{static:crtbeginT.o%s;: %{shared:crtbeginS.o%s} \
> -   %{" PIE_SPEC ":crtbeginS.o%s} \
> -   %{" NO_PIE_SPEC ":crtbegin.o%s}} \
> +  "%{shared:; \
> + pg|p|profile:gcrt1.o%s; \
> + static:crt1.o%s; \
> + " PIE_SPEC ":Scrt1.o%s; \
> + :crt1.o%s} \
> +   crti.o%s \
> +   

[PATCH 1/5] Rename existing insn_cost to insn_sched_cost

2017-07-31 Thread Segher Boessenkool
haifa-sched exports an insn_cost function, but it is only used in a
few places and specialised to scheduling.  This patch renames it to
insn_sched_cost.

---
 gcc/haifa-sched.c  | 14 +++---
 gcc/sched-int.h|  2 +-
 gcc/sched-rgn.c|  4 ++--
 gcc/sel-sched-ir.c |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index af0ed27..d1378d0 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -225,7 +225,7 @@ struct common_sched_info_def *common_sched_info;
 #define FEEDS_BACKTRACK_INSN(INSN) (HID (INSN)->feeds_backtrack_insn)
 #define SHADOW_P(INSN) (HID (INSN)->shadow_p)
 #define MUST_RECOMPUTE_SPEC_P(INSN) (HID (INSN)->must_recompute_spec)
-/* Cached cost of the instruction.  Use insn_cost to get cost of the
+/* Cached cost of the instruction.  Use insn_sched_cost to get cost of the
insn.  -1 here means that the field is not initialized.  */
 #define INSN_COST(INSN)(HID (INSN)->cost)
 
@@ -1383,7 +1383,7 @@ static rtx_insn *nonscheduled_insns_begin;
This is the number of cycles between instruction issue and
instruction results.  */
 int
-insn_cost (rtx_insn *insn)
+insn_sched_cost (rtx_insn *insn)
 {
   int cost;
 
@@ -1470,7 +1470,7 @@ dep_cost_1 (dep_t link, dw_t dw)
 {
   enum reg_note dep_type = DEP_TYPE (link);
 
-  cost = insn_cost (insn);
+  cost = insn_sched_cost (insn);
 
   if (INSN_CODE (insn) >= 0)
{
@@ -1608,11 +1608,11 @@ priority (rtx_insn *insn)
  INSN_FUSION_PRIORITY (insn) = this_fusion_priority;
}
   else if (dep_list_size (insn, SD_LIST_FORW) == 0)
-   /* ??? We should set INSN_PRIORITY to insn_cost when and insn has
-  some forward deps but all of them are ignored by
+   /* ??? We should set INSN_PRIORITY to insn_sched_cost when and insn
+  has some forward deps but all of them are ignored by
   contributes_to_priority hook.  At the moment we set priority of
   such insn to 0.  */
-   this_priority = insn_cost (insn);
+   this_priority = insn_sched_cost (insn);
   else
{
  rtx_insn *prev_first, *twin;
@@ -1683,7 +1683,7 @@ priority (rtx_insn *insn)
{
  gcc_assert (this_priority == -1);
 
- this_priority = insn_cost (insn);
+ this_priority = insn_sched_cost (insn);
}
 
   INSN_PRIORITY (insn) = this_priority;
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 624d892..2af8f9f 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1403,7 +1403,7 @@ extern void get_ebb_head_tail (basic_block, basic_block,
   rtx_insn **, rtx_insn **);
 extern int no_real_insns_p (const rtx_insn *, const rtx_insn *);
 
-extern int insn_cost (rtx_insn *);
+extern int insn_sched_cost (rtx_insn *);
 extern int dep_cost_1 (dep_t, dw_t);
 extern int dep_cost (dep_t);
 extern int set_priorities (rtx_insn *, rtx_insn *);
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 492094e..ba5b47c 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2837,8 +2837,8 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail)
   : INSN_PRIORITY (insn))
: INSN_PRIORITY (insn)),
   (sel_sched_p () ? (sched_emulate_haifa_p ? -1
-  : insn_cost (insn))
-   : insn_cost (insn)));
+  : insn_sched_cost (insn))
+   : insn_sched_cost (insn)));
 
   if (recog_memoized (insn) < 0)
fprintf (sched_dump, "nothing");
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index fa88259..c0e835f 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -1324,7 +1324,7 @@ sel_insn_rtx_cost (rtx_insn *insn)
 }
 
 /* Return the cost of the VI.
-   !!! FIXME: Unify with haifa-sched.c: insn_cost ().  */
+   !!! FIXME: Unify with haifa-sched.c: insn_sched_cost ().  */
 int
 sel_vinsn_cost (vinsn_t vi)
 {
-- 
1.9.3



[PATCH 0/5] RFC, WIP: RTL cost improvements

2017-07-31 Thread Segher Boessenkool
This series creates pattern_cost and insn_cost functions that together
replace the existing insn_rtx_cost function.

pattern_cost is like the old insn_rtx_cost function; insn_cost takes
an actual rtx_insn * as input, not just a pattern.

Also a targetm.insn_cost is added, which targets can use to implement
a more exact cost more easily.

The combine patch is pretty gross (but functional), it needs some
refactoring (to not call recog so often).  The rs6000 patch is very
much a work in progress.

How does this look?  Is this the right direction?


Segher


Segher Boessenkool (5):
  Rename existing insn_cost to insn_sched_cost
  Replace insn_rtx_cost with insn_cost and pattern_cost
  combine: Use insn_cost instead of pattern_cost everywhere
  Add targetm.insn_cost hook
  rs6000: Implement insn_cost hook

 gcc/cfgrtl.c   |  7 +++
 gcc/combine.c  | 40 +---
 gcc/config/rs6000/rs6000.c | 14 ++
 gcc/doc/tm.texi| 12 
 gcc/doc/tm.texi.in |  2 ++
 gcc/dse.c  |  2 +-
 gcc/haifa-sched.c  | 14 +++---
 gcc/ifcvt.c| 12 ++--
 gcc/rtl.h  |  3 ++-
 gcc/rtlanal.c  | 16 ++--
 gcc/sched-int.h|  2 +-
 gcc/sched-rgn.c|  4 ++--
 gcc/sel-sched-ir.c |  2 +-
 gcc/target.def | 14 ++
 14 files changed, 108 insertions(+), 36 deletions(-)

-- 
1.9.3



[PATCH 5/5] rs6000: Implement insn_cost hook

2017-07-31 Thread Segher Boessenkool
This is a pretty minimalistic implementation of the insn_cost hook: it
just counts how many machine instructions will be generated.  Some
improvements are needed: loads should get extra cost; some instructions
like mul and div should be more expensive than others; and it exposes
some suboptimalities in our machine description files.  Still, good
enough for most testing.

---
 gcc/config/rs6000/rs6000.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a7d2e7e..b4fda69 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1761,6 +1761,8 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #define TARGET_RTX_COSTS rs6000_rtx_costs
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST hook_int_rtx_mode_as_bool_0
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST rs6000_insn_cost
 
 #undef TARGET_INIT_DWARF_REG_SIZES_EXTRA
 #define TARGET_INIT_DWARF_REG_SIZES_EXTRA rs6000_init_dwarf_reg_sizes_extra
@@ -34521,6 +34523,18 @@ rs6000_debug_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return ret;
 }
 
+static int
+rs6000_insn_cost (rtx_insn *insn, bool speed)
+{
+  if (recog_memoized (insn) < 0)
+return 0;
+
+  if (!speed)
+return get_attr_length (insn);
+
+  return COSTS_N_INSNS (get_attr_length (insn) / 4);
+}
+
 /* Debug form of ADDRESS_COST that is selected if -mdebug=cost.  */
 
 static int
-- 
1.9.3



Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Martin Sebor

On 07/31/2017 05:11 PM, Trevor Saunders wrote:

On Mon, Jul 31, 2017 at 04:58:15PM -0600, Martin Sebor wrote:

On 07/31/2017 03:34 PM, Trevor Saunders wrote:

On Mon, Jul 31, 2017 at 02:56:40PM -0600, Martin Sebor wrote:

On 07/27/2017 02:30 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

The preC++ way of passing information about the call site of a function was to
use a macro that passed __file__, __LINE__, and __FUNCTION__ to a function with
the same name with _stat appended to it.  The way this is now done with C++ is
to have arguments where the default value is __LINE__, __FILE__, and
__FUNCTION__ in the caller.  This has the significant advantage that if you
look for "^function (" you find the correct function, where in the C way of
doing things you need to realize its a macro and check the definition of the
macro to see what to look for next.  So this removes a layer of indirection,
and makes things somewhat more consistant in using the C++ way of doing things.


So that's what these things are for! :)

I must be missing something either about the macros or about your
changes.  The way I read the changes it seems that it's no longer
possible to call, say,

  t = make_node (code);

and have __FILE__, __LINE__, and __FUNCTION__ passed as arguments
behind the scenes.

Instead, to achieve that, make_node has to be called like so

  t = make_node (code PASS_MEM_STAT);

Otherwise calling make_node() with just one argument will end up
calling it with the defaults.


calling it with the defaults will do the same thing the macro used to
do, so its fine unless you want to pass in a location that you got as an
argument.


Sorry, I'm still not sure I understand.  Please bear with me
if I'm just being slow.

When GATHER_STATISTICS is defined, make_node_stat is declared
like so in current GCC (without your patch):

  extern tree
  make_node_stat (enum tree_code , const char * _loc_name,
  int _loc_line,  const char * _loc_function);

and the make_node macro like so:

  #define make_node(t) make_node_stat (t MEM_STAT_INFO)

with MEM_STAT_INFO being defined as follows:

  #define MEM_STAT_INFO , ALONE_MEM_STAT_INFO
  #define ALONE_MEM_STAT_INFO __FILE__, __LINE__, __FUNCTION__

So a call to

  t = make_node (code);

expands to

  t = make_node_stat (code, __FILE__, __LINE__, __FUNCTION__);

If I'm reading your patch correctly (please correct me if/where
I'm not) then it treats the same call as just an ordinary call
to the make_node function with no macro expansion taking place
at the call  site.  I.e., it will be made with the _loc_name,
_loc_line, and _loc_function arguments set to whatever their
defaults are where the function is declared.  To have the call
do the same thing as before it will have to be made with explicit
arguments, e.g., like so:

  t = make_node (code MEM_STAT_INFO);

That seems like a (significant) difference.


I'm guessing you are missing that the functions when used as the default
argument provide the location of the call site of the function, not the
location of the prototype.  Which is not the most obvious thing.


I am indeed missing that.  How do/can they do that when that's
not how C++ 98 default arguments work?  (It's only possible with
GCC's __builtin_{FILE,LINE,FUNCTION} but not without these
extensions.)


They use the extensions if available, and otherwise provide the wrong
values.  However that's fine because this is just for profiling gcc
itself so demanding you use gcc 4.8 or greater, or bootstrap the
compiler you are going to profile isn't a big deal.


Ah, okay, thanks for your patience with me.  I didn't realize
some language features/extensions get enabled as GCC bootstraps.



perhaps we should just make --enable-gather-detailed-mem-statss require
a recent gcc instead of supporting it but providing bad data for some
things?


That would make sense to me.  Either that, or documenting this
caveat as a known limitation somewhere users of the option will
notice (or both).

Martin


[PATCH 4/5] Add targetm.insn_cost hook

2017-07-31 Thread Segher Boessenkool
This introduces a hook to implement insn_cost.  If a target does not
implement the hook, the old function (i.e. pattern_cost) is used.

---
 gcc/doc/tm.texi| 12 
 gcc/doc/tm.texi.in |  2 ++
 gcc/rtlanal.c  |  3 +++
 gcc/target.def | 14 ++
 4 files changed, 31 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 23e85c7..dfba37f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6640,6 +6640,18 @@ should probably only be given to addresses with 
different numbers of
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_INSN_COST (rtx_insn *@var{insn}, bool 
@var{speed})
+This target hook describes the relative costs of RTL instructions.
+
+In implementing this hook, you can use the construct
+@code{COSTS_N_INSNS (@var{n})} to specify a cost equal to @var{n} fast
+instructions.
+
+When optimizing for code size, i.e.@: when @code{speed} is
+false, this target hook should be used to estimate the relative
+size cost of an expression, again relative to @code{COSTS_N_INSNS}.
+@end deftypefn
+
 @deftypefn {Target Hook} {unsigned int} TARGET_MAX_NOCE_IFCVT_SEQ_COST (edge 
@var{e})
 This hook returns a value in the same units as @code{TARGET_RTX_COSTS},
 giving the maximum acceptable cost for a sequence generated by the RTL
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6df08a2..6e81341 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4796,6 +4796,8 @@ Define this macro if a non-short-circuit operation 
produced by
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_INSN_COST
+
 @hook TARGET_MAX_NOCE_IFCVT_SEQ_COST
 
 @hook TARGET_NOCE_CONVERSION_PROFITABLE_P
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 8f05546..aca194e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5314,6 +5314,9 @@ pattern_cost (rtx pat, bool speed)
 int
 insn_cost (rtx_insn *insn, bool speed)
 {
+  if (targetm.insn_cost)
+return targetm.insn_cost (insn, speed);
+
   return pattern_cost (PATTERN (insn), speed);
 }
 
diff --git a/gcc/target.def b/gcc/target.def
index 6d67b1f..2eb411d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3647,6 +3647,20 @@ registers on machines with lots of registers.",
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Compute a cost for INSN.  */
+DEFHOOK
+(insn_cost,
+ "This target hook describes the relative costs of RTL instructions.\n\
+\n\
+In implementing this hook, you can use the construct\n\
+@code{COSTS_N_INSNS (@var{n})} to specify a cost equal to @var{n} fast\n\
+instructions.\n\
+\n\
+When optimizing for code size, i.e.@: when @code{speed} is\n\
+false, this target hook should be used to estimate the relative\n\
+size cost of an expression, again relative to @code{COSTS_N_INSNS}.",
+ int, (rtx_insn *insn, bool speed), NULL)
+
 /* Give a cost, in RTX Costs units, for an edge.  Like BRANCH_COST, but with
well defined units.  */
 DEFHOOK
-- 
1.9.3



[PATCH 2/5] Replace insn_rtx_cost with insn_cost and pattern_cost

2017-07-31 Thread Segher Boessenkool
This renames insn_rtx_cost to pattern cost, and adds a new function
insn_cost that takes an rtx_insn * instead of an instruction pattern
as input.  It uses the latter function anywhere an instruction is
readily available (instead of just an instruction pattern).

The actual implementation of insn_cost just calls pattern_cost on
the pattern of the instruction; no functional change yet.

---
 gcc/cfgrtl.c  |  7 +++
 gcc/combine.c | 17 -
 gcc/dse.c |  2 +-
 gcc/ifcvt.c   | 12 ++--
 gcc/rtl.h |  3 ++-
 gcc/rtlanal.c | 13 +++--
 6 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 6ef47b7..739d1bb 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -5039,14 +5039,13 @@ rtl_account_profile_record (basic_block bb, int 
after_pass,
   FOR_BB_INSNS (bb, insn)
 if (INSN_P (insn))
   {
-   record->size[after_pass]
- += insn_rtx_cost (PATTERN (insn), false);
+   record->size[after_pass] += insn_cost (insn, false);
if (bb->count.initialized_p ())
  record->time[after_pass]
-   += insn_rtx_cost (PATTERN (insn), true) * bb->count.to_gcov_type ();
+   += insn_cost (insn, true) * bb->count.to_gcov_type ();
else if (profile_status_for_fn (cfun) == PROFILE_GUESSED)
  record->time[after_pass]
-   += insn_rtx_cost (PATTERN (insn), true) * bb->frequency;
+   += insn_cost (insn, true) * bb->frequency;
   }
 }
 
diff --git a/gcc/combine.c b/gcc/combine.c
index c5200db..aadd328 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -311,7 +311,7 @@ static bool optimize_this_for_speed_p;
 
 static int max_uid_known;
 
-/* The following array records the insn_rtx_cost for every insn
+/* The following array records the insn_cost for every insn
in the instruction stream.  */
 
 static int *uid_insn_cost;
@@ -841,7 +841,7 @@ do_SUBST_LINK (struct insn_link **into, struct insn_link 
*newval)
 #define SUBST_LINK(oldval, newval) do_SUBST_LINK (, newval)
 
 /* Subroutine of try_combine.  Determine whether the replacement patterns
-   NEWPAT, NEWI2PAT and NEWOTHERPAT are cheaper according to insn_rtx_cost
+   NEWPAT, NEWI2PAT and NEWOTHERPAT are cheaper according to insn_cost
than the original sequence I0, I1, I2, I3 and undobuf.other_insn.  Note
that I0, I1 and/or NEWI2PAT may be NULL_RTX.  Similarly, NEWOTHERPAT and
undobuf.other_insn may also both be NULL_RTX.  Return false if the cost
@@ -888,11 +888,11 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, 
rtx_insn *i2, rtx_insn *i3,
 old_cost -= i1_cost;
 
 
-  /* Calculate the replacement insn_rtx_costs.  */
-  new_i3_cost = insn_rtx_cost (newpat, optimize_this_for_speed_p);
+  /* Calculate the replacement pattern_costs.  */
+  new_i3_cost = pattern_cost (newpat, optimize_this_for_speed_p);
   if (newi2pat)
 {
-  new_i2_cost = insn_rtx_cost (newi2pat, optimize_this_for_speed_p);
+  new_i2_cost = pattern_cost (newi2pat, optimize_this_for_speed_p);
   new_cost = (new_i2_cost > 0 && new_i3_cost > 0)
 ? new_i2_cost + new_i3_cost : 0;
 }
@@ -907,7 +907,7 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn 
*i2, rtx_insn *i3,
   int old_other_cost, new_other_cost;
 
   old_other_cost = INSN_COST (undobuf.other_insn);
-  new_other_cost = insn_rtx_cost (newotherpat, optimize_this_for_speed_p);
+  new_other_cost = pattern_cost (newotherpat, optimize_this_for_speed_p);
   if (old_other_cost > 0 && new_other_cost > 0)
{
  old_cost += old_other_cost;
@@ -1208,10 +1208,9 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
  set_nonzero_bits_and_sign_copies (XEXP (links, 0), NULL_RTX,
insn);
 
-   /* Record the current insn_rtx_cost of this instruction.  */
+   /* Record the current insn_cost of this instruction.  */
if (NONJUMP_INSN_P (insn))
- INSN_COST (insn) = insn_rtx_cost (PATTERN (insn),
-   optimize_this_for_speed_p);
+ INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p);
if (dump_file)
  {
fprintf (dump_file, "insn_cost %d for ", INSN_COST (insn));
diff --git a/gcc/dse.c b/gcc/dse.c
index f87dd50..9ffb696 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1650,7 +1650,7 @@ find_shift_sequence (int access_size,
   cost = 0;
   for (insn = shift_seq; insn != NULL_RTX; insn = NEXT_INSN (insn))
if (INSN_P (insn))
- cost += insn_rtx_cost (PATTERN (insn), speed);
+ cost += insn_cost (insn, speed);
 
   /* The computation up to here is essentially independent
 of the arguments and could be precomputed.  It may
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 46a13c4..7de287f 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -121,7 +121,7 @@ count_bb_insns (const_basic_block bb)
 

[PATCH] rs6000: Trailing comma warning in enum

2017-07-31 Thread Segher Boessenkool
Committed.


Segher


2017-07-31  Segher Boessenkool  

* config/rs6000/rs6000.c (enum rs6000_reg_type): Delete trailing comma.

---
 gcc/config/rs6000/rs6000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index efa3a5e..cba03da 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -437,7 +437,7 @@ enum rs6000_reg_type {
   ALTIVEC_REG_TYPE,
   FPR_REG_TYPE,
   SPR_REG_TYPE,
-  CR_REG_TYPE,
+  CR_REG_TYPE
 };
 
 /* Map register class to register type.  */
-- 
1.9.3



[PATCH 3/5] combine: Use insn_cost instead of pattern_cost everywhere

2017-07-31 Thread Segher Boessenkool
This changes combine to always use insn_cost, not pattern_cost.  I don't
intend to commit it like this: it calls recog more often than necessary,
and it is very ugly (no, don't look at it).  But it's good enough to test
things with.

---
 gcc/combine.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index aadd328..d0bd4d3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -856,7 +856,7 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn 
*i2, rtx_insn *i3,
   int new_i2_cost, new_i3_cost;
   int old_cost, new_cost;
 
-  /* Lookup the original insn_rtx_costs.  */
+  /* Lookup the original insn_costs.  */
   i2_cost = INSN_COST (i2);
   i3_cost = INSN_COST (i3);
 
@@ -888,11 +888,23 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, 
rtx_insn *i2, rtx_insn *i3,
 old_cost -= i1_cost;
 
 
-  /* Calculate the replacement pattern_costs.  */
-  new_i3_cost = pattern_cost (newpat, optimize_this_for_speed_p);
+  /* Calculate the replacement insn_costs.  */
+  rtx tmp = PATTERN (i3);
+  PATTERN (i3) = newpat;
+  int tmpi = INSN_CODE (i3);
+  INSN_CODE (i3) = -1;
+  new_i3_cost = insn_cost (i3, optimize_this_for_speed_p);
+  PATTERN (i3) = tmp;
+  INSN_CODE (i3) = tmpi;
   if (newi2pat)
 {
-  new_i2_cost = pattern_cost (newi2pat, optimize_this_for_speed_p);
+  tmp = PATTERN (i2);
+  PATTERN (i2) = newi2pat;
+  tmpi = INSN_CODE (i2);
+  INSN_CODE (i2) = -1;
+  new_i2_cost = insn_cost (i2, optimize_this_for_speed_p);
+  PATTERN (i2) = tmp;
+  INSN_CODE (i2) = tmpi;
   new_cost = (new_i2_cost > 0 && new_i3_cost > 0)
 ? new_i2_cost + new_i3_cost : 0;
 }
@@ -907,7 +919,14 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, 
rtx_insn *i2, rtx_insn *i3,
   int old_other_cost, new_other_cost;
 
   old_other_cost = INSN_COST (undobuf.other_insn);
-  new_other_cost = pattern_cost (newotherpat, optimize_this_for_speed_p);
+  tmp = PATTERN (undobuf.other_insn);
+  PATTERN (undobuf.other_insn) = newotherpat;
+  tmpi = INSN_CODE (undobuf.other_insn);
+  INSN_CODE (undobuf.other_insn) = -1;
+  new_other_cost = insn_cost (undobuf.other_insn,
+ optimize_this_for_speed_p);
+  PATTERN (undobuf.other_insn) = tmp;
+  INSN_CODE (undobuf.other_insn) = tmpi;
   if (old_other_cost > 0 && new_other_cost > 0)
{
  old_cost += old_other_cost;
@@ -4081,7 +4100,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
}
 }
 
-  /* Only allow this combination if insn_rtx_costs reports that the
+  /* Only allow this combination if insn_cost reports that the
  replacement instructions are cheaper than the originals.  */
   if (!combine_validate_cost (i0, i1, i2, i3, newpat, newi2pat, other_pat))
 {
-- 
1.9.3



[Patch] Testsuite fixes for failures caused by patch for PR 80925 (part deux)

2017-07-31 Thread Steve Ellcey
Here is a second set of patches to fix tests that started failing
with the patch for PR 80925.  It uses the same testsuite changes
I did for no-section-anchors-vect-69.c and section-anchors-vect-69.c
on new set of gcc.dg/vect/vect-* tests.

2017-07-31  Steve Ellcey  

PR tree-optimization/80925
* gcc.dg/vect/vect-28.c: Add
--param vect-max-peeling-for-alignment=0 option.
Remove unaligned access and peeling checks.
* gcc.dg/vect/vect-33-big-array.c: Ditto.
* gcc.dg/vect/vect-70.c: Ditto.
* gcc.dg/vect/vect-87.c: Ditto.
* gcc.dg/vect/vect-88.c: Ditto.
* gcc.dg/vect/vect-91.c: Ditto.
* gcc.dg/vect/vect-93.c: Ditto.diff --git a/gcc/testsuite/gcc.dg/vect/vect-28.c b/gcc/testsuite/gcc.dg/vect/vect-28.c
index b28fbd9..e213df1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-28.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-28.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
 #include 
 #include "tree-vect.h"
@@ -39,6 +40,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target { vector_alignment_reachable } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" { target { {! vector_alignment_reachable} && {! vect_hw_misalign} } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
index 5ad3953..c1aa399 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
 #include 
 #include "tree-vect.h"
@@ -38,6 +39,4 @@ int main (void)
 
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { target vector_alignment_reachable } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" { target { {! vector_alignment_reachable} && {! vect_hw_misalign} } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-70.c b/gcc/testsuite/gcc.dg/vect/vect-70.c
index 0ec06a2..a110f9c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-70.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-70.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
 #include 
 #include "tree-vect.h"
@@ -63,6 +64,4 @@ int main (void)
 }
   
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" {target { vector_alignment_reachable} } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" {target {{! vector_alignment_reachable} && {! vect_hw_misalign} } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-87.c b/gcc/testsuite/gcc.dg/vect/vect-87.c
index 4f74397..17b1dcd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-87.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-87.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
 #include 
 #include "tree-vect.h"
@@ -50,6 +51,4 @@ int main (void)
 
 /* Fails for targets that don't vectorize PLUS (e.g alpha).  */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" {target vector_alignment_reachable} } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" {target { {! vector_alignment_reachable} && {! vect_hw_misalign} } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-88.c b/gcc/testsuite/gcc.dg/vect/vect-88.c
index f35c525..b99cb4d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-88.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-88.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
 #include 
 #include "tree-vect.h"
@@ -50,6 +51,4 @@ int main (void)
 
 /* Fails for targets that don't vectorize PLUS (e.g alpha).  */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times 

[PATCH 1/2] add unique_ptr header

2017-07-31 Thread tbsaunde+gcc
From: Trevor Saunders 

For most of the history of this see 
https://sourceware.org/ml/gdb-patches/2016-10/msg00223.html
The changes are mostly s/gdb/gtl/g

include/ChangeLog:

2017-07-29  Trevor Saunders  

* unique-ptr.h: New file.
---
 include/unique-ptr.h | 386 +++
 1 file changed, 386 insertions(+)
 create mode 100644 include/unique-ptr.h

diff --git a/include/unique-ptr.h b/include/unique-ptr.h
new file mode 100644
index 000..7903a5abefe
--- /dev/null
+++ b/include/unique-ptr.h
@@ -0,0 +1,386 @@
+/* gtl::unique_ptr, a simple std::unique_ptr replacement for C++03.
+
+   Copyright (C) 2007-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+/* gtl::unique_ptr defines a C++ owning smart pointer that exposes a
+   subset of the std::unique_ptr API.
+
+   In fact, when compiled with a C++11 compiler, gtl::unique_ptr
+   actually _is_ std::unique_ptr.  When compiled with a C++03 compiler
+   OTOH, it's an hand coded std::unique_ptr emulation that assumes
+   code is correct and doesn't try to be too smart.
+
+   This supports custom deleters, but not _stateful_ deleters, so you
+   can't use those in C++11 mode either.  Only the managed pointer is
+   stored in the smart pointer.  That could be changed; it simply
+   wasn't found necessary.
+
+   At the end of the file you'll find a gtl::unique_ptr partial
+   specialization that uses a custom (stateless) deleter:
+   gtl::unique_xmalloc_ptr.  That is used to manage pointers to
+   objects allocated with xmalloc.
+
+   The C++03 version was originally based on GCC 7.0's std::auto_ptr
+   and then heavily customized to behave more like C++11's
+   std::unique_ptr, but at this point, it no longer shares much at all
+   with the original file.  But, that's the history and the reason for
+   the copyright's starting year.
+
+   The C++03 version lets you shoot yourself in the foot, since
+   similarly to std::auto_ptr, the copy constructor and assignment
+   operators actually move.  Also, in the name of simplicity, no
+   effort is spent on using SFINAE to prevent invalid conversions,
+   etc.  This is not really a problem, because the goal here is to
+   allow code that would be correct using std::unique_ptr to be
+   equally correct in C++03 mode, and, just as efficient.  If client
+   code compiles correctly with a C++11 (or newer) compiler, we know
+   we're not doing anything invalid by mistake.
+
+   Usage notes:
+
+   - Putting gtl::unique_ptr in standard containers is not supported,
+ since C++03 containers are not move-aware (and our emulation
+ relies on copy actually moving).
+
+   - Since there's no nullptr in C++03, gtl::unique_ptr allows
+ implicit initialization and assignment from NULL instead.
+
+   - To check whether there's an associated managed object, all these
+ work as expected:
+
+  if (ptr)
+  if (!ptr)
+  if (ptr != NULL)
+  if (ptr == NULL)
+  if (NULL != ptr)
+  if (NULL == ptr)
+*/
+
+#ifndef GTL_UNIQUE_PTR_H
+#define GTL_UNIQUE_PTR_H 1
+
+#include 
+
+namespace gtl
+{
+
+#if __cplusplus >= 201103
+
+/* In C++11 mode, all we need is import the standard
+   std::unique_ptr.  */
+template using unique_ptr = std::unique_ptr;
+
+/* Pull in move as well.  */
+using std::move;
+
+#else /* C++11 */
+
+/* Default destruction policy used by gtl::unique_ptr when no deleter
+   is specified.  Uses delete.  */
+
+template
+struct default_delete
+{
+  void operator () (T *ptr) const { delete ptr; }
+};
+
+/* Specialization for arrays.  Uses delete[].  */
+
+template
+struct default_delete
+{
+  void operator () (T *ptr) const { delete [] ptr; }
+};
+
+namespace detail
+{
+/* Type used to support implicit construction from NULL:
+
+ gtl::unique_ptr func ()
+ {
+ return NULL;
+ }
+
+   and assignment from NULL:
+
+ gtl::unique_ptr ptr ();
+ ...
+ ptr = NULL;
+
+  It is intentionally not defined anywhere.  */
+struct nullptr_t;
+
+/* Base class of our unique_ptr emulation.  Contains code common to
+   both unique_ptr and unique_ptr.  */
+
+template
+class unique_ptr_base
+{
+public:
+  typedef T *pointer;
+  typedef T element_type;
+  typedef D deleter_type;
+
+  /* 

[PATCH 2/2] use unique_ptr some

2017-07-31 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2017-07-31  Trevor Saunders  

* cse.c (find_comparison_args): Make visited a unique_ptr.
* lto-streamer-out.c (write_global_references): Make data a
unique_ptr.
* tree-cfg.c (move_sese_region_to_fn): Make several variables
unique_ptrs.
---
 gcc/cse.c  |  7 +++
 gcc/lto-streamer-out.c |  6 +++---
 gcc/tree-cfg.c | 38 +-
 3 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/gcc/cse.c b/gcc/cse.c
index 6a968d19788..45da9b2da9d 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "unique-ptr.h"
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
@@ -2887,7 +2888,7 @@ find_comparison_args (enum rtx_code code, rtx *parg1, rtx 
*parg2,
  machine_mode *pmode1, machine_mode *pmode2)
 {
   rtx arg1, arg2;
-  hash_set *visited = NULL;
+  gtl::unique_ptr visited;
   /* Set nonzero when we find something of interest.  */
   rtx x = NULL;
 
@@ -2904,7 +2905,7 @@ find_comparison_args (enum rtx_code code, rtx *parg1, rtx 
*parg2,
   if (x)
{
  if (!visited)
-   visited = new hash_set;
+   visited.reset (new hash_set);
  visited->add (x);
  x = 0;
}
@@ -3067,8 +3068,6 @@ find_comparison_args (enum rtx_code code, rtx *parg1, rtx 
*parg2,
   *pmode1 = GET_MODE (arg1), *pmode2 = GET_MODE (arg2);
   *parg1 = fold_rtx (arg1, 0), *parg2 = fold_rtx (arg2, 0);
 
-  if (visited)
-delete visited;
   return code;
 }
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 41fba318cb5..61576c30266 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "unique-ptr.h"
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
@@ -2434,7 +2435,7 @@ write_global_references (struct output_block *ob,
   const uint32_t size = lto_tree_ref_encoder_size (encoder);
 
   /* Write size and slot indexes as 32-bit unsigned numbers. */
-  uint32_t *data = XNEWVEC (uint32_t, size + 1);
+  gtl::unique_ptr data (new uint32_t[size + 1]);
   data[0] = size;
 
   for (index = 0; index < size; index++)
@@ -2447,8 +2448,7 @@ write_global_references (struct output_block *ob,
   data[index + 1] = slot_num;
 }
 
-  lto_write_data (data, sizeof (int32_t) * (size + 1));
-  free (data);
+  lto_write_data (data.get (), sizeof (int32_t) * (size + 1));
 }
 
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 733c92fcdd0..604f799926c 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "unique-ptr.h"
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
@@ -7252,10 +7253,8 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
 {
   vec bbs, dom_bbs;
   basic_block dom_entry = get_immediate_dominator (CDI_DOMINATORS, entry_bb);
-  basic_block after, bb, *entry_pred, *exit_succ, abb;
+  basic_block after, bb, abb;
   struct function *saved_cfun = cfun;
-  int *entry_flag, *exit_flag;
-  profile_probability *entry_prob, *exit_prob;
   unsigned i, num_entry_edges, num_exit_edges, num_nodes;
   edge e;
   edge_iterator ei;
@@ -7291,9 +7290,10 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
  EXIT_BB so that we can re-attach them to the new basic block that
  will replace the region.  */
   num_entry_edges = EDGE_COUNT (entry_bb->preds);
-  entry_pred = XNEWVEC (basic_block, num_entry_edges);
-  entry_flag = XNEWVEC (int, num_entry_edges);
-  entry_prob = XNEWVEC (profile_probability, num_entry_edges);
+  gtl::unique_ptr entry_pred (new basic_block[num_entry_edges]);
+  gtl::unique_ptr entry_flag (new int[num_entry_edges]);
+  gtl::unique_ptr entry_prob
+(new profile_probability[num_entry_edges]);
   i = 0;
   for (ei = ei_start (entry_bb->preds); (e = ei_safe_edge (ei)) != NULL;)
 {
@@ -7303,12 +7303,15 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
   remove_edge (e);
 }
 
+  gtl::unique_ptr exit_succ;
+  gtl::unique_ptr exit_flag;
+  gtl::unique_ptr exit_prob;
   if (exit_bb)
 {
   num_exit_edges = EDGE_COUNT (exit_bb->succs);
-  exit_succ = XNEWVEC (basic_block, num_exit_edges);
-  exit_flag = XNEWVEC (int, num_exit_edges);
-  exit_prob = XNEWVEC (profile_probability, num_exit_edges);
+  exit_succ.reset (new basic_block[num_exit_edges]);
+  exit_flag.reset (new 

[PATCH 0/2] add unique_ptr class

2017-07-31 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

I've been saying I'd do this for a long time, but I'm finally getting to
importing the C++98 compatable unique_ptr class Pedro wrote for gdb a while
back.  I believe the gtl namespace also comes from Pedro, but GNU template
library seems as reasonable as any other name I can come up with.  I'm not sure
at the moment what outside of gcc may want to use this, but putting it include/
at least allows us to use it in libcpp which may be useful.  I didn't include
too much usage in this series, but I believe other people have wanted this too,
so I'm reasonably confident it will get a fair amount of usage.

patches individually bootstrapped + regtested on ppc64-linux-gnu, ok?

Trev




Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Trevor Saunders
On Mon, Jul 31, 2017 at 04:58:15PM -0600, Martin Sebor wrote:
> On 07/31/2017 03:34 PM, Trevor Saunders wrote:
> > On Mon, Jul 31, 2017 at 02:56:40PM -0600, Martin Sebor wrote:
> > > On 07/27/2017 02:30 AM, tbsaunde+...@tbsaunde.org wrote:
> > > > From: Trevor Saunders 
> > > > 
> > > > The preC++ way of passing information about the call site of a function 
> > > > was to
> > > > use a macro that passed __file__, __LINE__, and __FUNCTION__ to a 
> > > > function with
> > > > the same name with _stat appended to it.  The way this is now done with 
> > > > C++ is
> > > > to have arguments where the default value is __LINE__, __FILE__, and
> > > > __FUNCTION__ in the caller.  This has the significant advantage that if 
> > > > you
> > > > look for "^function (" you find the correct function, where in the C 
> > > > way of
> > > > doing things you need to realize its a macro and check the definition 
> > > > of the
> > > > macro to see what to look for next.  So this removes a layer of 
> > > > indirection,
> > > > and makes things somewhat more consistant in using the C++ way of doing 
> > > > things.
> > > 
> > > So that's what these things are for! :)
> > > 
> > > I must be missing something either about the macros or about your
> > > changes.  The way I read the changes it seems that it's no longer
> > > possible to call, say,
> > > 
> > >   t = make_node (code);
> > > 
> > > and have __FILE__, __LINE__, and __FUNCTION__ passed as arguments
> > > behind the scenes.
> > > 
> > > Instead, to achieve that, make_node has to be called like so
> > > 
> > >   t = make_node (code PASS_MEM_STAT);
> > > 
> > > Otherwise calling make_node() with just one argument will end up
> > > calling it with the defaults.
> > 
> > calling it with the defaults will do the same thing the macro used to
> > do, so its fine unless you want to pass in a location that you got as an
> > argument.
> 
> Sorry, I'm still not sure I understand.  Please bear with me
> if I'm just being slow.
> 
> When GATHER_STATISTICS is defined, make_node_stat is declared
> like so in current GCC (without your patch):
> 
>   extern tree
>   make_node_stat (enum tree_code , const char * _loc_name,
>   int _loc_line,  const char * _loc_function);
> 
> and the make_node macro like so:
> 
>   #define make_node(t) make_node_stat (t MEM_STAT_INFO)
> 
> with MEM_STAT_INFO being defined as follows:
> 
>   #define MEM_STAT_INFO , ALONE_MEM_STAT_INFO
>   #define ALONE_MEM_STAT_INFO __FILE__, __LINE__, __FUNCTION__
> 
> So a call to
> 
>   t = make_node (code);
> 
> expands to
> 
>   t = make_node_stat (code, __FILE__, __LINE__, __FUNCTION__);
> 
> If I'm reading your patch correctly (please correct me if/where
> I'm not) then it treats the same call as just an ordinary call
> to the make_node function with no macro expansion taking place
> at the call  site.  I.e., it will be made with the _loc_name,
> _loc_line, and _loc_function arguments set to whatever their
> defaults are where the function is declared.  To have the call
> do the same thing as before it will have to be made with explicit
> arguments, e.g., like so:
> 
>   t = make_node (code MEM_STAT_INFO);
> 
> That seems like a (significant) difference.
> 
> > I'm guessing you are missing that the functions when used as the default
> > argument provide the location of the call site of the function, not the
> > location of the prototype.  Which is not the most obvious thing.
> 
> I am indeed missing that.  How do/can they do that when that's
> not how C++ 98 default arguments work?  (It's only possible with
> GCC's __builtin_{FILE,LINE,FUNCTION} but not without these
> extensions.)

They use the extensions if available, and otherwise provide the wrong
values.  However that's fine because this is just for profiling gcc
itself so demanding you use gcc 4.8 or greater, or bootstrap the
compiler you are going to profile isn't a big deal.

perhaps we should just make --enable-gather-detailed-mem-statss require
a recent gcc instead of supporting it but providing bad data for some
things?

Trev

> 
> Martin
> 


Re: ping [PATCH] [MSP430] Fix PR78849: ICE on initialization of global struct containing __int20 array

2017-07-31 Thread Joseph Myers
On Wed, 26 Jul 2017, Jeff Law wrote:

> TYPE_SIZE, according to my understanding, should be a tree for the size
> of the expression in bits.
> 
> The problem is for msp430 that size varies depending on where it's used.
>  ie, in a register an object might have a bitsize of 20 bits, but in
> memory its size is 32 bits.
> 
> I don't think that TYPE_SIZE has any concept that the use context is
> relevant to selecting the proper size.

TYPE_SIZE_UNIT is unambiguously the memory size, including padding; it's 
used for sizeof.  TYPE_SIZE may be less clear.  We've had issues before 
with unions with x87 long double, which has 80-bit precision in registers 
but is 12-byte or 16-byte in memory; that was wrong code (bug 71522) 
rather than an ICE, but the long double case may be useful for comparison 
of what various type properties are set to in such cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 4/N] Recover GOTO predictor.

2017-07-31 Thread David Edelsohn
This patch breaks bootstrap on AIX and probably is a hidden bug on all targets.

With the patch, libbacktrace/xcoff.c fails to compile with the error:

/nasfarm/edelsohn/src/src/libbacktrace/xcoff.c: In function 'xcoff_add':
/nasfarm/edelsohn/src/src/libbacktrace/xcoff.c:822:13: error: 'incl'
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
filename = incl->filename;
~^~~~
/nasfarm/edelsohn/src/src/libbacktrace/xcoff.c:777:22: note: 'incl'
was declared here
   struct xcoff_incl *incl;
  ^~~~

incl is used immediately above the line that produces the warning --
with no warning.

I can provide a pre-processed xcoff.c, if you need it.

Thanks, David


Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Martin Sebor

On 07/31/2017 03:34 PM, Trevor Saunders wrote:

On Mon, Jul 31, 2017 at 02:56:40PM -0600, Martin Sebor wrote:

On 07/27/2017 02:30 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

The preC++ way of passing information about the call site of a function was to
use a macro that passed __file__, __LINE__, and __FUNCTION__ to a function with
the same name with _stat appended to it.  The way this is now done with C++ is
to have arguments where the default value is __LINE__, __FILE__, and
__FUNCTION__ in the caller.  This has the significant advantage that if you
look for "^function (" you find the correct function, where in the C way of
doing things you need to realize its a macro and check the definition of the
macro to see what to look for next.  So this removes a layer of indirection,
and makes things somewhat more consistant in using the C++ way of doing things.


So that's what these things are for! :)

I must be missing something either about the macros or about your
changes.  The way I read the changes it seems that it's no longer
possible to call, say,

  t = make_node (code);

and have __FILE__, __LINE__, and __FUNCTION__ passed as arguments
behind the scenes.

Instead, to achieve that, make_node has to be called like so

  t = make_node (code PASS_MEM_STAT);

Otherwise calling make_node() with just one argument will end up
calling it with the defaults.


calling it with the defaults will do the same thing the macro used to
do, so its fine unless you want to pass in a location that you got as an
argument.


Sorry, I'm still not sure I understand.  Please bear with me
if I'm just being slow.

When GATHER_STATISTICS is defined, make_node_stat is declared
like so in current GCC (without your patch):

  extern tree
  make_node_stat (enum tree_code , const char * _loc_name,
  int _loc_line,  const char * _loc_function);

and the make_node macro like so:

  #define make_node(t) make_node_stat (t MEM_STAT_INFO)

with MEM_STAT_INFO being defined as follows:

  #define MEM_STAT_INFO , ALONE_MEM_STAT_INFO
  #define ALONE_MEM_STAT_INFO __FILE__, __LINE__, __FUNCTION__

So a call to

  t = make_node (code);

expands to

  t = make_node_stat (code, __FILE__, __LINE__, __FUNCTION__);

If I'm reading your patch correctly (please correct me if/where
I'm not) then it treats the same call as just an ordinary call
to the make_node function with no macro expansion taking place
at the call  site.  I.e., it will be made with the _loc_name,
_loc_line, and _loc_function arguments set to whatever their
defaults are where the function is declared.  To have the call
do the same thing as before it will have to be made with explicit
arguments, e.g., like so:

  t = make_node (code MEM_STAT_INFO);

That seems like a (significant) difference.


I'm guessing you are missing that the functions when used as the default
argument provide the location of the call site of the function, not the
location of the prototype.  Which is not the most obvious thing.


I am indeed missing that.  How do/can they do that when that's
not how C++ 98 default arguments work?  (It's only possible with
GCC's __builtin_{FILE,LINE,FUNCTION} but not without these
extensions.)

Martin



[PATCH] i386: Update naked-1.c for PIC

2017-07-31 Thread H.J. Lu
For ia32 targets, -fPIC may generate

call __x86.get_pc_thunk.ax
...
__x86.get_pc_thunk.ax:
movl(%esp), %eax
ret

We should check "ret" only for non-PIC or non-ia32 targets.

OK for trunk?

H.J.
---
* gcc.target/i386/naked-1.c: Check "ret" only for non-PIC or
non-ia32 targets.
---
 gcc/testsuite/gcc.target/i386/naked-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/naked-1.c 
b/gcc/testsuite/gcc.target/i386/naked-1.c
index 440dbe9ee7a..dda354371ba 100644
--- a/gcc/testsuite/gcc.target/i386/naked-1.c
+++ b/gcc/testsuite/gcc.target/i386/naked-1.c
@@ -11,4 +11,4 @@ foo (void)
 }
 /* { dg-final { scan-assembler "# naked" } } */
 /* { dg-final { scan-assembler "ud2" } } */
-/* { dg-final { scan-assembler-not "ret" } } */
+/* { dg-final { scan-assembler-not "ret" { target { nonpic || { ! ia32 } } } } 
} */
-- 
2.13.3



Re: C PATCH to detect clashing attributes (PR c/81544)

2017-07-31 Thread Joseph Myers
On Tue, 25 Jul 2017, Marek Polacek wrote:

> PR c/81544 complaints that we aren't detecting clashing noreturn /
> warn_unused_result attributes so this patch adds that checking.  Martin
> plans to do more systematic checking in this area but meanwhile we
> might want to go with this.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

I'd expect exactly the same cases to be diagnosed for the two attributes 
on the same declaration, in either order, whether or not inside a single 
__attribute__, as are diagnosed when multiple declarations are involved 
(this applies to all such cases of invalid attribute combinations, not 
just this one).  Whether or not this patch achieves this, the testcase 
doesn't seem to cover the case of a single declaration with both 
attributes (and I don't see an existing such test in c-c++-common or 
gcc.dg either).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] Remaining references of Java

2017-07-31 Thread Joseph Myers
On Tue, 25 Jul 2017, Richard Biener wrote:

> > 2) fold-const.c:
> >
> >   1882/* The following code implements the floating point to integer
> >   1883   conversion rules required by the Java Language Specification,
> >   1884   that IEEE NaNs are mapped to zero and values that overflow
> >   1885   the target precision saturate, i.e. values greater than
> >   1886   INT_MAX are mapped to INT_MAX, and values less than INT_MIN
> >   1887   are mapped to INT_MIN.  These semantics are allowed by the
> >   1888   C and C++ standards that simply state that the behavior of
> >   1889   FP-to-integer conversion is unspecified upon overflow.  */
> >   1890
> >   1891wide_int val;
> >   1892REAL_VALUE_TYPE r;
> >   1893REAL_VALUE_TYPE x = TREE_REAL_CST (arg1);
> >
> > Can we somehow remove that Richi?
> 
> Remove what?  The comment?  It still matches the code and we have to
> do sth.  So I don't see why we should change this.
> 
> Maybe Joseph knows whether future standards / TRs recommend sth different
> or exactly this.

The requirement under Annex F is to return an unspecified value (with 
"invalid" raised) for such overflowing / NaN conversions to integer.

"unspecified value" means that any particular execution of the code must 
behave as if some consistent value was chosen; after

double d;
int i = d;

you can't have one subsequent conditional show i == 0 and another show i 
== INT_MAX.  But it's fine for different executions of that code to use 
different values.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Trevor Saunders
On Mon, Jul 31, 2017 at 02:56:40PM -0600, Martin Sebor wrote:
> On 07/27/2017 02:30 AM, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > The preC++ way of passing information about the call site of a function was 
> > to
> > use a macro that passed __file__, __LINE__, and __FUNCTION__ to a function 
> > with
> > the same name with _stat appended to it.  The way this is now done with C++ 
> > is
> > to have arguments where the default value is __LINE__, __FILE__, and
> > __FUNCTION__ in the caller.  This has the significant advantage that if you
> > look for "^function (" you find the correct function, where in the C way of
> > doing things you need to realize its a macro and check the definition of the
> > macro to see what to look for next.  So this removes a layer of indirection,
> > and makes things somewhat more consistant in using the C++ way of doing 
> > things.
> 
> So that's what these things are for! :)
> 
> I must be missing something either about the macros or about your
> changes.  The way I read the changes it seems that it's no longer
> possible to call, say,
> 
>   t = make_node (code);
> 
> and have __FILE__, __LINE__, and __FUNCTION__ passed as arguments
> behind the scenes.
> 
> Instead, to achieve that, make_node has to be called like so
> 
>   t = make_node (code PASS_MEM_STAT);
> 
> Otherwise calling make_node() with just one argument will end up
> calling it with the defaults.

calling it with the defaults will do the same thing the macro used to
do, so its fine unless you want to pass in a location that you got as an
argument.

I'm guessing you are missing that the functions when used as the default
argument provide the location of the call site of the function, not the
location of the prototype.  Which is not the most obvious thing.

Trev

> 
> Martin
> 


Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Martin Sebor

On 07/27/2017 02:30 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

The preC++ way of passing information about the call site of a function was to
use a macro that passed __file__, __LINE__, and __FUNCTION__ to a function with
the same name with _stat appended to it.  The way this is now done with C++ is
to have arguments where the default value is __LINE__, __FILE__, and
__FUNCTION__ in the caller.  This has the significant advantage that if you
look for "^function (" you find the correct function, where in the C way of
doing things you need to realize its a macro and check the definition of the
macro to see what to look for next.  So this removes a layer of indirection,
and makes things somewhat more consistant in using the C++ way of doing things.


So that's what these things are for! :)

I must be missing something either about the macros or about your
changes.  The way I read the changes it seems that it's no longer
possible to call, say,

  t = make_node (code);

and have __FILE__, __LINE__, and __FUNCTION__ passed as arguments
behind the scenes.

Instead, to achieve that, make_node has to be called like so

  t = make_node (code PASS_MEM_STAT);

Otherwise calling make_node() with just one argument will end up
calling it with the defaults.

Martin



Re: [PATCH, RFC] Proposed PowerPC IEEE 128-bit floating point changes

2017-07-31 Thread Joseph Myers
On Fri, 21 Jul 2017, Michael Meissner wrote:

> The first change is to enable the C language to use _Float128 keyword (but not
> __float128) without having to use the -mfloat128 option on power7-power9
> systems.  My question is in the TR that introduced _Float128, is there any
> expectation that outside of the built-in functions we already provide, that we
> need to provide runtime functions?  Yes, glibc 2.26 will be coming along
> shortly, and it should provide most/all of the F128 functions, but distros
> won't pick this library up for some time.

In formal standards terms, TS 18661-3 builds on TS 18661-1 (for binary 
floating point) or 18661-2 (for decimal floating point), both of which add 
the contents of ,  and the numeric conversion functions 
from  to the library features required of freestanding 
implementations.

I don't think that makes any difference to what GCC should do, however.  
Just as stdatomic.h is provided by GCC, though strictly a hosted library 
feature rather than one required of freestanding implementations, and just 
as GCC always requires a C library to provide memcpy / memmove / memset, 
even the case of a freestanding implementation requires a cooperating 
compiler / library pair, and in this case it makes the most sense for 
those library facilities (both C11 ones and newer TS 18661 ones such as 
for float128) to come from an external libc/libm.  At some point I should 
update standards.texi to discuss these variations from the old rule of 
thumb of GCC providing exactly those library facilities required of 
freestanding implementations.  Practically, code expecting compiler 
support for _Float128 to imply library support is just like code expecting 
__STDC_VERSION__ to imply C11 library support; people may write such code, 
but it's not practically portable.

-- 
Joseph S. Myers
jos...@codesourcery.com


New French PO file for 'gcc' (version 7.1.0)

2017-07-31 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

http://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-7.1.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH, rs6000] Fix PR81622

2017-07-31 Thread Bill Schmidt

> On Jul 31, 2017, at 11:27 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jul 31, 2017 at 11:19:26AM -0500, Bill Schmidt wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81622 reports an ICE-on-invalid
>> for the vec_ld and vec_st built-in functions.  This fires when the last
>> argument of the built-in is not a pointer or array type, as is required.
>> We break on this during early expansion of the built-ins into tree code
>> during parsing.  The solution, as with other ill-formed uses of these
>> built-ins, is to avoid the early expansion when the argument has an invalid
>> type, so that normal error handling can kick in later.
>> 
>> (The long-term solution is to move the vec_ld and vec_st built-ins to the
>> gimple folding work that Will Schmidt has been doing, but that hasn't
>> happened yet.)
>> 
>> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
>> Is this ok for trunk and GCC 7?  I'd like to get it into 7.2 since it
>> is a 7 regression.
> 
> See the patch I've attached in the PR, this isn't sufficient
> (and for the ARRAY_TYPE I wonder if you can ever see there ARRAY_TYPE),
> the function has various other issues, including e.g. ICE on
> vec_cmpne (1, 2) with -mpower9.

Yes, I'll withdraw the patch (but the ARRAY_TYPE thing is necessary as
we've discussed).  I'll step out of your way on this one since you've got it
well in hand.  It would be great to have a fix in for 7.2, though.

Bill

> 
>   Jakub
> 



Re: [PATCH] enhance overflow and truncation detection in strncpy and strncat (PR 81117)

2017-07-31 Thread Martin Sebor

So I think the fixes exposed by the new warning are OK to go in as-is
immediately if you wish to do so.  Minor questions on the actual
improved warnings inline.



Sure, thanks.




-static inline tree
+static tree
 compute_objsize (tree dest, int ostype)
 {

...

+  type = TYPE_MAIN_VARIANT (type);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+return TYPE_SIZE_UNIT (type);

So I *think* TYPE_SIZE_UNIT isn't necessarily guaranteed to be a
INTEGER_CST, it could be a non-constant expression for the size.  Are
the callers of compute_objsize prepared to handle that?  Just to be
clear, I'd prefer to return TYPE_SIZE_UNIT even if it's not an
INTEGER_CST, I'm just worried about the callers ability to handle that
correctly.


They should be prepared for it.  If not, it's a bug.  I've added
a few more test cases though I'm not sure the case you're concerned
about actually arises (VLA sizes are represented as gimple calls to
__builtin_alloca_with_align so the code doesn't get this far).




diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d0b9050..e4208d9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5170,6 +5170,57 @@ whether to issue a warning.  Similarly to 
@option{-Wstringop-overflow=3} this
 setting of the option may result in warnings for benign code.
 @end table

+@item -Wstringop-truncation
+@opindex Wstringop-overflow
+@opindex Wno-stringop-overflow


Looks like I have a couple of typos up there.  The option name
should be the same in all three entries.  Let me fix that.


+Warn for calls to bounded string manipulation functions such as @code{strncat},
+@code{strncpy}, and @code{stpncpy} that may either truncate the copied string
+or leave the destination unchanged.
+
+In the following example, the call to @code{strncat} specifies the length
+of the source string as the bound.  If the destination contains a non-empty
+string the copy of the source will be truncated.  Therefore, the call is
+diagnosed.  To avoid the warning use @code{strlen (d) - 4)} as the bound;
+
+@smallexample
+void append (char *d)
+@{
+  strncat (d, ".txt", 4);
+@}
+@end smallexample

Sorry.  I don't follow this particular example.  Where's the truncation
when strlen (SRC) == N for strncat?  In that case aren't we going to
append SRC without the nul terminator, then nul terminate the result?
Isn't that the same as appending SRC with its nul terminator?  Am I
missing something here?


You're right that there is no truncation and the effect is
the same but only in the unlikely case when the destination
is empty.  Otherwise the result is truncated.

(Although well defined, calling strncat with an empty destination
and with the bound as big as the source is a misuse of the API.
The expected way to copy a string is to call one of the copy
functions and the warning relies on this as the basis for the
diagnostic.)



Also your advice for avoiding the warning seems wrong.  Isn't it going
to produce a huge value if strlen (d) < 4?


Right, that advice is wrong.  I had fixed this in my local tree
before posting the patch but must not have updated the patch.
Let me refresh the patch.


 @item -Wsizeof-array-argument
 @opindex Wsizeof-array-argument
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index d94dc9c..5d021f9 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1856,21 +1893,68 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator *gsi)
   return true;
 }

+  if (!nowarn && cmpsrc <= 0)
+   {
+ /* To avoid certain truncation the specified bound should also
+not be equal to (or less than) the length of the source.  */
+ location_t loc = gimple_location (stmt);
+ if (warning_at (loc, OPT_Wstringop_truncation,
+ cmpsrc == 0
+ ? G_("%qD specified bound %E equals source length")
+ : G_("%qD specified bound %E is less than source "
+  "length %u"),
+ gimple_call_fndecl (stmt), len, srclen))
+   gimple_set_no_warning (stmt, true);
+   }

So I think this corresponds to the example above I asked about.  Where's
the truncation when the bound is equal to the source length?


Yes.  Hopefully the above answers the question.


NIT: s/potentiall/potentially/



Sure. Fixed along with the other typo above.




@@ -2512,6 +2651,19 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi)
  case BUILT_IN_STPCPY_CHK_CHKP:
handle_builtin_strcpy (DECL_FUNCTION_CODE (callee), gsi);
break;
+
+ case BUILT_IN_STRNCAT:
+ case BUILT_IN_STRNCAT_CHK:
+   check_builtin_strncat (DECL_FUNCTION_CODE (callee), gsi);
+   break;
+
+ case BUILT_IN_STPNCPY:
+ case BUILT_IN_STPNCPY_CHK:
+ case BUILT_IN_STRNCPY:
+ case BUILT_IN_STRNCPY_CHK:
+   check_builtin_stxncpy (DECL_FUNCTION_CODE (callee), gsi);
+   break;
+

So we've got calls to check the 

[PATCH, Fortran] Support for legacy %FILL fields in STRUCTUREs

2017-07-31 Thread Fritz Reese
All,

Attached is a patch which extends the DEC STRUCTURE support to include
special fields named '%FILL'. These fields generate anonymous
inaccessible components, commonly used in DEC FORTRAN to control
alignment of fields within STRUCTUREs. The patch is fairly
straightforward. The match is performed during parsing in decl.c
(variable_decl) by generating a component with an internal anonymous
name which is an invalid Fortran identifier to replace any components
specified as '%FILL'.

Regtests on x86_64-redhat-linux. OK for trunk?

---
Fritz Reese


>From a94d1aefe608447ff0de469ca90cbcc4942f6168 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 14:18:25 -0400
Subject: [PATCH] Support for legacy %FILL field in STRUCTUREs.

gcc/fortran/
* decl.c (attr_seen): New static variable.
* decl.c (variable_decl): Match %FILL in STRUCTURE body.
* gfortran.texi: Update documentation.

gcc/testsuite/gfortran.dg/
* dec_structure_18.f90, dec_structure_19.f90, dec_structure_20.f90,
dec_structure_21.f90: New.
---
From a94d1aefe608447ff0de469ca90cbcc4942f6168 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 14:18:25 -0400
Subject: [PATCH] Support for legacy %FILL field in STRUCTUREs.

gcc/fortran/
	* decl.c (attr_seen): New static variable.
	* decl.c (variable_decl): Match %FILL in STRUCTURE body.
	* gfortran.texi: Update documentation.

gcc/testsuite/gfortran.dg/
	* dec_structure_18.f90, dec_structure_19.f90, dec_structure_20.f90,
	dec_structure_21.f90: New.
---
 gcc/fortran/decl.c | 56 +-
 gcc/fortran/gfortran.texi  | 14 +++
 gcc/testsuite/gfortran.dg/dec_structure_18.f90 | 38 +
 gcc/testsuite/gfortran.dg/dec_structure_19.f90 | 38 +
 gcc/testsuite/gfortran.dg/dec_structure_20.f90 | 18 +
 gcc/testsuite/gfortran.dg/dec_structure_21.f90 | 10 +
 6 files changed, 172 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_18.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_19.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_20.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_21.f90

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index bd310703557..0c56b4010bc 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -54,6 +54,7 @@ static gfc_typespec current_ts;
 static symbol_attribute current_attr;
 static gfc_array_spec *current_as;
 static int colon_seen;
+static int attr_seen;
 
 /* The current binding label (if any).  */
 static const char* curr_binding_label;
@@ -2140,6 +2141,7 @@ static match
 variable_decl (int elem)
 {
   char name[GFC_MAX_SYMBOL_LEN + 1];
+  static unsigned int fill_id = 0;
   gfc_expr *initializer, *char_len;
   gfc_array_spec *as;
   gfc_array_spec *cp_as; /* Extra copy for Cray Pointees.  */
@@ -2157,9 +2159,47 @@ variable_decl (int elem)
   /* When we get here, we've just matched a list of attributes and
  maybe a type and a double colon.  The next thing we expect to see
  is the name of the symbol.  */
-  m = gfc_match_name (name);
+
+  /* If we are parsing a structure with legacy support, we allow the symbol
+ name to be '%FILL' which gives it an anonymous (inaccessible) name.  */
+  m = MATCH_NO;
+  gfc_gobble_whitespace ();
+  if (gfc_peek_ascii_char () == '%')
+{
+  gfc_next_ascii_char ();
+  m = gfc_match ("fill");
+}
+
   if (m != MATCH_YES)
-goto cleanup;
+{
+  m = gfc_match_name (name);
+  if (m != MATCH_YES)
+	goto cleanup;
+}
+
+  else
+{
+  m = MATCH_ERROR;
+  if (gfc_current_state () != COMP_STRUCTURE)
+	{
+	  if (flag_dec_structure)
+	gfc_error ("%s not allowed outside STRUCTURE at %C", "%FILL");
+	  else
+	gfc_error ("%s at %C is a DEC extension, enable with "
+		   "%<-fdec-structure%>", "%FILL");
+	  goto cleanup;
+	}
+
+  if (attr_seen)
+	{
+	  gfc_error ("%s entity cannot have attributes at %C", "%FILL");
+	  goto cleanup;
+	}
+
+  /* %FILL components are given invalid fortran names.  */
+  snprintf (name, GFC_MAX_SYMBOL_LEN + 1, "%%FILL%u", fill_id++);
+  m = MATCH_YES;
+}
 
   var_locus = gfc_current_locus;
 
@@ -2260,6 +2300,14 @@ variable_decl (int elem)
 	}
 }
 
+  /* %FILL components may not have initializers.  */
+  if (strncmp (name, "%FILL", 5) == 0 && gfc_match_eos () != MATCH_YES)
+{
+  gfc_error ("%s entity cannot have an initializer at %C", "%FILL");
+  m = MATCH_ERROR;
+  goto cleanup;
+}
+
   /*  If this symbol has already shown up in a Cray Pointer declaration,
   and this is not a component declaration,
   then we want to set the type & bail out.  */
@@ -3858,6 +3906,7 @@ match_attr_spec (void)
 
   current_as = NULL;
   colon_seen = 0;
+  attr_seen = 0;
 
   /* See if we get all of the keywords up to the 

[PATCH] Update myself in MAINTAINERS

2017-07-31 Thread Richard Henderson
A change of jobs requires a change of email.  And I'm going to finally
admit that I've not been active gcc development in the last year or two.
I am far enough behind that I cannot globally review changes.  That said,
I do not plan to abandon Alpha just yet.


r~
---
 MAINTAINERS | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 18bfbbed619..7effec1287f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22,7 +22,6 @@ Maintainers
 
 Richard Earnshaw   
 Richard Biener 
-Richard Henderson  
 Jakub Jelinek  
 Richard Kenner 
 Jeff Law   
@@ -42,7 +41,7 @@ their own patches from other maintainers or reviewers.
 
 aarch64 port   Marcus Shawcroft
 aarch64 port   Richard Earnshaw
-alpha port Richard Henderson   
+alpha port Richard Henderson   
 arc port   Joern Rennecke  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
@@ -101,7 +100,6 @@ s390 port   Andreas Krebbel 

 score port Chen Liqin  
 sh portAlexandre Oliva 
 sh portOleg Endo   
-sparc port Richard Henderson   
 sparc port David S. Miller 
 sparc port Eric Botcazou   
 spu port   Trevor Smigiel  

@@ -141,7 +139,6 @@ cygwin, mingw-w64   Jonathan Yong   
<10wa...@gmail.com>
Language Front Ends Maintainers
 
 C front end/ISO C99Joseph Myers
-C front end/ISO C99Richard Henderson   
 Ada front end  Arnaud Charlet  
 Ada front end  Eric Botcazou   
 BRIG (HSAIL) front end Pekka Jääskeläinen  

@@ -162,7 +159,6 @@ fp-bit  Ian Lance Taylor

 libdecnumber   Ben Elliston
 libgcc Ian Lance Taylor
 libgo  Ian Lance Taylor
-libgompRichard Henderson   
 libgompJakub Jelinek   
 libiberty  DJ Delorie  
 libiberty  Ian Lance Taylor
@@ -417,6 +413,7 @@ Mark Heffernan  

 George Helffrich   
 Daniel Hellstrom   
 Fergus Henderson   
+Richard Henderson  
 Stuart Henderson   
 Matthew Hiller 
 Kazu Hirata
-- 
2.13.3



Re: [1/2] PR 78736: New warning -Wenum-conversion

2017-07-31 Thread Prathamesh Kulkarni
On 11 July 2017 at 17:59, Prathamesh Kulkarni
 wrote:
> On 13 June 2017 at 01:47, Joseph Myers  wrote:
>> This is OK with one fix:
>>
>>> +C ObjC Var(warn_enum_conversion) Init(0) Warning LangEnabledBy(C Objc,Wall)
>>
>> I believe the LangEnabledBy arguments are case-sensitive, so you need to
>> have ObjC not Objc there for it to work correctly.  (*.opt parsing isn't
>> very good at detecting typos and giving errors rather than silently
>> ignoring things.)
> Hi,
> Sorry for the late response, I was on a vacation.
> The attached patch is rebased and bootstrap+tested on 
> x86_64-unknown-linux-gnu.
> I have modified it slightly to not warn for enums with different names
> but having same value ranges.
> For eg:
> enum e1 { e1_1, e1_2 };
> enum e2 { e2_1, e2_2 };
>
> enum e1 x = e2_1;
> With this version, there would be no warning for the above assignment
> since both e1 and e2 have
> same value ranges.  Is that OK ?
>
> The patch has following fallouts in the testsuite:
>
> a) libgomp:
> I initially assume it was a false positive because I thought enum
> gomp_schedule_type
> and enum omp_sched_t have same value-ranges but it looks like omp_sched_t
> has range [1, 4] while gomp_schedule_type has range [0, 4] with one
> extra element.
> Is the warning then correct for this case ?
>
> b) libgfortran:
> i) Implicit conversion from unit_mode to file_mode
> ii) Implicit conversion from unit_sign_s to unit_sign.
> I suppose the warning is OK for these cases since unit_mode, file_mode
> have different value-ranges and similarly for unit_sign_s, unit_sign ?
>
> Also I tested the warning by compiling the kernel for x86_64 with
> allmodconifg (attached), and there
> have been quite few instances of the warning (attached). I have been
> through few cases which I don't think are false positives
> but I wonder then whether we should relegate the warning to Wextra instead ?
ping: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00514.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com


Re: [PATCH 00/19] cleanup of memory stats prototypes

2017-07-31 Thread Jeff Law
On 07/27/2017 02:43 AM, Richard Biener wrote:
> On Thu, Jul 27, 2017 at 10:30 AM,   wrote:
>> From: Trevor Saunders 
>>
>> The preC++ way of passing information about the call site of a function was 
>> to
>> use a macro that passed __file__, __LINE__, and __FUNCTION__ to a function 
>> with
>> the same name with _stat appended to it.  The way this is now done with C++ 
>> is
>> to have arguments where the default value is __LINE__, __FILE__, and
>> __FUNCTION__ in the caller.  This has the significant advantage that if you
>> look for "^function (" you find the correct function, where in the C way of
>> doing things you need to realize its a macro and check the definition of the
>> macro to see what to look for next.  So this removes a layer of indirection,
>> and makes things somewhat more consistant in using the C++ way of doing 
>> things.
>>
>> patches independently bootstrapped and regtested on ppc64le-linux-gnu.  I
>> successfully ran make all-gcc with --enable-gather-detailed-mem-stats, but
>> couldn't complete a bootstrap before the series was applied, because the
>> ddrs_table in tree-loop-distribution.c causes memory statistics gathering to 
>> crash before the series as well as after it.  ok?
> 
> Thanks!  This was on my list of things todo...
> 
> The series is ok.
Just wanted to relay a thanks from me too.  I always find it annoying to
remember which functions are _stat thingies when debugging.  No more!

Jeff


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-07-31 Thread Jeff Law
On 07/20/2017 12:59 AM, Martin Liška wrote:
> Hello.
> 
> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> (with a known constant as source)
> so that we use same type of expansion (direct insns emission, direct emission 
> with a loop instruction and
> library call). As mentioned in the PR, glibc does not provide an optimized 
> version for majority of targets.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-07-17  Martin Liska  
> 
>   PR middle-end/70140
>   * gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
> 
> gcc/ChangeLog:
> 
> 2017-07-17  Martin Liska  
> 
>   PR middle-end/70140
>   * builtins.c (expand_builtin_memcpy_args): Remove.
>   (expand_builtin_memcpy): Call newly added function
>   expand_builtin_memory_copy_args.
>   (expand_builtin_memcpy_with_bounds): Likewise.
>   (expand_builtin_mempcpy): Remove last argument.
>   (expand_builtin_mempcpy_with_bounds): Likewise.
>   (expand_builtin_memory_copy_args): New function created from
>   expand_builtin_mempcpy_args with small modifications.
>   (expand_builtin_mempcpy_args): Remove.
>   (expand_builtin_stpcpy): Remove unused argument.
>   (expand_builtin): Likewise.
>   (expand_builtin_with_bounds): Likewise.


> 
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 2deef725620..016f68d2cb6 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3493,94 +3419,103 @@ expand_builtin_mempcpy_with_bounds (tree exp, rtx 
> target, machine_mode mode)
>  }
>  }
>  
> -/* Helper function to do the actual work for expand_builtin_mempcpy.  The
> -   arguments to the builtin_mempcpy call DEST, SRC, and LEN are broken out
> -   so that this can also be called without constructing an actual CALL_EXPR.
> -   The other arguments and return value are the same as for
> -   expand_builtin_mempcpy.  */
> +/* Helper function to do the actual work for expand of memory copy family
> +   functions (memcpy, mempcpy, stpcpy).  Expansing should assign LEN bytes
"Expansing"?

OK with the nit fixed.


Jeff




Re: [PATCH 6/6] qsort comparator consistency checking

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote:
> I  must have missed something.  Can't you just define
> 
> qsort (BASE, NMEMB, SIZE, COMPARE) into
> 
> qsort_chk (BASE, NMEMB, SIZE, COMPARE)
> 
> That shouldn't affect the qsort from vec?  Right?  Or am I missing something

If you do

  #define qsort(base, n, sz, cmp) qsort_chk (base, n, sz, cmp)

then all invocations of vec::qsort, i.e.

  myvec.qsort (mycmp);

will yield a preprocessing error due to wrong number of arguments supplied
to the qsort macro (one instead of four).

Alexander


Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-07-31 Thread Prathamesh Kulkarni
On 23 May 2017 at 19:10, Prathamesh Kulkarni
 wrote:
> On 19 May 2017 at 19:02, Jan Hubicka  wrote:
>>>
>>> * LTO and memory management
>>> This is a general question about LTO and memory management.
>>> IIUC the following sequence takes place during normal LTO:
>>> LGEN: generate_summary, write_summary
>>> WPA: read_summary, execute ipa passes, write_opt_summary
>>>
>>> So I assumed it was OK in LGEN to allocate return_callees_map in
>>> generate_summary and free it in write_summary and during WPA, allocate
>>> return_callees_map in read_summary and free it after execute (since
>>> write_opt_summary does not require return_callees_map).
>>>
>>> However with fat LTO, it seems the sequence changes for LGEN with
>>> execute phase takes place after write_summary. However since
>>> return_callees_map is freed in pure_const_write_summary and
>>> propagate_malloc() accesses it in execute stage, it results in
>>> segmentation fault.
>>>
>>> To work around this, I am using the following hack in 
>>> pure_const_write_summary:
>>> // FIXME: Do not free if -ffat-lto-objects is enabled.
>>> if (!global_options.x_flag_fat_lto_objects)
>>>   free_return_callees_map ();
>>> Is there a better approach for handling this ?
>>
>> I think most passes just do not free summaries with -flto.  We probably want
>> to fix it to make it possible to compile multiple units i.e. from plugin by
>> adding release_summaries method...
>> So I would say it is OK to do the same as others do and leak it with -flto.
>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
>>> index e457166ea39..724c26e03f6 100644
>>> --- a/gcc/ipa-pure-const.c
>>> +++ b/gcc/ipa-pure-const.c
>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-scalar-evolution.h"
>>>  #include "intl.h"
>>>  #include "opts.h"
>>> +#include "ssa.h"
>>>
>>>  /* Lattice values for const and pure functions.  Everything starts out
>>> being const, then may drop to pure and then neither depending on
>>> @@ -69,6 +70,15 @@ enum pure_const_state_e
>>>
>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};
>>>
>>> +enum malloc_state_e
>>> +{
>>> +  PURE_CONST_MALLOC_TOP,
>>> +  PURE_CONST_MALLOC,
>>> +  PURE_CONST_MALLOC_BOTTOM
>>> +};
>>
>> It took me a while to work out what PURE_CONST means here :)
>> I would just call it something like STATE_MALLOC_TOP... or so.
>> ipa_pure_const is outdated name from the time pass was doing only
>> those two.
>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;
>>>
>>>  static vec funct_state_vec;
>>>
>>> +/* A map from node to subset of callees. The subset contains those callees
>>> + * whose return-value is returned by the node. */
>>> +static hash_map< cgraph_node *, vec* > *return_callees_map;
>>> +
>>
>> Hehe, a special case of return jump function.  We ought to support those 
>> more generally.
>> How do you keep it up to date over callgraph changes?
>>> @@ -921,6 +1055,23 @@ end:
>>>if (TREE_NOTHROW (decl))
>>>  l->can_throw = false;
>>>
>>> +  if (ipa)
>>> +{
>>> +  vec v = vNULL;
>>> +  l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
>>> +  if (DECL_IS_MALLOC (decl))
>>> + l->malloc_state = PURE_CONST_MALLOC;
>>> +  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))
>>> + {
>>> +   l->malloc_state = PURE_CONST_MALLOC_TOP;
>>> +   vec *callees_p = new vec (vNULL);
>>> +   for (unsigned i = 0; i < v.length (); ++i)
>>> + callees_p->safe_push (v[i]);
>>> +   return_callees_map->put (fn, callees_p);
>>> + }
>>> +  v.release ();
>>> +}
>>> +
>>
>> I would do non-ipa variant, too.  I think most attributes can be detected 
>> that way
>> as well.
>>
>> The patch generally makes sense to me.  It would be nice to make it easier 
>> to write such
>> a basic propagators across callgraph (perhaps adding a template doing the 
>> basic
>> propagation logic). Also I think you need to solve the problem with keeping 
>> your
>> summaries up to date across callgraph node removal and duplications.
> Thanks for the suggestions, I will try to address them in a follow-up patch.
> IIUC, I would need to modify ipa-pure-const cgraph hooks -
> add_new_function, remove_node_data, duplicate_node_data
> to keep return_callees_map up-to-date across callgraph node insertions
> and removal ?
>
> Also, if instead of having a separate data-structure like return_callees_map,
> should we rather have a flag within cgraph_edge, which marks that the
> caller may return the value of the callee ?
Hi,
Sorry for the very late response. I have attached an updated version
of the prototype patch,
which adds a non-ipa variant, and keeps return_callees_map up-to-date
across callgraph
node insertions and removal. For the non-ipa variant,
malloc_candidate_p() additionally checks
that all the "return callees" have DECL_IS_MALLOC set to true.
Bootstrapped+tested and LTO bootstrapped+tested 

Re: [PATCH 6/6] qsort comparator consistency checking

2017-07-31 Thread Jeff Law
On 07/15/2017 02:47 PM, Alexander Monakov wrote:
> This is the updated qsort comparator verifier.
> 
> Since we have vec::qsort(cmp), the patch uses the macro argument counting
> trick to redirect only the four-argument invocations of qsort to qsort_chk.
> I realize that won't win much sympathies, but a patch doing mass-renaming
> of qsort in the whole GCC codebase would win even fewer, I suspect.
> 
> The macro ENABLE_FULL_QSORT_CHECKING could be used to enable full O(n^2)
> checking, but there's no accompanying configure.ac change.  I'm not sure
> that would be useful in practice, so I'd rather turn it into #if 0.
> 
> The suppression in genmodes was needed because it's not linked with vec.o.
> 
>   * genmodes.c (calc_wider_mode): Suppress qsort macro.
> * system.h [CHECKING_P] (qsort): Redirect to qsort_chk.
> (qsort_nochk): New macro.
> (qsort_chk): Declare.
> (qsort_disable_checking): Declare.
> * vec.c (qsort_chk_error): New static function.
> (qsort_disable_checking): Define.
> (qsort_chk): New function.
I  must have missed something.  Can't you just define

qsort (BASE, NMEMB, SIZE, COMPARE) into

qsort_chk (BASE, NMEMB, SIZE, COMPARE)

That shouldn't affect the qsort from vec?  Right?  Or am I missing something

Jeff


Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote:
> > Please consider that expand_mem_thread_fence is used to place fences around
> > seq-cst atomic loads when the backend doesn't provide a direct 
> > pattern.
> > With compiler barriers on both sides of the machine barrier, the generated
> > sequence for a seq-cst atomic load will be 7 insns:
> > 
> >   asm volatile ("":::"memory");
> >   machine_seq_cst_fence ();
> >   asm volatile ("":::"memory");
> >   dst = mem[src];
> >   asm volatile ("":::"memory");
> >   machine_seq_cst_fence ();
> >   asm volatile ("":::"memory");
> > 
> > I can easily imagine people looking at RTL dumps with this overkill fencing
> > being unhappy about this.
> But the extra fences aren't actually going to impact anything except
> perhaps an unmeasurable compile-time hit.  ie, it may look bad, but I'd
> have a hard time believing it matters in practice.

I agree it doesn't matter that much from compile-time/memory standpoint.
I meant it matters from the standpoint of a person working with the RTL dumps,
i.e. having to work through all that, especially if they need to work with
non-slim dumps.

> > I'd be more happy with detecting empty expansion via get_last_insn ().
> That seems like an unnecessary complication to me.

I think it's quite simple by GCC standards:

  {
rtx_insn last = get_last_insn ();
emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
if (last == get_last_insn () && !is_mm_relaxed (model))
  expand_asm_memory_barrier ();
  }

> I'd prefer instead to just emit the necessary fencing in the generic code
> and update the MD docs so that maintainers know they don't have to emit
> the RTL fences themselves.

I agree the docs need an update, but hopefully not in this way.  The legacy
__sync_synchronize barrier has always been required to be a compiler barrier
when expanded to RTL, and it's quite natural to use the same RTL structure
for new atomic fences as the old memory_barrier expansion.  The only problem
in current practice seen so far is with empty expansion for new patterns.

Thanks.
Alexander


Re: [PATCH] Add -std=c++2a

2017-07-31 Thread Joseph Myers
On Thu, 20 Jul 2017, Andrew Sutton wrote:

> This adds a new C++ dialect, enabled by -std=c++2a.

I don't see documentation here.  I'd expect additions to invoke.texi and 
standards.texi discussing the new option.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix libquadmath regression (bootstrap failure) on FreeBSD

2017-07-31 Thread Joseph Myers
On Wed, 19 Jul 2017, Jakub Jelinek wrote:

> >  And can you please push this
> > to glibc as well (which I do not have access to)?
> 
> glibc uses u_int32_t in all the places of *powl.c where libquadmath
> uses uint32_t, and various other files, so I guess it is intentional.

I'd consider it a relic of old fdlibm code, i.e. code (all over libm) that 
hasn't yet been cleaned up to use uintN_t instead of u_intN_t internally 
(but which it would be desirable to clean up like that).

I think it would be desirable to be able to do libquadmath imports via a 
script that automatically applies substitutions to turn glibc code into 
code that can be used in libquadmath.  Right now required substitutions 
would include s/u_int/uint/, and converting L() to apply appropriate 
suffixes, and converting function calls to the *q form, while some other 
substitutions such as s/_Float128/__float128/ would not actually be needed 
for the code to build but would make the results of substitution closer to 
the present libquadmath contents, and so easier to compare.  If then the 
number of substitutions needed can be reduced over time (e.g. defining L() 
as a macro in quadmath-imp.h rather than substituting it in the sources), 
that would be good.  Moving glibc code (across libm, not just ldbl-128) to 
use uintN_t would be in that category.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 5/6] haifa-sched.c: give up qsort checking when autoprefetch heuristic is in use

2017-07-31 Thread Jeff Law
On 07/15/2017 02:47 PM, Alexander Monakov wrote:
> The autopref_rank_for_schedule sub-comparator and its subroutine
> autopref_rank_data lack transitivity.  Skip checking if they are in use.
> 
> This heuristic is disabled by default everywhere except ARM and AArch64,
> so on other targets this does not suppress  checking all the time.
> 
>   * haifa-sched.c (ready_sort_real): Disable qsort checking if
> autoprefetch heuristic will be used.
> (autopref_rank_for_schedule): Add FIXME.
This is also OK once the final patch for consistency checking is approved.

jeff


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Bill Schmidt

> On Jul 31, 2017, at 12:40 PM, Michael Meissner  
> wrote:
> 
> On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
 (define_insn "vsx_concat_"
 -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
 +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
 -   (match_operand: 1 "gpc_reg_operand" ",b")
 -   (match_operand: 2 "gpc_reg_operand" ",b")))]
 +   (match_operand: 1 "gpc_reg_operand" "wa,b")
 +   (match_operand: 2 "gpc_reg_operand" "wa,b")))]
  "VECTOR_MEM_VSX_P (mode)"
 {
  if (which_alternative == 0)
 -return (BYTES_BIG_ENDIAN
 -  ? "xxpermdi %x0,%x1,%x2,0"
 -  : "xxpermdi %x0,%x2,%x1,0");
 +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
 
  else if (which_alternative == 1)
 -return (BYTES_BIG_ENDIAN
 +return (VECTOR_ELT_ORDER_BIG
? "mtvsrdd %x0,%1,%2"
: "mtvsrdd %x0,%2,%1");
>>> 
>>> This one could be
>>> 
>>> {
>>> if (!BYTES_BIG_ENDIAN)
>> 
>> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general 
>> bitrot associated with -maltivec=be that needs to be addressed, or we need 
>> to give up on it altogether.  Still of two minds about this.)
>> 
>> Bill
> 
> In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
> (I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
> to look at it again.

Hi Mike,

You misunderstand me, I think you had it right (you did move to 
VECTOR_ELT_ORDER_BIG here)
but I just wanted to clarify that Segher's suggestion would also need to use 
VECTOR_ELT_ORDER_BIG.

Thanks,
Bill

> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Michael Meissner
On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
> >> (define_insn "vsx_concat_"
> >> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
> >> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >>(vec_concat:VSX_D
> >> -   (match_operand: 1 "gpc_reg_operand" ",b")
> >> -   (match_operand: 2 "gpc_reg_operand" ",b")))]
> >> +   (match_operand: 1 "gpc_reg_operand" "wa,b")
> >> +   (match_operand: 2 "gpc_reg_operand" "wa,b")))]
> >>   "VECTOR_MEM_VSX_P (mode)"
> >> {
> >>   if (which_alternative == 0)
> >> -return (BYTES_BIG_ENDIAN
> >> -  ? "xxpermdi %x0,%x1,%x2,0"
> >> -  : "xxpermdi %x0,%x2,%x1,0");
> >> +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >> 
> >>   else if (which_alternative == 1)
> >> -return (BYTES_BIG_ENDIAN
> >> +return (VECTOR_ELT_ORDER_BIG
> >>? "mtvsrdd %x0,%1,%2"
> >>: "mtvsrdd %x0,%2,%1");
> > 
> > This one could be
> > 
> > {
> >  if (!BYTES_BIG_ENDIAN)
> 
> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general 
> bitrot associated with -maltivec=be that needs to be addressed, or we need to 
> give up on it altogether.  Still of two minds about this.)
> 
> Bill

In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
(I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
to look at it again.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 4/6] lra-assigns.c: give up on qsort checking in assign_by_spills

2017-07-31 Thread Jeff Law
On 07/15/2017 02:47 PM, Alexander Monakov wrote:
> The reload_pseudo_compare_func comparator, when used from assign_by_spills,
> can be non-transitive, indicating A < B < C < A if both A and C satisfy
> !bitmap_bit_p (_reload_pseudos, rAC), but B does not.
> 
> This function was originally a proper comparator, and the problematic
> clause was added to fix PR 57878:
> https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00732.html
> 
> That the comparator is invalid implies that that PR, if it still exists,
> can reappear (but probably under more complicated circumstances).
> 
> This looks like a sensitive area, so disabling checking is the only
> obvious approach.
> 
>   * lra-assigns.c (reload_pseudo_compare_func): Add a FIXME.
> (assign_by_spills): Use non-checking qsort.
This is OK once the final comparator consistency checking patch is approved.

jeff


Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

2017-07-31 Thread Michael Meissner
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles 
> > and
> > longs where the elements in the vector set may have been extracted from 
> > another
> > vector (PR target/81593):
> 
> > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> > emit XXPERMDI accessing either double word in either vector
> > register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi,
etc.), which is what the other functions used.

> > +/* Emit a XXPERMDI instruction that can extract from either double word of 
> > the
> > +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 
> > giving
> > +   which double word to be used for the operand.  */
> > +
> > +const char *
> > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> > +{
> > +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> > +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> > +
> > +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> > +
> > +  if (BYTES_BIG_ENDIAN)
> > +{
> > +  operands[3] = GEN_INT (2*op1_dword + op2_dword);
> > +  return "xxpermdi %x0,%x1,%x2,%3";
> > +}
> > +  else
> > +{
> > +  if (element1)
> > +   op1_dword = 1 - op1_dword;
> > +
> > +  if (element2)
> > +   op2_dword = 1 - op2_dword;
> > +
> > +  operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> > +  return "xxpermdi %x0,%x2,%x1,%3";
> > +}
> > +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

Ok, let me think on it.

> 
> >  (define_insn "vsx_concat_"
> > -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=,we")
> > +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> > (vec_concat:VSX_D
> > -(match_operand: 1 "gpc_reg_operand" ",b")
> > -(match_operand: 2 "gpc_reg_operand" ",b")))]
> > +(match_operand: 1 "gpc_reg_operand" "wa,b")
> > +(match_operand: 2 "gpc_reg_operand" "wa,b")))]
> >"VECTOR_MEM_VSX_P (mode)"
> >  {
> >if (which_alternative == 0)
> > -return (BYTES_BIG_ENDIAN
> > -   ? "xxpermdi %x0,%x1,%x2,0"
> > -   : "xxpermdi %x0,%x2,%x1,0");
> > +return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >  
> >else if (which_alternative == 1)
> > -return (BYTES_BIG_ENDIAN
> > +return (VECTOR_ELT_ORDER_BIG
> > ? "mtvsrdd %x0,%1,%2"
> > : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>   if (!BYTES_BIG_ENDIAN)
> std::swap (operands[1], operands[2]);
> 
>   switch (which_alternative)
> {
> case 0:
>   return "xxpermdi %x0,%x1,%x2,0";
> case 1:
>   return "mtvsrdd %x0,%1,%2";
> default:
>   gcc_unreachable ();
> }
> }

> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)

For me no, xxpermdi is clearer.  But if you want xxmrghd, I can do it.

> > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
> > (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)
> >  (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
> > (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)  (revision 250640)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +vector double
> > +test_vpasted (vector double high, vector double low)
> > +{
> > +  vector double res;
> > +  res[1] = high[1];
> > +  res[0] = low[0];
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?

It is kind of hard to test a negative, without trying to guess what possible
instructions could be generated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Jeff Law
On 07/31/2017 11:02 AM, Alexander Monakov wrote:
> On Mon, 31 Jul 2017, Jeff Law wrote:
 In the middle end patch, do we need a barrier before the fence as well?
 The post-fence barrier prevents reordering the fence with anything which
 follows the fence.  But do we have to also prevent reordering the fence
 with prior instructions with any of the memory models?  This isn't my
 area of expertise, so if it's dumb question, don't hesitate to let me
 know :-)
>>>
>>> That depends on how pessimistic we want to be with respect to backend
>>> getting it wrong.  My expectation here is that if a backend emits non-empty
>>> RTL, the produced sequence for the fence itself acts as a compiler memory
>>> barrier.
>> Perhaps. But do we really want to rely on that?  EMitting a scheduling
>> barrier prior to these atomics is virtually free.
> 
> Please consider that expand_mem_thread_fence is used to place fences around
> seq-cst atomic loads when the backend doesn't provide a direct pattern.
> With compiler barriers on both sides of the machine barrier, the generated
> sequence for a seq-cst atomic load will be 7 insns:
> 
>   asm volatile ("":::"memory");
>   machine_seq_cst_fence ();
>   asm volatile ("":::"memory");
>   dst = mem[src];
>   asm volatile ("":::"memory");
>   machine_seq_cst_fence ();
>   asm volatile ("":::"memory");
> 
> I can easily imagine people looking at RTL dumps with this overkill fencing
> being unhappy about this.
But the extra fences aren't actually going to impact anything except
perhaps an unmeasurable compile-time hit.  ie, it may look bad, but I'd
have a hard time believing it matters in practice.

> 
> I'd be more happy with detecting empty expansion via get_last_insn ().
That seems like an unnecessary complication to me.  I'd prefer instead
to just emit the necessary fencing in the generic code and update the MD
docs so that maintainers know they don't have to emit the RTL fences
themselves.

Jeff


Re: [PATCH] enhance overflow and truncation detection in strncpy and strncat (PR 81117)

2017-07-31 Thread Jeff Law
On 07/08/2017 02:45 PM, Martin Sebor wrote:
> PR 81117 asks for improved detection of common misuses(*) of
> strncpy and strncat.  The attached patch is my solution.  It
> consists of three related sets of changes:
> 
> 1) Adds a new option, -Wstringop-truncation, that diagnoses calls
> to strncpy, and stpncpy (and also strncat) that truncate the copy.
> This helps highlight the common but incorrect assumption that
> the first two functions NUL-terminate the copy (see, for example,
> CWE-170)  For strncat, it helps detect cases of inadvertent
> truncation of the source string by passing in a bound that's
> less than or equal to its length.
> 
> 2) Enhances -Wstringon-overflow to diagnose calls of the form
> strncpy(D, S, N) where the bound N depends on a call to strlen(S).
> This misuse is common in legacy code when, often in response to
> the adoption of a secure coding initiative, while replacing uses
> of strcpy with strncpy, the engineer either makes a mistake, or
> doesn't have a good enough understanding of how the function works,
> or does only the bare minimum to satisfy the requirement to avoid
> using strcpy without actually improving anything.
> 
> 3) Enhances -Wsizeof-pointer-memaccess to also warn about uses of
> the functions to copy an array to a destination of an unknown size
> that specify the size of the array as the bound.  Given the
> pervasive [mis]use of strncpy to bound the copy to the size of
> the destination, instances like this suggest a bug: a possible
> buffer overflow due to an excessive bound (see, for example,
> CWE-806).  In cases when the call is safe, it's equivalent to
> the corresponding call to memcpy which is clearer and can be
> more efficient.
> 
> Martin
> 
> PS By coincidence rather than by intent, the strncat warnings
> are a superset of Clang's -Wstrncat-size.  AFAICS, Clang only
> warns when the destination is an array of known size and
> doesn't have a corresponding warning for strncpy.
> 
> [*] Here's some background into these misuses.
> 
> The purpose of the historical strncpy function introduced in V7
> UNIX was to completely fill an array of chars with data, either
> by copying an initial portion of a source string, or by clearing
> it.  I.e., its purpose wasn't to create NUL-terminated strings.
> An example of its use was to fill the directory entry d_name
> array (dirent::d_name) with the name of a file.
> 
> The original purpose of the strncat function, on the other hand,
> was to append a not necessarily NUL-terminated array of chars
> to a string to form a NUL-terminated concatenation of the two.
> An example use case is appending a directory entry (struct
> dirent::d_name) that need not be NUL-terminated, to form
> a pathname which does.
> 
> Largely due to a misunderstanding of the functions' purpose they
> have become commonly used (and misused) to make "safe," bounded
> string copies by safeguarding against accidentally overflowing
> the destination.  This has led to great many bugs and security
> vulnerabilities.
> 
> 
> gcc-81117.diff
> 
> 
> PR c/81117 - Improve buffer overflow checking in strncpy
> 
> gcc/ChangeLog:
> 
>   PR c/81117
>   * builtins.c (compute_objsize): Handle arrays that
>   compute_builtin_object_size likes to fail for.
>   (check_strncpy_sizes): New function.
>   (expand_builtin_strncpy): Call check_strncpy_sizes.
>   * doc/invoke.texi (-Wsizeof-pointer-memaccess): Document enhancement.
>   (-Wstringop-truncation): Document new option.
>   * gimple-fold.c (gimple_fold_builtin_strncpy): Implement
>   -Wstringop-truncation.
>   (gimple_fold_builtin_strncat): Same.
>   * gimple.c (gimple_build_call_from_tree): Set call location.
>   * tree-ssa-strlen.c (strlen_to_stridx): New global variable.
>   (is_strlen_related_p): New function.
>   (check_builtin_strxncpy, check_builtin_strncat): Same.
>   handle_builtin_strlen): Use strlen_to_stridx.
>   (strlen_optimize_stmt): Handle flavors of strncat, strncpy, and
>   stpncpy.
>   Use strlen_to_stridx.
>   (pass_strlen::execute): Release strlen_to_stridx.
> 
> gcc/ada/ChangeLog:
> 
>   PR c/81117
>   * adadecode.c (__gnat_decode): Replace pointless strncpy with
>   memcpy.
>   * argv.c (__gnat_fill_arg): Same.
> 
> gcc/c-family/ChangeLog:
> 
>   PR c/81117
>   * c-common.c (resort_sorted_fields): Replace pointless strncpy
>   with memcpy.
>   * c-warn.c (sizeof_pointer_memaccess_warning): Handle arrays.
>   * c.opt (-Wstriingop-truncation): New option.
> 
> gcc/fortran/ChangeLog:
> 
>   PR c/81117
>   * decl.c (build_sym): Replace pointless strncpy with strcpy.
> 
> gcc/objc/ChangeLog:
> 
>   PR c/81117
>   * objc-encoding.c (encode_type): Replace pointless strncpy with
>   memcpy.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/81117
>   * c-c++-common/Wsizeof-pointer-memaccess3.c: New test.
>   * c-c++-common/Wstringop-overflow.c: New test.
> 

Re: [PATCH] libstdc++: Support std::is_aggregate on clang++ (was [cfe-dev] clang++: std::is_aggregate unusable with clang-5.0/libstdc++-7)

2017-07-31 Thread Tim Song
On Mon, Jul 31, 2017 at 11:13 AM, Tim Song  wrote:
>
> https://clang.llvm.org/docs/LanguageExtensions.html#checks-for-type-trait-primitives
> seems to suggest using __has_extension instead.

Hmm, that doesn't work. Oh well.


Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-07-31 Thread Daniel Santos
Well I just learned how to test 32-bit earlier and I've uncovered a 
problem when running 32-bit tests.  Do you want me to commit the the two 
patches (squashed together) in the mean time?


Thanks,
Daniel




[PATCH, Fortran] Slight cleanup in set_dec_flags

2017-07-31 Thread Fritz Reese
All,

Attached is a trivial patch which exists for posterity: currently
set_dec_flags(int value) enables support for legacy standards and
disables warnings for them etc... Unfortunately it does this
irrespective of the int value parameter. Since set_dec_flags(0) is
called on initialization, the legacy standards are enabled even when
-fdec is not specified on the command-line. However, since
set_dec_flags is called *just before* set_default_std_flags() in
gfc_init_options() the standards are correctly reset to defaults. I
believe a guard around setting the legacy standards from set_dec_flags
to ensure they are only set with value != 0 is appropriate for clarity
and correctness.

I plan to commit to trunk soon if there no objections since the patch
is trivial and does not effectively change compiler behavior at
present.

---
Fritz Reese


>From 805b4909deb450216c1dc522d834173455baca69 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 12:58:30 -0400
Subject: [PATCH] Only set legacy std in set_dec_flags when value!=0 for
 posterity

gcc/fortran/
* options.c (set_dec_flags): Only set legacy standards when value
is not zero.
From 805b4909deb450216c1dc522d834173455baca69 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 12:58:30 -0400
Subject: [PATCH] Only set legacy std in set_dec_flags when value!=0 for
 posterity

gcc/fortran/
	* options.c (set_dec_flags): Only set legacy standards when value
	is not zero.
---
 gcc/fortran/options.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 8cb861bf513..c2f8b8a9398 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -52,11 +52,13 @@ set_default_std_flags (void)
 static void
 set_dec_flags (int value)
 {
-  /* Allow legacy code without warnings.  */
-  gfc_option.allow_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL
-| GFC_STD_GNU | GFC_STD_LEGACY;
-  gfc_option.warn_std &= ~(GFC_STD_LEGACY | GFC_STD_F95_DEL);
-
+  if (value)
+{
+  /* Allow legacy code without warnings.  */
+  gfc_option.allow_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL
+| GFC_STD_GNU | GFC_STD_LEGACY;
+  gfc_option.warn_std &= ~(GFC_STD_LEGACY | GFC_STD_F95_DEL);
+}
 
   /* Set other DEC compatibility extensions.  */
   flag_dollar_ok |= value;
-- 
2.12.2



[PATCH, Fortran] Correctly set -fd-lines-as-comments with -fdec

2017-07-31 Thread Fritz Reese
All,

This is a simple patch. The original intent was for -fdec to set
-fd-lines-as-comments by default if flag_d_lines was unspecified by
the user. However, currently flag_d_lines is interrogated in
set_dec_flags(), usually before its final value is determined. The
attached patch fixes this behavior to work as intended.

Any objections for trunk? If not I think it is safe to commit.

---
Fritz Reese

>From e2761d73e818a5095bcc130ddbafe27022e25ba6 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 12:46:10 -0400
Subject: [PATCH] Correctly set -fd-lines-as-comments with -fdec

gcc/fortran/
* options.c (set_dec_flags, gfc_post_options): Set flag_d_lines
with -fdec when not set by user.

gcc/testsuite/gfortran.dg/
* dec_d_lines_1.f, dec_d_lines_2.f: New.
From e2761d73e818a5095bcc130ddbafe27022e25ba6 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 31 Jul 2017 12:46:10 -0400
Subject: [PATCH] Don't override -fd-lines-as-code with -fdec.

gcc/fortran/
	* options.c (set_dec_flags, gfc_post_options): Only set flag_d_lines
	with -fdec when not set by user.

gcc/testsuite/gfortran.dg/
	* dec_d_lines_1.f, dec_d_lines_2.f: New.
---
 gcc/fortran/options.c | 14 +-
 gcc/testsuite/gfortran.dg/dec_d_lines_1.f |  9 +
 gcc/testsuite/gfortran.dg/dec_d_lines_2.f |  8 
 3 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec_d_lines_1.f
 create mode 100644 gcc/testsuite/gfortran.dg/dec_d_lines_2.f

diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 1af76aa81ec..8cb861bf513 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -57,9 +57,6 @@ set_dec_flags (int value)
 | GFC_STD_GNU | GFC_STD_LEGACY;
   gfc_option.warn_std &= ~(GFC_STD_LEGACY | GFC_STD_F95_DEL);
 
-  /* Set -fd-lines-as-comments by default.  */
-  if (value && gfc_current_form != FORM_FREE && gfc_option.flag_d_lines == -1)
-gfc_option.flag_d_lines = 0;
 
   /* Set other DEC compatibility extensions.  */
   flag_dollar_ok |= value;
@@ -337,8 +334,15 @@ gfc_post_options (const char **pfilename)
 	diagnostic_classify_diagnostic (global_dc, OPT_Wline_truncation,
 	DK_ERROR, UNKNOWN_LOCATION);
 }
-  else if (warn_line_truncation == -1)
-warn_line_truncation = 0;
+  else
+{
+  /* With -fdec, set -fd-lines-as-comments by default in fixed form.  */
+  if (flag_dec && gfc_option.flag_d_lines == -1)
+	gfc_option.flag_d_lines = 0;
+
+  if (warn_line_truncation == -1)
+	warn_line_truncation = 0;
+}
 
   /* If -pedantic, warn about the use of GNU extensions.  */
   if (pedantic && (gfc_option.allow_std & GFC_STD_GNU) != 0)
diff --git a/gcc/testsuite/gfortran.dg/dec_d_lines_1.f b/gcc/testsuite/gfortran.dg/dec_d_lines_1.f
new file mode 100644
index 000..2cc7a01daff
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_d_lines_1.f
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form -fd-lines-as-code -fdec" }
+!
+! Ensure -fd-lines-as-code is not overridden by -fdec.
+!
+  i = 0
+d end
+  subroutine s
+D end
diff --git a/gcc/testsuite/gfortran.dg/dec_d_lines_2.f b/gcc/testsuite/gfortran.dg/dec_d_lines_2.f
new file mode 100644
index 000..31eaf5f2328
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_d_lines_2.f
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form -fdec" }
+!
+! Ensure -fd-lines-as-comments is enabled by default with -fdec.
+!
+d This is a comment.
+D This line, too.
+  end
-- 
2.12.2



Re: C PATCH to further improve diagnostic for -Wsign-compare (PR c/81417)

2017-07-31 Thread Martin Sebor

On 07/31/2017 10:05 AM, Marek Polacek wrote:

On Mon, Jul 31, 2017 at 09:54:03AM -0600, Martin Sebor wrote:

On 07/31/2017 08:14 AM, Marek Polacek wrote:

This patch improves the diagnostic of -Wsign-compare for ?: by also printing
the types, similarly to my recent patch.  But we can do even better here if we
actually point to the operand in question, so I passed the locations of the
operands from the parser.  So instead of

x.c:8:16: warning: signed and unsigned type in conditional expression 
[-Wsign-compare]
   return x ? y : -1;
^
you'll now see:

x.c:8:18: warning: operand of conditional expression changes signedness: 'int' 
to 'unsigned int' [-Wsign-compare]
   return x ? y : -1;
  ^


I like that this is more informative than the last warning you
committed for this bug: it says what type the operand is converted
to.  The last one only shows what the types of the operands are but
leaves users guessing as to what that might mean (integer promotion
rules are often poorly understood).  Where the last warning prints

  comparison of integer expressions of different signedness: ‘int’ and
‘unsigned int’

it would be nice to go back and add this detail to it as well, and
have it print something like this instead:

  comparison of integer expressions of different signedness changes type of
the second operand from ‘int’ to ‘unsigned int’

Where constant expressions are involved it would also be helpful
to show the result of the conversion.  For instance:

  comparison between ‘int’ and ‘unsigned int’ changes the value of the
second operand from ‘-1’ to ‘4294967296’


Hmm, interesting.  I could do that.  How do other people feel about this?


Since you ask for input from others, let me mention that I also
would find it helpful if we could come up with some sort of high
level direction on what information to include in diagnostics and
how to present it (e.g., a section on the GCC Diagnostic Guidelines
page on the Wiki).  Not only would it help guide us in implementing
new diagnostics but it would also result in a more consistent and
uniform user experience.

For what it's worth, my own preference is to err on the side of
providing more detail rather than less so the changes I made in
r248431 to -Wconversion reflect that.  So for example there, GCC
might now print:

  unsigned conversion from ‘int‘ to ‘unsigned char‘ changes value from 
‘-128‘ to ‘128‘"


My suggestions above were based on the style I chose here.

Martin


Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote:
> >> In the middle end patch, do we need a barrier before the fence as well?
> >> The post-fence barrier prevents reordering the fence with anything which
> >> follows the fence.  But do we have to also prevent reordering the fence
> >> with prior instructions with any of the memory models?  This isn't my
> >> area of expertise, so if it's dumb question, don't hesitate to let me
> >> know :-)
> > 
> > That depends on how pessimistic we want to be with respect to backend
> > getting it wrong.  My expectation here is that if a backend emits non-empty
> > RTL, the produced sequence for the fence itself acts as a compiler memory
> > barrier.
> Perhaps. But do we really want to rely on that?  EMitting a scheduling
> barrier prior to these atomics is virtually free.

Please consider that expand_mem_thread_fence is used to place fences around
seq-cst atomic loads when the backend doesn't provide a direct pattern.
With compiler barriers on both sides of the machine barrier, the generated
sequence for a seq-cst atomic load will be 7 insns:

  asm volatile ("":::"memory");
  machine_seq_cst_fence ();
  asm volatile ("":::"memory");
  dst = mem[src];
  asm volatile ("":::"memory");
  machine_seq_cst_fence ();
  asm volatile ("":::"memory");

I can easily imagine people looking at RTL dumps with this overkill fencing
being unhappy about this.

I'd be more happy with detecting empty expansion via get_last_insn ().

Thanks.
Alexander


Re: [PATCH, rs6000] Fix PR81622

2017-07-31 Thread Jakub Jelinek
On Mon, Jul 31, 2017 at 11:19:26AM -0500, Bill Schmidt wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81622 reports an ICE-on-invalid
> for the vec_ld and vec_st built-in functions.  This fires when the last
> argument of the built-in is not a pointer or array type, as is required.
> We break on this during early expansion of the built-ins into tree code
> during parsing.  The solution, as with other ill-formed uses of these
> built-ins, is to avoid the early expansion when the argument has an invalid
> type, so that normal error handling can kick in later.
> 
> (The long-term solution is to move the vec_ld and vec_st built-ins to the
> gimple folding work that Will Schmidt has been doing, but that hasn't
> happened yet.)
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this ok for trunk and GCC 7?  I'd like to get it into 7.2 since it
> is a 7 regression.

See the patch I've attached in the PR, this isn't sufficient
(and for the ARRAY_TYPE I wonder if you can ever see there ARRAY_TYPE),
the function has various other issues, including e.g. ICE on
vec_cmpne (1, 2) with -mpower9.

Jakub


[PATCH, rs6000] Fix PR81622

2017-07-31 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81622 reports an ICE-on-invalid
for the vec_ld and vec_st built-in functions.  This fires when the last
argument of the built-in is not a pointer or array type, as is required.
We break on this during early expansion of the built-ins into tree code
during parsing.  The solution, as with other ill-formed uses of these
built-ins, is to avoid the early expansion when the argument has an invalid
type, so that normal error handling can kick in later.

(The long-term solution is to move the vec_ld and vec_st built-ins to the
gimple folding work that Will Schmidt has been doing, but that hasn't
happened yet.)

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this ok for trunk and GCC 7?  I'd like to get it into 7.2 since it
is a 7 regression.

Thanks,
Bill


[gcc]

2017-07-31  Bill Schmidt  

PR target/81622
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Don't expand vec_ld or vec_st early if the last argument isn't a
pointer or array type.

[gcc/testsuite]

2017-07-31  Bill Schmidt  

PR target/81622
* gcc.target/powerpc/pr81622.c: New file.


Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 250721)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -6454,10 +6454,13 @@ altivec_resolve_overloaded_builtin (location_t loc
  consider this for all memory access built-ins.
 
  When -maltivec=be is specified, or the wrong number of arguments
- is provided, simply punt to existing built-in processing.  */
+ is provided, or the second argument isn't a pointer, simply punt
+ to existing built-in processing.  */
   if (fcode == ALTIVEC_BUILTIN_VEC_LD
   && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
-  && nargs == 2)
+  && nargs == 2
+  && (POINTER_TYPE_P ((*arglist)[1])
+ || TREE_CODE (TREE_TYPE ((*arglist)[1])) == ARRAY_TYPE))
 {
   tree arg0 = (*arglist)[0];
   tree arg1 = (*arglist)[1];
@@ -6528,7 +6531,9 @@ altivec_resolve_overloaded_builtin (location_t loc
   /* Similarly for stvx.  */
   if (fcode == ALTIVEC_BUILTIN_VEC_ST
   && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
-  && nargs == 3)
+  && nargs == 3
+  && (POINTER_TYPE_P ((*arglist)[2])
+ || TREE_CODE (TREE_TYPE ((*arglist)[2])) == ARRAY_TYPE))
 {
   tree arg0 = (*arglist)[0];
   tree arg1 = (*arglist)[1];
Index: gcc/testsuite/gcc.target/powerpc/pr81622.c
===
--- gcc/testsuite/gcc.target/powerpc/pr81622.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr81622.c  (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+void a()
+{
+  __builtin_vec_ld (1, 2); /* { dg-error "invalid parameter combination for 
AltiVec intrinsic __builtin_vec_ld" } */
+}
+
+void b()
+{
+  __builtin_vec_st (0, 1, 2); /* { dg-error "invalid parameter combination for 
AltiVec intrinsic __builtin_vec_st" } */
+}




[PATCH] Fix reassoc var_bound range test optimization (PR tree-optimization/81588)

2017-07-31 Thread Jakub Jelinek
Hi!

This patch fixes a couple of issues in optimize_range_tests_var_bound.
One is that if we have ranges[i].in_p, we should be inverting the comparison
rather than just swapping or not swapping operands.  Then because
we want to handle only LT/LE, we want to swap operands and the comparison
code if we have GT/GE after the optional inversion.  And finally, using
ranges[i].in_p to determine if we should swap at the end is wrong,
in the pr81588 testcase we have a negation in between and thus
ranges[i].in_p doesn't match the opcode - so, we should use opcode and if
opcode is ERROR_MARK (i.e. the inter-bb range test optimization), we should
check oe->rank) and finally shouldn't swap comparison operands, but invert
the comparison code if we need to invert.

In the earlier version of the patch, I made a mistake and miscompiled
trip_count_to_ahead_ratio_too_small_p in tree-ssa-loop-prefetch.c,
so that simplified is also included in another testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/7.2?

2017-07-31  Jakub Jelinek  

PR tree-optimization/81588
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): If
ranges[i].in_p, invert comparison code ccode.  For >/>=,
swap rhs1 and rhs2 and comparison code unconditionally,
for rank is BIT_IOR_EXPR.

* gcc.dg/tree-ssa/pr81588.c: New test.
* gcc.dg/pr81588.c: New test.
* gcc.c-torture/execute/pr81588.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2017-07-31 10:19:22.777332269 +0200
+++ gcc/tree-ssa-reassoc.c  2017-07-31 13:14:33.488618172 +0200
@@ -2958,17 +2958,26 @@ optimize_range_tests_var_bound (enum tre
{
case GT_EXPR:
case GE_EXPR:
- if (!ranges[i].in_p)
-   std::swap (rhs1, rhs2);
+   case LT_EXPR:
+   case LE_EXPR:
+ break;
+   default:
+ continue;
+   }
+  if (ranges[i].in_p)
+   ccode = invert_tree_comparison (ccode, false);
+  switch (ccode)
+   {
+   case GT_EXPR:
+   case GE_EXPR:
+ std::swap (rhs1, rhs2);
  ccode = swap_tree_comparison (ccode);
  break;
case LT_EXPR:
case LE_EXPR:
- if (ranges[i].in_p)
-   std::swap (rhs1, rhs2);
  break;
default:
- continue;
+ gcc_unreachable ();
}
 
   int *idx = map->get (rhs1);
@@ -3015,8 +3024,14 @@ optimize_range_tests_var_bound (enum tre
  fprintf (dump_file, "\n");
}
 
-  if (ranges[i].in_p)
-   std::swap (rhs1, rhs2);
+  operand_entry *oe = (*ops)[ranges[i].idx];
+  ranges[i].in_p = 0;
+  if (opcode == BIT_IOR_EXPR
+ || (opcode == ERROR_MARK && oe->rank == BIT_IOR_EXPR))
+   {
+ ranges[i].in_p = 1;
+ ccode = invert_tree_comparison (ccode, false);
+   }
 
   unsigned int uid = gimple_uid (stmt);
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
@@ -3043,7 +3058,6 @@ optimize_range_tests_var_bound (enum tre
}
   else
{
- operand_entry *oe = (*ops)[ranges[i].idx];
  tree ctype = oe->op ? TREE_TYPE (oe->op) : boolean_type_node;
  if (!INTEGRAL_TYPE_P (ctype)
  || (TREE_CODE (ctype) != BOOLEAN_TYPE
@@ -3065,7 +3079,7 @@ optimize_range_tests_var_bound (enum tre
  ranges[i].high = ranges[i].low;
}
   ranges[i].strict_overflow_p = false;
-  operand_entry *oe = (*ops)[ranges[*idx].idx];
+  oe = (*ops)[ranges[*idx].idx];
   /* Now change all the other range test immediate uses, so that
 those tests will be optimized away.  */
   if (opcode == ERROR_MARK)
--- gcc/testsuite/gcc.dg/tree-ssa/pr81588.c.jj  2017-07-31 12:30:16.917723926 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr81588.c 2017-07-31 12:30:16.917723926 
+0200
@@ -0,0 +1,15 @@
+/* PR tree-optimization/81588 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1-details" } */
+   
+extern long long int a, c;
+extern unsigned short b;
+
+/* { dg-final { scan-tree-dump-times "Optimizing range test \[^\n\r]* and 
comparison" 1 "reassoc1" } } */
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  if ((b > a) != (1 + (a < 0)))
+c = 0;
+}
--- gcc/testsuite/gcc.dg/pr81588.c.jj   2017-07-31 12:30:16.917723926 +0200
+++ gcc/testsuite/gcc.dg/pr81588.c  2017-07-31 12:30:16.917723926 +0200
@@ -0,0 +1,26 @@
+/* PR tree-optimization/81588 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+long long int a = 5011877430933453486LL, c = 1;
+unsigned short b = 24847;
+
+#include "tree-ssa/pr81588.c"
+
+int
+main ()
+{
+  foo ();
+  if (c != 0)
+__builtin_abort ();
+  a = 24846;
+  c = 1;
+  foo ();
+  if (c != 1)
+__builtin_abort ();
+  a = -5;
+  foo ();
+  if (c != 0)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr81588.c.jj2017-07-31 
13:15:11.832181205 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr81588.c   

Re: [WWW PATCH]: Mention that x86 now supports "naked" function attribute.

2017-07-31 Thread Gerald Pfeifer
On Mon, 31 Jul 2017, Uros Bizjak wrote:
> One liner that mentions new addition to x86 port.
> 
> OK for wwwdocs?

Sure.  (Even without asking. ;-)

Thanks,
Gerald


Re: C PATCH to further improve diagnostic for -Wsign-compare (PR c/81417)

2017-07-31 Thread Marek Polacek
On Mon, Jul 31, 2017 at 09:54:03AM -0600, Martin Sebor wrote:
> On 07/31/2017 08:14 AM, Marek Polacek wrote:
> > This patch improves the diagnostic of -Wsign-compare for ?: by also printing
> > the types, similarly to my recent patch.  But we can do even better here if 
> > we
> > actually point to the operand in question, so I passed the locations of the
> > operands from the parser.  So instead of
> > 
> > x.c:8:16: warning: signed and unsigned type in conditional expression 
> > [-Wsign-compare]
> >return x ? y : -1;
> > ^
> > you'll now see:
> > 
> > x.c:8:18: warning: operand of conditional expression changes signedness: 
> > 'int' to 'unsigned int' [-Wsign-compare]
> >return x ? y : -1;
> >   ^
> 
> I like that this is more informative than the last warning you
> committed for this bug: it says what type the operand is converted
> to.  The last one only shows what the types of the operands are but
> leaves users guessing as to what that might mean (integer promotion
> rules are often poorly understood).  Where the last warning prints
> 
>   comparison of integer expressions of different signedness: ‘int’ and
> ‘unsigned int’
> 
> it would be nice to go back and add this detail to it as well, and
> have it print something like this instead:
> 
>   comparison of integer expressions of different signedness changes type of
> the second operand from ‘int’ to ‘unsigned int’
> 
> Where constant expressions are involved it would also be helpful
> to show the result of the conversion.  For instance:
> 
>   comparison between ‘int’ and ‘unsigned int’ changes the value of the
> second operand from ‘-1’ to ‘4294967296’

Hmm, interesting.  I could do that.  How do other people feel about this?

Marek


[PATCH][AArch64] PR71951: Fix unwinding with -fomit-frame-pointer

2017-07-31 Thread Wilco Dijkstra
As described in PR71951, if libgcc is built with -fomit-frame-pointer,
unwinding crashes, for example while doing a backtrace.  The underlying
reason is the Dwarf unwinder does not setup the frame pointer register
in the initialization code.  When later unwinding a function that uses
the frame pointer, it tries to read FP using _Unwind_GetGR, and this
crashes if has never restored FP.  To unwind correctly the first frame
must save and restore FP (it is unwound in a special way so that it
uses SP instead of FP).  This is done by adding -fno-omit-frame-pointer.

OK for commit and backport to GCC6/7?

ChangeLog:
2017-07-31  Wilco Dijkstra  

PR target/71951
* config/aarch64/aarch64.h (LIBGCC2_UNWIND_ATTRIBUTE): Define.

--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
7f91edb5713d7e8eda2f0a024a0f97b4e111c4b0..03fd93046bdbdb03bd7d0c4573928f504640f7e1
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -971,4 +971,12 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 extern tree aarch64_fp16_type_node;
 extern tree aarch64_fp16_ptr_type_node;
 
+/* The generic unwind code in libgcc does not initialize the frame pointer.
+   So in order to unwind a function using a frame pointer, the very first
+   function that is unwound must save the frame pointer.  That way the frame
+   pointer is restored and its value is now valid - otherwise _Unwind_GetGR
+   crashes.  Libgcc can now be safely built with -fomit-frame-pointer.  */
+#define LIBGCC2_UNWIND_ATTRIBUTE \
+  __attribute__((optimize ("no-omit-frame-pointer")))
+
 #endif /* GCC_AARCH64_H */


Re: C PATCH to further improve diagnostic for -Wsign-compare (PR c/81417)

2017-07-31 Thread Martin Sebor

On 07/31/2017 08:14 AM, Marek Polacek wrote:

This patch improves the diagnostic of -Wsign-compare for ?: by also printing
the types, similarly to my recent patch.  But we can do even better here if we
actually point to the operand in question, so I passed the locations of the
operands from the parser.  So instead of

x.c:8:16: warning: signed and unsigned type in conditional expression 
[-Wsign-compare]
   return x ? y : -1;
^
you'll now see:

x.c:8:18: warning: operand of conditional expression changes signedness: 'int' 
to 'unsigned int' [-Wsign-compare]
   return x ? y : -1;
  ^


I like that this is more informative than the last warning you
committed for this bug: it says what type the operand is converted
to.  The last one only shows what the types of the operands are but
leaves users guessing as to what that might mean (integer promotion
rules are often poorly understood).  Where the last warning prints

  comparison of integer expressions of different signedness: ‘int’ and 
‘unsigned int’


it would be nice to go back and add this detail to it as well, and
have it print something like this instead:

  comparison of integer expressions of different signedness changes 
type of the second operand from ‘int’ to ‘unsigned int’


Where constant expressions are involved it would also be helpful
to show the result of the conversion.  For instance:

  comparison between ‘int’ and ‘unsigned int’ changes the value of the 
second operand from ‘-1’ to ‘4294967296’


Martin



Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-07-31  Marek Polacek  

PR c/81417
* c-array-notation.c (fix_builtin_array_notation_fn): Update calls to
build_conditional_expr. 
* c-parser.c (c_parser_conditional_expression): Pass the locations of
OP1 and OP2 down to build_conditional_expr.
* c-tree.h (build_conditional_expr): Update declaration.
* c-typeck.c (build_conditional_expr): Add location_t parameters.
For -Wsign-compare, also print the types.

* objc-next-runtime-abi-02.c (build_v2_build_objc_method_call): Update
a call to build_conditional_expr.

* Wsign-compare-1.c: New test.
* gcc.dg/compare1.c: Adjust dg-bogus.
* gcc.dg/compare2.c: Likewise.
* gcc.dg/compare3.c: Likewise.
* gcc.dg/compare7.c: Likewise.
* gcc.dg/compare8.c: Likewise.
* gcc.dg/compare9.c: Likewise.
* gcc.dg/pr11492.c: Likewise.

diff --git gcc/c/c-array-notation.c gcc/c/c-array-notation.c
index e430f5c681b..40f1cfdabb8 100644
--- gcc/c/c-array-notation.c
+++ gcc/c/c-array-notation.c
@@ -355,8 +355,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_NONZERO:
   new_var_init = build_modify_expr
@@ -375,8 +376,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_ZERO:
   new_var_init = build_modify_expr
@@ -394,8 +396,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_NONZERO:
   new_var_init = build_modify_expr
@@ -413,8 +416,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, 

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08

2017-07-31 Thread Jeff Law
On 07/21/2017 02:25 PM, Segher Boessenkool wrote:
> On Thu, Jul 20, 2017 at 08:20:52AM -0600, Jeff Law wrote:
>>> Can only combine-stack-adjustments do this?  It seems like something
>>> many passes could do, and then your new note doesn't help.
>> SO far it's only been observed with c-s-a, but further auditing is
>> certainly desirable here, particularly with the upcoming changes to the
>> generic dynamic alloca handling.
>>
>> In the V2 patch only backends would emit unrolled inline alloca/probe
>> sequences like what you see above and only for prologues.  Thus there
>> were a very limited number of passes to be concerned about.
>>
>> In the V3 patch we have unrolled inline probing for the dynamic space as
>> well, so this kind of sequence is exposed to everything after
>> gimple->rtl expansion.
>>
>> Unfortunately, the most thorough checker we have is x86 and on that
>> target, because of stack alignment issues, we'll never see a constant
>> size in the dynamic space and thus no unrolled inlined alloca/probe
>> sequences.
>>
>> In reality I suspect that with teh hard register references, most passes
>> are going to leave those insns alone, but some auditing is necessary.
> 
> This is similar to what rs6000 uses stack_tie for.  You want the
> prevent a store to the stack (the probe) from being moved after a
> later stack pointer update.  By pretending (in the insn pattern)
> there is a store to stack with that stack pointer update, nothing
> can move stores after it.
FWIW, we found a case where the scheduler would muck things up.
Essentially it has code rewrites address computations and memory
references in the hopes of breaking dependency chains.  In many ways
it's a lot like c-s-a.

Anyway, it broke the dependency chain and as a result we ended up with
unsafe sequences on aarch64.   THe V3 patch posted last night addresses
that in two ways.  First the scheduler knows about the magic note in the
same manner as c-s-a.  And we emit scheduling barriers in the probing
code to prevent undesirable movement.

Jeff


Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Jeff Law
On 07/26/2017 12:13 PM, Alexander Monakov wrote:
> On Wed, 26 Jul 2017, Jeff Law wrote:
>> I'm not sure what you mean by extraneous compiler barriers -- isn't the
>> worst case scenario here that the target emits them as well?  So there
>> would be an extraneous one in that case, but that ought to be a "don't
>> care".
> 
> Yes, exactly this.
> 
>> In the middle end patch, do we need a barrier before the fence as well?
>> The post-fence barrier prevents reordering the fence with anything which
>> follows the fence.  But do we have to also prevent reordering the fence
>> with prior instructions with any of the memory models?  This isn't my
>> area of expertise, so if it's dumb question, don't hesitate to let me
>> know :-)
> 
> That depends on how pessimistic we want to be with respect to backend
> getting it wrong.  My expectation here is that if a backend emits non-empty
> RTL, the produced sequence for the fence itself acts as a compiler memory
> barrier.
Perhaps. But do we really want to rely on that?  EMitting a scheduling
barrier prior to these atomics is virtually free.

Jeff


Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Jeff Law
On 07/26/2017 02:02 PM, Alexander Monakov wrote:
> On Wed, 26 Jul 2017, Alexander Monakov wrote:
> 
>> On Wed, 26 Jul 2017, Jeff Law wrote:
>>> I'm not sure what you mean by extraneous compiler barriers -- isn't the
>>> worst case scenario here that the target emits them as well?  So there
>>> would be an extraneous one in that case, but that ought to be a "don't
>>> care".
>>
>> Yes, exactly this.
> 
> I've just realized that we can detect if the backend produced empty RTL
> sequence by looking at get_last_insn () before/after gen_mem_thread_fence.
> This way we can emit a compiler barrier iff the backend didn't emit a
> machine barrier.
We could.  But I suspect just emitting the barriers in the generic code
is the way to go.  If we have a redundant barrier, the compile time cost
is minimal and I don't think the additional code to avoid creating the
exxtra barrier is worth it.

jeff


Re: C PATCH to further improve diagnostic for -Wsign-compare (PR c/81417)

2017-07-31 Thread David Malcolm
On Mon, 2017-07-31 at 16:14 +0200, Marek Polacek wrote:
> This patch improves the diagnostic of -Wsign-compare for ?: by also
> printing
> the types, similarly to my recent patch.  But we can do even better
> here if we
> actually point to the operand in question, so I passed the locations
> of the
> operands from the parser.

Thanks for updating the patch.

>   So instead of 
> 
> x.c:8:16: warning: signed and unsigned type in conditional expression
> [-Wsign-compare]
>return x ? y : -1;
> ^
> you'll now see:
> 
> x.c:8:18: warning: operand of conditional expression changes
> signedness: 'int' to 'unsigned int' [-Wsign-compare]
>return x ? y : -1;
>   ^

That's an improvement, but I would have expected it to underline the
whole of the pertinent subexpression e.g.:

   return x ? y : -1;
  ^~

rather than just:

   return x ? y : -1;
  ^

>From my reading of the patch, it's capturing just the location of the
first token within the subexpression (hence e.g. just the minus token
in the example above, which happens to make some sense for this case,
but wouldn't in general).

Hopefully you can get at the location_t for the whole of the
subexpression using c_expr, rather than peeking at the first token.

> Bootstrapped/regtested on x86_64-linux, ok for trunk?

The patch doesn't have a testcase for the location information; please
add one, using -fdiagnostics-show-caret and dg-begin-multiline-output/
dg-end-multiline-output.  Please ensure that the pertinent expressions
are more than one character wide, so that the test properly verifies
the underlining.

Thanks
Dave


> 2017-07-31  Marek Polacek  
> 
>   PR c/81417
>   * c-array-notation.c (fix_builtin_array_notation_fn): Update
> calls to
>   build_conditional_expr. 
>   * c-parser.c (c_parser_conditional_expression): Pass the
> locations of
>   OP1 and OP2 down to build_conditional_expr.
>   * c-tree.h (build_conditional_expr): Update declaration.
>   * c-typeck.c (build_conditional_expr): Add location_t
> parameters.
>   For -Wsign-compare, also print the types.
> 
>   * objc-next-runtime-abi-02.c (build_v2_build_objc_method_call):
> Update
>   a call to build_conditional_expr.
> 
>   * Wsign-compare-1.c: New test.
>   * gcc.dg/compare1.c: Adjust dg-bogus.
>   * gcc.dg/compare2.c: Likewise.
>   * gcc.dg/compare3.c: Likewise.
>   * gcc.dg/compare7.c: Likewise.
>   * gcc.dg/compare8.c: Likewise.
>   * gcc.dg/compare9.c: Likewise.
>   * gcc.dg/pr11492.c: Likewise.
> 
> diff --git gcc/c/c-array-notation.c gcc/c/c-array-notation.c
> index e430f5c681b..40f1cfdabb8 100644
> --- gcc/c/c-array-notation.c
> +++ gcc/c/c-array-notation.c
> @@ -355,8 +355,9 @@ fix_builtin_array_notation_fn (tree
> an_builtin_fn, tree *new_var)
>new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm),
> func_parm,
> build_zero_cst (TREE_TYPE
> (func_parm)));
>new_expr = build_conditional_expr
> - (location, new_cond_expr, false, new_yes_expr,
> -  TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE
> (new_no_expr));
> + (location, new_cond_expr, false,
> +  new_yes_expr, TREE_TYPE (new_yes_expr), location,
> +  new_no_expr, TREE_TYPE (new_no_expr), location);
>break;
>  case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_NONZERO:
>new_var_init = build_modify_expr
> @@ -375,8 +376,9 @@ fix_builtin_array_notation_fn (tree
> an_builtin_fn, tree *new_var)
>new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm),
> func_parm,
> build_zero_cst (TREE_TYPE
> (func_parm)));
>new_expr = build_conditional_expr
> - (location, new_cond_expr, false, new_yes_expr,
> -  TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE
> (new_no_expr));
> + (location, new_cond_expr, false,
> +  new_yes_expr, TREE_TYPE (new_yes_expr), location,
> +  new_no_expr, TREE_TYPE (new_no_expr), location);
>break;
>  case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_ZERO:
>new_var_init = build_modify_expr
> @@ -394,8 +396,9 @@ fix_builtin_array_notation_fn (tree
> an_builtin_fn, tree *new_var)
>new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm),
> func_parm,
> build_zero_cst (TREE_TYPE
> (func_parm)));
>new_expr = build_conditional_expr
> - (location, new_cond_expr, false, new_yes_expr,
> -  TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE
> (new_no_expr));   
> + (location, new_cond_expr, false,
> +  new_yes_expr, TREE_TYPE (new_yes_expr), location,
> +  new_no_expr, TREE_TYPE (new_no_expr), location);
>break;
>  case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_NONZERO:
>new_var_init = build_modify_expr
> @@ -413,8 +416,9 @@ fix_builtin_array_notation_fn (tree
> an_builtin_fn, tree *new_var)
>new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm),
> 

Re: [PATCH] libstdc++: Support std::is_aggregate on clang++ (was [cfe-dev] clang++: std::is_aggregate unusable with clang-5.0/libstdc++-7)

2017-07-31 Thread Tim Song
On Mon, Jul 31, 2017 at 10:53 AM, Jonathan Wakely  wrote:
> On 27/07/17 16:27 +0900, Katsuhiko Nishimra wrote:
>>
>> From 56c4a18d0d8c8ce7aa1239880138775e4db06645 Mon Sep 17 00:00:00 2001
>> From: Katsuhiko Nishimra 
>> Date: Thu, 27 Jul 2017 16:03:54 +0900
>> Subject: [PATCH] libstdc++: Support std::is_aggregate on clang++
>>
>> Currently, libstdc++ tries to detect __is_aggregate built-in macro using
>> __has_builtin, but this fails on clang++ because __has_builtin on
>> clang++ detects only built-in functions, not built-in macros. This patch
>> adds a test using __is_identifier. Tested on clang++
>> 5.0.0-svn308422-1~exp1 and g++ 7.1.0-10 from Debian unstable.
>> ---
>> libstdc++-v3/include/std/type_traits | 5 +
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/libstdc++-v3/include/std/type_traits
>> b/libstdc++-v3/include/std/type_traits
>> index 390b6f40a..e7ec402fb 100644
>> --- a/libstdc++-v3/include/std/type_traits
>> +++ b/libstdc++-v3/include/std/type_traits
>> @@ -2894,6 +2894,11 @@ template 
>>
>> #if __GNUC__ >= 7
>> # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
>> +#elif defined(__is_identifier)
>> +// For clang
>> +# if ! __is_identifier(__is_aggregate)
>> +#  define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
>> +# endif
>> #elif defined __has_builtin
>> // For non-GNU compilers:
>> # if __has_builtin(__is_aggregate)
>
>
> This __has_bultin check only exists for Clang, so should be replaced
> by the correct __is_identifier check, not left there in addition to
> it.
>
>

https://clang.llvm.org/docs/LanguageExtensions.html#checks-for-type-trait-primitives
seems to suggest using __has_extension instead.


PING: [PATCH] PR driver/81523: Make -static override -pie

2017-07-31 Thread H.J. Lu
On Mon, Jul 24, 2017 at 10:24 AM, H.J. Lu  wrote:
> On Sun, Jul 23, 2017 at 8:14 AM, H.J. Lu  wrote:
>> -static and -pie together behave differently depending on whether GCC is
>> configured with --enable-default-pie.  On x86, "-static -pie" fails to
>> create executable when --enable-default-pie isn't used, but creates a
>> static executable when --enable-default-pie is used.  This patch makes
>> -static completely override -pie to create a static executable, regardless
>> if --enable-default-pie is used to configure GCC.
>>
>> OK for master?
>>
>> H.J.
>> --
>> 2017-07-23  Alan Modra  
>> H.J. Lu  
>>
>> gcc/
>>
>> PR driver/81523
>> * gcc.c (NO_PIE_SPEC): Delete.
>> (PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
>> exclusion..
>> (LINK_PIE_SPEC): ..to here.
>> * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
>> chain of crtbegin*.o selection, update for PIE_SPEC changes and
>> format.
>> (GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
>> * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
>> (ENDFILE_CRTEND_SPEC): Similarly.
>>
>
> We need to add %{no-pie:} to LINK_COMMAND_SPEC to prevent an error
> message when PIE isn't enabled by default.   Here is the updated patch with
> a testcase.
>

PING.  I am enclosing the patch here.

Thanks.


-- 
H.J.
From ea702c99286ab92a4b94f676d2340ce55fd173c3 Mon Sep 17 00:00:00 2001
From: Alan Modra 
Date: Thu, 20 Jul 2017 09:57:36 -0700
Subject: [PATCH] PR driver/81523: Make -static override -pie

-static and -pie together behave differently depending on whether GCC is
configured with --enable-default-pie.  On x86, "-static -pie" fails to
create executable when --enable-default-pie isn't used, but creates a
static executable when --enable-default-pie is used.  This patch makes
-static completely override -pie to create a static executable, regardless
if --enable-default-pie is used to configure GCC.

2017-07-24  Alan Modra  
	H.J. Lu  

gcc/

	PR driver/81523
	* gcc.c (NO_PIE_SPEC): Delete.
	(PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r
	exclusion..
	(LINK_PIE_SPEC): ..to here.
	(LINK_COMMAND_SPEC): Support -no-pie.
	* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
	chain of crtbegin*.o selection, update for PIE_SPEC changes and
	format.
	(GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
	* config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
	(ENDFILE_CRTEND_SPEC): Similarly.

gcc/testsuite/

	PR driver/81523
	* gcc.dg/pie-7.c: New test.
	* gcc.dg/pie-static-1.c: Likewise.
	* gcc.dg/pie-static-2.c: Likewise.
---
 gcc/config/gnu-user.h   | 34 --
 gcc/config/sol2.h   | 12 ++--
 gcc/gcc.c   | 10 +-
 gcc/testsuite/gcc.dg/pie-7.c|  7 +++
 gcc/testsuite/gcc.dg/pie-static-1.c |  7 +++
 gcc/testsuite/gcc.dg/pie-static-2.c |  7 +++
 6 files changed, 56 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pie-7.c
 create mode 100644 gcc/testsuite/gcc.dg/pie-static-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pie-static-2.c

diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2787a3d16be..de605b0c466 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -50,19 +50,28 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #if defined HAVE_LD_PIE
 #define GNU_USER_TARGET_STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:gcrt1.o%s;: \
-%{" PIE_SPEC ":Scrt1.o%s} %{" NO_PIE_SPEC ":crt1.o%s}}} \
-   crti.o%s %{static:crtbeginT.o%s;: %{shared:crtbeginS.o%s} \
-	  %{" PIE_SPEC ":crtbeginS.o%s} \
-	  %{" NO_PIE_SPEC ":crtbegin.o%s}} \
+  "%{shared:; \
+ pg|p|profile:gcrt1.o%s; \
+ static:crt1.o%s; \
+ " PIE_SPEC ":Scrt1.o%s; \
+ :crt1.o%s} \
+   crti.o%s \
+   %{static:crtbeginT.o%s; \
+ shared|" PIE_SPEC ":crtbeginS.o%s; \
+ :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
" CRTOFFLOADBEGIN
 #else
 #define GNU_USER_TARGET_STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:gcrt1.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} \
+  "%{shared:; \
+ pg|p|profile:gcrt1.o%s; \
+ :crt1.o%s} \
+   crti.o%s \
+   %{static:crtbeginT.o%s; \
+ shared|pie:crtbeginS.o%s; \
+ :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
@@ -82,15 +91,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s} \
-   %{shared:crtendS.o%s;: 

Re: [PATCH] libstdc++: Support std::is_aggregate on clang++ (was [cfe-dev] clang++: std::is_aggregate unusable with clang-5.0/libstdc++-7)

2017-07-31 Thread Jonathan Wakely

On 27/07/17 16:27 +0900, Katsuhiko Nishimra wrote:

From 56c4a18d0d8c8ce7aa1239880138775e4db06645 Mon Sep 17 00:00:00 2001
From: Katsuhiko Nishimra 
Date: Thu, 27 Jul 2017 16:03:54 +0900
Subject: [PATCH] libstdc++: Support std::is_aggregate on clang++

Currently, libstdc++ tries to detect __is_aggregate built-in macro using
__has_builtin, but this fails on clang++ because __has_builtin on
clang++ detects only built-in functions, not built-in macros. This patch
adds a test using __is_identifier. Tested on clang++
5.0.0-svn308422-1~exp1 and g++ 7.1.0-10 from Debian unstable.
---
libstdc++-v3/include/std/type_traits | 5 +
1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 390b6f40a..e7ec402fb 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2894,6 +2894,11 @@ template 

#if __GNUC__ >= 7
# define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
+#elif defined(__is_identifier)
+// For clang
+# if ! __is_identifier(__is_aggregate)
+#  define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
+# endif
#elif defined __has_builtin
// For non-GNU compilers:
# if __has_builtin(__is_aggregate)


This __has_bultin check only exists for Clang, so should be replaced
by the correct __is_identifier check, not left there in addition to
it.




C PATCH to further improve diagnostic for -Wsign-compare (PR c/81417)

2017-07-31 Thread Marek Polacek
This patch improves the diagnostic of -Wsign-compare for ?: by also printing
the types, similarly to my recent patch.  But we can do even better here if we
actually point to the operand in question, so I passed the locations of the
operands from the parser.  So instead of 

x.c:8:16: warning: signed and unsigned type in conditional expression 
[-Wsign-compare]
   return x ? y : -1;
^
you'll now see:

x.c:8:18: warning: operand of conditional expression changes signedness: 'int' 
to 'unsigned int' [-Wsign-compare]
   return x ? y : -1;
  ^

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-07-31  Marek Polacek  

PR c/81417
* c-array-notation.c (fix_builtin_array_notation_fn): Update calls to
build_conditional_expr. 
* c-parser.c (c_parser_conditional_expression): Pass the locations of
OP1 and OP2 down to build_conditional_expr.
* c-tree.h (build_conditional_expr): Update declaration.
* c-typeck.c (build_conditional_expr): Add location_t parameters.
For -Wsign-compare, also print the types.

* objc-next-runtime-abi-02.c (build_v2_build_objc_method_call): Update
a call to build_conditional_expr.

* Wsign-compare-1.c: New test.
* gcc.dg/compare1.c: Adjust dg-bogus.
* gcc.dg/compare2.c: Likewise.
* gcc.dg/compare3.c: Likewise.
* gcc.dg/compare7.c: Likewise.
* gcc.dg/compare8.c: Likewise.
* gcc.dg/compare9.c: Likewise.
* gcc.dg/pr11492.c: Likewise.

diff --git gcc/c/c-array-notation.c gcc/c/c-array-notation.c
index e430f5c681b..40f1cfdabb8 100644
--- gcc/c/c-array-notation.c
+++ gcc/c/c-array-notation.c
@@ -355,8 +355,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_NONZERO:
   new_var_init = build_modify_expr
@@ -375,8 +376,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_ZERO:
   new_var_init = build_modify_expr
@@ -394,8 +396,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (EQ_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));   
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_NONZERO:
   new_var_init = build_modify_expr
@@ -413,8 +416,9 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_cond_expr = build2 (NE_EXPR, TREE_TYPE (func_parm), func_parm,
  build_zero_cst (TREE_TYPE (func_parm)));
   new_expr = build_conditional_expr
-   (location, new_cond_expr, false, new_yes_expr,
-TREE_TYPE (new_yes_expr), new_no_expr, TREE_TYPE (new_no_expr));   
+   (location, new_cond_expr, false,
+new_yes_expr, TREE_TYPE (new_yes_expr), location,
+new_no_expr, TREE_TYPE (new_no_expr), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX:
   if (TYPE_MIN_VALUE (new_var_type))
@@ -434,7 +438,8 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
   new_expr = build_conditional_expr
(location,
 build2 (LT_EXPR, TREE_TYPE (*new_var), *new_var, func_parm), false,
-new_yes_expr, TREE_TYPE (*new_var), new_no_expr, TREE_TYPE (*new_var));
+new_yes_expr, TREE_TYPE (*new_var), location,
+new_no_expr, TREE_TYPE (*new_var), location);
   break;
 case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN:
   if (TYPE_MAX_VALUE (new_var_type))
@@ -454,7 +459,8 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
  

[PATCH PR81267]Rewrite into loop closed ssa form in case of any store-store chain

2017-07-31 Thread Bin Cheng
Hi,
This simple patch fixes the ICE by rewriting into loop closed ssa form in case
of any store-store chain.  We maybe able to avoid that for some cases that
eliminated stores only store loop invariant values, but only with more checks
when inserting final store instructions.
Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?

Thanks,
bin
2017-07-31  Bin Cheng  <bin.ch...@arm.com>

PR tree-optimization/81627
* tree-predcom.c (prepare_finalizers): Always rewrite into loop
closed ssa form for store-store chain.

gcc/testsuite/ChangeLog
2017-07-31  Bin Cheng  <bin.ch...@arm.com>

PR tree-optimization/81627
* gcc.dg/tree-ssa/pr81627.c: New.From d366015187de926a8fe3248325b229bed99b27b5 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com>
Date: Mon, 31 Jul 2017 11:16:44 +0100
Subject: [PATCH 2/2] pr81627-20170731.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr81627.c | 28 
 gcc/tree-predcom.c  | 10 +-
 2 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100755 gcc/testsuite/gcc.dg/tree-ssa/pr81627.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81627.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr81627.c
new file mode 100755
index 000..7421c49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81627.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-loop-vectorize -fdump-tree-pcom-details" } */
+
+int a, b, c, d[6], e = 3, f;
+
+void abort (void);
+void fn1 ()
+{
+  for (b = 1; b < 5; b++)
+{
+  for (c = 0; c < 5; c++)
+d[b] = e;
+  if (a)
+f++;
+  d[b + 1] = 1;
+}
+}
+
+int main ()
+{
+  fn1 ();
+  if (d[0] != 0 || d[1] != 3 || d[2] != 3
+  || d[3] != 3 || d[4] != 3 || d[5] != 1)
+abort ();
+
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "Store-stores chain" 1 "pcom" } } */
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index f7a57a4..4538773 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2983,11 +2983,11 @@ prepare_finalizers (struct loop *loop, vec 
chains)
   if (prepare_finalizers_chain (loop, chain))
{
  i++;
- /* We don't corrupt loop closed ssa form for store elimination
-chain if eliminated stores only store loop invariant values
-into memory.  */
- if (!chain->inv_store_elimination)
-   loop_closed_ssa |= (!chain->inv_store_elimination);
+ /* Be conservative, assume loop closed ssa form is corrupted
+by store-store chain.  Though it's not always the case if
+eliminated stores only store loop invariant values into
+memory.  */
+ loop_closed_ssa = true;
}
   else
{
-- 
1.9.1



[PATCH PR81620]Don't set has_max_use_after flag for store-store chain

2017-07-31 Thread Bin Cheng
Hi,
This simple patch fixes the ICE by not setting has_max_use_after flag for
store-store chain because there is no use at all.
Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no failure?

Thanks,
bin
2017-07-31  Bin Cheng  <bin.ch...@arm.com>

PR tree-optimization/81620
* tree-predcom.c (add_ref_to_chain): Don't set has_max_use_after
for store-store chain.

gcc/testsuite/ChangeLog
2017-07-31  Bin Cheng  <bin.ch...@arm.com>

PR tree-optimization/81620
* gcc.dg/tree-ssa/pr81620-1.c: New.
* gcc.dg/tree-ssa/pr81620-2.c: New.From 4e8f67bb1cc09ef475f9cfbb8e847f9f422c3e44 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com>
Date: Mon, 31 Jul 2017 10:24:07 +0100
Subject: [PATCH 1/2] pr81620-20170731.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr81620-1.c | 20 
 gcc/testsuite/gcc.dg/tree-ssa/pr81620-2.c | 25 +
 gcc/tree-predcom.c|  4 +++-
 3 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr81620-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr81620-2.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81620-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr81620-1.c
new file mode 100644
index 000..f8f2dd8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81620-1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-loop-vectorize -fdump-tree-pcom-details" } */
+
+int a[7];
+char b;
+void abort (void);
+
+int main() {
+  b = 4;
+  for (; b; b--) {
+a[b] = b;
+a[b + 2] = 1;
+  }
+  if (a[0] != 0 || a[1] != 1 || a[2] != 2
+  || a[3] != 1 || a[4] != 1 || a[5] != 1 || a[6] != 1)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Store-stores chain" 1 "pcom" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81620-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr81620-2.c
new file mode 100644
index 000..85a8e35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81620-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-loop-vectorize -fdump-tree-pcom-details" } */
+
+int a[200];
+char b;
+void abort (void);
+
+int main() {
+  int i;
+  b = 100;
+  for (; b; b--) {
+a[b] = 2;
+a[b + 2] = 1;
+  }
+
+  if (a[0] != 0 || a[1] != 2 || a[2] != 2)
+abort ();
+  for (i = 3; i < 103; i++)
+if (a[i] != 1)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Store-stores chain" 1 "pcom" } } */
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index a4011bf..f7a57a4 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1069,7 +1069,9 @@ add_ref_to_chain (chain_p chain, dref ref)
   chain->has_max_use_after = false;
 }
 
-  if (ref->distance == chain->length
+  /* Don't set the flag for store-store chain since there is no use.  */
+  if (chain->type != CT_STORE_STORE
+  && ref->distance == chain->length
   && ref->pos > root->pos)
 chain->has_max_use_after = true;
 
-- 
1.9.1



Re: [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out

2017-07-31 Thread Uros Bizjak
On Mon, Jul 31, 2017 at 1:24 PM, Daniel Santos  wrote:
> The -mcall-ms2sysv-xlogues project added the boolean fields
> call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function
> to track rather or not an additional 8 bytes of padding was needed for
> stack alignment prior to and after the stub save area.  This design was
> based upon the faulty assumption the function body would not require a
> stack alignment greater than 16 bytes.  This continues to work well for
> managing padding prior to the stub save area, but will not work for the
> outgoing alignment.
>
> Rather than changing machine_function::call_ms2sysv_pad_out to a larger
> type, this patch removes it, thus transferring responsibility for stack
> alignment following the stub save area from class xlogue_layout to the
> body of ix86_compute_frame_layout.  Since the 64-bit va_arg register
> save area is always a multiple of 16-bytes in size (176 for System V ABI
> and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset
> at the start of the function body (frame.frame_pointer_offset) will
> assure there is enough room for any padding needed to keep the save area
> for SSE va_args 16-byte aligned, so no modification is needed for that
> calculation.
>
> Signed-off-by: Daniel Santos 

LGTM.

OK for mainline.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c | 18 --
>  gcc/config/i386/i386.h |  8 ++--
>  2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 47c5608c3cd..e2e9546a27c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2491,9 +2491,7 @@ public:
>  unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
>
>  gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
> -return m_regs[last_reg].offset
> -  + (m->call_ms2sysv_pad_out ? 8 : 0)
> -  + STUB_INDEX_OFFSET;
> +return m_regs[last_reg].offset + STUB_INDEX_OFFSET;
>}
>
>/* Returns the offset for the base pointer used by the stub.  */
> @@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void)
> {
>   unsigned count = xlogue_layout::count_stub_managed_regs ();
>   m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
> + m->call_ms2sysv_pad_in = 0;
> }
>  }
>
>frame->nregs = ix86_nsaved_regs ();
>frame->nsseregs = ix86_nsaved_sseregs ();
> -  m->call_ms2sysv_pad_in = 0;
> -  m->call_ms2sysv_pad_out = 0;
>
>/* 64-bit MS ABI seem to require stack alignment to be always 16,
>   except for function prologues, leaf functions and when the defult
> @@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void)
>gcc_assert (!frame->nsseregs);
>
>m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
> -
> -  /* Select an appropriate layout for incoming stack offset.  */
> -  const struct xlogue_layout  = xlogue_layout::get_instance ();
> -
> -  if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD)
> -   m->call_ms2sysv_pad_out = 1;
> -
> -  offset += xlogue.get_stack_space_used ();
> -  gcc_assert (!(offset & 0xf));
> +  offset += xlogue_layout::get_instance ().get_stack_space_used ();
>  }
>
>/* Align and set SSE register save area.  */
> @@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void)
>
>/* Align start of frame for local function.  */
>if (stack_realign_fp
> +  || m->call_ms2sysv
>|| offset != frame->sse_reg_save_offset
>|| size != 0
>|| !crtl->is_leaf
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 1648bdf1556..b08e45f68d4 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2646,17 +2646,13 @@ struct GTY(()) machine_function {
>BOOL_BITFIELD arg_reg_available : 1;
>
>/* If true, we're out-of-lining reg save/restore for regs clobbered
> - by ms_abi functions calling a sysv function.  */
> + by 64-bit ms_abi functions calling a sysv_abi function.  */
>BOOL_BITFIELD call_ms2sysv : 1;
>
>/* If true, the incoming 16-byte aligned stack has an offset (of 8) and
> - needs padding.  */
> + needs padding prior to out-of-line stub save/restore area.  */
>BOOL_BITFIELD call_ms2sysv_pad_in : 1;
>
> -  /* If true, the size of the stub save area plus inline int reg saves will
> - result in an 8 byte offset, so needs padding.  */
> -  BOOL_BITFIELD call_ms2sysv_pad_out : 1;
> -
>/* This is the number of extra registers saved by stub (valid range is
>   0-6). Each additional register is only saved/restored by the stubs
>   if all successive ones are. (Will always be zero when using a hard
> --
> 2.13.3
>


Re: [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset

2017-07-31 Thread Uros Bizjak
On Mon, Jul 31, 2017 at 1:24 PM, Daniel Santos  wrote:
> This value was used in an earlier incarnation of the
> -mcall-ms2sysv-xlogues patch set but is now set and never read.  The
> value of ix86_frame::sse_reg_save_offset serves the same purpose.

OK as obvious patch.

Thanks,
Uros.

> Signed-off-by: Daniel Santos 
> ---
>  gcc/config/i386/i386.c | 1 -
>  gcc/config/i386/i386.h | 4 +---
>  2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 690631dfe43..47c5608c3cd 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void)
>
>offset += xlogue.get_stack_space_used ();
>gcc_assert (!(offset & 0xf));
> -  frame->outlined_save_offset = offset;
>  }
>
>/* Align and set SSE register save area.  */
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ce5bb7f6677..1648bdf1556 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2477,8 +2477,7 @@ enum avx_u128_state
> <- end of stub-saved/restored regs
>   [padding1]
> ]
> -   <- outlined_save_offset
> -   <- sse_regs_save_offset
> +   <- sse_reg_save_offset
> [padding2]
>|<- FRAME_POINTER
> [va_arg registers]  |
> @@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame
>HOST_WIDE_INT reg_save_offset;
>HOST_WIDE_INT stack_realign_allocate_offset;
>HOST_WIDE_INT stack_realign_offset;
> -  HOST_WIDE_INT outlined_save_offset;
>HOST_WIDE_INT sse_reg_save_offset;
>
>/* When save_regs_using_mov is set, emit prologue using
> --
> 2.13.3
>


Re: [PATCH] Compile pr79793-[12].c with -mtune=generic

2017-07-31 Thread Uros Bizjak
On Mon, Jul 31, 2017 at 3:47 PM, H.J. Lu  wrote:
> pr79793-1.c and pr79793-2.c are failed when GCC is configured with
> --with-cpu=slm since lea is used to adjust stack, instead of sub/add.
> This patch uses -mtune=generic to always generate sub and add.
>
> OK for trunk?

OK.

Thanks,
Uros.

> H.J.
> 
> * gcc.target/i386/pr79793-1.c: Compile with -mtune=generic.
> * gcc.target/i386/pr79793-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr79793-1.c | 2 +-
>  gcc/testsuite/gcc.target/i386/pr79793-2.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr79793-1.c 
> b/gcc/testsuite/gcc.target/i386/pr79793-1.c
> index a382fe9c5e2..1cc67a83ba3 100644
> --- a/gcc/testsuite/gcc.target/i386/pr79793-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr79793-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
> -/* { dg-options "-O2 -mgeneral-regs-only" } */
> +/* { dg-options "-O2 -mgeneral-regs-only -mtune=generic" } */
>
>  void
>   __attribute__ ((interrupt))
> diff --git a/gcc/testsuite/gcc.target/i386/pr79793-2.c 
> b/gcc/testsuite/gcc.target/i386/pr79793-2.c
> index f6ae5aed33a..e1e6463e120 100644
> --- a/gcc/testsuite/gcc.target/i386/pr79793-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr79793-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
> -/* { dg-options "-O2 -mgeneral-regs-only" } */
> +/* { dg-options "-O2 -mgeneral-regs-only -mtune=generic" } */
>
>  typedef unsigned int uword_t __attribute__ ((mode (__word__)));
>
> --
> 2.13.3
>


[PATCH] Compile pr79793-[12].c with -mtune=generic

2017-07-31 Thread H.J. Lu
pr79793-1.c and pr79793-2.c are failed when GCC is configured with
--with-cpu=slm since lea is used to adjust stack, instead of sub/add.
This patch uses -mtune=generic to always generate sub and add.

OK for trunk?

H.J.

* gcc.target/i386/pr79793-1.c: Compile with -mtune=generic.
* gcc.target/i386/pr79793-2.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr79793-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr79793-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr79793-1.c 
b/gcc/testsuite/gcc.target/i386/pr79793-1.c
index a382fe9c5e2..1cc67a83ba3 100644
--- a/gcc/testsuite/gcc.target/i386/pr79793-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr79793-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
-/* { dg-options "-O2 -mgeneral-regs-only" } */
+/* { dg-options "-O2 -mgeneral-regs-only -mtune=generic" } */
 
 void
  __attribute__ ((interrupt))
diff --git a/gcc/testsuite/gcc.target/i386/pr79793-2.c 
b/gcc/testsuite/gcc.target/i386/pr79793-2.c
index f6ae5aed33a..e1e6463e120 100644
--- a/gcc/testsuite/gcc.target/i386/pr79793-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr79793-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
-/* { dg-options "-O2 -mgeneral-regs-only" } */
+/* { dg-options "-O2 -mgeneral-regs-only -mtune=generic" } */
 
 typedef unsigned int uword_t __attribute__ ((mode (__word__)));
 
-- 
2.13.3



Re: [PATCH] Fix typo in std::stack (PR libstdc++/81599)

2017-07-31 Thread Marek Polacek
On Mon, Jul 31, 2017 at 04:37:19PM +0300, Ville Voutilainen wrote:
> On 31 July 2017 at 16:25, Marek Polacek  wrote:
> > The documentation of std::stack says that the underlying container must 
> > support
> > pop_front, but that is wrong, it meant to say pop_back, so this patch fixes
> > that.
> 
> Indeed, the documentation has a copy-pasto originating from bits/stl_queue.h.
> 
> > Ok for trunk?
> 
> 
> I can't approve the patch, but I suggest committing it as obvious.

Yea, will do.  Thanks,

Marek


Re: [PATCH] Fix typo in std::stack (PR libstdc++/81599)

2017-07-31 Thread Ville Voutilainen
On 31 July 2017 at 16:25, Marek Polacek  wrote:
> The documentation of std::stack says that the underlying container must 
> support
> pop_front, but that is wrong, it meant to say pop_back, so this patch fixes
> that.

Indeed, the documentation has a copy-pasto originating from bits/stl_queue.h.

> Ok for trunk?


I can't approve the patch, but I suggest committing it as obvious.


[WWW PATCH]: Mention that x86 now supports "naked" function attribute.

2017-07-31 Thread Uros Bizjak
One liner that mentions new addition to x86 port.

OK for wwwdocs?

Uros.
Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.7
diff -u -r1.7 changes.html
--- htdocs/gcc-8/changes.html   3 Jul 2017 16:37:04 -   1.7
+++ htdocs/gcc-8/changes.html   31 Jul 2017 13:26:13 -
@@ -119,7 +119,8 @@
 
 IA-32/x86-64
 
-  
+  
+The x86 port now supports the naked function attribute.
 
 
 


[PATCH] Fix typo in std::stack (PR libstdc++/81599)

2017-07-31 Thread Marek Polacek
The documentation of std::stack says that the underlying container must support
pop_front, but that is wrong, it meant to say pop_back, so this patch fixes
that.

Ok for trunk?

2017-07-31  Marek Polacek  

PR libstdc++/81599
* include/bits/stl_stack.h: Fix typo.

diff --git gcc/include/bits/stl_stack.h gcc/include/bits/stl_stack.h
index ac59ec715cf..5f2b4ab4486 100644
--- gcc/include/bits/stl_stack.h
+++ gcc/include/bits/stl_stack.h
@@ -86,7 +86,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  The second template parameter defines the type of the underlying
*  sequence/container.  It defaults to std::deque, but it can be
-   *  any type that supports @c back, @c push_back, and @c pop_front,
+   *  any type that supports @c back, @c push_back, and @c pop_back,
*  such as std::list, std::vector, or an appropriate user-defined
*  type.
*

Marek


Re: [PATCH] Fix PR81354 (rewrite gimple_split_edge)

2017-07-31 Thread Bill Schmidt
That would certainly be much simpler!  I'll regstrap it and test it on the other
occurrence I've found to be certain.

-- Bill

> On Jul 31, 2017, at 4:15 AM, Richard Biener  
> wrote:
> 
> On Sun, Jul 30, 2017 at 8:04 PM, Bill Schmidt
>  wrote:
>> Hi,
>> 
>> PR81354 identifies a latent bug that can happen in SLSR since the
>> conditional candidate support was first added.  SLSR relies on the
>> address of a GIMPLE PHI remaining constant during the course of the
>> optimization pass, but it needs to split edges.  The use of
>> make_single_succ_edge and reinstall_phi_args in gimple_split_edge
>> causes GIMPLE PHI statements to be temporarily expanded to add a
>> predecessor, and then rebuilt to have the original number of
>> predecessors.  The expansion usually, if not always, causes the PHI
>> statement to change address.  Thus gimple_split_edge needs to be
>> rewritten to perform in-situ replacement of PHI arguments.
>> 
>> The required pieces of make_single_succ_edge have been extracted into
>> two places:  make_replacement_pred_edge, and some fixup code at the
>> end of gimple_split_edge.  The division is necessary because the
>> destination of the original edge must remember its original
>> predecessors for the switch processing in
>> gimple_redirect_edge_and_branch_1 to work properly.
>> 
>> The function gimple_redirect_edge_and_branch was factored into two
>> pieces so that most of it can be used by gimple_split_edge without
>> calling ssa_redirect_edge, which also interferes with PHIs.  The
>> useful bits of ssa_redirect_edge are factored out into the next three
>> lines of gimple_split_edge.
>> 
>> Similarly, redirect_eh_edge had already been similarly factored into
>> redirect_eh_edge_1 and ssa_redirect_edge.  I took advantage of that
>> and exposed redirect_eh_edge_1 for use in gimple_redirect_edge_and_branch_1.
>> 
>> I've added the test from PR81354 as a torture test, but as we've seen,
>> small changes elsewhere in the optimizer can easily hide the problem.
>> 
>> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
>> Is this ok for trunk?  Eventually this needs to be backported to GCC 5,
>> 6, and 7 if that's acceptable, since PR81354 was observed on
>> gcc-5-branch.  I haven't yet prepared the backports.
> 
> I don't like make_replacement_pred_edge too much.  Wouldn't it work
> to make sure we first shrink and then re-grow like if we simply do the
> redirect_edge_and_branch before the make_single_succ_edge call?
> At least quick testing shows it fixes the testcase on the GCC 6 branch for me.
> 
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c  (revision 250732)
> +++ gcc/tree-cfg.c  (working copy)
> @@ -2753,12 +2753,16 @@ gimple_split_edge (edge edge_in)
>   new_bb = create_empty_bb (after_bb);
>   new_bb->frequency = EDGE_FREQUENCY (edge_in);
>   new_bb->count = edge_in->count;
> +
> +  /* First redirect the existing edge to avoid reallocating
> + PHI nodes in dest.  */
> +  e = redirect_edge_and_branch (edge_in, new_bb);
> +  gcc_assert (e == edge_in);
> +
>   new_edge = make_edge (new_bb, dest, EDGE_FALLTHRU);
>   new_edge->probability = REG_BR_PROB_BASE;
>   new_edge->count = edge_in->count;
> 
> -  e = redirect_edge_and_branch (edge_in, new_bb);
> -  gcc_assert (e == edge_in);
>   reinstall_phi_args (new_edge, e);
> 
>   return new_bb;
> 
> Sorry for misleading you to a complex solution.
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Bill
>> 
>> 
>> [gcc]
>> 
>> 2017-07-30  Bill Schmidt  
>> 
>>PR tree-optimization/81354
>>* tree-cfg.c (gimple_redirect_edge_and_branch_1): New decl.
>>(reinstall_phi_args): Delete function.
>>(make_replacement_pred_edge): New function.
>>(gimple_split_edge): Rewrite.
>>(gimple_redirect_edge_and_branch_1): New function, factored
>>from...
>>(gimple_redirect_edge_and_branch): ...here.
>>(split_critical_edges): Don't re-split already split edges.
>>* tree-eh.c (redirect_eh_edge_1): Make visible.
>>* tree-eh.h (redirect_eh_edge_1): Likewise.
>> 
>> [gcc/testsuite]
>> 
>> 2017-07-30  Bill Schmidt  
>> 
>>PR tree-optimization/81354
>>* g++.dg/torture/pr81354.C: New file.
>> 
>> 
>> Index: gcc/testsuite/g++.dg/torture/pr81354.C
>> ===
>> --- gcc/testsuite/g++.dg/torture/pr81354.C  (nonexistent)
>> +++ gcc/testsuite/g++.dg/torture/pr81354.C  (working copy)
>> @@ -0,0 +1,24 @@
>> +// PR81354 reported this test as crashing in a limited range of revisions.
>> +// { dg-do compile }
>> +
>> +struct T { double a; double b; };
>> +
>> +void foo(T Ad[], int As[2])
>> +{
>> +  int j;
>> +  int i;
>> +  int Bs[2] = {0,0};
>> +  T Bd[16];
>> +
>> +  for (j = 0; j < 4; j++) {
>> +for (i = 0; i + 

Re: [PATCH, i386]: Implement attribute ((naked))

2017-07-31 Thread Uros Bizjak
On Sun, Jul 30, 2017 at 10:14 PM, Uros Bizjak  wrote:
> Hello!
>
> attribute ((naked)) generates function body without function frame,
> and as shown in PR 25967 [1], users are looking for this feature also
> for x86 targets. Recently, Daniel introduced a testcase that would
> benefit from this attribute.

Following additional patch enables passing arguments to a naked
function. Testcases gcc.target/i386/naked-3.c and
gcc.target/i386/naked-4.c show necessary decorations (i.e. mregparm
for -m32 and volatile "ret" for function result) to reliably pass
function arguments to and function result from naked functions.

2017-07-31  Uros Bizjak  

PR target/25967
* config/i386/i386.c (ix86_allocate_stack_slots_for_args):
New function.
(TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS): Define.

testsuite/ChangeLog:

2017-07-31  Uros Bizjak  

PR target/25967
* gcc.target/i386/naked-3.c (dg-options): Use -O0.
(naked): Add attribute regparm(1) for x86_32 targets.
Add integer argument.  Remove global "data" variable.
(main): Pass integer argument to naked function.
* gcc.target/i386/naked-4.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 250736)
+++ config/i386/i386.c  (working copy)
@@ -31676,6 +31676,13 @@ ix86_trampoline_init (rtx m_tramp, tree fndecl, rt
 }
 
 static bool
+ix86_allocate_stack_slots_for_args (void)
+{
+  /* Naked functions should not allocate stack slots for arguments.  */
+  return !ix86_function_naked (current_function_decl);
+}
+
+static bool
 ix86_warn_func_return (tree decl)
 {
   /* Naked functions are implemented entirely in assembly, including the
@@ -52727,6 +52734,8 @@ ix86_run_selftests (void)
 #define TARGET_SETUP_INCOMING_VARARGS ix86_setup_incoming_varargs
 #undef TARGET_MUST_PASS_IN_STACK
 #define TARGET_MUST_PASS_IN_STACK ix86_must_pass_in_stack
+#undef TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
+#define TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS ix86_allocate_stack_slots_for_args
 #undef TARGET_FUNCTION_ARG_ADVANCE
 #define TARGET_FUNCTION_ARG_ADVANCE ix86_function_arg_advance
 #undef TARGET_FUNCTION_ARG
Index: testsuite/gcc.target/i386/naked-3.c
===
--- testsuite/gcc.target/i386/naked-3.c (revision 250736)
+++ testsuite/gcc.target/i386/naked-3.c (working copy)
@@ -1,17 +1,18 @@
 /* { dg-do run { target *-*-linux* *-*-gnu* } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O0" } */
 
 #include 
 #include 
 #include 
 
-int data;
-
 /* Verify that naked function traps at the end.  */
 
 void
 __attribute__((naked, noinline, noclone))
-naked (void)
+#ifdef __i386__
+__attribute__((regparm(1)))
+#endif
+naked (int data)
 {
   if (data == 0x12345678)
 return;
@@ -32,8 +33,7 @@ int main ()
   s.sa_flags = 0;
   sigaction (SIGILL, , NULL);
 
-  data = 0x12345678;
-  naked ();
+  naked (0x12345678);
 
   abort ();
 }
Index: testsuite/gcc.target/i386/naked-4.c
===
--- testsuite/gcc.target/i386/naked-4.c (nonexistent)
+++ testsuite/gcc.target/i386/naked-4.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-mregparm=3" { target ia32 } } */
+
+/* Verify that __attribute__((naked)) produces a naked function 
+   that does not allocate stack slots for args.  */
+extern void bar (int);
+
+int
+__attribute__((naked))
+foo (int a, int b, int c)
+{
+  bar (c);
+  asm volatile ("ret" :: "a" (b));
+}
+
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */


[PATCH, committed] Add myself to MAINTAINERS

2017-07-31 Thread Robin Dapp
ChangeLog:

2017-07-31  Robin Dapp  

* MAINTAINERS (write after approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 250740)
+++ MAINTAINERS	(working copy)
@@ -356,6 +356,7 @@
 Lawrence Crowl	
 Ian Dall	
 David Daney	
+Robin Dapp	
 Simon Dardis	
 Bud Davis	
 Chris Demetriou	


Re: [PATCH v12] add -fpatchable-function-entry=N,M option

2017-07-31 Thread Maxim Kuvyrkov
On Jul 26, 2017, at 5:33 PM, Andreas Schwab  wrote:
> 
> On Jul 26 2017, Torsten Duwe  wrote:
> 
>> On Wed, Jul 26, 2017 at 04:16:25PM +0200, Andreas Schwab wrote:
>>> On Jul 07 2017, Torsten Duwe  wrote:
>>> 
 diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c 
 b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c
 new file mode 100644
 index 000..8514b10e820
 --- /dev/null
 +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c
 @@ -0,0 +1,16 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
 +/* { dg-final { scan-assembler-times "nop" 2 } } */
>>> 
>>> This fails on ia64.
>> 
>> The solution is fairly obvious: on architectures where the nop is not called
>> "nop" provide a custom, cpu-specific test, or document the failure.
> 
> But on ia64, a nop _is_ called nop.

The problem here is that ia64 backend emits "nop" instructions to pad IA64 
bundles.  The 2 nops at the beginning are [as expected] from the patchable 
attribute, but [unexpected] nops after ld8.mov and before "add r8" are 
generated by ia64 bundle packing.

nop 0
nop 0
.prologue
.body
.mmi
addl r14 = @ltoffx(a#), r1
;;
ld8.mov r14 = [r14], a#
nop 0
;;
.mmi
ld4 r14 = [r14]
;;
shladd r8 = r14, 2, r0
nop 0
;;
.mib
nop 0
add r8 = r8, r14
br.ret.sptk.many b0

I don't see an easy way to correctly differentiate between "attribute" nops and 
"bundle" nops, so XFAILing these tests on ia64 seems like a valid approach.

I speculate that other tests fail on ia64 for the same reason, but I didn't 
check.

--
Maxim Kuvyrkov
www.linaro.org





Re: [PATCH] Make inlining consistent in LTO and non-LTO mode (PR target/71991).

2017-07-31 Thread Martin Liška
Honza?

Thanks,
Martin

On 06/30/2017 03:50 PM, Martin Liška wrote:
> On 06/28/2017 05:18 PM, Jan Hubicka wrote:
>>> On 06/28/2017 04:24 PM, Jan Hubicka wrote:
> -  /* If callee has no option attributes, then it is ok to inline.  */
> -  if (!callee_tree)
> +  /* If callee has no option attributes (or default),
> + then it is ok to inline.  */
> +  if (!callee_tree || callee_tree == target_option_default_node)

 I am not sure this actually makes sense, because 
 target_option_default_node is not very
 meaningful for LTO (it contains whatever was passed to LTO driver). 
>>>
>>> I see!
>>>
>>>  Perhaps one can check
 for explicit optimization/machine attribute and whether caller and callee 
 come from
> 
> I'm not sure what you mean by 'for explicit optimization/machine attribute' ?
> 
> I'm attaching a new patch, is it closer?
> 
> Martin
> 
 same compilation unit, though this is quite hackish and will do unexpected 
 things with COMDATs.
>>>
>>> That's quite cumbersome. Any other idea than marking the PR as won't fix?
>>
>> Yep, it is not prettiest. The problem is that the concept that callee can 
>> change semantics
>> when no explicit attribute is present is sloppy.  I am not sure how many 
>> programs rely on it
>> (it is kind of surprising to see functions not being inlined into your 
>> target attribute annotated
>> function I guess).
>> Note that we check for original file in inliner already - this can be done 
>> by comparing lto_file_data
>> of corresponding cgraph nodes.
>>
>> Honza
>>
> 



[Committed] S/390: Support z14 as CPU name.

2017-07-31 Thread Andreas Krebbel
With IBM z14 officially announced we can add support for z14 as
preferred CPU name.  We still pass arch12 to Binutils in order to keep
older Binutils versions supported.

Bootstrapped and regression-tested on s390x.

Committed to mainline and GCC 7 branch.

Bye,

-Andreas-


gcc/ChangeLog:

2017-07-31  Andreas Krebbel  

* config.gcc: Add z14.
* config/s390/driver-native.c (s390_host_detect_local_cpu): Add
CPU model numbers for z13s and z14.
* config/s390/s390-c.c (s390_resolve_overloaded_builtin): Replace
arch12 with z14.
* config/s390/s390-opts.h (enum processor_type): Rename
PROCESSOR_ARCH12 to PROCESSOR_3906_Z14.
* config/s390/s390.c (processor_table): Add field for CPU name to
be passed to Binutils.
(s390_asm_output_machine_for_arch): Use the new field in
processor_table for Binutils.
(s390_expand_builtin): Replace arch12 with z14.
(s390_issue_rate): Rename PROCESSOR_ARCH12 to PROCESSOR_3906_Z14.
(s390_get_sched_attrmask): Likewise.
(s390_get_unit_mask): Likewise.
* config/s390/s390.opt: Add z14 to processor_type enum.
---
 gcc/config.gcc  |  2 +-
 gcc/config/s390/driver-native.c |  6 +-
 gcc/config/s390/s390-c.c|  4 ++--
 gcc/config/s390/s390-opts.h |  2 +-
 gcc/config/s390/s390.c  | 38 +-
 gcc/config/s390/s390.opt|  5 -
 6 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 9196843..a9196cd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4336,7 +4336,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | native | g5 | g6 | z900 | z990 | z9-109 | z9-ec | 
z10 | z196 | zEC12 | z13 | arch3 | arch5 | arch6 | arch7 | arch8 | arch9 | 
arch10 | arch11 | arch12)
+   "" | native | g5 | g6 | z900 | z990 | z9-109 | z9-ec | 
z10 | z196 | zEC12 | z13 | z14 | arch3 | arch5 | arch6 | arch7 | arch8 | arch9 
| arch10 | arch11 | arch12)
# OK
;;
*)
diff --git a/gcc/config/s390/driver-native.c b/gcc/config/s390/driver-native.c
index 4bcddb4..acb9836 100644
--- a/gcc/config/s390/driver-native.c
+++ b/gcc/config/s390/driver-native.c
@@ -112,10 +112,14 @@ s390_host_detect_local_cpu (int argc, const char **argv)
  cpu = "zEC12";
  break;
case 0x2964:
+   case 0x2965:
  cpu = "z13";
  break;
+   case 0x3906:
+ cpu = "z14";
+ break;
default:
- cpu = "arch12";
+ cpu = "z14";
  break;
}
}
diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 35c3545..07224ad 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -886,7 +886,7 @@ s390_resolve_overloaded_builtin (location_t loc,
 
   if (!TARGET_VXE && (ob_flags & B_VXE))
 {
-  error_at (loc, "%qF requires -march=arch12 or higher", ob_fndecl);
+  error_at (loc, "%qF requires z14 or higher", ob_fndecl);
   return error_mark_node;
 }
 
@@ -963,7 +963,7 @@ s390_resolve_overloaded_builtin (location_t loc,
   if (!TARGET_VXE
   && bflags_overloaded_builtin_var[last_match_index] & B_VXE)
 {
-  error_at (loc, "%qs matching variant requires -march=arch12 or higher",
+  error_at (loc, "%qs matching variant requires z14 or higher",
IDENTIFIER_POINTER (DECL_NAME (ob_fndecl)));
   return error_mark_node;
 }
diff --git a/gcc/config/s390/s390-opts.h b/gcc/config/s390/s390-opts.h
index 65ac4f8..6d506e2 100644
--- a/gcc/config/s390/s390-opts.h
+++ b/gcc/config/s390/s390-opts.h
@@ -38,7 +38,7 @@ enum processor_type
   PROCESSOR_2817_Z196,
   PROCESSOR_2827_ZEC12,
   PROCESSOR_2964_Z13,
-  PROCESSOR_ARCH12,
+  PROCESSOR_3906_Z14,
   PROCESSOR_NATIVE,
   PROCESSOR_max
 };
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 7be22d9..c408d59 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -318,24 +318,27 @@ struct processor_costs zEC12_cost =
 
 static struct
 {
+  /* The preferred name to be used in user visible output.  */
   const char *const name;
+  /* CPU name as it should be passed to Binutils via .machine  */
+  const char *const binutils_name;
   const enum processor_type processor;
   const struct processor_costs *cost;
 }
 const processor_table[] =
 {
-  { "g5", PROCESSOR_9672_G5, _cost },
-  { "g6", PROCESSOR_9672_G6, _cost },
-  { "z900",   PROCESSOR_2064_Z900,   _cost },
-  { "z990",   PROCESSOR_2084_Z990,   _cost },
-  { "z9-109", PROCESSOR_2094_Z9_109, _109_cost },
-  { "z9-ec",  PROCESSOR_2094_Z9_EC,  _109_cost },
-  { "z10",PROCESSOR_2097_Z10,  

Re: C PATCH to display types when printing a conversion warning (PR c/81233)

2017-07-31 Thread Marek Polacek
Ping.

On Thu, Jul 20, 2017 at 12:53:10PM +0200, Marek Polacek wrote:
> On Wed, Jul 19, 2017 at 10:51:33AM -0400, David Malcolm wrote:
> > The changes to diagnostic-core.h and diagnostic.c are OK.
> 
> Thanks.
>  
> > > Also,
> > > PEDWARN_FOR_ASSIGNMENT didn't work with the addition of printing TYPE
> > > and
> > > RHSTYPE so I just decided to unroll the macro instead of making it
> > > even more
> > > ugly.
> > > This patch is long but it's mainly because of the testsuite fallout.
> > 
> > The comment by PEDWARN_FOR_ASSIGNMENT says:
> > 
> > 
> >   /* This macro is used to emit diagnostics to ensure that all format
> >  strings are complete sentences, visible to gettext and checked
> > at
> >  compile time.  */
> > 
> > I wonder if it's possible to convert it to an inline function to get
> > the same test coverage, without unrolling the macro?
> 
> Yeah, I tried, but the resulting inline function would have to have 12
> parameters if I count well and that didn't seem like a win.  Perhaps
> splitting convert_for_assignment would make sense, but likely not as
> part of this patch.

Marek


[PATCH][v2] Introduce TARGET_SUPPORTS_ALIASES

2017-07-31 Thread Martin Liška
On 07/31/2017 11:57 AM, Yuri Gribov wrote:
> On Mon, Jul 31, 2017 at 9:04 AM, Martin Liška  wrote:
>> Hi.
>>
>> Doing the transformation suggested by Honza.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux-gnu and 
>> survives regression tests.
>> And I also verified that works on hppa2.0w-hp-hpux11.11 (target w/o aliasing 
>> support).
>>
>> Ready to be installed?
> 
> A nit - you can probly get rid of ATTRIBUTE_UNUSED in note_mangling_alias now.
> 
> -Y
> 

Sure.

Done in v2.

Martin
>From 78ee08b25d22125cb1fa248bac98ef1e84504761 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 25 Jul 2017 13:11:28 +0200
Subject: [PATCH] Introduce TARGET_SUPPORTS_ALIASES

gcc/c-family/ChangeLog:

2017-07-25  Martin Liska  

	* c-opts.c (c_common_post_options): Replace ASM_OUTPUT_DEF with
	TARGET_SUPPORTS_ALIASES.

gcc/ChangeLog:

2017-07-25  Martin Liska  

	* asan.c (asan_protect_global): Replace ASM_OUTPUT_DEF with
	TARGET_SUPPORTS_ALIASES.
	* cgraph.c (cgraph_node::create_same_body_alias): Likewise.
	* ipa-visibility.c (can_replace_by_local_alias): Likewise.
	(optimize_weakref): Likewise.
	* symtab.c (symtab_node::noninterposable_alias): Likewise.
	* varpool.c (varpool_node::create_extra_name_alias): Likewise.
	* defaults.h: Introduce TARGET_SUPPORTS_ALIASES.

gcc/cp/ChangeLog:

2017-07-25  Martin Liska  

	* decl2.c (get_tls_init_fn): Replace ASM_OUTPUT_DEF with
	TARGET_SUPPORTS_ALIASES.
	(handle_tls_init): Likewise.
	(note_mangling_alias): Likewise.  Remove ATTRIBUTE_UNUSED for
	both arguments.
	* optimize.c (can_alias_cdtor): Likewise.
---
 gcc/asan.c|  4 +---
 gcc/c-family/c-opts.c | 22 --
 gcc/cgraph.c  |  7 ---
 gcc/cp/decl2.c| 25 +++--
 gcc/cp/optimize.c |  6 +++---
 gcc/defaults.h|  9 +
 gcc/ipa-visibility.c  | 15 +--
 gcc/symtab.c  |  6 +++---
 gcc/varpool.c |  6 +++---
 9 files changed, 51 insertions(+), 49 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 5f9275f6425..d8cb2b52c8b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1663,10 +1663,8 @@ asan_protect_global (tree decl)
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
 return false;
 
-#ifndef ASM_OUTPUT_DEF
-  if (asan_needs_local_alias (decl))
+  if (!TARGET_SUPPORTS_ALIASES && asan_needs_local_alias (decl))
 return false;
-#endif
 
   return true;
 }
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 1657e7a4390..0b13a188c1b 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -957,16 +957,18 @@ c_common_post_options (const char **pfilename)
 
   if (flag_extern_tls_init)
 {
-#if !defined (ASM_OUTPUT_DEF) || !SUPPORTS_WEAK
-  /* Lazy TLS initialization for a variable in another TU requires
-	 alias and weak reference support. */
-  if (flag_extern_tls_init > 0)
-	sorry ("external TLS initialization functions not supported "
-	   "on this target");
-  flag_extern_tls_init = 0;
-#else
-  flag_extern_tls_init = 1;
-#endif
+  if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
+	{
+	  /* Lazy TLS initialization for a variable in another TU requires
+	 alias and weak reference support.  */
+	  if (flag_extern_tls_init > 0)
+	sorry ("external TLS initialization functions not supported "
+		   "on this target");
+
+	  flag_extern_tls_init = 0;
+	}
+  else
+	flag_extern_tls_init = 1;
 }
 
   if (num_in_fnames > 1)
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index d7c9ba61795..849989443b1 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -582,10 +582,11 @@ cgraph_node *
 cgraph_node::create_same_body_alias (tree alias, tree decl)
 {
   cgraph_node *n;
-#ifndef ASM_OUTPUT_DEF
+
   /* If aliases aren't supported by the assembler, fail.  */
-  return NULL;
-#endif
+  if (!TARGET_SUPPORTS_ALIASES)
+return NULL;
+
   /* Langhooks can create same body aliases of symbols not defined.
  Those are useless. Drop them on the floor.  */
   if (symtab->global_info_ready)
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 2a52f8ca3e2..29a2e3cf02d 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3156,11 +3156,9 @@ get_tls_init_fn (tree var)
   if (!flag_extern_tls_init && DECL_EXTERNAL (var))
 return NULL_TREE;
 
-#ifdef ASM_OUTPUT_DEF
   /* If the variable is internal, or if we can't generate aliases,
  call the local init function directly.  */
-  if (!TREE_PUBLIC (var))
-#endif
+  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
 return get_local_tls_init_fn ();
 
   tree sname = mangle_tls_init_fn (var);
@@ -4241,9 +4239,8 @@ handle_tls_init (void)
   tree init = TREE_PURPOSE (vars);
   one_static_initialization_or_destruction (var, init, true);
 
-#ifdef ASM_OUTPUT_DEF
   /* Output init aliases even with -fno-extern-tls-init.  */
-  if (TREE_PUBLIC (var))
+  if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
 	{
 

[PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-07-31 Thread Daniel Santos
The testcase in the PR is used as a base and relevant variants are added
to test other factors affected by the patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
  stubs.

Signed-off-by: Daniel Santos 
---
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |  26 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  31 
 gcc/testsuite/gcc.target/i386/pr80969-4.c  | 123 
 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 +
 gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 +
 gcc/testsuite/lib/target-supports.exp  |  66 +++
 8 files changed, 536 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c 
b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 000..eb8d767a778
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 000..e868d6c7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 000..071a90534a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 000..5982981b55c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with alloca (and DRAP).  */
+
+#include 
+
+int a[56];
+volatile int b = -12345;
+volatile const int d = 42;
+
+void foo (int *x, int y, int z)
+{
+}
+
+void (*volatile const foo_noinfo)(int *, int, int) = foo;
+
+int main (int argc, char *argv[]) {
+  int c;
+  int *e = alloca (d);
+  foo_noinfo (e, d, 0);
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+foo_noinfo (e, d, c);
+a[-(b % 56)] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c 
b/gcc/testsuite/gcc.target/i386/pr80969-4.c
new file mode 100644
index 000..1ec54d081cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
@@ -0,0 +1,123 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512 and va_args.  */
+
+#include 
+#include 
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 
2.99792 };
+__m128i n3 = { 893, -3180 } ;

[PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al.

2017-07-31 Thread Daniel Santos
The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 54 +++---
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e92f322de0c..7e1fc4dfbf5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13273,10 +13273,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14322,35 +14325,35 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame )
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout  = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+ we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+ + xlogue.get_stub_ptr_offset (), );
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
- layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, );
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-stack_alloc_size), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
 {
   const xlogue_layout::reginfo  = xlogue.get_reginfo (i);
   rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 r.regno);
-  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
 }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14608,8 +14611,8 @@ ix86_expand_prologue (void)
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
-  allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-  if (allocate && !m->call_ms2sysv)
+  allocate = frame.stack_realign_allocate;
+  if (allocate)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (-allocate), -1, false);
 
@@ -14618,8 +14621,7 @@ ix86_expand_prologue (void)
stack_pointer_rtx,
  

[PATCH 4/6] [i386] Modify ix86_compute_frame_layout

2017-07-31 Thread Daniel Santos
These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 116 +
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e2e9546a27c..e92f322de0c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12874,6 +12874,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+{
+  gcc_assert (stack_alignment_needed >= 16);
+  gcc_assert (!frame->nsseregs);
+}
+
   /* For SEH we have to limit the amount of code movement into the prologue.
  At present we do this via a BLOCKAGE, at which point there's very little
  scheduling that can be done, which means that there's very little point
@@ -12936,54 +12944,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
- is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  Use sp_valid_at
- to determine if the stack pointer is valid for a given offset and
- fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
 {
-  gcc_assert (stack_alignment_needed >= 16);
-  gcc_assert (!frame->nsseregs);
+  /* We may need a 16-byte aligned stack for the remainder of the
+register save area, but the stack frame for the local function
+may require a greater alignment if using AVX/2/512.  In order
+to avoid wasting space, we first calculate the space needed for
+the rest of the register saves, add that to the stack pointer,
+and then realign the stack to the boundary of the start of the
+frame for the local function.  */
+  HOST_WIDE_INT space_needed = 0;
+  HOST_WIDE_INT sse_reg_space_needed = 0;
 
-  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-  offset += xlogue_layout::get_instance ().get_stack_space_used ();
-}
+  if (TARGET_64BIT)
+   {
+ if (m->call_ms2sysv)
+   {
+ m->call_ms2sysv_pad_in = 0;
+ space_needed = xlogue_layout::get_instance 
().get_stack_space_used ();
+   }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-{
-  /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack.  However, many programs violate
-the ABI, and Wine64 forces stack realignment to compensate.
+ else if (frame->nsseregs)
+   /* The only ABI that has saved SSE registers (Win64) also has a
+  16-byte aligned default stack.  However, many programs violate
+  the ABI, and Wine64 forces stack realignment to compensate.  */
+   space_needed = frame->nsseregs * 16;
+
+ sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+ /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+rounding to be pedantic.  */
+ space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+   }
+  else
+   space_needed = frame->va_arg_size;
+
+  /* Record the allocation size required prior to the realignment AND.  */
+  frame->stack_realign_allocate = space_needed;
+
+  /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+before this point are not directly comparable with values below
+this point.  Use sp_valid_at to determine if the stack pointer is
+  

[PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out

2017-07-31 Thread Daniel Santos
The -mcall-ms2sysv-xlogues project added the boolean fields
call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function
to track rather or not an additional 8 bytes of padding was needed for
stack alignment prior to and after the stub save area.  This design was
based upon the faulty assumption the function body would not require a
stack alignment greater than 16 bytes.  This continues to work well for
managing padding prior to the stub save area, but will not work for the
outgoing alignment.

Rather than changing machine_function::call_ms2sysv_pad_out to a larger
type, this patch removes it, thus transferring responsibility for stack
alignment following the stub save area from class xlogue_layout to the
body of ix86_compute_frame_layout.  Since the 64-bit va_arg register
save area is always a multiple of 16-bytes in size (176 for System V ABI
and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset
at the start of the function body (frame.frame_pointer_offset) will
assure there is enough room for any padding needed to keep the save area
for SSE va_args 16-byte aligned, so no modification is needed for that
calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 18 --
 gcc/config/i386/i386.h |  8 ++--
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 47c5608c3cd..e2e9546a27c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2491,9 +2491,7 @@ public:
 unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
 gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
-return m_regs[last_reg].offset
-  + (m->call_ms2sysv_pad_out ? 8 : 0)
-  + STUB_INDEX_OFFSET;
+return m_regs[last_reg].offset + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void)
{
  unsigned count = xlogue_layout::count_stub_managed_regs ();
  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+ m->call_ms2sysv_pad_in = 0;
}
 }
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
-  m->call_ms2sysv_pad_in = 0;
-  m->call_ms2sysv_pad_out = 0;
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void)
   gcc_assert (!frame->nsseregs);
 
   m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-
-  /* Select an appropriate layout for incoming stack offset.  */
-  const struct xlogue_layout  = xlogue_layout::get_instance ();
-
-  if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD)
-   m->call_ms2sysv_pad_out = 1;
-
-  offset += xlogue.get_stack_space_used ();
-  gcc_assert (!(offset & 0xf));
+  offset += xlogue_layout::get_instance ().get_stack_space_used ();
 }
 
   /* Align and set SSE register save area.  */
@@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void)
 
   /* Align start of frame for local function.  */
   if (stack_realign_fp
+  || m->call_ms2sysv
   || offset != frame->sse_reg_save_offset
   || size != 0
   || !crtl->is_leaf
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1648bdf1556..b08e45f68d4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2646,17 +2646,13 @@ struct GTY(()) machine_function {
   BOOL_BITFIELD arg_reg_available : 1;
 
   /* If true, we're out-of-lining reg save/restore for regs clobbered
- by ms_abi functions calling a sysv function.  */
+ by 64-bit ms_abi functions calling a sysv_abi function.  */
   BOOL_BITFIELD call_ms2sysv : 1;
 
   /* If true, the incoming 16-byte aligned stack has an offset (of 8) and
- needs padding.  */
+ needs padding prior to out-of-line stub save/restore area.  */
   BOOL_BITFIELD call_ms2sysv_pad_in : 1;
 
-  /* If true, the size of the stub save area plus inline int reg saves will
- result in an 8 byte offset, so needs padding.  */
-  BOOL_BITFIELD call_ms2sysv_pad_out : 1;
-
   /* This is the number of extra registers saved by stub (valid range is
  0-6). Each additional register is only saved/restored by the stubs
  if all successive ones are. (Will always be zero when using a hard
-- 
2.13.3



[PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset

2017-07-31 Thread Daniel Santos
This value was used in an earlier incarnation of the
-mcall-ms2sysv-xlogues patch set but is now set and never read.  The
value of ix86_frame::sse_reg_save_offset serves the same purpose.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 1 -
 gcc/config/i386/i386.h | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 690631dfe43..47c5608c3cd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void)
 
   offset += xlogue.get_stack_space_used ();
   gcc_assert (!(offset & 0xf));
-  frame->outlined_save_offset = offset;
 }
 
   /* Align and set SSE register save area.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ce5bb7f6677..1648bdf1556 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2477,8 +2477,7 @@ enum avx_u128_state
<- end of stub-saved/restored regs
  [padding1]
]
-   <- outlined_save_offset
-   <- sse_regs_save_offset
+   <- sse_reg_save_offset
[padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
@@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT stack_realign_allocate_offset;
   HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
-- 
2.13.3



[PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at

2017-07-31 Thread Daniel Santos
When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 45 ++---
 gcc/config/i386/i386.h | 18 +-
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1486ff3750..690631dfe43 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13102,26 +13102,36 @@ choose_baseaddr_len (unsigned int regno, 
HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state  = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
- && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+  return false;
+}
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state  = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
- && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+  return false;
+}
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14560,6 +14570,9 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+  /* Record last valid frame pointer offset.  */
+  m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
   /* The computation of the size of the re-aligned stack frame means
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14573,13 +14586,15 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-  /* For the purposes of register save area addressing, the stack
-pointer can no longer be used to access anything in the frame
-below m->fs.sp_realigned_offset and the frame pointer cannot be
-used for anything at or above.  */
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
   m->fs.sp_realigned = true;
   m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+Beyond this point, stack access should be done via choose_baseaddr or
+by using sp_valid_at and fp_valid_at to determine the correct base
+register.  Henceforth, any CFA offset should be thought of as logical
+and not physical.  */
+  gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
   gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
   /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 is needed to describe where a register 

[PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-07-31 Thread Daniel Santos
When working on the Wine64 project to use aligned SSE MOVs after SP 
realignment and adding -mcall-ms2sysv-xlogues, I overlooked the fact 
that the function body may require a stack alignment greater than 
16-bytes.  This can result in an ICE with -mabi=ms -mavx512f and some 
other cases.  This patch set reworks the strategy for calculating the 
frame layout following normal (inline) integral register saves (at 
frame.reg_save_offset) to the start of the frame for the local function 
(frame.frame_pointer_offset).


I've completed a bootstrap and full regression test with no additional 
failures, but I don't have access to a machine with avx512 extensions.  
I have manually run the tests that need it using the Intel SDE, but I 
haven't been able to validate that my 
check_effective_target_avx512f_runtime code in 
gcc/testsuite/lib/target-supports.exp is correctly enabling the tests 
for pr80969-4*.c.


As an aside note, I still have some rework of the ms-sysv.exp tests that 
I haven't yet to submitted and in which I'm adding more tests for cases 
with uncommon stacks, as in PR 81563.


Thanks,
Daniel
2017-07-23  Daniel Santos  

* config/i386/i386.h (ix86_frame::outlined_save_offset): Remove field.
(ix86_frame::stack_realign_allocate_offset): Likewise.
(ix86_frame::stack_realign_allocate): New field.
(struct machine_frame_state): Modify comments.
(machine_frame_state::sp_realigned_fp_end): New field.
(machine_function::call_ms2sysv_pad_out): Remove field.
* config/i386/i386.c (xlogue_layout::get_stack_space_used): Modify.
(ix86_compute_frame_layout): Likewise.
(sp_valid_at): Likewise.
(fp_valid_at): Likewise.
(choose_baseaddr): Modify comments.
(ix86_emit_outlined_ms2sysv_save): Modify.
(ix86_expand_prologue): Likewise.
(ix86_expand_epilogue): Modify comments.
2017-07-23  Daniel Santos  
* gcc.target/i386/pr80969-1.c: New testcase.
* gcc.target/i386/pr80969-2a.c: Likewise.
* gcc.target/i386/pr80969-2.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/i386/pr80969-4a.c: Likewise.
* gcc.target/i386/pr80969-4b.c: Likewise.
* gcc.target/i386/pr80969-4.c: Likewise.


C++ PATCH to fix ICE with bit-fields and ?: (PR c++/81607)

2017-07-31 Thread Marek Polacek
We crash in the gimplifier because it sees a shift expression with different
types of op1/op2.  The problem arises in cp_fold: it gets "1 ? d.c : 0" of type
int which is folded to "d.c", but it then sees that d.c is a bit-field with
declared type long int, so if produces "(long int) d.c" -- but that differs
from the original COND_EXPR type!  Folding should not change the type of the
CODE_EXPR, thus this fix.

Bootstrapped/regtested on x86_64-linux, ok for trunk/7/6?

2017-07-31  Marek Polacek  

PR c++/81607
* cp-gimplify.c (cp_fold): If folding exposed a branch of
a COND_EXPR, convert it to the original type of the COND_EXPR, if
they differ.   

* g++.dg/other/bitfield6.C: New test.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index a9563b1a8cd..9fc4ab18207 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -2314,9 +2314,9 @@ cp_fold (tree x)
 
   /* A COND_EXPR might have incompatible types in branches if one or both
 arms are bitfields.  If folding exposed such a branch, fix it up.  */
-  if (TREE_CODE (x) != code)
-   if (tree type = is_bitfield_expr_with_lowered_type (x))
- x = fold_convert (type, x);
+  if (TREE_CODE (x) != code
+ && !useless_type_conversion_p (TREE_TYPE (org_x), TREE_TYPE (x)))
+   x = fold_convert (TREE_TYPE (org_x), x);
 
   break;
 
diff --git gcc/testsuite/g++.dg/other/bitfield6.C 
gcc/testsuite/g++.dg/other/bitfield6.C
index e69de29bb2d..c1e8a17989b 100644
--- gcc/testsuite/g++.dg/other/bitfield6.C
+++ gcc/testsuite/g++.dg/other/bitfield6.C
@@ -0,0 +1,9 @@
+// PR c++/81607
+
+int a;
+
+struct b {
+  long c : 32;
+} d;
+
+char f = (903092 ? int(d.c) : 0) << a;

Marek


Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues

2017-07-31 Thread Daniel Santos

On 07/28/2017 09:41 AM, H.J. Lu wrote:

On Fri, Jul 28, 2017 at 6:57 AM, Daniel Santos  wrote:

On 07/26/2017 02:03 PM, H.J. Lu wrote:

This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563

Hello.  I've rebased my patch set and I'm now retesting.  I'm afraid that
your changes are wrong because my my sp_valid_at and fp_valid_at functions
are wrong -- these are supposed to be for the base offset and not the CFA
offset, sorry about that.  This means that the check in choose_basereg (and
thus choose_baseaddr) have been wrong as well.  I'm retesting now.

Please check your change with gcc.target/i386/pr81563.c.

Thanks.


I'm still getting used to x86 stack math and and briefly I thought that 
my understanding of the CFA was wrong and that I had messed up 
sp_valid_at and fp_valid_at, but I was mistaken, so sorry for the false 
alarm.  My rebased patches pass all tests, so it's OK.


Re: [PATCH] Introduce TARGET_SUPPORTS_ALIASES

2017-07-31 Thread Yuri Gribov
On Mon, Jul 31, 2017 at 9:04 AM, Martin Liška  wrote:
> Hi.
>
> Doing the transformation suggested by Honza.
>
> Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux-gnu and survives 
> regression tests.
> And I also verified that works on hppa2.0w-hp-hpux11.11 (target w/o aliasing 
> support).
>
> Ready to be installed?

A nit - you can probly get rid of ATTRIBUTE_UNUSED in note_mangling_alias now.

-Y


Re: [ARM, VXworks] Fix build

2017-07-31 Thread Richard Earnshaw (lists)
On 25/07/17 11:31, Olivier Hainque wrote:
> Hi Richard,
> 
> (back from a few days away)
> 
>> On Jul 17, 2017, at 12:01 , Eric Botcazou  wrote:
>>
>>> That's good news.  Does that mean we'll be able to drop the old stuff
>>> though?  I'd really like to make progress towards removing the old ABI
>>> support from GCC.
>>
>> Yes, I'd think so, but Olivier knows better.
>>
>>> Note that considering a port for deprecation is only the first step.  It
>>> does not mean that it has been deprecated.  Sometimes the only way to
>>> find out if a port really is wanted is to make such a threat...
>>
>> No disagreement. ;-)
> 
> In this instance, no doubt we want to keep the port in. As I mentioned
> off-list, the port is still very active from our (AdaCore) perspective,
> I have just recently been enlisted as maintainer and plan to contribute
> significant updates.
> 
> Part of that is support for VxWorks 7, which has switched to the new
> ABI.
> 
> Regarding removal of old ABI support, which release were you
> targeting ?
> 
> On the VxWorks front, where we adapt to what the system toolchains
> do, it will mean dropping support for VxWorks versions prior to 7,
> which is not so old - couple of years I think. Not the end of the
> world, but an extra release cycle can make a difference.
> 
> Is VxWorks alone in this category ?

Pretty close, I think.  The only other ARM port using the old ABI that
I'm aware of is NetBSD.  That doesn't use GCC as it's default compiler
these days and even there an EABI port is in use (I suspect Clang
requires it).

So until I became aware of the VXworks port using the old ABI, I thought
there was only one remaining port - VXworks makes that 2 but both seem
to have a transition plan of sorts.

I think deprecating in gcc-8 with removal in GCC-9 is probably viable on
that basis, so that's my opening bid.  Given that gcc-8 will have a
2-year support window that means support for the old ABI through to
~2020, at which point v2 of the EABI will itself be 15 years old.

R.

> 
> Olivier
> 
> 
> 
> 
> 



Re: [PATCH] Fix PR81354 (rewrite gimple_split_edge)

2017-07-31 Thread Richard Biener
On Sun, Jul 30, 2017 at 8:04 PM, Bill Schmidt
 wrote:
> Hi,
>
> PR81354 identifies a latent bug that can happen in SLSR since the
> conditional candidate support was first added.  SLSR relies on the
> address of a GIMPLE PHI remaining constant during the course of the
> optimization pass, but it needs to split edges.  The use of
> make_single_succ_edge and reinstall_phi_args in gimple_split_edge
> causes GIMPLE PHI statements to be temporarily expanded to add a
> predecessor, and then rebuilt to have the original number of
> predecessors.  The expansion usually, if not always, causes the PHI
> statement to change address.  Thus gimple_split_edge needs to be
> rewritten to perform in-situ replacement of PHI arguments.
>
> The required pieces of make_single_succ_edge have been extracted into
> two places:  make_replacement_pred_edge, and some fixup code at the
> end of gimple_split_edge.  The division is necessary because the
> destination of the original edge must remember its original
> predecessors for the switch processing in
> gimple_redirect_edge_and_branch_1 to work properly.
>
> The function gimple_redirect_edge_and_branch was factored into two
> pieces so that most of it can be used by gimple_split_edge without
> calling ssa_redirect_edge, which also interferes with PHIs.  The
> useful bits of ssa_redirect_edge are factored out into the next three
> lines of gimple_split_edge.
>
> Similarly, redirect_eh_edge had already been similarly factored into
> redirect_eh_edge_1 and ssa_redirect_edge.  I took advantage of that
> and exposed redirect_eh_edge_1 for use in gimple_redirect_edge_and_branch_1.
>
> I've added the test from PR81354 as a torture test, but as we've seen,
> small changes elsewhere in the optimizer can easily hide the problem.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this ok for trunk?  Eventually this needs to be backported to GCC 5,
> 6, and 7 if that's acceptable, since PR81354 was observed on
> gcc-5-branch.  I haven't yet prepared the backports.

I don't like make_replacement_pred_edge too much.  Wouldn't it work
to make sure we first shrink and then re-grow like if we simply do the
redirect_edge_and_branch before the make_single_succ_edge call?
At least quick testing shows it fixes the testcase on the GCC 6 branch for me.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 250732)
+++ gcc/tree-cfg.c  (working copy)
@@ -2753,12 +2753,16 @@ gimple_split_edge (edge edge_in)
   new_bb = create_empty_bb (after_bb);
   new_bb->frequency = EDGE_FREQUENCY (edge_in);
   new_bb->count = edge_in->count;
+
+  /* First redirect the existing edge to avoid reallocating
+ PHI nodes in dest.  */
+  e = redirect_edge_and_branch (edge_in, new_bb);
+  gcc_assert (e == edge_in);
+
   new_edge = make_edge (new_bb, dest, EDGE_FALLTHRU);
   new_edge->probability = REG_BR_PROB_BASE;
   new_edge->count = edge_in->count;

-  e = redirect_edge_and_branch (edge_in, new_bb);
-  gcc_assert (e == edge_in);
   reinstall_phi_args (new_edge, e);

   return new_bb;

Sorry for misleading you to a complex solution.

Thanks,
Richard.

> Thanks,
> Bill
>
>
> [gcc]
>
> 2017-07-30  Bill Schmidt  
>
> PR tree-optimization/81354
> * tree-cfg.c (gimple_redirect_edge_and_branch_1): New decl.
> (reinstall_phi_args): Delete function.
> (make_replacement_pred_edge): New function.
> (gimple_split_edge): Rewrite.
> (gimple_redirect_edge_and_branch_1): New function, factored
> from...
> (gimple_redirect_edge_and_branch): ...here.
> (split_critical_edges): Don't re-split already split edges.
> * tree-eh.c (redirect_eh_edge_1): Make visible.
> * tree-eh.h (redirect_eh_edge_1): Likewise.
>
> [gcc/testsuite]
>
> 2017-07-30  Bill Schmidt  
>
> PR tree-optimization/81354
> * g++.dg/torture/pr81354.C: New file.
>
>
> Index: gcc/testsuite/g++.dg/torture/pr81354.C
> ===
> --- gcc/testsuite/g++.dg/torture/pr81354.C  (nonexistent)
> +++ gcc/testsuite/g++.dg/torture/pr81354.C  (working copy)
> @@ -0,0 +1,24 @@
> +// PR81354 reported this test as crashing in a limited range of revisions.
> +// { dg-do compile }
> +
> +struct T { double a; double b; };
> +
> +void foo(T Ad[], int As[2])
> +{
> +  int j;
> +  int i;
> +  int Bs[2] = {0,0};
> +  T Bd[16];
> +
> +  for (j = 0; j < 4; j++) {
> +for (i = 0; i + 1 <= j + 1; i++) {
> +  Ad[i + As[0] * j] = Bd[i + Bs[0] * j];
> +}
> +
> +i = j + 1;  // <- comment out this line and it does not crash
> +for (; i + 1 < 5; i++) {
> +  Ad[i + As[0] * j].a = 0.0;
> +  Ad[i + As[0] * j].b = 0.0;
> +}
> +  }
> +}
> Index: gcc/tree-cfg.c
> ===

Re: [PATCH 4/N] Recover GOTO predictor.

2017-07-31 Thread Richard Biener
On Mon, Jul 31, 2017 at 9:46 AM, Martin Liška  wrote:
> Richi?

4 is fine.

> Thanks
>
> On 06/30/2017 03:48 PM, Martin Liška wrote:
>> On 06/22/2017 12:27 PM, Richard Biener wrote:
>>> On Wed, Jun 21, 2017 at 3:06 PM, Martin Liška  wrote:
 Hello.

 There's one additional predictor enhancement that is GOTO predict that
 used to working. Following patch adds expect statement for C and C++ family
 languages.

 There's one fallout which is vrp24.c test-case, where only 'Simplified 
 relational'
 appears just once. Adding Richi and Patrick who can probably help how to 
 fix the
 test-case.
>>>
>>> Happens to be optimized better now, there's only one predicate to simplify
>>> left in the IL input to VRP1.  I suggest to change it to match 1 times and 
>>> add
>>> -fdump-tree-optimized to dg-options and
>>>
>>> /* { dg-final { scan-tree-dump-times "if " 3 "optimized" } } */
>>>
>>> to verify we have 3 conditions left.
>>
>> One small note, I see 4 conditions in optimized dump:
>>
>> sss (struct rtx_def * insn, int code1, int code2, int code3)
>> {
>>   int D1544;
>>   struct rtx_def * body;
>>   _Bool D1562;
>>
>>[100.00%] [count: INV]:
>>   body_5 = insn_4(D)->u.fld[5].rt_rtx;
>>   D1544_6 = body_5->code;
>>   if (D1544_6 == 55)
>> goto  (L7); [34.00%] [count: INV]
>>   else
>> goto ; [66.00%] [count: INV]
>>
>>[66.00%] [count: INV]:
>>   if (code3_7(D) == 99)
>> goto ; [34.00%] [count: INV]
>>   else
>> goto  (L16); [66.00%] [count: INV]
>>
>>[22.44%] [count: INV]:
>>   D1562_9 = code1_8(D) == 10;
>>   if (D1562_9 == 1)
>> goto  (L7); [64.00%] [count: INV]
>>   else
>> goto  (L16); [36.00%] [count: INV]
>>
>>[9.82%] [count: INV]:
>>   arf ();
>>
>>[46.68%] [count: INV]:
>>   frob (); [tail call]
>>   goto  (L16); [100.00%] [count: INV]
>>
>> L7 [48.36%] [count: INV]:
>>   if (code2_12(D) == 42)
>> goto ; [20.24%] [count: INV]
>>   else
>> goto ; [79.76%] [count: INV]
>>
>> L16 [100.00%] [count: INV]:
>>   return;
>>
>> }
>>
>> Is it a problem or adjusting to 4 is fine?
>>
>> Martin
>>
>>>
 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
 Martin
>>
>


  1   2   >