[PATCH] Fix PR87906

2018-11-06 Thread Richard Biener


This adds a workaround for LTO decl merging prevailing a
non-ultimate origin decl, breaking invariants of the middle-end.
In the future (GCC 10) I hope to have DIE references here so
this will not be an issue there anymore.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

>From ff035da8314ea8e0889b99bb338e67dd5dae455b Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Wed, 7 Nov 2018 08:56:52 +0100
Subject: [PATCH] fix-pr87906

2018-11-07  Richard Biener  

PR lto/87906
* tree-streamer-in.c (lto_input_ts_block_tree_pointers): Fixup
BLOCK_ABSTRACT_ORIGIN to be the ultimate origin.

* g++.dg/lto/pr87906_0.C: New testcase.
* g++.dg/lto/pr87906_1.C: Likewise.

diff --git a/gcc/testsuite/g++.dg/lto/pr87906_0.C 
b/gcc/testsuite/g++.dg/lto/pr87906_0.C
new file mode 100644
index 000..08e7ed3ba07
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr87906_0.C
@@ -0,0 +1,35 @@
+// { dg-lto-do link }
+// { dg-lto-options { { -O -fPIC -flto } } }
+// { dg-extra-ld-options "-shared -nostdlib" }
+
+namespace com {
+namespace sun {
+namespace star {}
+} // namespace sun
+} // namespace com
+namespace a = com::sun::star;
+namespace com {
+namespace sun {
+namespace star {
+namespace uno {
+class a {
+public:
+  ~a();
+};
+
+class b {
+public:
+  ~b();
+  a c;
+};
+class c {
+  b e;
+};
+class RuntimeException : b {};
+} // namespace uno
+} // namespace star
+} // namespace sun
+} // namespace com
+template  void d(int) { throw a::uno::RuntimeException(); }
+int f;
+void g() { d(f); }
diff --git a/gcc/testsuite/g++.dg/lto/pr87906_1.C 
b/gcc/testsuite/g++.dg/lto/pr87906_1.C
new file mode 100644
index 000..ee5849fd604
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr87906_1.C
@@ -0,0 +1,23 @@
+namespace com {
+namespace sun {
+namespace star {
+namespace uno {
+class a {
+public:
+  ~a();
+};
+class b {
+public:
+~b();
+  a c;
+};
+class RuntimeException : b {};
+} // namespace uno
+class C : uno::RuntimeException {};
+} // namespace star
+} // namespace sun
+} // namespace com
+using com::sun::star::C;
+using com::sun::star::uno::RuntimeException;
+void d() { throw RuntimeException(); }
+void e() { C(); }
diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index bd98ed0b128..5c989a221cd 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -915,6 +915,12 @@ lto_input_ts_block_tree_pointers (struct lto_input_block 
*ib,
 
   BLOCK_SUPERCONTEXT (expr) = stream_read_tree (ib, data_in);
   BLOCK_ABSTRACT_ORIGIN (expr) = stream_read_tree (ib, data_in);
+  /* We may end up prevailing a decl with DECL_ORIGIN (t) != t here
+ which breaks the invariant that BLOCK_ABSTRACT_ORIGIN is the
+ ultimate origin.  Fixup here.
+ ???  This should get fixed with moving to DIE references.  */
+  if (DECL_P (BLOCK_ORIGIN (expr)))
+BLOCK_ABSTRACT_ORIGIN (expr) = DECL_ORIGIN (BLOCK_ABSTRACT_ORIGIN (expr));
   /* Do not stream BLOCK_NONLOCALIZED_VARS.  We cannot handle debug information
  for early inlined BLOCKs so drop it on the floor instead of ICEing in
  dwarf2out.c.  */


Re: Fix bug in fld_type_variant

2018-11-06 Thread Jan Hubicka
> Hi,
> in fld_type_variant I lost code copying alignment. This patch fixes it
> and also checks that newly constructed variant is indeed compatible.
> 
> Bootstrapped/regtested x86_64-linux, comitted as obvious.
Hi,
it turns out that situation is more subtle. We need to copy aliases for
normal types (such as variants of pointers) but we need to not copy them
when we create incomplete variant of complete type. Incomplete variant
alignment is always BITS_PER_UNIT.

I have regtested & lto-bootstrapped the attached patch which furhter
reduces number of type duplicates from 500 to 200.

Honza

* tree.c (fld_type_variant_equal_p): Skip TYPE_ALIGN check when
building incomplete variant of complete type.
(fld_type_variant): Do not copy TYPE_ALIGN when building incomplete
variant of complete type.
Index: tree.c
===
--- tree.c  (revision 265848)
+++ tree.c  (working copy)
@@ -5106,12 +5106,15 @@ static bool
 fld_type_variant_equal_p (tree t, tree v)
 {
   if (TYPE_QUALS (t) != TYPE_QUALS (v)
-  || TYPE_ALIGN (t) != TYPE_ALIGN (v)
+  /* We want to match incomplete variants with complete types.
+In this case we need to ignore alignment.   */
+  || ((!RECORD_OR_UNION_TYPE_P (t) || COMPLETE_TYPE_P (v))
+ && TYPE_ALIGN (t) != TYPE_ALIGN (v))
   || fld_simplified_type_name (t) != fld_simplified_type_name (v)
   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
TYPE_ATTRIBUTES (v)))
 return false;

   return true;
 }
 
@@ -5134,7 +5137,10 @@ fld_type_variant (tree first, tree t, st
   TYPE_NAME (v) = TYPE_NAME (t);
   TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
   TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
-  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+  /* Variants of incomplete types should have alignment 
+ set to BITS_PER_UNIT.  Do not copy the actual alignment.  */
+  if (!RECORD_OR_UNION_TYPE_P (v) || COMPLETE_TYPE_P (v))
+SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
   gcc_checking_assert (fld_type_variant_equal_p (t,v));
   add_tree_to_fld_list (v, fld);
   return v;


[PR87793] reject non-toplevel unspecs in debug loc exprs on x86

2018-11-06 Thread Alexandre Oliva
Before revision 254025, we'd reject UNSPECs in debug loc exprs.
TARGET_CONST_NOT_OK_FOR_DEBUG_P still rejects that by default, on all
ports that override it, except for x86, that accepts @gotoff unspecs.
We can indeed accept them in top-level expressions, but not as
subexpressions: the assembler rejects the difference between two
@gotoff symbols, for example.

We could simplify such a difference and drop the @gotoffs, provided
that the symbols are in the same section; we could also accept
@gotoffs plus literal constants.  However, accepting those but
rejecting such combinations as subexpressions would be ugly, and most
likely not worth the trouble: sym@gotoff+litconst hardly makes sense
as a standalone expression, and the difference between @gotoffs should
be avoided to begin with, as follows.

Ideally, the debug loc exprs would use the symbolic data in
REG_EQUIV/REG_EQUAL notes, or delegitimized addresses, instead of
simplifying the difference between two legitimized addresses so that
the occurrences of the GOT register cancel each other.  That would
require some more elaborate surgery in var-tracking and cselib than
would be appropriate at this stage.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR target/87793
* config/i386/i386.c (ix86_const_not_ok_for_debug_p): Reject
non-toplevel UNSPEC.

for  gcc/testsuite/ChangeLog

PR target/87793
* gcc.dg/pr87793.c: New.
---
 gcc/config/i386/i386.c |   12 +++
 gcc/testsuite/gcc.dg/pr87793.c |   42 
 2 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr87793.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ae8971c82b0a..424a4b20631c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17172,6 +17172,18 @@ ix86_const_not_ok_for_debug_p (rtx x)
   if (SYMBOL_REF_P (x) && strcmp (XSTR (x, 0), GOT_SYMBOL_NAME) == 0)
 return true;
 
+  /* Reject UNSPECs within expressions.  We could accept symbol@gotoff
+ + literal_constant, but that would hardly come up in practice,
+ and it's not worth the trouble of having to reject that as an
+ operand to pretty much anything else.  */
+  if (UNARY_P (x)
+  && GET_CODE (XEXP (x, 0)) == UNSPEC)
+return true;
+  if (BINARY_P (x)
+  && (GET_CODE (XEXP (x, 0)) == UNSPEC
+ || GET_CODE (XEXP (x, 1)) == UNSPEC))
+return true;
+
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr87793.c b/gcc/testsuite/gcc.dg/pr87793.c
new file mode 100644
index ..3194313a265d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87793.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-fpic -Os -g" } */
+
+struct fit_loadable_tbl {
+   int type;
+   void (*handler)(int data, int size);
+};
+
+#define ll_entry_start(_type, _list)   \
+({ \
+   static char start[0] __attribute__((aligned(4)))\
+   __attribute__((unused, section(".u_boot_list_2_"#_list"_1")));  
\
+   (_type *)\
+})
+
+#define ll_entry_end(_type, _list) \
+({ \
+   static char end[0] __attribute__((aligned(4)))  \
+   __attribute__((unused, section(".u_boot_list_2_"#_list"_3")));  
\
+   (_type *)  \
+})
+
+#define ll_entry_count(_type, _list)   \
+   ({  \
+   _type *start = ll_entry_start(_type, _list);\
+   _type *end = ll_entry_end(_type, _list);\
+   unsigned int _ll_result = end - start;  \
+   _ll_result; \
+   })
+
+void test(int img_type, int img_data, int img_len)
+{
+   int i;
+   const unsigned int count =
+   ll_entry_count(struct fit_loadable_tbl, fit_loadable);
+   struct fit_loadable_tbl *fit_loadable_handler =
+   ll_entry_start(struct fit_loadable_tbl, fit_loadable);
+
+   for (i = 0; i < count; i++, fit_loadable_handler++)
+   if (fit_loadable_handler->type == img_type)
+   fit_loadable_handler->handler(img_data, img_len);
+}

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH] x86: Optimize VFIXUPIMM* patterns with multiple-alternative constraints

2018-11-06 Thread Uros Bizjak
On Tue, Nov 6, 2018 at 11:16 AM Wei Xiao  wrote:
>
> Hi maintainers,
>
> The attached patch intends to optimize VFIXUPIMM* patterns with
> multiple-alternative constraints and
> 4 patterns are combined into 2 patterns. Tested with bootstrap and
> regression tests on x86_64. No regressions.
>
> Is it OK for trunk?

I'm not convinced that this particular optimization is a good idea.
Looking at the patch, you have to add a whole bunch of substs just to
merge two pattern sets. Also, the approach diverges from established
approach of handling zero masks. The later raises maintenance costs
for no compelling reason.

I'd say to leave these patterns the way they are.

Uros.


Re: Update libquadmath fmaq from glibc, fix nanq issues

2018-11-06 Thread Jakub Jelinek
On Wed, Nov 07, 2018 at 12:29:21AM +, Joseph Myers wrote:
> This patch extends update-quadmath.py to update fmaq from glibc.
> 
> The issue in that function was that quadmath-imp.h had a struct in a
> union with mant_high and mant_low fields (up to 64-bit) whereas glibc
> has mantissa0, mantissa1, mantissa2 and mantissa3 (up to 32-bit).  The
> patch changes those fields to be the same as in glibc, moving printf /
> strtod code that also uses those fields back to closer to the glibc
> form.  This allows fmaq to be updated automatically from glibc (which
> brings in at least one bug fix from glibc from 2015).
> 
> nanq was also using the mant_high field name, and had other issues: it
> only partly initialized the union from which a value was returned, and
> setting mant_high to 1 meant a signaling NaN would be returned rather
> than a quiet NaN.  This patch fixes those issues as part of updating
> it to use the changed interfaces (but does not fix the issue of not
> using the argument).
> 
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.

Don't know about the dropping of HAVE_FENV_H/USE_FENV_H stuff, don't we
support libquadmath on targets that don't have fenv.h?
In other sources, like e.g. expq.c, the USE_FENV_H guards are still kept.

Jakub


[PATCH 6/6] [RS6000] inline plt call sequences

2018-11-06 Thread Alan Modra
Finally, the point of the previous patches in this series, support for
inline PLT calls, keyed off -fno-plt.  This emits code using new
relocations that tie all insns in the sequence together, so that the
linker can edit the sequence back to a direct call should the call
target turn out to be local.  An example of ELFv2 code to call puts is
as follows:

 .reloc .,R_PPC64_PLTSEQ,puts
std 2,24(1)
 .reloc .,R_PPC64_PLT16_HA,puts
addis 12,2,0
 .reloc .,R_PPC64_PLT16_LO_DS,puts
ld 12,0(12)
 .reloc .,R_PPC64_PLTSEQ,puts
mtctr 12
 .reloc .,R_PPC64_PLTCALL,puts
bctrl
ld 2,24(1)

"addis 12,2,puts@plt@ha" and "ld 12,puts@plt@l(12)" are also supported
by the assembler.  gcc instead uses the explicit R_PPC64_PLT16_HA and
R_PPC64_PLT16_LO_DS relocs because when the call is to __tls_get_addr
an extra reloc is emitted at every place where one is shown above, to
specify the __tls_get_addr arg.  The linker expects the extra reloc to
come first.  .reloc enforces that ordering.

The patch also changes code emitted for longcalls if the assembler
supports the new marker relocs, so that these too can be edited.  One
side effect of longcalls using PLT16 relocs is that they can now be
resolved lazily by ld.so.

I don't support lazy inline PLT calls for ELFv1, because ELFv1 would
need barriers to reliably load both the function address and toc
pointer from the PLT.  ELFv1 -fno-plt uses the longcall sequence
instead, which isn't edited by GNU ld.

* config.in (HAVE_AS_PLTSEQ): Add.
* config/rs6000/predicates.md (indirect_call_operand): New.
* config/rs6000/rs6000-protos.h (rs6000_output_pltseq): Declare.
* config/rs6000/rs6000.c (init_cumulative_args): Set cookie
CALL_LONG for -fno-plt.
(print_operand ): Handle UNSPEC_PLTSEQ.
(rs6000_output_indirect_call): Emit .reloc directives for
UNSPEC_PLTSEQ calls.
(rs6000_output_pltseq): New function.
(rs6000_longcall_ref): Add arg parameter.  Use PLT16 insns if
relocs supported by assembler.  Move SYMBOL_REF test to callers.
(rs6000_call_aix): Adjust rs6000_longcall_ref call.  Package
insns in UNSPEC_PLTSEQ, preserving original func_desc.
(rs6000_call_sysv): Likewise.
(rs6000_sibcall_sysv): New function.
* config/rs6000/rs6000-protos.h (rs6000_sibcall_sysv): Declare.
* config/rs6000/rs6000.h (HAVE_AS_PLTSEQ): Provide default.
* config/rs6000/rs6000.md (UNSPEC_PLTSEQ, UNSPEC_PLT16_HA,
UNSPEC_PLT16_LO): New.
(pltseq_tocsave, pltseq_plt16_ha, pltseq_plt16_lo, pltseq_mtctr): New.
(call_indirect_nonlocal_sysv): Don't differentiate zero from non-zero
cookie in constraints.  Test explicitly for flags in length attr.
Handle unspec operand 1.
(call_value_indirect_nonlocal_sysv): Likewise.
(call_indirect_aix, call_value_indirect_aix): Handle unspec operand 1.
(call_indirect_elfv2, call_value_indirect_elfv2): Likewise.
(sibcall, sibcall_value): Use rs6000_sibcall_sysv.
(sibcall_indirect_nonlocal_sysv): New pattern.
(sibcall_value_indirect_nonlocal_sysv): Likewise.
(sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv): Remove indirect
call alternatives.
* configure.ac: Check for gas plt sequence marker support.
* configure: Regenerate.

diff --git a/gcc/config.in b/gcc/config.in
index 67a1e6cfc4c..86ff5e8636b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -577,6 +577,12 @@
 #endif
 
 
+/* Define if your assembler supports R_PPC*_PLTSEQ relocations. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_PLTSEQ
+#endif
+
+
 /* Define if your assembler supports .ref */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_REF
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8f13c1457e4..805d92ea1f1 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1055,6 +1055,24 @@ (define_predicate "call_operand"
  || REGNO (op) >= FIRST_PSEUDO_REGISTER")
  (match_code "symbol_ref")))
 
+;; Return 1 if the operand, used inside a MEM, is a valid first argument
+;; to an indirect CALL.  This is LR, CTR, or a PLTSEQ unspec using CTR.
+(define_predicate "indirect_call_operand"
+  (match_code "reg,unspec")
+{
+  if (REG_P (op))
+return (REGNO (op) == LR_REGNO
+   || REGNO (op) == CTR_REGNO);
+  if (GET_CODE (op) == UNSPEC)
+{
+  if (XINT (op, 1) != UNSPEC_PLTSEQ)
+   return false;
+  op = XVECEXP (op, 0, 0);
+  return REG_P (op) && REGNO (op) == CTR_REGNO;
+}
+  return false;
+})
+
 ;; Return 1 if the operand is a SYMBOL_REF for a function known to be in
 ;; this file.
 (define_predicate "current_file_function_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 9e84c692a9b..27a161983da 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ 

[PATCH 5/6] [RS6000] Use standard call patterns for __tls_get_addr calls

2018-11-06 Thread Alan Modra
The current code handling __tls_get_addr calls for powerpc*-linux
generates a call then overwrites the call insn with a special
tls_{gd,ld}_{aix,sysv} pattern.  It's done that way to support
!TARGET_TLS_MARKERS, where the arg setup insns need to be emitted
immediately before the branch and link.  When TARGET_TLS_MARKERS, the
arg setup insns are split from the actual call, but we then have a
non-standard call pattern that needs to be carried through to output.

This patch changes that scheme, to instead use the standard call
patterns for __tls_get_addr calls, except for the now rare
!TARGET_TLS_MARKERS case.  Doing it this way should be better for
maintenance as the !TARGET_TLS_MARKERS code can eventually disappear.
It also makes it possible to support longcalls (and in following
patches, inline plt calls) for __tls_get_addr without introducing yet
more special call patterns.

__tls_get_addr calls do however need to be different to standard
calls, because when TARGET_TLS_MARKERS the calls are decorated with an
argument specifier, eg. "bl __tls_get_addr(thread_var@tlsgd)" that
causes a reloc to be emitted by the assembler tying the call to its
arg setup insns.  I chose to smuggle the arg in the currently unused
stack size rtl.

I've also introduced rs6000_call_sysv to generate rtl for sysv calls,
as rs6000_call_aix does for aix and elfv2 calls.  This allows
rs6000_longcall_ref to be local to rs6000.c since the calls in the
expanders never did anything for darwin.

* config/rs6000/predicates.md (unspec_tls): New.
* config/rs6000/rs6000-protos.h (rs6000_output_call): Update proto.
(rs6000_longcall_ref): Delete.
(rs6000_call_sysv): Declare.
* config/rs6000/rs6000.c (edit_tls_call_insn): New function.
(global_tlsarg): New variable.
(rs6000_legitimize_tls_address): Rewrite __tls_get_addr call
handling.
(print_operand): Extract UNSPEC_TLSGD address operand.
(rs6000_output_call): Remove arg parameter, extract from second
call operand instead.
(rs6000_longcall_ref): Make static, localize vars.
(rs6000_call_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.  Don't create unused rtl or nop insns.
(rs6000_sibcall_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.
(rs6000_call_sysv): New function.
* config/rs6000/rs6000.md: Adjust rs6000_output_call throughout.
(tls_sysv_suffix): Delete.
(tls_gd_aix, tls_gd_sysv, tls_gd_call_aix, tls_gd_call_sysv): Delete.
(tls_ld_aix, tls_ld_sysv, tls_ld_call_aix, tls_ld_call_sysv): Delete.
(tls_gdld_aix, tls_gdld_sysv): New insns, replacing above.
(tls_gd): Swap operand order.  Simplify mode selection.
(tls_gd_high, tls_gd_low): Swap operand order.
(tls_ld): Remove const_int 0 vector element from UNSPEC_TLSLD.
Simplify mode selection.
(tls_ld_high, tls_ld_low): Similarly adjust UNSPEC_TLSLD.
(call, call_value): Don't assert for second call operand.
Use rs6000_call_sysv.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 40b0114a64f..8f13c1457e4 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1039,6 +1039,13 @@ (define_predicate "rs6000_tls_symbol_ref"
   (and (match_code "symbol_ref")
(match_test "RS6000_SYMBOL_REF_TLS_P (op)")))
 
+;; Return 1 for the UNSPEC used in TLS call operands
+(define_predicate "unspec_tls"
+  (match_code "unspec")
+{
+  return XINT (op, 1) == UNSPEC_TLSGD || XINT (op, 1) == UNSPEC_TLSLD;
+})
+
 ;; Return 1 if the operand, used inside a MEM, is a valid first argument
 ;; to CALL.  This is a SYMBOL_REF, a pseudo-register, LR or CTR.
 (define_predicate "call_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 493cfe6ba2b..9e84c692a9b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,7 +111,7 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
-extern const char *rs6000_output_call (rtx *, unsigned int, bool, const char 
*);
+extern const char *rs6000_output_call (rtx *, unsigned int, bool);
 extern const char *rs6000_output_indirect_call (rtx *, unsigned int, bool);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
@@ -134,7 +134,6 @@ extern void rs6000_expand_atomic_op (enum rtx_code, rtx, 
rtx, rtx, rtx, rtx);
 extern void rs6000_emit_swdiv (rtx, rtx, rtx, bool);
 extern void rs6000_emit_swsqrt (rtx, rtx, bool);
 extern void output_toc (FILE *, rtx, int, machine_mode);
-extern rtx rs6000_longcall_ref (rtx);
 extern void rs6000_fatal_bad_address (rtx);
 extern rtx create_TOC_reference (rtx, 

[PATCH 4/6] [RS6000] Remove constraints on call rounded_stack_size_rtx arg

2018-11-06 Thread Alan Modra
This call arg is unused on rs6000.

* config/rs6000/darwin.md (call_indirect_nonlocal_darwin64,
call_nonlocal_darwin64, call_value_indirect_nonlocal_darwin64,
call_value_nonlocal_darwin64): Remove constraints from second call
arg, the rounded_stack_size_rtx arg.
* config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv,
tls_gd_call_aix, tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv,
tls_ld_call_aix, tls_ld_call_sysv, call_local32, call_local64,
call_value_local32, call_value_local64, call_indirect_nonlocal_sysv,
call_nonlocal_sysv, call_nonlocal_sysv_secure,
call_value_indirect_nonlocal_sysv, call_value_nonlocal_sysv,
call_value_nonlocal_sysv_secure, call_local_aix,
call_value_local_aix, call_nonlocal_aix, call_value_nonlocal_aix,
call_indirect_aix, call_value_indirect_aix, call_indirect_elfv2,
call_value_indirect_elfv2, sibcall_local32, sibcall_local64,
sibcall_value_local32, sibcall_value_local64, sibcall_aix,
sibcall_value_aix): Likewise.

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 2d6d1ca57dd..a1c07702d6f 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -302,7 +302,7 @@ (define_insn "macho_correct_pic_di"
 
 (define_insn "*call_indirect_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l,c,*l"))
-(match_operand 1 "" "g,g,g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN && TARGET_64BIT"
@@ -314,7 +314,7 @@ (define_insn "*call_indirect_nonlocal_darwin64"
 
 (define_insn "*call_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s,s"))
-(match_operand 1 "" "g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
@@ -332,7 +332,7 @@ (define_insn "*call_nonlocal_darwin64"
 (define_insn "*call_value_indirect_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l,c,*l"))
- (match_operand 2 "" "g,g,g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN"
@@ -345,7 +345,7 @@ (define_insn "*call_value_indirect_nonlocal_darwin64"
 (define_insn "*call_value_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "symbol_ref_operand" "s,s"))
- (match_operand 2 "" "g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3f9830bc743..bed4c6c48fa 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9445,7 +9445,7 @@ (define_mode_attr tls_insn_suffix [(SI "wz") (DI "d")])
 (define_insn_and_split "tls_gd_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9479,7 +9479,7 @@ (define_insn_and_split "tls_gd_aix"
 (define_insn_and_split "tls_gd_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9546,7 +9546,7 @@ (define_insn "*tls_gd_low"
 (define_insn "*tls_gd_call_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9561,7 +9561,7 @@ (define_insn "*tls_gd_call_aix"
 (define_insn "*tls_gd_call_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9574,7 +9574,7 @@ (define_insn "*tls_gd_call_sysv"
 (define_insn_and_split "tls_ld_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 2 "symbol_ref_operand" "s"))
- (match_operand 3 "" "g")))
+  

[PATCH 3/6] [RS6000] Replace TLSmode with P, and correct tls call mems

2018-11-06 Thread Alan Modra
There is really no need to define a TLSmode mode iterator that is
identical (since !TARGET_64BIT == TARGET_32BIT) to the much used P
mode iterator.  It's nonsense to think we might ever want to support
32-bit TLS on 64-bit or vice versa!  The patch also fixes a minor
error in the call mems.  All other direct calls use (call (mem:SI ..)).

* config/rs6000/rs6000.md (TLSmode): Delete mode iterator.  Replace
with P throughout except for call mems which should use SI.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9d9e29d12eb..3f9830bc743 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9438,18 +9438,17 @@ (define_peephole2
 ;; TLS support.
 
 ;; Mode attributes for different ABIs.
-(define_mode_iterator TLSmode [(SI "! TARGET_64BIT") (DI "TARGET_64BIT")])
 (define_mode_attr tls_abi_suffix [(SI "32") (DI "64")])
 (define_mode_attr tls_sysv_suffix [(SI "si") (DI "di")])
 (define_mode_attr tls_insn_suffix [(SI "wz") (DI "d")])
 
-(define_insn_and_split "tls_gd_aix"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_aix"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
@@ -9462,28 +9461,28 @@ (define_insn_and_split 
"tls_gd_aix"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set (attr "length")
  (if_then_else (ne (symbol_ref "TARGET_CMODEL") (symbol_ref 
"CMODEL_SMALL"))
-  (const_int 16)
-  (const_int 12)))])
+  (const_int 16)
+  (const_int 12)))])
 
-(define_insn_and_split "tls_gd_sysv"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_sysv"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && DEFAULT_ABI == ABI_V4"
 {
@@ -9492,64 +9491,64 @@ (define_insn_and_split 
"tls_gd_sysv"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set_attr "length" "8")])
 
-(define_insn_and_split "*tls_gd"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-(match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-   UNSPEC_TLSGD))]
+(define_insn_and_split "*tls_gd"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+ UNSPEC_TLSGD))]
   "HAVE_AS_TLS && TARGET_TLS_MARKERS"
   "addi %0,%1,%2@got@tlsgd"
   "&& TARGET_CMODEL != 

[PATCH 2/6] [RS6000] rs6000_output_indirect_call

2018-11-06 Thread Alan Modra
Like the last patch for external calls, now handle most assembly code
for indirect calls in one place.  The patch also merges some insns,
correcting some !rs6000_speculate_indirect_jumps cases branching to
LR, which don't require a speculation barrier.

* config/rs6000/rs6000-protos.h (rs6000_output_indirect_call): Declare.
* config/rs6000/rs6000.c (rs6000_output_indirect_call): New function.
* config/rs6000/rs6000.md (call_indirect_nonlocal_sysv): Use
rs6000_output_indirect_call.
(call_value_indirect_nonlocal_sysv, sibcall_nonlocal_sysv): Likewise.
(call_indirect_aix, call_value_indirect_aix,
call_indirect_elfv2, call_value_indirect_elfv2): Likewise, and
handle both speculation and non-speculation cases.
(call_indirect_aix_nospec, call_value_indirect_aix_nospec): Delete.
(call_indirect_elfv2_nospec, call_value_indirect_elfv2_nospec): Delete.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index f1a421dde16..493cfe6ba2b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -112,6 +112,7 @@ extern void rs6000_output_function_entry (FILE *, const 
char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
 extern const char *rs6000_output_call (rtx *, unsigned int, bool, const char 
*);
+extern const char *rs6000_output_indirect_call (rtx *, unsigned int, bool);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b22cae55a0d..bf1551746d5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21411,6 +21411,69 @@ rs6000_output_call (rtx *operands, unsigned int fun, 
bool sibcall,
   return str;
 }
 
+/* As above, for indirect calls.  */
+
+const char *
+rs6000_output_indirect_call (rtx *operands, unsigned int fun, bool sibcall)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  FUN is 0 or 1 as of 2018-03.  */
+  gcc_assert (fun <= 6);
+
+  static char str[144];
+  const char *ptrload = TARGET_64BIT ? "d" : "wz";
+
+  bool speculate = (rs6000_speculate_indirect_jumps
+   || (REG_P (operands[fun])
+   && REGNO (operands[fun]) == LR_REGNO));
+
+  if (DEFAULT_ABI == ABI_AIX)
+{
+  if (speculate)
+   sprintf (str,
+"l%s 2,%%%u\n\t"
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+ptrload, fun + 2, fun, ptrload, fun + 3);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"l%s 2,%%%u\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+ptrload, fun + 2, fun, ptrload, fun + 3);
+}
+  else if (DEFAULT_ABI == ABI_ELFv2)
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+fun, ptrload, fun + 2);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+fun, ptrload, fun + 2);
+}
+  else if (DEFAULT_ABI == ABI_V4)
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%u%s",
+fun, "l" + sibcall);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%u%s-%s",
+fun, "l" + sibcall, sibcall ? "\n\tb $" : "");
+}
+  else
+gcc_unreachable ();
+  return str;
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 52088fdfbdb..9d9e29d12eb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10540,11 +10540,7 @@ (define_insn "*call_indirect_nonlocal_sysv"
   else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
 output_asm_insn ("creqv 6,6,6", operands);
 
-  if (rs6000_speculate_indirect_jumps
-  || which_alternative == 1 || which_alternative == 3)
-return "b%T0l";
-  else
-return "crset 2\;beq%T0l-";
+  return rs6000_output_indirect_call (operands, 0, false);
 }
   [(set_attr "type" "jmpreg,jmpreg,jmpreg,jmpreg")
(set_attr_alternative "length"
@@ -10630,11 +10626,7 @@ (define_insn "*call_value_indirect_nonlocal_sysv"
   else if (INTVAL (operands[3]) & CALL_V4_CLEAR_FP_ARGS)
 output_asm_insn ("creqv 6,6,6", operands);
 
-  if (rs6000_speculate_indirect_jumps
-  || which_alternative == 1 || which_alternative == 3)
-return "b%T1l";
-  else
-return "crset 2\;beq%T1l-";
+  return rs6000_output_indirect_call (operands, 1, false);
 }
   [(set_attr "type" "jmpreg,jmpreg,jmpreg,jmpreg")

[PATCH 1/6] [RS6000] rs6000_output_call for external call insn assembly output

2018-11-06 Thread Alan Modra
This is a first step in tidying rs6000 call patterns, in preparation
to support inline plt calls.

* config/rs6000/rs6000-protos.h (rs6000_output_call): Declare.
(macho_output_call): Rename from output_call.
* config/rs6000/rs6000.c (rs6000_output_call): New function.
(macho_output_call): Rename from output_call.
* config/rs6000/rs6000.md (tls_gd_aix): Use rs6000_output_call
to emit call.
(tls_gd_sysv, tls_gd_call_aix, tls_gd_call_sysv): Likewise.
(tls_ld_aix, tls_ld_sysv, tls_ld_call_aix): Likewise.
(tls_ld_call_sysv, call_nonlocal_sysv): Likewise.
(call_nonlocal_sysv_secure, call_value_nonlocal_sysv): Likewise.
(call_value_nonlocal_sysv_secure, call_nonlocal_aix): Likewise.
(call_value_nonlocal_aix, sibcall_nonlocal_sysv): Likewise.
(sibcall_value_nonlocal_sysv, sibcall_aix): Likewise.
(sibcall_value_aix): Likewise.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index fb69019c47c..f1a421dde16 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,6 +111,7 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
+extern const char *rs6000_output_call (rtx *, unsigned int, bool, const char 
*);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
@@ -228,7 +229,7 @@ extern void (*rs6000_target_modify_macros_ptr) (bool, 
HOST_WIDE_INT,
 extern void rs6000_d_target_versions (void);
 
 #if TARGET_MACHO
-char *output_call (rtx_insn *, rtx *, int, int);
+char *macho_output_call (rtx_insn *, rtx *, int, int);
 #endif
 
 #ifdef NO_DOLLAR_IN_LABEL
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 75b197f458c..b22cae55a0d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21380,6 +21380,37 @@ rs6000_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
   return default_assemble_integer (x, size, aligned_p);
 }
 
+/* Return a template string for assembly to emit when making an
+   external call.  FUN is the %z argument, ARG is either NULL or
+   a @TLSGD or @TLSLD __tls_get_addr argument specifier.  */
+
+const char *
+rs6000_output_call (rtx *operands, unsigned int fun, bool sibcall,
+   const char *arg)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  FUN is 0 or 1 as of 2018-03.  */
+  gcc_assert (fun <= 6);
+
+  /* The magic 32768 offset here corresponds to the offset of
+ r30 in .got2, as given by LCTOC1.  See sysv4.h:toc_section.  */
+  char z[10];
+  sprintf (z, "%%z%u%s", fun,
+  (DEFAULT_ABI == ABI_V4 && TARGET_SECURE_PLT && flag_pic == 2
+   ? "+32768" : ""));
+
+  static char str[32];  /* 5 spare */
+  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
+sprintf (str, "b%s %s%s%s", "l" + sibcall, z, arg,
+sibcall ? "" : "\n\tnop");
+  else if (DEFAULT_ABI == ABI_V4)
+sprintf (str, "b%s %s%s%s", "l" + sibcall, z, arg,
+flag_pic ? "@plt" : "");
+  else
+gcc_unreachable ();
+  return str;
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
@@ -32818,8 +32849,8 @@ get_prev_label (tree function_name)
CALL_DEST is the routine we are calling.  */
 
 char *
-output_call (rtx_insn *insn, rtx *operands, int dest_operand_number,
-int cookie_operand_number)
+macho_output_call (rtx_insn *insn, rtx *operands, int dest_operand_number,
+ int cookie_operand_number)
 {
   static char buf[256];
   if (darwin_emit_branch_islands
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 66742f66a89..52088fdfbdb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9454,10 +9454,11 @@ (define_insn_and_split 
"tls_gd_aix"
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
   if (TARGET_CMODEL != CMODEL_SMALL)
-return "addis %0,%1,%2@got@tlsgd@ha\;addi %0,%0,%2@got@tlsgd@l\;"
-  "bl %z3\;nop";
+output_asm_insn ("addis %0,%1,%2@got@tlsgd@ha\;"
+"addi %0,%0,%2@got@tlsgd@l", operands);
   else
-return "addi %0,%1,%2@got@tlsgd\;bl %z3\;nop";
+output_asm_insn ("addi %0,%1,%2@got@tlsgd", operands);
+  return rs6000_output_call (operands, 3, false, "");
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
@@ -9486,15 +9487,8 @@ (define_insn_and_split 
"tls_gd_sysv"
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && DEFAULT_ABI == ABI_V4"
 {
-  if (flag_pic)
-{
-  if (TARGET_SECURE_PLT && flag_pic == 2)
-   return "addi %0,%1,%2@got@tlsgd\;bl 

[PATCH 0/6] [RS6000] inline plt call support

2018-11-06 Thread Alan Modra
Hi Segher,
This is the patch series you already saw earlier this year, rebased
to recent gcc, and with a comment or two fixed.  The first five
patches tidy and rearrange the function call code in order to support
inline plt calls without a huge increase in rs6000.md.  As is, inline
plt calls are supported for powerpc-linux and powerpc64le-linux
ELFv2.  I don't support them for powerpc64-linux ELFv1 due to the
extra read barriers needed there, but it wouldn't be too difficult to
support if there was demand.

I've regression tested again on powerpc64le-linux.  Earlier testing
went to some lengths with old and new binutils on powerpc-linux,
powerpc64le-linux and powerpc64-linux.  I also tested using -fno-plt
in bootstrap and regression tests, which unsurprisingly showed
numerous fails due to wrong counts of symbols (inline plt references a
function symbol multiple times to make a call), wrong "bl" counts
(none with inline call) or similar.  I didn't see anything
frightening, and I expect that people generally won't regression test
with -fno-plt, so I haven't modified any tests.

One benefit of the inline plt support is that gcc will now use the new
sequences and relocs to support -mlongcall.  This allows lazy dynamic
resolution of the plt entries so it is now possible to dlopen
libraries and have -mlongcall code call functions in those libraries.
That wasn't possible before.  See
https://bugzilla.redhat.com/show_bug.cgi?id=1633721

Alan Modra (6):
  [RS6000] rs6000_output_call for external call insn assembly output
  [RS6000] rs6000_output_indirect_call
  [RS6000] Replace TLSmode with P, and correct tls call mems
  [RS6000] Remove constraints on call rounded_stack_size_rtx arg
  [RS6000] Use standard call patterns for __tls_get_addr calls
  [RS6000] inline plt call sequences

 gcc/config.in |6 +
 gcc/config/rs6000/darwin.md   |8 +-
 gcc/config/rs6000/predicates.md   |   25 +
 gcc/config/rs6000/rs6000-protos.h |8 +-
 gcc/config/rs6000/rs6000.c|  617 ++---
 gcc/config/rs6000/rs6000.h|4 +
 gcc/config/rs6000/rs6000.md   | 1023 +
 gcc/configure |   36 +
 gcc/configure.ac  |6 +
 9 files changed, 1065 insertions(+), 668 deletions(-)

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] doc: Use @: where needed

2018-11-06 Thread Segher Boessenkool
When an abbreviation ends with a dot followed by whitespace, Texinfo
thinks the dot ends a sentence, and applies spacing rules etc. based
on that.  To prevent this, there is the @: macro.

This patch puts @: after every vs., e.g., and i.e. where it is needed.
In a few cases there was "@ " already, or "@\n", but @: is slightly
better, and more consistent.

I only spot checked the output.

Is this okay for trunk?


Segher


2018-11-06  Segher Boessenkool  

* target.def: Put @: after every vs., e.g., and i.e. if it is followed
by whitespace.
* doc/extend.texi: Ditto.
* doc/fragments.texi: Ditto.
* doc/gimple.texi: Ditto.
* doc/implement-c.texi: Ditto.
* doc/install.texi: Ditto.
* doc/invoke.texi: Ditto.
* doc/md.texi: Ditto.
* doc/plugins.texi: Ditto.
* doc/rtl.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
* doc/tm.texi.in: Ditto.
* doc/ux.texi: Ditto.
* doc/tm.texi: Regenerate.

---
 gcc/doc/extend.texi  | 12 ++--
 gcc/doc/fragments.texi   |  2 +-
 gcc/doc/gimple.texi  |  4 ++--
 gcc/doc/implement-c.texi |  2 +-
 gcc/doc/install.texi | 10 +-
 gcc/doc/invoke.texi  | 50 
 gcc/doc/md.texi  | 23 +++---
 gcc/doc/plugins.texi |  4 ++--
 gcc/doc/rtl.texi |  4 ++--
 gcc/doc/sourcebuild.texi | 10 +-
 gcc/doc/tm.texi  | 12 ++--
 gcc/doc/tm.texi.in   |  6 +++---
 gcc/doc/ux.texi  |  4 ++--
 gcc/target.def   |  6 +++---
 14 files changed, 75 insertions(+), 74 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8d6a8d8..a1f79dc 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3232,7 +3232,7 @@ change the number of NOPs to any desired value.  The 
two-value syntax
 is the same as for the command-line switch
 @option{-fpatchable-function-entry=N,M}, generating @var{N} NOPs, with
 the function entry point before the @var{M}th NOP instruction.
-@var{M} defaults to 0 if omitted e.g. function entry point is before
+@var{M} defaults to 0 if omitted e.g.@: function entry point is before
 the first NOP.
 
 If patchable function entries are enabled globally using the command-line
@@ -5322,7 +5322,7 @@ depended upon to work reliably and are not supported.
 @cindex @code{vector} function attribute, RX
 This RX attribute is similar to the @code{interrupt} attribute, including its
 parameters, but does not make the function an interrupt-handler type
-function (i.e. it retains the normal C function calling ABI).  See the
+function (i.e.@: it retains the normal C function calling ABI).  See the
 @code{interrupt} attribute for a description of its arguments.
 @end table
 
@@ -7253,7 +7253,7 @@ possible for these fields to have a different scalar 
storage order than the
 enclosing type.
 
 This attribute is supported only for targets that use a uniform default
-scalar storage order (fortunately, most of them), i.e. targets that store
+scalar storage order (fortunately, most of them), i.e.@: targets that store
 the scalars either all in big-endian or all in little-endian.
 
 Additional restrictions are enforced for types with the reverse scalar
@@ -8485,7 +8485,7 @@ This code copies @code{src} to @code{dst} and add 1 to 
@code{dst}.
 GCC's optimizers sometimes discard @code{asm} statements if they determine 
 there is no need for the output variables. Also, the optimizers may move 
 code out of loops if they believe that the code will always return the same 
-result (i.e. none of its input values change between calls). Using the 
+result (i.e.@: none of its input values change between calls). Using the 
 @code{volatile} qualifier disables these optimizations. @code{asm} statements 
 that have no output operands, including @code{asm goto} statements, 
 are implicitly volatile.
@@ -8750,7 +8750,7 @@ Operands are separated by commas.  Each operand has this 
format:
 Specifies a symbolic name for the operand.
 Reference the name in the assembler template 
 by enclosing it in square brackets 
-(i.e. @samp{%[Value]}). The scope of the name is the @code{asm} statement 
+(i.e.@: @samp{%[Value]}). The scope of the name is the @code{asm} statement 
 that contains the definition. Any valid C variable name is acceptable, 
 including names already defined in the surrounding code. No two operands 
 within the same @code{asm} statement can use the same symbolic name.
@@ -8990,7 +8990,7 @@ Operands are separated by commas.  Each operand has this 
format:
 Specifies a symbolic name for the operand.
 Reference the name in the assembler template 
 by enclosing it in square brackets 
-(i.e. @samp{%[Value]}). The scope of the name is the @code{asm} statement 
+(i.e.@: @samp{%[Value]}). The scope of the name is the @code{asm} statement 
 that contains the definition. Any valid C variable name is acceptable, 
 including names already defined in the 

Update libquadmath fmaq from glibc, fix nanq issues

2018-11-06 Thread Joseph Myers
This patch extends update-quadmath.py to update fmaq from glibc.

The issue in that function was that quadmath-imp.h had a struct in a
union with mant_high and mant_low fields (up to 64-bit) whereas glibc
has mantissa0, mantissa1, mantissa2 and mantissa3 (up to 32-bit).  The
patch changes those fields to be the same as in glibc, moving printf /
strtod code that also uses those fields back to closer to the glibc
form.  This allows fmaq to be updated automatically from glibc (which
brings in at least one bug fix from glibc from 2015).

nanq was also using the mant_high field name, and had other issues: it
only partly initialized the union from which a value was returned, and
setting mant_high to 1 meant a signaling NaN would be returned rather
than a quiet NaN.  This patch fixes those issues as part of updating
it to use the changed interfaces (but does not fix the issue of not
using the argument).

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

2018-11-06  Joseph Myers  

* quadmath-imp.h (ieee854_float128): Use mantissa0, mantissa1,
mantissa2 and mantissa3 fields instead of mant_high and mant_low.
Change nan field to ieee_nan.
* update-quadmath.py (update_sources): Also update fmaq.c.
* math/nanq.c (nanq): Use ieee_nan field of union.
Zero-initialize f.  Set quiet_nan field.
* printf/flt1282mpn.c, printf/printf_fphex.c, strtod/mpn2flt128.c,
strtod/strtoflt128.c: Use mantissa0, mantissa1, mantissa2 and
mantissa3 fields.  Use ieee_nan and quiet_nan field.
* math/fmaq.c: Regenerate from glibc sources with
update-quadmath.py.

Index: libquadmath/math/fmaq.c
===
--- libquadmath/math/fmaq.c (revision 265822)
+++ libquadmath/math/fmaq.c (working copy)
@@ -1,5 +1,5 @@
 /* Compute x * y + z as ternary operation.
-   Copyright (C) 2010-2017 Free Software Foundation, Inc.
+   Copyright (C) 2010-2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Jakub Jelinek , 2010.
 
@@ -18,16 +18,6 @@
.  */
 
 #include "quadmath-imp.h"
-#include 
-#include 
-#ifdef HAVE_FENV_H
-# include 
-# if defined HAVE_FEHOLDEXCEPT && defined HAVE_FESETROUND \
- && defined HAVE_FEUPDATEENV && defined HAVE_FETESTEXCEPT \
- && defined FE_TOWARDZERO && defined FE_INEXACT
-#  define USE_FENV_H
-# endif
-#endif
 
 /* This implementation uses rounding to odd to avoid problems with
double rounding.  See a paper by Boldo and Melquiond:
@@ -73,7 +63,7 @@
   if (u.ieee.exponent + v.ieee.exponent
  > 0x7fff + IEEE854_FLOAT128_BIAS)
return x * y;
-  /* If x * y is less than 1/4 of FLT128_DENORM_MIN, neither the
+  /* If x * y is less than 1/4 of FLT128_TRUE_MIN, neither the
 result nor whether there is underflow depends on its exact
 value, only on its sign.  */
   if (u.ieee.exponent + v.ieee.exponent
@@ -94,8 +84,10 @@
  : (w.ieee.exponent == 0
 || (w.ieee.exponent == 1
 && w.ieee.negative != neg
-&& w.ieee.mant_low == 0
-&& w.ieee.mant_high == 0)))
+&& w.ieee.mantissa3 == 0
+&& w.ieee.mantissa2 == 0
+&& w.ieee.mantissa1 == 0
+&& w.ieee.mantissa0 == 0)))
{
  __float128 force_underflow = x * y;
  math_force_eval (force_underflow);
@@ -124,7 +116,7 @@
 very small, adjust them up to avoid spurious underflows,
 rather than down.  */
  if (u.ieee.exponent + v.ieee.exponent
- <= IEEE854_FLOAT128_BIAS + FLT128_MANT_DIG)
+ <= IEEE854_FLOAT128_BIAS + 2 * FLT128_MANT_DIG)
{
  if (u.ieee.exponent > v.ieee.exponent)
u.ieee.exponent += 2 * FLT128_MANT_DIG + 2;
@@ -181,17 +173,15 @@
 }
 
   /* Ensure correct sign of exact 0 + 0.  */
-  if (__builtin_expect ((x == 0 || y == 0) && z == 0, 0))
+  if (__glibc_unlikely ((x == 0 || y == 0) && z == 0))
 {
   x = math_opt_barrier (x);
   return x * y + z;
 }
 
-#ifdef USE_FENV_H
   fenv_t env;
   feholdexcept ();
   fesetround (FE_TONEAREST);
-#endif
 
   /* Multiplication m1 + m2 = x * y using Dekker's algorithm.  */
 #define C ((1LL << (FLT128_MANT_DIG + 1) / 2) + 1)
@@ -214,62 +204,46 @@
   /* Ensure the arithmetic is not scheduled after feclearexcept call.  */
   math_force_eval (m2);
   math_force_eval (a2);
-#ifdef USE_FENV_H
   feclearexcept (FE_INEXACT);
-#endif
 
   /* If the result is an exact zero, ensure it has the correct sign.  */
   if (a1 == 0 && m2 == 0)
 {
-#ifdef USE_FENV_H
   feupdateenv ();
-#endif
   /* Ensure that round-to-nearest value of z + m1 is not reused.  */
   z = math_opt_barrier (z);
   return z + m1;
 }
 
-#ifdef USE_FENV_H
   fesetround 

Re: [RFC][PATCH LRA] WIP patch to fix one part of PR87507

2018-11-06 Thread Segher Boessenkool
On Tue, Nov 06, 2018 at 01:27:58PM -0700, Jeff Law wrote:
> So the one worry I have/had in this code is nested subregs.  My
> recollection is they do happen in rare cases.  But I can also find a
> reference  where Jim W. has indicated they're invalid (and I absolutely
> trust Jim on this kind of historical RTL stuff).
> 
> https://gcc.gnu.org/ml/gcc/2005-04/msg01173.html

rtl.texi says

@code{subreg}s of @code{subreg}s are not supported.  Using
@code{simplify_gen_subreg} is the recommended way to avoid this problem.

(since r133982, from 2008).

> So, after all that, I think we're OK.  It might make sense to verify we
> don't have nested subregs in the IL verifiers.  Bonus points if you add
> that checking.

Or more general, that what is inside the subreg is a reg, because the
code does rely on that.


Segher


PING [PATCH] use MAX_OFILE_ALIGNMENT to validate attribute aligned (PR 87795)

2018-11-06 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02081.html

On 10/31/2018 02:52 PM, Martin Sebor wrote:

On 10/30/2018 04:34 PM, Joseph Myers wrote:

On Tue, 30 Oct 2018, Martin Sebor wrote:


So it seems that the attribute handler should be using this macro
instead.  I also took the liberty to add more detail to the error


Note that it should only be used for alignments relevant to the object
file - *not* for alignments of variables with automatic storage duration
(and thus not for alignments of types / struct fields, because such types
might only be used on the stack) since GCC supports arbitrary alignments
on the stack via dynamically realigning it.

So you need testcases that verify that large alignments are still allowed
for types / fields / on the stack, even when the object file only
supports
smaller alignments.


Good catch, thanks!  Attached is an updated patch that relaxes
the restriction to allow auto variables to be aligned on a more
restrictive boundary than MAX_OFILE_ALIGNMENT would imply.

I spent far more time building and testing various cross-toolchains
than I did on the GCC change, mainly because I couldn't find a way
to programmatically detect the value of MAX_OFILE_ALIGNMENT (or
the maximum alignment supported by GCC).  In the end I stuck with
the hardcoding.  If there isn't one, how about adding a couple
predefined macros for these?

The test is also pretty hacky (and I wouldn't surprised if it
failed on some system I didn't exercise).  Having GCC expose
these parameters in some way would make the test cleaner (and
more robust, though one might make the argument that relying
on GCC-generated values to verify those same values would
actually make it less robust).

In this revision I also updated the MAX_OFILE_ALIGNMENT desciption
in the internals manual.

Retested on x86_64-linux, plus using cross-compilers for hppa64,
pdp11, and powerpc-darwin.

Martin





Re: [PATCH] detect missing nuls in address of const char (PR 87756)

2018-11-06 Thread Martin Sebor

Jeff, I'd like to go ahead and commit the patch as is.  I believe
the use of the default argument is appropriate and in line with
GCC practice.  Please let me know if you have strong objections.
If I don't hear any I will proceed later this week

Thanks
Martin

On 10/30/2018 10:38 AM, Martin Sebor wrote:

On 10/30/2018 09:54 AM, Jeff Law wrote:

On 10/30/18 9:44 AM, Martin Sebor wrote:

On 10/30/2018 09:27 AM, Jeff Law wrote:

On 10/29/18 5:51 PM, Martin Sebor wrote:

The missing nul detection fails when the argument of the %s or
similar sprintf directive is the address of a non-nul character
constant such as in:

  const char c = 'a';
  int f (void)
  {
return snprintf (0, 0, "%s", );
  }

This is because the string_constant function only succeeds for
arguments that refer to STRRING_CSTs, not to individual characters.

For the same reason, calls to memchr() such as the one below aren't
folded into constants:

  const char d = '\0';
  void* g (void)
  {
return memchr (, 0, 1);
  }

To detect and diagnose the missing nul in the first example and
to fold the second, the attached patch modifies string_constant
to return a synthesized STRING_CST object for such references
(while also indicating whether such an object is properly
nul-terminated).

Tested on x86_64-linux.

Martin

gcc-87756.diff

PR tree-optimization/87756 - missing unterminated argument warning
using address of a constant character

gcc/ChangeLog:

PR tree-optimization/87756
* expr.c (string_constant): Synthesize a string literal from
the address of a constant character.
* tree.c (build_string_literal): Add an argument.
* tree.h (build_string_literal): Same.

gcc/testsuite/ChangeLog:

PR tree-optimization/87756
* gcc.dg/builtin-memchr-2.c: New test.
* gcc.dg/builtin-memchr-3.c: Same.
* gcc.dg/warn-sprintf-no-nul-2.c: Same.

Index: gcc/expr.c
===
--- gcc/expr.c(revision 265496)
+++ gcc/expr.c(working copy)
@@ -11484,18 +11484,40 @@ string_constant (tree arg, tree
*ptr_offset, tree
 offset = off;
 }

-  if (!init || TREE_CODE (init) != STRING_CST)
+  if (!init)
 return NULL_TREE;

+  *ptr_offset = offset;
+
+  tree eltype = TREE_TYPE (init);
+  tree initsize = TYPE_SIZE_UNIT (eltype);
   if (mem_size)
-*mem_size = TYPE_SIZE_UNIT (TREE_TYPE (init));
+*mem_size = initsize;
+
   if (decl)
 *decl = array;

-  gcc_checking_assert (tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE
(init)))
-   >= TREE_STRING_LENGTH (init));
+  if (TREE_CODE (init) == INTEGER_CST)
+{
+  /* For a reference to (address of) a single constant character,
+ store the native representation of the character in
CHARBUF.   */
+  unsigned char charbuf[MAX_BITSIZE_MODE_ANY_MODE /
BITS_PER_UNIT];
+  int len = native_encode_expr (init, charbuf, sizeof charbuf,
0);
+  if (len > 0)
+{
+  /* Construct a string literal with elements of ELTYPE and
+ the representation above.  Then strip
+ the ADDR_EXPR (ARRAY_REF (...)) around the STRING_CST.  */
+  init = build_string_literal (len, (char *)charbuf, eltype);
+  init = TREE_OPERAND (TREE_OPERAND (init, 0), 0);
+}
+}

-  *ptr_offset = offset;
+  if (TREE_CODE (init) != STRING_CST)
+return NULL_TREE;
+
+  gcc_checking_assert (tree_to_shwi (initsize) >= TREE_STRING_LENGTH
(init));
+
   return init;
 }

Index: gcc/tree.c
===
--- gcc/tree.c(revision 265496)
+++ gcc/tree.c(working copy)



Index: gcc/tree.h
===
--- gcc/tree.h(revision 265496)
+++ gcc/tree.h(working copy)
@@ -4194,7 +4194,7 @@ extern tree
build_call_expr_internal_loc_array (lo
 extern tree maybe_build_call_expr_loc (location_t, combined_fn, tree,
int, ...);
 extern tree build_alloca_call_expr (tree, unsigned int,
HOST_WIDE_INT);
-extern tree build_string_literal (int, const char *);
+extern tree build_string_literal (int, const char *, tree =
char_type_node);

 /* Construct various nodes representing data types.  */

There's only about a dozen calls to build_string_literal.  Instead of
defaulting the argument, just fix them.OK with that change.  Make
sure to catch those in config/{rs6000,i386}/ and cp/


Why?  Default arguments (and overloading) exist in C++ to deal
with just this case: to avoid having to provide the common
argument value while letting callers provide a different value
when they need to.  What purpose will it serve to make these
unnecessary changes and to force new callers to provide
the default argument value?  It will only make using
the function more error-prone and its callers harder
to read.I find them much harder to read especially once you get more
than one.

In cases where we have a small number of call sites we should just fix
them.  I think that we're well in that range here.  

Re: [PATCH] diagnose built-in declarations without prototype (PR 83656)

2018-11-06 Thread Martin Sebor

In response to Joseph's comment I've removed the interaction
with -Wpedantic from the updated patch.

In addition, to help detect bugs like the one in the test case
for pr87886, I have also enhanced the detection of incompatible
calls to include integer/real type mismatches so that calls like
the one below are diagnosed:

  extern double sqrt ();
  int f (int x)
  {
return sqrt (x);   // passing int where double is expected
  }

With the removal of the -Wpedantic interaction declaring abort()
without a prototype is no longer diagnosed and so the test suite
changes to add the prototype are not necessary.  I decided not
to back them out because Jeff indicated a preference for making
these kinds of improvements in general in an unrelated
discussion.

On 11/02/2018 05:40 PM, Martin Sebor wrote:

On 11/02/2018 04:52 PM, Joseph Myers wrote:

On Fri, 2 Nov 2018, Martin Sebor wrote:


I have reworked the patch to resolve any lingering concerns about
warnings in configure tests.  The attached revision only warns
with -Wextra and only for incompatible declarations of built-ins
that take arguments.  For void built-ins like abort() it only
warns with -Wpedantic (this required adjustments to several
tests that are being compiled with -pedantic-errors).


I don't think this use of -Wpedantic is appropriate.  -Wpedantic is not a
catch-all for warnings we don't want to enable with some other option;
it's specifically for programs doing something that is disallowed by
ISO C
(such warnings may or may not also be enabled by other relevant options).

Since this declaration is not disallowed by ISO C, -Wpedantic should not
result in a warning for it.

(I do consider declarations with () for built-in functions without
arguments to be more dubious than for user-defined functions without
arguments, simply because good practice would be to include the standard
header to get declarations of those functions, whereas for user-defined
functions the code might simply be using C++ style for declaring
functions
without arguments.)


-Wpedantic alone doesn't cause a warning, only in conjunction
with -Wno-builtin-declaration-mismatch.

But I have no preference for what option to put it under, or
necessarily think that using -Wpedantic (or any other "group"
option) like this is a great idea (it doesn't work with #pragma
GCC diagnostic that way I think it should).  In fact, with
the latest approach of diagnosing unsafe calls to these functions
regardless of the declaration form it doesn't seem that important
that declarations of built-ins with no arguments be diagnosed at
all.  Either way, there aren't enough of them for it to matter
much.  I think there's just one: abort.  I'm fine with removing
this part of the patch.

Is there anything else?

Martin


PR c/83656 - missing -Wbuiltin-declaration-mismatch on declaration without prototype

gcc/c/ChangeLog:

	PR c/83656
	* c-decl.c (header_for_builtin_fn): Declare.
	(diagnose_mismatched_decls): Diagnose declarations of built-in
	functions without a prototype.
	* c-typeck.c (maybe_warn_builtin_no_proto_arg): New function.
	(convert_argument): Same.
	(convert_arguments): Factor code out into convert_argument.
	Detect mismatches between built-in formal arguments in calls
	to built-in without prototype.
	(build_conditional_expr): Same.
	(type_or_builtin_type): New function.
	(convert_for_assignment): Add argument.  Conditionally issue
	warnings instead of errors for mismatches.

gcc/testsuite/ChangeLog:

	PR c/83656
	* gcc.dg/20021006-1.c
	* gcc.dg/Wbuiltin-declaration-mismatch.c: New test.
	* gcc.dg/Wbuiltin-declaration-mismatch-2.c: New test.
	* gcc.dg/Wbuiltin-declaration-mismatch-3.c: New test.
	* gcc.dg/Wbuiltin-declaration-mismatch-4.c: New test.
	* gcc.dg/Walloca-16.c: Adjust.
	* gcc.dg/Wrestrict-4.c: Adjust.
	* gcc.dg/Wrestrict-5.c: Adjust.
	* gcc.dg/atomic/stdatomic-generic.c: Adjust.
	* gcc.dg/atomic/stdatomic-lockfree.c: Adjust.
	* gcc.dg/initpri1.c: Adjust.
	* gcc.dg/pr15698-1.c: Adjust.
	* gcc.dg/pr69156.c: Adjust.
	* gcc.dg/pr83463.c: Adjust.
	* gcc.dg/redecl-4.c: Adjust.
	* gcc.dg/tls/thr-init-2.c: Adjust.
	* gcc.dg/torture/pr55890-2.c: Adjust.
	* gcc.dg/torture/pr55890-3.c: Adjust.
	* gcc.dg/torture/pr67741.c: Adjust.
	* gcc.dg/torture/stackalign/sibcall-1.c: Adjust.
	* gcc.dg/torture/tls/thr-init-1.c: Adjust.
	* gcc.dg/tree-ssa/builtins-folding-gimple-ub.c: Adjust.

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index cbbf7eb..524ac83 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -604,6 +604,7 @@ static tree grokparms (struct c_arg_info *, bool);
 static void layout_array_type (tree);
 static void warn_defaults_to (location_t, int, const char *, ...)
 ATTRIBUTE_GCC_DIAG(3,4);
+static const char *header_for_builtin_fn (enum built_in_function);
 
 /* T is a statement.  Add it to the statement-tree.  This is the
C/ObjC version--C++ has a slightly different version of this
@@ -1887,12 +1888,25 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 	*oldtypep 

RE: Notes on -mloongson-ext2

2018-11-06 Thread mfortune
Hi Paul,

Just a few additional comments, mostly typos and a little cleanup.

>gcc/
>   * config/mips/mips-protos.h
>   (mips_loongson_ext2_prefetch_cookie): New prototype.
>   * config/mips/mips.c (mips_loongson_ext2_prefetch_cookie): New.
>   (mips_option_override): Enable TARGET_LOONGSON_EXT when
>   TARGET_LOONGSON_EXT2 is true.
>   * config/mips/mips.h (TARGET_CPU_CPP_BUILTINS): Define
>   __mips_loongson_ext2, __mips_loongson_ext_rev=2.
>   (ISA_HAS_CTZ_CTO): New, ture if TARGET_LOONGSON_EXT2.

typo: true

>   (ISA_HAS_PREFETCH): Include TARGET_LOONGSON_EXT and
>   TARGET_LOONGSON_EXT2.
>   (ASM_SPEC): Add mloongson-ext2 and mno-loongson-ext2.
>   (define_insn "ctz2"): New insn pattern.
>   (define_insn "prefetch"): Include TARGET_LOONGSON_EXT2.
>   (define_insn "prefetch_indexed_"): Include
>   TARGET_LOONGSON_EXT and TARGET_LOONGSON_EXT2.
>   * config/mips/mips.opt (-mloongson-ext2): Add option.
>   * gcc/doc/invoke.texi (-mloongson-ext2): Document.
>
>gcc/testsuite/
>   * gcc.target/mips/loongson-ctz.c: New test.
>   * gcc.target/mips/loongson-dctz.c: Likewise.
>   * gcc.target/mips/mips.exp (mips_option_groups): Add
>   -mloongson-ext2 option.
>---
> gcc/config/mips/mips-protos.h |1 +
> gcc/config/mips/mips.c|   28 +++
> gcc/config/mips/mips.h|   15 +++-
> gcc/config/mips/mips.md   |   47 ++--
> gcc/config/mips/mips.opt  |4 ++
> gcc/doc/invoke.texi   |7 
> gcc/testsuite/gcc.target/mips/loongson-ctz.c  |   11 ++
> gcc/testsuite/gcc.target/mips/loongson-dctz.c |   11 ++
> gcc/testsuite/gcc.target/mips/mips.exp|1 +
> 9 files changed, 120 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/mips/loongson-ctz.c
> create mode 100644 gcc/testsuite/gcc.target/mips/loongson-dctz.c
>
>diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>index b579c3c..1c20750 100644
>--- a/gcc/config/mips/mips.c
>+++ b/gcc/config/mips/mips.c
>@@ -20171,6 +20187,18 @@ mips_option_override (void)
>   if (TARGET_LOONGSON_MMI &&  !TARGET_HARD_FLOAT_ABI)
> error ("%<-mloongson-mmi%> must be used with %<-mhard-float%>");
> 
>+  /* If TARGET_LOONGSON_EXT2, enable TARGET_LOONGSON_EXT.  */
>+  if (TARGET_LOONGSON_EXT2)
>+{
>+  /* Make sure that when TARGET_LOONGSON_EXT2 is true, TARGET_LOONGSON_EXT
>+   is true.  If a user explicitly says -mloongson-ext2 -mno-loongson-ext
>+   then that is an error.  */
>+  if (!TARGET_LOONGSON_EXT
>+&& !((target_flags_explicit & MASK_LOONGSON_EXT) == 0))

Bit of a brain twister, how about:

(target_flags_explicit & MASK_LOONGSON_EXT) != 0

>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>index 7237c8d..beeb4bc 100644
>--- a/gcc/config/mips/mips.h
>+++ b/gcc/config/mips/mips.h
>@@ -1134,6 +1141,9 @@ struct mips_cpu_info {
> /* ISA has count leading zeroes/ones instruction (not implemented).  */
> #define ISA_HAS_CLZ_CLO   (mips_isa_rev >= 1 && !TARGET_MIPS16)
> 
>+/* ISA has count tailing zeroes/ones instruction.  */

typo: trailing

>diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
>index 4b7a627..8358218 100644
>--- a/gcc/config/mips/mips.md
>+++ b/gcc/config/mips/mips.md
>@@ -7136,13 +7153,20 @@
>(match_operand 2 "const_int_operand" "n"))]
>   "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
> {
>-  if (TARGET_LOONGSON_2EF || TARGET_LOONGSON_EXT)
>+  if (TARGET_LOONGSON_2EF || TARGET_LOONGSON_EXT || TARGET_LOONGSON_EXT2)
> {
>-  /* Loongson 2[ef] and Loongson 3a use load to $0 for prefetching.  */
>+  /* Loongson ext2 implementation pref insnstructions.  */

typo: instructions.

>+  if (TARGET_LOONGSON_EXT2)
>+  {
>+operands[1] = mips_loongson_ext2_prefetch_cookie (operands[1],
>+  operands[2]);
>+return "pref\t%1, %a0";
>+  }

It's not a big deal but it would be cleaner to hoist this above the
2EF/EXT block as it is totally independent. Same for the prefx case.

>@@ -7156,6 +7180,21 @@
>(match_operand 3 "const_int_operand" "n"))]
>   "ISA_HAS_PREFETCHX && TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT"
> {
>+  if (TARGET_LOONGSON_EXT || TARGET_LOONGSON_EXT2)
>+{
>+  /* Loongson ext2 implementation pref insnstructions.  */

typo: instructions.

>+  if (TARGET_LOONGSON_EXT2)
>+  {
>+operands[2] = mips_loongson_ext2_prefetch_cookie (operands[2],
>+  operands[3]);
>+return "prefx\t%2,%1(%0)";
>+  }

>diff --git a/gcc/testsuite/gcc.target/mips/loongson-ctz.c 
>b/gcc/testsuite/gcc.target/mips/loongson-ctz.c
>new file mode 100644
>index 000..8df66a0
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/mips/loongson-ctz.c

Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Richard Earnshaw (lists)
On 06/11/2018 20:18, Segher Boessenkool wrote:
> On Tue, Nov 06, 2018 at 07:43:36PM +, Richard Earnshaw (lists) wrote:
>> On 06/11/2018 18:18, Segher Boessenkool wrote:
>>> On Tue, Nov 06, 2018 at 11:46:53AM +, Richard Earnshaw (lists) wrote:
 Well it generates new 'light-weight' prologue and epilogue sequences for
 the 'shrunk' code path that lack the establishment of the tracker
 register and doesn't know how to move the existing sequence to the new
 entry sequence.
>>>
>>> Ah, so the shrink-wrapping code is not deleting anything at all (just
>>> not adding it).  Gotcha :-)
>>
>> Well you could argue that it deleted the tracker update for the case
>> where the branch was not taken, and it also deleted the part of the
>> prologue where the tracker state was restored into SP before the return.
>>  But I'm being picky... :-)
> 
> When I say "deleted" I mean "deleted RTL code that was actually there".
> You seem to mean "prevented it from being created later"?
> 
> What I'm after is, if the shrink-wrapping code is deleting RTL it has
> no business touching, that sounds like a serious bug.

Well it has 'deleted' the update of the tracker register after the
conditional branch leading directly to the return insn.  But it's
possible that what has happened is that the use of the tracker variable
has been deleted (not re-emitted for the shrunk-wrap return sequence)
and thus another optimization has deleted the update as being dead.  I
haven't checked the rtl output directly to see how this is happening.

R.

> 
>>> [ snip example code; thanks, that helped ]
>>>
 I'm not asking that shrink wrapping be updated to handle all this; in
 fact, I'm not sure it's that easy to do as the branch patterns and
 simple-return patterns aren't set up to handle this.
>>>
>>> One thing you could do is make shrink-wrap aware what part of the code
>>> needs the speculation tracking parts of the prologue.  You could do this
>>> by making a separate shrink-wrapping component for it, or you can do it
>>> by marking the places needing it as needing the full prologue, e.g. by
>>> emitting a fake call into it (and not outputting any code for that call).
>>> The latter does cause a stack frame to be emitted even when it wouldn't
>>> otherwise, unfortunately.  The separate shrink-wrapping approach should
>>> work beautifully as far as I see.
>>
>> There are number of optimizations that are worth investigation with the
>> tracking support; but whether they'll notably improve performance I'm
>> not sure.  Tracking just just expensive and the main problem is the
>> serialization of the state, which limits the core's ability to reorder
>> stuff internally.
> 
> Yeah, it will be seriously expensive always.  If people still use this
> in production code you really _do_ want to optimise it.  If that helps
> measurably, anyway.
> 
> 
> Segher
> 



[PATCH] Implement std::pmr::unsynchronized_pool_resource

2018-11-06 Thread Jonathan Wakely

Implement std::pmr::unsynchronized_pool_resource
* config/abi/pre/gnu.ver: Add new symbols.
* include/std/memory_resource (std::pmr::__pool_resource): New class.
(std::pmr::unsynchronized_pool_resource): New class.
* src/c++17/Makefile.am: Add -fimplicit-templates to flags for
memory_resource.cc
* src/c++17/Makefile.in: Regenerate.
* src/c++17/memory_resource.cc (bitset, chunk, big_block): New
internal classes.
(__pool_resource::_Pool): Define new class.
(munge_options, pool_index, select_num_pools): New internal functions.
(__pool_resource::__pool_resource, __pool_resource::~__pool_resource)
(__pool_resource::allocate, __pool_resource::deallocate)
(__pool_resource::_M_alloc_pools): Define member functions.
(unsynchronized_pool_resource::unsynchronized_pool_resource)
(unsynchronized_pool_resource::~unsynchronized_pool_resource)
(unsynchronized_pool_resource::release)
(unsynchronized_pool_resource::_M_find_pool)
(unsynchronized_pool_resource::do_allocate)
(unsynchronized_pool_resource::do_deallocate): Define member
functions.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/is_equal.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/options.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/release.cc: New
test.

The new tests being added here are pretty minimal, because we can't
assume machines running the testsuite will be able to allocate large
amounts of memory. I've tested it more thoroughly with much larger
tests though, and will try to get some of them in shape for the
testsuite/performance/20_util directory.

Tested powerpc64le-linux. Committed to trunk.

commit 52d8ce5431c191d8249415eff5c8b942a597efa0
Author: Jonathan Wakely 
Date:   Wed Oct 31 22:22:45 2018 +

Implement std::pmr::unsynchronized_pool_resource

Implement std::pmr::unsynchronized_pool_resource
* config/abi/pre/gnu.ver: Add new symbols.
* include/std/memory_resource (std::pmr::__pool_resource): New 
class.
(std::pmr::unsynchronized_pool_resource): New class.
* src/c++17/Makefile.am: Add -fimplicit-templates to flags for
memory_resource.cc
* src/c++17/Makefile.in: Regenerate.
* src/c++17/memory_resource.cc (bitset, chunk, big_block): New
internal classes.
(__pool_resource::_Pool): Define new class.
(munge_options, pool_index, select_num_pools): New internal 
functions.
(__pool_resource::__pool_resource, 
__pool_resource::~__pool_resource)
(__pool_resource::allocate, __pool_resource::deallocate)
(__pool_resource::_M_alloc_pools): Define member functions.
(unsynchronized_pool_resource::unsynchronized_pool_resource)
(unsynchronized_pool_resource::~unsynchronized_pool_resource)
(unsynchronized_pool_resource::release)
(unsynchronized_pool_resource::_M_find_pool)
(unsynchronized_pool_resource::do_allocate)
(unsynchronized_pool_resource::do_deallocate): Define member
functions.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/is_equal.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/options.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/release.cc: New
test.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index e8cd286ef0c..b55038b8845 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2055,6 +2055,15 @@ GLIBCXX_3.4.26 {
 _ZNSt13basic_filebufI[cw]St11char_traitsI[cw]EE4openEPKwSt13_Ios_Openmode;
 
 
_ZN11__gnu_debug25_Safe_local_iterator_base16_M_attach_singleEPNS_19_Safe_sequence_baseEb;
+
+#  members
+_ZTINSt3pmr28unsynchronized_pool_resourceE;
+
_ZNSt3pmr28unsynchronized_pool_resourceC[12]ERKNS_12pool_optionsEPNS_15memory_resourceE;
+_ZNSt3pmr28unsynchronized_pool_resourceD[12]Ev;
+_ZNSt3pmr28unsynchronized_pool_resource7releaseEv;
+_ZNSt3pmr28unsynchronized_pool_resource11do_allocateEmm;
+_ZNSt3pmr28unsynchronized_pool_resource13do_deallocateEPvmm;
+
 } GLIBCXX_3.4.25;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/memory_resource 
b/libstdc++-v3/include/std/memory_resource
index 7dc35ae723d..40486af82fe 100644
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@@ -33,9 +33,9 @@
 
 #if __cplusplus >= 201703L
 
-#include 

[PATCH v4 2/3] or1k: testsuite: initial support for openrisc

2018-11-06 Thread Stafford Horne
-mm-dd  Stafford Horne  
Richard Henderson  

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/20101011-1.c: Adjust for OpenRISC.
* gcc.dg/20020312-2.c: Likewise.
* gcc.dg/attr-alloc_size-11.c: Likewise.
* gcc.dg/builtin-apply2.c: Likewise.
* gcc.dg/nop.h: Likewise.
* gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise.
* gcc.dg/tree-ssa/20040204-1.c: Likewise.
* gcc.dg/tree-ssa/reassoc-33.c: Likewise.
* gcc.dg/tree-ssa/reassoc-34.c: Likewise.
* gcc.dg/tree-ssa/reassoc-35.c: Likewise.
* gcc.dg/tree-ssa/reassoc-36.c: Likewise.
* lib/target-supports.exp
(check_effective_target_logical_op_short_circuit): Add or1k*-*-*.
* gcc.target/or1k/*: New.
---
 .../gcc.c-torture/execute/20101011-1.c|  3 ++
 gcc/testsuite/gcc.dg/20020312-2.c |  2 +
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c |  4 +-
 gcc/testsuite/gcc.dg/builtin-apply2.c |  2 +-
 gcc/testsuite/gcc.dg/nop.h|  2 +
 .../torture/stackalign/builtin-apply-2.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c|  2 +-
 gcc/testsuite/gcc.target/or1k/args-1.c| 19 +
 gcc/testsuite/gcc.target/or1k/args-2.c| 15 +++
 gcc/testsuite/gcc.target/or1k/cmov-1.c|  8 
 gcc/testsuite/gcc.target/or1k/cmov-2.c|  9 
 gcc/testsuite/gcc.target/or1k/div-mul-1.c |  9 
 gcc/testsuite/gcc.target/or1k/div-mul-2.c |  9 
 gcc/testsuite/gcc.target/or1k/or1k.exp| 41 +++
 gcc/testsuite/gcc.target/or1k/return-1.c  | 10 +
 gcc/testsuite/gcc.target/or1k/return-2.c  | 19 +
 gcc/testsuite/gcc.target/or1k/return-3.c  | 19 +
 gcc/testsuite/gcc.target/or1k/return-4.c  | 19 +
 gcc/testsuite/gcc.target/or1k/ror-1.c |  8 
 gcc/testsuite/gcc.target/or1k/ror-2.c |  9 
 gcc/testsuite/gcc.target/or1k/ror-3.c |  8 
 gcc/testsuite/gcc.target/or1k/shftimm-1.c |  8 
 gcc/testsuite/gcc.target/or1k/shftimm-2.c |  8 
 gcc/testsuite/gcc.target/or1k/sibcall-1.c | 18 
 gcc/testsuite/lib/target-supports.exp |  1 +
 29 files changed, 253 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/or1k/args-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/args-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/cmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/cmov-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/div-mul-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/div-mul-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/or1k.exp
 create mode 100644 gcc/testsuite/gcc.target/or1k/return-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/return-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/return-3.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/return-4.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/ror-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/ror-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/ror-3.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/shftimm-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/shftimm-2.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/sibcall-1.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/20101011-1.c 
b/gcc/testsuite/gcc.c-torture/execute/20101011-1.c
index 8261b796a47..d2beeb52a0e 100644
--- a/gcc/testsuite/gcc.c-torture/execute/20101011-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/20101011-1.c
@@ -100,6 +100,9 @@ __aeabi_idiv0 (int return_value)
 #elif defined (__moxie__)
   /* Not all moxie configurations may raise exceptions.  */
 # define DO_TEST 0
+#elif defined (__or1k__)
+  /* On OpenRISC division by zero does not trap.  */
+# define DO_TEST 0
 #else
 # define DO_TEST 1
 #endif
diff --git a/gcc/testsuite/gcc.dg/20020312-2.c 
b/gcc/testsuite/gcc.dg/20020312-2.c
index 1a8afd81506..e72a5b261ae 100644
--- a/gcc/testsuite/gcc.dg/20020312-2.c
+++ b/gcc/testsuite/gcc.dg/20020312-2.c
@@ -117,6 +117,8 @@ extern void abort (void);
 # if defined (__CK807__) || defined (__CK810__)
 #   define PIC_REG  "r28"
 # endif
+#elif defined (__or1k__)
+/* No pic register.  */
 #else
 # error "Modify the test for your target."
 #endif
diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c 
b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index 301a06fd464..e19f81a7624 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fail because of missing range information.  The xfail
exclusions are PR79356.  */
-TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning 

[PATCH v4 3/3] or1k: gcc: initial support for openrisc

2018-11-06 Thread Stafford Horne
-mm-dd  Stafford Horne  
Richard Henderson  
Joel Sherrill  

gcc/ChangeLog:

* common/config/or1k/or1k-common.c: New file.
* config/or1k/*: New.
* config.gcc (or1k*-*-*): New.
* configure.ac (or1k*-*-*): New test for openrisc tls.
* configure: Regenerated.
* doc/install.texi: Document OpenRISC triplets.
* doc/invoke.texi: Document OpenRISC arguments.
* doc/md.texi: Document OpenRISC.
---
 gcc/common/config/or1k/or1k-common.c |   41 +
 gcc/config.gcc   |   45 +
 gcc/config/or1k/constraints.md   |   55 +
 gcc/config/or1k/elf.h|   42 +
 gcc/config/or1k/elf.opt  |   33 +
 gcc/config/or1k/linux.h  |   45 +
 gcc/config/or1k/or1k-protos.h|   38 +
 gcc/config/or1k/or1k.c   | 2183 ++
 gcc/config/or1k/or1k.h   |  392 +
 gcc/config/or1k/or1k.md  |  897 +++
 gcc/config/or1k/or1k.opt |   67 +
 gcc/config/or1k/predicates.md|   84 +
 gcc/config/or1k/rtems.h  |   30 +
 gcc/config/or1k/t-or1k   |   22 +
 gcc/config/or1k/t-rtems  |3 +
 gcc/configure|   12 +
 gcc/configure.ac |   12 +
 gcc/doc/install.texi |   19 +
 gcc/doc/invoke.texi  |   68 +
 gcc/doc/md.texi  |   25 +
 20 files changed, 4113 insertions(+)
 create mode 100644 gcc/common/config/or1k/or1k-common.c
 create mode 100644 gcc/config/or1k/constraints.md
 create mode 100644 gcc/config/or1k/elf.h
 create mode 100644 gcc/config/or1k/elf.opt
 create mode 100644 gcc/config/or1k/linux.h
 create mode 100644 gcc/config/or1k/or1k-protos.h
 create mode 100644 gcc/config/or1k/or1k.c
 create mode 100644 gcc/config/or1k/or1k.h
 create mode 100644 gcc/config/or1k/or1k.md
 create mode 100644 gcc/config/or1k/or1k.opt
 create mode 100644 gcc/config/or1k/predicates.md
 create mode 100644 gcc/config/or1k/rtems.h
 create mode 100644 gcc/config/or1k/t-or1k
 create mode 100644 gcc/config/or1k/t-rtems

diff --git a/gcc/common/config/or1k/or1k-common.c 
b/gcc/common/config/or1k/or1k-common.c
new file mode 100644
index 000..044e843fd19
--- /dev/null
+++ b/gcc/common/config/or1k/or1k-common.c
@@ -0,0 +1,41 @@
+/* Common hooks for OpenRISC
+   Copyright (C) 2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+/* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
+static const struct default_options or1k_option_optimization_table[] =
+  {
+/* Enable section anchors by default at -O1 or higher.  */
+{ OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
+{ OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
+#undef TARGET_OPTION_OPTIMIZATION_TABLE
+#define TARGET_OPTION_OPTIMIZATION_TABLE or1k_option_optimization_table
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 284f7d178de..7ef8d27f091 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -484,6 +484,9 @@ nios2-*-*)
 nvptx-*-*)
cpu_type=nvptx
;;
+or1k*-*-*)
+   cpu_type=or1k
+   ;;
 powerpc*-*-*spe*)
cpu_type=powerpcspe
extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h 
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
@@ -2490,6 +2493,48 @@ nvptx-*)
tm_file="${tm_file} nvptx/offload.h"
fi
;;
+or1k*-*-*)
+   tm_file="elfos.h ${tm_file}"
+   tmake_file="${tmake_file} or1k/t-or1k"
+   # Force .init_array support.  The configure script cannot always
+   # automatically detect that GAS supports it, yet we require it.
+   gcc_cv_initfini_array=yes
+
+   # Handle --with-multilib-list=...
+   or1k_multilibs="${with_multilib_list}"
+   if test "$or1k_multilibs" = "default"; then
+   or1k_multilibs="mcmov,msoft-mul,msoft-div"
+   fi
+   or1k_multilibs=`echo $or1k_multilibs | sed -e 's/,/ /g'`
+   for or1k_multilib in ${or1k_multilibs}; do
+   case 

[PATCH v4 1/3] or1k: libgcc: initial support for openrisc

2018-11-06 Thread Stafford Horne
-mm-dd  Stafford Horne  
Richard Henderson  

libgcc/ChangeLog:

* config.host: Add OpenRISC support.
* config/or1k/*: New.
---
 libgcc/config.host|  12 ++
 libgcc/config/or1k/lib1funcs.S| 222 ++
 libgcc/config/or1k/linux-unwind.h |  87 
 libgcc/config/or1k/sfp-machine.h  |  54 
 libgcc/config/or1k/t-or1k |  22 +++
 5 files changed, 397 insertions(+)
 create mode 100644 libgcc/config/or1k/lib1funcs.S
 create mode 100644 libgcc/config/or1k/linux-unwind.h
 create mode 100644 libgcc/config/or1k/sfp-machine.h
 create mode 100644 libgcc/config/or1k/t-or1k

diff --git a/libgcc/config.host b/libgcc/config.host
index 029f6569caf..1cbc8aca1cb 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -165,6 +165,9 @@ nds32*-*)
 nios2*-*-*)
cpu_type=nios2
;;
+or1k*-*-*)
+   cpu_type=or1k
+   ;;
 powerpc*-*-*)
cpu_type=rs6000
;;
@@ -1039,6 +1042,15 @@ nios2-*-*)
tmake_file="$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl 
t-softfp"
extra_parts="$extra_parts crti.o crtn.o"
;;
+or1k-*-linux*)
+   tmake_file="$tmake_file or1k/t-or1k"
+   tmake_file="$tmake_file t-softfp-sfdf t-softfp"
+   md_unwind_header=or1k/linux-unwind.h
+   ;;
+or1k-*-*)
+   tmake_file="$tmake_file or1k/t-or1k"
+   tmake_file="$tmake_file t-softfp-sfdf t-softfp"
+   ;;
 pdp11-*-*)
tmake_file="pdp11/t-pdp11 t-fdpbit"
;;
diff --git a/libgcc/config/or1k/lib1funcs.S b/libgcc/config/or1k/lib1funcs.S
new file mode 100644
index 000..0ec41c3eba1
--- /dev/null
+++ b/libgcc/config/or1k/lib1funcs.S
@@ -0,0 +1,222 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+
+#ifdef L__mulsi3
+   .balign 4
+   .globl  __mulsi3
+   .type   __mulsi3, @function
+__mulsi3:
+   l.movhi r11, 0  /* initial r */
+
+   /* Given R = X * Y ... */
+1: l.sfeq  r4, r0  /* while (y != 0) */
+   l.bf2f
+l.andi r5, r4, 1   /* if (y & 1) ... */
+   l.add   r12, r11, r3
+   l.sfne  r5, r0
+#if defined(__or1k_cmov__)
+   l.cmov  r11, r12, r11   /* ... r += x. */
+   l.srli  r4, r4, 1   /* y >>= 1 */
+#else
+   l.bnf   3f
+l.srli r4, r4, 1   /* y >>= 1 */
+   l.ori   r11, r12, 0
+3:
+#endif
+   l.j 1b
+l.add  r3, r3, r3  /* x <<= 1 */
+
+2: l.jrr9
+l.nop
+
+   .size   __mulsi3, . - __mulsi3
+#endif
+
+#if defined(L__udivsi3) || defined(L__umodsi3) \
+|| defined(L__divsi3) || defined(L__modsi3)
+   .global __udivmodsi3_internal
+   .hidden __udivmodsi3_internal
+   .type   __udivmodsi3_internal, @function
+#endif
+
+#ifdef L__udivsi3
+   .balign 4
+   .global __udivsi3
+   .type   __udivsi3, @function
+__udivsi3:
+__udivmodsi3_internal:
+   /* Note that the other division routines assume that r13
+  is not clobbered by this routine, and use that as to
+  save a return address without creating a stack frame.  */
+
+   l.sfeqi r4, 0   /* division by zero; return 0.  */
+   l.ori   r11, r0, 0  /* initial quotient */
+   l.bf9f
+l.ori  r12, r3, 0  /* initial remainder */
+
+   /* Given X/Y, shift Y left until Y >= X.  */
+   l.ori   r6, r0, 1   /* mask = 1 */
+1: l.sfltsir4, 0   /* y has msb set */
+   l.bf2f
+l.sfltur4, r12 /* y < x */
+   l.add   r4, r4, r4  /* y <<= 1 */
+   l.bnf   1b
+l.add  r6, r6, r6  /* mask <<= 1 */
+
+   /* Shift Y back to the right again, subtracting from X.  */
+2: l.add   r7, r11, r6 /* tmp1 = quot + mask */
+3: l.srli  r6, r6, 1   /* mask >>= 1 */
+   l.sub 

[PATCH v4 0/3] OpenRISC port

2018-11-06 Thread Stafford Horne
Hello,

As you can see this is v4 of the OpenRISC port patch series, I just want to
mention that there are a few things pointed out during the v3 review that I have
not fixed, and do not plan before pushing upstream.  These are either because I
didn't feel they made the code easier to read or they were things that could
wait unil after upstreaming.  These include:

(not changed)
 - libgcc !cmov 1cyc improvements suggested by Richard
 - gcc eliminations refactorings suggested by Segher
 - leaving out empty constraint strings suggested by Segher
 - implementing TARGET_ISNS_COST suggested by Segher

Please let me know if you have concerns; now onto the patches:


Changes Since v3:
 - Fix tabs formatting pointed out by Segher
 - Fix comment formatting and typos pointed out by Segher
 - Fix for sign/zero extention login in md file from Richard
 - Remove usages of ATTRIBUTE_UNUSED suggested by Segher
 - Remove need for init/fini, removing crti/n.S files
 - Add support for -static-pie in LINK_SPEC suggsted by Szabolcs

Changes Since v2:
 - Add RTEMS patches from Joel Sherrill
 - Disable t-softfp-excl as suggsted by Joseph Myers
 - Add new architecture flags needed to run on real FPGA's found in testing
   * -mror - enable l.ror (rotate right)
   * -mshftimm - enable shift/rorate by immediate instructions
 - Binutils requirements are now in upstream git

Changes Since v1:
 - Document options in invoke.texi suggested by Joseph Myers
 - Remove obsolete/incorrect macros suggested by Joseph Myers
 - Documented or1k.c functions as requested by Jeff Law
 - Add epilogue barriers suggested by Jeff Law
 - Define SPECULATION_SAFE_VALUE suggested by Jeff Law
 - Switch to init/fini array suggested by Richard Henderson
 - Define and document multilib flags to enable disable instructions only
   available on some CPU cores as requested on OpenRISC mailing list.

Since February this year I have been working on an OpenRISC clean room rewrite.

  
http://stffrdhrn.github.io/software/embedded/openrisc/2018/02/03/openrisc_gcc_rewrite.html

As per the article, the old port had issues with some of the owners signing over
FSF copyright.  To get around this I discussed options with the group and in the
end I opted for a clean room rewrite.

The new code base has been written by me with lots of help from Richard
Henderson.  I trust that both of us have our FSF GCC copyright's in place.

# Testing

We have been running the GCC testsuite with newlib and musl libc.  The results
are good.  See results published in a test build/release here:

 - https://github.com/stffrdhrn/gcc/releases/tag/or1k-9.0.0-20181106

# Building

To build this requires the latest binutils upstream master i.e. 2.31.52.  Also,
due to removing need for `init` and `fini` it requires the latest changes on
newlib master.

-Stafford

Stafford Horne (3):
  or1k: libgcc: initial support for openrisc
  or1k: testsuite: initial support for openrisc
  or1k: gcc: initial support for openrisc

 gcc/common/config/or1k/or1k-common.c  |   41 +
 gcc/config.gcc|   45 +
 gcc/config/or1k/constraints.md|   55 +
 gcc/config/or1k/elf.h |   42 +
 gcc/config/or1k/elf.opt   |   33 +
 gcc/config/or1k/linux.h   |   45 +
 gcc/config/or1k/or1k-protos.h |   38 +
 gcc/config/or1k/or1k.c| 2183 +
 gcc/config/or1k/or1k.h|  392 +++
 gcc/config/or1k/or1k.md   |  897 +++
 gcc/config/or1k/or1k.opt  |   67 +
 gcc/config/or1k/predicates.md |   84 +
 gcc/config/or1k/rtems.h   |   30 +
 gcc/config/or1k/t-or1k|   22 +
 gcc/config/or1k/t-rtems   |3 +
 gcc/configure |   12 +
 gcc/configure.ac  |   12 +
 gcc/doc/install.texi  |   19 +
 gcc/doc/invoke.texi   |   68 +
 gcc/doc/md.texi   |   25 +
 .../gcc.c-torture/execute/20101011-1.c|3 +
 gcc/testsuite/gcc.dg/20020312-2.c |2 +
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c |4 +-
 gcc/testsuite/gcc.dg/builtin-apply2.c |2 +-
 gcc/testsuite/gcc.dg/nop.h|2 +
 .../torture/stackalign/builtin-apply-2.c  |2 +-
 gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c|2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c|2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-34.c|2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-35.c|2 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-36.c|2 +-
 gcc/testsuite/gcc.target/or1k/args-1.c|   19 +
 gcc/testsuite/gcc.target/or1k/args-2.c|   15 +
 gcc/testsuite/gcc.target/or1k/cmov-1.c|8 +
 gcc/testsuite/gcc.target/or1k/cmov-2.c|9 +
 gcc/testsuite/gcc.target/or1k

Re: [RFC][PATCH LRA] WIP patch to fix one part of PR87507

2018-11-06 Thread Jeff Law
On 10/27/18 10:24 AM, Peter Bergner wrote:
> On 10/22/18 6:45 PM, Peter Bergner wrote:
>> Bah, my bootstrap failed and I need to make a small change.  Let me do that
>> and verify my bootstraps get all the way thru before I give you the updated
>> patch.  Sorry.
> Ok, the following updated patch survives bootstrap and regtesting on
> powerpc64le-linux, x86_64-linux and s390x-linux with no regressions.
> Changes from the previous patch is that checking for illegal hard register
> usage in inline asm statements has been moved to expand time.  Secondly, the
> lra constraints code now checks for both HARD_REGISTER_P and REG_USERVAR_P.
> This was because I was seeing combine forcing hard regs into a pattern
> (not from inline asm) which lra needed to spill to match constraints,
> which it should be able to do.
> 
> Jeff, can you give this a try on your testers to see how it behaves on
> the other arches that were having problems?
> 
> Peter
> 
>   PR rtl-optimization/87600
>   * cfgexpand.c (expand_asm_stmt): Catch illegal asm constraint usage.
>   * lra-constraints.c (process_alt_operands): Skip illegal hard
>   register usage.  Prefer reloading non hard register operands.



> 

> Index: gcc/lra-constraints.c
> ===
> --- gcc/lra-constraints.c (revision 265402)
> +++ gcc/lra-constraints.c (working copy)
> @@ -2146,9 +2146,30 @@ process_alt_operands (int only_alternati
> }
>   else
> {
> - /* Operands don't match.  Both operands must
> -allow a reload register, otherwise we
> -cannot make them match.  */
> + /* Operands don't match.  If the operands are
> +different user defined explicit hard registers,
> +then we cannot make them match.  */
> + if ((REG_P (*curr_id->operand_loc[nop])
> +  || SUBREG_P (*curr_id->operand_loc[nop]))
> + && (REG_P (*curr_id->operand_loc[m])
> + || SUBREG_P (*curr_id->operand_loc[m])))
> +   {
> + rtx nop_reg = *curr_id->operand_loc[nop];
> + if (SUBREG_P (nop_reg))
> +   nop_reg = SUBREG_REG (nop_reg);
> + rtx m_reg = *curr_id->operand_loc[m];
> + if (SUBREG_P (m_reg))
> +   m_reg = SUBREG_REG (m_reg);
So the one worry I have/had in this code is nested subregs.  My
recollection is they do happen in rare cases.  But I can also find a
reference  where Jim W. has indicated they're invalid (and I absolutely
trust Jim on this kind of historical RTL stuff).

https://gcc.gnu.org/ml/gcc/2005-04/msg01173.html

Reviewing cases where we've written the stripping as a loop, several are
stripping subregs as well as extractions.  And I can certainly believe
that we could have an RTX with some combination of subregs and
extractions.  The exception to that pattern would be reload which has
subreg stripping loops that only strip the subregs.

So, after all that, I think we're OK.  It might make sense to verify we
don't have nested subregs in the IL verifiers.  Bonus points if you add
that checking.

So ideally we'd have some tests which tickle this code, even if they're
target specific.

So I'm OK with this patch and also OK with adding tests independently as
a follow-up.

jeff


Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Segher Boessenkool
On Tue, Nov 06, 2018 at 07:43:36PM +, Richard Earnshaw (lists) wrote:
> On 06/11/2018 18:18, Segher Boessenkool wrote:
> > On Tue, Nov 06, 2018 at 11:46:53AM +, Richard Earnshaw (lists) wrote:
> >> Well it generates new 'light-weight' prologue and epilogue sequences for
> >> the 'shrunk' code path that lack the establishment of the tracker
> >> register and doesn't know how to move the existing sequence to the new
> >> entry sequence.
> > 
> > Ah, so the shrink-wrapping code is not deleting anything at all (just
> > not adding it).  Gotcha :-)
> 
> Well you could argue that it deleted the tracker update for the case
> where the branch was not taken, and it also deleted the part of the
> prologue where the tracker state was restored into SP before the return.
>  But I'm being picky... :-)

When I say "deleted" I mean "deleted RTL code that was actually there".
You seem to mean "prevented it from being created later"?

What I'm after is, if the shrink-wrapping code is deleting RTL it has
no business touching, that sounds like a serious bug.

> > [ snip example code; thanks, that helped ]
> > 
> >> I'm not asking that shrink wrapping be updated to handle all this; in
> >> fact, I'm not sure it's that easy to do as the branch patterns and
> >> simple-return patterns aren't set up to handle this.
> > 
> > One thing you could do is make shrink-wrap aware what part of the code
> > needs the speculation tracking parts of the prologue.  You could do this
> > by making a separate shrink-wrapping component for it, or you can do it
> > by marking the places needing it as needing the full prologue, e.g. by
> > emitting a fake call into it (and not outputting any code for that call).
> > The latter does cause a stack frame to be emitted even when it wouldn't
> > otherwise, unfortunately.  The separate shrink-wrapping approach should
> > work beautifully as far as I see.
> 
> There are number of optimizations that are worth investigation with the
> tracking support; but whether they'll notably improve performance I'm
> not sure.  Tracking just just expensive and the main problem is the
> serialization of the state, which limits the core's ability to reorder
> stuff internally.

Yeah, it will be seriously expensive always.  If people still use this
in production code you really _do_ want to optimise it.  If that helps
measurably, anyway.


Segher


Re: [PR87874] avoid const-wide-int subreg in LRA

2018-11-06 Thread Vladimir Makarov

On 11/06/2018 04:38 AM, Alexandre Oliva wrote:

Just like CONST_INT, CONST_WIDE_INT is VOIDmode, so LRA might be
tempted to build a SUBREG to "convert" it to the wanted mode.  That's
no use.  Test for CONST_SCALAR_INT_P instead of CONST_INT_P so that we
skip the subreg creation for both.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?

Yes.  Thank you, Alex.

for  gcc/ChangeLog

PR rtl-optimization/87874
* lra.c (lra_substitute_pseudo): Do not create a subreg for
const wide ints.

for  gcc/testsuite/ChangeLog

PR rtl-optimization/87874
* gcc.dg/pr87874.c: New.




Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Richard Earnshaw (lists)
On 06/11/2018 19:43, Richard Earnshaw (lists) wrote:
> On 06/11/2018 18:18, Segher Boessenkool wrote:
>> Hi Richard,
>>
>> On Tue, Nov 06, 2018 at 11:46:53AM +, Richard Earnshaw (lists) wrote:
>>> On 06/11/2018 01:40, Segher Boessenkool wrote:
 Hi Richard,

 On Mon, Nov 05, 2018 at 10:09:30AM +, Richard Earnshaw (lists) wrote:
 Shouldn't you be able to do this per function at least?
>>>
>>> do what per function?  track speculation?
>>
>> disable shrink-wrapping only when any speculation was there
>> (this is about __bultin_speculation_safe_value, no?)
>
> Only indirectly.  This is about the tracking code that tracks
> conditional branches and propagates that information through call/return
> sequences.  Shrink wrapping messes with the prologue/epilogue sequences
> after the speculation tracking pass has run and unknowingly deletes some
> of the additional code that was previously inserted by the tracking pass.

 Do you have an example of this?  Shrink-wrapping does not generally
 delete any code.

>>>
>>> Well it generates new 'light-weight' prologue and epilogue sequences for
>>> the 'shrunk' code path that lack the establishment of the tracker
>>> register and doesn't know how to move the existing sequence to the new
>>> entry sequence.
>>
>> Ah, so the shrink-wrapping code is not deleting anything at all (just
>> not adding it).  Gotcha :-)
> 
> Well you could argue that it deleted the tracker update for the case
> where the branch was not taken, and it also deleted the part of the
> prologue where the tracker state was restored into SP before the return.

Duh! epilogue, of course.

R.

>  But I'm being picky... :-)
> 
>>
>> [ snip example code; thanks, that helped ]
>>
>>> I'm not asking that shrink wrapping be updated to handle all this; in
>>> fact, I'm not sure it's that easy to do as the branch patterns and
>>> simple-return patterns aren't set up to handle this.
>>
>> One thing you could do is make shrink-wrap aware what part of the code
>> needs the speculation tracking parts of the prologue.  You could do this
>> by making a separate shrink-wrapping component for it, or you can do it
>> by marking the places needing it as needing the full prologue, e.g. by
>> emitting a fake call into it (and not outputting any code for that call).
>> The latter does cause a stack frame to be emitted even when it wouldn't
>> otherwise, unfortunately.  The separate shrink-wrapping approach should
>> work beautifully as far as I see.
>>
>>
> 
> There are number of optimizations that are worth investigation with the
> tracking support; but whether they'll notably improve performance I'm
> not sure.  Tracking just just expensive and the main problem is the
> serialization of the state, which limits the core's ability to reorder
> stuff internally.
> 
> R.
> 
> 



Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Richard Earnshaw (lists)
On 06/11/2018 18:18, Segher Boessenkool wrote:
> Hi Richard,
> 
> On Tue, Nov 06, 2018 at 11:46:53AM +, Richard Earnshaw (lists) wrote:
>> On 06/11/2018 01:40, Segher Boessenkool wrote:
>>> Hi Richard,
>>>
>>> On Mon, Nov 05, 2018 at 10:09:30AM +, Richard Earnshaw (lists) wrote:
>>> Shouldn't you be able to do this per function at least?
>>
>> do what per function?  track speculation?
>
> disable shrink-wrapping only when any speculation was there
> (this is about __bultin_speculation_safe_value, no?)

 Only indirectly.  This is about the tracking code that tracks
 conditional branches and propagates that information through call/return
 sequences.  Shrink wrapping messes with the prologue/epilogue sequences
 after the speculation tracking pass has run and unknowingly deletes some
 of the additional code that was previously inserted by the tracking pass.
>>>
>>> Do you have an example of this?  Shrink-wrapping does not generally
>>> delete any code.
>>>
>>
>> Well it generates new 'light-weight' prologue and epilogue sequences for
>> the 'shrunk' code path that lack the establishment of the tracker
>> register and doesn't know how to move the existing sequence to the new
>> entry sequence.
> 
> Ah, so the shrink-wrapping code is not deleting anything at all (just
> not adding it).  Gotcha :-)

Well you could argue that it deleted the tracker update for the case
where the branch was not taken, and it also deleted the part of the
prologue where the tracker state was restored into SP before the return.
 But I'm being picky... :-)

> 
> [ snip example code; thanks, that helped ]
> 
>> I'm not asking that shrink wrapping be updated to handle all this; in
>> fact, I'm not sure it's that easy to do as the branch patterns and
>> simple-return patterns aren't set up to handle this.
> 
> One thing you could do is make shrink-wrap aware what part of the code
> needs the speculation tracking parts of the prologue.  You could do this
> by making a separate shrink-wrapping component for it, or you can do it
> by marking the places needing it as needing the full prologue, e.g. by
> emitting a fake call into it (and not outputting any code for that call).
> The latter does cause a stack frame to be emitted even when it wouldn't
> otherwise, unfortunately.  The separate shrink-wrapping approach should
> work beautifully as far as I see.
> 
> 

There are number of optimizations that are worth investigation with the
tracking support; but whether they'll notably improve performance I'm
not sure.  Tracking just just expensive and the main problem is the
serialization of the state, which limits the core's ability to reorder
stuff internally.

R.




Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Jeff Law
On 11/6/18 3:52 AM, Renlin Li wrote:
> Hi Jeff & Peter,
> 
> On 11/05/2018 07:41 PM, Jeff Law wrote:
>> On 11/5/18 12:36 PM, Peter Bergner wrote:
>>> On 11/5/18 1:20 PM, Jeff Law wrote:
 On 11/1/18 4:07 PM, Peter Bergner wrote:
> On 11/1/18 1:50 PM, Renlin Li wrote:
>> Is there any update on this issues?
>> arm-none-linux-gnueabihf native toolchain has been mis-compiled
>> for a while.
>
>  From the analysis I've done, my commit is just exposing latent issues
> in LRA.  Can you try the patch I submitted here to see if it helps?
>
>    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>
> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> Jeff threw it on his testers and said he saw an arm issue and was
> trying to come up with a test case for me to debug.
 So I don't think the ARM issues are related to your patch, they may
 have
 been related the combiner changes that went in around the same time.
> Yes, there are issues related to the combiner changes.
> 
> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap
> native toolchain mis-compiled.
> And the new patch seems not fix this problem.
That's strange.  I'm bootstrapping arm-linux-gnueabihf daily with qemu +
a suitable root filesystem using Peter's most recent testing patch.


> 
> I am trying to extract a test case, but it is a little bit hard as the
> toolchain itself is mis-compiled.
> And it ICEs when compile test case with it.
What I would suggest doing is to first start with running the testsuite
against the stage1 compiler before/after Peter's changes.  Sometimes
that'll turn up something useful and you can avoid debuging things
through stage2/stage3.


jeff


Re: [PATCH][RFC] Fix UBSAN in postreload-gcse.c (PR rtl-optimization/87868).

2018-11-06 Thread Jeff Law
On 11/6/18 7:05 AM, Martin Liška wrote:
> Hi.
> 
> The patch is adding a check overflow in  eliminate_partially_redundant_load.
> Question is whether the usage of conditional compilation of 
> __builtin_mul_overflow
> is fine?
> 
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-06  Martin Liska  
> 
>   PR rtl-optimization/87868
>   * postreload-gcse.c (eliminate_partially_redundant_load): Set
>   threshold to max_count if we would overflow.
>   * profile-count.h: Make max_count a public constant.
OK.  Though I do worry about how many of these things we'll have to
sprinkle over the sources over time.  I suspect there's all kinds of
overflows just waiting to happen, some are obviously more important than
others.

jeff


Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Peter Bergner
On 11/6/18 6:23 AM, Renlin Li wrote:
> I just did a bootstrap again with everything up to r264897 which is Oct 6.
> it produce the ICE I mentioned on the PR87899.
> 
> The first combiner patch on Oct 22.

Do the testsuite results (for disable-bootstrap builds) differ between
r264896 and r264897?  If so, that would be much easier to track down.

If not, maybe the following patch could help to narrow down which gcc
source file(s) are being miscompiled by allowing you to disable the
special handling of copy conflicts with an option?  The option default
(ie, not using the option or -fno-ira-copies-conflict) is the same behavior
as now and -fira-copies-conflict would make things behave like they did
before my patch.

Peter


Index: gcc/common.opt
===
--- gcc/common.opt  (revision 265402)
+++ gcc/common.opt  (working copy)
@@ -1761,6 +1761,10 @@ Enum(ira_region) String(all) Value(IRA_R
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-copies-conflict
+Common Report Var(flag_ira_copies_conflict) Init(0) Optimization
+Make pseudos connected by a copy conflict
+
 fira-hoist-pressure
 Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
 Use IRA based register pressure calculation
Index: gcc/ira-lives.c
===
--- gcc/ira-lives.c (revision 265402)
+++ gcc/ira-lives.c (working copy)
@@ -1066,7 +1066,7 @@ non_conflicting_reg_copy_p (rtx_insn *in
 {
   /* Reload has issues with overlapping pseudos being assigned to the
  same hard register, so don't allow it.  See PR87600 for details.  */
-  if (!targetm.lra_p ())
+  if (flag_ira_copies_conflict || !targetm.lra_p ())
 return NULL_RTX;
 
   rtx set = single_set (insn);



Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Segher Boessenkool
Hi Richard,

On Tue, Nov 06, 2018 at 11:46:53AM +, Richard Earnshaw (lists) wrote:
> On 06/11/2018 01:40, Segher Boessenkool wrote:
> > Hi Richard,
> > 
> > On Mon, Nov 05, 2018 at 10:09:30AM +, Richard Earnshaw (lists) wrote:
> > Shouldn't you be able to do this per function at least?
> 
>  do what per function?  track speculation?
> >>>
> >>> disable shrink-wrapping only when any speculation was there
> >>> (this is about __bultin_speculation_safe_value, no?)
> >>
> >> Only indirectly.  This is about the tracking code that tracks
> >> conditional branches and propagates that information through call/return
> >> sequences.  Shrink wrapping messes with the prologue/epilogue sequences
> >> after the speculation tracking pass has run and unknowingly deletes some
> >> of the additional code that was previously inserted by the tracking pass.
> > 
> > Do you have an example of this?  Shrink-wrapping does not generally
> > delete any code.
> > 
> 
> Well it generates new 'light-weight' prologue and epilogue sequences for
> the 'shrunk' code path that lack the establishment of the tracker
> register and doesn't know how to move the existing sequence to the new
> entry sequence.

Ah, so the shrink-wrapping code is not deleting anything at all (just
not adding it).  Gotcha :-)

[ snip example code; thanks, that helped ]

> I'm not asking that shrink wrapping be updated to handle all this; in
> fact, I'm not sure it's that easy to do as the branch patterns and
> simple-return patterns aren't set up to handle this.

One thing you could do is make shrink-wrap aware what part of the code
needs the speculation tracking parts of the prologue.  You could do this
by making a separate shrink-wrapping component for it, or you can do it
by marking the places needing it as needing the full prologue, e.g. by
emitting a fake call into it (and not outputting any code for that call).
The latter does cause a stack frame to be emitted even when it wouldn't
otherwise, unfortunately.  The separate shrink-wrapping approach should
work beautifully as far as I see.


Segher


Re: PR libstdc++/87872 Avoids iterator transfer on self splice

2018-11-06 Thread Jonathan Wakely

On 06/11/18 10:49 +0100, François Dumont wrote:

Here is the patch submitted by John and now fully tested.

    PR libstdc++/87872
    * include/debug/safe_sequence.tcc
    (_Safe_sequence<>::_M_transfer_from_if): Skip transfer to self.

Is it fine to commit it now ?


OK, thanks.



François




diff --git a/libstdc++-v3/include/debug/safe_sequence.tcc 
b/libstdc++-v3/include/debug/safe_sequence.tcc
index 12de48cf322..ce9a807e79f 100644
--- a/libstdc++-v3/include/debug/safe_sequence.tcc
+++ b/libstdc++-v3/include/debug/safe_sequence.tcc
@@ -68,6 +68,9 @@ namespace __gnu_debug
  _Safe_sequence<_Sequence>::
  _M_transfer_from_if(_Safe_sequence& __from, _Predicate __pred)
  {
+   if (this == std::__addressof(__from))
+ return;
+
typedef typename _Sequence::iterator iterator;
typedef typename _Sequence::const_iterator const_iterator;





aarch64 - Set the mode for the unspec in speculation_tracker insn.

2018-11-06 Thread Richard Earnshaw (lists)
The speculation tracker insn in my recent patch set for CVE-2017-5753
was missing a mode on the UNSPEC.  Although this didn't break the
build, it did cause an unnecessary warning from the MD parsing
mechanism that I missed at the time.  It's a trivial fix, as follows:

* config/aarch64/aarch64.md (speculation_tracker): Set the mode for
the UNSPEC.


Committed to trunk.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ada623bb6f1..82af4d47f78 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6687,7 +6687,7 @@ (define_expand "doloop_end"
 ;; SPECULATION_TRACKER_REGNUM is reserved for this purpose when necessary.
 (define_insn "speculation_tracker"
   [(set (reg:DI SPECULATION_TRACKER_REGNUM)
-	(unspec [(reg:DI SPECULATION_TRACKER_REGNUM) (match_operand 0)]
+	(unspec:DI [(reg:DI SPECULATION_TRACKER_REGNUM) (match_operand 0)]
 	 UNSPEC_SPECULATION_TRACKER))]
   ""
   {


Small typo in iconv.m4

2018-11-06 Thread Hafiz Abid Qadeer
Hi All,
I was investigating a character set related problem with windows hosted
GDB and I tracked it down to a typo in iconv.m4. This typo caused
libiconv detection to fail and related support was not built into gdb.

The problem is with the following line.
CPPFLAGS="$LIBS $INCICONV"
which should have been
CPPFLAGS="$CPPFLAGS $INCICONV"

OK to commit the attached patch?

2018-11-06  Hafiz Abid Qadeer  

* config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
Append $INCICONV to it.
* gcc/configure: Regenerate.
* libcpp/configure: Likewise.
* libstdc++-v3/configure: Likewise.
* intl/configure: Likewise.

Thanks,
-- 
Hafiz Abid Qadeer
Mentor Embedded/CodeSourcery
diff --git a/config/iconv.m4 b/config/iconv.m4
index 5f9304a6ba..f1e54c5aed 100644
--- a/config/iconv.m4
+++ b/config/iconv.m4
@@ -73,7 +73,7 @@ AC_DEFUN([AM_ICONV_LINK],
 if test "$am_cv_func_iconv" != yes; then
   am_save_CPPFLAGS="$CPPFLAGS"
   am_save_LIBS="$LIBS"
-  CPPFLAGS="$LIBS $INCICONV"
+  CPPFLAGS="$CPPFLAGS $INCICONV"
   LIBS="$LIBS $LIBICONV"
   AC_TRY_LINK([#include 
 #include ],


Re: [Driver] Add support for -fuse-ld=lld

2018-11-06 Thread H.J. Lu
On Sat, Oct 20, 2018 at 3:19 AM Romain Geissler
 wrote:
>
> Hi,
>
> I would like to raise again the question of supporting -fuse-ld=ldd. A
> patch implementing it was already submitted in
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01722.html by Davide
> Italiano. This patch still applies correctly to current trunk. I am CC-ing
> the original author and re-posting it here unchanged for reference.
>
> I think we can consider this patch as relevant despite the goals and
> licence difference of LLVM vs GNU, based on what was written by Mike Stump
> in https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00157.html
>
> Back then, the technical problem raised by lld was reported as
> https://bugs.llvm.org/show_bug.cgi?id=28414 now closed. In this bug, every
> reported problems have been fixed except the last one. H.J. Lu mentions
> this last problem (lld does not produces symbol versions predecessor
> relationship while ld.bfd and ld.gold do, which seems to be a decision
> taken on purpose and advertised as a harmless change) as being one reason
> against supporting in -fuse-ld=ldd in gcc. Is it still the case today, and
> if yes, why ?
>
> Is there any other reason why -fuse-ld=ldd shall not be supported by gcc ?
>
> Cheers,
> Romain
>
> From 323c23d79c91d7dcee2f29b9ced8c1c00703d346 Mon Sep 17 00:00:00 2001
> From: Davide Italiano 
> Date: Thu, 23 Jun 2016 20:51:53 -0700
> Subject: [PATCH] Driver: Add support for -fuse-ld=lld.
>
> * collect2.c  (main): Support -fuse-ld=lld.
>
> * common.opt: Add fuse-ld=lld
>
> * doc/invoke.texi:  Document -fuse-ld=lld
>
> * opts.c: Ignore -fuse-ld=lld
> ---
>  gcc/collect2.c  | 11 ---
>  gcc/common.opt  |  4 
>  gcc/doc/invoke.texi |  4 
>  gcc/opts.c  |  1 +
>  4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/collect2.c b/gcc/collect2.c
> index bffac80..6a8387c 100644
> --- a/gcc/collect2.c
> +++ b/gcc/collect2.c
> @@ -831,6 +831,7 @@ main (int argc, char **argv)
>USE_PLUGIN_LD,
>USE_GOLD_LD,
>USE_BFD_LD,
> +  USE_LLD_LD,
>USE_LD_MAX
>  } selected_linker = USE_DEFAULT_LD;
>static const char *const ld_suffixes[USE_LD_MAX] =
> @@ -838,7 +839,8 @@ main (int argc, char **argv)
>"ld",
>PLUGIN_LD_SUFFIX,
>"ld.gold",
> -  "ld.bfd"
> +  "ld.bfd",
> +  "ld.lld"
>  };
>static const char *const real_ld_suffix = "real-ld";
>static const char *const collect_ld_suffix = "collect-ld";
> @@ -1004,6 +1006,8 @@ main (int argc, char **argv)
>   selected_linker = USE_BFD_LD;
> else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
>   selected_linker = USE_GOLD_LD;
> +  else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
> +selected_linker = USE_LLD_LD;
>
>  #ifdef COLLECT_EXPORT_LIST
> /* These flags are position independent, although their order
> @@ -1093,7 +1097,8 @@ main (int argc, char **argv)
>/* Maybe we know the right file to use (if not cross).  */
>ld_file_name = 0;
>  #ifdef DEFAULT_LINKER
> -  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD)
> +  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD ||
> +  selected_linker == USE_LLD_LD)
>  {
>char *linker_name;
>  # ifdef HOST_EXECUTABLE_SUFFIX
> @@ -1307,7 +1312,7 @@ main (int argc, char **argv)
>   else if (!use_collect_ld
>&& strncmp (arg, "-fuse-ld=", 9) == 0)
> {
> - /* Do not pass -fuse-ld={bfd|gold} to the linker. */
> + /* Do not pass -fuse-ld={bfd|gold|lld} to the linker. */
>   ld1--;
>   ld2--;
> }
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5d90385..2a95a1f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2536,6 +2536,10 @@ fuse-ld=gold
>  Common Driver Negative(fuse-ld=bfd)
>  Use the gold linker instead of the default linker.
>
> +fuse-ld=lld
> +Common Driver Negative(fuse-ld=lld)
> +Use the lld LLVM linker instead of the default linker.
> +
>  fuse-linker-plugin
>  Common Undocumented Var(flag_use_linker_plugin)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2c87c53..4b8acff 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10651,6 +10651,10 @@ Use the @command{bfd} linker instead of the default 
> linker.
>  @opindex fuse-ld=gold
>  Use the @command{gold} linker instead of the default linker.
>
> +@item -fuse-ld=lld
> +@opindex fuse-ld=lld
> +Use the LLVM @command{lld} linker instead of the default linker.
> +
>  @cindex Libraries
>  @item -l@var{library}
>  @itemx -l @var{library}
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 7406210..f2c86f7 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -2178,6 +2178,7 @@ common_handle_option (struct gcc_options *opts,
>
>  case OPT_fuse_ld_bfd:
>  case OPT_fuse_ld_gold:
> +case OPT_fuse_ld_lld:
>  case OPT_fuse_linker_plugin:
>/* No-op. 

Re: [Driver] Add support for -fuse-ld=lld

2018-11-06 Thread Romain GEISSLER
Ping^2

> Le 27 oct. 2018 à 11:33, Romain GEISSLER  a 
> écrit :
> 
> Ping
> 
>> Le 20 oct. 2018 à 12:18, Romain Geissler  a 
>> écrit :
>> 
>> Hi,
>> 
>> I would like to raise again the question of supporting -fuse-ld=ldd. A
>> patch implementing it was already submitted in
>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01722.html by Davide
>> Italiano. This patch still applies correctly to current trunk. I am CC-ing
>> the original author and re-posting it here unchanged for reference.
>> 
>> I think we can consider this patch as relevant despite the goals and
>> licence difference of LLVM vs GNU, based on what was written by Mike Stump
>> in https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00157.html
>> 
>> Back then, the technical problem raised by lld was reported as
>> https://bugs.llvm.org/show_bug.cgi?id=28414 now closed. In this bug, every
>> reported problems have been fixed except the last one. H.J. Lu mentions
>> this last problem (lld does not produces symbol versions predecessor
>> relationship while ld.bfd and ld.gold do, which seems to be a decision
>> taken on purpose and advertised as a harmless change) as being one reason
>> against supporting in -fuse-ld=ldd in gcc. Is it still the case today, and
>> if yes, why ?
>> 
>> Is there any other reason why -fuse-ld=ldd shall not be supported by gcc ?
>> 
>> Cheers,
>> Romain
>> 
>> From 323c23d79c91d7dcee2f29b9ced8c1c00703d346 Mon Sep 17 00:00:00 2001
>> From: Davide Italiano 
>> Date: Thu, 23 Jun 2016 20:51:53 -0700
>> Subject: [PATCH] Driver: Add support for -fuse-ld=lld.
>> 
>> * collect2.c  (main): Support -fuse-ld=lld.
>> 
>> * common.opt: Add fuse-ld=lld
>> 
>> * doc/invoke.texi:  Document -fuse-ld=lld
>> 
>> * opts.c: Ignore -fuse-ld=lld
>> ---
>> gcc/collect2.c  | 11 ---
>> gcc/common.opt  |  4 
>> gcc/doc/invoke.texi |  4 
>> gcc/opts.c  |  1 +
>> 4 files changed, 17 insertions(+), 3 deletions(-)
>> 
>> diff --git a/gcc/collect2.c b/gcc/collect2.c
>> index bffac80..6a8387c 100644
>> --- a/gcc/collect2.c
>> +++ b/gcc/collect2.c
>> @@ -831,6 +831,7 @@ main (int argc, char **argv)
>>  USE_PLUGIN_LD,
>>  USE_GOLD_LD,
>>  USE_BFD_LD,
>> +  USE_LLD_LD,
>>  USE_LD_MAX
>>} selected_linker = USE_DEFAULT_LD;
>>  static const char *const ld_suffixes[USE_LD_MAX] =
>> @@ -838,7 +839,8 @@ main (int argc, char **argv)
>>  "ld",
>>  PLUGIN_LD_SUFFIX,
>>  "ld.gold",
>> -  "ld.bfd"
>> +  "ld.bfd",
>> +  "ld.lld"
>>};
>>  static const char *const real_ld_suffix = "real-ld";
>>  static const char *const collect_ld_suffix = "collect-ld";
>> @@ -1004,6 +1006,8 @@ main (int argc, char **argv)
>>selected_linker = USE_BFD_LD;
>>  else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
>>selected_linker = USE_GOLD_LD;
>> +  else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
>> +selected_linker = USE_LLD_LD;
>> 
>> #ifdef COLLECT_EXPORT_LIST
>>  /* These flags are position independent, although their order
>> @@ -1093,7 +1097,8 @@ main (int argc, char **argv)
>>  /* Maybe we know the right file to use (if not cross).  */
>>  ld_file_name = 0;
>> #ifdef DEFAULT_LINKER
>> -  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD)
>> +  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD ||
>> +  selected_linker == USE_LLD_LD)
>>{
>>  char *linker_name;
>> # ifdef HOST_EXECUTABLE_SUFFIX
>> @@ -1307,7 +1312,7 @@ main (int argc, char **argv)
>>else if (!use_collect_ld
>> && strncmp (arg, "-fuse-ld=", 9) == 0)
>>  {
>> -  /* Do not pass -fuse-ld={bfd|gold} to the linker. */
>> +  /* Do not pass -fuse-ld={bfd|gold|lld} to the linker. */
>>ld1--;
>>ld2--;
>>  }
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 5d90385..2a95a1f 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2536,6 +2536,10 @@ fuse-ld=gold
>> Common Driver Negative(fuse-ld=bfd)
>> Use the gold linker instead of the default linker.
>> 
>> +fuse-ld=lld
>> +Common Driver Negative(fuse-ld=lld)
>> +Use the lld LLVM linker instead of the default linker.
>> +
>> fuse-linker-plugin
>> Common Undocumented Var(flag_use_linker_plugin)
>> 
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 2c87c53..4b8acff 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -10651,6 +10651,10 @@ Use the @command{bfd} linker instead of the default 
>> linker.
>> @opindex fuse-ld=gold
>> Use the @command{gold} linker instead of the default linker.
>> 
>> +@item -fuse-ld=lld
>> +@opindex fuse-ld=lld
>> +Use the LLVM @command{lld} linker instead of the default linker.
>> +
>> @cindex Libraries
>> @item -l@var{library}
>> @itemx -l @var{library}
>> diff --git a/gcc/opts.c b/gcc/opts.c
>> index 7406210..f2c86f7 100644
>> --- a/gcc/opts.c
>> +++ b/gcc/opts.c
>> @@ -2178,6 +2178,7 @@ common_handle_option (struct gcc_options *opts,
>> 

Simplify function types

2018-11-06 Thread Jan Hubicka
Hi,
this patch simplifies function types.  For GCC it cuts number of type
duplicates to half (to about 500 duplicates).  I need to analyze the
remaining ones, but i think they are mostly caused by mixing up
complete/incomplete enums and arrays of pointers that should be last
necessary changes to avoid duplicated ODR types at the GCC bootstrap.

We are now down from 650MB to 450MB of ltrans files since my last
report https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02034.html

 phase opt and generate :  39.37 ( 78%)   0.87 ( 13%)  40.27 ( 70%) 
 40 kB ( 29%)
 phase stream in:  10.34 ( 20%)   0.40 (  6%)  10.74 ( 19%) 
 980729 kB ( 70%)
 ipa function summary   :   0.20 (  0%)   0.05 (  1%)   0.24 (  0%) 
  67974 kB (  5%)
 ipa cp :   0.85 (  2%)   0.05 (  1%)   0.93 (  2%) 
 126839 kB (  9%)
 ipa inlining heuristics:  30.71 ( 61%)   0.08 (  1%)  30.81 ( 53%) 
 119761 kB (  9%)
 lto stream inflate :   2.41 (  5%)   0.20 (  3%)   2.51 (  4%) 
  0 kB (  0%)
 ipa lto gimple in  :   1.21 (  2%)   0.50 (  8%)   1.65 (  3%) 
 201610 kB ( 14%)
 ipa lto gimple out :   0.04 (  0%)   0.03 (  0%)   0.07 (  0%) 
  0 kB (  0%)
 whopr partitioning :   1.31 (  3%)   0.01 (  0%)   1.32 (  2%) 
   5338 kB (  0%)
 ipa icf:   2.93 (  6%)   0.07 (  1%)   3.04 (  5%) 
  12830 kB (  1%)
 TOTAL  :  50.54  6.60 57.72
1393898 kB

Thus even relatively small improvements in type merging still translate
to large improvements in overal ltrans stream size, because types keeps
dupicating everything else.

lto-bootstrapped/regtested x86_64-linux before some last minute change.
Re-testing, OK if it passes?

Honza

* tree.c (free_lang_data_in_type): Add fld parameter; simplify
return and parameter types of function and method types.
(free_lang_data_in_cgraph): Update.
Index: tree.c
===
--- tree.c  (revision 265848)
+++ tree.c  (working copy)
@@ -5261,7 +5261,7 @@ free_lang_data_in_binfo (tree binfo)
 /* Reset all language specific information still present in TYPE.  */
 
 static void
-free_lang_data_in_type (tree type)
+free_lang_data_in_type (tree type, struct free_lang_data_d *fld)
 {
   gcc_assert (TYPE_P (type));
 
@@ -5280,6 +5280,7 @@ free_lang_data_in_type (tree type)
 
   if (TREE_CODE (type) == FUNCTION_TYPE)
 {
+  TREE_TYPE (type) = fld_simplified_type (TREE_TYPE (type), fld);
   /* Remove the const and volatile qualifiers from arguments.  The
 C++ front end removes them, but the C front end does not,
 leading to false ODR violation errors when merging two
@@ -5287,6 +5288,7 @@ free_lang_data_in_type (tree type)
 different front ends.  */
   for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
{
+  TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld);
  tree arg_type = TREE_VALUE (p);
 
  if (TYPE_READONLY (arg_type) || TYPE_VOLATILE (arg_type))
@@ -5295,16 +5297,22 @@ free_lang_data_in_type (tree type)
  & ~TYPE_QUAL_CONST
  & ~TYPE_QUAL_VOLATILE;
  TREE_VALUE (p) = build_qualified_type (arg_type, quals);
- free_lang_data_in_type (TREE_VALUE (p));
+ free_lang_data_in_type (TREE_VALUE (p), fld);
}
  /* C++ FE uses TREE_PURPOSE to store initial values.  */
  TREE_PURPOSE (p) = NULL;
}
 }
   else if (TREE_CODE (type) == METHOD_TYPE)
-for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
-  /* C++ FE uses TREE_PURPOSE to store initial values.  */
-  TREE_PURPOSE (p) = NULL;
+{
+  TREE_TYPE (type) = fld_simplified_type (TREE_TYPE (type), fld);
+  for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
+   {
+ /* C++ FE uses TREE_PURPOSE to store initial values.  */
+ TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld);
+ TREE_PURPOSE (p) = NULL;
+   }
+}
   else if (RECORD_OR_UNION_TYPE_P (type))
 {
   /* Remove members that are not FIELD_DECLs from the field list
@@ -5985,7 +5994,7 @@ free_lang_data_in_cgraph (void)
 
   /* Traverse every type found freeing its language data.  */
   FOR_EACH_VEC_ELT (fld.types, i, t)
-free_lang_data_in_type (t);
+free_lang_data_in_type (t, );
   if (flag_checking)
 {
   FOR_EACH_VEC_ELT (fld.types, i, t)


Avoid duplicate variants in free_lang_data

2018-11-06 Thread Jan Hubicka
Hi,
this patch fixes another bogus variants created which I noticed by
extra checking code.  fld_type_variant_equal_p copares type name which
are modified by free_lang_data_in_type.  If we first introduce a variant
and then free lang data in the original type, next time we look for same
variant we will fail.

This patch breaks out the logic into fld_simplified_type_name and
makes fld_type_variant_equal_p to anticipate the cleanup which will be done
later.  Bootstrapped/regtested x86_64-linux, comitted as obvious.

Honza

* tree.c (fld_simplified_type_name): Break out form ...
(free_lang_data_in_type): ... here.
(fld_type_variant_equal_p): Use it.
Index: tree.c
===
--- tree.c  (revision 265845)
+++ tree.c  (working copy)
@@ -5083,6 +5083,21 @@ fld_worklist_push (tree t, struct free_l
 
 
 
+/* Return simplified TYPE_NAME of TYPE.  */
+
+static tree
+fld_simplified_type_name (tree type)
+{
+  if (!TYPE_NAME (type) || TREE_CODE (TYPE_NAME (type)) != TYPE_DECL)
+return TYPE_NAME (type);
+  /* Drop TYPE_DECLs in TYPE_NAME in favor of the identifier in the
+ TYPE_DECL if the type doesn't have linkage.
+ this must match fld_  */
+  if (type != TYPE_MAIN_VARIANT (type) || ! type_with_linkage_p (type))
+return DECL_NAME (TYPE_NAME (type));
+  return TYPE_NAME (type);
+}
+
 /* Do same comparsion as check_qualified_type skipping lang part of type
and be more permissive about type names: we only care that names are
same (for diagnostics) and that ODR names are the same.  */
@@ -5091,8 +5106,8 @@ static bool
 fld_type_variant_equal_p (tree t, tree v)
 {
   if (TYPE_QUALS (t) != TYPE_QUALS (v)
-  || TYPE_NAME (t) != TYPE_NAME (v)
   || TYPE_ALIGN (t) != TYPE_ALIGN (v)
+  || fld_simplified_type_name (t) != fld_simplified_type_name (v)
   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
TYPE_ATTRIBUTES (v)))
 return false;
@@ -5338,12 +5353,11 @@ free_lang_data_in_type (tree type)
 }
 
   /* Drop TYPE_DECLs in TYPE_NAME in favor of the identifier in the
- TYPE_DECL if the type doesn't have linkage.  */
+ TYPE_DECL if the type doesn't have linkage.
+ this must match fld_  */
   if (type != TYPE_MAIN_VARIANT (type) || ! type_with_linkage_p (type))
-{
-  TYPE_NAME (type) = TYPE_IDENTIFIER (type);
-  TYPE_STUB_DECL (type) = NULL;
-}
+TYPE_STUB_DECL (type) = NULL;
+  TYPE_NAME (type) = fld_simplified_type_name (type);
 }
 
 


Re: Restore -fopt-info-vec optimized locations for SLP vectorization

2018-11-06 Thread Richard Biener
On Tue, 6 Nov 2018, David Malcolm wrote:

> On Tue, 2018-11-06 at 13:12 +0100, Richard Biener wrote:
> > The following patch pushes a DUMP_VECT_SCOPE down one level because
> > it otherwise hides a MSG_OPTIMIZED_LOCATION print.
> > 
> > David - was this an intended effect of the scoping?
> 
> No, an accident, sorry.  The scope depth thing controlling
> MSG_PRIORITY_{USER_FACING|INTERNALS} is something of a blunt hammer.
> 
> Thanks for fixing it.
> 
> Am I right in thinking that we don't yet have any test coverage of
> -fopt-info-vec-optimized for SLP?  (otherwise presumably my testing
> would have caught this)

Yes, looks like we only have a testcase for loop vectorization.  BB
vectorization also doesn't yet use the opt-problem thing it seems.

> (FWIW, I'm working on -fopt-info-inline; I hope to post patches for
> that in the next day or so)

Great!

Richard.

> Dave
> 
> > Applied to trunk.
> > 
> > Richard.
> > 
> > 2018-11-06  Richard Biener  
> > 
> > * tree-vect-slp.c (vect_slp_bb): Move opening of
> > vect_slp_analyze_bb
> > dump-scope ...
> > (vect_slp_analyze_bb_1): ... here to avoid hiding optimized
> > locations.
> > 
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index e7e5d252c00..f802b004bef 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -2779,6 +2779,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator
> > region_begin,
> >vec datarefs, int n_stmts,
> >bool , vec_info_shared *shared)
> >  {
> > +  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
> > +
> >bb_vec_info bb_vinfo;
> >slp_instance instance;
> >int i;
> > @@ -2949,8 +2951,6 @@ vect_slp_bb (basic_block bb)
> >bool any_vectorized = false;
> >auto_vector_sizes vector_sizes;
> >  
> > -  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
> > -
> >/* Autodetect first vector size we try.  */
> >current_vector_size = 0;
> >targetm.vectorize.autovectorize_vector_sizes (_sizes);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-06 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.

r265490 allowed the compiler to choose in a more flexible way whether to
use load or load-address-relative-long (LARL) instruction.  When it
chose LARL for literal pool references, the latter ones were rewritten
by pass_s390_early_mach to use UNSPEC_LTREF, which assumes base register
usage, which in turn is not compatible with LARL.  The end result was an
ICE because of unrecognizable insn.

UNSPEC_LTREF and friends are necessary in order to communicate the
dependency on the base register to pass_sched2.  When relative
addressing is used, no base register is necessary, so in such cases the
rewrite must be avoided.

gcc/ChangeLog:

2018-11-05  Ilya Leoshkevich  

PR target/87762
* config/s390/predicates.md (larl_operand): Use
s390_symbol_relative_long_p () to reduce code duplication.
* config/s390/s390-protos.h (s390_symbol_relative_long_p): New
function.
* config/s390/s390.c (s390_safe_relative_long_p): Likewise.
(annotate_constant_pool_refs): Skip insns which support relative
addressing.
(annotate_constant_pool_refs_1): New helper function.
(s390_symbol_relative_long_p): New function.
(find_constant_pool_ref): Return unannotated constant pool
references.
(replace_constant_pool_ref): Skip insns which support relative
addressing.
(replace_constant_pool_ref_1): New helper function.
(s390_mainpool_finish): Adjust to the new signature of
replace_constant_pool_ref ().
(s390_chunkify_finish): Likewise.
(pass_s390_early_mach::execute): Likewise.
(s390_prologue_plus_offset): Likewise.
(s390_emit_prologue): Likewise.
(s390_emit_epilogue): Likewise.
---
 gcc/config/s390/predicates.md |   8 +--
 gcc/config/s390/s390-protos.h |   1 +
 gcc/config/s390/s390.c| 107 ++
 3 files changed, 85 insertions(+), 31 deletions(-)

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 97f717c558d..bb66c8a6bcb 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -151,9 +151,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
   /* Everything else must have a CONST, so strip it.  */
   if (GET_CODE (op) != CONST)
@@ -176,9 +174,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
 
   /* Now we must have a @GOTENT offset or @PLT stub
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 96fa705f879..d3c2cc55e28 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -157,6 +157,7 @@ extern void s390_indirect_branch_via_thunk (unsigned int 
regno,
rtx comparison_operator,
enum s390_indirect_branch_type 
type);
 extern void s390_indirect_branch_via_inline_thunk (rtx execute_target);
+extern bool s390_symbol_relative_long_p (rtx);
 #endif /* RTX_CODE */
 
 /* s390-c.c routines */
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1be85727b73..c1318c25004 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2731,6 +2731,17 @@ s390_safe_attr_type (rtx_insn *insn)
 return TYPE_NONE;
 }
 
+/* Return attribute relative_long of insn.  */
+
+static bool
+s390_safe_relative_long_p (rtx_insn *insn)
+{
+  if (recog_memoized (insn) >= 0)
+return get_attr_relative_long (insn) == RELATIVE_LONG_YES;
+  else
+return false;
+}
+
 /* Return true if DISP is a valid short displacement.  */
 
 static bool
@@ -8116,11 +8127,8 @@ s390_first_cycle_multipass_dfa_lookahead (void)
   return 4;
 }
 
-/* Annotate every literal pool reference in X by an UNSPEC_LTREF expression.
-   Fix up MEMs as required.  */
-
 static void
-annotate_constant_pool_refs (rtx *x)
+annotate_constant_pool_refs_1 (rtx *x)
 {
   int i, j;
   const char *fmt;
@@ -8199,18 +8207,41 @@ annotate_constant_pool_refs (rtx *x)
 {
   if (fmt[i] == 'e')
{
- annotate_constant_pool_refs ( (*x, i));
+ annotate_constant_pool_refs_1 ( (*x, i));
}
   else if (fmt[i] == 'E')
{
  for (j = 0; j < XVECLEN (*x, i); j++)
-   annotate_constant_pool_refs ( (*x, i, j));
+   annotate_constant_pool_refs_1 ( (*x, i, j));
}
 }
 }
 
-/* Find an annotated literal pool symbol referenced in RTX X,
-   and store it at REF.  Will abort if X contains references to
+/* Annotate every literal pool reference in INSN by an 

[PATCH][RFC] Fix UBSAN in postreload-gcse.c (PR rtl-optimization/87868).

2018-11-06 Thread Martin Liška
Hi.

The patch is adding a check overflow in  eliminate_partially_redundant_load.
Question is whether the usage of conditional compilation of 
__builtin_mul_overflow
is fine?

Thanks,
Martin

gcc/ChangeLog:

2018-11-06  Martin Liska  

PR rtl-optimization/87868
* postreload-gcse.c (eliminate_partially_redundant_load): Set
threshold to max_count if we would overflow.
* profile-count.h: Make max_count a public constant.
---
 gcc/postreload-gcse.c | 14 --
 gcc/profile-count.h   |  2 +-
 2 files changed, 13 insertions(+), 3 deletions(-)


diff --git a/gcc/postreload-gcse.c b/gcc/postreload-gcse.c
index b56993183d0..399970c368a 100644
--- a/gcc/postreload-gcse.c
+++ b/gcc/postreload-gcse.c
@@ -1170,8 +1170,18 @@ eliminate_partially_redundant_load (basic_block bb, rtx_insn *insn,
   if (ok_count.to_gcov_type ()
   < GCSE_AFTER_RELOAD_PARTIAL_FRACTION * not_ok_count.to_gcov_type ())
 goto cleanup;
-  if (ok_count.to_gcov_type ()
-  < GCSE_AFTER_RELOAD_CRITICAL_FRACTION * critical_count.to_gcov_type ())
+
+  gcov_type threshold;
+#if (GCC_VERSION >= 5000)
+  if (__builtin_mul_overflow (GCSE_AFTER_RELOAD_CRITICAL_FRACTION,
+			  critical_count.to_gcov_type (), ))
+threshold = profile_count::max_count;
+#else
+  threshold
+= GCSE_AFTER_RELOAD_CRITICAL_FRACTION * critical_count.to_gcov_type ();
+#endif
+
+  if (ok_count.to_gcov_type () < threshold)
 goto cleanup;
 
   /* Generate moves to the loaded register from where
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index f4d0c340a0a..5d3bcc75f6d 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -641,8 +641,8 @@ public:
  type to hold various extra stages.  */
 
   static const int n_bits = 61;
-private:
   static const uint64_t max_count = ((uint64_t) 1 << n_bits) - 2;
+private:
   static const uint64_t uninitialized_count = ((uint64_t) 1 << n_bits) - 1;
 
   uint64_t m_val : n_bits;



Re: Restore -fopt-info-vec optimized locations for SLP vectorization

2018-11-06 Thread David Malcolm
On Tue, 2018-11-06 at 13:12 +0100, Richard Biener wrote:
> The following patch pushes a DUMP_VECT_SCOPE down one level because
> it otherwise hides a MSG_OPTIMIZED_LOCATION print.
> 
> David - was this an intended effect of the scoping?

No, an accident, sorry.  The scope depth thing controlling
MSG_PRIORITY_{USER_FACING|INTERNALS} is something of a blunt hammer.

Thanks for fixing it.

Am I right in thinking that we don't yet have any test coverage of
-fopt-info-vec-optimized for SLP?  (otherwise presumably my testing
would have caught this)

(FWIW, I'm working on -fopt-info-inline; I hope to post patches for
that in the next day or so)

Dave

> Applied to trunk.
> 
> Richard.
> 
> 2018-11-06  Richard Biener  
> 
>   * tree-vect-slp.c (vect_slp_bb): Move opening of
> vect_slp_analyze_bb
>   dump-scope ...
>   (vect_slp_analyze_bb_1): ... here to avoid hiding optimized
> locations.
> 
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index e7e5d252c00..f802b004bef 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2779,6 +2779,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator
> region_begin,
>  vec datarefs, int n_stmts,
>  bool , vec_info_shared *shared)
>  {
> +  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
> +
>bb_vec_info bb_vinfo;
>slp_instance instance;
>int i;
> @@ -2949,8 +2951,6 @@ vect_slp_bb (basic_block bb)
>bool any_vectorized = false;
>auto_vector_sizes vector_sizes;
>  
> -  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
> -
>/* Autodetect first vector size we try.  */
>current_vector_size = 0;
>targetm.vectorize.autovectorize_vector_sizes (_sizes);


[PATCH] Improve -fprofile-report.

2018-11-06 Thread Martin Liška
Hi.

The patch is based on what was discussed on IRC and in the PR.
Apart from that the reported layout is improved.

Patch survives regression tests on x86_64-linux-gnu.

Ready for trunk?
Martin

gcc/ChangeLog:

2018-11-06  Martin Liska  

PR tree-optimization/87885
* cfghooks.c (account_profile_record): Rename
to ...
(profile_record_check_consistency): ... this.
Calculate missing num_mismatched_freq_in.
(profile_record_account_profile): New function
that calculates time and size of a function.
* cfghooks.h (struct profile_record): Remove
all tuples.
(struct cfg_hooks): Remove after_pass flag.
(account_profile_record): Rename to ...
(profile_record_check_consistency): ... this.
(profile_record_account_profile): New.
* cfgrtl.c (rtl_account_profile_record): Remove
after_pass flag.
* passes.c (check_profile_consistency): Do only
checking.
(account_profile): Calculate size and time of
function only.
(pass_manager::dump_profile_report): Reformat
output.
(execute_one_ipa_transform_pass): Call
consistency check before clean upand call account_profile
after a clean up is done.
(execute_one_pass): Call check_profile_consistency and
account_profile instead of using after_pass flag..
* tree-cfg.c (gimple_account_profile_record): Likewise.
---
 gcc/cfghooks.c |  38 +++--
 gcc/cfghooks.h |  17 ++--
 gcc/cfgrtl.c   |  12 ++-
 gcc/passes.c   | 207 ++---
 gcc/tree-cfg.c |  11 ++-
 5 files changed, 161 insertions(+), 124 deletions(-)


diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index ea106e0cb32..824ab25929a 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -1425,11 +1425,10 @@ split_block_before_cond_jump (basic_block bb)
 
 /* Work-horse for passes.c:check_profile_consistency.
Do book-keeping of the CFG for the profile consistency checker.
-   If AFTER_PASS is 0, do pre-pass accounting, or if AFTER_PASS is 1
-   then do post-pass accounting.  Store the counting in RECORD.  */
+   Store the counting in RECORD.  */
 
 void
-account_profile_record (struct profile_record *record, int after_pass)
+profile_record_check_consistency (profile_record *record)
 {
   basic_block bb;
   edge_iterator ei;
@@ -1445,26 +1444,49 @@ account_profile_record (struct profile_record *record, int after_pass)
 	sum += e->probability;
 	  if (EDGE_COUNT (bb->succs)
 	  && sum.differs_from_p (profile_probability::always ()))
-	record->num_mismatched_freq_out[after_pass]++;
+	record->num_mismatched_freq_out++;
 	  profile_count lsum = profile_count::zero ();
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	lsum += e->count ();
 	  if (EDGE_COUNT (bb->succs) && (lsum.differs_from_p (bb->count)))
-	record->num_mismatched_count_out[after_pass]++;
+	record->num_mismatched_count_out++;
 	}
   if (bb != ENTRY_BLOCK_PTR_FOR_FN (cfun)
 	  && profile_status_for_fn (cfun) != PROFILE_ABSENT)
 	{
+	  profile_probability sum = profile_probability::never ();
 	  profile_count lsum = profile_count::zero ();
 	  FOR_EACH_EDGE (e, ei, bb->preds)
-	lsum += e->count ();
+	{
+	  sum += e->probability;
+	  lsum += e->count ();
+	}
+	  if (EDGE_COUNT (bb->preds)
+	  && sum.differs_from_p (profile_probability::always ()))
+	record->num_mismatched_freq_in++;
 	  if (lsum.differs_from_p (bb->count))
-	record->num_mismatched_count_in[after_pass]++;
+	record->num_mismatched_count_in++;
 	}
   if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
 	  || bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
 	continue;
   gcc_assert (cfg_hooks->account_profile_record);
-  cfg_hooks->account_profile_record (bb, after_pass, record);
+  cfg_hooks->account_profile_record (bb, record);
+   }
+}
+
+/* Work-horse for passes.c:acount_profile.
+   Do book-keeping of the CFG for the profile accounting.
+   Store the counting in RECORD.  */
+
+void
+profile_record_account_profile (profile_record *record)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, cfun)
+   {
+  gcc_assert (cfg_hooks->account_profile_record);
+  cfg_hooks->account_profile_record (bb, record);
}
 }
diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
index b5981da4a05..d1d2e70c3d4 100644
--- a/gcc/cfghooks.h
+++ b/gcc/cfghooks.h
@@ -38,18 +38,18 @@ struct profile_record
 {
   /* The number of basic blocks where sum(freq) of the block's predecessors
  doesn't match reasonably well with the incoming frequency.  */
-  int num_mismatched_freq_in[2];
+  int num_mismatched_freq_in;
   /* Likewise for a basic block's successors.  */
-  int num_mismatched_freq_out[2];
+  int num_mismatched_freq_out;
   /* The number of basic blocks where sum(count) of the block's predecessors
  doesn't match reasonably well with the incoming frequency.  */
-  int num_mismatched_count_in[2];
+  int num_mismatched_count_in;

Fix bug in fld_type_variant

2018-11-06 Thread Jan Hubicka
Hi,
in fld_type_variant I lost code copying alignment. This patch fixes it
and also checks that newly constructed variant is indeed compatible.

Bootstrapped/regtested x86_64-linux, comitted as obvious.

Honza
* tree.c (fld_type_variant): Also copy alignment; be sure that
new variant is equal.
Index: tree.c
===
--- tree.c  (revision 265841)
+++ tree.c  (working copy)
@@ -5119,6 +5119,8 @@ fld_type_variant (tree first, tree t, st
   TYPE_NAME (v) = TYPE_NAME (t);
   TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
   TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
+  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+  gcc_checking_assert (fld_type_variant_equal_p (t,v));
   add_tree_to_fld_list (v, fld);
   return v;
 }


Re: [PATCH] S/390: Introduce relative_long attribute

2018-11-06 Thread Andreas Krebbel
On 05.11.18 15:18, Ilya Leoshkevich wrote:
> In order to properly fix PR87762, we need to distinguish between
> instructions which support relative addressing and instructions which
> don't.  We could check whether the existing "type" attribute is equal to
> "larl", but there are notable exceptions (lrl, for example), and
> changing them makes scheduling worse on z10.  We could also check
> whether the existing "op_type" attribute is equal to "RIL-b" or "RIL-c".
> However, adding a new attribute provides more flexibility, since we
> don't depend idiosyncrasies which might be introduced into PoP in the
> future.
> 
> gcc/ChangeLog:
> 
> 2018-11-05  Ilya Leoshkevich  
> 
>   PR target/87762
>   * config/s390/s390.md: Add relative_long attribute.

Ok. Thanks!

Andreas



Re: [PATCH] S/390: Accept cdb in load-and-test-fp-1 testcase

2018-11-06 Thread Andreas Krebbel
On 06.11.18 13:12, Ilya Leoshkevich wrote:
> The compiler now generates cdb instead of cdbr for comparison with 0.0,
> which looks like an improvement to me.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-06  Ilya Leoshkevich  
> 
>   * gcc.target/s390/load-and-test-fp-1.c: Accept cdb.
> ---
>  gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c 
> b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
> index b9d59122242..2a7e88c0f1b 100644
> --- a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
> +++ b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
> @@ -14,4 +14,4 @@ foo (double dummy, double a)
>return a;
>  }
>  
> -/* { dg-final { scan-assembler "cdbr\t" } } */
> +/* { dg-final { scan-assembler {\tcdbr?\t} } } */
> 

Ok. Thanks!

Andreas



[PATCH] Fix PR86850 (hopefully)

2018-11-06 Thread Richard Biener


The following makes vec::splice of an empty vector to an empty
vector safe without falling onto the trap of invoking a method
on a NULL m_vec.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-11-06  Richard Biener  

PR tree-optimization/86850
* vec.h (vec::splice): Check src.length ()
instead of src.m_vec.

diff --git a/gcc/vec.h b/gcc/vec.h
index f8c039754d2..407269c5ad3 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1688,7 +1688,7 @@ template
 inline void
 vec::splice (const vec )
 {
-  if (src.m_vec)
+  if (src.length ())
 m_vec->splice (*(src.m_vec));
 }
 


Re: Notes on -mloongson-ext2

2018-11-06 Thread Paul Hua
Hi, Matthew:

Thanks for your review and suggestion.

> >}   \
> >\
> >+   \
>
> Remove excess new line.

Done.

> Also you have the EXT2 condition nested inside a check for EXT but you do
> not have any logic to ensure that use of EXT2 automatically enables EXT.
> Also you need logic to say that if a user explicitly says -mloongson-ext2
> -mno-loongson-ext then that is an error.

Done.

> >   /* Historical Octeon macro.  */  \
> >   if (TARGET_OCTEON)   \
> >builtin_define ("__OCTEON__");  \
>
> >diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> >index 4b7a627b7a6..c8128d4d530 100644
> >--- a/gcc/config/mips/mips.md
> >+++ b/gcc/config/mips/mips.md
> >@@ -335,6 +335,7 @@
> > ;; slt set less than instructions
> > ;; signext  sign extend instructions
> > ;; clz the clz and clo instructions
> >+;; ctz the ctz and cto instructions
>
> There is no need to add a new type here. These are for distinguishing
> between
> scheduling rules and I see no distinction being made in the schedulers. Just
> mark the new instructions as clz type.
> > ;; pop the pop instruction
> > ;; traptrap if instructions
> > ;; imulinteger multiply 2 operands
> >@@ -375,7 +376,7 @@
> > (define_attr "type"
> >
> "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
> >
> prefetch,prefetchx,condmove,mtc,mfc,mthi,mtlo,mfhi,mflo,const,arith,logical,
> >-
> shift,slt,signext,clz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
> >+
> shift,slt,signext,clz,ctz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
>
> As above no need for this.

Done.

>
> >@@ -7136,13 +7154,16 @@
> > (match_operand 2 "const_int_operand" "n"))]
> >   "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
> > {
> >-  if (TARGET_LOONGSON_2EF || TARGET_LOONGSON_EXT)
> >+  if (TARGET_LOONGSON_2EF || TARGET_LOONGSON_EXT || TARGET_LOONGSON_EXT2)
>
> This does not look correct. Is operand 1 really in the format you want for
> EXT2? Does it not need the appropriate MIPS prefetch value calculating like
> below for the non-loongson cores?

I made a mistake.  Loongson only implements perf hint=0 and hint=1, So
other case just
set to load or store. Updated patch.

> I suggest changing this to:
>
> if ((TARGET_LOONGSON_2EF || TARGET_LOONGSON_EXT)
> && !TARGET_LOONGSON_EXT2)
>
> And let the standard non-loongson code take care of the rest.
>
> > {
> >-  /* Loongson 2[ef] and Loongson 3a use load to $0 for prefetching.
> */
> >+  /* Loongson ext2 implementation pref insnstructions.  */
> >+  if (TARGET_LOONGSON_EXT2)
> >+   return "pref\t%1, %a0";
> >+  /* Loongson 2[ef] and Loongson ext use load to $0 for prefetching.
> */
> >   if (TARGET_64BIT)
> >-return "ld\t$0,%a0";
> >+   return "ld\t$0,%a0";
> >   else
> >-return "lw\t$0,%a0";
> >+   return "lw\t$0,%a0";
> > }
> >   operands[1] = mips_prefetch_cookie (operands[1], operands[2]);
> >   return "pref\t%1,%a0";
>
> This one needs resubmitting to check the updated logic.
> Thanks,
> Matthew
>

Thanks again.

Paul Hua
From 3bedc3c580e1cf570b5ad0717ffac985a84fbc40 Mon Sep 17 00:00:00 2001
From: Chenghua Xu 
Date: Fri, 31 Aug 2018 11:55:48 +0800
Subject: [PATCH] Add support for Loongson EXT2 istructions.

gcc/
	* config/mips/mips-protos.h
	(mips_loongson_ext2_prefetch_cookie): New prototype.
	* config/mips/mips.c (mips_loongson_ext2_prefetch_cookie): New.
	(mips_option_override): Enable TARGET_LOONGSON_EXT when
	TARGET_LOONGSON_EXT2 is true.
	* config/mips/mips.h (TARGET_CPU_CPP_BUILTINS): Define
	__mips_loongson_ext2, __mips_loongson_ext_rev=2.
	(ISA_HAS_CTZ_CTO): New, ture if TARGET_LOONGSON_EXT2.
	(ISA_HAS_PREFETCH): Include TARGET_LOONGSON_EXT and
	TARGET_LOONGSON_EXT2.
	(ASM_SPEC): Add mloongson-ext2 and mno-loongson-ext2.
	(define_insn "ctz2"): New insn pattern.
	(define_insn "prefetch"): Include TARGET_LOONGSON_EXT2.
	(define_insn "prefetch_indexed_"): Include
	TARGET_LOONGSON_EXT and TARGET_LOONGSON_EXT2.
	* config/mips/mips.opt (-mloongson-ext2): Add option.
	* gcc/doc/invoke.texi (-mloongson-ext2): Document.

gcc/testsuite/
	* gcc.target/mips/loongson-ctz.c: New test.
	* gcc.target/mips/loongson-dctz.c: Likewise.
	* gcc.target/mips/mips.exp (mips_option_groups): Add
	-mloongson-ext2 option.
---
 gcc/config/mips/mips-protos.h |1 +
 gcc/config/mips/mips.c|   28 +++
 gcc/config/mips/mips.h|   15 +++-
 gcc/config/mips/mips.md   |   47 ++--
 gcc/config/mips/mips.opt  |4 ++
 

[PATCH, OpenACC] Enable 0-length array data mappings for implicit data clauses

2018-11-06 Thread Chung-Lin Tang

Hi Thomas, this patch allows the gimplifier to create 0-length array mappings
for certain pointer and reference typed variables. Without this, array usage
of certain pointer variables do not work as usually intended, this is the
original description by Cesar when applied to og7:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00673.html

Note that this patch requires this to cleanly apply (still awaiting approval):
https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01911.html

Thomas, since this only touches OpenACC, I suppose you have the powers to 
approve.

Thanks,
Chung-Lin

2018-11-06  Cesar Philippidis  

gcc/
* gimplify.c (oacc_default_clause): Create implicit 0-length
array data clauses for pointers and reference types.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/fp-dyn-arrays.c: New test.
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9e2b0aa..58ef3de 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7081,7 +7081,12 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl, unsigned flags)
 case ORT_ACC_PARALLEL:
   rkind = "parallel";
 
-  if (is_private)
+  if (TREE_CODE (type) == REFERENCE_TYPE
+ && TREE_CODE (TREE_TYPE (type)) == POINTER_TYPE)
+   flags |= GOVD_MAP | GOVD_MAP_0LEN_ARRAY;
+  else if (!lang_GNU_Fortran () && TREE_CODE (type) == POINTER_TYPE)
+   flags |= GOVD_MAP | GOVD_MAP_0LEN_ARRAY;
+  else if (is_private)
flags |= GOVD_FIRSTPRIVATE;
   else if (on_device || declared)
flags |= GOVD_MAP;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/fp-dyn-arrays.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/fp-dyn-arrays.c
new file mode 100644
index 000..c57261f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/fp-dyn-arrays.c
@@ -0,0 +1,42 @@
+/* Expect dynamic arrays to be mapped on the accelerator via
+   GOMP_MAP_0LEN_ARRAY.  */
+
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  const int n = 1000;
+  int *a, *b, *c, *d, i;
+
+  d = (int *) 12345;
+  a = (int *) malloc (sizeof (int) * n);
+  b = (int *) malloc (sizeof (int) * n);
+  c = (int *) malloc (sizeof (int) * n);
+
+  for (i = 0; i < n; i++)
+{
+  a[i] = -1;
+  b[i] = i+1;
+  c[i] = 2*(i+1);
+}
+
+#pragma acc enter data create(a[0:n]) copyin(b[:n], c[:n])
+#pragma acc parallel loop
+  for (i = 0; i < n; ++i)
+{
+  a[i] = b[i] + c[i] + *((int *));
+}
+#pragma acc exit data copyout(a[0:n]) delete(b[0:n], c[0:n])
+
+  for (i = 0; i < n; i++)
+assert (a[i] == 3*(i+1) + 12345);
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}


Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Renlin Li

Hi Ramana,

On 11/06/2018 10:57 AM, Ramana Radhakrishnan wrote:

On Tue, Nov 6, 2018 at 10:52 AM Renlin Li  wrote:


Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:

On 11/5/18 12:36 PM, Peter Bergner wrote:

On 11/5/18 1:20 PM, Jeff Law wrote:

On 11/1/18 4:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.


  From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

Yes, there are issues related to the combiner changes.


But didn't the combiner changes come *after* these patches ? So IIUC,
Renlin has been trying to get these fixed *without* the combine
patches but just with your patch applied on top of the revision where
the problem started showing up .

Can you confirm that Renlin ?


I just did a bootstrap again with everything up to r264897 which is Oct 6.
it produce the ICE I mentioned on the PR87899.

The first combiner patch on Oct 22.

Regards,
Renlin




Ramana


But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native 
toolchain mis-compiled.
And the new patch seems not fix this problem.

I am trying to extract a test case, but it is a little bit hard as the 
toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

I created a bugzilla ticket for this, PR87899.

./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
   test main
Analyzing compilation unit
Performing interprocedural optimizations
   <*free_lang_data> 
 Streaming LTO
   
  
Assembling functions:
testduring GIMPLE pass: ldist

gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler 
error: Segmentation fault
  9 | test (void)
| ^~~~
0x5c3a37 crash_signal
 ../../gcc/gcc/toplev.c:325
0x63ef6b inchash::hash::add(void const*, unsigned int)
 ../../gcc/gcc/inchash.h:100
0x63ef6b inchash::hash::add_ptr(void const*)
 ../../gcc/gcc/inchash.h:94
0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
 ../../gcc/gcc/tree-loop-distribution.c:143
0x63ef6b hash_table::find_slot(data_dependence_relation* 
const&, insert_option)
 ../../gcc/gcc/hash-table.h:414
0x63ef6b get_data_dependence
 ../../gcc/gcc/tree-loop-distribution.c:1184
0x63f2bd data_dep_in_cycle_p
 ../../gcc/gcc/tree-loop-distribution.c:1210
0x63f2bd update_type_for_merge
 ../../gcc/gcc/tree-loop-distribution.c:1255
0x64064b build_rdg_partition_for_vertex
 ../../gcc/gcc/tree-loop-distribution.c:1302
0x64064b rdg_build_partitions
 ../../gcc/gcc/tree-loop-distribution.c:1754
0x64064b distribute_loop
 ../../gcc/gcc/tree-loop-distribution.c:2795
0x642299 execute
 ../../gcc/gcc/tree-loop-distribution.c:3133
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.



Regards
Renlin





At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.


Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.




Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff



Restore -fopt-info-vec optimized locations for SLP vectorization

2018-11-06 Thread Richard Biener


The following patch pushes a DUMP_VECT_SCOPE down one level because
it otherwise hides a MSG_OPTIMIZED_LOCATION print.

David - was this an intended effect of the scoping?

Applied to trunk.

Richard.

2018-11-06  Richard Biener  

* tree-vect-slp.c (vect_slp_bb): Move opening of vect_slp_analyze_bb
dump-scope ...
(vect_slp_analyze_bb_1): ... here to avoid hiding optimized locations.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index e7e5d252c00..f802b004bef 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2779,6 +2779,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
   vec datarefs, int n_stmts,
   bool , vec_info_shared *shared)
 {
+  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
+
   bb_vec_info bb_vinfo;
   slp_instance instance;
   int i;
@@ -2949,8 +2951,6 @@ vect_slp_bb (basic_block bb)
   bool any_vectorized = false;
   auto_vector_sizes vector_sizes;
 
-  DUMP_VECT_SCOPE ("vect_slp_analyze_bb");
-
   /* Autodetect first vector size we try.  */
   current_vector_size = 0;
   targetm.vectorize.autovectorize_vector_sizes (_sizes);


[PATCH] S/390: Accept cdb in load-and-test-fp-1 testcase

2018-11-06 Thread Ilya Leoshkevich
The compiler now generates cdb instead of cdbr for comparison with 0.0,
which looks like an improvement to me.

gcc/testsuite/ChangeLog:

2018-11-06  Ilya Leoshkevich  

* gcc.target/s390/load-and-test-fp-1.c: Accept cdb.
---
 gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c 
b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
index b9d59122242..2a7e88c0f1b 100644
--- a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
+++ b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
@@ -14,4 +14,4 @@ foo (double dummy, double a)
   return a;
 }
 
-/* { dg-final { scan-assembler "cdbr\t" } } */
+/* { dg-final { scan-assembler {\tcdbr?\t} } } */
-- 
2.19.1



[committed][MSP430] Fix classification of PC, CG1 and CG2 registers

2018-11-06 Thread Jozef Lawrynowicz

An ICE in gcc.dg/tree-ssa/asm-3.c for msp430-elf exposed some inconsistencies
in the classification of some of the "special" registers, R0 (PC), R2 (SR/CG1)
and R3 (CG2).

REG_CLASS_CONTENTS[GEN_REGS] does not have the bit for any of these registers
set, yet REGNO_REG_CLASS returns GEN_REGS for them.

Fixed by setting bit 0 for R0 in REG_CLASS_CONTENTS[GEN_REGS], and for R2 and
R3, REGNO_REG_CLASS will now return NO_REGS.

It is appropriate for R0 (PC) to be in the GEN_REGS class, as it can be used as
an operand in any instruction and using any addressing mode.

Successfully regtested the attached patch on trunk, and committed.

>From 9e26066f2a0f979a6bea538d27524e03b81618f3 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Fri, 2 Nov 2018 20:59:10 +
Subject: [PATCH] [MSP430] Fix register classification of PC, CG1 and CG2

2018-11-06  Jozef Lawrynowicz  

	* gcc/config/msp430/msp430.h (REG_CLASS_CONTENTS): Add R0 to
	REG_CLASS_CONTENTS[GEN_REGS].
	(REGNO_REG_CLASS): Return NO_REGS for R2 and R3.

	* gcc/testsuite/gcc.target/msp430/special-regs.c: New test.

---
 gcc/config/msp430/msp430.h | 11 +--
 gcc/testsuite/gcc.target/msp430/special-regs.c | 16 
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/special-regs.c

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 6bfe28c..380e63e 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -241,10 +241,15 @@ enum reg_class
   0x,		   \
   0x1000,		   \
   0x2000,		   \
-  0xfff2,		   \
+  0xfff3,		   \
   0x0001		   \
 }
 
+/* GENERAL_REGS just means that the "g" and "r" constraints can use these
+   registers.
+   Even though R0 (PC) and R1 (SP) are not "general" in that they can be used
+   for any purpose by the register allocator, they are general in that they can
+   be used by any instruction in any addressing mode.  */
 #define GENERAL_REGS			GEN_REGS
 #define BASE_REG_CLASS  		GEN_REGS
 #define INDEX_REG_CLASS			GEN_REGS
@@ -259,7 +264,9 @@ enum reg_class
 
 #define FIRST_PSEUDO_REGISTER 		17
 
-#define REGNO_REG_CLASS(REGNO)  ((REGNO) < 17 \
+#define REGNO_REG_CLASS(REGNO)		(REGNO != 2 \
+	 && REGNO != 3 \
+	 && REGNO < 17 \
 	 ? GEN_REGS : NO_REGS)
 
 #define TRAMPOLINE_SIZE			4 /* FIXME */
diff --git a/gcc/testsuite/gcc.target/msp430/special-regs.c b/gcc/testsuite/gcc.target/msp430/special-regs.c
new file mode 100644
index 000..c9121e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/special-regs.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+int foo (void)
+{
+  register int pc __asm__("R0");
+  register int sp __asm__("R1");
+  register int cg1 __asm__("R2"); /* { dg-error "the register specified for 'cg1' is not general enough" } */
+  register int cg2 __asm__("R3"); /* { dg-error "the register specified for 'cg2' is not general enough" } */
+
+  asm("" : "=r"(pc));
+  asm("" : "=r"(sp));
+  asm("" : "=r"(cg1));
+  asm("" : "=r"(cg2));
+
+  return pc + sp + cg1 + cg2;
+}
-- 
2.7.4



Re: [aarch64] disable shrink wrapping when tracking speculative execution

2018-11-06 Thread Richard Earnshaw (lists)
On 06/11/2018 01:40, Segher Boessenkool wrote:
> Hi Richard,
> 
> On Mon, Nov 05, 2018 at 10:09:30AM +, Richard Earnshaw (lists) wrote:
> Shouldn't you be able to do this per function at least?

 do what per function?  track speculation?
>>>
>>> disable shrink-wrapping only when any speculation was there
>>> (this is about __bultin_speculation_safe_value, no?)
>>
>> Only indirectly.  This is about the tracking code that tracks
>> conditional branches and propagates that information through call/return
>> sequences.  Shrink wrapping messes with the prologue/epilogue sequences
>> after the speculation tracking pass has run and unknowingly deletes some
>> of the additional code that was previously inserted by the tracking pass.
> 
> Do you have an example of this?  Shrink-wrapping does not generally
> delete any code.
> 

Well it generates new 'light-weight' prologue and epilogue sequences for
the 'shrunk' code path that lack the establishment of the tracker
register and doesn't know how to move the existing sequence to the new
entry sequence.

Consider this (trivial shrink-wrap case).

int f ();

int g (int a)
{
  if (a > 3)
return f() + 4;
  return a;
}

Without speculation tracking we get (sorry, aarch64, hope you can follow
this):

g:
cmp w0, 3
bgt .L8 // a > 3
ret
.p2align 2
.L8:
stp x29, x30, [sp, -16]!
mov x29, sp
bl  f   // f()
add w0, w0, 4   // + 4
ldp x29, x30, [sp], 16
ret

With shrink-wrapping and speculation both enabled, we get the following
code:

g:
cmp sp, 0
csetm   x15, ne  // Establish tracker - OK
cmp w0, 3
bgt .L8
// tracker not updated, speculation state not re-encoded in SP
ret
.p2align 2
.L8:
stp x29, x30, [sp, -16]!
cselx15, x15, xzr, gt  // tracker updated for branch
mov x14, sp
mov x29, sp
and x14, x14, x15 // Encode speculation status in SP
mov sp, x14
bl  f
add w0, w0, 4
cmp sp, 0// extract speculation status from call
ldp x29, x30, [sp], 16
csetm   x15, ne
mov x14, sp
and x14, x14, x15   // And re-encode it in return sp
mov sp, x14
ret

So although this code executes correctly from an architectural
perspective, if the early return path is taken speclatively, the caller
receives incomplete information about the speculation that has taken place.

And if we disable shrink wrapping, then obviously things work as
originally intended:

g:
cmp sp, 0
stp x29, x30, [sp, -16]!
csetm   x15, ne // establish tracker
mov x29, sp
cmp w0, 3
bgt .L5
ldp x29, x30, [sp], 16
cselx15, x15, xzr, le  // Update tracker for branch
mov x14, sp
and x14, x14, x15
mov sp, x14 // tracking encoded into SP for return
ret
.p2align 2
.L5:
cselx15, x15, xzr, gt  // As above.
mov x14, sp
and x14, x14, x15
mov sp, x14
bl  f
add w0, w0, 4
cmp sp, 0
ldp x29, x30, [sp], 16
csetm   x15, ne
mov x14, sp
and x14, x14, x15
mov sp, x14
ret

I'm not asking that shrink wrapping be updated to handle all this; in
fact, I'm not sure it's that easy to do as the branch patterns and
simple-return patterns aren't set up to handle this.

R.

> 
> Segher
> 



Re: [C++ Patch] Improve compute_array_index_type locations

2018-11-06 Thread Paolo Carlini
.. by the way, we could pass the location to abstract_virtuals_error 
too, thus fixing the locations of tests like other/abstract1.C and 
abstract2.C, where we currently hit the case:


  else if (identifier_p (decl))
    /* Here we do not have location information.  */
    error ("invalid abstract type %qT for %qE", type, decl);

not sure if it's worth it. What do you think?

Thanks, Paolo.



Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Renlin Li

Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:

On 11/5/18 12:36 PM, Peter Bergner wrote:

On 11/5/18 1:20 PM, Jeff Law wrote:

On 11/1/18 4:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.


 From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

Yes, there are issues related to the combiner changes.

But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native 
toolchain mis-compiled.
And the new patch seems not fix this problem.

I am trying to extract a test case, but it is a little bit hard as the 
toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

I created a bugzilla ticket for this, PR87899.

./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
 test main
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> 
 Streaming LTO

Assembling functions:

  testduring GIMPLE pass: ldist

gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler 
error: Segmentation fault
9 | test (void)
  | ^~~~
0x5c3a37 crash_signal
../../gcc/gcc/toplev.c:325
0x63ef6b inchash::hash::add(void const*, unsigned int)
../../gcc/gcc/inchash.h:100
0x63ef6b inchash::hash::add_ptr(void const*)
../../gcc/gcc/inchash.h:94
0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
../../gcc/gcc/tree-loop-distribution.c:143
0x63ef6b hash_table::find_slot(data_dependence_relation* 
const&, insert_option)
../../gcc/gcc/hash-table.h:414
0x63ef6b get_data_dependence
../../gcc/gcc/tree-loop-distribution.c:1184
0x63f2bd data_dep_in_cycle_p
../../gcc/gcc/tree-loop-distribution.c:1210
0x63f2bd update_type_for_merge
../../gcc/gcc/tree-loop-distribution.c:1255
0x64064b build_rdg_partition_for_vertex
../../gcc/gcc/tree-loop-distribution.c:1302
0x64064b rdg_build_partitions
../../gcc/gcc/tree-loop-distribution.c:1754
0x64064b distribute_loop
../../gcc/gcc/tree-loop-distribution.c:2795
0x642299 execute
../../gcc/gcc/tree-loop-distribution.c:3133
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.



Regards
Renlin





At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.


Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.




Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff



Re: Clear TYPELESS_STORAGE when turning type to incmplete

2018-11-06 Thread Richard Biener
On Tue, 6 Nov 2018, Jan Hubicka wrote:

> > On Tue, 6 Nov 2018, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch adds code to clear typeless storage flag. This is needed to
> > > enable some more merging for C++ types containing char array.
> > >
> > > Bootstrapped/regtesed x86_64-linux, OK?
> > 
> > OK.  (I guess we could equally well set it to 1?)
> 
> Then we would need to set it for front-end built incomplete types as
> well. Whatever I construct in here should better match real incomplete
> type built by front-ends otherwise merging of other types referring to
> this one via pointers won't happen.

Ah, of course.

Richard.


Re: Clear TYPELESS_STORAGE when turning type to incmplete

2018-11-06 Thread Jan Hubicka
> On Tue, 6 Nov 2018, Jan Hubicka wrote:
> 
> > Hi,
> > this patch adds code to clear typeless storage flag. This is needed to
> > enable some more merging for C++ types containing char array.
> >
> > Bootstrapped/regtesed x86_64-linux, OK?
> 
> OK.  (I guess we could equally well set it to 1?)

Then we would need to set it for front-end built incomplete types as
well. Whatever I construct in here should better match real incomplete
type built by front-ends otherwise merging of other types referring to
this one via pointers won't happen.

Honza

> 
> Richard.
>  
> > Honza
> > 
> > * tree.c (fld_incomplete_type_of): Clear TYPE_TYPELESS_STORAGE
> > flag.
> > Index: tree.c
> > ===
> > --- tree.c  (revision 265835)
> > +++ tree.c  (working copy)
> > @@ -5173,6 +5173,7 @@ fld_incomplete_type_of (tree t, struct f
> >   SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
> >   TYPE_SIZE_UNIT (copy) = NULL;
> >   TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
> > + TYPE_TYPELESS_STORAGE (copy) = 0;
> >   if (AGGREGATE_TYPE_P (t))
> > {
> >   TYPE_FIELDS (copy) = NULL;
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: Clear TYPELESS_STORAGE when turning type to incmplete

2018-11-06 Thread Richard Biener
On Tue, 6 Nov 2018, Jan Hubicka wrote:

> Hi,
> this patch adds code to clear typeless storage flag. This is needed to
> enable some more merging for C++ types containing char array.
>
> Bootstrapped/regtesed x86_64-linux, OK?

OK.  (I guess we could equally well set it to 1?)

Richard.
 
> Honza
> 
>   * tree.c (fld_incomplete_type_of): Clear TYPE_TYPELESS_STORAGE
>   flag.
> Index: tree.c
> ===
> --- tree.c(revision 265835)
> +++ tree.c(working copy)
> @@ -5173,6 +5173,7 @@ fld_incomplete_type_of (tree t, struct f
> SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
> TYPE_SIZE_UNIT (copy) = NULL;
> TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
> +   TYPE_TYPELESS_STORAGE (copy) = 0;
> if (AGGREGATE_TYPE_P (t))
>   {
> TYPE_FIELDS (copy) = NULL;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Do not stream TYPE_NEEDS_CONSTRUCTING

2018-11-06 Thread Richard Biener
On Tue, 6 Nov 2018, Jan Hubicka wrote:

> > On Tue, 6 Nov 2018, Jan Hubicka wrote:
> > 
> > > Hi,
> > > TYPE_NEEDS_CONSTRUCTING is one of reasons why we get duplicated complete 
> > > and
> > > incomplete types after my patch because the incoplete type I construct
> > > may have TYPE_NEEDS_CONSTRUCTING set.
> > > 
> > > I think this flag is useless and can be dropped - the only use in 
> > > ipa-pure-const
> > > seems confused since we should drop the readonly flag on such variables.
> > 
> > Did you check that with an assert?
> 
> You can't assert on this easily because constructor may end up being
> optimized out and the variable promoted to readonly for valid reasons.

Hmm, I see.


Re: PR83750: CSE erf/erfc pair

2018-11-06 Thread Richard Biener
On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 5 Nov 2018 at 18:14, Richard Biener  
> wrote:
> >
> > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 5 Nov 2018 at 15:10, Richard Biener  
> > > wrote:
> > > >
> > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > > single use within 1 - erf(x).
> > > > >
> > > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > > reproduces the issue with patch:
> > > > >
> > > > > void test(double d1) {
> > > > >   if (signbit(erfc(d1)))
> > > > > link_failure_erfc();
> > > > > }
> > > > >
> > > > > ssa dump:
> > > > >
> > > > >:
> > > > >   _5 = __builtin_erf (d1_4(D));
> > > > >   _1 = 1.0e+0 - _5;
> > > > >   _6 = _1 < 0.0;
> > > > >   _2 = (int) _6;
> > > > >   if (_2 != 0)
> > > > > goto ; [INV]
> > > > >   else
> > > > > goto ; [INV]
> > > > >
> > > > >:
> > > > >   link_failure_erfc ();
> > > > >
> > > > >:
> > > > >   return;
> > > > >
> > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > > forwprop then transforms the if condition from _2 != 0
> > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > > in undefined reference to link_failure_erfc().
> > > > >
> > > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > > >
> > > > Ick.
> > > >
> > > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> > >
> > > This caused undefined reference to link_failure_erf() in following 
> > > test-case:
> > >
> > > extern int signbit(double);
> > > extern void link_failure_erf(void);
> > > extern double erf(double);
> > >
> > > void test(double d1) {
> > >   if (signbit(erf(d1)))
> > > link_failure_erf();
> > > }
> >
> > But that's already not optimized without any canonicalization
> > because erf returns sth in range [-1, 1].
> >
> > I suggested the change because we have limited support for FP
> > value-ranges and nonnegative is one thing we can compute
> > (and erfc as opposed to erf is nonnegative).
> Ah right, thanks for the explanation.
> Unfortunately this still regresses builtin-nonneg-1.c, which can be
> reproduced with following test-case:
>
> extern int signbit(double);
> extern void link_failure_erf(void);
> extern double erf(double);
> extern double fabs(double);
>
> void test(double d1) {
>   if (signbit(erf(fabs(d1
> link_failure_erf();
> }
>
> signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
> it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
> defeats DCE.
>
> forwprop1 shows:
>  :
>   _1 = ABS_EXPR ;
>   _6 = __builtin_erfc (_1);
>   _2 = 1.0e+0 - _6;
>   _7 = _6 > 1.0e+0;
>   _3 = (int) _7;
>   if (_6 > 1.0e+0)
> goto ; [INV]
>   else
> goto ; [INV]
>
>:
>   link_failure_erf ();
>
>:
>   return;
>
> I assume we would need to somehow tell gcc that the canonicalized
> expression 1 - erfc(x) would not exceed 1.0 ?
> Is there a better way to do that apart from defining pattern (1 -
> erfc(x)) > 1.0 -> 0
> which I agree doesn't look ideal to add in match.pd ?

You could handle a MINUS_EXPR of 1 and erfc() in
gimple_assign_nonnegative_warnv_p (I wouldn't bother
to do it for tree_binary_nonnegative_warnv_p)

This is of course similarly "lame" but a bit cleaner than
a match.pd pattern that just works for the comparison.

In reality the proper long-term fix is to add basic range
propagation to floats.

Richard.

>
> Thanks
> Prathamesh
> >
> > > forwprop1 shows:
> > >
> > > :
> > >   _5 = __builtin_erfc (d1_4(D));
> > >   _1 = 1.0e+0 - _5;
> > >   _6 = _5 > 1.0e+0;
> > >   _2 = (int) _6;
> > >   if (_5 > 1.0e+0)
> > > goto ; [INV]
> > >   else
> > > goto ; [INV]
> > >
> > >:
> > >   link_failure_erf ();
> > >
> > >:
> > >   return;
> > >
> > > which defeats DCE to remove call to link_failure_erf.
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > which resolves the regression.
> > > > >
> > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > > Cross-testing on arm and aarch64 variants in progress.
> > > > > OK for trunk if passes ?
> > > > >
> > > > > Thanks,
> > > > > Prathamesh


Clear TYPELESS_STORAGE when turning type to incmplete

2018-11-06 Thread Jan Hubicka
Hi,
this patch adds code to clear typeless storage flag. This is needed to
enable some more merging for C++ types containing char array.

Bootstrapped/regtesed x86_64-linux, OK?

Honza

* tree.c (fld_incomplete_type_of): Clear TYPE_TYPELESS_STORAGE
flag.
Index: tree.c
===
--- tree.c  (revision 265835)
+++ tree.c  (working copy)
@@ -5173,6 +5173,7 @@ fld_incomplete_type_of (tree t, struct f
  SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
  TYPE_SIZE_UNIT (copy) = NULL;
  TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
+ TYPE_TYPELESS_STORAGE (copy) = 0;
  if (AGGREGATE_TYPE_P (t))
{
  TYPE_FIELDS (copy) = NULL;


Re: Do not stream TYPE_NEEDS_CONSTRUCTING

2018-11-06 Thread Jan Hubicka
> On Tue, 6 Nov 2018, Jan Hubicka wrote:
> 
> > Hi,
> > TYPE_NEEDS_CONSTRUCTING is one of reasons why we get duplicated complete and
> > incomplete types after my patch because the incoplete type I construct
> > may have TYPE_NEEDS_CONSTRUCTING set.
> > 
> > I think this flag is useless and can be dropped - the only use in 
> > ipa-pure-const
> > seems confused since we should drop the readonly flag on such variables.
> 
> Did you check that with an assert?

You can't assert on this easily because constructor may end up being
optimized out and the variable promoted to readonly for valid reasons.

Honza


Re: [libsanitizer] Enable libsanitizer on Solaris (PR sanitizer/80953)

2018-11-06 Thread Jakub Jelinek
On Tue, Nov 06, 2018 at 11:12:00AM +0100, Rainer Orth wrote:
> The asan port is 32-bit only for now (on sparc because 64-bit
> Solaris/SPARC uses the full address space with a large virtual address
> hole in the middle whose exact location is machine-dependent and not
> easily determined at runtime; on x86 the situation was the same before
> Solaris 11.4).

Doesn't e.g. the initial thread stack live at the end of the address space,
so that you could compute from that the size of the virtual address space
and from there determine the size of the hole?  Or poke the hole with
mmap calls or something similar?  Parse some special filesystem files?
libasan can cope with dynamically determined layouts in some cases, on some
arches it supports varying virtual address space sizes (though with a fixed
shadow offset).

> 2018-10-31  Rainer Orth  
> 
>   gcc:
>   PR sanitizer/80953
>   * config/sol2.h (ASAN_CC1_SPEC): Define.
>   (LD_WHOLE_ARCHIVE_OPTION): Define.
>   (LD_NO_WHOLE_ARCHIVE_OPTION): Define.
>   (ASAN_REJECT_SPEC): Provide default.
>   (LIBASAN_EARLY_SPEC): Define.
>   (LIBTSAN_EARLY_SPEC): Define.
>   (LIBLSAN_EARLY_SPEC): Define.
>   * config/i386/sol2.h (CC1_SPEC): Redefine.
>   (ASAN_REJECT_SPEC): Define.
> 
>   * config/sparc/sparc.c (sparc_asan_shadow_offset): Declare.
>   (TARGET_ASAN_SHADOW_OFFSET): Define.
>   (sparc_asan_shadow_offset): New function.
>   * config/sparc/sol2.h (CC1_SPEC): Append ASAN_CC1_SPEC.
>   (ASAN_REJECT_SPEC): Define.
> 
>   gcc/testsuite:
>   PR sanitizer/80953
>   * c-c++-common/asan/alloca_loop_unpoisoning.c: Require alloca
>   support.
>   (foo): Use __builtin_alloca.
> 
>   libsanitizer:
>   PR sanitizer/80953
>   * configure.tgt (sparc*-*-solaris2.11*): Enable.
>   (x86_64-*-solaris2.11* | i?86-*-solaris2.11*): Enable.

Ok.

Jakub


Re: Do not stream TYPE_NEEDS_CONSTRUCTING

2018-11-06 Thread Richard Biener
On Tue, 6 Nov 2018, Jan Hubicka wrote:

> Hi,
> TYPE_NEEDS_CONSTRUCTING is one of reasons why we get duplicated complete and
> incomplete types after my patch because the incoplete type I construct
> may have TYPE_NEEDS_CONSTRUCTING set.
> 
> I think this flag is useless and can be dropped - the only use in 
> ipa-pure-const
> seems confused since we should drop the readonly flag on such variables.

Did you check that with an assert?

Otherwise OK.

Richard.

> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
>   * ipa-pure-const.c (check_decl): Do not test TYPE_NEEDS_CONSTRUCTING.
>   * lto-streamer-out.c (hash_tree): Do not hash TYPE_NEEDS_CONSTRUCTING.
>   * tree-streamer-in.c (unpack_ts_type_common_value_fields): Do not
>   stream TYPE_NEEDS_CONSTRUCTING.
>   * tree-streamer-out.c (pack_ts_type_common_value_fields): Likewise.
>   * tree.c (free_lang_data_in_type): Clear TYPE_NEEDS_CONSTRUCTING.
> Index: ipa-pure-const.c
> ===
> --- ipa-pure-const.c  (revision 265738)
> +++ ipa-pure-const.c  (working copy)
> @@ -339,7 +339,7 @@ check_decl (funct_state local,
>if (DECL_EXTERNAL (t) || TREE_PUBLIC (t))
>  {
>/* Readonly reads are safe.  */
> -  if (TREE_READONLY (t) && !TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (t)))
> +  if (TREE_READONLY (t))
>   return; /* Read of a constant, do not change the function state.  */
>else
>   {
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 265738)
> +++ lto-streamer-out.c(working copy)
> @@ -1147,7 +1147,6 @@ hash_tree (struct streamer_tree_cache_d
>hstate.add_flag (TYPE_STRING_FLAG (t));
>/* TYPE_NO_FORCE_BLK is private to stor-layout and need
>no streaming.  */
> -  hstate.add_flag (TYPE_NEEDS_CONSTRUCTING (t));
>hstate.add_flag (TYPE_PACKED (t));
>hstate.add_flag (TYPE_RESTRICT (t));
>hstate.add_flag (TYPE_USER_ALIGN (t));
> Index: tree-streamer-in.c
> ===
> --- tree-streamer-in.c(revision 265738)
> +++ tree-streamer-in.c(working copy)
> @@ -367,7 +367,6 @@ unpack_ts_type_common_value_fields (stru
>TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
>/* TYPE_NO_FORCE_BLK is private to stor-layout and need
>   no streaming.  */
> -  TYPE_NEEDS_CONSTRUCTING (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_PACKED (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
> Index: tree-streamer-out.c
> ===
> --- tree-streamer-out.c   (revision 265738)
> +++ tree-streamer-out.c   (working copy)
> @@ -314,7 +314,6 @@ pack_ts_type_common_value_fields (struct
>bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
>/* TYPE_NO_FORCE_BLK is private to stor-layout and need
>   no streaming.  */
> -  bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
>bp_pack_value (bp, TYPE_PACKED (expr), 1);
>bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
>bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
> Index: tree.c
> ===
> --- tree.c(revision 265738)
> +++ tree.c(working copy)
> @@ -5254,6 +5254,8 @@ free_lang_data_in_type (tree type)
>TREE_LANG_FLAG_5 (type) = 0;
>TREE_LANG_FLAG_6 (type) = 0;
>  
> +  TYPE_NEEDS_CONSTRUCTING (type) = 0;
> +
>if (TREE_CODE (type) == FUNCTION_TYPE)
>  {
>/* Remove the const and volatile qualifiers from arguments.  The
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Do not stream TYPE_NEEDS_CONSTRUCTING

2018-11-06 Thread Jan Hubicka
Hi,
TYPE_NEEDS_CONSTRUCTING is one of reasons why we get duplicated complete and
incomplete types after my patch because the incoplete type I construct
may have TYPE_NEEDS_CONSTRUCTING set.

I think this flag is useless and can be dropped - the only use in ipa-pure-const
seems confused since we should drop the readonly flag on such variables.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* ipa-pure-const.c (check_decl): Do not test TYPE_NEEDS_CONSTRUCTING.
* lto-streamer-out.c (hash_tree): Do not hash TYPE_NEEDS_CONSTRUCTING.
* tree-streamer-in.c (unpack_ts_type_common_value_fields): Do not
stream TYPE_NEEDS_CONSTRUCTING.
* tree-streamer-out.c (pack_ts_type_common_value_fields): Likewise.
* tree.c (free_lang_data_in_type): Clear TYPE_NEEDS_CONSTRUCTING.
Index: ipa-pure-const.c
===
--- ipa-pure-const.c(revision 265738)
+++ ipa-pure-const.c(working copy)
@@ -339,7 +339,7 @@ check_decl (funct_state local,
   if (DECL_EXTERNAL (t) || TREE_PUBLIC (t))
 {
   /* Readonly reads are safe.  */
-  if (TREE_READONLY (t) && !TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (t)))
+  if (TREE_READONLY (t))
return; /* Read of a constant, do not change the function state.  */
   else
{
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 265738)
+++ lto-streamer-out.c  (working copy)
@@ -1147,7 +1147,6 @@ hash_tree (struct streamer_tree_cache_d
   hstate.add_flag (TYPE_STRING_FLAG (t));
   /* TYPE_NO_FORCE_BLK is private to stor-layout and need
 no streaming.  */
-  hstate.add_flag (TYPE_NEEDS_CONSTRUCTING (t));
   hstate.add_flag (TYPE_PACKED (t));
   hstate.add_flag (TYPE_RESTRICT (t));
   hstate.add_flag (TYPE_USER_ALIGN (t));
Index: tree-streamer-in.c
===
--- tree-streamer-in.c  (revision 265738)
+++ tree-streamer-in.c  (working copy)
@@ -367,7 +367,6 @@ unpack_ts_type_common_value_fields (stru
   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
   /* TYPE_NO_FORCE_BLK is private to stor-layout and need
  no streaming.  */
-  TYPE_NEEDS_CONSTRUCTING (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_PACKED (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
Index: tree-streamer-out.c
===
--- tree-streamer-out.c (revision 265738)
+++ tree-streamer-out.c (working copy)
@@ -314,7 +314,6 @@ pack_ts_type_common_value_fields (struct
   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
   /* TYPE_NO_FORCE_BLK is private to stor-layout and need
  no streaming.  */
-  bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
   bp_pack_value (bp, TYPE_PACKED (expr), 1);
   bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
   bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
Index: tree.c
===
--- tree.c  (revision 265738)
+++ tree.c  (working copy)
@@ -5254,6 +5254,8 @@ free_lang_data_in_type (tree type)
   TREE_LANG_FLAG_5 (type) = 0;
   TREE_LANG_FLAG_6 (type) = 0;
 
+  TYPE_NEEDS_CONSTRUCTING (type) = 0;
+
   if (TREE_CODE (type) == FUNCTION_TYPE)
 {
   /* Remove the const and volatile qualifiers from arguments.  The


[PATCH] Fix PR87889

2018-11-06 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-06  Richard Biener  

PR tree-optimization/87889
* tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges):
Do nothing if old and new arg are the same

* gcc.dg/pr87894.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/pr87894.c b/gcc/testsuite/gcc.dg/pr87894.c
new file mode 100644
index 000..921a9cec468
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87894.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+int a, b, c, d;
+double e;
+
+void f(double g[][1])
+{
+  for (;;)
+{
+  double h;
+  for (; b < c; b++)
+   {
+ if (b >= 0)
+   ;
+ else if (d)
+   h = 2.0;
+ else
+   h = 0.0;
+ if (e)
+   g[a][b] = 0.0;
+ g[a][b] = h;
+   }
+}
+}
+
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index f1b023b4e4e..d4e71b7195b 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -977,7 +977,8 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge 
to)
}
   if (TREE_CODE (from_arg) != SSA_NAME)
gcc_assert (operand_equal_p (from_arg, to_arg, 0));
-  else if (TREE_CODE (to_arg) == SSA_NAME)
+  else if (TREE_CODE (to_arg) == SSA_NAME
+  && from_arg != to_arg)
{
  if (get_current_def (to_arg) == NULL_TREE)
{


[Committed] S/390: Fix PR87723

2018-11-06 Thread Andreas Krebbel
Committed to mainline after successful bootstrapping and regression
testing.

gcc/ChangeLog:

2018-11-06  Andreas Krebbel  

PR target/87723
* config/s390/s390.md ("*rsbg_di_rotl"): Remove mode
attributes for operands 3 and 4.

gcc/testsuite/ChangeLog:

2018-11-06  Andreas Krebbel  

PR target/87723
* gcc.target/s390/pr87723.c: New test.
---
 gcc/config/s390/s390.md |  2 +-
 gcc/testsuite/gcc.target/s390/pr87723.c | 29 +
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr87723.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 8e7b285..4ffd438 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -4230,7 +4230,7 @@
  (match_operand:DI 4 "nonimmediate_operand" "0")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_Z10"
-  "rsbg\t%0,%1,%2,%2,%b3"
+  "rsbg\t%0,%1,%s2,%e2,%b3"
   [(set_attr "op_type" "RIE")])
 
 ; rosbg, rxsbg
diff --git a/gcc/testsuite/gcc.target/s390/pr87723.c 
b/gcc/testsuite/gcc.target/s390/pr87723.c
new file mode 100644
index 000..b0e8a5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr87723.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z196 -m64 -mzarch" } */
+
+unsigned long a;
+int b;
+void c(char* i) {
+  for (;;) {
+char g = 0;
+for (; g < 24; ++g)
+  b = a << g | a >> 64 - g;
+{
+  char *d = i;
+  long h = b;
+  char e = 0;
+  for (; e < 8; ++e)
+   d[e] = h;
+}
+char *d = i;
+signed e;
+unsigned long f = 0;
+e = 7;
+for (; e; --e) {
+  f <<= 8;
+  f |= d[e];
+}
+for (; e < 8; ++e)
+  d[e] = f;
+  }
+}
-- 
2.7.4



Re: Fix SPEC gcc micompile with LTO

2018-11-06 Thread Jan Hubicka
Hi,
this is updated version of the patch. As discussed on IRC, C++ has two
different types of references (rvalue and normal). They have different
canonical type, but this does not affect outcome of get_alias_set
because it rebuilds the pointer/reference types from scratch.

The effect of this patch is to turn both into one type when pointer is
reuilt that should be safe. So I simply relaxed the sanity check that
type canonicals needs to be same for pointers.

lto-bootstrapped/regtested x86_64-linux, plan to commit it shortly.

Honza

* gcc.dg/lto/tbaa-1.c: New testcase.

* tree.c (fld_type_variant): Copy canonical type.
(fld_incomplete_type_of): Check that canonical types looks sane;
copy canonical type.
(verify_type): Accept when incomplete type has complete canonical type.
Index: testsuite/gcc.dg/lto/tbaa-1.c
===
--- testsuite/gcc.dg/lto/tbaa-1.c   (nonexistent)
+++ testsuite/gcc.dg/lto/tbaa-1.c   (working copy)
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -flto -fdump-tree-evrp" } */
+typedef struct rtx_def *rtx;
+typedef struct cselib_val_struct
+{
+  union
+  {
+  } u;
+  struct elt_loc_list *locs;
+}
+cselib_val;
+struct elt_loc_list
+{
+  struct elt_loc_list *next;
+  rtx loc;
+};
+static int n_useless_values;
+unchain_one_elt_loc_list (pl)
+ struct elt_loc_list **pl;
+{
+  struct elt_loc_list *l = *pl;
+  *pl = l->next;
+}
+
+discard_useless_locs (x, info)
+ void **x;
+{
+  cselib_val *v = (cselib_val *) * x;
+  struct elt_loc_list **p = >locs;
+  int had_locs = v->locs != 0;
+  while (*p)
+{
+  unchain_one_elt_loc_list (p);
+  p = &(*p)->next;
+}
+  if (had_locs && v->locs == 0)
+{
+  n_useless_values++;
+}
+}
+/* { dg-final { scan-tree-dump-times "n_useless_values" 2 "evrp" } } */
 
Index: tree.c
===
--- tree.c  (revision 265807)
+++ tree.c  (working copy)
@@ -5118,6 +5118,7 @@ fld_type_variant (tree first, tree t, st
   TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
   TYPE_NAME (v) = TYPE_NAME (t);
   TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
+  TYPE_CANONICAL (v) = TYPE_CANONICAL (t);
   add_tree_to_fld_list (v, fld);
   return v;
 }
@@ -5146,6 +5147,8 @@ fld_incomplete_type_of (tree t, struct f
  else
first = build_reference_type_for_mode (t2, TYPE_MODE (t),
TYPE_REF_CAN_ALIAS_ALL (t));
+ gcc_assert (TYPE_CANONICAL (t2) != t2
+ && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
  add_tree_to_fld_list (first, fld);
  return fld_type_variant (first, t, fld);
}
@@ -5169,6 +5174,7 @@ fld_incomplete_type_of (tree t, struct f
  SET_TYPE_MODE (copy, VOIDmode);
  SET_TYPE_ALIGN (copy, BITS_PER_UNIT);
  TYPE_SIZE_UNIT (copy) = NULL;
+ TYPE_CANONICAL (copy) = TYPE_CANONICAL (t);
  if (AGGREGATE_TYPE_P (t))
{
  TYPE_FIELDS (copy) = NULL;
@@ -13880,7 +13893,8 @@ verify_type (const_tree t)
  with variably sized arrays because their sizes possibly
  gimplified to different variables.  */
   && !variably_modified_type_p (ct, NULL)
-  && !gimple_canonical_types_compatible_p (t, ct, false))
+  && !gimple_canonical_types_compatible_p (t, ct, false)
+  && COMPLETE_TYPE_P (t))
 {
   error ("TYPE_CANONICAL is not compatible");
   debug_tree (ct);


[PATCH] x86: Optimize VFIXUPIMM* patterns with multiple-alternative constraints

2018-11-06 Thread Wei Xiao
Hi maintainers,

The attached patch intends to optimize VFIXUPIMM* patterns with
multiple-alternative constraints and
4 patterns are combined into 2 patterns. Tested with bootstrap and
regression tests on x86_64. No regressions.

Is it OK for trunk?

Thanks,
Wei


opt-vfixupimm-v1.diff
Description: Binary data


[libsanitizer] Enable libsanitizer on Solaris (PR sanitizer/80953)

2018-11-06 Thread Rainer Orth
Now that the Solaris sanitizer port (asan and ubsan only for now) has
been upstream for a while and some last issues fixed, I'd finally like
to enable the sanitizers on both sparc and x86.

The asan port is 32-bit only for now (on sparc because 64-bit
Solaris/SPARC uses the full address space with a large virtual address
hole in the middle whose exact location is machine-dependent and not
easily determined at runtime; on x86 the situation was the same before
Solaris 11.4).

The patch itself is pretty trivial, mostly duplicating what Linux does.

The only parts I need approval for are

* The SPARC implementation of TARGET_ASAN_SHADOW_OFFSET, so far only
  tested on 32-bit Solaris/SPARC.

* The change to c-c++-common/asan/alloca_loop_unpoisoning.c is necessary
  since Solaris doesn't have an alloca prototype by default:

FAIL: c-c++-common/asan/alloca_loop_unpoisoning.c   -O0  (test for excess 
errors)
WARNING: c-c++-common/asan/alloca_loop_unpoisoning.c   -O0  compilation failed 
to produce executable

Excess errors:
/vol/gcc/src/hg/trunk/solaris-asan/gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c:19:3:
 warning: implicit declaration of function 'alloca' 
[-Wimplicit-function-declaration]
/vol/gcc/src/hg/trunk/solaris-asan/gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c:19:3:
 warning: incompatible implicit declaration of built-in function 'alloca'

  This isn't an issue on Linux where glibc  includes
   which defines alloca as __builtin_alloca.

* I'm only enabling libsanitizer on Solaris 11 right now: on Solaris 10
  there are a couple of libsanitizer compilation failures and I don't
  know if I have much time or inclination to fix them only to rip the
  changes out again in GCC 10.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.  x86 testsuite
results are decent with only c-c++-common/asan/swapcontext-test-1.c
failing and g++.dg/asan/default-options-1.C failing on Solaris 11.4+
(11.3 is fine; I suspect this is just another instance of PR c++/52477).
On sparc there are quite some more which I intend to investigate
subsequently.

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-10-31  Rainer Orth  

gcc:
PR sanitizer/80953
* config/sol2.h (ASAN_CC1_SPEC): Define.
(LD_WHOLE_ARCHIVE_OPTION): Define.
(LD_NO_WHOLE_ARCHIVE_OPTION): Define.
(ASAN_REJECT_SPEC): Provide default.
(LIBASAN_EARLY_SPEC): Define.
(LIBTSAN_EARLY_SPEC): Define.
(LIBLSAN_EARLY_SPEC): Define.
* config/i386/sol2.h (CC1_SPEC): Redefine.
(ASAN_REJECT_SPEC): Define.

* config/sparc/sparc.c (sparc_asan_shadow_offset): Declare.
(TARGET_ASAN_SHADOW_OFFSET): Define.
(sparc_asan_shadow_offset): New function.
* config/sparc/sol2.h (CC1_SPEC): Append ASAN_CC1_SPEC.
(ASAN_REJECT_SPEC): Define.

gcc/testsuite:
PR sanitizer/80953
* c-c++-common/asan/alloca_loop_unpoisoning.c: Require alloca
support.
(foo): Use __builtin_alloca.

libsanitizer:
PR sanitizer/80953
* configure.tgt (sparc*-*-solaris2.11*): Enable.
(x86_64-*-solaris2.11* | i?86-*-solaris2.11*): Enable.

# HG changeset patch
# Parent  7a0019c2c3d32fe0c18f4708f0099e2816f05fa6
Enable libsanitizer on Solaris (PR sanitizer/80953)

diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -54,6 +54,9 @@ along with GCC; see the file COPYING3.  
 #undef CPP_SPEC
 #define CPP_SPEC "%(cpp_subtarget)"
 
+#undef CC1_SPEC
+#define CC1_SPEC "%(cc1_cpu) " ASAN_CC1_SPEC
+
 /* GNU as understands --32 and --64, but the native Solaris
assembler requires -xarch=generic or -xarch=generic64 instead.  */
 #ifdef USE_GAS
@@ -241,6 +244,10 @@ along with GCC; see the file COPYING3.  
 #define LARGECOMM_SECTION_ASM_OP "\t.lbcomm\t"
 #endif
 
+/* -fsanitize=address is currently only supported for 32-bit.  */
+#define ASAN_REJECT_SPEC \
+  DEF_ARCH64_SPEC("%e:-fsanitize=address is not supported in this configuration")
+
 #define USE_IX86_FRAME_POINTER 1
 #define USE_X86_64_FRAME_POINTER 1
 
diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -138,6 +138,9 @@ along with GCC; see the file COPYING3.  
 #define DEF_ARCH64_SPEC(__str) "%{!m32:" __str "}"
 #endif
 
+/* Solaris needs -fasynchronous-unwind-tables to generate unwind info.  */
+#define ASAN_CC1_SPEC "%{%:sanitize(address):-fasynchronous-unwind-tables}"
+
 /* It's safe to pass -s always, even if -g is not used.  Those options are
handled by both Sun as and GNU as.  */
 #define ASM_SPEC_BASE \
@@ -231,6 +234,36 @@ along with GCC; see the file COPYING3.  
 #define ENDFILE_VTV_SPEC ""
 #endif /* !ENABLE_VTABLE_VERIFY */
 
+/* Link -lasan early on the command line.  For -static-libasan, 

Re: [libsanitizer] Cherry-pick Solaris sanitizer fixes (PR sanitizer/80953)

2018-11-06 Thread Jakub Jelinek
On Tue, Nov 06, 2018 at 10:53:12AM +0100, Rainer Orth wrote:
> As a prerequisite for the soon-to-be-submitted Solaris libsanitizer
> patch, I'd like to cherry-pick two fixes recently installed upstream.
> 
> Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.  Ok for
> mainline?

Ok.

Jakub


[libsanitizer] Cherry-pick Solaris sanitizer fixes (PR sanitizer/80953)

2018-11-06 Thread Rainer Orth
As a prerequisite for the soon-to-be-submitted Solaris libsanitizer
patch, I'd like to cherry-pick two fixes recently installed upstream.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.  Ok for
mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-10-31  Rainer Orth  

PR sanitizer/80953
* sanitizer_common/sanitizer_internal_defs.h,
sanitizer_common/sanitizer_platform_limits_solaris.h,
sanitizer_common/sanitizer_procmaps_solaris.cc,
sanitizer_common/sanitizer_solaris.cc: Cherry-pick compiler-rt
revision 346153.
* sanitizer_common/sanitizer_stacktrace.h,
sanitizer_common/sanitizer_stacktrace_sparc.cc: Cherry-pick
compiler-rt revision 346155.

# HG changeset patch
# Parent  b62a13c742d5ca0d2bb80184916b2b24cd558d0c
Cherry-pick Solaris sanitizer fixes

diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -170,6 +170,7 @@ typedef int pid_t;
 
 #if SANITIZER_FREEBSD || SANITIZER_NETBSD || \
 SANITIZER_OPENBSD || SANITIZER_MAC || \
+(SANITIZER_SOLARIS && (defined(_LP64) || _FILE_OFFSET_BITS == 64)) || \
 (SANITIZER_LINUX && defined(__x86_64__))
 typedef u64 OFF_T;
 #else
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_solaris.h b/libsanitizer/sanitizer_common/sanitizer_platform_limits_solaris.h
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_solaris.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_solaris.h
@@ -208,8 +208,7 @@ struct __sanitizer_cmsghdr {
   int cmsg_type;
 };
 
-#if SANITIZER_SOLARIS32 && 0
-// FIXME: need to deal with large file and non-large file cases
+#if SANITIZER_SOLARIS && (defined(_LP64) || _FILE_OFFSET_BITS == 64)
 struct __sanitizer_dirent {
   unsigned long long d_ino;
   long long d_off;
diff --git a/libsanitizer/sanitizer_common/sanitizer_procmaps_solaris.cc b/libsanitizer/sanitizer_common/sanitizer_procmaps_solaris.cc
--- a/libsanitizer/sanitizer_common/sanitizer_procmaps_solaris.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_procmaps_solaris.cc
@@ -13,6 +13,8 @@
 #include "sanitizer_common.h"
 #include "sanitizer_procmaps.h"
 
+// Before Solaris 11.4,  doesn't work in a largefile environment.
+#undef _FILE_OFFSET_BITS
 #include 
 #include 
 
diff --git a/libsanitizer/sanitizer_common/sanitizer_solaris.cc b/libsanitizer/sanitizer_common/sanitizer_solaris.cc
--- a/libsanitizer/sanitizer_common/sanitizer_solaris.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_solaris.cc
@@ -48,10 +48,21 @@ namespace __sanitizer {
   DECLARE__REAL(ret_type, func, __VA_ARGS__); \
   ret_type internal_ ## func(__VA_ARGS__)
 
+#if !defined(_LP64) && _FILE_OFFSET_BITS == 64
+#define _REAL64(func) _ ## func ## 64
+#else
+#define _REAL64(func) _REAL(func)
+#endif
+#define DECLARE__REAL64(ret_type, func, ...) \
+  extern "C" ret_type _REAL64(func)(__VA_ARGS__)
+#define DECLARE__REAL_AND_INTERNAL64(ret_type, func, ...) \
+  DECLARE__REAL64(ret_type, func, __VA_ARGS__); \
+  ret_type internal_ ## func(__VA_ARGS__)
+
 // -- sanitizer_libc.h
-DECLARE__REAL_AND_INTERNAL(uptr, mmap, void *addr, uptr /*size_t*/ length,
-   int prot, int flags, int fd, OFF_T offset) {
-  return (uptr)_REAL(mmap)(addr, length, prot, flags, fd, offset);
+DECLARE__REAL_AND_INTERNAL64(uptr, mmap, void *addr, uptr /*size_t*/ length,
+ int prot, int flags, int fd, OFF_T offset) {
+  return (uptr)_REAL64(mmap)(addr, length, prot, flags, fd, offset);
 }
 
 DECLARE__REAL_AND_INTERNAL(uptr, munmap, void *addr, uptr length) {
@@ -66,14 +77,14 @@ DECLARE__REAL_AND_INTERNAL(uptr, close, 
   return _REAL(close)(fd);
 }
 
-extern "C" int _REAL(open)(const char *, int, ...);
+extern "C" int _REAL64(open)(const char *, int, ...);
 
 uptr internal_open(const char *filename, int flags) {
-  return _REAL(open)(filename, flags);
+  return _REAL64(open)(filename, flags);
 }
 
 uptr internal_open(const char *filename, int flags, u32 mode) {
-  return _REAL(open)(filename, flags, mode);
+  return _REAL64(open)(filename, flags, mode);
 }
 
 uptr OpenFile(const char *filename, bool write) {
@@ -94,16 +105,16 @@ DECLARE__REAL_AND_INTERNAL(uptr, ftrunca
   return ftruncate(fd, size);
 }
 
-DECLARE__REAL_AND_INTERNAL(uptr, stat, const char *path, void *buf) {
-  return _REAL(stat)(path, (struct stat *)buf);
+DECLARE__REAL_AND_INTERNAL64(uptr, stat, const char *path, void *buf) {
+  return _REAL64(stat)(path, (struct stat *)buf);
 }
 
-DECLARE__REAL_AND_INTERNAL(uptr, lstat, const char *path, void *buf) {
-  return _REAL(lstat)(path, (struct stat *)buf);
+DECLARE__REAL_AND_INTERNAL64(uptr, lstat, const char *path, void *buf) {
+  

PR libstdc++/87872 Avoids iterator transfer on self splice

2018-11-06 Thread François Dumont

Here is the patch submitted by John and now fully tested.

    PR libstdc++/87872
    * include/debug/safe_sequence.tcc
    (_Safe_sequence<>::_M_transfer_from_if): Skip transfer to self.

Is it fine to commit it now ?

François

diff --git a/libstdc++-v3/include/debug/safe_sequence.tcc b/libstdc++-v3/include/debug/safe_sequence.tcc
index 12de48cf322..ce9a807e79f 100644
--- a/libstdc++-v3/include/debug/safe_sequence.tcc
+++ b/libstdc++-v3/include/debug/safe_sequence.tcc
@@ -68,6 +68,9 @@ namespace __gnu_debug
   _Safe_sequence<_Sequence>::
   _M_transfer_from_if(_Safe_sequence& __from, _Predicate __pred)
   {
+	if (this == std::__addressof(__from))
+	  return;
+
 	typedef typename _Sequence::iterator iterator;
 	typedef typename _Sequence::const_iterator const_iterator;
 


Optimization of std::deque implementation

2018-11-06 Thread François Dumont
Here is a patch similar to what has been done to other containers like 
std::vector.


It optimizes the swap operation, the allocator extended move constructor 
and some other methods where I just bypass intermediate calls.


I reproduced the noexcept qualification on _Deque_impl_data/_Deque_impl 
even if they are not necessary as long as the std:deque always allocate 
on instantiation.


I also removed some _M_map checks as, once again, it can't be null for 
the moment. I just kept the one in the destructor as I'd like to commit 
a patch after this one to avoid allocations on move which will make this 
check necessary.


Tested under Linux x86_64 Default and C++98 modes, w/o version namespace.

Still time to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index 8e4defbcf26..c49b6852324 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -398,7 +398,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Map_alloc_type;
   typedef __gnu_cxx::__alloc_traits<_Map_alloc_type> _Map_alloc_traits;
 
-public:
   typedef _Alloc		  allocator_type;
 
   allocator_type
@@ -409,11 +408,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   typedef _Deque_iterator<_Tp, const _Tp&, _Ptr_const>   const_iterator;
 
   _Deque_base()
-  : _M_impl()
   { _M_initialize_map(0); }
 
   _Deque_base(size_t __num_elements)
-  : _M_impl()
   { _M_initialize_map(__num_elements); }
 
   _Deque_base(const allocator_type& __a, size_t __num_elements)
@@ -425,6 +422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { /* Caller must initialize map. */ }
 
 #if __cplusplus >= 201103L
+private:
   _Deque_base(_Deque_base&& __x, false_type)
   : _M_impl(__x._M_move_impl())
   { }
@@ -433,84 +431,100 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   : _M_impl(std::move(__x._M_get_Tp_allocator()))
   {
 	_M_initialize_map(0);
-	if (__x._M_impl._M_map)
 	this->_M_impl._M_swap_data(__x._M_impl);
   }
 
+protected:
   _Deque_base(_Deque_base&& __x)
   : _Deque_base(std::move(__x), typename _Alloc_traits::is_always_equal{})
   { }
 
+  _Deque_base(_Deque_base&& __x, const allocator_type& __a)
+  : _M_impl(std::move(__x._M_impl), _Tp_alloc_type(__a))
+  { __x._M_initialize_map(0); }
+
   _Deque_base(_Deque_base&& __x, const allocator_type& __a, size_t __n)
   : _M_impl(__a)
   {
 	if (__x.get_allocator() == __a)
-	  {
-	if (__x._M_impl._M_map)
 	  {
 	_M_initialize_map(0);
 	this->_M_impl._M_swap_data(__x._M_impl);
 	  }
-	  }
 	else
-	  {
 	  _M_initialize_map(__n);
   }
-  }
 #endif
 
   ~_Deque_base() _GLIBCXX_NOEXCEPT;
 
-protected:
   typedef typename iterator::_Map_pointer _Map_pointer;
 
-  //This struct encapsulates the implementation of the std::deque
-  //standard container and at the same time makes use of the EBO
-  //for empty allocators.
-  struct _Deque_impl
-  : public _Tp_alloc_type
+  struct _Deque_impl_data
   {
 	_Map_pointer _M_map;
 	size_t _M_map_size;
 	iterator _M_start;
 	iterator _M_finish;
 
-	_Deque_impl()
-	: _Tp_alloc_type(), _M_map(), _M_map_size(0),
-	  _M_start(), _M_finish()
+	_Deque_impl_data() _GLIBCXX_NOEXCEPT
+	: _M_map(), _M_map_size(), _M_start(), _M_finish()
+	{ }
+
+#if __cplusplus >= 201103L
+	_Deque_impl_data(const _Deque_impl_data&) = default;
+	_Deque_impl_data&
+	operator=(const _Deque_impl_data&) = default;
+
+	_Deque_impl_data(_Deque_impl_data&& __x) noexcept
+	: _Deque_impl_data(__x)
+	{ __x = _Deque_impl_data(); }
+#endif
+
+	void
+	_M_swap_data(_Deque_impl_data& __x) _GLIBCXX_NOEXCEPT
+	{
+	  // Do not use std::swap(_M_start, __x._M_start), etc as it loses
+	  // information used by TBAA.
+	  std::swap(*this, __x);
+	}
+  };
+
+  // This struct encapsulates the implementation of the std::deque
+  // standard container and at the same time makes use of the EBO
+  // for empty allocators.
+  struct _Deque_impl
+  : public _Tp_alloc_type, public _Deque_impl_data
+  {
+	_Deque_impl() _GLIBCXX_NOEXCEPT_IF(
+	  is_nothrow_default_constructible<_Tp_alloc_type>::value)
+	: _Tp_alloc_type()
 	{ }
 
 	_Deque_impl(const _Tp_alloc_type& __a) _GLIBCXX_NOEXCEPT
-	: _Tp_alloc_type(__a), _M_map(), _M_map_size(0),
-	  _M_start(), _M_finish()
+	: _Tp_alloc_type(__a)
 	{ }
 
 #if __cplusplus >= 201103L
 	_Deque_impl(_Deque_impl&&) = default;
 
 	_Deque_impl(_Tp_alloc_type&& __a) noexcept
-	: _Tp_alloc_type(std::move(__a)), _M_map(), _M_map_size(0),
-	  _M_start(), _M_finish()
+	: _Tp_alloc_type(std::move(__a))
 	{ }
-#endif
 
-	void _M_swap_data(_Deque_impl& __x) _GLIBCXX_NOEXCEPT
-	{
-	  using std::swap;
-	  swap(this->_M_start, __x._M_start);
-	  swap(this->_M_finish, __x._M_finish);
-	  swap(this->_M_map, __x._M_map);
-	  swap(this->_M_map_size, __x._M_map_size);
-	}
+	_Deque_impl(_Deque_impl&& __d, _Tp_alloc_type&& __a)
+	: 

[PR87874] avoid const-wide-int subreg in LRA

2018-11-06 Thread Alexandre Oliva
Just like CONST_INT, CONST_WIDE_INT is VOIDmode, so LRA might be
tempted to build a SUBREG to "convert" it to the wanted mode.  That's
no use.  Test for CONST_SCALAR_INT_P instead of CONST_INT_P so that we
skip the subreg creation for both.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?

for  gcc/ChangeLog

PR rtl-optimization/87874
* lra.c (lra_substitute_pseudo): Do not create a subreg for
const wide ints.

for  gcc/testsuite/ChangeLog

PR rtl-optimization/87874
* gcc.dg/pr87874.c: New.
---
 gcc/lra.c  |2 +-
 gcc/testsuite/gcc.dg/pr87874.c |   35 +++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr87874.c

diff --git a/gcc/lra.c b/gcc/lra.c
index aa768fb2a231..5d58d90f3a6b 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1961,7 +1961,7 @@ lra_substitute_pseudo (rtx *loc, int old_regno, rtx 
new_reg, bool subreg_p,
   machine_mode inner_mode = GET_MODE (new_reg);
 
   if (mode != inner_mode
- && ! (CONST_INT_P (new_reg) && SCALAR_INT_MODE_P (mode)))
+ && ! (CONST_SCALAR_INT_P (new_reg) && SCALAR_INT_MODE_P (mode)))
{
  poly_uint64 offset = 0;
  if (partial_subreg_p (mode, inner_mode)
diff --git a/gcc/testsuite/gcc.dg/pr87874.c b/gcc/testsuite/gcc.dg/pr87874.c
new file mode 100644
index ..3ab5dcf68ffb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87874.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O1 -fgcse -fno-dce -fno-tree-ccp -fno-tree-coalesce-vars 
-fno-tree-copy-prop -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre 
-fno-tree-loop-optimize -fno-tree-sink" } */
+
+int *vk;
+int m2;
+#if __SIZEOF_INT128__
+__int128 nb;
+
+void
+em (int u5, int fo, int s7)
+{
+  for (;;)
+{
+  long int es;
+
+  es = !!u5 ? (!!fo && !!m2) : fo;
+  if (es == 0)
+if (nb == *vk)
+  {
+const unsigned long int uint64_max = 18446744073709551615ul;
+__int128 ks = uint64_max / 2 + 1;
+
+while (s7 < 1)
+  while (nb < 2)
+{
+  for (ks = 0; ks < 3; ++ks)
+{
+}
+
+  ++nb;
+}
+  }
+}
+}
+#endif

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


[C++ Patch] Improve compute_array_index_type locations

2018-11-06 Thread Paolo Carlini

Hi,

when I improved create_array_type_for_decl I didn't notice that it calls 
compute_array_index_type as helper, which simply needs to have the 
location information propagated. Tested x86_64-linux.


Thanks, Paolo.



/cp
2018-11-06  Paolo Carlini  

* decl.c (compute_array_index_type_loc): New, like the current
compute_array_index_type but takes a location_t too.
(compute_array_index_type): Forward to the latter.
(create_array_type_for_decl): Use compute_array_index_type_loc.

/testsuite
2018-11-06  Paolo Carlini  

* g++.dg/cpp0x/constexpr-47969.C: Test locations too.
* g++.dg/cpp0x/constexpr-48324.C: Likewise.
* g++.dg/cpp0x/constexpr-ex2.C: Likewise.
* g++.dg/cpp0x/scoped_enum2.C: Likewise.
* g++.dg/cpp1y/pr63996.C: Likewise.
* g++.dg/ext/constexpr-vla5.C: Likewise.
* g++.dg/ext/stmtexpr15.C: Likewise.
* g++.dg/ext/vla1.C: Likewise.
* g++.dg/other/fold1.C: Likewise.
* g++.dg/parse/array-size2.C: Likewise.
* g++.dg/parse/crash36.C: Likewise.
* g++.dg/ubsan/pr81530.C: Likewise.
* g++.dg/warn/Wvla-1.C: Likewise.
* g++.dg/warn/Wvla-2.C: Likewise.
* g++.old-deja/g++.brendan/array1.C: Likewise.
* g++.old-deja/g++.bugs/900402_02.C: Likewise.
* g++.old-deja/g++.law/init3.C: Likewise.
* g++.old-deja/g++.mike/p6149.C: Likewise.Index: cp/decl.c
===
--- cp/decl.c   (revision 265826)
+++ cp/decl.c   (working copy)
@@ -9621,8 +9621,9 @@ fold_sizeof_expr (tree t)
an appropriate index type for the array.  If non-NULL, NAME is
the name of the entity being declared.  */
 
-tree
-compute_array_index_type (tree name, tree size, tsubst_flags_t complain)
+static tree
+compute_array_index_type_loc (location_t loc, tree name, tree size,
+ tsubst_flags_t complain)
 {
   tree itype;
   tree osize = size;
@@ -9658,7 +9659,8 @@ fold_sizeof_expr (tree t)
  if (!(complain & tf_error))
return error_mark_node;
  if (name)
-   error ("size of array %qD has non-integral type %qT", name, type);
+   error_at (loc, "size of array %qD has non-integral type %qT",
+ name, type);
  else
error ("size of array has non-integral type %qT", type);
  size = integer_one_node;
@@ -9689,8 +9691,14 @@ fold_sizeof_expr (tree t)
 {
   tree folded = cp_fully_fold (size);
   if (TREE_CODE (folded) == INTEGER_CST)
-   pedwarn (input_location, OPT_Wpedantic,
-"size of array is not an integral constant-expression");
+   {
+ if (name)
+   pedwarn (loc, OPT_Wpedantic, "size of array %qD is not an "
+"integral constant-expression", name);
+ else
+   pedwarn (input_location, OPT_Wpedantic,
+"size of array is not an integral constant-expression");
+   }
   /* Use the folded result for VLAs, too; it will have resolved
 SIZEOF_EXPR.  */
   size = folded;
@@ -9706,7 +9714,7 @@ fold_sizeof_expr (tree t)
return error_mark_node;
 
  if (name)
-   error ("size of array %qD is negative", name);
+   error_at (loc, "size of array %qD is negative", name);
  else
error ("size of array is negative");
  size = integer_one_node;
@@ -9722,9 +9730,11 @@ fold_sizeof_expr (tree t)
  else if (in_system_header_at (input_location))
/* Allow them in system headers because glibc uses them.  */;
  else if (name)
-   pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids zero-size 
array %qD", name);
+   pedwarn (loc, OPT_Wpedantic,
+"ISO C++ forbids zero-size array %qD", name);
  else
-   pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids zero-size 
array");
+   pedwarn (input_location, OPT_Wpedantic,
+"ISO C++ forbids zero-size array");
}
 }
   else if (TREE_CONSTANT (size)
@@ -9737,8 +9747,9 @@ fold_sizeof_expr (tree t)
return error_mark_node;
   /* `(int) ' is not a valid array bound.  */
   if (name)
-   error ("size of array %qD is not an integral constant-expression",
-  name);
+   error_at (loc,
+ "size of array %qD is not an integral constant-expression",
+ name);
   else
error ("size of array is not an integral constant-expression");
   size = integer_one_node;
@@ -9746,15 +9757,17 @@ fold_sizeof_expr (tree t)
   else if (pedantic && warn_vla != 0)
 {
   if (name)
-   pedwarn (input_location, OPT_Wvla, "ISO C++ forbids variable length 
array %qD", name);
+   pedwarn (loc, OPT_Wvla,
+"ISO C++ forbids variable length array %qD", name);
   else
-   pedwarn 

Re: Reset more langhooks at free lang data

2018-11-06 Thread Richard Biener
On Mon, 5 Nov 2018, Jan Hubicka wrote:

> Hi,
> this patch reset some of frontend langhooks that I think should be
> completely handled by middle-end now.  I will also make patch to rewrite
> set_assembler_name in some way that is safe, the comment bellow
> the hunk added is bit oversimplified because we do add external
> symbols (such as ctors and dtors or profile stuff).
> 
> The patch fixed the ICE in var_mod_type_p with obj-c++.  The other
> hooks was changed more as a pre-cauntion. We want to detach FE
> as much as possible from middle-end.
> 
> lto-Bootstrapped/regtested x86_64-linux, OK?

OK.

I wonder if we can do some of this resetting unconditionally.

Richard.

> Honza
> 
>   * tree.c (free_lang_data): Reset overwite_assembler_name,
>   print_xnode, print_decl, print_type and print_identifier of
>   langhooks.
> Index: tree.c
> ===
> --- tree.c(revision 265807)
> +++ tree.c(working copy)
> @@ -6010,6 +6016,13 @@ free_lang_data (void)
>lang_hooks.dwarf_name = lhd_dwarf_name;
>lang_hooks.decl_printable_name = gimple_decl_printable_name;
>lang_hooks.gimplify_expr = lhd_gimplify_expr;
> +  lang_hooks.overwrite_decl_assembler_name = 
> lhd_overwrite_decl_assembler_name;
> +  lang_hooks.print_xnode = lhd_print_tree_nothing;
> +  lang_hooks.print_decl = lhd_print_tree_nothing;
> +  lang_hooks.print_type = lhd_print_tree_nothing;
> +  lang_hooks.print_identifier = lhd_print_tree_nothing;
> +
> +  lang_hooks.tree_inlining.var_mod_type_p = hook_bool_tree_tree_false;
>  
>/* We do not want the default decl_assembler_name implementation,
>   rather if we have fixed everything we want a wrapper around it
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)