date:20180413

Re: [PATCH] [PR c++/85039] no type definitions in builtin offsetof

2018-04-13 Thread Jason Merrill

On Fri, Apr 13, 2018, 6:09 PM Alexandre Oliva  wrote:

> On Apr 11, 2018, Jason Merrill  wrote:
>
> > I raised this issue with the C++ committee, and it seems that nobody
> > expects defining a type here to work.  So let's go back to your first
> > patch, removing the offending part of semicolon3.C.
>
> All right, here's the adjusted patch.  Retested on i686- and
> x86_64-linux-gnu.  Ok to install?
>

Ok.

Jason

[PR c++/85039] no type definitions in builtin offsetof
>
> Types defined within a __builtin_offsetof argument don't always get
> properly recorded as members of their context types, so if they're
> anonymous, we may fail to assign them an anon type index for mangling
> and ICE.
>
> We shouldn't allow types to be introduced in __builtin_offsetof, I
> think, and Jason says the std committee agrees, so I've arranged for
> us to reject them.
>
> Even then, we still parse the definitions and attempt to assign
> mangled names to its member functions, so the ICE remains.  Since
> we've already reported an error, we might as well complete the name
> assignment with an arbitrary index, thus avoiding the ICE.
>
> We used to have a test that expected to be able to define types in
> __builtin_offsetof; this patch removes that specific test.
>
>
> for  gcc/cp/ChangeLog
>
> PR c++/85039
> * parser.c (cp_parser_builtin_offset): Reject type definitions.
> * mangle.c (nested_anon_class_index): Avoid crash returning -1
> if we've seen errors.
>
> for  gcc/testsuite/ChangeLog
>
> PR c++/85039
> * g++.dg/pr85039-1.C: New.
> * g++.dg/pr85039-2.C: New.
> * g++.dg/parse/semicolon3.C: Remove test_offsetof.
> ---
>  gcc/cp/mangle.c |3 +++
>  gcc/cp/parser.c |8 +++-
>  gcc/testsuite/g++.dg/parse/semicolon3.C |7 ---
>  gcc/testsuite/g++.dg/pr85039-1.C|   17 +
>  gcc/testsuite/g++.dg/pr85039-2.C|   10 ++
>  5 files changed, 37 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr85039-1.C
>  create mode 100644 gcc/testsuite/g++.dg/pr85039-2.C
>
> diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
> index 94c4bed28486..a7f9d686345d 100644
> --- a/gcc/cp/mangle.c
> +++ b/gcc/cp/mangle.c
> @@ -1623,6 +1623,9 @@ nested_anon_class_index (tree type)
>   ++index;
>}
>
> +  if (seen_error ())
> +return -1;
> +
>gcc_unreachable ();
>  }
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 8b1b271b53d1..bf46165f5ae1 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -9823,7 +9823,13 @@ cp_parser_builtin_offsetof (cp_parser *parser)
>parens.require_open (parser);
>/* Parse the type-id.  */
>location_t loc = cp_lexer_peek_token (parser->lexer)->location;
> -  type = cp_parser_type_id (parser);
> +  {
> +const char *saved_message = parser->type_definition_forbidden_message;
> +parser->type_definition_forbidden_message
> +  = G_("types may not be defined within __builtin_offsetof");
> +type = cp_parser_type_id (parser);
> +parser->type_definition_forbidden_message = saved_message;
> +  }
>/* Look for the `,'.  */
>cp_parser_require (parser, CPP_COMMA, RT_COMMA);
>token = cp_lexer_peek_token (parser->lexer);
> diff --git a/gcc/testsuite/g++.dg/parse/semicolon3.C
> b/gcc/testsuite/g++.dg/parse/semicolon3.C
> index 8a2b1ac46301..0d46be9ed654 100644
> --- a/gcc/testsuite/g++.dg/parse/semicolon3.C
> +++ b/gcc/testsuite/g++.dg/parse/semicolon3.C
> @@ -20,13 +20,6 @@ struct OK3
>  } // no complaints
>(s7);
>
> -__SIZE_TYPE__
> -test_offsetof (void)
> -{
> -  // no complaints about a missing semicolon
> -  return __builtin_offsetof (struct OK4 { int a; int b; }, b);
> -}
> -
>  struct OK5
>  {
>int a;
> diff --git a/gcc/testsuite/g++.dg/pr85039-1.C
> b/gcc/testsuite/g++.dg/pr85039-1.C
> new file mode 100644
> index ..f57c8a261dee
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr85039-1.C
> @@ -0,0 +1,17 @@
> +// { dg-do compile { target c++14 } }
> +
> +constexpr int a() {
> + return
> +  __builtin_offsetof(struct { // { dg-error "types may not be defined" }
> +int i;
> +short b {
> +  __builtin_offsetof(struct { // { dg-error "types may not be
> defined" }
> +   int j;
> +struct c { // { dg-error "types may not be defined" }
> +  void d() {
> +  }
> +};
> +  }, j)
> +};
> +  }, i);
> +}
> diff --git a/gcc/testsuite/g++.dg/pr85039-2.C
> b/gcc/testsuite/g++.dg/pr85039-2.C
> new file mode 100644
> index ..e6d16325105b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr85039-2.C
> @@ -0,0 +1,10 @@
> +// { dg-do compile }
> +
> +struct d {
> +  static d *b;
> +} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be
> defined" }
> +  int i;
> +  struct a { // { dg-error "types may not be defined" }
> +int c() { return .1f; }
> +  };
> +}, i));
>
>
>

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Jason Merrill

On Fri, Apr 13, 2018, 7:53 PM Paolo Carlini 
wrote:

> Hi again,
>
> On 14/04/2018 00:12, Paolo Carlini wrote:
> > Hi,
> >
> > On 13/04/2018 19:45, Jason Merrill wrote:
> >> Ah, I see.  The problem is that even though convert_to_integer_1 was
> >> called with dofold false, we lose that when it calls convert(). Why
> >> not recurse directly to convert_to_integer_1 with the underlying type?
> >>   That would avoid the undesired folding.
> > Interesting. I had no idea that this seemingly simple error-recovery
> > issue was ultimately due to a substantive if latent issue in
> > convert.c. Anyway, in the meanwhile I tried a few variants of direct
> > recursion without much success, my boostraps all failed pretty quickly
> > and pretty badly. However the below - which isn't far from the spirit
> > of your analysis, I think - is holding up well, at least bootstrap +
> > C++ testsuite are definitely OK. Should I continue testing it over the
> > weekend?
> The below seems much better, also bootstraps fine and I'm now fully
> testing it.
>
> Thanks,
> Paolo.
>
> PS: no idea why earlier today I wasted a lot of time trying to
> completely avoid maybe_fold_build1_loc :(
>

Ok if testing passes.

Jason

>

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Paolo Carlini


Hi again,

On 14/04/2018 00:12, Paolo Carlini wrote:

Hi,

On 13/04/2018 19:45, Jason Merrill wrote:

Ah, I see.  The problem is that even though convert_to_integer_1 was
called with dofold false, we lose that when it calls convert(). Why
not recurse directly to convert_to_integer_1 with the underlying type?
  That would avoid the undesired folding.
Interesting. I had no idea that this seemingly simple error-recovery 
issue was ultimately due to a substantive if latent issue in 
convert.c. Anyway, in the meanwhile I tried a few variants of direct 
recursion without much success, my boostraps all failed pretty quickly 
and pretty badly. However the below - which isn't far from the spirit 
of your analysis, I think - is holding up well, at least bootstrap + 
C++ testsuite are definitely OK. Should I continue testing it over the 
weekend?
The below seems much better, also bootstraps fine and I'm now fully 
testing it.


Thanks,
Paolo.

PS: no idea why earlier today I wasted a lot of time trying to 
completely avoid maybe_fold_build1_loc :(


/
Index: convert.c
===
--- convert.c   (revision 259376)
+++ convert.c   (working copy)
@@ -741,8 +741,10 @@ convert_to_integer_1 (tree type, tree expr, bool d
   else if (TREE_CODE (type) == ENUMERAL_TYPE
   || maybe_ne (outprec, GET_MODE_PRECISION (TYPE_MODE (type
{
- expr = convert (lang_hooks.types.type_for_mode
- (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr);
+ expr
+   = convert_to_integer_1 (lang_hooks.types.type_for_mode
+   (TYPE_MODE (type), TYPE_UNSIGNED (type)),
+   expr, dofold);
  return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr);
}
 
Index: testsuite/g++.dg/cpp0x/pr85112.C
===
--- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/pr85112.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/85112
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int m;
+  int n : 4;
+};
+
+int i;  // { dg-message "not const" }
+
+void foo()
+{
+  constexpr int j = i;  // { dg-error "not usable" }
+  A a;
+  a.n = j;
+}

[PATCH] rs6000: Disable -m[no-]direct-move (PR85293)

2018-04-13 Thread Segher Boessenkool

The -mno-direct-move option causes a lot of problems, since it forces
us to be able to generate code for p8 and up with some crucial
instructions missing.  This patch removes the -m[no-]direct-move
options so that the user cannot put us into this unexpected situation
anymore.  Internally we still have all the same flags, and they are
automatically set based on -mcpu; getting rid of that is a lot more
work and will have to wait for GCC 9 (in some places the flag is used
to see if we are compiling for a p8 _at all_).

Tested on powerpc64-linux {-m32,-m64}.  Will also test on a p9 before
committing.


Segher


2018-04-13  Segher Boessenkool  

PR target/85293
* config/rs6000/rs6000.opt (mdirect-move): Make deprecated.
* doc/invoke.texi (RS/6000 and PowerPC Options): Remove -mdirect-move
and -mno-direct-move.

gcc/testsuite/
PR target/85293
* gcc.target/powerpc/pr80098-2.c: Remove -mdirect-move.  Remove the
corresponding dg-error clause.
* gcc.target/powerpc/pr80098-3.c: Ditto.
* gcc.target/powerpc/pr80103-1.c: Delete.

---
 gcc/config/rs6000/rs6000.opt |  3 +--
 gcc/doc/invoke.texi  | 12 ++--
 gcc/testsuite/gcc.target/powerpc/pr80098-2.c |  3 +--
 gcc/testsuite/gcc.target/powerpc/pr80098-3.c |  3 +--
 gcc/testsuite/gcc.target/powerpc/pr80103-1.c | 15 ---
 5 files changed, 5 insertions(+), 31 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr80103-1.c

diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 596f3fb..bbe3a64 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -534,8 +534,7 @@ Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
 
 mdirect-move
-Target Report Mask(DIRECT_MOVE) Var(rs6000_isa_flags)
-Use ISA 2.07 direct move between GPR & VSX register instructions.
+Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) Ignore Warn(%qs is 
deprecated)
 
 mhtm
 Target Report Mask(HTM) Var(rs6000_isa_flags)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a8e2672..70896f7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1095,7 +1095,7 @@ See RS/6000 and PowerPC Options.
 -mpointers-to-nested-functions  -mno-pointers-to-nested-functions @gol
 -msave-toc-indirect  -mno-save-toc-indirect @gol
 -mpower8-fusion  -mno-mpower8-fusion  -mpower8-vector  -mno-power8-vector @gol
--mcrypto  -mno-crypto  -mhtm  -mno-htm  -mdirect-move  -mno-direct-move @gol
+-mcrypto  -mno-crypto  -mhtm  -mno-htm @gol
 -mquad-memory  -mno-quad-memory @gol
 -mquad-memory-atomic  -mno-quad-memory-atomic @gol
 -mcompat-align-parm  -mno-compat-align-parm @gol
@@ -23353,7 +23353,7 @@ following options:
 -mpopcntb -mpopcntd  -mpowerpc64 @gol
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
 -msimple-fpu  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
--mcrypto -mdirect-move -mhtm -mpower8-fusion -mpower8-vector @gol
+-mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol
 -mquad-memory -mquad-memory-atomic -mfloat128 -mfloat128-hardware}
 
 The particular options set for any particular CPU varies between
@@ -23495,14 +23495,6 @@ Enable the use (disable) of the built-in functions 
that allow direct
 access to the cryptographic instructions that were added in version
 2.07 of the PowerPC ISA.
 
-@item -mdirect-move
-@itemx -mno-direct-move
-@opindex mdirect-move
-@opindex mno-direct-move
-Generate code that uses (does not use) the instructions to move data
-between the general purpose registers and the vector/scalar (VSX)
-registers that were added in version 2.07 of the PowerPC ISA.
-
 @item -mhtm
 @itemx -mno-htm
 @opindex mhtm
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80098-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr80098-2.c
index 88f7ee4..5a6421b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80098-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80098-2.c
@@ -1,9 +1,8 @@
 /* { dg-do compile { target { powerpc64*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mcpu=power8 -mno-power8-vector -mdirect-move -mcrypto" } */
+/* { dg-options "-mcpu=power8 -mno-power8-vector -mcrypto" } */
 
 int i;
 
-/* { dg-error "'-mno-power8-vector' turns off '-mdirect-move'" "PR80098" { 
target *-*-* } 0 } */
 /* { dg-error "'-mno-power8-vector' turns off '-mcrypto'"  "PR80098" { 
target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80098-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr80098-3.c
index aae8fa1..e62418b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80098-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80098-3.c
@@ -1,9 +1,8 @@
 /* { dg-do compile { target { powerpc64*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } {

Re: [PATCH] rs6000 PR83660 fix ICE with vec_extract

2018-04-13 Thread Segher Boessenkool

Hi!

On Fri, Apr 13, 2018 at 03:37:08PM -0500, Aaron Sawdey wrote:
> Per the discussion on the 83660, I've come to a minimal patch to
> prevent this. Basically marking the vec_extract tree as having side
> effects later makes sure that it gets all the cleanup points it needs
> so that gimplify_cleanup_point_expr () is happy.  Also because
> vec_insert puts a MODIFY_EXPR in there, it has side effects and this
> problem will not occur.
> 
> Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is
> where this issue arises. OK for trunk if everything passes?

> --- testsuite/gcc.target/powerpc/pr83660.C(nonexistent)
> +++ testsuite/gcc.target/powerpc/pr83660.C(working copy)
> @@ -0,0 +1,14 @@
> +/* PR target/83660 */
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=power7" } */

You'll need to skip this test if the user overrides -mcpu=; see the many
other testcases that do this.

With that fixed, okay for trunk.  Thanks!


Segher

[PATCH] [PR c++/80290] recycle tinst garbage sooner

2018-04-13 Thread Alexandre Oliva

tinst_level objects are created during template instantiation, and
they're most often quite short-lived, but since there's no intervening
garbage collection, they accumulate throughout the pass while most by
far could be recycled after a short while.  The original testcase in
PR80290, for example, creates almost 55 million tinst_level objects,
all but 10 thousand of them without intervening GC, but we don't need
more than 284 of them live at a time.

Furthermore, in many cases, TREE_LIST objects are created to stand for
the decl in tinst_level.  In most cases, those can be recycled as soon
as the tinst_level object is recycled; in some relatively common
cases, they are modified and reused in up to 3 tinst_level objects.
In the same testcase, TREE_LISTs are used in all but 3 thousand of the
tinst_level objects, and we don't need more than 89 tinst_level
objects with TREE_LISTs live at a time.  Furthermore, all but 2
thousand of those are created without intervening GC.

This patch arranges for tinst_level objects to be refcounted (I've
seen as many as 20 live references to a single tinst_level object in
my testing), and for pending_template, tinst_level and TREE_LIST
objects that can be recycled to be added to freelists; that's much
faster than ggc_free()ing them and reallocating them shortly
thereafter.  In fact, the introduction of such freelists is what makes
this entire patch lighter-weight, when it comes to memory use, and
faster.  With refcounting alone, the total memory footprint was still
about the same, and we spent more time.

In order to further reduce memory use, I've arranged for us to create
TREE_LISTs lazily, only at points that really need them (when printing
error messages).  This grows tinst_level by an additional pointer, but
since a TREE_LIST holds a lot more than an extra pointer, and
tinst_levels with TREE_LISTs used to be allocated tens of thousands of
times more often than plain decl ones, we still save memory overall.

I was still not quite happy about growing memory use in cases that
used template classes but not template overload resolution, so I
changed the type of the errors field to unsigned short, from int.
With that change, in_system_header_p and refcount move into the same
word or half-word that used to hold errors, releasing a full word,
bringing tinst_level back to its original size, now without any
padding.

The errors field is only used to test whether we've had any errors
since the expansion of some template, to skip the expansion of further
templates.  If we get 2^16 errors or more, it is probably reasonable
to refrain from expanding further templates, even if we would expand
them before this change.

With these changes, compile time for the original testcase at -O0,
with default checking enabled, is cut down by some 3.7%, total GCable
memory allocation is cut down by almost 20%, and total memory use (as
reported by GNU time as maxresident) is cut down by slightly over 15%.

Regstrapped on i686- and x86_64-linux-gnu.  Ok to install?
(See below for an incremental patch that introduces some statistics
collection and reporting, that shows how I collected some of the data
above.  That one is not meant for inclusion)

for  gcc/cp/ChangeLog

PR c++/80290
* cp-tree.h (struct tinst_level): Split decl into tldcl and
targs.  Add private split_list_p, tree_list_p, and not_list_p
inline const predicates; to_list private member function
declaration; list_p, get_node, and get_decl_maybe accessors,
and refcount private data member.  Narrow errors to unsigned
short.
* error.c (print_instantiation_full_context): Use new
accessors.
(print_instantiation_partial_context_line): Likewise.
* mangle.c (mangle_decl_string): Likewise.
* pt.c (freelist): New template class.
(tree_list_freelist_head): New var.
(tree_list_freelist): New fn, along with specializations.
(tinst_level_freelist_head): New var.
(pending_template_freelist_head): Likewise.
(tinst_level_freelist, pending_template_freelist): New fns.
(tinst_level::to_list): Define.
(inc_refcount_use, dec_refcount_use): New fns for tinst_level.
(set_refcount_ptr): New template fn.
(add_pending_template): Adjust for refcounting, freelists and
new accessors.
(neglectable_inst_p): Take a NULL d as a non-DECL.
(limit_bad_template_recursion): Use new accessors.
(push_tinst_level): New overload to create the list.
(push_tinst_level_loc): Make it static, split decl into two
args, adjust tests and initialization to cope with split
lists, use freelist, adjust for refcounting.
(push_tinst_level_loc): New wrapper with the old interface.
(pop_tinst_level): Adjust for refcounting.
(record_last_problematic_instantiation): Likewise.
(reopen_tinst_level): Likewise.  Use new accessors.

Re: RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Vladimir Makarov




On 04/13/2018 05:26 PM, Segher Boessenkool wrote:

Hi!

On Fri, Apr 13, 2018 at 04:43:02PM -0400, Vladimir Makarov wrote:

On 04/13/2018 03:58 PM, Alexander Monakov wrote:

Here's another compact variant:

  regno = reg_renumber[regno];
  if (regno < 0)
regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];


Thanks, Alexander.  I like your variant more.

Me too :-)


--- testsuite/gcc.target/powerpc/pr79916.c  (nonexistent)
+++ testsuite/gcc.target/powerpc/pr79916.c  (working copy)
@@ -0,0 +1,556 @@
+/* { dg-do compile  { target { powerpc64le-*-* } } } */

Is the testcase specific to LE?  If not, please use powerpc*-*-*, or
powerpc*-*-* && lp64  if it needs 64-bit.

I've just tried powerpc64 BE and powerpc.  The bug is reproducible 
everywhere.  So I change it to powerpc*-*-*

+/* { dg-options "-Idfp -fno-expensive-optimizations --param 
ira-max-conflict-table-size=0 -mno-popcntd -O3" } */

I think you can drop the -Idfp ?  If that option would do anything we
have bigger problems ;-)

I removed it.

Looks fine to me otherwise.  Thanks for working on this!



Thank you for the quick review.

I committed the patch to trunk (rev. 259379).

The final variant is in the attachment.

Index: ChangeLog
===
--- ChangeLog	(revision 259378)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard
+	regs (if any) to define how to gnerate SD moves when LRA is in
+	progress.
+
 2018-04-13  Jakub Jelinek  
 
 	PR rtl-optimization/85393
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 259378)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* gcc.target/powerpc/pr79916.c: New.
+
 2018-04-13  Jakub Jelinek  
 
 	PR rtl-optimization/85393
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c	(revision 259330)
+++ config/rs6000/rs6000.c	(working copy)
@@ -10610,7 +10610,9 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10635,7 +10637,9 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
@@ -10664,7 +10668,9 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10689,7 +10695,9 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
Index: testsuite/gcc.target/powerpc/pr79916.c
===
--- testsuite/gcc.target/powerpc/pr79916.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr79916.c	(working copy)
@@ -0,0 +1,556 @@
+/* { dg-do compile  { target { powerpc*-*-* } } } */
+/* { dg-options "-fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */
+
+#define PASTE2(A,B) A ## B
+#define PASTE(A,B) PASTE2(A,B)
+
+#define TESTVAL_NEG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  si = VAL;\
+  sll = PASTE(VAL,LL);			\
+  a = si;\
+  b = sll;\
+  c = VAL;\
+  d = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NEG_BIG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  sll = PASTE(VAL,LL);			\
+  a = sll;\
+  b = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NONNEG(VAL,SUF,SIZE)

Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins

2018-04-13 Thread Segher Boessenkool

Hi!

On Fri, Apr 13, 2018 at 03:27:40PM -0700, Carl Love wrote:
> On Fri, 2018-04-13 at 16:54 -0500, Segher Boessenkool wrote:
> > On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote:
> > > diff --git a/gcc/config/rs6000/rs6000.c
> > > b/gcc/config/rs6000/rs6000.c
> > > index a0c9b5c..855be43 100644
> > > --- a/gcc/config/rs6000/rs6000.c
> > > +++ b/gcc/config/rs6000/rs6000.c
> > > @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin
> > > (gimple_stmt_iterator *gsi)
> > >    {
> > >    arg0 = gimple_call_arg (stmt, 0);
> > >    lhs = gimple_call_lhs (stmt);
> > > -  /* Only fold the vec_splat_*() if arg0 is constant.  */
> > > -  if (TREE_CODE (arg0) != INTEGER_CST)
> > > +  /* Only fold the vec_splat_*() if arg0 is a 5-bit
> > > constant.  */
> > > +  if (TREE_CODE (arg0) != INTEGER_CST
> > > +  || TREE_INT_CST_LOW (arg0) & ~0x1f)
> > >      return false;
> > 
> > Should this test for *signed* 5-bit constants only?
> > 
> >  if (TREE_CODE (arg0) != INTEGER_CST
> >  || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))
> 
> The buitins for the unsigned vec_splat_u[8, 16,32]  are actually mapped
> to their corresponding signed version vec_splat_s[8, 16,32] in
> altivec.h.  Both the vec_splat_u[8, 16,32] and vec_splat_s[8, 16,32]
> builtins will get you to the case statement
> 
> /* flavors of vec_splat_[us]{8,16,32}.  */
> case ALTIVEC_BUILTIN_VSPLTISB:
> case ALTIVEC_BUILTIN_VSPLTISH:
> case ALTIVEC_BUILTIN_VSPLTISW:
> 
> under which the change is being made.  So technically arg0 could be a
> signed 5-bit or an unsiged 5-bit value.  Either way the 5-bit value is
> splatted into the vector with sign extension to 8/16/32 bits.  So I
> would argue that the IN_RANGE test for signed values is maybe
> misleading as arg0 could represent a signed or unsigned value.  Both
> tests, the one from the patch or Segher's suggestion, are really just
> looking to see if any of the upper bits are 1.  From a functional
> standpoint, I don't see any difference and feel either one would do the
> job.  Am I missing anything?

But then the & ~0x1f test is not good either, it does not work for values
-16..-1 ?

You cannot handle signed and unsigned exactly the same for the test for
allowed values.


Segher

Re: [PATCH v2, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative

2018-04-13 Thread Segher Boessenkool

Hi!

On Fri, Apr 13, 2018 at 03:21:14PM -0500, Paul Clarke wrote:
> -  if (__builtin_constant_p(__B))
> - lshift = (__v4su) vec_splat_s32(__B);
> +  if (__builtin_constant_p(__B) && __B < 16)
> +lshift = (__v4su) vec_splat_s32(__B);
>else
> - lshift = vec_splats ((unsigned int) __B);
> +lshift = vec_splats ((unsigned int) __B);

You changed the tab chars to spaces here.  Please don't.  I'll fix it.

Rest looks fine...  Let's see if I manage to commit it :-)

Thanks,


Segher

Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins

2018-04-13 Thread Carl Love

On Fri, 2018-04-13 at 16:54 -0500, Segher Boessenkool wrote:
> Hi Carl,
> 
> On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote:
> > diff --git a/gcc/config/rs6000/rs6000.c
> > b/gcc/config/rs6000/rs6000.c
> > index a0c9b5c..855be43 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin
> > (gimple_stmt_iterator *gsi)
> >    {
> >      arg0 = gimple_call_arg (stmt, 0);
> >      lhs = gimple_call_lhs (stmt);
> > -    /* Only fold the vec_splat_*() if arg0 is constant.  */
> > -    if (TREE_CODE (arg0) != INTEGER_CST)
> > +    /* Only fold the vec_splat_*() if arg0 is a 5-bit
> > constant.  */
> > +    if (TREE_CODE (arg0) != INTEGER_CST
> > +    || TREE_INT_CST_LOW (arg0) & ~0x1f)
> >        return false;
> 
> Should this test for *signed* 5-bit constants only?
> 
>if (TREE_CODE (arg0) != INTEGER_CST
>    || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))
Segher:

The buitins for the unsigned vec_splat_u[8, 16,32]  are actually mapped
to their corresponding signed version vec_splat_s[8, 16,32] in
altivec.h.  Both the vec_splat_u[8, 16,32] and vec_splat_s[8, 16,32]
builtins will get you to the case statement

/* flavors of vec_splat_[us]{8,16,32}.  */
case ALTIVEC_BUILTIN_VSPLTISB:
case ALTIVEC_BUILTIN_VSPLTISH:
case ALTIVEC_BUILTIN_VSPLTISW:

under which the change is being made.  So technically arg0 could be a
signed 5-bit or an unsiged 5-bit value.  Either way the 5-bit value is
splatted into the vector with sign extension to 8/16/32 bits.  So I
would argue that the IN_RANGE test for signed values is maybe
misleading as arg0 could represent a signed or unsigned value.  Both
tests, the one from the patch or Segher's suggestion, are really just
looking to see if any of the upper bits are 1.  From a functional
standpoint, I don't see any difference and feel either one would do the
job.  Am I missing anything?

   Carl Love

Re: [PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32

2018-04-13 Thread Segher Boessenkool

Hi!

On Fri, Apr 13, 2018 at 03:15:09PM -0500, Kelvin Nilsen wrote:
> Twelve failures have been occuring in the bfp test directory during -m32
> regression testing.

> This patch:
> 
> 1. Changes the expected error messages in certain test programs.
> 
> 2. Disables certain test programs from being exercised on 32-bit targets.
> 
> 3. Adds a "note" error message to explain the mapping from overloaded
> built-in functions
> to non-overloaded built-in functions.

Ah, very good plan!

> @@ -6932,14 +6933,20 @@
>      && rs6000_builtin_type_compatible (types[1], desc->op2))
>    {
>      if (rs6000_builtin_decls[desc->overloaded_code] != NULL_TREE)
> -      return altivec_build_resolved_builtin (args, n, desc);
> +      {
> +        result = altivec_build_resolved_builtin (args, n, desc);
> +        /* overloaded_code is set above */
> +        if (!rs6000_builtin_is_supported_p (overloaded_code))
> +          unsupported_builtin = true;
> +        else
> +          return result;
> +      }
>      else
>    unsupported_builtin = true;
>    }
>    }

The indentation (here and elsewhere) is off; probably because of the way
you sent this?  Please check.

> +    else
> +      /* on 64-bit, i seem to end up here */
> +      error ("builtin function %qs not supported in this compiler "
> +         "configuration", name);

I think you don't want that comment?

> +    /* If an error-representing  result tree was returned from
> +       altivec_build_resolved_builtin above, use it.  */
> +    return (result != NULL)? result: error_mark_node;

Spaces before ? and :.  x != 0 as a condition is pretty silly btw, but
that's just style ;-)

Okay for trunk with those nits fixed.  Thanks!


Segher

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Paolo Carlini


Hi,

On 13/04/2018 19:45, Jason Merrill wrote:

Ah, I see.  The problem is that even though convert_to_integer_1 was
called with dofold false, we lose that when it calls convert().  Why
not recurse directly to convert_to_integer_1 with the underlying type?
  That would avoid the undesired folding.
Interesting. I had no idea that this seemingly simple error-recovery 
issue was ultimately due to a substantive if latent issue in convert.c. 
Anyway, in the meanwhile I tried a few variants of direct recursion 
without much success, my boostraps all failed pretty quickly and pretty 
badly. However the below - which isn't far from the spirit of your 
analysis, I think - is holding up well, at least bootstrap + C++ 
testsuite are definitely OK. Should I continue testing it over the weekend?


Thanks!
Paolo.

///
Index: convert.c
===
--- convert.c   (revision 259376)
+++ convert.c   (working copy)
@@ -741,8 +741,12 @@ convert_to_integer_1 (tree type, tree expr, bool d
   else if (TREE_CODE (type) == ENUMERAL_TYPE
   || maybe_ne (outprec, GET_MODE_PRECISION (TYPE_MODE (type
{
- expr = convert (lang_hooks.types.type_for_mode
- (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr);
+ tree utype = (lang_hooks.types.type_for_mode
+   (TYPE_MODE (type), TYPE_UNSIGNED (type)));
+ if (dofold)
+   expr = convert (utype, expr);
+ else
+   expr = build1 (CONVERT_EXPR, utype, expr);
  return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr);
}
 
Index: testsuite/g++.dg/cpp0x/pr85112.C
===
--- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/pr85112.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/85112
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int m;
+  int n : 4;
+};
+
+int i;  // { dg-message "not const" }
+
+void foo()
+{
+  constexpr int j = i;  // { dg-error "not usable" }
+  A a;
+  a.n = j;
+}

Re: [PATCH] [PR c++/85039] no type definitions in builtin offsetof

2018-04-13 Thread Alexandre Oliva

On Apr 11, 2018, Jason Merrill  wrote:

> I raised this issue with the C++ committee, and it seems that nobody
> expects defining a type here to work.  So let's go back to your first
> patch, removing the offending part of semicolon3.C.

All right, here's the adjusted patch.  Retested on i686- and
x86_64-linux-gnu.  Ok to install?


[PR c++/85039] no type definitions in builtin offsetof

Types defined within a __builtin_offsetof argument don't always get
properly recorded as members of their context types, so if they're
anonymous, we may fail to assign them an anon type index for mangling
and ICE.

We shouldn't allow types to be introduced in __builtin_offsetof, I
think, and Jason says the std committee agrees, so I've arranged for
us to reject them.

Even then, we still parse the definitions and attempt to assign
mangled names to its member functions, so the ICE remains.  Since
we've already reported an error, we might as well complete the name
assignment with an arbitrary index, thus avoiding the ICE.

We used to have a test that expected to be able to define types in
__builtin_offsetof; this patch removes that specific test.


for  gcc/cp/ChangeLog

PR c++/85039
* parser.c (cp_parser_builtin_offset): Reject type definitions.
* mangle.c (nested_anon_class_index): Avoid crash returning -1
if we've seen errors.

for  gcc/testsuite/ChangeLog

PR c++/85039
* g++.dg/pr85039-1.C: New.
* g++.dg/pr85039-2.C: New.
* g++.dg/parse/semicolon3.C: Remove test_offsetof.
---
 gcc/cp/mangle.c |3 +++
 gcc/cp/parser.c |8 +++-
 gcc/testsuite/g++.dg/parse/semicolon3.C |7 ---
 gcc/testsuite/g++.dg/pr85039-1.C|   17 +
 gcc/testsuite/g++.dg/pr85039-2.C|   10 ++
 5 files changed, 37 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr85039-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr85039-2.C

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 94c4bed28486..a7f9d686345d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -1623,6 +1623,9 @@ nested_anon_class_index (tree type)
  ++index;
   }
 
+  if (seen_error ())
+return -1;
+
   gcc_unreachable ();
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 8b1b271b53d1..bf46165f5ae1 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -9823,7 +9823,13 @@ cp_parser_builtin_offsetof (cp_parser *parser)
   parens.require_open (parser);
   /* Parse the type-id.  */
   location_t loc = cp_lexer_peek_token (parser->lexer)->location;
-  type = cp_parser_type_id (parser);
+  {
+const char *saved_message = parser->type_definition_forbidden_message;
+parser->type_definition_forbidden_message
+  = G_("types may not be defined within __builtin_offsetof");
+type = cp_parser_type_id (parser);
+parser->type_definition_forbidden_message = saved_message;
+  }
   /* Look for the `,'.  */
   cp_parser_require (parser, CPP_COMMA, RT_COMMA);
   token = cp_lexer_peek_token (parser->lexer);
diff --git a/gcc/testsuite/g++.dg/parse/semicolon3.C 
b/gcc/testsuite/g++.dg/parse/semicolon3.C
index 8a2b1ac46301..0d46be9ed654 100644
--- a/gcc/testsuite/g++.dg/parse/semicolon3.C
+++ b/gcc/testsuite/g++.dg/parse/semicolon3.C
@@ -20,13 +20,6 @@ struct OK3
 } // no complaints
   (s7);
 
-__SIZE_TYPE__
-test_offsetof (void)
-{
-  // no complaints about a missing semicolon
-  return __builtin_offsetof (struct OK4 { int a; int b; }, b);
-}
-
 struct OK5
 {
   int a;
diff --git a/gcc/testsuite/g++.dg/pr85039-1.C b/gcc/testsuite/g++.dg/pr85039-1.C
new file mode 100644
index ..f57c8a261dee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr85039-1.C
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++14 } }
+
+constexpr int a() {
+ return
+  __builtin_offsetof(struct { // { dg-error "types may not be defined" }
+int i;
+short b {
+  __builtin_offsetof(struct { // { dg-error "types may not be defined" }
+   int j;
+struct c { // { dg-error "types may not be defined" }
+  void d() {
+  }
+};
+  }, j)
+};
+  }, i);
+}
diff --git a/gcc/testsuite/g++.dg/pr85039-2.C b/gcc/testsuite/g++.dg/pr85039-2.C
new file mode 100644
index ..e6d16325105b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr85039-2.C
@@ -0,0 +1,10 @@
+// { dg-do compile }
+
+struct d {
+  static d *b;
+} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be defined" }
+  int i;
+  struct a { // { dg-error "types may not be defined" }
+int c() { return .1f; }
+  };
+}, i));


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins

2018-04-13 Thread Segher Boessenkool

Hi Carl,

On Fri, Apr 13, 2018 at 09:49:25AM -0700, Carl Love wrote:
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index a0c9b5c..855be43 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>{
>arg0 = gimple_call_arg (stmt, 0);
>lhs = gimple_call_lhs (stmt);
> -  /* Only fold the vec_splat_*() if arg0 is constant.  */
> -  if (TREE_CODE (arg0) != INTEGER_CST)
> +  /* Only fold the vec_splat_*() if arg0 is a 5-bit constant.  */
> +  if (TREE_CODE (arg0) != INTEGER_CST
> +  || TREE_INT_CST_LOW (arg0) & ~0x1f)
>  return false;

Should this test for *signed* 5-bit constants only?

 if (TREE_CODE (arg0) != INTEGER_CST
 || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))

or similar?

Approved for trunk (with such a change if appropriate, and testing
then of course).  Thanks!


Segher

Re: RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Segher Boessenkool

Hi!

On Fri, Apr 13, 2018 at 04:43:02PM -0400, Vladimir Makarov wrote:
> On 04/13/2018 03:58 PM, Alexander Monakov wrote:
> >Here's another compact variant:
> >
> >   regno = reg_renumber[regno];
> >   if (regno < 0)
> > regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
> >
> Thanks, Alexander.  I like your variant more.

Me too :-)

> --- testsuite/gcc.target/powerpc/pr79916.c(nonexistent)
> +++ testsuite/gcc.target/powerpc/pr79916.c(working copy)
> @@ -0,0 +1,556 @@
> +/* { dg-do compile  { target { powerpc64le-*-* } } } */

Is the testcase specific to LE?  If not, please use powerpc*-*-*, or
powerpc*-*-* && lp64  if it needs 64-bit.

> +/* { dg-options "-Idfp -fno-expensive-optimizations --param 
> ira-max-conflict-table-size=0 -mno-popcntd -O3" } */

I think you can drop the -Idfp ?  If that option would do anything we
have bigger problems ;-)

Looks fine to me otherwise.  Thanks for working on this!


Segher

Re: RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Vladimir Makarov




On 04/13/2018 03:58 PM, Alexander Monakov wrote:

On Fri, 13 Apr 2018, Jakub Jelinek wrote:

  if (reg_renumber[regno] >= 0)
regno = reg_renumber[regno];
  else
regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
or
  regno = (reg_renumber[regno] >= 0
   ? reg_renumber[regno]
   : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]);
is better, the latter is perhaps more compact.

Here's another compact variant:

  regno = reg_renumber[regno];
  if (regno < 0)
regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];


Thanks, Alexander.  I like your variant more.

Sorry for forgetting GNU standard (it might happen when you are working 
on a few different projects simultaneously).


The revised patch is in the attachment.

Index: ChangeLog
===
--- ChangeLog	(revision 259376)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard
+	regs (if any) to define how to gnerate SD moves when LRA is in
+	progress.
+
 2018-04-13  Jan Hubicka  
 	Bin Cheng  
 
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 259376)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* gcc.target/powerpc/pr79916.c: New.
+
 2018-04-13  Andrey Belevantsev  
 
 	PR rtl-optimization/83852
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c	(revision 259330)
+++ config/rs6000/rs6000.c	(working copy)
@@ -10610,7 +10610,9 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10635,7 +10637,9 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
@@ -10664,7 +10668,9 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10689,7 +10695,9 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno];
+	  if (regno < 0)
+	regno = ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
Index: testsuite/gcc.target/powerpc/pr79916.c
===
--- testsuite/gcc.target/powerpc/pr79916.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr79916.c	(working copy)
@@ -0,0 +1,556 @@
+/* { dg-do compile  { target { powerpc64le-*-* } } } */
+/* { dg-options "-Idfp -fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */
+
+#define PASTE2(A,B) A ## B
+#define PASTE(A,B) PASTE2(A,B)
+
+#define TESTVAL_NEG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  si = VAL;\
+  sll = PASTE(VAL,LL);			\
+  a = si;\
+  b = sll;\
+  c = VAL;\
+  d = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NEG_BIG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  sll = PASTE(VAL,LL);			\
+  a = sll;\
+  b = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NONNEG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  si = VAL;\
+  ui = VAL;\
+  sll = PASTE(VAL,LL);			\
+  ull = PASTE(VAL,ULL);			\
+  a = si;\
+  b = sll;\
+  c = ui;\
+  d = ull;\
+  e = VAL;\
+  f = VAL;\
+  g = PASTE(VAL,LL);			\
+  h = PASTE(VAL,ULL);			\
+  if ((__builtin_memcmp ((void *),

[PATCH] rs6000 PR83660 fix ICE with vec_extract

2018-04-13 Thread Aaron Sawdey

Per the discussion on the 83660, I've come to a minimal patch to
prevent this. Basically marking the vec_extract tree as having side
effects later makes sure that it gets all the cleanup points it needs
so that gimplify_cleanup_point_expr () is happy.  Also because
vec_insert puts a MODIFY_EXPR in there, it has side effects and this
problem will not occur.

Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is
where this issue arises. OK for trunk if everything passes?

Thanks,
   Aaron


2018-04-13  Aaron Sawdey  

PR target/83660
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Mark
vec_extract expression as having side effects to make sure it gets
a cleanup point.

2018-04-13  Aaron Sawdey  

PR target/83660
* gcc.target/powerpc/pr83660.C: New test.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: config/rs6000/rs6000-c.c
===
--- config/rs6000/rs6000-c.c	(revision 259353)
+++ config/rs6000/rs6000-c.c	(working copy)
@@ -6705,6 +6705,15 @@
   stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
   stmt = build_indirect_ref (loc, stmt, RO_NULL);
 
+  /* PR83660: We mark this as having side effects so that
+	 downstream in fold_build_cleanup_point_expr () it will get a
+	 CLEANUP_POINT_EXPR.  If it does not we can run into an ICE
+	 later in gimplify_cleanup_point_expr ().  Potentially this
+	 causes missed optimization because the actually is no side
+	 effect.  */
+  if (c_dialect_cxx ())
+	TREE_SIDE_EFFECTS (stmt) = 1;
+
   return stmt;
 }
 
Index: testsuite/gcc.target/powerpc/pr83660.C
===
--- testsuite/gcc.target/powerpc/pr83660.C	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr83660.C	(working copy)
@@ -0,0 +1,14 @@
+/* PR target/83660 */
+/* { dg-do compile } */
+/* { dg-options "-mcpu=power7" } */
+
+#include 
+
+typedef __vector unsigned int  uvec32_t  __attribute__((__aligned__(16)));
+
+unsigned get_word(uvec32_t v)
+{
+return ({const unsigned _B1 = 32;
+vec_extract((uvec32_t)v, 2);});
+}
+

Re: [PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 09:32:07PM +0200, Eric Botcazou wrote:
> > This works nicely if the landing pad bb hasn't been really modified much,
> > in particular it still needs to contain only the instructions emitted
> > by expand_dw2_landing_pad_for_region function call and then an unconditional
> > jump or at least single successor and nothing else in the bb.
> 
> How is that supposed to work for SJLJ though?

Dunno, does SJLJ emit .gcc_except_table section with this limitation?

> OK, but also make expand_dw2_landing_pad_for_region private so that the same 
> mistake is not made again in the future.

Done, thanks.

Jakub

[PATCH v2, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative

2018-04-13 Thread Paul Clarke

The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h
do not properly handle shift values between 16 and 31, inclusive.
These are setting up the shift with vec_splat_s32, which only accepts
*5 bit signed* shift values, or a range of -16 to 15.  Values above 15
produce an error:

  error: argument 1 must be a 5-bit signed literal

Fix is to effectively reduce the range for which vec_splat_s32 is used
to < 32 and use vec_splats otherwise.

Also, __mm_slli_epi{16,32,64}, when given a negative shift value,
should always return a vector of {0}.

2018-04-13  Paul A. Clarke  

Changes in v2:
- fixed the "shift by 0" cases, which were being treated as negative
  shifts and returning {0} instead of a no-op. (Segher)
- fixed the test cases to correctly test the shift-by zero cases.

gcc/

PR target/83402
* config/rs6000/emmintrin.h (_mm_slli_epi{16,32,64}):
Ensure that vec_splat_s32 is only called with 0 <= shift < 16.
Ensure negative shifts result in {0}.

gcc/testsuite/

PR target/83402
* gcc.target/powerpc/sse2-psllw-1.c: Refactor and add tests for
several values:  positive, negative, and zero.
* gcc.target/powerpc/sse2-pslld-1.c: Same.
* gcc.target/powerpc/sse2-psllq-1.c: Same.

Index: gcc/config/rs6000/emmintrin.h
===
--- gcc/config/rs6000/emmintrin.h   (revision 259375)
+++ gcc/config/rs6000/emmintrin.h   (working copy)
@@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B)
   __v8hu lshift;
   __v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 };
 
-  if (__B < 16)
+  if (__B >= 0 && __B < 16)
 {
   if (__builtin_constant_p(__B))
  lshift = (__v8hu) vec_splat_s16(__B);
@@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B)
   __v4su lshift;
   __v4si result = { 0, 0, 0, 0 };
 
-  if (__B < 32)
+  if (__B >= 0 && __B < 32)
 {
-  if (__builtin_constant_p(__B))
-   lshift = (__v4su) vec_splat_s32(__B);
+  if (__builtin_constant_p(__B) && __B < 16)
+lshift = (__v4su) vec_splat_s32(__B);
   else
-   lshift = vec_splats ((unsigned int) __B);
+lshift = vec_splats ((unsigned int) __B);
 
   result = vec_vslw ((__v4si) __A, lshift);
 }
@@ -1527,17 +1527,12 @@ _mm_slli_epi64 (__m128i __A, int __B)
   __v2du lshift;
   __v2di result = { 0, 0 };
 
-  if (__B < 64)
+  if (__B >= 0 && __B < 64)
 {
-  if (__builtin_constant_p(__B))
-   {
- if (__B < 32)
- lshift = (__v2du) vec_splat_s32(__B);
-   else
- lshift = (__v2du) vec_splats((unsigned long long)__B);
-   }
+  if (__builtin_constant_p(__B) && __B < 16)
+   lshift = (__v2du) vec_splat_s32(__B);
   else
- lshift = (__v2du) vec_splats ((unsigned int) __B);
+   lshift = (__v2du) vec_splats ((unsigned int) __B);
 
   result = vec_vsld ((__v2di) __A, lshift);
 }
Index: gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c (revision 259375)
+++ gcc/testsuite/gcc.target/powerpc/sse2-pslld-1.c (working copy)
@@ -13,32 +13,50 @@
 #define TEST sse2_test_pslld_1
 #endif
 
-#define N 0xf
-
 #include 
 
-static __m128i
-__attribute__((noinline, unused))
-test (__m128i s1)
-{
-  return _mm_slli_epi32 (s1, N); 
-}
+#define TEST_FUNC(id, N) \
+  static __m128i \
+  __attribute__((noinline, unused)) \
+  test##id (__m128i s1) \
+  { \
+return _mm_slli_epi32 (s1, N);  \
+  }
 
+TEST_FUNC(0, 0)
+TEST_FUNC(15, 15)
+TEST_FUNC(16, 16)
+TEST_FUNC(31, 31)
+TEST_FUNC(neg1, -1)
+TEST_FUNC(neg16, -16)
+TEST_FUNC(neg32, -32)
+TEST_FUNC(neg64, -64)
+TEST_FUNC(neg128, -128)
+
+#define TEST_CODE(id, N) \
+  { \
+int e[4] = {0}; \
+union128i_d u, s; \
+int i; \
+s.x = _mm_set_epi32 (1, -2, 3, 4); \
+u.x = test##id (s.x); \
+if (N >= 0 && N < 32) \
+  for (i = 0; i < 4; i++) \
+e[i] = s.a[i] << (N * (N >= 0)); \
+if (check_union128i_d (u, e)) \
+  abort (); \
+  }
+
 static void
 TEST (void)
 {
-  union128i_d u, s;
-  int e[4] = {0};
-  int i;
- 
-  s.x = _mm_set_epi32 (1, -2, 3, 4);
-
-  u.x = test (s.x);
-
-  if (N < 32)
-for (i = 0; i < 4; i++)
-  e[i] = s.a[i] << N; 
-
-  if (check_union128i_d (u, e))
-abort (); 
+  TEST_CODE(0, 0);
+  TEST_CODE(15, 15);
+  TEST_CODE(16, 16);
+  TEST_CODE(31, 31);
+  TEST_CODE(neg1, -1);
+  TEST_CODE(neg16, -16);
+  TEST_CODE(neg32, -32);
+  TEST_CODE(neg64, -64);
+  TEST_CODE(neg128, -128);
 }
Index: gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c (revision 259375)
+++ gcc/testsuite/gcc.target/powerpc/sse2-psllq-1.c (working copy)
@@ -13,36 +13,56 @@
 #define TEST sse2_test_psllq_1
 #endif
 
-#define N 60
-
 #include 
 
 #ifdef

[PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32

2018-04-13 Thread Kelvin Nilsen

Twelve failures have been occuring in the bfp test directory during -m32
regression testing.

The cause of these failures was two-fold:

1. Patches added subsequent to development of the tests caused new error
messages
to be emitted that are different than the error messages expected in the
dejagnu patterns.
These new patches also changed which built-in functions are legal when
compiling with the
-m32 command-line option.

2. The implementation of overloaded built-in functions maps overloaded
function names to
non-overloaded names.  Depending on the stage at which an error is
recognized, error
messages may refer either to the overloaded built-in function name or
the non-overloaded
name.

This patch:

1. Changes the expected error messages in certain test programs.

2. Disables certain test programs from being exercised on 32-bit targets.

3. Adds a "note" error message to explain the mapping from overloaded
built-in functions
to non-overloaded built-in functions.


This patch has bootstrapped and tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
both -m32
and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2018-04-13  Kelvin Nilsen  

    * config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p):
    New prototype.
    * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
    Add note to error message to explain internal mapping of overloaded
    built-in function name to non-overloaded built-in function name.
    * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New
    function.

gcc/testsuite/ChangeLog:

2018-04-13  Kelvin Nilsen  

    * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to
    prevent cascading of errors and change expected error message.
    * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test
    to 64-bit targets.
    * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected
    error message.
    * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
(working copy)
@@ -8,10 +8,10 @@
    error because the builtin requires 64 bits.  */
 #include 
 
-unsigned __int128 /* { dg-error "'__int128' is not supported on this
target" } */
+unsigned long long int
 get_significand (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source); /* { dg-error
"builtin function '__builtin_vec_scalar_extract_sig' not supported in
this compiler configuration" } */
+  return (long long int) __builtin_vec_scalar_extract_sig (source); /*
{ dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */
 }
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c    (working
copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power9" } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-options "-mcpu=power9" } */
 
@@ -11,6 +12,8 @@
 {
   __ieee128 source = *p;
 
+  /* IEEE 128-bit floating point operations are only supported
+ on 64-bit targets.  */
   return scalar_test_neg (source);
 }
 
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   
(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power9" } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-options "-mcpu=power9" } */
 
@@ -11,6 +12,8 @@
 {
   __ieee128 source = *p;
 
+  /* IEEE 128-bit floating point operations are only supported
+ on 64-bit targets.  */
   return scalar_test_data_class (source, 3);
 }
 
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c   
(revision 259316)
+++

Re: RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Alexander Monakov

On Fri, 13 Apr 2018, Jakub Jelinek wrote:
> if (reg_renumber[regno] >= 0)
>   regno = reg_renumber[regno];
> else
>   regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
> or
> regno = (reg_renumber[regno] >= 0
>  ? reg_renumber[regno]
>  : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]);
> is better, the latter is perhaps more compact.

Here's another compact variant:

  regno = reg_renumber[regno];
  if (regno < 0)
regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];

Alexander

Re: RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 03:29:47PM -0400, Vladimir Makarov wrote:
>   The attached patch fixes
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79916
> 
>   The PR is about LRA cycling on some tests when SD data are used.  The
> problem was in that actual assigned reg to pseudo was not in the pseudo
> preferred class and this resulted in wrong generated code which LRA tried to
> change again and again.
> 
>   The patch was successfully bootstrapped and tested on ppc64 (gcc110).
> 
>   Is it ok to commit?

I have just formatting nits and will defer the actual review to the PowerPC
maintainers.  The nit is that all the lines are too long.
Not sure if
  if (reg_renumber[regno] >= 0)
regno = reg_renumber[regno];
  else
regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
or
  regno = (reg_renumber[regno] >= 0
   ? reg_renumber[regno]
   : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1]);
is better, the latter is perhaps more compact.

> --- config/rs6000/rs6000.c(revision 259330)
> +++ config/rs6000/rs6000.c(working copy)
> @@ -10610,7 +10610,7 @@ rs6000_emit_move (rtx dest, rtx source,
>if (regno >= FIRST_PSEUDO_REGISTER)
>   {
> cl = reg_preferred_class (regno);
> -   regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
> +   regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == 
> NO_REGS ? -1 : ira_class_hard_regs[cl][1];
>   }
>if (regno >= 0 && ! FP_REGNO_P (regno))
>   {
> @@ -10635,7 +10635,7 @@ rs6000_emit_move (rtx dest, rtx source,
>   {
> cl = reg_preferred_class (regno);
> gcc_assert (cl != NO_REGS);
> -   regno = ira_class_hard_regs[cl][0];
> +   regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : 
> ira_class_hard_regs[cl][0];
>   }
>if (FP_REGNO_P (regno))
>   {
> @@ -10664,7 +10664,7 @@ rs6000_emit_move (rtx dest, rtx source,
>if (regno >= FIRST_PSEUDO_REGISTER)
>   {
> cl = reg_preferred_class (regno);
> -   regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
> +   regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == 
> NO_REGS ? -1 : ira_class_hard_regs[cl][0];
>   }
>if (regno >= 0 && ! FP_REGNO_P (regno))
>   {
> @@ -10689,7 +10689,7 @@ rs6000_emit_move (rtx dest, rtx source,
>   {
> cl = reg_preferred_class (regno);
> gcc_assert (cl != NO_REGS);
> -   regno = ira_class_hard_regs[cl][0];
> +   regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : 
> ira_class_hard_regs[cl][0];
>   }
>if (FP_REGNO_P (regno))
>   {

Jakub

Re: [PATCH, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative

2018-04-13 Thread Paul Clarke

On 04/13/2018 02:37 PM, Segher Boessenkool wrote:
> On Thu, Apr 12, 2018 at 07:07:21PM -0500, Paul Clarke wrote:
>> The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h
>> do not properly handle shift values between 16 and 31, inclusive.
>> These were setting up the shift with vec_splat_s32, which only accepts
>> *5 bit signed* shift values, or a range of -16 to 15.  Values above 15
>> produced an error:
>>
>>   error: argument 1 must be a 5-bit signed literal
>>
>> Fix is to effectively reduce the range for which vec_splat_s32 is used
>> to < 32 and use vec_splats otherwise.
>>
>> Also, __mm_slli_epi{16,32,64}, when given a negative shift value,
>> should always return a vector of {0}.
> 
> Yup.
> 
>> 2018-04-12  Paul A. Clarke  
>>
>> gcc/config
> 
> There is no changelog there; the changelog is in gcc/.  I'll fix it up.

At some point, I'll need to figure out what that means, I guess.  :-)

>> --- gcc/config/rs6000/emmintrin.h(revision 259016)
>> +++ gcc/config/rs6000/emmintrin.h(working copy)
>> @@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B)
>>__v8hu lshift;
>>__v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>  
>> -  if (__B < 16)
>> +  if (__B > 0 && __B < 16)
>>  {
>>if (__builtin_constant_p(__B))
>>lshift = (__v8hu) vec_splat_s16(__B);
>> @@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B)
>>__v4su lshift;
>>__v4si result = { 0, 0, 0, 0 };
>>  
>> -  if (__B < 32)
>> +  if (__B > 0 && __B < 32)
> 
> These should >= 0 (I'll fix it).

Oh, you're right, of course.  That also means the test cases have to change.  
Shall I submit a new patch with those changes?

> Thanks for the patch!  I'll commit it; please get your commit access in
> order if you plan any further patches.

I'll work on that.

PC

Re: [PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)

2018-04-13 Thread Richard Biener

On April 13, 2018 9:33:31 PM GMT+02:00, Eric Botcazou  
wrote:
>> 2018-04-13  Jakub Jelinek  
>> 
>>  PR rtl-optimization/85376
>>  * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ
>and
>>  zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX
>>  instead of a specific value.
>> 
>>  * gcc.dg/pr85376.c: New test.
>
>My preference would be for this one.

Likewise. 

Richard.

Re: [PATCH, rs6000] (PR84302) Fix _mm_slli_epi{32,64} for shift values 16 through 31 and negative

2018-04-13 Thread Segher Boessenkool

Hi!

On Thu, Apr 12, 2018 at 07:07:21PM -0500, Paul Clarke wrote:
> The powerpc versions of _mm_slli_epi32 and __mm_slli_epi64 in emmintrin.h
> do not properly handle shift values between 16 and 31, inclusive.
> These were setting up the shift with vec_splat_s32, which only accepts
> *5 bit signed* shift values, or a range of -16 to 15.  Values above 15
> produced an error:
> 
>   error: argument 1 must be a 5-bit signed literal
> 
> Fix is to effectively reduce the range for which vec_splat_s32 is used
> to < 32 and use vec_splats otherwise.
> 
> Also, __mm_slli_epi{16,32,64}, when given a negative shift value,
> should always return a vector of {0}.

Yup.

> 2018-04-12  Paul A. Clarke  
> 
> gcc/config

There is no changelog there; the changelog is in gcc/.  I'll fix it up.

> --- gcc/config/rs6000/emmintrin.h (revision 259016)
> +++ gcc/config/rs6000/emmintrin.h (working copy)
> @@ -1488,7 +1488,7 @@ _mm_slli_epi16 (__m128i __A, int __B)
>__v8hu lshift;
>__v8hi result = { 0, 0, 0, 0, 0, 0, 0, 0 };
>  
> -  if (__B < 16)
> +  if (__B > 0 && __B < 16)
>  {
>if (__builtin_constant_p(__B))
> lshift = (__v8hu) vec_splat_s16(__B);
> @@ -1507,12 +1507,12 @@ _mm_slli_epi32 (__m128i __A, int __B)
>__v4su lshift;
>__v4si result = { 0, 0, 0, 0 };
>  
> -  if (__B < 32)
> +  if (__B > 0 && __B < 32)

These should >= 0 (I'll fix it).

Thanks for the patch!  I'll commit it; please get your commit access in
order if you plan any further patches.

Cheers,


Segher

[PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 12:33:02AM +0200, Jakub Jelinek wrote:
> The following patch let us punt in these cases.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?
> 
> Another option would be to tweak simplify-rtx.c and instead of doing
>   else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value))
> int_value = GET_MODE_PRECISION (imode);
> do
>   else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value))
>   return NULL_RTX;
> and similarly for CLZ, haven't tested what would break if anything;
> we've been doing something like that since r62453 when the
> C?Z_DEFINED_VALUE_AT_ZERO macros have been introduced, and before that
> actually the same, just unconditionally assumed the value is undefined at 0.

And here is the variant patch, also bootstrapped/regtested on x86_64-linux
and i686-linux.  Is this one ok for trunk, or the other one (or something
else)?

2018-04-13  Jakub Jelinek  

PR rtl-optimization/85376
* simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and
zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX
instead of a specific value.

* gcc.dg/pr85376.c: New test.

--- gcc/simplify-rtx.c.jj   2018-03-21 11:13:00.322234030 +0100
+++ gcc/simplify-rtx.c  2018-04-13 11:47:05.689621937 +0200
@@ -1877,7 +1877,7 @@ simplify_const_unary_operation (enum rtx
  if (wi::ne_p (op0, 0))
int_value = wi::clz (op0);
  else if (! CLZ_DEFINED_VALUE_AT_ZERO (imode, int_value))
-   int_value = GET_MODE_PRECISION (imode);
+   return NULL_RTX;
  result = wi::shwi (int_value, result_mode);
  break;
 
@@ -1889,7 +1889,7 @@ simplify_const_unary_operation (enum rtx
  if (wi::ne_p (op0, 0))
int_value = wi::ctz (op0);
  else if (! CTZ_DEFINED_VALUE_AT_ZERO (imode, int_value))
-   int_value = GET_MODE_PRECISION (imode);
+   return NULL_RTX;
  result = wi::shwi (int_value, result_mode);
  break;
 
--- gcc/testsuite/gcc.dg/pr85376.c.jj   2018-04-12 17:44:41.506370642 +0200
+++ gcc/testsuite/gcc.dg/pr85376.c  2018-04-12 17:45:11.669401115 +0200
@@ -0,0 +1,32 @@
+/* PR rtl-optimization/85376 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-Og -fno-dce -fgcse -fno-tree-ccp -fno-tree-copy-prop 
-Wno-psabi" } */
+
+typedef unsigned int U __attribute__ ((vector_size (64)));
+typedef unsigned __int128 V __attribute__ ((vector_size (64)));
+unsigned int e, i, l;
+unsigned char f;
+U g, h, k, j;
+
+static inline V
+foo (unsigned char n, unsigned short o, unsigned int p, U q, U r, U s)
+{
+  unsigned int t;
+  o <<= 5;
+  q[7] >>= __builtin_add_overflow (0xfff0, __builtin_ffs (n), [5]);
+  t = __builtin_ffs (g[7]);
+  e *= __builtin_sub_overflow (o, t, );
+  return f + (V) g + (V) h + (V) q + i + (V) j + (V) s + (V) k + l;
+}
+
+int
+main ()
+{
+  if (__SIZEOF_INT128__ != 16 || __SIZEOF_INT__ != 4 || __CHAR_BIT__ != 8)
+return 0;
+  V x = foo (0, 1, 5, (U) { }, (U) { }, (U) { });
+  for (unsigned i = 0; i < 4; i++)
+if ((unsigned int) x[i] != 0x20)
+  __builtin_abort ();
+  return 0;
+}


Jakub

Re: [PATCH] Fix CSE CLZ/CTZ handling (PR rtl-optimization/85376, variant)

2018-04-13 Thread Eric Botcazou

> 2018-04-13  Jakub Jelinek  
> 
>   PR rtl-optimization/85376
>   * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and
>   zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX
>   instead of a specific value.
> 
>   * gcc.dg/pr85376.c: New test.

My preference would be for this one.

-- 
Eric Botcazou

Re: [PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)

2018-04-13 Thread Eric Botcazou

> This works nicely if the landing pad bb hasn't been really modified much,
> in particular it still needs to contain only the instructions emitted
> by expand_dw2_landing_pad_for_region function call and then an unconditional
> jump or at least single successor and nothing else in the bb.

How is that supposed to work for SJLJ though?

> 2018-04-13  Jakub Jelinek  
> 
>   PR rtl-optimization/85393
>   * bb-reorder.c (fix_up_crossing_landing_pad): In new_bb emit just
>   a label and unconditional jump to old_bb, rather than
>   expand_dw2_landing_pad_for_region insn(s) and jump to single_succ
>   basic block.

OK, but also make expand_dw2_landing_pad_for_region private so that the same 
mistake is not made again in the future.

-- 
Eric Botcazou

RFA[powerpc]: patch to fix PR79916

2018-04-13 Thread Vladimir Makarov


  The attached patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79916

  The PR is about LRA cycling on some tests when SD data are used.  The 
problem was in that actual assigned reg to pseudo was not in the pseudo 
preferred class and this resulted in wrong generated code which LRA 
tried to change again and again.


  The patch was successfully bootstrapped and tested on ppc64 (gcc110).

  Is it ok to commit?


Index: ChangeLog
===
--- ChangeLog	(revision 259376)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard
+	regs (if any) to define how to gnerate SD moves when LRA is in
+	progress.
+
 2018-04-13  Jan Hubicka  
 	Bin Cheng  
 
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 259376)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/79916
+	* gcc.target/powerpc/pr79916.c: New.
+
 2018-04-13  Andrey Belevantsev  
 
 	PR rtl-optimization/83852
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c	(revision 259330)
+++ config/rs6000/rs6000.c	(working copy)
@@ -10610,7 +10610,7 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
+	  regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][1];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10635,7 +10635,7 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
@@ -10664,7 +10664,7 @@ rs6000_emit_move (rtx dest, rtx source,
   if (regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  cl = reg_preferred_class (regno);
-	  regno = cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : cl == NO_REGS ? -1 : ira_class_hard_regs[cl][0];
 	}
   if (regno >= 0 && ! FP_REGNO_P (regno))
 	{
@@ -10689,7 +10689,7 @@ rs6000_emit_move (rtx dest, rtx source,
 	{
 	  cl = reg_preferred_class (regno);
 	  gcc_assert (cl != NO_REGS);
-	  regno = ira_class_hard_regs[cl][0];
+	  regno = reg_renumber[regno] >= 0 ? reg_renumber[regno] : ira_class_hard_regs[cl][0];
 	}
   if (FP_REGNO_P (regno))
 	{
Index: testsuite/gcc.target/powerpc/pr79916.c
===
--- testsuite/gcc.target/powerpc/pr79916.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr79916.c	(working copy)
@@ -0,0 +1,556 @@
+/* { dg-do compile  { target { powerpc64le-*-* } } } */
+/* { dg-options "-Idfp -fno-expensive-optimizations --param ira-max-conflict-table-size=0 -mno-popcntd -O3" } */
+
+#define PASTE2(A,B) A ## B
+#define PASTE(A,B) PASTE2(A,B)
+
+#define TESTVAL_NEG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  si = VAL;\
+  sll = PASTE(VAL,LL);			\
+  a = si;\
+  b = sll;\
+  c = VAL;\
+  d = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NEG_BIG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  sll = PASTE(VAL,LL);			\
+  a = sll;\
+  b = PASTE(VAL,LL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0))		\
+__builtin_abort ();
+
+#define TESTVAL_NONNEG(VAL,SUF,SIZE)	\
+  x = PASTE(PASTE(VAL,.),SUF);		\
+  si = VAL;\
+  ui = VAL;\
+  sll = PASTE(VAL,LL);			\
+  ull = PASTE(VAL,ULL);			\
+  a = si;\
+  b = sll;\
+  c = ui;\
+  d = ull;\
+  e = VAL;\
+  f = VAL;\
+  g = PASTE(VAL,LL);			\
+  h = PASTE(VAL,ULL);			\
+  if ((__builtin_memcmp ((void *), (void *), SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void *), (void *),SIZE) != 0)		\
+  || (__builtin_memcmp ((void

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Jason Merrill

On Fri, Apr 13, 2018 at 1:18 PM, Paolo Carlini  wrote:
> Hi,
>
>
> On 13/04/2018 19:14, Jason Merrill wrote:
>>
>> On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carlini 
>> wrote:
>>>
>>> Hi,
>>>
>>> On 13/04/2018 16:06, Jason Merrill wrote:

 On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini
 
 wrote:
>
> Hi,
>
> in this error-recovery regression, after a sensible error message
> issued
> by
> cxx_constant_init, store_init_value stores an error_mark_node as
> DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears
> much
> later, to cause a crash during gimplification. As far as I know, the
> choice
> of storing error_mark_nodes too is the outcome of a rather delicate
> error-recovery strategy and I don't think we want to revisit it at this
> time, thus the remaining option is catching later the error_mark_node,
> at
> a
> "good" time. I note, in passing, that the do loop in gimplify_expr
> which
> uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking
> from
> the
> error-recovery point of view, because at each iteration it *does* cover
> for
> error_operand_p (save_expr) but only immediately after the call, when
> it's
> too late.
>
> All the above said, I believe that at least for the 8.1.0 needs we may
> want
> to catch the error_mark_node in cp_build_modify_expr, when we are
> handling
> the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR
> from
> the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep
> in
> convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone
> form,
> with the error_mark_node as the first operand. Passes testing on
> x86_64-linux.

 We should avoid wrapping an error_mark_node in a NOP_EXPR in the first
 place.
>>>
>>> Basing on my analysis, that's easy to do, in convert_to_integer_1.
>>
>> Are we passing an error_mark_node down into convert_to_integer_1?
>> Where are we folding away the VAR_DECL?
>
> That convert gets a NOP_EXPR with the VAR_DECL for 'j' as argument and
> returns the error_mark_node.

Ah, I see.  The problem is that even though convert_to_integer_1 was
called with dofold false, we lose that when it calls convert().  Why
not recurse directly to convert_to_integer_1 with the underlying type?
 That would avoid the undesired folding.

Jason

[PATCH] Fix bbpart handling of EH pads (PR rtl-optimization/85393)

2018-04-13 Thread Jakub Jelinek

Hi!

As mentioned in the comments, the .gcc_except_table format doesn't allow the
throwing region and corresponding landing pad to be in different partitions,
because it uses landing_pad_symbol - the start of the partition in which the
throwing region is and we don't generally have relocations for arbitrary
symbol subtraction.  The fix_up_crossing_landing_pad caller checks if all
the EH incoming edges are from the same partition, in that case makes sure
the landing pad is also in that partition; in case they are from different
partitions, "duplicates" the landing pad into another bb, redirects EH edges
from one of the partitions to the new landing pad and makes sure we jump
from that landing pad to the single_succ of the old landing pad.

This works nicely if the landing pad bb hasn't been really modified much,
in particular it still needs to contain only the instructions emitted
by expand_dw2_landing_pad_for_region function call and then an unconditional
jump or at least single successor and nothing else in the bb.

On the following testcase, GCSE pre pass adds another instruction into the
landing pad bb and because fix_up_crossing_landing_pad doesn't really
duplicate the bb, but instead regenerates it from scratch based from what
the landing pad should contain, that pre inserted insn is lost from the new
landing pad and we segfault trying to dereference uninitialized register.
Similarly, if e.g. the landing pad bb is merged with some other bb, or has
more than one successor etc.

Instead of trying to duplicate the whole landing pad (which also wastes
space), this patch instead just emits into the new landing pad's bb a label
and immediately jumps to the other landing pad (that is a crossing jump of
course).  This way we don't need to duplicate anything.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-04-13  Jakub Jelinek  

PR rtl-optimization/85393
* bb-reorder.c (fix_up_crossing_landing_pad): In new_bb emit just
a label and unconditional jump to old_bb, rather than
expand_dw2_landing_pad_for_region insn(s) and jump to single_succ
basic block.

* g++.dg/opt/pr85393.C: New test.
* g++.dg/opt/pr85393-aux.cc: New file.

--- gcc/bb-reorder.c.jj 2018-01-12 19:20:32.412976124 +0100
+++ gcc/bb-reorder.c2018-04-13 15:59:41.094859051 +0200
@@ -1409,14 +1409,14 @@ get_uncond_jump_length (void)
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
-   Duplicate the landing pad and split the edges so that no EH edge
-   crosses partitions.  */
+   Add a new landing pad that will just jump to the old one and split the
+   edges so that no EH edge crosses partitions.  */
 
 static void
 fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
 {
   eh_landing_pad new_lp;
-  basic_block new_bb, last_bb, post_bb;
+  basic_block new_bb, last_bb;
   rtx_insn *jump;
   unsigned new_partition;
   edge_iterator ei;
@@ -1431,24 +1431,20 @@ fix_up_crossing_landing_pad (eh_landing_
   /* Put appropriate instructions in new bb.  */
   rtx_code_label *new_label = emit_label (new_lp->landing_pad);
 
-  expand_dw2_landing_pad_for_region (old_lp->region);
-
-  post_bb = BLOCK_FOR_INSN (old_lp->landing_pad);
-  post_bb = single_succ (post_bb);
-  rtx_code_label *post_label = block_label (post_bb);
-  jump = emit_jump_insn (targetm.gen_jump (post_label));
-  JUMP_LABEL (jump) = post_label;
+  rtx_code_label *old_label = block_label (old_bb);
+  jump = emit_jump_insn (targetm.gen_jump (old_label));
+  JUMP_LABEL (jump) = old_label;
 
   /* Create new basic block to be dest for lp.  */
   last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
   new_bb = create_basic_block (new_label, jump, last_bb);
   new_bb->aux = last_bb->aux;
-  new_bb->count = post_bb->count;
+  new_bb->count = old_bb->count;
   last_bb->aux = new_bb;
 
   emit_barrier_after_bb (new_bb);
 
-  make_single_succ_edge (new_bb, post_bb, 0);
+  make_single_succ_edge (new_bb, old_bb, 0);
 
   /* Make sure new bb is in the other partition.  */
   new_partition = BB_PARTITION (old_bb);
@@ -1457,7 +1453,7 @@ fix_up_crossing_landing_pad (eh_landing_
 
   /* Fix up the edges.  */
   for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
-if (BB_PARTITION (e->src) == new_partition)
+if (e->src != new_bb && BB_PARTITION (e->src) == new_partition)
   {
rtx_insn *insn = BB_END (e->src);
rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
--- gcc/testsuite/g++.dg/opt/pr85393.C.jj   2018-04-13 12:05:46.935137936 
+0200
+++ gcc/testsuite/g++.dg/opt/pr85393.C  2018-04-13 12:10:03.473256722 +0200
@@ -0,0 +1,29 @@
+// PR rtl-optimization/85393
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+// { dg-additional-sources "pr85393-aux.cc" }
+
+#include 
+#include 
+
+void foo (char const *s);
+struct S { ~S () noexcept (false) { throw std::runtime_error ("foo"); } };
+
+int
+main

Re: [PATCH] Fix -fsanitize=address VLA instrumentation (PR sanitizer/85230)

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 12:16:06AM +0200, Jakub Jelinek wrote:
> Bootstrapped on
> {x86_64,i686,powerpc64,powerpc64le,aarch64,s390x,armv7hl}-linux, regtested
> on {x86_64,i686,powerpc64,powerpc64le}-linux so far, but on the power* ones
> on virtual address space size that isn't really supported (likely
> https://github.com/google/sanitizers/issues/933#issuecomment-380058705
> issue, so while nothing regresses there, pretty much all asan tests fail
> there before and after the patch); also tested successfully with
> asan.exp=alloca* on gcc110 and gcc112 on compile farm where it doesn't
> suffer from the VA issue.  Ok if testing passes also on aarch64, s390x
> and armv7hl?

Passed regtest also on aarch64, s390x and armv7hl, additionally with the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85394#c2 patch on top of this
patch bootstrapped/regtested on powerpc64{,le}-linux, this time without
any asan related issues.

Ok for trunk?

Jakub

Re: [PATCH] Make redirection only for target_clones: V2 (PR ipa/85329).

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 04:33:40PM +0200, Martin Liška wrote:
> > Ah, but we emit the resolver only if we see a use of it.  That sounds quite
> > broken, resolver in each TU that uses it?  Better to have one at each
> > definition...
> > 
> > Jakub
> > 
> 
> So after quite some time I would need some brainstorming as I'm not sure how 
> to
> fix that properly. Let's start how 'target' attribute works and I believe it 
> should
> behave same as target_clones attribute:

I think the target_clones behavior if you put the .resolver into
.text. section in  comdat rather than
.text..resolver and .resolver comdat is reasonable.

The thing is that both the .resolver and  symbols are
defined in the section in which the resolver is defined.
Maybe it would be nice if the resolver was made not exported out of the TU,
and eventually, not necessarily now for GCC8, turn the specializations into
non-exported symbols too and put them into the same section or different
sections of the same comdat group.

For ctors and dtors we need extra care about the aliases, not sure if we can
have an alias to ifunc symbol, or if we need to emit two ifunc symbols with
the same resolver or what exactly.

And yes, it would be nice if the target attribute multiversioning worked
similarly to this.  It would change behavior for it in ABI incompatible way,
the question is how much work it would be as well.

What you can do right now with the target attribute multiversioning and
couldn't do after the change would be e.g.
mv.h:
__attribute__((target ("default"))) void foo ();
__attribute__((target ("avx"))) void foo ();
mv1.C:
#include "mv.h"
__attribute__((target ("default"))) void foo () {}
mv2.C:
#include "mv.h"
__attribute__((target ("avx"))) void foo () {}
mv3.C:
#include "mv.h"
void bar () { foo (); }
You couldn't this in the semantics closer to target_clones, all the
definitions would need to be done in the same file which is where you'd also
get the ifunc symbol.

IMHO it is worth changing the semantics anyway, because the current one
isn't very well thought out, it doesn't really work well with comdats, can
have too many resolvers with the associated costs, etc.

Honza, do you agree?

Jakub

[PATCH] configure.ac: honor --with-gcc-major-version in gcc-driver-name.h (PR jit/85384, variant)

2018-04-13 Thread Jakub Jelinek

On Thu, Apr 12, 2018 at 04:51:21PM -0400, David Malcolm wrote:
> This patch updates gcc/configure.ac to use gcc_base_ver.
> 
> I had to drop the \$\$ from the sed expression to get it to work
> within the configure script; I'm not entirely sure what their purpose
> is.  Without them, it's still matching on the first group of numeric
> characters in BASE-VER.
> 
> Tested with and without --with-gcc-major-version; in each case,
> gcc-driver-name.h is correctly determined.
> 
> Fixes the linker issue reported downstream in
>   https://bugzilla.redhat.com/show_bug.cgi?id=1566178
> and fixes the driver not found issue with:
>   gcc_jit_context_set_bool_use_external_driver (ctxt, 1);
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> config/ChangeLog:
>   PR jit/85384
>   * acx.m4 (GCC_BASE_VER): Remove \$\$ from sed expression.
> 
> gcc/ChangeLog:
>   PR jit/85384
>   * configure.ac (gcc-driver-name.h): Honor --with-gcc-major-version
>   by using gcc_base_ver to generate a gcc_driver_version, and use
>   it when generating GCC_DRIVER_NAME.
>   * configure.ac: Regenerate.

Here is the variant I've talked about in patch form.  Bootstrapped/regtested
on x86_64-linux and i686-linux and tested with
--enable-languages=jit --enable-host-shared --disable-bootstrap
and
--enable-languages=jit --enable-host-shared --disable-bootstrap 
--with-gcc-major-version-only
Ok for trunk?

2018-04-13  Jakub Jelinek  

PR jit/85384
* configure.ac (GCC_DRIVER_NAME): For --with-gcc-major-version-only
use just major version in the driver filename rather than full
version.
* configure: Regenerated.

--- gcc/configure.ac.jj 2018-04-12 10:22:56.179162225 +0200
+++ gcc/configure.ac2018-04-13 16:16:02.712459619 +0200
@@ -6499,8 +6499,14 @@ AC_DEFINE_UNQUOTED(DIAGNOSTICS_COLOR_DEF
 
 # Generate gcc-driver-name.h containing GCC_DRIVER_NAME for the benefit
 # of jit/jit-playback.c.
+changequote(,)dnl
+gcc_driver_version=$gcc_BASEVER
+if test x$with_gcc_major_version_only = xyes ; then
+  gcc_driver_version=`echo $gcc_BASEVER | sed -e 's/^\([0-9]*\).*$/\1/'`
+fi
+changequote([,])dnl
 cat > gcc-driver-name.h < gcc-driver-name.h <

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Paolo Carlini


Hi,

On 13/04/2018 19:14, Jason Merrill wrote:

On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carlini  wrote:

Hi,

On 13/04/2018 16:06, Jason Merrill wrote:

On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini 
wrote:

Hi,

in this error-recovery regression, after a sensible error message issued
by
cxx_constant_init, store_init_value stores an error_mark_node as
DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much
later, to cause a crash during gimplification. As far as I know, the
choice
of storing error_mark_nodes too is the outcome of a rather delicate
error-recovery strategy and I don't think we want to revisit it at this
time, thus the remaining option is catching later the error_mark_node, at
a
"good" time. I note, in passing, that the do loop in gimplify_expr which
uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from
the
error-recovery point of view, because at each iteration it *does* cover
for
error_operand_p (save_expr) but only immediately after the call, when
it's
too late.

All the above said, I believe that at least for the 8.1.0 needs we may
want
to catch the error_mark_node in cp_build_modify_expr, when we are
handling
the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR
from
the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in
convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone
form,
with the error_mark_node as the first operand. Passes testing on
x86_64-linux.

We should avoid wrapping an error_mark_node in a NOP_EXPR in the first
place.

Basing on my analysis, that's easy to do, in convert_to_integer_1.

Are we passing an error_mark_node down into convert_to_integer_1?
Where are we folding away the VAR_DECL?
That convert gets a NOP_EXPR with the VAR_DECL for 'j' as argument and 
returns the error_mark_node.


Paolo.

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Jason Merrill

On Fri, Apr 13, 2018 at 1:05 PM, Paolo Carlini  wrote:
> Hi,
>
> On 13/04/2018 16:06, Jason Merrill wrote:
>>
>> On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini 
>> wrote:
>>>
>>> Hi,
>>>
>>> in this error-recovery regression, after a sensible error message issued
>>> by
>>> cxx_constant_init, store_init_value stores an error_mark_node as
>>> DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much
>>> later, to cause a crash during gimplification. As far as I know, the
>>> choice
>>> of storing error_mark_nodes too is the outcome of a rather delicate
>>> error-recovery strategy and I don't think we want to revisit it at this
>>> time, thus the remaining option is catching later the error_mark_node, at
>>> a
>>> "good" time. I note, in passing, that the do loop in gimplify_expr which
>>> uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from
>>> the
>>> error-recovery point of view, because at each iteration it *does* cover
>>> for
>>> error_operand_p (save_expr) but only immediately after the call, when
>>> it's
>>> too late.
>>>
>>> All the above said, I believe that at least for the 8.1.0 needs we may
>>> want
>>> to catch the error_mark_node in cp_build_modify_expr, when we are
>>> handling
>>> the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR
>>> from
>>> the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in
>>> convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone
>>> form,
>>> with the error_mark_node as the first operand. Passes testing on
>>> x86_64-linux.
>>
>> We should avoid wrapping an error_mark_node in a NOP_EXPR in the first
>> place.
>
> Basing on my analysis, that's easy to do, in convert_to_integer_1.

Are we passing an error_mark_node down into convert_to_integer_1?
Where are we folding away the VAR_DECL?

Jason

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Paolo Carlini


Hi,

On 13/04/2018 16:06, Jason Merrill wrote:

On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini  wrote:

Hi,

in this error-recovery regression, after a sensible error message issued by
cxx_constant_init, store_init_value stores an error_mark_node as
DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much
later, to cause a crash during gimplification. As far as I know, the choice
of storing error_mark_nodes too is the outcome of a rather delicate
error-recovery strategy and I don't think we want to revisit it at this
time, thus the remaining option is catching later the error_mark_node, at a
"good" time. I note, in passing, that the do loop in gimplify_expr which
uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the
error-recovery point of view, because at each iteration it *does* cover for
error_operand_p (save_expr) but only immediately after the call, when it's
too late.

All the above said, I believe that at least for the 8.1.0 needs we may want
to catch the error_mark_node in cp_build_modify_expr, when we are handling
the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from
the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in
convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form,
with the error_mark_node as the first operand. Passes testing on
x86_64-linux.

We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place.
Basing on my analysis, that's easy to do, in convert_to_integer_1. I 
wasn't sure we wanted to touch such basic facilities ;) 
Implementation-wise, I wondered whether we wanted to handle NOP_EXPR and 
error_mark_node specially inside build1 itself, but, again, that seems a 
bit invasive, plus all the build* facilities always allocate a new node 
as the first step. Anyway, fully testing the below will require a bit of 
time, shall I go ahead with that, given that g++.dg passed?


Thanks!
Paolo.


Index: convert.c
===
--- convert.c   (revision 259376)
+++ convert.c   (working copy)
@@ -743,6 +743,8 @@ convert_to_integer_1 (tree type, tree expr, bool d
{
  expr = convert (lang_hooks.types.type_for_mode
  (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr);
+ if (expr == error_mark_node)
+   return error_mark_node;
  return maybe_fold_build1_loc (dofold, loc, NOP_EXPR, type, expr);
}
 
Index: testsuite/g++.dg/cpp0x/pr85112.C
===
--- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/pr85112.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/85112
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int m;
+  int n : 4;
+};
+
+int i;  // { dg-message "not const" }
+
+void foo()
+{
+  constexpr int j = i;  // { dg-error "not usable" }
+  A a;
+  a.n = j;
+}

[PATCH] avoid -Wrestrict for null pointers (PR 85365)

2018-04-13 Thread Martin Sebor


PR 85365 is another example of a false positive caused by
the interaction between the instrumentation inserted by
sanitizers, jump threading, and a middle-end warning.
In this case, the warning is easy to avoid by suppressing
-Wrestrict for null pointers.  Although undefined, since
they do no point to an object, strictly speaking they do
not indicate an overlap, and so issuing a -Wrestrict
warning is not quite appropriate anyway.

Testing in progress.

Martin
PR c/85365 -  -Wrestrict false positives with -fsanitize=undefined

gcc/ChangeLog:

	PR c/85365
	* gimple-fold.c (gimple_fold_builtin_strcpy): Suppress -Wrestrict
	for null pointers.
	(gimple_fold_builtin_stxcpy_chk): Same.
	* gimple-ssa-warn-restrict.c (check_bounds_or_overlap): Same.

gcc/testsuite/ChangeLog:

	PR c/85365
	* gcc.dg/Wrestrict-15.c: New test.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c	(revision 259298)
+++ gcc/gimple-fold.c	(working copy)
@@ -1612,7 +1612,11 @@ gimple_fold_builtin_strcpy (gimple_stmt_iterator *
   /* If SRC and DEST are the same (and not volatile), return DEST.  */
   if (operand_equal_p (src, dest, 0))
 {
-  if (!gimple_no_warning_p (stmt))
+  /* Issue -Wrestrict unless the pointers are null (those do
+	 not point to objects and so do not indicate an overlap;
+	 such calls could be the result of sanitization and jump
+	 threading).  */
+  if (!integer_zerop (dest) && !gimple_no_warning_p (stmt))
 	{
 	  tree func = gimple_call_fndecl (stmt);
 
@@ -2593,7 +2597,11 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterat
   /* If SRC and DEST are the same (and not volatile), return DEST.  */
   if (fcode == BUILT_IN_STRCPY_CHK && operand_equal_p (src, dest, 0))
 {
-  if (!gimple_no_warning_p (stmt))
+  /* Issue -Wrestrict unless the pointers are null (those do
+	 not point to objects and so do not indicate an overlap;
+	 such calls could be the result of sanitization and jump
+	 threading).  */
+  if (!integer_zerop (dest) && !gimple_no_warning_p (stmt))
 	{
 	  tree func = gimple_call_fndecl (stmt);
 
Index: gcc/gimple-ssa-warn-restrict.c
===
--- gcc/gimple-ssa-warn-restrict.c	(revision 259298)
+++ gcc/gimple-ssa-warn-restrict.c	(working copy)
@@ -1880,11 +1880,20 @@ check_bounds_or_overlap (gcall *call, tree dst, tr
 
   if (operand_equal_p (dst, src, 0))
 {
-  warning_at (loc, OPT_Wrestrict,
-		  "%G%qD source argument is the same as destination",
-		  call, func);
-  gimple_set_no_warning (call, true);
-  return false;
+  /* Issue -Wrestrict unless the pointers are null (those do
+	 not point to objects and so do not indicate an overlap;
+	 such calls could be the result of sanitization and jump
+	 threading).  */
+  if (!integer_zerop (dst) && !gimple_no_warning_p (call))
+	{
+	  warning_at (loc, OPT_Wrestrict,
+		  "%G%qD source argument is the same as destination",
+		  call, func);
+	  gimple_set_no_warning (call, true);
+	  return false;
+	}
+
+  return true;
 }
 
   /* Return false when overlap has been detected.  */
Index: gcc/testsuite/gcc.dg/Wrestrict-15.c
===
--- gcc/testsuite/gcc.dg/Wrestrict-15.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wrestrict-15.c	(working copy)
@@ -0,0 +1,38 @@
+/* PR 85365 - -Wrestrict false positives with -fsanitize=undefined
+   { dg-do compile }
+   { dg-options "-O2 -Wrestrict -fsanitize=undefined" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+char *strcpy (char *, const char *);
+char *strcat (char *, const char *);
+
+size_t strlen (char *);
+
+extern char a[], b[], d[];
+
+size_t t1 (char *g, int i)
+{
+  /* The following exercises the handling in gimple-fold.c.  */
+  strcpy (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */
+  return strlen (g + 4);
+}
+
+void t2 (char *g, int i)
+{
+  strcpy (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */
+  strcat (g + 4, d);
+}
+
+void t3 (char *g, int i)
+{
+  /* The following exercises the handling in gimple-ssa-warn-restrict.c.  */
+  strcat (g + 4, i ? b : a);/* { dg-bogus "\\\[-Wrestrict]" } */
+  strcat (g + 4, d);
+}
+
+void t4 (char *p, char *q)
+{
+  strcpy (p, q);/* { dg-bogus "\\\[-Wrestrict]" } */
+  strcat (p, q + 32);
+}

[PATCH] PR 83402 Fix ICE for vec_splat_s8, vec_splat_s16, vec_splat_s32 builtins

2018-04-13 Thread Carl Love


GCC Maintainers:

GCC revision 255549  implemented early gimple folding for the
vec_splat_s[8,16,32] builtins.  However, as a consequence of the
implementation, we lost error checking for out-of-range values for the
expected vspltis[bhw] instructions.  The result of not having the out-
of-range checking is we get and ICE.

This patch adds the missing error checking on argument zero for the
vec_splat_s[8,16,32] builtins.  The argument must be a 5-bit const int
as specified for the vspltis[bhw] instructions.

The regression testing for the patch was done on GCC mainline on 

powerpc64le-unknown-linux-gnu (Power 8 LE)

with no regressions.  Additional hand testing was done as well to test
the various cases such as

vec_splat_s8 (31)
vec_splat_s8 (32)
vec_splat_s8 (value) where "value" is an integer variable

to verify vector result is correct for a 5-bit const int argument (
i.e. 31).  The error message "error: argument 1 must be a 5-bit signed
literal" is generated in the other cases.

Please let me know if the patch looks OK for the GCC 7 branch.

 Carl Love
---


gcc/ChangeLog:

2018-04-13  Carl Love  

PR target/83402
* config/rs6000/rs6000-c.c (rs6000_gimple_fold_builtin): Add
size check for arg0.
---
 gcc/config/rs6000/rs6000.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a0c9b5c..855be43 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16576,8 +16576,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   {
 arg0 = gimple_call_arg (stmt, 0);
 lhs = gimple_call_lhs (stmt);
-/* Only fold the vec_splat_*() if arg0 is constant.  */
-if (TREE_CODE (arg0) != INTEGER_CST)
+/* Only fold the vec_splat_*() if arg0 is a 5-bit constant.  */
+if (TREE_CODE (arg0) != INTEGER_CST
+|| TREE_INT_CST_LOW (arg0) & ~0x1f)
   return false;
 gimple_seq stmts = NULL;
 location_t loc = gimple_location (stmt);
-- 
2.7.4

[ARM,testsuite] Restrict armv8_2-fp16-scalar-2 to hf targets

2018-04-13 Thread Christophe Lyon

Hi,

It seems gcc.target/arm/armv8_2-fp16-scalar-2.c needs an armhf target
to pass: it wants 4 fmov.f16, but there are only 2 on
arm-linux-gnueabi for instance.

Since the test includes arm_fp16.h, adding -mfloat-abi=hard implies
the usual problem of missing gnu/stubs-hard.h, so it seems to me that
the easiest way is just to skip this test on non-hf targets.

Is this simple patch OK or do we want something smarter?

Thanks,

Christophe
2018-04-13  Christophe Lyon  

* gcc.target/arm/armv8_2-fp16-scalar-2.c: Require arm_hf_eabi.
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
index fa4828d..fcd7db4 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile }  */
+/* { dg-require-effective-target arm_hf_eabi }  */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
 /* { dg-options "-O2 -std=c11" }  */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-13 Thread James Greenhalgh

On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
> > wrote:
> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > Hi,
> >
> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
> > >:
> > > Hi Christophe,
> > >
> > > Please find attached the updated patch with testcases.
> > >
> > > Ok for trunk?
> >
> > Thanks for the update.
> >
> > Since the new intrinsics are only available on aarch64, you want to
> > prevent the tests from running on arm.
> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
> > targets.
> > There are several examples on how to do that in that directory.
> >
> > I have also noticed that the tests fail at execution on aarch64_be.
> 
> I think this is important to fix. We don't want the big-endian target to have
> failing implementations of the Neon intrinsics. What is the nature of the
> failure?
> 
> From what I can see, nothing in the patch prevents using these intrinsics
> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
> code bug), or the testcase expected behaviour is wrong.
> 
> I don't think disabling the test for big-endian is the right fix. We should
> either fix the intrinsics, or fix the testcase.
> 
> Thanks,
> James
> 
> Hi James,
> 
> As the tests assume the little endian order of elements while checking the
> results, the tests are failing for big endian targets. So, the failures are
> not because of intrinsic implementations, but because of the testcase.

The testcase is a little hard to follow through the macros, but why would
this be the case?

ld1 is deterministic on big and little endian for which elements will be
loaded from memory, as is st1.

My expectation would be that:

  int __attribute__ ((noinline))
  test_vld_u16_x3 ()
  {
uint16_t data[3 * 3];
uint16_t temp[3 * 3];
uint16x4x3_t vectors;
int i,j;
for (i = 0; i < 3 * 3; i++)
  data [i] = (uint16_t) 3*i;
asm volatile ("" : : : "memory");
vectors = vld1_u16_x3 (data);
vst1_u16 (temp, vectors.val[0]);
vst1_u16 ([3], vectors.val[1]);
vst1_u16 ([3 * 2], vectors.val[2]);
asm volatile ("" : : : "memory");
for (j = 0; j < 3 * 3; j++)
  if (temp[j] != data[j])
return 1;
return 0;
  }

would work equally well for big- or little-endian.

I think this is more likely to be an intrinsics implementation bug.

Thanks,
James

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-13 Thread Sameera Deshpande

On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
wrote:

> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > Hi,
> >
> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpa...@linaro.org>:
> > > Hi Christophe,
> > >
> > > Please find attached the updated patch with testcases.
> > >
> > > Ok for trunk?
> >
> > Thanks for the update.
> >
> > Since the new intrinsics are only available on aarch64, you want to
> > prevent the tests from running on arm.
> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
> targets.
> > There are several examples on how to do that in that directory.
> >
> > I have also noticed that the tests fail at execution on aarch64_be.
>
> I think this is important to fix. We don't want the big-endian target to
> have
> failing implementations of the Neon intrinsics. What is the nature of the
> failure?
>
> From what I can see, nothing in the patch prevents using these intrinsics
> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
> code bug), or the testcase expected behaviour is wrong.
>
> I don't think disabling the test for big-endian is the right fix. We should
> either fix the intrinsics, or fix the testcase.
>
> Thanks,
> James
>
> Hi James,


As the tests assume the little endian order of elements while checking the
results, the tests are failing for big endian targets. So, the failures are
not because of intrinsic implementations, but because of the testcase.

- Thanks and regards,
  Sameera D.

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-13 Thread James Greenhalgh

On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> Hi,
> 
> 2018-04-06 12:15 GMT+02:00 Sameera Deshpande :
> > Hi Christophe,
> >
> > Please find attached the updated patch with testcases.
> >
> > Ok for trunk?
> 
> Thanks for the update.
> 
> Since the new intrinsics are only available on aarch64, you want to
> prevent the tests from running on arm.
> Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
> targets.
> There are several examples on how to do that in that directory.
> 
> I have also noticed that the tests fail at execution on aarch64_be.

I think this is important to fix. We don't want the big-endian target to have
failing implementations of the Neon intrinsics. What is the nature of the
failure?

>From what I can see, nothing in the patch prevents using these intrinsics
on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
code bug), or the testcase expected behaviour is wrong.

I don't think disabling the test for big-endian is the right fix. We should
either fix the intrinsics, or fix the testcase.

Thanks,
James

Re: [PATCH] Make redirection only for target_clones: V2 (PR ipa/85329).

2018-04-13 Thread Martin Liška

On 04/12/2018 07:59 PM, Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 07:53:35PM +0200, Jakub Jelinek wrote:
>> On Thu, Apr 12, 2018 at 03:46:26PM +0200, Jan Hubicka wrote:
>>> If you make C++ inline and get the idea to use target cloning attribute on 
>>> this,
>>> this will likely lead to link error if you compile multiple files because 
>>> you
>>> turn comdat to non-comdat.
>>>
>>> For comdats this woudl effectivly need to become C++ abi extension and we 
>>> would
>>> need to define comdat sections for these.  Perhaps easiest way is to simply
>>> reject the attribute on comdats and probaby also extern functions?
>>
>> I'm not really sure we can do that, various packages in the wild are already
>> using this.
>> What is the problem with comdats and multi-versioning?
>> The question is what comdat groups we should use for the comdat resolver and
>> the versioned functions, shall the ifunc symbol be the original mangling of
>> the method (or other comdat) and the other entrypoints just be .local
>> non-weak symbols inside of the same section?
> 
> Ah, but we emit the resolver only if we see a use of it.  That sounds quite
> broken, resolver in each TU that uses it?  Better to have one at each
> definition...
> 
>   Jakub
> 

So after quite some time I would need some brainstorming as I'm not sure how to
fix that properly. Let's start how 'target' attribute works and I believe it 
should
behave same as target_clones attribute:

$ cat mv.h
int foo (void);

void test ();

$ cat mv.cpp 
#include "mv.h"

  __attribute__ ((target ("default")))
int foo ()
{
  return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
  return 1;
}

int main ()
{
  __builtin_printf ("in main: %d\n", foo ());
  test ();

  return 0;
}

$ cat mv2.cpp 
#include "mv.h"

void test()
{
  int (*f) (void) = 
  __builtin_printf ("value: %d\n", foo ());
}

$ gcc -fdump-ipa-cgraph=/dev/stdout mv.cpp -c

_Z13_Z3foov.ifuncv/2 (int _Z3foov.ifunc()) @0x767182e0
  Type: function definition analyzed alias cpp_implicit_alias
  Visibility: externally_visible asm_written public comdat 
comdat_group:_Z3foov.resolver one_only section:.text._Z3foov.resolver 
(implicit_section) artificial
  Same comdat group as: _Z3foov.resolver/6
  References: _Z3foov.resolver/6 (alias)
  Referring: 
  Availability: overwritable
  First run: 0
  Version info: next: _Z3foov/0 dispatcher: _Z3foov.resolver
  Function flags:
  Called by: 
  Calls: 
_Z3foov.sse4.2/1 (int foo()) @0x76718170
  Type: function definition analyzed
  Visibility: force_output externally_visible asm_written public
  Address is taken.
  References: 
  Referring: 
  Availability: available
  First run: 0
  Version info: prev: _Z3foov/0 dispatcher: int _Z3foov.ifunc()
  Function flags:
  Called by: 
  Calls: 
_Z3foov/0 (int foo()) @0x76718000
  Type: function definition analyzed
  Visibility: force_output externally_visible asm_written public
  Address is taken.
  References: 
  Referring: 
  Availability: available
  First run: 0
  Version info: next: _Z3foov.sse4.2/1 dispatcher: int _Z3foov.ifunc()
  Function flags:
  Called by: 
  Calls: 
_Z3foov.resolver/6 (_Z3foov.resolver) @0x767188a0
  Type: function definition analyzed
  Visibility: externally_visible asm_written public weak comdat 
comdat_group:_Z3foov.resolver one_only section:.text._Z3foov.resolver 
(implicit_section) artificial
  Same comdat group as: _Z13_Z3foov.ifuncv/2
  References: 
  Referring: _Z13_Z3foov.ifuncv/2 (alias)
  Availability: available
  First run: 0
  Function flags:
  Called by: 
  Calls: 

So note that resolver and ifunc live in a unique comdat_group. And as opposed 
to target_clones, default implementation has not appended '.default' suffix. 
That's
problem because an address taken returns default implementation as seen here:

$  gcc mv.cpp mv2.cpp && ./a.out 
in main: 1
value: 0

Which is the same problem is fixed for target_clones in this release cycle. So 
it's broken.

Apart from that, following is broken for target attribute:
cat Crypto-target.ii
struct aaa
{
  static __attribute__((target("avx512f"))) void foo() { __builtin_printf 
("avx512f\n"); }
  static __attribute__((target("sse"))) void foo() {__builtin_printf ("sse\n");}
  static __attribute__((target("default"))) void foo() {__builtin_printf 
("default\n");}
};

void (*initializer) (void) = { ::foo };

int main()
{
  initializer ();
//  aaa::foo ();
}

$ g++ Crypto-target.ii && ./a.out
/tmp/ccJMMaDc.o:(.data+0x0): undefined reference to `_ZN3aaa3fooEv.ifunc()'
collect2: error: ld returned 1 exit status

For target_clones, I have a patch candidate that works for the test-case in 
PR85329:

$ cat ~/Programming/testcases/Crypto.ii
struct a
{
  __attribute__((target_clones("sse", "default"))) void operator^=(a) {}
} * b;

class c {
public:
  a *d();
};

class f {
  void g();
  c e;
};

void f::g() { *e.d() ^= b[0]; }

$ g++ ~/Programming/testcases/Crypto.ii -c -fdump-ipa-cgraph=/dev/stdout

_ZN1aeOES_.resolver/6

Re: [C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Jason Merrill

On Fri, Apr 13, 2018 at 5:05 AM, Paolo Carlini  wrote:
> Hi,
>
> in this error-recovery regression, after a sensible error message issued by
> cxx_constant_init, store_init_value stores an error_mark_node as
> DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears much
> later, to cause a crash during gimplification. As far as I know, the choice
> of storing error_mark_nodes too is the outcome of a rather delicate
> error-recovery strategy and I don't think we want to revisit it at this
> time, thus the remaining option is catching later the error_mark_node, at a
> "good" time. I note, in passing, that the do loop in gimplify_expr which
> uses the crashing STRIP_USELESS_TYPE_CONVERSION seems a bit lacking from the
> error-recovery point of view, because at each iteration it *does* cover for
> error_operand_p (save_expr) but only immediately after the call, when it's
> too late.
>
> All the above said, I believe that at least for the 8.1.0 needs we may want
> to catch the error_mark_node in cp_build_modify_expr, when we are handling
> the assignment 'a.n = j;': convert_for_assignment produces a NOP_EXPR from
> the VAR_DECL for 'j' which then cp_convert_and_check regenerates (deep in
> convert_to_integer_1 via maybe_fold_build1_loc) in the final bare-bone form,
> with the error_mark_node as the first operand. Passes testing on
> x86_64-linux.

We should avoid wrapping an error_mark_node in a NOP_EXPR in the first place.

Jason

[PATCH] i386: Insert ENDBR after __morestack call

2018-04-13 Thread H.J. Lu

Since __morestack will jump back to its callee via indirect call, we
need to insert ENDBR after calling __morestack.

OK for trunk?

H.J.

gcc/

PR target/85388
* config/i386/i386.c (ix86_expand_split_stack_prologue): Insert
ENDBR after calling __morestack.

gcc/testsuite/

PR target/85388
* gcc.dg/pr85388-1.c: New test.
* gcc.dg/pr85388-2.c: Likewise.
* gcc.dg/pr85388-3.c: Likewise.
* gcc.dg/pr85388-4.c: Likewise.
* gcc.dg/pr85388-5.c: Likewise.
* gcc.dg/pr85388-6.c: Likewise.
---
 gcc/config/i386/i386.c   | 11 ++-
 gcc/testsuite/gcc.dg/pr85388-1.c | 50 +
 gcc/testsuite/gcc.dg/pr85388-2.c | 56 
 gcc/testsuite/gcc.dg/pr85388-3.c | 65 +
 gcc/testsuite/gcc.dg/pr85388-4.c | 69 
 gcc/testsuite/gcc.dg/pr85388-5.c | 54 +++
 gcc/testsuite/gcc.dg/pr85388-6.c | 56 
 7 files changed, 360 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-4.c
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-5.c
 create mode 100644 gcc/testsuite/gcc.dg/pr85388-6.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 03e5c433574..8b4fd8ae30b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15242,7 +15242,16 @@ ix86_expand_split_stack_prologue (void)
  instruction--we need control flow to continue at the subsequent
  label.  Therefore, we use an unspec.  */
   gcc_assert (crtl->args.pops_args < 65536);
-  emit_insn (gen_split_stack_return (GEN_INT (crtl->args.pops_args)));
+  rtx_insn *ret_insn
+= emit_insn (gen_split_stack_return (GEN_INT (crtl->args.pops_args)));
+
+  if ((flag_cf_protection & CF_BRANCH) && TARGET_IBT)
+{
+  /* Insert ENDBR since __morestack will jump back here via indirect
+call.  */
+  rtx cet_eb = gen_nop_endbr ();
+  emit_insn_after (cet_eb, ret_insn);
+}
 
   /* If we are in 64-bit mode and this function uses a static chain,
  we saved %r10 in %rax before calling _morestack.  */
diff --git a/gcc/testsuite/gcc.dg/pr85388-1.c b/gcc/testsuite/gcc.dg/pr85388-1.c
new file mode 100644
index 000..86d4737e32b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr85388-1.c
@@ -0,0 +1,50 @@
+/* This test needs to use setrlimit to set the stack size, so it can
+   only run on Unix.  */
+/* { dg-do run { target { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */
+/* { dg-require-effective-target cet } */
+/* { dg-require-effective-target split_stack } */
+/* { dg-options "-fsplit-stack -fcf-protection -mcet" } */
+
+#include 
+#include 
+#include 
+
+/* Use a noinline function to ensure that the buffer is not removed
+   from the stack.  */
+static void use_buffer (char *buf) __attribute__ ((noinline));
+static void
+use_buffer (char *buf)
+{
+  buf[0] = '\0';
+}
+
+/* Each recursive call uses 10,000 bytes.  We call it 1000 times,
+   using a total of 10,000,000 bytes.  If -fsplit-stack is not
+   working, that will overflow our stack limit.  */
+
+static void
+down (int i)
+{
+  char buf[1];
+
+  if (i > 0)
+{
+  use_buffer (buf);
+  down (i - 1);
+}
+}
+
+int
+main (void)
+{
+  struct rlimit r;
+
+  /* We set a stack limit because we are usually invoked via make, and
+ make sets the stack limit to be as large as possible.  */
+  r.rlim_cur = 8192 * 1024;
+  r.rlim_max = 8192 * 1024;
+  if (setrlimit (RLIMIT_STACK, ) != 0)
+abort ();
+  down (1000);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr85388-2.c b/gcc/testsuite/gcc.dg/pr85388-2.c
new file mode 100644
index 000..fd13d984c50
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr85388-2.c
@@ -0,0 +1,56 @@
+/* { dg-do run { target { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */
+/* { dg-require-effective-target cet } */
+/* { dg-require-effective-target split_stack } */
+/* { dg-require-effective-target pthread_h } */
+/* { dg-options "-pthread -fsplit-stack -fcf-protection -mcet" } */
+
+#include 
+#include 
+
+/* Use a noinline function to ensure that the buffer is not removed
+   from the stack.  */
+static void use_buffer (char *buf) __attribute__ ((noinline));
+static void
+use_buffer (char *buf)
+{
+  buf[0] = '\0';
+}
+
+/* Each recursive call uses 10,000 bytes.  We call it 1000 times,
+   using a total of 10,000,000 bytes.  If -fsplit-stack is not
+   working, that will overflow our stack limit.  */
+
+static void
+down (int i)
+{
+  char buf[1];
+
+  if (i > 0)
+{
+  use_buffer (buf);
+  down (i - 1);
+}
+}
+
+static void *
+thread_routine (void *arg __attribute__ ((unused)))
+{
+  down (1000);
+  return NULL;
+}
+
+int
+main (void)
+{
+  int i;
+  pthread_t

[x86, patch] Add tuning options to skylake-avx512

2018-04-13 Thread Koval, Julia

Hi,

This patch adds 2 tuning options to -march=skylake-avx512. Ok for trunk?

gcc/
   PR target/84413
   * config/i386/x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL,
   X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Add m_SKYLAKE_AVX512.

Thanks,
Julia


0001-pts.patch
Description: 0001-pts.patch

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 12:10:33PM +, Wilco Dijkstra wrote:
> > Anyway, here is what I think Richard was asking for, that I'm currently
> > bootstrapping/regtesting.  It can be easily combined with Martin's target
> > hook if needed, or do it only for
> > endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
> 
> This patch causes regressions on AArch64 since it now always calls mempcpy
> again, so yes either it would need to be done only for tailcalls (which fixes 
> the

No, it only uses mempcpy if we'd otherwise call memcpy library function and user
wrote mempcpy.  AFAIK 7.x and earlier behaved that way too, so it isn't a
regression, regression is only from released GCC versions.  And, a fix is
easy, just implement fast mempcpy on aarch64 ;)

Jakub

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread H.J. Lu

On Fri, Apr 13, 2018 at 5:10 AM, Wilco Dijkstra  wrote:

> This patch causes regressions on AArch64 since it now always calls mempcpy
> again, so yes either it would need to be done only for tailcalls (which fixes 
> the
> reported regression) or we need Martin's change too so each target can state 
> whether
> they have a fast mempcpy or not.
>

You should open a bug report to track it.

-- 
H.J.

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Wilco Dijkstra

Jakub Jelinek wrote:  
>On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
>> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got 
>> > added,
>> > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
>> > isn't hard if you have asm version of memcpy and one spare register.
>> 
>> More mempcpy implementations have been added in recent years indeed, but 
>> almost all
>> add an extra copy of the memcpy code rather than using a single combined 
>> implementation.
>> That means it is still better to call memcpy (which is frequently used and 
>> thus likely in L1/L2)
>> rather than mempcpy (which is more likely to be cold and thus not cached).
>
> That really depends, usually when some app uses mempcpy, it uses it very
> heavily.

But it would have to not use memcpy nearby. Do you have examples of apps using
mempcpy significantly? GLIBC is the only case I've seen that uses mempcpy.

> And all the proposed patches do is honor what the user asked, if
> you use memcpy () + n, we aren't transforming that into mempcpy behind the
> user's back.

We transform a lot of calls behind the user's back so that's not a plausible 
argument
for "honoring" the original call. Eg. (void)mempcpy always gets changed to 
memcpy,
bcopy to memmove, bzero to memset, strchr (s, 0) into strlen(s) + s - the list 
is long
and there are plenty cases where these expansions block tailcalls.

> Anyway, here is what I think Richard was asking for, that I'm currently
> bootstrapping/regtesting.  It can be easily combined with Martin's target
> hook if needed, or do it only for
> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)

This patch causes regressions on AArch64 since it now always calls mempcpy
again, so yes either it would need to be done only for tailcalls (which fixes 
the
reported regression) or we need Martin's change too so each target can state 
whether
they have a fast mempcpy or not.

Wilco

Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure

2018-04-13 Thread Jonathan Wakely

On 13 April 2018 at 11:22, Jakub Jelinek wrote:
> On Fri, Apr 13, 2018 at 11:19:35AM +0100, Jonathan Wakely wrote:
>> --- a/libstdc++-v3/src/c++11/Makefile.am
>> +++ b/libstdc++-v3/src/c++11/Makefile.am
>> @@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc
>>
>>  if ENABLE_DUAL_ABI
>>  # Rewrite the type info for __ios_failure.
>> -rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \
>> - -e 'n' \
>> - -e 
>> 's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \
>> - -e '}'
>> +rewrite_ios_failure_typeinfo = sed -e 
>> '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
>
> I miss / in between , and _ZTVN10__cxxabiv120__si_class_type_infoE
>
> $ echo | sed -e 
> '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_info_ZTVSt19__iosfail_type_info/'
> sed: -e expression #1, char 29: unexpected `,'
> $ echo | sed -e 
> '/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
>
> Jakub

Not sure how that got past my testing, so I've made sure to test with
a clean build this time.

Committed to trunk and gcc-7-branch.
commit 6b12637ba78f29a2a697d3fb78e3e23076d8dee4
Author: Jonathan Wakely 
Date:   Fri Apr 13 11:30:48 2018 +0100

Fix broken sed command from previous commit

* src/c++11/Makefile.am: Fix sed command.
* src/c++11/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/src/c++11/Makefile.am 
b/libstdc++-v3/src/c++11/Makefile.am
index 12bc004a2ea..cdc49bb7f9b 100644
--- a/libstdc++-v3/src/c++11/Makefile.am
+++ b/libstdc++-v3/src/c++11/Makefile.am
@@ -128,7 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc
 
 if ENABLE_DUAL_ABI
 # Rewrite the type info for __ios_failure.
-rewrite_ios_failure_typeinfo = sed -e 
'/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
+rewrite_ios_failure_typeinfo = sed -e 
'/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
 
 cxx11-ios_failure-lt.s: cxx11-ios_failure.cc
$(LTCXXCOMPILE) -S $< -o tmp-cxx11-ios_failure-lt.s

Add test from PR 83852 (was Re: Fix PR 83962)

2018-04-13 Thread Andrey Belevantsev

On 09.04.2018 12:16, Andrey Belevantsev wrote:
> On 06.04.2018 18:59, Alexander Monakov wrote:
>> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
>>
>>> Hello,
>>>
>>> This issues is about the correct order in which we need to call the
>>> routines that fix up the control flow for us.
>>
>> OK with formatting both in the new comment and the Changelog fixed.
> 
> Thanks, fixed that in rev. 259229.

I've found out that this patch also fixes PR 83852, so I've committed the
test from that PR as obvious after verifying that it works on cross-ppc
compiler and on x86-64.

Andrey
Index: gcc.dg/pr83852.c
===
*** gcc.dg/pr83852.c(revision 0)
--- gcc.dg/pr83852.c(revision 259373)
***
*** 0 
--- 1,33 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-std=gnu99 -O2 -fselective-scheduling -fno-if-conversion 
-fno-tree-dse -w" } */
+ long long int uo;
+ unsigned int vt;
+ 
+ void
+ r5 (long long int h8, long long int pu)
+ {
+   short int wj;
+   long long int *mh = h8;
+ 
+   for (wj = 0; wj < 3; ++wj)
+ {
+   int oq;
+   long long int ns, xf;
+ 
+   h8 += 2;
+   oq = !!h8 && !!wj;
+   ++uo;
+   vt ^= oq + uo;
+   ns = !!uo && !!vt;
+   xf = (h8 != 0) ? mh : 1;
+   pu += ns < xf;
+ }
+ 
+   for (pu = 0; pu < 1; ++pu)
+ {
+   int *sc;
+ 
+   sc = (int *)
+   *sc = 0;
+ }
+ }
Index: ChangeLog
===
*** ChangeLog   (revision 259372)
--- ChangeLog   (revision 259373)
***
*** 1,3 
--- 1,8 
+ 2018-04-13  Andrey Belevantsev  
+ 
+   PR rtl-optimization/83852
+   * gcc.dg/pr83852.c: New testcase.
+ 
  2018-04-13  Andreas Krebbel  
  
  PR testsuite/85326

Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure

2018-04-13 Thread Jakub Jelinek

On Fri, Apr 13, 2018 at 11:19:35AM +0100, Jonathan Wakely wrote:
> --- a/libstdc++-v3/src/c++11/Makefile.am
> +++ b/libstdc++-v3/src/c++11/Makefile.am
> @@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc
>  
>  if ENABLE_DUAL_ABI
>  # Rewrite the type info for __ios_failure.
> -rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \
> - -e 'n' \
> - -e 
> 's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \
> - -e '}'
> +rewrite_ios_failure_typeinfo = sed -e 
> '/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'

I miss / in between , and _ZTVN10__cxxabiv120__si_class_type_infoE

$ echo | sed -e 
'/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_info_ZTVSt19__iosfail_type_info/'
sed: -e expression #1, char 29: unexpected `,'
$ echo | sed -e 
'/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'

Jakub

Re: [PATCH] PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure

2018-04-13 Thread Jonathan Wakely

Darwin has double underscores at the start of mangled names, so this
fixes the sed command to be more flexible.

Committed to trunk and gcc-7-branch.
commit f80944837b4c21016d826bff5f497ceda85b9894
Author: Jonathan Wakely 
Date:   Fri Apr 13 10:44:08 2018 +0100

Fix __iosfail_type_info hack to work on darwin

* src/c++11/Makefile.am: Rewrite sed rule to be less fragile and to
handle mangled names starting with double underscores on darwin.
* src/c++11/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/src/c++11/Makefile.am 
b/libstdc++-v3/src/c++11/Makefile.am
index 8d524b67232..12bc004a2ea 100644
--- a/libstdc++-v3/src/c++11/Makefile.am
+++ b/libstdc++-v3/src/c++11/Makefile.am
@@ -128,10 +128,7 @@ hashtable_c++0x.o: hashtable_c++0x.cc
 
 if ENABLE_DUAL_ABI
 # Rewrite the type info for __ios_failure.
-rewrite_ios_failure_typeinfo = sed -e '/^_ZTISt13__ios_failure:$$/{' \
-   -e 'n' \
-   -e 
's/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/' \
-   -e '}'
+rewrite_ios_failure_typeinfo = sed -e 
'/^_*_ZTISt13__ios_failure:/,_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
 
 cxx11-ios_failure-lt.s: cxx11-ios_failure.cc
$(LTCXXCOMPILE) -S $< -o tmp-cxx11-ios_failure-lt.s

[Committed] IBM Z: Get rid of target specific C++ testcase

2018-04-13 Thread Andreas Krebbel

gcc/testsuite/ChangeLog:

2018-04-13  Andreas Krebbel  

PR testsuite/85326
* gcc.target/s390/pr77822-1.C: Rename to ...
* gcc.target/s390/pr77822-1.c: ... this. Add asm scan check.
* gcc.target/s390/pr77822-2.c: Add asm scan check.
* gcc.target/s390/s390.exp: Remove C from testcase regexps.
---
 .../gcc.target/s390/{pr77822-1.C => pr77822-1.c} | 16 +---
 gcc/testsuite/gcc.target/s390/pr77822-2.c|  2 ++
 gcc/testsuite/gcc.target/s390/s390.exp   | 13 -
 3 files changed, 19 insertions(+), 12 deletions(-)
 rename gcc/testsuite/gcc.target/s390/{pr77822-1.C => pr77822-1.c} (62%)

diff --git a/gcc/testsuite/gcc.target/s390/pr77822-1.C 
b/gcc/testsuite/gcc.target/s390/pr77822-1.c
similarity index 62%
rename from gcc/testsuite/gcc.target/s390/pr77822-1.C
rename to gcc/testsuite/gcc.target/s390/pr77822-1.c
index bd5a9b4..9bf7bf4 100644
--- a/gcc/testsuite/gcc.target/s390/pr77822-1.C
+++ b/gcc/testsuite/gcc.target/s390/pr77822-1.c
@@ -3,15 +3,15 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=zEC12" } */
 
-class A {
-  void m_fn1();
-  char m_datawidth;
-  char m_subunits;
-  int m_subunit_infos[];
-};
+void m_fn1();
+
+char m_datawidth;
+char m_subunits;
+int m_subunit_infos[1];
+
 int a;
 long b;
-void A::m_fn1() {
+void m_fn1() {
   int c = 32, d = m_datawidth / c;
   for (int e = 0; e < d; e++) {
 int f = e * 32;
@@ -19,3 +19,5 @@ void A::m_fn1() {
   m_subunit_infos[m_subunits] = a;
   }
 }
+
+/* { dg-final { scan-assembler-not "risbg.*-\[0-9\]+\\\+1\n" } } */
diff --git a/gcc/testsuite/gcc.target/s390/pr77822-2.c 
b/gcc/testsuite/gcc.target/s390/pr77822-2.c
index 6789152..9a0fad2 100644
--- a/gcc/testsuite/gcc.target/s390/pr77822-2.c
+++ b/gcc/testsuite/gcc.target/s390/pr77822-2.c
@@ -305,3 +305,5 @@ void sizepos_c_13 (signed char b)
   if (b >> 13 & 1)
 g = b;
 }
+
+/* { dg-final { scan-assembler-not "risbg.*-\[0-9\]+\\\+1\n" } } */
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index bb13bfd..93c570a 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -199,23 +199,26 @@ dg-init
 
 set md_tests $srcdir/$subdir/md/*.c
 
+# C++ tests belong into g++.dg with a target check.  Do NOT add C to
+# these regexps!
+
 # Main loop.
-dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.{c,S,C}] \
+dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.{c,S}] \
 $md_tests]] "" $DEFAULT_CFLAGS
 
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.{c,S,C}]] \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.{c,S}]] \
"" $DEFAULT_CFLAGS
 
-dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/target-attribute/*.{c,S,C}]] \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/target-attribute/*.{c,S}]] 
\
"" $DEFAULT_CFLAGS
 
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/arch12/*.{c,S,C}]] \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/arch12/*.{c,S}]] \
"" "-O3 -march=arch12 -mzarch"
 
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vxe/*.{c,S}]] \
"" "-O3 -march=arch12 -mzarch"
 
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.{c,S,C}]] \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.{c,S}]] \
"" $DEFAULT_CFLAGS
 
 # Additional hotpatch torture tests.
-- 
2.9.1

Re: [PATCH] Fix -fsanitize=address VLA instrumentation (PR sanitizer/85230)

2018-04-13 Thread Maxim Ostapenko

On 04/13/2018 01:16 AM, Jakub Jelinek wrote:
> Hi!
>
> As mentioned in the PR, we need to unpoison the red zones when leaving a
> scope with VLA variable(s); this is done through __asan_allocas_unpoison
> call, unfortunately it is called after the __builtin_stack_restore which
> restores the stack pointer; now, if an interrupt comes in between the stack
> restore and the __asan_allocas_unpoison call, the interrupt handler might
> have some stack bytes marked as red zones in the shadow memory and might
> diagnose sanitizing error even when there is none in the original program.
>
> The following patch ought to fix this by swapping the two calls, so we first
> unpoison and only after it is unpoisoned in shadow memory release the stack.
> The second argument to the __asan_allocas_unpoison call is meant to
> be virtual_dynamic_stack_rtx after the __builtin_stack_restore, i.e. the new
> stack_pointer_rtx value + STACK_DYNAMIC_OFFSET (current_function_decl).
> As the STACK_DYNAMIC_OFFSET value isn't known until the vregs pass, the code
> used a hack where it ignored the second argument and replaced it by
> virtual_dynamic_stack_rtx.  With the asan.c change below this doesn't work
> anymore, because virtual_dynamic_stack_rtx aka stack_pointer_rtx +
> STACK_DYNAMIC_OFFSET (current_function_decl) before the
> __builtin_stack_restore is a different value.  The patch instead uses the
> argument passed to the __asan_allocas_unpoison at GIMPLE time, which is the
> same as passed to __builtin_stack_restore; this is the new stack_pointer_rtx
> value after __builtin_stack_restore.  And, because we don't want that value,
> but that + STACK_DYNAMIC_OFFSET (current_function_decl), we compute
> arg1 + (virtual_dynamic_stack_rtx - stack_pointer_rtx) and let CSE/combiner
> optimize it into arg1 (on targets like x86_64 where STACK_DYNAMIC_OFFSET can
> be even 0 when not accumulating outgoing args or when that size is 0) or
> arg1 + some_constant.
>
> Bootstrapped on
> {x86_64,i686,powerpc64,powerpc64le,aarch64,s390x,armv7hl}-linux, regtested
> on {x86_64,i686,powerpc64,powerpc64le}-linux so far, but on the power* ones
> on virtual address space size that isn't really supported (likely
> https://github.com/google/sanitizers/issues/933#issuecomment-380058705
> issue, so while nothing regresses there, pretty much all asan tests fail
> there before and after the patch); also tested successfully with
> asan.exp=alloca* on gcc110 and gcc112 on compile farm where it doesn't
> suffer from the VA issue.  Ok if testing passes also on aarch64, s390x
> and armv7hl?

OK with me, thanks.

-Maxim

> 2018-04-12  Jakub Jelinek  
>
>   PR sanitizer/85230
>   * asan.c (handle_builtin_stack_restore): Adjust comment.  Emit
>   __asan_allocas_unpoison call and last_alloca_addr = new_sp before
>   __builtin_stack_restore rather than after it.
>   * builtins.c (expand_asan_emit_allocas_unpoison): Pass
>   arg1 + (virtual_dynamic_stack_rtx - stack_pointer_rtx) as second
>   argument instead of virtual_dynamic_stack_rtx.
>
> --- gcc/asan.c.jj 2018-01-09 21:53:38.821577722 +0100
> +++ gcc/asan.c2018-04-12 13:22:59.166095523 +0200
> @@ -554,14 +554,14 @@ get_last_alloca_addr ()
> return last_alloca_addr;
>   }
>   
> -/* Insert __asan_allocas_unpoison (top, bottom) call after
> +/* Insert __asan_allocas_unpoison (top, bottom) call before
>  __builtin_stack_restore (new_sp) call.
>  The pseudocode of this routine should look like this:
> - __builtin_stack_restore (new_sp);
>top = last_alloca_addr;
>bot = new_sp;
>__asan_allocas_unpoison (top, bot);
>last_alloca_addr = new_sp;
> + __builtin_stack_restore (new_sp);
>  In general, we can't use new_sp as bot parameter because on some
>  architectures SP has non zero offset from dynamic stack area.  Moreover, 
> on
>  some architectures this offset (STACK_DYNAMIC_OFFSET) becomes known for 
> each
> @@ -570,9 +570,8 @@ get_last_alloca_addr ()
>  
> http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK.
>  To overcome the issue we use following trick: pass new_sp as a second
>  parameter to __asan_allocas_unpoison and rewrite it during expansion with
> -   virtual_dynamic_stack_rtx later in expand_asan_emit_allocas_unpoison
> -   function.
> -*/
> +   new_sp + (virtual_dynamic_stack_rtx - sp) later in
> +   expand_asan_emit_allocas_unpoison function.  */
>   
>   static void
>   handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
> @@ -584,9 +583,9 @@ handle_builtin_stack_restore (gcall *cal
> tree restored_stack = gimple_call_arg (call, 0);
> tree fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCAS_UNPOISON);
> gimple *g = gimple_build_call (fn, 2, last_alloca, restored_stack);
> -  gsi_insert_after (iter, g, GSI_NEW_STMT);
> +  gsi_insert_before (iter, g, GSI_SAME_STMT);
> g = gimple_build_assign (last_alloca,

[C++ Patch] PR 85112 ("[8 Regression] ICE with invalid constexpr")

2018-04-13 Thread Paolo Carlini


Hi,

in this error-recovery regression, after a sensible error message issued 
by cxx_constant_init, store_init_value stores an error_mark_node as 
DECL_INITIAL of the VAR_DECL for 'j'. This error_mark_node reappears 
much later, to cause a crash during gimplification. As far as I know, 
the choice of storing error_mark_nodes too is the outcome of a rather 
delicate error-recovery strategy and I don't think we want to revisit it 
at this time, thus the remaining option is catching later the 
error_mark_node, at a "good" time. I note, in passing, that the do loop 
in gimplify_expr which uses the crashing STRIP_USELESS_TYPE_CONVERSION 
seems a bit lacking from the error-recovery point of view, because at 
each iteration it *does* cover for error_operand_p (save_expr) but only 
immediately after the call, when it's too late.


All the above said, I believe that at least for the 8.1.0 needs we may 
want to catch the error_mark_node in cp_build_modify_expr, when we are 
handling the assignment 'a.n = j;': convert_for_assignment produces a 
NOP_EXPR from the VAR_DECL for 'j' which then cp_convert_and_check 
regenerates (deep in convert_to_integer_1 via maybe_fold_build1_loc) in 
the final bare-bone form, with the error_mark_node as the first operand. 
Passes testing on x86_64-linux.


Thanks, Paolo.

/


/cp
2018-04-13  Paolo Carlini  

PR c++/85112
* typeck.c (cp_build_modify_expr): Return error_mark_node upon
an error_mark_node wrapped in a NOP_EXPR too.

/testsuite
2018-04-13  Paolo Carlini  

PR c++/85112
* g++.dg/cpp0x/pr85112.C: New.
Index: cp/typeck.c
===
--- cp/typeck.c (revision 259359)
+++ cp/typeck.c (working copy)
@@ -8234,7 +8234,9 @@ cp_build_modify_expr (location_t loc, tree lhs, en
 TREE_OPERAND (newrhs, 0));
 }
 
-  if (newrhs == error_mark_node)
+  if (newrhs == error_mark_node
+  || (TREE_CODE (newrhs) == NOP_EXPR
+ && TREE_OPERAND (newrhs, 0) == error_mark_node))
 return error_mark_node;
 
   if (c_dialect_objc () && flag_objc_gc)
Index: testsuite/g++.dg/cpp0x/pr85112.C
===
--- testsuite/g++.dg/cpp0x/pr85112.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/pr85112.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/85112
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int m;
+  int n : 4;
+};
+
+int i;  // { dg-message "not const" }
+
+void foo()
+{
+  constexpr int j = i;  // { dg-error "not usable" }
+  A a;
+  a.n = j;
+}

Be more permissive for always inlining across target attribute changes

2018-04-13 Thread Jan Hubicka

Hi,
this patch makes it possible to always inline across target attribute changes
when doing so will not lead to incorrect code.  We used to be permissive here
with default options and overly restrictie without.

This fixes most common anoyances seen with these, but not all (i.e. zen)

Bootstrapped/regteste x86_64-linux, commited.

Honza

PR lto/71991
* config/i386/i386.c (ix86_can_inline_p): Allow safe transitions for
always inline.
* gcc.target/i386/pr71991.c: New testcase.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 259345)
+++ config/i386/i386.c  (working copy)
@@ -5766,6 +5766,19 @@ ix86_can_inline_p (tree caller, tree cal
 {
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+
+  /* Changes of those flags can be tolerated for always inlines. Lets hope
+ user knows what he is doing.  */
+  const unsigned HOST_WIDE_INT always_inline_safe_mask
+= (MASK_USE_8BIT_IDIV | MASK_ACCUMULATE_OUTGOING_ARGS
+   | MASK_NO_ALIGN_STRINGOPS | MASK_AVX256_SPLIT_UNALIGNED_LOAD
+   | MASK_AVX256_SPLIT_UNALIGNED_STORE | MASK_CLD
+   | MASK_NO_FANCY_MATH_387 | MASK_IEEE_FP | MASK_INLINE_ALL_STRINGOPS
+   | MASK_INLINE_STRINGOPS_DYNAMICALLY | MASK_RECIP | MASK_STACK_PROBE
+   | MASK_STV | MASK_TLS_DIRECT_SEG_REFS | MASK_VZEROUPPER
+   | MASK_NO_PUSH_ARGS | MASK_OMIT_LEAF_FRAME_POINTER);
+
+
   if (!callee_tree)
 callee_tree = target_option_default_node;
   if (!caller_tree)
@@ -5776,6 +5789,10 @@ ix86_can_inline_p (tree caller, tree cal
   struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
   struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
   bool ret = false;
+  bool always_inline =
+ (DECL_DISREGARD_INLINE_LIMITS (callee)
+  && lookup_attribute ("always_inline",
+  DECL_ATTRIBUTES (callee)));
 
   /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
  function can inline a SSE2 function but a SSE2 function can't inline
@@ -5787,14 +5804,17 @@ ix86_can_inline_p (tree caller, tree cal
 ret = false;
 
   /* See if we have the same non-isa options.  */
-  else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
+  else if ((!always_inline
+   && caller_opts->x_target_flags != callee_opts->x_target_flags)
+  || (caller_opts->x_target_flags & ~always_inline_safe_mask)
+  != (callee_opts->x_target_flags & ~always_inline_safe_mask))
 ret = false;
 
   /* See if arch, tune, etc. are the same.  */
   else if (caller_opts->arch != callee_opts->arch)
 ret = false;
 
-  else if (caller_opts->tune != callee_opts->tune)
+  else if (!always_inline && caller_opts->tune != callee_opts->tune)
 ret = false;
 
   else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
@@ -5807,7 +5827,8 @@ ix86_can_inline_p (tree caller, tree cal
   (cgraph_node::get (callee))->fp_expressions))
 ret = false;
 
-  else if (caller_opts->branch_cost != callee_opts->branch_cost)
+  else if (!always_inline
+  && caller_opts->branch_cost != callee_opts->branch_cost)
 ret = false;
 
   else
Index: testsuite/gcc.target/i386/pr71991.c
===
--- testsuite/gcc.target/i386/pr71991.c (revision 0)
+++ testsuite/gcc.target/i386/pr71991.c (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+static inline __attribute__ ((__always_inline__)) int fn1 () { return 0; }
+static __attribute__ ((target ("inline-all-stringops"))) int fn2 () { fn1 (); }
+
+int main()
+{
+  fn2();
+}
+

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Richard Biener

On April 13, 2018 12:35:54 AM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Apr 12, 2018 at 07:37:22PM +0200, Jakub Jelinek wrote:
>> On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
>> > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy
>got added,
>> > > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent
>mempcpy
>> > > isn't hard if you have asm version of memcpy and one spare
>register.
>> > 
>> > More mempcpy implementations have been added in recent years
>indeed, but almost all
>> > add an extra copy of the memcpy code rather than using a single
>combined implementation.
>> > That means it is still better to call memcpy (which is frequently
>used and thus likely in L1/L2)
>> > rather than mempcpy (which is more likely to be cold and thus not
>cached).
>> 
>> That really depends, usually when some app uses mempcpy, it uses it
>very
>> heavily.  And all the proposed patches do is honor what the user
>asked, if
>> you use memcpy () + n, we aren't transforming that into mempcpy
>behind the
>> user's back.
>> 
>> Anyway, here is what I think Richard was asking for, that I'm
>currently
>> bootstrapping/regtesting.  It can be easily combined with Martin's
>target
>> hook if needed, or do it only for
>> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
>> etc.
>> 
>> 2018-04-12  Martin Liska  
>>  Jakub Jelinek  
>> 
>>  PR middle-end/81657
>>  * expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET.
>>  * expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET.
>>  * builtins.c (expand_builtin_memory_copy_args): Use
>>  BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target,
>>  handle dest_addr == pc_rtx.
>> 
>>  * gcc.dg/string-opt-1.c: Remove bogus comment.  Expect a mempcpy
>>  call.
>
>Successfully bootstrapped/regtested on x86_64-linux and i686-linux.

OK. 

Thanks, 
Richard. 

>   Jakub

Fix gcc.dg/debug/pr41893-1.c with Solaris ld (PR lto/81968)

2018-04-13 Thread Rainer Orth

The last LTO early debug related failure on Solaris 11 is

FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors)
FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess errors)
FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess errors)

Excess errors:
ld: fatal: relocation error: R_386_32: file /var/tmp//ccyfLclbdebugobjtem: 
section [6].rel.debug_macro: symbol .debug_macro (section): symbol has been 
discarded with discarded section: [7].debug_macro

As detailed in the PR, the rewritten objects lto-wrapper passes to ld -r
for a partial link violate the ELF gABI access rules for COMDAT
sections.  On closer inspection, however, the input objects do so, too.
Solaris ld has heuristics to relax the rules in objects produced by GCC,
which trigger on the presence of a .comment section containing the
string "GCC: (GNU)".  This same relaxation can be enabled with
-z relaxreloc/-z relax=comdat if need be.

Right now, simple_object_elf_copy_lto_debug_sections doesn't copy
.comment sections, so the heuristic doesn't trigger.  Fixed trivially by
keeping .comment sections in the output.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.  Approved by Richard in the PR, installed on
mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-04-11  Rainer Orth  

PR lto/81968
* simple-object.c (handle_lto_debug_sections): Keep .comment
section.

# HG changeset patch
# Parent  6314816f4181dd47c34310bc29a5996d925c9b8d
Fix gcc.dg/debug/pr41893-1.c with Solaris ld (PR lto/81968)

	libiberty:
	PR lto/81968
	* simple-object.c (handle_lto_debug_sections): Keep .comment
	section.

diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c
--- a/libiberty/simple-object.c
+++ b/libiberty/simple-object.c
@@ -284,6 +284,11 @@ handle_lto_debug_sections (const char *n
   /* Copy over .note.GNU-stack section under the same name if present.  */
   else if (strcmp (name, ".note.GNU-stack") == 0)
 return strcpy (newname, name);
+  /* Copy over .comment section under the same name if present.  Solaris
+ ld uses them to relax its checking of ELF gABI access rules for
+ COMDAT sections in objects produced by GCC.  */
+  else if (strcmp (name, ".comment") == 0)
+return strcpy (newname, name);
   return NULL;
 }

Re: [Ping, Fortran, Patch, PR81773, PR83606, coarray, v1] Fix coarray get to array with vector indexing

2018-04-13 Thread Andre Vehreschild

Ping 

On Sun, 8 Apr 2018 14:25:50 +0200
Andre Vehreschild  wrote:

> Hi all,
> 
> attached patch fixes (to my knowledge) the two PRs 81773 and 83606 where the
> result of a coarray get() operation is assigned to an array using a vector as
> index.  It took me quite long to get it right, because I had to use the
> scalarizer which I haven't use directly before. So reviewers with expertise on
> using the scalarizer are especially welcome.
> 
> Bootstrapped and regtested on x86_64-linux/f27.
> 
> Ok for trunk? Backports?
> 
> Regards,
>   Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2018-04-08  Andre Vehreschild  

PR fortran/81773
PR fortran/83606
* dependency.c (gfc_dep_resolver): Coarray indexes are to be ignored
during dependency computation.  They define no data dependency.
* trans-array.c (conv_array_index_offset): The stride can not be set
here, prevent fail.
* trans-intrinsic.c (conv_caf_send): Add creation of temporary array
for caf_get's result and copying to the array with vectorial
indexing.

gcc/testsuite/ChangeLog:

2018-04-08  Andre Vehreschild  

PR fortran/81773
PR fortran/83606
* gfortran.dg/coarray/get_to_indexed_array_1.f90: New test.
* gfortran.dg/coarray/get_to_indirect_array.f90: New test.

diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c
index a0bbd584947..3e14ddc25d8 100644
--- a/gcc/fortran/dependency.c
+++ b/gcc/fortran/dependency.c
@@ -2238,8 +2238,9 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 	break;
 
 	  /* Exactly matching and forward overlapping ranges don't cause a
-	 dependency.  */
-	  if (fin_dep < GFC_DEP_BACKWARD)
+	 dependency, when they are not part of a coarray ref.  */
+	  if (fin_dep < GFC_DEP_BACKWARD
+	  && lref->u.ar.codimen == 0 && rref->u.ar.codimen == 0)
 	return 0;
 
 	  /* Keep checking.  We only have a dependency if
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index bd731689031..b68e77d5281 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3215,7 +3215,7 @@ conv_array_index_offset (gfc_se * se, gfc_ss * ss, int dim, int i,
 }
 
   /* Multiply by the stride.  */
-  if (!integer_onep (stride))
+  if (stride != NULL && !integer_onep (stride))
 index = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
 			 index, stride);
 
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index a45aec708fb..00edd447bb2 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1907,34 +1907,124 @@ conv_caf_send (gfc_code *code) {
 }
   else
 {
-  /* If has_vector, pass descriptor for whole array and the
- vector bounds separately.  */
-  gfc_array_ref *ar, ar2;
-  bool has_vector = false;
+  bool has_vector = gfc_has_vector_subscript (lhs_expr);
 
-  if (gfc_is_coindexed (lhs_expr) && gfc_has_vector_subscript (lhs_expr))
+  if (gfc_is_coindexed (lhs_expr) || !has_vector)
 	{
-  has_vector = true;
-  ar = gfc_find_array_ref (lhs_expr);
-	  ar2 = *ar;
-	  memset (ar, '\0', sizeof (*ar));
-	  ar->as = ar2.as;
-	  ar->type = AR_FULL;
+	  /* If has_vector, pass descriptor for whole array and the
+	 vector bounds separately.  */
+	  gfc_array_ref *ar, ar2;
+	  bool has_tmp_lhs_array = false;
+	  if (has_vector)
+	{
+	  has_tmp_lhs_array = true;
+	  ar = gfc_find_array_ref (lhs_expr);
+	  ar2 = *ar;
+	  memset (ar, '\0', sizeof (*ar));
+	  ar->as = ar2.as;
+	  ar->type = AR_FULL;
+	}
+	  lhs_se.want_pointer = 1;
+	  gfc_conv_expr_descriptor (_se, lhs_expr);
+	  /* Using gfc_conv_expr_descriptor, we only get the descriptor, but
+	 that has the wrong type if component references are done.  */
+	  lhs_type = gfc_typenode_for_spec (_expr->ts);
+	  tmp = build_fold_indirect_ref_loc (input_location, lhs_se.expr);
+	  gfc_add_modify (_se.pre, gfc_conv_descriptor_dtype (tmp),
+			  gfc_get_dtype_rank_type (has_vector ? ar2.dimen
+			  : lhs_expr->rank,
+		   lhs_type));
+	  if (has_tmp_lhs_array)
+	{
+	  vec = conv_caf_vector_subscript (, lhs_se.expr, );
+	  *ar = ar2;
+	}
 	}
-  lhs_se.want_pointer = 1;
-  gfc_conv_expr_descriptor (_se, lhs_expr);
-  /* Using gfc_conv_expr_descriptor, we only get the descriptor, but that
- has the wrong type if component references are done.  */
-  lhs_type = gfc_typenode_for_spec (_expr->ts);
-  tmp = build_fold_indirect_ref_loc (input_location, lhs_se.expr);
-  gfc_add_modify (_se.pre, gfc_conv_descriptor_dtype (tmp),
-  gfc_get_dtype_rank_type (has_vector ? ar2.dimen
-			  : lhs_expr->rank,
-		  lhs_type));
-  if (has_vector)
+  else
 	{
-	  vec = conv_caf_vector_subscript (,

[PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64

2018-04-13 Thread vladimir . mezentsev

From: Vladimir Mezentsev 

When weakref_targets is not empty a target cannot be removed from weak_decls.
A small example is below when 'wv12' is removed from the weak list on aarch64:
  static vtype Wv12 __attribute__((weakref ("wv12")));
  extern vtype wv12 __attribute__((weak));

Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go).
Tested on aarch64-linux-gnu.
No regression. The attr-weakref-1.c test passed.

ChangeLog:
2018-04-12  Vladimir Mezentsev  

PR gcc/84923
* varasm.c (weak_finish): clean up weak_decls
---
 gcc/varasm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/varasm.c b/gcc/varasm.c
index d24bac4..2a70234 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -5683,8 +5683,7 @@ weak_finish (void)
   nor multiple .weak directives for the latter.  */
for (p = _decls; (t2 = *p) ; )
  {
-   if (TREE_VALUE (t2) == alias_decl
-   || target == DECL_ASSEMBLER_NAME (TREE_VALUE (t2)))
+   if (TREE_VALUE (t2) == alias_decl)
  *p = TREE_CHAIN (t2);
else
  p = _CHAIN (t2);
-- 
1.8.3.1

62 matches

Mail list logo