date:20191016

[Bug c++/68897] No option to disable just "warning: enumeral and non-enumeral type in conditional expression"

2019-10-16 Thread egallager at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68897

--- Comment #3 from Eric Gallager  ---
(In reply to Eric Gallager from comment #2)
> (In reply to Manuel López-Ibáñez from comment #1)
> > You just need to come up with a good name and implement a patch like the
> > ones shown in PR7651. Finding a good name is probably the hardest part. :)
> > 
> > 
> > See also https://gcc.gnu.org/ml/gcc/2007-01/msg00391.html
> 
> From that thread, it seems like the agreement was to put it under
> -Wconversion:
> https://gcc.gnu.org/ml/gcc/2007-01/msg00437.html

...or maybe -Wenum-conversion would make sense, now that we have that...

Re: [PATCH] Fix -fdebug-types-section ICE, PR91887

2019-10-16 Thread Jason Merrill


On 10/16/19 6:11 AM, Richard Biener wrote:


The following makes sure we correctly identify a parm DIE created
early in a formal parameter pack during late annotation.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?


OK, thanks.

Jason

Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-16 Thread Alan Modra

On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > For 32bit cases this isn't a problem since we can load/store to unaligned
> > addresses using D-mode insns.
> 
> Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> six years ago or so...  Alan, do you remember?  It required some assembler
> work IIRC.

Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
Rather than defining a whole series of _DS (and _DQ!) relocs, the
linker inspects the instruction being relocated and complains if the
relocation would modify opcode bits.  See is_insn_ds_form in
bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

> I'll have another looke through this (esp. the generic part) when I'm fresh
> awake (but not before coffee!).  Alan, can you have a look as well please?

It looks reasonable to me.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-16 Thread Jason Merrill


On 10/16/19 11:59 AM, Paolo Carlini wrote:
... the below, slightly extended patch: 1- Makes sure the 
in_system_header_at calls surviving in decl.c get the same location used 
for the corresponding diagnostic


Hmm, we probably want to change permerror to respect warn_system_headers 
like warning and pedwarn.


Jason

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jeff Law

On 10/16/19 9:43 AM, Martin Sebor wrote:
> On 10/16/19 9:11 AM, Richard Sandiford wrote:
>> Sorry for the slow reply.
>>
>> Bernd Edlinger  writes:
>>> Hi,
>>>
>>> this is probably on the border to obvious.
>>>
>>> The REGEXP_xxx macros in genautomata are invoked
>>> recursively, and the local values are all named _regexp
>>> and shadow each other.
>>>
>>>
>>> Fixed by using different names _regexp1..6 for each
>>> macro.
>>
>> Sorry to repeat the complaint about numerical suffixes, but I think
>> we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
>> and similarly for the other macros.  But a similar fix to rtl.h might
>> be better.
> 
> Should the warning trigger when the shadowing name results from
> macro expansion?  The author of a macro can't (in general) know
> what context it's going to be used, and when different macros
> come from two different third party headers, it would seem
> pointless to force their users to jump through hoops just to
> avoid the innocuous shadowing.  Such as in this example:
One could make the argument that if you want to avoid shadowing, then
you should avoid code within macros for this precise reason.  And if
you're getting code within macros from 3rd parties, then well, you're in
for a world of pain if you're going to try to be shadow-free.



jeff

Re: connecting a QEMU VM to dejagnu...

2019-10-16 Thread Rob Savoye

On 10/16/19 5:40 PM, Alan Lehotsky via DejaGnu wrote:

> The one example I found via a web search seems to want to do
> everything in the virtual machine - but I have to believe that’s
> going to be insanely slow…

  Well, qemu is a virtual machine... Here's the ones I used for GNU
toolchain cross testing:
https://git.linaro.org/toolchain/abe.git/tree/config/boards. There's a
few on there. If you're building cross compilers, just use ABE and it's
all built in.

- rob -

connecting a QEMU VM to dejagnu...

2019-10-16 Thread Alan Lehotsky

I’m trying to grapple with connecting dejagnu to a QEMU simulator; not finding 
any obvious examples to work with.

I’ve had a lot of familiarity using CGEN simulators connected to dejagnu, but 
QEMU’s a new breed of cat….

Can anyone point me to a boards/.exp that is based on using QEMU, or 
provide other examples.  

The one example I found via a web search seems to want to do everything in the 
virtual machine - but I have to believe that’s going to be insanely slow…

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Eric Gallager

On 10/16/19, Jakub Jelinek  wrote:
> On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
>> > The counter example would be:
>> > #define F(x) \
>> >__extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
>> > #define G(x) \
>> >__extension__ (({ __typeof__ (x) _x = x; F(_x); }))
>> > where a -Wshadow diagnostics could point the author at a serious bug,
>> > because in the expansion it will be __typeof__ (_x) _x = _x; ...
>>
>> True.  I don't suppose there is a way to make it so the warning
>> triggers for the counter example and not for the original, is
>> there?
>
> Maybe look through the macro nesting context and if the shadowing
> declaration comes from the same macro as shadowed declaration
> or macro included directly or indirectly from the macro with shadowed
> declaration, warn, otherwise not?
> This might still not warn in case where the scope of the shadowing
> declaration is created from multiple macros ({ coming from one,
> }) from another one, but otherwise could work.
> Perhaps -Wshadow-local needs multiple modes, the default one that
> will have this macro handling and full one (=2) which would warn
> regardless of macro definitions.
>

I'm worried about the proliferation of the number of '=' signs here...
there's already confusion as to whether the first '=' represents
levels or just a different spelling of names, adding a second would
only compound the confusion.

>   Jakub
>

[Bug libfortran/92100] Formatted stream IO irreproducible read with binary data in file

2019-10-16 Thread sgk at troutmask dot apl.washington.edu

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92100

--- Comment #6 from Steve Kargl  ---
On Wed, Oct 16, 2019 at 10:57:05PM +, angus at agibson dot me wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92100
> 
> --- Comment #5 from Angus Gibson  ---
> I agree that it's not ideal... Unfortunately an awkward file format, and
> fortran is usually the wrong language for this kind of IO anyway. I guess the
> note that "A processor may prohibit some control characters from appearing in 
> a
> formatted stream file" is a bit of an escape clause for gfortran here.
> 
> I would argue that because the position is obtained from an inquire statement,
> we should be able to re-set the file position to that (indeed, Intel Fortran,
> not that it defines the standard, does what I expect). Anyway, I'm happy for
> this to go either way.
> 

I agree that gfortran should do better.  I simply don't
know internals of libgfortran well enough to come up with
a quick fix.

[Bug libfortran/92100] Formatted stream IO irreproducible read with binary data in file

2019-10-16 Thread angus at agibson dot me

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92100

--- Comment #5 from Angus Gibson  ---
I agree that it's not ideal... Unfortunately an awkward file format, and
fortran is usually the wrong language for this kind of IO anyway. I guess the
note that "A processor may prohibit some control characters from appearing in a
formatted stream file" is a bit of an escape clause for gfortran here.

I would argue that because the position is obtained from an inquire statement,
we should be able to re-set the file position to that (indeed, Intel Fortran,
not that it defines the standard, does what I expect). Anyway, I'm happy for
this to go either way.

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Segher Boessenkool  writes:
> On Wed, Oct 16, 2019 at 09:04:18PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > This isn't canonical RTL.  Does combine not simplify this?
>> >
>> > Or, rather, it should not be what we canonicalise to: nothing is defined
>> > here.
>> 
>> But when nothing is defined, let's match what we get :-)
>
> Of course.
>
>> If someone wants to add a new canonical form then the ports should of
>> course adapt, but until then I think the patch is doing the right thing.
>
> We used to generate this, until GCC 5.  There aren't many ports that have
> adapted yet.

The patch has testcases, so this won't be a silent failure for SVE2
if things change again in future.

>> > If the mask is not a constant, we really shouldn't generate a totally
>> > different form.  The xor-and-xor form is very hard to handle, too.
>> >
>> > Expand currently generates this, because gimple thinks this is simpler.
>> > I think this should be fixed.
>> 
>> But the constant form is effectively folding away the NOT.
>> Without it the equivalent rtl uses 4 operations rather than 3:
>> 
>>   (ior (and A C) (and B (not C)))
>
> RTL canonicalisation rules are not based around number of ops.

Not to the exclusion of all else, sure.  But my point was that there
are reasons why forcing the (ior ...) form for non-constants might not
be a strict improvement.

> For example, we do (and (not A) (not B)) rather than (not (ior (A B)) .

Right, hence my complaint about this the other day on IRC. :-)
I hadn't noticed until then that gimple had a different rule.

> Instead, there are other rules (like here: push "not"s inward,
> which can be applied locally with the wanted result).

Sure.  But I think it's common ground that there's no existing
rtl rule that applies naturally to (xor (and (xor A B) C) B),
where there's no (not ...) to push down.

>> And folding 4 operations gets us into 4-insn combinations, which are
>> obviously more limited (for good reason).
>
> But on most machines it doesn't need to combine more than two or three
> insns to get here.  Reducing the depth of the tree is more useful...  That
> is 3 in both cases here, but "andc" is common on many machines, so that
> makes it only two deep.
>
>> As you say, it's no accident that we get this form, it's something
>> that match.pd specifically chose.  And I think there should be a
>> strong justification for having an RTL canonical form that reverses
>> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
>> to be able to undo the gimple transforms in all cases.
>
> Canonical RTL is different in many ways, already.

Sure, wasn't claiming otherwise.  But most of the rtl canonicalisation
rules predate gimple by some distance, so while the individual choices
are certainly deliberate, the differences weren't necessarily planned
as differences.  Whereas here we're talking about adding a new rtl rule
with the full knowledge that it's the ooposite of the equivalent gimple
rule.  If we're going to move in one direction, it seems better to move
towards making the rules more consistent rather than towards deliberately
making them (even) less consistent.

> "Not as powerful", I have no idea what you mean, btw.  RTL is much closer
> to the real machine, so is a lot *more* powerful than Gimple for modelling
> machine instructions (where Gimple is much nicer for higher-level
> optimisations).  We need both.

I meant rtl passes aren't generally as powerful as gimple passes
(which wasn't what I said :-)).  E.g. match.pd sees potential
combinations on gimple stmts that combine wouldn't see for the
corresponding rtl insns.

Richard

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread arigo at tunes dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #8 from Armin Rigo  ---
I'd like to point out that the problem only shows up with all the extra lines
of code that appear unrelated: everything before the loop, and the first half
of the loop itself (the switch-with-goto with cases 8 and 4).  For example, we
can't remove any if in the chain "if(i) if(g) if(h) if(e)".  And we can't
replace "goto bh;" with the apparently equivalent "break;".

[Bug c++/91363] Implement P0960R3: Parenthesized initialization of aggregates

2019-10-16 Thread mpolacek at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91363

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org

--- Comment #1 from Marek Polacek  ---
Working on it.  This test now runs fine with -std=c++2a with my WIP patch:

struct A {
  int i;
  int j;
  int k; // value-init
};

int i;
A a(1, 2);
A a2(1.0, 2);
A a3(++i, ++i); // left-to-right eval

int
main ()
{
  if (a.i != 1 || a.j != 2 || a.k != 0)
__builtin_abort ();
  if (a2.i != 1 || a2.j != 2 || a.k != 0)
__builtin_abort ();
  if (a3.i != 1 || a3.j != 2 || a.k != 0)
__builtin_abort ();
}

[Bug fortran/92113] [8 regression] r276673 causes segfault in gfortran.dg/pr51434.f90

2019-10-16 Thread tkoenig at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92113

Thomas Koenig  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-16
 Ever confirmed|0   |1

--- Comment #1 from Thomas Koenig  ---
Probably fixed by some intermediate patch.

I'll try a bisection to see which one it was, and then backport
the patch if I can find it.

[PATCH] [OBVIOUS] Fix old file reference in gcc/cp/cp-gimplify.c

2019-10-16 Thread Luis Machado

I've found this stale reference while looking at cp-gimplify.c. tree-gimplify.c
no longer exists and its contents were merged into gimple.c.

Seems obvious enough.

gcc/cp/ChangeLog:

2019-10-16  Luis Machado  

* cp-gimplify.c: Fix reference to non-existing tree-gimplify.c file.

Signed-off-by: Luis Machado 
---
 gcc/cp/cp-gimplify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 154fa70ec06..0ab0438f601 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1,4 +1,4 @@
-/* C++-specific tree lowering bits; see also c-gimplify.c and tree-gimple.c.
+/* C++-specific tree lowering bits; see also c-gimplify.c and gimple.c.
 
Copyright (C) 2002-2019 Free Software Foundation, Inc.
Contributed by Jason Merrill 
-- 
2.17.1

Re: [PATCH] handle local aggregate initialization in strlen, take 2 (PR 83821)

2019-10-16 Thread Jakub Jelinek

On Mon, Oct 14, 2019 at 06:23:22PM -0600, Martin Sebor wrote:
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/83821
>   * tree-ssa-strlen.c (maybe_invalidate): Add argument.  Consider
>   the length of a string when available.

> +   fprintf (dump_file,
> +"  statement may clobber string %zu long\n",
> +tree_to_uhwi (size));
> + else
> +   fprintf (dump_file,
> +"  statement may clobber string\n");

This broke bootstrap on i686-linux and likely many other hosts.

I'll commit the following as obvious if it passes bootstrap.

2019-10-16  Jakub Jelinek  

* tree-ssa-strlen.c (maybe_invalidate): Use
HOST_WIDE_INT_PRINT_UNSIGNED instead of "%zu".

--- gcc/tree-ssa-strlen.c.jj2019-10-16 23:10:18.641727500 +0200
+++ gcc/tree-ssa-strlen.c   2019-10-16 23:44:43.202636942 +0200
@@ -1130,7 +1130,8 @@ maybe_invalidate (gimple *stmt, bool zer
  {
if (size && tree_fits_uhwi_p (size))
  fprintf (dump_file,
-  "  statement may clobber string %zu long\n",
+  "  statement may clobber string "
+  HOST_WIDE_INT_PRINT_UNSIGNED " long\n",
   tree_to_uhwi (size));
else
  fprintf (dump_file,


Jakub

[PATCH] [x86] Add detection of Icelake Client and Server

2019-10-16 Thread Thiago Macieira

gcc/ChangeLog:
* config/i386/driver-i386.c (host_detect_local_cpu): Handle
  icelake-client and icelake-server.
* testsuite/gcc.target/i386/builtin_target.c (check_intel_cpu_model):
  Verify icelakes are detected correctly.

libgcc/ChangeLog:
* config/i386/cpuinfo.c (get_intel_cpu): Handle icelake-client
  and icelake-server.
---
 gcc/config/i386/driver-i386.c  | 11 +++
 gcc/testsuite/gcc.target/i386/builtin_target.c | 11 +++
 libgcc/config/i386/cpuinfo.c   | 13 +
 3 files changed, 35 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 8e8b4d21950..996308c7eaa 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -855,6 +855,17 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  /* Cannon Lake.  */
  cpu = "cannonlake";
  break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ cpu = "icelake-server";
+ break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ cpu = "icelake-client";
+ break;
case 0x85:
  /* Knights Mill.  */
  cpu = "knm";
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 7a8b6e805ed..ef15d48f84d 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -124,6 +124,17 @@ check_intel_cpu_model (unsigned int family, unsigned int 
model,
  /* Cannon Lake.  */
  assert (__builtin_cpu_is ("cannonlake"));
  break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ assert (__builtin_cpu_is ("icelake-client"));
+ break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ assert (__builtin_cpu_is ("icelake-server"));
+ break;
case 0x17:
case 0x1d:
  /* Penryn.  */
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index 5659ec89546..cc9c62f053b 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -232,6 +232,19 @@ get_intel_cpu (unsigned int family, unsigned int model, 
unsigned int brand_id)
  __cpu_model.__cpu_type = INTEL_COREI7;
  __cpu_model.__cpu_subtype = INTEL_COREI7_CANNONLAKE;
  break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ __cpu_model.__cpu_type = INTEL_COREI7;
+ __cpu_model.__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+ break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ __cpu_model.__cpu_type = INTEL_COREI7;
+ __cpu_model.__cpu_subtype = INTEL_COREI7_ICELAKE_SERVER;
+ break;
case 0x17:
case 0x1d:
  /* Penryn.  */
-- 
2.23.0

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread arigo at tunes dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #7 from Armin Rigo  ---
Created attachment 47056
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47056=edit
made the example runnable

Here is a main().  Compare:
* gcc -Og foomin3.c foomin3main.c && a.out
* gcc -O1 foomin3.c foomin3main.c && a.out

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #6 from Jakub Jelinek  ---
The reason the intersection gives [-INF, -8] is that compare_values (
-7, e.7_8 + 9223372036854775806 ) returns -1 rather than -2.  And that is
because it thinks exactly 9223372036854775806 is added to e.7_8, at which point
it would be even for e.7_8 equal to LONG_MIN -2 and for any larger value
larger.
Though, when we created that e.7_8 + 9223372036854775806, we didn't mean that,
we meant that the maximum value it can have is e.7_8 + 9223372036854775806 if
e.7_8 is negative or 1, if it is > 1, the maximum value is LONG_MAX, i.e. we
meant
MIN (9223372036854775807, (__int128_t) e.7_8 + 9223372036854775806).
So, I'm afraid we need to define exactly what we mean by symbolic + constant
first and depending on that tweak compare_values, or intersect_ranges, or
extract_range_from_plus_minus_expr.

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread JeanHeyd Meneide

Thanks, Jason! I fixed those last things and I put the changelog below
in the e-mail. I'll figure out how to write a good changelog in a
commit message on the command line soon. :D



2019-10-16  JeanHeyd Meneide 

gcc/

* escaped_string.h: New. Refactored out of tree.c to make more
broadly available (e.g. to parser.c, cvt.c).
* tree.c: remove escaped_string class

gcc/c-family

* c-lex.c - update attribute value

gcc/cp/

* tree.c: Implement p1301 - nodiscard("should have a reason"))
Added C++2a nodiscard string message handling.
Increase nodiscard argument handling max_length from 0
to 1.
* parser.c: add requirement that nodiscard only be seen
once in attribute-list
* cvt.c: add nodiscard message to output, if applicable
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..f2d2ba6cafb 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
+ format, fn, raw_msg))
inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
+  && (attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES 
(rettype
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = has_msg ?
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %%s");
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
+ format, rettype, raw_msg))
{
  if (fn)
inform (DECL_SOURCE_LOCATION (fn),
@@ -1180,7 +1204,7 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
 instantiations be affected by an ABI property that is, or at
 least ought to be transparent to the language.  */
   if (tree fn = cp_get_callee_fndecl_nofold (expr))
-   if (DECL_CONSTRUCTOR_P (fn) || DECL_DESTRUCTOR_P (fn))
+   if (DECL_DESTRUCTOR_P (fn))
  return expr;
 
   maybe_warn_nodiscard (expr, implicit);
diff

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool

On Wed, Oct 16, 2019 at 09:04:18PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > This isn't canonical RTL.  Does combine not simplify this?
> >
> > Or, rather, it should not be what we canonicalise to: nothing is defined
> > here.
> 
> But when nothing is defined, let's match what we get :-)

Of course.

> If someone wants to add a new canonical form then the ports should of
> course adapt, but until then I think the patch is doing the right thing.

We used to generate this, until GCC 5.  There aren't many ports that have
adapted yet.

> > If the mask is not a constant, we really shouldn't generate a totally
> > different form.  The xor-and-xor form is very hard to handle, too.
> >
> > Expand currently generates this, because gimple thinks this is simpler.
> > I think this should be fixed.
> 
> But the constant form is effectively folding away the NOT.
> Without it the equivalent rtl uses 4 operations rather than 3:
> 
>   (ior (and A C) (and B (not C)))

RTL canonicalisation rules are not based around number of ops.  For example,
we do  (and (not A) (not B))  rather than  (not (ior (A B))  .  Instead,
there are other rules (like here: push "not"s inward, which can be applied
locally with the wanted result).

> And folding 4 operations gets us into 4-insn combinations, which are
> obviously more limited (for good reason).

But on most machines it doesn't need to combine more than two or three
insns to get here.  Reducing the depth of the tree is more useful...  That
is 3 in both cases here, but "andc" is common on many machines, so that
makes it only two deep.

> As you say, it's no accident that we get this form, it's something
> that match.pd specifically chose.  And I think there should be a
> strong justification for having an RTL canonical form that reverses
> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> to be able to undo the gimple transforms in all cases.

Canonical RTL is different in many ways, already.

"Not as powerful", I have no idea what you mean, btw.  RTL is much closer
to the real machine, so is a lot *more* powerful than Gimple for modelling
machine instructions (where Gimple is much nicer for higher-level
optimisations).  We need both.

Segher

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #5 from Jakub Jelinek  ---
Or, more likely the intersection into [-INF, -8] is incorrect.

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-10-16 Thread Richard Sandiford

Wilco Dijkstra  writes:
> ping
>
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?
>
> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
>
> PR target/79262
> * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.

OK, thanks, and sorry for the delay.

qdf24xx_vector_cost is the only specific CPU cost table with a
vec_to_scalar_cost as low as 1.  It's not obvious how emphatic
that choice is though.  It looks like qdf24xx_vector_cost might
(very reasonably!) have started out as a copy of the generic costs
with some targeted changes.

But even if 1 is accurate there from a h/w perspective, the problem
is that the vectoriser's costings have a tendency to miss additional
overhead involved in scalarisation.  Although increasing the cost
to avoid that might be a bit of a hack, it's the accepted hack.

So I suspect in practice all CPUs will benefit from a higher cost,
not just those whose CPU tables already have one.  On that basis,
increasing the generic cost by the smallest possible amount should
be a good change across the board.

If anyone finds a counter-example, please let us know or file a bug.

Richard

> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>1, /* vec_int_stmt_cost  */
>1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
>1, /* vec_align_load_cost  */
>1, /* vec_unalign_load_cost  */

[PATCH] RISC-V: Include more registers in SIBCALL_REGS.

2019-10-16 Thread Jim Wilson

This finishes the part 1 of 2 patch submitted by Andrew Burgess on Aug 19.
This adds the argument registers but not t0 (aka x5) to SIBCALL_REGS.  It
also adds the missing riscv_regno_to_class change.

Tested with cross riscv32-elf and riscv64-linux toolchain build and check.
There were no regressions.  I see about a 0.01% code size reduction for the
C and libstdc++ libraries.

Committed.

Jim

gcc/
* config/riscv/riscv.h (REG_CLASS_CONTENTS): Add argument passing
regs to SIBCALL_REGS.
* config/riscv/riscv.c (riscv_regno_to_class): Change argument
passing regs to SIBCALL_REGS.
---
 gcc/config/riscv/riscv.c | 6 +++---
 gcc/config/riscv/riscv.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index b8a8778b92c..77a3ad94aa8 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -256,9 +256,9 @@ enum riscv_microarchitecture_type riscv_microarchitecture;
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
   GR_REGS, GR_REGS,SIBCALL_REGS,   SIBCALL_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
+  JALR_REGS,   JALR_REGS,  SIBCALL_REGS,   SIBCALL_REGS,
+  SIBCALL_REGS,SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
+  SIBCALL_REGS,SIBCALL_REGS,   JALR_REGS,  JALR_REGS,
   JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
   JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
   SIBCALL_REGS,SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 5fc9be8edbf..246494663f6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -400,7 +400,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS \
 {  \
   { 0x, 0x, 0x },  /* NO_REGS */   \
-  { 0xf0c0, 0x, 0x },  /* SIBCALL_REGS */  \
+  { 0xf003fcc0, 0x, 0x },  /* SIBCALL_REGS */  \
   { 0xffc0, 0x, 0x },  /* JALR_REGS */ \
   { 0x, 0x, 0x },  /* GR_REGS */   \
   { 0x, 0x, 0x },  /* FP_REGS */   \
-- 
2.17.1

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread Jason Merrill


On 10/15/19 8:31 PM, JeanHeyd Meneide wrote:

Attached is a patch for p1301 that improves in the way Jason Merrill
specified earlier
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html)


Great, thanks!

This mail is missing ChangeLog entries.  My guess is that you're using 
git diff to create the patch file; git show or (even better) git 
format-patch will also include the commit message.



+/* Check that the attribute ATTRIBU,TE appears at most once in the


Stray added ,


-  else if (TREE_CODE (expr) == TARGET_EXPR
-  && lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
+  else if (TREE_CODE (expr) == TARGET_EXPR)
 {
   /* The TARGET_EXPR confuses do_warn_unused_result into thinking that the
 result is used, so handle that case here.  */
-  if (fn)
+  if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))

...

+  else if ((attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (type


The first two if/else should have already handled nodiscard; this else 
was only intended to handle cases where warn_unused_result wants a 
warning and nodiscard doesn't, i.e. when there's an explicit cast to 
void.  We shouldn't need to change anything here.


Jason

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #4 from Jakub Jelinek  ---
From what I can see, the weird + 0x7ffe is created when
extract_range_from_plus_minus_expr is called with PLUS_EXPR and VARYING and
long int [-INF, e.7_8 + -1] ranges.
  /* Build the symbolic bounds if needed.  */
  adjust_symbolic_bound (min, code, expr_type,
 sym_min_op0, sym_min_op1,
 neg_min_op0, neg_min_op1);
  adjust_symbolic_bound (max, code, expr_type,
 sym_max_op0, sym_max_op1,
 neg_max_op0, neg_max_op1);
in there.  max_op0 is 9223372036854775807 (maximum of VARYING), and max_op1 is
-1 (which is not the upper bound, just an offset against the symbolic).
combine_bound adds this into wmax, so 9223372036854775807 + -1 gives
0x7ffe and that is what is used.  But e + 0x7ffe is
later considered that it is a huge value added to e in undefined overflow way
and thus e has to be at most 1.  I think VARYING + [-INF, e.7_8 + -1] really
should be VARYING.

[Bug demangler/67299] demangler mishandles complex types

2019-10-16 Thread saldivarcher at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67299

Miguel Saldivar  changed:

   What|Removed |Added

 CC||saldivarcher at gmail dot com

--- Comment #3 from Miguel Saldivar  ---
Created attachment 47055
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47055=edit
Print "float complex" rather than "floatcomplex "

So, `llvm-cxxfilt` and `cpp_demangle` both print: `f(float complex)` for symbol
`_Z1fCf`, it'd probably be best to do the same? Although if `_Complex` is
preferred, I can change it to that as well.

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jason Merrill

On 10/16/19 12:27 PM, Jakub Jelinek wrote:
> On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:
>>> On x86_64 and most other targets, cleanup here (if non-NULL) is the
>>> CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
>>> type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
>>> protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
>>> but on arm only NOP_EXPR location.
>>>
>>> The following patch (totally untested) should fix that.
>>>
>>> For the warning location, perhaps we could special case destructor calls
>>> in push_cx_call_context (to offset the intentional clearing of location for
>>> debugging purposes), if they don't have location set, don't use
>>> input_location for them, but try to pick DECL_SOURCE_LOCATION for the
>>> variable being destructed?
>>
>> Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
>> the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.
>
> The already previously posted patch (now attached as first) has now been
> bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
> improve the location or not should fix the arm vs. the rest of the world
> difference.  Is that ok for trunk?

OK.

> As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
> anything, the diagnostics was still
> constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
> constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
> constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
>  5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
> "inline assembly is not a constant expression" }
>|  ^~~
> constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in 
> a ‘constexpr’ function in C++2a
> as without that change.

That's because the patch changes EXPR_LOCATION for evaluation of the
CLEANUP_BODY, but it should be for evaluation of CLEANUP_EXPR instead.


Jason

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jason Merrill


On 10/16/19 12:27 PM, Jakub Jelinek wrote:

On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:

On x86_64 and most other targets, cleanup here (if non-NULL) is the
CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
but on arm only NOP_EXPR location.

The following patch (totally untested) should fix that.

For the warning location, perhaps we could special case destructor calls
in push_cx_call_context (to offset the intentional clearing of location for
debugging purposes), if they don't have location set, don't use
input_location for them, but try to pick DECL_SOURCE_LOCATION for the
variable being destructed?


Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.


The already previously posted patch (now attached as first) has now been
bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
improve the location or not should fix the arm vs. the rest of the world
difference.  Is that ok for trunk?


OK.


As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
anything, the diagnostics was still
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
 5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error "inline 
assembly is not a constant expression" }
   |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a
as without that change.


That's because the patch changes EXPR_LOCATION for evaluation of the 
CLEANUP_BODY, but it should be for evaluation of CLEANUP_EXPR instead.


Jason

Re: [PATCH] Help compiler detect invalid code

2019-10-16 Thread François Dumont


Here is a version with __detail::__copy and __detail::__copy_backward.

I prefered to introduce the __detail namespace cause __copy is quite a 
common name so putting it in __detail namespace will I hope clarify that 
it is for internal usage only.


I even hesitated to put more details into this namespace, maybe for 
another patch later.


    * include/bits/stl_algobase.h (__memmove): Replace by...
    (__detail::__copy) ...that. Return void, loop as long as __n != 0.
    (__copy_move<_IsMove, true, 
std::random_access_iterator_tag>::__copy_m):

    Adapt to use latter.
    (__detail::__copy_backward): New.
    (__copy_move_backward<_IsMove, true,
    std::random_access_iterator_tag>::__copy_m): Adapt to use latter.
    (__copy_move_backward_a): Remove std::is_constant_evaluated block.
    * testsuite/25_algorithms/copy/constexpr.cc (test): Add check on copied
    values.
    * testsuite/25_algorithms/copy_backward/constexpr.cc (test): Likewise
    and rename in test1.
    (test2): New.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New.
    * testsuite/25_algorithms/copy_backward/constexpr.cc: New.
    * testsuite/25_algorithms/equal/constexpr_neg.cc: New.
    * testsuite/25_algorithms/move/constexpr.cc: New.
    * testsuite/25_algorithms/move/constexpr_neg.cc: New.

François

On 10/10/19 10:03 PM, Jonathan Wakely wrote:

On 01/10/19 22:05 +0200, François Dumont wrote:

On 9/27/19 1:24 PM, Jonathan Wakely wrote:

On 20/09/19 07:08 +0200, François Dumont wrote:
I already realized that previous patch will be too controversial to 
be accepted.


In this new version I just implement a real memmove in __memmove so


A real memmove doesn't just work backwards, it needs to detect any
overlaps and work forwards *or* backwards, as needed.
ok, good to know, I understand now why using __builtin_memcopy didn't 
show any performance enhancement when I tested it !


I think your change will break this case:

#include 

constexpr int f(int i, int j, int k)
{
 int arr[5] = { 0, 0, i, j, k };
 std::move(arr+2, arr+5, arr);
 return arr[0] + arr[1] + arr[2];
}

static_assert( f(1, 2, 3) == 6 );

This is valid because std::move only requires that the result iterator
is not in the input range, but it's OK for the two ranges to overlap.

I haven't tested it, but I think with your change the array will end
up containing {3, 2, 3, 2, 3} instead of {1, 2, 3, 2, 3}.

Indeed, I've added a std::move constexpr test in this new proposal 
which demonstrate that.


C++ Standard clearly states that [copy|move]_backward is done 
backward. So in this new proposal I propose to add a __memcopy used 
in copy/move and keep __memmove for *_backward algos. Both are using 
__builtin_memmove as before.


Then they *really* need better names now (__memmove was already a bad
name, but now it's terrible). If the difference is that one goes
forwards and one goes backwards, the names should reflect that.

I'll review it properly tomorrow.




diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 98d324827ed..dc6b3d3fc76 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -77,19 +77,22 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+namespace __detail
+{
   /*
* A constexpr wrapper for __builtin_memmove.
+   * When constant-evaluated performs a forward copy.
* @param __num The number of elements of type _Tp (not bytes).
*/
   template
 _GLIBCXX14_CONSTEXPR
-inline void*
-__memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+inline void
+__copy(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
 {
 #ifdef __cpp_lib_is_constant_evaluated
   if (std::is_constant_evaluated())
 	{
-	  for(; __num > 0; --__num)
+	  for (; __num != 0; --__num)
 	{
 	  if constexpr (_IsMove)
 		*__dst = std::move(*__src);
@@ -98,13 +101,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  ++__src;
 	  ++__dst;
 	}
-	  return __dst;
 	}
   else
 #endif
-	return __builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
-  return __dst;
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
+}
+
+  /*
+   * A constexpr wrapper for __builtin_memmove.
+   * When constant-evaluated performs a backward copy.
+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+_GLIBCXX14_CONSTEXPR
+inline void
+__copy_backward(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
+{
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+	{
+	  __dst += __num;
+	  __src += __num;
+	  for (; __num != 0; --__num)
+	{
+	  if constexpr (_IsMove)
+		*--__dst = std::move(*--__src);
+	  else
+		*--__dst = *--__src;
+	}
+	}
+  else
+#endif
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
 }
+} // namespace __detail
 
   /*
* A constexpr wrapper for __builtin_memcmp.
@@ -446,7 +476,7 @@

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #7 from Witold Baryluk  ---
Online examples: https://gcc.godbolt.org/z/Nyjty3

[Bug testsuite/92127] [10 regression] gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c fails after r276645 on power7

2019-10-16 Thread seurer at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92127

--- Comment #1 from seurer at gcc dot gnu.org ---
Also this test:  

FAIL: gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c scan-tree-dump-times vect
"vectorization not profitable" 1

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #3 from Jakub Jelinek  ---
The testcase is not very good because it is missing main and calling the
function with some arguments that will trigger the miscompilation.
Anyway, at least at -O2 it seems to be a VRP bug to me:
ao_31 = ASSERT_EXPR ;
Intersecting
  long int ~[-7, -1]  EQUIVALENCES: { ao_23 } (1 elements)
and
  long int [-INF, e.7_8 + 9223372036854775806]  EQUIVALENCES: { } (0 elements)
to
  long int [-INF, -8]  EQUIVALENCES: { ao_23 } (1 elements)
where e.7_8 is VARYING and am_13 is too.
  am_28 = ASSERT_EXPR ;
  b.9_11 = b;
  ao_23 = b.9_11 + am_28;
  ao_31 = ASSERT_EXPR ;
ao_31 is the ao value at the see_me_here call.
There is
  if (am >= 0)
b = -am;
  ao = am + b;
  f = ao & 7;
  if (f == 0)
so if am is non-negative, then ao is 0, but if am is negative, then a VARYING
long is added to it and all we know is that am < e (am_13 < e.7_8) for a
VARYING e.7_8.  The 9223372036854775806 is 0x7ffe and that looks
wrong to me.
>From the (unsigned long) ao_23 <= 18446744073709551608 assertion we know that
the low 3 bits are 0, e.g. ~[-7, -1], but where the  + 9223372036854775806
comes from is unclear.  I'd think that for ao_23 we should have [-INF, e.7_8-1]
range or so.

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Segher Boessenkool  writes:
> Hi,
>
> [ Please don't use application/octet-stream attachments.  Thanks! ]
>
> On Wed, Oct 16, 2019 at 04:24:29PM +, Yuliang Wang wrote:
>> +;; Unpredicated bitwise select.
>> +(define_insn "*aarch64_sve2_bsl"
>> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
>> +(xor:SVE_I
>> +  (and:SVE_I
>> +(xor:SVE_I
>> +  (match_operand:SVE_I 1 "register_operand" ", w")
>> +  (match_operand:SVE_I 2 "register_operand" ", w"))
>> +(match_operand:SVE_I 3 "register_operand" "w, w"))
>> +  (match_dup BSL_3RD)))]
>
> This isn't canonical RTL.  Does combine not simplify this?
>
> Or, rather, it should not be what we canonicalise to: nothing is defined
> here.

But when nothing is defined, let's match what we get :-)

If someone wants to add a new canonical form then the ports should of
course adapt, but until then I think the patch is doing the right thing.

> We normally get something like
>
> Trying 7, 8 -> 9:
> 7: r127:SI=r130:DI#4^r125:DI#4
>   REG_DEAD r130:DI
> 8: r128:SI=r127:SI&0x2000
>   REG_DEAD r127:SI
> 9: r126:SI=r128:SI^r125:DI#4
>   REG_DEAD r128:SI
>   REG_DEAD r125:DI
> Successfully matched this instruction:
> (set (reg:SI 126)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 4)
> (const_int 536870912 [0x2000]))
> (and:SI (subreg:SI (reg/v:DI 125 [ yD.2902+-4 ]) 4)
> (const_int -536870913 [0xdfff]
>
> If the mask is not a constant, we really shouldn't generate a totally
> different form.  The xor-and-xor form is very hard to handle, too.
>
> Expand currently generates this, because gimple thinks this is simpler.
> I think this should be fixed.

But the constant form is effectively folding away the NOT.
Without it the equivalent rtl uses 4 operations rather than 3:

  (ior (and A C) (and B (not C)))

And folding 4 operations gets us into 4-insn combinations, which are
obviously more limited (for good reason).

As you say, it's no accident that we get this form, it's something
that match.pd specifically chose.  And I think there should be a
strong justification for having an RTL canonical form that reverses
a gimple decision.  RTL isn't as powerful as gimple and so isn't going
to be able to undo the gimple transforms in all cases.

Thanks,
Richard

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread arigo at tunes dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #2 from Armin Rigo  ---
Created attachment 47054
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47054=edit
slightly different version, with comments showing the expected values

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #6 from Witold Baryluk  ---
I also tested clang with LLVM 10~svn374655 and it does vectorize the loop
properly, even when both frequency and amplitude variables are updated every
loop. 

It still doesn't inline calls to sinf, even if I set -fno-math-errno and other
things from -ffast-math. My random guess is that it is because there is no
hardware support for vectorized sinf, and there is no vectorized variant of
sinf software implementation either. If I provide my own version of sinf using
simple Taylor expansion, clang fully vectorized the code:



  401320:   62 e1 7d 58 fe 3d 56vpaddd 0xd56(%rip){1to16},%zmm0,%zmm23 
  # 402080 <_IO_stdin_used+0x80>
  401327:   0d 00 00 
  40132a:   62 61 7c 48 5b c0   vcvtdq2ps %zmm0,%zmm24
  401330:   62 a1 7c 48 5b ff   vcvtdq2ps %zmm23,%zmm23
  401336:   62 f1 7c 48 10 4c 24vmovups 0x140(%rsp),%zmm1
  40133d:   05 
  40133e:   62 61 3c 40 59 d1   vmulps %zmm1,%zmm24,%zmm26
  401344:   62 61 44 40 59 f9   vmulps %zmm1,%zmm23,%zmm31
  40134a:   62 f1 7c 48 10 4c 24vmovups 0x100(%rsp),%zmm1
  401351:   04 
  401352:   62 61 3c 40 59 d9   vmulps %zmm1,%zmm24,%zmm27
  401358:   62 f1 44 40 59 c9   vmulps %zmm1,%zmm23,%zmm1
  40135e:   62 01 2c 40 59 ca   vmulps %zmm26,%zmm26,%zmm25
  401364:   62 f1 7c 48 10 54 24vmovups 0x80(%rsp),%zmm2
  40136b:   02 
  40136c:   62 61 3c 40 59 e2   vmulps %zmm2,%zmm24,%zmm28
  401372:   62 f1 44 40 59 d2   vmulps %zmm2,%zmm23,%zmm2
  401378:   62 02 25 40 ac ca   vfnmadd213ps %zmm26,%zmm27,%zmm25
  40137e:   62 f1 7c 48 10 5c 24vmovups 0x40(%rsp),%zmm3
  401385:   01 
  401386:   62 61 3c 40 59 eb   vmulps %zmm3,%zmm24,%zmm29
  40138c:   62 f1 44 40 59 db   vmulps %zmm3,%zmm23,%zmm3
  401392:   62 01 1c 40 59 d4   vmulps %zmm28,%zmm28,%zmm26
  401398:   62 01 04 40 59 df   vmulps %zmm31,%zmm31,%zmm27
  40139e:   62 02 15 40 ac d4   vfnmadd213ps %zmm28,%zmm29,%zmm26
  4013a4:   62 f1 7c 48 10 6c 24vmovups -0x40(%rsp),%zmm5
  4013ab:   ff 
  4013ac:   62 f1 3c 40 59 e5   vmulps %zmm5,%zmm24,%zmm4
  4013b2:   62 f1 44 40 59 ed   vmulps %zmm5,%zmm23,%zmm5
  4013b8:   62 61 6c 48 59 e2   vmulps %zmm2,%zmm2,%zmm28
  4013be:   62 f1 7c 48 10 7c 24vmovups -0x80(%rsp),%zmm7
  4013c5:   fe 
  4013c6:   62 f1 3c 40 59 f7   vmulps %zmm7,%zmm24,%zmm6
  4013cc:   62 f1 44 40 59 ff   vmulps %zmm7,%zmm23,%zmm7
  4013d2:   62 61 5c 48 59 ec   vmulps %zmm4,%zmm4,%zmm29
  4013d8:   62 61 54 48 59 f5   vmulps %zmm5,%zmm5,%zmm30
  4013de:   62 62 4d 48 ac ec   vfnmadd213ps %zmm4,%zmm6,%zmm29
  4013e4:   62 d1 3c 40 59 e3   vmulps %zmm11,%zmm24,%zmm4
  4013ea:   62 d1 44 40 59 f3   vmulps %zmm11,%zmm23,%zmm6
  4013f0:   62 02 75 48 ac df   vfnmadd213ps %zmm31,%zmm1,%zmm27
  4013f6:   62 d1 3c 40 59 cc   vmulps %zmm12,%zmm24,%zmm1
  4013fc:   62 41 44 40 59 fc   vmulps %zmm12,%zmm23,%zmm31
  401402:   62 71 5c 48 59 c4   vmulps %zmm4,%zmm4,%zmm8
  401408:   62 62 65 48 ac e2   vfnmadd213ps %zmm2,%zmm3,%zmm28
  40140e:   62 72 75 48 ac c4   vfnmadd213ps %zmm4,%zmm1,%zmm8
  401414:   62 d1 3c 40 59 ce   vmulps %zmm14,%zmm24,%zmm1
  40141a:   62 d1 44 40 59 d6   vmulps %zmm14,%zmm23,%zmm2
  401420:   62 62 45 48 ac f5   vfnmadd213ps %zmm5,%zmm7,%zmm30
  401426:   62 d1 3c 40 59 df   vmulps %zmm15,%zmm24,%zmm3
  40142c:   62 d1 44 40 59 e7   vmulps %zmm15,%zmm23,%zmm4
  401432:   62 f1 74 48 59 e9   vmulps %zmm1,%zmm1,%zmm5
  401438:   62 f1 4c 48 59 fe   vmulps %zmm6,%zmm6,%zmm7
  40143e:   62 71 6c 48 59 ca   vmulps %zmm2,%zmm2,%zmm9
  401444:   62 f2 65 48 ac e9   vfnmadd213ps %zmm1,%zmm3,%zmm5
  40144a:   62 b1 3c 40 59 c9   vmulps %zmm17,%zmm24,%zmm1
  401450:   62 f2 05 40 ac fe   vfnmadd213ps %zmm6,%zmm31,%zmm7
  401456:   62 b1 44 40 59 d9   vmulps %zmm17,%zmm23,%zmm3
  40145c:   62 b1 3c 40 59 f2   vmulps %zmm18,%zmm24,%zmm6
  401462:   62 21 44 40 59 fa   vmulps %zmm18,%zmm23,%zmm31
  401468:   62 72 5d 48 ac ca   vfnmadd213ps %zmm2,%zmm4,%zmm9
  40146e:   62 f1 74 48 59 d1   vmulps %zmm1,%zmm1,%zmm2
  401474:   62 f1 64 48 59 e3   vmulps %zmm3,%zmm3,%zmm4
  40147a:   62 f2 4d 48 ac d1   vfnmadd213ps %zmm1,%zmm6,%zmm2
  401480:   62 f2 05 40 ac e3   vfnmadd213ps %zmm3,%zmm31,%zmm4
  401486:   62 b1 3c 40 59 cc   vmulps %zmm20,%zmm24,%zmm1
  40148c:   62 b1 3c 40 59 dd   vmulps %zmm21,%zmm24,%zmm3
  401492:   62 f1 74 48 59 f1   vmulps %zmm1,%zmm1,%zmm6
  401498:   62 21 44 40 59 fc   vmulps %zmm20,%zmm23,%zmm31
  40149e:   62 f2 65 48 ac f1   vfnmadd213ps

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Thanks for the patch, looks really good.

Yuliang Wang  writes:
> +;; Use NBSL for vector NOR.
> +(define_insn_and_rewrite "*aarch64_sve2_nor"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
> + (unspec:SVE_I
> +   [(match_operand 3)
> +(and:SVE_I
> +  (not:SVE_I
> +(match_operand:SVE_I 1 "register_operand" "w, 0, w"))
> +  (not:SVE_I
> +(match_operand:SVE_I 2 "register_operand" "0, w, w")))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %1.d, %0.d
> +  nbsl\t%0.d, %0.d, %2.d, %0.d
> +  movprfx\t%0, %2\;nbsl\t%0.d, %0.d, %1.d, %0.d"
> +  "&& !CONSTANT_P (operands[3])"
> +  {
> +operands[3] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,*,yes")]
> +)
> +
> +;; Use NBSL for vector NAND.
> +(define_insn_and_rewrite "*aarch64_sve2_nand"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
> + (unspec:SVE_I
> +   [(match_operand 3)
> +(ior:SVE_I
> +  (not:SVE_I
> +(match_operand:SVE_I 1 "register_operand" "w, 0, w"))
> +  (not:SVE_I
> +(match_operand:SVE_I 2 "register_operand" "0, w, w")))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %1.d, %1.d
> +  nbsl\t%0.d, %0.d, %2.d, %2.d
> +  movprfx\t%0, %2\;nbsl\t%0.d, %0.d, %1.d, %1.d"
> +  "&& !CONSTANT_P (operands[3])"
> +  {
> +operands[3] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,*,yes")]
> +)

For these two it should in theory be slightly better to use "%" on
operand 1 to make the operation commutative, rather than provide a
separate matching alternative for operand 2.  E.g.:

(define_insn_and_rewrite "*aarch64_sve2_nand"
  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
(unspec:SVE_I
  [(match_operand 3)
   (ior:SVE_I
 (not:SVE_I
   (match_operand:SVE_I 1 "register_operand" "%0, w"))
 (not:SVE_I
   (match_operand:SVE_I 2 "register_operand" "w, w")))]
  UNSPEC_PRED_X))]
  "TARGET_SVE2"
  "@
  nbsl\t%0.d, %0.d, %2.d, %2.d
  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %2.d"
  "&& !CONSTANT_P (operands[3])"
  {
operands[3] = CONSTM1_RTX (mode);
  }
  [(set_attr "movprfx" "*,*,yes")]
)

(But the EOR3 pattern is fine as-is, since that supports any
permutation of the three inputs.)

> +;; Unpredicated bitwise select.
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl\t%0.d, %0.d, %.d, %3.d"
> +  [(set_attr "movprfx" "*,yes")]
> +)

This is sufficiently far from the documented logic that it might
be worth a comment.  E.g.:

;; (op3 ? bsl_mov : bsl_3rd) == (((bsl_mov ^ bsl_3rd) & op3) ^ bsl_3rd)

Similarly for the later patterns.

> +;; Unpredicated bitwise inverted select.
> +(define_insn_and_rewrite "*aarch64_sve2_nbsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(and:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w"))
> +  (match_operand:SVE_I 3 "register_operand" "w, w"))
> +(match_dup BSL_3RD)))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;nbsl\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted first operand.
> +(define_insn_and_rewrite "*aarch64_sve2_bsl1n"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w")))]
> +   UNSPEC_PRED_X)
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl1n\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl1n\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted second operand.
> +(define_insn_and_rewrite "*aarch64_sve2_bsl2n"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> +

[Bug testsuite/92132] New: new test case gcc.dg/vect/vect-cond-reduc-4.c fails with its introduction in r277067

2019-10-16 Thread seurer at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92132

Bug ID: 92132
   Summary: new test case gcc.dg/vect/vect-cond-reduc-4.c fails
with its introduction in r277067
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

This fails on powerpc64 both BE and LE

spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -flto -ffat-lto-objects
-maltivec -mvsx -mno-allow-movmisalign -ftree-vectorize
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2
-fdump-tree-vect-details -lm -o ./vect-cond-reduc-4.exe
PASS: gcc.dg/vect/vect-cond-reduc-4.c -flto -ffat-lto-objects (test for excess
errors)
Setting LD_LIBRARY_PATH to
:/home/seurer/gcc/build/gcc-test/gcc::/home/seurer/gcc/build/gcc-test/gcc:/home/seurer/gcc/build/gcc-test/./gmp/.libs:/home/seurer/gcc/build/gcc-test/./prev-gmp/.libs:/home/seurer/gcc/build/gcc-test/./mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./isl/.libs:/home/seurer/gcc/build/gcc-test/./prev-isl/.libs
Execution timeout is: 300
spawn [open ...]
PASS: gcc.dg/vect/vect-cond-reduc-4.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-cond-reduc-4.c -flto -ffat-lto-objects : pattern found 0 times
FAIL: gcc.dg/vect/vect-cond-reduc-4.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "LOOP VECTORIZED" 2
PASS: gcc.dg/vect/vect-cond-reduc-4.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "condition expression based on integer induction." 2
testcase /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect.exp completed
in 2 seconds

=== gcc Summary ===

# of expected passes6
# of unexpected failures2

[Bug target/59888] Darwin linker error "illegal text-relocation" with -shared

2019-10-16 Thread iains at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59888

--- Comment #19 from Iain Sandoe  ---
(In reply to Zaak from comment #18)
> (In reply to Iain Sandoe from comment #17)
> > by the way, I haven't been able to find a C reproducer for this issue - if
> > you feel we should have a testcase for it perhaps a link test for the
> > fortran example would work?

The Fortran FE folks might also want to convince themselves that the behaviour
FX describes in comment #8 is correct.

> It would be great to have a test to prevent future regressions here. I have
> no experience contributing to GCC or using dejagnu but if people are having
> trouble cooking up a C source to produce an object/library to link the
> Fortran against, I can try to find something to trigger this.

That's not the problem;
When C generates initialisers for function pointers, these end up in the
correct section already, the problem is triggered by something that the 
Fortran FE does differently.

So that means it's going to be hard to make a test that lives outside the
Fortran testsuite.

A testcase does not have to execute to be useful (a link test using the code at
comment #12 would be OK).  However someone has to package the test and decide
where it should live in the Fortran testsuite :-)

> As I mentioned previously this blocks gtk-fortran from linking on Darwin,
> however both GTK and gtk-fortran are not adequately small reproducers.

I will back port this in due course.

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Segher Boessenkool

On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
> PS The counterexample nicely illustrates why -Wself-init should
> be in -Wall like in Clang or MSVC, or at least in -Wextra like in
> ICC.  Let me take it as a reminder to submit a patch for GCC 10.

c-family/c-gimplify.c says:

  /* This is handled mostly by gimplify.c, but we have to deal with
 not warning about int x = x; as it is a GCC extension to turn off
 this warning but only if warn_init_self is zero.  */

A lot of code will start to warn if you turn on -Winit-self by default
(in -Wall or -W), since forever we have had this GCC extension, and
people expect it.  (Or so I fear, feel free to prove me wrong :-) )

Segher

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #5 from Witold Baryluk  ---
As a bonus:


static float perlin1d(float x) {
  float accum = 0.0f;
  for (int i = 0; i < 8; i++) {
accum += powf(0.781f, i) * sinf(x * powf(2.131f, i));
  }
  return accum;
}


claims to be vectorized, but really isn't, and has non inline or lowered calls
to sinf and expf_finite.

[Bug tree-optimization/83543] strlen of a local array member not optimized on some targets

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83543

--- Comment #8 from Martin Sebor  ---
The test committed for the solution for pr83821 is disabled for all targets
except LP64 x86_64 to get around this limitation.  Once this is implemented
across all targets the test should be enabled everywhere.

[Bug tree-optimization/83819] [meta-bug] missing strlen optimizations

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
Bug 83819 depends on bug 83821, which changed state.

Bug 83821 Summary: local aggregate initialization defeats strlen optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83821

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #4 from Witold Baryluk  ---
If I reduce minimized test case even further:

only frequency update: VECTORIZED:

static float perlin1d(float x) {
  float accum = 0.0f;
  float amplitude = 1.0f;
  float frequency = 1.0f;
  for (int i = 0; i < 8; i++) {
accum += amplitude * sinf(x * frequency);
frequency *= 2.131f;
  }
  return accum;
}

__attribute__((noinline))
static void fill_data(int width, float * __restrict__ height_data, float scale)
{
  for (int i = 0; i < width; i++) {
height_data[i] = perlin1d(i);
  }
}


only amplitude update: VECTORIZED:

static float perlin1d(float x) {
  float accum = 0.0f;
  float amplitude = 1.0f;
  float frequency = 1.0f;
  for (int i = 0; i < 8; i++) {
accum += amplitude * sinf(x * frequency);
amplitude *= 0.781f;
  }
  return accum;
}

__attribute__((noinline))
static void fill_data(int width, float * __restrict__ height_data, float scale)
{
  for (int i = 0; i < width; i++) {
height_data[i] = perlin1d(i);
  }
}

both frequency and amplitude update: NOT VECTORIZED:

static float perlin1d(float x) {
  float accum = 0.0f;
  float amplitude = 1.0f;
  float frequency = 1.0f;
  for (int i = 0; i < 8; i++) {
accum += amplitude * sinf(x * frequency);
amplitude *= 0.781f;
frequency *= 2.131f;
  }
  return accum;
}

__attribute__((noinline))
static void fill_data(int width, float * __restrict__ height_data, float scale)
{
  for (int i = 0; i < width; i++) {
height_data[i] = perlin1d(i);
  }
}

[Bug tree-optimization/83821] local aggregate initialization defeats strlen optimization

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83821

Martin Sebor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||10.0
 Resolution|--- |FIXED
   Target Milestone|--- |10.0
  Known to fail||7.3.0, 8.3.0, 9.2.0

--- Comment #7 from Martin Sebor  ---
Patch committed in r277080.

[Bug tree-optimization/92131] incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread arigo at tunes dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

--- Comment #1 from Armin Rigo  ---
Comment on attachment 47053
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47053
creduce'd C source that miscompiles in -O>=1

BTW I just noticed that the reduced code is highly self-recursive, but that's
just an artifact of the reduction: if we rename the recursive calls so that
they call some other declared-but-not-implemented function, then the results
are the same.

[Bug tree-optimization/83821] local aggregate initialization defeats strlen optimization

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83821

--- Comment #6 from Martin Sebor  ---
Author: msebor
Date: Wed Oct 16 19:24:36 2019
New Revision: 277080

URL: https://gcc.gnu.org/viewcvs?rev=277080=gcc=rev
Log:
PR tree-optimization/83821 - local aggregate initialization defeats strlen
optimization

gcc/ChangeLog:

PR tree-optimization/83821
* tree-ssa-strlen.c (maybe_invalidate): Add argument.  Consider
the length of a string when available.
(handle_builtin_memset) Add argument.
(handle_store, strlen_check_and_optimize_call): Same.
(check_and_optimize_stmt): Same.  Pass it to callees.

gcc/testsuite/ChangeLog:

PR tree-optimization/83821
* c-c++-common/Warray-bounds-4.c: Remove XFAIL.
* gcc.dg/strlenopt-82.c: New test.
* gcc.dg/strlenopt-83.c: Same.
* gcc.dg/strlenopt-84.c: Same.
* gcc.dg/strlenopt-85.c: Same.
* gcc.dg/strlenopt-86.c: Same.
* gcc.dg/tree-ssa/calloc-4.c: Same.
* gcc.dg/tree-ssa/calloc-5.c: Same.


Added:
trunk/gcc/testsuite/gcc.dg/strlenopt-82.c
trunk/gcc/testsuite/gcc.dg/strlenopt-83.c
trunk/gcc/testsuite/gcc.dg/strlenopt-84.c
trunk/gcc/testsuite/gcc.dg/strlenopt-85.c
trunk/gcc/testsuite/gcc.dg/strlenopt-86.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/calloc-4.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/calloc-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/c-c++-common/Warray-bounds-4.c
trunk/gcc/tree-ssa-strlen.c

[Bug target/87243] FSF GCC needs to do something special (like using xcrun) on darwin18 to find system headers in SDK

2019-10-16 Thread iains at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87243

--- Comment #11 from Iain Sandoe  ---
Author: iains
Date: Wed Oct 16 19:22:17 2019
New Revision: 277079

URL: https://gcc.gnu.org/viewcvs?rev=277079=gcc=rev
Log:
[Darwin] Pick up SDKROOT as the sysroot fallback.

For compatibility with xcrun and the behaviour of the clang driver, make use
of the setting of the SDKROOT environment variable when it is available.
This applies to both finding headers and libraries (i.e. it is also passed to
ld64).

Priority:
1. User's command-line specified --sysroot= or -isysroot.
2. The SDKROOT variable when set, and validated.
3. Any sysroot provided by --with-sysroot= configuration parameter.

SDKROOT is checked thus:
1. Presence.
2. That it starts with / (i.e. 'absolute').
3. That it is not / only (since that's the default).
4. That it is readable by the process executing the driver.

This is pretty much the same rule set as used by the clang driver.

NOTE: (3) might turn out to be overly restrictive in the case that we
have configured with --with-sysroot= and then we want to run on a system
with an installation of the headers/libraries in /. We can revisit this
if that turns out to be an important use-case.

So one can do:

xcrun --sdk macosx /path/to/gcc 

and that provides the SDK path as the sysroot to GCC as expected.

CAVEAT: An unfortunate effect of the fact that gcc (and g++) are
executables in the Xcode installation, which are found ahead of any such
named in the
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin:/usr/local/tools/gcc-2016/bin:

PATH=/path/to/gcc/install:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin:/usr/local/tools/gcc-2016/bin
xcrun --sdk macosx gcc 

does *not* work, instead that executes the clang from the xcode/commmand
line tools installation.

PATH=/path/to/gcc/install:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin:/usr/local/tools/gcc-2016/bin
xcrun --sdk macosx x64_64-apple-darwinXX-gcc ...

does work as expected, however.

2019-10-16  Iain Sandoe  

Backport from mainline
2019-10-03  Iain Sandoe  

PR target/87243
* config/darwin-driver.c (maybe_get_sysroot_from_sdkroot): New.
(darwin_driver_init): Use the sysroot provided by SDKROOT when that
is available and the user has not set one on the command line.


Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/config/darwin-driver.c

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #3 from Witold Baryluk  ---
If only the frequency is updated in the inner loop:

frequency *= 2.131f;

function fill_data is vectorized:

mesh_minimal.c:34:3: optimized: loop vectorized using 64 byte vectors
mesh_minimal.c:33:13: note: vectorized 1 loops in function.


However if amplitude is updated in the inner loop:

amplitude *= 0.781f;

function fill_data is NOT vectorized.

mesh_minimal.c:34:3: missed: couldn't vectorize loop
mesh_minimal.c:34:3: missed: not vectorized: latch block not empty.
mesh_minimal.c:33:13: note: vectorized 0 loops in function.


Here for reference:


/* line 20 */ static float perlin1d(float x) {
  float accum = 0.0;
  float frequency = 1.0;
  float amplitude = 1.0;
  for (int i = 0; i < 8; i++) {
accum += amplitude * (sinf(x * frequency + (float)i));
frequency *= 2.131f;
amplitude *= 0.781f;
  }
  return accum;
}

__attribute__((noinline))
/* line 33 */ static void fill_data(int width, float * __restrict__
height_data, float scale) {
  /* line 34 */ for (int i = 0; i < width; i++) {
height_data[i] = perlin1d(i);
  }
}

[Bug tree-optimization/92131] New: incorrect assumption that (ao >= 0) is always false

2019-10-16 Thread arigo at tunes dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92131

Bug ID: 92131
   Summary: incorrect assumption that (ao >= 0) is always false
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: arigo at tunes dot org
  Target Milestone: ---

Created attachment 47053
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47053=edit
creduce'd C source that miscompiles in -O>=1

The attached reduced snippet miscompiles with "gcc -O1 foo.c -S", or with any
higher optimization level.  It works fine with "gcc -Og foo.c -S".  It also
works fine with "gcc -O1 -fwrapv foo.c -S".  This may sound suspicious, but I'm
rather convinced that it is a bug anyway: -fwrapv should have no effect here,
which should be easy to check because the code contains only a few arithmetic
operations.  Most values involved are loaded from unknown globals, too.

The problem is that the compiled code removes the "if (ao >= 0) {..}"
completely.  This is unexpected, and most likely wrong, because there is no
reason for why ao should always be negative here.  In fact, we can check that
during the first iteration of the loop it will be equal to 0.

The various apparently unrelated checks that come before the loop cannot be
removed.  This code contains a lot of "goto" because it comes from generated
code (pypy project); attempting to change them into equivalent if-else logic
usually makes the bug disappear too.

Reproducers:

* gcc (GCC) 9.1.0 (arch linux x86-64)

$ gcc -O1 foomin1.c -S
$ cat foomin1.s
...
callsee_me_here@PLT
.L11:
callerror@PLT
...

* gcc (Debian 8.3.0-6) 8.3.0, aarch64

$ gcc -O1 foomin1.c -S
$ cat foomin1.s
...
bl  see_me_here
bl  error
...

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #2 from Witold Baryluk  ---
Added a minimized test case that has only one outer loop, and f and h are
removed for simple inlined replacement.

Example diagnostic:

$ gcc -std=c17 -march=knm -O3 -ffast-math -fassociative-math
-ftree-vectorizer-verbose=2 -fopt-info-vec-all -ggdb -Wall mesh_minimal.c -o
mesh_minimal_knm -lm

mesh_minimal.c:34:3: missed: couldn't vectorize loop
mesh_minimal.c:34:3: missed: not vectorized: latch block not empty.
mesh_minimal.c:33:13: note: vectorized 0 loops in function.

[Bug tree-optimization/92130] Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

--- Comment #1 from Witold Baryluk  ---
Created attachment 47052
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47052=edit
Minimized test case

[Bug tree-optimization/92129] [10 Regression] ICE in vectorizable_reduction, at tree-vect-loop.c:5869

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92129

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-16
 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |10.0
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Started with r276700.

[PATCH] [og9] Re-do OpenACC private variable resolution

2019-10-16 Thread Julian Brown

This patch (for the openacc-gcc-9-branch) reworks how the partitioning
level for OpenACC "private" variables is calculated and represented in
the compiler. There have been two previous approaches:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was deemed
   too early; the final determination about which loops are assigned
   which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

 https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism levels
   to omp-low.c (but kept the "oacc gangprivate" attribute and the NVPTX
   backend parts). However (as mentioned in that mail), this is still
   too early: in fact the final determination of the parallelism level
   for each loop (especially for loops without explicit gang/worker/vector
   clauses) does not happen until we reach the device compiler, in the
   oaccloops pass.

This patch builds on the second approach, but delays fixing the parallelism
level of each "private" variable (those that are addressable, and declared
private using OpenACC clauses or by defining them in a scope nested
within a compute region or partitioned loop) until the oaccdevlow pass.
This is done by adding a new internal UNIQUE function (OACC_PRIVATE)
that lists (the address of) each private variable as an argument.
These new internal functions fit into the existing scheme for demarking
OpenACC loops, as described in comments in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight mechanism
as a followup).

Tested with offloading to NVPTX and GCN (separately). I will apply to
the openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (diagnostic-core.h): Include.
(gcn_goacc_adjust_gangprivate_decl): Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(nvptx_goacc_adjust_private_decl): New function.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* doc/tm.texi: Regenerated.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Remove oacc_partitioning_levels field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert before
fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its
gimple call arguments as appropriate. Don't set
oacc_partitioning_levels in omp_context. Pass private_marker to
lower_oacc_reductions.
(oacc_record_private_var_clauses): Don't check for NULL ctx.
(make_oacc_private_marker): New function.
(lower_omp_for): Only call oacc_record_vars_in_bind for
OpenACC contexts.  Create private marker and pass to
lower_oacc_head_tail.
(lower_omp_target): Remove unnecessary call to
oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate.
Create private marker and pass to lower_oacc_reductions.
(process_oacc_gangprivate_1): Remove.
(lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC.  Don't
iterate over contexts calling process_oacc_gangprivate_1.
(omp-offload.c (oacc_loop_xform_head_tail): Treat
private-variable markers like fork/join when transforming head/tail
sequences.
(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of
"oacc gangprivate" attributes to determine partitioning level of
variables.
* omp-sese.c (find_gangprivate_vars): New function.
(find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead
of "oacc gangprivate" attribute to determine which variables are
gang-private.
(oacc_do_neutering): Use find_gangprivate_vars.
* target.def (adjust_gangprivate_decl): Rename to...
(adjust_private_decl): ...this.  Update documentation (briefly).

libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use
oaccdevlow dump and update scanned output.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise.
Add missing atomic to force worker

[PATCH] [og9] Fix libgomp serial-dims.c test for AMD GCN

2019-10-16 Thread Julian Brown

This patch adds support for AMD GCN offloading to the
libgomp.oacc-c-c++-common/serial-dims.c test case.

I will apply to the og9 branch shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Support AMD GCN.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
index 3895405b2cf..e373ebd37b7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
@@ -69,6 +69,13 @@ int main ()
  /* The GCC nvptx back end enforces vector_length (32).  */
  vectors_actual = 32;
}
+  else if (acc_on_device (acc_device_gcn))
+   {
+ /* AMD GCN relies on the autovectorizer for the vector dimension:
+the loop below isn't likely to be vectorized, so vectors_actual
+is effectively 1.  */
+ vectors_actual = 1;
+   }
   else if (!acc_on_device (acc_device_host))
__builtin_abort ();
 #pragma acc loop gang \
-- 
2.23.0

[Bug libstdc++/92124] std::vector copy-assigning when it should move-assign.

2019-10-16 Thread redi at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92124

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-10-16
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
I'm testing a patch now.

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool

Hi,

[ Please don't use application/octet-stream attachments.  Thanks! ]

On Wed, Oct 16, 2019 at 04:24:29PM +, Yuliang Wang wrote:
> +;; Unpredicated bitwise select.
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]

This isn't canonical RTL.  Does combine not simplify this?

Or, rather, it should not be what we canonicalise to: nothing is defined
here.

We normally get something like

Trying 7, 8 -> 9:
7: r127:SI=r130:DI#4^r125:DI#4
  REG_DEAD r130:DI
8: r128:SI=r127:SI&0x2000
  REG_DEAD r127:SI
9: r126:SI=r128:SI^r125:DI#4
  REG_DEAD r128:SI
  REG_DEAD r125:DI
Successfully matched this instruction:
(set (reg:SI 126)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 4)
(const_int 536870912 [0x2000]))
(and:SI (subreg:SI (reg/v:DI 125 [ yD.2902+-4 ]) 4)
(const_int -536870913 [0xdfff]

If the mask is not a constant, we really shouldn't generate a totally
different form.  The xor-and-xor form is very hard to handle, too.

Expand currently generates this, because gimple thinks this is simpler.
I think this should be fixed.

Segher

[Bug tree-optimization/92130] New: Missed vectorization for iteration dependent loads and simple multiplicative accumulators

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92130

Bug ID: 92130
   Summary: Missed vectorization for iteration dependent loads and
simple multiplicative accumulators
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: witold.baryluk+gcc at gmail dot com
  Target Milestone: ---

Created attachment 47051
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47051=edit
Perlin2D noise mesh generation

So,

I do have pretty complex multi level loop spread across many functions, but it
can be all vectorized, but under certain scenarios gcc does not vectorize it
with gcc 9.2.1

I am attaching somehow simplified code with few defines inside to play with it.

The one exposed by default present the biggest challenge to gcc, despite me
able to vectorize it manually.

I tested this on SSE2, AVX2 (cascadelake and znver2), AVX512 (-march=knm and
-march=skylake-avx512) and ARM SVE, with all same effects. I am using
associative math and other flags mentioned in the sourcefile at the top.

The high level overview is like this:

input: A, F, W, maxO, sufficiently aligned d.

foreach y:
  foreach x:
float v = 0.0
float a = 1.0
float f = 1.0
foreach o in [0, maxO):
  v += a * g(f * x, f * y, o, h(o, p))
  a *= A
  f *= F
d[y*W + x] = v

where both g and h are pure functions (relatively complex tho) with no control
flow or data dependent flow.

In some situations if a and f are replaced by a precomputed table of
coefficient for every o, and then used as v += a[o] * g(f[o] * x, f[o] * y,
h(o, p)), it does vectorize, but not always. h(o, p) could also be precomputed,
but I didn't bother as it appears to not have any bad effect on vectorizer.

Vectorizater should vectorize along the 'foreach x', and compute multiple x-s
per-lane completely independently. It is true that when updating a and f, each
lane need to be duplicated, but that can be done by computing it scalarly, and
then broadcasting, or by repeating same constants updates in each lane.

[Bug tree-optimization/92129] New: [10 Regression] ICE in vectorizable_reduction, at tree-vect-loop.c:5869

2019-10-16 Thread asolokha at gmx dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92129

Bug ID: 92129
   Summary: [10 Regression] ICE in vectorizable_reduction, at
tree-vect-loop.c:5869
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gcc-10.0.0-alpha20191013 snapshot (r276943) ICEs when compiling the following
testcase w/ -O1 -ftree-loop-vectorize -fno-tree-copy-prop:

unsigned int lq;
int aw;

void
m2 (int o9)
{
  while (o9 < 2)
{
  for (aw = 0; aw < 2; ++aw)
lq *= aw ? 2 : lq < 2;

  ++o9;
}
}

% gcc-10.0.0-alpha20191013 -O1 -ftree-loop-vectorize -fno-tree-copy-prop -c
k2t5fs2w.c
during GIMPLE pass: vect
k2t5fs2w.c: In function 'm2':
k2t5fs2w.c:5:1: internal compiler error: in vectorizable_reduction, at
tree-vect-loop.c:5869
5 | m2 (int o9)
  | ^~
0x6b8023 vectorizable_reduction(_stmt_vec_info*, _slp_tree*, _slp_instance*,
vec*)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vect-loop.c:5869
0xe91b61 vect_analyze_loop_operations
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vect-loop.c:1561
0xe91b61 vect_analyze_loop_2
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vect-loop.c:2033
0xe91b61 vect_analyze_loop(loop*, _loop_vec_info*, vec_info_shared*)
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vect-loop.c:2336
0xeaa308 try_vectorize_loop_1
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vectorizer.c:887
0xeaad0a vectorize_loops()
   
/var/tmp/portage/sys-devel/gcc-10.0.0_alpha20191013/work/gcc-10-20191013/gcc/tree-vectorizer.c:1107

[Bug c/92117] Warning from -Wbad-function-cast when casting from bool to int

2019-10-16 Thread egallager at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92117

Eric Gallager  changed:

   What|Removed |Added

   Keywords||diagnostic
 CC||egallager at gcc dot gnu.org

--- Comment #1 from Eric Gallager  ---
-Wbad-function-cast is a strange legacy warning that is more about GCC's
internal implementation details (of TREE_CODEs) than anything in user code:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00250.html
IOW I wouldn't worry too much about it. But I guess if someone really wants to
change it, they could...

[Bug libfortran/92100] Formatted stream IO irreproducible read with binary data in file

2019-10-16 Thread kargl at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92100

--- Comment #4 from kargl at gcc dot gnu.org ---
Here's a self-contained program.  AFAICT, pos= has an effect only within the
first record, and thereafter it is ignored.

F2018, 12.3.3.4(4) gives a bulleted list of things that can cause problems for
a user.  My unsolicited advice: Don't mix stream access and formatted output.
:(

[Bug libfortran/92100] Formatted stream IO irreproducible read with binary data in file

2019-10-16 Thread kargl at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92100

--- Comment #3 from kargl at gcc dot gnu.org ---
Created attachment 47050
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47050=edit
self-contained program

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread JeanHeyd Meneide

Dear Dave,

 Thanks for sharing all of that! It was very helpful to read it
over again, and it was helpful in IRC yesterday.

 As a bit of a "that was strange" moment, I ran the builds again
and did NOT do --disable-bootstrap with the patch on a different
machine. They built and ran fine, with no segfaults during calls to
error(...). I probably forgot to properly clean/build something in my
debugging environment, which caused most of the problems.

 I already assigned my Copyright to the FSF, so that part should
be fine! Is there anything else I should be doing?

Sincerely,
JeanHeyd

On Wed, Oct 16, 2019 at 8:51 AM David Malcolm  wrote:
>
> On Tue, 2019-10-15 at 20:31 -0400, JeanHeyd Meneide wrote:
> > Attached is a patch for p1301 that improves in the way Jason Merrill
> > specified earlier
> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html), but it
> > keeps segfaulting on my build of GCC. I don't know what changes I've
> > made that cause it to segfault: it does so whenever the error()
> > function is called but the backtraces aren't showing me anything
> > conclusive.
> >
> > The tests test what I expect them to and the output is fine, I just
> > can't get the segfaults to stop, so I'm putting the patch up so
> > someone can critique what I've written, or someone else to test it
> > too. Sorry for taking so long.
> >
> > Thanks,
> > JeanHeyd
>
> Thanks for posting these patches.
>
> I believe you're a new GCC contributor - welcome!
>
> Having said "welcome", for good or ill, GCC has some legalese that we
> have to follow - to get a patch accepted, see the "Legal Prerequisites"
> section of: https://gcc.gnu.org/contribute.html so you'll need to
> follow that at some point  (I hate having to say this).
>
> I got the impression on IRC yesterday that you were having some trouble
> getting your debugging environment set up - it sounded to me like
> you've been "making do" by adding print statements to the parser and
> recompiling each time.
>
> If that's the case, then the best next step is probably to figure out
> how to get you debugging cc1plus using a debugger.
>
> Normally, I run:
>   ./xgcc -B.
> and add  "-wrapper gdb,--args" to the command line so that the driver
> invokes gdb.
>
> Another way is to invoke gdb and have it directly invoke cc1plus:
>gdb --args ./cc1plus -quiet x.C
>
> In both cases, run it from /gcc
>
> See also:
> https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html
> which has further hints and info on a "nicer" debugging experience.
>
> You mentioned that you normally use gdbserver, which makes me think
> that you're most comfortable using gdb from an IDE.  I'm not very
> familiar with gdbserver, but hopefully someone here knows the recipe
> for this that will let you comfortably debug the frontend - without
> needing to add print statements and recompile each time (ugh).
>
> Note that for hacking on the frontend and debugging, I normally
> configure gcc with --disable-bootstrap which saves a *lot* of time;
> similarly I also only build the "gcc" subdirectory and its dependencies
> (I still do a full bootstrap and regression test in a separate
> directory once I've got a candidate patch that I want to test properly
> before submitting to the mailing list).
>
> [I know that you know some/all of this, but I'm posting it here for the
> benefit of people reading the list archives]
>
> Hope this is helpful, and welcome again.
> Dave
>
>

Re: general_operand not validating "(const_int 65535 [0xffff])"

2019-10-16 Thread Jozef Lawrynowicz

On Wed, 16 Oct 2019 19:02:17 +0200
Jakub Jelinek  wrote:

> On Wed, Oct 16, 2019 at 05:51:11PM +0100, Jozef Lawrynowicz wrote:
> > We call expand_expr_real_2 from expand_mul_overflow (internal-fn.c:1604).
> > When we process the arguments to:
> >   __builtin_umul_overflow ((unsigned int) (-1), y, );
> > at expr.c:8952, they go through a few transformations.
> > 
> > First we generate the rtx for ((unsigned int) -1) in the HImode context 
> > (msp430
> > has 16-bit int), which generates (const_int -1). OK.
> > Then it gets widened in a SImode context, but since it is unsigned, we zero
> > extend and the rtx becomes (const_int 65535). OK.
> > When we call expand_mult_highpart_adjust, we are back in HImode, but using
> > operands which have been widened in a SImode context. This is when we
> > generate our problematic insns using (const_int 65535) with HImode
> > operands.  
> 
> So, what exactly calls expand_mult_highpart_adjust, with what exact
> arguments (I see 3 callers).
> E.g. the one in expr.c already has:
>   if (TREE_CODE (treeop1) == INTEGER_CST)
> op1 = convert_modes (mode, word_mode, op1,
>  TYPE_UNSIGNED (TREE_TYPE (treeop1)));
> and should thus take care of op1.  It doesn't have the same for op0, assumes
> that if only one operand is INTEGER_CST, it must be the (canonical) second
> one.  So perhaps the bug is that something doesn't canonicalize the order of
> arguments?
> 
>   Jakub

That convert_modes call is actually what changes op1 from (const_int -1) to
(const_int 65535). In expand_expr_real_2, mode is SImode and word_mode is HImode
for that call.

Info from GDB:
> Breakpoint 2, expand_mult_highpart_adjust ( 
> unsignedp=unsignedp@entry=1) at ../../gcc/expmed.c:3747
> op0 == (reg:HI 23 [ y.0_1 ])
> op1 == (const_int 65535 [0x])
> mode == {m_mode = E_HImode}
> (gdb) bt
> #0  expand_mult_highpart_adjust ( unsignedp=unsignedp@entry=1) at 
> ../../gcc/expmed.c:3747
> #1  0x0085ee18 in expand_expr_real_2 ( 
> tmode=tmode@entry=E_SImode, modifier=modifier@entry=EXPAND_NORMAL) at 
> ../../gcc/expr.c:8963
> #2  0x0098d01d in expand_mul_overflow () at 
> ../../gcc/internal-fn.c:1604
> #3  0x0098fe2d in expand_arith_overflow (code=MULT_EXPR, 
> stmt=) at ../../gcc/internal-fn.c:2362

from expr.c:8946:
if (find_widening_optab_handler (other_optab, mode, innermode)
!= CODE_FOR_nothing
&& innermode == word_mode)
  {
rtx htem, hipart;
op0 = expand_normal (treeop0);
* Below generates (const_int -1) ***
op1 = expand_normal (treeop1);
/* op0 and op1 might be constants, despite the above
   != INTEGER_CST check.  Handle it.  */
if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
  goto widen_mult_const;
if (TREE_CODE (treeop1) == INTEGER_CST)
* Below generates (const_int 65535) **
  op1 = convert_modes (mode, word_mode, op1,
   TYPE_UNSIGNED (TREE_TYPE (treeop1)));
temp = expand_binop (mode, other_optab, op0, op1, target,
 unsignedp, OPTAB_LIB_WIDEN);
hipart = gen_highpart (word_mode, temp);
htem = expand_mult_highpart_adjust (word_mode, hipart,
op0, op1, hipart,
zextend_p);

Maybe the constants should be canonicalized before calling
expand_mult_high_part_adjust? I'm not sure at the moment.

Below patch is an alternative I quickly tried that also fixes the issue, but I
haven't tested it and its not clear if op0 should also be converted.

diff --git a/gcc/expmed.c b/gcc/expmed.c
index f1975fe33fe..25d8edde02e 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3748,6 +3748,8 @@ expand_mult_highpart_adjust (scalar_int_mode mode, rtx
adj_operand, rtx op0,
   rtx tem;
   enum rtx_code adj_code = unsignedp ? PLUS : MINUS;
 
+  op1 = convert_modes (mode, GET_MODE (XEXP (adj_operand, 0)), op1, unsignedp);
+
   tem = expand_shift (RSHIFT_EXPR, mode, op0,
  GET_MODE_BITSIZE (mode) - 1, NULL_RTX, 0);
   tem = expand_and (mode, tem, op1, NULL_RTX);


Thanks,
Jozef

[Bug tree-optimization/63945] Missing vectorization optimization

2019-10-16 Thread witold.baryluk+gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63945

Witold Baryluk  changed:

   What|Removed |Added

 CC||witold.baryluk+gcc at gmail 
dot co
   ||m

--- Comment #1 from Witold Baryluk  ---
It does vectorize for me on gcc 9.2.1:

-march=skylake-avx512

aa.cpp:34:29: optimized: loop vectorized using 32 byte vectors
aa.cpp:25:27: optimized: loop vectorized using 32 byte vectors


  if (val<100.)
1279:   c5 fb 10 0b vmovsd (%rbx),%xmm1
127d:   c5 fb 10 05 8b 0d 00vmovsd 0xd8b(%rip),%xmm0# 2010
<_IO_stdin_used+0x10>
1284:   00 
1285:   c5 f9 2f c1 vcomisd %xmm1,%xmm0
1289:   76 2b   jbe12b6 <_ZN4TEST4testEv+0xc6>
128b:   c4 e2 7d 19 c9  vbroadcastsd %xmm1,%ymm1
1290:   31 c0   xor%eax,%eax
1292:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
  c[i] = val*a[i]+b[i];
1298:   c4 c1 7d 10 04 04   vmovupd (%r12,%rax,1),%ymm0
129e:   c4 c2 f5 a8 44 05 00vfmadd213pd
0x0(%r13,%rax,1),%ymm1,%ymm0
12a5:   c5 fd 11 04 07  vmovupd %ymm0,(%rdi,%rax,1)
for (unsigned int i=0; i
::operator delete(__p);
12b6:   c5 f8 77vzeroupper 


Similarly:

-march=knm

aa.cpp:34:29: optimized: loop vectorized using 64 byte vectors
aa.cpp:25:27: optimized: loop vectorized using 64 byte vectors

  if (val<100.)
15bc:   31 c0   xor%eax,%eax
15be:   66 90   xchg   %ax,%ax
  c[i] = val*a[i]+b[i];
15c0:   62 f1 fd 48 28 04 01vmovapd (%rcx,%rax,1),%zmm0
15c7:   62 f2 ed 48 a8 04 06vfmadd213pd (%rsi,%rax,1),%zmm2,%zmm0
15ce:   62 d1 fd 48 11 04 01vmovupd %zmm0,(%r9,%rax,1)
for (unsigned int i=0; i

(plus a lot of handling for unaligned stack).

-march=znver2

aa.cpp:34:29: optimized: loop vectorized using 32 byte vectors
aa.cpp:25:27: optimized: loop vectorized using 32 byte vectors

  if (val<100.)
1279:   c5 fb 10 0b vmovsd (%rbx),%xmm1
127d:   c5 fb 10 05 8b 0d 00vmovsd 0xd8b(%rip),%xmm0# 2010
<_IO_stdin_used+0x10>
1284:   00 
1285:   c5 f9 2f c1 vcomisd %xmm1,%xmm0
1289:   76 33   jbe12be <_ZN4TEST4testEv+0xce>
128b:   c4 e2 7d 19 c9  vbroadcastsd %xmm1,%ymm1
1290:   31 c0   xor%eax,%eax
1292:   66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1)
1299:   00 00 00 00 
129d:   0f 1f 00nopl   (%rax)
  c[i] = val*a[i]+b[i];
12a0:   c4 c1 7d 10 04 04   vmovupd (%r12,%rax,1),%ymm0
12a6:   c4 c2 f5 a8 44 05 00vfmadd213pd
0x0(%r13,%rax,1),%ymm1,%ymm0
12ad:   c5 fd 11 04 07  vmovupd %ymm0,(%rdi,%rax,1)
for (unsigned int i=0; i

-march=core2

aa.cpp:34:29: optimized: loop vectorized using 16 byte vectors
aa.cpp:25:27: optimized: loop vectorized using 16 byte vectors

  if (val<100.)
1276:   f2 0f 10 13 movsd  (%rbx),%xmm2
127a:   f2 0f 10 05 8e 0d 00movsd  0xd8e(%rip),%xmm0# 2010
<_IO_stdin_used+0x10>
1281:   00 
1282:   66 0f 2f c2 comisd %xmm2,%xmm0
1286:   76 40   jbe12c8 <_ZN4TEST4testEv+0xd8>
1288:   31 c0   xor%eax,%eax
128a:   66 0f 14 d2 unpcklpd %xmm2,%xmm2
128e:   66 90   xchg   %ax,%ax
  c[i] = val*a[i]+b[i];
1290:   f3 0f 7e 44 05 00   movq   0x0(%rbp,%rax,1),%xmm0
1296:   f3 41 0f 7e 0c 04   movq   (%r12,%rax,1),%xmm1
129c:   66 0f 16 44 05 08   movhpd 0x8(%rbp,%rax,1),%xmm0
12a2:   66 0f 59 c2 mulpd  %xmm2,%xmm0
12a6:   66 41 0f 16 4c 04 08movhpd 0x8(%r12,%rax,1),%xmm1
12ad:   66 0f 58 c1 addpd  %xmm1,%xmm0
12b1:   66 0f 13 04 07  movlpd %xmm0,(%rdi,%rax,1)
12b6:   66 0f 17 44 07 08   movhpd %xmm0,0x8(%rdi,%rax,1)
for (unsigned int i=0; i



Looks all pretty optimally vectorized to me.

The code can be made even better, if you ensure proper alignment of std::vector
arrrays, which they might not be at the moment.

[PATCH] Communicate lto-wrapper and ld through a file

2019-10-16 Thread Giuliano Belinassi

Hi,

Previously, the lto-wrapper communicates with ld by creating a pipe from
lto-wrapper's stdout to ld's stdin. This patch uses a temporary file for
this communication, releasing stdout to be used for debugging.

I've run a full testsuite and bootstrapped LTO in a linux x86_64, and found
no issues so far. Do I need to write a testcase for this feature?

Giuliano.

gcc/ChangeLog
2019-10-16  Giuliano Belinassi  

* lto-wrapper.c (STATIC_LEN): New macro.
(to_ld): New.
(find_crtofftable): Print to file to_ld.
(find_ld_list_file): New.
(main): Check if to_ld is valid or is stdout.

gcc/libiberty
2019-10-16  Giuliano Belinassi  

* pex-unix.c (pex_unix_exec_child): check for PEX_KEEP_STD_IO flag.
(to_ld): New.

gcc/include
19-10-16  Giuliano Belinassi  

* libiberty.h (PEX_KEEP_STD_IO): New macro.

gcc/lto-plugin
2019-10-16  Giuliano Belinassi  

* lto-plugin.c (exec_lto_wrapper): Replace pipe from stdout to temporary
file, and pass its name in argv.

diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
index 9a7bbd0c022..d794b493d5d 100644
--- gcc/lto-wrapper.c
+++ gcc/lto-wrapper.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-section-names.h"
 #include "collect-utils.h"
 
+#define STATIC_LEN(x) (sizeof (x) / sizeof (*x))
+
 /* Environment variable, used for passing the names of offload targets from GCC
driver to lto-wrapper.  */
 #define OFFLOAD_TARGET_NAMES_ENV	"OFFLOAD_TARGET_NAMES"
@@ -74,6 +76,8 @@ static char *makefile;
 static unsigned int num_deb_objs;
 static const char **early_debug_object_names;
 
+static FILE *to_ld;
+
 const char tool_name[] = "lto-wrapper";
 
 /* Delete tempfiles.  Called from utils_cleanup.  */
@@ -955,7 +959,7 @@ find_crtoffloadtable (void)
 	/* The linker will delete the filename we give it, so make a copy.  */
 	char *crtoffloadtable = make_temp_file (".crtoffloadtable.o");
 	copy_file (crtoffloadtable, paths[i]);
-	printf ("%s\n", crtoffloadtable);
+	fprintf (to_ld, "%s\n", crtoffloadtable);
 	XDELETEVEC (crtoffloadtable);
 	break;
   }
@@ -1556,7 +1560,7 @@ cont1:
 	{
 	  find_crtoffloadtable ();
 	  for (i = 0; offload_names[i]; i++)
-	printf ("%s\n", offload_names[i]);
+	fprintf (to_ld, "%s\n", offload_names[i]);
 	  free_array_of_ptrs ((void **) offload_names, i);
 	}
 }
@@ -1686,12 +1690,12 @@ cont1:
 
   if (lto_mode == LTO_MODE_LTO)
 {
-  printf ("%s\n", flto_out);
+  fprintf (to_ld, "%s\n", flto_out);
   if (!skip_debug)
 	{
 	  for (i = 0; i < ltoobj_argc; ++i)
 	if (early_debug_object_names[i] != NULL)
-	  printf ("%s\n", early_debug_object_names[i]);	  
+	  fprintf (to_ld, "%s\n", early_debug_object_names[i]);
 	}
   /* These now belong to collect2.  */
   free (flto_out);
@@ -1866,15 +1870,15 @@ cont:
 	}
   for (i = 0; i < nr; ++i)
 	{
-	  fputs (output_names[i], stdout);
-	  putc ('\n', stdout);
+	  fputs (output_names[i], to_ld);
+	  putc ('\n', to_ld);
 	  free (input_names[i]);
 	}
   if (!skip_debug)
 	{
 	  for (i = 0; i < ltoobj_argc; ++i)
 	if (early_debug_object_names[i] != NULL)
-	  printf ("%s\n", early_debug_object_names[i]);	  
+	  fprintf (to_ld, "%s\n", early_debug_object_names[i]);
 	}
   nr = 0;
   free (ltrans_priorities);
@@ -1892,13 +1896,43 @@ cont:
   obstack_free (_obstack, NULL);
 }
 
+/* Find out if lto-wrapper was called and output to lto-plugin will be in a file
+   instead of stdout.  */
+
+static const char *find_ld_list_file (int* argc, char *argv[])
+{
+  int i;
+  static const char param[] = "--to_ld_list=";
+
+  for (i = 1; i < *argc; i++)
+{
+  if (argv[i] != NULL && !strncmp (argv[i], param, STATIC_LEN (param) - 1))
+	{
+	  /* Found.  Retrieve the path to the file.  */
+
+	  /* This simple 'automata' just finds the first ':' of the string,
+	 and then return a pointer from now foward.  */
+	  const char *path = argv[i];
+	  argv[i] = NULL;
+	  (*argc)--;
+	  while (*path != '\0')
+	if (*path++ == '=')
+		return path;
+	}
+}
+
+
+  /* Not found.  */
+  return NULL;
+}
+
 
 /* Entry point.  */
 
 int
 main (int argc, char *argv[])
 {
-  const char *p;
+  const char *p, *files_list;
 
   init_opts_obstack ();
 
@@ -1934,11 +1968,22 @@ main (int argc, char *argv[])
   signal (SIGCHLD, SIG_DFL);
 #endif
 
+  files_list = find_ld_list_file (, argv);
+  to_ld = files_list ? fopen (files_list, "a") : stdout;
+
+  if (!to_ld)
+{
+  fprintf (stderr, "%s: failed to open parameter file.\n", argv[0]);
+  abort ();
+}
+
   /* We may be called with all the arguments stored in some file and
  passed with @file.  Expand them into argv before processing.  */
   expandargv (, );
 
   run_gcc (argc, argv);
 
+  fclose (to_ld);
+
   return 0;
 }
diff --git include/libiberty.h include/libiberty.h
index 71192a29377..cc3940f91f3 100644
--- include/libiberty.h
+++ include/libiberty.h
@@ -406,6 +406,9 @@ extern void hex_init (void);
 /* Save files

[Bug tree-optimization/83819] [meta-bug] missing strlen optimizations

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
Bug 83819 depends on bug 91996, which changed state.

Bug 91996 Summary: fold non-constant strlen relational expressions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91996

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/91996] fold non-constant strlen relational expressions

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91996

Martin Sebor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=92128
 Resolution|--- |FIXED

--- Comment #4 from Martin Sebor  ---
Patch committed in 277076.  Bug 92128 tracks some of the remaining folding
opportunities in this are as well as the limited scope of one of the added
tests (strlenopt-80.c).

[Bug tree-optimization/92128] fold more non-constant strlen relational expressions

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92128

Martin Sebor  changed:

   What|Removed |Added

 Blocks||83819
   Severity|normal  |enhancement


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
[Bug 83819] [meta-bug] missing strlen optimizations

[Bug tree-optimization/92128] New: fold more non-constant strlen relational expressions

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92128

Bug ID: 92128
   Summary: fold more non-constant strlen relational expressions
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

With pr91996 resolved GCC can make use of strlen range information in some
contexts but not in others.  The test for pr91996 only enabled on a subset of
those where GCC is known to be able to do this, but even for those it is
specially crafted to exercise only the contexts where the optimization was
implemented.  On other targets (e.g., arm-*-*), or even on others such as
aarch64-*-* where the test passes, other similar test cases fail.  This is
because the solution for pr91996 was only put in place for multi-byte
assignments via MEM_REF but not also for calls to memcpy.

$ cat c.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout c.c
typedef __SIZE_TYPE__ size_t;

void f (char *d, const char *s)
{
  if (__builtin_strlen (s) < 5) return;

  __builtin_memcpy (d, s, 16); // when size is a small power of two

  size_t n1 = __builtin_strlen (d);
  if (n1 < 5)  // this is folded to false
__builtin_abort ();
}

void g (char *d, const char *s)
{
  if (__builtin_strlen (s) < 5) return;

  __builtin_memcpy (d, s, 32); // with larger power of two or
non-power-of-two sizes

  size_t n1 = __builtin_strlen (d);
  if (n1 < 5)  // this is not folded
__builtin_abort ();
}


;; Function f (f, funcdef_no=0, decl_uid=1932, cgraph_uid=1, symbol_order=0)

Removing basic block 5
f (char * d, const char * s)
{
  long unsigned int _1;
  __int128 unsigned _5;

   [local count: 1073741824]:
  _1 = __builtin_strlen (s_4(D));
  if (_1 <= 4)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 708669605]:
  _5 = MEM <__int128 unsigned> [(char * {ref-all})s_4(D)];
  MEM <__int128 unsigned> [(char * {ref-all})d_6(D)] = _5;

   [local count: 1073741829]:
  return;

}



;; Function g (g, funcdef_no=1, decl_uid=1937, cgraph_uid=2, symbol_order=1)

Removing basic block 6
Removing basic block 7
g (char * d, const char * s)
{
  size_t n1;
  long unsigned int _1;

   [local count: 1073741824]:
  _1 = __builtin_strlen (s_4(D));
  if (_1 <= 4)
goto ; [51.12%]
  else
goto ; [48.88%]

   [local count: 524845004]:
  __builtin_memcpy (d_5(D), s_4(D), 32);
  n1_7 = __builtin_strlen (d_5(D));
  if (n1_7 <= 4)
goto ; [0.00%]
  else
goto ; [100.00%]

   [count: 0]:
  __builtin_abort ();

   [local count: 1073741828]:
  return;

}

[Bug tree-optimization/91996] fold non-constant strlen relational expressions

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91996

--- Comment #3 from Martin Sebor  ---
Author: msebor
Date: Wed Oct 16 17:18:57 2019
New Revision: 277076

URL: https://gcc.gnu.org/viewcvs?rev=277076=gcc=rev
Log:
PR tree-optimization/91996 - fold non-constant strlen relational expressions

gcc/testsuite/ChangeLog:

PR tree-optimization/91996
* gcc.dg/strlenopt-80.c: New test.
* gcc.dg/strlenopt-81.c: New test.

gcc/ChangeLog:

PR tree-optimization/91996
* tree-ssa-strlen.c (maybe_warn_pointless_strcmp): Improve location
information.
(compare_nonzero_chars): Add an overload.
(count_nonzero_bytes): Add an argument.  Call overload above.
Handle non-constant lengths in some range.
(handle_store): Add an argument.
(check_and_optimize_stmt): Pass an argument to handle_store.


Added:
trunk/gcc/testsuite/gcc.dg/strlenopt-80.c
trunk/gcc/testsuite/gcc.dg/strlenopt-81.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-strlen.c

[Bug fortran/92114] equivalence in module causes ICE

2019-10-16 Thread sgk at troutmask dot apl.washington.edu

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92114

--- Comment #2 from Steve Kargl  ---
On Wed, Oct 16, 2019 at 05:02:50PM +, kargl at gcc dot gnu.org wrote:
> > 
> >  GNU Fortran (GCC) 7.4.0

This was released in Dec 2018, so ...

> This may have been fixed by
> 
> r242802 | kargl | 2016-11-23 13:44:05 -0800 (Wed, 23 Nov 2016) | 10 lines
> 

..., this is certainly in the version of gfortran you have.

[Bug fortran/92114] equivalence in module causes ICE

2019-10-16 Thread kargl at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92114

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #1 from kargl at gcc dot gnu.org ---
(In reply to urbanjost from comment #0)
> module minimal
> implicit none
> logical,private :: help,h,version,v
> equivalence (help,h),(version,v)
> end module minimal
> 
>  GNU Fortran (GCC) 7.4.0
>  gfortran bug.f90
>  f951: internal compiler error: Segmentation fault
> 
>  might be similar to bug 77381, but error message is different.
>  
>  The only restriction in the standard that seemed to bear on this is
> 
>   C593 (R567) The name of an equivalence-object shall not be a name made
> accessible by use association.
> 
>  but all the variables are PRIVATE. It failed with all variable types I
> tried, not just LOGICAL.

C593 applies to a USEd associated variable.  Nothing is USEd associated
in your example.  C593 covers

% cat a.f90
module m
  integer num
end module m

program foo
   use m
   integer k
   equivalence(k,num)
   num = 1
   print *, k
end program foo
% gfcx -c a.f90
a.f90:8:20:

8 |equivalence(k,num)
  |1
Error: EQUIVALENCE attribute conflicts with USE ASSOCIATED attribute in 'num'
at (1)


For me, the code you posted compiles with 
GNU Fortran (GCC) 10.0.0 20191014 (experimental)
GNU Fortran (GCC) 9.2.1 20191011
GNU Fortran (GCC) 8.3.1 20191010
GNU Fortran (GCC) 7.4.1 20191010
GNU Fortran (GCC) 6.5.0

This may have been fixed by

r242802 | kargl | 2016-11-23 13:44:05 -0800 (Wed, 23 Nov 2016) | 10 lines

2016-11-23  Steven G. Kargl  

PR fortran/78297
* trans-common.c (finish_equivalences): Do not dereference a NULL 
pointer.

2016-11-23  Steven G. Kargl  

PR fortran/78297
* gfortran.dg/pr78297.f90: New test.

Re: general_operand not validating "(const_int 65535 [0xffff])"

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 05:51:11PM +0100, Jozef Lawrynowicz wrote:
> We call expand_expr_real_2 from expand_mul_overflow (internal-fn.c:1604).
> When we process the arguments to:
>   __builtin_umul_overflow ((unsigned int) (-1), y, );
> at expr.c:8952, they go through a few transformations.
> 
> First we generate the rtx for ((unsigned int) -1) in the HImode context 
> (msp430
> has 16-bit int), which generates (const_int -1). OK.
> Then it gets widened in a SImode context, but since it is unsigned, we zero
> extend and the rtx becomes (const_int 65535). OK.
> When we call expand_mult_highpart_adjust, we are back in HImode, but using
> operands which have been widened in a SImode context. This is when we
> generate our problematic insns using (const_int 65535) with HImode
> operands.

So, what exactly calls expand_mult_highpart_adjust, with what exact
arguments (I see 3 callers).
E.g. the one in expr.c already has:
  if (TREE_CODE (treeop1) == INTEGER_CST)
op1 = convert_modes (mode, word_mode, op1,
 TYPE_UNSIGNED (TREE_TYPE (treeop1)));
and should thus take care of op1.  It doesn't have the same for op0, assumes
that if only one operand is INTEGER_CST, it must be the (canonical) second
one.  So perhaps the bug is that something doesn't canonicalize the order of
arguments?

Jakub

Re: [gomp4.1] Start of structure element mapping support

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 03:22:52PM +0200, Thomas Schwinge wrote:
> Stumbled over this while reviewing Julian's "Factor out duplicate code in
> gimplify_scan_omp_clauses":

> ..., which here gets writte to...
> 
> > +   if (base != decl)
> > + break;
> > +   gcc_assert (offset == NULL_TREE
> > +   || TREE_CODE (offset) == INTEGER_CST);
> 
> ..., but here we again check 'offset', not 'offset2'...

Yes, it indeed should be offset2 == NULL_TREE and
TREE_CODE (offset2) == INTEGER_CST, thanks for catching that.

> Should the second highlighted 'gcc_assert' be changed as follows,
> suitably adapted for current GCC trunk, of course?  (Not yet tested.)  If
> approving such a patch, please respond with "Reviewed-by: NAME "
> so that your effort will be recorded in the commit log, see
> .
> 
> - gcc_assert (offset == NULL_TREE
> - || TREE_CODE (offset) == INTEGER_CST);
> + gcc_assert (offset2 == NULL_TREE
> + || TREE_CODE (offset2) == INTEGER_CST);

Preapproved for trunk if it passes bootstrap/regtest.

Jakub

Re: general_operand not validating "(const_int 65535 [0xffff])"

2019-10-16 Thread Jozef Lawrynowicz

On Wed, 9 Oct 2019 16:03:34 +0200
Jakub Jelinek  wrote:

> On Wed, Oct 09, 2019 at 02:40:42PM +0100, Jozef Lawrynowicz wrote:
> > I've added a new define_expand for msp430 to handle "mulhisi", but when 
> > testing
> > the changes, some builtin tests (e.g. builtin-arith-overflow-{1,5,p-1}.c) 
> > fail.
> > 
> > I've narrowed a test case down to:
> > 
> > void
> > foo (unsigned int r, unsigned int y)
> > {
> >   __builtin_umul_overflow ((unsigned int) (-1), y, );
> > }
> >   
> > > msp430-elf-gcc -S tester.c -O0  
> > 
> > tester.c: In function 'foo':
> > tester.c:4:1: error: unrecognizable insn:
> > 4 | }
> >   | ^
> > (insn 16 15 17 2 (set (reg:HI 32)
> > (const_int 65535 [0x])) "tester.c":3:3 -1
> >  (nil))  
> 
> Yes, that is not valid, it needs to be (const_int -1).
> 
> > I guess the bug is wherever the (const_int 65535) is generated, it should 
> > be -1
> > sign extend to a HWI. That is based on this statement from the docs:  
> 
> Yes.  You need to debug where it is created ((const_int 65535) itself is not
> wrong if it is e.g. meant for SImode or DImode etc., but when it is to be
> used in HImode context, it needs to be canonicalized) and fix.
> 
>   Jakub

Thanks for the responses, took me a little while to get back to looking at this.

In the end I tracked this down to some behaviour specific to the mul_overflow
builtins.

We call expand_expr_real_2 from expand_mul_overflow (internal-fn.c:1604).
When we process the arguments to:
  __builtin_umul_overflow ((unsigned int) (-1), y, );
at expr.c:8952, they go through a few transformations.

First we generate the rtx for ((unsigned int) -1) in the HImode context (msp430
has 16-bit int), which generates (const_int -1). OK.
Then it gets widened in a SImode context, but since it is unsigned, we zero
extend and the rtx becomes (const_int 65535). OK.
When we call expand_mult_highpart_adjust, we are back in HImode, but using
operands which have been widened in a SImode context. This is when we
generate our problematic insns using (const_int 65535) with HImode
operands.

I'm currently testing the following patch which fixes the problem:
diff --git a/gcc/expmed.c b/gcc/expmed.c
index f1975fe33fe..5a2516dfc15 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3748,6 +3748,18 @@ expand_mult_highpart_adjust (scalar_int_mode mode, rtx
adj_operand, rtx op0, rtx tem;
   enum rtx_code adj_code = unsignedp ? PLUS : MINUS;

+  /* Constants that have been converted from a mode with
+ prec <= HOST_BITS_PER_WIDE_INT to a wider mode and back again may not be
+ canonically represented.  So we check if the high bit is set (which
indicates
+ if the constant might be ambiguously represented), and
+ canonicalize the constant if it is.  */
+  if (CONST_INT_P (op0)
+  && (UINTVAL (op0) & (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1
+op0 = gen_int_mode (INTVAL (op0), mode);
+  if (CONST_INT_P (op1)
+  && (UINTVAL (op1) & (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1
+op1 = gen_int_mode (INTVAL (op1), mode);
+
   tem = expand_shift (RSHIFT_EXPR, mode, op0,
  GET_MODE_BITSIZE (mode) - 1, NULL_RTX, 0);
   tem = expand_and (mode, tem, op1, NULL_RTX);

I thought about adding this code to expand_binop instead but this seems like
something that should be handled by the caller. Also, we don't have
this problem when expanding any other RTL.

However, we do already have somewhat similar behaviour of shifts in expand_binop
  /* For shifts, constant invalid op1 might be expanded from different
 mode than MODE.  As those are invalid, force them to a register
 to avoid further problems during expansion.  */
  else if (CONST_INT_P (op1)
   && shift_optab_p (binoptab)
   && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
{
  op1 = gen_int_mode (INTVAL (op1), GET_MODE_INNER (mode));
  op1 = force_reg (GET_MODE_INNER (mode), op1);
}

For now I'll stick with the fix in expand_mult_highpart_adjust and see how the
tests go.

Jozef

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
> > The counter example would be:
> > #define F(x) \
> >__extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> > #define G(x) \
> >__extension__ (({ __typeof__ (x) _x = x; F(_x); }))
> > where a -Wshadow diagnostics could point the author at a serious bug,
> > because in the expansion it will be __typeof__ (_x) _x = _x; ...
> 
> True.  I don't suppose there is a way to make it so the warning
> triggers for the counter example and not for the original, is
> there?

Maybe look through the macro nesting context and if the shadowing
declaration comes from the same macro as shadowed declaration
or macro included directly or indirectly from the macro with shadowed
declaration, warn, otherwise not?
This might still not warn in case where the scope of the shadowing
declaration is created from multiple macros ({ coming from one,
}) from another one, but otherwise could work.
Perhaps -Wshadow-local needs multiple modes, the default one that
will have this macro handling and full one (=2) which would warn
regardless of macro definitions.

Jakub

[arm] fix bootstrap failure due to uninitialized warning

2019-10-16 Thread Richard Earnshaw (lists)

The Arm port is failing bootstrap because GCC is now warning about an 
unitialized array.


The code is complex enough that I certainly can't be sure the compiler 
is wrong, so perhaps the best fix here is just to memset the entire 
array before use.


* config/arm/arm.c (neon_valid_immediate): Clear bytes before use.

Applied to trunk.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0975dc071..75a011029f1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12243,7 +12252,7 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 
   unsigned int i, elsize = 0, idx = 0, n_elts;
   unsigned int innersize;
-  unsigned char bytes[16];
+  unsigned char bytes[16] = {};
   int immtype = -1, matches;
   unsigned int invmask = inverse ? 0xff : 0;
   bool vector = GET_CODE (op) == CONST_VECTOR;

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Joseph Myers

On Wed, 16 Oct 2019, Jakub Jelinek wrote:

> The counter example would be:
> #define F(x) \
>   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> #define G(x) \
>   __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
> where a -Wshadow diagnostics could point the author at a serious bug,
> because in the expansion it will be __typeof__ (_x) _x = _x; ...

And this is not theoretical, 
 
 was a real 
bug in glibc soft-fp where shadowing of variables called _c1 and _c2 in 
two macros resulted in wrong code.

-- 
Joseph S. Myers
jos...@codesourcery.com

[Bug testsuite/92127] New: [10 regression] gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c fails after r276645 on power7

2019-10-16 Thread seurer at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92127

Bug ID: 92127
   Summary: [10 regression]
gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr2
9925.c fails after r276645 on power7
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

Only seeing this on power 7.

Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/-fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never  -maltivec -c -o
powerpc_altivec_ok7704.o powerpc_altivec_ok7704.c(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/ -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never -maltivec -c -o
powerpc_altivec_ok7704.o powerpc_altivec_ok7704.c
Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/ vmx_hw_available7704.c   
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never  -mno-vsx  -lm  -o vmx_hw_available7704.exe   
(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/ vmx_hw_available7704.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -mno-vsx -lm -o vmx_hw_available7704.exe
Setting LD_LIBRARY_PATH to
:/home/seurer/gcc/build/gcc-test/gcc::/home/seurer/gcc/build/gcc-test/gcc:/home/seurer/gcc/build/gcc-test/./gmp/.libs:/home/seurer/gcc/build/gcc-test/./prev-gmp/.libs:/home/seurer/gcc/build/gcc-test/./mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./isl/.libs:/home/seurer/gcc/build/gcc-test/./prev-isl/.libs
Execution timeout is: 300
spawn [open ...]
Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c
   -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never   -O2 -ftree-vectorize -fvect-cost-model=dynamic
-fno-common -maltivec -fdump-tree-vect-details -ffast-math  -lm  -o
./costmodel-fast-math-vect-pr29925.exe(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -O2 -ftree-vectorize -fvect-cost-model=dynamic
-fno-common -maltivec -fdump-tree-vect-details -ffast-math -lm -o
./costmodel-fast-math-vect-pr29925.exe
PASS: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c (test for
excess errors)
Setting LD_LIBRARY_PATH to
:/home/seurer/gcc/build/gcc-test/gcc::/home/seurer/gcc/build/gcc-test/gcc:/home/seurer/gcc/build/gcc-test/./gmp/.libs:/home/seurer/gcc/build/gcc-test/./prev-gmp/.libs:/home/seurer/gcc/build/gcc-test/./mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./isl/.libs:/home/seurer/gcc/build/gcc-test/./prev-isl/.libs
Execution timeout is: 300
spawn [open ...]
PASS: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c execution
test
Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/ p8vector_hw_available7704.c   
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never  -mpower8-vector  -lm  -o
p8vector_hw_available7704.exe(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/ p8vector_hw_available7704.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -mpower8-vector -lm -o p8vector_hw_available7704.exe
Setting LD_LIBRARY_PATH to
:/home/seurer/gcc/build/gcc-test/gcc::/home/seurer/gcc/build/gcc-test/gcc:/home/seurer/gcc/build/gcc-test/./gmp/.libs:/home/seurer/gcc/build/gcc-test/./prev-gmp/.libs:/home/seurer/gcc/build/gcc-test/./mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpfr/src/.libs:/home/seurer/gcc/build/gcc-test/./mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./prev-mpc/src/.libs:/home/seurer/gcc/build/gcc-test/./isl/.libs:/home/seurer/gcc/build/gcc-test/./prev-isl/.libs
Execution timeout is: 300
spawn [open ...]
gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c: pattern found 0
times
FAIL:

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jakub Jelinek

On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:
> > On x86_64 and most other targets, cleanup here (if non-NULL) is the
> > CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
> > type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
> > protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
> > but on arm only NOP_EXPR location.
> > 
> > The following patch (totally untested) should fix that.
> > 
> > For the warning location, perhaps we could special case destructor calls
> > in push_cx_call_context (to offset the intentional clearing of location for
> > debugging purposes), if they don't have location set, don't use
> > input_location for them, but try to pick DECL_SOURCE_LOCATION for the
> > variable being destructed?
> 
> Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
> the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.

The already previously posted patch (now attached as first) has now been
bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
improve the location or not should fix the arm vs. the rest of the world
difference.  Is that ok for trunk?

As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
anything, the diagnostics was still
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
"inline assembly is not a constant expression" }
  |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a
as without that change.

I've also tried the third patch, tested so far with check-c++-all, which
changes that to
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:12:6:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
"inline assembly is not a constant expression" }
  |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a

Jakub
2019-10-16  Jakub Jelinek  

* decl.c (cxx_maybe_build_cleanup): When clearing location of cleanup,
if cleanup is a nop, clear location of its operand too.

--- gcc/cp/decl.c.jj2019-10-10 01:33:38.154943945 +0200
+++ gcc/cp/decl.c   2019-10-11 10:09:24.321277942 +0200
@@ -16864,6 +16864,8 @@ cxx_maybe_build_cleanup (tree decl, tsub
  the end of the block.  So let's unset the location of the
  destructor call instead.  */
   protected_set_expr_location (cleanup, UNKNOWN_LOCATION);
+  if (cleanup && CONVERT_EXPR_P (cleanup))
+protected_set_expr_location (TREE_OPERAND (cleanup, 0), UNKNOWN_LOCATION);

   if (cleanup
   && DECL_P (decl)
2019-10-16  Jakub Jelinek  

* constexpr.c (cxx_eval_constant_expression) :
Temporarily change input_location to CLEANUP_STMT location.

--- gcc/cp/constexpr.c.jj   2019-10-10 01:33:38.185943480 +0200
+++ gcc/cp/constexpr.c  2019-10-11 22:54:32.628051700 +0200
@@ -4980,9 +4980,13 @@ cxx_eval_constant_expression (const cons
 case CLEANUP_STMT:
   {
tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
+   location_t loc = input_location;
+   if (EXPR_HAS_LOCATION (t))
+ input_location = EXPR_LOCATION (t);
r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
  non_constant_p, overflow_p,
  jump_target);
+   input_location = loc;
if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
  /* Also evaluate the cleanup.  If we weren't skipping at the
 start of the CLEANUP_BODY, change jump_target temporarily
2019-10-16  Jakub Jelinek  

* decl.c (cxx_maybe_build_cleanup): When clearing location of cleanup,
if cleanup is a nop, clear location of its operand too.
* constexpr.c (push_cx_call_context): For calls to destructors, use
DECL_SOURCE_LOCATION of destructed variable in preference to
input_location.

* g++.dg/cpp2a/constexpr-dtor3.C: Expect in 'constexpr' expansion of
message on the line with variable declaration.

--- gcc/cp/decl.c.jj2019-10-16 09:30:57.490109872 +0200
+++ gcc/cp/decl.c   2019-10-16 17:45:48.647529567 +0200
@@ -16864,6 +16864,8 @@ cxx_maybe_build_cleanup (tree decl, tsub
  the end of the block.  So let's unset the location of the
  destructor call instead.  */
   protected_set_expr_location (cleanup, UNKNOWN_LOCATION);
+  if (cleanup && CONVERT_EXPR_P (cleanup))
+

[AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Yuliang Wang

Hi,

This patch adds combine pass support for the following SVE2 bitwise logic 
instructions:

- EOR3  (3-way vector exclusive OR)
- BSL   (bitwise select)
- NBSL  (inverted ")
- BSL1N (" with first input inverted)
- BSL2N (" with second input inverted)

Example template snippet:

void foo (TYPE *a, TYPE *b, TYPE *c, TYPE *d, int n)
{
  for (int i = 0; i < n; i++)
a[i] = OP (b[i], c[i], d[i]);
}

EOR3:

  // #define OP(x,y,z) ((x) ^ (y) ^ (z))

  beforeeor z1.d, z1.d, z2.d
eor z0.d, z0.d, z1.d
  ...
  after eor3z0.d, z0.d, z1.d, z2.d

BSL:

  // #define OP(x,y,z) (((x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
and z0.d, z0.d, z2.d
eor z0.d, z0.d, z1.d
  ...
  after bsl z0.d, z0.d, z1.d, z2.d

NBSL:

  // #define OP(x,y,z) ~(((x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
and z0.d, z0.d, z2.d
eor z0.d, z0.d, z1.d
not z0.s, p1/m, z0.s
  ...
  after nbslz0.d, z0.d, z1.d, z2.d

BSL1N:

  // #define OP(x,y,z) ((~(x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
bic z0.d, z2.d, z0.d
eor z0.d, z0.d, z1.d
  ...
  after bsl1n   z0.d, z0.d, z1.d, z2.d

BSL2N:

  // #define OP(x,y,z) (((x) & (z)) | (~(y) & ~(z)))

  beforeorr z0.d, z1.d, z0.d
and z1.d, z1.d, z2.d
not z0.s, p1/m, z0.s
orr z0.d, z0.d, z1.d
  ...
  after bsl2n   z0.d, z0.d, z1.d, z2.d

Additionally, vector NOR and NAND operations are now optimized with NBSL:

  NOR   x, y  ->  NBSL  x, y, x
  NAND  x, y  ->  NBSL  x, y, y

Built and tested on aarch64-none-elf.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-10-16  Yuliang Wang  

* config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
(aarch64_sve2_nor, aarch64_sve2_nand)
(aarch64_sve2_bsl, aarch64_sve2_nbsl)
(aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
New combine patterns.
* config/aarch64/iterators.md (BSL_3RD): New int iterator for the above.
(bsl_1st, bsl_2nd, bsl_3rd, bsl_mov): Attributes for the above.
* config/aarch64/aarch64.h (AARCH64_ISA_SVE2_AES, AARCH64_ISA_SVE2_SM4)
(AARCH64_ISA_SVE2_SHA3, AARCH64_ISA_SVE2_BITPERM): New ISA flag macros.
(TARGET_SVE2_AES, TARGET_SVE2_SM4, TARGET_SVE2_SHA3)
(TARGET_SVE2_BITPERM): New CPU targets.

gcc/testsuite/ChangeLog:

2019-10-16  Yuliang Wang  

* gcc.target/aarch64/sve2/eor3_1.c: New test.
* gcc.target/aarch64/sve2/eor3_2.c: As above.
* gcc.target/aarch64/sve2/nlogic_1.c: As above.
* gcc.target/aarch64/sve2/nlogic_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_1.c: As above.
* gcc.target/aarch64/sve2/bitsel_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_3.c: As above.
* gcc.target/aarch64/sve2/bitsel_4.c: As above.


rb11975.patch
Description: rb11975.patch

[Bug c++/91055] alignof () evaluated before layout is complete?

2019-10-16 Thread kamleshbhalui at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91055

Kamlesh Kumar  changed:

   What|Removed |Added

 CC||kamleshbhalui at gmail dot com

--- Comment #1 from Kamlesh Kumar  ---
This fixes it.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index f427c4f4d3e..928dc887956 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -1791,6 +1791,11 @@ cxx_alignof_expr (tree e, tsubst_flags_t complain)
   if (e == error_mark_node)
 return error_mark_node;

+  if (current_class_type && TYPE_BEING_DEFINED (current_class_type))
+{
+  error ("invalid application of % to a field of a class still
being defined");
+  return error_mark_node;
+}
   if (processing_template_decl)
 {
   e = build_min (ALIGNOF_EXPR, size_type_node, e);

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-16 Thread Richard Earnshaw (lists)


On 11/10/2019 00:08, Ramana Radhakrishnan wrote:

On Thu, Oct 10, 2019 at 7:06 PM Richard Sandiford
 wrote:


Wilco Dijkstra  writes:

ping

Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on 
practically
any target.


The name is confusing, but the documentation looks accurate to me:

 Define this macro as a C expression which is nonzero if accessing less
 than a word of memory (i.e.@: a @code{char} or a @code{short}) is no
 faster than accessing a word of memory, i.e., if such access
 require more than one instruction or if there is no difference in cost
 between byte and (aligned) word loads.

 When this macro is not defined, the compiler will access a field by
 finding the smallest containing object; when it is defined, a fullword
 load will be used if alignment permits.  Unless bytes accesses are
 faster than word accesses, using word accesses is preferable since it
 may eliminate subsequent memory access if subsequent accesses occur to
 other fields in the same word of the structure, but to different bytes.


I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.


I disagree.  Some targets can optimise single-bit operations when the
container is a byte, for example.


OK for commit until we get rid of it?

ChangeLog:
2017-11-17  Wilco Dijkstra  

 gcc/
 * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -769,14 +769,9 @@ typedef struct
 if given data not on the nominal alignment.  */
  #define STRICT_ALIGNMENTTARGET_STRICT_ALIGN

-/* Define this macro to be non-zero if accessing less than a word of
-   memory is no faster than accessing a word of memory, i.e., if such
-   accesses require more than one instruction or if there is no
-   difference in cost.
-   Although there's no difference in instruction count or cycles,
-   in AArch64 we don't want to expand to a sub-word to a 64-bit access
-   if we don't have to, for power-saving reasons.  */
-#define SLOW_BYTE_ACCESS   0
+/* Contrary to all documentation, this enables wide bitfield accesses,
+   which results in better code when accessing multiple bitfields.  */
+#define SLOW_BYTE_ACCESS   1

  #define NO_FUNCTION_CSE 1


I agree this makes sense from a performance point of view, and I think
the existing comment is admitting that AArch64 has the properties that
would normally cause us to set SLOW_BYTE_ACCESS to 1.  But the comment
is claiming that there's a power-saving benefit to leaving it off.

It seems like a weak argument though.  Bitfields are used when several
values are packed into the same integer, so there's a high likelihood
we'll need the whole integer anyway.  Avoiding the redundancies described
in the documention should if anything help with power usage.

Maybe the main concern was using a 64-bit access when a 32-bit one
would do, since 32-bit bitfield containers are the most common.  But the:

  && GET_MODE_ALIGNMENT (mode) <= align

condition in get_best_mode should avoid that unless the 64-bit
access is naturally aligned.  (See the big comment above for the
pros and cons of this.)

So I think we should change the macro value unless anyone can back up the
power-saving claim.  Let's wait a week (more) to see if anyone objects.


IIRC, that power saving comment comes from the original port and
probably from when
the port was first written which is probably more than 10 years now.



Yes.  Don't forget that at that time the INSV and EXTV expanders only 
operated on a single mode, which on AArch64 was 64 bits.  IIRC, at the 
time this was written the compiler would widen everything to that size 
if there was a bitfield op and that led to worse code.


So it's probably not as relevant now as it once was.

R.


regards
Ramana

Ramana



The comment change isn't OK though.  Please keep the first paragraph
and just reword the second to say that's why we set the value to 1.

Thanks,
Richard

[wwwdocs, committed] Fix GCC 8.2 release date (was: wwwdocs/htdocs/gcc-8 index.html)

2019-10-16 Thread Thomas Schwinge

Hi!

On 2018-07-26T11:56:36+, ja...@gcc.gnu.org wrote:
> Files modified in the GCC repository. Log entry:
>
> Fix up a typo in the release year.

..., but the day also needs to be fixed.  ;-)

Pushed to wwwdocs the attached commit
0dd4c6860fe284cef2df33ec98b2754c25d10438 "Fix GCC 8.2 release date".


Grüße
 Thomas


From 0dd4c6860fe284cef2df33ec98b2754c25d10438 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Oct 2019 18:02:00 +0200
Subject: [PATCH] Fix GCC 8.2 release date

---
 htdocs/gcc-8/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-8/index.html b/htdocs/gcc-8/index.html
index fb315d17..54bb3809 100644
--- a/htdocs/gcc-8/index.html
+++ b/htdocs/gcc-8/index.html
@@ -29,7 +29,7 @@ GCC 8.2 relative to previous releases of GCC.
 
 
 GCC 8.2
-Jul 14, 2018
+Jul 26, 2018
 (changes,
  http://gcc.gnu.org/onlinedocs/8.2.0/;>documentation)
 
-- 
2.17.1



signature.asc
Description: PGP signature

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Martin Sebor


On 10/16/19 9:50 AM, Jakub Jelinek wrote:

On Wed, Oct 16, 2019 at 09:43:49AM -0600, Martin Sebor wrote:

Should the warning trigger when the shadowing name results from
macro expansion?  The author of a macro can't (in general) know
what context it's going to be used, and when different macros
come from two different third party headers, it would seem
pointless to force their users to jump through hoops just to
avoid the innocuous shadowing.  Such as in this example:

#define Abs(x) \
   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))

#define Min(x, y) \
   __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < _y ?
_x : _y; }))

int f (int x, int y)
{
   return Abs (Min (x, y));   // -Wshadow for _x?
}


The counter example would be:
#define F(x) \
   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
#define G(x) \
   __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
where a -Wshadow diagnostics could point the author at a serious bug,
because in the expansion it will be __typeof__ (_x) _x = _x; ...


True.  I don't suppose there is a way to make it so the warning
triggers for the counter example and not for the original, is
there?

Martin

PS The counterexample nicely illustrates why -Wself-init should
be in -Wall like in Clang or MSVC, or at least in -Wextra like in
ICC.  Let me take it as a reminder to submit a patch for GCC 10.

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-16 Thread Paolo Carlini

... the below, slightly extended patch: 1- Makes sure the 
in_system_header_at calls surviving in decl.c get the same location used 
for the corresponding diagnostic; exploit locations[ds_typedef] in an 
error_at in grokdeclarator.


Tested, as usual, on x86_64-linux.

Thanks, Paolo.

/

/cp
2019-10-16  Paolo Carlini  

* decl.c (grokfndecl): Remove redundant use of in_system_header_at.
(compute_array_index_type_loc): Likewise.
(grokdeclarator): Likewise.
* error.c (cp_printer): Likewise.
* lambda.c (add_default_capture): Likewise.
* parser.c (cp_parser_primary_expression): Likewise.
(cp_parser_selection_statement): Likewise.
(cp_parser_toplevel_declaration): Likewise.
(cp_parser_enumerator_list): Likewise.
(cp_parser_using_declaration): Likewise.
(cp_parser_member_declaration): Likewise.
(cp_parser_exception_specification_opt): Likewise.
(cp_parser_std_attribute_spec): Likewise.
* pt.c (do_decl_instantiation): Likewise.
(do_type_instantiation): Likewise.
* typeck.c (cp_build_unary_op): Likewise.

* decl.c (check_tag_decl): Pass to in_system_header_at the same
location used for the permerror.
(grokdeclarator): Likewise.

* decl.c (check_tag_decl): Use locations[ds_typedef] in error_at.

/testsuite
2019-10-16  Paolo Carlini  

* g++.old-deja/g++.other/decl9.C: Check locations too.
Index: cp/decl.c
===
--- cp/decl.c   (revision 276984)
+++ cp/decl.c   (working copy)
@@ -4933,9 +4933,9 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
  "multiple types in one declaration");
   else if (declspecs->redefined_builtin_type)
 {
-  if (!in_system_header_at (input_location))
-   permerror (declspecs->locations[ds_redefined_builtin_type_spec],
-  "redeclaration of C++ built-in type %qT",
+  location_t loc = declspecs->locations[ds_redefined_builtin_type_spec];
+  if (!in_system_header_at (loc))
+   permerror (loc, "redeclaration of C++ built-in type %qT",
   declspecs->redefined_builtin_type);
   return NULL_TREE;
 }
@@ -4984,7 +4984,8 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
 --end example]  */
   if (saw_typedef)
{
- error ("missing type-name in typedef-declaration");
+ error_at (declspecs->locations[ds_typedef],
+   "missing type-name in typedef-declaration");
  return NULL_TREE;
}
   /* Anonymous unions are objects, so they can have specifiers.  */;
@@ -9328,7 +9329,6 @@ grokfndecl (tree ctype,
}
  /* 17.6.3.3.5  */
  if (suffix[0] != '_'
- && !in_system_header_at (location)
  && !current_function_decl && !(friendp && !funcdef_flag))
warning_at (location, OPT_Wliteral_suffix,
"literal operator suffixes not preceded by %<_%>"
@@ -10036,8 +10036,6 @@ compute_array_index_type_loc (location_t name_loc,
   indicated by the state of complain), so that
   another substitution can be found.  */
return error_mark_node;
- else if (in_system_header_at (input_location))
-   /* Allow them in system headers because glibc uses them.  */;
  else if (name)
pedwarn (loc, OPT_Wpedantic,
 "ISO C++ forbids zero-size array %qD", name);
@@ -11004,7 +11002,7 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (type_was_error_mark_node)
/* We've already issued an error, don't complain more.  */;
-  else if (in_system_header_at (input_location) || flag_ms_extensions)
+  else if (in_system_header_at (id_loc) || flag_ms_extensions)
/* Allow it, sigh.  */;
   else if (! is_main)
permerror (id_loc, "ISO C++ forbids declaration of %qs with no type",
@@ -11037,7 +11035,7 @@ grokdeclarator (const cp_declarator *declarator,
}
   /* Don't pedwarn if the alternate "__intN__" form has been used instead
 of "__intN".  */
-  else if (!int_n_alt && pedantic && ! in_system_header_at 
(input_location))
+  else if (!int_n_alt && pedantic)
pedwarn (declspecs->locations[ds_type_spec], OPT_Wpedantic,
 "ISO C++ does not support %<__int%d%> for %qs",
 int_n_data[declspecs->int_n_idx].bitsize, name);
@@ -12695,10 +12693,7 @@ grokdeclarator (const cp_declarator *declarator,
else
  {
/* Array is a flexible member.  */
-   if (in_system_header_at (input_location))
- /* Do not warn on flexible array members in system
-headers because glibc uses them.  */;
-   else if (name)
+   if (name)
  pedwarn (id_loc, OPT_Wpedantic,

[Bug testsuite/92126] New: [10 regression] gcc.dg/vect/pr62171.c fails after r276876 on power7

2019-10-16 Thread seurer at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92126

Bug ID: 92126
   Summary: [10 regression] gcc.dg/vect/pr62171.c fails after
r276876 on power7
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

This appears to be working OK everywhere else, just not on power 7.

Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/pr62171.c   
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never  -fdiagnostics-urls=never   -maltivec -mvsx
-mno-allow-movmisalign -ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o pr62171.s  
 (timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/pr62171.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -maltivec -mvsx
-mno-allow-movmisalign -ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o pr62171.s
PASS: gcc.dg/vect/pr62171.c (test for excess errors)
PASS: gcc.dg/vect/pr62171.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/pr62171.c scan-tree-dump-not vect "versioned"
Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/pr62171.c   
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never  -fdiagnostics-urls=never  -flto -ffat-lto-objects
-maltivec -mvsx -mno-allow-movmisalign -ftree-vectorize
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2
-fdump-tree-vect-details -S -o pr62171.s(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/build/gcc-test/gcc/
/home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/pr62171.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -flto -ffat-lto-objects
-maltivec -mvsx -mno-allow-movmisalign -ftree-vectorize
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2
-fdump-tree-vect-details -S -o pr62171.s
PASS: gcc.dg/vect/pr62171.c -flto -ffat-lto-objects (test for excess errors)
PASS: gcc.dg/vect/pr62171.c -flto -ffat-lto-objects  scan-tree-dump-times vect
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/pr62171.c -flto -ffat-lto-objects  scan-tree-dump-not vect
"versioned"
testcase /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect.exp completed
in 2 seconds

=== gcc Summary ===

# of expected passes4
# of unexpected failures2

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 09:43:49AM -0600, Martin Sebor wrote:
> Should the warning trigger when the shadowing name results from
> macro expansion?  The author of a macro can't (in general) know
> what context it's going to be used, and when different macros
> come from two different third party headers, it would seem
> pointless to force their users to jump through hoops just to
> avoid the innocuous shadowing.  Such as in this example:
> 
> #define Abs(x) \
>   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> 
> #define Min(x, y) \
>   __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < _y ?
> _x : _y; }))
> 
> int f (int x, int y)
> {
>   return Abs (Min (x, y));   // -Wshadow for _x?
> }

The counter example would be:
#define F(x) \
  __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
#define G(x) \
  __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
where a -Wshadow diagnostics could point the author at a serious bug,
because in the expansion it will be __typeof__ (_x) _x = _x; ...

Jakub

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Martin Sebor


On 10/16/19 9:11 AM, Richard Sandiford wrote:

Sorry for the slow reply.

Bernd Edlinger  writes:

Hi,

this is probably on the border to obvious.

The REGEXP_xxx macros in genautomata are invoked
recursively, and the local values are all named _regexp
and shadow each other.


Fixed by using different names _regexp1..6 for each
macro.


Sorry to repeat the complaint about numerical suffixes, but I think
we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
and similarly for the other macros.  But a similar fix to rtl.h might
be better.


Should the warning trigger when the shadowing name results from
macro expansion?  The author of a macro can't (in general) know
what context it's going to be used, and when different macros
come from two different third party headers, it would seem
pointless to force their users to jump through hoops just to
avoid the innocuous shadowing.  Such as in this example:

#define Abs(x) \
  __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))

#define Min(x, y) \
  __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < 
_y ? _x : _y; }))


int f (int x, int y)
{
  return Abs (Min (x, y));   // -Wshadow for _x?
}

Martin


Thanks,
Richard


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

2019-10-04  Bernd Edlinger  

* genautomata.c (REGEXP_UNIT, REGEXP_RESERV, REGEXP_SEQUENCE,
REGEXP_REPEAT, REGEXP_ALLOF, REGEXP_ONEOF): Rename local vars.

Index: gcc/genautomata.c
===
--- gcc/genautomata.c   (revision 276484)
+++ gcc/genautomata.c   (working copy)
@@ -984,46 +984,46 @@ decl_mode_check_failed (enum decl_mode mode, const
  
  
  #define REGEXP_UNIT(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_unit)  \
-   regexp_mode_check_failed (_regexp->mode, "rm_unit",\
+(({ struct regexp *const _regex1 = (r);
\
+ if (_regex1->mode != rm_unit)  \
+   regexp_mode_check_failed (_regex1->mode, "rm_unit",\
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.unit; }))
+ &(_regex1)->regexp.unit; }))
  
  #define REGEXP_RESERV(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_reserv)\
-   regexp_mode_check_failed (_regexp->mode, "rm_reserv",  \
+(({ struct regexp *const _regex2 = (r);
\
+ if (_regex2->mode != rm_reserv)\
+   regexp_mode_check_failed (_regex2->mode, "rm_reserv",  \
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.reserv; }))
+ &(_regex2)->regexp.reserv; }))
  
  #define REGEXP_SEQUENCE(r) __extension__\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_sequence)  \
-   regexp_mode_check_failed (_regexp->mode, "rm_sequence",\
+(({ struct regexp *const _regex3 = (r);
\
+ if (_regex3->mode != rm_sequence)  \
+   regexp_mode_check_failed (_regex3->mode, "rm_sequence",\
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.sequence; }))
+ &(_regex3)->regexp.sequence; }))
  
  #define REGEXP_REPEAT(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_repeat)\
-   regexp_mode_check_failed (_regexp->mode, "rm_repeat",  \
+(({ struct regexp *const _regex4 = (r);
\
+ if (_regex4->mode != rm_repeat)\
+   regexp_mode_check_failed (_regex4->mode, "rm_repeat",  \
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.repeat; }))
+ &(_regex4)->regexp.repeat; }))
  
  #define REGEXP_ALLOF(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_allof) \
-   regexp_mode_check_failed (_regexp->mode, "rm_allof",   \
+(({ struct regexp *const _regex5 = (r);
\
+ if (_regex5->mode != rm_allof) \
+   regexp_mode_check_failed (_regex5->mode, "rm_allof",   \
   __FILE__, __LINE__,

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Richard Earnshaw (lists)


On 16/10/2019 13:13, Wilco Dijkstra wrote:

Hi Christophe,


I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1


That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
         adds    r0, r0, #1
         b       .L27
.L6:
         ldr     r4, [r2, #12]
         adds    r0, r0, #4
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
.L27:
         ldr     r4, [r2, #12]
         cmp     ip, r0
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         bne     .L6
         pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits 
scheduling of
the loop itself.

Cheers,
Wilco



So what's your proposed solution?  Leaving the test failing is not an 
option.

[Bug fortran/78260] ICE in gimplify_expr, at gimplify.c:11939

2019-10-16 Thread tschwinge at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78260

Thomas Schwinge  changed:

   What|Removed |Added

 CC||tschwinge at gcc dot gnu.org
 Depends on||85701
   Assignee|unassigned at gcc dot gnu.org  |burnus at gcc dot 
gnu.org

--- Comment #6 from Thomas Schwinge  ---
(In reply to Tobias Burnus from comment #3)
> This one is fixed with GCC 9:
> 5 | !$acc declare present(s)
>   |1
> Error: Object ‘s’ is not a variable at (1)

(Given 'subroutine s'.)

For the record (in case we ever get to backporting these things to release
branches), that was fixed in PR85701.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85701
[Bug 85701] [openacc] ICE in mark_scope_block_unused, at tree-ssa-live.c:364

[Bug c++/92015] [9/10 Regression] internal compiler error: in cxx_eval_array_reference, at cp/constexpr.c:2568

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92015

--- Comment #3 from Jakub Jelinek  ---
Created attachment 47049
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47049=edit
gcc10-pr92015.patch

Like this.  Untested.

[Bug c++/92015] [9/10 Regression] internal compiler error: in cxx_eval_array_reference, at cp/constexpr.c:2568

2019-10-16 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92015

Jakub Jelinek  changed:

   What|Removed |Added

 CC||dmalcolm at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
So, we try to constexpr evaluate
(TARGET_EXPR ("hello")}>).c[0]
which has a location wrapper in.  Normally location wrappers are stripped early
in cxx_eval_constant_expression: STRIP_ANY_LOCATION_WRAPPER (t);
The problem is that when cxx_eval_constant_expression is called on CONSTRUCTOR,
it does:
5233  if (TREE_CONSTANT (t) && reduced_constant_expression_p (t))
5234{
5235  /* Don't re-process a constant CONSTRUCTOR, but do fold it to
5236 VECTOR_CST if applicable.  */
5237  verify_constructor_flags (t);
5238  if (TREE_CONSTANT (t))
5239return fold (t);
5240}
5241  r = cxx_eval_bare_aggregate (ctx, t, lval,
5242   non_constant_p, overflow_p);
While cxx_eval_bare_aggregate strips them away, as it recurses on each element,
if the CONSTRUCTOR is TREE_CONSTANT and reduced constant expression (which is
happy about STRING_CSTs wrapped in VIEW_CONVERT_EXPRs or about INTEGER_CSTs
wrapped in NON_LVALUE_EXPRs, because initializer_constant_valid_p_1 recurses on
both), nothing is changed.
The easiest fix is IMHO when we are picking something out of a CONSTRUCTOR add
the STRIP_ANY_LOCATION_WRAPPER there.

[Bug middle-end/92110] too many -Warray-bounds warnings for a loop buffer overflow

2019-10-16 Thread msebor at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92110

--- Comment #2 from Martin Sebor  ---
The __builtin_warning patch implements the per-location solution
(https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01015.html) but I think the more
aggregate approach would be more informative and so preferable.

Re: [PATCH] [MIPS] Remove unnecessary moves around dpadd and dpsub

2019-10-16 Thread Jeff Law

On 10/16/19 9:03 AM, Mihailo Stojanovic wrote:
> Unnecessary moves around dpadd and dpsub are caused by different pseudos
> being assigned to the input-output operands which correspond to the same
> register.
> 
> This forces the same pseudo to the input-output operands, which removes
> unnecesary moves.
> 
> Tested on mips-mti-linux-gnu.
> 
> gcc/ChangeLog:
> 
> * gcc/config/mips/mips.c (mips_expand_builtin_insn): Force the
> operands which correspond to the same input-output register to
> have the same pseudo assigned to them.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c: New test.
THanks.  Installed.
jeff

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Richard Sandiford

Sorry for the slow reply.

Bernd Edlinger  writes:
> Hi,
>
> this is probably on the border to obvious.
>
> The REGEXP_xxx macros in genautomata are invoked
> recursively, and the local values are all named _regexp
> and shadow each other.
>
>
> Fixed by using different names _regexp1..6 for each
> macro.

Sorry to repeat the complaint about numerical suffixes, but I think
we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
and similarly for the other macros.  But a similar fix to rtl.h might
be better.

Thanks,
Richard

> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.
>
> 2019-10-04  Bernd Edlinger  
>
>   * genautomata.c (REGEXP_UNIT, REGEXP_RESERV, REGEXP_SEQUENCE,
>   REGEXP_REPEAT, REGEXP_ALLOF, REGEXP_ONEOF): Rename local vars.
>
> Index: gcc/genautomata.c
> ===
> --- gcc/genautomata.c (revision 276484)
> +++ gcc/genautomata.c (working copy)
> @@ -984,46 +984,46 @@ decl_mode_check_failed (enum decl_mode mode, const
>  
>  
>  #define REGEXP_UNIT(r) __extension__ \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_unit)   \
> -   regexp_mode_check_failed (_regexp->mode, "rm_unit",   \
> +(({ struct regexp *const _regex1 = (r);  
> \
> + if (_regex1->mode != rm_unit)   \
> +   regexp_mode_check_failed (_regex1->mode, "rm_unit",   \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.unit; }))
> + &(_regex1)->regexp.unit; }))
>  
>  #define REGEXP_RESERV(r) __extension__   
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_reserv) \
> -   regexp_mode_check_failed (_regexp->mode, "rm_reserv", \
> +(({ struct regexp *const _regex2 = (r);  
> \
> + if (_regex2->mode != rm_reserv) \
> +   regexp_mode_check_failed (_regex2->mode, "rm_reserv", \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.reserv; }))
> + &(_regex2)->regexp.reserv; }))
>  
>  #define REGEXP_SEQUENCE(r) __extension__ \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_sequence)   
> \
> -   regexp_mode_check_failed (_regexp->mode, "rm_sequence",   
> \
> +(({ struct regexp *const _regex3 = (r);  
> \
> + if (_regex3->mode != rm_sequence)   
> \
> +   regexp_mode_check_failed (_regex3->mode, "rm_sequence",   
> \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.sequence; }))
> + &(_regex3)->regexp.sequence; }))
>  
>  #define REGEXP_REPEAT(r) __extension__   
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_repeat) \
> -   regexp_mode_check_failed (_regexp->mode, "rm_repeat", \
> +(({ struct regexp *const _regex4 = (r);  
> \
> + if (_regex4->mode != rm_repeat) \
> +   regexp_mode_check_failed (_regex4->mode, "rm_repeat", \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.repeat; }))
> + &(_regex4)->regexp.repeat; }))
>  
>  #define REGEXP_ALLOF(r) __extension__
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_allof)  \
> -   regexp_mode_check_failed (_regexp->mode, "rm_allof",  \
> +(({ struct regexp *const _regex5 = (r);  
> \
> + if (_regex5->mode != rm_allof)  \
> +   regexp_mode_check_failed (_regex5->mode, "rm_allof",  \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.allof; }))
> + &(_regex5)->regexp.allof; }))
>  
>  #define REGEXP_ONEOF(r) __extension__
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_oneof)  \
> -   regexp_mode_check_failed (_regexp->mode, "rm_oneof",  \
> +(({ struct regexp *const _regex6 = (r);

Re: [PATCH] handle local aggregate initialization in strlen, take 2 (PR 83821)

2019-10-16 Thread Jeff Law

On 10/14/19 6:23 PM, Martin Sebor wrote:
> When a subsequent element or member of a local aggregate containing
> a prior character array is initialized the strlen pass discards
> the length it computed for the prior element/member.  E.g., here:
> 
>   struct { char a[4], b[4]; } s = { "1", "12" };
> 
> even though strlen (s.b) is folded to 2, strlen (s.a) is not.  (Ditto
> for other stores even to members of other types.)  This causes hundreds
> (over 700 in GCC) to thousands (nearly 3,000 in Binutils/GDB and some
> 36,000 in the kernel) of instances of previously computed string lengths
> to end up discarded and so besides emitting less than optimal code also
> defeats buffer overflow detection in such cases.
> 
> Attached is a resubmission of a previously approved patch that I never
> committed (the original had a bug that was noted during review that
> I subsequently fixed but I didn't remember to post the corrected patch).
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-83821.diff
> 
> PR tree-optimization/83821 - local aggregate initialization defeats strlen 
> optimization
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/83821
>   * tree-ssa-strlen.c (maybe_invalidate): Add argument.  Consider
>   the length of a string when available.
>   (handle_builtin_memset) Add argument.
>   (handle_store, strlen_check_and_optimize_call): Same.
>   (check_and_optimize_stmt): Same.  Pass it to callees.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/83821
>   * c-c++-common/Warray-bounds-4.c: Remove XFAIL.
>   * gcc.dg/strlenopt-80.c: New test.
>   * gcc.dg/strlenopt-81.c: Same.
>   * gcc.dg/strlenopt-82.c: Same.
>   * gcc.dg/strlenopt-83.c: Same.
>   * gcc.dg/strlenopt-84.c: Same.
>   * gcc.dg/tree-ssa/calloc-4.c: Same.
>   * gcc.dg/tree-ssa/calloc-5.c: Same.
OK.

Jeff

Re: [PATCH] handle string copies with non-constant lengths (PR 91996)

2019-10-16 Thread Jeff Law

On 10/15/19 3:24 PM, Martin Sebor wrote:
> The attached patch removes a FIXME added recently to the strlen
> pass as a reminder to extend the handling of multi-byte stores
> of characters copied from non-constant strings with constant
> lengths to strings with non-constant lengths in some known range.
> For the string length range information it relies on the EVRP
> instance introduced into the pass with the sprintf integration
> and so far only used by sprintf.  (This is just a small first
> step in generalizing the strlen pass to take advantage of strlen
> ranges.)
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-91996.diff
> 
> PR tree-optimization/91996 - fold non-constant strlen relational expressions
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/91996
>   * gcc.dg/strlenopt-80.c: New test.
>   * gcc.dg/strlenopt-81.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/91996
>   * tree-ssa-strlen.c (maybe_warn_pointless_strcmp): Improve location
>   information.
>   (compare_nonzero_chars): Add an overload.
>   (count_nonzero_bytes): Add an argument.  Call overload above.
>   Handle non-constant lengths in some range.
>   (handle_store): Add an argument.
>   (check_and_optimize_stmt): Pass an argument to handle_store.
OK
jeff

[PATCH] i386: Add clear_ratio to processor_costs

2019-10-16 Thread H.J. Lu

i386.h has

 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)

It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
to processor_costs, sets it to the minimum of 6 and move_ratio in all
cost models and defines CLEAR_RATIO with clear_ratio.

* config/i386/i386.h (processor_costs): Add clear_ratio.
(CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
* config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
of 6 and move_ratio in all cost models.

OK for trunk?

Thanks.

-- 
H.J.
From 266be647ebab13a461a91d6cdcb0f9c792a3714a Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 15 Oct 2019 08:12:47 -0700
Subject: [PATCH] i386: Add clear_ratio to processor_costs

i386.h has

 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)

It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
to processor_costs, sets it to the minimum of 6 and move_ratio in all
cost models and defines CLEAR_RATIO with clear_ratio.

	* config/i386/i386.h (processor_costs): Add clear_ratio.
	(CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
	* config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
	of 6 and move_ratio in all cost models.
---
 gcc/config/i386/i386.h   |  4 +++-
 gcc/config/i386/x86-tune-costs.h | 24 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 4e37336c7140..afa0aa83ddf3 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -291,6 +291,8 @@ struct processor_costs {
   const int large_insn;		/* insns larger than this cost more */
   const int move_ratio;		/* The threshold of number of scalar
    memory-to-memory move insns.  */
+  const int clear_ratio;	/* The threshold of number of scalar
+   memory clearing insns.  */
   const int int_load[3];	/* cost of loading integer registers
    in QImode, HImode and SImode relative
    to reg-reg move (2).  */
@@ -1947,7 +1949,7 @@ typedef struct ix86_args {
 /* If a clear memory operation would take CLEAR_RATIO or more simple
move-instruction sequences, we will do a clrmem or libcall instead.  */
 
-#define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
+#define CLEAR_RATIO(speed) ((speed) ? ix86_cost->clear_ratio : 2)
 
 /* Define if shifts truncate the shift count which implies one can
omit a sign-extension or zero-extension of a shift count.
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 8e6f4b5d3ea5..99816aeaebc1 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -81,6 +81,7 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */
   COSTS_N_BYTES (3),			/* cost of movzx */
   0,	/* "large" insn */
   2,	/* MOVE_RATIO */
+  2,	/* CLEAR_RATIO */
   {2, 2, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -185,6 +186,7 @@ struct processor_costs i386_cost = {	/* 386 specific costs */
   COSTS_N_INSNS (2),			/* cost of movzx */
   15,	/* "large" insn */
   3,	/* MOVE_RATIO */
+  3,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -286,6 +288,7 @@ struct processor_costs i486_cost = {	/* 486 specific costs */
   COSTS_N_INSNS (2),			/* cost of movzx */
   15,	/* "large" insn */
   3,	/* MOVE_RATIO */
+  3,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -389,6 +392,7 @@ struct processor_costs pentium_cost = {
   COSTS_N_INSNS (2),			/* cost of movzx */
   8,	/* "large" insn */
   6,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -483,6 +487,7 @@ struct processor_costs lakemont_cost = {
   COSTS_N_INSNS (2),			/* cost of movzx */
   8,	/* "large" insn */
   17,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -592,6 +597,7 @@ struct processor_costs pentiumpro_cost = {
   COSTS_N_INSNS (1),			/* cost of movzx */
   8,	/* "large" insn */
   6,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {4, 4, 4},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -692,6 +698,7 @@ struct processor_costs geode_cost = {
   COSTS_N_INSNS (1),			/* cost of movzx */
   8,	/* "large" insn */
   4,	/* MOVE_RATIO */
+  4,	/* CLEAR_RATIO */
   {2, 2, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to

1 2 3 >

1 - 100 of 214 matches

Mail list logo