Re: [PATCH] Make strlen range computations more conservative

2018-10-23 Thread Maxim Kuvyrkov
Hi Jeff,
Hi Bernd,

This change (git commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb / svn rev. 
263793) causes a segfault when build Linux kernel for AArch64.  The exact 
configuration is 
===
git_repo[linux]=https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
git_branch[linux]=linux-4.14.y
git_repo[gcc]=git://gcc.gnu.org/git/gcc.git
git_branch[gcc]=master
linux_config=allmodconfig
===

The bisection artifacts point at this exact commit:
Parent commit 0584c3707994997f5dc9fa79732d01a53c25db6a can build 18083 objects.
This commit d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb can build 18076 objects 
from the same linux tree.

The relevant error (see [1]):
==
during GIMPLE pass: dse
drivers/md/dm-mpath.c: In function 'multipath_init_per_bio_data':
drivers/md/dm-mpath.c:2032:1: internal compiler error: Segmentation fault
==

Full bisection artifacts are at [2].

Bernd, would you please investigate?

IMO, this should be easy to reproduce from the bisection logs, but let me know 
if it's not straightforward.  Best ping on IRC (I'm maximk) or follow up here. 

FYI, there is another regression (either caused or unmasked by Kugan's gcc 
commit b88c25691cf8b153db44108935db871e1d40db89), but it appears orthogonal to 
this one.

[1] 
https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/build-d0eb64b248a9e40dfa633c4e4baebc3b238fd6eb/5-count_linux_objs/console.log/*view*/

[2] 
https://ci.linaro.org/view/tcwg_kernel-gnu/job/tcwg_kernel-bisect-gnu-master-aarch64-lts-allmodconfig/8/artifact/artifacts/

Regards,

--
Maxim Kuvyrkov
www.linaro.org



> On Aug 22, 2018, at 2:43 AM, Jeff Law  wrote:
> 
> [ I'm still digesting, but saw something in this that ought to be broken
> out... ]
> 
> On 08/19/2018 09:55 AM, Bernd Edlinger wrote:
>> diff -Npur gcc/tree-ssa-dse.c gcc/tree-ssa-dse.c
>> --- gcc/tree-ssa-dse.c   2018-07-18 21:21:34.0 +0200
>> +++ gcc/tree-ssa-dse.c   2018-08-19 14:29:32.344498771 +0200
>> @@ -248,6 +248,12 @@ compute_trims (ao_ref *ref, sbitmap live
>>   residual handling in mem* and str* functions is usually
>>   reasonably efficient.  */
>>   *trim_tail = last_orig - last_live;
>> +  /* Don't fold away an out of bounds access, as this defeats proper
>> + warnings.  */
>> +  if (*trim_tail
>> +  && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)),
>> +   last_orig) <= 0)
>> +*trim_tail = 0;
>> }
>>   else
>> *trim_tail = 0;
> This seems like a good change in and of itself and should be able to go
> forward without further review work.   Consider this hunk approved,
> along with any testsuite you have which tickles this code (I didn't
> immediately see one attached to this patch.  But I could have missed it).
> 
> Jeff



Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-10-23 Thread Martin Sebor

On 10/23/2018 03:53 PM, Joseph Myers wrote:

On Mon, 22 Oct 2018, Martin Sebor wrote:


between aliases and ifunc resolvers.  With -Wattribute-alias=1
that reduced the number of unique instances of the warnings for
a Glibc build to just 27.  Of those, all but one of
the -Wattributes instances are of the form:

  warning: ‘leaf’ attribute has no effect on unit local functions


What do the macro expansions look like there?  All the places where you're
adding "copy" attributes are for extern declarations, not static ones,
whereas your list of warnings seems to indicate this is appearing for
ifunc resolvers (which are static, but should not be copying attributes
from anywhere).


These must have been caused by the bug in the patch (below).
They have cleared up with it fixed.  I'm down to just 18
instances of a -Wmissing-attributes warning, all for string
functions.  The cause of those is described below.




All the -Wmissing-attributes instances are due to a missing
nonnull attribute on the __EI__ kinds of functions, like:

  warning: ‘__EI_vfprintf’ specifies less restrictive attribute than its
target ‘vfprintf’: ‘nonnull’


That looks like a bug in the GCC patch to me; you appear to be adding copy
attributes in the correct place.  Note that __EI_* gets declared twice
(first with __asm__, second with an alias attribute), so anything related
to handling of such duplicate declarations might be a cause for such a
bug (and an indication of what you need to add a test for when fixing such
a bug).


There was a bug in the patch, but there is also an issue in Glibc
that made it tricky to see the problem.

The tests I had in place were too simple to catch the GCC bug:
the problem there was that when the decl didn't have an attribute
the type of the "template" did the check would fail without also
considering the decl's type.  Tricky stuff!  I've added tests to
exercise this.

The Glibc issue has to do with the use of __hidden_ver1 macro
to declare string functions.  sysdeps/x86_64/multiarch/strcmp.c
for instance has:

  __hidden_ver1 (strcmp, __GI_strcmp, __redirect_strcmp)
__attribute__ ((visibility ("hidden")));

and __redirect_strcmp is missing the nonnull attribute because
it's #undefined in include/sys/cdefs.h.  An example of one of
these warnings is attached.

Using strcmp instead of __redirect_strcmp would solve this but
__redirect_strcmp should have all the same attributes as strcmp.
But nonnull is removed from the declaration because the __nonnull
macro that controls it is undefined in include/sys/cdefs.h.  There
is a comment above the #undef in the header that reads:

/* The compiler will optimize based on the knowledge the parameter is
   not NULL.  This will omit tests.  A robust implementation cannot allow
   this so when compiling glibc itself we ignore this attribute.  */
# undef __nonnull
# define __nonnull(params)

I don't think this is actually true for recent versions of GCC.
The nonnull optimization is controlled by
-fisolate-erroneous-paths-attribute and according to the manual
and common.opt the option is disabled by default.

But if you do want to avoid the attribute on declarations of
these functions regardless it should be safe to add it after
the declaration in the .c file, like so:

__hidden_ver1 (strcmp, __GI_strcmp, __redirect_strcmp)
  __attribute__ ((visibility ("hidden"), copy (strcmp)));

That should make it straightforward to adopt the enhancement
and experiment with -Wattribute-alias=2 to see if it does what
you had  in mind.

The latest GCC patch with the fix mentioned above is attached.

Martin
PR middle-end/81824 - Warn for missing attributes with function aliases

gcc/c-family/ChangeLog:

	PR middle-end/81824
	* c-attribs.c (handle_copy_attribute_impl): New function.
	(handle_copy_attribute): Same.

gcc/cp/ChangeLog:

	PR middle-end/81824
	* pt.c (warn_spec_missing_attributes): Move code to attribs.c.
	Call decls_mismatched_attributes.

gcc/ChangeLog:

	PR middle-end/81824
	* attribs.c (has_attribute): New helper function.
	(decls_mismatched_attributes, maybe_diag_alias_attributes): Same.
	* attribs.h (decls_mismatched_attributes): Declare.
	* cgraphunit.c (handle_alias_pairs): Call maybe_diag_alias_attributes.
	(maybe_diag_incompatible_alias): Use OPT_Wattribute_alias_.
	* common.opt (-Wattribute-alias): Take an argument.
	(-Wno-attribute-alias): New option.
	* doc/extend.texi (Common Function Attributes): Document copy.
	(Common Variable Attributes): Same.
	* doc/invoke.texi (-Wmissing-attributes): Document enhancement.
	(-Wattribute-alias): Document new option argument.

libgomp/ChangeLog:

	PR c/81824
	* libgomp.h (strong_alias, ialias, ialias_redirect): Use attribute
	copy.

gcc/testsuite/ChangeLog:

	PR middle-end/81824
	* gcc.dg/Wattribute-alias.c: New test.
	* gcc.dg/Wmissing-attributes.c: New test.
	* gcc.dg/attr-copy.c: New test.
	* gcc.dg/attr-copy-2.c: New test.
	* gcc.dg/attr-copy-3.c: New test.
	* gcc.dg/attr-copy-4.c: New test.

diff --git a/gcc/attribs.c 

[PATCH libquadmath/PR68686]

2018-10-23 Thread Ed Smith-Rowland

Greetings,

This is an almost trivial patch to get the correct sign for tgammaq.

I don't have a testcase as I don't know where to put one.

OK?

Ed Smith-Rowland


2018-10-24  Edward Smith-Rowland  <3dw...@verizon.net>

	PR libquadmath/68686
	* math/tgammaq.c: Correct sign for negative argument.
Index: libquadmath/math/tgammaq.c
===
--- libquadmath/math/tgammaq.c	(revision 265345)
+++ libquadmath/math/tgammaq.c	(working copy)
@@ -47,7 +47,9 @@
 /* x == -Inf.  According to ISO this is NaN.  */
 return x - x;
 
-  /* XXX FIXME.  */
   res = expq (lgammaq (x));
-  return signbitq (x) ? -res : res;
+  if (x > 0.0Q || ((int)(-x) & 1) == 1)
+return res;
+  else
+return -res;
 }


Re: [PATCH] ux.texi: move "Quoting" and "Fix-it hints" from DiagnosticsGuidelines wiki page

2018-10-23 Thread Martin Sebor

On 10/23/2018 02:42 PM, David Malcolm wrote:

I want to move material from
  https://gcc.gnu.org/wiki/DiagnosticsGuidelines
into the new User Experience Guidelines chapter of our internals
documentation.  I've already update the link in that wiki page to point
to the pertinent HTML build of the docs:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html

This patch does it for the "Quoting" section, and adds a note about
fix-it hints that would make the wiki page's "Fix-it hints" section
redundant.

Martin and Manu: can you confirm you wrote this wiki material, and that
it's OK to add it to the GCC docs (I don't know what license the wiki
is under).  Are all such changes OK from a licensing perspective, for
material you contributed to the GCC wiki?


I did add a some brief text about quoting to the Wiki.  Now that
we have guidelines for these things in the manual I think it makes
perfect sense to move stuff we all agree with there.  Go for it!

Martin


gcc/ChangeLog:
* doc/ux.texi (Quoting): New subsection, adapted from material at
https://gcc.gnu.org/wiki/DiagnosticsGuidelines written by
MartinSebor and ManuelLopezIbanez.
(Fix-it hints): Note that fix-it hints shouldn't be marked for
translation.
---
 gcc/doc/ux.texi | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/doc/ux.texi b/gcc/doc/ux.texi
index 9185f68..1061aa0 100644
--- a/gcc/doc/ux.texi
+++ b/gcc/doc/ux.texi
@@ -384,6 +384,38 @@ of the @code{auto_diagnostic_group} are related.  
(Currently it doesn't
 do anything with this information, but we may implement that in the
 future).

+@subsection Quoting
+Text should be quoted by either using the @samp{q} modifier in a directive
+such as @samp{%qE}, or by enclosing the quoted text in a pair of @samp{%<}
+and @samp{%>} directives, and never by using explicit quote characters.
+The directives handle the appropriate quote characters for each language
+and apply the correct color or highlighting.
+
+The following elements should be quoted in GCC diagnostics:
+
+@itemize @bullet
+@item
+Language keywords.
+@item
+Tokens.
+@item
+Boolean, numerical, character, and string constants that appear in the
+source code.
+@item
+Identifiers, including function, macro, type, and variable names.
+@end itemize
+
+Other elements such as numbers that do not refer to numeric constants that
+appear in the source code should not be quoted. For example, in the message:
+
+@smallexample
+argument %d of %qE must be a pointer type
+@end smallexample
+
+@noindent
+since the argument number does not refer to a numerical constant in the
+source code it should not be quoted.
+
 @subsection Spelling and Terminology

 See the @uref{https://gcc.gnu.org/codingconventions.html#Spelling
@@ -401,6 +433,9 @@ can also be viewed via 
@option{-fdiagnostics-generate-patch} and
 @option{-fdiagnostics-parseable-fixits}.  With the latter, an IDE
 ought to be able to offer to automatically apply the suggested fix.

+Fix-it hints contain code fragments, and thus they should not be marked
+for translation.
+
 Fix-it hints can be added to a diagnostic by using a @code{rich_location}
 rather than a @code{location_t} - the fix-it hints are added to the
 @code{rich_location} using one of the various @code{add_fixit} member





Re: [PATCH], PowerPC: Use f128 for long double built-ins if we have changed to use IEEE 128-bit floating point

2018-10-23 Thread Michael Meissner
On Tue, Oct 23, 2018 at 10:22:41PM +, Joseph Myers wrote:
> On Tue, 23 Oct 2018, Michael Meissner wrote:
> 
> > 2018-10-23  Michael Meissner  
> > 
> > * config/rs6000/rs6000.c (TARGET_MANGLE_DECL_ASSEMBLER_NAME):
> > Define as rs6000_mangle_decl_assembler_name.
> > (rs6000_mangle_decl_assembler_name): If the user switched from IBM
> > long double to IEEE long double, switch the names of the long
> > double built-in functions to be f128 instead of l.
> 
> [This is the issue discussed at the Cauldron of how to get the 
> redirections correct in any case that does not involve including the 
> standard  to get the function declarations.]
> 
> My understanding was that __ieee128 aliases would be added to glibc 
> (indeed, the relevant names are present in 
> sysdeps/ieee754/ldbl-128ibm-compat/Versions in glibc, albeit without 
> actually being enabled for powerpc64le yet), for two reasons: (a) to be 
> namespace-clean for standard C, and (b) because a few libm functions (and 
> many libc functions) have no _Float128 analogues but are still part of the 
> API for long double (e.g. the nexttoward functions, where nexttowardf, 
> nexttoward also need different versions for IEEE 128-bit long double, or 
> scalbl, which is somewhat obsolescent but still supported).
> 
> Now, you can't use the __ieee128 names with *current* glibc because 
> they aren't exported yet.  So is the plan that GCC would later switch to 
> using the __ieee128 names when available based on TARGET_GLIBC_MAJOR 
> and TARGET_GLIBC_MINOR (as more namespace-correct, and available for some 
> functions without f128 variants), with use of f128 only a 
> fallback for older glibc versions lacking __ieee128?

I suspect the timing may not be right for GCC 9, since I tend to use the
Advance Toolchain to get newer glibc's, and they are on 2.28.  I just tried it,
and as you said, they aren't exported yet.

However, I am somewhat relunctant to do patches if I don't have a glibc that
properly exports them.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



[committed] Fix minor H8 bug exposed by recent combiner changes

2018-10-23 Thread Jeff Law


This has been latent since it's H8/SX support went in like 15 years ago...

The recent combiner changes twiddled the set of registers we need to
save sometimes.  No big deal, except for a minor bug in the H8/S H8/SX
support for stm.

On the H8/S (but not the SX) there are restrictions on the alignment of
the set of consecutive registers to save (or restore in the case of
ldm).  For example, pairs have to start on an even register.

The code had it backwards -- it had the H8/S with no restrictions, but
restrictions on the H8/SX.

This caused builds in  my tester to fail to build newlib for the H8/S.

Fixed thusly.  Installing on the trunk.

Jeff
* config/h8300/h8300.c (h8300_expand_prologue): Fix stm generation
for H8/S.

diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 596f2fd2cda..24b7485602f 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -865,15 +865,15 @@ h8300_expand_prologue (void)
  if (TARGET_H8300S)
{
  /* See how many registers we can push at the same time.  */
- if ((!TARGET_H8300SX || (regno & 3) == 0)
+ if ((TARGET_H8300SX || (regno & 3) == 0)
  && ((saved_regs >> regno) & 0x0f) == 0x0f)
n_regs = 4;
 
- else if ((!TARGET_H8300SX || (regno & 3) == 0)
+ else if ((TARGET_H8300SX || (regno & 3) == 0)
   && ((saved_regs >> regno) & 0x07) == 0x07)
n_regs = 3;
 
- else if ((!TARGET_H8300SX || (regno & 1) == 0)
+ else if ((TARGET_H8300SX || (regno & 1) == 0)
   && ((saved_regs >> regno) & 0x03) == 0x03)
n_regs = 2;
}


Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Segher Boessenkool
Hi Christophe,

On Tue, Oct 23, 2018 at 03:25:55PM +0200, Christophe Lyon wrote:
> On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
>  wrote:
> > On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> > > I have noticed many regressions on arm and aarch64 between 265366 and
> > > 265408 (this commit is 265398).
> > >
> > > I bisected at least one to this commit on aarch64:
> > > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> > > live-range of register"
> > > The same test also regresses on arm.
> >
> > Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
> > when random things in the RTL change, apparently.

This is PR87708 now.

> > > For a whole picture of all the regressions I noticed during these two
> > > commits, have a look at:
> > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
> >
> > No thanks.  I am not going to click on 111 links and whatever is behind
> > those.  Please summarise, like, what was the diff in test_summary, and
> > then dig down into individual tests if you want.  Or whatever else works
> > both for you and for me.  This doesn't work for me.
> 
> OK this is not very practical for me either. There were 25 commits between
> the two validations being compared,
> 25-28 gcc tests regressed on aarch64, depending on the exact target
> 177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
> so I could have to run many bisects to make sure every regression is
> caused by the same commit.

So many, ouch!  I didn't realise.

> Since these are all automated builds with everything discarded after
> computing the regressions, it's quite time consuming to re-run the
> tests manually on my side (probably at least as much as it is for you).

Running arm tests is very painful for me.  But you say this is on aarch64
as well, I didn't realise that either; aarch64 should be easy to test,
we have many reasonable aarch64 machines in the cfarm.

> I know this doesn't answer your question, but I thought you could run aarch64
> tests easily and that would be more efficient for the project that you
> do it directly
> without waiting for me to provide hardly little more information.

Well, I'm not too familiar with aarch64, so if you can say "this Z is a
pretty simple test that should do X but now does Y" that would be a huge
help :-)

> Maybe this will answer your question better:
> List of aarch64-linux-gnu regressions:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
> List of arm-none-linux-gnueabihf regressions:
> (gcc) 
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
> (gfortran) 
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt

That may help yes, thanks!

> To me it just highlights again that we need a validation system easier to
> work with when we break something on a target we are not familiar with.

OTOH a patch like this is likely to break many target-specific tests, and
that should not prevent commiting it imnsho.  If it actively breaks things,
then of course it shouldn't go in as-is, or if it breaks bootstrap, etc.

> I run post-commit validations as finely grained as possible with the CPU
> resources I have access to, that's not enough and I think having a
> developer-accessible gerrit+jenkins-like system would be very valuable
> to test patches before commit. We have a prototype in Linaro, not
> production-ready. But I guess that would be worth another
> discussion thread :)

Yeah...  One when people have time for it ;-)


Segher


Re: [PATCH], PowerPC: Use f128 for long double built-ins if we have changed to use IEEE 128-bit floating point

2018-10-23 Thread Joseph Myers
On Tue, 23 Oct 2018, Michael Meissner wrote:

> 2018-10-23  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (TARGET_MANGLE_DECL_ASSEMBLER_NAME):
>   Define as rs6000_mangle_decl_assembler_name.
>   (rs6000_mangle_decl_assembler_name): If the user switched from IBM
>   long double to IEEE long double, switch the names of the long
>   double built-in functions to be f128 instead of l.

[This is the issue discussed at the Cauldron of how to get the 
redirections correct in any case that does not involve including the 
standard  to get the function declarations.]

My understanding was that __ieee128 aliases would be added to glibc 
(indeed, the relevant names are present in 
sysdeps/ieee754/ldbl-128ibm-compat/Versions in glibc, albeit without 
actually being enabled for powerpc64le yet), for two reasons: (a) to be 
namespace-clean for standard C, and (b) because a few libm functions (and 
many libc functions) have no _Float128 analogues but are still part of the 
API for long double (e.g. the nexttoward functions, where nexttowardf, 
nexttoward also need different versions for IEEE 128-bit long double, or 
scalbl, which is somewhat obsolescent but still supported).

Now, you can't use the __ieee128 names with *current* glibc because 
they aren't exported yet.  So is the plan that GCC would later switch to 
using the __ieee128 names when available based on TARGET_GLIBC_MAJOR 
and TARGET_GLIBC_MINOR (as more namespace-correct, and available for some 
functions without f128 variants), with use of f128 only a 
fallback for older glibc versions lacking __ieee128?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Relocation (= move+destroy)

2018-10-23 Thread Jonathan Wakely

On 23/10/18 23:17 +0200, Marc Glisse wrote:

On Tue, 23 Oct 2018, Jonathan Wakely wrote:

What depends on C++14 here? Just enable_if_t? Because we have
__enable_if_t for use in C++11.

Both GCC and Clang will allow constexpr-if and static_assert with no
message in C++11.


Probably it can be enabled in C++11 if you think that matters. I'll 
admit that I personally don't care at all about C++11, and the main 
motivation would be to enable a cleanup if we stop supporting C++03 (I 
am not very optimistic).


Me neither.

But even if enabling cleanup some day is unlikely, I think that for
this code the only change you'd need is to replace enable_if_t with
__enable_if_t and we get C++11 support for free, so we might as well
do it.


+  template
+inline void
+__relocate_a(_Tp* __dest, _Up* __orig, _Allocator& __alloc)


I find it a little surprising that this overload for single objects
using the memmove argument ordering (dest, source) but the range overload
below uses the STL ordering (source_begin, source_end, dest).

But I wouldn't be surprised if we're already doing that somewhere that
I've forgotten about.

WOuld it make sense to either rename this overload, or to use
consistent argument ordering for the two __relocate_a overloads?


The functions were not meant as overloads, it just happened that I 
arrived at the same name for both, but it would make perfect sense to 
give them different names. I started from __relocate(dest, source) for 
one element, and later added an allocator to it. The other one 
corresponds to __uninitialized_move_a, and naming it 
__uninitialized_relocate_a would be silly since "uninitialized" is 
included in the definition of relocate.


Yes, my first thought was to add "uninitialized" and I rejected it for
that same reason.

I think I'd rather rename than change the order. Do you have 
suggestions? __relocate_range_a?


I was thinking the single object one could be __relocate_1_a with the
_1 being like copy_n, fill_n etc. but that's going to be confusing
with __relocate_a_1 instead!

It seems unfortunate to have to put "range" in the name when no other
algos that work on ranges bother to say that in the name.

Maybe the single object one could be __relocate_single_a? Or
__do_relocate_a? __relocate_obj_a? None of them really makes me happy.


+ noexcept(noexcept(__gnu_cxx::__alloc_traits<_Allocator>::construct(__alloc,


Since this is C++14 (or maybe C++11) you could just use
std::allocator_traits directly. __gnu_cxx::__alloc_traits is to
provide equivalent functionality in C++98 code.


Thanks, I was wondering what it was for.


It's just so I could write code in containers that works the same for
C++11 allocators and C++98 allocators, always using __alloc_traits. In
C++98 it just forwards everything straight to the allocator, because
C++98 allocators are required to define all the members themselves.



Re: [PATCH] add simple attribute introspection

2018-10-23 Thread Martin Sebor

On 10/22/2018 04:08 PM, Jason Merrill wrote:

On 10/13/18 8:19 PM, Martin Sebor wrote:

+  oper = cp_parser_type_id (parser);
+  parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
+
+  if (cp_parser_parse_definitely (parser))
+{
+  /* If all went well, set OPER to the type.  */
+  cp_decl_specifier_seq decl_specs;
+
+  /* Build a trivial decl-specifier-seq.  */
+  clear_decl_specs (_specs);
+  decl_specs.type = oper;
+
+  /* Call grokdeclarator to figure out what type this is.  */
+  oper = grokdeclarator (NULL,
+ _specs,
+ TYPENAME,
+ /*initialized=*/0,
+ /*attrlist=*/NULL);
+}


Doesn't grokdeclarator here give you back the same type you already had
from cp_parser_type_id?  The latter already calls grokdeclarator.

I don't know why cp_parser_sizeof_operand does this, either.  Try
removing it from both places?


You're right, the call in cp_parser_has_attribute_expression
was unnecessary.  cp_parser_sizeof_operand still needs it.




+  /* Consume the comma if it's there.  */
+  if (!cp_parser_require (parser, CPP_COMMA, RT_COMMA))
+{
+  parens.require_close (parser);


I think you want cp_parser_skip_to_closing_parenthesis for error
recovery, rather than require_close.


Thanks, the error messages look slightly better that way (there
are fewer of them), although still not as good as in C or other
compilers in some cases.




+  if (tree attr = cp_parser_gnu_attribute_list (parser,
/*exactly_one=*/true))
+{
+  if (oper != error_mark_node)
+{
+  /* Fold constant expressions used in attributes first.  */
+  cp_check_const_attributes (attr);
+
+  /* Finally, see if OPER has been declared with ATTR.  */
+  ret = has_attribute (atloc, oper, attr, default_conversion);
+}
+}
+  else
+{
+  error_at (atloc, "expected identifier");
+  cp_parser_skip_to_closing_parenthesis (parser, true, false,
false);
+}
+
+  parens.require_close (parser);


I think the require_close should be in the valid case, since *skip*
already consumes a closing paren.


Ah, I need to make it consume the paren by passing true as the last
argument.  With that it works.




+is valuated.  The @var{type-or-expression} argument is subject to the
same


evaluated


Thanks for the review.

Attached is an updated patch with the fixes above.

Martin

gcc/c/ChangeLog:

	* c-parser.c (c_parser_has_attribute_expression): New function.
	(c_parser_attribute): New function.
	(c_parser_attributes): Move code into c_parser_attribute.
	(c_parser_unary_expression): Handle RID_HAS_ATTRIBUTE_EXPRESSION.

gcc/c-family/ChangeLog:

	* c-attribs.c (type_for_vector_size): New function.
	(type_valid_for_vector_size): Same.
	(handle_vector_size_attribute): Move code to the functions above
	and call them.
	(validate_attribute, has_attribute): New functions.
	* c-common.h (has_attribute): Declare.
	(rid): Add RID_HAS_ATTRIBUTE_EXPRESSION.
	* c-common.c (c_common_resword): Same.

gcc/cp/ChangeLog:

	* cp-tree.h (cp_check_const_attributes): Declare.
	* decl2.c (cp_check_const_attributes): Declare extern.
	* parser.c (cp_parser_has_attribute_expression): New function.
	(cp_parser_unary_expression): Handle RID_HAS_ATTRIBUTE_EXPRESSION.
	(cp_parser_gnu_attribute_list): Add argument.

gcc/ChangeLog:

	* doc/extend.texi (Other Builtins): Add __builtin_has_attribute.

gcc/testsuite/ChangeLog:

	* c-c++-common/builtin-has-attribute-2.c: New test.
	* c-c++-common/builtin-has-attribute-3.c: New test.
	* c-c++-common/builtin-has-attribute-4.c: New test.
	* c-c++-common/builtin-has-attribute.c: New test.
	* gcc.dg/builtin-has-attribute.c: New test.
	* gcc/testsuite/gcc.target/i386/builtin-has-attribute.c: New test.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 3a88766..c0a1bb5 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -3128,34 +3128,11 @@ handle_deprecated_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
-/* Handle a "vector_size" attribute; arguments as in
-   struct attribute_spec.handler.  */
-
+/* Return the "base" type from TYPE that is suitable to apply attribute
+   vector_size to by stripping arrays, function types, etc.  */
 static tree
-handle_vector_size_attribute (tree *node, tree name, tree args,
-			  int ARG_UNUSED (flags),
-			  bool *no_add_attrs)
+type_for_vector_size (tree type)
 {
-  unsigned HOST_WIDE_INT vecsize, nunits;
-  machine_mode orig_mode;
-  tree type = *node, new_type, size;
-
-  *no_add_attrs = true;
-
-  size = TREE_VALUE (args);
-  if (size && TREE_CODE (size) != IDENTIFIER_NODE
-  && TREE_CODE (size) != FUNCTION_DECL)
-size = default_conversion (size);
-
-  if (!tree_fits_uhwi_p (size))
-{
-  warning (OPT_Wattributes, "%qE attribute ignored", name);
-  return NULL_TREE;
-}
-
-  /* Get the vector size (in bytes).  */
-  vecsize = tree_to_uhwi (size);
-
   /* We need to provide for 

Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-10-23 Thread Joseph Myers
On Mon, 22 Oct 2018, Martin Sebor wrote:

> between aliases and ifunc resolvers.  With -Wattribute-alias=1
> that reduced the number of unique instances of the warnings for
> a Glibc build to just 27.  Of those, all but one of
> the -Wattributes instances are of the form:
> 
>   warning: ‘leaf’ attribute has no effect on unit local functions

What do the macro expansions look like there?  All the places where you're 
adding "copy" attributes are for extern declarations, not static ones, 
whereas your list of warnings seems to indicate this is appearing for 
ifunc resolvers (which are static, but should not be copying attributes 
from anywhere).

> All the -Wmissing-attributes instances are due to a missing
> nonnull attribute on the __EI__ kinds of functions, like:
> 
>   warning: ‘__EI_vfprintf’ specifies less restrictive attribute than its
> target ‘vfprintf’: ‘nonnull’

That looks like a bug in the GCC patch to me; you appear to be adding copy 
attributes in the correct place.  Note that __EI_* gets declared twice 
(first with __asm__, second with an alias attribute), so anything related 
to handling of such duplicate declarations might be a cause for such a 
bug (and an indication of what you need to add a test for when fixing such 
a bug).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH], PowerPC: Use f128 for long double built-ins if we have changed to use IEEE 128-bit floating point

2018-10-23 Thread Michael Meissner
On Tue, Oct 23, 2018 at 04:18:55PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Tue, Oct 23, 2018 at 04:12:11PM -0400, Michael Meissner wrote:
> > This patch changes the name used by the l built-in functions that 
> > return
> > or are passed long double if the long double type is changed from the 
> > current
> > IBM long double format to the IEEE 128-bit format.
> > 
> > I have done a bootstrap and make check with no regressions on a little 
> > endian
> > power8 system.  Is it ok to check into the trunk?  This will need to be back
> > ported to the GCC 8.x branch.
> 
> Could you test it on the usual assortment of systems instead of just one?
> BE 7 and 8, LE 8 and 9.  Or wait for possible fallout :-)

Ok.

> A backport to 8 is wanted yes, but please wait as usual.  It's fine after
> a week or so of no problems.

Yep.

> 
> > +/* On 64-bit Linux and Freebsd systems, possibly switch the long double 
> > library
> > +   function names from l to f128 if the default long double type 
> > is
> > +   IEEE 128-bit.  Typically, with the C and C++ languages, the standard 
> > math.h
> > +   include file switches the names on systems that support long double as 
> > IEEE
> > +   128-bit, but that doesn't work if the user uses __builtin_l 
> > directly or
> > +   if they use Fortran.  Use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to
> > +   change this name.  We only do this if the default is long double is not 
> > IEEE
> > +   128-bit, and the user asked for IEEE 128-bit.  */
> 
> s/default is/default/
> 
> Does this need some synchronisation with the libm headers?  I guess things
> will just work out, but it is desirable if libm stops doing this with
> compilers that have this change?

It should just work out assuming you are using a recent enough GLIBC.  The
patch is more for when you aren't using headers.

> > +static tree
> > +rs6000_mangle_decl_assembler_name (tree decl, tree id)
> > +{
> > +  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
> 
> Write this is in the opposite order?
>   if (TARGET_LONG_DOUBLE_128 && TARGET_IEEEQUAD && !TARGET_IEEEQUAD_DEFAULT

Because !TARGET_IEEEQUAD_DEFAULT is a constant test.  If you are on a system
that defaults to IEEE 128-bit, the whole code gets deleted.  I would hope the
tests still get deleted if it occurs later in the test, but I tend to put the
things that can be optimized at compile time first.

> > +{
> > +  size_t len = IDENTIFIER_LENGTH (id);
> > +  const char *name = IDENTIFIER_POINTER (id);
> > +
> > +  if (name[len-1] == 'l')
> > +   {
> > + bool has_long_double_p = false;
> > + tree type = TREE_TYPE (decl);
> > + machine_mode ret_mode = TYPE_MODE (type);
> > +
> > + /* See if the function returns long double or long double
> > +complex.  */
> > + if (ret_mode == TFmode || ret_mode == TCmode)
> > +   has_long_double_p = true;
> 
> This comment is a bit misleading I think?  The code checks if it is the
> same mode as would be used for long double, not if that is the actual
> asked-for type.  The code is fine AFAICS, the comment isn't so great
> though.

Well the long double type is 'TFmode'.  Though _Float128 does get mapped to
TFmode instead of KFmode also.  But explicit f128 built-ins won't go through
here, since they don't end in 'l'.  I'm just trying to avoid things like CLZL
that take long arguments and not long double.

> > + else
> > +   {
> > + function_args_iterator args_iter;
> > + tree arg;
> > +
> > + /* See if we have a long double or long double complex
> > +argument.  */
> 
> And same here.
> 
> > + FOREACH_FUNCTION_ARGS (type, arg, args_iter)
> > +   {
> > + machine_mode arg_mode = TYPE_MODE (arg);
> > + if (arg_mode == TFmode || arg_mode == TCmode)
> > +   {
> > + has_long_double_p = true;
> > + break;
> > +   }
> > +   }
> > +   }
> > +
> > + /* If we have long double, change the name.  */
> 
> And this.
> 
> > + if (has_long_double_p)
> > +   {
> > + char *name2 = (char *) alloca (len + 4);
> > + memcpy (name2, name, len-1);
> 
> len - 1

Ok.

> Okay for trunk with those things fixed.  Thanks!
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



[rfc rs6000] troubles with gimple folding for vec_sel

2018-10-23 Thread Will Schmidt
Hi all, 
I've been attempting to get early gimple-folding to work with the
vec_sel intrinsic for powerpc, and I've run into a snag or two such that
I'm not sure how to best proceed.  Code snippet is below, followed by a
description of the issues as I interpret them below.

Apologies for the ramble, Thanks in advance, ...  :-)

-Will

---8<---
/* vector selects */
/* d = vec_sel (a, b, c)
Each bit of the result vector (d) has the value of
the corresponding bit of (a) if the corresponding bit
of (c) is 0. Otherwise, each bit of the result vector
has the value of the corresponding bit of (b).  */
case ALTIVEC_BUILTIN_VSEL_16QI:
case ALTIVEC_BUILTIN_VSEL_8HI:
case ALTIVEC_BUILTIN_VSEL_4SI:
case ALTIVEC_BUILTIN_VSEL_2DI:
case ALTIVEC_BUILTIN_VSEL_4SF:
case ALTIVEC_BUILTIN_VSEL_2DF:
{
tree cond_tree = gimple_call_arg (stmt, 2);
tree then_tree = gimple_call_arg (stmt, 0);
tree else_tree = gimple_call_arg (stmt, 1);
lhs = gimple_call_lhs (stmt);
location_t loc = gimple_location (stmt);
gimple_seq stmts = NULL;
   tree truth_cond_tree_type
 = build_same_sized_truth_vector_type (TREE_TYPE(cond_tree));
   tree truth_cond_tree
 = gimple_convert (, loc, truth_cond_tree_type, cond_tree);
   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   g = gimple_build_assign (lhs, VEC_COND_EXPR, truth_cond_tree,
then_tree, else_tree);
   gimple_set_location (g, gimple_location (stmt));
   gsi_replace (gsi, g, true);
   return true;
}
---8<---

First issue (easier?) - The above code snippet works for the
existing powerpc/fold-vec-sel-* testcases, except that when comparing
before and after codegen, we end up with an extra pair of instructions
(vspltisw and vcmpgtsh ~~ splat zero, compare).  This appears to be due
to taking the "Fake op0 < 0" path during optabs.c:
expand_vec_cond_expr(),  I've not fully exhausted my debug on that
front, but mention it in case this is something obvious.  And, this is
probably minor with respect to issue 2.

Second issue - This works for the simple tests, so seems like the
implementation should be close to correct, but triggers an ICE
when trying to build some of the pre-existing tests.  After some
investigation, this appears to be specific to those tests that have
non-variable values for the condition vector.

For instance, our powerpc/altivec-32.c testcase contains: 
  unsigned k = 1;
  a = (vector unsigned) { 0, 0, 0, 1 };
  b = c = (vector unsigned) { 0, 0, 0, 0 };
  a = vec_add (a, vec_splats (k));
  b = vec_add (b, a);
  c = vec_sel (c, a, b);

which ends up as...
c = vec_sel ({0,0,0,0}, {1,1,1,2}, {1,1,1,2});
Our condition vector is the last argument, so this gets rearranged a bit
when we build the vec_cond_expr, and we eventually get (at gimple time)

  _1 = VEC_COND_EXPR <{ 1(OVF), 1(OVF), 1(OVF), 2(OVF) }, { 0, 0, 0, 0 }, { 1, 
1, 1, 2 }>;

And we subsequently ICE (at expand time) when we hit a gcc_unreachable()
in expr.c const_vector_mask_from_tree() when we try to map that '2(OVF)'
value to a (boolean) zero or minus_one, and fail.

during RTL pass: expand
dump file: altivec-34.c.230r.expand
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:
 In function ‘foo’:
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/altivec-34.c:21:6:
 internal compiler error: in const_vector_mask_from_tree, at expr.c:12247
0x1059fba7 const_vector_mask_from_tree
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/expr.c:12247

Ultimately, this seems like an impasse between the vec_sel intrinsic
being a bit-wise operation, and the truth_vector being of a boolean
type.   But.. i'm not certain.
I had at an earlier time tried to implement vec_sel() up using a mix of
BIT_NOT_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, but ended up with some
incredibly horrible codegen.  (which would be a no-go).  I may need to
revisit that..

Thanks,
-Will





[PATCH, committed] Remove self from write after approval

2018-10-23 Thread Iain Buclaw
As I'm now listed under Language Front Ends Maintainers.

Regards
--
Iain

---
ChangeLog:

2018-10-23  Iain Buclaw  

* MAINTAINERS (Write After Approval): Remove myself.

---
diff --git a/MAINTAINERS b/MAINTAINERS
index 55c4663f4d2..d9ecc9f5580 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -332,7 +332,6 @@ Joel Brobecker	
 Dave Brolley	
 Julian Brown	
 Christian Bruel	
-Iain Buclaw	
 Kevin Buettner	
 Adam Butcher	
 Andrew Cagney	


Re: [PATCH], PowerPC: Use f128 for long double built-ins if we have changed to use IEEE 128-bit floating point

2018-10-23 Thread Segher Boessenkool
Hi Mike,

On Tue, Oct 23, 2018 at 04:12:11PM -0400, Michael Meissner wrote:
> This patch changes the name used by the l built-in functions that return
> or are passed long double if the long double type is changed from the current
> IBM long double format to the IEEE 128-bit format.
> 
> I have done a bootstrap and make check with no regressions on a little endian
> power8 system.  Is it ok to check into the trunk?  This will need to be back
> ported to the GCC 8.x branch.

Could you test it on the usual assortment of systems instead of just one?
BE 7 and 8, LE 8 and 9.  Or wait for possible fallout :-)

A backport to 8 is wanted yes, but please wait as usual.  It's fine after
a week or so of no problems.


> +/* On 64-bit Linux and Freebsd systems, possibly switch the long double 
> library
> +   function names from l to f128 if the default long double type is
> +   IEEE 128-bit.  Typically, with the C and C++ languages, the standard 
> math.h
> +   include file switches the names on systems that support long double as 
> IEEE
> +   128-bit, but that doesn't work if the user uses __builtin_l directly 
> or
> +   if they use Fortran.  Use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to
> +   change this name.  We only do this if the default is long double is not 
> IEEE
> +   128-bit, and the user asked for IEEE 128-bit.  */

s/default is/default/

Does this need some synchronisation with the libm headers?  I guess things
will just work out, but it is desirable if libm stops doing this with
compilers that have this change?

> +static tree
> +rs6000_mangle_decl_assembler_name (tree decl, tree id)
> +{
> +  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128

Write this is in the opposite order?
  if (TARGET_LONG_DOUBLE_128 && TARGET_IEEEQUAD && !TARGET_IEEEQUAD_DEFAULT

> +{
> +  size_t len = IDENTIFIER_LENGTH (id);
> +  const char *name = IDENTIFIER_POINTER (id);
> +
> +  if (name[len-1] == 'l')
> + {
> +   bool has_long_double_p = false;
> +   tree type = TREE_TYPE (decl);
> +   machine_mode ret_mode = TYPE_MODE (type);
> +
> +   /* See if the function returns long double or long double
> +  complex.  */
> +   if (ret_mode == TFmode || ret_mode == TCmode)
> + has_long_double_p = true;

This comment is a bit misleading I think?  The code checks if it is the
same mode as would be used for long double, not if that is the actual
asked-for type.  The code is fine AFAICS, the comment isn't so great
though.

> +   else
> + {
> +   function_args_iterator args_iter;
> +   tree arg;
> +
> +   /* See if we have a long double or long double complex
> +  argument.  */

And same here.

> +   FOREACH_FUNCTION_ARGS (type, arg, args_iter)
> + {
> +   machine_mode arg_mode = TYPE_MODE (arg);
> +   if (arg_mode == TFmode || arg_mode == TCmode)
> + {
> +   has_long_double_p = true;
> +   break;
> + }
> + }
> + }
> +
> +   /* If we have long double, change the name.  */

And this.

> +   if (has_long_double_p)
> + {
> +   char *name2 = (char *) alloca (len + 4);
> +   memcpy (name2, name, len-1);

len - 1


Okay for trunk with those things fixed.  Thanks!


Segher


Re: Debug unordered containers code cleanup

2018-10-23 Thread Jonathan Wakely

On 23/10/18 22:35 +0200, François Dumont wrote:

On 10/23/2018 11:52 AM, Jonathan Wakely wrote:

On 22/10/18 22:45 +0200, François Dumont wrote:

I plan to commit the attached patch this week if not told otherwise.


Looks good.


This is to generalize usage of C++11 direct initialization in 
unordered containers.


It also avoids a number of safe iterator instantiations.


Would the following patch also make sense?

--- a/libstdc++-v3/include/debug/safe_unordered_container.h
+++ b/libstdc++-v3/include/debug/safe_unordered_container.h
@@ -66,18 +66,18 @@ namespace __gnu_debug
  void
  _M_invalidate_locals()
  {
-   auto __local_end = _M_cont()._M_base().end(0);
+   auto __local_end = _M_cont()._M_base().cend(0);
   this->_M_invalidate_local_if(
- [__local_end](__decltype(_M_cont()._M_base().cend(0)) __it)
+   [__local_end](__decltype(__local_end) __it)
   { return __it != __local_end; });
  }

  void
  _M_invalidate_all()
  {
-   auto __end = _M_cont()._M_base().end();
+   auto __end = _M_cont()._M_base().cend();
   this->_M_invalidate_if(
-   [__end](__decltype(_M_cont()._M_base().cend()) __it)
+   [__end](__decltype(__end) __it)
   { return __it != __end; });
   _M_invalidate_locals();
  }
@@ -92,7 +92,7 @@ namespace __gnu_debug

  /** Invalidates all local iterators @c x that reference this 
container,

 are not singular, and for which @c __pred(x) returns @c
- true. @c __pred will be invoked with the normal ilocal 
iterators

+ true. @c __pred will be invoked with the normal local iterators
 nested in the safe ones. */
  template
   void




Yes, looks like a nice cleanup too.

I'll integrate it in mine and commit all the changes once tests are 
completed.


Thanks.



Re: Relocation (= move+destroy)

2018-10-23 Thread Marc Glisse

On Tue, 23 Oct 2018, Jonathan Wakely wrote:


CCing gcc-patches


It seems to have disappeared somehow during the discussion, sorry.


The tricky stuff in  all looks right, I only have
some comments on the __relocate_a functions ...



Index: libstdc++-v3/include/bits/stl_uninitialized.h
===
--- libstdc++-v3/include/bits/stl_uninitialized.h   (revision 265289)
+++ libstdc++-v3/include/bits/stl_uninitialized.h   (working copy)
@@ -872,14 +872,75 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
uninitialized_move_n(_InputIterator __first, _Size __count,
 _ForwardIterator __result)
{
  auto __res = std::__uninitialized_copy_n_pair
(_GLIBCXX_MAKE_MOVE_ITERATOR(__first),
 __count, __result);
  return {__res.first.base(), __res.second};
}
#endif

+#if __cplusplus >= 201402L


What depends on C++14 here? Just enable_if_t? Because we have
__enable_if_t for use in C++11.

Both GCC and Clang will allow constexpr-if and static_assert with no
message in C++11.


Probably it can be enabled in C++11 if you think that matters. I'll admit 
that I personally don't care at all about C++11, and the main motivation 
would be to enable a cleanup if we stop supporting C++03 (I am not very 
optimistic).



+  template
+inline void
+__relocate_a(_Tp* __dest, _Up* __orig, _Allocator& __alloc)


I find it a little surprising that this overload for single objects
using the memmove argument ordering (dest, source) but the range overload
below uses the STL ordering (source_begin, source_end, dest).

But I wouldn't be surprised if we're already doing that somewhere that
I've forgotten about.

WOuld it make sense to either rename this overload, or to use
consistent argument ordering for the two __relocate_a overloads?


The functions were not meant as overloads, it just happened that I arrived 
at the same name for both, but it would make perfect sense to give them 
different names. I started from __relocate(dest, source) for one element, 
and later added an allocator to it. The other one corresponds to 
__uninitialized_move_a, and naming it __uninitialized_relocate_a would be 
silly since "uninitialized" is included in the definition of relocate.


I think I'd rather rename than change the order. Do you have suggestions? 
__relocate_range_a?


+ 
noexcept(noexcept(__gnu_cxx::__alloc_traits<_Allocator>::construct(__alloc,


Since this is C++14 (or maybe C++11) you could just use
std::allocator_traits directly. __gnu_cxx::__alloc_traits is to
provide equivalent functionality in C++98 code.


Thanks, I was wondering what it was for.


+__dest, std::move(*__orig)))
+&& noexcept(__gnu_cxx::__alloc_traits<_Allocator>::destroy(
+   __alloc, std::__addressof(*__orig
+{
+  typedef __gnu_cxx::__alloc_traits<_Allocator> __traits;
+  __traits::construct(__alloc, __dest, std::move(*__orig));
+  __traits::destroy(__alloc, std::__addressof(*__orig));
+}
+
+  template
+struct __is_trivially_relocatable
+: is_trivial<_Tp> { };


It might be worth adding a comment that this type might be specialized
in future, so that I don't forget and simplify it to an alias template
later :-)


Ok.

--
Marc Glisse


Re: [patch, fortran] Implement FINDLOC

2018-10-23 Thread Thomas Koenig

Am 23.10.18 um 18:16 schrieb Dominique d'Humières:





Anyway, the attached patch fixes this,


It now gives the error

4 |integer, parameter :: I_FINDLOC_BACK(1) = findloc([1,1],1, &
   |1
Error: transformational intrinsic 'findloc' at (1) is not permitted in an 
initialization expression


That error message was misleading, the new one now has

Error: Parameter 'x' at (1) has not been declared or is a variable, 
which does not reduce to a constant expression



The following test

program logtest3
implicit none
! !
! *** Everything depends on this parameter ***!

integer, parameter :: A1 = 2
logical :: L
L = transfer(A1,L)
call sub(L)
end program logtest3

subroutine sub(x)
implicit none
logical x
integer a(1)
character(*), parameter :: strings(2) = ['.TRUE. ','.FALSE.']

a = findloc([1,1],1,mask=[x,.TRUE.])
write(*,'(a)') 'Value by FINDLOC(MASK): '// &
   trim(strings(a(1)))
a = findloc([1,1],1,back=x)
write(*,'(a)') 'Value by FINDLOC(BACK): '// &
   trim(strings(3-a(1)))

end subroutine sub

does not link:

 8 |L = transfer(A1,L)
   |   1
Warning: Assigning value other than 0 or 1 to LOGICAL has undefined result at 
(1)
Undefined symbols for architecture x86_64:
   "__gfortran_findloc0_i4", referenced from:
   _sub_ in ccnoLKfH.o
   "__gfortran_mfindloc0_i4", referenced from:
   _sub_ in ccnoLKfH.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status


Ah, I didn't include the newly generated files in the previous patch.
Now included.



Finally the line before the end of findloc_6.f90 should be

   if (findloc(ch,"CC ",dim=1,mask=false) /= 0) stop 23


Changed, also the whitespace fixes that Bernhard mentioned.

So, I think this should be clear for trunk now.  I will supply
the documentation later.

Regards

Thomas
! { dg-do compile }
! Test errors in findloc.
program main
  integer, dimension(4) :: a
  logical, dimension(3) :: msk
  a = [2,4,6,8]
  print *,findloc(a) ! { dg-error "Missing actual argument" }
  print *,findloc(a,value=.true.) ! { dg-error "must be in type conformance to argument" }
  print *,findloc(a,23,dim=6) ! { dg-error "is not a valid dimension index" }
  print *,findloc(a,-42,dim=2.0) ! { dg-error "must be INTEGER" }
  print *,findloc(a,6,msk) ! { dg-error "Different shape for arguments 'array' and 'mask'" }
  print *,findloc(a,6,kind=98) ! { dg-error "Invalid kind for INTEGER" }
end program main
! { dg-do run }
! Various tests with findloc.
program main
  implicit none
  real, dimension(2,2) :: a, b
  integer, dimension(2,3) :: c
  logical, dimension(2,2) :: lo
  integer, dimension(:), allocatable :: e
  a = reshape([1.,2.,3.,4.], shape(a))
  b = reshape([1.,2.,1.,2.], shape(b))

  lo = .true.

  if (any(findloc(a, 5.) /= [0,0])) stop 1
  if (any(findloc(a, 5., back=.true.) /= [0,0])) stop 2
  if (any(findloc(a, 2.) /= [2,1])) stop 2
  if (any(findloc(a, 2. ,back=.true.) /= [2,1])) stop 3

  if (any(findloc(a,3.,mask=lo) /= [1,2])) stop 4
  if (any(findloc(a,3,mask=.true.) /= [1,2])) stop 5
  lo(1,2) = .false.
  if (any(findloc(a,3.,mask=lo) /= [0,0])) stop 6
  if (any(findloc(b,2.) /= [2,1])) stop 7
  if (any(findloc(b,2.,back=.true.) /= [2,2])) stop 8
  if (any(findloc(b,1.,mask=lo,back=.true.) /= [1,1])) stop 9
  if (any(findloc(b,1.,mask=.false.) /= [0,0])) stop 10

  c = reshape([1,2,2,2,-9,6], shape(c))
  if (any(findloc(c,value=2,dim=1) /= [2,1,0])) stop 11
  if (any(findloc(c,value=2,dim=2) /= [2,1])) stop 12
end program main
! { dg-do run }
! Various tests with findloc with character variables.
program main
  character(len=2) :: a(3,3), c(3,3), d(3,4)
  character(len=3) :: b(3,3)
  integer :: ret(2)
  integer :: i,j
  character(len=3) :: s
  logical :: lo
  logical, dimension(3,4) :: msk
  data a /"11", "21", "31", "12", "22", "32", "13", "23", "33" /
  data b /"11 ", "21 ", "31 ", "12 ", "22 ", "32 ", "13 ", "23 ", "33 " /
  if (any(findloc(a,"11 ") /= [1,1])) stop 1
  ret = findloc(b,"31")
  do j=1,3
 do i=1,3
write(unit=s,fmt='(2I1," ")') i,j
ret = findloc(b,s)
if (b(ret(1),ret(2)) /= s) stop 2
 end do
  end do

  if (any(findloc(b(::2,::2),"13") /= [1,2])) stop 3

  do j=1,3
do i=1,3
  write(unit=c(i,j),fmt='(I2)') 2+i-j
end do
  end do

  if (any(findloc(c," 1") /= [1,2])) stop 4
  if (any(findloc(c," 1", back=.true.) /= [2,3])) stop 5
  if (any(findloc(c," 1", back=.true., mask=.false.) /= [0,0])) stop 6

  lo = .true.
  if (any(findloc(c," 2", dim=1) /= [1,2,3])) stop 7
  if (any(findloc(c," 2",dim=1,mask=lo) /= [1,2,3])) stop 8

  if (any(findloc(c," 2", dim=1,back=.true.) /= [1,2,3])) stop 9
  if (any(findloc(c," 2",dim=1,mask=lo,back=.true.) /= [1,2,3])) stop 10
  do j=1,4
 do i=1,3
if (j<= i) then
   d(i,j) = "AA"
else
   d(i,j) 

[PATCH, committed] Add self as maintainer of D front-end and libphobos

2018-10-23 Thread Iain Buclaw
David Edelsohn  wrote:
>
> I am pleased to announce that the GCC Steering Committee has
> accepted the D Language front-end and runtime for inclusion in GCC
> and appointed Iain Buclaw as maintainer.
>
> The patches still require approval by a Global Reviewer.
>
> Please join me in congratulating Iain on his new role.
> Please update your listing in the MAINTAINERS file.
>

Hi,

This adds myself as a maintainer of the D front-end and libphobos
runtime library.

Sending this now as it looks like the patch series has been given the
OK (thanks to Richard finding the time).

Thanks
Iain

---
ChangeLog:

2018-10-23  Iain Buclaw  

* MAINTAINERS: Add myself as D front-end and libphobos maintainer.

---
diff --git a/MAINTAINERS b/MAINTAINERS
index 0d6c81d4af6..55c4663f4d2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -150,6 +150,7 @@ BRIG (HSAIL) front end	Pekka Jääskeläinen	
 BRIG (HSAIL) front end 	Martin Jambor		
 c++			Jason Merrill		
 c++			Nathan Sidwell		
+D front end		Iain Buclaw		
 go			Ian Lance Taylor	
 objective-c/c++		Mike Stump		
 objective-c/c++		Iain Sandoe		
@@ -175,6 +176,7 @@ libquadmath		Jakub Jelinek		
 libvtv			Caroline Tice		
 libhsail-rt		Pekka Jääskeläinen	
 libhsail-rt		Martin Jambor		
+libphobos		Iain Buclaw		
 line map		Dodji Seketeli		
 soft-fp			Joseph Myers		
 scheduler (+ haifa)	Jim Wilson		


Re: Debug unordered containers code cleanup

2018-10-23 Thread François Dumont

On 10/23/2018 11:52 AM, Jonathan Wakely wrote:

On 22/10/18 22:45 +0200, François Dumont wrote:

I plan to commit the attached patch this week if not told otherwise.


Looks good.


This is to generalize usage of C++11 direct initialization in 
unordered containers.


It also avoids a number of safe iterator instantiations.


Would the following patch also make sense?

--- a/libstdc++-v3/include/debug/safe_unordered_container.h
+++ b/libstdc++-v3/include/debug/safe_unordered_container.h
@@ -66,18 +66,18 @@ namespace __gnu_debug
  void
  _M_invalidate_locals()
  {
-   auto __local_end = _M_cont()._M_base().end(0);
+   auto __local_end = _M_cont()._M_base().cend(0);
   this->_M_invalidate_local_if(
- [__local_end](__decltype(_M_cont()._M_base().cend(0)) __it)
+   [__local_end](__decltype(__local_end) __it)
   { return __it != __local_end; });
  }

  void
  _M_invalidate_all()
  {
-   auto __end = _M_cont()._M_base().end();
+   auto __end = _M_cont()._M_base().cend();
   this->_M_invalidate_if(
-   [__end](__decltype(_M_cont()._M_base().cend()) __it)
+   [__end](__decltype(__end) __it)
   { return __it != __end; });
   _M_invalidate_locals();
  }
@@ -92,7 +92,7 @@ namespace __gnu_debug

  /** Invalidates all local iterators @c x that reference this 
container,

 are not singular, and for which @c __pred(x) returns @c
- true. @c __pred will be invoked with the normal ilocal 
iterators

+ true. @c __pred will be invoked with the normal local iterators
 nested in the safe ones. */
  template
   void




Yes, looks like a nice cleanup too.

I'll integrate it in mine and commit all the changes once tests are 
completed.


François



Re: [RFC] GCC support for live-patching

2018-10-23 Thread Nicolai Stange
Hi Qing,

Qing Zhao  writes:

> thanks a lot for your detailed explanation of the source based live patch 
> creation procedure.
> really interesting and helpful information. 
>
> more questions and comments below:
>
>>> 
>>> One question here,  what’s the major benefit to prepare the patches 
>>> manually? 
>> 
>> There is none. We here at SUSE prefer the source based approach (as
>> opposed to binary diff) for a number of reasons and the manual live
>> patch creation is simply a consequence of not having any tooling for
>> this yet.
>> 
>> 
>> For reference, source based live patch creation involves the following
>> steps:
>> 
>> 1. Determine the initial set of to be patched functions:
>>   a.) Inspect the upstream diff for the fix in question, add
>>   any touched functions to the initial set.
>>   b.) For each function in the initial set, check whether it has been
>>   inlined/cloned/optimized and if so, add all its callers to the
>>   initial set.  Repeat until the initial set has stabilized.
>> 
>> 2. Copy & paste the initial set over to the new live patch sources.
>> 
>> 3. Make it compile, i.e. recursively copy any needed cpp macro, type, or
>>   functions definition and add references to data objects with static
>>   storage duration.
>>   The rules are:
>>   a.) For data objects with static storage duration, a reference to the
>>   original must always be made. (If the symbol is EXPORT()ed, then
>>   fine. Otherwise, for kGraft, this involves a kallsyms lookup at
>>   patch module load time, for upstream kernel live patching, this
>>   has been solved with those '.klp' relocations).
>>   b.) If a called function is available as a symbol from either vmlinux
>>   or some (usually the patched) module, do not copy the definition,
>>   but add a reference to it, just as in a.).
>>   c.) If a type, cpp macro or (usually inlined) function is provided by
>>   some "public" header in /include/, include that
>>   rather than copying the definition.  Counterexample: Non-public
>>   header outside of include/ like
>>   e.g. /fs/btrfs/qgroup.h.
>>   d.) Otherwise copy the definition to the live patch module sources.
>> 
>> Rule 3b is not strictly necessary, but it helps in reducing the live
>> patch code size which is a factor with _manual_ live patch creation.
>> 
>> For 1b.), we need help from GCC. Namely, we want to know when some
>> functions has been optimized and we want it to disable any of those IPA
>> optimization it (currently) isn't capable to report properly.
>
> Yes, this this is the place that GCC can help. and it’s also the motivation 
> for this current proposal.
>
>> 
>> Step 3.) is a bit tedious sometimes TBH and yes, w/o any tooling in
>> place, patch size would be a valid point. However, I'm currently working
>> on that and I'm optimistic that I'll have a working prototype soon.
>
> So, currently, step 3 is done manually?

Currently yes.


> If the initial set of patched functions is too big, this work is
> really tedious and error-prone.

For the "tedious" part: yes it sometimes is. But we haven't seen any
problems or bugs so far. So it's reliable in practice.


>
> without a good tool for step3, controlling the initial set size of patched 
> function is still meaningful.
>
>> 
>> That tool would be given the GCC command line from the original or "live
>> patch target" kernel compilation for the source file in question, the
>> set of functions as determined in 1.) and a number of user provided
>> filter scripts to make the decisions in 3.). As a result, it would
>> output a self-contained, minimal subset of the original kernel sources.
>> 
>> With that tooling in place, live patch code size would not be a real
>> concern for us.
>
> that’s good to know.
>
> however, my question here is:
>
> can this tool be easily adopted by other applications than linux kernel? i.e, 
> if there is another application that tries to use GCC’s live patching
> feature with manually created source patch, will your tool for step 3 be 
> readily used by this application?  Or, this application have to develop
> a similar but different tool for itself?

This tool's scope is the C99 language, more specifically the GCC dialect
and I'll try to keep the CLI agnostic to the application or build
environment anyway.

That said, my primary interest is the Linux kernel and I'm going to
make sure that it works there. It might not work out of the box for
random applications, but require some tweaking or even bug fixing.



>> 
>> So in conclusion, what we need from GCC is the information on when we
>> have to live patch callers due to optimizations. If that's not possible
>> for a particular class of optimization, it needs to be disabled.
>> 
>> OTOH, we definitely want to keep the set of these disabled optimizations
>> as small as possible in order to limit the impact of live patching on
>> kernel performance. In particular, disabling any of the "cloning"
>> 

[PATCH], PowerPC: Use f128 for long double built-ins if we have changed to use IEEE 128-bit floating point

2018-10-23 Thread Michael Meissner
This patch changes the name used by the l built-in functions that return
or are passed long double if the long double type is changed from the current
IBM long double format to the IEEE 128-bit format.

I have done a bootstrap and make check with no regressions on a little endian
power8 system.  Is it ok to check into the trunk?  This will need to be back
ported to the GCC 8.x branch.

[gcc]
2018-10-23  Michael Meissner  

* config/rs6000/rs6000.c (TARGET_MANGLE_DECL_ASSEMBLER_NAME):
Define as rs6000_mangle_decl_assembler_name.
(rs6000_mangle_decl_assembler_name): If the user switched from IBM
long double to IEEE long double, switch the names of the long
double built-in functions to be f128 instead of l.

[gcc/testsuite]
2018-10-23  Michael Meissner  

* gcc.target/powerpc/float128-math.c: New test to make sure the
long double built-in function names use the f128 form if the user
switched from IBM long double to IEEE long double.
* gcc.target/powerpc/ppc-fortran/ieee128-math.f90: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 265400)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1981,6 +1981,9 @@ static const struct attribute_spec rs600
 
 #undef TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P
 #define TARGET_SETJMP_PRESERVES_NONVOLATILE_REGS_P hook_bool_void_true
+
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME rs6000_mangle_decl_assembler_name
 
 
 /* Processor table.  */
@@ -38965,6 +38968,67 @@ rs6000_globalize_decl_name (FILE * strea
 #endif
 
 
+/* On 64-bit Linux and Freebsd systems, possibly switch the long double library
+   function names from l to f128 if the default long double type is
+   IEEE 128-bit.  Typically, with the C and C++ languages, the standard math.h
+   include file switches the names on systems that support long double as IEEE
+   128-bit, but that doesn't work if the user uses __builtin_l directly or
+   if they use Fortran.  Use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to
+   change this name.  We only do this if the default is long double is not IEEE
+   128-bit, and the user asked for IEEE 128-bit.  */
+
+static tree
+rs6000_mangle_decl_assembler_name (tree decl, tree id)
+{
+  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && TREE_CODE (decl) == FUNCTION_DECL && DECL_IS_BUILTIN (decl) )
+{
+  size_t len = IDENTIFIER_LENGTH (id);
+  const char *name = IDENTIFIER_POINTER (id);
+
+  if (name[len-1] == 'l')
+   {
+ bool has_long_double_p = false;
+ tree type = TREE_TYPE (decl);
+ machine_mode ret_mode = TYPE_MODE (type);
+
+ /* See if the function returns long double or long double
+complex.  */
+ if (ret_mode == TFmode || ret_mode == TCmode)
+   has_long_double_p = true;
+ else
+   {
+ function_args_iterator args_iter;
+ tree arg;
+
+ /* See if we have a long double or long double complex
+argument.  */
+ FOREACH_FUNCTION_ARGS (type, arg, args_iter)
+   {
+ machine_mode arg_mode = TYPE_MODE (arg);
+ if (arg_mode == TFmode || arg_mode == TCmode)
+   {
+ has_long_double_p = true;
+ break;
+   }
+   }
+   }
+
+ /* If we have long double, change the name.  */
+ if (has_long_double_p)
+   {
+ char *name2 = (char *) alloca (len + 4);
+ memcpy (name2, name, len-1);
+ strcpy (name2 + len - 1, "f128");
+ id = get_identifier (name2);
+   }
+   }
+}
+
+  return id;
+}
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
Index: gcc/testsuite/gcc.target/powerpc/float128-math.c
===
--- gcc/testsuite/gcc.target/powerpc/float128-math.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/float128-math.c(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx -O2 -mfloat128 -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Test whether we convert __builtin_l to __builtin_f128 if the
+   default long double type is IEEE 128-bit.  Also test that using the explicit
+   __builtin_f128 function does not interfere with the __builtin_l
+   function.  */
+
+extern __float128 sinf128 (__float128);
+
+void foo (__float128 *p, long double *q, long double *r)
+{
+  *p = sinf128 (*p);
+  *q = 

[PATCH] ux.texi: move "Quoting" and "Fix-it hints" from DiagnosticsGuidelines wiki page

2018-10-23 Thread David Malcolm
I want to move material from
  https://gcc.gnu.org/wiki/DiagnosticsGuidelines
into the new User Experience Guidelines chapter of our internals
documentation.  I've already update the link in that wiki page to point
to the pertinent HTML build of the docs:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html

This patch does it for the "Quoting" section, and adds a note about
fix-it hints that would make the wiki page's "Fix-it hints" section
redundant.

Martin and Manu: can you confirm you wrote this wiki material, and that
it's OK to add it to the GCC docs (I don't know what license the wiki
is under).  Are all such changes OK from a licensing perspective, for
material you contributed to the GCC wiki?

gcc/ChangeLog:
* doc/ux.texi (Quoting): New subsection, adapted from material at
https://gcc.gnu.org/wiki/DiagnosticsGuidelines written by
MartinSebor and ManuelLopezIbanez.
(Fix-it hints): Note that fix-it hints shouldn't be marked for
translation.
---
 gcc/doc/ux.texi | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/doc/ux.texi b/gcc/doc/ux.texi
index 9185f68..1061aa0 100644
--- a/gcc/doc/ux.texi
+++ b/gcc/doc/ux.texi
@@ -384,6 +384,38 @@ of the @code{auto_diagnostic_group} are related.  
(Currently it doesn't
 do anything with this information, but we may implement that in the
 future).
 
+@subsection Quoting
+Text should be quoted by either using the @samp{q} modifier in a directive
+such as @samp{%qE}, or by enclosing the quoted text in a pair of @samp{%<}
+and @samp{%>} directives, and never by using explicit quote characters.
+The directives handle the appropriate quote characters for each language
+and apply the correct color or highlighting.
+
+The following elements should be quoted in GCC diagnostics:
+
+@itemize @bullet
+@item
+Language keywords.
+@item
+Tokens.
+@item
+Boolean, numerical, character, and string constants that appear in the
+source code.
+@item
+Identifiers, including function, macro, type, and variable names.
+@end itemize
+
+Other elements such as numbers that do not refer to numeric constants that
+appear in the source code should not be quoted. For example, in the message:
+
+@smallexample
+argument %d of %qE must be a pointer type
+@end smallexample
+
+@noindent
+since the argument number does not refer to a numerical constant in the
+source code it should not be quoted.
+
 @subsection Spelling and Terminology
 
 See the @uref{https://gcc.gnu.org/codingconventions.html#Spelling
@@ -401,6 +433,9 @@ can also be viewed via 
@option{-fdiagnostics-generate-patch} and
 @option{-fdiagnostics-parseable-fixits}.  With the latter, an IDE
 ought to be able to offer to automatically apply the suggested fix.
 
+Fix-it hints contain code fragments, and thus they should not be marked
+for translation.
+
 Fix-it hints can be added to a diagnostic by using a @code{rich_location}
 rather than a @code{location_t} - the fix-it hints are added to the
 @code{rich_location} using one of the various @code{add_fixit} member
-- 
1.8.5.3



[PATCH] Fix PR87691: transparent_union attribute does not work with MODE_PARTIAL_INT

2018-10-23 Thread Jozef Lawrynowicz

msp430-elf uses the partial int type __int20 for pointers in the large memory
model. __int20 has PSImode, with bitsize of 20.

A few DejaGNU tests fail when built with -mlarge for msp430-elf, when
transparent unions are used containing pointers.
These are:
- gcc.c-torture/compile/pr34885.c
- gcc.dg/transparent-union-{1,2,3,4,5}.c

The issue is that the union is considered to have size of 32 bits (the
in-memory size of __int20), so unless mode_for_size as called by
compute_record_mode (both in stor-layout.c) is explicitly told to look for a
mode of class MODE_PARTIAL_INT, then a size of 32 will always return MODE_INT.
In this case, the union will have TYPE_MODE of SImode, but its field is
PSImode, so transparent_union has no effect.

The attached patch fixes the issue by allowing the TYPE_MODE of a union to be
set to the DECL_MODE of the widest field, if the mode is of class
MODE_PARTIAL_INT and the union would be passed by reference.

Some target ABIs mandate that unions be passed in integer registers, so to
avoid any potential ABI violations, the mode of the union is only changed if
it would be passed by reference.

Successfully bootstrapped and regstested trunk for x86_64-pc-linux-gnu, and
msp430-elf with -mlarge. For msp430-elf with -mlarge, the above DejaGNU tests
are also fixed.

Ok for trunk?

>From cc1ccfcc0d8adf7b0e1ca95a47a8a8e7e12fc99c Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 22 Oct 2018 21:02:10 +0100
Subject: [PATCH] Allow union TYPE_MODE to be set to the mode of the widest
 element if the union would be passed by reference

2018-10-23  Jozef Lawrynowicz  

	PR c/87691
	* gcc/stor-layout.c (compute_record_mode): Set TYPE_MODE of UNION_TYPE
	to the mode of the widest field iff the widest field has mode class
	MODE_INT, or MODE_PARTIAL_INT and the union would be passed by
	reference.
	* gcc/testsuite/gcc.target/msp430/pr87691.c: New test.
---
 gcc/stor-layout.c | 21 +---
 gcc/testsuite/gcc.target/msp430/pr87691.c | 41 +++
 2 files changed, 58 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/pr87691.c

diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 58a3aa3..c4f5f83 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -1834,7 +1834,13 @@ compute_record_mode (tree type)
   /* If this field is the whole struct, remember its mode so
 	 that, say, we can put a double in a class into a DF
 	 register instead of forcing it to live in the stack.  */
-  if (simple_cst_equal (TYPE_SIZE (type), DECL_SIZE (field)))
+  if (simple_cst_equal (TYPE_SIZE (type), DECL_SIZE (field))
+	  /* Partial int types (e.g. __int20) may have TYPE_SIZE equal to
+	 wider types (e.g. int32), despite precision being less.  Ensure
+	 that the TYPE_MODE of the struct does not get set to the partial
+	 int mode if there is a wider type also in the struct.  */
+	  && known_gt (GET_MODE_PRECISION (DECL_MODE (field)),
+		   GET_MODE_PRECISION (mode)))
 	mode = DECL_MODE (field);
 
   /* With some targets, it is sub-optimal to access an aligned
@@ -1844,10 +1850,17 @@ compute_record_mode (tree type)
 }
 
   /* If we only have one real field; use its mode if that mode's size
- matches the type's size.  This only applies to RECORD_TYPE.  This
- does not apply to unions.  */
+ matches the type's size.  This generally only applies to RECORD_TYPE.
+ For UNION_TYPE, if the widest field is MODE_INT then use that mode.
+ If the widest field is MODE_PARTIAL_INT, and the union will be passed
+ by reference, then use that mode.  */
   poly_uint64 type_size;
-  if (TREE_CODE (type) == RECORD_TYPE
+  if ((TREE_CODE (type) == RECORD_TYPE
+   || (TREE_CODE (type) == UNION_TYPE
+	   && (GET_MODE_CLASS (mode) == MODE_INT
+	   || (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT
+		   && targetm.calls.pass_by_reference (pack_cumulative_args (0),
+		   mode, type, 0)
   && mode != VOIDmode
   && poly_int_tree_p (TYPE_SIZE (type), _size)
   && known_eq (GET_MODE_BITSIZE (mode), type_size))
diff --git a/gcc/testsuite/gcc.target/msp430/pr87691.c b/gcc/testsuite/gcc.target/msp430/pr87691.c
new file mode 100644
index 000..c00425d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr87691.c
@@ -0,0 +1,41 @@
+/* PR 87691 - Test that a union containing __int20 and a float is not treated as
+   20-bits in size.  */
+
+/* { dg-do compile } */
+/* { dg-skip-if "no __int20 for mcpu=msp430" { *-*-* } { "-mcpu=msp430" } { "" } } */
+/* { dg-final { scan-assembler-not "MOVX.A" } } */
+
+/* To move a 20-bit value from memory (using indexed or indirect register
+   mode), onto the stack (also addressed using indexed or indirect register
+   mode), MOVX.A must be used. MOVA does not support these addressing modes.
+   Therefore, to check that the union is not manipulated as a 20-bit type,
+   test that no MOVX.A instructions are present in the assembly.

Re: [PATCH] Remove reduntant dumps and make tp_first_run dump more compact.

2018-10-23 Thread Jan Hubicka
> Hi.
> 
> I've noticed a redundancy in cgraph_node dump function and I would like to 
> simplify
> and compact how Function flags are printed. Plus I moved 'First run' to the 
> flags
> as well. One diff example:
> 
> @@ -133,8 +125,7 @@
>Referring: 
>Availability: available
>Profile id: 108032747
> -  First run: 6
> -  Function flags: count: 1 body only_called_at_startup nonfreeing_fn 
> only_called_at_startup
> +  Function flags: count:1 first_run:6 body only_called_at_startup 
> nonfreeing_fn
>Called by: 
>Calls: 
>  g/0 (g) @0x76ad8000
> 
> Patch survives regression tests on x86_64-linux-gnu.
> Ready for trunk?
OK,
thanks!
Honza
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-10-23  Martin Liska  
> 
>   * cgraph.c (cgraph_node::dump):
>   Remove reduntant dumps and make tp_first_run dump more compact.
> 
> libgcc/ChangeLog:
> 
> 2018-10-23  Martin Liska  
> 
>   * libgcov-profiler.c: Start from 1 in order to distinguish
>   functions which were seen and these that were not.
> ---
>  gcc/cgraph.c | 15 ++-
>  gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c |  4 ++--
>  gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c |  2 +-
>  libgcc/libgcov-profiler.c|  2 +-
>  5 files changed, 11 insertions(+), 14 deletions(-)
> 
> 

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 48bab9f2749..b432f7e6500 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2016,7 +2016,6 @@ cgraph_node::dump (FILE *f)
>if (profile_id)
>  fprintf (f, "  Profile id: %i\n",
>profile_id);
> -  fprintf (f, "  First run: %i\n", tp_first_run);
>cgraph_function_version_info *vi = function_version ();
>if (vi != NULL)
>  {
> @@ -2040,11 +2039,13 @@ cgraph_node::dump (FILE *f)
>fprintf (f, "  Function flags:");
>if (count.initialized_p ())
>  {
> -  fprintf (f, " count: ");
> +  fprintf (f, " count:");
>count.dump (f);
>  }
> +  if (tp_first_run > 0)
> +fprintf (f, " first_run:%i", tp_first_run);
>if (origin)
> -fprintf (f, " nested in: %s", origin->asm_name ());
> +fprintf (f, " nested in:%s", origin->asm_name ());
>if (gimple_has_body_p (decl))
>  fprintf (f, " body");
>if (process)
> @@ -2081,10 +2082,6 @@ cgraph_node::dump (FILE *f)
>  fprintf (f, " unlikely_executed");
>if (frequency == NODE_FREQUENCY_EXECUTED_ONCE)
>  fprintf (f, " executed_once");
> -  if (only_called_at_startup)
> -fprintf (f, " only_called_at_startup");
> -  if (only_called_at_exit)
> -fprintf (f, " only_called_at_exit");
>if (opt_for_fn (decl, optimize_size))
>  fprintf (f, " optimize_size");
>if (parallelized_function)
> @@ -2096,7 +2093,7 @@ cgraph_node::dump (FILE *f)
>  {
>fprintf (f, "  Thunk");
>if (thunk.alias)
> -fprintf (f, "  of %s (asm: %s)",
> + fprintf (f, "  of %s (asm:%s)",
>lang_hooks.decl_printable_name (thunk.alias, 2),
>IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk.alias)));
>fprintf (f, " fixed offset %i virtual value %i indirect_offset %i "
> @@ -2112,7 +2109,7 @@ cgraph_node::dump (FILE *f)
>fprintf (f, "  Alias of %s",
>  lang_hooks.decl_printable_name (thunk.alias, 2));
>if (DECL_ASSEMBLER_NAME_SET_P (thunk.alias))
> -fprintf (f, " (asm: %s)",
> + fprintf (f, " (asm:%s)",
>IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk.alias)));
>fprintf (f, "\n");
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
> index 455f923f3f4..a622df23ce6 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
> @@ -16,6 +16,6 @@ int main ()
>  {
>return foo ();
>  }
> -/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 1 
> "profile"} } */
>  /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 
> "profile"} } */
>  /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 2" 1 
> "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 3" 1 
> "profile"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
> index e6eaeb99810..497b585388e 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
> @@ -43,7 +43,7 @@ int main ()
>  
>return r;
>  }
> -/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 2 
> "profile"} } */
> -/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 
> "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 2 
> "profile"} } */
>  /* { 

libgo patch committed: Remove unused armArch, hwcap and hardDiv

2018-10-23 Thread Ian Lance Taylor
This patch by Tobias Klauser removes some variables from the runtime
package, which are unused after https://golang.org/cl/140057.  This
should fix GCC PR 87661.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu, not that that proves much.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265430)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6db7e35d3bcd75ab3cb15296a5ddc5178038c9c1
+771668f7137e560b2ef32c8799e5f8b4c4ee14a9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/os_linux_arm.go
===
--- libgo/go/runtime/os_linux_arm.go(revision 265430)
+++ libgo/go/runtime/os_linux_arm.go(working copy)
@@ -4,20 +4,7 @@
 
 package runtime
 
-import "unsafe"
-
-const (
-   _AT_PLATFORM = 15 //  introduced in at least 2.6.11
-
-   _HWCAP_VFP   = 1 << 6  // introduced in at least 2.6.11
-   _HWCAP_VFPv3 = 1 << 13 // introduced in 2.6.30
-   _HWCAP_IDIVA = 1 << 17
-)
-
 var randomNumber uint32
-var armArch uint8 = 6 // we default to ARMv6
-var hwcap uint32  // set by archauxv
-var hardDiv bool  // set if a hardware divider is available
 
 func archauxv(tag, val uintptr) {
switch tag {
@@ -27,15 +14,5 @@ func archauxv(tag, val uintptr) {
// it as a byte array.
randomNumber = uint32(startupRandomData[4]) | 
uint32(startupRandomData[5])<<8 |
uint32(startupRandomData[6])<<16 | 
uint32(startupRandomData[7])<<24
-
-   case _AT_PLATFORM: // v5l, v6l, v7l
-   t := *(*uint8)(unsafe.Pointer(val + 1))
-   if '5' <= t && t <= '7' {
-   armArch = t - '0'
-   }
-
-   case _AT_HWCAP: // CPU capability bit flags
-   hwcap = uint32(val)
-   hardDiv = (hwcap & _HWCAP_IDIVA) != 0
}
 }


Re: [RFC] GCC support for live-patching

2018-10-23 Thread Qing Zhao


> On Oct 23, 2018, at 4:11 AM, Miroslav Benes  wrote:
>> 
>> One question here,  what’s the major benefit to prepare the patches 
>> manually? 
> 
> I could almost quote what you wrote below. It is a C file, easy to review 
> and maintain. You have everything "under control". It allows to implement 
> tricky hacks easily by hand if needed.

Okay, I see. 

another question here:

>From my understanding of the live patching creation from Nicolai’s email, the 
>patch includes:

1. initial patched functions;
2. all the callers of any patched function if it’s been 
inlined/cloned/optimized;
3. recursively copy any needed cpp macro, type, or
  functions definition and add references to data objects with static
  storage duration.

during review and maintain procedure, are all the above 3 need to be reviewed 
and maintained?

>>> 
>>> So let me ask, what is your motivation behind this? Is there a real 
>>> problem you're trying to solve? It may have been mentioned somewhere and I 
>>> missed it.
>> 
>> the major functionality we want is:   to Only enable static inlining for 
>> live patching for one 
>> of our internal customers.   the major purpose is to control the patch code 
>> size explosion and
>> debugging complexity due to too much inlining of global functions for the 
>> specific application.
> 
> I hoped for more details, but ok.
at this time, this is the details I have. I can ask more if more details are 
needed.

> 
>> therefore, I proposed the multiple level of control for -flive-patching to 
>> satisfy multiple request from 
>> different users. 
>> 
>> So far, from the feedback, I see that among the 4 levels of control,   none, 
>> only-inline-static, inline,
>> and inline-clone,   “none” and “inline” are NOT needed at all.
>> 
>> however,  -flive-patching = [only-inline-static | inline-clone] are 
>> necessary.
>> 
>>> 
 
 3. Details of the proposal:
>>> 
>>> This sounds awfully complicated. Especially when there is a dumping option 
>>> in GCC thanks to Martin. What information do you miss there? We could 
>>> improve the analysis tool. So far, it has given us all the info we need.
>> 
>> Yes, it’s TRUE that the tool Martin wrote should serve the same purpose. 
>> nothing new from this
>> new GCC option -flive-patch-list compared to Martin’s tool.
>> 
>> However,  by simply adding this new GCC’s option, we can simplify the whole 
>> procedure for helping
>> live-patching. by only running  GCC with the new added options once, we can 
>> get the impacted function list
>> at the same time. No need to run another tool anymore.   
> 
> I probably do not understand completely. I thought that using the 
> option you would "preprocess" everything during the kernel build and then 
> you'd need a tool to get the impacted function list for a given function. 
> In that case, Martin's work is more than sufficient.
> 
> Now I think you meant to run GCC with a given function, build everything 
> and the list. Iteratively for every to-be-patched function. It does not 
> sound better to me.

there might be misunderstanding among us for this part.  Let me explain my 
understanding first:

1. with martin’s tool, there are two steps to get the impacted function list 
for patched functions:

Step1,  build kernel with GCC with -fdump-ipa-clones + a bunch of options 
to disable bunch of ipa optimizations. ;
Step2,  using the tool kgraft-ipa-analysis.py to analyze the dumped file 
from step1 to report the impacted function list.

2. with the new functionality of the GCC proposed in this proposal, 
-flive-patching -flive-patch-list

Step1,  build kernel with GCC with  -flive-patching -flive-patch-list

then gcc will automatically disable the unsafe ipa optimizations and report 
the impacted function list with the safe ipa optimizations. 

compare 1 and 2,  I think that 2 is better and much more convenient than 1. 
another benefit from 2 is:

if later we want more ipa optimization to be On for live-patching for the 
runtime performance purpose, we can expand it easily to include those
ipa optimization and at the same time report the additional impacted function 
list with the new ipa optimizations. 

however, for 1,  this will be not easy to be extended. 

do I miss anything here?

> 
>> this is the major benefit from this new option.
>> 
>> anyway, if most of the people think that this new option is not necessary, I 
>> am fine to delete it. 
>>> 
>>> In the end, I'd be more than happy with what has been proposed in this 
>>> thread by the others. To have a way to guarantee that GCC would not apply 
>>> an optimization that could potentially destroy our effort to livepatch a 
>>> running system.
>> 
>> So, the major functionality you want from GCC is:
>> 
>> -flive-patching=inline-clone
>> 
>> Only enable inlining and all optimizations that internally create clone,
>> for example, cloning, ipa-sra, partial inlining, etc; disable all 
>> other IPA optimizations/analyses.
>> 
>> As a result, 

Re: [PATCH 02/14] Add D frontend (GDC) implementation.

2018-10-23 Thread Iain Buclaw
On Tue, 23 Oct 2018 at 15:48, Richard Sandiford
 wrote:
>
> Iain Buclaw  writes:
> > I'm just going to post the diff since the original here, just to show
> > what's been done since review comments.
> >
> > I think I've covered all that's been addressed, except for the couple
> > of notes about the quadratic parts (though I think one of them is
> > actually O(N^2)).  I've raised bug reports on improving them later.
> >
> > I've also rebased them against trunk, so there's a couple new things
> > present that are just to support build.
>
> Thanks, this is OK when the frontend is accepted in principle
> (can't remember where things stand with that).
>

As discussed, the front-end has already been approved by the SC.

I'm not sure if there's anything else further required, or if any
final review needs to be done.

Thanks.
-- 
Iain


Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-23 Thread Olivier Hainque
Hi Wilco,

> On 18 Oct 2018, at 19:08, Wilco Dijkstra  wrote:

>> I wondered if we could set it to R11 unconditionally and picked
>> the way ensuring no change for !vxworks ports, especially since I
>> don't have means to test more than what I described above.
> 
> Yes it should always be the same register, there is no gain in switching
> it dynamically. I'd suggest to use X9 since X8 is the last register used for
> arguments (STATIC_CHAIN_REGNUM is passed when calling a nested
> function) and some of the higher registers may be used as temporaries in
> prolog/epilog.

Thanks for your feedback!  I ported the patches
to gcc-8 and was able to get a functional toolchain
for aarch64-wrs-vxworks7 and aarch64-elf, passing
full Acats for a couple of runtime variants on VxWorks
(compiled for RTP or kernel mode) as well as a small
internal testsuite we have, dedicated to cross configurations.

All the patches apply directly on mainline.

As for the original patch, I also sanity checked that
"make all-gcc" passes (self tests included) there for
--target=aarch64-elf --enable-languages=c


There are three changes to the common aarch64 port files.

It turns out that X9 doesn't work for STATIC_CHECK_REGNUM
because this conflicts with the registers used for -fstack-check:

  /* The pair of scratch registers used for stack probing.  */
  #define PROBE_STACK_FIRST_REG  9
  #define PROBE_STACK_SECOND_REG 10

I didn't find that immediately (read, I first experienced a few
badly crashing test runs) because I searched for R9_REGNUM to check
for other uses, so the first patchlet attached simply adjusts
the two #defines above to use R9/R10_REGNUM.


2018-10-23  Olivier Hainque  

* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
R9_REGNUM instead of 9.
(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.


The second patch is the one which I proposed a few days ago
to allow a subtarget (in my case, the VxWorks port) to state that
R18 is to be considered fixed. Two changes compared to the original
patch: a comment attached to the default definition of FIXED_R18,
and the unconditional use of R11_REGNUM as an alternate STATIC_CHAIN_REGNUM.

I suppose the latter could require extra testing than what I was
able to put in (since this is also changing for !vxworks configurations),
which Sam very kindly did on the first instance.

I didn't touch CALL_SAVED_REGISTERS since this is 1 for r18 already.
I also didn't see a strong reason to move to a more dynamic scheme,
through conditional_register_usage.

2018-03-18  Olivier Hainque  

* config/aarch64/aarch64.h (FIXED_R18): New internal
configuration macro, defaulted to 0.
(FIXED_REGISTERS): Use it.
(STATIC_CHAIN_REGNUM): Use r11 instead of r18.


The third patch proposes the introduction of support for a
conditional SUBTARGET_OVERRIDE_OPTIONS macro, as many other
architectures have, and which is needed by all VxWorks ports.

In the current state, this one could possibly impact only
VxWorks, as no other config file would define the macro.

I'm not 100% clear on the possible existence of rules regarding
the placement of this within the override_options functions. We used
something similar to what other ports do, and it worked just fine
for VxWorks.

2018-10-23  Olivier Hainque  

* config/aarch64/aarch64.c (aarch64_override_options): Once
arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
if defined.

I'm happy to adjust any of all this if needed of course.

Thanks in advance for your feedback!

With Kind Regards,

Olivier


From dd32f3611e4cb10f0b48d58d84c96951befbd99f Mon Sep 17 00:00:00 2001
From: Olivier Hainque 
Date: Sun, 21 Oct 2018 10:32:13 +0200
Subject: [PATCH 1/6] Use R9/R10_REGNUM to designate stack checking registers

---
 gcc/config/aarch64/aarch64.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6810db7..f03e803 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3793,8 +3793,8 @@ aarch64_libgcc_cmp_return_mode (void)
 #endif
 
 /* The pair of scratch registers used for stack probing.  */
-#define PROBE_STACK_FIRST_REG  9
-#define PROBE_STACK_SECOND_REG 10
+#define PROBE_STACK_FIRST_REG  R9_REGNUM
+#define PROBE_STACK_SECOND_REG R10_REGNUM
 
 /* Emit code to probe a range of stack addresses from FIRST to FIRST+POLY_SIZE,
inclusive.  These are offsets from the current stack pointer.  */
-- 
1.7.10.4

From 399f0f4773bc3771f2a03c1e750f3b09d6b4824f Mon Sep 17 00:00:00 2001
From: Olivier Hainque 
Date: Fri, 19 Oct 2018 12:54:26 +0200
Subject: [PATCH 2/6] Introduce FIXED_R18 in aarch64.h

---
 gcc/config/aarch64/aarch64.h |   12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 976f9af..4069e29 100644
--- a/gcc/config/aarch64/aarch64.h
+++ 

Re: [PATCH] powerpc: Optimized conversion of IBM long double to int128/int64

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 09:01:26PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> This new implementation of fixunstfdi and fixunstfti
> gives 16X performance improvement.

:-)

>   * libgcc/config/rs6000/t-ppc64-fp (LIB2ADD): Add
>   $(srcdir)/config/rs6000/fixunstfti.c.

And fixunstfdi.c?

>   * libgcc/config/rs6000/ppc64-fp.c (__fixunstfdi): Remove definition.
>   * libgcc/config/rs6000/fixunstfti.c: New file.
>   * libgcc/config/rs6000/fixunstfdi.c: Likewise.
>   * libgcc/config/rs6000/ibm-ldouble.h: Likewise.

libgcc/ has its own changelog; the path names in the changelog should be
relative to that (so should start with config/).

Does -m32 still work after this?  (Did it before?)

Okay for trunk with the changelog fixed up (unless it broke -m32 ;-) )


Segher


Re: [PATCH] Provide extension hint for aarch64 target (PR driver/83193).

2018-10-23 Thread Martin Sebor

On 10/22/2018 07:05 AM, Martin Liška wrote:

On 10/16/18 6:57 PM, James Greenhalgh wrote:

On Mon, Oct 08, 2018 at 05:34:52AM -0500, Martin Liška wrote:

Hi.

I'm attaching updated version of the patch.


Can't say I'm thrilled by the allocation/free (aarch64_parse_extension
allocates, everyone else has to free) responsibilities here.


Agreed.



If you can clean that up I'd be much happier. The overall patch is OK.


I rewrote that to use std::string, hope it's improvement?


If STR below is not nul-terminated the std::string ctor is not
safe.  If it is nul-terminated but LEN is equal to its length
then the nul assignment should be unnecessary.  If LEN is less
than its length and the goal is to truncate the string then
calling resize() would be the right way to do it.  Otherwise,
assigning a nul to an element into the middle won't truncate
(it will leave the remaining elements there).  (This may not
matter if the string isn't appended to after that.)

@@ -274,6 +277,11 @@
 aarch64_parse_extension (const char *str, unsigned long *isa_flags)
   if (opt->name == NULL)
{
  /* Extension not found in list.  */
+ if (invalid_extension)
+   {
+ *invalid_extension = std::string (str);
+ (*invalid_extension)[len] = '\0';
+   }

I also noticed a minor typo while quickly skimming the rest
of the patch:

@@ -11678,7 +11715,8 @@
 aarch64_handle_attr_isa_flags (char *str)
break;

   case AARCH64_PARSE_INVALID_FEATURE:
-	error ("invalid value (\"%s\") in % pragma or attribute", 
str);

+   error ("invalid feature modified %s of value (\"%s\") in "
+	   "% pragma or attribute", invalid_extension.c_str (), 
str);

break;

   default:

Based on the other messages in the patch the last word in "invalid
feature modified" should be "modifier"


Martin



Martin



Thanks,
James


From d36974540cda9fb0e159103fdcf92d26ef2f1b94 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 4 Oct 2018 16:31:49 +0200
Subject: [PATCH] Provide extension hint for aarch64 target (PR driver/83193).

gcc/ChangeLog:

2018-10-05  Martin Liska  

PR driver/83193
* common/config/aarch64/aarch64-common.c (aarch64_parse_extension):
Add new argument invalid_extension.
(aarch64_get_all_extension_candidates): New function.
(aarch64_rewrite_selected_cpu): Add NULL to function call.
* config/aarch64/aarch64-protos.h (aarch64_parse_extension): Add
new argument.
(aarch64_get_all_extension_candidates): New function.
* config/aarch64/aarch64.c (aarch64_parse_arch): Add new
argument invalid_extension.
(aarch64_parse_cpu): Likewise.
(aarch64_print_hint_for_extensions): New function.
(aarch64_validate_mcpu): Provide hint about invalid extension.
(aarch64_validate_march): Likewise.
(aarch64_handle_attr_arch): Pass new argument.
(aarch64_handle_attr_cpu): Provide hint about invalid extension.
(aarch64_handle_attr_isa_flags): Likewise.

gcc/testsuite/ChangeLog:

2018-10-05  Martin Liska  

PR driver/83193
* gcc.target/aarch64/spellcheck_7.c: New test.
* gcc.target/aarch64/spellcheck_8.c: New test.
* gcc.target/aarch64/spellcheck_9.c: New test.
---
 gcc/common/config/aarch64/aarch64-common.c| 24 +-
 gcc/config/aarch64/aarch64-protos.h   |  4 +-
 gcc/config/aarch64/aarch64.c  | 75 +++
 .../gcc.target/aarch64/spellcheck_7.c | 12 +++
 .../gcc.target/aarch64/spellcheck_8.c | 13 
 .../gcc.target/aarch64/spellcheck_9.c | 13 
 6 files changed, 121 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_9.c







Re: [PATCH][c++] Fix DECL_BY_REFERENCE of clone parms

2018-10-23 Thread Jakub Jelinek
On Tue, Oct 23, 2018 at 06:28:27PM +0200, Tom de Vries wrote:
> On 7/31/18 11:22 AM, Richard Biener wrote:
> > Otherwise OK for trunk and also for branches after a while.
> 
> I just backported this fix to gcc-8-branch and gcc-7-branch.
> 
> I noticed that the gcc-6 branch is frozen, and changes require RM
> approval.  Do you want this fix in gcc-6?

This is ok for gcc-6 now.  Thanks.

Jakub


Re: [PATCH][c++] Fix DECL_BY_REFERENCE of clone parms

2018-10-23 Thread Tom de Vries
On 7/31/18 11:22 AM, Richard Biener wrote:
> Otherwise OK for trunk and also for branches after a while.

Jakub,

I just backported this fix to gcc-8-branch and gcc-7-branch.

I noticed that the gcc-6 branch is frozen, and changes require RM
approval.  Do you want this fix in gcc-6?

Thanks,
- Tom


Re: [PATCH, rs6000 2/2] Add compatible implementations of x86 SSSE3 intrinsics

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 10:29:35AM -0500, Paul Clarke wrote:
> On 10/22/2018 06:38 PM, Segher Boessenkool wrote:
> > On Mon, Oct 22, 2018 at 01:26:11PM -0500, Paul Clarke wrote:
> >> Target tests for the intrinsics defined in pmmintrin.h, copied from
> >> gcc.target/i386.
> >>
> >> Tested on POWER8 ppc64le and ppc64 (-m64 and -m32, the latter only 
> >> reporting
> >> 16 new unsupported tests), and also by forcing -mcpu=power7 on ppc64.
> > 
> > Why are they unsupported?  lp64?  Why do many of those tests require
> > lp64 anyway?  It's not obvious to me.
> 
> None of the x86 intrinsics compatibility implementation code has thus far 
> supported -m32, and I'd venture that it's not interesting to anyone.  Or at 
> least not worth the effort.

You can use it with -m32 just fine.  Disabling the tests isn't a good idea.
Either disable the *feature*, or enable the tests.

> > You tested on a >=p8 (with a compiler defaulting to that, too) I hope ;-)
> 
> As stated, "Tested on POWER8 ppc64le" (which defaults to -mcpu=power8), or 
> did you mean something else?

The big-endian part.  It does not default to p8 unless you arrange for
that specially.  The tests require p8 hardware, but that means if you
tested BE without --with-cpu=power8 or similar you really didn't test
anything.


Segher


Re: [patch, fortran] Implement FINDLOC

2018-10-23 Thread Dominique d'Humières



> Le 22 oct. 2018 à 23:00, Thomas Koenig  a écrit :
> 
> Hi Dominique,
> 
>> With your patch, compiling the following test
>> program logtest3
>>implicit none
>>logical :: x = .true.
>>integer, parameter :: I_FINDLOC_BACK(1) = findloc([1,1],1, &
>>   back=x)
>> end program logtest3
>> gives an ICE
> 
> I sometimes wonder where you get all these test cases from…

This is a reduction of a James van Buskirk's test at 
https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.fortran/GpaACNKn0Ds

> 
> Anyway, the attached patch fixes this,

It now gives the error

   4 |integer, parameter :: I_FINDLOC_BACK(1) = findloc([1,1],1, &
  |1
Error: transformational intrinsic 'findloc' at (1) is not permitted in an 
initialization expression

However a similar test

program logtest3 
   implicit none 
   integer, parameter :: A1 = 2 
   logical, parameter :: L1 = transfer(A1,.FALSE.)
   integer, parameter :: I_FINDLOC_MASK(1) = findloc([1,1],1, & 
  mask=[L1,.TRUE.]) 
   print *, A1, L1, I_FINDLOC_MASK(1)
end program logtest3 

compiles and gives '   2 F   2’ at run time. Also I see several 
transformational intrinsic accepted as initialization expressions.

The following test

program logtest3 
   implicit none 
! ! 
! *** Everything depends on this parameter ***! 

   integer, parameter :: A1 = 2
   logical :: L
   L = transfer(A1,L) 
   call sub(L) 
end program logtest3 

subroutine sub(x) 
   implicit none 
   logical x 
   integer a(1) 
   character(*), parameter :: strings(2) = ['.TRUE. ','.FALSE.'] 

   a = findloc([1,1],1,mask=[x,.TRUE.]) 
   write(*,'(a)') 'Value by FINDLOC(MASK): '// & 
  trim(strings(a(1))) 
   a = findloc([1,1],1,back=x) 
   write(*,'(a)') 'Value by FINDLOC(BACK): '// & 
  trim(strings(3-a(1))) 

end subroutine sub 

does not link:

8 |L = transfer(A1,L)
  |   1
Warning: Assigning value other than 0 or 1 to LOGICAL has undefined result at 
(1)
Undefined symbols for architecture x86_64:
  "__gfortran_findloc0_i4", referenced from:
  _sub_ in ccnoLKfH.o
  "__gfortran_mfindloc0_i4", referenced from:
  _sub_ in ccnoLKfH.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

Finally the line before the end of findloc_6.f90 should be

  if (findloc(ch,"CC ",dim=1,mask=false) /= 0) stop 23

TIA

Dominique

>  plus the print *, instead
> of test for return values, plus the whitespace issues mentioned
> by Bernhard. Patch gzipped this time to let it go through to
> gcc-patches.
> 
> OK for trunk?
> 
> Regards
> 
>   Thomas
> 



[PATCH v3] Avoid unnecessarily numbered clone symbols

2018-10-23 Thread Michael Ploujnikov
On 2018-10-21 09:14 PM, Michael Ploujnikov wrote:
> Continuing from https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01258.html
> 
> Fixed up the code after the change to concat suggested by Bernhard
> Reutner.
> 
> Outstanding question still remains:
> 
> To write an exact replacement for numbered_clone_function_name (apart
> from the numbering) I also need to copy the double underscore
> prefixing behaviour done by ASM_PN_FORMAT (right?)  which is used by
> ASM_FORMAT_PRIVATE_NAME. Does that mean that I can't use my
> suffixed_function_name to replace the very similar looking code in
> cgraph_node::create_virtual_clone? Or is it just missing the double
> underscore prefix by mistake?

I found https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01028.html which
answered my question so now I'm just simplifying
cgraph_node::create_virtual_clone with ACONCAT.


- Michael
From 383c64faa956c8b06e680808ef275acb6a746158 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov 
Date: Tue, 7 Aug 2018 20:36:53 -0400
Subject: [PATCH 1/4] Rename clone_function_name_1 and clone_function_name to
 clarify usage.

gcc:
2018-10-23  Michael Ploujnikov  

   * gcc/cgraph.h: Rename clone_function_name_1 to
 numbered_clone_function_name_1. Rename clone_function_name to
 numbered_clone_function_name.
   * cgraphclones.c: Ditto.
   * config/rs6000/rs6000.c: Ditto.
   * lto/lto-partition.c: Ditto.
   * multiple_target.c: Ditto.
   * omp-expand.c: Ditto.
   * omp-low.c: Ditto.
   * omp-simd-clone.c: Ditto.
   * symtab.c: Ditto.
---
 gcc/cgraph.h   |  4 ++--
 gcc/cgraphclones.c | 22 +-
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/lto/lto-partition.c|  4 ++--
 gcc/multiple_target.c  |  8 
 gcc/omp-expand.c   |  2 +-
 gcc/omp-low.c  |  4 ++--
 gcc/omp-simd-clone.c   |  2 +-
 gcc/symtab.c   |  3 ++-
 9 files changed, 28 insertions(+), 23 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index a8b1b4c..3583f7e 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,8 +2368,8 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
-tree clone_function_name (tree decl, const char *);
+tree numbered_clone_function_name_1 (const char *, const char *);
+tree numbered_clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec *,
 			   bool, bitmap, bool, bitmap, basic_block);
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..4395806 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -316,7 +316,8 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node)
   gcc_checking_assert (!DECL_RESULT (new_decl));
   gcc_checking_assert (!DECL_RTL_SET_P (new_decl));
 
-  DECL_NAME (new_decl) = clone_function_name (thunk->decl, "artificial_thunk");
+  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl,
+		   "artificial_thunk");
   SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
 
   new_thunk = cgraph_node::create (new_decl);
@@ -514,11 +515,11 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
 
 static GTY(()) unsigned int clone_fn_id_num;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return NAME appended with string SUFFIX and a unique unspecified
+   number.  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+numbered_clone_function_name_1 (const char *name, const char *suffix)
 {
   size_t len = strlen (name);
   char *tmp_name, *prefix;
@@ -531,13 +532,15 @@ clone_function_name_1 (const char *name, const char *suffix)
   return get_identifier (tmp_name);
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+/* Return a new assembler name for a clone of DECL.  Apart from the
+   string SUFFIX, the new name will end with a unique unspecified
+   number.  */
 
 tree
-clone_function_name (tree decl, const char *suffix)
+numbered_clone_function_name (tree decl, const char *suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return numbered_clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
 }
 
 
@@ -585,7 +588,8 @@ cgraph_node::create_virtual_clone (vec redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl,
+			   numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
@@ -964,7 +968,7 @@ cgraph_node::create_version_clone_with_body
   = build_function_decl_skip_args (old_decl, args_to_skip, 

[PATCH] powerpc: Optimized conversion of IBM long double to int128/int64

2018-10-23 Thread Rajalakshmi Srinivasaraghavan
This new implementation of fixunstfdi and fixunstfti
gives 16X performance improvement.
The design is focused on:
- Making sure the end result was a pure leaf function that
  only needed builtins or inline functions.
- Assumed power8 direct register transfer and accessed the IBM
  long double as int bit field structure.
- Understanding the quirks of IBM long double and decompose the
  code in to a set of optimized sub cases.
Tested on powerpc64le.

2018-10-20  Steven Munroe  
Rajalakshmi Srinivasaraghavan  

* libgcc/config/rs6000/t-ppc64-fp (LIB2ADD): Add
$(srcdir)/config/rs6000/fixunstfti.c.
* libgcc/config/rs6000/ppc64-fp.c (__fixunstfdi): Remove definition.
* libgcc/config/rs6000/fixunstfti.c: New file.
* libgcc/config/rs6000/fixunstfdi.c: Likewise.
* libgcc/config/rs6000/ibm-ldouble.h: Likewise.
---
 libgcc/config/rs6000/fixunstfdi.c  | 124 
 libgcc/config/rs6000/fixunstfti.c  | 125 +
 libgcc/config/rs6000/ibm-ldouble.h | 121 
 libgcc/config/rs6000/ppc64-fp.c|  24 --
 libgcc/config/rs6000/t-ppc64-fp|   5 +-
 5 files changed, 374 insertions(+), 25 deletions(-)
 create mode 100755 libgcc/config/rs6000/fixunstfdi.c
 create mode 100755 libgcc/config/rs6000/fixunstfti.c
 create mode 100755 libgcc/config/rs6000/ibm-ldouble.h

diff --git a/libgcc/config/rs6000/fixunstfdi.c 
b/libgcc/config/rs6000/fixunstfdi.c
new file mode 100755
index 000..1b1a4f280bd
--- /dev/null
+++ b/libgcc/config/rs6000/fixunstfdi.c
@@ -0,0 +1,124 @@
+/* Convert IBM long double to 64bit unsigned integer.
+
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#if defined(__powerpc64__) || defined (__64BIT__) || defined(__ppc64__)
+#include 
+#include "ibm-ldouble.h"
+
+typedef unsigned int UDItype __attribute__ ((mode (DI)));
+typedef float TFtype __attribute__ ((mode (TF)));
+extern UDItype __fixunstfdi (TFtype);
+
+#define TWO53 9007199254740992.0L
+#define TWO64 18446744073709551616.0L
+
+UDItype
+__fixunstfdi (TFtype a)
+{
+  unsigned long result;
+  unsigned long qi0, qi1;
+  union ibm_extended_long_double ld;
+  uint64_t l0, l1;
+  long exp0, exp1;
+  const uint64_t two52 = 0x10;
+  if (__builtin_unpack_longdouble (a, 0) < TWO53)
+{
+  /* In this case the integer portion is completely contained
+ within the high double.  So use the hardware convert to
+ integer doubleword, and then extend to int.  */
+  l1 = __builtin_unpack_longdouble (a, 0);
+  result = l1;
+}
+  else
+{
+  if (a < TWO64)
+   {
+ ld.ld = a;
+ l0 = two52 | ((uint64_t)ld.d[0].ieee.mantissa0 << 32)
+ | ld.d[0].ieee.mantissa1;
+ l1 = two52 | ((uint64_t)ld.d[1].ieee.mantissa0 << 32)
+ | ld.d[1].ieee.mantissa1;
+ exp0 = ld.d[0].ieee.exponent - IEEE754_DOUBLE_BIAS;
+ exp1 = ld.d[1].ieee.exponent - IEEE754_DOUBLE_BIAS;
+ /* The high double shift is (non-negative) because in this
+case we know the value it greater than 2^53 -1.  */
+ qi0 = l0;
+ qi0 = qi0 << (exp0 - 52);
+ /* The low double is tricky because it could be
+zero/denormal and have a large negative exponent.  */
+ if ( exp1 > -1022)
+   {
+ /* Need to right justify the integer portion of the
+low double.  This may be a left or right shift.  */
+ exp1 = exp1 - 52;
+ if (exp1 < 0)
+   {
+ /* Negative exponent,  shift right to truncate.  */
+ l1 = l1 >> (-exp1);
+ /* If we 

Re: [PATCH, rs6000 2/2] Add compatible implementations of x86 SSSE3 intrinsics

2018-10-23 Thread Paul Clarke
On 10/22/2018 06:38 PM, Segher Boessenkool wrote:
> On Mon, Oct 22, 2018 at 01:26:11PM -0500, Paul Clarke wrote:
>> Target tests for the intrinsics defined in pmmintrin.h, copied from
>> gcc.target/i386.
>>
>> Tested on POWER8 ppc64le and ppc64 (-m64 and -m32, the latter only reporting
>> 16 new unsupported tests), and also by forcing -mcpu=power7 on ppc64.
> 
> Why are they unsupported?  lp64?  Why do many of those tests require
> lp64 anyway?  It's not obvious to me.

None of the x86 intrinsics compatibility implementation code has thus far 
supported -m32, and I'd venture that it's not interesting to anyone.  Or at 
least not worth the effort.

> You tested on a >=p8 (with a compiler defaulting to that, too) I hope ;-)

As stated, "Tested on POWER8 ppc64le" (which defaults to -mcpu=power8), or did 
you mean something else?

PC



Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 05:16:38PM +0200, Andreas Schwab wrote:
> This miscompiles libffi and libgo on ia64.

Ouch.  I cannot read ia64 machine code without a lot of handholding...
Any hints what is wrong?


Segher


Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Andreas Schwab
This miscompiles libffi and libgo on ia64.

The following libffi tests fail:

libffi.call/nested_struct.c -W -Wall -Wno-psabi -O2 -fomit-frame-pointer 
execution test
libffi.call/nested_struct.c -W -Wall -Wno-psabi -O2 execution test
libffi.call/nested_struct.c -W -Wall -Wno-psabi -Os execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -O2 -fomit-frame-pointer 
execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -O2 execution test
libffi.call/nested_struct1.c -W -Wall -Wno-psabi -Os execution test
libffi.call/stret_large.c -W -Wall -Wno-psabi -Os output pattern test
libffi.call/stret_large2.c -W -Wall -Wno-psabi -Os output pattern test
libffi.call/stret_medium2.c -W -Wall -Wno-psabi -Os output pattern test

And a lot of libgo tests fail.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [ARM/FDPIC v3 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 02:58:21PM +0100, Richard Earnshaw (lists) wrote:
> On 15/10/2018 11:10, Christophe Lyon wrote:
> > Do you mean to also make -mfdpic non-existent/rejected when GCC is not
> > configured
> > for arm-uclinuxfdpiceabi? 
> 
> Ideally doesn't exist, so that it doesn't show up in things like --help
> when it doesn't work.
> 
> > How to achieve that?
> 
> Good question, I'm not sure, off hand.  It might be possible to make the
> config machinery add additional opt files, but it's not something I've
> tried.  You might want to try adding an additional opt file to
> extra_options for fdpic targets.

That should work yes.  You could look at how 476.opt is added for powerpc,
it is a comparable situation.


Segher


Re: [PATCH] Fix g++.dg/cpp2a/lambda-this3.C (Re: PATCH to enable testing C++17 by default)

2018-10-23 Thread Jason Merrill
OK.
On Tue, Oct 23, 2018 at 4:52 AM Jakub Jelinek  wrote:
>
> On Wed, Oct 17, 2018 at 03:31:43PM -0400, Marek Polacek wrote:
> > As discussed in  
> > it
> > seems to be a high time we turned on testing C++17 by default.
> >
> > The only interesting part is at the very end, otherwise most of the changes 
> > is
> > just using { target c++17 } instead of explicit dg-options.  Removing
> > dg-options has the effect that DEFAULT_CXXFLAGS comes in play, so I've 
> > removed
> > a bunch of stray semicolons to fix -Wpedantic errors.
> >
> > I wonder if we also want to enable 2a, but the overhead could be too much.  
> > Or
> > use 2a instead of 17?
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2018-10-17  Marek Polacek  
> >
> >   * g++.dg/*.C: Use target c++17 instead of explicit dg-options.
> >   * lib/g++-dg.exp: Don't test C++11 by default.  Add C++17 to
> >   the list of default stds to test.
>
> > diff --git gcc/testsuite/g++.dg/cpp2a/lambda-this3.C 
> > gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > index 5e5c8b3d50f..d1738ea7d17 100644
> > --- gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > +++ gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > @@ -1,6 +1,6 @@
> >  // P0806R2
> > -// { dg-do compile }
> > -// { dg-options "-std=c++17" }
> > +// { dg-do compile { target c++17 } }
> > +// { dg-options "" }
> >
> >  struct X {
> >int x;
>
> This test now fails with -std=gnu++2a:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: note: add explicit 
> 'this' or '*this' capture
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 8)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 16)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 17)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 18)
> PASS: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a (test for excess errors)
>
> The following patch fixes this, tested on x86_64-linux with
> make check-c++-all RUNTESTFLAGS=dg.exp=lambda-this3.C
> Ok for trunk?
>
> 2018-10-23  Jakub Jelinek  
>
> * g++.dg/cpp2a/lambda-this3.C: Limit dg-bogus directives to 
> c++17_down only.
> Add expected warnings and messages for c++2a.
>
> --- gcc/testsuite/g++.dg/cpp2a/lambda-this3.C.jj2018-10-22 
> 09:28:06.807650016 +0200
> +++ gcc/testsuite/g++.dg/cpp2a/lambda-this3.C   2018-10-23 10:48:13.992577673 
> +0200
> @@ -5,7 +5,9 @@
>  struct X {
>int x;
>void foo (int n) {
> -auto a1 = [=] { x = n; }; // { dg-bogus "implicit capture" }
> +auto a1 = [=] { x = n; }; // { dg-bogus "implicit capture" "" { target 
> c++17_down } }
> + // { dg-warning "implicit capture of 'this' via 
> '\\\[=\\\]' is deprecated" "" { target c++2a } .-1 }
> + // { dg-message "add explicit 'this' or 
> '\\\*this' capture" "" { target c++2a } .-2 }
>  auto a2 = [=, this] { x = n; };
>  // { dg-warning "explicit by-copy capture" "" { target c++17_down } .-1 }
>  auto a3 = [=, *this]() mutable { x = n; };
> @@ -13,9 +15,15 @@ struct X {
>  auto a5 = [&, this] { x = n; };
>  auto a6 = [&, *this]() mutable { x = n; };
>
> -auto a7 = [=] { // { dg-bogus "implicit capture" }
> -  auto a = [=] { // { dg-bogus "implicit capture" }
> -auto a2 = [=] { x = n; }; // { dg-bogus "implicit capture" }
> +auto a7 = [=] { // { dg-bogus "implicit capture" "" { target c++17_down 
> } }
> +   // { dg-warning "implicit capture of 'this' via 
> '\\\[=\\\]' is deprecated" "" { target c++2a } .-1 }
> +   // { dg-message "add explicit 'this' or '\\\*this' 
> capture" "" { target c++2a } .-2 }
> +  auto a = [=] { // { dg-bogus "implicit capture" "" { target c++17_down 
> } }
> +// 

Re: [PATCH] Switch conversion: support any ax + b transformation (PR tree-optimization/84436).

2018-10-23 Thread Martin Liška
On 10/23/18 12:20 PM, Richard Biener wrote:
> On Tue, Oct 23, 2018 at 10:37 AM Martin Liška  wrote:
>>
>> On 10/22/18 4:25 PM, Jakub Jelinek wrote:
>>> On Mon, Oct 22, 2018 at 04:08:53PM +0200, Martin Liška wrote:
 Very valid question. I hope as long as I calculate the linear function
 values in wide_int (get via wi::to_wide (switch_element)), then it should
 overflow in the same way as original tree type arithmetic. I have a 
 test-case with
 overflow: gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c.

 Do you have any {over,under)flowing test-cases that I should add to 
 test-suite?
>>>
>>> I'm worried that the calculation you emit into the code could invoke UB at
>>> runtime, even if there was no UB in the original code, and later GCC passes
>>> would optimize with the assumption that UB doesn't occur.
>>> E.g. if the multiplication overflows for one or more of the valid values in
>>> the switch and then the addition adds a negative value so that the end
>>> result is actually representable.
>>
>> In order to address that I verified that neither of (a * x) and (a * x) + b 
>> {over,under}flow
>> in case of TYPE_OVERFLOW_UNDEFINED (type) is true.
>>
>> Hope it's way how to properly make it safe?
> 
> Hmm, if the default: case is unreachable maybe.  But I guess Jakub was
> suggesting to do the linear function compute in an unsigned type?
> 
> +  /* Let's try to find any linear function a.x + y that can apply to
> 
> a * x?

Yep.

> 
> + given values. 'a' can be calculated as follows:
> 
> +  tree t = TREE_TYPE (m_index_expr);
> 
> so unsigned_type_for (TREE_TYPE ...)
> 
> +  tree tmp = make_ssa_name (t);
> +  tree value = fold_build2_loc (loc, MULT_EXPR, t,
> +   wide_int_to_tree (t, coeff_a),
> +   m_index_expr);
> +
> 
> +  gsi_insert_before (, gimple_build_assign (tmp, value),
> GSI_SAME_STMT);
> +  value = fold_build2_loc (loc, PLUS_EXPR, t,
> +  tmp, wide_int_to_tree (t, coeff_b));
> +  tree tmp2 = make_ssa_name (t);
> +  gsi_insert_before (, gimple_build_assign (tmp2, value),
> +GSI_SAME_STMT);
> +  load = gimple_build_assign (name, NOP_EXPR, fold_convert (t, tmp2));
> 
> before the unsigned_type_for that NOP_EXPR would be always redundant.
> 
> Please also use
> 
>   gimple_seq seq = NULL;
>   tree tmp = gimple_build (, MULT_EXPR, type, ...);
>   tree tmp2 = gimple_build (, PLUS_EXPR, type, ...);
>   tree tmp3 = gimple_convert (, TREE_TYPE (m_index_expr), tmp2);
>   gsi_insert_seq_before (, seq, GSI_SAME_STMT);
>   load = gimple_build_assign (name, tmp3);
> 
> not sure why you need the extra assignment at the end, not enough
> context in the patch.

Thanks for the hint. I did that and tested the patch. It looks fine.

Martin

> 
> Richard.
> 
> 
>> Martin
>>
>>>
>>>   Jakub
>>>
>>

>From 81b29b5ba12f043d091b81805116793af4a98442 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 11 Oct 2018 12:37:37 +0200
Subject: [PATCH] Switch conversion: support any ax + b transformation (PR
 tree-optimization/84436).

gcc/ChangeLog:

2018-10-11  Martin Liska  

	PR tree-optimization/84436
	* tree-switch-conversion.c (switch_conversion::contains_same_values_p):
	Remove.
	(switch_conversion::contains_linear_function_p): New.
	(switch_conversion::build_one_array): Support linear
	transformation on input.
	* tree-switch-conversion.h (struct switch_conversion): Add
	contains_linear_function_p declaration.

gcc/testsuite/ChangeLog:

2018-10-11  Martin Liska  

	PR tree-optimization/84436
	* gcc.dg/tree-ssa/pr84436-1.c: New test.
	* gcc.dg/tree-ssa/pr84436-2.c: New test.
	* gcc.dg/tree-ssa/pr84436-3.c: New test.
	* gcc.dg/tree-ssa/pr84436-4.c: New test.
	* gcc.dg/tree-ssa/pr84436-5.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c | 36 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c | 67 +
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-3.c | 24 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c | 38 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-5.c | 38 ++
 gcc/tree-switch-conversion.c  | 87 +++
 gcc/tree-switch-conversion.h  | 10 +--
 7 files changed, 281 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-5.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
new file mode 100644
index 000..a045b44c2b9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
@@ -0,0 +1,36 @@
+/* PR tree-optimization/84436 */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-optimized" } */
+/* { dg-do run } */
+
+int

[PATCH] Remove reduntant dumps and make tp_first_run dump more compact.

2018-10-23 Thread Martin Liška
Hi.

I've noticed a redundancy in cgraph_node dump function and I would like to 
simplify
and compact how Function flags are printed. Plus I moved 'First run' to the 
flags
as well. One diff example:

@@ -133,8 +125,7 @@
   Referring: 
   Availability: available
   Profile id: 108032747
-  First run: 6
-  Function flags: count: 1 body only_called_at_startup nonfreeing_fn 
only_called_at_startup
+  Function flags: count:1 first_run:6 body only_called_at_startup nonfreeing_fn
   Called by: 
   Calls: 
 g/0 (g) @0x76ad8000

Patch survives regression tests on x86_64-linux-gnu.
Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-10-23  Martin Liska  

* cgraph.c (cgraph_node::dump):
Remove reduntant dumps and make tp_first_run dump more compact.

libgcc/ChangeLog:

2018-10-23  Martin Liska  

* libgcov-profiler.c: Start from 1 in order to distinguish
functions which were seen and these that were not.
---
 gcc/cgraph.c | 15 ++-
 gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c |  2 +-
 libgcc/libgcov-profiler.c|  2 +-
 5 files changed, 11 insertions(+), 14 deletions(-)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 48bab9f2749..b432f7e6500 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2016,7 +2016,6 @@ cgraph_node::dump (FILE *f)
   if (profile_id)
 fprintf (f, "  Profile id: %i\n",
 	 profile_id);
-  fprintf (f, "  First run: %i\n", tp_first_run);
   cgraph_function_version_info *vi = function_version ();
   if (vi != NULL)
 {
@@ -2040,11 +2039,13 @@ cgraph_node::dump (FILE *f)
   fprintf (f, "  Function flags:");
   if (count.initialized_p ())
 {
-  fprintf (f, " count: ");
+  fprintf (f, " count:");
   count.dump (f);
 }
+  if (tp_first_run > 0)
+fprintf (f, " first_run:%i", tp_first_run);
   if (origin)
-fprintf (f, " nested in: %s", origin->asm_name ());
+fprintf (f, " nested in:%s", origin->asm_name ());
   if (gimple_has_body_p (decl))
 fprintf (f, " body");
   if (process)
@@ -2081,10 +2082,6 @@ cgraph_node::dump (FILE *f)
 fprintf (f, " unlikely_executed");
   if (frequency == NODE_FREQUENCY_EXECUTED_ONCE)
 fprintf (f, " executed_once");
-  if (only_called_at_startup)
-fprintf (f, " only_called_at_startup");
-  if (only_called_at_exit)
-fprintf (f, " only_called_at_exit");
   if (opt_for_fn (decl, optimize_size))
 fprintf (f, " optimize_size");
   if (parallelized_function)
@@ -2096,7 +2093,7 @@ cgraph_node::dump (FILE *f)
 {
   fprintf (f, "  Thunk");
   if (thunk.alias)
-fprintf (f, "  of %s (asm: %s)",
+	fprintf (f, "  of %s (asm:%s)",
 		 lang_hooks.decl_printable_name (thunk.alias, 2),
 		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk.alias)));
   fprintf (f, " fixed offset %i virtual value %i indirect_offset %i "
@@ -2112,7 +2109,7 @@ cgraph_node::dump (FILE *f)
   fprintf (f, "  Alias of %s",
 	   lang_hooks.decl_printable_name (thunk.alias, 2));
   if (DECL_ASSEMBLER_NAME_SET_P (thunk.alias))
-fprintf (f, " (asm: %s)",
+	fprintf (f, " (asm:%s)",
 		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk.alias)));
   fprintf (f, "\n");
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
index 455f923f3f4..a622df23ce6 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-1.c
@@ -16,6 +16,6 @@ int main ()
 {
   return foo ();
 }
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 1 "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 2" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 3" 1 "profile"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
index e6eaeb99810..497b585388e 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-2.c
@@ -43,7 +43,7 @@ int main ()
 
   return r;
 }
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 2 "profile"} } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 2 "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 2" 1 "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 3" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 4" 1 "profile"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c 

Re: [PATCH] Fix g++.dg/cpp2a/lambda-this3.C (Re: PATCH to enable testing C++17 by default)

2018-10-23 Thread Marek Polacek
On Tue, Oct 23, 2018 at 10:52:02AM +0200, Jakub Jelinek wrote:
> On Wed, Oct 17, 2018 at 03:31:43PM -0400, Marek Polacek wrote:
> > As discussed in  
> > it
> > seems to be a high time we turned on testing C++17 by default.
> > 
> > The only interesting part is at the very end, otherwise most of the changes 
> > is
> > just using { target c++17 } instead of explicit dg-options.  Removing
> > dg-options has the effect that DEFAULT_CXXFLAGS comes in play, so I've 
> > removed
> > a bunch of stray semicolons to fix -Wpedantic errors.
> > 
> > I wonder if we also want to enable 2a, but the overhead could be too much.  
> > Or
> > use 2a instead of 17?
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2018-10-17  Marek Polacek  
> > 
> > * g++.dg/*.C: Use target c++17 instead of explicit dg-options.
> > * lib/g++-dg.exp: Don't test C++11 by default.  Add C++17 to
> > the list of default stds to test.
> 
> > diff --git gcc/testsuite/g++.dg/cpp2a/lambda-this3.C 
> > gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > index 5e5c8b3d50f..d1738ea7d17 100644
> > --- gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > +++ gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> > @@ -1,6 +1,6 @@
> >  // P0806R2
> > -// { dg-do compile }
> > -// { dg-options "-std=c++17" }
> > +// { dg-do compile { target c++17 } }
> > +// { dg-options "" }
> >  
> >  struct X {
> >int x;
> 
> This test now fails with -std=gnu++2a:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: note: add explicit 
> 'this' or '*this' capture
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: warning: implicit 
> capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
> /.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: note: add explicit 
> 'this' or '*this' capture
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 8)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 16)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 17)
> FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, 
> line 18)
> PASS: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a (test for excess errors)
> 
> The following patch fixes this, tested on x86_64-linux with
> make check-c++-all RUNTESTFLAGS=dg.exp=lambda-this3.C
> Ok for trunk?

Oops, sorry, I thought I'd limited the test to c++17_only.  I don't know why
it wasn't part of the patch.

Marek


[PATCH] Fix PR87665

2018-10-23 Thread Richard Biener


The following fixes a long-standing issue with SLP vectorization
where the dependence checking didn't really reflect reality... (oops).

I have sofar prepared trunk and GCC 8 variants.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-10-23  Richard Biener  

PR tree-optimization/87665
* tree-vect-data-refs.c (vect_preserves_scalar_order_p): Adjust
to reflect reality.

* gcc.dg/torture/pr87665.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 265422)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -210,16 +210,26 @@ vect_preserves_scalar_order_p (dr_vec_in
 return true;
 
   /* STMT_A and STMT_B belong to overlapping groups.  All loads in a
- group are emitted at the position of the first scalar load and all
+ group are emitted at the position of the last scalar load and all
  stores in a group are emitted at the position of the last scalar store.
- Thus writes will happen no earlier than their current position
- (but could happen later) while reads will happen no later than their
- current position (but could happen earlier).  Reordering is therefore
- only possible if the first access is a write.  */
-  stmtinfo_a = vect_orig_stmt (stmtinfo_a);
-  stmtinfo_b = vect_orig_stmt (stmtinfo_b);
-  stmt_vec_info earlier_stmt_info = get_earlier_stmt (stmtinfo_a, stmtinfo_b);
-  return !DR_IS_WRITE (STMT_VINFO_DATA_REF (earlier_stmt_info));
+ Compute that position and check whether the resulting order matches
+ the current one.  */
+  stmt_vec_info last_a = DR_GROUP_FIRST_ELEMENT (stmtinfo_a);
+  if (last_a)
+for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_a); s;
+s = DR_GROUP_NEXT_ELEMENT (s))
+  last_a = get_later_stmt (last_a, s);
+  else
+last_a = stmtinfo_a;
+  stmt_vec_info last_b = DR_GROUP_FIRST_ELEMENT (stmtinfo_b);
+  if (last_b)
+for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_b); s;
+s = DR_GROUP_NEXT_ELEMENT (s))
+  last_b = get_later_stmt (last_b, s);
+  else
+last_b = stmtinfo_b;
+  return ((get_later_stmt (last_a, last_b) == last_a)
+ == (get_later_stmt (stmtinfo_a, stmtinfo_b) == stmtinfo_a));
 }
 
 /* A subroutine of vect_analyze_data_ref_dependence.  Handle
Index: gcc/testsuite/gcc.dg/torture/pr87665.c
===
--- gcc/testsuite/gcc.dg/torture/pr87665.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87665.c  (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+struct X { long x; long y; };
+
+struct X a[1024], b[1024];
+
+void foo ()
+{
+  for (int i = 0; i < 1024; ++i)
+{
+  long tem = a[i].x;
+  a[i].x = 0;
+  b[i].x = tem;
+  b[i].y = a[i].y;
+}
+}
+
+int main()
+{
+  for (int i = 0; i < 1024; ++i)
+a[i].x = i;
+  foo ();
+  for (int i = 0; i < 1024; ++i)
+if (b[i].x != i)
+  __builtin_abort();
+  return 0;
+}


Re: [ARM/FDPIC v3 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2018-10-23 Thread Richard Earnshaw (lists)
On 19/10/2018 14:40, Christophe Lyon wrote:
> On 12/10/2018 12:45, Richard Earnshaw (lists) wrote:
>> On 11/10/18 14:34, Christophe Lyon wrote:
>>> The FDPIC register is hard-coded to r9, as defined in the ABI.
>>>
>>> We have to disable tailcall optimizations if we don't know if the
>>> target function is in the same module. If not, we have to set r9 to
>>> the value associated with the target module.
>>>
>>> When generating a symbol address, we have to take into account whether
>>> it is a pointer to data or to a function, because different
>>> relocations are needed.
>>>
>>> 2018-XX-XX  Christophe Lyon  
>>> Mickaël Guêné 
>>>
>>> * config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
>>> in FDPIC mode.
>>> * config/arm/arm-protos.h (arm_load_function_descriptor): Declare
>>> new function.
>>> * config/arm/arm.c (arm_option_override): Define pic register to
>>> FDPIC_REGNUM.
>>> (arm_function_ok_for_sibcall) Disable sibcall optimization if we
>>
>> Missing colon.
>>
>>> have no decl or go through PLT.
>>> (arm_load_pic_register): Handle TARGET_FDPIC.
>>> (arm_is_segment_info_known): New function.
>>> (arm_pic_static_addr): Add support for FDPIC.
>>> (arm_load_function_descriptor): New function.
>>> (arm_assemble_integer): Add support for FDPIC.
>>> * config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
>>> Define. (FDPIC_REGNUM): New define.
>>> * config/arm/arm.md (call): Add support for FDPIC.
>>> (call_value): Likewise.
>>> (*restore_pic_register_after_call): New pattern.
>>> (untyped_call): Disable if FDPIC.
>>> (untyped_return): Likewise.
>>> * config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.
>>>
>>
>> Other comments inline.
>>
>>> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
>>> index 4471f79..90733cc 100644
>>> --- a/gcc/config/arm/arm-c.c
>>> +++ b/gcc/config/arm/arm-c.c
>>> @@ -202,6 +202,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
>>>     builtin_define ("__ARM_EABI__");
>>>   }
>>>   +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
>>> +
>>>     def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
>>>     def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
>>>   diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>> index 0dfb3ac..28cafa8 100644
>>> --- a/gcc/config/arm/arm-protos.h
>>> +++ b/gcc/config/arm/arm-protos.h
>>> @@ -136,6 +136,7 @@ extern int arm_max_const_double_inline_cost (void);
>>>   extern int arm_const_double_inline_cost (rtx);
>>>   extern bool arm_const_double_by_parts (rtx);
>>>   extern bool arm_const_double_by_immediates (rtx);
>>> +extern rtx arm_load_function_descriptor (rtx funcdesc);
>>>   extern void arm_emit_call_insn (rtx, rtx, bool);
>>>   bool detect_cmse_nonsecure_call (tree);
>>>   extern const char *output_call (rtx *);
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index 8810df5..92ae24b 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -3470,6 +3470,14 @@ arm_option_override (void)
>>>     if (flag_pic && TARGET_VXWORKS_RTP)
>>>   arm_pic_register = 9;
>>>   +  /* If in FDPIC mode then force arm_pic_register to be r9.  */
>>> +  if (TARGET_FDPIC)
>>> +    {
>>> +  arm_pic_register = FDPIC_REGNUM;
>>> +  if (TARGET_ARM_ARCH < 7)
>>> +    error ("FDPIC mode is not supported on architectures older than
>>> Armv7");
>>
>> What properties of FDPIC impose this requirement?  Does it also apply to
>> Armv8-m.baseline?
>>
> In fact, there was miscommunication on my side, resulting in a
> misunderstanding between Kyrill and myself, which I badly translated
> into this condition.
> 
> My initial plan was to submit a patch series tested on v7, and send the
> patches needed to support older architectures as a follow-up. The proper
> restriction is actually "CPUs that do not support ARM or Thumb2". As you
> may have noticed during the iterations of this patch series, I had
> failed to remove partial Thumb1 support hunks.
> 
> So really this should be rephrased, and rewritten as "FDPIC mode is
> supported on architecture versions that support ARM or Thumb-2", if that
> suits you. And the condition should thus be:
> if (! TARGET_ARM && ! TARGET_THUMB2)
>   error ("...")
> 
> This would also exclude Armv8-m.baseline, since it doesn't support Thumb2.

When we get to v8-m.baseline the thumb1/2 distinction starts to become a
lot more blurred.  A lot of thumb2 features needed for stand-alone
systems are then available.  So what feature is it that you require in
order to make fdpic work in (traditional) thumb2 that isn't in
(traditional) thumb1?



> As a side note, I tried to build GCC master (without my patches)
> --with-cpu=cortex-m23, and both targets arm-eabi and arm-linux-gnueabi
> failed to buid.
> 
> For arm-eabi, there are problems in newlib:
> newlib/libc/sys/arm/crt0.S:145: Error: lo register required -- `add
> sl,r2,#256'
> 

Re: [ARM/FDPIC v3 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2018-10-23 Thread Richard Earnshaw (lists)
On 15/10/2018 11:10, Christophe Lyon wrote:
> On Fri, 12 Oct 2018 at 12:01, Richard Earnshaw (lists) <
> richard.earns...@arm.com> wrote:
> 
>> On 11/10/18 14:34, Christophe Lyon wrote:
>>> In FDPIC mode, we set -fPIE unless the user provides -fno-PIE, -fpie,
>>> -fPIC or -fpic: indeed FDPIC code is PIC, but we want to generate code
>>> for executables rather than shared libraries by default.
>>>
>>> We also make sure to use the --fdpic assembler option, and select the
>>> appropriate linker emulation.
>>>
>>> At link time, we also default to -pie, unless we are generating a
>>> shared library or a relocatable file (-r). Note that even for static
>>> link, we must specify the dynamic linker because the executable still
>>> has to relocate itself at startup.
>>>
>>> We also force 'now' binding since lazy binding is not supported.
>>>
>>> We should also apply the same behavior for -Wl,-Ur as for -r, but I
>>> couldn't find how to describe that in the specs fragment.
>>>
>>> 2018-XX-XX  Christophe Lyon  
>>>   Mickaël Guêné 
>>>
>>>   gcc/
>>>   * config.gcc: Handle arm*-*-uclinuxfdpiceabi.
>>>   * config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): New.
>>>   (SUBTARGET_EXTRA_ASM_SPEC): Use TARGET_FDPIC_ASM_SPEC.
>>>   * config/arm/linux-eabi.h (FDPIC_CC1_SPEC): New.
>>>   (CC1_SPEC): Use FDPIC_CC1_SPEC.
>>>   * config/arm/uclinuxfdpiceabi.h: New file.
>>>
>>>   libsanitizer/
>>>   * configure.tgt (arm*-*-uclinuxfdpiceabi): Sanitizers are
>>>   unsupported in this configuration.
>>
>> The documentation (in patch 1) seems to imply that -mfdpic is available
>> in all configurations and has certain effects (such as enabling -fPIE),
>> but this patch set suggests that such behaviours are only available when
>> the compiler is configured explicitly for an fdpic target.
>>
>> I think this needs to be resolved.  Either -mfdpic works everywhere, or
>> the option should only be available when configured for -mfdpic.
>>
>>
> You are right, this is not clear. I tried to follow what other fdpic
> targets do,
> but it's not consistent either, it seems.
> 
> So, at present, -mfdpic alone is in general not sufficient, and the user has
> to use -fpic/-fPIC/-fpie/-fPIE as needed. When configured for
> arm-uclinuxfdpiceabi,
> this is done implicitly (thanks to this patch).
> 
> One possibility is to rephrase the doc, and say that -fPIE is only implied
> when GCC
> is configured for arm-uclinuxfdpiceabi.
> 
> Do you mean to also make -mfdpic non-existent/rejected when GCC is not
> configured
> for arm-uclinuxfdpiceabi? 

Ideally doesn't exist, so that it doesn't show up in things like --help
when it doesn't work.

> How to achieve that?

Good question, I'm not sure, off hand.  It might be possible to make the
config machinery add additional opt files, but it's not something I've
tried.  You might want to try adding an additional opt file to
extra_options for fdpic targets.

R.

> 
> 
> R.
>>
>>>
>>> Change-Id: If369e0a10bb916fd72e38f71498d3c640fa85c4c
>>>
>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>> index 793fc69..a4f4331 100644
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1144,6 +1144,11 @@ arm*-*-linux-* | arm*-*-uclinuxfdpiceabi)
>>   # ARM GNU/Linux with ELF
>>>   esac
>>>   tmake_file="${tmake_file} arm/t-arm arm/t-arm-elf arm/t-bpabi
>> arm/t-linux-eabi"
>>>   tm_file="$tm_file arm/bpabi.h arm/linux-eabi.h arm/aout.h
>> arm/arm.h"
>>> + case $target in
>>> + arm*-*-uclinuxfdpiceabi)
>>> + tm_file="$tm_file arm/uclinuxfdpiceabi.h"
>>> + ;;
>>> + esac
>>>   # Generation of floating-point instructions requires at least
>> ARMv5te.
>>>   if [ "$with_float" = "hard" -o "$with_float" = "softfp" ] ; then
>>>   target_cpu_cname="arm10e"
>>> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
>>> index 1e3ecfb..5901154 100644
>>> --- a/gcc/config/arm/bpabi.h
>>> +++ b/gcc/config/arm/bpabi.h
>>> @@ -55,6 +55,8 @@
>>>  #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
>>>"|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>>>
>>> +#define TARGET_FDPIC_ASM_SPEC  ""
>>> +
>>>  #define BE8_LINK_SPEC
>>   \
>>>"%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}" \
>>>" %{mbig-endian:big}"  \
>>> @@ -64,7 +66,7 @@
>>>  /* Tell the assembler to build BPABI binaries.  */
>>>  #undef  SUBTARGET_EXTRA_ASM_SPEC
>>>  #define SUBTARGET_EXTRA_ASM_SPEC \
>>> -  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}"
>> TARGET_FIX_V4BX_SPEC
>>> +  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}"
>> TARGET_FIX_V4BX_SPEC TARGET_FDPIC_ASM_SPEC
>>>
>>>  #ifndef SUBTARGET_EXTRA_LINK_SPEC
>>>  #define SUBTARGET_EXTRA_LINK_SPEC ""
>>> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
>>> index 8585fde..4cee958 100644
>>> --- a/gcc/config/arm/linux-eabi.h
>>> +++ b/gcc/config/arm/linux-eabi.h
>>> @@ -98,11 

Re: [GCC][PATCH][Aarch64] Replace umov with cheaper fmov in popcount expansion

2018-10-23 Thread Richard Earnshaw (lists)
On 22/10/2018 10:02, Sam Tebbs wrote:
> Hi all,
> 
> This patch replaces the umov instruction in the aarch64 popcount
> expansion with
> the less expensive fmov instruction.
> 
> Example:
> 
> int foo (int a) {
>   return __builtin_popcount (a);
> }
> 
> would generate:
> 
> foo:
>   uxtw    x0, w0
>   fmov    d0, x0
>   cnt    v0.8b, v0.8b
>   addv    b0, v0.8b
>   umov    w0, v0.b[0]
>   ret
> 
> but now generates:
> 
> foo:
>   uxtw    x0, w0
>   fmov    d0, x0
>   cnt    v0.8b, v0.8b
>   addv    b0, v0.8b
>   fmov    w0, s0
>   ret
> 
> Using __builtin_popcountl on a long generates
> 
> foo:
>   fmov    d0, x0
>   cnt    v0.8b, v0.8b
>   addv    b0, v0.8b
>   umov    w0, v0.b[0]
>   ret
> 
> but with this patch generates:
> 
> foo:
>   fmov    d0, x0
>   cnt    v0.8b, v0.8b
>   addv    b0, v0.8b
>   fmov    w0, s0
>   ret
> 
> Bootstrapped successfully and tested on aarch64-none-elf and
> aarch64_be-none-elf with no regressions.
> 
> OK for trunk?
> 
> gcc/
> 2018-10-22  Sam Tebbs
> 
>     * config/aarch64/aarch64.md (popcount2): Replaced zero_extend
>     generation with move generation.
> 
> gcc/testsuite
> 2018-10-22  Sam Tebbs
> 
>     * gcc.target/aarch64/popcnt2.c: New file.
> 
> 
> latest.patch
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> d7473418a8eb62b2757017cd1675493f86e41ef4..77e6f75cc15f06733df7b47906ee00580bea8d29
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4489,7 +4489,7 @@
>emit_move_insn (v, gen_lowpart (V8QImode, in));
>emit_insn (gen_popcountv8qi2 (v1, v));
>emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> -  emit_insn (gen_zero_extendqi2 (out, r));
> +  emit_move_insn (out, gen_lowpart_SUBREG (GET_MODE (out), r));

I don't think this is right.  You're effectively creating a paradoxical
subreg here and relying on an unstated side effect of an earlier
instruction for correct behaviour.

What you really need is a pattern that generates the zero-extend in
combination with the reduction operation.  So something like

(set (reg:DI)
 (zero_extend:DI (unspec:VecMode [(reg:VecMode)] UNSPEC_ADDV)))

now you can copy all, or part, or that register directly across to the
integer side and the RTL remains mathematically accurate.

R.

>DONE;
>  })
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt2.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt2.c
> new file mode 100644
> index 
> ..9c595f09222c24eefb4b00e8823e4c02f6eaf3b9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo0 (int a)
> +{
> +  return __builtin_popcount (a);
> +}
> +
> +int
> +foo1 (long a)
> +{
> +  return __builtin_popcountl (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "umov\\t" } } */
> +/* { dg-final { scan-assembler-times "fmov\\t" 4 } } */
> 



Re: [PATCH 02/14] Add D frontend (GDC) implementation.

2018-10-23 Thread Richard Sandiford
Iain Buclaw  writes:
> I'm just going to post the diff since the original here, just to show
> what's been done since review comments.
>
> I think I've covered all that's been addressed, except for the couple
> of notes about the quadratic parts (though I think one of them is
> actually O(N^2)).  I've raised bug reports on improving them later.
>
> I've also rebased them against trunk, so there's a couple new things
> present that are just to support build.

Thanks, this is OK when the frontend is accepted in principle
(can't remember where things stand with that).

Richard


[gomp5] Add support for reduction clause task modifier on parallel

2018-10-23 Thread Jakub Jelinek
Hi!

This implements task reduction support on parallel.  Such reductions are
registered with a special GOMP_parallel_reductions call which first
determines number of threads, then registers the reductions and creates
artificial taskgroup for those and then spawns threads as normally.
The function returns the number of threads, so the caller then performs the
reductions and finally unregisters the reductions.

Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-10-23  Jakub Jelinek  

* builtin-types.def (BT_FN_UINT_OMPFN_PTR_UINT_UINT): New.
* omp-builtins.def (BUILT_IN_GOMP_PARALLEL_REDUCTIONS): New builtin.
* omp-low.c (scan_sharing_clauses): Handle reduction clause with task
modifier on parallel like reduction on taskloop.
(scan_omp_parallel): Add _reductemp_ clause if there are any reduction
clauses with task modifier.
(finish_taskreg_scan): Move field corresponding to _reductemp_ clause
first.
(lower_rec_input_clauses): Don't set reduction_omp_orig_ref if handling
reduction as task reductions.  Handle reduction clauses with task
modifiers like task_reduction clauses.  For is_parallel_ctx perform
initialization unconditionally.
(lower_reduction_clauses): Ignore reduction clauses with task modifier
on parallel constructs.
(lower_send_clauses): Likewise.
(lower_omp_task_reductions): Handle code == OMP_PARALLEL.  Use ctx
rather than ctx->outer in lower_omp call.
(lower_omp_taskreg): Handle reduction clauses with task modifier on
parallel construct.
* omp-expand.c (workshare_safe_to_combine_p): Return false for
non-worksharing loops.
(determine_parallel_type): Don't combine if there are any
OMP_CLAUSE__REDUCTEMP_ clauses.
(expand_parallel_call): Use GOMP_parallel_reductions call instead
of GOMP_parallel if there are any reductions, store return value into
the _reductemp_ temporary.
gcc/fortran/
* types.def (BT_FN_UINT_OMPFN_PTR_UINT_UINT): New.
libgomp/
* libgomp.h (gomp_parallel_reduction_register): Declare.
(gomp_team_start): Add taskgroup argument.
* libgomp_g.h (GOMP_parallel_reductions): New protype.
* libgomp.map (GOMP_5.0): Add GOMP_parallel_reductions.  Sort.
* loop.c (gomp_parallel_loop_start): Pass NULL as taskgroup to
gomp_team_start.
* parallel.c (GOMP_parallel_start): Likewise.
(GOMP_parallel): Likewise.  Formatting fix.
(GOMP_parallel_reductions): New function.
* sections.c (GOMP_parallel_sections_start, GOMP_parallel_sections):
Pass NULL as taskgroup to gomp_team_start.
* task.c (gomp_taskgroup_init): New static inline function.
(GOMP_taskgroup_start): Use it.
(gomp_reduction_register): New static inline function.
(GOMP_taskgroup_reduction_register): Use it.
(GOMP_taskgroup_reduction_unregister): Add ialias.
(gomp_parallel_reduction_register): New function.
* team.c (gomp_team_start): Add taskgroup argument, initialize implicit
tasks' taskgroup field to that.
* config/nvptx/team.c (gomp_team_start): Likewise.
* testsuite/libgomp.c-c++-common/task-reduction-6.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-7.c: New test.
* testsuite/libgomp.c/task-reduction-2.c: New test.
* testsuite/libgomp.c++/task-reduction-8.C: New test.
* testsuite/libgomp.c++/task-reduction-9.C: New test.
* testsuite/libgomp.c++/task-reduction-10.C: New test.
* testsuite/libgomp.c++/task-reduction-11.C: New test.
* testsuite/libgomp.c++/task-reduction-12.C: New test.
* testsuite/libgomp.c++/task-reduction-13.C: New test.

--- gcc/builtin-types.def.jj2018-09-25 14:30:29.038766626 +0200
+++ gcc/builtin-types.def   2018-10-22 13:02:46.904919759 +0200
@@ -644,6 +644,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_INT_FILEPTR_I
 BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING, BT_VALIST_ARG)
 DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
 BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_4 (BT_FN_UINT_OMPFN_PTR_UINT_UINT,
+BT_UINT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_4 (BT_FN_VOID_PTR_WORD_WORD_PTR,
 BT_VOID, BT_PTR, BT_WORD, BT_WORD, BT_PTR)
 DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_VPTR_PTR_INT, BT_VOID, BT_SIZE,
--- gcc/omp-builtins.def.jj 2018-09-25 14:32:54.671315163 +0200
+++ gcc/omp-builtins.def2018-10-22 13:02:15.955438340 +0200
@@ -315,6 +315,9 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_DOACROSS
  BT_FN_VOID_ULL_VAR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL, "GOMP_parallel",
  BT_FN_VOID_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN 

Re: [PATCH] bring netbsd/arm support up to speed. eabi, etc.

2018-10-23 Thread Richard Earnshaw (lists)
Thanks for posting this.  Before we can commit it, however, we need to
sort out the authorship and ensure that all the appropriate copyright
assignments are in place.  Are you the sole author, or are other NetBSD
developers involved?

Firstly, please provide a ChangeLog description for the patch.

Below are some initial comments, I might have more once I have a more
detailed look.

R.

On 20/10/2018 22:05, Maya Rashish wrote:
> ---
>  gcc/config.gcc  |  33 +-
>  gcc/config.host |   2 +-
>  gcc/config/arm/netbsd-eabi.h| 108 
>  gcc/config/arm/netbsd-elf.h |  10 +++
>  gcc/config/netbsd-elf.h |  15 +
>  libgcc/config.host  |  11 +++-
>  libgcc/config/arm/t-netbsd  |  15 -
>  libgcc/config/arm/t-netbsd-eabi |  18 ++
>  8 files changed, 205 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/config/arm/netbsd-eabi.h
>  create mode 100644 libgcc/config/arm/t-netbsd-eabi
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 8521f7d556e..e749c61e75f 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1130,10 +1130,37 @@ arm*-*-freebsd*)# ARM FreeBSD EABI
>   with_tls=${with_tls:-gnu}
>   ;;
>  arm*-*-netbsdelf*)
> - tm_file="dbxelf.h elfos.h ${nbsd_tm_file} arm/elf.h arm/aout.h 
> ${tm_file} arm/netbsd-elf.h"
> - extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
>   tmake_file="${tmake_file} arm/t-arm"
> - target_cpu_cname="arm6"

See patch I posted this morning, a default here is probably still needed.

> + tm_file="dbxelf.h elfos.h ${nbsd_tm_file} arm/elf.h"
> + extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
> + case ${target} in
> + arm*eb-*) tm_defines="${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1" ;;
> + esac
> + case ${target} in
> + arm*-*-netbsdelf-*eabi*)
> + tm_file="$tm_file arm/bpabi.h arm/netbsd-elf.h arm/netbsd-eabi.h"
> + tmake_file="$tmake_file arm/t-bpabi arm/t-netbsdeabi"
> + # The BPABI long long divmod functions return a 128-bit value in
> + # registers r0-r3.  Correctly modeling that requires the use of
> + # TImode.
> + need_64bit_hwint=yes

need_64bit_hwint isn't needed any more (removed in 2014).

> + ;;
> + *)
> + tm_file="$tm_file arm/netbsd-elf.h"
> + tmake_file="$tmake_file arm/t-netbsd"
> + ;;
> + esac
> + tm_file="${tm_file} arm/aout.h arm/arm.h"
> + case ${target} in
> + arm*-*-netbsdelf-*eabihf*)
> + tm_defines="${tm_defines} 
> TARGET_DEFAULT_FLOAT_ABI=ARM_FLOAT_ABI_HARD"
> + ;;
> + esac
> + case ${target} in
> + armv4*) target_cpu_cname="strongarm";;

You might want to filter out some bogus combinations here, such as armv4
+ eabihf which can't be supported.

> + armv6*) target_cpu_cname="arm1176jzf-s";;
> + armv7*) target_cpu_cname="cortex-a8";;

The list overall, seems somewhat incomplete.  What about armv4t, armv5
and armv8?

> + esac
>   ;;
>  arm*-*-linux-*)  # ARM GNU/Linux with ELF
>   tm_file="dbxelf.h elfos.h gnu-user.h linux.h linux-android.h 
> glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h"
> diff --git a/gcc/config.host b/gcc/config.host
> index c65569da2e9..59208d2508f 100644
> --- a/gcc/config.host
> +++ b/gcc/config.host
> @@ -107,7 +107,7 @@ case ${host} in
>   ;;
>  esac
>  ;;
> -  arm*-*-freebsd* | arm*-*-linux* | arm*-*-fuchsia*)
> +  arm*-*-freebsd* | arm*-*-netbsd* | arm*-*-linux* | arm*-*-fuchsia*)
>  case ${target} in
>arm*-*-*)
>   host_extra_gcc_objs="driver-arm.o"
> diff --git a/gcc/config/arm/netbsd-eabi.h b/gcc/config/arm/netbsd-eabi.h
> new file mode 100644
> index 000..92f31b885f0
> --- /dev/null
> +++ b/gcc/config/arm/netbsd-eabi.h
> @@ -0,0 +1,108 @@
> +/* Definitions of target machine for GNU compiler, NetBSD/arm ELF version.
> +   Copyright (C) 2002, 2003, 2004, 2005, 2007 Free Software Foundation, Inc.
> +   Contributed by Wasabi Systems, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +/* Run-time Target Specification.  */
> +#undef MULTILIB_DEFAULTS
> +#define MULTILIB_DEFAULTS { "mabi=aapcs-linux" }
> +
> +#define TARGET_LINKER_EABI_SUFFIX \
> +

[PATCH, testsuite] Fix sibcall-9 and sibcall-10 with -fPIC

2018-10-23 Thread Thomas Preudhomme
Hi,

gcc.dg/sibcall-9.c and gcc.dg/sibcall-10.c give execution failure
on ARM when compiled with -fPIC due to the PIC access to volatile
variable v creating an extra spill which causes the frame size of the
two recursive functions to be different. Making the variable static
solve the issue because the variable can be access in a PC-relative way
and avoid the spill, while still testing sibling call as originally
intended.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

* gcc.dg/sibcall-9.c: Make v static.
* gcc.dg/sibcall-10.c: Likewise.

Tested both testcase with and without -fPIC and it now passes in both
case when targeting arm-none-eabi. It also passes in both cases on
x86_64-linux-gnu.

Is this ok for trunk?

Best regards,

Thomas
From 27286120fe2d6a088d14d7e4f4b5b6fa6cc2bc41 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme 
Date: Tue, 23 Oct 2018 14:01:31 +0100
Subject: [PATCH] [PATCH, testsuite] Fix sibcall-9 and sibcall-10 with -fPIC

Hi,

gcc.dg/sibcall-9.c and gcc.dg/sibcall-10.c give execution failure
on ARM when compiled with -fPIC due to the PIC access to volatile
variable v creating an extra spill which causes the frame size of the
two recursive functions to be different. Making the variable static
solve the issue because the variable can be access in a PC-relative way
and avoid the spill, while still testing sibling call as originally
intended.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

	* gcc.dg/sibcall-9.c: Make v static.
	* gcc.dg/sibcall-10.c: Likewise.

Tested both testcase with and without -fPIC and it now passes in both
case when targeting arm-none-eabi. It also passes in both cases on
x86_64-linux-gnu.

Is this ok for trunk?

Best regards,

Thomas
---
 gcc/testsuite/gcc.dg/sibcall-10.c | 2 +-
 gcc/testsuite/gcc.dg/sibcall-9.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/sibcall-10.c b/gcc/testsuite/gcc.dg/sibcall-10.c
index 54cc604aecf..4acca50e3e4 100644
--- a/gcc/testsuite/gcc.dg/sibcall-10.c
+++ b/gcc/testsuite/gcc.dg/sibcall-10.c
@@ -31,7 +31,7 @@ extern void exit (int);
 static ATTR void recurser_void1 (void);
 static ATTR void recurser_void2 (void);
 extern void track (void);
-volatile int v;
+static volatile int v;
 
 int n = 0;
 int main ()
diff --git a/gcc/testsuite/gcc.dg/sibcall-9.c b/gcc/testsuite/gcc.dg/sibcall-9.c
index fc3bd9dcf16..32b2e1d5d61 100644
--- a/gcc/testsuite/gcc.dg/sibcall-9.c
+++ b/gcc/testsuite/gcc.dg/sibcall-9.c
@@ -31,7 +31,7 @@ extern void exit (int);
 static ATTR void recurser_void1 (int);
 static ATTR void recurser_void2 (int);
 extern void track (int);
-volatile int v;
+static volatile int v;
 
 int main ()
 {
-- 
2.19.1



Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Christophe Lyon
On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
 wrote:
>
> On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> > I have noticed many regressions on arm and aarch64 between 265366 and
> > 265408 (this commit is 265398).
> >
> > I bisected at least one to this commit on aarch64:
> > FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> > live-range of register"
> > The same test also regresses on arm.
>
> Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
> when random things in the RTL change, apparently.
>
> > For a whole picture of all the regressions I noticed during these two
> > commits, have a look at:
> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>
> No thanks.  I am not going to click on 111 links and whatever is behind
> those.  Please summarise, like, what was the diff in test_summary, and
> then dig down into individual tests if you want.  Or whatever else works
> both for you and for me.  This doesn't work for me.
>

OK this is not very practical for me either. There were 25 commits between
the two validations being compared,
25-28 gcc tests regressed on aarch64, depending on the exact target
177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
so I could have to run many bisects to make sure every regression is
caused by the same commit.

Since these are all automated builds with everything discarded after
computing the regressions, it's quite time consuming to re-run the
tests manually on my side (probably at least as much as it is for you).

In this case, the most efficient way would be for me to extract your patch
and have my validation system validate it against the preceding commit,
that would give the regressions caused by your patch only. I'm going
to do that, it should take 3-5h to run.

I know this doesn't answer your question, but I thought you could run aarch64
tests easily and that would be more efficient for the project that you
do it directly
without waiting for me to provide hardly little more information.

Maybe this will answer your question better:
List of aarch64-linux-gnu regressions:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
List of arm-none-linux-gnueabihf regressions:
(gcc) 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
(gfortran) 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
(all these are what you get when you click on one of the REGRESSED
links on the main page)

but as I said, these were caused by commits between 265366 and 265408,
so not all of them may be caused by your commit

Don't get me wrong, I'm not angry at you, I don't want to be offending,
I'm very humble.

To me it just highlights again that we need a validation system easier to
work with when we break something on a target we are not familiar with.
I run post-commit validations as finely grained as possible with the CPU
resources I have access to, that's not enough and I think having a
developer-accessible gerrit+jenkins-like system would be very valuable
to test patches before commit. We have a prototype in Linaro, not
production-ready. But I guess that would be worth another
discussion thread :)

Christophe

>
> Segher


[PATCH] Remove extra memory allocation of strings.

2018-10-23 Thread Martin Liška
Hello.

As a follow up patch I would like to remove redundant string allocation
on string which is not needed in my opinion.

That bootstrap on aarch64-linux.

Martin

>From a21a626055442635057985323bb42ef29526e182 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 22 Oct 2018 15:18:23 +0200
Subject: [PATCH] Remove extra memory allocation of strings.

gcc/ChangeLog:

2018-10-22  Martin Liska  

	* config/aarch64/aarch64.c (aarch64_parse_arch): Do not copy
	string to a stack buffer.
	(aarch64_parse_cpu): Likewise.
	(aarch64_parse_tune): Likewise.
---
 gcc/config/aarch64/aarch64.c | 32 
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e3295419154..12c21dd74fb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10521,19 +10521,16 @@ static enum aarch64_parse_opt_result
 aarch64_parse_arch (const char *to_parse, const struct processor **res,
 		unsigned long *isa_flags, std::string *invalid_extension)
 {
-  char *ext;
+  const char *ext;
   const struct processor *arch;
-  char *str = (char *) alloca (strlen (to_parse) + 1);
   size_t len;
 
-  strcpy (str, to_parse);
-
-  ext = strchr (str, '+');
+  ext = strchr (to_parse, '+');
 
   if (ext != NULL)
-len = ext - str;
+len = ext - to_parse;
   else
-len = strlen (str);
+len = strlen (to_parse);
 
   if (len == 0)
 return AARCH64_PARSE_MISSING_ARG;
@@ -10542,7 +10539,8 @@ aarch64_parse_arch (const char *to_parse, const struct processor **res,
   /* Loop through the list of supported ARCHes to find a match.  */
   for (arch = all_architectures; arch->name != NULL; arch++)
 {
-  if (strlen (arch->name) == len && strncmp (arch->name, str, len) == 0)
+  if (strlen (arch->name) == len
+	  && strncmp (arch->name, to_parse, len) == 0)
 	{
 	  unsigned long isa_temp = arch->flags;
 
@@ -10578,19 +10576,16 @@ static enum aarch64_parse_opt_result
 aarch64_parse_cpu (const char *to_parse, const struct processor **res,
 		   unsigned long *isa_flags, std::string *invalid_extension)
 {
-  char *ext;
+  const char *ext;
   const struct processor *cpu;
-  char *str = (char *) alloca (strlen (to_parse) + 1);
   size_t len;
 
-  strcpy (str, to_parse);
-
-  ext = strchr (str, '+');
+  ext = strchr (to_parse, '+');
 
   if (ext != NULL)
-len = ext - str;
+len = ext - to_parse;
   else
-len = strlen (str);
+len = strlen (to_parse);
 
   if (len == 0)
 return AARCH64_PARSE_MISSING_ARG;
@@ -10599,7 +10594,7 @@ aarch64_parse_cpu (const char *to_parse, const struct processor **res,
   /* Loop through the list of supported CPUs to find a match.  */
   for (cpu = all_cores; cpu->name != NULL; cpu++)
 {
-  if (strlen (cpu->name) == len && strncmp (cpu->name, str, len) == 0)
+  if (strlen (cpu->name) == len && strncmp (cpu->name, to_parse, len) == 0)
 	{
 	  unsigned long isa_temp = cpu->flags;
 
@@ -10633,14 +10628,11 @@ static enum aarch64_parse_opt_result
 aarch64_parse_tune (const char *to_parse, const struct processor **res)
 {
   const struct processor *cpu;
-  char *str = (char *) alloca (strlen (to_parse) + 1);
-
-  strcpy (str, to_parse);
 
   /* Loop through the list of supported CPUs to find a match.  */
   for (cpu = all_cores; cpu->name != NULL; cpu++)
 {
-  if (strcmp (cpu->name, str) == 0)
+  if (strcmp (cpu->name, to_parse) == 0)
 	{
 	  *res = cpu;
 	  return AARCH64_PARSE_OK;
-- 
2.19.0



[PATCH] PR libstdc++/87704 fix unique_ptr(nullptr_t) constructors

2018-10-23 Thread Jonathan Wakely

Using a delegating constructor to implement these constructors means
that they instantiate the destructor, which requires the element_type to
be complete. In C++11 and C++14 they were specified to be delegating,
but that was changed as part of LWG 2801 so in C++17 they don't require
a complete type (as was intended all along).

PR libstdc++/87704
* include/bits/unique_ptr.h (unique_ptr::unique_ptr(nullptr_t)): Do
not delegate to default constructor.
(unique_ptr::unique_ptr(nullptr_t)): Likewise.
* testsuite/20_util/unique_ptr/cons/incomplete.cc: New test.

Tested powerpc64le-linux, committed to trunk. Backports to follow.


commit 072315d496bd19a7227213b945652c984ddc4162
Author: Jonathan Wakely 
Date:   Tue Oct 23 12:40:03 2018 +0100

PR libstdc++/87704 fix unique_ptr(nullptr_t) constructors

Using a delegating constructor to implement these constructors means
that they instantiate the destructor, which requires the element_type to
be complete. In C++11 and C++14 they were specified to be delegating,
but that was changed as part of LWG 2801 so in C++17 they don't require
a complete type (as was intended all along).

PR libstdc++/87704
* include/bits/unique_ptr.h (unique_ptr::unique_ptr(nullptr_t)): Do
not delegate to default constructor.
(unique_ptr::unique_ptr(nullptr_t)): Likewise.
* testsuite/20_util/unique_ptr/cons/incomplete.cc: New test.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 0717c1e2728..dcb866d37bc 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -195,7 +195,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template>
constexpr unique_ptr() noexcept
: _M_t()
-{ }
+   { }
 
   /** Takes ownership of a pointer.
*
@@ -244,7 +244,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Creates a unique_ptr that owns nothing.
   template>
-   constexpr unique_ptr(nullptr_t) noexcept : unique_ptr() { }
+   constexpr unique_ptr(nullptr_t) noexcept
+   : _M_t()
+   { }
 
   // Move constructors.
 
@@ -472,7 +474,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template>
constexpr unique_ptr() noexcept
: _M_t()
-{ }
+   { }
 
   /** Takes ownership of a pointer.
*
@@ -535,7 +537,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Creates a unique_ptr that owns nothing.
   template>
-   constexpr unique_ptr(nullptr_t) noexcept : unique_ptr() { }
+   constexpr unique_ptr(nullptr_t) noexcept
+   : _M_t()
+{ }
 
   template>>
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc
new file mode 100644
index 000..1a8f28838a1
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/incomplete.cc
@@ -0,0 +1,32 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct Incomplete;
+
+void f(void** p)
+{
+  ::new (p[0]) std::unique_ptr();
+  ::new (p[1]) std::unique_ptr();
+
+  // PR libstdc++/87704
+  ::new (p[2]) std::unique_ptr(nullptr);
+  ::new (p[3]) std::unique_ptr(nullptr);
+}


Re: [RFC] GCC support for live-patching

2018-10-23 Thread Nicolai Stange
Hi,

Qing Zhao  writes:

>> 
>> thanks for the proposal. The others have already expressed some of my 
>> worries and remarks, but I think it would be only right to write them 
>> again. Especially since I am part of the team responsible for 
>> implementation and maintenance of live patches here at SUSE, we use kGraft 
>> and we prepare everything manually (compared to kpatch and ksplice).
>
> One question here,  what’s the major benefit to prepare the patches manually? 

There is none. We here at SUSE prefer the source based approach (as
opposed to binary diff) for a number of reasons and the manual live
patch creation is simply a consequence of not having any tooling for
this yet.


For reference, source based live patch creation involves the following
steps:

1. Determine the initial set of to be patched functions:
   a.) Inspect the upstream diff for the fix in question, add
   any touched functions to the initial set.
   b.) For each function in the initial set, check whether it has been
   inlined/cloned/optimized and if so, add all its callers to the
   initial set.  Repeat until the initial set has stabilized.

2. Copy & paste the initial set over to the new live patch sources.

3. Make it compile, i.e. recursively copy any needed cpp macro, type, or
   functions definition and add references to data objects with static
   storage duration.
   The rules are:
   a.) For data objects with static storage duration, a reference to the
   original must always be made. (If the symbol is EXPORT()ed, then
   fine. Otherwise, for kGraft, this involves a kallsyms lookup at
   patch module load time, for upstream kernel live patching, this
   has been solved with those '.klp' relocations).
   b.) If a called function is available as a symbol from either vmlinux
   or some (usually the patched) module, do not copy the definition,
   but add a reference to it, just as in a.).
   c.) If a type, cpp macro or (usually inlined) function is provided by
   some "public" header in /include/, include that
   rather than copying the definition.  Counterexample: Non-public
   header outside of include/ like
   e.g. /fs/btrfs/qgroup.h.
   d.) Otherwise copy the definition to the live patch module sources.

Rule 3b is not strictly necessary, but it helps in reducing the live
patch code size which is a factor with _manual_ live patch creation.

For 1b.), we need help from GCC. Namely, we want to know when some
functions has been optimized and we want it to disable any of those IPA
optimization it (currently) isn't capable to report properly.

Step 3.) is a bit tedious sometimes TBH and yes, w/o any tooling in
place, patch size would be a valid point. However, I'm currently working
on that and I'm optimistic that I'll have a working prototype soon.

That tool would be given the GCC command line from the original or "live
patch target" kernel compilation for the source file in question, the
set of functions as determined in 1.) and a number of user provided
filter scripts to make the decisions in 3.). As a result, it would
output a self-contained, minimal subset of the original kernel sources.

With that tooling in place, live patch code size would not be a real
concern for us.

So in conclusion, what we need from GCC is the information on when we
have to live patch callers due to optimizations. If that's not possible
for a particular class of optimization, it needs to be disabled.

OTOH, we definitely want to keep the set of these disabled optimizations
as small as possible in order to limit the impact of live patching on
kernel performance. In particular, disabling any of the "cloning"
optimizations, which GCC is able to report properly, would be a
no-no IMO.

IIUC, our preferred selection of allowed IPA optimizations would be
provided by what you are referring to as "-flive-patching=inline-clone".



>> 
>>> 1. A study of Kernel live patching schemes.
>>> 
>>> Three major kernel live patching tools:  https://lwn.net/Articles/734765/
>>> 
>>> * ksplice:   http://www.ksplice.com/doc/ksplice.pdf
>>> * kpatch:https://lwn.net/Articles/597123/
>>>https://github.com/dynup/kpatch
>>> * kGraft:
>>> https://pdfs.semanticscholar.org/presentation/af4c/895aa3fef0cc2b501317aaec9d91ba2d704c.pdf
>>> 
>>> In the above, ksplice and kpatch can automatically generate binary patches 
>>> as following:
>>> 
>>>   * a collection of tools which convert a source diff patch to a patch
>>> module. They work by compiling the kernel both with and without the source
>>> patch, comparing the binaries, and generating a binary patch module which 
>>> includes new binary versions of the functions to be replaced.
>>> 
>>> on the other hand, kGraft offers a way to create patches entirely by hand. 
>>> The source of the patch is a single C file, easy to review, easy to
>>> maintain. 
>>> 
>>> In addition to kGraft, there are other live patching tools that prefer
>>> creating patches 

Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Christophe Lyon
On Tue, 23 Oct 2018 at 14:34, Segher Boessenkool
 wrote:
>
> On Tue, Oct 23, 2018 at 02:02:35PM +0200, Christophe Lyon wrote:
> > I also bisected regressions on arm:
> > gcc.c-torture/execute/920428-2.c
> > gfortran.dg/actual_array_substr_2.f90
> > both point to this commit too.
>
> And what are the errors for those?
>
Both are execution failures.

For the fortran test:
*** Error in `./actual_array_substr_2.exe': munmap_chunk(): invalid
pointer: 0x00023057 ***

For the C test (920428-2.c), the .log file isn't very helpful:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped



>
> Segher


Re: [PATCH, contrib] dg-cmp-results: display NA->FAIL by default

2018-10-23 Thread Thomas Preudhomme
And now with the patch. My apologies for the omission.

Best regards,

Thomas
On Tue, 23 Oct 2018 at 12:08, Thomas Preudhomme
 wrote:
>
> Hi,
>
> Currently, dg-cmp-results will not print anything for a test that was
> not run before, even if it is a FAIL now. This means that when
> contributing a code change together with a testcase in the same commit
> one must run dg-cmp-results twice: once to check for regression on a
> full testsuite run and once against the new testcase with -v -v. This
> also prevents using dg-cmp-results on sum files generated with
> test_summary since these would not contain PASS.
>
> This patch changes dg-cmp-results to print NA->FAIL changes by default.
>
> ChangeLog entry is as follows:
>
> *** contrib/ChangeLog ***
>
> 2018-10-23  Thomas Preud'homme  
>
> * dg-cmp-results.sh: Print NA-FAIL changes at default verbosity.
>
> Is this ok for trunk?
>
> Best regards,
>
> Thomas
From ab4272a15bdd8931ef683e234e7dd2e0d038df5f Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme 
Date: Tue, 23 Oct 2018 11:54:51 +0100
Subject: [PATCH] dg-cmp-results: display NA->FAIL by default

Hi,

Currently, dg-cmp-results will not print anything for a test that was
not run before, even if it is a FAIL now. This means that when
contributing a code change together with a testcase in the same commit
one must run dg-cmp-results twice: once to check for regression on a
full testsuite run and once against the new testcase with -v -v. This
also prevents using dg-cmp-results on sum files generated with
test_summary since these would not contain PASS.

This patch changes dg-cmp-results to print NA->FAIL changes by default.

ChangeLog entry is as follows:

*** contrib/ChangeLog ***

2018-10-23  Thomas Preud'homme  

	* dg-cmp-results.sh: Print NA-FAIL changes at default verbosity.

Is this ok for trunk?

Best regards,

Thomas
---
 contrib/dg-cmp-results.sh | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index 821d557a168..921a9b9ca28 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -137,8 +137,11 @@ function drop() {
 function compare(st, nm) {
 old = peek()
 if (old == 0) {
-# This new test wasn't run last time.
-if (verbose >= 2) printf("NA->%s:%s\n", st, nm)
+	# This new test wasn't run last time.
+	if(st == "FAIL" || verbose >= 2) {
+	# New test fails or we want all changes
+	printf("NA->%s:%s\n", st, nm)
+	}
 }
 else {
 	# Compare this new test to the first queued old one.
-- 
2.19.1



Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-10-23 Thread Thomas Preudhomme
[Removing Jeff Law since middle end code hasn't changed]

Hi,

Given how memory operand are reloaded even with an X constraint, I've
reworked the patch for the combined set and combined test instruction
ot keep the mem out of the match_operand and used an expander to
generate the right instruction pattern. I've also fixed some
longstanding issues with the patch when flag_pic is true and with
constraints for Thumb-1 that I hadn't noticed before due to using
dg-cmp-results in conjunction with test_summary which does not show
NA->FAIL (see [1]).

All in all, I think the Arm code would do with a fresh review rather
than looking at the changes since last posted version. (unchanged)
ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-08-09  Thomas Preud'homme  

* target-insns.def (stack_protect_combined_set): Define new standard
pattern name.
(stack_protect_combined_test): Likewise.
* cfgexpand.c (stack_protect_prologue): Try new
stack_protect_combined_set pattern first.
* function.c (stack_protect_epilogue): Try new
stack_protect_combined_test pattern first.
* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
parameters to control which register to use as PIC register and force
reloading PIC register respectively.  Insert in the stream of insns if
possible.
(legitimize_pic_address): Expose above new parameters in prototype and
adapt recursive calls accordingly.  Use pic_reg if non null instead of
cached one.
(arm_load_pic_register): Add pic_reg parameter and use it if non null.
(arm_legitimize_address): Adapt to new legitimize_pic_address
prototype.
(thumb_legitimize_address): Likewise.
(arm_emit_call_insn): Adapt to require_pic_register prototype change.
(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
(thumb1_expand_prologue): Likewise.
* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
change.
(arm_load_pic_register): Likewise.
* config/arm/predicated.md (guard_addr_operand): New predicate.
(guard_operand): New predicate.
* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
prototype change.
(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
prototype change.
(stack_protect_combined_set): New expander..
(stack_protect_combined_set_insn): New insn_and_split pattern.
(stack_protect_set_insn): New insn pattern.
(stack_protect_combined_test): New expander.
(stack_protect_combined_test_insn): New insn_and_split pattern.
(stack_protect_test_insn): New insn pattern.
* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
(UNSPEC_SP_TEST): Likewise.
* doc/md.texi (stack_protect_combined_set): Document new standard
pattern name.
(stack_protect_set): Clarify that the operand for guard's address is
legal.
(stack_protect_combined_test): Document new standard pattern name.
(stack_protect_test): Clarify that the operand for guard's address is
legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  

* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
-fPIC -fstack-protect-all. A glibc build and testsuite run was also
performed for Arm and Thumb-2. Default flags show no regression and
the other runs have some expected scan-assembler failing (due to stack
protector or fPIC code sequence), as well as guality fail (due to less
optimized code with the new stack protector code) and some execution
failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
due to the PIC sequence for the global variable making the frame
layout different for the 2 functions (these become PASS if making the
global variable static).

Is this ok for trunk?

Best regards,

Thomas

[1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html


On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
 wrote:
>
> Hi Thomas,
>
> On 29/08/18 10:51, Thomas Preudhomme wrote:
> > Resend hopefully without HTML this time.
> >
> > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> >  wrote:
> >> Hi,
> >>
> >> I've reworked the patch fixing PR85434 (spilling of stack protector 
> >> guard's address on ARM) to address the testsuite regression on powerpc and 
> >> x86 as well as glibc testsuite regression on ARM. Issues were due to 
> >> unconditionally attempting to generate the new patterns. The code now 
> >> tests if there is a pattern for them for the target before generating 
> >> them. In the ARM side of the patch, I've also added a more specific 
> >> predicate for the new patterns. The new patch is found below.
> >>
> >>
> >> In case of high register pressure in PIC mode, address of the stack
> >> protector's guard can be spilled on ARM targets as shown in PR85434,
> >> thus allowing an attacker to control what the canary would be 

Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 02:02:35PM +0200, Christophe Lyon wrote:
> I also bisected regressions on arm:
> gcc.c-torture/execute/920428-2.c
> gfortran.dg/actual_array_substr_2.f90
> both point to this commit too.

And what are the errors for those?


Segher


Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Segher Boessenkool
On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:
> I have noticed many regressions on arm and aarch64 between 265366 and
> 265408 (this commit is 265398).
> 
> I bisected at least one to this commit on aarch64:
> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> live-range of register"
> The same test also regresses on arm.

Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
when random things in the RTL change, apparently.

> For a whole picture of all the regressions I noticed during these two
> commits, have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html

No thanks.  I am not going to click on 111 links and whatever is behind
those.  Please summarise, like, what was the diff in test_summary, and
then dig down into individual tests if you want.  Or whatever else works
both for you and for me.  This doesn't work for me.


Segher


Re: [PATCH] Fix some EVRP stupidness

2018-10-23 Thread Aldy Hernandez
Thanks!

On Tue, Oct 23, 2018, 13:37 Richard Biener  wrote:

> On Tue, 23 Oct 2018, Richard Biener wrote:
>
> > On Tue, 23 Oct 2018, Aldy Hernandez wrote:
> >
> > >
> > > > +   if (tem.kind () == old_vr->kind ()
> > > > +   && tem.min () == old_vr->min ()
> > > > +   && tem.max () == old_vr->max ())
> > > > + continue;
> > >
> > > I think it would be cleaner to use tem.ignore_equivs_equal_p
> (*old_vr). The
> > > goal was to use == when the equivalence bitmap should be taken into
> account,
> > > or ignore_equivs_equal_p() otherwise.
> >
> > Ah, didn't know of that function (and yes, I wanted to ignore equivs).
> >
> > Will try to remember together with the dump thing David noticed.
>
> Like the following.
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2018-10-23  Richard Biener  
>
> * tree-vrp.c (add_assert_info): Guard dump_printf with
> dump_enabled_p.
> * gimple-ssa-evrp-analyze.c
> (evrp_range_analyzer::record_ranges_from_incoming_edge):
> Use value_range::ignore_equivs_equal_p.
>
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  (revision 265420)
> +++ gcc/tree-vrp.c  (working copy)
> @@ -2299,9 +2299,10 @@ add_assert_info (vec 
>info.val = val;
>info.expr = expr;
>asserts.safe_push (info);
> -  dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
> -  "Adding assert for %T from %T %s %T\n",
> -  name, expr, op_symbol_code (comp_code), val);
> +  if (dump_enabled_p ())
> +dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
> +"Adding assert for %T from %T %s %T\n",
> +name, expr, op_symbol_code (comp_code), val);
>  }
>
>  /* If NAME doesn't have an ASSERT_EXPR registered for asserting
> Index: gcc/gimple-ssa-evrp-analyze.c
> ===
> --- gcc/gimple-ssa-evrp-analyze.c   (revision 265420)
> +++ gcc/gimple-ssa-evrp-analyze.c   (working copy)
> @@ -209,9 +209,7 @@ evrp_range_analyzer::record_ranges_from_
>   value_range *old_vr = get_value_range (vrs[i].first);
>   value_range tem (old_vr->kind (), old_vr->min (),
> old_vr->max ());
>   tem.intersect (vrs[i].second);
> - if (tem.kind () == old_vr->kind ()
> - && tem.min () == old_vr->min ()
> - && tem.max () == old_vr->max ())
> + if (tem.ignore_equivs_equal_p (*old_vr))
> continue;
>   push_value_range (vrs[i].first, vrs[i].second);
>   if (is_fallthru
>


Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Christophe Lyon
On Tue, 23 Oct 2018 at 12:14, Christophe Lyon
 wrote:
>
> On Mon, 22 Oct 2018 at 22:17, Segher Boessenkool
>  wrote:
> >
> > On most targets every function starts with moves from the parameter
> > passing (hard) registers into pseudos.  Similarly, after every call
> > there is a move from the return register into a pseudo.  These moves
> > usually combine with later instructions (leaving pretty much the same
> > instruction, just with a hard reg instead of a pseudo).
> >
> > This isn't a good idea.  Register allocation can get rid of unnecessary
> > moves just fine, and moving the parameter passing registers into many
> > later instructions tends to prevent good register allocation.  This
> > patch disallows combining moves from a hard (non-fixed) register.
> >
> > This also avoid the problem mentioned in PR87600 #c3 (combining hard
> > registers into inline assembler is problematic).
> >
> > Because the register move can often be combined with other instructions
> > *itself*, for example for setting some condition code, this patch adds
> > extra copies via new pseudos after every copy-from-hard-reg.
> >
> > On some targets this reduces average code size.  On others it increases
> > it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
> >
> > I'll commit this to trunk now.  If there are problems, please don't
> > hesitate to let me know!  Thanks.
> >
>
> Hi,
>
> I have noticed many regressions on arm and aarch64 between 265366 and
> 265408 (this commit is 265398).
>
> I bisected at least one to this commit on aarch64:
> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
> live-range of register"
> The same test also regresses on arm.
>
> For a whole picture of all the regressions I noticed during these two
> commits, have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html
>
> Christophe
>

I also bisected regressions on arm:
gcc.c-torture/execute/920428-2.c
gfortran.dg/actual_array_substr_2.f90
both point to this commit too.



>
>
> >
> > Segher
> >
> >
> > 2018-10-22  Segher Boessenkool  
> >
> > PR rtl-optimization/87600
> > * combine.c: Add include of expr.h.
> > (cant_combine_insn_p): Do not combine moves from any hard non-fixed
> > register to a pseudo.
> > (make_more_copies): New function, add a copy to a new pseudo after
> > the moves from hard registers into pseudos.
> > (rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
> > later.  Call make_more_copies.
> >
> > ---
> >  gcc/combine.c | 50 ++
> >  1 file changed, 46 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 256b5a4..3ff1760 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "explow.h"
> >  #include "insn-attr.h"
> >  #include "rtlhooks-def.h"
> > +#include "expr.h"
> >  #include "params.h"
> >  #include "tree-pass.h"
> >  #include "valtrack.h"
> > @@ -2348,8 +2349,7 @@ cant_combine_insn_p (rtx_insn *insn)
> >  dest = SUBREG_REG (dest);
> >if (REG_P (src) && REG_P (dest)
> >&& ((HARD_REGISTER_P (src)
> > -  && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))
> > -  && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO 
> > (src
> > +  && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> >   || (HARD_REGISTER_P (dest)
> >   && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
> >   && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO 
> > (dest))
> > @@ -14969,11 +14969,53 @@ dump_combine_total_stats (FILE *file)
> >   total_attempts, total_merges, total_extras, total_successes);
> >  }
> >
> > +/* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, 
> > because
> > +   the reg-to-reg copy can usefully combine with later instructions, but we
> > +   do not want to combine the hard reg into later instructions, for that
> > +   restricts register allocation.  */
> > +static void
> > +make_more_copies (void)
> > +{
> > +  basic_block bb;
> > +
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +{
> > +  rtx_insn *insn;
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +{
> > +  if (!NONDEBUG_INSN_P (insn))
> > +continue;
> > +
> > + rtx set = single_set (insn);
> > + if (!set)
> > +   continue;
> > + rtx src = SET_SRC (set);
> > + rtx dest = SET_DEST (set);
> > + if (GET_CODE (src) == SUBREG)
> > +   src = SUBREG_REG (src);
> > + if (!(REG_P (src) && HARD_REGISTER_P (src)))
> > +   continue;
> > + if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> > +   continue;
> > +
> > + rtx new_reg = gen_reg_rtx (GET_MODE (dest));
> > + rtx_insn *insn1 = gen_move_insn (new_reg, 

Re: [PATCH] Fix some EVRP stupidness

2018-10-23 Thread Richard Biener
On Tue, 23 Oct 2018, Richard Biener wrote:

> On Tue, 23 Oct 2018, Aldy Hernandez wrote:
> 
> > 
> > > +   if (tem.kind () == old_vr->kind ()
> > > +   && tem.min () == old_vr->min ()
> > > +   && tem.max () == old_vr->max ())
> > > + continue;
> > 
> > I think it would be cleaner to use tem.ignore_equivs_equal_p (*old_vr). The
> > goal was to use == when the equivalence bitmap should be taken into account,
> > or ignore_equivs_equal_p() otherwise.
> 
> Ah, didn't know of that function (and yes, I wanted to ignore equivs).
> 
> Will try to remember together with the dump thing David noticed.

Like the following.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-10-23  Richard Biener  

* tree-vrp.c (add_assert_info): Guard dump_printf with
dump_enabled_p.
* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::record_ranges_from_incoming_edge):
Use value_range::ignore_equivs_equal_p.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 265420)
+++ gcc/tree-vrp.c  (working copy)
@@ -2299,9 +2299,10 @@ add_assert_info (vec 
   info.val = val;
   info.expr = expr;
   asserts.safe_push (info);
-  dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
-  "Adding assert for %T from %T %s %T\n",
-  name, expr, op_symbol_code (comp_code), val);
+  if (dump_enabled_p ())
+dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
+"Adding assert for %T from %T %s %T\n",
+name, expr, op_symbol_code (comp_code), val);
 }
 
 /* If NAME doesn't have an ASSERT_EXPR registered for asserting
Index: gcc/gimple-ssa-evrp-analyze.c
===
--- gcc/gimple-ssa-evrp-analyze.c   (revision 265420)
+++ gcc/gimple-ssa-evrp-analyze.c   (working copy)
@@ -209,9 +209,7 @@ evrp_range_analyzer::record_ranges_from_
  value_range *old_vr = get_value_range (vrs[i].first);
  value_range tem (old_vr->kind (), old_vr->min (), old_vr->max ());
  tem.intersect (vrs[i].second);
- if (tem.kind () == old_vr->kind ()
- && tem.min () == old_vr->min ()
- && tem.max () == old_vr->max ())
+ if (tem.ignore_equivs_equal_p (*old_vr))
continue;
  push_value_range (vrs[i].first, vrs[i].second);
  if (is_fallthru


Performance impact of disabling non-clone IPA optimizations for the Linux kernel (was: "GCC options for kernel live-patching")

2018-10-23 Thread Nicolai Stange
Hi,

let me summarize some results from performance comparisons of Linux
kernels compiled with and without certain IPA optimizations.

It's a slight abuse of this thread, but I think having the numbers might
perhaps give some useful insights on the potential costs associated with
the -flive-patching discussed here.

All kudos go to Giovanni Gherdovich from the SUSE Performance Team who
did all of the work presented below.

For a TL;DR, see the conclusion at the end of this email.

Martin Jambor  writes:

> (this message is a part of the thread originating with
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html)
>
> We have just had a quick discussion with two upstream maintainers of
> Linux kernel live-patching about this and the key points were:
>
> 1. SUSE live-patch creators (and I assume all that use the upstream
>live-patching method) use Martin Liska's (somewhat under-documented)
>-fdump-ipa-clones option and a utility he wrote
>(https://github.com/marxin/kgraft-analysis-tool) to deal with all
>kinds of inlining, IPA-CP and generally all IPA optimizations that
>internally create a clone.  The tool tells them what happened and
>also lists all callers that need to be live-patched.
>
> 2. However, there is growing concern about other IPA analyses that do
>not create a clone but still affect code generation in other
>functions.  Kernel developers have identified and disabled IPA-RA but
>there is more of them such as IPA-modref analysis, stack alignment
>propagation and possibly quite a few others which extract information
>from one function and use it a caller or perhaps even some
>almost-unrelated functions (such as detection of read-only and
>write-only static global variables).
>
>The kernel live-patching community would welcome if GCC had an option
>that could disable all such optimizations/analyses for which it
>cannot provide a list of all affected functions (i.e. which ones need
>to be live-patched if a particular function is).

AFAIU, the currently known IPA optimizations of this category are
(c.f. [1] and [2] from this thread):

 - -fipa-pure-const
 - -fipa-pta
 - -fipa-reference
 - -fipa-ra
 - -fipa-icf
 - -fipa-bit-cp
 - -fipa-vrp
 - and some others which might be problematic but currently can't get
   disabled on the cli:
   - stack alignment requirements
   - duplication of or skipping of alias analysis for
 functions/variables whose address is not taken (I don't know what
 that means, TBH).

Some time ago, Giovanni compared the performance of a kernel compiled with

 -fno-ipa-pure-const
 -fno-ipa-pta
 -fno-ipa-reference
 -fno-ipa-ra
 -fno-ipa-icf
 -fno-ipa-bit-cp
 -fno-ipa-vrp

plus (because I wasn't able to tell whether these are problematic in the
context of live patching)

 -fno-ipa-cp
 -fno-ipa-cp-clone
 -fno-ipa-profile
 -fno-ipa-sra

against a kernel compiled without any of these.

The kernel was a 4.12.14 one with additional patches on top.

The benchmarks had been performed on a smaller and on a bigger machine
each. Specs:
- single socket with a Xeon E3-1240 v5 (Skylake), 4 cores / 8 threads,
  32G of memory (UMA)
- 2 sockets with each one mounting a Xeon E5-2698 v4 (Broadwell) for a
  total of 40 cores / 80 threads and 528G of memory (NUMA)

You can find the results here:

  
https://beta.suse.com/private/nstange/ggherdovich-no-ipa-results/dashboard.html

"laurel2" is the smaller machine, "hardy4" the bigger one.

The numbers presented in the dashboard are a relative measure of how the
no-ipa kernel was performing in comparison to the stock one. "1" means
no change, and, roughly speaking, each deviation by 0.01 from that value
corresponds to an overall performance change of 1%. Depending on the
benchmark, higher means better (e.g. for throughput) or vice versa
(e.g. for latencies). Some of the numbers are highlighted in green or
red. Green means that the no-ipa kernel performs better, red the
contrary.

The sockperf-{tcp,udp}-under-load results are spoiled due to outliers,
probably because of slow vs. fast paths. Please ignore.

(If you're interested in the detailed results, you can click on any of
 those accumulated numbers in the dashboard. Scroll down and you'll find
 some nice plots.)

For the overall outcome, let me quote Giovanni who summarized it nicely:

  What's left in red:

  * fsmark-threaded on laurel2 (skylake 8 cores), down 2%: if you look at the
histograms of files created per seconds, there is never a clear winner
between with and without IPA (except for the single-threaded case). Clean
on hardy4.

  * sockperf-udp-throughput, hardy4: yep this one is statistically
significant (in the plot you clearly see that the green dots are all
below the yellow dots). 4% worst on average. Clean on the other machine.

  * tbench: this one is significant too (look at the histogram, no
overlapping between the two distributions) but it's a curious one,
because on the 

Re: [PATCH][RFC] Early phiopt pass

2018-10-23 Thread Richard Biener
On Mon, 22 Oct 2018, Richard Biener wrote:

> On Wed, 29 Aug 2018, Richard Biener wrote:
> 
> > On Wed, 29 Aug 2018, Jeff Law wrote:
> > 
> > > On 08/29/2018 04:56 AM, Richard Biener wrote:
> > > > 
> > > > In response to PR87105 I dusted off an old patch that adds an early
> > > > phiopt pass.  Recognizing we run phiopt twice with not many passes
> > > > in between early in post-IPA optimizations this patch moves the
> > > > first of said to the early pipeline.
> > > > 
> > > > The main motivation is to do things like MIN/MAX_EXPR early to
> > > > avoid jump threading mess up the CFG (the case with PR87105).
> > > > I realize theres early backward threading before the new early
> > > > phiopt pass but that doesn't seem to do anything useful there (yet).
> > > > I think it makes sense to push that later anyways.
> > > > 
> > > > Now, early phiopt is quite confused about predict stmts still
> > > > being present and turning conditional BBs into diamonds which it
> > > > cannot handle.  I've fixed at least stray such stmts in the BBs
> > > > that are interesting.  Note this may hide fallout which would otherwise
> > > > be visible in the testsuite (there's no flag to avoid
> > > > generating the predictors - they are emitted directly by the frontends,
> > > > maybe we could drop them with -fno[-guess]-branch-probabilities at
> > > > gimplification time?).
> > > > 
> > > > There's also an effect on ifcombine which, when preceeded by phiopt,
> > > > can miss cases because phiopt may value-replace some condition.
> > > > 
> > > > The patch contains adjustments to testcases where there's no harm done
> > > > in the end and leaves those FAILing where we would need to do sth.
> > > > 
> > > > In the end it's regular pass-ordering issues but our testsuite very
> > > > often ties our hands when re-ordering passes because of them.
> > > > 
> > > > One option would be to distinguish early from late phiopt and for
> > > > example avoid value-replacement - like just do MIN/MAX recognition
> > > > for the vectorizer.
> > > > 
> > > > Any comments?
> > > > 
> > > > Some detailed notes on the remaining FAILs below.
> > > [ ... ]
> > > I didn't see anything in the testsuite fallout that gave me significant
> > > concern.  If your judgment is that we're better off running it earlier,
> > > then let's do it.
> > 
> > I guess so.  I'll add an early variant anyway since we don't want to
> > do adjacent load hoisting early.  I'll see how difficult it is to handle
> > ifcombine merging with straight-line code or catch those cases elsewhere.
> 
> So I'm finally returning to this...
> 
> I failed to find a sequence of VRP (jump threading), ifcombine and phiopt
> that avoids regressions (well, the current order works of course).  So
> instead of moving the first phiopt pass I am now _adding_ a phiopt
> pass early doing _only_ min/max/abs replacement.  Those are wrecked
> by jump-threading if they happen in sequences with some common operands.
> 
> Thus the following patch is now in bootstrap & regtest on 
> x86_64-unknown-linux-gnu.
> 
> If that goes well I plan to install the patch after digging out
> the relevant two-to-three PRs and adding testcases for them.

The following is what I have applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

>From 83f89c82650bdfa057bde3ef1a236338a8cdd848 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Tue, 28 Aug 2018 12:53:53 +0200
Subject: [PATCH] early-phiopt

FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++98  scan-tree-dump-times profile_es
timate "loop exit heuristics:" 2

phiopt value-replaces the last "exit" test in

  :
  _5 = foo ();
  if (_5 != 0)
goto ;
  else
goto ;

  :
  g.0_7 = g;
  if (g.0_7 <= 9)
goto ;
  else
goto ;

  :

  :
  # iftmp.3_1 = PHI <1(5), 0(6), 1(4)>
  if (iftmp.3_1 != 0)
goto ;
  else
goto ;

  :
  return;

to look like

  :
  g.0_7 = g;
  _6 = g.0_7 <= 9;

  :
  # iftmp.3_1 = PHI <_6(5), 1(4)>
  if (iftmp.3_1 != 0)
goto ;

which confuses whatever the testcase was supposed to check (there is only
one loop exit).

FAIL: gcc.dg/tree-ssa/ssa-pre-32.c scan-tree-dump pre "# prephitmp_[0-9]+ = PHI 
<[xy]_[0-9]+(D)[^,]*, [xy]_[0-9]+(D)"

phiopt optimizes

:
   if (b_5(D) != 0)
 goto ; [INV]
   else
 goto ; [INV]

:

:
   # iftmp.0_3 = PHI <4294967295(2), 0(3)>
   _1 = iftmp.0_3 & x_8(D);

to

   _9 = (unsigned int) b_5(D);
   _14 = -_9;
   _1 = x_8(D) & _14;

but in the sequence with two same conditions but opposite bits
nothing optimizes this further back to b_5(D) ? x_6(D) : y_7(D).

FAIL: gcc.dg/tree-ssa/ssa-ifcombine-7.c scan-tree-dump ifcombine " > "
FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c scan-tree-dump optimized "&"
FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c scan-tree-dump optimized "&"
FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c scan-tree-dump-times optimized "&" 
2
  \|" 2

early phiopt value-replaces the inner condition which leaves ifcombine
with no work.


Re: [PATCH] Make __PRETTY_FUNCTION__-like functions mergeable string csts (PR c++/64266).

2018-10-23 Thread Richard Biener
On Tue, Oct 23, 2018 at 10:59 AM Martin Liška  wrote:
>
> Hi.
>
> I've returned to this long-lasting issue after quite some time. Thanks to 
> Honza I hope
> I can now address the root cause which caused output of a string constant 
> when debug info
> was emitted. The problematic situation happened with following back-trace:
>
> #0  mergeable_string_section (decl=, align=64, 
> flags=0) at /home/marxin/Programming/gcc/gcc/varasm.c:808
> #1  0x01779bf3 in default_elf_select_section (decl= 0x767be210>, reloc=0, align=64) at 
> /home/marxin/Programming/gcc/gcc/varasm.c:6739
> #2  0x0176efb6 in get_constant_section (exp= 0x767be210>, align=64) at /home/marxin/Programming/gcc/gcc/varasm.c:3302
> #3  0x0176f468 in build_constant_desc (exp= 0x767be210>) at /home/marxin/Programming/gcc/gcc/varasm.c:3371
> #4  0x0176f81c in output_constant_def (exp= 0x767be210>, defer=1) at /home/marxin/Programming/gcc/gcc/varasm.c:3434
> #5  0x0176d406 in decode_addr_const (exp=, 
> value=0x7fffc540) at /home/marxin/Programming/gcc/gcc/varasm.c:2951
> #6  0x0176d93f in const_hash_1 (exp=) at 
> /home/marxin/Programming/gcc/gcc/varasm.c:3054
> #7  0x0176fdc2 in lookup_constant_def (exp= 0x7682f8c0>) at /home/marxin/Programming/gcc/gcc/varasm.c:3557
> #8  0x00dd5778 in cst_pool_loc_descr (loc=) 
> at /home/marxin/Programming/gcc/gcc/dwarf2out.c:17288
>
> That was in situation where we emit debug info of a function that has an 
> inlined __PRETTY_FUNCTION__ from
> a different function. As seen, the constant is output due to const_hash_1 
> function call. Proper fix would
> be to not emit these string constants for purpose of hash function.

possibly sth like the following - that avoids all cases of calling
output_constant_def.  Probably worth testing
separately.

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 91650eea9f7..9121dbd2c84 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -3047,6 +3047,10 @@ const_hash_1 (const tree exp)
   }

 case ADDR_EXPR:
+  if (CONSTANT_CLASS_P (TREE_OPERAND (exp, 0)))
+   return const_hash_1 (TREE_OPERAND (exp, 0));
+
+  /* Fallthru.  */
 case FDESC_EXPR:
   {
struct addr_const value;

> However, I still see some minor ICEs, it's probably related to 
> decay_conversion in cp_fname_init:
>
> 1) ./xg++ -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C
>
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C:6:17:
>  internal compiler error: Segmentation fault
> 6 | [] { return __func__; }();
>   | ^~~~
> 0x1344568 crash_signal
> /home/marxin/Programming/gcc/gcc/toplev.c:325
> 0x76bc310f ???
> 
> /usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
> 0x9db134 is_capture_proxy(tree_node*)
> /home/marxin/Programming/gcc/gcc/cp/lambda.c:261
> 0xaeecb7 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
> /home/marxin/Programming/gcc/gcc/cp/pt.c:16700
> 0xaee5fd tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
> /home/marxin/Programming/gcc/gcc/cp/pt.c:16636
> 0xaf0ffb tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
> /home/marxin/Programming/gcc/gcc/cp/pt.c:16942
>
> where
> (gdb) p debug_tree(decl)
>   type  type  type_6 QI
> size 
> unit-size 
> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x769b5498 precision:8 min  max 
> 
> pointer_to_this >
> unsigned DI
> size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x769b5540>
> readonly used tree_2 unsigned DI 
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C:6:17
>  size  unit-size 
> align:64 warn_if_not_align:0 context  operator()>
> value-expr 
> constant
> arg:0 
> constant
> arg:0 
> constant "operator()\000"
>
> and
> #0  0x009db134 in is_capture_proxy (decl=) at 
> /home/marxin/Programming/gcc/gcc/cp/lambda.c:261
>
> 2 ) ./xg++ -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C -c
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C:16:24: 
> internal compiler error: Segmentation fault
> 16 | __PRETTY_FUNCTION__), 0))
>|^
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C:36:10: note: 
> in expansion of macro ‘assert’
> 36 | int a = (assert (foo ()), 1);
>|  ^~
> 0x1344568 crash_signal
> /home/marxin/Programming/gcc/gcc/toplev.c:325
> 0x76bc310f ???
> 
> /usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
> 0x9db270 is_capture_proxy(tree_node*)
> 

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Wilco Dijkstra
Hi,

>> Generally the goal is 1ULP in round to nearest
>
> Has that changed recently?  At least in the past for double the goal has
> been always .5ULP in round to nearest.

Yes. 0.5 ULP (perfect rounding) as a goal was insane as it caused ridiculous
slowdowns in the 10x range for no apparent reason. GLIBC was black listed
in the HPC community as a result. So I removed most of the perfect rounding
code - this not only avoids the slowdown but also speeds up the average case
significantly. The goal is to stay below 1 ULP, the math functions Szabolcs and 
I
rewrote generally do better, eg sinf is 0.56 ULP.

Wilco


[PATCH, contrib] dg-cmp-results: display NA->FAIL by default

2018-10-23 Thread Thomas Preudhomme
Hi,

Currently, dg-cmp-results will not print anything for a test that was
not run before, even if it is a FAIL now. This means that when
contributing a code change together with a testcase in the same commit
one must run dg-cmp-results twice: once to check for regression on a
full testsuite run and once against the new testcase with -v -v. This
also prevents using dg-cmp-results on sum files generated with
test_summary since these would not contain PASS.

This patch changes dg-cmp-results to print NA->FAIL changes by default.

ChangeLog entry is as follows:

*** contrib/ChangeLog ***

2018-10-23  Thomas Preud'homme  

* dg-cmp-results.sh: Print NA-FAIL changes at default verbosity.

Is this ok for trunk?

Best regards,

Thomas


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Jakub Jelinek
On Tue, Oct 23, 2018 at 10:37:54AM +, Wilco Dijkstra wrote:
> >> So I think the runtime math libraries shoot for .5 ULP (yes, they don't
> >> always make it, but that's their goal).  We should probably have the
> >> same goal.  Going from 0 to 2 ULPs would be considered bad.
> 
> Generally the goal is 1ULP in round to nearest

Has that changed recently?  At least in the past for double the goal has
been always .5ULP in round to nearest.

> > But we do that everywhere (with -funsafe-math-optimizations or
> > -fassociative-math).
> 
> Exactly. And 2 ULP is extremely accurate for fast-math transformations - much
> better than eg. reassociating additions.

For -ffast-math yeah.

Jakub


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Wilco Dijkstra
Hi,

>> So I think the runtime math libraries shoot for .5 ULP (yes, they don't
>> always make it, but that's their goal).  We should probably have the
>> same goal.  Going from 0 to 2 ULPs would be considered bad.

Generally the goal is 1ULP in round to nearest - other rounding modes may have
higher ULP. The current GLIBC float/double/long double sinh and tanh are 2 ULP
in libm-test-ulps (they can be 4 ULP in non-nearest rounding modes). cosh is
1 ULP in round to nearest but up to 3 in other rounding modes.

> But we do that everywhere (with -funsafe-math-optimizations or
> -fassociative-math).

Exactly. And 2 ULP is extremely accurate for fast-math transformations - much
better than eg. reassociating additions.

Wilco

Re: Relocation (= move+destroy)

2018-10-23 Thread Jonathan Wakely

CCing gcc-patches

On 19/10/18 07:33 +0200, Marc Glisse wrote:

On Thu, 18 Oct 2018, Marc Glisse wrote:

Uh, why didn't I notice that the function __relocate is unused? I 
guess I'll resend the same patch without __relocate once retesting 
has finished :-( Sorry for all the corrections, I guess I didn't 
check my patch carefully enough before sending it the first time.


2018-10-19  Marc Glisse  

   PR libstdc++/87106
   * include/bits/alloc_traits.h (_S_construct, _S_destroy, construct,
   destroy): Add noexcept specification.
   * include/bits/allocator.h (construct, destroy): Likewise.
   * include/ext/alloc_traits.h (construct, destroy): Likewise.
   * include/ext/malloc_allocator.h (construct, destroy): Likewise.
   * include/ext/new_allocator.h (construct, destroy): Likewise.
* include/bits/stl_uninitialized.h (__relocate_a, __relocate_a_1):
New functions.
   (__is_trivially_relocatable): New class.
   * include/bits/stl_vector.h (__use_relocate): New static member.
   * include/bits/vector.tcc (reserve, _M_realloc_insert,
   _M_default_append): Use __relocate_a.
   (reserve, _M_assign_aux, _M_realloc_insert, _M_fill_insert,
   _M_default_append, _M_range_insert): Move _GLIBCXX_ASAN_ANNOTATE_REINIT
   after _Destroy.
   * testsuite/23_containers/vector/modifiers/push_back/49836.cc:
   Replace CopyConsOnlyType with DelAnyAssign.

--
Marc Glisse


The tricky stuff in  all looks right, I only have
some comments on the __relocate_a functions ...



Index: libstdc++-v3/include/bits/stl_uninitialized.h
===
--- libstdc++-v3/include/bits/stl_uninitialized.h   (revision 265289)
+++ libstdc++-v3/include/bits/stl_uninitialized.h   (working copy)
@@ -872,14 +872,75 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
uninitialized_move_n(_InputIterator __first, _Size __count,
 _ForwardIterator __result)
{
  auto __res = std::__uninitialized_copy_n_pair
(_GLIBCXX_MAKE_MOVE_ITERATOR(__first),
 __count, __result);
  return {__res.first.base(), __res.second};
}
#endif

+#if __cplusplus >= 201402L


What depends on C++14 here? Just enable_if_t? Because we have
__enable_if_t for use in C++11.

Both GCC and Clang will allow constexpr-if and static_assert with no
message in C++11.


+  template
+inline void
+__relocate_a(_Tp* __dest, _Up* __orig, _Allocator& __alloc)


I find it a little surprising that this overload for single objects
using the memmove argument ordering (dest, source) but the range overload
below uses the STL ordering (source_begin, source_end, dest).

But I wouldn't be surprised if we're already doing that somewhere that
I've forgotten about.

WOuld it make sense to either rename this overload, or to use
consistent argument ordering for the two __relocate_a overloads?


+noexcept(noexcept(__gnu_cxx::__alloc_traits<_Allocator>::construct(__alloc,


Since this is C++14 (or maybe C++11) you could just use
std::allocator_traits directly. __gnu_cxx::__alloc_traits is to
provide equivalent functionality in C++98 code.


+__dest, std::move(*__orig)))
+&& noexcept(__gnu_cxx::__alloc_traits<_Allocator>::destroy(
+   __alloc, std::__addressof(*__orig
+{
+  typedef __gnu_cxx::__alloc_traits<_Allocator> __traits;
+  __traits::construct(__alloc, __dest, std::move(*__orig));
+  __traits::destroy(__alloc, std::__addressof(*__orig));
+}
+
+  template
+struct __is_trivially_relocatable
+: is_trivial<_Tp> { };


It might be worth adding a comment that this type might be specialized
in future, so that I don't forget and simplify it to an alias template
later :-)


+  template 
+inline std::enable_if_t::value, _Tp*>
+__relocate_a_1(_Tp* __first, _Tp* __last,
+  _Tp* __result, allocator<_Up>& __alloc)
+{
+  ptrdiff_t __count = __last - __first;
+  __builtin_memmove(__result, __first, __count * sizeof(_Tp));
+  return __result + __count;
+}
+
+  template 
+inline _ForwardIterator
+__relocate_a_1(_InputIterator __first, _InputIterator __last,
+  _ForwardIterator __result, _Allocator& __alloc)
+{
+  typedef typename iterator_traits<_InputIterator>::value_type
+   _ValueType;
+  typedef typename iterator_traits<_ForwardIterator>::value_type
+   _ValueType2;
+  static_assert(std::is_same<_ValueType, _ValueType2>::value);
+  static_assert(noexcept(std::__relocate_a(std::addressof(*__result),
+  std::addressof(*__first),
+  __alloc)));
+  _ForwardIterator __cur = __result;
+  for (; __first != __last; ++__first, (void)++__cur)
+   std::__relocate_a(std::__addressof(*__cur),
+ std::__addressof(*__first), __alloc);
+   

Re: [PATCH] Default to an ARM cpu that exists

2018-10-23 Thread Richard Earnshaw (lists)
On 22/10/2018 19:14, co...@sdf.org wrote:
> On Mon, Oct 22, 2018 at 03:56:24PM +0100, Richard Earnshaw (lists) wrote:
>> I think strongarm would be a better choice.  I'm not aware of anyone
>> running NetBSD on Arm8 cpus.
>>
>> Otherwise, this is fine with a suitable ChangeLog entry.
>>
>> R.
> 
> I hope this is OK. Thanks!
> 
> Maya Rashish  
> 
> PR target/86383
> * config.gcc (arm*-*-*): Change default -mcpu to strongarm.
> 
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 720e6a737..23e2e85c8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3987,7 +3987,7 @@ case "${target}" in
>   TM_MULTILIB_CONFIG="$with_multilib_list"
>   fi
>   fi
> - target_cpu_cname=${target_cpu_cname:-arm6}
> + target_cpu_cname=${target_cpu_cname:-strongarm}
>   with_cpu=${with_cpu:-$target_cpu_cname}
>   ;;
>  
> 

Thinking about this overnight and discussing with some colleagues, we've
concluded that the best generic default is probably ARM7TDMI, since
StrongArm is now on the deprecated list (note, not obsolete - yet) as it
lacks support for Thumb.  Because of this I've changed the NetBSD
default to StrongARM as that reflects that this target can still support
ARMv4 devices.

So I've committed this patch:

PR target/86383
* config.gcc (arm*-*-netbsdelf*): Default to StrongARM if no CPU
specified to configure.
(arm*-*-*): Use ARM7TDMI as the target CPU if no default provided.

R.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 720e6a7373d..a2e89e23706 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1134,7 +1134,7 @@ arm*-*-netbsdelf*)
 	tm_file="dbxelf.h elfos.h ${nbsd_tm_file} arm/elf.h arm/aout.h ${tm_file} arm/netbsd-elf.h"
 	extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
 	tmake_file="${tmake_file} arm/t-arm"
-	target_cpu_cname="arm6"
+	target_cpu_cname="strongarm"
 	;;
 arm*-*-linux-*)			# ARM GNU/Linux with ELF
 	tm_file="dbxelf.h elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h"
@@ -3987,7 +3987,7 @@ case "${target}" in
 TM_MULTILIB_CONFIG="$with_multilib_list"
 			fi
 		fi
-		target_cpu_cname=${target_cpu_cname:-arm6}
+		target_cpu_cname=${target_cpu_cname:-arm7tdmi}
 		with_cpu=${with_cpu:-$target_cpu_cname}
 		;;
 


Re: [PATCH] Switch conversion: support any ax + b transformation (PR tree-optimization/84436).

2018-10-23 Thread Richard Biener
On Tue, Oct 23, 2018 at 10:37 AM Martin Liška  wrote:
>
> On 10/22/18 4:25 PM, Jakub Jelinek wrote:
> > On Mon, Oct 22, 2018 at 04:08:53PM +0200, Martin Liška wrote:
> >> Very valid question. I hope as long as I calculate the linear function
> >> values in wide_int (get via wi::to_wide (switch_element)), then it should
> >> overflow in the same way as original tree type arithmetic. I have a 
> >> test-case with
> >> overflow: gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c.
> >>
> >> Do you have any {over,under)flowing test-cases that I should add to 
> >> test-suite?
> >
> > I'm worried that the calculation you emit into the code could invoke UB at
> > runtime, even if there was no UB in the original code, and later GCC passes
> > would optimize with the assumption that UB doesn't occur.
> > E.g. if the multiplication overflows for one or more of the valid values in
> > the switch and then the addition adds a negative value so that the end
> > result is actually representable.
>
> In order to address that I verified that neither of (a * x) and (a * x) + b 
> {over,under}flow
> in case of TYPE_OVERFLOW_UNDEFINED (type) is true.
>
> Hope it's way how to properly make it safe?

Hmm, if the default: case is unreachable maybe.  But I guess Jakub was
suggesting to do the linear function compute in an unsigned type?

+  /* Let's try to find any linear function a.x + y that can apply to

a * x?

+ given values. 'a' can be calculated as follows:

+  tree t = TREE_TYPE (m_index_expr);

so unsigned_type_for (TREE_TYPE ...)

+  tree tmp = make_ssa_name (t);
+  tree value = fold_build2_loc (loc, MULT_EXPR, t,
+   wide_int_to_tree (t, coeff_a),
+   m_index_expr);
+

+  gsi_insert_before (, gimple_build_assign (tmp, value),
GSI_SAME_STMT);
+  value = fold_build2_loc (loc, PLUS_EXPR, t,
+  tmp, wide_int_to_tree (t, coeff_b));
+  tree tmp2 = make_ssa_name (t);
+  gsi_insert_before (, gimple_build_assign (tmp2, value),
+GSI_SAME_STMT);
+  load = gimple_build_assign (name, NOP_EXPR, fold_convert (t, tmp2));

before the unsigned_type_for that NOP_EXPR would be always redundant.

Please also use

  gimple_seq seq = NULL;
  tree tmp = gimple_build (, MULT_EXPR, type, ...);
  tree tmp2 = gimple_build (, PLUS_EXPR, type, ...);
  tree tmp3 = gimple_convert (, TREE_TYPE (m_index_expr), tmp2);
  gsi_insert_seq_before (, seq, GSI_SAME_STMT);
  load = gimple_build_assign (name, tmp3);

not sure why you need the extra assignment at the end, not enough
context in the patch.

Richard.


> Martin
>
> >
> >   Jakub
> >
>


Re: [PATCH] combine: Do not combine moves from hard registers

2018-10-23 Thread Christophe Lyon
On Mon, 22 Oct 2018 at 22:17, Segher Boessenkool
 wrote:
>
> On most targets every function starts with moves from the parameter
> passing (hard) registers into pseudos.  Similarly, after every call
> there is a move from the return register into a pseudo.  These moves
> usually combine with later instructions (leaving pretty much the same
> instruction, just with a hard reg instead of a pseudo).
>
> This isn't a good idea.  Register allocation can get rid of unnecessary
> moves just fine, and moving the parameter passing registers into many
> later instructions tends to prevent good register allocation.  This
> patch disallows combining moves from a hard (non-fixed) register.
>
> This also avoid the problem mentioned in PR87600 #c3 (combining hard
> registers into inline assembler is problematic).
>
> Because the register move can often be combined with other instructions
> *itself*, for example for setting some condition code, this patch adds
> extra copies via new pseudos after every copy-from-hard-reg.
>
> On some targets this reduces average code size.  On others it increases
> it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
>
> I'll commit this to trunk now.  If there are problems, please don't
> hesitate to let me know!  Thanks.
>

Hi,

I have noticed many regressions on arm and aarch64 between 265366 and
265408 (this commit is 265398).

I bisected at least one to this commit on aarch64:
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
live-range of register"
The same test also regresses on arm.

For a whole picture of all the regressions I noticed during these two
commits, have a look at:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html

Christophe



>
> Segher
>
>
> 2018-10-22  Segher Boessenkool  
>
> PR rtl-optimization/87600
> * combine.c: Add include of expr.h.
> (cant_combine_insn_p): Do not combine moves from any hard non-fixed
> register to a pseudo.
> (make_more_copies): New function, add a copy to a new pseudo after
> the moves from hard registers into pseudos.
> (rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
> later.  Call make_more_copies.
>
> ---
>  gcc/combine.c | 50 ++
>  1 file changed, 46 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 256b5a4..3ff1760 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "explow.h"
>  #include "insn-attr.h"
>  #include "rtlhooks-def.h"
> +#include "expr.h"
>  #include "params.h"
>  #include "tree-pass.h"
>  #include "valtrack.h"
> @@ -2348,8 +2349,7 @@ cant_combine_insn_p (rtx_insn *insn)
>  dest = SUBREG_REG (dest);
>if (REG_P (src) && REG_P (dest)
>&& ((HARD_REGISTER_P (src)
> -  && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))
> -  && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (src
> +  && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
>   || (HARD_REGISTER_P (dest)
>   && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
>   && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO 
> (dest))
> @@ -14969,11 +14969,53 @@ dump_combine_total_stats (FILE *file)
>   total_attempts, total_merges, total_extras, total_successes);
>  }
>
> +/* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, because
> +   the reg-to-reg copy can usefully combine with later instructions, but we
> +   do not want to combine the hard reg into later instructions, for that
> +   restricts register allocation.  */
> +static void
> +make_more_copies (void)
> +{
> +  basic_block bb;
> +
> +  FOR_EACH_BB_FN (bb, cfun)
> +{
> +  rtx_insn *insn;
> +
> +  FOR_BB_INSNS (bb, insn)
> +{
> +  if (!NONDEBUG_INSN_P (insn))
> +continue;
> +
> + rtx set = single_set (insn);
> + if (!set)
> +   continue;
> + rtx src = SET_SRC (set);
> + rtx dest = SET_DEST (set);
> + if (GET_CODE (src) == SUBREG)
> +   src = SUBREG_REG (src);
> + if (!(REG_P (src) && HARD_REGISTER_P (src)))
> +   continue;
> + if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
> +   continue;
> +
> + rtx new_reg = gen_reg_rtx (GET_MODE (dest));
> + rtx_insn *insn1 = gen_move_insn (new_reg, src);
> + rtx_insn *insn2 = gen_move_insn (dest, new_reg);
> + emit_insn_after (insn1, insn);
> + emit_insn_after (insn2, insn1);
> + delete_insn (insn);
> +
> + insn = insn2;
> +   }
> +}
> +}
> +
>  /* Try combining insns through substitution.  */
>  static unsigned int
>  rest_of_handle_combine (void)
>  {
> -  int rebuild_jump_labels_after_combine;
> +  make_more_copies ();
>
>df_set_flags 

Re: Debug unordered containers code cleanup

2018-10-23 Thread Jonathan Wakely

On 22/10/18 22:45 +0200, François Dumont wrote:

I plan to commit the attached patch this week if not told otherwise.


Looks good.


This is to generalize usage of C++11 direct initialization in 
unordered containers.


It also avoids a number of safe iterator instantiations.


Would the following patch also make sense?

--- a/libstdc++-v3/include/debug/safe_unordered_container.h
+++ b/libstdc++-v3/include/debug/safe_unordered_container.h
@@ -66,18 +66,18 @@ namespace __gnu_debug
  void
  _M_invalidate_locals()
  {
-   auto __local_end = _M_cont()._M_base().end(0);
+   auto __local_end = _M_cont()._M_base().cend(0);
   this->_M_invalidate_local_if(
-   [__local_end](__decltype(_M_cont()._M_base().cend(0)) __it)
+   [__local_end](__decltype(__local_end) __it)
   { return __it != __local_end; });
  }

  void
  _M_invalidate_all()
  {
-   auto __end = _M_cont()._M_base().end();
+   auto __end = _M_cont()._M_base().cend();
   this->_M_invalidate_if(
-   [__end](__decltype(_M_cont()._M_base().cend()) __it)
+   [__end](__decltype(__end) __it)
   { return __it != __end; });
   _M_invalidate_locals();
  }
@@ -92,7 +92,7 @@ namespace __gnu_debug

  /** Invalidates all local iterators @c x that reference this container,
 are not singular, and for which @c __pred(x) returns @c
- true. @c __pred will be invoked with the normal ilocal iterators
+ true. @c __pred will be invoked with the normal local iterators
 nested in the safe ones. */
  template
   void




Re: Fix std::byte namespace declaration

2018-10-23 Thread Jonathan Wakely

On 23/10/18 07:07 +0200, François Dumont wrote:

On 10/18/2018 10:34 PM, Jonathan Wakely wrote:

On 18/10/18 22:12 +0200, François Dumont wrote:
Current build of libstdc++ with 
--enable-symvers=gnu-versioned-namespace fails (at least under 
Linux) because of:


In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:71:59: 
error: la référence à « byte » est ambiguë
   71 |   template<> struct __byte_operand { using __type = 
byte; };

  | ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:61,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory:62,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:37,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/cpp_type_traits.h:395:30: 
note: les candidats sont : « enum class std::__8::byte »

  395 |   enum class byte : unsigned char;
  |  ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:68:14: 
note:  « enum class std::byte »

   68 |   enum class byte : unsigned char {};
  |  ^~~~

I think the issue if that std::byte declaration in 
cpp_type_traits.h has been done in versioned namespace, so the 
attached patch.


I think the definitions in  should use the versioned
namespace macros. Then  would be correct.

I thought cstddef was some kind of generated file.

I eventually put all its content in versioned namespace.

    * include/c_global/cstddef: Add versioned namespace.

Build successful with the patch, still tests to run. Ok if successful ?


Yes, OK thanks.




[PATCH] Fix PR87700

2018-10-23 Thread Richard Biener


This fixes a very old bug in the copy-propagation lattice-update
exposed by my SSA propagator changes which happen to introduce
oscillation between two unshared ADDR_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
and branch.

Richard.

2018-10-23  Richard Biener  

PR tree-optimization/87700
* tree-ssa-copy.c (set_copy_of_val): Fix change detection logic.

* gcc.dg/torture/pr87700.c: New testcase.

Index: gcc/tree-ssa-copy.c
===
--- gcc/tree-ssa-copy.c (revision 265192)
+++ gcc/tree-ssa-copy.c (working copy)
@@ -155,7 +155,7 @@ set_copy_of_val (tree var, tree val)
   copy_of[ver].value = val;
 
   if (old != val
-  || (val && !operand_equal_p (old, val, 0)))
+  && (!old || !operand_equal_p (old, val, 0)))
 return true;
 
   return false;
Index: gcc/testsuite/gcc.dg/torture/pr87700.c
===
--- gcc/testsuite/gcc.dg/torture/pr87700.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87700.c  (working copy)
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+
+void
+wn (int ki)
+{
+  int m8 = 0;
+  int *d6 = 
+
+  if (ki == 0)
+{
+ud:
+  for (ki = 0; ki < 1; ++ki)
+   for (m8 = 0; m8 < 1; ++m8)
+ goto ud;
+
+  d6 = 
+
+y8:
+  ++m8;
+
+xw:
+  if (ki == 0)
+   {
+   }
+  else
+   {
+ for (m8 = 0; m8 < 1; ++m8)
+   {
+gt:
+ if (*d6 == 0)
+   goto y8;
+   }
+
+ for (m8 = 0; m8 < 1; ++m8)
+   {
+ goto gt;
+
+ym:
+ ;
+   }
+   }
+
+  d6 = 
+
+  goto ym;
+}
+
+  goto xw;
+}


Re: [[C++ PATCH]] Implement C++2a P0330R2 - Literal Suffixes for ptrdiff_t and size_t

2018-10-23 Thread Florian Weimer
* Ed Smith-Rowland:

> This patch implements C++2a proposal P0330R2 Literal Suffixes for
> ptrdiff_t and size_t*.  It's not official yet but looks very likely to
> pass.  It is incomplete because I'm looking for some opinions. 9We
> also might wait 'till it actually passes).
>
> This paper takes the direction of a language change rather than a
> library change through C++11 literal operators.  This was after
> feedback on that paper after a few iterations.
>
> As coded in this patch, integer suffixes involving 'z' are errors in C
> and warnings for C++ <= 17 (in addition to the usual warning about
> implementation suffixes shadowing user-defined ones).

So a plain z would denote ptrdiff_t, and size_t would be zu?  That is
very confusing.

Why is this not consistent with %td, %zd and %zu?  I would have expected
t for ptrdiff_t, zu for size_t, and z for ssize_t.

Thanks,
Florian


Re: [PATCH] Fix PR86144

2018-10-23 Thread Richard Biener
On Tue, 23 Oct 2018, Jakub Jelinek wrote:

> On Tue, Oct 23, 2018 at 10:57:34AM +0200, Richard Biener wrote:
> > +/* Prefer vectorizable_call over vectorizable_simd_clone_call so
> > +   -mveclibabi= takes preference over ibrary functions with
> 
> s/ibrary/l&/

Fixed.

Richard.

2018-10-23  Richard Biener  

* tree-vect-stmts.c (vect_analyze_stmt): Fix typo in comment.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 265414)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -9534,7 +9534,7 @@ vect_analyze_stmt (stmt_vec_info stmt_in
   && (STMT_VINFO_RELEVANT_P (stmt_info)
  || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def))
 /* Prefer vectorizable_call over vectorizable_simd_clone_call so
-   -mveclibabi= takes preference over ibrary functions with
+   -mveclibabi= takes preference over library functions with
the simd attribute.  */
 ok = (vectorizable_call (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_simd_clone_call (stmt_info, NULL, NULL, node,


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Richard Biener
On Mon, Oct 22, 2018 at 10:09 PM Jeff Law  wrote:
>
> On 10/20/18 9:47 AM, Giuliano Augusto Faulin Belinassi wrote:
> > So I did some further investigation comparing the ULP error.
> >
> > With the formula that Wilco Dijkstra provided, there are cases where
> > the substitution is super precise.
> > With floats:
> > with input  :  = 9.9940395355224609375000e-01
> > sinh: before:  = 2.89631005859375e+03
> > sinh: after :  = 2.896309326171875000e+03
> > sinh: mpfr  :  = 2.89630924626497842670468162463283783344599446025119e+03
> > ulp err befr:  = 3
> > ulp err aftr:  = 0
> >
> > With doubles:
> > with input  :  = 9.99888977697537484345957636833190917969e-01
> > sinh: before:  = 6.710886400029802322387695312500e+07
> > sinh: after :  = 6.71088632549419403076171875e+07
> > sinh: mpfr  :  = 6.710886344120645523071287770030292885894208e+07
> > ulp err befr:  = 3
> > ulp err aftr:  = 0
> >
> > *However*, there are cases where some error shows up. The biggest ULP
> > error that I could find was 2.
> >
> > With floats:
> > with input  :  = 9.99968349933624267578125000e-01
> > sinh: before:  = 1.2568613433837890625000e+02
> > sinh: after :  = 1.2568614959716796875000e+02
> > sinh: mpfr  :  = 1.25686137592274042266452526368087062890399889097864e+02
> > ulp err befr:  = 0
> > ulp err aftr:  = 2
> >
> > With doubles:
> > with input  :  = 9.999463651256803586875321343541145324707031e-01
> > sinh: before:  = 9.65520209507428342476487159729003906250e+05
> > sinh: after :  = 9.6552020950742810964584350585937500e+05
> > sinh: mpfr  :  = 9.65520209507428288553227922831618987450806468855883e+05
> > ulp err befr:  = 0
> > ulp err aftr:  = 2
> >
> > And with FMA we have the same results showed above. (super precise
> > cases, and maximum ULP error equal 2).
> >
> > So maybe update the patch with the following rules?
> >* If FMA is available, then compute 1 - x*x with it.
> >* If FMA is not available, then do the dijkstra substitution when |x| > 
> > 0.5.
> So I think the runtime math libraries shoot for .5 ULP (yes, they don't
> always make it, but that's their goal).  We should probably have the
> same goal.  Going from 0 to 2 ULPs would be considered bad.

But we do that everywhere (with -funsafe-math-optimizations or
-fassociative-math).

Richard.

> So ideally we'd have some way to distinguish between the cases where we
> actually improve things (such as in your example).  I don't know if
> that's possible.
>
> jeff


Re: [RFC] GCC support for live-patching

2018-10-23 Thread Miroslav Benes
On Mon, 22 Oct 2018, Qing Zhao wrote:

> Hi, 
> 
> thanks for the comments.
> 
> > 
> > thanks for the proposal. The others have already expressed some of my 
> > worries and remarks, but I think it would be only right to write them 
> > again. Especially since I am part of the team responsible for 
> > implementation and maintenance of live patches here at SUSE, we use kGraft 
> > and we prepare everything manually (compared to kpatch and ksplice).
> 
> One question here,  what’s the major benefit to prepare the patches manually? 

I could almost quote what you wrote below. It is a C file, easy to review 
and maintain. You have everything "under control". It allows to implement 
tricky hacks easily by hand if needed.
 
> >> 1. A study of Kernel live patching schemes.
> >> 
> >> Three major kernel live patching tools:  https://lwn.net/Articles/734765/
> >> 
> >> * ksplice:   http://www.ksplice.com/doc/ksplice.pdf
> >> * kpatch:https://lwn.net/Articles/597123/
> >>https://github.com/dynup/kpatch
> >> * kGraft:
> >> https://pdfs.semanticscholar.org/presentation/af4c/895aa3fef0cc2b501317aaec9d91ba2d704c.pdf
> >> 
> >> In the above, ksplice and kpatch can automatically generate binary patches 
> >> as following:
> >> 
> >>   * a collection of tools which convert a source diff patch to a patch
> >> module. They work by compiling the kernel both with and without the source
> >> patch, comparing the binaries, and generating a binary patch module which 
> >> includes new binary versions of the functions to be replaced.
> >> 
> >> on the other hand, kGraft offers a way to create patches entirely by hand. 
> >> The source of the patch is a single C file, easy to review, easy to
> >> maintain. 
> >> 
> >> In addition to kGraft, there are other live patching tools that prefer
> >> creating patches by hand for the similar reason. 
> >> 
> >> The compiler support is mainly for the above live patching tools that 
> >> create 
> >> patches entirely by hand. the major purpose is:
> >> 
> >> * control patch code size and debug complexity;
> >> * keep good run time performance;
> >> 
> >> 2. the major problems of compiler in live patching:
> >> 
> >> For the live patching schemes that create patches by hand, when patching 
> >> one function, there might a list of functions that will be impacted by 
> >> this patched function due to compiler optimization/analyses (mainly IPA
> >> optimization/analyses), a complete patch will include the patched function
> >> and all impacted functions. Usually, there are two major factors to be
> >> considered in such live patching schemes:
> >> 
> >> * patch code size, one major factor is the length of the list 
> >> of impacted functions;
> >> * run time performance.
> >> 
> >> If we want to control the patch code size, to make the list of impacted 
> >> functions minimum, we have to disable corresponding compiler optimizations 
> >> as much as possible.
> > 
> > Andi already talked about it and I, too, do not understand your worry 
> > about patch code size. First, it has never been so bad here. Yes, 
> > sometimes the function closure gets bigger due to optimizations and 
> > inlining. I've considered it as nothing else than a lack of better 
> > tooling, because it is indeed something which could be improved a lot. 
> > Nicolai (CCed) works on a potential solution. It is also one of the topics 
> > at LPC miniconf in Vancouver.
> > 
> > Second, the idea to disable inlining would not fly at SUSE. I can't 
> > imagine to even propose it. The kernel heavily relies on the feature. The 
> > optimizations are a different story and some of those certainly could be 
> > disabled with no harm caused.
> > 
> > So let me ask, what is your motivation behind this? Is there a real 
> > problem you're trying to solve? It may have been mentioned somewhere and I 
> > missed it.
> 
> the major functionality we want is:   to Only enable static inlining for live 
> patching for one 
> of our internal customers.   the major purpose is to control the patch code 
> size explosion and
> debugging complexity due to too much inlining of global functions for the 
> specific application.

I hoped for more details, but ok.
 
> therefore, I proposed the multiple level of control for -flive-patching to 
> satisfy multiple request from 
> different users. 
> 
> So far, from the feedback, I see that among the 4 levels of control,   none, 
> only-inline-static, inline,
> and inline-clone,   “none” and “inline” are NOT needed at all.
> 
> however,  -flive-patching = [only-inline-static | inline-clone] are necessary.
> 
> > 
> >> On the other hand, in order to keep good run time performance, we need to 
> >> keep the compiler optimization as much as possible. 
> >> 
> >> So, there should be some tradeoff between these two factors. 
> >> 
> >> The following are two major categories of compiler optimizations 
> >> we should considered:
> >> 
> >> A. compiler optimizations/analyses that extract ipa info 

Re: [PATCH] Fix PR86144

2018-10-23 Thread Jakub Jelinek
On Tue, Oct 23, 2018 at 10:57:34AM +0200, Richard Biener wrote:
> +/* Prefer vectorizable_call over vectorizable_simd_clone_call so
> +   -mveclibabi= takes preference over ibrary functions with

s/ibrary/l&/

Jakub


Re: [PATCH, GCC/ARM, ping2] Fix PR87374: ICE with -mslow-flash-data and -mword-relocations

2018-10-23 Thread Thomas Preudhomme
Ping?

Best regards,

Thomas

On Mon, 15 Oct 2018 at 16:01, Thomas Preudhomme
 wrote:
>
> Ping?
>
> Best regards,
>
> Thomas
> On Fri, 5 Oct 2018 at 17:50, Thomas Preudhomme
>  wrote:
> >
> > Hi Ramana and Kyrill,
> >
> > I've reworked the patch to add some documentation of the option
> > conflict and reworked the -mword-relocation logic slightly to set the
> > variable explicitely in PIC mode rather than test for PIC and word
> > relocation everywhere.
> >
> > ChangeLog entries are now as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-10-02  Thomas Preud'homme  
> >
> > PR target/87374
> > * config/arm/arm.c (arm_option_check_internal): Disable the combined
> > use of -mslow-flash-data and -mword-relocations.
> > (arm_option_override): Enable -mword-relocations if -fpic or -fPIC.
> > * config/arm/arm.md (SYMBOL_REF MOVT splitter): Stop checking for
> > flag_pic.
> > * doc/invoke.texi (-mword-relocations): Mention conflict with
> > -mslow-flash-data.
> > (-mslow-flash-data): Reciprocally.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-09-25  Thomas Preud'homme  
> >
> > PR target/87374
> > * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data and
> > -mword-relocations would be passed when compiling the test.
> > * gcc.target/arm/movsi_movt.c: Likewise.
> > * gcc.target/arm/pr81863.c: Likewise.
> > * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
> > * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
> > * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> > * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> > * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
> > * gcc.target/arm/tls-disable-literal-pool.c: Likewise.
> >
> > Is this ok for trunk?
> >
> > Best regards,
> >
> > Thomas
> >
> > On Tue, 2 Oct 2018 at 13:39, Ramana Radhakrishnan
> >  wrote:
> > >
> > > On 02/10/2018 11:42, Thomas Preudhomme wrote:
> > > > Hi Ramana,
> > > >
> > > > On Thu, 27 Sep 2018 at 11:14, Ramana Radhakrishnan
> > > >  wrote:
> > > >>
> > > >> On 27/09/2018 09:26, Kyrill Tkachov wrote:
> > > >>> Hi Thomas,
> > > >>>
> > > >>> On 26/09/18 18:39, Thomas Preudhomme wrote:
> > >  Hi,
> > > 
> > >  GCC ICEs under -mslow-flash-data and -mword-relocations because there
> > >  is no way to load an address, both literal pools and MOVW/MOVT being
> > >  forbidden. This patch gives an error message when both options are
> > >  specified by the user and adds the according dg-skip-if directives 
> > >  for
> > >  tests that use either of these options.
> > > 
> > >  ChangeLog entries are as follows:
> > > 
> > >  *** gcc/ChangeLog ***
> > > 
> > >  2018-09-25  Thomas Preud'homme  
> > > 
> > > PR target/87374
> > > * config/arm/arm.c (arm_option_check_internal): Disable the 
> > >  combined
> > > use of -mslow-flash-data and -mword-relocations.
> > > 
> > >  *** gcc/testsuite/ChangeLog ***
> > > 
> > >  2018-09-25  Thomas Preud'homme  
> > > 
> > > PR target/87374
> > > * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data 
> > >  and
> > > -mword-relocations would be passed when compiling the test.
> > > * gcc.target/arm/movsi_movt.c: Likewise.
> > > * gcc.target/arm/pr81863.c: Likewise.
> > > * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
> > > * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
> > > * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> > > * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> > > * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
> > > * gcc.target/arm/tls-disable-literal-pool.c: Likewise.
> > > 
> > > 
> > >  Testing: Bootstrapped in Thumb-2 mode. No testsuite regression when
> > >  targeting arm-none-eabi. Modified tests get skipped as expected when
> > >  running the testsuite with -mslow-flash-data (pr81863.c) or
> > >  -mword-relocations (all the others).
> > > 
> > > 
> > >  Is this ok for trunk? I'd also appreciate guidance on whether this is
> > >  worth a backport. It's a simple patch but on the other hand it only
> > >  prevents some option combination, it does not fix anything so I have
> > >  mixed feelings.
> > > >>>
> > > >>> In my opinion -mslow-flash-data is more of a tuning option rather 
> > > >>> than a security/ABI feature
> > > >>> and therefore erroring out on its combination with -mword-relocations 
> > > >>> feels odd.
> > > >>> I'm leaning more towards making -mword-relocations or any other 
> > > >>> option that really requires constant pools
> > > >>> to bypass/disable the effects of -mslow-flash-data instead.
> > > >>
> > > >> -mslow-flash-data and -mword-relocations are contradictory in their
> > > >> expectations. 

Re: [PATCH] Come up with --param asan-stack-small-redzone (PR sanitizer/81715).

2018-10-23 Thread Martin Liška
PING^2

On 10/9/18 10:29 AM, Martin Liška wrote:
> PING^1
> 
> On 9/26/18 11:33 AM, Martin Liška wrote:
>> On 9/25/18 5:53 PM, Jakub Jelinek wrote:
>>> On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
 The only missing piece is how to implement asan_emit_redzone_payload more 
 smart.
 It means doing memory stores with 8,4,2,1 sizes in order to reduce # of 
 insns.
 Do we have somewhere a similar code?
>>>
>>> Yeah, that is a very important optimization.  I wasn't using DImode because
>>> at least on x86_64 64-bit constants are quite expensive and on several other
>>> targets even more so, so SImode was a compromise to get size of the prologue
>>> under control and not very slow.  What I think we want is figure out ranges
>>
>> Ah, some time ago, I remember you mentioned the 64-bit constants are 
>> expensive
>> (even on x86_64). Btw. it's what clang used for the red zone instrumentation.
>>
>>> of shadow bytes we want to initialize and the values we want to store there,
>>> perhaps take also into account strict alignment vs. non-strict alignment,
>>> and perform kind of store merging for it.  Given that 2 shadow bytes would
>>> be only used for the very small variables (<=4 bytes in size, so <= 0.5
>>> bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
>>> handling adjacent vars and store it together.
>>
>> Agree, it's implemented in next version of patch.
>>
>>>
>>> I think we want to introduce some define for minimum red zone size and use
>>> it instead of the granularity (granularity is 8 bytes, but minimum red zone
>>> size if we count into it also the very small variable size is 16 bytes).
>>>
 --- a/gcc/asan.h
 +++ b/gcc/asan.h
 @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
  }
  
 +/* Return how much a stack variable occupy on a stack
 +   including a space for redzone.  */
 +
 +static inline unsigned int
 +asan_var_and_redzone_size (unsigned int size)
>>>
>>> The argument needs to be UHWI, otherwise you do a wrong thing for
>>> say 4GB + 4 bytes long variable.  Ditto the result.
>>>
 +{
 +  if (size <= 4)
 +return 16;
 +  else if (size <= 16)
 +return 32;
 +  else if (size <= 128)
 +return 32 + size;
 +  else if (size <= 512)
 +return 64 + size;
 +  else if (size <= 4096)
 +return 128 + size;
 +  else
 +return 256 + size;
>>>
>>> I'd prefer size + const instead of const + size operand order.
>>>
 @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct 
 stack_vars_data *data)
  && stack_vars[i].size.is_constant ())
{
  prev_offset = align_base (prev_offset,
 -  MAX (alignb, ASAN_RED_ZONE_SIZE),
 +  MAX (alignb, ASAN_SHADOW_GRANULARITY),
>>>
>>> Use that ASAN_MIN_RED_ZONE_SIZE (16) here.
>>>
!FRAME_GROWS_DOWNWARD);
  tree repr_decl = NULL_TREE;
 +poly_uint64 size =  asan_var_and_redzone_size 
 (stack_vars[i].size.to_constant ());
>>>
>>> Too long line.  Two spaces instead of one.  Why poly_uint64?
>>> Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
>>> automatic variable in a frame), we should ensure that size is at least
>>> 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 
>>>
  offset
 -  = alloc_stack_frame_space (stack_vars[i].size
 - + ASAN_RED_ZONE_SIZE,
 - MAX (alignb, ASAN_RED_ZONE_SIZE));
 +  = alloc_stack_frame_space (size,
 + MAX (alignb, 
 ASAN_SHADOW_GRANULARITY));
>>>
>>> Again, too long line and we want 16 instead of 8 here too.
  
  data->asan_vec.safe_push (prev_offset);
  /* Allocating a constant amount of space from a constant
 @@ -2254,7 +2254,7 @@ expand_used_vars (void)
 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
  /* Allocating a constant amount of space from a constant
 starting offset must give a constant result.  */
 -offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
 +offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
>>>
>>> and here too.
>>>
>>> Jakub
>>>
>>
>> The rest is also implemented as requested. I'm testing Linux kernel now, 
>> will send
>> stats to the PR created for it.
>>
>> Patch survives testing on x86_64-linux-gnu.
>>
>> Martin
>>
> 



Re: [PATCH] Fix setting of hotness in non-LTO mode (PR gcov-profile/77698).

2018-10-23 Thread Martin Liška
PING^1

On 10/9/18 2:37 PM, Martin Liška wrote:
> Hi.
> 
> In non-LTO mode, we should not set hotness according to computed histogram
> in ipa-profile. Following patch does that and fixes the test-case isolated
> from PR.
> 
> Patch survives regression tests on x86_64-linux-gnu.
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-10-09  Martin Liska  
> 
>   PR gcov-profile/77698
>   * ipa-profile.c (ipa_profile): Adjust hotness threshold
>   only in LTO mode.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-10-09  Martin Liska  
> 
>   PR gcov-profile/77698
>   * gcc.dg/tree-prof/pr77698.c: New test.
> ---
>  gcc/ipa-profile.c|  5 ++---
>  gcc/testsuite/gcc.dg/tree-prof/pr77698.c | 23 +++
>  2 files changed, 25 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/pr77698.c
> 
> 



[PATCH] Make __PRETTY_FUNCTION__-like functions mergeable string csts (PR c++/64266).

2018-10-23 Thread Martin Liška
Hi.

I've returned to this long-lasting issue after quite some time. Thanks to Honza 
I hope
I can now address the root cause which caused output of a string constant when 
debug info
was emitted. The problematic situation happened with following back-trace:

#0  mergeable_string_section (decl=, align=64, 
flags=0) at /home/marxin/Programming/gcc/gcc/varasm.c:808
#1  0x01779bf3 in default_elf_select_section (decl=, reloc=0, align=64) at 
/home/marxin/Programming/gcc/gcc/varasm.c:6739
#2  0x0176efb6 in get_constant_section (exp=, align=64) at /home/marxin/Programming/gcc/gcc/varasm.c:3302
#3  0x0176f468 in build_constant_desc (exp=) 
at /home/marxin/Programming/gcc/gcc/varasm.c:3371
#4  0x0176f81c in output_constant_def (exp=, 
defer=1) at /home/marxin/Programming/gcc/gcc/varasm.c:3434
#5  0x0176d406 in decode_addr_const (exp=, 
value=0x7fffc540) at /home/marxin/Programming/gcc/gcc/varasm.c:2951
#6  0x0176d93f in const_hash_1 (exp=) at 
/home/marxin/Programming/gcc/gcc/varasm.c:3054
#7  0x0176fdc2 in lookup_constant_def (exp=) 
at /home/marxin/Programming/gcc/gcc/varasm.c:3557
#8  0x00dd5778 in cst_pool_loc_descr (loc=) 
at /home/marxin/Programming/gcc/gcc/dwarf2out.c:17288

That was in situation where we emit debug info of a function that has an 
inlined __PRETTY_FUNCTION__ from
a different function. As seen, the constant is output due to const_hash_1 
function call. Proper fix would
be to not emit these string constants for purpose of hash function.

However, I still see some minor ICEs, it's probably related to decay_conversion 
in cp_fname_init:

1) ./xg++ -B. 
/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C

/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C:6:17:
 internal compiler error: Segmentation fault
6 | [] { return __func__; }();
  | ^~~~
0x1344568 crash_signal
/home/marxin/Programming/gcc/gcc/toplev.c:325
0x76bc310f ???

/usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x9db134 is_capture_proxy(tree_node*)
/home/marxin/Programming/gcc/gcc/cp/lambda.c:261
0xaeecb7 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
/home/marxin/Programming/gcc/gcc/cp/pt.c:16700
0xaee5fd tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
/home/marxin/Programming/gcc/gcc/cp/pt.c:16636
0xaf0ffb tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
/home/marxin/Programming/gcc/gcc/cp/pt.c:16942

where
(gdb) p debug_tree(decl)
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x769b5498 precision:8 min  max 
pointer_to_this >
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x769b5540>
readonly used tree_2 unsigned DI 
/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-__func__2.C:6:17
 size  unit-size 
align:64 warn_if_not_align:0 context 
value-expr 
constant
arg:0 
constant
arg:0 
constant "operator()\000"

and
#0  0x009db134 in is_capture_proxy (decl=) at 
/home/marxin/Programming/gcc/gcc/cp/lambda.c:261

2 ) ./xg++ -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C 
-c
/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C:16:24: internal 
compiler error: Segmentation fault
16 | __PRETTY_FUNCTION__), 0))
   |^
/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C:36:10: note: in 
expansion of macro ‘assert’
36 | int a = (assert (foo ()), 1);
   |  ^~
0x1344568 crash_signal
/home/marxin/Programming/gcc/gcc/toplev.c:325
0x76bc310f ???

/usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x9db270 is_capture_proxy(tree_node*)
/home/marxin/Programming/gcc/gcc/cp/lambda.c:265
0x9dbad9 is_normal_capture_proxy(tree_node*)
/home/marxin/Programming/gcc/gcc/cp/lambda.c:274
0x9c1fe6 mark_use(tree_node*, bool, bool, unsigned int, bool)
/home/marxin/Programming/gcc/gcc/cp/expr.c:114
0x89d9ab convert_like_real
/home/marxin/Programming/gcc/gcc/cp/call.c:6905

where:

(gdb) p debug_tree(decl)
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x769b4498 precision:8 min  max 
pointer_to_this >
unsigned type_6 DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x769b4540>
readonly used tree_1 tree_2 unsigned read decl_6 DI 
/home/marxin/Programming/gcc/gcc/testsuite/g++.dg/ext/pretty2.C:36:10 size 
 unit-size 
align:64 warn_if_not_align:0
value-expr 
constant
arg:0 

[PATCH] Fix PR86144

2018-10-23 Thread Richard Biener


In this PR it was requested that -mveclibabi=svml takes precedence
over simd annotations of  which is a reasonable expectation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-10-23  Richard Biener  

PR tree-optimization/86144
* tree-vect-stmts.c (vect_analyze_stmt): Prefer -mveclibabi
over simd attribute.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 265411)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -9533,14 +9533,18 @@ vect_analyze_stmt (stmt_vec_info stmt_in
   if (!bb_vinfo
   && (STMT_VINFO_RELEVANT_P (stmt_info)
  || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def))
-ok = (vectorizable_simd_clone_call (stmt_info, NULL, NULL, node, cost_vec)
+/* Prefer vectorizable_call over vectorizable_simd_clone_call so
+   -mveclibabi= takes preference over ibrary functions with
+   the simd attribute.  */
+ok = (vectorizable_call (stmt_info, NULL, NULL, node, cost_vec)
+ || vectorizable_simd_clone_call (stmt_info, NULL, NULL, node,
+  cost_vec)
  || vectorizable_conversion (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_shift (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_operation (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_assignment (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_load (stmt_info, NULL, NULL, node, node_instance,
cost_vec)
- || vectorizable_call (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_store (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_reduction (stmt_info, NULL, NULL, node,
 node_instance, cost_vec)
@@ -9552,8 +9556,9 @@ vect_analyze_stmt (stmt_vec_info stmt_in
   else
 {
   if (bb_vinfo)
-   ok = (vectorizable_simd_clone_call (stmt_info, NULL, NULL, node,
-   cost_vec)
+   ok = (vectorizable_call (stmt_info, NULL, NULL, node, cost_vec)
+ || vectorizable_simd_clone_call (stmt_info, NULL, NULL, node,
+  cost_vec)
  || vectorizable_conversion (stmt_info, NULL, NULL, node,
  cost_vec)
  || vectorizable_shift (stmt_info, NULL, NULL, node, cost_vec)
@@ -9562,7 +9567,6 @@ vect_analyze_stmt (stmt_vec_info stmt_in
  cost_vec)
  || vectorizable_load (stmt_info, NULL, NULL, node, node_instance,
cost_vec)
- || vectorizable_call (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_store (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_condition (stmt_info, NULL, NULL, NULL, 0, node,
 cost_vec)


Re: [PATCH] Fix a couple of avx512* intrinsic prototypes (PR target/87674)

2018-10-23 Thread Uros Bizjak
On Tue, Oct 23, 2018 at 10:35 AM Jakub Jelinek  wrote:
>
> Hi!
>
> For all these, the instructions use just 8-bits from mask register and
> ICC prototypes as well as online Intel intrinsic documentation confirm that
> too.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Not sure if we need to backport it, this isn't a wrong-code issue, the
> important mask bits aren't lost in any way.
>
> 2018-10-23  Jakub Jelinek  
>
> PR target/87674
> * config/i386/avx512vlintrin.h (_mm_mask_mullo_epi32): Change type of
> second argument from __mmask16 to __mmask8.
> * config/i386/avx512vlbwintrin.h (_mm_mask_packus_epi32,
> _mm_mask_packs_epi32): Likewise.
> * config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i64scatter_ps):
> Likewise.
> (_mm512_mask_prefetch_i64scatter_pd): Likewise.  Formatting fix.

OK.

IMO, the patch is also safe for backports.

Thanks,
Uros.

> --- gcc/config/i386/avx512vlintrin.h.jj 2018-10-22 09:28:21.843398728 +0200
> +++ gcc/config/i386/avx512vlintrin.h2018-10-22 09:52:39.432092006 +0200
> @@ -9095,7 +9095,7 @@ _mm_maskz_mullo_epi32 (__mmask8 __M, __m
>
>  extern __inline __m128i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_mask_mullo_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
> +_mm_mask_mullo_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
>   __m128i __B)
>  {
>return (__m128i) __builtin_ia32_pmulld128_mask ((__v4si) __A,
> --- gcc/config/i386/avx512vlbwintrin.h.jj   2018-07-11 22:55:44.663456512 
> +0200
> +++ gcc/config/i386/avx512vlbwintrin.h  2018-10-22 09:55:24.784333238 +0200
> @@ -4346,7 +4346,7 @@ _mm_maskz_packus_epi32 (__mmask8 __M, __
>
>  extern __inline __m128i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_mask_packus_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
> +_mm_mask_packus_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
>__m128i __B)
>  {
>return (__m128i) __builtin_ia32_packusdw128_mask ((__v4si) __A,
> @@ -4389,7 +4389,7 @@ _mm_maskz_packs_epi32 (__mmask8 __M, __m
>
>  extern __inline __m128i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_mask_packs_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
> +_mm_mask_packs_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
>   __m128i __B)
>  {
>return (__m128i) __builtin_ia32_packssdw128_mask ((__v4si) __A,
> --- gcc/config/i386/avx512pfintrin.h.jj 2018-01-03 10:20:06.095535707 +0100
> +++ gcc/config/i386/avx512pfintrin.h2018-10-22 09:49:52.647874664 +0200
> @@ -174,16 +174,16 @@ _mm512_prefetch_i64scatter_ps (void *__a
>
>  extern __inline void
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_mask_prefetch_i64scatter_pd (void *__addr, __mmask16 __mask,
> +_mm512_mask_prefetch_i64scatter_pd (void *__addr, __mmask8 __mask,
> __m512i __index, int __scale, int __hint)
>  {
>__builtin_ia32_scatterpfqpd (__mask, (__v8di) __index, __addr, __scale,
> - __hint);
> +  __hint);
>  }
>
>  extern __inline void
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_mask_prefetch_i64scatter_ps (void *__addr, __mmask16 __mask,
> +_mm512_mask_prefetch_i64scatter_ps (void *__addr, __mmask8 __mask,
> __m512i __index, int __scale, int __hint)
>  {
>__builtin_ia32_scatterpfqps (__mask, (__v8di) __index, __addr, __scale,
>
> Jakub


[PATCH] Fix g++.dg/cpp2a/lambda-this3.C (Re: PATCH to enable testing C++17 by default)

2018-10-23 Thread Jakub Jelinek
On Wed, Oct 17, 2018 at 03:31:43PM -0400, Marek Polacek wrote:
> As discussed in  it
> seems to be a high time we turned on testing C++17 by default.
> 
> The only interesting part is at the very end, otherwise most of the changes is
> just using { target c++17 } instead of explicit dg-options.  Removing
> dg-options has the effect that DEFAULT_CXXFLAGS comes in play, so I've removed
> a bunch of stray semicolons to fix -Wpedantic errors.
> 
> I wonder if we also want to enable 2a, but the overhead could be too much.  Or
> use 2a instead of 17?
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2018-10-17  Marek Polacek  
> 
>   * g++.dg/*.C: Use target c++17 instead of explicit dg-options.
>   * lib/g++-dg.exp: Don't test C++11 by default.  Add C++17 to
>   the list of default stds to test.

> diff --git gcc/testsuite/g++.dg/cpp2a/lambda-this3.C 
> gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> index 5e5c8b3d50f..d1738ea7d17 100644
> --- gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> +++ gcc/testsuite/g++.dg/cpp2a/lambda-this3.C
> @@ -1,6 +1,6 @@
>  // P0806R2
> -// { dg-do compile }
> -// { dg-options "-std=c++17" }
> +// { dg-do compile { target c++17 } }
> +// { dg-options "" }
>  
>  struct X {
>int x;

This test now fails with -std=gnu++2a:
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: warning: implicit 
capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:8:15: note: add explicit 
'this' or '*this' capture
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C: In lambda function:
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: warning: implicit 
capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:16:15: note: add explicit 
'this' or '*this' capture
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: warning: implicit 
capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:17:16: note: add explicit 
'this' or '*this' capture
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: warning: implicit 
capture of 'this' via '[=]' is deprecated in C++20 [-Wdeprecated]
/.../gcc/gcc/testsuite/g++.dg/cpp2a/lambda-this3.C:18:13: note: add explicit 
'this' or '*this' capture
FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, line 
8)
FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, line 
16)
FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, line 
17)
FAIL: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a  (test for bogus messages, line 
18)
PASS: g++.dg/cpp2a/lambda-this3.C  -std=gnu++2a (test for excess errors)

The following patch fixes this, tested on x86_64-linux with
make check-c++-all RUNTESTFLAGS=dg.exp=lambda-this3.C
Ok for trunk?

2018-10-23  Jakub Jelinek  

* g++.dg/cpp2a/lambda-this3.C: Limit dg-bogus directives to c++17_down 
only.
Add expected warnings and messages for c++2a.

--- gcc/testsuite/g++.dg/cpp2a/lambda-this3.C.jj2018-10-22 
09:28:06.807650016 +0200
+++ gcc/testsuite/g++.dg/cpp2a/lambda-this3.C   2018-10-23 10:48:13.992577673 
+0200
@@ -5,7 +5,9 @@
 struct X {
   int x;
   void foo (int n) {
-auto a1 = [=] { x = n; }; // { dg-bogus "implicit capture" }
+auto a1 = [=] { x = n; }; // { dg-bogus "implicit capture" "" { target 
c++17_down } }
+ // { dg-warning "implicit capture of 'this' via 
'\\\[=\\\]' is deprecated" "" { target c++2a } .-1 }
+ // { dg-message "add explicit 'this' or 
'\\\*this' capture" "" { target c++2a } .-2 }
 auto a2 = [=, this] { x = n; };
 // { dg-warning "explicit by-copy capture" "" { target c++17_down } .-1 }
 auto a3 = [=, *this]() mutable { x = n; };
@@ -13,9 +15,15 @@ struct X {
 auto a5 = [&, this] { x = n; };
 auto a6 = [&, *this]() mutable { x = n; };
 
-auto a7 = [=] { // { dg-bogus "implicit capture" }
-  auto a = [=] { // { dg-bogus "implicit capture" }
-auto a2 = [=] { x = n; }; // { dg-bogus "implicit capture" }
+auto a7 = [=] { // { dg-bogus "implicit capture" "" { target c++17_down } }
+   // { dg-warning "implicit capture of 'this' via '\\\[=\\\]' 
is deprecated" "" { target c++2a } .-1 }
+   // { dg-message "add explicit 'this' or '\\\*this' capture" 
"" { target c++2a } .-2 }
+  auto a = [=] { // { dg-bogus "implicit capture" "" { target c++17_down } 
}
+// { dg-warning "implicit capture of 'this' via 
'\\\[=\\\]' is deprecated" "" { target c++2a } .-1 }
+// { dg-message "add explicit 'this' or '\\\*this' 
capture" "" { target c++2a } .-2 }
+auto a2 = [=] { x = n; }; // { dg-bogus "implicit capture" "" { 

[PATCH] Fix PR87693

2018-10-23 Thread Richard Biener


Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-10-23  Richard Biener  

PR tree-optimization/87693
* tree-ssa-threadedge.c (thread_around_empty_blocks): Handle
the case we do not find the taken edge.

* gcc.dg/torture/pr87693.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/torture/pr87693.c 
b/gcc/testsuite/gcc.dg/torture/pr87693.c
new file mode 100644
index 000..802560dd347
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr87693.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+void f (void);
+void g (void);
+void h (int a)
+{
+  void *p, **q;
+  if (a)
+p = (void *)f;
+  else
+p = (void *)g;
+  q = (void *)p;
+  if (*q == (void *)0)
+goto *p;
+L0:
+  return;
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 0b1f9733fdd..330ba153e37 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -981,7 +981,8 @@ thread_around_empty_blocks (edge taken_edge,
   else
taken_edge = find_taken_edge (bb, cond);
 
-  if ((taken_edge->flags & EDGE_DFS_BACK) != 0)
+  if (!taken_edge
+ || (taken_edge->flags & EDGE_DFS_BACK) != 0)
return false;
 
   if (bitmap_bit_p (visited, taken_edge->dest->index))



Re: [PATCH] Switch conversion: support any ax + b transformation (PR tree-optimization/84436).

2018-10-23 Thread Martin Liška
On 10/22/18 4:25 PM, Jakub Jelinek wrote:
> On Mon, Oct 22, 2018 at 04:08:53PM +0200, Martin Liška wrote:
>> Very valid question. I hope as long as I calculate the linear function
>> values in wide_int (get via wi::to_wide (switch_element)), then it should
>> overflow in the same way as original tree type arithmetic. I have a 
>> test-case with
>> overflow: gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c.
>>
>> Do you have any {over,under)flowing test-cases that I should add to 
>> test-suite?
> 
> I'm worried that the calculation you emit into the code could invoke UB at
> runtime, even if there was no UB in the original code, and later GCC passes
> would optimize with the assumption that UB doesn't occur.
> E.g. if the multiplication overflows for one or more of the valid values in
> the switch and then the addition adds a negative value so that the end
> result is actually representable.

In order to address that I verified that neither of (a * x) and (a * x) + b 
{over,under}flow
in case of TYPE_OVERFLOW_UNDEFINED (type) is true.

Hope it's way how to properly make it safe?

Martin

> 
>   Jakub
> 

>From 19a9315c9defe2416ff3c44d1b1b2206eac56b10 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 11 Oct 2018 12:37:37 +0200
Subject: [PATCH] Switch conversion: support any ax + b transformation (PR
 tree-optimization/84436).

gcc/ChangeLog:

2018-10-11  Martin Liska  

	PR tree-optimization/84436
	* tree-switch-conversion.c (switch_conversion::contains_same_values_p):
	Remove.
	(switch_conversion::contains_linear_function_p): New.
	(switch_conversion::build_one_array): Support linear
	transformation on input.
	* tree-switch-conversion.h (struct switch_conversion): Add
	contains_linear_function_p declaration.

gcc/testsuite/ChangeLog:

2018-10-11  Martin Liska  

	PR tree-optimization/84436
	* gcc.dg/tree-ssa/pr84436-1.c: New test.
	* gcc.dg/tree-ssa/pr84436-2.c: New test.
	* gcc.dg/tree-ssa/pr84436-3.c: New test.
	* gcc.dg/tree-ssa/pr84436-4.c: New test.
	* gcc.dg/tree-ssa/pr84436-5.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c |  36 
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c |  67 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-3.c |  24 +
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c |  38 
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-5.c |  37 
 gcc/testsuite/gcc.dg/tree-ssa/pr84436-6.c |  38 
 gcc/tree-switch-conversion.c  | 101 ++
 gcc/tree-switch-conversion.h  |  10 ++-
 8 files changed, 332 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr84436-6.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
new file mode 100644
index 000..6e739704f8a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-1.c
@@ -0,0 +1,36 @@
+/* PR tree-optimization/84436 */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-optimized" } */
+/* { dg-do run } */
+
+int
+__attribute__ ((noipa))
+foo (int how)
+{
+  switch (how) {
+case 2: how = 205; break; /* how = 100 * index + 5 */
+case 3: how = 305; break;
+case 4: how = 405; break;
+case 5: how = 505; break;
+case 6: how = 605; break;
+  }
+  return how;
+}
+
+int main()
+{
+  if (foo (2) != 205)
+  __builtin_abort ();
+
+  if (foo (6) != 605)
+  __builtin_abort ();
+
+  if (foo (123) != 123)
+  __builtin_abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "how.*\\* 100" 1 "switchconv" } } */
+/* { dg-final { scan-tree-dump-times "how.* = .* \\+ 5" 1 "switchconv" } } */
+/* { dg-final { scan-tree-dump-not "switch" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c
new file mode 100644
index 000..c34027a08b9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84436-2.c
@@ -0,0 +1,67 @@
+/* PR tree-optimization/84436 */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-optimized" } */
+
+char
+lowerit(char a)
+{
+  switch (a)
+{
+default:
+  return a;
+case 'A':
+  return 'a';
+case 'B':
+  return 'b';
+case 'C':
+  return 'c';
+case 'D':
+  return 'd';
+case 'E':
+  return 'e';
+case 'F':
+  return 'f';
+case 'G':
+  return 'g';
+case 'H':
+  return 'h';
+case 'I':
+  return 'i';
+case 'J':
+  return 'j';
+case 'K':
+  return 'k';
+case 'L':
+  return 'l';
+case 'M':
+  return 'm';
+case 'N':
+  return 'n';
+case 'O':
+  return 'o';
+case 'P':
+  return 'p';
+case 'Q':
+  return 'q';
+case 'R':
+  return 'r';
+  

[PATCH] Fix a couple of avx512* intrinsic prototypes (PR target/87674)

2018-10-23 Thread Jakub Jelinek
Hi!

For all these, the instructions use just 8-bits from mask register and
ICC prototypes as well as online Intel intrinsic documentation confirm that
too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Not sure if we need to backport it, this isn't a wrong-code issue, the
important mask bits aren't lost in any way.

2018-10-23  Jakub Jelinek  

PR target/87674
* config/i386/avx512vlintrin.h (_mm_mask_mullo_epi32): Change type of
second argument from __mmask16 to __mmask8.
* config/i386/avx512vlbwintrin.h (_mm_mask_packus_epi32,
_mm_mask_packs_epi32): Likewise.
* config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i64scatter_ps):
Likewise.
(_mm512_mask_prefetch_i64scatter_pd): Likewise.  Formatting fix.

--- gcc/config/i386/avx512vlintrin.h.jj 2018-10-22 09:28:21.843398728 +0200
+++ gcc/config/i386/avx512vlintrin.h2018-10-22 09:52:39.432092006 +0200
@@ -9095,7 +9095,7 @@ _mm_maskz_mullo_epi32 (__mmask8 __M, __m
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mullo_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
+_mm_mask_mullo_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
  __m128i __B)
 {
   return (__m128i) __builtin_ia32_pmulld128_mask ((__v4si) __A,
--- gcc/config/i386/avx512vlbwintrin.h.jj   2018-07-11 22:55:44.663456512 
+0200
+++ gcc/config/i386/avx512vlbwintrin.h  2018-10-22 09:55:24.784333238 +0200
@@ -4346,7 +4346,7 @@ _mm_maskz_packus_epi32 (__mmask8 __M, __
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_packus_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
+_mm_mask_packus_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
   __m128i __B)
 {
   return (__m128i) __builtin_ia32_packusdw128_mask ((__v4si) __A,
@@ -4389,7 +4389,7 @@ _mm_maskz_packs_epi32 (__mmask8 __M, __m
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_packs_epi32 (__m128i __W, __mmask16 __M, __m128i __A,
+_mm_mask_packs_epi32 (__m128i __W, __mmask8 __M, __m128i __A,
  __m128i __B)
 {
   return (__m128i) __builtin_ia32_packssdw128_mask ((__v4si) __A,
--- gcc/config/i386/avx512pfintrin.h.jj 2018-01-03 10:20:06.095535707 +0100
+++ gcc/config/i386/avx512pfintrin.h2018-10-22 09:49:52.647874664 +0200
@@ -174,16 +174,16 @@ _mm512_prefetch_i64scatter_ps (void *__a
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i64scatter_pd (void *__addr, __mmask16 __mask,
+_mm512_mask_prefetch_i64scatter_pd (void *__addr, __mmask8 __mask,
__m512i __index, int __scale, int __hint)
 {
   __builtin_ia32_scatterpfqpd (__mask, (__v8di) __index, __addr, __scale,
- __hint);
+  __hint);
 }
 
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_prefetch_i64scatter_ps (void *__addr, __mmask16 __mask,
+_mm512_mask_prefetch_i64scatter_ps (void *__addr, __mmask8 __mask,
__m512i __index, int __scale, int __hint)
 {
   __builtin_ia32_scatterpfqps (__mask, (__v8di) __index, __addr, __scale,

Jakub


Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections

2018-10-23 Thread Eric Botcazou
> I know: a patch to fix this is almost ready, just needs a final round of
> testing.

OK, thanks for the information.

-- 
Eric Botcazou


Re: [PATCH] Fix some EVRP stupidness

2018-10-23 Thread Richard Biener
On Tue, 23 Oct 2018, Aldy Hernandez wrote:

> 
> > + if (tem.kind () == old_vr->kind ()
> > + && tem.min () == old_vr->min ()
> > + && tem.max () == old_vr->max ())
> > +   continue;
> 
> I think it would be cleaner to use tem.ignore_equivs_equal_p (*old_vr). The
> goal was to use == when the equivalence bitmap should be taken into account,
> or ignore_equivs_equal_p() otherwise.

Ah, didn't know of that function (and yes, I wanted to ignore equivs).

Will try to remember together with the dump thing David noticed.

Richard.

> (Unless you really really don't want to compare the extremes with
> vrp_operand_equal_p.)
> 
> Aldy


Re: [PATCH] Fix some EVRP stupidness

2018-10-23 Thread Aldy Hernandez




+ if (tem.kind () == old_vr->kind ()
+ && tem.min () == old_vr->min ()
+ && tem.max () == old_vr->max ())
+   continue;


I think it would be cleaner to use tem.ignore_equivs_equal_p (*old_vr). 
The goal was to use == when the equivalence bitmap should be taken into 
account, or ignore_equivs_equal_p() otherwise.


(Unless you really really don't want to compare the extremes with 
vrp_operand_equal_p.)


Aldy


Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections

2018-10-23 Thread Rainer Orth
Hi Eric,

>> it's not disabled (I had to disable it when testing an a /bin/as version
>> with full SHF_MERGE/SHF_STRINGS suppurt recently), so I suspect the
>> latter.  In S11.4 .rodata and .rodata.str1.8 are merged, with the
>> alignment of the larger of the two on the output section.
>
> OK.  The regressions are still there on Solaris 11.3 for me.

I know: a patch to fix this is almost ready, just needs a final round of
testing.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections

2018-10-23 Thread Eric Botcazou
> it's not disabled (I had to disable it when testing an a /bin/as version
> with full SHF_MERGE/SHF_STRINGS suppurt recently), so I suspect the
> latter.  In S11.4 .rodata and .rodata.str1.8 are merged, with the
> alignment of the larger of the two on the output section.

OK.  The regressions are still there on Solaris 11.3 for me.

-- 
Eric Botcazou


Re: [patch, fortran] Implement FINDLOC

2018-10-23 Thread Bernhard Reutner-Fischer
On Mon, 22 Oct 2018 at 23:01, Thomas Koenig  wrote:

> Anyway, the attached patch fixes this, plus the print *, instead
> of test for return values, plus the whitespace issues mentioned
> by Bernhard. Patch gzipped this time to let it go through to
> gcc-patches.

Thanks, The few remainin issues are:

$ ./contrib/check_GNU_style.py /tmp/p15.diff
=== ERROR type #1: blocks of 8 spaces should be replaced with tabs (1
error(s)) ===
gcc/fortran/simplify.c:5667:17:  dim_index -= 1;   /*
zero-base index */

=== ERROR type #2: dot, space, space, end of comment (1 error(s)) ===
gcc/fortran/simplify.c:5667:50:  dim_index -= 1;   /*
zero-base index */

=== ERROR type #3: dot, space, space, new sentence (3 error(s)) ===
gcc/fortran/check.c:3363:30:/* Check function for findloc.█Mostly like
gfc_check_minloc_maxloc
gcc/fortran/simplify.c:5604:32:/* Simplify findloc to an array.█Similar to
gcc/fortran/simplify.c:5627:27: linked-list traversal.█Masked
elements are set to NULL.  */

=== ERROR type #4: lines should not exceed 80 characters (196 error(s)) ===
gcc/fortran/check.c:159:80:  gfc_error ("%qs argument of %qs
intrinsic at %L must be of intrinsic type",
gcc/fortran/intrinsic.c:728:80:add_sym_6fl (const char *name,
gfc_isym_id id, enum klass cl, int actual_ok, bt type,
gcc/fortran/simplify.c:5674:80:  tmpstride[i] = (i == 0) ? 1 :
tmpstride[i-1] * mpz_get_si (array->shape[i-1]);

=== ERROR type #6: trailing operator (1 error(s)) ===
gcc/fortran/iresolve.c:1873:25:  f->value.function.name =

(this wants ...function.name\n= gfc_get_string (... )

=== ERROR type #7: trailing whitespace (2 error(s)) ===
gcc/fortran/check.c:3390:0:███
gcc/fortran/simplify.c:5794:10:  else█

TIA,


  1   2   >