Re: [PATCH] Make strlen range computations more conservative

2018-07-24 Thread Jeff Law
On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
> On 07/24/18 23:46, Jeff Law wrote:
>> I'd like to ask we hold on this until I return from PTO (Aug 1) so that
>> we can discuss the best thing to do here for each class of change.
>>
> 
> Okay.
> 
>> I think you, Martin, Richi and myself should hash through the technical
>> issues raised by the patch.  Obviously others can chime in, but I think
>> the 4 of us probably need to drive the discussion.
>>
> 
> Yes, sure.  I will try to help when I can.
Thanks for understanding.  I'll be back on Aug 1, slogging my way
through the week's worth of patches...


Jeff


Re: [2/5] C-SKY port: Backend implementation

2018-07-24 Thread Jeff Law
On 07/24/2018 06:17 PM, Sandra Loosemore wrote:
> On 07/24/2018 03:24 PM, Jeff Law wrote:
>>>
 Any thoughts on using the newer function descriptor bits rather than
 old
 style stack trampolines?
>>>
>>> Has that been committed?  I vaguely remembered discussion of a new way
>>> to handle nested functions without using the trampoline interface, but I
>>> couldn't find any documentation in the internals manual.
>> It did.  See TARGET_CUSTOM_FUNCTION_DESCRIPTORS and the (relatively few)
>> ports that define it.
> 
> Hmmm, I completely failed to make that connection from the docs -- the
> whole description of that hook is pretty gibberishy and I thought it was
> something for targets where the ABI already specifies some "standard
> calling sequence" using descriptors (C-SKY doesn't), rather than a
> generic alternative to executable trampolines.  Putting on my doc
> maintainer hat briefly, I can see this needs a lot of work.  :-(
Most likely :-)  So many things to do, so little time.


> 
> Anyway, is this required for new ports nowadays?  If so, I at least know
> what to search for now.  At this point I couldn't say whether this would
> do anything to fix the situation on ck801 targets where there simply
> aren't enough spare registers available to the trampoline to both hold
> the static link and do an indirect jump.
It's not required, but preferred, particularly if the part has an MMU
that can provide no-execute protections on pages in memory.  If the
target doesn't have an mmu, then it's of marginal value.

The key advantage it has over the old trampoline implementation is that
stacks can remain non-executable, even for Ada and nested functions.
That's a big win from a security standpoint.

Jeff


[PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-24 Thread Martin Sebor

The very large option argument enhancement committed last week
inadvertently introduced an assumption about the LP64 data model
that makes the -Wxxx-larger-than options have a different effect
at their default documented setting of PTRDIFF_MAX between ILP32
and LP64.  As a result, the options are treated as suppressed in
ILP32 while triggering the expected warnings for allocations or
sizes in excess of the limit in ILP64.

To remove this I considered making it possible to use symbolic
constants like PTRDIFF_MAX as the option arguments so that
then defaults in the .opt files could be set to that.  Sadly,
options in GCC are processed before the values of constants
like PTRDIFF_MAX for the target are known, and deferring
the handling of just the -Wxxx-larger-than= options until
the constants have been computed would be too involved to
make it worth the effort.

To keep things simple I decided to have the code that handles
each of the affected options treat the HOST_WIDE_INT_MAX default
as a special request to substitute the value of PTRDIFF_MAX at
the time.

The attached patch implements this.

Tested on x86_64-linux with -m32/-m64.

Martin
PR middle-end/86631 - missing -Walloc-size-larger-than on ILP32 hosts

gcc/ChangeLog:

	PR middle-end/86631
	* calls.c (alloc_max_size): Treat HOST_WIDE_INT special.
	* gimple-ssa-warn-alloca.c (adjusted_warn_limit): New function.
	(pass_walloca::gate): Use it.
	(alloca_call_type): Same.
	(pass_walloca::execute): Same.
	* stor-layout.c (layout_decl): Treat HOST_WIDE_INT special.

gcc/testsuite/ChangeLog:

	PR middle-end/86631
	* g++.dg/Walloca1.C: Adjust.

Index: gcc/calls.c
===
--- gcc/calls.c	(revision 262951)
+++ gcc/calls.c	(working copy)
@@ -1222,9 +1222,12 @@ alloc_max_size (void)
   if (alloc_object_size_limit)
 return alloc_object_size_limit;
 
-  alloc_object_size_limit
-= build_int_cst (size_type_node, warn_alloc_size_limit);
+  HOST_WIDE_INT limit = warn_alloc_size_limit;
+  if (limit == HOST_WIDE_INT_MAX)
+limit = tree_to_shwi (TYPE_MAX_VALUE (ptrdiff_type_node));
 
+  alloc_object_size_limit = build_int_cst (size_type_node, limit);
+
   return alloc_object_size_limit;
 }
 
Index: gcc/gimple-ssa-warn-alloca.c
===
--- gcc/gimple-ssa-warn-alloca.c	(revision 262951)
+++ gcc/gimple-ssa-warn-alloca.c	(working copy)
@@ -38,6 +38,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "intl.h"
 
+static unsigned HOST_WIDE_INT adjusted_warn_limit (bool);
+
 const pass_data pass_data_walloca = {
   GIMPLE_PASS,
   "walloca",
@@ -82,7 +84,9 @@ pass_walloca::gate (function *fun ATTRIBUTE_UNUSED
   // Warning is disabled when its size limit is greater than PTRDIFF_MAX
   // for the target maximum, which makes the limit negative since when
   // represented in signed HOST_WIDE_INT.
-  return warn_alloca_limit >= 0 || warn_vla_limit >= 0;
+  unsigned HOST_WIDE_INT max = tree_to_uhwi (TYPE_MAX_VALUE (ptrdiff_type_node));
+  return (adjusted_warn_limit (false) <= max
+	  || adjusted_warn_limit (true) <= max);
 }
 
 // Possible problematic uses of alloca.
@@ -127,6 +131,30 @@ struct alloca_type_and_limit {
   alloca_type_and_limit (enum alloca_type type) : type(type) { }
 };
 
+/* Return the value of the argument N to -Walloca-larger-than= or
+   -Wvla-larger-than= adjusted for the target data model so that
+   when N == HOST_WIDE_INT_MAX, the adjusted value is set to
+   PTRDIFF_MAX on the target.  This is done to prevent warnings
+   for unknown/unbounded allocations in the "permissive mode"
+   while still diagnosing excessive and necessarily invalid
+   allocations.  */
+
+static unsigned HOST_WIDE_INT
+adjusted_warn_limit (bool idx)
+{
+  static HOST_WIDE_INT limits[2];
+  if (limits[idx])
+return limits[idx];
+
+  limits[idx] = idx ? warn_vla_limit : warn_alloca_limit;
+  if (limits[idx] != HOST_WIDE_INT_MAX)
+return limits[idx];
+
+  limits[idx] = tree_to_shwi (TYPE_MAX_VALUE (ptrdiff_type_node));
+  return limits[idx];
+}
+
+
 // NOTE: When we get better range info, this entire function becomes
 // irrelevant, as it should be possible to get range info for an SSA
 // name at any point in the program.
@@ -309,11 +337,7 @@ alloca_call_type (gimple *stmt, bool is_vla, tree
 
   // Adjust warn_alloca_max_size for VLAs, by taking the underlying
   // type into account.
-  unsigned HOST_WIDE_INT max_size;
-  if (is_vla)
-max_size = warn_vla_limit;
-  else
-max_size = warn_alloca_limit;
+  unsigned HOST_WIDE_INT max_size = adjusted_warn_limit (is_vla);
 
   // Check for the obviously bounded case.
   if (TREE_CODE (len) == INTEGER_CST)
@@ -510,6 +534,8 @@ pass_walloca::execute (function *fun)
 	  struct alloca_type_and_limit t
 	= alloca_call_type (stmt, is_vla, _casted_type);
 
+	  unsigned HOST_WIDE_INT adjusted_alloca_limit
+	= adjusted_warn_limit (false);
 	  // Even if we think 

Re: [2/5] C-SKY port: Backend implementation

2018-07-24 Thread Sandra Loosemore

On 07/24/2018 03:24 PM, Jeff Law wrote:



Any thoughts on using the newer function descriptor bits rather than old
style stack trampolines?


Has that been committed?  I vaguely remembered discussion of a new way
to handle nested functions without using the trampoline interface, but I
couldn't find any documentation in the internals manual.

It did.  See TARGET_CUSTOM_FUNCTION_DESCRIPTORS and the (relatively few)
ports that define it.


Hmmm, I completely failed to make that connection from the docs -- the 
whole description of that hook is pretty gibberishy and I thought it was 
something for targets where the ABI already specifies some "standard 
calling sequence" using descriptors (C-SKY doesn't), rather than a 
generic alternative to executable trampolines.  Putting on my doc 
maintainer hat briefly, I can see this needs a lot of work.  :-(


Anyway, is this required for new ports nowadays?  If so, I at least know 
what to search for now.  At this point I couldn't say whether this would 
do anything to fix the situation on ck801 targets where there simply 
aren't enough spare registers available to the trampoline to both hold 
the static link and do an indirect jump.


-Sandra


Re: [PATCH] Make strlen range computations more conservative

2018-07-24 Thread Bernd Edlinger
On 07/24/18 23:46, Jeff Law wrote:
> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>> Hi!
>>
>> This patch makes strlen range computations more conservative.
>>
>> Firstly if there is a visible type cast from type A to B before passing
>> then value to strlen, don't expect the type layout of B to restrict the
>> possible return value range of strlen.
> Why do you think this is the right thing to do?  ie, is there language
> in the standards that makes you think the code as it stands today is
> incorrect from a conformance standpoint?  Is there a significant body of
> code that is affected in an adverse way by the current code?  If so,
> what code?
> 
> 

I think if you have an object, of an effective type A say char[100], then
you can cast the address of A to B, say typedef char (*B)[2] for instance
and then to const char *, say for use in strlen.  I may be wrong, but I think
that we should at least try not to pick up char[2] from B, but instead
use A for strlen ranges, or leave this range open.  Currently the range
info for strlen is [0..1] in this case, even if we see the type cast
in the generic tree.

One other example I have found in one of the test cases:

char c;

if (strlen() != 0) abort();

this is now completely elided, but why?  Is there a code base where
that is used?  I doubt, but why do we care to eliminate something
stupid like that?  If we would emit a warning for that I'm fine with it,
But if we silently remove code like that I don't think that it
will improve anything.  So I ask, where is the code base which
gets an improvement from that optimization?



> 
>>
>> Furthermore use the outermost enclosing array instead of the
>> innermost one, because too aggressive optimization will likely
>> convert harmless errors into security-relevant errors, because
>> as the existing test cases demonstrate, this optimization is actively
>> attacking string length checks in user code, while and not giving
>> any warnings.
> Same questions here.
> 
> I'll also note that Martin is *very* aware of the desire to avoid
> introducing security relevent errors.  In fact his main focus is to help
> identify coding errors that have a security impact.  So please don't
> characterize his work as "actively attacking string length checks in
> user code".
> 

I do fully respect Martin's valuable contributions over the years,
and I did not intend to say anything about the quality of his work,
for GCC, it is just breathtaking!

What I meant is just, what this particular optimization can do.

> Ultimately we want highly accurate string lengths to help improve the
> quality of the warnings we generate for potentially dangerous code.
> These changes seem to take us in the opposite direction.
> 

No, I don't think so, we have full control on the direction, when
I do what Richi requested on his response, we will have one function
where the string length estimation is based upon, instead of several
open coded tree walks.

> So ISTM that you really need a stronger justification using the
> standards compliance and/or real world code that is made less safe by
> keeping string lengths as accurate as possible.
> 
> 

This work concentrates mostly on avoiding to interfere with code that
actually deserves warnings, but which is not being warned about.

>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> I'd like to ask we hold on this until I return from PTO (Aug 1) so that
> we can discuss the best thing to do here for each class of change.
> 

Okay.

> I think you, Martin, Richi and myself should hash through the technical
> issues raised by the patch.  Obviously others can chime in, but I think
> the 4 of us probably need to drive the discussion.
> 

Yes, sure.  I will try to help when I can.

Currently I thought Martin is working on the string constant folding,
(therefore I thought this range patch would not collide with his patch)
and there are plenty of change requests, plus I think he has some more
patches on hold.  I would like to see the review comments resolved,
and maybe also get to see the follow up patches, maybe as a patch
series, so we can get a clearer picture?


Thanks
Bernd.

> Thanks,
> Jeff
> 


Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-24 Thread Jeff Law
On 07/24/2018 02:16 PM, Martin Sebor wrote:
> On 07/20/2018 04:20 AM, Richard Biener wrote:
>> On Thu, 19 Jul 2018, Martin Sebor wrote:
>>
>>> Here's one more update with tweaks addressing a couple more
>>> of Bernd's comments:
>>>
>>> 1) correct the use of TREE_STRING_LENGTH() where a number of
>>> array elements is expected and not bytes
>>> 2) set CHARTYPE as soon as it's first determined rather than
>>> trying to extract it again later
TREE_STRING_LENGTH is *really* poorly named.  It practically invites misuse.

[ Snip ]


> 
> 
> gcc-86532.diff
> 
> 
> PR tree-optimization/86622 - incorrect strlen of array of array plus variable 
> offset
> PR tree-optimization/86532 - Wrong code due to a wrong strlen folding 
> starting with r262522
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/86622
>   PR tree-optimization/86532
>   * builtins.h (string_length): Declare.
>   * builtins.c (c_strlen): Correct handling of non-constant offsets.  
>   (check_access): Be prepared for non-constant length ranges.
>   (string_length): Make extern.
>   * expr.c (string_constant): Only handle the minor non-constant
>   array index.  Use string_constant to compute the length of
>   a generic string constant.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/86622
>   PR tree-optimization/86532
>   * gcc.c-torture/execute/strlen-2.c: New test.
>   * gcc.c-torture/execute/strlen-3.c: New test.
>   * gcc.c-torture/execute/strlen-4.c: New test.
> 
OK
jeff


Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-24 Thread Steve Ellcey
On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote:
> 
> 
> I'd say this patch isn't desirable for trunk. I'd be interested in use cases
> that need a static decision on presence of LSE that are not better expressed
> using higher level language features.
> 
> Thanks,
> James

How about when building the higher level features?  Right now,
in sysdeps/aarch64/atomic-machine.h, we
hardcode ATOMIC_EXCHANGE_USES_CAS to 0.  If we had __ARM_FEATURE_LSE we
could use that to determine if we wanted to set
ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call
generated in nptl/pthread_spin_lock.c.  That would be useful if we
built a lipthread specifically for a platform that had LSE.

Steve Ellcey
sell...@cavium.com



Re: [PATCH] Explain asan parameters in params.def (PR sanitizer/79635).

2018-07-24 Thread Jeff Law
On 07/24/2018 06:18 AM, Martin Liška wrote:
> Hi.
> 
> That's simple patch that improves documentation as requested
> in the PR.
> 
> Ready for trunk?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-24  Martin Liska  
> 
> PR sanitizer/79635
>   * params.def: Explain ASan abbreviation and provide
> a documentation link.
> ---
>  gcc/params.def | 2 ++
>  1 file changed, 2 insertions(+)
> 
> 
OK
jeff


Re: [PATCH] Make strlen range computations more conservative

2018-07-24 Thread Jeff Law
On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
> Hi!
> 
> This patch makes strlen range computations more conservative.
> 
> Firstly if there is a visible type cast from type A to B before passing
> then value to strlen, don't expect the type layout of B to restrict the
> possible return value range of strlen.
Why do you think this is the right thing to do?  ie, is there language
in the standards that makes you think the code as it stands today is
incorrect from a conformance standpoint?  Is there a significant body of
code that is affected in an adverse way by the current code?  If so,
what code?



> 
> Furthermore use the outermost enclosing array instead of the
> innermost one, because too aggressive optimization will likely
> convert harmless errors into security-relevant errors, because
> as the existing test cases demonstrate, this optimization is actively
> attacking string length checks in user code, while and not giving
> any warnings.
Same questions here.

I'll also note that Martin is *very* aware of the desire to avoid
introducing security relevent errors.  In fact his main focus is to help
identify coding errors that have a security impact.  So please don't
characterize his work as "actively attacking string length checks in
user code".

Ultimately we want highly accurate string lengths to help improve the
quality of the warnings we generate for potentially dangerous code.
These changes seem to take us in the opposite direction.

So ISTM that you really need a stronger justification using the
standards compliance and/or real world code that is made less safe by
keeping string lengths as accurate as possible.


> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
I'd like to ask we hold on this until I return from PTO (Aug 1) so that
we can discuss the best thing to do here for each class of change.

I think you, Martin, Richi and myself should hash through the technical
issues raised by the patch.  Obviously others can chime in, but I think
the 4 of us probably need to drive the discussion.

Thanks,
Jeff


Re: [PATCH 6/7] AArch64 - new pass to add conditional-branch speculation tracking

2018-07-24 Thread Jeff Law
On 07/23/2018 08:33 AM, Richard Earnshaw (lists) wrote:
> [sorry, missed this mail somehow]
> 
> On 11/07/18 22:01, Jeff Law wrote:
>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>> This patch is the main part of the speculation tracking code.  It adds
>>> a new target-specific pass that is run just before the final branch
>>> reorg pass (so that it can clean up any new edge insertions we make).
>>> The pass is only run with -mtrack-speculation is passed on the command
>>> line.
>>>
>>> One thing that did come to light as part of this was that the stack pointer
>>> register was not being permitted in comparision instructions.  We rely on
>>> that for moving the tracking state between SP and the scratch register at
>>> function call boundaries.
>> Note that the sp in comparison instructions issue came up with the
>> improvements to stack-clash that Tamar, Richard S. and you worked on.
>>
> 
> I can certainly lift that part into a separate patch.
Your call.  It was mostly an observation that the change was clearly
needed elsewhere.  I'm certainly comfortable letting that hunk go in
with whichever kit is approved first :-)

> 
>>
>>>
>>> * config/aarch64/aarch64-speculation.cc: New file.
>>> * config/aarch64/aarch64-passes.def (pass_track_speculation): Add before
>>> pass_reorder_blocks.
>>> * config/aarch64/aarch64-protos.h (make_pass_track_speculation): Add
>>> prototype.
>>> * config/aarch64/aarch64.c (aarch64_conditional_register_usage): Fix
>>> X14 and X15 when tracking speculation.
>>> * config/aarch64/aarch64.md (register name constants): Add
>>> SPECULATION_TRACKER_REGNUM and SPECULATION_SCRATCH_REGNUM.
>>> (unspec): Add UNSPEC_SPECULATION_TRACKER.
>>> (speculation_barrier): New insn attribute.
>>> (cmp): Allow SP in comparisons.
>>> (speculation_tracker): New insn.
>>> (speculation_barrier): Add speculation_barrier attribute.
>>> * config/aarch64/t-aarch64: Add make rule for aarch64-speculation.o.
>>> * config.gcc (aarch64*-*-*): Add aarch64-speculation.o to extra_objs.
>>> * doc/invoke.texi (AArch64 Options): Document -mtrack-speculation.
>>> ---
>>>  gcc/config.gcc|   2 +-
>>>  gcc/config/aarch64/aarch64-passes.def |   1 +
>>>  gcc/config/aarch64/aarch64-protos.h   |   3 +-
>>>  gcc/config/aarch64/aarch64-speculation.cc | 494 
>>> ++
>>>  gcc/config/aarch64/aarch64.c  |  13 +
>>>  gcc/config/aarch64/aarch64.md |  30 +-
>>>  gcc/config/aarch64/t-aarch64  |  10 +
>>>  gcc/doc/invoke.texi   |  10 +-
>>>  8 files changed, 558 insertions(+), 5 deletions(-)
>>>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
>> Given the consensus forming about using these kind of masking
>> instructions being the preferred way to mitigate (as opposed to lfence
>> barriers and the like) I have to ask your opinions about making the bulk
>> of this a general pass rather than one specific to the aarch backend.
>> I'd hate to end up duplicating all this stuff across multiple architectures.
>>
>> I think it all looks pretty reasonable though.
>>
>> jeff
>>
> 
> 
> It would be nice to make this more generic, but I'm not sure how easy
> that would be.  Some of the analysis is surely the same, but deployment
> of the mitigation itself is perhaps more complex.  At this point in
> time, I think I'd prefer to go with the target-specific implementation
> and then look to generalize it as a follow-up.  There may be some more
> optimizations to add later as well.
ACK.  I suspect it's mostly the analysis side that we'll want to share.
I don't mind giving you the advantage of going first and letting it live
in the aarch64 backend.  Second implementation can extract the analysis
bits :-)

So IMHO, this can go forward whenever you want to push it.

Jeff



Re: [2/5] C-SKY port: Backend implementation

2018-07-24 Thread Jeff Law
On 07/24/2018 12:18 PM, Sandra Loosemore wrote:
> On 07/24/2018 09:45 AM, Jeff Law wrote:
>> On 07/23/2018 10:21 PM, Sandra Loosemore wrote:
>>> 2018-07-23  Jojo  
>>>  Huibin Wang  
>>>  Sandra Loosemore  
>>>  Chung-Lin Tang  
>>>
>>>  C-SKY port: Backend implementation
>>>
>>>  gcc/
>>>  * config/csky/*: New.
>>>  * common/config/csky/*: New.
>>
>> Let's avoid gratutious whitespace that attempts to line up conditionals.
>>    As an example, look at the predicate csky_load_multiple_operation.  I
>> think just doing a quick pass over the .c, .h and main .md files should
>> be sufficient here.
> 
> OK, will do.
> 
>> I'm not a big fan of more awk code, but I'm not going to object to it :-)
>>
>> Why does the port have its own little pass for condition code
>> optimization (cse_cc)?  What is it doing that can't be done with our
>> generic optimizers?
> 
> This pass was included in the initial patch set we got from C-SKY, and
> as it didn't seem to break anything I left it in.  Perhaps C-SKY can
> provide a testcase that demonstrates why it's still useful in the
> current version of GCC; otherwise we can remove this from the initial
> port submission and restore it later if some performance analysis shows
> it is still worthwhile.
FWIW it looks like we model CC setting on just a few insns, (add,
subtract) so I'd be surprised if this little mini pass found much.  I'd
definitely like to hear from the csky authors here.

Alternately, you could do add some instrumentation to flag when it
triggers, take a test or two that does, reduce it and we can then look
at the key RTL sequences and see what the pass is really doing.

> 
>> Any thoughts on using the newer function descriptor bits rather than old
>> style stack trampolines?
> 
> Has that been committed?  I vaguely remembered discussion of a new way
> to handle nested functions without using the trampoline interface, but I
> couldn't find any documentation in the internals manual.
It did.  See TARGET_CUSTOM_FUNCTION_DESCRIPTORS and the (relatively few)
ports that define it.



> 
>> I don't see anything terribly concerning in the core of the port.  The
>> amount of support code for minipool is huge and I wonder if some sharing
>> across the various ports would be possible, but I don't think that
>> should be a blocking issue for this port.
> 
> Yes, that code was clearly copied almost verbatim from the ARM backend.
> I left it alone as much as possible to simplify any future attempts at
> genericizing it.
Understood -- I'd assumed it was largely copied from ARM, but hadn't
gone back to the ARM bits to verify.

> 
>> Can you update the backends.html web page here appropriately for the
>> c-sky target?
> 
> Sure, I can take care of updating that when the port is committed.  I
> believe the right entry is
> 
> "csky  b   ia"
Yea, that seems right to me.

Jeff


Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-24 Thread Jeff Law
On 07/24/2018 11:18 AM, Segher Boessenkool wrote:
> This patch allows combine to combine two insns into two.  This helps
> in many cases, by reducing instruction path length, and also allowing
> further combinations to happen.  PR85160 is a typical example of code
> that it can improve.
> 
> This patch does not allow such combinations if either of the original
> instructions was a simple move instruction.  In those cases combining
> the two instructions increases register pressure without improving the
> code.  With this move test register pressure does no longer increase
> noticably as far as I can tell.
> 
> (At first I also didn't allow either of the resulting insns to be a
> move instruction.  But that is actually a very good thing to have, as
> should have been obvious).
> 
> Tested for many months; tested on about 30 targets.
> 
> I'll commit this later this week if there are no objections.
> 
> 
> Segher
> 
> 
> 2018-07-24  Segher Boessenkool  
> 
>   PR rtl-optimization/85160
>   * combine.c (is_just_move): New function.
>   (try_combine): Allow combining two instructions into two if neither of
>   the original instructions was a move.
I've had several instances where a 2->2 combination would be useful
through the years.  I didn't save any of those examples though...  Good
to see the limitation being addressed.

jeff


[PATCH] Add initial version of C++17 header

2018-07-24 Thread Jonathan Wakely

This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.

This is a new implementation, not based on the existing code in
, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.

The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).

* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include  for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New test.
* testsuite/20_util/polymorphic_allocator/resource.cc: New test.
* testsuite/20_util/polymorphic_allocator/select.cc: New test.
* testsuite/util/testsuite_allocator.h (__gnu_test::memory_resource):
Define concrete memory resource for testing.
(__gnu_test::default_resource_mgr): Define RAII helper for changing
default resource.

Tested powerpc64le-linux, committed to trunk.



commit 05c7ae80dbd59fcef5d583eac15181afbc07a116
Author: Jonathan Wakely 
Date:   Tue Jul 24 14:19:19 2018 +0100

Add initial version of C++17  header

This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.

This is a new implementation, not based on the existing code in
, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.

The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).

* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include  for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New 
test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New 

Re: Fix ceil_log2(0) (PR 86644)

2018-07-24 Thread Jeff Law
On 07/24/2018 12:11 PM, Richard Sandiford wrote:
> This PR shows a pathological case in which we try SLP vectorisation on
> dead code.  We record that 0 bits of the result are enough to satisfy
> all users (which is true), and that led to precision being 0 in:
> 
> static unsigned int
> vect_element_precision (unsigned int precision)
> {
>   precision = 1 << ceil_log2 (precision);
>   return MAX (precision, BITS_PER_UNIT);
> }
> 
> ceil_log2 (0) returned 64 rather than 0, leading to 1 << 64, which is UB.
> 
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2018-07-24  Richard Sandiford  
> 
> gcc/
>   * hwint.c (ceil_log2): Fix comment.  Return 0 for 0.
OK
Jeff


Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On 07/24/2018 09:06 PM, Alexandre Oliva wrote:
> On Jul 24, 2018, Tom de Vries  wrote:
> 
>> There's a design principle in GCC that code generation and debug generation
>> are independent.  This guarantees that if you're encountering a problem in an
>> application without debug info, you can recompile it with -g and be certain
>> that you can reproduce the same problem, and use the debug info to debug the
>> problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
>> breaks this invariant
> 
> I thought of a way to not break it: enable the debug info generation
> machinery, including VTA and SFN, but discard those only at the very end
> if -g is not enabled.  The downside is that it would likely slow -Og
> down significantly, but who uses it without -g anyway?

I thought of the same.  I've submitted a patch here that uses SFN:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01391.html . VTA is not
needed AFAIU.

Thanks,
- Tom


Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-24 Thread James Greenhalgh
On Tue, Jul 24, 2018 at 03:22:02PM -0500, Steve Ellcey wrote:
> This is a patch for PR 86538, to define an __ARM_FEATURE_LSE macro
> when LSE is available.  Richard Earnshaw closed PR 86538 as WONTFIX
> because the ACLE (Arm C Language Extension) does not require this
> macro and because he is concerned that it might encourage people to
> use inline assembly instead of the __sync and atomic intrinsics.
> (See actual comments in the defect report.)
> 
> While I agree that we want people to use the intrinsics I still think
> there are use cases where people may want to know if LSE is available
> or not and there is currrently no (simple) way to determine if this feature
> is available since it can be turned or and off independently of the
> architecture used.  Also, as a general principle, I  think any feature
> that can be toggled on or off by the compiler should provide a way for
> users to determine what its state is.

Well, we blow that design principle all over the place (find me a macro
which tells you whether AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW is on for
example :-) )

A better design principle would be that if we think language programmers
may want to compile in different C code depending on a compiler option, we
should consider adding a feature macro.

> So what do other ARM maintainers and users think?  Is this a useful
> feature to have in GCC?

I'm with Richard on this one.

Whether LSE is available or not at compile time, the best user strategy is
to use the C11/C++11 atomic extensions. That's where the memory model is
well defined, well reasoned about, and well implemented.

Purely in ACLE we're not keen on providing macros that don't provide choice
to a C level programmer (i.e. change the prescence of intrinsics).

You could well imagine an inline asm programmer wanting to choose between an
LSE path and an Armv8.0-A path; but I can't imagine what they would want to
do on that path that couldn't be expressed better in the C language. You
might say they want to validate presence of the instruction; but that will
need to be a dynamic check outside of ACLE anyway.

All of which is to say, I don't think that this is a neccessary macro. Each
time I've seen it requested by a user, we've told them the same thing; what
do you want to express here that isn't better expressed by C atomic
primitives.

I'd say this patch isn't desirable for trunk. I'd be interested in use cases
that need a static decision on presence of LSE that are not better expressed
using higher level language features.

Thanks,
James



[PATCH] PR libstdc++/86658 fix __niter_wrap to not copy invalid iterators

2018-07-24 Thread Jonathan Wakely

An output iterator passed as the unused first argument to __niter_wrap
might have already been invalidated, so don't copy it.

PR libstdc++/86658
* include/bits/stl_algobase.h (__niter_wrap<_Iterator>): Pass unused
parameter by reference, to avoid copying invalid iterators.
* testsuite/25_algorithms/copy/86658.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit e2dcaf432a8fa6e8e9d96b03003ece28fa3f1ca6
Author: Jonathan Wakely 
Date:   Tue Jul 24 21:33:54 2018 +0100

PR libstdc++/86658 fix __niter_wrap to not copy invalid iterators

An output iterator passed as the unused first argument to __niter_wrap
might have already been invalidated, so don't copy it.

PR libstdc++/86658
* include/bits/stl_algobase.h (__niter_wrap<_Iterator>): Pass unused
parameter by reference, to avoid copying invalid iterators.
* testsuite/25_algorithms/copy/86658.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index f0130bc4123..b1ecd83ddb7 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -288,7 +288,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // No need to wrap, iterator already has the right type.
   template
 inline _Iterator
-__niter_wrap(_Iterator, _Iterator __res)
+__niter_wrap(const _Iterator&, _Iterator __res)
 { return __res; }
 
   // All of these auxiliary structs serve two purposes.  (1) Replace
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/86658.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/86658.cc
new file mode 100644
index 000..600747a823d
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/86658.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run }
+
+#define _GLIBCXX_DEBUG
+#include 
+#include 
+
+void
+test01()
+{
+  int i[1] = { 1 };
+  std::vector v(1);
+  std::copy(i, i+1, std::inserter(v, v.end()));
+}
+
+int
+main()
+{
+  test01();
+}


[PATCH 11/11] [nvptx] Generalize state propagation and synchronization

2018-07-24 Thread cesar
From: Tom de Vries 

As the title mentions, this patch generalizes the state propagation and
synchronization code. Note that while the patch makes reference to
large_vectors, they are not enabled in nvptx_goacc_validate_dims.
Therefore, only the worker case is exercised in this patch.

2018-XX-YY  Tom de Vries  
Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (oacc_bcast_partition): Declare.
(nvptx_option_override): Init oacc_bcast_partition.
(nvptx_init_oacc_workers): New function.
(nvptx_declare_function_name): Call nvptx_init_oacc_workers.
(nvptx_needs_shared_bcast): New function.
(nvptx_find_par): Generalize to enable vectors to use shared-memory
to propagate state.
(nvptx_shared_propagate): Initialize vector bcast partition and
synchronization state.
(nvptx_single):  Generalize to enable vectors to use shared-memory
to propagate state.
(nvptx_process_pars): Likewise.
(nvptx_set_current_function): Initialize oacc_broadcast_partition.
* config/nvptx/nvptx.h (struct machine_function): Add
bcast_partition and sync_bar members.

(cherry picked from openacc-gcc-7-branch commit
628f439f33ed6f689656a1ed8ff74db97e7ec3ed, and commit
293e415e04d6b407e59118253e5fdfe539000cfe)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 7d49b4f..abd47ac 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -136,6 +136,7 @@ static GTY((cache)) hash_table 
*needed_fndecls_htab;
memory.  It'd be nice if PTX supported common blocks, because then
this could be shared across TUs (taking the largest size).  */
 static unsigned oacc_bcast_size;
+static unsigned oacc_bcast_partition;
 static unsigned oacc_bcast_align;
 static GTY(()) rtx oacc_bcast_sym;
 
@@ -154,6 +155,8 @@ static bool need_softstack_decl;
 /* True if any function references __nvptx_uni.  */
 static bool need_unisimt_decl;
 
+static int nvptx_mach_max_workers ();
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -213,6 +216,7 @@ nvptx_option_override (void)
   oacc_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__oacc_bcast");
   SET_SYMBOL_DATA_AREA (oacc_bcast_sym, DATA_AREA_SHARED);
   oacc_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+  oacc_bcast_partition = 0;
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
@@ -1101,6 +1105,40 @@ nvptx_init_axis_predicate (FILE *file, int regno, const 
char *name)
   fprintf (file, "\t}\n");
 }
 
+/* Emit code to initialize OpenACC worker broadcast and synchronization
+   registers.  */
+
+static void
+nvptx_init_oacc_workers (FILE *file)
+{
+  fprintf (file, "\t{\n");
+  fprintf (file, "\t\t.reg.u32\t%%tidy;\n");
+  if (cfun->machine->bcast_partition)
+{
+  fprintf (file, "\t\t.reg.u64\t%%t_bcast;\n");
+  fprintf (file, "\t\t.reg.u64\t%%y64;\n");
+}
+  fprintf (file, "\t\tmov.u32\t\t%%tidy, %%tid.y;\n");
+  if (cfun->machine->bcast_partition)
+{
+  fprintf (file, "\t\tcvt.u64.u32\t%%y64, %%tidy;\n");
+  fprintf (file, "\t\tadd.u64\t\t%%y64, %%y64, 1; // vector ID\n");
+  fprintf (file, "\t\tcvta.shared.u64\t%%t_bcast, __oacc_bcast;\n");
+  fprintf (file, "\t\tmad.lo.u64\t%%r%d, %%y64, %d, %%t_bcast; "
+  "// vector broadcast offset\n",
+  REGNO (cfun->machine->bcast_partition),
+  oacc_bcast_partition);
+}
+  /* Verify oacc_bcast_size.  */
+  gcc_assert (oacc_bcast_partition * (nvptx_mach_max_workers () + 1)
+ <= oacc_bcast_size);
+  if (cfun->machine->sync_bar)
+fprintf (file, "\t\tadd.u32\t\t%%r%d, %%tidy, 1; "
+"// vector synchronization barrier\n",
+REGNO (cfun->machine->sync_bar));
+  fprintf (file, "\t}\n");
+}
+
 /* Emit code to initialize predicate and master lane index registers for
-muniform-simt code generation variant.  */
 
@@ -1327,6 +1365,8 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
   if (cfun->machine->unisimt_predicate
   || (cfun->machine->has_simtreg && !crtl->is_leaf))
 nvptx_init_unisimt_predicate (file);
+  if (cfun->machine->bcast_partition || cfun->machine->sync_bar)
+nvptx_init_oacc_workers (file);
 }
 
 /* Output code for switching uniform-simt state.  ENTERING indicates whether
@@ -3045,6 +3085,19 @@ nvptx_split_blocks (bb_insn_map_t *map)
 }
 }
 
+/* Return true if MASK contains parallelism that requires shared
+   memory to broadcast.  */
+
+static bool
+nvptx_needs_shared_bcast (unsigned mask)
+{
+  bool worker = mask & GOMP_DIM_MASK (GOMP_DIM_WORKER);
+  bool large_vector = (mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
+&& nvptx_mach_vector_length () != PTX_WARP_SIZE;
+
+  return worker || large_vector;
+}
+
 /* BLOCK is a basic block containing a head or tail instruction.
Locate the associated prehead 

[PATCH 09/11] [nvptx] Use TARGET_SET_CURRENT_FUNCTION

2018-07-24 Thread cesar
From: Cesar Philippidis 

Chung-Lin had originally defined TARGET_SET_CURRENT_FUNCTION as part
of his gang-local variable patch. But I intend to introduce those
changes at a later time. Eventually the state propagation code will
utilize nvptx_set_current_function to reset the reduction buffer
offset. However, for the time being, this patch only introduces
it as a placeholder.

2018-XX-YY  Chung-Lin Tang  
Cesar Philippidis  

[PATCH 10/11] [nvptx] Use MAX, MIN, ROUND_UP macros

2018-07-24 Thread cesar
From: Tom de Vries 

This patch replaces the confusing, in-lined min, max and rounding code
sprinkled throughout the nvptx BE with calls to MIN, MAX, and ROUND_UP
macros.

2018-XX-YY  Tom de Vries  
Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (nvptx_gen_shared_bcast, shared_prop_gen)
(nvptx_goacc_expand_accel_var): Use MAX and ROUND_UP.
(nvptx_assemble_value, nvptx_output_skip): Use MIN.
(nvptx_shared_propagate, nvptx_single, nvptx_expand_shared_addr): Use
MAX.

(cherry picked from openacc-gcc-7-branch commit
d3d6411c160071f70f995bbcd92f617aec67ba10)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 09daedd..7d49b4f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1812,9 +1812,8 @@ nvptx_gen_shared_bcast (rtx reg, propagate_mask pm, 
unsigned rep,
  {
unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
 
-   if (align > oacc_bcast_align)
- oacc_bcast_align = align;
-   data->offset = (data->offset + align - 1) & ~(align - 1);
+   oacc_bcast_align = MAX (oacc_bcast_align, align);
+   data->offset = ROUND_UP (data->offset, align);
addr = data->base;
gcc_assert (data->base != NULL);
if (data->offset)
@@ -1936,8 +1935,7 @@ nvptx_assemble_value (unsigned HOST_WIDE_INT val, 
unsigned size)
 {
   val >>= part * BITS_PER_UNIT;
   part = init_frag.size - init_frag.offset;
-  if (part > size)
-   part = size;
+  part = MIN (part, size);
 
   unsigned HOST_WIDE_INT partial
= val << (init_frag.offset * BITS_PER_UNIT);
@@ -2000,8 +1998,7 @@ nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size)
   if (init_frag.offset)
 {
   unsigned part = init_frag.size - init_frag.offset;
-  if (part > size)
-   part = (unsigned) size;
+  part = MIN (part, (unsigned)size);
   size -= part;
   nvptx_assemble_value (0, part);
 }
@@ -3957,9 +3954,8 @@ shared_prop_gen (rtx reg, propagate_mask pm, unsigned 
rep, void *data_,
   /* Starting a loop, initialize pointer.*/
   unsigned align = GET_MODE_ALIGNMENT (GET_MODE (reg)) / BITS_PER_UNIT;
 
-  if (align > oacc_bcast_align)
-   oacc_bcast_align = align;
-  data->offset = (data->offset + align - 1) & ~(align - 1);
+  oacc_bcast_align = MAX (oacc_bcast_align, align);
+  data->offset = ROUND_UP (data->offset, align);
 
   data->ptr = gen_reg_rtx (Pmode);
 
@@ -4000,8 +3996,7 @@ nvptx_shared_propagate (bool pre_p, bool is_call, 
basic_block block,
   rtx init = gen_rtx_SET (data.base, oacc_bcast_sym);
   emit_insn_after (init, insn);
 
-  if (oacc_bcast_size < data.offset)
-   oacc_bcast_size = data.offset;
+  oacc_bcast_size = MAX (oacc_bcast_size, data.offset);
 }
   return empty;
 }
@@ -4379,8 +4374,7 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
  data.base = oacc_bcast_sym;
  data.ptr = 0;
 
- if (oacc_bcast_size < GET_MODE_SIZE (SImode))
-   oacc_bcast_size = GET_MODE_SIZE (SImode);
+ oacc_bcast_size = MAX (oacc_bcast_size, GET_MODE_SIZE (SImode));
 
  data.offset = 0;
  emit_insn_before (nvptx_gen_shared_bcast (pvar, PM_read, 0, ,
@@ -5122,13 +5116,11 @@ nvptx_expand_shared_addr (tree exp, rtx target,
 return target;
 
   unsigned align = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 2));
-  if (align > worker_red_align)
-worker_red_align = align;
+  worker_red_align = MAX (worker_red_align, align);
 
   unsigned offset = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 0));
   unsigned size = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1));
-  if (size + offset > worker_red_size)
-worker_red_size = size + offset;
+  worker_red_size = MAX (worker_red_size, size + offset);
 
   rtx addr = worker_red_sym;
   if (offset)
-- 
2.7.4



[PATCH 08/11] [nvptx] Add axis_dim

2018-07-24 Thread cesar
From: Tom de Vries 

This patch introduces an axis_dim member to the machine_function
struct. The launch geometry will be queried frequently enough so that
its justified to store that information with each cfun.

2018-XX-YY  Tom de Vries  
Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (MACH_VECTOR_LENGTH, MACH_MAX_WORKERS): Define.
(nvptx_mach_max_workers, nvptx_mach_vector_length): New function.
(nvptx_reorg): Set function-specific axis_dim's.
* config/nvptx/nvptx.h (struct machine_function): Add axis_dims.

(cherry picked from openacc-gcc-7-branch commit
a36cbbe19af6822abd203d167905b8ca61d95992)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index fa27f71..e3a02d2 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2884,6 +2884,23 @@ struct offload_attrs
   int max_workers;
 };
 
+/* Define entries for cfun->machine->axis_dim.  */
+
+#define MACH_VECTOR_LENGTH 0
+#define MACH_MAX_WORKERS 1
+
+static int ATTRIBUTE_UNUSED
+nvptx_mach_max_workers ()
+{
+  return cfun->machine->axis_dim[MACH_MAX_WORKERS];
+}
+
+static int ATTRIBUTE_UNUSED
+nvptx_mach_vector_length ()
+{
+  return cfun->machine->axis_dim[MACH_VECTOR_LENGTH];
+}
+
 /* Loop structure of the function.  The entire function is described as
a NULL loop.  */
 
@@ -4831,6 +4848,9 @@ nvptx_reorg (void)
 
   populate_offload_attrs ();
 
+  cfun->machine->axis_dim[MACH_VECTOR_LENGTH] = oa.vector_length;
+  cfun->machine->axis_dim[MACH_MAX_WORKERS] = oa.max_workers;
+
   /* If there is worker neutering, there must be vector
 neutering.  Otherwise the hardware will fail.  */
   gcc_assert (!(oa.mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dfa1e9a..90fb2c9 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -210,6 +210,8 @@ struct GTY(()) machine_function
   int return_mode; /* Return mode of current fn.
  (machine_mode not defined yet.) */
   rtx axis_predicate[2]; /* Neutering predicates.  */
+  int axis_dim[2]; /* Maximum number of threads on each axis, dim[0] is
+ vector_length, dim[1] is num_workers.  */
   rtx unisimt_master; /* 'Master lane index' for -muniform-simt.  */
   rtx unisimt_predicate; /* Predicate for -muniform-simt.  */
   rtx unisimt_location; /* Mask location for -muniform-simt.  */
-- 
2.7.4



[PATCH 06/11] [nvptx] only use one bar.sync barriers in OpenACC offloaded code

2018-07-24 Thread cesar
From: Cesar Philippidis 

This patch teaches nvptx_single to always use barrier '0' for CTA
synchronization. This started off as a cosmetic change, but later on
each large vector (i.e. one that larger than a PTX warp) will need to
use its own unique thread barrier to avoid thread divergence.
Consequently, this patch begins the process of teaching the nvptx
state propagator how to use a common thread barrier for each
propagation level.

2018-XX-YY  Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (nvptx_single): Always pass false to
nvptx_cta_sync.
(nvptx_process_pars): Likewise.

(cherry picked from openacc-gcc-7-branch commit
ac0a55b8e72363a09f7968474744c51c1fa7720a)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4d46d89..1f954a6 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4374,7 +4374,7 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
  /* This barrier is needed to avoid worker zero clobbering
 the broadcast buffer before all the other workers have
 had a chance to read this instance of it.  */
- emit_insn_before (nvptx_cta_sync (true), tail);
+ emit_insn_before (nvptx_cta_sync (false), tail);
}
 
   extract_insn (tail);
@@ -4501,7 +4501,7 @@ nvptx_process_pars (parallel *par)
{
  /* Insert begin and end synchronizations.  */
  emit_insn_before (nvptx_cta_sync (false), par->forked_insn);
- emit_insn_before (nvptx_cta_sync (true), par->join_insn);
+ emit_insn_before (nvptx_cta_sync (false), par->join_insn);
}
 }
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
-- 
2.7.4



[PATCH 07/11] [nvptx] Add thread count parm to bar.sync

2018-07-24 Thread cesar
From: Tom de Vries 

This patch extends the nvptx_barsync pattern to accept an integeral
parameter to specify how many threads utilize the requested
barrier. This is necessary variable length vectors, as each large
vector will require own thread barrier.

2018-XX-YY  Tom de Vries  
Cesar Philippidis  

gcc/
* config/nvptx/nvptx.md (nvptx_barsync): Add and handle operand.
* config/nvptx/nvptx.c (nvptx_cta_sync): Change arguments to take in a
lock and thread count.  Update call to gen_nvptx_barsync.
(nvptx_single, nvptx_process_pars): Update calls to nvptx_cta_sync.

(cherry picked from openacc-gcc-7-branch commit
49b69b4002496c9f5759d933e0a6233660434e70)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 1f954a6..fa27f71 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3989,13 +3989,14 @@ nvptx_shared_propagate (bool pre_p, bool is_call, 
basic_block block,
   return empty;
 }
 
-/* Emit a CTA-level synchronization barrier.  We use different
-   markers for before and after synchronizations.  */
+/* Emit a CTA-level synchronization barrier (bar.sync).  LOCK is the
+   barrier number, which is an integer or a register.  THREADS is the
+   number of threads controlled by the barrier.  */
 
 static rtx
-nvptx_cta_sync (bool after)
+nvptx_cta_sync (rtx lock, int threads)
 {
-  return gen_nvptx_barsync (GEN_INT (after));
+  return gen_nvptx_barsync (lock, GEN_INT (threads));
 }
 
 #if WORKAROUND_PTXJIT_BUG
@@ -4355,6 +4356,8 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
  /* Includes worker mode, do spill & fill.  By construction
 we should never have worker mode only. */
  broadcast_data_t data;
+ rtx barrier = GEN_INT (0);
+ int threads = 0;
 
  data.base = oacc_bcast_sym;
  data.ptr = 0;
@@ -4367,14 +4370,14 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
false),
before);
  /* Barrier so other workers can see the write.  */
- emit_insn_before (nvptx_cta_sync (false), tail);
+ emit_insn_before (nvptx_cta_sync (barrier, threads), tail);
  data.offset = 0;
  emit_insn_before (nvptx_gen_shared_bcast (pvar, PM_write, 0, ,
false), tail);
  /* This barrier is needed to avoid worker zero clobbering
 the broadcast buffer before all the other workers have
 had a chance to read this instance of it.  */
- emit_insn_before (nvptx_cta_sync (false), tail);
+ emit_insn_before (nvptx_cta_sync (barrier, threads), tail);
}
 
   extract_insn (tail);
@@ -4496,12 +4499,15 @@ nvptx_process_pars (parallel *par)
   bool empty = nvptx_shared_propagate (true, is_call,
   par->forked_block, par->fork_insn,
   false);
+  rtx barrier = GEN_INT (0);
+  int threads = 0;
 
   if (!empty || !is_call)
{
  /* Insert begin and end synchronizations.  */
- emit_insn_before (nvptx_cta_sync (false), par->forked_insn);
- emit_insn_before (nvptx_cta_sync (false), par->join_insn);
+ emit_insn_before (nvptx_cta_sync (barrier, threads),
+   par->forked_insn);
+ emit_insn_before (nvptx_cta_sync (barrier, threads), par->join_insn);
}
 }
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 2988f5d..9538333 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1455,10 +1455,16 @@
   [(set_attr "atomic" "true")])
 
 (define_insn "nvptx_barsync"
-  [(unspec_volatile [(match_operand:SI 0 "const_int_operand" "")]
+  [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
+(match_operand:SI 1 "const_int_operand")]
UNSPECV_BARSYNC)]
   ""
-  "\\tbar.sync\\t%0;"
+  {
+if (INTVAL (operands[1]) == 0)
+  return "\\tbar.sync\\t%0;";
+else
+  return "\\tbar.sync\\t%0, %1;";
+  }
   [(set_attr "predicable" "false")])
 
 (define_expand "memory_barrier"
-- 
2.7.4



[PATCH 05/11] [nvptx] Fix whitespace in nvptx_single and nvptx_neuter_pars

2018-07-24 Thread cesar
From: Tom de Vries 

This patch only adjusts white space.

2018-XX-YY  Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (nvptx_single): Fix whitespace.
(nvptx_neuter_pars): Likewise.

(cherry picked from openacc-gcc-7-branch commit
10f697dfcdaa77b842de6e9a62c68b33a49d3c16)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 447425f..4d46d89 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4250,7 +4250,7 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
pred = gen_reg_rtx (BImode);
cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
  }
-   
+
rtx br;
if (mode == GOMP_DIM_VECTOR)
  br = gen_br_true (pred, label);
@@ -4584,7 +4584,7 @@ nvptx_neuter_pars (parallel *par, unsigned modes, 
unsigned outer)
 }
 
   if (skip_mask)
-  nvptx_skip_par (skip_mask, par);
+nvptx_skip_par (skip_mask, par);
   
   if (par->next)
 nvptx_neuter_pars (par->next, modes, outer);
-- 
2.7.4



[PATCH 03/11] [nvptx] Consolidate offloaded function attributes into struct offload_attrs

2018-07-24 Thread cesar
From: Cesar Philippidis 

This patch introduces a new struct offload_attrs, which contains the
details regarding the offload function launch geometry. In addition to
its current usage to neuter worker and vector threads, it will
eventually by used to validate the compile-time launch geometry
requested by the user.

2018-XX-YY  Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (struct offload_attrs): New.
(populate_offload_attrs): New function.
(nvptx_reorg): Use it to extract partitioning mask.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index fb3e0c7..1b83b3c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2871,6 +2871,17 @@ nvptx_reorg_uniform_simt ()
 }
 }
 
+/* Offloading function attributes.  */
+
+struct offload_attrs
+{
+  unsigned mask;
+  int num_gangs;
+  int num_workers;
+  int vector_length;
+  int max_workers;
+};
+
 /* Loop structure of the function.  The entire function is described as
a NULL loop.  */
 
@@ -4568,6 +4579,56 @@ nvptx_neuter_pars (parallel *par, unsigned modes, 
unsigned outer)
 nvptx_neuter_pars (par->next, modes, outer);
 }
 
+static void
+populate_offload_attrs (offload_attrs *oa)
+{
+  tree attr = oacc_get_fn_attrib (current_function_decl);
+  tree dims = TREE_VALUE (attr);
+  unsigned ix;
+
+  oa->mask = 0;
+
+  for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+{
+  tree t = TREE_VALUE (dims);
+  int size = (t == NULL_TREE) ? 0 : TREE_INT_CST_LOW (t);
+  tree allowed = TREE_PURPOSE (dims);
+
+  if (size != 1 && !(allowed && integer_zerop (allowed)))
+   oa->mask |= GOMP_DIM_MASK (ix);
+
+  switch (ix)
+   {
+   case GOMP_DIM_GANG:
+ oa->num_gangs = size;
+ break;
+
+   case GOMP_DIM_WORKER:
+ oa->num_workers = size;
+ break;
+
+   case GOMP_DIM_VECTOR:
+ oa->vector_length = size;
+ break;
+   }
+}
+
+  if (oa->vector_length == 0)
+{
+  /* FIXME: Need a more graceful way to handle large vector
+lengths in OpenACC routines.  */
+  if (!lookup_attribute ("omp target entrypoint",
+DECL_ATTRIBUTES (current_function_decl)))
+   oa->vector_length = PTX_WARP_SIZE;
+  else
+   oa->vector_length = PTX_VECTOR_LENGTH;
+}
+  if (oa->num_workers == 0)
+oa->max_workers = PTX_CTA_SIZE / oa->vector_length;
+  else
+oa->max_workers = oa->num_workers;
+}
+
 #if WORKAROUND_PTXJIT_BUG_2
 /* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
is needed in the nvptx target because the branches generated for
@@ -4749,27 +4810,19 @@ nvptx_reorg (void)
 {
   /* If we determined this mask before RTL expansion, we could
 elide emission of some levels of forks and joins.  */
-  unsigned mask = 0;
-  tree dims = TREE_VALUE (attr);
-  unsigned ix;
+  offload_attrs oa;
 
-  for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
-   {
- int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
- tree allowed = TREE_PURPOSE (dims);
+  populate_offload_attrs ();
 
- if (size != 1 && !(allowed && integer_zerop (allowed)))
-   mask |= GOMP_DIM_MASK (ix);
-   }
   /* If there is worker neutering, there must be vector
 neutering.  Otherwise the hardware will fail.  */
-  gcc_assert (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
- || (mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)));
+  gcc_assert (!(oa.mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+ || (oa.mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)));
 
   /* Discover & process partitioned regions.  */
   parallel *pars = nvptx_discover_pars (_insn_map);
   nvptx_process_pars (pars);
-  nvptx_neuter_pars (pars, mask, 0);
+  nvptx_neuter_pars (pars, oa.mask, 0);
   delete pars;
 }
 
-- 
2.7.4



[PATCH 04/11] [nvptx] Make nvptx state propagation function names more generic

2018-07-24 Thread cesar
From: Cesar Philippidis 

This patch renames various state propagation functions into somewhat
that reflects their usage in generic worker and vector contexts. E.g.,
whereas before nvptx_wpropagate used to be used exclusively for worker
state propagation, it will eventually be used for any state
propagation using multiple warps. Because variable length vectors will
be able use both shared memory and warp shuffles for broadcasting, the
old vector-specific functions now contain 'warp' in their name.

2018-XX-YY  Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (nvptx_gen_wcast): Rename as
nvptx_gen_warp_bcast.
(nvptx_gen_wcast): Rename to nvptx_gen_shared_bcast, add bool
vector argument, and update call to nvptx_gen_shared_bcast.
(propagator_fn): Add bool argument.
(nvptx_propagate): New bool argument, pass bool argument to fn.
(vprop_gen): Rename to warp_prop_gen, update call to
nvptx_gen_warp_bcast.
(nvptx_vpropagate): Rename to nvptx_warp_propagate, update call to
nvptx_propagate.
(wprop_gen): Rename to shared_prop_gen, update call to
nvptx_gen_shared_bcast.
(nvptx_wpropagate): Rename to nvptx_shared_propagate, update call
to nvptx_propagate.
(nvptx_wsync): Rename to nvptx_cta_sync.
(nvptx_single): Update calls to nvptx_gen_warp_bcast,
nvptx_gen_shared_bcast and nvptx_cta_sync.
(nvptx_process_pars): Likewise.
(write_worker_buffer): Rename as write_shared_buffer.
(nvptx_file_end): Update calls to write_shared_buffer.
(nvptx_expand_worker_addr): Rename as nvptx_expand_shared_addr.
(nvptx_expand_builtin): Update call to nvptx_expand_shared_addr.
(nvptx_get_worker_red_addr): Rename as nvptx_get_shared_red_addr.
(nvptx_goacc_reduction_setup): Update call to
nvptx_get_shared_red_addr.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 1b83b3c..447425f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1750,7 +1750,7 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, 
nvptx_shuffle_kind kind)
across the vectors of a single warp.  */
 
 static rtx
-nvptx_gen_vcast (rtx reg)
+nvptx_gen_warp_bcast (rtx reg)
 {
   return nvptx_gen_shuffle (reg, reg, const0_rtx, SHUFFLE_IDX);
 }
@@ -1781,7 +1781,8 @@ enum propagate_mask
how many loop iterations will be executed (0 for not a loop).  */

 static rtx
-nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t 
*data)
+nvptx_gen_shared_bcast (rtx reg, propagate_mask pm, unsigned rep,
+   broadcast_data_t *data, bool vector)
 {
   rtx  res;
   machine_mode mode = GET_MODE (reg);
@@ -1795,7 +1796,7 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned 
rep, broadcast_data_t *dat
start_sequence ();
if (pm & PM_read)
  emit_insn (gen_sel_truesi (tmp, reg, GEN_INT (1), const0_rtx));
-   emit_insn (nvptx_gen_wcast (tmp, pm, rep, data));
+   emit_insn (nvptx_gen_shared_bcast (tmp, pm, rep, data, vector));
if (pm & PM_write)
  emit_insn (gen_rtx_SET (reg, gen_rtx_NE (BImode, tmp, const0_rtx)));
res = get_insns ();
@@ -1815,6 +1816,7 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned 
rep, broadcast_data_t *dat
  oacc_bcast_align = align;
data->offset = (data->offset + align - 1) & ~(align - 1);
addr = data->base;
+   gcc_assert (data->base != NULL);
if (data->offset)
  addr = gen_rtx_PLUS (Pmode, addr, GEN_INT (data->offset));
  }
@@ -3816,11 +3818,11 @@ nvptx_find_sese (auto_vec , 
bb_pair_vec_t )
regions and (b) only propagating stack entries that are used.  The
latter might be quite hard to determine.  */
 
-typedef rtx (*propagator_fn) (rtx, propagate_mask, unsigned, void *);
+typedef rtx (*propagator_fn) (rtx, propagate_mask, unsigned, void *, bool);
 
 static bool
 nvptx_propagate (bool is_call, basic_block block, rtx_insn *insn,
-propagate_mask rw, propagator_fn fn, void *data)
+propagate_mask rw, propagator_fn fn, void *data, bool vector)
 {
   bitmap live = DF_LIVE_IN (block);
   bitmap_iterator iterator;
@@ -3855,7 +3857,7 @@ nvptx_propagate (bool is_call, basic_block block, 
rtx_insn *insn,
  
  emit_insn (gen_rtx_SET (idx, GEN_INT (fs)));
  /* Allow worker function to initialize anything needed.  */
- rtx init = fn (tmp, PM_loop_begin, fs, data);
+ rtx init = fn (tmp, PM_loop_begin, fs, data, vector);
  if (init)
emit_insn (init);
  emit_label (label);
@@ -3864,7 +3866,7 @@ nvptx_propagate (bool is_call, basic_block block, 
rtx_insn *insn,
}
   if (rw & PM_read)
emit_insn (gen_rtx_SET (tmp, gen_rtx_MEM (DImode, 

[PATCH 02/11] [nvptx] Rename worker_bcast variables oacc_bcast.

2018-07-24 Thread cesar
From: Cesar Philippidis 

Eventually, we want the nvptx BE to use a common shared memory buffer
for both worker and vector state propagation (albeit using different
partitions of shared memory for each logical thread). This patch
renames the worker_bcast variables into a more generic oacc_bcast.

2018-XX-YY  Cesar Philippidis  

gcc/
* config/nvptx/nvptx.c (worker_bcast_size): Rename as
oacc_bcast_size.
(worker_bcast_align): Rename as oacc_bcast_align.
(worker_bcast_sym): Rename as oacc_bcast_sym.
(nvptx_option_override): Update usage of oacc_bcast_*.
(struct wcast_data_t): Rename as broadcast_data_t.
(nvptx_gen_wcast): Update type of data argument and usage of
oacc_bcast_align.
(wprop_gen): Update type of data_ and usage of oacc_bcast_align.
(nvptx_wpropagate): Update type of data and usage of
oacc_bcast_{sym,size}.
(nvptx_single): Update type of data and usage of oacc_bcast_size.
(nvptx_file_end): Update usage of oacc_bcast_{sym,align,size}.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 521f83e..fb3e0c7 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -129,14 +129,15 @@ struct tree_hasher : ggc_cache_ptr_hash
 static GTY((cache)) hash_table *declared_fndecls_htab;
 static GTY((cache)) hash_table *needed_fndecls_htab;
 
-/* Buffer needed to broadcast across workers.  This is used for both
-   worker-neutering and worker broadcasting.  It is shared by all
-   functions emitted.  The buffer is placed in shared memory.  It'd be
-   nice if PTX supported common blocks, because then this could be
-   shared across TUs (taking the largest size).  */
-static unsigned worker_bcast_size;
-static unsigned worker_bcast_align;
-static GTY(()) rtx worker_bcast_sym;
+/* Buffer needed to broadcast across workers and vectors.  This is
+   used for both worker-neutering and worker broadcasting, and
+   vector-neutering and boardcasting when vector_length > 32.  It is
+   shared by all functions emitted.  The buffer is placed in shared
+   memory.  It'd be nice if PTX supported common blocks, because then
+   this could be shared across TUs (taking the largest size).  */
+static unsigned oacc_bcast_size;
+static unsigned oacc_bcast_align;
+static GTY(()) rtx oacc_bcast_sym;
 
 /* Buffer needed for worker reductions.  This has to be distinct from
the worker broadcast array, as both may be live concurrently.  */
@@ -209,9 +210,9 @@ nvptx_option_override (void)
   declared_libfuncs_htab
 = hash_table::create_ggc (17);
 
-  worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_bcast");
-  SET_SYMBOL_DATA_AREA (worker_bcast_sym, DATA_AREA_SHARED);
-  worker_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+  oacc_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__oacc_bcast");
+  SET_SYMBOL_DATA_AREA (oacc_bcast_sym, DATA_AREA_SHARED);
+  oacc_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
@@ -1756,7 +1757,7 @@ nvptx_gen_vcast (rtx reg)
 
 /* Structure used when generating a worker-level spill or fill.  */
 
-struct wcast_data_t
+struct broadcast_data_t
 {
   rtx base;  /* Register holding base addr of buffer.  */
   rtx ptr;  /* Iteration var,  if needed.  */
@@ -1780,7 +1781,7 @@ enum propagate_mask
how many loop iterations will be executed (0 for not a loop).  */

 static rtx
-nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, wcast_data_t *data)
+nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t 
*data)
 {
   rtx  res;
   machine_mode mode = GET_MODE (reg);
@@ -1810,8 +1811,8 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned 
rep, wcast_data_t *data)
  {
unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
 
-   if (align > worker_bcast_align)
- worker_bcast_align = align;
+   if (align > oacc_bcast_align)
+ oacc_bcast_align = align;
data->offset = (data->offset + align - 1) & ~(align - 1);
addr = data->base;
if (data->offset)
@@ -3916,15 +3917,15 @@ nvptx_vpropagate (bool is_call, basic_block block, 
rtx_insn *insn)
 static rtx
 wprop_gen (rtx reg, propagate_mask pm, unsigned rep, void *data_)
 {
-  wcast_data_t *data = (wcast_data_t *)data_;
+  broadcast_data_t *data = (broadcast_data_t *)data_;
 
   if (pm & PM_loop_begin)
 {
   /* Starting a loop, initialize pointer.*/
   unsigned align = GET_MODE_ALIGNMENT (GET_MODE (reg)) / BITS_PER_UNIT;
 
-  if (align > worker_bcast_align)
-   worker_bcast_align = align;
+  if (align > oacc_bcast_align)
+   oacc_bcast_align = align;
   data->offset = (data->offset + align - 1) & ~(align - 1);
 
   data->ptr = gen_reg_rtx (Pmode);
@@ -3949,7 +3950,7 @@ wprop_gen (rtx reg, propagate_mask pm, 

[PATCH 00/11] [nvptx] Initial vector length changes

2018-07-24 Thread cesar
From: Cesar Philippidis 

This patch series contains various cleanups and structural
reorganizations to the NVPTX BE in preparation for the forthcoming
variable length vector length enhancements. Tom, in order to make
these changes easier for you to review, I broke these patches into
logical components. If approved for trunk, would you like to see these
patches committed individually, or all together in a single huge
commit?

One notable change in this patch set is the partial inclusion of the
PTX_DEFAULT_RUNTIME_DIM change that I previously placed with the
libgomp default geometry update patch that I posted a couple of weeks
ago. I don't want to block this patch series so I included the nvptx
changes in patch 01.

It this OK for trunk? I regtested both standalone and offloading
compiliers. I'm seeing some inconsistencies in the standalone compiler
results, so I might rerun those just to be safe. But the results using
nvptx as an offloading compiler came back clean.

Thanks,
Cesar


[PATCH 01/11] [nvptx] Update openacc dim macros

2018-07-24 Thread cesar
From: Cesar Philippidis 

Besides for updating the macros for the NVPTX OpenACC dims, this patch
also renames PTX_GANG_DEFAULT to PTX_DEFAULT_RUNTIME_DIM. I had
originally included the PTX_GANG_DEFAULT hunk in an earlier libgomp
patch, but going forward it makes sense to isolate the nvptx and
libgomp changes when possible, which this patch series does.

2018-XX-YY  Cesar Philippidis  

gcc/
config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to
PTX_DEFAULT_RUNTIME_DIM.
* (PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH,
PTX_DEFAULT_RUNTIME_DIM): Move to the top of the file.
(PTX_WARP_SIZE): Define.
(PTX_CTA_SIZE): Define.
(nvptx_simt_vf): Return PTX_WARP_SIZE instead of PTX_VECTOR_LENGTH.
(nvptx_goacc_validate_dims): Use PTX_DEFAULT_RUNTIME_DIM.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee..521f83e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -81,6 +81,13 @@
 #define WORKAROUND_PTXJIT_BUG_2 1
 #define WORKAROUND_PTXJIT_BUG_3 1
 
+/* Define dimension sizes for known hardware.  */
+#define PTX_VECTOR_LENGTH 32
+#define PTX_WORKER_LENGTH 32
+#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
+#define PTX_WARP_SIZE 32
+#define PTX_CTA_SIZE 1024
+
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
 {
@@ -5161,18 +5168,13 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 default: gcc_unreachable ();
 }
 }
-
-/* Define dimension sizes for known hardware.  */
-#define PTX_VECTOR_LENGTH 32
-#define PTX_WORKER_LENGTH 32
-#define PTX_GANG_DEFAULT  0 /* Defer to runtime.  */
 
 /* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
 
 static int
 nvptx_simt_vf ()
 {
-  return PTX_VECTOR_LENGTH;
+  return PTX_WARP_SIZE;
 }
 
 /* Validate compute dimensions of an OpenACC offload or routine, fill
@@ -5216,7 +5218,7 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int 
fn_level)
   if (dims[GOMP_DIM_WORKER] < 0)
dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
   if (dims[GOMP_DIM_GANG] < 0)
-   dims[GOMP_DIM_GANG] = PTX_GANG_DEFAULT;
+   dims[GOMP_DIM_GANG] = PTX_DEFAULT_RUNTIME_DIM;
   changed = true;
 }
 
-- 
2.7.4



[Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-24 Thread Steve Ellcey
This is a patch for PR 86538, to define an __ARM_FEATURE_LSE macro
when LSE is available.  Richard Earnshaw closed PR 86538 as WONTFIX
because the ACLE (Arm C Language Extension) does not require this
macro and because he is concerned that it might encourage people to
use inline assembly instead of the __sync and atomic intrinsics.
(See actual comments in the defect report.)

While I agree that we want people to use the intrinsics I still think
there are use cases where people may want to know if LSE is available
or not and there is currrently no (simple) way to determine if this feature
is available since it can be turned or and off independently of the
architecture used.  Also, as a general principle, I  think any feature
that can be toggled on or off by the compiler should provide a way for
users to determine what its state is.

So what do other ARM maintainers and users think?  Is this a useful
feature to have in GCC?

Steve Ellcey
sell...@cavium.com


2018-07-24  Steve Ellcey  

PR target/86538
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Add define of __ARM_FEATURE_LSE.


diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 40c738c..e057ba9 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -154,6 +154,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SM4, "__ARM_FEATURE_SM4", pfile);
   aarch64_def_or_undef (TARGET_F16FML, "__ARM_FEATURE_FP16_FML", pfile);
 
+  /* This is not required by ACLE, but it is useful.  */
+  aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_LSE", pfile);
+
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
  16-bit floating-point extensions.  */


Re: committed: remove redundant -Wall from -Warray-bounds (PR 82063)

2018-07-24 Thread Martin Sebor

On 07/24/2018 01:48 PM, Franz Sirl wrote:

Am 2018-07-24 um 17:35 schrieb Martin Sebor:

On 07/24/2018 03:24 AM, Franz Sirl wrote:

Am 2018-07-20 um 23:22 schrieb Martin Sebor:

As the last observation in PR 82063 Jim points out that

   Both -Warray-bounds and -Warray-bounds= are listed in the c.opt
   file as being enabled by -Wall, but they are the same option,
   and it causes this one option to be processed twice in the
   C_handle_option_auto function in the generated options.c file.
   It gets set to the same value twice, so it does work as intended,
   but this is wasteful.

I have removed the redundant -Wall from the first option and
committed the change as obvious in r262912.


Hi Martin,

this looks related to PR 68845 and my patch in there. I never posted it
to gcc-patches because I couldn't find a definitive answer on how
options duplicated between common.opt and c-family/c.opt are supposed to
be handled.
For example, Warray-bounds in common.opt is a separate option (not an
alias to Warray-bounds=), leading to separate enums for them. Is this
intended? Warray-bounds seemed to be the only option with an equal sign
doing it like that at that time. Now Wcast-align is doing the same...

Can you shed some light on this?


-Warray-bounds= (the form that takes an argument) was added in
r219577.  Before then, only the plain form existed.  If I had
to guess, the interplay between the two options (as opposed to
making the latter an alias for the new option) wasn't considered.
I didn't think of it until now either.  Your patch seems like
the right solution to me.  Let me know if you will submit it.
If not, I posted the patch below that touches this area and
that will likely need updating so I can roll your change into
it:

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01286.html


I'll post a patch tomorrow, since I already have all the changes
available and tested here.

Note that one minor change with this patch is that with
-fdiagnostics-show-option the message will show -Warray-bounds= (equal
sign added) instead of -Warray-bounds.


That should be fine.

Martin



Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-24 Thread Martin Sebor

On 07/20/2018 04:20 AM, Richard Biener wrote:

On Thu, 19 Jul 2018, Martin Sebor wrote:


Here's one more update with tweaks addressing a couple more
of Bernd's comments:

1) correct the use of TREE_STRING_LENGTH() where a number of
array elements is expected and not bytes
2) set CHARTYPE as soon as it's first determined rather than
trying to extract it again later


Please look at Bernds followup comments.  One additional note:

I see you are ultimatively using CHARTYPE to get at the size
of the access.  That is wrong.

   if (TREE_CODE (arg) == ADDR_EXPR)
 {
+  tree argtype = TREE_TYPE (arg);
+  chartype = argtype;
+
   arg = TREE_OPERAND (arg, 0);
   tree ref = arg;
   if (TREE_CODE (arg) == ARRAY_REF)
{

so the "access" is of size array_ref_element_size (arg) here.  You
may not simply use TYPE_SIZE_UNIT of sth.

That is, lookign at current trunk,

  if (TREE_CODE (arg) == ADDR_EXPR)
{
  arg = TREE_OPERAND (arg, 0);
  tree ref = arg;
  if (TREE_CODE (arg) == ARRAY_REF)
{
  tree idx = TREE_OPERAND (arg, 1);
  if (TREE_CODE (idx) != INTEGER_CST)
{
  /* Extract the variable index to prevent
 get_addr_base_and_unit_offset() from failing due to
 it.  Use it later to compute the non-constant offset
 into the string and return it to the caller.  */
  varidx = idx;
  ref = TREE_OPERAND (arg, 0);

you should scale the index here by array_ref_element_size (arg).
Or simply rewrite this to instead of using get_addr_base_and_unit_offset,
use get_inner_reference which does all that magic for you.

That is, you shouldn't need chartype.


I've made use of size array_ref_element_size() here as you suggest
and eliminated the type.  For the purposes of testing though,
I haven't been able to come up with a test case that would have
the function return something other than TYPE_SIZE_UNIT().  IIUC,
the function is used to compute the size of elements of overaligned
types and there is no way that I know of to create an array of
overaligned characters.  (If there is a way to exercise this I'd
appreciate a test case so I can add it to the test suite).

I've also fixed the other bug Bernd pointed with pointers to arrays.
The fix seems small enough that it makes sense to handle at the same
time as this bug.

Attached is an update with these changes.

Martin



Richard.



On 07/19/2018 01:49 PM, Martin Sebor wrote:

On 07/19/2018 01:17 AM, Richard Biener wrote:

On Wed, 18 Jul 2018, Martin Sebor wrote:


+  while (TREE_CODE (chartype) != INTEGER_TYPE)
+chartype = TREE_TYPE (chartype);

This is a bit concerning.  First under what conditions is chartype
not
going to be an INTEGER_TYPE?  And under what conditions will
extracting
its type ultimately lead to something that is an INTEGER_TYPE?


chartype is usually (maybe even always) pointer type here:

  const char a[] = "123";
  extern int i;
  n = strlen ([i]);


But your hunch was correct that the loop isn't safe because
the element type need not be an integer (I didn't know/forgot
that the function is called for non-strings too).  The loop
should be replaced by:

  while (TREE_CODE (chartype) == ARRAY_TYPE
 || TREE_CODE (chartype) == POINTER_TYPE)
chartype = TREE_TYPE (chartype);


As this function may be called "late" you need to cope with
the middle-end ignoring type changes and thus happily
passing int *** directly rather than (char *) of that.

Also doesn't the above yield int for int *[]?


I don't think it ever gets this far for either a pointer to
an array of int, or for an array of pointers to int.  So for
something like the following the function fails earlier:

  const int* const a[2] = { ... };
  const char* (const *p)[2] = 

  int f (void)
  {
return __builtin_memcmp (*p, "12345678", 8);
  }

(Assuming this is what you were asking about.)


I guess you really want

   if (POINTER_TYPE_P (chartype))
 chartype = TREE_TYPE (chartype);
   while (TREE_CODE (chartype) == ARRAY_TYPE)
 chartype = TREE_TYPE (chartype);

?


That seems to work too.  Attached is an update with this tweak.
The update also addresses some of Bernd's comments: it removes
the pointless second test in:

if (TREE_CODE (type) == ARRAY_TYPE
&& TREE_CODE (type) != INTEGER_TYPE)

the unused assignment to chartype in:

   else if (DECL_P (arg))
 {
   array = arg;
   chartype = TREE_TYPE (arg);
 }

and calls string_constant() instead of strnlen() to compute
the length of a generic string.

Other improvements  are possible in this area but they are
orthogonal to the bug I'm trying to fix so I'll post separate
patches for some of those.

Martin







PR tree-optimization/86622 - incorrect strlen of array of array plus variable offset
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86622
	PR 

Re: committed: remove redundant -Wall from -Warray-bounds (PR 82063)

2018-07-24 Thread Franz Sirl

Am 2018-07-24 um 17:35 schrieb Martin Sebor:

On 07/24/2018 03:24 AM, Franz Sirl wrote:

Am 2018-07-20 um 23:22 schrieb Martin Sebor:

As the last observation in PR 82063 Jim points out that

   Both -Warray-bounds and -Warray-bounds= are listed in the c.opt
   file as being enabled by -Wall, but they are the same option,
   and it causes this one option to be processed twice in the
   C_handle_option_auto function in the generated options.c file.
   It gets set to the same value twice, so it does work as intended,
   but this is wasteful.

I have removed the redundant -Wall from the first option and
committed the change as obvious in r262912.


Hi Martin,

this looks related to PR 68845 and my patch in there. I never posted it
to gcc-patches because I couldn't find a definitive answer on how
options duplicated between common.opt and c-family/c.opt are supposed to
be handled.
For example, Warray-bounds in common.opt is a separate option (not an
alias to Warray-bounds=), leading to separate enums for them. Is this
intended? Warray-bounds seemed to be the only option with an equal sign
doing it like that at that time. Now Wcast-align is doing the same...

Can you shed some light on this?


-Warray-bounds= (the form that takes an argument) was added in
r219577.  Before then, only the plain form existed.  If I had
to guess, the interplay between the two options (as opposed to
making the latter an alias for the new option) wasn't considered.
I didn't think of it until now either.  Your patch seems like
the right solution to me.  Let me know if you will submit it.
If not, I posted the patch below that touches this area and
that will likely need updating so I can roll your change into
it:

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01286.html


I'll post a patch tomorrow, since I already have all the changes 
available and tested here.


Note that one minor change with this patch is that with 
-fdiagnostics-show-option the message will show -Warray-bounds= (equal 
sign added) instead of -Warray-bounds.


Franz


Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Segher Boessenkool
On Tue, Jul 24, 2018 at 12:19:30PM -0600, Sandra Loosemore wrote:
> On 07/24/2018 12:10 PM, Segher Boessenkool wrote:
> >On Mon, Jul 23, 2018 at 10:26:35PM -0600, Sandra Loosemore wrote:
> >>diff --git a/libgcc/config.host b/libgcc/config.host
> >>index 18cabaf..b2ee0c9 100644
> >>--- a/libgcc/config.host
> >>+++ b/libgcc/config.host
> >>@@ -94,6 +94,9 @@ am33_2.0-*-linux*)
> >>  arc*-*-*)
> >>cpu_type=arc
> >>;;
> >>+csky*-*-*)
> >>+   cpu_type=csky
> >>+   ;;
> >>  arm*-*-*)
> >>cpu_type=arm
> >>;;
> >
> >This long list was alphabetic before (except x86_64 and tic6x, alas);
> >let's not make things worse?
> 
> Oops!  Good catch on that.  I'll take care of it.

Thanks!

Rest looks fine fwiw (I just skimmed it, and I cannot read .gz files
without some effort, so take it for what it's worth: not too much ;-) ).


Segher


Re: [PATCH] include more detail in -Warray-bounds (PR 86650)

2018-07-24 Thread Martin Sebor

On 07/24/2018 11:05 AM, David Malcolm wrote:

On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:

On 07/23/2018 07:20 PM, David Malcolm wrote:

On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:

(David, I'm hoping your your help here.  Please see the end.)

While looking into a recent -Warray-bounds instance in Glibc
involving inlining of large functions it became apparent that
GCC could do a better job of pinpointing the source of
the problem.

The attached patch makes a few adjustments to both
the pretty printer infrastructure and to VRP to make this
possible.  The diagnostic pretty printer already has directives
to print the inlining context for both tree and gcall* arguments,
so most of the changes just adjust things to be able to pass in
gimple* argument instead.

The only slightly interesting change is to print the declaration
to which the out-of-bounds array refers if one is known.

Tested on x86_64-linux with one regression.

The regression is in the gcc.dg/Warray-bounds.c test: the column
numbers of the warnings are off.  Adding the %G specifier to
the array bounds warnings in VRP has the unexpected effect of
expanding the extent of the underling. For instance, for a test
case like this:

   int a[10];

   void f (void)
   {
 a[-1] = 0;
   }

from the expected:

a[-1] = 0;
~^~~~

to this:

   a[-1] = 0;
~~^~~

David, do you have any idea how to avoid this?


Are you referring to the the various places in your patch (in e.g.
  vrp_prop::check_array_ref
  vrp_prop::check_mem_ref
  vrp_prop::search_for_addr_array
) where the patch changed things from this form:

  warning_at (location, OPT_Warray_bounds,
  "[...format string...]", ARGS...);

to this form:

  warning_at (location, OPT_Warray_bounds,
  "%G[...format string...]", stmt, ARGS...);


Yes.



If so, there are two location_t values of interest here:
(a) the "location" value, and
(b) gimple_location (stmt)

My recollection is that %G and %K override the "location" value
passed
in as the first param to the diagnostic call, overwriting it within
the
diagnostic_info's text_info with the location value from the %K/%G
(which also set up the pp_ti_abstract_origin of the text_info from
the
block information stashed in the ad-hoc data part of the location,
so
that the pretty-printer prints the inlining chain).


Would having the pretty printer restore the location and
the block after it's done printing the context and before
processing the rest of the format string fix it?  (I have
only a vague idea how this all works so I'm not sure if
this even makes sense.)


Structurally, it looks like this:

Temporaries during the emission of   |  Long-lived stuff:
the diagnostic:  |
 |+-+
++   ||global_dc|
|diagnostic_info |   |+-+
|++  |   |
||text_info:  |  |   |
||  m_richloc-+--+---> rich_location |
||  x_data+--+---+--> block (via pp_ti_abstract_origin)
|++  |   |
++   |
 |

The location_t of the diagnostic is stored in the rich_location.

Calling:
  warning_at (location)
creates a rich_location wrapping "location" and uses it as above.

During formatting, the %K/%G codes set text_info.x_data via
pp_ti_abstract_origin and overwrite the location_t in the
rich_location.

So in theory we could have a format code that sets the block and
doesn't touch the rich_location.  But that seems like overkill to me.


I wasn't thinking of a new format.  Rather, I thought the %K
would save the current block and location (set by the location
argument to warning_at), then after printing the inlining stack
but before printing the rest of the diagnostic the printer would
restore the saved block and location.  I still don't know enough
to tell if it would work.

In any event, if it's easier to always print the inlining stack
and get rid of %K and %G then that would be preferable.  I don't
think they are used for any other purpose (i.e., they are always
used as the first directive in a format string).


[aside, why don't we always just print the inlining chain?  IIRC,
%K
and %G feel too much like having to jump through hoops to me, given
that gimple_block is looking at gimple_location anyway, why not
just
use the location in the location_t's ad-hoc data; I have a feeling
there's a PR open about this, but I don't have it to hand right
now].


That would make sense to me.  I think that's also what we
agreed would be the way forward the last time we discussed
this.


(nods)


So how do we go about making this happen?  Somewhat selfishly
I was sort of waiting for you to take the lead on it since
you're much more familiar with the code than I am :)  But
that doesn't mean I can't try to tackle it myself.  If it seems
like something you can fit 

Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-24 Thread Alexandre Oliva
Hello, Christina,

On Jul 24, 2018, Tamar Christina  wrote:

> gcc/
> 2018-07-24  Tamar Christina  

>   PR target/86486
>   * configure.ac: Add stack-clash-protection-guard-size.
>   * doc/install.texi: Document it.
>   * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
>   * params.def: Update comment for guard-size.
>   * configure: Regenerate.

The configury bits look almost good to me.

I wish the help message, comments and docs expressed somehow that the
given power of two expresses a size in bytes, rather than in kilobytes,
bits or any other unit that might be reasonably assumed to express stack
sizes.  I'm afraid I don't know the best way to accomplish that in a few
words.

> +stk_clash_default=12

This seems to be left-over from an earlier patch, as it is now unused
AFAICT.

Thanks,

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Alexandre Oliva
On Jul 24, 2018, Tom de Vries  wrote:

> This patch adds fake uses of user variables at the point where they go out of
> scope, to keep user variables inspectable throughout the application.

I suggest also adding such uses before sets, so that variables aren't
regarded as dead and get optimized out in ranges between the end of a
live range and a subsequent assignment.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Alexandre Oliva
On Jul 24, 2018, Tom de Vries  wrote:

> There's a design principle in GCC that code generation and debug generation
> are independent.  This guarantees that if you're encountering a problem in an
> application without debug info, you can recompile it with -g and be certain
> that you can reproduce the same problem, and use the debug info to debug the
> problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
> breaks this invariant

I thought of a way to not break it: enable the debug info generation
machinery, including VTA and SFN, but discard those only at the very end
if -g is not enabled.  The downside is that it would likely slow -Og
down significantly, but who uses it without -g anyway?

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [PATCH] Introduce instance discriminators

2018-07-24 Thread Alexandre Oliva
On Jul 19, 2018, Richard Biener  wrote:

> Oh, that probably wasn't omitted on purpose.  Cary said it was used
> for profiling but I can't see any such use.

> Is the instance discriminator stuff also used for profiling?

Not that I know, but...  I probably wouldn't know yet ;-)

Anyway, it was easy enough to implement this:

>> I suspect there might be a way to assign instance discriminator numbers
>> to individual function DECLs, and then walk up the lexical block tree to
>> identify the DECL containing the block so as to obtain the discriminator
>> number.

and then, in a subsequent patch, I went ahead and added support for LTO,
saving and recovering discriminator info for instances and, while at
that, for basic blocks too.

Besides sucessfully regstrapping the first two patches on
x86_64-linux-gnu, I have tested this patchset with an additional
bootstrap with the third (throw-away) patch, that adds -gnateS to Ada
compilations in gcc/ada, libada and gnattools.  I also tested the saving
and restoring of discriminators for LTO by manually inspecting the line
number tables in LTO-recompiled executables, to check that they retained
the instance or BB discriminator numbers that went into the non-LTO
object files.

Ok to install the first two patches?  (the third is just for reference)


Introduce instance discriminators

From: Alexandre Oliva 

With -gnateS, the Ada compiler sets itself up to output discriminators
for different instantiations of generics, but the middle and back ends
have lacked support for that.  This patch introduces the missing bits,
translating the GNAT-internal representation of the per-file instance
map to an instance_table that maps decls to instance discriminators.


From: Alexandre Oliva  , Olivier Hainque  

for  gcc/ChangeLog

* debug.h (decl_to_instance_map_t): New type.
(decl_to_instance_map): Declare.
(maybe_create_decl_to_instance_map): New inline function.
* final.c (bb_discriminator, last_bb_discriminator): New statics,
to track basic block discriminators.
(final_start_function_1): Initialize them.
(final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track
bb_discriminator.
(decl_to_instance_map): New variable.
(map_decl_to_instance, maybe_set_discriminator): New functions.
(notice_source_line): Set discriminator.

for  gcc/ada

* trans.c: Include debug.h.
(file_map): New static variable.
(gigi): Set it.  Create decl_to_instance_map when needed.
(Subprogram_Body_to_gnu): Pass gnu_subprog_decl to...
(Sloc_to_locus): ... this.  Add decl parm, map it to instance.
* gigi.h (Sloc_to_locus): Adjust declaration.

for  gcc/testsuite/ChangeLog

* gnat.dg/dinst.adb: New.
* gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New.
---
 gcc/ada/gcc-interface/gigi.h|2 +
 gcc/ada/gcc-interface/trans.c   |   29 ---
 gcc/debug.h |   15 
 gcc/final.c |   70 +--
 gcc/testsuite/gnat.dg/dinst.adb |   20 ++
 gcc/testsuite/gnat.dg/dinst_pkg.adb |7 
 gcc/testsuite/gnat.dg/dinst_pkg.ads |4 ++
 7 files changed, 137 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/dinst.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads

diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h
index a75cb9094491..b890195cefc3 100644
--- a/gcc/ada/gcc-interface/gigi.h
+++ b/gcc/ada/gcc-interface/gigi.h
@@ -285,7 +285,7 @@ extern void process_type (Entity_Id gnat_entity);
location and false if it doesn't.  If CLEAR_COLUMN is true, set the column
information to 0.  */
 extern bool Sloc_to_locus (Source_Ptr Sloc, location_t *locus,
-  bool clear_column = false);
+  bool clear_column = false, const_tree decl = 0);
 
 /* Post an error message.  MSG is the error message, properly annotated.
NODE is the node at which to post the error and the node to use for the
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 31e098a0c707..0371d00fce18 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -41,6 +41,7 @@
 #include "stmt.h"
 #include "varasm.h"
 #include "output.h"
+#include "debug.h"
 #include "libfuncs.h"  /* For set_stack_check_libfunc.  */
 #include "tree-iterator.h"
 #include "gimplify.h"
@@ -255,6 +256,12 @@ static tree create_init_temporary (const char *, tree, 
tree *, Node_Id);
 static const char *extract_encoding (const char *) ATTRIBUTE_UNUSED;
 static const char *decode_name (const char *) ATTRIBUTE_UNUSED;
 
+/* This makes gigi's file_info_ptr visible in this translation unit,
+   so that Sloc_to_locus can look it up when deciding whether to map
+   decls to instances.  */
+
+static struct File_Info_Type *file_map;
+

Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Sandra Loosemore

On 07/24/2018 12:10 PM, Segher Boessenkool wrote:

On Mon, Jul 23, 2018 at 10:26:35PM -0600, Sandra Loosemore wrote:

diff --git a/libgcc/config.host b/libgcc/config.host
index 18cabaf..b2ee0c9 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -94,6 +94,9 @@ am33_2.0-*-linux*)
  arc*-*-*)
cpu_type=arc
;;
+csky*-*-*)
+   cpu_type=csky
+   ;;
  arm*-*-*)
cpu_type=arm
;;


This long list was alphabetic before (except x86_64 and tic6x, alas);
let's not make things worse?


Oops!  Good catch on that.  I'll take care of it.

-Sandra


Re: [2/5] C-SKY port: Backend implementation

2018-07-24 Thread Sandra Loosemore

On 07/24/2018 09:45 AM, Jeff Law wrote:

On 07/23/2018 10:21 PM, Sandra Loosemore wrote:

2018-07-23  Jojo  
     Huibin Wang  
     Sandra Loosemore  
     Chung-Lin Tang  

     C-SKY port: Backend implementation

     gcc/
     * config/csky/*: New.
     * common/config/csky/*: New.


Let's avoid gratutious whitespace that attempts to line up conditionals.
   As an example, look at the predicate csky_load_multiple_operation.  I
think just doing a quick pass over the .c, .h and main .md files should
be sufficient here.


OK, will do.


I'm not a big fan of more awk code, but I'm not going to object to it :-)

Why does the port have its own little pass for condition code
optimization (cse_cc)?  What is it doing that can't be done with our
generic optimizers?


This pass was included in the initial patch set we got from C-SKY, and 
as it didn't seem to break anything I left it in.  Perhaps C-SKY can 
provide a testcase that demonstrates why it's still useful in the 
current version of GCC; otherwise we can remove this from the initial 
port submission and restore it later if some performance analysis shows 
it is still worthwhile.



Any thoughts on using the newer function descriptor bits rather than old
style stack trampolines?


Has that been committed?  I vaguely remembered discussion of a new way 
to handle nested functions without using the trampoline interface, but I 
couldn't find any documentation in the internals manual.



I don't see anything terribly concerning in the core of the port.  The
amount of support code for minipool is huge and I wonder if some sharing
across the various ports would be possible, but I don't think that
should be a blocking issue for this port.


Yes, that code was clearly copied almost verbatim from the ARM backend. 
I left it alone as much as possible to simplify any future attempts at 
genericizing it.



Can you update the backends.html web page here appropriately for the
c-sky target?


Sure, I can take care of updating that when the port is committed.  I 
believe the right entry is


"csky  b   ia"


I'd like to take a closer look, but those are the high level comment's
I've got this morning :-)


Thanks.  I'll wait a bit for more comments to come in before preparing a 
revised patch.


-Sandra


Fix ceil_log2(0) (PR 86644)

2018-07-24 Thread Richard Sandiford
This PR shows a pathological case in which we try SLP vectorisation on
dead code.  We record that 0 bits of the result are enough to satisfy
all users (which is true), and that led to precision being 0 in:

static unsigned int
vect_element_precision (unsigned int precision)
{
  precision = 1 << ceil_log2 (precision);
  return MAX (precision, BITS_PER_UNIT);
}

ceil_log2 (0) returned 64 rather than 0, leading to 1 << 64, which is UB.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2018-07-24  Richard Sandiford  

gcc/
* hwint.c (ceil_log2): Fix comment.  Return 0 for 0.

Index: gcc/hwint.c
===
--- gcc/hwint.c 2018-05-02 08:38:14.433364094 +0100
+++ gcc/hwint.c 2018-07-24 19:09:03.522774662 +0100
@@ -60,12 +60,12 @@ floor_log2 (unsigned HOST_WIDE_INT x)
   return t;
 }
 
-/* Given X, an unsigned number, return the largest Y such that 2**Y >= X.  */
+/* Given X, an unsigned number, return the least Y such that 2**Y >= X.  */
 
 int
 ceil_log2 (unsigned HOST_WIDE_INT x)
 {
-  return floor_log2 (x - 1) + 1;
+  return x == 0 ? 0 : floor_log2 (x - 1) + 1;
 }
 
 /* Return the logarithm of X, base 2, considering X unsigned,


Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Segher Boessenkool
On Mon, Jul 23, 2018 at 10:26:35PM -0600, Sandra Loosemore wrote:
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 18cabaf..b2ee0c9 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -94,6 +94,9 @@ am33_2.0-*-linux*)
>  arc*-*-*)
>   cpu_type=arc
>   ;;
> +csky*-*-*)
> + cpu_type=csky
> + ;;
>  arm*-*-*)
>   cpu_type=arm
>   ;;

This long list was alphabetic before (except x86_64 and tic6x, alas);
let's not make things worse?


Segher


Avoid _VINFO_MASKS for bb vectorisation (PR 86618)

2018-07-24 Thread Richard Sandiford
r262589 introduced another instance of the bug fixed in r258131.

Tested on aarch64-linux-gnu and applied as obvious.

Richard


2018-07-24  Richard Sandiford  

gcc/
PR tree-optimization/86618
* tree-vect-stmts.c (vectorizable_call): Don't take the address
of LOOP_VINFO_MASKS (loop_vinfo) when loop_vinfo is null.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2018-07-24 19:03:04.0 +0100
+++ gcc/tree-vect-stmts.c   2018-07-24 19:03:23.521825230 +0100
@@ -3337,7 +3337,7 @@ vectorizable_call (gimple *gs, gimple_st
  needs to be generated.  */
   gcc_assert (ncopies >= 1);
 
-  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : NULL);
   if (!vec_stmt) /* transformation not required.  */
 {
   STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;


Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-24 Thread Richard Biener
On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
 wrote:
>
>
> This patch defines a new intrinsic function
> __builtin_speculation_safe_value.  A generic default implementation is
> defined which will attempt to use the backend pattern
> "speculation_safe_barrier".  If this pattern is not defined, or if it
> is not available, then the compiler will emit a warning, but
> compilation will continue.
>
> Note that the test spec-barrier-1.c will currently fail on all
> targets.  This is deliberate, the failure will go away when
> appropriate action is taken for each target backend.

So given this series is supposed to be backported I question

+rtx
+default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
+   rtx result, rtx val,
+   rtx failval ATTRIBUTE_UNUSED)
+{
+  emit_move_insn (result, val);
+#ifdef HAVE_speculation_barrier
+  /* Assume the target knows what it is doing: if it defines a
+ speculation barrier, but it is not enabled, then assume that one
+ isn't needed.  */
+  if (HAVE_speculation_barrier)
+emit_insn (gen_speculation_barrier ());
+
+#else
+  warning_at (input_location, 0,
+ "this target does not define a speculation barrier; "
+ "your program will still execute correctly, but speculation "
+ "will not be inhibited");
+#endif
+  return result;

which makes all but aarch64 archs warn on __bultin_speculation_safe_value
uses, even those that do not suffer from Spectre like all those embedded targets
where implementations usually do not speculate at all.

In fact for those targets the builtin stays in the way of optimization on GIMPLE
as well so we should fold it away early if neither the target hook is
implemented
nor there is a speculation_barrier insn.

So, please make resolve_overloaded_builtin return a no-op on such targets
which means you can remove the above warning.  Maybe such targets
shouldn't advertise / initialize the builtins at all?

The builtins also have no attributes which mean they are assumed to be
1) calling back into the CU via exported functions, 2) possibly throwing
exceptions, 3) affecting memory state.  I think you at least want
to use ATTR_NOTHROW_LEAF_LIST.

The builtins are not designed to be optimization or memory barriers as
far as I can see and should thus be CONST as well.

BUILT_IN_SPECULATION_SAFE_VALUE_PTR is declared but
nowhere generated?  Maybe

+case BUILT_IN_SPECULATION_SAFE_VALUE_N:
+  {
+   int n = speculation_safe_value_resolve_size (function, params);
+   tree new_function, first_param, result;
+   enum built_in_function fncode;
+
+   if (n == -1)
+ return error_mark_node;
+   else if (n == 0)
+ fncode = (enum built_in_function)((int)orig_code + 1);
+   else
+ fncode
+   = (enum built_in_function)((int)orig_code + exact_log2 (n) + 2);

resolve_size does that?  Why can that not return the built_in_function
itself or BUILT_IN_NONE on error to make that clearer?

Otherwise it looks reasonable but C FE maintainers should comment.
I miss C++ testcases (or rather testcases should be in c-c++-common).

Richard.

> gcc:
> * builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
> (BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
> (BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
> * builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
> (BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
> (BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
> (BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
> (BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
> (BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
> (BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
> * builtins.c (expand_speculation_safe_value): New function.
> (expand_builtin): Call it.
> * doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
> * doc/extend.texi: Document __builtin_speculation_safe_value.
> * doc/md.texi: Document "speculation_barrier" pattern.
> * doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE.
> * doc/tm.texi: Regenerated.
> * target.def (speculation_safe_value): New hook.
> * targhooks.c (default_speculation_safe_value): New function.
> * targhooks.h (default_speculation_safe_value): Add prototype.
>
> c-family:
> * c-common.c (speculation_safe_resolve_size): New function.
> (speculation_safe_resolve_params): New function.
> (speculation_safe_resolve_return): New function.
> (resolve_overloaded_builtin): Handle __builtin_speculation_safe_value.
> * c-cppbuiltin.c (c_cpp_builtins): Add pre-define for
> __HAVE_SPECULATION_SAFE_VALUE.
>
> testsuite:
> * gcc.dg/spec-barrier-1.c: New test.
> * gcc.dg/spec-barrier-2.c: New test.
> * gcc.dg/spec-barrier-3.c: 

[PATCH] combine: Allow combining two insns to two insns

2018-07-24 Thread Segher Boessenkool
This patch allows combine to combine two insns into two.  This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen.  PR85160 is a typical example of code
that it can improve.

This patch does not allow such combinations if either of the original
instructions was a simple move instruction.  In those cases combining
the two instructions increases register pressure without improving the
code.  With this move test register pressure does no longer increase
noticably as far as I can tell.

(At first I also didn't allow either of the resulting insns to be a
move instruction.  But that is actually a very good thing to have, as
should have been obvious).

Tested for many months; tested on about 30 targets.

I'll commit this later this week if there are no objections.


Segher


2018-07-24  Segher Boessenkool  

PR rtl-optimization/85160
* combine.c (is_just_move): New function.
(try_combine): Allow combining two instructions into two if neither of
the original instructions was a move.

---
 gcc/combine.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index cfe0f19..d64e84d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2604,6 +2604,17 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
   return true;
 }
 
+/* Return whether X is just a single set, with the source
+   a general_operand.  */
+static bool
+is_just_move (rtx x)
+{
+  if (INSN_P (x))
+x = PATTERN (x);
+
+  return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode));
+}
+
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into
@@ -2668,6 +2679,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   int swap_i2i3 = 0;
   int split_i2i3 = 0;
   int changed_i3_dest = 0;
+  bool i2_was_move = false, i3_was_move = false;
 
   int maxreg;
   rtx_insn *temp_insn;
@@ -3059,6 +3071,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   return 0;
 }
 
+  /* Record whether i2 and i3 are trivial moves.  */
+  i2_was_move = is_just_move (i2);
+  i3_was_move = is_just_move (i3);
+
   /* Record whether I2DEST is used in I2SRC and similarly for the other
  cases.  Knowing this will help in register status updating below.  */
   i2dest_in_i2src = reg_overlap_mentioned_p (i2dest, i2src);
@@ -4014,8 +4030,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && XVECLEN (newpat, 0) == 2
   && GET_CODE (XVECEXP (newpat, 0, 0)) == SET
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-  && (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
- || set_noop_p (XVECEXP (newpat, 0, 1)))
+  && (i1
+  || set_noop_p (XVECEXP (newpat, 0, 0))
+  || set_noop_p (XVECEXP (newpat, 0, 1))
+  || (!i2_was_move && !i3_was_move))
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
-- 
1.8.3.1



Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Richard Biener
On Tue, Jul 24, 2018 at 4:41 PM Jakub Jelinek  wrote:
>
> On Tue, Jul 24, 2018 at 04:33:30PM +0200, Richard Biener wrote:
> > DW_OP_GNU_variable_value you mean.  That's true.  But I was talking about
>
> Sure.
>
> > omitting DW_AT_upper_bound which would correspond to int[] vs.
> > a empty location list DW_AT_upper_bound which would correspond to
> > int[].
>
> I think that is fine to differentiate.
>
> > Not sure if that is how it behaves (well, gdb still doesn't support
> > DW_OP_GNU_variable_value).  Related is probably that I wanted to have
> > debug info of VLAs in the abstract instance so we'd have a
> > DW_OP_GNU_variable_value
> > of a DIE in the abstract instance (with no location) and the concrete 
> > instance
> > DIE of the refered to DIE would only be related to it via its
> > DW_AT_abstract_origin
> > attribute.  Ideally that would just work but given lacking gdb support
> > I couldn't
> > figure that out and thus we re-emit the whole type in the concrete 
> > instances...
>
> The intent was that the debugger would do similar thing as if the user at that
> point asked for the value of that (integral) variable, whatever it would
> print would be just pushed to the DWARF stack.

OK, that is definitely exactly what I was hoping for.  I guess I
should contribute
a example like those in the DWARF specs with source and abstract instance
vs. inline instance for example with intended behavior so you can amend your
proposal.

> Not sure how far are the GDB folks with the DW_OP_GNU_variable_value
> support.

No idea - maybe Tom can help out here as well.

Richard.

> Jakub


Re: [PATCH] include more detail in -Warray-bounds (PR 86650)

2018-07-24 Thread David Malcolm
On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:
> On 07/23/2018 07:20 PM, David Malcolm wrote:
> > On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:
> > > (David, I'm hoping your your help here.  Please see the end.)
> > > 
> > > While looking into a recent -Warray-bounds instance in Glibc
> > > involving inlining of large functions it became apparent that
> > > GCC could do a better job of pinpointing the source of
> > > the problem.
> > > 
> > > The attached patch makes a few adjustments to both
> > > the pretty printer infrastructure and to VRP to make this
> > > possible.  The diagnostic pretty printer already has directives
> > > to print the inlining context for both tree and gcall* arguments,
> > > so most of the changes just adjust things to be able to pass in
> > > gimple* argument instead.
> > > 
> > > The only slightly interesting change is to print the declaration
> > > to which the out-of-bounds array refers if one is known.
> > > 
> > > Tested on x86_64-linux with one regression.
> > > 
> > > The regression is in the gcc.dg/Warray-bounds.c test: the column
> > > numbers of the warnings are off.  Adding the %G specifier to
> > > the array bounds warnings in VRP has the unexpected effect of
> > > expanding the extent of the underling. For instance, for a test
> > > case like this:
> > > 
> > >int a[10];
> > > 
> > >void f (void)
> > >{
> > >  a[-1] = 0;
> > >}
> > > 
> > > from the expected:
> > > 
> > > a[-1] = 0;
> > > ~^~~~
> > > 
> > > to this:
> > > 
> > >a[-1] = 0;
> > > ~~^~~
> > > 
> > > David, do you have any idea how to avoid this?
> > 
> > Are you referring to the the various places in your patch (in e.g.
> >   vrp_prop::check_array_ref
> >   vrp_prop::check_mem_ref
> >   vrp_prop::search_for_addr_array
> > ) where the patch changed things from this form:
> > 
> >   warning_at (location, OPT_Warray_bounds,
> >   "[...format string...]", ARGS...);
> > 
> > to this form:
> > 
> >   warning_at (location, OPT_Warray_bounds,
> >   "%G[...format string...]", stmt, ARGS...);
> 
> Yes.
> 
> > 
> > If so, there are two location_t values of interest here:
> > (a) the "location" value, and
> > (b) gimple_location (stmt)
> > 
> > My recollection is that %G and %K override the "location" value
> > passed
> > in as the first param to the diagnostic call, overwriting it within
> > the
> > diagnostic_info's text_info with the location value from the %K/%G
> > (which also set up the pp_ti_abstract_origin of the text_info from
> > the
> > block information stashed in the ad-hoc data part of the location,
> > so
> > that the pretty-printer prints the inlining chain).
> 
> Would having the pretty printer restore the location and
> the block after it's done printing the context and before
> processing the rest of the format string fix it?  (I have
> only a vague idea how this all works so I'm not sure if
> this even makes sense.)

Structurally, it looks like this:

Temporaries during the emission of   |  Long-lived stuff:
the diagnostic:  |
 |+-+
++   ||global_dc|
|diagnostic_info |   |+-+
|++  |   |
||text_info:  |  |   |
||  m_richloc-+--+---> rich_location |
||  x_data+--+---+--> block (via pp_ti_abstract_origin)
|++  |   |
++   |
 |

The location_t of the diagnostic is stored in the rich_location.

Calling:
  warning_at (location)
creates a rich_location wrapping "location" and uses it as above.

During formatting, the %K/%G codes set text_info.x_data via
pp_ti_abstract_origin and overwrite the location_t in the
rich_location.

So in theory we could have a format code that sets the block and
doesn't touch the rich_location.  But that seems like overkill to me.
   

> > 
> > [aside, why don't we always just print the inlining chain?  IIRC,
> > %K
> > and %G feel too much like having to jump through hoops to me, given
> > that gimple_block is looking at gimple_location anyway, why not
> > just
> > use the location in the location_t's ad-hoc data; I have a feeling
> > there's a PR open about this, but I don't have it to hand right
> > now].
> 
> That would make sense to me.  I think that's also what we
> agreed would be the way forward the last time we discussed
> this.

(nods)

> > 
> > It looks like the old location was (a), and now you're seeing (b),
> > since (b) is the location of the statement.
> 
> Yes, I think that's it.
> 
> > 
> > Whether or not this is a problem is difficult to tell; what does
> > the
> > full diagnostic look like? (you only quoted the
> > diagnostic_show_locus
> > part of it).
> 
> It looks like this:
> 
> e.c: In function ‘f’:
> e.c:5:9: warning: array subscript -1 is below array bounds of
> ‘int[10]’ 
> 

Re: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

2018-07-24 Thread Tamar Christina
Hi Richard,

Thanks for the review!

The 07/23/2018 18:46, Richard Biener wrote:
> On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina 
>  wrote:
> >Hi All,
> >
> >This allows copy_blkmode_to_reg to perform larger copies when it is
> >safe to do so by calculating
> >the bitsize per iteration doing the maximum copy allowed that does not
> >read more
> >than the amount of bits left to copy.
> >
> >Strictly speaking, this copying is only done if:
> >
> >  1. the target supports fast unaligned access
> >  2. no padding is being used.
> >
> >This should avoid the issues of the first patch (PR85123) but still
> >work for targets that are safe
> >to do so.
> >
> >Original patch https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> >Previous respin
> >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> >
> >
> >This produces for the copying of a 3 byte structure:
> >
> >fun3:
> > adrpx1, .LANCHOR0
> > add x1, x1, :lo12:.LANCHOR0
> > mov x0, 0
> > sub sp, sp, #16
> > ldrhw2, [x1, 16]
> > ldrbw1, [x1, 18]
> > add sp, sp, 16
> > bfi x0, x2, 0, 16
> > bfi x0, x1, 16, 8
> > ret
> >
> >whereas before it was producing
> >
> >fun3:
> > adrpx0, .LANCHOR0
> > add x2, x0, :lo12:.LANCHOR0
> > sub sp, sp, #16
> > ldrhw1, [x0, #:lo12:.LANCHOR0]
> > ldrbw0, [x2, 2]
> > strhw1, [sp, 8]
> > strbw0, [sp, 10]
> > ldr w0, [sp, 8]
> > add sp, sp, 16
> > ret
> >
> >Cross compiled and regtested on
> >  aarch64_be-none-elf
> >  armeb-none-eabi
> >and no issues
> >
> >Boostrapped and regtested
> > aarch64-none-linux-gnu
> > x86_64-pc-linux-gnu
> > powerpc64-unknown-linux-gnu
> > arm-none-linux-gnueabihf
> >
> >and found no issues.
> >
> >OK for trunk?
> 
> How does this affect store-to-load forwarding when the source is initialized 
> piecewise? IMHO we should avoid larger loads but generate larger stores when 
> possible. 
> 
> How do non-x86 architectures behave with respect to STLF? 
>

I should have made it more explicit in my cover letter, but this only covers 
reg to reg copies.
So the store-t-load forwarding shouldn't really come to play here, unless I'm 
missing something

The example in my patch shows that the loads from mem are mostly unaffected.

For x86 the change is also quite significant, e.g for a 5 byte struct load it 
used to generate

fun5:
movlfoo5(%rip), %eax
movl%eax, %edi
movzbl  %al, %edx
movzbl  %ah, %eax
movb%al, %dh
movzbl  foo5+2(%rip), %eax
shrl$24, %edi
salq$16, %rax
movq%rax, %rsi
movzbl  %dil, %eax
salq$24, %rax
movq%rax, %rcx
movq%rdx, %rax
movzbl  foo5+4(%rip), %edx
orq %rsi, %rax
salq$32, %rdx
orq %rcx, %rax
orq %rdx, %rax
ret

instead of

fun5:
movzbl  foo5+4(%rip), %eax
salq$32, %rax
movq%rax, %rdx
movlfoo5(%rip), %eax
orq %rdx, %rax
ret

so the loads themselves are unaffected.

Thanks,
Tamar
 
> Richard. 
> 
> >Thanks,
> >Tamar
> >
> >gcc/
> >2018-07-23  Tamar Christina  
> >
> > * expr.c (copy_blkmode_to_reg): Perform larger copies when safe.
> 

-- 


Re: GCC 8.2 Status Report (2018-07-19), branch frozen for release

2018-07-24 Thread Richard Biener
On July 24, 2018 5:50:33 PM GMT+02:00, Ramana Radhakrishnan 
 wrote:
>On Thu, Jul 19, 2018 at 10:11 AM, Richard Biener 
>wrote:
>>
>> Status
>> ==
>>
>> The GCC 8 branch is frozen for preparation of the GCC 8.2 release.
>> All changes to the branch now require release manager approval.
>>
>>
>> Previous Report
>> ===
>>
>> https://gcc.gnu.org/ml/gcc/2018-07/msg00194.html
>
>Is there any chance we can get some of the spectrev1 mitigation
>patches reviewed and into 8.2 .

It's now too late for that and it has to wait for 8.2.

>It would be quite useful to get these into a release as I see that the
>reviews are kinda petering out and there hasn't been any objection to
>the approach.

It's not that people only use release tarballs.

Richard. 
>
>
>regards
>Ramana



Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-24 Thread Sam Tebbs



On 07/23/2018 02:14 PM, Sam Tebbs wrote:



On 07/23/2018 12:38 PM, Renlin Li wrote:

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+    (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+    (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+    (match_operand:GPI 4 "const_int_operand" "Ulc, n"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])"
+  {
+    switch (which_alternative)
+    {
+  case 0:
+    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3])));
+    return "bfxil\\t%0, %1, 0, %3";
+  case 1:
+    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4])));
+    return "bfxil\\t%0, %2, 0, %3";
+  default:
+    gcc_unreachable ();
+    }
+  }
+  [(set_attr "type" "bfm")]
+)


Hi Sam,

Is it possible that operand[3] or operand[4] is 1?

ceil_log2() could return 0 if the argument is 1.
The width field of the resulting instruction will be 0. Is it still 
correct?


Regard,
Renlin



Hi Renlin,

I think you have found an edge-case that I didn't think of and that 
the code would fail under. I have added a check for this situation and 
will send the updated patch soon.


Thanks,
Sam


Here is the updated patch that fixes the problem outlined by Renlin, and 
adds another testcase to check for the same issue. The changelog and 
example from my earlier email (sent on 07/20/2018 10:31 AM) still apply.


Thanks,
Sam



On 07/20/2018 10:33 AM, Sam Tebbs wrote:

Please disregard the original patch and see this updated version.


On 07/20/2018 10:31 AM, Sam Tebbs wrote:

Hi all,

Here is an updated patch that does the following:

* Adds a new constraint in config/aarch64/constraints.md to check 
for a constant integer that is left consecutive. This addresses 
Richard Henderson's suggestion about combining the 
aarch64_is_left_consecutive call and the const_int match in the 
pattern.


* Merges the two patterns defined into one.

* Changes the pattern's type attribute to bfm.

* Improved the comment above the aarch64_is_left_consecutive 
implementation.


* Makes the pattern use the GPI iterator to accept smaller integer 
sizes (an extra test is added to check for this).


* Improves the tests in combine_bfxil.c to ensure they aren't 
optimised away and that they check for the pattern's correctness.


Below is a new changelog and the example given before.

Is this OK for trunk?

This patch adds an optimisation that exploits the AArch64 BFXIL 
instruction
when or-ing the result of two bitwise and operations with 
non-overlapping

bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long 
b) {

  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned long 
long *c) {

  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret



Bootstrapped and regtested on aarch64-none-linux-gnu and 
aarch64-none-elf with no regressions.



gcc/
2018-07-11  Sam Tebbs  

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h 
(aarch64_is_left_consecutive):

    Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive): 
New function.


gcc/testsuite
2018-07-11  Sam Tebbs  

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/19/2018 02:02 PM, Sam Tebbs wrote:

Hi Richard,

Thanks for the feedback. I find that using "is_left_consecutive" 
is more descriptive than checking for it being a power of 2 - 1, 
since it describes the requirement (having consecutive ones from 
the MSB) more explicitly. I would be happy to change it though if 
that is the consensus.


I have addressed your point about just returning the string 
instead of using output_asm_insn and have changed it locally. I'll 
send an updated patch soon.



On 07/17/2018 02:33 AM, Richard Henderson wrote:

On 07/16/2018 10:10 AM, Sam Tebbs wrote:

+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode 
(unsigned regno, unsigned,

  return SImode;
  }
  +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits 
are consecutive

+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+

...

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+    (match_operand 3 

Re: [Patch, Fortran] PR 57160: short-circuit IF only with -ffrontend-optimize

2018-07-24 Thread Janus Weil
2018-07-24 17:41 GMT+02:00 Janne Blomqvist :
> Optimization bugs that pop up at different optimization levels are hard
> enough for users to figure out

Right, and they're impossible to detect if there is no way to disable
the optimization, which is what this PR is about.


> without the frontend generating different
> code to begin with depending on the optimization level.

In the end it doesn't make much of a difference whether the
optimizations are done in the front or the middle end. The user knows
nothing about this, and he doesn't need to.

The problematic point here is that short-circuiting is an optimization
that is enabled already at -O0.


> Also, with a
> separate option it would be easy to check how it affects performance at
> different optimization levels.

For the case at hand, the short-circuiting is an absolutely valid
optimization. There is no reason why you wouldn't wanna do it (with -O
flags).


> What about making it a -fcheck=short-circuit-logicals (or however you want
> to spell it?) option, that also would be enabled with -fcheck=all?

What would such a flag even do? The actually invalid operation in the
test case is a null-pointer access, which could be caught by
-fcheck=pointer if we disable the optimization that removes it (i.e.
short-circuiting).

However, it also seems like -fcheck=pointer could use some
enhancement, since it does not even seem to catch simple cases like
this:

integer, pointer :: p => null()
print *,p
end

Cheers,
Janus


Re: [PATCH] Make strlen range computations more conservative

2018-07-24 Thread Martin Sebor

On 07/24/2018 01:59 AM, Bernd Edlinger wrote:

Hi!

This patch makes strlen range computations more conservative.

Firstly if there is a visible type cast from type A to B before passing
then value to strlen, don't expect the type layout of B to restrict the
possible return value range of strlen.

Furthermore use the outermost enclosing array instead of the
innermost one, because too aggressive optimization will likely
convert harmless errors into security-relevant errors, because
as the existing test cases demonstrate, this optimization is actively
attacking string length checks in user code, while and not giving
any warnings.


I strongly object to this change.

As you know, I am actively working in this area -- I asked you
to hold off on submitting patches for it until the review of
bug 86532 has completed.  It's not just unhelpful but
disrespectful of you to ignore my request and to try to make
changes you know I will likely have a strong opinion on in
spite of it, and without as much as involving me in the proposal.

As the author of this code and of many security improvements
in GCC I also find your characterization above of "actively
attacking" user code insulting.  If security is your main
concern then helping detect the invalid code you are trying
to accommodate with this change would be the right thing to
do.  One of the reasons for the tight bound is to build
a better foundation for the detection of buffer overflow
in string functions.  Relaxing the bound could make
the detection more difficult.

So again, I strongly object to both this change and to your
conduct.

Martin


Re: GCC 8.2 Status Report (2018-07-19), branch frozen for release

2018-07-24 Thread Ramana Radhakrishnan
On Thu, Jul 19, 2018 at 10:11 AM, Richard Biener  wrote:
>
> Status
> ==
>
> The GCC 8 branch is frozen for preparation of the GCC 8.2 release.
> All changes to the branch now require release manager approval.
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/ml/gcc/2018-07/msg00194.html

Is there any chance we can get some of the spectrev1 mitigation
patches reviewed and into 8.2 .

It would be quite useful to get these into a release as I see that the
reviews are kinda petering out and there hasn't been any objection to
the approach.


regards
Ramana


Re: [2/5] C-SKY port: Backend implementation

2018-07-24 Thread Jeff Law
On 07/23/2018 10:21 PM, Sandra Loosemore wrote:
> 2018-07-23  Jojo  
>     Huibin Wang  
>     Sandra Loosemore  
>     Chung-Lin Tang  
> 
>     C-SKY port: Backend implementation
> 
>     gcc/
>     * config/csky/*: New.
>     * common/config/csky/*: New.

Let's avoid gratutious whitespace that attempts to line up conditionals.
  As an example, look at the predicate csky_load_multiple_operation.  I
think just doing a quick pass over the .c, .h and main .md files should
be sufficient here.

I'm not a big fan of more awk code, but I'm not going to object to it :-)

Why does the port have its own little pass for condition code
optimization (cse_cc)?  What is it doing that can't be done with our
generic optimizers?

Any thoughts on using the newer function descriptor bits rather than old
style stack trampolines?

I don't see anything terribly concerning in the core of the port.  The
amount of support code for minipool is huge and I wonder if some sharing
across the various ports would be possible, but I don't think that
should be a blocking issue for this port.


Can you update the backends.html web page here appropriately for the
c-sky target?

I'd like to take a closer look, but those are the high level comment's
I've got this morning :-)


Jeff


Re: [Patch, Fortran] PR 57160: short-circuit IF only with -ffrontend-optimize

2018-07-24 Thread Janne Blomqvist
On Tue, Jul 24, 2018 at 4:46 PM, Janus Weil  wrote:

> 2018-07-24 11:12 GMT+02:00 Dominique d'Humières :
> > If you want non short-circuit evaluation, introduce an option for it.
>
> Your argument could easily be reversed: If you want short-circuiting,
> go introduce an option for it.
>
> I'm sure we'll not get anywhere this way, and I do think that Joost's
> suggestion to avoid short-circuiting with -O0 (and possibly -Og) is
> very reasonable: People who want maximum performance still get that
> with -O3, and at the same time they can still check their codes for
> errors with -O0.
>

I agree with Dominique, that this would be better as a separate option.
Optimization bugs that pop up at different optimization levels are hard
enough for users to figure out, without the frontend generating different
code to begin with depending on the optimization level. Also, with a
separate option it would be easy to check how it affects performance at
different optimization levels.

What about making it a -fcheck=short-circuit-logicals (or however you want
to spell it?) option, that also would be enabled with -fcheck=all?

-- 
Janne Blomqvist


Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-24 Thread Kyrill Tkachov



On 24/07/18 16:12, James Greenhalgh wrote:

On Thu, Jul 19, 2018 at 07:35:22AM -0500, Matthew Malcomson wrote:
> Hi again.
>
> Providing an updated patch to include the formatting suggestions.

Please try not to top-post replies, it makes the conversation thread
harder to follow (reply continues below!).

> On 12/07/18 11:39, Sudakshina Das wrote:
> > Hi Matthew
> >
> > On 12/07/18 11:18, Richard Sandiford wrote:
> >> Looks good to me FWIW (not a maintainer), just a minor formatting thing:
> >>
> >> Matthew Malcomson  writes:
> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >>> b/gcc/config/aarch64/aarch64-simd.md
> >>> index
> >>> 
aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
> >>> 100644
> >>> --- a/gcc/config/aarch64/aarch64-simd.md
> >>> +++ b/gcc/config/aarch64/aarch64-simd.md
> >>> @@ -3302,38 +3302,78 @@
> >>> DONE;
> >>>   })
> >>>   -(define_insn "aarch64_w"
> >>> +(define_insn "aarch64_subw"
> >>> [(set (match_operand: 0 "register_operand" "=w")
> >>> -(ADDSUB: (match_operand: 1 "register_operand"
> >>> "w")
> >>> -(ANY_EXTEND:
> >>> -  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>> +(minus:
> >>> + (match_operand: 1 "register_operand" "w")
> >>> + (ANY_EXTEND:
> >>> +   (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> The (minus should be under the "(match_operand":
> >>
> >> (define_insn "aarch64_subw"
> >>[(set (match_operand: 0 "register_operand" "=w")
> >> (minus: (match_operand: 1 "register_operand" "w")
> >>(ANY_EXTEND:
> >>  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> Same for the other patterns.
> >>
> >> Thanks,
> >> Richard
> >>
> >
> > You will need a maintainer's approval but this looks good to me.
> > Thanks for doing this. I would only point out one other nit which you
> > can choose to ignore:
> >
> > +/* Ensure
> > +   saddw2 and one saddw for the function add()
> > +   ssubw2 and one ssubw for the function subtract()
> > +   uaddw2 and one uaddw for the function uadd()
> > +   usubw2 and one usubw for the function usubtract() */
> > +
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */
> >
> > The scan-assembly directives for the different
> > functions can be placed right below each of them and that would
> > make it easier to read the expected results in the test and you
> > can get rid of the comments saying the same.

Thanks for the first-line review Sudi.

OK for trunk.



I've committed this on behalf of Matthew as: https://gcc.gnu.org/r262949

Thanks,
Kyrill


Thanks,
James





Re: committed: remove redundant -Wall from -Warray-bounds (PR 82063)

2018-07-24 Thread Martin Sebor

On 07/24/2018 03:24 AM, Franz Sirl wrote:

Am 2018-07-20 um 23:22 schrieb Martin Sebor:

As the last observation in PR 82063 Jim points out that

   Both -Warray-bounds and -Warray-bounds= are listed in the c.opt
   file as being enabled by -Wall, but they are the same option,
   and it causes this one option to be processed twice in the
   C_handle_option_auto function in the generated options.c file.
   It gets set to the same value twice, so it does work as intended,
   but this is wasteful.

I have removed the redundant -Wall from the first option and
committed the change as obvious in r262912.


Hi Martin,

this looks related to PR 68845 and my patch in there. I never posted it
to gcc-patches because I couldn't find a definitive answer on how
options duplicated between common.opt and c-family/c.opt are supposed to
be handled.
For example, Warray-bounds in common.opt is a separate option (not an
alias to Warray-bounds=), leading to separate enums for them. Is this
intended? Warray-bounds seemed to be the only option with an equal sign
doing it like that at that time. Now Wcast-align is doing the same...

Can you shed some light on this?


-Warray-bounds= (the form that takes an argument) was added in
r219577.  Before then, only the plain form existed.  If I had
to guess, the interplay between the two options (as opposed to
making the latter an alias for the new option) wasn't considered.
I didn't think of it until now either.  Your patch seems like
the right solution to me.  Let me know if you will submit it.
If not, I posted the patch below that touches this area and
that will likely need updating so I can roll your change into
it:

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01286.html

Martin


Re: [0/5] C-SKY port

2018-07-24 Thread Sandra Loosemore

On 07/23/2018 10:17 PM, Sandra Loosemore wrote:


C-SKY proposes Han Mao  and Yunhai Shang
 as maintainers for this port.  


I apologize, I made a mistake here.  The correct proposed maintainers 
are Xianmiao Qu  and Yunhai Shang

.


We expect that
C-SKY will also be providing a public link to the processor and ABI
documentation at some point.


The ABI manual has been posted, but not the ISA documentation yet.  (I'd 
guess that when it does show up it will be in the same place, though.)


https://github.com/c-sky/csky-doc

-Sandra


Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-24 Thread James Greenhalgh
On Thu, Jul 19, 2018 at 07:35:22AM -0500, Matthew Malcomson wrote:
> Hi again.
> 
> Providing an updated patch to include the formatting suggestions.

Please try not to top-post replies, it makes the conversation thread
harder to follow (reply continues below!).
 
> On 12/07/18 11:39, Sudakshina Das wrote:
> > Hi Matthew
> >
> > On 12/07/18 11:18, Richard Sandiford wrote:
> >> Looks good to me FWIW (not a maintainer), just a minor formatting thing:
> >>
> >> Matthew Malcomson  writes:
> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> >>> b/gcc/config/aarch64/aarch64-simd.md
> >>> index 
> >>> aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
> >>>  
> >>> 100644
> >>> --- a/gcc/config/aarch64/aarch64-simd.md
> >>> +++ b/gcc/config/aarch64/aarch64-simd.md
> >>> @@ -3302,38 +3302,78 @@
> >>>     DONE;
> >>>   })
> >>>   -(define_insn "aarch64_w"
> >>> +(define_insn "aarch64_subw"
> >>>     [(set (match_operand: 0 "register_operand" "=w")
> >>> -    (ADDSUB: (match_operand: 1 "register_operand" 
> >>> "w")
> >>> -    (ANY_EXTEND:
> >>> -  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>> +    (minus:
> >>> + (match_operand: 1 "register_operand" "w")
> >>> + (ANY_EXTEND:
> >>> +   (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> The (minus should be under the "(match_operand":
> >>
> >> (define_insn "aarch64_subw"
> >>    [(set (match_operand: 0 "register_operand" "=w")
> >> (minus: (match_operand: 1 "register_operand" "w")
> >>    (ANY_EXTEND:
> >>  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> Same for the other patterns.
> >>
> >> Thanks,
> >> Richard
> >>
> >
> > You will need a maintainer's approval but this looks good to me.
> > Thanks for doing this. I would only point out one other nit which you
> > can choose to ignore:
> >
> > +/* Ensure
> > +   saddw2 and one saddw for the function add()
> > +   ssubw2 and one ssubw for the function subtract()
> > +   uaddw2 and one uaddw for the function uadd()
> > +   usubw2 and one usubw for the function usubtract() */
> > +
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */
> >
> > The scan-assembly directives for the different
> > functions can be placed right below each of them and that would
> > make it easier to read the expected results in the test and you
> > can get rid of the comments saying the same.

Thanks for the first-line review Sudi.

OK for trunk.

Thanks,
James



Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Jeff Law
On 07/23/2018 10:26 PM, Sandra Loosemore wrote:
> 2018-07-23  Jojo  
>     Huibin Wang  
>     Sandra Loosemore  
>     Chung-Lin Tang  
> 
>     C-SKY port: libgcc
> 
>     libgcc/
>     * config.host: Add C-SKY support.
>     * config/csky/*: New.

Obviously we'll trust you on all that assembly code :-)
OK when the kit as a whole is approved.

Easy stuff is done, onward to the actual port.

jeff


Re: [4/5] C-SKY port: Testsuite

2018-07-24 Thread Jeff Law
On 07/23/2018 10:25 PM, Sandra Loosemore wrote:
> 2018-07-23  Sandra Loosemore  
>     Chung-Lin Tang  
> 
>     C-SKY port: Testsuite
> 
>     gcc/testsuite/
>     * g++.dg/Wno-frame-address.C: Adjust for C-SKY.
>     * g++.dg/torture/type-generic-1.C: Likewise.
>     * gcc.c-torture/compile/2804-1.c: Likewise.
>     * gcc.c-torture/execute/20101011-1.c: Likewise.
>     * gcc.c-torture/execute/ieee/mul-subnormal-single-1.x: Likewise.
>     * gcc.dg/20020312-2.c: Likewise.
>     * gcc.dg/Wno-frame-address.c: Likewise.
>     * gcc.dg/c11-true_min-1.c: Likewise.
>     * gcc.dg/sibcall-10.c: Likewise.
>     * gcc.dg/sibcall-9.c: Likewise.
>     * gcc.dg/stack-usage-1.c: Likewise.
>     * gcc.dg/torture/float32-tg-3.c: Likewise.
>     * gcc.dg/torture/float32x-tg-3.c: Likewise.
>     * gcc.dg/torture/float64-tg-3.c: Likewise.
>     * gcc.dg/torture/float64x-tg-3.c: Likewise.
>     * gcc.dg/torture/type-generic-1.c: Likewise.
>     * gcc.target/csky/*: New.
>     * lib/target-supports.exp (check_profiling_available): Add
>     csky-*-elf.
>     (check_effective_target_hard_float): Handle C-SKY targets with
>     single-precision hard float only.
>     (check_effective_target_logical_op_short_circuit): Handle C-SKY.

No concerns here.  OK when the kit as a whole is approved.

Jeff


Re: [3/5] C-SKY port: Documentation

2018-07-24 Thread Jeff Law
On 07/23/2018 10:23 PM, Sandra Loosemore wrote:
> 2018-07-23  Sandra Loosemore  
> 
>     C-SKY port: Documentation
> 
>     gcc/
>     * doc/extend.texi (C-SKY Function Attributes): New section.
>     * doc/invoke.texi (Option Summary): Add C-SKY options.
>     (C-SKY Options): New section.
>     * doc/md.texi (Machine Constraints): Document C-SKY constraints.

No concerns here.  OK when the kit as a whole is approved.


Re: [1/5] C-SKY port: Configury

2018-07-24 Thread Jeff Law
On 07/23/2018 10:19 PM, Sandra Loosemore wrote:
> 2018-07-23  Jojo  
>     Huibin Wang  
>     Sandra Loosemore  
>     Chung-Lin Tang  
>     Andrew Jenner  
> 
>     C-SKY port: Configury
> 
>     gcc/
>     * config.gcc (csky-*-*): New.
>     * configure.ac: Add csky to targets for dwarf2 debug_line support.
>     * configure: Regenerated.
> 
>     contrib/
>     * config-list.mk (LIST): Add csky-elf and csky-linux-gnu.
No concerns here.  OK when the kit as a whole is approved.

jeff


[PATCH] Fix PR86654

2018-07-24 Thread Richard Biener


I am testing the following patch to avoid forcing DIEs for a type context
for method clones late via limbo processing.  Instead hang them off
comp_unit_die if there is no early DIE for the function.

One question is whether the comment "If we're a nested function"
matches up with the decl_function_context (decl) check or whether
we really wanted to check DECL_CONTEXT (decl) == FUNCTION_DECL
which would have made this particular case not match (DECL_CONTEXT
is a RECORD_TYPE which context is a FUNCTION_DECL).

Another option is of course to make clones not inherit
DECL_CONTEXT from the cloned function (for an artificial
instance the "context" is somewhat arbitrary since it isn't
going to be found by lookup).  In fact I long wanted to
represent clones as artificial containers for an
inline instance of the cloned function...

LTO bootstrap and regtest running on x86_64-unknown-linux-gnu.

If that succeeds I'm going to apply it as a stop-gap measure to
make Firefox build again with LTO but I'm open to revising it.

Richard.

2018-07-24  Richard Biener  

PR debug/86654
* dwarf2out.c (dwarf2out_decl): Do not handle nested functions
special wrt context_die late.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 262948)
+++ gcc/dwarf2out.c (working copy)
@@ -26702,8 +26702,11 @@ dwarf2out_decl (tree decl)
 case FUNCTION_DECL:
   /* If we're a nested function, initially use a parent of NULL; if we're
 a plain function, this will be fixed up in decls_for_scope.  If
-we're a method, it will be ignored, since we already have a DIE.  */
-  if (decl_function_context (decl)
+we're a method, it will be ignored, since we already have a DIE.
+Avoid doing this late though since clones of class methods may
+otherwise end up creating type DIEs late.  */
+  if (early_dwarf
+ && decl_function_context (decl)
  /* But if we're in terse mode, we don't care about scope.  */
  && debug_info_level > DINFO_LEVEL_TERSE)
context_die = NULL;


[PATCH, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 02:34:26PM +0200, Tom de Vries wrote:
> On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> > On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
> >> Another drawback is that the fake uses confuse the unitialized warning
> >> analysis, so that is switched off for -fkeep-vars-live.
> > 
> > Is that really needed?  I.e. can't you for the purpose of uninitialized
> > warning analysis ignore the clobber = var uses?
> > 
> 
> This seems to work on the test-case that failed during testing
> (g++.dg/uninit-pred-4.C):
> ...
> diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
> index 77f090bfa80..953db9ed02d 100644
> --- a/gcc/tree-ssa-uninit.c
> +++ b/gcc/tree-ssa-uninit.c
> @@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
> tree var,
>if (is_gimple_assign (context)
>&& gimple_assign_rhs_code (context) == COMPLEX_EXPR)
>  return;
> +  if (gimple_assign_single_p (context)
> +  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
> +return;
>if (!has_undefined_value_p (t))
>  return;
> ...
> But I don't know the pass well enough to know whether this is a
> sufficient fix.
> 

Updated and re-tested patch.

> +Add artificial use for each local variable at the end of the
> declaration scope

Is this a better option description?


OK for trunk?

Thanks,
- Tom

[debug] Add fkeep-vars-live

This patch adds fake uses of user variables at the point where they go out of
scope, to keep user variables inspectable throughout the application.

This approach will generate sub-optimal code: in some cases, the executable
code will go through efforts to keep a var alive, while var-tracking can
easily compute the value of the var from something else.

Also, the compiler treats the fake use as any other use, so it may keep an
expensive resource like a register occupied (if we could mark the use as a
cold use or some such, we could tell optimizers that we need the value, but
it's ok if getting the value is expensive, so it could be spilled instead of
occupying a register).

The current implementation is expected to increase register pressure, and
therefore spilling, but we'd still expect less memory accesses then with O0.

2018-07-24  Tom de Vries  

PR debug/78685
* cfgexpand.c (expand_use): New function.
(expand_gimple_stmt_1): Handle TREE_CLOBBER_P as lhs of assignment.
* common.opt (fkeep-vars-live): New option.
* function.c (instantiate_virtual_regs): Instantiate in USEs as well.
* gimplify.c (gimple_build_uses): New function.
(gimplify_bind_expr): Build clobber uses for variables that don't have
to be in memory.
(gimplify_body): Build clobber uses for arguments.
* tree-cfg.c (verify_gimple_assign_single): Handle TREE_CLOBBER_P as lhs
of assignment.
* tree-sra.c (sra_modify_assign): Same.
* tree-ssa-alias.c (refs_may_alias_p_1): Same.
* tree-ssa-structalias.c (find_func_aliases): Same.
* tree-ssa-uninit.c (warn_uninit): Same.

* gcc.dg/guality/pr54200-2.c: Update.

---
 gcc/cfgexpand.c  | 11 
 gcc/common.opt   |  4 +++
 gcc/function.c   |  5 ++--
 gcc/gimplify.c   | 46 +++-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
 gcc/tree-cfg.c   |  1 +
 gcc/tree-sra.c   |  7 +
 gcc/tree-ssa-alias.c |  4 +++
 gcc/tree-ssa-structalias.c   |  3 ++-
 gcc/tree-ssa-uninit.c|  3 +++
 10 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..e28e8ceec75 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3533,6 +3533,15 @@ expand_clobber (tree lhs)
 }
 }
 
+/* Expand a use of RHS.  */
+
+static void
+expand_use (tree rhs)
+{
+  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  emit_use (target);
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3632,6 +3641,8 @@ expand_gimple_stmt_1 (gimple *stmt)
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
  expand_clobber (lhs);
+   else if (TREE_CLOBBER_P (lhs))
+ expand_use (rhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
diff --git a/gcc/common.opt b/gcc/common.opt
index 984b351cd79..a29e320ba45 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1496,6 +1496,10 @@ fdebug-nops
 Common Report Var(flag_debug_nops) Optimization
 Insert nops if that might improve debugging of optimized code.
 
+fkeep-vars-live
+Common Report Var(flag_keep_vars_live) 

[PATCH, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:35:14PM +0200, Tom de Vries wrote:
> On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> > On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > > On 07/16/2018 03:50 PM, Richard Biener wrote:
> > >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> > >>> Any comments?
> > >>
> > >> Interesting idea.  I wonder if that should be generalized
> > >> to other places
> > > 
> > > I kept the option name general, to allow for that.
> > > 
> > > And indeed, this is a point-fix patch. I've been playing around with a
> > > more generic patch that adds nops such that each is_stmt .loc is
> > > associated with a unique insn, but that was embedded in an
> > > fkeep-vars-live branch, so this patch is minimally addressing the first
> > > problem I managed to reproduce on trunk.
> > > 
> > >> and how we can avoid compare-debug failures
> > >> (var-tracking usually doesn't change code-generation).
> > >>
> > > 
> > 
> > I'll post this patch series (the current state of my fkeep-vars-live
> > branch) in reply to this email:
> > 
> >  1  [debug] Add fdebug-nops
> >  2  [debug] Add fkeep-vars-live
> >  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> > 
> > Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> > header comments missing.
> > 
> > Comments welcome.
> > 
> 

And here's a version that is fcompare-debug friendly.

OK for trunk?

Thanks,
- Tom

[debug] Add fdebug-nops

Consider this gdb session in foo of pr54200-2.c (using -Os):
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
'a' has unknown type; cast it to its declared type
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The problem is that the scope in which a is declared ends at .LBE7, and the
statement .loc for line 26 ends up attached to the ret insn:
...
.loc 1 24 11
addl%edx, %eax
.LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
.LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
for O1 and higher) by adding a nop after the ".loc is_stmt 1" annotation:
...
.loc 1 24 11
addl%edx, %eax
 .LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
+   nop
 .LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

and instead we have:
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
$1 = 6
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The option should have the same effect as using a debugger with location views
capability.

2018-07-24  Tom de Vries  

PR debug/78685
* common.opt (fdebug-nops): New option.
* passes.def (pass_late_compilation): Add pass_debug_nops.
* rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): Add flag_debug_nops.
* timevar.def (TV_VAR_DEBUG_NOPS): New timevar.
* tree-pass.h (make_pass_debug_nops): Declare.
* tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Add flag_debug_nops.
* var-tracking.c (insert_debug_nops): New function.
(class pass_debug_nops): New pass.
(make_pass_debug_nops): New function.

* gcc.dg/guality/pr54200-2.c: New test.

---
 gcc/common.opt   |  4 ++
 gcc/passes.def   |  1 +
 gcc/rtl.h|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c | 37 +
 gcc/timevar.def  |  1 +
 gcc/tree-pass.h  |  1 +
 gcc/tree.h   |  2 +-
 gcc/var-tracking.c   | 91 
 8 files changed, 137 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4bf8de90331..984b351cd79 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1492,6 +1492,10 @@ Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fdebug-nops
+Common Report Var(flag_debug_nops) Optimization
+Insert nops if that might improve debugging of optimized code.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/passes.def b/gcc/passes.def
index 2a8fbc2efbe..e97c4f9533a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -487,6 +487,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_late_compilation);
   PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
  NEXT_PASS (pass_compute_alignments);
+ NEXT_PASS (pass_debug_nops);
  NEXT_PASS (pass_variable_tracking);
  NEXT_PASS (pass_free_cfg);
  NEXT_PASS (pass_machine_reorg);
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 565ce3abbe4..29a2e19cb6e 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -843,7 +843,7 @@ struct GTY(()) rtvec_def {
 

Re: [PATCH] Make strlen range computations more conservative

2018-07-24 Thread Richard Biener
On Tue, 24 Jul 2018, Bernd Edlinger wrote:

> Hi!
> 
> This patch makes strlen range computations more conservative.
> 
> Firstly if there is a visible type cast from type A to B before passing
> then value to strlen, don't expect the type layout of B to restrict the
> possible return value range of strlen.
> 
> Furthermore use the outermost enclosing array instead of the
> innermost one, because too aggressive optimization will likely
> convert harmless errors into security-relevant errors, because
> as the existing test cases demonstrate, this optimization is actively
> attacking string length checks in user code, while and not giving
> any warnings.
> 
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

I'd like us to be explicit in what we support, not what we do not
support, thus

+ /* Avoid arrays of pointers.  */
+ if (TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE)
+   return false;

should become

   /* We handle arrays of integer types.  */
  if (TREE_CODE (TRE_TYPE (arg)) != INTEGER_TYPE)
return false;

+ tree base = arg;
+ while (TREE_CODE (base) == ARRAY_REF
+|| TREE_CODE (base) == ARRAY_RANGE_REF
+|| TREE_CODE (base) == COMPONENT_REF)
+   base = TREE_OPERAND (base, 0);
+
+ /* If this looks like a type cast don't assume anything.  */
+ if ((TREE_CODE (base) == MEM_REF
+  && (! integer_zerop (TREE_OPERAND (base, 1))
+  || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))
+ != TREE_TYPE (base)))
+ || TREE_CODE (base) == VIEW_CONVERT_EXPR)
return false;

likewise - you miss to handle BIT_FIELD_REF.  So, instead

  if (!(DECL_P (base)
|| TREE_CODE (base) == STRING_CST
|| (TREE_CODE (base) == MEM_REF
&& ...

you should look at comparing TYPE_MAIN_VARIANT in your type
check, aligned/unaligned or const/non-const accesses shouldn't
be considered a "type cast".  Maybe even use
types_compatible_p.  Not sure why you enforce zero-offset MEMs?
Do we in the end only handle  bases of MEMs?  Given you
strip arbitrary COMPONENT_REFs the offset in a MEM isn't
so much different?

It looks like the component-ref stripping plus type-check part
could be factored out into sth like get_base_address?  I don't
have a good name or suggested semantics for it though.

Richard.


> 
> Thanks
> Bernd.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Jakub Jelinek
On Tue, Jul 24, 2018 at 04:33:30PM +0200, Richard Biener wrote:
> DW_OP_GNU_variable_value you mean.  That's true.  But I was talking about

Sure.

> omitting DW_AT_upper_bound which would correspond to int[] vs.
> a empty location list DW_AT_upper_bound which would correspond to
> int[].

I think that is fine to differentiate.

> Not sure if that is how it behaves (well, gdb still doesn't support
> DW_OP_GNU_variable_value).  Related is probably that I wanted to have
> debug info of VLAs in the abstract instance so we'd have a
> DW_OP_GNU_variable_value
> of a DIE in the abstract instance (with no location) and the concrete instance
> DIE of the refered to DIE would only be related to it via its
> DW_AT_abstract_origin
> attribute.  Ideally that would just work but given lacking gdb support
> I couldn't
> figure that out and thus we re-emit the whole type in the concrete 
> instances...

The intent was that the debugger would do similar thing as if the user at that
point asked for the value of that (integral) variable, whatever it would
print would be just pushed to the DWARF stack.

Not sure how far are the GDB folks with the DW_OP_GNU_variable_value
support.

Jakub


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Richard Biener
On Tue, Jul 24, 2018 at 12:39 PM Jakub Jelinek  wrote:
>
> On Tue, Jul 24, 2018 at 12:29:37PM +0200, Tom de Vries wrote:
> > > I wonder if it makes sense to disambiguate things from the gcc side
> > > by generating an empty location description for known optimized out
> > > values (the standard seems to explicitely call that out as a way to
> > > say "optimized out").
>
> You can't emit the empty location descriptions in the middle of other
> expressions though,

True.  We'd have to "fold" any expression dependend on the optimized out
value to optimized out itself.

> and if you have e.g. an expression that uses
> DW_OP_GNU_variable_size only conditionally, even if you know that
> the corresponding DIE doesn't have location, the containing DWARF expression
> could still be well defined if the condition guarding the
> DW_OP_GNU_variable_size isn't true.

DW_OP_GNU_variable_value you mean.  That's true.  But I was talking about
omitting DW_AT_upper_bound which would correspond to int[] vs.
a empty location list DW_AT_upper_bound which would correspond to
int[].
ISTR we quite aggressively prune DW_OP_GNU_variable_values that we cannot
resolve.

> And, also, for DW_OP_GNU_variable_size DIE lookup should be performed,
> say you have only a declaration in the TU, but definition somewhere else,
> in that case the debugger should be using that other TU's definition
> with hopefully location description.  Or if there is a global variable
> interposed by some other TU, again, DW_OP_GNU_variable_size should follow
> to whatever variable wins.

Not sure if that is how it behaves (well, gdb still doesn't support
DW_OP_GNU_variable_value).  Related is probably that I wanted to have
debug info of VLAs in the abstract instance so we'd have a
DW_OP_GNU_variable_value
of a DIE in the abstract instance (with no location) and the concrete instance
DIE of the refered to DIE would only be related to it via its
DW_AT_abstract_origin
attribute.  Ideally that would just work but given lacking gdb support
I couldn't
figure that out and thus we re-emit the whole type in the concrete instances...

If we sort that out we can drop all of the VLA type remapping during
inlining / cloning I think
given we have debug info for the abstract instance around.

Richard.

>
> Jakub


Re: [PATCH] Fix expand_divmod (PR middle-end/86627)

2018-07-24 Thread Richard Biener
On Tue, 24 Jul 2018, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows, expand_divmod stopped emitting int128
> signed divisions by positive small (fitting into hwi) power of two constants
> in my r242690 aka PR78416 fix, where I've added next to
> EXACT_POWER_OF_2_OR_ZERO_P uses a check that either the bitsize is
> smaller or equal to hwi, or the value is positive (because otherwise
> the value is not a power of two, has say 65 bits set and 63 bits clear).
> In this particular spot I've been changing:
> else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
> +&& (size <= HOST_BITS_PER_WIDE_INT || d >= 0)
>  && (rem_flag
>  ? smod_pow2_cheap (speed, compute_mode)
>  : sdiv_pow2_cheap (speed, compute_mode))
>  /* We assume that cheap metric is true if the
> optab has an expander for this mode.  */
>  && ((optab_handler ((rem_flag ? smod_optab
>   : sdiv_optab),
>  compute_mode)
>   != CODE_FOR_nothing)
>  || (optab_handler (sdivmod_optab,
> compute_mode)
>  != CODE_FOR_nothing)))
>   ;
> -   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
> +   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)
> +&& (size <= HOST_BITS_PER_WIDE_INT
> +|| abs_d != (unsigned HOST_WIDE_INT) d))
> 
> The first change was correct, but I think I've failed to take into account
> the large additional && there and that the positive power of two values of
> d aren't really handled in the first else if, it is merely about not
> optimizing it if division or modulo is fast, and the actual optimization
> is only done in the second else if, where it handles both the cases of
> d being a positive power of two, and the case where d is negative and is not
> a power of two, but its negation abs_d is a positive power of two (handled
> by doing additional negation afterwards).  The condition I've added allowed
> for the > 64-bit bitsizes only the cases of negative d values where their
> negation is a positive power of two (and disallowed the corner wrapping
> case of abs_d == d).  This means with the above change we keep optimizing
> signed int128 division by e.g. -2 or -0x400 into shifts, but
> actually don't optimize division by 2 or 0x40.
> Although d and abs_d are HOST_WIDE_INT and unsigned HOST_WIDE_INT, the
> d >= cases are always good even for int128, the higher bits are all zeros
> and abs_d is the same as d.  For d < 0 and d != HOST_WIDE_INT_MIN it is also
> ok, d is lots of sign bits followed by 63 arbitrary bits, but the absolute
> value of that is still a number with the msb bit in hwi clear and in wider
> precision all bits above it clear too.  So the only problematic case is
> d equal to HOST_WIDE_INT_MIN, where we are divising or doing modulo
> by (signed __int128) 0x8000, and its negation
> is still the same value when expressed in CONST_INT or HOST_WIDE_INT, but
> the actual negated value should be (signed __int128) 0x8000ULL.
> 
> So, this patch punts for that single special case which we don't handle
> properly (so it will be expanded as __divti3 likely), and allows again
> the positive power of two d values.
> 
> Sorry for introducing this regression.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?

OK.  Please wait until after 8.2 for the 8 branch though.

Thanks,
Richard.

> 2018-07-24  Jakub Jelinek  
> 
>   PR middle-end/86627
>   * expmed.c (expand_divmod): Punt if d == HOST_WIDE_INT_MIN
>   and size > HOST_BITS_PER_WIDE_INT.  For size > HOST_BITS_PER_WIDE_INT
>   and abs_d == d, do the power of two handling if profitable.
> 
>   * gcc.target/i386/pr86627.c: New test.
> 
> --- gcc/expmed.c.jj   2018-07-16 23:24:29.0 +0200
> +++ gcc/expmed.c  2018-07-23 12:22:05.835272680 +0200
> @@ -4480,6 +4480,11 @@ expand_divmod (int rem_flag, enum tree_c
>   HOST_WIDE_INT d = INTVAL (op1);
>   unsigned HOST_WIDE_INT abs_d;
>  
> + /* Not prepared to handle division/remainder by
> +0x8000 etc.  */
> + if (d == HOST_WIDE_INT_MIN && size > HOST_BITS_PER_WIDE_INT)
> +   break;
> +
>   /* Since d might be INT_MIN, we have to cast to
>  unsigned HOST_WIDE_INT before negating to avoid
>  undefined signed overflow.  */
> @@ -4522,9 +4527,7 @@ expand_divmod (int rem_flag, enum tree_c
>|| (optab_handler (sdivmod_optab, int_mode)
>

Re: [PATCH] Limix dump_flag enum values range (PR middle-end/86645).

2018-07-24 Thread Richard Biener
On Tue, Jul 24, 2018 at 9:27 AM Martin Liška  wrote:
>
> Hi.
>
> That fixes many UBSAN issues that are caused by:
>
>   {"all", dump_flags_t (~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_GRAPH
> | TDF_STMTADDR | TDF_RHS_ONLY | TDF_NOUID
> | TDF_ENUMERATE_LOCALS | TDF_SCEV | TDF_GIMPLE))},
>
> That goes out of:
>
>   minv = TYPE_MIN_VALUE (TREE_TYPE (type));
>   maxv = TYPE_MAX_VALUE (TREE_TYPE (type));
>
> Thus I would like to limit value of "all".
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> And UBSAN errors are gone.
>
> Ready to be installed?

OK.

Richard.

> Martin
>
>
> gcc/ChangeLog:
>
> 2018-07-23  Martin Liska  
>
> PR middle-end/86645
> * dumpfile.c: And excluded values with TDF_ALL_VALUES.
> * dumpfile.h (enum dump_flag): Defince TDF_ALL_VALUES.
> ---
>  gcc/dumpfile.c | 7 ---
>  gcc/dumpfile.h | 5 -
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
>


Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-24 Thread Tamar Christina
Hi All,

Here's an updated patch with documentation.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-24  Tamar Christina  

PR target/86486
* configure.ac: Add stack-clash-protection-guard-size.
* doc/install.texi: Document it.
* config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
* params.def: Update comment for guard-size.
* configure: Regenerate.

The 07/24/2018 11:34, Joseph Myers wrote:
> On Tue, 24 Jul 2018, tamar.christ...@arm.com wrote:
> 
> > This patch defines a configure option to allow the setting of the default
> > guard size via configure flags when building the target.
> 
> If you add a configure option, you must also add documentation for it in 
> install.texi.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

-- 
diff --git a/gcc/config.in b/gcc/config.in
index 2856e72d627df537a301a6c7ab6b5bbb75f6b43f..e853adc9aed366fe7bbc0f598e9f7137d68d7c02 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -55,6 +55,12 @@
 #endif
 
 
+/* Define to larger than zero set to the default stack clash protector size. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_STK_CLASH_GUARD_SIZE
+#endif
+
+
 /* Define if you want to use __cxa_atexit, rather than atexit, to register C++
destructors for local statics and global objects. This is essential for
fully standards-compliant handling of destructors, but requires
diff --git a/gcc/configure b/gcc/configure
index 60d373982fd38fe51c285e2b02941754d1b833d6..513a8eada59af8314732d744b673e807829bad77 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -905,6 +905,7 @@ enable_valgrind_annotations
 with_stabs
 enable_multilib
 enable_multiarch
+with_stack_clash_protection_guard_size
 enable___cxa_atexit
 enable_decimal_float
 enable_fixed_point
@@ -1724,6 +1725,9 @@ Optional Packages:
   --with-gnu-as   arrange to work with GNU as
   --with-as   arrange to use the specified as (full pathname)
   --with-stabsarrange to use stabs instead of host debug format
+  --with-stack-clash-protection-guard-size=size
+  Set the default stack clash protection guard size
+  for specific targets as a power of two.
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
   --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
@@ -7436,6 +7440,35 @@ $as_echo "$enable_multiarch$ma_msg_suffix" >&6; }
 
 
 
+# default stack clash protection guard size
+# Please keep these in sync with params.def.
+stk_clash_min=12
+stk_clash_max=30
+stk_clash_default=12
+
+# Keep the default value when the option is not used to 0, this allows us to
+# distinguish between the cases where the user specifially set a value via
+# configure and when the normal default value is used.
+
+# Check whether --with-stack-clash-protection-guard-size was given.
+if test "${with_stack_clash_protection_guard_size+set}" = set; then :
+  withval=$with_stack_clash_protection_guard_size; DEFAULT_STK_CLASH_GUARD_SIZE="$with_stack_clash_protection_guard_size"
+else
+  DEFAULT_STK_CLASH_GUARD_SIZE=0
+fi
+
+if test $DEFAULT_STK_CLASH_GUARD_SIZE -ne 0 \
+ && (test $DEFAULT_STK_CLASH_GUARD_SIZE -lt $stk_clash_min \
+	 || test $DEFAULT_STK_CLASH_GUARD_SIZE -gt $stk_clash_max); then
+  as_fn_error "Invalid value $DEFAULT_STK_CLASH_GUARD_SIZE for --with-stack-clash-protection-guard-size. Must be between $stk_clash_min and $stk_clash_max." "$LINENO" 5
+fi
+
+
+cat >>confdefs.h <<_ACEOF
+#define DEFAULT_STK_CLASH_GUARD_SIZE $DEFAULT_STK_CLASH_GUARD_SIZE
+_ACEOF
+
+
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
 if test "${enable___cxa_atexit+set}" = set; then :
@@ -18448,7 +18481,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18451 "configure"
+#line 18484 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18554,7 +18587,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18557 "configure"
+#line 18590 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 010ecd2ccf609ded1f4d2849a2acc13aba43b55b..e839445c1aea7dc9c5a5ee8dab8191eae5298bc2 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -811,6 +811,30 @@ AC_MSG_RESULT($enable_multiarch$ma_msg_suffix)
 AC_SUBST(with_cpu)
 AC_SUBST(with_float)
 
+# default stack clash protection guard size
+# Please keep these in sync with params.def.
+stk_clash_min=12
+stk_clash_max=30
+stk_clash_default=12
+
+# Keep the default value when the option is not used to 0, this allows us to
+# distinguish between the cases where the user specifially set a value via
+# configure and when the normal default value is used.
+AC_ARG_WITH(stack-clash-protection-guard-size,

Re: [PATCH] Fix segfault in -fsave-optimization-record (PR tree-optimization/86636)

2018-07-24 Thread Richard Biener
On Tue, Jul 24, 2018 at 1:44 AM David Malcolm  wrote:
>
> There are various ways that it's possible for a gimple statement to
> have an UNKNOWN_LOCATION, and for that UNKNOWN_LOCATION to be wrapped
> in an ad-hoc location to capture inlining information.
>
> For such a location, LOCATION_FILE (loc) is NULL.
>
> Various places in -fsave-optimization-record were checking for
>   loc != UNKNOWN_LOCATION
> and were passing LOCATION_FILE (loc) to code that assumed a non-NULL
> filename, thus leading to segfaults for the above cases.
>
> This patch updates the tests to use
>   LOCATION_LOCUS (loc) != UNKNOWN_LOCATION
> instead, to look through ad-hoc location wrappers, fixing the segfaults.
>
> It also adds various assertions to the affected code.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds
> 8 PASS results to gcc.sum.
>
> OK for trunk?

OK.

Richard.

>
> gcc/ChangeLog:
> PR tree-optimization/86636
> * json.cc (json::object::set): Fix comment.  Add assertions.
> (json::array::append): Move here from json.h.  Add comment and an
> assertion.
> (json::string::string): Likewise.
> * json.h (json::array::append): Move to json.cc.
> (json::string::string): Likewise.
> * optinfo-emit-json.cc
> (optrecord_json_writer::impl_location_to_json): Assert that we
> aren't attempting to write out UNKNOWN_LOCATION, or an ad-hoc
> wrapper around it.  Expand the location once, rather than three
> times.
> (optrecord_json_writer::inlining_chain_to_json): Fix the check for
> UNKNOWN_LOCATION, to use LOCATION_LOCUS to look through ad-hoc
> wrappers.
> (optrecord_json_writer::optinfo_to_json): Likewise, in four
> places.  Fix some overlong lines.
>
> gcc/testsuite/ChangeLog:
> PR tree-optimization/86636
> * gcc.c-torture/compile/pr86636.c: New test.
> ---
>  gcc/json.cc   | 24 +++-
>  gcc/json.h|  4 ++--
>  gcc/optinfo-emit-json.cc  | 25 +++--
>  gcc/testsuite/gcc.c-torture/compile/pr86636.c |  8 
>  4 files changed, 48 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr86636.c
>
> diff --git a/gcc/json.cc b/gcc/json.cc
> index 3c2aa77..3ead980 100644
> --- a/gcc/json.cc
> +++ b/gcc/json.cc
> @@ -76,12 +76,15 @@ object::print (pretty_printer *pp) const
>pp_character (pp, '}');
>  }
>
> -/* Set the json::value * for KEY, taking ownership of VALUE
> +/* Set the json::value * for KEY, taking ownership of V
> (and taking a copy of KEY if necessary).  */
>
>  void
>  object::set (const char *key, value *v)
>  {
> +  gcc_assert (key);
> +  gcc_assert (v);
> +
>value **ptr = m_map.get (key);
>if (ptr)
>  {
> @@ -126,6 +129,15 @@ array::print (pretty_printer *pp) const
>pp_character (pp, ']');
>  }
>
> +/* Append non-NULL value V to a json::array, taking ownership of V.  */
> +
> +void
> +array::append (value *v)
> +{
> +  gcc_assert (v);
> +  m_elements.safe_push (v);
> +}
> +
>  /* class json::number, a subclass of json::value, wrapping a double.  */
>
>  /* Implementation of json::value::print for json::number.  */
> @@ -140,6 +152,16 @@ number::print (pretty_printer *pp) const
>
>  /* class json::string, a subclass of json::value.  */
>
> +/* json::string's ctor.  */
> +
> +string::string (const char *utf8)
> +{
> +  gcc_assert (utf8);
> +  m_utf8 = xstrdup (utf8);
> +}
> +
> +/* Implementation of json::value::print for json::string.  */
> +
>  void
>  string::print (pretty_printer *pp) const
>  {
> diff --git a/gcc/json.h b/gcc/json.h
> index 5c3274c..154d9e1 100644
> --- a/gcc/json.h
> +++ b/gcc/json.h
> @@ -107,7 +107,7 @@ class array : public value
>enum kind get_kind () const FINAL OVERRIDE { return JSON_ARRAY; }
>void print (pretty_printer *pp) const FINAL OVERRIDE;
>
> -  void append (value *v) { m_elements.safe_push (v); }
> +  void append (value *v);
>
>   private:
>auto_vec m_elements;
> @@ -134,7 +134,7 @@ class number : public value
>  class string : public value
>  {
>   public:
> -  string (const char *utf8) : m_utf8 (xstrdup (utf8)) {}
> +  string (const char *utf8);
>~string () { free (m_utf8); }
>
>enum kind get_kind () const FINAL OVERRIDE { return JSON_STRING; }
> diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
> index bf1172a..6460a81 100644
> --- a/gcc/optinfo-emit-json.cc
> +++ b/gcc/optinfo-emit-json.cc
> @@ -202,10 +202,12 @@ optrecord_json_writer::impl_location_to_json 
> (dump_impl_location_t loc)
>  json::object *
>  optrecord_json_writer::location_to_json (location_t loc)
>  {
> +  gcc_assert (LOCATION_LOCUS (loc) != UNKNOWN_LOCATION);
> +  expanded_location exploc = expand_location (loc);
>json::object *obj = new json::object ();
> -  obj->set ("file", new json::string (LOCATION_FILE (loc)));
> 

Re: [PATCH] -fsave-optimization-record: add contrib/optrecord.py

2018-07-24 Thread Richard Biener
On Mon, Jul 23, 2018 at 9:20 PM David Malcolm  wrote:
>
> On Mon, 2018-07-23 at 11:46 +0200, Richard Biener wrote:
> > On Fri, Jul 20, 2018 at 6:27 PM David Malcolm 
> > wrote:
> > >
> > > This patch adds a Python 3 module to "contrib" for reading the
> > > output of
> > > -fsave-optimization-record.
> > >
> > > It can be imported from other Python code, or run standalone as a
> > > script,
> > > in which case it prints the saved messages in a form resembling GCC
> > > diagnostics.
> > >
> > > OK for trunk?
> >
> > OK, but shouldn't there maybe a user-visible (and thus installed)
> > tool for
> > this kind of stuff?  Which would mean to place it somewhere else.
>
> As well as this support code, I've got code that uses it to generate
> HTML reports.  I'm thinking that all this Python code might be better
> to maintain in an entirely separate repository, as a third-party
> project (maintained by me, under some suitable Free Software license,
> accessible via PyPI), since I suspect that the release cycle ought to
> be different from that of gcc itself.
>
> Would that be a better approach?

Possibly.

Richard.

> Dave
>
> > Richard.
> >
> > > contrib/ChangeLog:
> > > * optrecord.py: New file.
> > > ---
> > >  contrib/optrecord.py | 295
> > > +++
> > >  1 file changed, 295 insertions(+)
> > >  create mode 100755 contrib/optrecord.py
> > >
> > > diff --git a/contrib/optrecord.py b/contrib/optrecord.py
> > > new file mode 100755
> > > index 000..b07488e
> > > --- /dev/null
> > > +++ b/contrib/optrecord.py
> > > @@ -0,0 +1,295 @@
> > > +#!/usr/bin/env python3
> > > +#
> > > +# Python module for working with the result of -fsave-
> > > optimization-record
> > > +# Contributed by David Malcolm .
> > > +#
> > > +# Copyright (C) 2018 Free Software Foundation, Inc.
> > > +# This file is part of GCC.
> > > +#
> > > +# GCC is free software; you can redistribute it and/or modify it
> > > under
> > > +# the terms of the GNU General Public License as published by the
> > > Free
> > > +# Software Foundation; either version 3, or (at your option) any
> > > later
> > > +# version.
> > > +#
> > > +# GCC is distributed in the hope that it will be useful, but
> > > WITHOUT ANY
> > > +# WARRANTY; without even the implied warranty of MERCHANTABILITY
> > > or
> > > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > > License
> > > +# for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public
> > > License
> > > +# along with GCC; see the file COPYING3.  If not see
> > > +# .  */
> > > +
> > > +import argparse
> > > +import json
> > > +import os
> > > +import sys
> > > +
> > > +class TranslationUnit:
> > > +"""Top-level class for containing optimization records"""
> > > +@staticmethod
> > > +def from_filename(filename):
> > > +with open(filename) as f:
> > > +root_obj = json.load(f)
> > > +return TranslationUnit(filename, root_obj)
> > > +
> > > +def __init__(self, filename, json_obj):
> > > +self.filename = filename
> > > +self.pass_by_id = {}
> > > +
> > > +# Expect a 3-tuple
> > > +metadata, passes, records = json_obj
> > > +
> > > +self.format = metadata['format']
> > > +self.generator = metadata['generator']
> > > +self.passes = [Pass(obj, self) for obj in passes]
> > > +self.records = [Record(obj, self) for obj in records]
> > > +
> > > +def __repr__(self):
> > > +return ('TranslationUnit(%r, %r, %r, %r)'
> > > +% (self.filename, self.generator, self.passes,
> > > self.records))
> > > +
> > > +class Pass:
> > > +"""An optimization pass"""
> > > +def __init__(self, json_obj, tu):
> > > +self.id_ = json_obj['id']
> > > +self.name = json_obj['name']
> > > +self.num = json_obj['num']
> > > +self.optgroups = set(json_obj['optgroups']) # list of
> > > strings
> > > +self.type = json_obj['type']
> > > +tu.pass_by_id[self.id_] = self
> > > +self.children = [Pass(child, tu)
> > > + for child in json_obj.get('children',
> > > [])]
> > > +
> > > +def __repr__(self):
> > > +return ('Pass(%r, %r, %r, %r)'
> > > +% (self.name, self.num, self.optgroups,
> > > self.type))
> > > +
> > > +def from_optional_json_field(cls, jsonobj, field):
> > > +if field not in jsonobj:
> > > +return None
> > > +return cls(jsonobj[field])
> > > +
> > > +class ImplLocation:
> > > +"""An implementation location (within the compiler itself)"""
> > > +def __init__(self, json_obj):
> > > +self.file = json_obj['file']
> > > +self.line = json_obj['line']
> > > +self.function = json_obj['function']
> > > +
> > > +def __repr__(self):
> > > +return ('ImplLocation(%r, %r, %r)'
> > > +% (self.file, 

Re: [PATCH] Make function clone name numbering independent.

2018-07-24 Thread Michael Ploujnikov
On 2018-07-20 06:05 AM, Richard Biener wrote:
> On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov
>  wrote:
>>
>> On 2018-07-17 04:25 PM, Michael Ploujnikov wrote:
>>> On 2018-07-17 06:02 AM, Richard Biener wrote:
 On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
  wrote:
>
> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov 
>  wrote:
>> Hi,
>>
>
>> +clone_fn_ids = hash_map::create_ggc
>> (1000);
>
> Isn't 1000 a bit excessive? What about 64 or thereabouts?

 I'm not sure we should throw memory at this "problem" in this way.
 What specific issue
 does this address?
>>>
>>> This goes along with the general theme of preventing changes to one 
>>> function affecting codegen of others. What I'm seeing in this case is when 
>>> a function bar is modified codegen decides to create more clones of it (eg: 
>>> during the constprop pass). These extra clones cause the global counter to 
>>> increment so the clones of the unchanged function foo are named differently 
>>> only because of a source change to bar. I was hoping that the testcase 
>>> would be a good illustration, but perhaps not; is there anything else I can 
>>> do to make this clearer?
>>>
>>>

 Iff then I belive forgoing the automatic counter addidion is the way
 to go and hand
 control of that to the callers (for example the caller in
 lto/lto-partition.c doesn't
 even seem to need that counter.
>>>
>>> How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm 
>>> assuming it has a good reason to call clone_function_name_1 rather than 
>>> appending ".lto_priv" itself.
>>>
 You also assume the string you key on persists - luckily the
 lto-partition.c caller
 leaks it but this makes your approach quite fragile in my eye (put the 
 logic
 into clone_function_name instead, where you at least know you are dealing
 with a string from an indentifier which are never collected).

 Richard.

>>>
>>> Is this what you had in mind?:
>>>
>>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>>> index 6e84a31..f000420 100644
>>> --- gcc/cgraphclones.c
>>> +++ gcc/cgraphclones.c
>>> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
>>> prof_count,
>>>return new_node;
>>>  }
>>>
>>> -static GTY(()) unsigned int clone_fn_id_num;
>>> +static GTY(()) hash_map *clone_fn_ids;
>>>
>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>> NAME.  */
>>> @@ -521,14 +521,13 @@ tree
>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>  {
>>>size_t len = strlen (name);
>>> -  char *tmp_name, *prefix;
>>> +  char *prefix;
>>>
>>>prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>memcpy (prefix, name, len);
>>>strcpy (prefix + len + 1, suffix);
>>>prefix[len] = symbol_table::symbol_suffix_separator ();
>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>> -  return get_identifier (tmp_name);
>>> +  return get_identifier (prefix);
>>>  }
>>>
>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>> @@ -537,7 +536,17 @@ tree
>>>  clone_function_name (tree decl, const char *suffix)
>>>  {
>>>tree name = DECL_ASSEMBLER_NAME (decl);
>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>> +  char *decl_name = IDENTIFIER_POINTER (name);
>>> +  char *numbered_name;
>>> +  unsigned int *suffix_counter;
>>> +  if (!clone_fn_ids) {
>>> +/* Initialize the per-function counter hash table if this is the first 
>>> call */
>>> +clone_fn_ids = hash_map::create_ggc (64);
>>> +  }
>>> +  suffix_counter = _fn_ids->get_or_insert(name);
>>> +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
>>> +  *suffix_counter = *suffix_counter + 1;
>>> +  return clone_function_name_1 (numbered_name, suffix);
>>>  }
>>>
>>> - Michael
>>>
>>>
>>>
>>
>> Ping, and below is the updated version of the full patch with changelogs:
>>
>>
>> gcc:
>> 2018-07-16  Michael Ploujnikov  
>>
>>Make function clone name numbering independent.
>>* cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>>(clone_function_name_1): Move suffixing to clone_function_name
>>and change it to use per-function clone_fn_ids.
>>
>> testsuite:
>> 2018-07-16  Michael Ploujnikov  
>>
>> Clone id counters should be completely independent from one another.
>> * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>>
>> ---
>>  gcc/cgraphclones.c| 19 ++
>>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 
>> +++
>>  2 files changed, 52 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>
>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>> index 6e84a31..e1a77a2 100644
>> --- gcc/cgraphclones.c
>> +++ gcc/cgraphclones.c
>> @@ -512,7 

[PATCH] Minor refactoring in header

2018-07-24 Thread Jonathan Wakely

* include/std/bit (__countl_zero, __countr_zero, __popcount): Use
local variables for number of digits instead of type aliases.
(__log2p1): Remove redundant branch also checked in __countl_zero.

Tested powerpc64le-linux, committed to trunk.


commit b5cb477ecf9e98d10f4608bd2d94f57ee3bd48e5
Author: Jonathan Wakely 
Date:   Tue Jul 24 14:10:01 2018 +0100

Minor refactoring in  header

* include/std/bit (__countl_zero, __countr_zero, __popcount): Use
local variables for number of digits instead of type aliases.
(__log2p1): Remove redundant branch also checked in __countl_zero.

diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index a23f2ba60d1..0aebac28758 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -62,45 +62,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr int
 __countl_zero(_Tp __x) noexcept
 {
-  using __limits = numeric_limits<_Tp>;
+  constexpr auto _Nd = numeric_limits<_Tp>::digits;
 
   if (__x == 0)
-return __limits::digits;
+return _Nd;
 
-  using __limits_ull = numeric_limits;
-  using __limits_ul = numeric_limits;
-  using __limits_u = numeric_limits;
+  constexpr auto _Nd_ull = numeric_limits::digits;
+  constexpr auto _Nd_ul = numeric_limits::digits;
+  constexpr auto _Nd_u = numeric_limits::digits;
 
-  if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_u::digits)
+  if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_u)
{
- constexpr int __diff = __limits_u::digits - __limits::digits;
+ constexpr int __diff = _Nd_u - _Nd;
  return __builtin_clz(__x) - __diff;
}
-  else if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_ul::digits)
+  else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ul)
{
- constexpr int __diff = __limits_ul::digits - __limits::digits;
+ constexpr int __diff = _Nd_ul - _Nd;
  return __builtin_clzl(__x) - __diff;
}
-  else if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_ull::digits)
+  else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ull)
{
- constexpr int __diff = __limits_ull::digits - __limits::digits;
+ constexpr int __diff = _Nd_ull - _Nd;
  return __builtin_clzll(__x) - __diff;
}
-  else // (__limits::digits > __limits_ull::digits)
+  else // (_Nd > _Nd_ull)
{
- static_assert(__limits::digits <= (2 * __limits_ull::digits),
+ static_assert(_Nd <= (2 * _Nd_ull),
"Maximum supported integer size is 128-bit");
 
- unsigned long long __high = __x >> __limits_ull::digits;
+ unsigned long long __high = __x >> _Nd_ull;
  if (__high != 0)
{
- constexpr int __diff
-   = (2 * __limits_ull::digits) - __limits::digits;
+ constexpr int __diff = (2 * _Nd_ull) - _Nd;
  return __builtin_clzll(__high) - __diff;
}
- unsigned long long __low = __x & __limits_ull::max();
- return (__limits::digits - __limits_ull::digits)
-   + __builtin_clzll(__low);
+ constexpr auto __max_ull = numeric_limits::max();
+ unsigned long long __low = __x & __max_ull;
+ return (_Nd - _Nd_ull) + __builtin_clzll(__low);
}
 }
 
@@ -117,31 +116,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr int
 __countr_zero(_Tp __x) noexcept
 {
-  using __limits = numeric_limits<_Tp>;
+  constexpr auto _Nd = numeric_limits<_Tp>::digits;
 
   if (__x == 0)
-return __limits::digits;
+return _Nd;
 
-  using __limits_ull = numeric_limits;
-  using __limits_ul = numeric_limits;
-  using __limits_u = numeric_limits;
+  constexpr auto _Nd_ull = numeric_limits::digits;
+  constexpr auto _Nd_ul = numeric_limits::digits;
+  constexpr auto _Nd_u = numeric_limits::digits;
 
-  if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_u::digits)
+  if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_u)
return __builtin_ctz(__x);
-  else if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_ul::digits)
+  else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ul)
return __builtin_ctzl(__x);
-  else if _GLIBCXX17_CONSTEXPR (__limits::digits <= __limits_ull::digits)
+  else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ull)
return __builtin_ctzll(__x);
-  else // (__limits::digits > __limits_ull::digits)
+  else // (_Nd > _Nd_ull)
{
- static_assert(__limits::digits <= (2 * __limits_ull::digits),
+ static_assert(_Nd <= (2 * _Nd_ull),
"Maximum supported integer size is 128-bit");
 
- unsigned long long __low = __x & __limits_ull::max();
+ constexpr auto __max_ull = numeric_limits::max();
+ unsigned long long __low = __x & __max_ull;
  if (__low != 0)
return 

[PR 80689] Copy small aggregates element-wise

2018-07-24 Thread Martin Jambor
Hi,

I'd like to propose again a new variant of a fix that I sent here in
November (https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00881.html) that
avoids store-to-load forwarding stalls in the ImageMagick benchmark by
expanding copies of very small simple aggregates element-wise rather
than "by pieces."

I have adjusted the patch only a little, most notably there is only one
controlling parameter and that is the maximum number of element-copies
that are necessary to copy an aggregate to use this method, rather than
a size constraint.  On x86_64, that parameter is set 4, on other
architectures I leave it at zero but it could help them too.

I have benchmarked the patch on top of a recent trunk on an AMD Ryzen
and an Intel Skylake machine using SPEC 2006 and SPEC 2017 CPU suites.
The only non-noise difference was 538.imagick_r, which on Ryzen and -O2
improved by 13% (generich march/mtune) and 20% (native march/mtune) and
on Skylake by 7% and 9% with the same switches.

I have bootstrapped the patch on x86_64-linux and (after changing the
parameter default) also on ppc64-linux and aarch64-linux.  I have not
done any benchmarking on non-x86_64 machines.

I'll be grateful for any comments, eventually I'd like to get approval
to commit it to trunk.

Thanks,

Martin



2018-07-10  Martin Jambor  

PR target/80689
* tree-sra.h: New file.
* ipa-prop.h: Moved declaration of build_ref_for_offset to
tree-sra.h.
* expr.c: Include params.h and tree-sra.h.
(emit_move_elementwise): New function.
(store_expr_with_bounds): Optionally use it.
* ipa-cp.c: Include tree-sra.h.
* params.def (PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY): New.
* doc/invoke.texi: Document max-elems-for-elementwise-copy.
* config/i386/i386.c (ix86_option_override_internal): Set
PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY to 4.
* tree-sra.c: Include tree-sra.h.
(scalarizable_type_p): Renamed to
simple_mix_of_records_and_arrays_p, made public, renamed the
second parameter to allow_char_arrays, added count_p parameter.
(extract_min_max_idx_from_array): New function.
(completely_scalarize): Moved bits of the function to
extract_min_max_idx_from_array.

testsuite/
* gcc.target/i386/pr80689-1.c: New test.
---
 gcc/config/i386/i386.c|   4 +
 gcc/doc/invoke.texi   |   4 +
 gcc/expr.c| 102 +++-
 gcc/ipa-cp.c  |   1 +
 gcc/ipa-prop.h|   4 -
 gcc/params.def|   6 +
 gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 ++
 gcc/tree-sra.c| 134 --
 gcc/tree-sra.h|  34 ++
 9 files changed, 282 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
 create mode 100644 gcc/tree-sra.h

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..d90d1e1ade0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4698,6 +4698,10 @@ ix86_option_override_internal (bool main_args_p,
 ix86_tune_cost->l2_cache_size,
 opts->x_param_values,
 opts_set->x_param_values);
+  maybe_set_param_value (PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY,
+4,
+opts->x_param_values,
+opts_set->x_param_values);
 
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1dcdfb51c47..240934b07d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11309,6 +11309,10 @@ away for the unroll-and-jam transformation to be 
considered profitable.
 @item unroll-jam-max-unroll
 The maximum number of times the outer loop should be unrolled by
 the unroll-and-jam transformation.
+
+@item max-elems-for-elementwise-copy
+The maximum number of elements required to copy a structure and/or
+array element-wise (as opposed to a bulk memory move).
 @end table
 @end table
 
diff --git a/gcc/expr.c b/gcc/expr.c
index f665e187ebb..657043e43fe 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -62,7 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "ccmp.h"
 #include "gimple-fold.h"
 #include "rtx-vector-builder.h"
-
+#include "params.h"
+#include "tree-sra.h"
 
 /* If this is nonzero, we do not bother generating VOLATILE
around volatile memory references, and we are willing to
@@ -5419,6 +5420,77 @@ emit_storent_insn (rtx to, rtx from)
   return maybe_expand_insn (code, 2, ops);
 }
 
+/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to TARGET
+   plus OFFSET, but do so element-wise and/or field-wise for each record and
+   array within TYPE.  TYPE must 

Re: [Patch, Fortran] PR 57160: short-circuit IF only with -ffrontend-optimize

2018-07-24 Thread Janus Weil
2018-07-24 11:12 GMT+02:00 Dominique d'Humières :
> Hi Janus,
>
>> gfortran currently does short-circuiting, and after my patch for PR
>> 85599 warns about cases where this might remove an impure function
>> call (which potentially can change results).
>>
>> Now, this PR (57160) is about code which relies on the
>> short-circuiting behavior. Since short-circuiting is not guaranteed by
>> the standard, such code is invalid. Generating a warning or an error
>> at compile-time is a bit harder here, though, since there are multiple
>> variations of such a situation, e.g.:
>> * ASSOCIATED(p) .AND. p%T
>> * ALLOCATED(a) .AND. a%T
>> * i> * …
>>
>
> Aren’t you confusing portability with validity?

I don' think so.


> The above codes are indeed invalid without short-circuit evaluation,
>  but I did not find anything in the standard saying such codes are
> invalid with short-circuit evaluation.

If they're not valid without short-circuiting, then they're not valid
at all, because the Fortran standard does not require a processor to
do short-circuiting.


>> The suggestion in the PR was to do short-circuiting only with
>> optimization flags, but inhibit it with -O0, so that the faulty code
>> will run into a segfault (or runtime error) at least when
>> optimizations are disabled, and the problem can be identified.
>
> This PR has nothing to do with optimization and I think
>  it is a very bad idea to (ab)use any optimization option.

The PR is definitely related to optimization, because the fact that
the invalid test case works at all is only due to an optional
optimization called short-circuiting.

Don't get me wrong. I think it would be great if the Fortran standard
would *require* short-circuiting for and/or operators (or had separate
operators which do that). That would allow to write code like
"ASSOCIATED(p) .AND. p%T" (which would actually be useful).
Unfortunately that's not the case, therefore such code is plain
invalid.


> Please leave the default behavior (and test) as they are now.

You seriously prefer to keep an invalid test case in the testsuite?


> If you want non short-circuit evaluation, introduce an option for it.

Your argument could easily be reversed: If you want short-circuiting,
go introduce an option for it.

I'm sure we'll not get anywhere this way, and I do think that Joost's
suggestion to avoid short-circuiting with -O0 (and possibly -Og) is
very reasonable: People who want maximum performance still get that
with -O3, and at the same time they can still check their codes for
errors with -O0.

What is your problem?!?


> Note that the warning introduce for pr85599 should be disabled
> for non short-circuit evaluations.

That's a valid point.

Cheers,
Janus


[PATCH] Add BIT_FIELD_REF canonicalization patterns

2018-07-24 Thread Richard Biener


The following adds some simple BIT_FIELD_REF canonicalization patterns
that fire during SCCVN expression simplification.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-07-24  Richard Biener  

* match.pd: Add BIT_FIELD_REF canonicalizations.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 262940)
+++ gcc/match.pd(working copy)
@@ -4659,6 +4659,19 @@ (define_operator_list COND_TERNARY
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
 (simplify
+ (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
+ (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))
+
+(simplify
+ (BIT_FIELD_REF (view_convert @0) @1 @2)
+ (BIT_FIELD_REF @0 @1 @2))
+
+(simplify
+ (BIT_FIELD_REF @0 @1 integer_zerop)
+ (if (tree_int_cst_equal (@1, TYPE_SIZE (TREE_TYPE (@0
+  (view_convert @0)))
+
+(simplify
  (BIT_FIELD_REF @0 @1 @2)
  (switch
   (if (TREE_CODE (TREE_TYPE (@0)) == COMPLEX_TYPE


[PATCH] Make __resource_adaptor_imp usable with C++17 memory_resource

2018-07-24 Thread Jonathan Wakely

By making the memory_resource base class a template parameter the
__resource_adaptor_imp can be used to adapt an allocator into a
std::pmr::memory_resource instead of experimental::pmr::memory_resource.

No uses for this in the library but somebody might want to do it, and
it costs us nothing to support.

* include/experimental/memory_resource: Adjust comments and
whitespace.
(__resource_adaptor_imp): Add second template parameter for type of
memory resource base class.
(memory_resource): Define default constructor, destructor, copy
constructor and copy assignment operator as defaulted.

Tested powerpc64le-linux, committed to trunk.

commit ce04fa1c00b40a938cc25a264836a2e30149056e
Author: Jonathan Wakely 
Date:   Tue Jul 24 12:24:53 2018 +0100

Make __resource_adaptor_imp usable with C++17 memory_resource

By making the memory_resource base class a template parameter the
__resource_adaptor_imp can be used to adapt an allocator into a
std::pmr::memory_resource instead of experimental::pmr::memory_resource.

* include/experimental/memory_resource: Adjust comments and
whitespace.
(__resource_adaptor_imp): Add second template parameter for type of
memory resource base class.
(memory_resource): Define default constructor, destructor, copy
constructor and copy assignment operator as defaulted.

diff --git a/libstdc++-v3/include/experimental/memory_resource 
b/libstdc++-v3/include/experimental/memory_resource
index 83379d1367a..7ce64457a11 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -29,12 +29,12 @@
 #ifndef _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE
 #define _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE 1
 
-#include 
+#include   // align, uses_allocator, __uses_alloc
+#include // pair, 
experimental::erased_type
+#include   // atomic
 #include 
-#include 
-#include 
 #include 
-#include 
+#include 
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
@@ -51,39 +51,41 @@ inline namespace fundamentals_v2 {
 namespace pmr {
 #define __cpp_lib_experimental_memory_resources 201402L
 
-  class memory_resource;
-
-  template 
-class polymorphic_allocator;
-
-  template 
-class __resource_adaptor_imp;
-
-  template 
-using resource_adaptor = __resource_adaptor_imp<
-  typename allocator_traits<_Alloc>::template rebind_alloc>;
-
-  template 
-struct __uses_allocator_construction_helper;
-
-  // Global memory resources
-  memory_resource* new_delete_resource() noexcept;
-  memory_resource* null_memory_resource() noexcept;
-
-  // The default memory resource
-  memory_resource* get_default_resource() noexcept;
-  memory_resource* set_default_resource(memory_resource* __r) noexcept;
-
   // Standard memory resources
 
   // 8.5 Class memory_resource
+  class memory_resource;
+
+  // 8.6 Class template polymorphic_allocator
+  template
+class polymorphic_allocator;
+
+  template
+class __resource_adaptor_imp;
+
+  // 8.7 Alias template resource_adaptor
+  template
+using resource_adaptor = __resource_adaptor_imp<
+  typename allocator_traits<_Alloc>::template rebind_alloc>;
+
+  // 8.8 Global memory resources
+  memory_resource* new_delete_resource() noexcept;
+  memory_resource* null_memory_resource() noexcept;
+  memory_resource* get_default_resource() noexcept;
+  memory_resource* set_default_resource(memory_resource* __r) noexcept;
+
+  // TODO 8.9 Pool resource classes
+
   class memory_resource
   {
-  protected:
 static constexpr size_t _S_max_align = alignof(max_align_t);
 
   public:
-virtual ~memory_resource() { }
+memory_resource() = default;
+memory_resource(const memory_resource&) = default;
+virtual ~memory_resource() = default;
+
+memory_resource& operator=(const memory_resource&) = default;
 
 void*
 allocate(size_t __bytes, size_t __alignment = _S_max_align)
@@ -109,18 +111,15 @@ namespace pmr {
   };
 
   inline bool
-  operator==(const memory_resource& __a,
-const memory_resource& __b) noexcept
+  operator==(const memory_resource& __a, const memory_resource& __b) noexcept
   { return &__a == &__b || __a.is_equal(__b); }
 
   inline bool
-  operator!=(const memory_resource& __a,
-const memory_resource& __b) noexcept
+  operator!=(const memory_resource& __a, const memory_resource& __b) noexcept
   { return !(__a == __b); }
 
 
-  // 8.6 Class template polymorphic_allocator
-  template 
+  template
 class polymorphic_allocator
 {
   using __uses_alloc1_ = __uses_alloc1;
@@ -134,14 +133,15 @@ namespace pmr {
   template
void
_M_construct(__uses_alloc1_, _Tp1* __p, _Args&&...  __args)
-   { ::new(__p) _Tp1(allocator_arg, this->resource(),
- std::forward<_Args>(__args)...); }
+   {
+ 

[PATCH] PR libstdc++/70966 fix lifetime bug for default resource

2018-07-24 Thread Jonathan Wakely

Similar to what I did for the new_delete_resource and
null_memory_resource objects, this makes the atomic
immortal. This ensure it can still be used after static destructors
start running.

PR libstdc++/70966
* include/experimental/memory_resource (__get_default_resource): Use
placement new to create an object with dynamic storage duration.

Tested powerpc64le-linux, committed to trunk.


commit 5e51f3630b506d993737c95c65b251acaa433076
Author: Jonathan Wakely 
Date:   Mon Jul 23 23:30:38 2018 +0100

PR libstdc++/70966 fix lifetime bug for default resource

PR libstdc++/70966
* include/experimental/memory_resource (__get_default_resource): Use
placement new to create an object with dynamic storage duration.

diff --git a/libstdc++-v3/include/experimental/memory_resource 
b/libstdc++-v3/include/experimental/memory_resource
index 61273fc2c85..83379d1367a 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -459,12 +459,6 @@ namespace pmr {
 };
 
   // Global memory resources
-  inline std::atomic&
-  __get_default_resource()
-  {
-static atomic _S_default_resource(new_delete_resource());
-return _S_default_resource;
-  }
 
   inline memory_resource*
   new_delete_resource() noexcept
@@ -499,6 +493,16 @@ namespace pmr {
   }
 
   // The default memory resource
+
+  inline std::atomic&
+  __get_default_resource()
+  {
+using type = atomic;
+alignas(type) static unsigned char __buf[sizeof(type)];
+static type* __r = new(__buf) type(new_delete_resource());
+return *__r;
+  }
+
   inline memory_resource*
   get_default_resource() noexcept
   { return __get_default_resource().load(); }


[PATCH] Reorder conditions in uses-allocator construction helper

2018-07-24 Thread Jonathan Wakely

The erased_type condition is only true for code using the Library
Fundamentals TS, so assume it's less common and only check it after
checking for convertibility.

This does mean for types using erased_type the more expensive
convertibility check is done first, but such types are rare.

* include/bits/uses_allocator.h (__is_erased_or_convertible): Reorder
conditions. Add comments.
* testsuite/20_util/uses_allocator/69293_neg.cc: Adjust dg-error line.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.
* testsuite/20_util/scoped_allocator/69293_neg.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.


commit baf0d8cca17bc1223580b416c501f4b74bbca9b3
Author: redi 
Date:   Tue Jul 24 13:03:25 2018 +

Reorder conditions in uses-allocator construction helper

The erased_type condition is only true for code using the Library
Fundamentals TS, so assume it's less common and only check it after
checking for convertibility.

This does mean for types using erased_type the more expensive
convertibility check is done first, but such types are rare.

* include/bits/uses_allocator.h (__is_erased_or_convertible): 
Reorder
conditions. Add comments.
* testsuite/20_util/uses_allocator/69293_neg.cc: Adjust dg-error 
line.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.
* testsuite/20_util/scoped_allocator/69293_neg.cc: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@262945 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index 820a2a59894..3ef2830bebc 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -36,11 +36,15 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  // This is used for std::experimental::erased_type from Library Fundamentals.
   struct __erased_type { };
 
+  // This also supports the "type-erased allocator" protocol from the
+  // Library Fundamentals TS, where allocator_type is erased_type.
+  // The second condition will always be false for types not using the TS.
   template
 using __is_erased_or_convertible
-  = __or_, is_convertible<_Alloc, _Tp>>;
+  = __or_, is_same<_Tp, __erased_type>>;
 
   /// [allocator.tag]
   struct allocator_arg_t { explicit allocator_arg_t() = default; };
diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc 
b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
index 621ff47c7a3..168079fd5f9 100644
--- a/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
@@ -46,5 +46,5 @@ test01()
   scoped_alloc sa;
   auto p = sa.allocate(1);
   sa.construct(p);  // this is required to be ill-formed
-  // { dg-error "static assertion failed" "" { target *-*-* } 90 }
+  // { dg-error "static assertion failed" "" { target *-*-* } 94 }
 }
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
index 348ed41a7bc..eaf432491c6 100644
--- a/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
@@ -44,5 +44,5 @@ test01()
 {
   alloc_type a;
   std::tuple t(std::allocator_arg, a); // this is required to be ill-formed
-  // { dg-error "static assertion failed" "" { target *-*-* } 90 }
+  // { dg-error "static assertion failed" "" { target *-*-* } 94 }
 }
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/cons_neg.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/cons_neg.cc
index 8894e389cec..bb8c38d1e49 100644
--- a/libstdc++-v3/testsuite/20_util/uses_allocator/cons_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/cons_neg.cc
@@ -43,4 +43,4 @@ void test01()
 
   tuple t(allocator_arg, a, 1);
 }
-// { dg-error "static assertion failed" "" { target *-*-* } 90 }
+// { dg-error "static assertion failed" "" { target *-*-* } 94 }


Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
>> Another drawback is that the fake uses confuse the unitialized warning
>> analysis, so that is switched off for -fkeep-vars-live.
> 
> Is that really needed?  I.e. can't you for the purpose of uninitialized
> warning analysis ignore the clobber = var uses?
> 

This seems to work on the test-case that failed during testing
(g++.dg/uninit-pred-4.C):
...
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 77f090bfa80..953db9ed02d 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
tree var,
   if (is_gimple_assign (context)
   && gimple_assign_rhs_code (context) == COMPLEX_EXPR)
 return;
+  if (gimple_assign_single_p (context)
+  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
+return;
   if (!has_undefined_value_p (t))
 return;
...
But I don't know the pass well enough to know whether this is a
sufficient fix.


> Is the -fkeep-vars-live option -fcompare-debug friendly?
> 

I think so, there's no reference to debug flags or instructions.

>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -3533,6 +3533,13 @@ expand_clobber (tree lhs)
>>  }
>>  }
>>  
>> +static void
>> +expand_use (tree rhs)
>> +{
>> +  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
>> +  emit_use (target);
>> +}
> 
> Missing function comment.
> 
>> +fkeep-vars-live
>> +Common Report Var(flag_keep_vars_live) Optimization
>> +Add artificial uses of local vars at end of scope.
> 
> at the end of scope?

Is this better?
+Add artificial use for each local variable at the end of the
declaration scope

Thanks,
- Tom


[PATCH] Explain asan parameters in params.def (PR sanitizer/79635).

2018-07-24 Thread Martin Liška
Hi.

That's simple patch that improves documentation as requested
in the PR.

Ready for trunk?
Martin

gcc/ChangeLog:

2018-07-24  Martin Liska  

PR sanitizer/79635
* params.def: Explain ASan abbreviation and provide
a documentation link.
---
 gcc/params.def | 2 ++
 1 file changed, 2 insertions(+)


diff --git a/gcc/params.def b/gcc/params.def
index df3a06b5d9d..6bedeb5aa2b 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1189,6 +1189,8 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
 	  "strength reduction.",
 	  50, 1, 99)
 
+/* ASan stands for AddressSanitizer: https://github.com/google/sanitizers.  */
+
 DEFPARAM (PARAM_ASAN_STACK,
  "asan-stack",
  "Enable asan stack protection.",



[PATCH] Come up with TARGET_GET_VALID_OPTION_VALUES option hook (PR driver/83193).

2018-07-24 Thread Martin Liška
Hi.

I'm sending updated version of the patch. It comes up with a new target common 
hook
that provide option completion list. It's used both in --help=target and with 
--completion
option. I implemented support for -match and -mtune for i386 target.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Martin
>From 28171804c0aaa5c992eb38d2193ce36746b27231 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 20 Jul 2018 09:58:16 +0200
Subject: [PATCH] Come up with TARGET_GET_VALID_OPTION_VALUES option hook (PR
 driver/83193).

gcc/ChangeLog:

2018-07-24  Martin Liska  

PR driver/83193
	* common/common-target.def: Add TARGET_GET_VALID_OPTION_VALUES.
	* common/common-targhooks.c (default_get_valid_option_values):
New function.
	* common/common-targhooks.h (default_get_valid_option_values):
Likewise.
	* common/config/i386/i386-common.c: Move processor_target_table
from i386.c.
	(ix86_get_valid_option_values): New function.
	(TARGET_GET_VALID_OPTION_VALUES): New macro.
	* config/i386/i386.c (struct ptt): Move to i386-common.c.
	(PTA_*): Move all defined masks into i386-common.c.
	(ix86_function_specific_restore): Use new processor_cost_table.
	* config/i386/i386.h (struct ptt): Moved from i386.c.
	(struct pta): Likewise.
	* doc/tm.texi: Document new TARGET_GET_VALID_OPTION_VALUES.
	* doc/tm.texi.in: Likewise.
	* opt-suggestions.c (option_proposer::suggest_option):
Pass prefix to build_option_suggestions.
	(option_proposer::get_completions): Likewise.
	(option_proposer::build_option_suggestions): Use the new target
hook.
	* opts.c (struct option_help_tuple): New struct.
	(print_filtered_help): Use the new target hook.

gcc/testsuite/ChangeLog:

2018-07-24  Martin Liska  

	* gcc.dg/completion-4.c: New test.
---
 gcc/common/common-target.def |  10 +
 gcc/common/common-targhooks.c|   9 +
 gcc/common/common-targhooks.h|   1 +
 gcc/common/config/i386/i386-common.c | 266 +
 gcc/config/i386/i386.c   | 413 +++
 gcc/config/i386/i386.h   | 146 ++
 gcc/doc/tm.texi  |   4 +
 gcc/doc/tm.texi.in   |   2 +
 gcc/opt-suggestions.c|  27 +-
 gcc/opt-suggestions.h|   6 +-
 gcc/opts.c   |  33 +++
 gcc/testsuite/gcc.dg/completion-4.c  |   6 +
 12 files changed, 546 insertions(+), 377 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/completion-4.c

diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index e0afbc6af29..99d5264448a 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -80,6 +80,16 @@ DEFHOOK
  bool, (bool report, struct gcc_options *opts),
  hook_bool_bool_gcc_optionsp_false)
 
+DEFHOOK
+(get_valid_option_values,
+"The hook is used for options that have a non-trivial list of\
+ possible option values.  OPTION_CODE is option code of opt_code\
+ enum type.  PREFIX is used for bash completion and allows an implementation\
+ to return more specific completion based on the prefix.  All string values\
+ should be allocated from heap memory and consumers should release them.",
+ vec, (int option_code, const char *prefix),
+ default_get_valid_option_values)
+
 /* Leave the boolean fields at the end.  */
 
 /* True if unwinding tables should be generated by default.  */
diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
index b1090190664..747c7da55ca 100644
--- a/gcc/common/common-targhooks.c
+++ b/gcc/common/common-targhooks.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "common/common-target.h"
 #include "common/common-targhooks.h"
+#include "opts.h"
 
 /* Determine the exception handling mechanism for the target.  */
 
@@ -77,6 +78,14 @@ default_target_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
   return true;
 }
 
+/* Default version of TARGET_GET_VALID_OPTION_VALUES.  */
+
+vec
+default_get_valid_option_values (int option_code, const char *prefix)
+{
+  return vec ();
+}
+
 const struct default_options empty_optimization_table[] =
   {
 { OPT_LEVELS_NONE, 0, NULL, 0 }
diff --git a/gcc/common/common-targhooks.h b/gcc/common/common-targhooks.h
index d290d7f3e21..4bdf8efdbe6 100644
--- a/gcc/common/common-targhooks.h
+++ b/gcc/common/common-targhooks.h
@@ -28,6 +28,7 @@ extern bool default_target_handle_option (struct gcc_options *,
 	  struct gcc_options *,
 	  const struct cl_decoded_option *,
 	  location_t);
+extern vec default_get_valid_option_values (int, const char *);
 
 extern const struct default_options empty_optimization_table[];
 
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 70b3c3f2fc3..35ac71be5e5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1459,4 +1459,270 @@ i386_except_unwind_info (struct gcc_options 

Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Jakub Jelinek
On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
> Another drawback is that the fake uses confuse the unitialized warning
> analysis, so that is switched off for -fkeep-vars-live.

Is that really needed?  I.e. can't you for the purpose of uninitialized
warning analysis ignore the clobber = var uses?

Is the -fkeep-vars-live option -fcompare-debug friendly?

> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -3533,6 +3533,13 @@ expand_clobber (tree lhs)
>  }
>  }
>  
> +static void
> +expand_use (tree rhs)
> +{
> +  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  emit_use (target);
> +}

Missing function comment.

> +fkeep-vars-live
> +Common Report Var(flag_keep_vars_live) Optimization
> +Add artificial uses of local vars at end of scope.

at the end of scope?
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1264,6 +1264,23 @@ asan_poison_variables (hash_set *variables, bool 
> poison, gimple_seq *seq_p
>  }
>  }
>  
> +static gimple_seq
> +gimple_build_uses (tree vars)

Missing function comment.

> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -1368,6 +1368,10 @@ refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool 
> tbaa_p)
>poly_int64 max_size1 = -1, max_size2 = -1;
>bool var1_p, var2_p, ind1_p, ind2_p;
>  
> +  if ((ref1->ref && TREE_CLOBBER_P (ref1->ref)) ||

|| should go on the next line.

> +  (ref2->ref && TREE_CLOBBER_P (ref2->ref)))
> +return false;
> +
>gcc_checking_assert ((!ref1->ref
>   || TREE_CODE (ref1->ref) == SSA_NAME
>   || DECL_P (ref1->ref)

> --- a/gcc/tree-ssa-uninit.c
> +++ b/gcc/tree-ssa-uninit.c
> @@ -2628,7 +2628,7 @@ warn_uninitialized_phi (gphi *phi, vec 
> *worklist,
>  static bool
>  gate_warn_uninitialized (void)
>  {
> -  return warn_uninitialized || warn_maybe_uninitialized;
> +  return (warn_uninitialized || warn_maybe_uninitialized) && 
> !flag_keep_vars_live;
>  }

See above.

Jakub


[RFC 3/3, debug] Add fdebug-nops and fkeep-vars-live to Og only

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

This switches on fdebug-nops and fkeep-vars-live by default for Og only, in
order to excercise the new switches in combination with the optimization
option they're intended to be used with.

Resulting fixes:
...
FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line . z == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 12
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 12
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 6
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 6
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+7 a[1] == 25
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+6 a[2] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+5 *p == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 p[-1] == 25
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 q[1] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 25
...

The intended nature of Og is: offering a reasonable level of optimization
while maintaining fast compilation and a good debugging experience.
Todo: assess compile time and runtime cost of adding adding fdebug-nops and
fkeep-vars-live to Og, to see if nature of Og has been changed.

PR debug/78685

[debug] Add fdebug-nops and fkeep-vars-live to Og only

---
 gcc/common/common-target.h | 1 +
 gcc/opts.c | 6 ++
 gcc/testsuite/gcc.dg/pr84614.c | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/common/common-target.h b/gcc/common/common-target.h
index 9b796c8f085..17e8950a027 100644
--- a/gcc/common/common-target.h
+++ b/gcc/common/common-target.h
@@ -34,6 +34,7 @@ enum opt_levels
   OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og.  */
   OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og.  */
   OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og.  */
+  OPT_LEVELS_1_DEBUG, /* -Og.  */
   OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os.  */
   OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og.  */
   OPT_LEVELS_3_PLUS, /* -O3 and above.  */
diff --git a/gcc/opts.c b/gcc/opts.c
index 17d91988ada..e8142eb6920 

[RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

This patch adds fake uses of user variables at the point where they go out of
scope, to keep user variables inspectable throughout the application.

This approach will generate sub-optimal code: in some cases, the executable
code will go through efforts to keep a var alive, while var-tracking can
easily compute the value of the var from something else.

Also, the compiler treats the fake use as any other use, so it may keep an
expensive resource like a register occupied (if we could mark the use as a
cold use or some such, we could tell optimizers that we need the value, but
it's ok if getting the value is expensive, so it could be spilled instead of
occupying a register).

The current implementation is expected to increase register pressure, and
therefore spilling, but we'd still expect less memory accesses then with O0.

Another drawback is that the fake uses confuse the unitialized warning
analysis, so that is switched off for -fkeep-vars-live.

PR debug/78685

[debug] Add fkeep-vars-live

---
 gcc/cfgexpand.c  |  9 +++
 gcc/common.opt   |  4 +++
 gcc/function.c   |  5 ++--
 gcc/gimplify.c   | 44 +++-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
 gcc/tree-cfg.c   |  1 +
 gcc/tree-sra.c   |  7 +
 gcc/tree-ssa-alias.c |  4 +++
 gcc/tree-ssa-structalias.c   |  3 ++-
 gcc/tree-ssa-uninit.c|  2 +-
 10 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..eb9e36a8c5b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3533,6 +3533,13 @@ expand_clobber (tree lhs)
 }
 }
 
+static void
+expand_use (tree rhs)
+{
+  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  emit_use (target);
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3632,6 +3639,8 @@ expand_gimple_stmt_1 (gimple *stmt)
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
  expand_clobber (lhs);
+   else if (TREE_CLOBBER_P (lhs))
+ expand_use (rhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
diff --git a/gcc/common.opt b/gcc/common.opt
index 984b351cd79..a29e320ba45 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1496,6 +1496,10 @@ fdebug-nops
 Common Report Var(flag_debug_nops) Optimization
 Insert nops if that might improve debugging of optimized code.
 
+fkeep-vars-live
+Common Report Var(flag_keep_vars_live) Optimization
+Add artificial uses of local vars at end of scope.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/function.c b/gcc/function.c
index dee303cdbdd..6367d282db3 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -1964,11 +1964,12 @@ instantiate_virtual_regs (void)
   {
/* These patterns in the instruction stream can never be recognized.
   Fortunately, they shouldn't contain virtual registers either.  */
-if (GET_CODE (PATTERN (insn)) == USE
-   || GET_CODE (PATTERN (insn)) == CLOBBER
+if (GET_CODE (PATTERN (insn)) == CLOBBER
|| GET_CODE (PATTERN (insn)) == ASM_INPUT
|| DEBUG_MARKER_INSN_P (insn))
  continue;
+else if (GET_CODE (PATTERN (insn)) == USE)
+ instantiate_virtual_regs_in_rtx ( (insn));
else if 

[RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

Consider this gdb session in foo of pr54200-2.c (using -Os):
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
'a' has unknown type; cast it to its declared type
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The problem is that the scope in which a is declared ends at .LBE7, and the
statement .loc for line 26 ends up attached to the ret insn:
...
.loc 1 24 11
addl%edx, %eax
.LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
.LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
for O1 and higher) by adding a nop after the ".loc is_stmt 1" annotation:
...
.loc 1 24 11
addl%edx, %eax
 .LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
+   nop
 .LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

and instead we have:
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
$1 = 6
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The option should have the same effect as using a debugger with location views
capability.

There's a design principle in GCC that code generation and debug generation
are independent.  This guarantees that if you're encountering a problem in an
application without debug info, you can recompile it with -g and be certain
that you can reproduce the same problem, and use the debug info to debug the
problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
breaks this invariant, but we consider this acceptable since it is intended
for use in combination with -Og -g in the standard edit-compile-debug cycle,
where -g is assumed to be already present at the point of encountering a
problem.

PR debug/78685

[debug] Add fdebug-nops

---
 gcc/common.opt   |  4 +++
 gcc/testsuite/gcc.dg/guality/pr54200-2.c | 37 +
 gcc/var-tracking.c   | 46 
 3 files changed, 87 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4bf8de90331..984b351cd79 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1492,6 +1492,10 @@ Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fdebug-nops
+Common Report Var(flag_debug_nops) Optimization
+Insert nops if that might improve debugging of optimized code.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/testsuite/gcc.dg/guality/pr54200-2.c 
b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
new file mode 100644
index 000..e30e3c92b94
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* }  { "*" } { "-Og" "-Os" "-O0" } } */
+/* { dg-options "-g -fdebug-nops -DMAIN" } */
+
+#include "prevent-optimization.h"
+
+int o ATTRIBUTE_USED;
+
+void
+bar (void)
+{
+  o = 2;
+}
+
+int __attribute__((noinline,noclone))
+foo (int z, int x, int b)
+{
+  if (x == 1)
+{
+  bar ();
+  return z;
+}
+  else
+{
+  int a = (x + z) + b;
+  /* Add cast to prevent unsupported.  */
+  return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
+}
+}
+
+#ifdef MAIN
+int main ()
+{
+  foo (3, 2, 1);
+  return 0;
+}
+#endif
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 5537fa64085..1349454d5f2 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -10427,6 +10427,49 @@ vt_finalize (void)
   vui_allocated = 0;
 }
 
+static void
+insert_debug_nops (void)
+{
+  rtx_insn *debug_marker = 

Re: [RFC][debug] Add -fadd-debug-nops

2018-07-24 Thread Tom de Vries
On 07/16/2018 05:10 PM, Tom de Vries wrote:
> On 07/16/2018 03:50 PM, Richard Biener wrote:
>> On Mon, 16 Jul 2018, Tom de Vries wrote:
>>> Any comments?
>>
>> Interesting idea.  I wonder if that should be generalized
>> to other places
> 
> I kept the option name general, to allow for that.
> 
> And indeed, this is a point-fix patch. I've been playing around with a
> more generic patch that adds nops such that each is_stmt .loc is
> associated with a unique insn, but that was embedded in an
> fkeep-vars-live branch, so this patch is minimally addressing the first
> problem I managed to reproduce on trunk.
> 
>> and how we can avoid compare-debug failures
>> (var-tracking usually doesn't change code-generation).
>>
> 

I'll post this patch series (the current state of my fkeep-vars-live
branch) in reply to this email:

 1  [debug] Add fdebug-nops
 2  [debug] Add fkeep-vars-live
 3  [debug] Add fdebug-nops and fkeep-vars-live to Og only

Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
header comments missing.

Comments welcome.

Thanks,
- Tom


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Jakub Jelinek
On Tue, Jul 24, 2018 at 12:29:37PM +0200, Tom de Vries wrote:
> > I wonder if it makes sense to disambiguate things from the gcc side
> > by generating an empty location description for known optimized out
> > values (the standard seems to explicitely call that out as a way to
> > say "optimized out").

You can't emit the empty location descriptions in the middle of other
expressions though, and if you have e.g. an expression that uses
DW_OP_GNU_variable_size only conditionally, even if you know that
the corresponding DIE doesn't have location, the containing DWARF expression
could still be well defined if the condition guarding the
DW_OP_GNU_variable_size isn't true.
And, also, for DW_OP_GNU_variable_size DIE lookup should be performed,
say you have only a declaration in the TU, but definition somewhere else,
in that case the debugger should be using that other TU's definition
with hopefully location description.  Or if there is a global variable
interposed by some other TU, again, DW_OP_GNU_variable_size should follow
to whatever variable wins.

Jakub


RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-24 Thread Joseph Myers
On Tue, 24 Jul 2018, tamar.christ...@arm.com wrote:

> This patch defines a configure option to allow the setting of the default
> guard size via configure flags when building the target.

If you add a configure option, you must also add documentation for it in 
install.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Tom de Vries
On 07/19/2018 10:30 AM, Richard Biener wrote:
> On Wed, Jul 18, 2018 at 3:42 PM Tom de Vries  wrote:
>>
>> On 07/06/2018 12:28 PM, Richard Biener wrote:
>>> On Thu, Jul 5, 2018 at 4:12 PM Tom de Vries  wrote:

 On 07/05/2018 01:39 PM, Richard Biener wrote:
> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:
>>
>> [ was: Re: [testsuite/guality, committed] Prevent optimization of local 
>> in
>> vla-1.c ]
>>
>> On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
>>> On 07/03/2018 11:05 AM, Tom de Vries wrote:
 On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>> should be available via DW_OP_[GNU_]entry_value.
>>
>> I see it is
>>
>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 
>> 31
>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>
>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, 
>> the
>> storage itself doesn't have a location but the
>> type specifies the size.
>>
>> (gdb) ptype a
>> type = char [variable length]
>> (gdb) p sizeof(a)
>> $3 = 0
>>
>> this looks like a gdb bug to me?
>>

 With gdb patch:
 ...
 diff --git a/gdb/findvar.c b/gdb/findvar.c
 index 8ad5e25cb2..ebaff923a1 100644
 --- a/gdb/findvar.c
 +++ b/gdb/findvar.c
 @@ -789,6 +789,8 @@ default_read_var_value
break;

  case LOC_OPTIMIZED_OUT:
 +  if (is_dynamic_type (type))
 +   type = resolve_dynamic_type (type, NULL,
 +/* Unused address.  */ 0);
return allocate_optimized_out_value (type);

  default:
 ...

 I get:
 ...
 $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
 Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.

 Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
 17return a[0];
 $1 = 6
 ...

>>>
>>> Well, for -O1 and -O2.
>>>
>>> For O3, I get instead:
>>> ...
>>> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
>>> Breakpoint 1 at 0x4004b0: f1. (2 locations)
>>>
>>> Breakpoint 1, f1 (i=5) at vla-1.c:17
>>> 17return a[0];
>>> $1 = 0
>>> ...
>>>
>>
>> Hi,
>>
>> When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is 
>> optimized
>> away, but f1 still contains a debug expression describing the upper 
>> bound of the
>> vla (D.1914):
>> ...
>>  __attribute__((noinline))
>>  f1 (intD.6 iD.1900)
>>  {
>>
>>saved_stack.1_2 = __builtin_stack_save ();
>># DEBUG BEGIN_STMT
>># DEBUG D#3 => i_1(D) + 1
>># DEBUG D#2 => (long intD.8) D#3
>># DEBUG D#1 => D#2 + -1
>># DEBUG D.1914 => (sizetype) D#1
>> ...
>>
>> Then f1 is cloned to a version f1.constprop with no parameters, 
>> eliminating
>> parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
>> Consequently, 'print sizeof (a)' yields '0' in gdb.
>
> So does gdb correctly recognize there isn't any size available or do we 
> somehow
> generate invalid debug info, not recognizing that D#3 => NULL means
> "optimized out" and thus all dependent expressions are "optimized out" as 
> well?
>
> That is, shouldn't gdb do
>
> (gdb) print sizeof (a)
> 
>
> ?

 The type for the vla gcc is emitting is an DW_TAG_array_type with
 DW_TAG_subrange_type without DW_AT_upper_bound or DW_AT_count, which
 makes the upper bound value 'unknown'. So I'd say the debug info is valid.
>>>
>>> OK, that sounds reasonable.  I wonder if languages like Ada have a way
>>> to declare an array type with unknown upper bound but known lower bound.
>>> For
>>>
>>> typedef int arr[];
>>> arr *x;
>>>
>>> we generate just
>>>
>>>  <1><2d>: Abbrev Number: 2 (DW_TAG_typedef)
>>> <2e>   DW_AT_name: arr
>>> <32>   DW_AT_decl_file   : 1
>>> <33>   DW_AT_decl_line   : 1
>>> <34>   DW_AT_decl_column : 13
>>> <35>   DW_AT_type: <0x39>
>>>  <1><39>: Abbrev Number: 3 (DW_TAG_array_type)
>>> <3a>   DW_AT_type: <0x44>
>>> <3e>   DW_AT_sibling : <0x44>
>>>  <2><42>: Abbrev Number: 4 (DW_TAG_subrange_type)
>>>  <2><43>: Abbrev Number: 0
>>>
>>> which does
>>>
>>> (gdb) ptype arr
>>> type = int []
>>> (gdb) ptype x
>>> type = int (*)[]
>>> (gdb) p sizeof 

RE: [PATCH][GCC][front-end][opt-framework] Update options framework for parameters to properly handle and validate configure time params. [Patch (2/3)]

2018-07-24 Thread tamar . christina
Hi All,

This patch is re-spun to handle the configure changes in patch 4 / 6 of the 
previous series.

This patch now changes it so that default parameters are validated during
initialization. This change is needed to ensure parameters set via by the
target specific common initialization routines still keep the parameters within
the valid range.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-24  Tamar Christina  

* params.c (validate_param): New.
(add_params): Use it.
(set_param_value): Refactor param validation into validate_param.
(diagnostic.h): Include.
* diagnostic.h (diagnostic_ready_p): New.

> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, July 11, 2018 20:24
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; jos...@codesourcery.com
> Subject: Re: [PATCH][GCC][front-end][opt-framework] Update options
> framework for parameters to properly handle and validate configure time
> params. [Patch (2/3)]
> 
> On 07/11/2018 05:24 AM, Tamar Christina wrote:
> > Hi All,
> >
> > This patch builds on a previous patch to pass param options down from
> > configure by adding more expansive validation and correctness checks.
> >
> > These are set very early on and allow the target to validate or reject
> > the values as they see fit.
> >
> > To do this compiler_param has been extended to hold a value set at
> > configure time, this value is used to be able to distinguish between
> >
> > 1) default value
> > 2) configure value
> > 3) back-end default
> > 4) user specific value.
> >
> > The priority of the values should be 4 > 2 > 3 > 1.  The compiler will
> > now also validate the values in params.def after setting them.  This
> > means invalid values will no longer be accepted.
> >
> > This also changes it so that default parameters are validated during
> > initialization. This change is needed to ensure parameters set via
> > configure or by the target specific common initialization routines
> > still keep the parameters within the valid range.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> > Both targets were tested with stack clash on and off by default.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-11  Tamar Christina  
> >
> > * params.h (struct param_info): Add configure_value.
> > * params.c (DEFPARAMCONF): New.
> > (DEFPARAM, DEFPARAMENUM5): Set configure_value.
> > (validate_param): New.
> > (add_params): Use it.
> > (set_param_value): Refactor param validation into validate_param.
> > (maybe_set_param_value): Don't override value from configure.
> > (diagnostic.h): Include.
> > * params-enum.h (DEFPARAMCONF): New.
> > * params-list.h: Likewise.
> > * params-options.h: Likewise.
> > * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE):
> Use it.
> > * diagnostic.h (diagnostic_ready_p): New.
> Generally OK, though probably should depend on what we decide WRT
> configurability.  ie, I'm not convinced we need to be able to set the default
> via a configure time option.  And if we don't support that this patch gets
> somewhat simpler.
> 
> jeff
> >

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index cf3a610f3d945f2dbbfde7d9cf7a66f46ad6f0b1..584b5877b489d3cce5c18da2db5f73b7b41a72a4 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -250,6 +250,10 @@ diagnostic_inhibit_notes (diagnostic_context * context)
and similar functions.  */
 extern diagnostic_context *global_dc;
 
+/* Returns whether the diagnostic framework has been intialized already and is
+   ready for use.  */
+#define diagnostic_ready_p() (global_dc->printer != NULL)
+
 /* The total count of a KIND of diagnostics emitted so far.  */
 #define diagnostic_kind_count(DC, DK) (DC)->diagnostic_count[(int) (DK)]
 
diff --git a/gcc/params.c b/gcc/params.c
index eb663be880a91dc0adce2a84c6bad7e06b4c72c3..b6a33dfd6bf8c4df43fdac91e30ac6d082f39071 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "params-enum.h"
 #include "diagnostic-core.h"
+#include "diagnostic.h"
 #include "spellcheck.h"
 
 /* An array containing the compiler parameters and their current
@@ -58,6 +59,10 @@ static const param_info lang_independent_params[] = {
   { NULL, 0, 0, 0, NULL, NULL }
 };
 
+static bool
+validate_param (const int value, const param_info param, const int index);
+
+
 /* Add the N PARAMS to the current list of compiler parameters.  */
 
 void
@@ -68,12 +73,26 @@ add_params (const param_info params[], size_t n)
   /* Allocate enough space for the new parameters.  */
   compiler_params = XRESIZEVEC (param_info, compiler_params,
 num_compiler_params + n);
+  param_info *dst_params = compiler_params + num_compiler_params;

RE: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-clash is on [Patch (6/6)]

2018-07-24 Thread tamar . christina
Hi All,

This patch cleans up the testsuite when a run is done with stack clash
protection turned on.

Concretely this switches off -fstack-clash-protection for a couple of tests:

* sve: We don't yet support stack-clash-protection and sve, so for now turn 
these off.
* assembler scan: some tests are quite fragile in that they check for exact
   assembly output, e.g. check for exact amount of sub etc.  These won't
   match now.
* vla: Some of the ubsan tests negative array indices. Because the arrays 
weren't
   used before the incorrect $sp wouldn't have been used. The correct value 
is
   restored on ret.  Now however we probe the $sp which causes a segfault.
* params: When testing the parameters we have to skip these on AArch64 because 
of our
  custom constraints on them.  We already test them separately so this 
isn't a
  loss.

Note that the testsuite is not entire clean due to gdb failure caused by alloca 
with
stack clash. On AArch64 we output an incorrect .loc directive, but this is 
already the
case with the current implementation in GCC and is a bug unrelated to this 
patch series.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/testsuite/
2018-07-24  Tamar Christina  

PR target/86486
* gcc.dg/pr82788.c: Skip for AArch64.
* gcc.dg/guality/vla-1.c: Turn off stack-clash.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
* gcc.dg/params/blocksort-part.c: Skip stack-clash checks
on AArch64.
* gcc.dg/stack-check-10.c: Add AArch64 specific checks.
* gcc.dg/stack-check-5.c: Add AArch64 specific checks.
* gcc.dg/stack-check-6a.c: Skip on AArch64, we don't support this.
* testsuite/lib/target-supports.exp
(check_effective_target_frame_pointer_for_non_leaf): AArch64 does not
require frame pointer for non-leaf functions.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 11, 2018 12:23
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-
> clash is on [Patch (6/6)]
> 
> Hi All,
> 
> This patch cleans up the testsuite when a run is done with stack clash
> protection turned on.
> 
> Concretely this switches off -fstack-clash-protection for a couple of tests:
> 
> * sve: We don't yet support stack-clash-protection and sve, so for now turn
> these off.
> * assembler scan: some tests are quite fragile in that they check for exact
>assembly output, e.g. check for exact amount of sub etc.  These won't
>match now.
> * vla: Some of the ubsan tests negative array indices. Because the arrays
> weren't
>used before the incorrect $sp wouldn't have been used. The correct
> value is
>restored on ret.  Now however we probe the $sp which causes a segfault.
> * params: When testing the parameters we have to skip these on AArch64
> because of our
>   custom constraints on them.  We already test them separately so this
> isn't a
>   loss.
> 
> Note that the testsuite is not entire clean due to gdb failure caused by 
> alloca
> with stack clash. On AArch64 we output an incorrect .loc directive, but this 
> is
> already the case with the current implementation in GCC and is a bug
> unrelated to this patch series.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   gcc.dg/pr82788.c: Skip for AArch64.
>   gcc.dg/guality/vla-1.c: Turn off stack-clash.
>   gcc.target/aarch64/subsp.c: Likewise.
>   gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
>   gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
>   gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
>   gcc.dg/params/blocksort-part.c: Skip stack-clash checks
>   on AArch64.
> 
> --
diff --git a/gcc/testsuite/c-c++-common/ubsan/vla-1.c b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
index 52ade3aab7566dce3ca7ef931ac65895005d5e13..c97465edae195442a71ee66ab25015a2ac4fc8fc 100644
--- a/gcc/testsuite/c-c++-common/ubsan/vla-1.c
+++ b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable" } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable -fno-stack-clash-protection" } */
 
 typedef long int V;
 int x = -1;
diff --git a/gcc/testsuite/gcc.dg/params/blocksort-part.c 

RE: [PATCH][GCC][AArch64] Set default values for stack-clash and do basic validation in back-end. [Patch (5/6)]

2018-07-24 Thread tamar . christina
Hi All,

This patch is a cascade update from having to re-spin the configure patch (no# 
4 in the series).

This patch enforces that the default guard size for stack-clash protection for
AArch64 be 64KB unless the user has overriden it via configure in which case
the user value is used as long as that value is within the valid range.

It also does some basic validation to ensure that the guard size is only 4KB or
64KB and also enforces that for aarch64 the stack-clash probing interval is
equal to the guard size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-24  Tamar Christina  

PR target/86486
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Add validation for stack-clash parameters and set defaults.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 11, 2018 12:23
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Set default values for stack-clash and do
> basic validation in back-end. [Patch (5/6)]
> 
> Hi All,
> 
> This patch enforces that the default guard size for stack-clash protection for
> AArch64 be 64KB unless the user has overriden it via configure in which case
> the user value is used as long as that value is within the valid range.
> 
> It also does some basic validation to ensure that the guard size is only 4KB 
> or
> 64KB and also enforces that for aarch64 the stack-clash probing interval is
> equal to the guard size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
>   Add validation for stack-clash parameters.
> 
> --
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e2c34cdfc96a1d3f99f7e4834c66a7551464a518..30c62c406e10793fe041d54c73316a6c8d7c229f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10916,6 +10916,37 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
+  /* If the user hasn't change it via configure then set the default to 64 KB
+ for the backend.  */
+  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
+			 DEFAULT_STK_CLASH_GUARD_SIZE == 0
+			   ? 16 : DEFAULT_STK_CLASH_GUARD_SIZE,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
+  /* Validate the guard size.  */
+  int guard_size = PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  if (guard_size != 12 && guard_size != 16)
+  error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
+	 "size.  Given value %d (%llu KB) is out of range.\n",
+	 guard_size, (1ULL << guard_size) / 1024ULL);
+
+  /* Enforce that interval is the same size as size so the mid-end does the
+ right thing.  */
+  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			 guard_size,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
+  /* The maybe_set calls won't update the value if the user has explicitly set
+ one.  Which means we need to validate that probing interval and guard size
+ are equal.  */
+  int probe_interval
+= PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
+  if (guard_size != probe_interval)
+error ("stack clash guard size '%d' must be equal to probing interval "
+	   "'%d'\n", guard_size, probe_interval);
+
   /* Enable sw prefetching at specified optimization level for
  CPUS that have prefetch.  Lower optimization level threshold by 1
  when profiling is enabled.  */



RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-24 Thread tamar . christina
Hi Jeff,

This patch defines a configure option to allow the setting of the default
guard size via configure flags when building the target.

The new flag is:

 * --with-stack-clash-protection-guard-size=

The patch defines a new macro DEFAULT_STK_CLASH_GUARD_SIZE which targets need
to use explicitly is they want to support this configure flag and values that
users may have set.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-24  Tamar Christina  

PR target/86486
* configure.ac: Add stack-clash-protection-guard-size.
* config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
* params.def: Update comment for guard-size.
* configure: Regenerate.

> -Original Message-
> From: Jeff Law 
> Sent: Monday, July 23, 2018 23:19
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ;
> jos...@codesourcery.com; bonz...@gnu.org; d...@redhat.com;
> nero...@gcc.gnu.org; aol...@redhat.com; ralf.wildenh...@gmx.de
> Subject: Re: [PATCH][GCC][front-end][build-machinery][opt-framework]
> Allow setting of stack-clash via configure options. [Patch (4/6)]
> 
> On 07/20/2018 09:39 AM, Tamar Christina wrote:
> >>
> >> On 07/20/2018 05:03 AM, Tamar Christina wrote:
>  Understood.  Thanks for verifying.  I wonder if we could just bury
>  this entirely in the aarch64 config files and not expose the
>  default into
> >> params.def?
> 
> >>>
> >>> Burying it in config.gcc isn't ideal because if your C runtime is
> >>> configurable (like uClibc) it means you have to go and modify this
> >>> file every time you change something. If the argument is against
> >>> having these defines in the params and not the configure flag itself
> >>> then I
> >> can just have an aarch64 specific configure flag and just use the
> >> created define directly in the AArch64 back-end.
> >> Not config.gcc, but in a .h/.c file for the target.
> >>
> >> If we leave the generic option, but override the default in the target 
> >> files.
> >> Would that work?
> >
> > So leaving the generic configure option? Yes that would work too. The
> > only downside is that if we have want to do any validation on the
> > value at configure time it would need to be manually kept in sync with
> > those in params.def. Or we'd just have to not do any checking at
> > configure time.  This would mean you can get to the end of your build and
> only when you try to use the compiler would it complain.
> >
> > Both aren't a real deal breaker to me.
> >
> > Shall I then just leave the configure flag but remove the params plumbing?
> Yea, I think any sanity check essentially has to move to when the compiler
> runs.  We can always return to param removal at a later point.
> 
> Can you post an updated patch?  Note I'm on PTO starting Wed for a week.
>  If you post it tomorrow I'll try to take a look before I disappear.
> 
> jeff
diff --git a/gcc/config.in b/gcc/config.in
index 2856e72d627df537a301a6c7ab6b5bbb75f6b43f..f3b301ef5afdaf0db8865e11601980f19ea0b3dd 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -55,6 +55,12 @@
 #endif
 
 
+/* Define to larger than zero set the default stack clash protector size. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_STK_CLASH_GUARD_SIZE
+#endif
+
+
 /* Define if you want to use __cxa_atexit, rather than atexit, to register C++
destructors for local statics and global objects. This is essential for
fully standards-compliant handling of destructors, but requires
diff --git a/gcc/configure b/gcc/configure
index 60d373982fd38fe51c285e2b02941754d1b833d6..42ec5b536bee90adb319d172eb7cca1a363a87b6 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -905,6 +905,7 @@ enable_valgrind_annotations
 with_stabs
 enable_multilib
 enable_multiarch
+with_stack_clash_protection_guard_size
 enable___cxa_atexit
 enable_decimal_float
 enable_fixed_point
@@ -1724,6 +1725,9 @@ Optional Packages:
   --with-gnu-as   arrange to work with GNU as
   --with-as   arrange to use the specified as (full pathname)
   --with-stabsarrange to use stabs instead of host debug format
+  --with-stack-clash-protection-guard-size=size
+  Set the default stack clash protection guard size
+  for specific targets.
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
   --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
@@ -7436,6 +7440,35 @@ $as_echo "$enable_multiarch$ma_msg_suffix" >&6; }
 
 
 
+# default stack clash protection guard size
+# Please keep these in sync with params.def.
+stk_clash_min=12
+stk_clash_max=30
+stk_clash_default=12
+
+# Keep the default value when the option is not used to 0, this allows us to
+# distinguish between the cases where the user specifially set a value via
+# 

Re: [PATCH] Fix up pr19476-{1,5}.C (PR testsuite/86649)

2018-07-24 Thread Jakub Jelinek
On Tue, Jul 24, 2018 at 12:08:35PM +0200, Richard Biener wrote:
> OK - can you add a variant with -O2 that tests it at EVRP time then?

Here is what I've committed to trunk then:

2018-07-24  Jakub Jelinek  

PR testsuite/86649
* g++.dg/tree-ssa-/pr19476-1.C: Check dom2 dump instead of ccp1.
* g++.dg/tree-ssa-/pr19476-5.C: Likewise.
* g++.dg/tree-ssa-/pr19476-6.C: New test.
* g++.dg/tree-ssa-/pr19476-7.C: New test.

--- gcc/testsuite/g++.dg/tree-ssa/pr19476-1.C.jj2015-05-29 
15:04:33.037803445 +0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr19476-1.C   2018-07-24 11:39:10.108897097 
+0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp1 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O -fdump-tree-dom2 -fdelete-null-pointer-checks" } */
 /* { dg-skip-if "" keeps_null_pointer_checks } */
 
 // See pr19476-5.C for a version without including .
@@ -12,5 +12,5 @@ int g(){
   return 42 + (0 == new int[50]);
 }
 
-/* { dg-final { scan-tree-dump "return 42" "ccp1" } } */
-/* { dg-final { scan-tree-dump-not "return 33" "ccp1" } } */
+/* { dg-final { scan-tree-dump "return 42" "dom2" } } */
+/* { dg-final { scan-tree-dump-not "return 33" "dom2" } } */
--- gcc/testsuite/g++.dg/tree-ssa/pr19476-5.C.jj2015-05-29 
15:04:33.038803430 +0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr19476-5.C   2018-07-24 11:39:26.190913802 
+0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp1 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O -fdump-tree-dom2 -fdelete-null-pointer-checks" } */
 /* { dg-skip-if "" keeps_null_pointer_checks } */
 
 // See pr19476-1.C for a version that includes .
@@ -8,4 +8,4 @@ int g(){
   return 42 + (0 == new int[50]);
 }
 
-/* { dg-final { scan-tree-dump "return 42" "ccp1" } } */
+/* { dg-final { scan-tree-dump "return 42" "dom2" } } */
--- gcc/testsuite/g++.dg/tree-ssa/pr19476-6.C.jj2018-07-24 
12:09:30.321890628 +0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr19476-6.C   2018-07-24 12:10:49.812987922 
+0200
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp -fdelete-null-pointer-checks" } */
+/* { dg-skip-if "" keeps_null_pointer_checks } */
+
+// See pr19476-7.C for a version without including .
+#include 
+
+int f(){
+  return 33 + (0 == new(std::nothrow) int);
+}
+int g(){
+  return 42 + (0 == new int[50]);
+}
+
+/* { dg-final { scan-tree-dump "return 42" "evrp" } } */
+/* { dg-final { scan-tree-dump-not "return 33" "evrp" } } */
--- gcc/testsuite/g++.dg/tree-ssa/pr19476-7.C.jj2018-07-24 
12:09:33.034893945 +0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr19476-7.C   2018-07-24 12:11:03.657004866 
+0200
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp -fdelete-null-pointer-checks" } */
+/* { dg-skip-if "" keeps_null_pointer_checks } */
+
+// See pr19476-6.C for a version that includes .
+
+int g(){
+  return 42 + (0 == new int[50]);
+}
+
+/* { dg-final { scan-tree-dump "return 42" "evrp" } } */


Jakub


Re: Improve std::rotate usages

2018-07-24 Thread François Dumont

Ping.

On 08/06/2018 07:54, François Dumont wrote:

Gentle reminder.

On 27/05/2018 19:25, François Dumont wrote:

Still no chance to review it ?

I'd like this one to go in before submitting other algo related patches.

    * include/bits/stl_algo.h
    (__rotate(_Ite, _Ite, _Ite, forward_iterator_tag))
    (__rotate(_Ite, _Ite, _Ite, bidirectional_iterator_tag))
    (__rotate(_Ite, _Ite, _Ite, random_access_iterator_tag)): Move 
code duplication...

    (rotate(_Ite, _Ite, _Ite)): ...here.
    (__stable_partition_adaptive(_FIt, _FIt, _Pred, _Dist, _Pointer, 
_Dist)):

    Simplify rotate call.
    (__rotate_adaptive(_BIt1, _BIt1, _BIt1, _Dist, _Dist, _Bit2, 
_Dist)):

    Likewise.
    (__merge_without_buffer(_BIt, _BIt, _BIt, _Dist, _Dist, _Comp)):
    Likewise.

François

On 14/05/2018 22:14, François Dumont wrote:

Any feedback regarding this patch ?


On 02/05/2018 07:26, François Dumont wrote:

Hi

    std::rotate already returns the expected iterator so there is 
no need for calls to std::advance/std::distance.


Tested under Linux x86_64, ok to commit ?

François











  1   2   >