Re: [v2 of PATCH 06/14] Strip location wrappers in operand_equal_p

2017-12-18 Thread Jakub Jelinek
On Sun, Dec 17, 2017 at 09:07:54PM -0500, David Malcolm wrote:
> What do you think?
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu, as
> part of the kit.
> 
> gcc/ChangeLog:
>   * fold-const.c (operand_equal_p): Strip any location wrappers,
>   before computing hashes.
> ---
>  gcc/fold-const.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0f11076..2b938900 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -2804,6 +2804,9 @@ combine_comparisons (location_t loc,
>  int
>  operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
>  {
> +  STRIP_ANY_LOCATION_WRAPPER (arg0);
> +  STRIP_ANY_LOCATION_WRAPPER (arg1);
> +

Certainly not at this spot.
The checking that hashing ignores the location wrappers needs to be done
first.  And inchash:add_expr needs to ignore those.

Jakub


Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-12-18 Thread Jakub Jelinek
On Tue, Nov 14, 2017 at 01:57:21PM +0100, Rainer Orth wrote:
> > And the preexisting:
> > case "$host" in
> >   *-*-darwin*) MAC_INTERPOSE=true ; enable_static=no ;;
> >   *) MAC_INTERPOSE=false ;;
> > esac
> > AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
> > looks wrong too.
> 
> Wrong in which way?  Just the location?
> 
> >> --- a/libsanitizer/lsan/lsan_common.h
> >> +++ b/libsanitizer/lsan/lsan_common.h
> >> @@ -20,6 +20,7 @@
> >>  #include "sanitizer_common/sanitizer_stoptheworld.h"
> >>  #include "sanitizer_common/sanitizer_symbolizer.h"
> >>  
> >> +#ifndef CAN_SANITIZE_LEAKS
> >
> > Obviously better would be to move the Darwin version checks here, but if
> > upstream refuses to do that, we can do it this way too.
> 
> Kostya suggested the same.  I see now that
> sanitizer_common/sanitizer_platform_interceptors.h already uses
> __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__, so I'll give it a try.

Any progress with this?

Jakub


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-12-18 Thread Martin Liška
On 12/05/2017 10:27 AM, Jakub Jelinek wrote:
> On Tue, Nov 21, 2017 at 12:59:46PM +0100, Martin Liška wrote:
>> On 10/16/2017 10:39 PM, Martin Liška wrote:
>>> Hi.
>>>
>>> All nits included in mainline review request I've just done:
>>> https://reviews.llvm.org/D38971
>>>
>>> Martin
>>
>> Hi.
>>
>> There's updated version of patch where I added new test-cases and it's 
>> rebased
>> with latest version of libsanitizer changes. This is subject for 
>> libsanitizer review process.
> 
> A slightly modified version has been finally accepted upstream, here is the
> updated patch with that final version cherry-picked, plus small changes to
> make it apply after the POINTER_DIFF_EXPR changes, and a lot of testsuite
> tweaks, so that we don't run it 7 times with -O0, but with different
> optimization levels, cleanups etc.

Hi Jakub.

Thanks for finalizing the patch review and for the clean up you've done.

> The most important change I've done in the testsuite was pointer-subtract-2.c
> used -fsanitize=address,pointer-subtract, but the function was actually
> doing pointer comparison.  Guess that needs to be propagated upstream at
> some point.  Another thing is that in pointer-compare-1.c where you test
> p - 1, p and p, p - 1 on the global variables, we need to ensure there is
> some other array before it, otherwise we run into the issue that there is no
> red zone before the first global (and when optimizing, global objects seems
> to be sorted by decreasing size).

I'll add there minor issues to my TODO list and I'll create mainline patch 
review.

Martin

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
> 
> 2017-12-05  Martin Liska  
>   Jakub Jelinek  
> 
> gcc/
>   * doc/invoke.texi: Document the options.
>   * flag-types.h (enum sanitize_code): Add
>   SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
>   * ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
>   of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
>   * opts.c: Define new sanitizer options.
>   * sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
>   (BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.
> gcc/c/
>   * c-typeck.c (pointer_diff): Add new argument and instrument
>   pointer subtraction.
>   (build_binary_op): Similar for pointer comparison.
> gcc/cp/
>   * typeck.c (pointer_diff): Add new argument and instrument
>   pointer subtraction.
>   (cp_build_binary_op): Create compound expression if doing an
>   instrumentation.
> gcc/testsuite/
>   * c-c++-common/asan/pointer-compare-1.c: New test.
>   * c-c++-common/asan/pointer-compare-2.c: New test.
>   * c-c++-common/asan/pointer-subtract-1.c: New test.
>   * c-c++-common/asan/pointer-subtract-2.c: New test.
>   * c-c++-common/asan/pointer-subtract-3.c: New test.
>   * c-c++-common/asan/pointer-subtract-4.c: New test.
> libsanitizer/
>   * asan/asan_descriptions.cc: Cherry-pick upstream r319668.
>   * asan/asan_descriptions.h: Likewise.
>   * asan/asan_report.cc: Likewise.
>   * asan/asan_thread.cc: Likewise.
>   * asan/asan_thread.h: Likewise.
> 
> --- gcc/c/c-typeck.c.jj   2017-12-01 19:42:09.504222186 +0100
> +++ gcc/c/c-typeck.c  2017-12-04 22:41:50.680290866 +0100
> @@ -95,7 +95,7 @@ static tree lookup_field (tree, tree);
>  static int convert_arguments (location_t, vec, tree,
> vec *, vec *, tree,
> tree);
> -static tree pointer_diff (location_t, tree, tree);
> +static tree pointer_diff (location_t, tree, tree, tree *);
>  static tree convert_for_assignment (location_t, location_t, tree, tree, tree,
>   enum impl_conv, bool, tree, tree, int);
>  static tree valid_compound_expr_initializer (tree, tree);
> @@ -3768,10 +3768,11 @@ parser_build_binary_op (location_t locat
>  }
>  
>  /* Return a tree for the difference of pointers OP0 and OP1.
> -   The resulting tree has type ptrdiff_t.  */
> +   The resulting tree has type ptrdiff_t.  If POINTER_SUBTRACT sanitization 
> is
> +   enabled, assign to INSTRUMENT_EXPR call to libsanitizer.  */
>  
>  static tree
> -pointer_diff (location_t loc, tree op0, tree op1)
> +pointer_diff (location_t loc, tree op0, tree op1, tree *instrument_expr)
>  {
>tree restype = ptrdiff_type_node;
>tree result, inttype;
> @@ -3815,6 +3816,17 @@ pointer_diff (location_t loc, tree op0,
>  pedwarn (loc, OPT_Wpointer_arith,
>"pointer to a function used in subtraction");
>  
> +  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
> +{
> +  gcc_assert (current_function_decl != NULL_TREE);
> +
> +  op0 = save_expr (op0);
> +  op1 = save_expr (op1);
> +
> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
> +  *instrument_expr = build_call_expr_loc (loc, tt, 2, op0, op1);
> +}
> 

Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 11:21:35AM +0100, Rainer Orth wrote:
> I've been using the following in my tree.  Still need to try and get
> this upstream.

Please.  If that doesn't work, I think we need to do it in configure,
sanitizer_internal_defs.h seems to be included very early, so there
is no easy header to override say in include/system/ tree.

> Speaking of which, would it be acceptable to do another full
> libsanitizer merge at this point?  My initial Solaris port has landed
> now, and the necessary changes outside of libsanitizer are almost
> exclusively in Solaris-specific files.  I've already gave it a quick try
> and found no regressions on Linux/x86_64.

I'm afraid it is too late, we have way too many breakages on the trunk and
are seriously behind schedule, and sanitizer merges pretty much always
create numerous regressions.

Jakub


Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-18 Thread Richard Biener
On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
 wrote:
> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law  wrote:
>>I hate this patch.
>>
>>The fundamental problem we have is that there are times when we very
>>much want to thread jumps and yet there are other times when we do not.
>>
>>To date we have been able to largely select between those two by
>>looking
>>at the shape of the CFG and the jump thread to see how threading a
>>particular jump would impact the shape of the CFG (particularly loops
>>in
>>the CFG).
>>
>>In this BZ we have a loop like this:
>>
>>2
>>|
>>3<---+
>>||
>>4<---+   |
>>   / \   |   |
>>  5   6  |   |
>>   \  /  |   |
>> 7   |   |
>>/ \  |   |
>>   8  11-+   |
>>  / \|
>> 9  10---+
>> |
>> E
>>
>>We want to thread the path (6, 7) (7, 11).  ie, there's a path through
>>that inner loop where we don't need to do the loop test.  Here's an
>>example (from libgomp testsuite)
>>
>>(N is 1500)
>>
>> for (j = 0; j < N; j++)
>>{
>>  if (j > 500)
>>{
>>  x[i][j] = i + j + 3;
>>  y[j] = i*j + 10;
>>}
>>  else
>>x[i][j] = x[i][j]*3;
>>}
>>
>>
>>
>>Threading (in effect) puts the loop test into both arms of the
>>conditional, then removes it from the ELSE arm because the condition is
>>always "keep looping" in that arm.
>>
>>This plays poorly with loop parallelization -- I tried to follow what's
>>going on in there and just simply got lost.  I got as far as seeing
>>that
>>the parallelization code thought there were loop carried dependencies.
>>At least that's how it looked to me.  But I don't see any loop carried
>>dependencies in the code.
>
> Hmm. I'll double check if I remember on Monday.

So I did and the ultimate reason GRAPHITE FAILs is because for
the loop with the threading applied we can no longer compute
number of iterations.  That's of course bad for _all_ loop optimizers,
not just GRAPHITE (we just don't have any other test coverage).
The main issue here is that after the threading is applied the
block with the loop exit no longer dominates the loop latch.
This means the loop should really be split but it also means we should
definitely avoid performing such threading before loop optimizers run,
and not just if parallelizing loops.

Can you adjust your patch this way?  The condition you check seems
to more or less match the case where we thread through the exit
test _into_ the loop?  The bad thing is really if in the resulting CFG
the exit test isn't dominating the latch (thus we have an exit test
that is not always executed):

bool
number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   struct tree_niter_desc *niter,
   gcond **at_stmt, bool every_iteration)
{
...
  safe = dominated_by_p (CDI_DOMINATORS, loop->latch, exit->src);

  if (every_iteration && !safe)
return false;

and later passing it to

  if (!number_of_iterations_cond (loop, type, , code, , niter,
  loop_only_exit_p (loop, exit), safe))

which has

  if (!every_iteration
  && (!iv0->no_overflow || !iv1->no_overflow
  || code == NE_EXPR || code == EQ_EXPR))
return false;

and in our case code is NE_EXPR (_16 != 1).

I'm not sure if we eventually can lift this restriction if 'exit' is the only
exit from 'loop'.

The particular case can of course be handled given the exit test
and the test making it not dominating the latch are related
(operating on a related IV).

Richard.

>>
>>You might naturally ask if threading this is actually wise.  It seems
>>to
>>broadly fit into the cases where we throttle threading so as not to
>>muck
>>around too much with the loop structure.  It's not terrible to detect a
>>CFG of this shape and avoid threading.
>>
>>BUT
>>
>>
>>You can have essentially the same shape CFG (not 100% the same, but the
>>same key characteristics), but where jump threading simplifies things
>>in
>>ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
>>jump threading avoids spurious warnings (graphite/scop-4.c)
>>
>>Initially I thought I'd seen a key difference in the contents of the
>>latch block, but I think I was just up too late and mis-read the dump.
>>
>>So I don't see anything in the CFG shape or the contents of the blocks
>>that can be reasonably analyzed at jump threading time.  Given we have
>>two opposite needs and no reasonable way I can find to select between
>>them, I'm resorting to a disgusting hack.  Namely to avoid threading
>>through the latch to another point in the loop *when loop
>>parallelization is enabled*.
>>
>>Let me be clear.  This is a hack.  I don't like it, not one little bit.
>>But I don't see a way to resolve the regression without introducing
>>other regressions 

Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-12-18 Thread Rainer Orth
Hi Jakub,

> On Tue, Nov 14, 2017 at 01:57:21PM +0100, Rainer Orth wrote:
>> > And the preexisting:
>> > case "$host" in
>> >   *-*-darwin*) MAC_INTERPOSE=true ; enable_static=no ;;
>> >   *) MAC_INTERPOSE=false ;;
>> > esac
>> > AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
>> > looks wrong too.
>> 
>> Wrong in which way?  Just the location?
>> 
>> >> --- a/libsanitizer/lsan/lsan_common.h
>> >> +++ b/libsanitizer/lsan/lsan_common.h
>> >> @@ -20,6 +20,7 @@
>> >>  #include "sanitizer_common/sanitizer_stoptheworld.h"
>> >>  #include "sanitizer_common/sanitizer_symbolizer.h"
>> >>  
>> >> +#ifndef CAN_SANITIZE_LEAKS
>> >
>> > Obviously better would be to move the Darwin version checks here, but if
>> > upstream refuses to do that, we can do it this way too.
>> 
>> Kostya suggested the same.  I see now that
>> sanitizer_common/sanitizer_platform_interceptors.h already uses
>> __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__, so I'll give it a try.
>
> Any progress with this?

I've been using the following in my tree.  Still need to try and get
this upstream.

Speaking of which, would it be acceptable to do another full
libsanitizer merge at this point?  My initial Solaris port has landed
now, and the necessary changes outside of libsanitizer are almost
exclusively in Solaris-specific files.  I've already gave it a quick try
and found no regressions on Linux/x86_64.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -63,7 +63,13 @@
 // SANITIZER_SUPPORTS_WEAK_HOOKS means that we support real weak functions that
 // will evaluate to a null pointer when not defined.
 #ifndef SANITIZER_SUPPORTS_WEAK_HOOKS
-#if (SANITIZER_LINUX || SANITIZER_MAC) && !SANITIZER_GO
+#if SANITIZER_LINUX && !SANITIZER_GO
+# define SANITIZER_SUPPORTS_WEAK_HOOKS 1
+// Before Xcode 4.5, the Darwin linker doesn't reliably support undefined
+// weak symbols.  Mac OS X 10.9/Darwin 13 is the first release only supported
+// by Xcode >= 4.5.
+#elif SANITIZER_MAC && \
+__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 1090 && !SANITIZER_GO
 # define SANITIZER_SUPPORTS_WEAK_HOOKS 1
 #else
 # define SANITIZER_SUPPORTS_WEAK_HOOKS 0


Re: [PATCH][arm] Add -mverbose-cost-dump and de-verbosify cost dumps

2017-12-18 Thread Kyrill Tkachov


On 18/12/17 04:09, Sandra Loosemore wrote:

On 12/14/2017 10:50 PM, Sandra Loosemore wrote:

On 12/14/2017 11:48 AM, Kyrill Tkachov wrote:


[snip]

Thanks, done. I haven't created a new Target-specific developers 
options table but instead listed the targets the options apply to in 
parentheses.

Attached is the latest iteration.

Thank you for the review,
Kyrill

2017-12-14  Kyrylo Tkachov  

 * doc/invoke.texi (GCC Developer options): Add Target-specific
 developer options subsection.  Populate it with AArch64 and ARM
 options.
 (Options Summary): List the above.


Hmmm, there are still various problems here, and now I notice that 
there's already an entry for -moverride= in the AArch64 Options 
section.   I need to go through the docs for all targets and make 
sure everything is treated uniformly, so I think it would be more 
efficient for me to take this over than to continue to iterate with 
you to get the patch right.  I'll try to get to it this weekend.


OK, I've checked in the attached patch.

I apologize for having given you a bum steer on this previously. :-( 
When I dug around in the existing documentation lists, I found a dozen 
or more existing target-specific developer options already listed in 
the sections for the respective targets (FRV, in particular, has a 
whole pile of them documented as "for debugging the compiler").  I 
thought it made more sense in terms of the highly target-specific-ness 
of what many of these options do to leave them all where they are now, 
as well as less work than moving them all to another section.  So I 
similarly added the docs for the missing ARM and AArch64 developer 
options to the existing sections for those respective targets as well.




Thanks Sandra, this does look better now.
Kyrill


-Sandra




c-family PATCH for warn_logical_operator

2017-12-18 Thread Marek Polacek
While looking into something else I noticed that because of the missing
warn_logical_op check we do unnecessary work in warn_logical_operator, like
calling fold_for_warn which can put stuff into cv/fold_cache etc.  It also
causes more noise when debugging.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-12-18  Marek Polacek  

* c-warn.c (warn_logical_operator): Return early if -Wlogical-op is
not in effect.

--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -180,6 +180,9 @@ warn_logical_operator (location_t location, enum tree_code 
code, tree type,
   tree low0, low1, low, high0, high1, high, lhs, rhs, tem;
   bool strict_overflow_p = false;
 
+  if (!warn_logical_op)
+return;
+
   if (code != TRUTH_ANDIF_EXPR
   && code != TRUTH_AND_EXPR
   && code != TRUTH_ORIF_EXPR

Marek


Re: [wwwdocs] mention AVR additions

2017-12-18 Thread Georg-Johann Lay

On 14.12.2017 21:44, Gerald Pfeifer wrote:

Hi Johann,

On Wed, 13 Dec 2017, Georg-Johann Lay wrote:

This adds AVR improvements to v8 Release Notes.


that's quite impressive a set of improvements!

Index: changes.html
===
    The new devices are filed under     href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html;>-mmcu=avrxmega3. 



I'm not sure whether "filed under" is the best choice of wording (and
when I say that, I really am not).  So perhaps use "listed" under or
similar?


Ok.


features like PROGMEM and __flash
are no more needed (as opposed to other AVR families for which


"are not needed any more"


Ok.


that the compiler is used with Binutils2.29 or newer so that
read-only data will be located in flash memory, see feature
https://sourceware.org/PR21472;>PR21472.


How about using "read-only data will be located in flash memory" the
text for that link, and omit the "see feature PR..." part?  That'll
make it easier to read.


Ok.

  A new command line option -mshort-calls is 
supported.


"command-line"


Fixed.


This option is used internally for multilib selection of the
avrxmega3 variants.
It is not an optimization option, and you don't need to set
it by hand.


"do not" to emphasize this more strongly.


Ok.

    The compiler now implements feature href="http://gcc.gnu.org/PR20296;>PR20296

    and will generate more efficient interrupt service routine (ISR)
    prologues and epilogues.


Here I suggest "The compiler now generates..." (and perhaps make this a
link to the PR, but use https here).


Ok.


  A new command line option -mno-gas-isr-prologues


"command-line"


Fixed.


has been added.  It disables the generation of the


How about "A new command line option -mno-gas-isr-prologues
disable the generation..."?  Shorter and easier to read.


Ok.


Any non-naked ISR will save and restore SREG, tmp_reg and zero_reg,


... around SREG, tmp_reg and zero_reg.


Fixed.

  The feature is turned on per default for all optimization 
levels

except for -O0 and -Og. It can still be
  enabled by means of option -mgas-isr-prologues.


"It is explicitly enabled by..."


Ok.


  Support has been added for a new
href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html;>AVR 
function attribute

no_gccisr.


Is there something missing after "a new"?  "A new" what?


...a new "AVR function attribute" ... where the quoted text is
hyper-linked to the documentation of AVR function attributes.


Please go ahead and commit (taking the above into account) and just
share the final patch here in response.

Thank you for taking the time to document all this!

Gerald


Thank you for the friendly review.  Applied as attached.

Johann

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.22
diff -r1.22 changes.html
179c179,238
< 
---
> AVR
> 
>   
> The AVR port now supports the following XMEGA-like devices:
> 
>   ATtiny212, ATtiny214, ATtiny412, ATtiny414, ATtiny416, ATtiny417,
>   ATtiny814, ATtiny816, ATtiny817, ATtiny1614, ATtiny1616, ATtiny1617,
>   ATtiny3214, ATtiny3216, ATtiny3217
> 
> The new devices are listed under 
> https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html;>-mmcu=avrxmega3.
> 
>   These devices see flash memory in the RAM address space, so that
> 	features like PROGMEM and __flash
> 	are not needed any more (as opposed to other AVR families for which
> 	read-only data will be located in RAM except special, non-standard
> 	features are used to locate and access such data). This requires
> 	that the compiler is used with Binutils2.29 or newer so that
> 	https://sourceware.org/PR21472;>read-only data will be
> 	  located in flash memory.
>   A new command-line option -mshort-calls is supported.
> 	This option is used internally for multilib selection of the
> 	avrxmega3 variants. It is
> 	not an optimization option. Do not set it by hand.
> 
>   
>   
> The compiler now generates
> https://gcc.gnu.org/PR20296;> efficient interrupt service routine
>   (ISR) prologues and epilogues.  This is achieved by using the new
> https://sourceware.org/binutils/docs-2.29/as/AVR-Pseudo-Instructions.html;>
>   AVR pseudo instruction __gcc_isr which is supported
> and resolved by the GNU assembler.
> 
>   As the __gcc_isr pseudo-instruction will be resolved by
> 	the assembler, inline assembly is transparent to the process.
> 	This means that when inline assembly uses an instruction like
> 	INC that clobbers the condition code,
> 	then the assembler will detect this and generate an appropriate
> 	ISR prologue / epilogue chunk to save / restore SREG as needed.
>   A new command-line option -mno-gas-isr-prologues
> 	disables the generation of the __gcc_isr 

Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-12-18 Thread Rainer Orth
Hi Jakub,

> On Mon, Dec 18, 2017 at 11:21:35AM +0100, Rainer Orth wrote:
>> I've been using the following in my tree.  Still need to try and get
>> this upstream.
>
> Please.  If that doesn't work, I think we need to do it in configure,
> sanitizer_internal_defs.h seems to be included very early, so there
> is no easy header to override say in include/system/ tree.

done now as https://reviews.llvm.org/D41346.  Together with the revised
version of https://reviews.llvm.org/D39888 this has restored Darwin 11
build and sanitizer testing for me for quite some time.

>> Speaking of which, would it be acceptable to do another full
>> libsanitizer merge at this point?  My initial Solaris port has landed
>> now, and the necessary changes outside of libsanitizer are almost
>> exclusively in Solaris-specific files.  I've already gave it a quick try
>> and found no regressions on Linux/x86_64.
>
> I'm afraid it is too late, we have way too many breakages on the trunk and
> are seriously behind schedule, and sanitizer merges pretty much always
> create numerous regressions.

Understood, just wanted to ask.  This gives me more time to work out
64-bit support and a couple of warts.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [01/13] Add a qimode_for_vec_perm helper function

2017-12-18 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:08 AM, Richard Sandiford
 wrote:
> The vec_perm code falls back to doing byte-level permutes if
> element-level permutes aren't supported.  There were two copies
> of the code to calculate the mode, and later patches add another,
> so this patch splits it out into a helper function.
>

Ok.

> 2017-12-09  Richard Sandiford  
>
> gcc/
> * optabs-query.h (qimode_for_vec_perm): Declare.
> * optabs-query.c (can_vec_perm_p): Split out qimode search to...
> (qimode_for_vec_perm): ...this new function.
> * optabs.c (expand_vec_perm): Use qimode_for_vec_perm.
>
> Index: gcc/optabs-query.h
> ===
> --- gcc/optabs-query.h  2017-12-09 22:47:12.476364764 +
> +++ gcc/optabs-query.h  2017-12-09 22:47:14.730310076 +
> @@ -174,6 +174,7 @@ enum insn_code can_extend_p (machine_mod
>  enum insn_code can_float_p (machine_mode, machine_mode, int);
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
> +opt_machine_mode qimode_for_vec_perm (machine_mode);
>  bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
>  /* Find a widening optab even if it doesn't widen as much as we want.  */
>  #define find_widening_optab_handler(A, B, C) \
> Index: gcc/optabs-query.c
> ===
> --- gcc/optabs-query.c  2017-12-09 22:47:12.476364764 +
> +++ gcc/optabs-query.c  2017-12-09 22:47:14.729310075 +
> @@ -345,6 +345,22 @@ can_conditionally_move_p (machine_mode m
>return direct_optab_handler (movcc_optab, mode) != CODE_FOR_nothing;
>  }
>
> +/* If a target doesn't implement a permute on a vector with multibyte
> +   elements, we can try to do the same permute on byte elements.
> +   If this makes sense for vector mode MODE then return the appropriate
> +   byte vector mode.  */
> +
> +opt_machine_mode
> +qimode_for_vec_perm (machine_mode mode)
> +{
> +  machine_mode qimode;
> +  if (GET_MODE_INNER (mode) != QImode
> +  && mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists ()
> +  && VECTOR_MODE_P (qimode))
> +return qimode;
> +  return opt_machine_mode ();
> +}
> +
>  /* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
> expanded using SIMD extensions of the CPU.  SEL may be NULL, which
> stands for an unknown constant.  Note that additional permutations
> @@ -375,9 +391,7 @@ can_vec_perm_p (machine_mode mode, bool
>  return true;
>
>/* We allow fallback to a QI vector mode, and adjust the mask.  */
> -  if (GET_MODE_INNER (mode) == QImode
> -  || !mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists ()
> -  || !VECTOR_MODE_P (qimode))
> +  if (!qimode_for_vec_perm (mode).exists ())
>  return false;
>
>/* ??? For completeness, we ought to check the QImode version of
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2017-12-09 22:47:12.476364764 +
> +++ gcc/optabs.c2017-12-09 22:47:14.731310077 +
> @@ -5452,9 +5452,7 @@ expand_vec_perm (machine_mode mode, rtx
>
>/* Set QIMODE to a different vector mode with byte elements.
>   If no such mode, or if MODE already has byte elements, use VOIDmode.  */
> -  if (GET_MODE_INNER (mode) == QImode
> -  || !mode_for_vector (QImode, w).exists ()
> -  || !VECTOR_MODE_P (qimode))
> +  if (!qimode_for_vec_perm (mode).exists ())
>  qimode = VOIDmode;
>
>/* If the input is a constant, expand it specially.  */


Re: [08/13] Add a vec_perm_indices_to_tree helper function

2017-12-18 Thread Richard Biener
On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
 wrote:
> This patch adds a function for creating a VECTOR_CST from a
> vec_perm_indices, operating directly on the encoding.

Ok.

>
> 2017-12-09  Richard Sandiford  
>
> gcc/
> * vec-perm-indices.h (vec_perm_indices_to_tree): Declare.
> * vec-perm-indices.c (vec_perm_indices_to_tree): New function.
> * tree-ssa-forwprop.c (simplify_vector_constructor): Use it.
> * tree-vect-slp.c (vect_transform_slp_perm_load): Likewise.
> * tree-vect-stmts.c (vectorizable_bswap): Likewise.
> (vect_gen_perm_mask_any): Likewise.
>
> Index: gcc/vec-perm-indices.h
> ===
> --- gcc/vec-perm-indices.h  2017-12-09 22:48:47.548825399 +
> +++ gcc/vec-perm-indices.h  2017-12-09 22:48:50.361942571 +
> @@ -73,6 +73,7 @@ typedef int_vector_builder  };
>
>  bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
> +tree vec_perm_indices_to_tree (tree, const vec_perm_indices &);
>  rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
>
>  inline
> Index: gcc/vec-perm-indices.c
> ===
> --- gcc/vec-perm-indices.c  2017-12-09 22:48:47.548825399 +
> +++ gcc/vec-perm-indices.c  2017-12-09 22:48:50.360942531 +
> @@ -152,6 +152,20 @@ tree_to_vec_perm_builder (vec_perm_build
>return true;
>  }
>
> +/* Return a VECTOR_CST of type TYPE for the permutation vector in INDICES.  
> */
> +
> +tree
> +vec_perm_indices_to_tree (tree type, const vec_perm_indices )
> +{
> +  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == indices.length ());
> +  tree_vector_builder sel (type, indices.encoding ().npatterns (),
> +  indices.encoding ().nelts_per_pattern ());
> +  unsigned int encoded_nelts = sel.encoded_nelts ();
> +  for (unsigned int i = 0; i < encoded_nelts; i++)
> +sel.quick_push (build_int_cst (TREE_TYPE (type), indices[i]));
> +  return sel.build ();
> +}
> +
>  /* Return a CONST_VECTOR of mode MODE that contains the elements of
> INDICES.  */
>
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c 2017-12-09 22:48:47.546825312 +
> +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:48:50.359942492 +
> @@ -2119,10 +2119,7 @@ simplify_vector_constructor (gimple_stmt
>   || GET_MODE_SIZE (TYPE_MODE (mask_type))
>  != GET_MODE_SIZE (TYPE_MODE (type)))
> return false;
> -  tree_vector_builder mask_elts (mask_type, nelts, 1);
> -  for (i = 0; i < nelts; i++)
> -   mask_elts.quick_push (build_int_cst (TREE_TYPE (mask_type), sel[i]));
> -  op2 = mask_elts.build ();
> +  op2 = vec_perm_indices_to_tree (mask_type, indices);
>if (conv_code == ERROR_MARK)
> gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
>else
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +
> +++ gcc/tree-vect-slp.c 2017-12-09 22:48:50.359942492 +
> @@ -3675,13 +3675,7 @@ vect_transform_slp_perm_load (slp_tree n
>   tree mask_vec = NULL_TREE;
>
>   if (! noop_p)
> -   {
> - tree_vector_builder mask_elts (mask_type, nunits, 1);
> - for (int l = 0; l < nunits; ++l)
> -   mask_elts.quick_push (build_int_cst 
> (mask_element_type,
> -mask[l]));
> - mask_vec = mask_elts.build ();
> -   }
> +   mask_vec = vec_perm_indices_to_tree (mask_type, indices);
>
>   if (second_vec_index == -1)
> second_vec_index = first_vec_index;
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2017-12-09 22:48:47.548825399 +
> +++ gcc/tree-vect-stmts.c   2017-12-09 22:48:50.360942531 +
> @@ -2529,10 +2529,7 @@ vectorizable_bswap (gimple *stmt, gimple
>return true;
>  }
>
> -  tree_vector_builder telts (char_vectype, num_bytes, 1);
> -  for (unsigned i = 0; i < num_bytes; ++i)
> -telts.quick_push (build_int_cst (char_type_node, elts[i]));
> -  tree bswap_vconst = telts.build ();
> +  tree bswap_vconst = vec_perm_indices_to_tree (char_vectype, indices);
>
>/* Transform.  */
>vec vec_oprnds = vNULL;
> @@ -6521,17 +6518,10 @@ vect_gen_perm_mask_any (tree vectype, co
>  {
>tree mask_elt_type, mask_type;
>
> -  unsigned int nunits = sel.length ();
> -  gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
> -
>mask_elt_type = lang_hooks.types.type_for_mode
>  (int_mode_for_mode 

Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 02:17:18PM +0100, Jakub Jelinek wrote:
> One option would be to deal with that in pop_stmt_list and the C++ caller
> thereof, where we right now have:
> 
>   /* If the statement list contained exactly one statement, then
>  extract it immediately.  */
>   if (tsi_one_before_end_p (i))
> {
>   u = tsi_stmt (i);
>   tsi_delink ();
>   free_stmt_list (t);
>   t = u;
> }
> 
> This part would need a MAY_HAVE_DEBUG_MARKER_STMTS variant that would
> check for the case where there are just DEBUG_BEGIN_STMT markers + one other
> stmt (+ maybe some further DEBUG_BEGIN_STMT markers) and nothing else.
> If that one other stmt doesn't have TREE_SIDE_EFFECTS, clear
> TREE_SIDE_EFFECTS on the whole statement list to get the same
> TREE_SIDE_EFFECTS from the returned expression as without -g.

Like this (the cp_parser_omp_for_loop_init function would need similar
changes).  Except while it works for this new testcase and some of the
testcases I've listed, it still doesn't work for others.

Alex, I'm giving up here.

Jakub


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 09:36:46AM -0700, Jeff Law wrote:
> On 12/18/2017 08:10 AM, Marek Polacek wrote:
> > I'm not entirely up to speed with this code, but this one seemed 
> > sufficiently
> > obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
> > Otherwise, go with maxobjsize.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2017-12-18  Marek Polacek  
> > 
> > PR middle-end/83463
> > * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref):
> > Check if TYPE is INTEGRAL_TYPE_P before accessing its min/max
> > values.
> > 
> > * gcc.dg/pr83463.c: New test.
> OK.

What about INTEGRAL_TYPE_Ps that have NULL TYPE_MIN_VALUE and/or
TYPE_MAX_VALUE?  Doesn't Ada create those?

Jakub


PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Marek Polacek
I'm not entirely up to speed with this code, but this one seemed sufficiently
obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
Otherwise, go with maxobjsize.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-12-18  Marek Polacek  

PR middle-end/83463
* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref):
Check if TYPE is INTEGRAL_TYPE_P before accessing its min/max
values.

* gcc.dg/pr83463.c: New test.

diff --git gcc/gimple-ssa-warn-restrict.c gcc/gimple-ssa-warn-restrict.c
index 4d424735d2a..d1a376637a2 100644
--- gcc/gimple-ssa-warn-restrict.c
+++ gcc/gimple-ssa-warn-restrict.c
@@ -287,13 +287,15 @@ builtin_memref::builtin_memref (tree expr, tree size)
  else
{
  gimple *stmt = SSA_NAME_DEF_STMT (offset);
+ tree type;
  if (is_gimple_assign (stmt)
- && gimple_assign_rhs_code (stmt) == NOP_EXPR)
+ && gimple_assign_rhs_code (stmt) == NOP_EXPR
+ && (type = TREE_TYPE (gimple_assign_rhs1 (stmt)))
+ && INTEGRAL_TYPE_P (type))
{
  /* Use the bounds of the type of the NOP_EXPR operand
 even if it's signed.  The result doesn't trigger
 warnings but makes their output more readable.  */
- tree type = TREE_TYPE (gimple_assign_rhs1 (stmt));
  offrange[0] = wi::to_offset (TYPE_MIN_VALUE (type));
  offrange[1] = wi::to_offset (TYPE_MAX_VALUE (type));
}
diff --git gcc/testsuite/gcc.dg/pr83463.c gcc/testsuite/gcc.dg/pr83463.c
index e69de29bb2d..735ef3c6dc8 100644
--- gcc/testsuite/gcc.dg/pr83463.c
+++ gcc/testsuite/gcc.dg/pr83463.c
@@ -0,0 +1,17 @@
+/* PR middle-end/83463 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wrestrict" } */
+
+int *a;
+void *memcpy ();
+void
+m (void *p1)
+{
+  memcpy (0, p1, 0);
+}
+
+void
+p ()
+{
+  m (p + (long) a);
+}

Marek


Re: [PATCH, PR83327] Fix liveness analysis in lra for spilled-into hard regs

2017-12-18 Thread Vladimir Makarov



On 12/15/2017 06:25 AM, Tom de Vries wrote:


Proposed Solution:

The patch addresses the problem, by:
- marking the hard regs that have been used in lra_spill in
  hard_regs_spilled_into
- using hard_regs_spilled_into in lra_create_live_ranges to
  make sure those registers are marked in the conflict_hard_regs
  of pseudos that overlap with the spill register usage

[ I've also tried an approach where I didn't use 
hard_regs_spilled_into, but tried to propagate all hard regs. I 
figured out that I needed to mask out eliminable_regset.  Also I 
needed to masked out lra_no_alloc_regs, but that could be due to 
gcn-specific problems (pointers take 2 hard regs), I'm not yet sure. 
Anyway, in the submitted patch I tried to avoid these problems and 
went for the more minimal approach. ]


Tom, thank you for the detail explanation of the problem and solutions 
you considered.  It helped me a lot.  Your simple solution is adequate 
as the most transformations and allocation are done on the 1st LRA 
subpasses iteration.

In order to get the patch accepted for trunk, I think we need:
- bootstrap and reg-test on x86_64
- build and reg-test on mips (the only primary platform that has the
  spill_class hook enabled)

Any comments?


The patch looks ok to me.  You can commit it after successful testing on 
x86-64 and mips but I am sure there will be no problems with x86-64 as 
it does not use spill_class currently (actually your patch might help to 
switch it on again for x86-64.  spill_class was quite useful for x86-64 
performance on Intel processors).




C++ PATCH to fix wrong-code with constexpr call table cache (PR c++/83116)

2017-12-18 Thread Marek Polacek
Here the problem was that cxx_eval_call_expression can cache the result of a
constexpr call in constexpr_call_table, but we have to be careful, after
store_init_value the result might be invalid.  So I believe we also have to
clear the constexpr call table.  I've lumped it together with clearing
cv_cache.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 7?

2017-12-18  Marek Polacek  

PR c++/83116
* constexpr.c (clear_cv_cache): Renamed to ...
(clear_constexpr_cache): ... this.  Also clear constexpr_call_table.
(clear_cv_and_fold_caches): Renamed to ...
(clear_constexpr_and_fold_caches): ... this.
* cp-tree.h (clear_constexpr_and_fold_caches): Update declaration.
* decl.c (finish_enum_value_list): Call clear_constexpr_and_fold_caches
instead of clear_cv_and_fold_caches.
* typeck2.c (store_init_value): Likewise.

* g++.dg/cpp1y/constexpr-83116.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 0455be1d6da..e0299e731f0 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -4992,21 +4992,23 @@ maybe_constant_value (tree t, tree decl)
   return r;
 }
 
-/* Dispose of the whole CV_CACHE.  */
+/* Dispose of the whole CV_CACHE and CONSTEXPR_CALL_TABLE.  */
 
 static void
-clear_cv_cache (void)
+clear_constexpr_cache (void)
 {
   if (cv_cache != NULL)
 cv_cache->empty ();
+  if (constexpr_call_table != NULL)
+constexpr_call_table->empty ();
 }
 
-/* Dispose of the whole CV_CACHE and FOLD_CACHE.  */
+/* Clear the constexpr caches and FOLD_CACHE.  */
 
 void
-clear_cv_and_fold_caches (void)
+clear_constexpr_and_fold_caches (void)
 {
-  clear_cv_cache ();
+  clear_constexpr_cache ();
   clear_fold_cache ();
 }
 
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index f5f974da728..6e553c4f62d 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -7409,7 +7409,7 @@ extern bool var_in_maybe_constexpr_fn   (tree);
 extern void explain_invalid_constexpr_fn(tree);
 extern vec cx_error_context   (void);
 extern tree fold_sizeof_expr   (tree);
-extern void clear_cv_and_fold_caches   (void);
+extern void clear_constexpr_and_fold_caches(void);
 
 /* In cp-ubsan.c */
 extern void cp_ubsan_maybe_instrument_member_call (tree);
diff --git gcc/cp/decl.c gcc/cp/decl.c
index ca9e0c7b205..bed2280b9e4 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -14341,7 +14341,7 @@ finish_enum_value_list (tree enumtype)
 
   /* Each enumerator now has the type of its enumeration.  Clear the cache
  so that this change in types doesn't confuse us later on.  */
-  clear_cv_and_fold_caches ();
+  clear_constexpr_and_fold_caches ();
 }
 
 /* Finishes the enum type. This is called only the first time an
diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
index e5bb249b2be..787e59b70d8 100644
--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -845,7 +845,7 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   value = replace_placeholders (value, decl);
 
   /* DECL may change value; purge caches.  */
-  clear_cv_and_fold_caches ();
+  clear_constexpr_and_fold_caches ();
 
   /* If the initializer is not a constant, fill in DECL_INITIAL with
  the bits that are constant, and then return an expression that
diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C 
gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
index e69de29bb2d..18d79e2e1cc 100644
--- gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
@@ -0,0 +1,18 @@
+// PR c++/83116
+// { dg-do run { target c++14 } }
+// { dg-options "-O2" }
+
+struct S {
+  constexpr S () : s(0) { foo (); }
+  constexpr int foo () { return s; }
+  int s;
+};
+
+int
+main ()
+{
+  static S var;
+  var.s = 5;
+  if (var.s != 5 || var.foo () != 5)
+__builtin_abort ();
+}

Marek


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jeff Law
On 12/18/2017 08:10 AM, Marek Polacek wrote:
> I'm not entirely up to speed with this code, but this one seemed sufficiently
> obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
> Otherwise, go with maxobjsize.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-12-18  Marek Polacek  
> 
>   PR middle-end/83463
>   * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref):
>   Check if TYPE is INTEGRAL_TYPE_P before accessing its min/max
>   values.
> 
>   * gcc.dg/pr83463.c: New test.
OK.
jeff


RE: [patch][x86] -march=icelake

2017-12-18 Thread Koval, Julia
Hi, I tried to replace 2 flags variable with c++ bitset(in patch attached). 
What do you think?

> Please add these options first.
2 options left(they are under Kirill's review currently), I'll add PTAs for 
them to the patch, as soon as they will be commited.

Thanks,
Julia


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Sunday, November 12, 2017 5:30 PM
> To: Koval, Julia 
> Cc: GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [patch][x86] -march=icelake
> 
> On Sun, Nov 12, 2017 at 1:04 AM, Koval, Julia  wrote:
> > Hi, this patch adds new option -march=icelake. Isasets defined in:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> > I didn't add arch code to driver-i386.c, because there is no code available 
> > in
> SDM yet, only for cannonlake
> (https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-
> vol-1-2abcd-3abcd.pdf Chapter 2).
> 
> This means the driver will go through generic detection for
> -march=native. Perhaps a comment should be added, so we won't forget
> to add the model number when one is available.
> 
> > gcc/
> > * config.gcc: Add -march=icelake.
> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect icelake.
> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle 
> > icelake.
> > * config/i386/i386.c (processor_costs): Add m_ICELAKE.
> > (PTA_ICELAKE, PTA2_ICELAKE, PTA2_GFNI, PTA2_AVX512VBMI2,
> PTA2_VAES,
> > PTA2_AVX512VNNI, PTA2_VPCLMULQDQ, PTA2_RDPID,
> PTA2_AVX512BITALG): New.
> > (processor_target_table): Add icelake.
> > (ix86_option_override_internal): Add flags2 for new PTA, handle 
> > GFNI,
> RDPID.
> > (get_builtin_code_for_version): Handle icelake.
> > (M_INTEL_COREI7_ICELAKE): New.
> > * config/i386/i386.h (TARGET_ICELAKE, PROCESSOR_ICELAKE): New.
> > * doc/invoke.texi: Add -march=icelake.
> > gcc/testsuite/
> > * gcc.target/i386/funcspec-56.inc: Handle new march.
> > * g++.dg/ext/mv16.C: Ditto.
> > libgcc/
> > * config/i386/cpuinfo.h (processor_subtypes): Add
> INTEL_COREI7_ICELAKE.
> 
> @@ -3425,6 +3427,13 @@ ix86_option_override_internal (bool main_args_p,
>  #define PTA_AVX5124FMAPS(HOST_WIDE_INT_1 << 61)
>  #define PTA_AVX512VPOPCNTDQ(HOST_WIDE_INT_1 << 62)
>  #define PTA_SGX(HOST_WIDE_INT_1 << 63)
> +#define PTA2_GFNI(HOST_WIDE_INT_1 << 0)
> +#define PTA2_AVX512VBMI2(HOST_WIDE_INT_1 << 1)
> +#define PTA2_VAES(HOST_WIDE_INT_1 << 2)
> +#define PTA2_AVX512VNNI(HOST_WIDE_INT_1 << 3)
> +#define PTA2_VPCLMULQDQ(HOST_WIDE_INT_1 << 4)
> +#define PTA2_RDPID(HOST_WIDE_INT_1 << 5)
> +#define PTA2_AVX512BITALG(HOST_WIDE_INT_1 << 6)
> 
> Please add these options first.
> 
> On a related note, there should probably be a better way to extend
> various bitmapped flag variables beyond 64bit words. We are constantly
> going over 64bit sizes in target option masks, now the number of
> processor flags doesn't fit in a word anymore. There are several
> places one has to keep in mind in which word some specific flag lives,
> and this  approach opens several ways to make a hard to detect
> mistake. Does C++ offer a more elegant way?
> 
> Bellow, please find a suggestion of a couple of cosmetic changes.
> 
> Thanks,
> Uros.
> 
> @@ -3425,6 +3427,13 @@ ix86_option_override_internal (bool main_args_p,
>  #define PTA_AVX5124FMAPS(HOST_WIDE_INT_1 << 61)
>  #define PTA_AVX512VPOPCNTDQ(HOST_WIDE_INT_1 << 62)
>  #define PTA_SGX(HOST_WIDE_INT_1 << 63)
> 
> Please add a comment here, that the folowing belongs to flags2.
> 
> +#define PTA2_GFNI(HOST_WIDE_INT_1 << 0)
> +#define PTA2_AVX512VBMI2(HOST_WIDE_INT_1 << 1)
> +#define PTA2_VAES(HOST_WIDE_INT_1 << 2)
> 
> 
> @@ -4105,6 +4124,12 @@ ix86_option_override_internal (bool main_args_p,
>  if (processor_alias_table[i].flags & PTA_SGX
>  && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_SGX))
>opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_SGX;
> 
> Please add vertical space here to visually separate flags and flags2 
> processing.
> 
> +if (processor_alias_table[i].flags2 & PTA2_RDPID
> +&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_RDPID))
> +  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_RDPID;


0001-icelake.patch
Description: 0001-icelake.patch


Re: c-family PATCH for warn_logical_operator

2017-12-18 Thread Jeff Law
On 12/18/2017 03:35 AM, Marek Polacek wrote:
> While looking into something else I noticed that because of the missing
> warn_logical_op check we do unnecessary work in warn_logical_operator, like
> calling fold_for_warn which can put stuff into cv/fold_cache etc.  It also
> causes more noise when debugging.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-12-18  Marek Polacek  
> 
>   * c-warn.c (warn_logical_operator): Return early if -Wlogical-op is
>   not in effect.
OK.
jeff


[PR c++/59930] template friend injection

2017-12-18 Thread Nathan Sidwell
this patch fixes the handling of template friend classes of template 
classes.  Our current handling was somewhat messed up, possibly because 
of now-clarified ambiguities in earlier versions of the std.  In 
particular we'd search too many namespaces looking for an existing 
declaration, and if not found inject into the global namespace as a 
non-hidden name.


This patch:
a) removes some now-unneeded handling for self friendliness
b) looks only in the innermost containing namespace for an unqualified 
friend [*]

c) injects into that namespace as a hidden name.

[*] fixing this may permit removal of some checks in other friend cases 
too.  I have not investigated.


applying to trunk

nathan
--
Nathan Sidwell
2017-12-18  Nathan Sidwell  

	PR c++/59930
	* name-lookup.c (name_lookup::search_unqualified): Don't search
	parent namespace when looking for hidden things.
	* pt.c (tsubst_friend_class): Always push to friend scope, drop
	unneeded self-friend check. Inject new hidden friend into correct
	scope.

	PR c++/59930
	* g++.dg/parse/pr81247-c.C: Adjust.
	* g++.dg/template/pr59930-[123].C: New.

Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 255777)
+++ cp/name-lookup.c	(working copy)
@@ -711,6 +711,15 @@ name_lookup::search_unqualified (tree sc
 done:;
   if (scope == global_namespace)
 	break;
+
+  /* If looking for hidden names, we only look in the innermost
+	 namespace scope.  [namespace.memdef]/3 If a friend
+	 declaration in a non-local class first declares a class,
+	 function, class template or function template the friend is a
+	 member of the innermost enclosing namespace.  See also
+	 [basic.lookup.unqual]/7 */
+  if (flags & LOOKUP_HIDDEN)
+	break;
 }
 
   vec_safe_truncate (queue, length);
Index: cp/pt.c
===
--- cp/pt.c	(revision 255777)
+++ cp/pt.c	(working copy)
@@ -10005,57 +10005,23 @@ tsubst_friend_function (tree decl, tree
 static tree
 tsubst_friend_class (tree friend_tmpl, tree args)
 {
-  tree friend_type;
   tree tmpl;
-  tree context;
 
   if (DECL_TEMPLATE_TEMPLATE_PARM_P (friend_tmpl))
 {
-  tree t = tsubst (TREE_TYPE (friend_tmpl), args, tf_none, NULL_TREE);
-  return TREE_TYPE (t);
+  tmpl = tsubst (TREE_TYPE (friend_tmpl), args, tf_none, NULL_TREE);
+  return TREE_TYPE (tmpl);
 }
 
-  context = CP_DECL_CONTEXT (friend_tmpl);
+  tree context = CP_DECL_CONTEXT (friend_tmpl);
+  if (TREE_CODE (context) == NAMESPACE_DECL)
+push_nested_namespace (context);
+  else
+push_nested_class (context);
 
-  if (context != global_namespace)
-{
-  if (TREE_CODE (context) == NAMESPACE_DECL)
-	push_nested_namespace (context);
-  else
-	push_nested_class (tsubst (context, args, tf_none, NULL_TREE));
-}
-
-  /* Look for a class template declaration.  We look for hidden names
- because two friend declarations of the same template are the
- same.  For example, in:
-
-   struct A { 
- template  friend class F;
-   };
-   template  struct B { 
- template  friend class F;
-   };
-
- both F templates are the same.  */
-  tmpl = lookup_name_real (DECL_NAME (friend_tmpl), 0, 0,
-			   /*block_p=*/true, 0, LOOKUP_HIDDEN);
-
-  /* But, if we don't find one, it might be because we're in a
- situation like this:
-
-   template 
-   struct S {
-	 template 
-	 friend struct S;
-   };
-
- Here, in the scope of (say) S, `S' is bound to a TYPE_DECL
- for `S', not the TEMPLATE_DECL.  */
-  if (!tmpl || !DECL_CLASS_TEMPLATE_P (tmpl))
-{
-  tmpl = lookup_name_prefer_type (DECL_NAME (friend_tmpl), 1);
-  tmpl = maybe_get_template_decl_from_type_decl (tmpl);
-}
+  tmpl = lookup_name_real (DECL_NAME (friend_tmpl), /*prefer_type=*/false,
+			   /*non_class=*/false, /*block_p=*/false,
+			   /*namespaces_only=*/false, LOOKUP_HIDDEN);
 
   if (tmpl && DECL_CLASS_TEMPLATE_P (tmpl))
 {
@@ -10068,53 +10034,50 @@ tsubst_friend_class (tree friend_tmpl, t
   if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
 	  > TMPL_ARGS_DEPTH (args))
 	{
-	  tree parms;
-  location_t saved_input_location;
-	  parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (friend_tmpl),
-	 args, tf_warning_or_error);
-
-  saved_input_location = input_location;
+	  tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (friend_tmpl),
+	  args, tf_warning_or_error);
+  location_t saved_input_location = input_location;
   input_location = DECL_SOURCE_LOCATION (friend_tmpl);
   tree cons = get_constraints (tmpl);
   redeclare_class_template (TREE_TYPE (tmpl), parms, cons);
   input_location = saved_input_location;
-  
 	}
-
-  friend_type = TREE_TYPE (tmpl);
 }
   else
 {
   /* The friend template has not already been declared.  In this
 	 case, 

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-18 Thread Richard Biener
On Mon, Dec 18, 2017 at 1:37 PM, Richard Biener
 wrote:
> On Fri, Dec 15, 2017 at 7:39 PM, Bin.Cheng  wrote:
>> On Fri, Dec 15, 2017 at 1:19 PM, Richard Biener
>>  wrote:
>>> On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng  wrote:
 On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng  wrote:
> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>  wrote:
>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
>>> Hi,
>>> As explained in the PR, given below test case:
>>> int a[8][10] = { [2][5] = 4 }, c;
>>>
>>> int
>>> main ()
>>> {
>>>   short b;
>>>   int i, d;
>>>   for (b = 4; b >= 0; b--)
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][b + 2] = a[c][b + 1];
>>>   for (i = 0; i < 8; i++)
>>> for (d = 0; d < 10; d++)
>>>   if (a[i][d] != (i == 3 && d == 6) * 4)
>>> __builtin_abort ();
>>>   return 0;
>>>
>>> the loop nest is illegal for vectorization without reversing inner 
>>> loop.  The issue
>>> is in data dependence checking of vectorizer, I believe the mentioned 
>>> revision just
>>> exposed this.  Previously the vectorization is skipped because of 
>>> unsupported memory
>>> operation.  The outer loop vectorization unrolls the outer loop into:
>>>
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][6] = a[c][5];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][5] = a[c][4];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][4] = a[c][3];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][3] = a[c][2];
>>>   }
>>> Then four inner loops are fused into:
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>> for (c = 0; c <= 6; c++)
>>> {
>>>   a[c + 1][6] = a[c][5];  // S1
>>>   a[c + 1][5] = a[c][4];  // S2
>>>   a[c + 1][4] = a[c][3];
>>>   a[c + 1][3] = a[c][2];
>>> }
>>>   }
>>
>> Note that they are not really "fused" but they are interleaved.  With
>> GIMPLE in mind
>> that makes a difference, you should get the equivalent of
>>
>>for (c = 0; c <= 6; c++)
>>  {
>>tem1 = a[c][5];
>>tem2 = a[c][4];
>>tem3 = a[c][3];
>>tem4 = a[c][2];
>>a[c+1][6] = tem1;
>>a[c +1][5] = tem2;
>> a[c+1][4] = tem3;
>> a[c+1][3] = tem4;
>>  }
> Yeah, I will double check if this abstract breaks the patch and how.
 Hmm, I think this doesn't break it, well at least for part of the
 analysis, because it is loop carried (backward) dependence goes wrong,
 interleaving or not with the same iteration doesn't matter here.
>>>
>>> I think the idea is that forward dependences are always fine (negative 
>>> distance)
>>> to vectorize.  But with backward dependences we have to adhere to max_vf.
>>>
>>> It looks like for outer loop vectorization we only look at the distances in 
>>> the
>>> outer loop but never at inner ones.  But here the same applies but isn't 
>>> that
>>> independend on the distances with respect to the outer loop?
>>>
>>> But maybe I'm misunderstanding how "distances" work here.
>> Hmm, I am not sure I understand "distance" correctly.  With
>> description as in book like "Optimizing compilers for Modern
>> Architectures", distance is "# of iteration of sink ref - # of
>> iteration of source ref".  Given below example:
>>   for (i = 0; i < N; ++i)
>> {
>>   x = arr[idx_1];  // S1
>>   arr[idx_2] = x;  // S2
>> }
>> if S1 is source ref, distance = idx_2 - idx_1, and distance > 0.  Also
>> this is forward dependence.  For example, idx_1 is i + 1 and idx_2 is
>> i;
>> If S2 is source ref, distance = idx_1 - idx_2, and distance < 0.  Also
>> this is backward dependence.  For example idx_1 is i and idx_2 is i +
>> 1;
>>
>> In GCC, we always try to subtract idx_2 from idx_1 first in computing
>> classic distance, we could result in negative distance in case of
>> backward dependence.  When this happens at dependence carried level,
>> we manually reverse it.  When this happens at inner level loop, we
>> simply keep the negative distance.  And it's meaningless to talk about
>> forward/backward given dependence is carried at outer level loop.
>>
>> Outer loop vectorization is interesting.  The problematic case has
>> backward dependence carried by outer loop.  Because we don't check
>> dist vs. max_vf for such dep, the unrolled references could have outer
>> loop index equal to each other, as in a[c][5] and a[c+1][5].  So it's
>> like we have outer loop index resolved as equal.  Now it has
>> dependence only if c == c' + 1.  I think previous comment on fusion
>> still hold for this and we now need to check 

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-18 Thread Michael Matz
Hi,

On Mon, 18 Dec 2017, Richard Biener wrote:

> where *unroll is similar to *max_vf I think.  dist_v[0] is the innermost loop.

[0] is always outermost loop.

> The vectorizer does way more complicated things and only looks at the 
> distance with respect to the outer loop as far as I can see which can be 
> negative.
> 
> Not sure if fusion and vectorizer "interleaving" makes a difference 
> here. I think the idea was that when interleaving stmt-by-stmt then 
> forward dependences would be preserved and thus we don't need to check 
> the inner loop dependences.  speaking with "forward vs. backward" 
> dependences again, not distances...
> 
> This also means that unroll-and-jam could be enhanced to "interleave" 
> stmts and thus cover more cases?
> 
> Again, I hope Micha can have a look here...

Haven't yet looked at the patch, but some comments anyway:

fusion and interleaving interact in the following way in outer loop 
vectorization, conceptually:
* (1) the outer loop is unrolled
* (2) the inner loops are fused
* (3) the (now single) inner body is rescheduled/shuffled/interleaved.

(1) is always okay.  But (2) and (3) as individual transformations must 
both be checked for validity.  If fusion isn't possible the whole 
transformation is invalid, and if interleaving isn't possible the same is 
true.  In the specific example:

  for (b = 4; b >= 0; b--)
for (c = 0; c <= 6; c++)
  t = a[c][b + 1];  // S1
  a[c + 1][b + 2] = t;  // S2

it's already the fusion step that's invalid.  There's a 
dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0) 
with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5].  So a 
write-after-read.  After fusing:

   for (c = 0; c <= 6; c++)
 {
   t = a[c][5];  // S1
   a[c + 1][6] = t;
   t = a[c][4];
   a[c + 1][5] = t;  // S2
   a[c + 1][4] = a[c][3];
   a[c + 1][3] = a[c][2];
 }

here we have at iterations (c) = (0) comes-before (1), at S2(0) writing 
a[1][5] and S1(1) writing a[1][5].  I.e. now it's a read-after-write (the 
write in iteration 0 overwrites the value that is going to be read at 
iteration 1, which wasn't the case in the original loop).  The dependence 
switched direction --> invalid.

The simple interleaving of statements can't rectify this.  
Interleaving is an inner-body reordering but the brokenness comes from a 
cross-iteration ordering.

This example can be unroll-jammed or outer-loop vectorized if one of the 
two loops is reversed.  Let's say we reverse the inner loop, so that it 
runs in the same direction as the outer loop (reversal is possible here).

It'd then be something like:

   for (c = 6; c >= 0; c--)
 {
   t = a[c][5];  // S1
   a[c + 1][6] = t;
   t = a[c][4];
   a[c + 1][5] = t;  // S2
   a[c + 1][4] = a[c][3];
   a[c + 1][3] = a[c][2];
 }

The dependence between S1/S2 would still be a write-after-read, and all 
would be well.  This reversal of the inner loop can partly be simulated by 
not only interleaving the inner insns, but by also _reodering_ them.  But 
AFAIK the vectorizer doesn't do this?


Ciao,
Michael.


Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-18 Thread Jeff Law
On 12/18/2017 03:15 AM, Richard Biener wrote:
> On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
>  wrote:
>> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law  wrote:
>>> I hate this patch.
>>>
>>> The fundamental problem we have is that there are times when we very
>>> much want to thread jumps and yet there are other times when we do not.
>>>
>>> To date we have been able to largely select between those two by
>>> looking
>>> at the shape of the CFG and the jump thread to see how threading a
>>> particular jump would impact the shape of the CFG (particularly loops
>>> in
>>> the CFG).
>>>
>>> In this BZ we have a loop like this:
>>>
>>>2
>>>|
>>>3<---+
>>>||
>>>4<---+   |
>>>   / \   |   |
>>>  5   6  |   |
>>>   \  /  |   |
>>> 7   |   |
>>>/ \  |   |
>>>   8  11-+   |
>>>  / \|
>>> 9  10---+
>>> |
>>> E
>>>
>>> We want to thread the path (6, 7) (7, 11).  ie, there's a path through
>>> that inner loop where we don't need to do the loop test.  Here's an
>>> example (from libgomp testsuite)
>>>
>>> (N is 1500)
>>>
>>> for (j = 0; j < N; j++)
>>>{
>>>  if (j > 500)
>>>{
>>>  x[i][j] = i + j + 3;
>>>  y[j] = i*j + 10;
>>>}
>>>  else
>>>x[i][j] = x[i][j]*3;
>>>}
>>>
>>>
>>>
>>> Threading (in effect) puts the loop test into both arms of the
>>> conditional, then removes it from the ELSE arm because the condition is
>>> always "keep looping" in that arm.
>>>
>>> This plays poorly with loop parallelization -- I tried to follow what's
>>> going on in there and just simply got lost.  I got as far as seeing
>>> that
>>> the parallelization code thought there were loop carried dependencies.
>>> At least that's how it looked to me.  But I don't see any loop carried
>>> dependencies in the code.
>>
>> Hmm. I'll double check if I remember on Monday.
> 
> So I did and the ultimate reason GRAPHITE FAILs is because for
> the loop with the threading applied we can no longer compute
> number of iterations.  That's of course bad for _all_ loop optimizers,
> not just GRAPHITE (we just don't have any other test coverage).
> The main issue here is that after the threading is applied the
> block with the loop exit no longer dominates the loop latch.
> This means the loop should really be split but it also means we should
> definitely avoid performing such threading before loop optimizers run,
> and not just if parallelizing loops.
> 
> Can you adjust your patch this way?  The condition you check seems
> to more or less match the case where we thread through the exit
> test _into_ the loop?  The bad thing is really if in the resulting CFG
> the exit test isn't dominating the latch (thus we have an exit test
> that is not always executed):
The problem is we'll end up regressing other vectorization cases.  I
looked at a variety of things WRT the shape of the CFG and ultimately
found that there wasn't anything I could use to distinguish the two
cases.   I'll look again though.

Jeff


Re: [PR 81616] Deferring FMA transformations in tight loops

2017-12-18 Thread Richard Biener
On Fri, Dec 15, 2017 at 3:19 PM, Martin Jambor  wrote:
>
> Hello,
>
> the patch below prevents creation if fused-multiply-and-add instructions
> in the widening_mul gimple pass on the Zen-based AMD CPUs and as a
> result fixes regressions of native znver1 tuning when compared to
> generic tuning in:
>
>   - the matrix.c testcase of PR 81616 (straightforward matrix
> multiplication) at -O2 and -O3 which is currently 60% (!),
>
>   - SPEC 2006 454.calculix at -O2, which is currently over 20%, and
>
>   - SPEC 2017 510.parest at -O2 and -Ofast, which is currently also
> about 20% in both cases.
>
> The basic idea is to detect loops in the following form:
>
> 
> # accumulator_111 = PHI <0.0(5), accumulator_66(6)>
> ...
> _65 = _14 * _16;
> accumulator_66 = _65 + accumulator_111;
>
> and prevents from creating FMA for it.  Because at least in the parest
> and calculix cases it has to, it also deals with more than one chain of
> FMA candidates that feed the next one's addend:
>
>
> 
> # accumulator_111 = PHI <0.0(5), accumulator_66(6)>
> ...
> _65 = _14 * _16;
> accumulator_55 = _65 + accumulator_111;
> _65 = _24 * _36;
> accumulator_66 = _65 + accumulator_55;
>
> Unfortunately, to really get rid of the calculix regression, the
> algorithm cannot just look at one BB at a time but also has to work for
> cases like the following:
>
>  1  void mult(void)
>  2  {
>  3  int i, j, k, l;
>  4
>  5 for(i=0; i  6 {
>  7for(j=0; j  8{
>  9   for(l=0; l 10   {
> 11   c[i][j] += a[i][l] * b[k][l];
> 12   for(k=1; k<10; ++k)
> 13   {
> 14   c[i][j] += a[i][l+k] * b[k][l+k];
> 15   }
> 16
> 17   }
> 18}
> 19 }
> 20  }
>
> where the FMA on line 14 feeds into the one on line 11 in an
> encompassing loop.  Therefore I have changed the structure of the pass
> to work in reverse dominance order and it keeps a hash set of results of
> rejected FMAs candidates which it checks when looking at PHI nodes of
> the current BB.  Without this reorganization, calculix was still 8%
> slower with native tuning than with generic one.
>
> When the deferring mechanism realizes that in the current BB, the FMA
> candidates do not all form a one chain tight-loop like in the examples
> above, it goes back to all the previously deferred candidates (in the
> current BB only) and performs the transformation.
>
> The main reason is to keep the patch conservative (and also simple), but
> it also means that the following function is not affected and is still
> 20% slower when compiled with native march and tuning compared to the
> generic one:
>
>  1  void mult(struct s *p1, struct s *p2)
>  2  {
>  3 int i, j, k;
>  4
>  5 for(i=0; i  6 {
>  7for(j=0; j  8{
>  9   for(k=0; k 10   {
> 11  p1->c[i][j] += p1->a[i][k] * p1->b[k][j];
> 12  p2->c[i][j] += p2->a[i][k] * p2->b[k][j];
> 13   }
> 14}
> 15 }
> 16  }

So here we apply store-motion to p[12]->c[i][j] so we see a
reduction in SSA?  That is, you are really looking for
SSA SCCs that involve only FMA candidates?  There's a SCC
finding code in tree-ssa-sccvn.c, DFS (), but a straight-forward
use->def walk over the adds should find a SCC just involving the
adds?  That is, rather than doing all the defering stuff I wonder
if it is easier to simply gather the SCC if there is any and add
"disqualifications", like to each other member?  So when seeing

  x += a1 * b1;
  x += a2 * b2;
  x += a3 * b3;
...

we'd still generate a FMA for each 2nd statement?  Basically
do some scheduling of as many mults as needed to conver
the latency of the first FMA.

To go back to the Zen specific issue - so FMA has a higher overall
latency than mul + add (?) and (or?) if there's more than one mul being
accumulated then the 2nd mul can execute in parallel with the first
add (in case dependences allow).  I wonder if there's an easy way to
query the DFA for the latency of FMA vs. ADD and/or if the --param
(or target hook?) should rather tell us sth about this latency difference?

Zen IIRC can execute two FMA128 in parallel (thus one FMA256 per cycle).
That makes it an odd observation that avx256 can "fix" the issue.  WRT
the above suggestion the situation needs to be clarified.

> I suppose that the best optimization for the above would be to split the
> loops, but one could probably construct at least an artificial testcase
> where the FMAs would keep enough locality that it is not the case.  The
> mechanism can be easily extended to keep track of not just one chain but
> a few, preferably as a followup, if people think it 

[Committed] S/390: PR83420: Improve hotpatch option parsing.

2017-12-18 Thread Andreas Krebbel
With the attached patch we get rid of the following build failure:

/home/andreas/build/../gcc/gcc/config/s390/s390.c: In function ‘void
s390_option_override()’:
/home/andreas/build/../gcc/gcc/config/s390/s390.c:15361:16: error: ‘char*
strncpy(char*, const char*, size_t)’ specified bound 256 equals destination
size [-Werror=stringop-truncation]
strncpy (s, opt->arg, 256);
^~

gcc/ChangeLog:

2017-12-18  Andreas Krebbel  

PR target/83420
* config/s390/s390.c (s390_option_override): Avoid strncpy.
---
 gcc/config/s390/s390.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 0b04ac0..6f2a189 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15357,16 +15357,11 @@ s390_option_override (void)
{
  int val1;
  int val2;
- char s[256];
- char *t;
+ char *s = strtok (ASTRDUP (opt->arg), ",");
+ char *t = strtok (NULL, "\0");
 
- strncpy (s, opt->arg, 256);
- s[255] = 0;
- t = strchr (s, ',');
  if (t != NULL)
{
- *t = 0;
- t++;
  val1 = integral_argument (s);
  val2 = integral_argument (t);
}
-- 
2.9.1



Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-18 Thread Sudakshina Das

On 14/12/17 10:38, Sudakshina Das wrote:

Hi

On 13/12/17 16:56, James Greenhalgh wrote:

On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote:

On 13/12/17 16:42, Sudakshina Das wrote:

Hi

This patch is a follow up to the existing discussions on
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent
behavior.

As discussed earlier on the gcc-patches discussion and on the bug
report, PR81647 was occurring because of how UNEQ was handled in
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is
guaranteed to not give an FP exception but LTGT might,
__builtin_islessgreater gets converted to ~UNEQ very early on in
fold_builtin_unordered_cmp. Thus I will post a separate patch for
correcting how UNEQ and other unordered comparisons are handled in
aarch64-simd.md.

This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new compile time test case that 
gives

out LTGT to make sure it doesn't ICE.

Is this ok for trunk?


OK.

Thanks,
James



Thanks for the review.
Committed as r255625.
I think this needs a back-port as well to gcc-7-branch. Is that ok?

Sudi


Backport Ping!

Sudi


Re: [PATCH] -fdump-tree, -save-temps=obj & subdirs

2017-12-18 Thread Jeff Law
On 12/07/2017 05:10 AM, Nathan Sidwell wrote:
> There's an unfortunate interaction between -save-temps=obj and
> -fdump-tree- when subdirectories are in play.
> 
> Consider:
>   g++ -fdump-tree-all -save-temps=obj -c -o tgt/x.o sub/x.cc
> we end up with a bunch of errors of the form:
>  sub/x.cc:1:0: error: could not open dump file
> ‘tgt/tgt/x.046t.profile_estimate’: No such file or directory
> 
> you'll see it's added the 'tgt' sub directory twice.  The reason is that
> cc1plus is invoked as:
> 
>  /usr/libexec/gcc/x86_64-redhat-linux/7/cc1plus -E -quiet -v
> -D_GNU_SOURCE sub/x.cc -mtune=generic -march=x86-64 -ansi
> -Woverloaded-virtual -Wall -Wpointer-arith -Wwrite-strings
> -felide-constructors -fdump-tree-all -fpch-preprocess -o tgt/x.ii
> 
> to generate the preprocessed output, and then as:
> 
>  /usr/libexec/gcc/x86_64-redhat-linux/7/cc1plus -fpreprocessed tgt/x.ii
> -quiet -dumpbase tgt/x -mtune=generic -march=x86-64 -auxbase-strip
> tgt/x.o -Woverloaded-virtual -Wall -Wpointer-arith -Wwrite-strings -ansi
> -version -felide-constructors -fdump-tree-all -o tgt/x.s
> 
> for the compilation itself.  That has '-dumpbase tgt/x' and
> '-auxbase-strip tgt/x.o' passed in.
> 
> The options processing checks if dump-base is absolute, and if not
> prefixes dump_dir_name or aux_base_name's directory components.
> 
> It seems to me that the absolute path check is incomplete.  We should
> check if dump-base has /any/ directory components.  If it does, we
> shouldn't touch it.  That's what this patch does:
> 
> 1) remove the absolute dir check.  That's subsumed into ...
> 2) look for any DIR_SEPARATOR in the dump-base.  If so, no prefixing
> 3 & 4) existing prefix code
> 5) always set dump_base_name_prefixed, so we don't repeat #2 on a
> subsequent invocation.
> 
> With this patch we now get a successful compilation:
> 
> nathans@lyta:7>egcs/trunk/obj/x86_64/gcc/xg++ -B
> egcs/trunk/obj/x86_64/gcc/ -fdump-tree-all -save-temps=obj -c -o tgt/x.o
> sub/x.cc
> nathans@lyta:8>ls tgt
> total 100
> 4 x.003t.original  4 x.010t.eh  4 x.020t.ssa 4
> x.049t.release_ssa4 x.220t.switchlower  4 x.o
> 4 x.004t.gimple    4 x.011t.cfg  4 x.027t.fixup_cfg3
> 4 x.050t.local-fnsummary24 x.226t.resx  4 x.s
> 4 x.006t.omplower  4 x.012t.ompexp  4 x.028t.local-fnsummary1  4
> x.087t.fixup_cfg44 x.228t.optimized
> 4 x.007t.lower   4 x.013t.printf-return-value1  4
> x.029t.einline 4 x.218t.veclower0 x.313t.statistics
> 4 x.009t.ehopt   4 x.019t.fixup_cfg1  0
> x.046t.profile_estimate  4 x.219t.cplxlower04 x.ii
> 
> ok?
> 
> nathan
> -- 
> Nathan Sidwell
> 
> pfx.diff
> 
> 
> 2017-12-06  Nathan Sidwell  
> 
>   * opts.c (finish_options): Don't prefix dump_base_name if it
>   already contains directories.
OK.
jeff


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 05:27 PM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 05:03:22PM -0700, Martin Sebor wrote:

Your warning is about restrict and argument overlap, what does it have to do
with unprototyped calls?  Nothing.  There is no restrict in that case, and
it isn't handled as builtin if it doesn't match the builtin's prototype.


$ cat a.c && gcc -O2 -S -Wrestrict a.c
void* memcpy ();

char a[5];

void f (void)
{
  memcpy (a, a + 1, 3);
}
a.c: In function ‘f’:
a.c:7:3: warning: ‘memcpy’ accessing 3 bytes at offsets 0 and 1 overlaps 2
bytes at offset 1 [-Wrestrict]
   memcpy (a, a + 1, 3);
   ^~~~

By insisting on using gimple_call_builtin_p() you're effectively
arguing to disable the warning above, for no good reason that I
can see.


Because it is wrong.


I disagree.  Memcpy is a reserved external identifier and its
semantics are specified by the standard.  A program that defines
its own memcpy is undefined,  Even with -ffreestanding, GCC makes
assumptions about environments providing conforming memcpy and
memset.  Most of the other built-ins handled by the pass are also
reserved by C, same as memcpy.  But you know this at least as well
as I do.


Consider e.g.
void *mempcpy ();

void
foo (int *a)
{
  mempcpy (a, (float *) a + 1, ' ');
}

mempcpy isn't defined in ISO C99, so it is fine if you define it yourself
with void *mempcpy (int *, float *, char); prototype and do whatever you
want in there.  Warning about restrict doesn't make sense in that case.


Programs that define their own versions of GCC built-ins, whether
standard C or specified elsewhere, should disable the built-ins
from being recognized.  That's well understood, especially thanks
to -Wbuiltin-declaration-mismatch.  Otherwise, as the manual says,
they "may be handled as built-in functions."

As I already explained, the purpose of this pass is to help detect
bugs due to "honest" mistakes.  I see no value in trying to disable
such detection on the basis hypothetical use cases.  Doing so would
defeat the purpose of the pass.  I'm not at all concerned about
programs that don't disable built-ins but define functions with
the same names yet conflicting semantics, and on top of it declare
them without a prototype.  If those get diagnosed, so much
the better.  It will be a reminder to their authors to either
use -fno-builtin-xxx to disable the conflicting built-ins, at
least until -Wbuiltin-declaration-mismatch is enhanced to diagnose
such declarations.

With that said, while I value your input, I do not see the point
in continuing to debate whether the pass should diagnose these
cases.  In my mind the benefits are obvious and the downsides
are none.

Martin


Re: [committed][PATCH] Fix bogus propagation in DOM

2017-12-18 Thread Jeff Law
On 11/21/2017 07:56 AM, Richard Biener wrote:
>>
>>
>> I considered trying to key behavior on EDGE_DFS_BACK (6->8).  It'd be
>> something like don't record equivalences from a degenerate PHI where the
>> remaining edge(s) are EDGE_DFS_BACK.  But I just couldn't convince
>> myself that was actually reasonable or sufficient in general.
> 
> The key point is that you can never record "symbolic" equivalences on
> backedges.  Because in SSA you rely on PHIs to fence SSA names from
> leaking cross iterations.  That is, _1 in iteration 1 is obviously not equal
> to _1 in iteration 2.
> 
> Other passes like VRP and value-numbering need to take care of this
> as well.
> 
> You can of course record that _1 is 2 (constant) even across a backedge.
> 
> So I think your consideration was exactly correct.
So finally getting back to this.  Thankfully I was able to find an old
branch, revert the tree-ssa-dom.c bits and via some on-the-fly hackery
trigger the issue again.

This also enabled me to verify that triggering on EDGE_DFS_BACK would in
fact resolve this issue.

So attached are two patches.  The first is the reversion.  The second is
the change to avoid recording the equivalence for a backedge in the CFG.

Bootstrapped and regression tested on x86_64.  Also verified by hand
(using that old branch) that we wouldn't get into the infinite loop if
we reverted and used EDGE_DFS_BACK to trigger ignoring the equivalence.

Installing on the trunk.

Jeff
commit 46a2536a74d7f6981a02da05041885d097d9e23c
Author: Jeff Law 
Date:   Mon Dec 18 19:18:47 2017 -0500

Revert
2017-11-19  Jeff Law  

* tree-ssa-dom.c (record_equivalences_from_phis): Fix handling
of degenerates resulting from ignoring an edge.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d165b5ccf41..b5854f7a67c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2017-12-18  Jeff Law  
+
+   Revert
+   2017-11-19  Jeff Law  
+
+   * tree-ssa-dom.c (record_equivalences_from_phis): Fix handling
+   of degenerates resulting from ignoring an edge.
+
 2017-12-18  Martin Sebor  
 
PR middle-end/83373
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 93c992a9215..05bc807071f 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1109,7 +1109,6 @@ record_equivalences_from_phis (basic_block bb)
   tree rhs = NULL;
   size_t i;
 
-  bool ignored_phi_arg = false;
   for (i = 0; i < gimple_phi_num_args (phi); i++)
{
  tree t = gimple_phi_arg_def (phi, i);
@@ -1120,14 +1119,10 @@ record_equivalences_from_phis (basic_block bb)
  if (lhs == t)
continue;
 
- /* We want to track if we ignored any PHI arguments because
-their associated edges were not executable.  This impacts
-whether or not we can use any equivalence we might discover.  */
+ /* If the associated edge is not marked as executable, then it
+can be ignored.  */
  if ((gimple_phi_arg_edge (phi, i)->flags & EDGE_EXECUTABLE) == 0)
-   {
- ignored_phi_arg = true;
- continue;
-   }
+   continue;
 
  t = dom_valueize (t);
 
@@ -1152,15 +1147,9 @@ record_equivalences_from_phis (basic_block bb)
 a useful equivalence.  We do not need to record unwind data for
 this, since this is a true assignment and not an equivalence
 inferred from a comparison.  All uses of this ssa name are dominated
-by this assignment, so unwinding just costs time and space.
-
-Note that if we ignored a PHI argument and the resulting equivalence
-is SSA_NAME = SSA_NAME.  Then we can not use the equivalence as the
-uses of the LHS SSA_NAME are not necessarily dominated by the
-assignment of the RHS SSA_NAME.  */
+by this assignment, so unwinding just costs time and space.  */
   if (i == gimple_phi_num_args (phi)
- && may_propagate_copy (lhs, rhs)
- && (!ignored_phi_arg || TREE_CODE (rhs) != SSA_NAME))
+ && may_propagate_copy (lhs, rhs))
set_ssa_name_value (lhs, rhs);
 }
 }
commit 61046656ff4e832bd1019e90c251b1c4c2c43883
Author: Jeff Law 
Date:   Mon Dec 18 23:28:37 2017 -0500

* tree-ssa-dom.c (record_equivalences_from_phis): Do not
record symbolic equivalences from backedges in the CFG.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b5854f7a67c..96146e8934f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2017-12-18  Jeff Law  
 
+   * tree-ssa-dom.c (record_equivalences_from_phis): Do not
+   record symbolic equivalences from backedges in the CFG.
+
Revert
2017-11-19  Jeff Law  
 
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 

Re: [PATCH] Fix IPA flattening ICE (PR ipa/82801)

2017-12-18 Thread Jeff Law
On 12/12/2017 11:57 AM, Jakub Jelinek wrote:
> Hi!
> 
> We ICE on the following testcase, because we first gather all cgraph nodes
> in an array and then we walk it backwards and flatten_function anything
> that has "flatten" attribute.  But flatten_function can result in cgraph
> node removal and so we then try to dereference removed cgraph nodes.
> 
> From the array we are only interested in nodes with flatten attribute, so
> the patch ignores all other nodes (thus the common case of no flatten
> attributes in the TU is fast) by keeping them at the end of the order array
> only, and if we have at least 2 nodes to flatten, we additionally register
> a removal hook just in case a flatten node would be removed (the testcase
> has a non-flatten node removed only).  
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-12-12  Jakub Jelinek  
> 
>   PR ipa/82801
>   * ipa-inline.c (flatten_remove_node_hook): New function.
>   (ipa_inline): Keep only nodes with flatten attribute at the end of
>   the array in the order from ipa_reverse_postorder, only walk that
>   portion of array for flattening, if there is more than one such
>   node, temporarily register a removal hook and ignore removed nodes.
> 
>   * g++.dg/ipa/pr82801.C: New test.
OK.  Make sure to reference 83346 in the ChangeLog entry.

jeff


Re: [005/nnn] poly_int: rtx constants

2017-12-18 Thread Jeff Law
On 12/14/2017 06:25 PM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>>> This patch adds an rtl representation of poly_int values.
>>> There were three possible ways of doing this:
>>>
>>> (1) Add a new rtl code for the poly_ints themselves and store the
>>> coefficients as trailing wide_ints.  This would give constants like:
>>>
>>>   (const_poly_int [c0 c1 ... cn])
>>>
>>> The runtime value would be:
>>>
>>>   c0 + c1 * x1 + ... + cn * xn
>>>
>>> (2) Like (1), but use rtxes for the coefficients.  This would give
>>> constants like:
>>>
>>>   (const_poly_int [(const_int c0)
>>>(const_int c1)
>>>...
>>>(const_int cn)])
>>>
>>> although the coefficients could be const_wide_ints instead
>>> of const_ints where appropriate.
>>>
>>> (3) Add a new rtl code for the polynomial indeterminates,
>>> then use them in const wrappers.  A constant like c0 + c1 * x1
>>> would then look like:
>>>
>>>   (const:M (plus:M (mult:M (const_param:M x1)
>>>(const_int c1))
>>>(const_int c0)))
>>>
>>> There didn't seem to be that much to choose between them.  The main
>>> advantage of (1) is that it's a more efficient representation and
>>> that we can refer to the cofficients directly as wide_int_storage.
>> Well, and #1 feels more like how we handle CONST_INT :-)
>>>
>>>
>>> 2017-10-23  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * doc/rtl.texi (const_poly_int): Document.
>>> * gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
>>> * rtl.h (const_poly_int_def): New struct.
>>> (rtx_def::u): Add a cpi field.
>>> (CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
>>> (CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
>>> (wi::rtx_to_poly_wide_ref): New typedef
>>> (const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
>>> (poly_int_rtx_p): New functions.
>>> (trunc_int_for_mode): Declare a poly_int64 version.
>>> (plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> (immed_wide_int_const): Take a poly_wide_int_ref rather than
>>> a wide_int_ref.
>>> (strip_offset): Declare.
>>> (strip_offset_and_add): New function.
>>> * rtl.def (CONST_POLY_INT): New rtx code.
>>> * rtl.c (rtx_size): Handle CONST_POLY_INT.
>>> (shared_const_p): Use poly_int_rtx_p.
>>> * emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
>>> HOST_WIDE_INT.
>>> (gen_int_shift_amount): Likewise.
>>> * emit-rtl.c (const_poly_int_hasher): New class.
>>> (const_poly_int_htab): New variable.
>>> (init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
>>> (const_poly_int_hasher::hash): New function.
>>> (const_poly_int_hasher::equal): Likewise.
>>> (gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> (immed_wide_int_const): Rename to...
>>> (immed_wide_int_const_1): ...this and make static.
>>> (immed_wide_int_const): New function, taking a poly_wide_int_ref
>>> instead of a wide_int_ref.
>>> (gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> (gen_lowpart_common): Handle CONST_POLY_INT.
>>> * cse.c (hash_rtx_cb, equiv_constant): Likewise.
>>> * cselib.c (cselib_hash_rtx): Likewise.
>>> * dwarf2out.c (const_ok_for_output_1): Likewise.
>>> * expr.c (convert_modes): Likewise.
>>> * print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
>>> * rtlhash.c (add_rtx): Likewise.
>>> * explow.c (trunc_int_for_mode): Add a poly_int64 version.
>>> (plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> Handle existing CONST_POLY_INT rtxes.
>>> * expmed.h (expand_shift): Take a poly_int64 instead of a
>>> HOST_WIDE_INT.
>>> * expmed.c (expand_shift): Likewise.
>>> * rtlanal.c (strip_offset): New function.
>>> (commutative_operand_precedence): Give CONST_POLY_INT the same
>>> precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
>>> and CONST_INT.
>>> * rtl-tests.c (const_poly_int_tests): New struct.
>>> (rtl_tests_c_tests): Use it.
>>> * simplify-rtx.c (simplify_const_unary_operation): Handle
>>> CONST_POLY_INT.
>>> (simplify_const_binary_operation): Likewise.
>>> (simplify_binary_operation_1): Fold additions of symbolic constants
>>> and CONST_POLY_INTs.
>>> (simplify_subreg): Handle extensions and truncations of
>>> CONST_POLY_INTs.
>>> (simplify_const_poly_int_tests): New struct.
>>> (simplify_rtx_c_tests): Use it.
>>> * wide-int.h (storage_ref): Add default constructor.
>>> (wide_int_ref_storage): Likewise.
>>> (trailing_wide_ints): Use GTY((user)).
>>> 

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-18 Thread Richard Biener
On Fri, Dec 15, 2017 at 7:39 PM, Bin.Cheng  wrote:
> On Fri, Dec 15, 2017 at 1:19 PM, Richard Biener
>  wrote:
>> On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng  wrote:
>>> On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng  wrote:
 On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
  wrote:
> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
>> Hi,
>> As explained in the PR, given below test case:
>> int a[8][10] = { [2][5] = 4 }, c;
>>
>> int
>> main ()
>> {
>>   short b;
>>   int i, d;
>>   for (b = 4; b >= 0; b--)
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][b + 2] = a[c][b + 1];
>>   for (i = 0; i < 8; i++)
>> for (d = 0; d < 10; d++)
>>   if (a[i][d] != (i == 3 && d == 6) * 4)
>> __builtin_abort ();
>>   return 0;
>>
>> the loop nest is illegal for vectorization without reversing inner loop. 
>>  The issue
>> is in data dependence checking of vectorizer, I believe the mentioned 
>> revision just
>> exposed this.  Previously the vectorization is skipped because of 
>> unsupported memory
>> operation.  The outer loop vectorization unrolls the outer loop into:
>>
>>   for (b = 4; b > 0; b -= 4)
>>   {
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][6] = a[c][5];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][5] = a[c][4];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][4] = a[c][3];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][3] = a[c][2];
>>   }
>> Then four inner loops are fused into:
>>   for (b = 4; b > 0; b -= 4)
>>   {
>> for (c = 0; c <= 6; c++)
>> {
>>   a[c + 1][6] = a[c][5];  // S1
>>   a[c + 1][5] = a[c][4];  // S2
>>   a[c + 1][4] = a[c][3];
>>   a[c + 1][3] = a[c][2];
>> }
>>   }
>
> Note that they are not really "fused" but they are interleaved.  With
> GIMPLE in mind
> that makes a difference, you should get the equivalent of
>
>for (c = 0; c <= 6; c++)
>  {
>tem1 = a[c][5];
>tem2 = a[c][4];
>tem3 = a[c][3];
>tem4 = a[c][2];
>a[c+1][6] = tem1;
>a[c +1][5] = tem2;
> a[c+1][4] = tem3;
> a[c+1][3] = tem4;
>  }
 Yeah, I will double check if this abstract breaks the patch and how.
>>> Hmm, I think this doesn't break it, well at least for part of the
>>> analysis, because it is loop carried (backward) dependence goes wrong,
>>> interleaving or not with the same iteration doesn't matter here.
>>
>> I think the idea is that forward dependences are always fine (negative 
>> distance)
>> to vectorize.  But with backward dependences we have to adhere to max_vf.
>>
>> It looks like for outer loop vectorization we only look at the distances in 
>> the
>> outer loop but never at inner ones.  But here the same applies but isn't that
>> independend on the distances with respect to the outer loop?
>>
>> But maybe I'm misunderstanding how "distances" work here.
> Hmm, I am not sure I understand "distance" correctly.  With
> description as in book like "Optimizing compilers for Modern
> Architectures", distance is "# of iteration of sink ref - # of
> iteration of source ref".  Given below example:
>   for (i = 0; i < N; ++i)
> {
>   x = arr[idx_1];  // S1
>   arr[idx_2] = x;  // S2
> }
> if S1 is source ref, distance = idx_2 - idx_1, and distance > 0.  Also
> this is forward dependence.  For example, idx_1 is i + 1 and idx_2 is
> i;
> If S2 is source ref, distance = idx_1 - idx_2, and distance < 0.  Also
> this is backward dependence.  For example idx_1 is i and idx_2 is i +
> 1;
>
> In GCC, we always try to subtract idx_2 from idx_1 first in computing
> classic distance, we could result in negative distance in case of
> backward dependence.  When this happens at dependence carried level,
> we manually reverse it.  When this happens at inner level loop, we
> simply keep the negative distance.  And it's meaningless to talk about
> forward/backward given dependence is carried at outer level loop.
>
> Outer loop vectorization is interesting.  The problematic case has
> backward dependence carried by outer loop.  Because we don't check
> dist vs. max_vf for such dep, the unrolled references could have outer
> loop index equal to each other, as in a[c][5] and a[c+1][5].  So it's
> like we have outer loop index resolved as equal.  Now it has
> dependence only if c == c' + 1.  I think previous comment on fusion
> still hold for this and we now need to check backward dependence
> between the two reference at inner level loop.  I guess it's a bit
> trick to model dependence between a[c][5] and a[c+1][5] using the
> original references and dependence.  I think it's 

Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-18 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 09:34:44AM +0100, Richard Biener wrote:
> I don't like this too much.  Iff then we should do "real" lazy
> computation, like adding a TREE_SIDE_EFFECTS_VALID flag on STATEMENT_LIST,
> keeping TREE_SIDE_EFFECTS up-to-date when easily possible and when doing
> the expensive thing cache the result.  That said, I'm not convinced
> this will fix -fcompare-debug issues for good.  Is it really necessary
> to introduce this IL difference so early and in such an intrusive way?
> 
> Can't we avoid adding # DEBUG BEGIN_STMT when there's not already
> a STATEMENT_LIST around for example?

The STATEMENT_LIST handling is a complete mess.

E.g.
tree
alloc_stmt_list (void)
{
  tree list;
  if (!vec_safe_is_empty (stmt_list_cache))
{
  list = stmt_list_cache->pop ();
  memset (list, 0, sizeof (struct tree_base));
  TREE_SET_CODE (list, STATEMENT_LIST);
}
  else
list = make_node (STATEMENT_LIST);

Because make_node (STATEMENT_LIST) sets TREE_SIDE_EFFECTS, but the above
memset clears it, we start with inconsistent TREE_SIDE_EFFECTS from the
beginning.

I've tried to track TREE_SIDE_EFFECTS precisely, as below, but that seems to
break completely the statement expression handling, in particular
make check-gcc check-g++ RUNTESTFLAGS="compile.exp=20030530-3.c 
debug.exp='20020224-1.c 20030605-1.c' dg.exp='Wduplicated-branches-1.c 
compare2.c gnu89-const-expr-1.c pr64637.c pr68412-2.c stmt-expr-*.c pr56419.C' 
lto.exp=pr83388* noncompile.exp=971104-1.c dg-torture.exp=pr51071.c 
tree-ssa.exp='20030711-2.c 20030714-1.c 20030731-1.c' vect.exp=pr33833.c 
tm.exp=pr56419.C"
all FAILs because of that, e.g.
gcc/testsuite/gcc.c-torture/compile/20030530-3.c:14:8: error: void value not 
ignored as it ought to be
I've also tried to track instead of tracking TREE_SIDE_EFFECTS precisely
track just ignore DEBUG_BEGIN_STMTs in the processing, but that
unfortunately doesn't help for the testcase included in the patch (attached
patch).

2017-12-18  Jakub Jelinek  

PR debug/83419
* tree-iterator.c (alloc_stmt_list): Clear TREE_SIDE_EFFECTS for a new
node.
(tsi_link_after): Set TREE_SIDE_EFFECTS when adding t with
TREE_SIDE_EFFECTS.
(tsi_link_before): Likewise.  Formatting fix.
(tsi_delink): Recompute TREE_SIDE_EFFECTS on removal.
* gimplify.c (gimplify_statement_list): Clear TREE_SIDE_EFFECTS.

* gcc.dg/pr83419.c: New test.

--- gcc/tree-iterator.c.jj  2017-12-12 09:48:26.0 +0100
+++ gcc/tree-iterator.c 2017-12-18 10:13:58.776212769 +0100
@@ -41,7 +41,10 @@ alloc_stmt_list (void)
   TREE_SET_CODE (list, STATEMENT_LIST);
 }
   else
-list = make_node (STATEMENT_LIST);
+{
+  list = make_node (STATEMENT_LIST);
+  TREE_SIDE_EFFECTS (list) = 0;
+}
   TREE_TYPE (list) = void_type_node;
   return list;
 }
@@ -137,7 +140,7 @@ tsi_link_before (tree_stmt_iterator *i,
   tail = head;
 }
 
-  if (TREE_CODE (t) != DEBUG_BEGIN_STMT)
+  if (TREE_SIDE_EFFECTS (t))
 TREE_SIDE_EFFECTS (i->container) = 1;
 
   cur = i->ptr;
@@ -157,9 +160,9 @@ tsi_link_before (tree_stmt_iterator *i,
 {
   head->prev = STATEMENT_LIST_TAIL (i->container);
   if (head->prev)
-   head->prev->next = head;
+   head->prev->next = head;
   else
-   STATEMENT_LIST_HEAD (i->container) = head;
+   STATEMENT_LIST_HEAD (i->container) = head;
   STATEMENT_LIST_TAIL (i->container) = tail;
 }
 
@@ -214,7 +217,7 @@ tsi_link_after (tree_stmt_iterator *i, t
   tail = head;
 }
 
-  if (TREE_CODE (t) != DEBUG_BEGIN_STMT)
+  if (TREE_SIDE_EFFECTS (t))
 TREE_SIDE_EFFECTS (i->container) = 1;
 
   cur = i->ptr;
@@ -275,10 +278,28 @@ tsi_delink (tree_stmt_iterator *i)
   else
 STATEMENT_LIST_TAIL (i->container) = prev;
 
-  if (!next && !prev)
-TREE_SIDE_EFFECTS (i->container) = 0;
-
   i->ptr = next;
+
+  if (TREE_SIDE_EFFECTS (i->container))
+{
+  while (next || prev)
+   {
+ if (next)
+   {
+ if (TREE_SIDE_EFFECTS (next->stmt))
+   break;
+ next = next->next;
+   }
+ if (prev)
+   {
+ if (TREE_SIDE_EFFECTS (prev->stmt))
+   break;
+ prev = prev->prev;
+   }
+   }
+  if (next == NULL && prev == NULL)
+   TREE_SIDE_EFFECTS (i->container) = 0;
+}
 }
 
 /* Return the first expression in a sequence of COMPOUND_EXPRs, or in
--- gcc/gimplify.c.jj   2017-12-14 21:11:40.0 +0100
+++ gcc/gimplify.c  2017-12-18 10:17:07.922556324 +0100
@@ -1763,6 +1763,9 @@ gimplify_statement_list (tree *expr_p, g
 
   tree_stmt_iterator i = tsi_start (*expr_p);
 
+  /* No need to recompute TREE_SIDE_EFFECTS on removal, we are going to
+ delink everything.  */
+  TREE_SIDE_EFFECTS (*expr_p) = 0;
   while (!tsi_end_p (i))
 {
   gimplify_stmt (tsi_stmt_ptr (i), pre_p);
--- gcc/testsuite/gcc.dg/pr83419.c.jj   2017-12-18 

Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 02:02:52PM +0100, Jakub Jelinek wrote:
> Because make_node (STATEMENT_LIST) sets TREE_SIDE_EFFECTS, but the above
> memset clears it, we start with inconsistent TREE_SIDE_EFFECTS from the
> beginning.
> 
> I've tried to track TREE_SIDE_EFFECTS precisely, as below, but that seems to
> break completely the statement expression handling, in particular
> make check-gcc check-g++ RUNTESTFLAGS="compile.exp=20030530-3.c 
> debug.exp='20020224-1.c 20030605-1.c' dg.exp='Wduplicated-branches-1.c 
> compare2.c gnu89-const-expr-1.c pr64637.c pr68412-2.c stmt-expr-*.c 
> pr56419.C' lto.exp=pr83388* noncompile.exp=971104-1.c 
> dg-torture.exp=pr51071.c tree-ssa.exp='20030711-2.c 20030714-1.c 
> 20030731-1.c' vect.exp=pr33833.c tm.exp=pr56419.C"
> all FAILs because of that, e.g.
> gcc/testsuite/gcc.c-torture/compile/20030530-3.c:14:8: error: void value not 
> ignored as it ought to be
> I've also tried to track instead of tracking TREE_SIDE_EFFECTS precisely
> track just ignore DEBUG_BEGIN_STMTs in the processing, but that
> unfortunately doesn't help for the testcase included in the patch (attached
> patch).

One option would be to deal with that in pop_stmt_list and the C++ caller
thereof, where we right now have:

  /* If the statement list contained exactly one statement, then
 extract it immediately.  */
  if (tsi_one_before_end_p (i))
{
  u = tsi_stmt (i);
  tsi_delink ();
  free_stmt_list (t);
  t = u;
}

This part would need a MAY_HAVE_DEBUG_MARKER_STMTS variant that would
check for the case where there are just DEBUG_BEGIN_STMT markers + one other
stmt (+ maybe some further DEBUG_BEGIN_STMT markers) and nothing else.
If that one other stmt doesn't have TREE_SIDE_EFFECTS, clear
TREE_SIDE_EFFECTS on the whole statement list to get the same
TREE_SIDE_EFFECTS from the returned expression as without -g.

  /* If the statement list contained a debug begin stmt and a
 statement list, move the debug begin stmt into the statement
 list and return it.  */
  else if (!tsi_end_p (i)
   && TREE_CODE (tsi_stmt (i)) == DEBUG_BEGIN_STMT)
{
  u = tsi_stmt (i);
  tsi_next ();
  if (tsi_one_before_end_p (i)
  && TREE_CODE (tsi_stmt (i)) == STATEMENT_LIST)
{
  tree l = tsi_stmt (i);
  tsi_prev ();
  tsi_delink ();
  tsi_delink ();
  i = tsi_start (l);
  free_stmt_list (t);
  t = l;
  tsi_link_before (, u, TSI_SAME_STMT);
}
}

I don't understand how the above hack can work reliably, first of all,
it handles only a single DEBUG_BEGIN_STMT before, none after, and doesn't
recurse.  Perhaps it wouldn't be needed at all if we do the above, or would
be just an optimization for a common case, not a correctness fix?

Jakub


[ja...@redhat.com: Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)]

2017-12-18 Thread Jakub Jelinek

--- Begin Message ---
On Mon, Dec 18, 2017 at 02:35:07PM +0100, Jakub Jelinek wrote:
> On Mon, Dec 18, 2017 at 02:17:18PM +0100, Jakub Jelinek wrote:
> > One option would be to deal with that in pop_stmt_list and the C++ caller
> > thereof, where we right now have:
> > 
> >   /* If the statement list contained exactly one statement, then
> >  extract it immediately.  */
> >   if (tsi_one_before_end_p (i))
> > {
> >   u = tsi_stmt (i);
> >   tsi_delink ();
> >   free_stmt_list (t);
> >   t = u;
> > }
> > 
> > This part would need a MAY_HAVE_DEBUG_MARKER_STMTS variant that would
> > check for the case where there are just DEBUG_BEGIN_STMT markers + one other
> > stmt (+ maybe some further DEBUG_BEGIN_STMT markers) and nothing else.
> > If that one other stmt doesn't have TREE_SIDE_EFFECTS, clear
> > TREE_SIDE_EFFECTS on the whole statement list to get the same
> > TREE_SIDE_EFFECTS from the returned expression as without -g.
> 
> Like this (the cp_parser_omp_for_loop_init function would need similar
> changes).  Except while it works for this new testcase and some of the
> testcases I've listed, it still doesn't work for others.
> 
> Alex, I'm giving up here.

Forgot to attach patch for reference:

--- gcc/testsuite/gcc.dg/pr83419.c.jj   2017-12-18 10:08:24.214911480 +0100
+++ gcc/testsuite/gcc.dg/pr83419.c  2017-12-18 10:08:24.214911480 +0100
@@ -0,0 +1,16 @@
+/* PR debug/83419 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcompare-debug" } */
+
+int a, b;
+void foo (int, ...);
+
+void
+bar (void)
+{
+  if (a || 1 == b)
+foo (1);
+  else
+0;
+  foo (1, 0);
+}
--- gcc/c-family/c-semantics.c.jj   2017-12-12 09:48:11.0 +0100
+++ gcc/c-family/c-semantics.c  2017-12-18 14:30:06.389129165 +0100
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.
 #include "coretypes.h"
 #include "c-common.h"
 #include "tree-iterator.h"
+#include "options.h"
 
 /* Create an empty statement tree rooted at T.  */
 
@@ -76,25 +77,49 @@ pop_stmt_list (tree t)
  free_stmt_list (t);
  t = u;
}
-  /* If the statement list contained a debug begin stmt and a
-statement list, move the debug begin stmt into the statement
-list and return it.  */
-  else if (!tsi_end_p (i)
-  && TREE_CODE (tsi_stmt (i)) == DEBUG_BEGIN_STMT)
+  else if (MAY_HAVE_DEBUG_MARKER_STMTS)
{
- u = tsi_stmt (i);
- tsi_next ();
- if (tsi_one_before_end_p (i)
- && TREE_CODE (tsi_stmt (i)) == STATEMENT_LIST)
+ while (!tsi_end_p (i)
+&& TREE_CODE (tsi_stmt (i)) == DEBUG_BEGIN_STMT)
+   tsi_next ();
+ if (!tsi_end_p (i) && !TREE_SIDE_EFFECTS (tsi_stmt (i)))
+   {
+ tsi_next ();
+ while (!tsi_end_p (i)
+&& TREE_CODE (tsi_stmt (i)) == DEBUG_BEGIN_STMT)
+   tsi_next ();
+ /* With -g0, if the above tsi_one_before_end_p case would
+hit, we'd return the single stmt without TREE_SIDE_EFFECTS
+set.  We need to return a STATEMENT_LIST instead, ensure it
+has a TREE_SIDE_EFFECTS flag clear.  */
+ if (tsi_end_p (i))
+   {
+ TREE_SIDE_EFFECTS (t) = 0;
+ return t;
+   }
+   }
+
+ i = tsi_start (t);
+ /* As an optimization, if the statement list contained a debug begin
+stmt and a statement list, move the debug begin stmt into the
+statement list and return it.  */
+ if (!tsi_end_p (i)
+ && TREE_CODE (tsi_stmt (i)) == DEBUG_BEGIN_STMT)
{
- tree l = tsi_stmt (i);
- tsi_prev ();
- tsi_delink ();
- tsi_delink ();
- i = tsi_start (l);
- free_stmt_list (t);
- t = l;
- tsi_link_before (, u, TSI_SAME_STMT);
+ u = tsi_stmt (i);
+ tsi_next ();
+ if (tsi_one_before_end_p (i)
+ && TREE_CODE (tsi_stmt (i)) == STATEMENT_LIST)
+   {
+ tree l = tsi_stmt (i);
+ tsi_prev ();
+ tsi_delink ();
+ tsi_delink ();
+ i = tsi_start (l);
+ free_stmt_list (t);
+ t = l;
+ tsi_link_before (, u, TSI_SAME_STMT);
+   }
}
}
 }
--- gcc/gimplify.c.jj   2017-12-14 21:11:40.0 +0100
+++ gcc/gimplify.c  2017-12-18 10:17:07.922556324 +0100
@@ -1763,6 +1763,9 @@ gimplify_statement_list (tree *expr_p, g
 
   tree_stmt_iterator i = tsi_start (*expr_p);
 
+  /* No need to recompute TREE_SIDE_EFFECTS on removal, we are going to
+ delink everything.  */
+  TREE_SIDE_EFFECTS (*expr_p) = 0;
   while (!tsi_end_p (i))
 {
   gimplify_stmt (tsi_stmt_ptr (i), pre_p);
--- 

Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-18 Thread Richard Biener
On December 18, 2017 5:06:25 PM GMT+01:00, Jeff Law  wrote:
>On 12/18/2017 03:15 AM, Richard Biener wrote:
>> On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
>>  wrote:
>>> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law 
>wrote:
 I hate this patch.

 The fundamental problem we have is that there are times when we
>very
 much want to thread jumps and yet there are other times when we do
>not.

 To date we have been able to largely select between those two by
 looking
 at the shape of the CFG and the jump thread to see how threading a
 particular jump would impact the shape of the CFG (particularly
>loops
 in
 the CFG).

 In this BZ we have a loop like this:

2
|
3<---+
||
4<---+   |
   / \   |   |
  5   6  |   |
   \  /  |   |
 7   |   |
/ \  |   |
   8  11-+   |
  / \|
 9  10---+
 |
 E

 We want to thread the path (6, 7) (7, 11).  ie, there's a path
>through
 that inner loop where we don't need to do the loop test.  Here's an
 example (from libgomp testsuite)

 (N is 1500)

 for (j = 0; j < N; j++)
{
  if (j > 500)
{
  x[i][j] = i + j + 3;
  y[j] = i*j + 10;
}
  else
x[i][j] = x[i][j]*3;
}



 Threading (in effect) puts the loop test into both arms of the
 conditional, then removes it from the ELSE arm because the
>condition is
 always "keep looping" in that arm.

 This plays poorly with loop parallelization -- I tried to follow
>what's
 going on in there and just simply got lost.  I got as far as seeing
 that
 the parallelization code thought there were loop carried
>dependencies.
 At least that's how it looked to me.  But I don't see any loop
>carried
 dependencies in the code.
>>>
>>> Hmm. I'll double check if I remember on Monday.
>> 
>> So I did and the ultimate reason GRAPHITE FAILs is because for
>> the loop with the threading applied we can no longer compute
>> number of iterations.  That's of course bad for _all_ loop
>optimizers,
>> not just GRAPHITE (we just don't have any other test coverage).
>> The main issue here is that after the threading is applied the
>> block with the loop exit no longer dominates the loop latch.
>> This means the loop should really be split but it also means we
>should
>> definitely avoid performing such threading before loop optimizers
>run,
>> and not just if parallelizing loops.
>> 
>> Can you adjust your patch this way?  The condition you check seems
>> to more or less match the case where we thread through the exit
>> test _into_ the loop?  The bad thing is really if in the resulting
>CFG
>> the exit test isn't dominating the latch (thus we have an exit test
>> that is not always executed):
>The problem is we'll end up regressing other vectorization cases.  I
>looked at a variety of things WRT the shape of the CFG and ultimately
>found that there wasn't anything I could use to distinguish the two
>cases.   I'll look again though.

I think a good sanity check would be to assert (or dump) whenever we change the 
return value of number_of_iterations_exit from before to after threading in 
case the block with the exit test is involved in the threading path. 

I guess we can't undo after the fact but it might help getting more testcases 
and thus an idea what to look for exactly. 

Maybe sth to train the very first AI inside GCC ;) 

Richard. 

>Jeff



Re: [PATCH] rtlanal: dead_or_set_regno_p should handle CLOBBER (PR83424)

2017-12-18 Thread Jeff Law
On 12/16/2017 02:03 PM, Segher Boessenkool wrote:
> In PR83424 combine's move_deaths puts a REG_DEAD not in the wrong place
> because dead_or_set_regno_p does not account for CLOBBER insns.  This
> fixes it.
> 
> Bootstrapped and tested on powerpc64-linux {-m32,-m64} and on x86_64-linux.
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2017-12-16  Segher Boessenkool  
> 
>   PR rtl-optimization/83424
>   * rtlanal.c (dead_or_set_regno_p): Handle CLOBBER just like SET.
> 
> gcc/testsuite/
>   PR rtl-optimization/83424
>   * gcc.dg/pr83424.c: New testsuite.
OK.
jeff


Re: Add support for fully-predicated loops

2017-12-18 Thread Jeff Law
On 11/17/2017 07:56 AM, Richard Sandiford wrote:
> This patch adds support for using a single fully-predicated loop instead
> of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
> the predicate for each iteration of the loop, given the current scalar
> iv value and the loop bound.  This operation is wrapped up in a new internal
> function called WHILE_ULT.  E.g.:
> 
>WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
>WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
> 
> The third WHILE_ULT argument is needed to make the operation
> unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
> seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
> different numbers of elements.
> 
> Note that the patch uses "mask" and "fully-masked" instead of
> "predicate" and "fully-predicated", to follow existing GCC terminology.
> 
> This patch just handles the simple cases, punting for things like
> reductions and live-out values.  Later patches remove most of these
> restrictions.
> 
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2017-11-17  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * optabs.def (while_ult_optab): New optab.
>   * doc/md.texi (while_ult@var{m}@var{n}): Document.
>   * internal-fn.def (WHILE_ULT): New internal function.
>   * internal-fn.h (direct_internal_fn_supported_p): New override
>   that takes two types as argument.
>   * internal-fn.c (while_direct): New macro.
>   (expand_while_optab_fn): New function.
>   (convert_optab_supported_p): Likewise.
>   (direct_while_optab_supported_p): New macro.
>   * wide-int.h (wi::udiv_ceil): New function.
>   * tree-vectorizer.h (rgroup_masks): New structure.
>   (vec_loop_masks): New typedef.
>   (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
>   and fully_masked_p.
>   (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
>   (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
>   (vect_max_vf): New function.
>   (slpeel_make_loop_iterate_ntimes): Delete.
>   (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
>   (vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
>   )vect_record_loop_mask, vect_get_loop_mask): Likewise.
>   * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
>   internal-fn.h, stor-layout.h and optabs-query.h.
>   (vect_set_loop_mask): New function.
>   (add_preheader_seq): Likewise.
>   (add_header_seq): Likewise.
>   (vect_maybe_permute_loop_masks): Likewise.
>   (vect_set_loop_masks_directly): Likewise.
>   (vect_set_loop_condition_masked): Likewise.
>   (vect_set_loop_condition_unmasked): New function, split out from
>   slpeel_make_loop_iterate_ntimes.
>   (slpeel_make_loop_iterate_ntimes): Rename to..
>   (vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
>   for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
>   (vect_do_peeling): Update call accordingly.
>   (vect_gen_vector_loop_niters): Use VF as the step for fully-masked
>   loops.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>   mask_compare_type, can_fully_mask_p and fully_masked_p.
>   (release_vec_loop_masks): New function.
>   (_loop_vec_info): Use it to free the loop masks.
>   (can_produce_all_loop_masks_p): New function.
>   (vect_get_max_nscalars_per_iter): Likewise.
>   (vect_verify_full_masking): Likewise.
>   (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
>   retries, and free the mask rgroups before retrying.  Check loop-wide
>   reasons for disallowing fully-masked loops.  Make the final decision
>   about whether use a fully-masked loop or not.
>   (vect_estimate_min_profitable_iters): Do not assume that peeling
>   for the number of iterations will be needed for fully-masked loops.
>   (vectorizable_reduction): Disable fully-masked loops.
>   (vectorizable_live_operation): Likewise.
>   (vect_halve_mask_nunits): New function.
>   (vect_double_mask_nunits): Likewise.
>   (vect_record_loop_mask): Likewise.
>   (vect_get_loop_mask): Likewise.
>   (vect_transform_loop): Handle the case in which the final loop
>   iteration might handle a partial vector.  Call vect_set_loop_condition
>   instead of slpeel_make_loop_iterate_ntimes.
>   * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
>   (check_load_store_masking): New function.
>   (prepare_load_store_mask): Likewise.
>   (vectorizable_store): Handle fully-masked loops.
>   (vectorizable_load): Likewise.
>   

Re: [i386] PR81842 [CET] -fcf-protection -mcet is incompatible with makecontext family functions

2017-12-18 Thread Sandra Loosemore

On 12/17/2017 05:05 PM, Tsimbalist, Igor V wrote:

-fcf-protection -mcet is incompatible with makecontext family functions
since they can't properly set up and destroy shadow stack pointer. This
change provides a mechanism to help detection shadow stack compatibility.
The current proposal is to add -mcheck-shstk-compat option which will
predefine __CHECK_SHSTK_COMPAT__ macro. The option will be
set on by default.  Then we can add a code

#if defined __SHSTK__ && defined __CHECK_SHSTK_COMPAT__
# error This source is incompatible with -mshstk
#endif

to .


The functional change here is out of my maintainership domain, but
Why does this need a new macro and a new option to control it?  If the 
code being protected doesn't work properly with -mshstk, it seems like 
it would be more robust to do just


#if defined __SHSTK__
# error This source is incompatible with -mshstk
#endif

I don't see any discussion in the bugzilla issue to explain this.

Re the proposed documentation for the new option:


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1413095..7b4223a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26225,6 +26225,15 @@ The option has effect only if the 
@option{-fcf-protection=full} or
 @option{-mshstk} is on by default when the @option{-mcet} option is
 specified.
 
+@item -mcheck-shstk-compat

+@opindex mcheck-shstk-compat
+This option predefines __CHECK_SHSTK_COMPAT__ macro, which can be used


You need to add @code markup on all the macro names here.


+to add a guard to the C/C++ sources which are incompatible with Intel


s/which/that/


+shadow stack technology.  A typical case would be issuing an error when > 
+both __SHSTK__ and __CHECK_SHSTK_COMPAT__ macro are defined.  The option
+@option{-mcheck-shstk-compat} is on by default when the @code{-mshstk}
+option is specified.
+
 @item -mcrc32
 @opindex mcrc32
 This option enables built-in functions @code{__builtin_ia32_crc32qi},


-Sandra


Re: C++ PATCH to fix wrong-code with constexpr call table cache (PR c++/83116)

2017-12-18 Thread Marek Polacek
On Mon, Dec 18, 2017 at 01:07:18PM -0500, Jason Merrill wrote:
> On Mon, Dec 18, 2017 at 10:09 AM, Marek Polacek  wrote:
> > Here the problem was that cxx_eval_call_expression can cache the result of a
> > constexpr call in constexpr_call_table, but we have to be careful, after
> > store_init_value the result might be invalid.  So I believe we also have to
> > clear the constexpr call table.  I've lumped it together with clearing
> > cv_cache.
> 
> Hmm, that seems like a big hammer; the problem isn't that
> store_init_value makes the result invalid, it's that the result
> calculated during store_init_value (when we can treat the object as
> constant) isn't relevant later (when the object is no longer
> constant).  So we want to avoid caching when we're called for the
> initial value.  Maybe by changing
> 
> if (depth_ok && !non_constant_args)
> 
> to
> 
> if (depth_ok && !non_constant_args && ctx->strict)
> 
> ?  Does that work?

Yes!  I wish I'd thought of ctx->strict before.  Well, something to remember.

Bootstrapped/regtested on x86_64-linux, ok for trunk/7?

2017-12-18  Marek Polacek  

PR c++/83116
* constexpr.c (cxx_eval_call_expression): Only look into
constexpr_call_table if ctx->strict.

* g++.dg/cpp1y/constexpr-83116.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 0455be1d6da..518798e0c43 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1588,7 +1588,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   tree result = NULL_TREE;
 
   constexpr_call *entry = NULL;
-  if (depth_ok && !non_constant_args)
+  if (depth_ok && !non_constant_args && ctx->strict)
 {
   new_call.hash = iterative_hash_template_arg
(new_call.bindings, constexpr_fundef_hasher::hash (new_call.fundef));
diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C 
gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
index e69de29bb2d..18d79e2e1cc 100644
--- gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
@@ -0,0 +1,18 @@
+// PR c++/83116
+// { dg-do run { target c++14 } }
+// { dg-options "-O2" }
+
+struct S {
+  constexpr S () : s(0) { foo (); }
+  constexpr int foo () { return s; }
+  int s;
+};
+
+int
+main ()
+{
+  static S var;
+  var.s = 5;
+  if (var.s != 5 || var.foo () != 5)
+__builtin_abort ();
+}

Marek


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 10:00:36AM -0700, Martin Sebor wrote:
> On 12/18/2017 08:10 AM, Marek Polacek wrote:
> > I'm not entirely up to speed with this code, but this one seemed 
> > sufficiently
> > obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
> > Otherwise, go with maxobjsize.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Thanks for the fix.  I noticed the problem only comes up when
> memcpy is declared without a prototype.  Otherwise, the memcpy
> call is eliminated early on (during gimplification?)  So while
> avoiding the ICE is obviously a necessary change, I wonder if
> this bug also suggests a missing optimization that might still
> be relevant (i.e., eliminating memcpy() with a zero size for
> a memcpy without a prototype):

No, we shouldn't try to optimize if people mess up stuff.
What happens in this case is that
gimple_builtin_call_types_compatible_p
will fail, thus gimple_call_builtin_p and we won't be treating it specially.

That just shows a problem in the new pass:
  tree func = gimple_call_fndecl (call);
  if (!func || DECL_BUILT_IN_CLASS (func) != BUILT_IN_NORMAL)
return;
is just wrong, it should have been:
  if (!gimple_call_builtin_p (call, BUILT_IN_NORMAL))
return;

Then you don't need to worry about float or _Complex int third argument,
or just 2 or 27 arguments instead of 3 for memcpy, etc.

Jakub


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 10:07 AM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 10:00:36AM -0700, Martin Sebor wrote:

On 12/18/2017 08:10 AM, Marek Polacek wrote:

I'm not entirely up to speed with this code, but this one seemed sufficiently
obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
Otherwise, go with maxobjsize.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


Thanks for the fix.  I noticed the problem only comes up when
memcpy is declared without a prototype.  Otherwise, the memcpy
call is eliminated early on (during gimplification?)  So while
avoiding the ICE is obviously a necessary change, I wonder if
this bug also suggests a missing optimization that might still
be relevant (i.e., eliminating memcpy() with a zero size for
a memcpy without a prototype):


No, we shouldn't try to optimize if people mess up stuff.


Declaring memcpy() without a prototype isn't messing up.  It may
be poor practice but there's nothing inherently wrong with it.
GCC accepts it, treats it as a built-in (i.e., doesn't complain
about the declaration being incompatible with the built-in), but
doesn't optimize it.  On the other hand, as I said, when someone
really does mess up and declares memcpy like so:

  void* memcpy (long, long, long);

GCC optimizes calls to it as if the function were declared with
the right prototype.


What happens in this case is that
gimple_builtin_call_types_compatible_p
will fail, thus gimple_call_builtin_p and we won't be treating it specially.
That just shows a problem in the new pass:
  tree func = gimple_call_fndecl (call);
  if (!func || DECL_BUILT_IN_CLASS (func) != BUILT_IN_NORMAL)
return;
is just wrong, it should have been:
  if (!gimple_call_builtin_p (call, BUILT_IN_NORMAL))
return;

Then you don't need to worry about float or _Complex int third argument,
or just 2 or 27 arguments instead of 3 for memcpy, etc.


Yes, but only at the cost of not checking calls to memcpy() and
other built ins declared without a prototype.  I didn't consider
those when I wrote the code but now that I have, since GCC allows
built-ins to be declared without a prototype and doesn't warn even
with -Wpedantic, it doesn't seem like a good approach to me.

Martin


[PATCH,WIP] Use functional parameters for data mappings in OpenACC child functions

2017-12-18 Thread Cesar Philippidis
Jakub,

I'd like your thoughts on the following problem.

One of the offloading bottlenecks with GPU acceleration in OpenACC is
the nontrivial offloaded function invocation overhead. At present, GCC
generates code to pass a struct containing one field for each of the
data mappings used in the OMP child function. I'm guessing a struct is
used because pthread_create only accepts a single for new threads. What
I'd like to do is to create the child function with one argument per
data mapping. This has a number of advantages:

  1. No device memory needs to be managed for the child function data
 mapping struct.

  2. On PTX targets, the .param address space is cached. Using
 individual parameters for function arguments will allow the nvptx
 back end to generate a more relaxed "execution model" because the
 thread initialization code will be accessing cache memory instead
 of global memory.

  3. It was my hope that this would set a path to eliminate the
 GOMP_MAP_FIRSTPRIVATE_INT optimization, by replacing those mappings
 with the actual value directly.

1) is huge for programs, such as cloverleaf, which launch a lot of small
parallel regions a lot of times.

For the execution model in 2), OpenACC begins each parallel region in a
gang-redundant, worker-single and vector-single state. To transition
from a single-threaded (or single vector lane) state to a multi-threaded
partitioned state, GCC needs to emit code to propagate live variables,
both on the stack and registers to the spawned threads. A lot of loops,
including DGEMV from BLAS, can be executed in a fully-redundant state.
Executing code redundantly has the advantage of not requiring any state
transition code. The problem here is that because a) the struct is in
global memory, and b) not all of the GPU threads are executing the same
instruction at the same time. Consequently, initializing each thread in
a fully redundant manner actually hurts performance. When I rewrote the
same test case passing the data mappings via individual parameters, that
optimization improved performance compared to GCC trunk's baseline.

Lastly, 3) is more of a simplification than anything else. I'm not too
concerned about this because those variables only get initialized once.
So long as they don't require a separate COPYIN data mapping, the
performance hit should be negligible.

In this first attempt at using parameters I taught lower_omp_target how
to create child functions for OpenACC parallel regions with individual
parameters for the data mappings instead of using a large struct. This
works for the most part, but I realized too late that pthread_create
only passes one argument to each thread it creates. It should be noted
that I left the kernels implementation as-is, using the global struct
argument because kernels in GCC is largely ineffective and it usually
falls back to executing code on the host CPU. Eventually, we want to
redo kernels, but not until we get the parallel code running efficiently.

For fallback host targets, libgomp is using libffi to pass arguments to
the offloaded functions. This works OK at the moment because the host
code is always single-threaded. Ideally, that would change in the
future, but I'm not aware of any immediate plans to do so.

Question: is this approach acceptable for Stage 1 in May, or should I
make the offloaded function parameter expansion target-specific? I can
think a couple of ways to make this target-specific:

  a. Create two child functions during lowering, one with individual
 parameters for the data mappings, and another which takes in a
 single struct. The latter then calls the former immediately on
 on entry.

  b. Teach oaccdevlow to expand the incoming struct into individual
 parameters.

I'm concerned that b) is going to be a large pass. The SRA pass is
somewhat large at 5k. While this should be simpler, I'm not sure by how
much (probably a lot because it won't need to preform as much analysis).

While this patch is functional, it's not complete. I still need to tweak
a couple of things in the runtime. But I don't want to spend too much
time on it if we decide to go with a different approach.

Any thoughts are welcome.

By the way, next we'll be working on increasing vector_length on nvptx
targets. In conjunction with that, we'll simplifying the OpenACC
execution model in the nvptx BE, along with adding a new reduction
finalizer.

Cesar
2017-12-18  Cesar Philippidis  

	Makefile.def: Make libgomp depend on libffi.
	configure.ac: Likewise.
	Makefile.in: Regenerate.
	configure: Regenerate.

	gcc/fortran/
	* types.def: (BF_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
	Define.

	gcc/
	* builtin-types.def (BF_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
	Define.
	* config/nvptx/nvptx.c (nvptx_expand_cmp_swap): Handle PARM_DECLs.
	* omp-builtins.def (BUILD_IN_GOACC_PARALLEL): Call
	GOACC_parallel_keyed_v2.
	* omp-expand.c (expand_omp_target): Update call to
	

[PATCH] Improve strlen pass handling of loads from memory (PR tree-optimization/83444)

2017-12-18 Thread Jakub Jelinek
Hi!

As the testcase shows, we fold strlen (str) == 0 into *str == 0 before
the strlen pass has a chance to optimize.  The following patch optimizes
loads from the known termination '\0' into lhs = '\0' and loads from the
part of the string known to be non-zero results in ~[0, 0] range being
recorded if we don't have anything better.

Note that unfortunately vrp2 doesn't consider any memory loads as
interesting, even when they have non-varying ranges from other passes,
but guess that is something to improve later.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-18  Jakub Jelinek  

PR tree-optimization/83444
* tree-ssa-strlen.c (strlen_check_and_optimize_stmt): Optimize
character loads.

* gcc.dg/strlenopt-36.c: New test.

--- gcc/tree-ssa-strlen.c.jj2017-12-18 14:57:20.0 +0100
+++ gcc/tree-ssa-strlen.c   2017-12-18 16:45:04.689862506 +0100
@@ -3104,6 +3104,64 @@ strlen_check_and_optimize_stmt (gimple_s
else if (code == EQ_EXPR || code == NE_EXPR)
  fold_strstr_to_strncmp (gimple_assign_rhs1 (stmt),
  gimple_assign_rhs2 (stmt), stmt);
+   else if (gimple_assign_load_p (stmt)
+&& TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE
+&& TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (char_type_node)
+&& (TYPE_PRECISION (TREE_TYPE (lhs))
+== TYPE_PRECISION (char_type_node))
+&& !gimple_has_volatile_ops (stmt))
+ {
+   tree off = integer_zero_node;
+   unsigned HOST_WIDE_INT coff = 0;
+   int idx = -1;
+   tree rhs1 = gimple_assign_rhs1 (stmt);
+   if (code == MEM_REF)
+ {
+   idx = get_stridx (TREE_OPERAND (rhs1, 0));
+   off = TREE_OPERAND (rhs1, 1);
+ }
+   else
+ idx = get_addr_stridx (rhs1, NULL_TREE, );
+   if (idx > 0)
+ {
+   strinfo *si = get_strinfo (idx);
+   if (si
+   && si->nonzero_chars
+   && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
+ {
+   widest_int w1 = wi::to_widest (si->nonzero_chars);
+   widest_int w2 = wi::to_widest (off) + coff;
+   if (w1 == w2
+   && si->full_string_p)
+ {
+   /* Reading the final '\0' character.  */
+   tree zero = build_int_cst (TREE_TYPE (lhs), 0);
+   gimple_set_vuse (stmt, NULL_TREE);
+   gimple_assign_set_rhs_from_tree (gsi, zero);
+   update_stmt (gsi_stmt (*gsi));
+ }
+   else if (w1 > w2)
+ {
+   /* Reading a character before the final '\0'
+  character.  Just set the value range to ~[0, 0]
+  if we don't have anything better.  */
+   wide_int min, max;
+   tree type = TREE_TYPE (lhs);
+   enum value_range_type vr
+ = get_range_info (lhs, , );
+   if (vr == VR_VARYING
+   || (vr == VR_RANGE
+   && min == wi::min_value (TYPE_PRECISION (type),
+TYPE_SIGN (type))
+   && max == wi::max_value (TYPE_PRECISION (type),
+TYPE_SIGN (type
+ set_range_info (lhs, VR_ANTI_RANGE,
+ wi::zero (TYPE_PRECISION (type)),
+ wi::zero (TYPE_PRECISION (type)));
+ }
+ }
+ }
+ }
 
if (strlen_to_stridx)
  {
--- gcc/testsuite/gcc.dg/strlenopt-36.c.jj  2017-12-18 16:49:48.357210798 
+0100
+++ gcc/testsuite/gcc.dg/strlenopt-36.c 2017-12-18 17:02:41.804261717 +0100
@@ -0,0 +1,38 @@
+/* PR tree-optimization/83444 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "abort \\(\\)" "optimized" } } */
+
+#include "strlenopt.h"
+
+void
+foo (void)
+{
+  char a[5] = "012";
+  strcpy (a, "");
+  if (strlen (a) != 0)
+abort ();
+}
+
+void
+bar (void)
+{
+  char a[5] = "012";
+  char b[7] = "";
+  strcpy (a, b);
+  if (strlen (a) != 0)
+abort ();
+}
+
+struct S { char a[4]; char b[5]; char c[7]; };
+
+void
+baz (void)
+{
+  struct S s;
+  strcpy (s.b, "012");
+  strcpy (s.c, "");
+  strcpy (s.b, s.c);
+  if (s.b[0] != 0)
+abort ();
+}

Jakub


Re: [PATCH] diagnose attribute conflicts on conversion operators (PR 83394)

2017-12-18 Thread Martin Sebor

On 12/13/2017 02:38 PM, Jason Merrill wrote:

On Wed, Dec 13, 2017 at 12:54 PM, Martin Sebor  wrote:

The attached update also fixes both instances of the ICE
reported in bug 83322 and supersedes Jakub's patch for that
bug (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00765.html).
This passes bootstrap on x86_64 with no new regressions (there
are an increasing number of failures on trunk at the moment
but, AFAICS, none caused by this patch).

Jason, I'm still trying to come up with a test case for templates
that would illustrate the issue you're concerned about.  If you
have one that would be great (preferably one showing a regression).


I looked at the case I was concerned about, and found that it isn't an
issue because in that case we call duplicate_decls before applying
attributes.

But it looks like we'll still get this testcase wrong, because the
code assumes that if the old decl is a single _DECL, it must match.

[[gnu::noinline]] void f() { }
[[gnu::always_inline]] void f(int) { }  // OK, not the same function

I think the answer is to use Nathan's new iterators unconditionally,
probably lkp_iterator.


Thanks for the test case.  You're right that this problem still
exists.  I thought a complete fix for it would be simple enough
to include in this patch but after running into issues with
assumptions about how inline/noinline conflicts are resolved
I think it's best to fix the ICE alone in this patch and deal
with the pre-existing bug above in a follow up.  Apparently
(according to comment #6 on pr83322) the ICE is causing some
anxiety about the timely availability of a fix, so I'd like
to avoid spending more time on it than is necessary.

Attached is an updated patch.  It handles the overloads above
correctly but it doesn't fix the latent problem and so they
are still diagnosed, same as in GCC 7.

Martin

PR c++/83394 - always_inline vs. noinline no longer diagnosed
PR c++/83322 - ICE: tree check: expected class ‘type’, have ‘exceptional’

gcc/cp/ChangeLog:

	PR c++/83394
	PR c++/83322
	* decl2.c (cplus_decl_attributes): Look up member functions
	in the scope of their class.

gcc/testsuite/ChangeLog:

	PR c++/83394
	* g++.dg/Wattributes-3.C: New test.
	* g++.dg/Wattributes-4.C: New test.
	* g++.dg/Wattributes-5.C: New test.

Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c	(revision 255779)
+++ gcc/cp/decl2.c	(working copy)
@@ -1432,6 +1432,67 @@ cp_omp_mappable_type (tree type)
   return true;
 }
 
+/* Return the last pushed declaration for the symbol DECL or NULL
+   when no such declaration exists.  */
+
+static tree
+find_last_decl (tree decl)
+{
+  tree last_decl = NULL_TREE;
+
+  if (tree name = DECL_P (decl) ? DECL_NAME (decl) : NULL_TREE)
+{
+  /* Look up the declaration in its scope.  */
+  tree pushed_scope = NULL_TREE;
+  if (tree ctype = DECL_CONTEXT (decl))
+	pushed_scope = push_scope (ctype);
+
+  last_decl = lookup_name (name);
+
+  if (pushed_scope)
+	pop_scope (pushed_scope);
+
+  /* The declaration may be a member conversion operator
+	 or a bunch of overfloads (handle the latter below).  */
+  if (last_decl && BASELINK_P (last_decl))
+	last_decl = BASELINK_FUNCTIONS (last_decl);
+}
+
+  if (!last_decl)
+return NULL_TREE;
+
+  if (DECL_P (last_decl) || TREE_CODE (last_decl) == OVERLOAD)
+{
+  /* A set of overloads of the same function.  */
+  for (lkp_iterator iter (last_decl); iter; ++iter)
+	{
+	  if (TREE_CODE (*iter) == OVERLOAD)
+	continue;
+
+	  if (decls_match (decl, *iter, /*record_decls=*/false))
+	return *iter;
+	}
+  return NULL_TREE;
+}
+
+  if (TREE_CODE (last_decl) == TREE_LIST)
+{
+  /* The list contains a mix of symbols with the same name
+	 (e.g., functions and data members defined in different
+	 base classes).  */
+  do
+	{
+	  if (decls_match (decl, TREE_VALUE (last_decl)))
+	return TREE_VALUE (last_decl);
+
+	  last_decl = TREE_CHAIN (last_decl);
+	}
+  while (last_decl);
+}
+
+  return NULL_TREE;
+}
+
 /* Like decl_attributes, but handle C++ complexity.  */
 
 void
@@ -1483,28 +1544,7 @@ cplus_decl_attributes (tree *decl, tree attributes
 }
   else
 {
-  tree last_decl = (DECL_P (*decl) && DECL_NAME (*decl)
-			? lookup_name (DECL_NAME (*decl)) : NULL_TREE);
-
-  if (last_decl && TREE_CODE (last_decl) == OVERLOAD)
-	for (ovl_iterator iter (last_decl, true); ; ++iter)
-	  {
-	if (!iter)
-	  {
-		last_decl = NULL_TREE;
-		break;
-	  }
-
-	if (TREE_CODE (*iter) == OVERLOAD)
-	  continue;
-
-	if (decls_match (*decl, *iter, /*record_decls=*/false))
-	  {
-		last_decl = *iter;
-		break;
-	  }
-	  }
-
+  tree last_decl = find_last_decl (*decl);
   decl_attributes (decl, attributes, flags, last_decl);
 }
 
Index: gcc/testsuite/g++.dg/Wattributes-3.C
===
--- 

Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-18 Thread Martin Sebor

On 12/14/2017 12:04 PM, Jeff Law wrote:

On 12/14/2017 11:55 AM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:

Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.


Sure, that sounds like a useful enhancement.  I'll look into
adding it as a follow-on patch unless you feel that it needs
to be part of the same package.


The problem is if we'll need changes to libubsan for that (which we'll
likely do), then those need to be upstreamed, and e.g. my attempts
to upstream simple patch to diagnose noreturn function returns is suspended
upstream because clang doesn't have that support (and I have no interest
in adding to to clang).

In theory we could have some GCC only file in there, but then we'd be ABI
incompatible with them.

So defer the sanitization side until Clang catches up?


I've committed the patch as is in r255790.  If I have some spare
cycles I'll see if the instrumentation is possible without changing
libubsan.

Martin



Re: C++ PATCH to fix wrong-code with constexpr call table cache (PR c++/83116)

2017-12-18 Thread Jason Merrill
OK.

On Mon, Dec 18, 2017 at 2:56 PM, Marek Polacek  wrote:
> On Mon, Dec 18, 2017 at 01:07:18PM -0500, Jason Merrill wrote:
>> On Mon, Dec 18, 2017 at 10:09 AM, Marek Polacek  wrote:
>> > Here the problem was that cxx_eval_call_expression can cache the result of 
>> > a
>> > constexpr call in constexpr_call_table, but we have to be careful, after
>> > store_init_value the result might be invalid.  So I believe we also have to
>> > clear the constexpr call table.  I've lumped it together with clearing
>> > cv_cache.
>>
>> Hmm, that seems like a big hammer; the problem isn't that
>> store_init_value makes the result invalid, it's that the result
>> calculated during store_init_value (when we can treat the object as
>> constant) isn't relevant later (when the object is no longer
>> constant).  So we want to avoid caching when we're called for the
>> initial value.  Maybe by changing
>>
>> if (depth_ok && !non_constant_args)
>>
>> to
>>
>> if (depth_ok && !non_constant_args && ctx->strict)
>>
>> ?  Does that work?
>
> Yes!  I wish I'd thought of ctx->strict before.  Well, something to remember.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk/7?
>
> 2017-12-18  Marek Polacek  
>
> PR c++/83116
> * constexpr.c (cxx_eval_call_expression): Only look into
> constexpr_call_table if ctx->strict.
>
> * g++.dg/cpp1y/constexpr-83116.C: New test.
>
> diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> index 0455be1d6da..518798e0c43 100644
> --- gcc/cp/constexpr.c
> +++ gcc/cp/constexpr.c
> @@ -1588,7 +1588,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>tree result = NULL_TREE;
>
>constexpr_call *entry = NULL;
> -  if (depth_ok && !non_constant_args)
> +  if (depth_ok && !non_constant_args && ctx->strict)
>  {
>new_call.hash = iterative_hash_template_arg
> (new_call.bindings, constexpr_fundef_hasher::hash (new_call.fundef));
> diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C 
> gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
> index e69de29bb2d..18d79e2e1cc 100644
> --- gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
> +++ gcc/testsuite/g++.dg/cpp1y/constexpr-83116.C
> @@ -0,0 +1,18 @@
> +// PR c++/83116
> +// { dg-do run { target c++14 } }
> +// { dg-options "-O2" }
> +
> +struct S {
> +  constexpr S () : s(0) { foo (); }
> +  constexpr int foo () { return s; }
> +  int s;
> +};
> +
> +int
> +main ()
> +{
> +  static S var;
> +  var.s = 5;
> +  if (var.s != 5 || var.foo () != 5)
> +__builtin_abort ();
> +}
>
> Marek


Re: [PATCH] Improve strlen pass handling of loads from memory (PR tree-optimization/83444)

2017-12-18 Thread Jeff Law
On 12/18/2017 02:45 PM, Jakub Jelinek wrote:
> Hi!
> 
> As the testcase shows, we fold strlen (str) == 0 into *str == 0 before
> the strlen pass has a chance to optimize.  The following patch optimizes
> loads from the known termination '\0' into lhs = '\0' and loads from the
> part of the string known to be non-zero results in ~[0, 0] range being
> recorded if we don't have anything better.
> 
> Note that unfortunately vrp2 doesn't consider any memory loads as
> interesting, even when they have non-varying ranges from other passes,
> but guess that is something to improve later.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-12-18  Jakub Jelinek  
> 
>   PR tree-optimization/83444
>   * tree-ssa-strlen.c (strlen_check_and_optimize_stmt): Optimize
>   character loads.
> 
>   * gcc.dg/strlenopt-36.c: New test.
OK.
jeff


Re: [PATCH] avoid false negatives in attr-nonstring-3.c (PR 83131)

2017-12-18 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00665.html

On 12/11/2017 03:54 PM, Martin Sebor wrote:

The attr-nonstring-3.c test fails on targets that expand
the calls to some of the tested string functions in builtins.c,
before they reach the checker in calls.c.  The failures were
reported on powrrpc64le but tests can be constructed that fail
even on other targets (including x86_64).

To fix these failures the checker needs to be invoked both
in builtins.c when the expansion takes place and in calls.c
otherwise.

The attached patch does that.  Since it also adjusts
the indentation in the changed functions, I used diff -w
to leave the whitespace changes out of it.

Bootstrapped and tested on x86_64-linux.  I verified the tests
pass using a powerpc64le-linux cross-compiler.

Martin




[PATCH] Improve vect_create_epilog_for_reduction (PR tree-optimization/80631)

2017-12-18 Thread Jakub Jelinek
Hi!

When backporting the wrong-code bugfix parts of PR80631 to 7.3, I've noticed
that we perform the optimization to use the induc_val only when reduc_fn is
IFN_REDUC_{MAX,MIN}.  That is true e.g. for AVX2, but not plain SSE2, so if
we have a loop like:
void foo (int *v)
{
  int found_index = -17;
  for (int k = 0; k < 64; k++)
if (v[k] == 77)
  found_index = k;
  return found_index;
}
we start with { -17, -17, -17... } only for avx* and can optimize away
the scalar if (reduc_result == -17 ? -17 : reduc_result, but for sse*
we start instead with { -1, -1, -1... } and can't optimize
if (reduc_result == -1 ? -17 : reduc_result.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-12-18  Jakub Jelinek  

PR tree-optimization/80631
* tree-vect-loop.c (vect_create_epilog_for_reduction): Compare
induc_code against MAX_EXPR or MIN_EXPR instead of reduc_fn against
IFN_REDUC_MAX or IFN_REDUC_MIN.

--- gcc/tree-vect-loop.c.jj 2017-12-12 09:54:28.0 +0100
+++ gcc/tree-vect-loop.c2017-12-15 18:56:15.426591727 +0100
@@ -4432,9 +4432,9 @@ vect_create_epilog_for_reduction (vec

Re: Fix Debug insert_return_type

2017-12-18 Thread Jonathan Wakely
On 15 December 2017 at 21:19, François Dumont wrote:
> Here is a patch to fix those failures of the latest report:
>
> UNRESOLVED: 23_containers/map/modifiers/extract.cc compilation failed to
> produce executable
> FAIL: 23_containers/set/modifiers/extract.cc (test for excess errors)
> UNRESOLVED: 23_containers/set/modifiers/extract.cc compilation failed to
> produce executable
> FAIL: 23_containers/unordered_map/modifiers/extract.cc (test for excess
> errors)
> UNRESOLVED: 23_containers/unordered_map/modifiers/extract.cc compilation
> failed to produce executable
> FAIL: 23_containers/unordered_set/modifiers/extract.cc (test for excess
> errors)
> UNRESOLVED: 23_containers/unordered_set/modifiers/extract.cc compilation
> failed to produce executable
>
>
> Tested under Linux x86_64 Debug mode.
>
> Ok to commit ?

OK, thanks for fixing it.


> Note that I don't understand this in _Rb_tree:
>
>   using insert_return_type = _Node_insert_return<
> conditional_t, const_iterator, iterator>,
> node_type>;
>
> Why the conditional_t part ? In Debug mode it is always using iterator and I
> don't understand what represent this is_same_v<_Key, _Val> condition.

is_same_v<_Keym _Val> is true for sets, false for maps.

The condition means that the type uses _Rb_tree<>::const_iterator for
sets, and _Rb_tree::iterator for maps. IIRC this is necessary because
set::iterator is a typedef for set::const_iterator and that can only
constructed from _Rb_tree<>::const_iterator, not from
_Rb_tree<>::iterator. So for sets the member of _Rb_tree needs to
return a const_iterator.

There is no _Rb_tree::insert_return_type in Debug Mode, so you're
comparing apples and oranges. The iterator in Debug Mode is either
set::iterator (which is the same as set::const_iterator) or
map::iterator, not _Rb_tree::iterator.


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 10:37:19AM -0700, Martin Sebor wrote:
> On 12/18/2017 10:07 AM, Jakub Jelinek wrote:
> > On Mon, Dec 18, 2017 at 10:00:36AM -0700, Martin Sebor wrote:
> > > On 12/18/2017 08:10 AM, Marek Polacek wrote:
> > > > I'm not entirely up to speed with this code, but this one seemed 
> > > > sufficiently
> > > > obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
> > > > Otherwise, go with maxobjsize.
> > > > 
> > > > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > > 
> > > Thanks for the fix.  I noticed the problem only comes up when
> > > memcpy is declared without a prototype.  Otherwise, the memcpy
> > > call is eliminated early on (during gimplification?)  So while
> > > avoiding the ICE is obviously a necessary change, I wonder if
> > > this bug also suggests a missing optimization that might still
> > > be relevant (i.e., eliminating memcpy() with a zero size for
> > > a memcpy without a prototype):
> > 
> > No, we shouldn't try to optimize if people mess up stuff.
> 
> Declaring memcpy() without a prototype isn't messing up.  It may
> be poor practice but there's nothing inherently wrong with it.

There is.  Because the last argument is size_t, so calling it with 0
is UB (on LP64, on ILP32 I think it will be optimized normally).

Jakub


[committed][PR middle-end/83460] Remove compromised test

2017-12-18 Thread Jeff Law

As noted in the BZ, this test has been compromised by Martin's changes
in .

I contemplated turning the test into pure C code, but that turns out not
to be a terribly useful thing to do.

I also contemplated gathering the CPP output prior to Martin's changes
and using that for the test.  But that runs the high probability of bit
rotting over time.

I could have also created a private  without Martin's changes,
but it just doesn't seem to be worth the effort.

I considered re-purposing the test to verify Martin's changes were
effective, but his commit has reasonable tests for this already.

So I'm declaring the test compromised and removing it.

Jeff
commit 2ab3b98c5cc2a63d25198d5a78e037b1465bd415
Author: law 
Date:   Mon Dec 18 18:13:20 2017 +

PR middle-end/83460
* g++.dg/pr79095-4.C: Remove compromised test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@255784 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 0963f4ec961..6e64c42cd0c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-12-18  Jeff Law  
+
+   PR middle-end/83460
+   * g++.dg/pr79095-4.C: Remove compromised test.
+
 2017-12-18  Jakub Jelinek  
 
PR c++/83300
diff --git a/gcc/testsuite/g++.dg/pr79095-4.C b/gcc/testsuite/g++.dg/pr79095-4.C
deleted file mode 100644
index df550257465..000
--- a/gcc/testsuite/g++.dg/pr79095-4.C
+++ /dev/null
@@ -1,26 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-Wall -O3 -fdump-tree-vrp2" } */
-
-#include 
-
-void foo(std::vector );
-
-void vtest()
-{
-  std::vector v;
-  foo (v);
-  {
-v.resize (v.size()-1);
-  }
-}
-
-/* As written this testcase should trigger a warning.  We overflow to -1U
-   if v.size() == 0 in foo().  This results in bogus calls to memset.
-
-   The number of clearing loops in the IL can vary depending on the C++
-   mode used for the test.  But by the end of VRP2, there should be a single
-   clearing loop left and it should be using memcpy.  */
-/* { dg-final { scan-tree-dump-times  "__builtin_memset \\(_\[0-9\]+, 0, 
\[0-9\]+\\)" 1 "vrp2" } } */
-
-/* And that call should trigger a warning.  */
-/* { dg-warning "exceeds maximum object size" "" { target *-*-* } 0 } */ 


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Bernd Edlinger
Hi Martin,

> In all cases all the information necessary to detect and diagnose
> or even avoid the problem is available.  In fact, one might argue
> that optimizing such calls (expanding them inline) would be
> preferable to doing nothing and allowing the undefined behavior
> to cause a bug at runtime.

I think the right thing to do here would emit a warning for
declaring the builtin with the wrong prototype.

If you try to do the equivalent in C++ you get this:

cat test.cc
extern "C" void* memcpy (...);

void p (void *d, const void *s)
{
  memcpy (d, s, 0);
}

 g++ -S  test.cc
test.cc:1:18: warning: declaration of ‘void* memcpy(...)’ conflicts with 
built-in declaration ‘void* memcpy(void*, const void*, long unsigned int)’ 
[-Wbuiltin-declaration-mismatch]
 extern "C" void* memcpy (...);
  ^~

You get the same warning in C when the Prototype does not match,
but unfortunately not when the prototype is missing.

I would prefer the same warning in C whenever the declaration does
not match.


Bernd.




Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 08:10 AM, Marek Polacek wrote:

I'm not entirely up to speed with this code, but this one seemed sufficiently
obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
Otherwise, go with maxobjsize.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


Thanks for the fix.  I noticed the problem only comes up when
memcpy is declared without a prototype.  Otherwise, the memcpy
call is eliminated early on (during gimplification?)  So while
avoiding the ICE is obviously a necessary change, I wonder if
this bug also suggests a missing optimization that might still
be relevant (i.e., eliminating memcpy() with a zero size for
a memcpy without a prototype):

  void* memcpy ();

  void p (void *d, const void *s)
  {
memcpy (d, s, 0);
  }

GCC doesn't warn for this code which means it views the memcpy
as a built-in but it doesn't eliminate the call.  If I declare
memcpy for example like this, it complains about the mismatch
but it does eliminates the call:

  void* memcpy (long, long, long);

Martin



2017-12-18  Marek Polacek  

PR middle-end/83463
* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref):
Check if TYPE is INTEGRAL_TYPE_P before accessing its min/max
values.

* gcc.dg/pr83463.c: New test.

diff --git gcc/gimple-ssa-warn-restrict.c gcc/gimple-ssa-warn-restrict.c
index 4d424735d2a..d1a376637a2 100644
--- gcc/gimple-ssa-warn-restrict.c
+++ gcc/gimple-ssa-warn-restrict.c
@@ -287,13 +287,15 @@ builtin_memref::builtin_memref (tree expr, tree size)
  else
{
  gimple *stmt = SSA_NAME_DEF_STMT (offset);
+ tree type;
  if (is_gimple_assign (stmt)
- && gimple_assign_rhs_code (stmt) == NOP_EXPR)
+ && gimple_assign_rhs_code (stmt) == NOP_EXPR
+ && (type = TREE_TYPE (gimple_assign_rhs1 (stmt)))
+ && INTEGRAL_TYPE_P (type))
{
  /* Use the bounds of the type of the NOP_EXPR operand
 even if it's signed.  The result doesn't trigger
 warnings but makes their output more readable.  */
- tree type = TREE_TYPE (gimple_assign_rhs1 (stmt));
  offrange[0] = wi::to_offset (TYPE_MIN_VALUE (type));
  offrange[1] = wi::to_offset (TYPE_MAX_VALUE (type));
}
diff --git gcc/testsuite/gcc.dg/pr83463.c gcc/testsuite/gcc.dg/pr83463.c
index e69de29bb2d..735ef3c6dc8 100644
--- gcc/testsuite/gcc.dg/pr83463.c
+++ gcc/testsuite/gcc.dg/pr83463.c
@@ -0,0 +1,17 @@
+/* PR middle-end/83463 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wrestrict" } */
+
+int *a;
+void *memcpy ();
+void
+m (void *p1)
+{
+  memcpy (0, p1, 0);
+}
+
+void
+p ()
+{
+  m (p + (long) a);
+}

Marek





Re: C++ PATCH to fix wrong-code with constexpr call table cache (PR c++/83116)

2017-12-18 Thread Jason Merrill
On Mon, Dec 18, 2017 at 10:09 AM, Marek Polacek  wrote:
> Here the problem was that cxx_eval_call_expression can cache the result of a
> constexpr call in constexpr_call_table, but we have to be careful, after
> store_init_value the result might be invalid.  So I believe we also have to
> clear the constexpr call table.  I've lumped it together with clearing
> cv_cache.

Hmm, that seems like a big hammer; the problem isn't that
store_init_value makes the result invalid, it's that the result
calculated during store_init_value (when we can treat the object as
constant) isn't relevant later (when the object is no longer
constant).  So we want to avoid caching when we're called for the
initial value.  Maybe by changing

if (depth_ok && !non_constant_args)

to

if (depth_ok && !non_constant_args && ctx->strict)

?  Does that work?


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 10:45 AM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 10:37:19AM -0700, Martin Sebor wrote:

On 12/18/2017 10:07 AM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 10:00:36AM -0700, Martin Sebor wrote:

On 12/18/2017 08:10 AM, Marek Polacek wrote:

I'm not entirely up to speed with this code, but this one seemed sufficiently
obvious: check INTEGRAL_TYPE_P before looking at a tree's min/max value.
Otherwise, go with maxobjsize.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


Thanks for the fix.  I noticed the problem only comes up when
memcpy is declared without a prototype.  Otherwise, the memcpy
call is eliminated early on (during gimplification?)  So while
avoiding the ICE is obviously a necessary change, I wonder if
this bug also suggests a missing optimization that might still
be relevant (i.e., eliminating memcpy() with a zero size for
a memcpy without a prototype):


No, we shouldn't try to optimize if people mess up stuff.


Declaring memcpy() without a prototype isn't messing up.  It may
be poor practice but there's nothing inherently wrong with it.


There is.  Because the last argument is size_t, so calling it with 0
is UB (on LP64, on ILP32 I think it will be optimized normally).


It isn't optimized either way.  In fact, the only indication
of a problem in the code below is the new -Wrestrict warning:

  void* memcpy ();

  void p ()
  {
memcpy (0, 0, 0);
  }

  warning: ‘memcpy’ source argument is the same as destination [-Wrestrict]
 memcpy (0, 0, 0);
 ^~~~

  ;; Function p (p, funcdef_no=0, decl_uid=1892, cgraph_uid=0, 
symbol_order=0)


  p ()
  {
 [local count: 1073741825]:
memcpy (0, 0, 0); [tail call]
return;
  }

But you are conflating the ability to call such a built-in with
arguments of the wrong types with declaring it.  There really is
nothing wrong with declaring it as long as GCC checks the calls.

To ameliorate the undefined behavior you point out, GCC could do
one of three things: a) complain about declaring a built-in with
no prototype, or b) complain about calls to such built-ins that
pass arguments of the wrong type (or the wrong number of
arguments), or c) convert the arguments to the expected type
and still warn on mismatches).

In all cases all the information necessary to detect and diagnose
or even avoid the problem is available.  In fact, one might argue
that optimizing such calls (expanding them inline) would be
preferable to doing nothing and allowing the undefined behavior
to cause a bug at runtime.

Martin


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 11:08:19AM -0700, Martin Sebor wrote:
> It isn't optimized either way.  In fact, the only indication
> of a problem in the code below is the new -Wrestrict warning:

So just call it as memcpy (0, 0, (size_t) 0); or memcpy (0, 0, 0UL);
on targets where size_t is unsigned long int, and it will be optimized away,
because it will be compatible with the builtin's prototype then.

gimple_call_builtin_p is the API that has been agreed upon to be used for
checking for builtins several years ago, there is absolutely no reason why
this pass needs to treat them differently.

Jakub


[PING] [PATCH] testsuite: add coverage for diagnostics relating to inlining (PR tree-optimization/83336)

2017-12-18 Thread David Malcolm
Is the tweak to prune.exp at the end of this patch OK?  (I can self-
approve the rest of this).


On Mon, 2017-12-11 at 15:44 -0500, David Malcolm wrote:
> In theory, the diagnostics subsystem can print context information on
> code inlining when diagnostics are emitted by the middle-end,
> describing
> the chain of inlined callsites that led to a particular warning,
> but PR tree-optimization/83336 describes various issues with this.
> 
> An underlying issue is that we have very little automated testing for
> this code: gcc.dg/tm/pr52141.c has a test, but in general, prune.exp
> filters out the various "inlined from" lines.
> 
> The following patch adds test coverage for it for C and C++ via a new
> testsuite plugin, which emits a warning from the middle-end; the test
> cases use dg-regexp to verify that the "inlined from" lines are
> emitted correctly, with the correct function names and source
> locations.
> 
> Doing so requires a change to prune.exp: the dg-regexp lines have to
> be handled *before* the "inlined from" lines are stripped.
> 
> (I have various followups, but establishing automated test coverage
> seems like an important first step)
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu; adds
> 7 PASS results to g++.sum and 31 PASS results to gcc.sum.
> 
> OK for trunk?
> 
> gcc/testsuite/ChangeLog:
>   PR tree-optimization/83336
>   * g++.dg/cpp0x/missing-initializer_list-include.C: Update for
>   changes to prune.exp's handling of dg-regexp.
>   * g++.dg/plugin/diagnostic-test-inlining-1.C: New test case.
>   * g++.dg/plugin/plugin.exp (plugin_test_list): Add it, via
>   gcc.dg's plugin/diagnostic_plugin_test_inlining.c.
>   * gcc.dg/plugin/diagnostic-test-inlining-1.c: New test case.
>   * gcc.dg/plugin/diagnostic-test-inlining-2.c: Likewise.
>   * gcc.dg/plugin/diagnostic-test-inlining-3.c: Likewise.
>   * gcc.dg/plugin/diagnostic-test-inlining-4.c: Likewise.
>   * gcc.dg/plugin/diagnostic_plugin_test_inlining.c: New test
>   plugin.
>   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add them.
>   * lib/prune.exp (prune_gcc_output): Move call to handle-dg-
> regexps
>   to before the various text stripping regsup invocations,
>   in particular, to before the stripping of "inlined from".
> ---
>  .../cpp0x/missing-initializer_list-include.C   |   1 +
>  .../g++.dg/plugin/diagnostic-test-inlining-1.C |  34 
>  gcc/testsuite/g++.dg/plugin/plugin.exp |   2 +
>  .../gcc.dg/plugin/diagnostic-test-inlining-1.c |  34 
>  .../gcc.dg/plugin/diagnostic-test-inlining-2.c |  48 ++
>  .../gcc.dg/plugin/diagnostic-test-inlining-3.c |  43 +
>  .../gcc.dg/plugin/diagnostic-test-inlining-4.c |  56 +++
>  .../plugin/diagnostic_plugin_test_inlining.c   | 180
> +
>  gcc/testsuite/gcc.dg/plugin/plugin.exp |   5 +
>  gcc/testsuite/lib/prune.exp|   6 +-
>  10 files changed, 406 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/plugin/diagnostic-test-
> inlining-1.C
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-
> inlining-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-
> inlining-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-
> inlining-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-
> inlining-4.c
>  create mode 100644
> gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_inlining.c
> 
> diff --git a/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-
> include.C b/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-
> include.C
> index 7d72ec4..1010b0a 100644
> --- a/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/missing-initializer_list-include.C
> @@ -24,5 +24,6 @@ void test (int i)
>  +#include 
>   /* This is padding (to avoid the generated patch containing DejaGnu
>  directives).  */
> + 
>  { dg-end-multiline-output "" }
>  #endif
> diff --git a/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C 
> b/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C
> new file mode 100644
> index 000..df7bb1f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/diagnostic-test-inlining-1.C
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-attributes -fdiagnostics-show-caret" } */
> +
> +extern void __emit_warning (const char *message);
> +
> +/* Verify that the diagnostic subsytem describes the chain of
> inlining
> +   when reporting the warning.  */
> +
> +__attribute__((always_inline))
> +static void foo (void)
> +{
> +  __emit_warning ("message");
> +}
> +
> +__attribute__((always_inline))
> +static void bar (void)
> +{
> +  foo ();
> +}
> +
> +int main()
> +{
> +  bar ();
> +  return 0;
> +}
> +
> +/* { dg-regexp "In function 'void foo\\(\\)'," "" } */
> +/* { dg-regexp "inlined from 'void bar\\(\\)' at .+/diagnostic-
> 

Re: [v3 PATCH] PR libstdc++/68430

2017-12-18 Thread Jonathan Wakely
On 16 December 2017 at 12:30, Ville Voutilainen  wrote:
> The compiler-powered is_constructible that we have in gcc 8 is powerful
> enough to give the right answer to an is_constructible question
> that would be hard for a pure-library implementation to get right
> in a well-formed fashion. This is just adding a test for it. Tested
> on Linux-PPC64, OK for trunk?

OK, thanks.


Re: [PING] [PATCH] testsuite: add coverage for diagnostics relating to inlining (PR tree-optimization/83336)

2017-12-18 Thread Jeff Law
On 12/18/2017 10:14 AM, David Malcolm wrote:
> Is the tweak to prune.exp at the end of this patch OK?  (I can self-
> approve the rest of this).
Yes.
jeff


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 11:32 AM, Bernd Edlinger wrote:

Hi Martin,


In all cases all the information necessary to detect and diagnose
or even avoid the problem is available.  In fact, one might argue
that optimizing such calls (expanding them inline) would be
preferable to doing nothing and allowing the undefined behavior
to cause a bug at runtime.


I think the right thing to do here would emit a warning for
declaring the builtin with the wrong prototype.

If you try to do the equivalent in C++ you get this:

cat test.cc
extern "C" void* memcpy (...);

void p (void *d, const void *s)
{
  memcpy (d, s, 0);
}

 g++ -S  test.cc
test.cc:1:18: warning: declaration of ‘void* memcpy(...)’ conflicts with 
built-in declaration ‘void* memcpy(void*, const void*, long unsigned int)’ 
[-Wbuiltin-declaration-mismatch]
 extern "C" void* memcpy (...);
  ^~

You get the same warning in C when the Prototype does not match,
but unfortunately not when the prototype is missing.

I would prefer the same warning in C whenever the declaration does
not match.


That sounds good to me.

I do think that, in addition, issuing another warning for calls
with arguments of the wrong type (or wrong number) would be useful
as well, or in fact even more so.  It will make it possible to
distinguish safe calls to such functions from unsafe ones.  These
two could be part of the same enhancement or independent of one
another.

Martin


Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2017-12-18 Thread Jeff Law
On 12/11/2017 08:44 AM, James Greenhalgh wrote:
> Hi,
> 
> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.
> 
> This duplication can be clunky and wasteful.
> 
> Better would be to simply use the fact that we will always be overwriting
> the remaining bits, and simply move the first element to the corrcet place
> (implicitly zeroing all other bits).
> 
> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.
> 
> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.
> 
> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.
> 
> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
> catch:
> 
>   (vec_merge:OUTER
>  (vec_duplicate:OUTER x:INNER)
>  (subreg:OUTER y:INNER 0)
>  (const_int N))
> 
> And simplify it to:
> 
>   (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
> 
> This is similar to the existing patterns which are tested in this function,
> without requiring the second operand to also be a vec_duplicate.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.
> 
> Note that this requires 
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.
> 
> Are the non-AArch64 parts OK?
> 
> Thanks,
> James
> 
> ---
> 2017-12-11  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
>   generation for cases where splatting a value is not useful.
>   * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
>   across a vec_duplicate and a paradoxical subreg forming a vector
>   mode to a vec_concat.
> 
> 2017-12-11  James Greenhalgh  
> 
>   * gcc.target/aarch64/vect-slp-dup.c: New.
> 
> 
> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
> 
> 

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 806c309..ed16f70 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, 
> machine_mode mode,
>   return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
>   }
>  
> +   /* Replace:
> +
> +   (vec_merge:outer (vec_duplicate:outer x:inner)
> +(subreg:outer y:inner 0)
> +(const_int N))
> +
> +  with (vec_concat:outer x:inner y:inner) if N == 1,
> +  or (vec_concat:outer y:inner x:inner) if N == 2.
> +
> +  Implicitly, this means we have a paradoxical subreg, but such
> +  a check is cheap, so make it anyway.
I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
not show up here in practice.


So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
show up in the opposite order?  Or is there some canonicalization that
prevents that?

simplify-rtx bits are OK as-is if we're certain we're not going to get
the alternate ordering of the VEC_MERGE operands.  ALso OK if you either
generalize this chunk of code or duplicate & twiddle it to handle the
alternate order.

I didn't look at the aarch64 specific bits.

jeff



Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 04:04:11PM -0700, Martin Sebor wrote:
> On 12/18/2017 12:04 PM, Jakub Jelinek wrote:
> > On Mon, Dec 18, 2017 at 11:08:19AM -0700, Martin Sebor wrote:
> > > It isn't optimized either way.  In fact, the only indication
> > > of a problem in the code below is the new -Wrestrict warning:
> > 
> > So just call it as memcpy (0, 0, (size_t) 0); or memcpy (0, 0, 0UL);
> > on targets where size_t is unsigned long int, and it will be optimized away,
> > because it will be compatible with the builtin's prototype then.
> 
> A call to memcpy(0, 0, (size_t)0) is not eliminated on any target,
> presumably because the type of 0 is int and not void*.  "Don't
> write bad code" is also not an answer to the question of why
> shouldn't GCC eliminate the bad code (null pointer) when there
> is no prototype given that it eliminates it when the function
> is declared with one.  But I'm not insisting on it.  What
> I think will much more helpful is diagnosing such bad calls,
> similarly to how attribute sentinel diagnoses the analogous
> subset of the problem involving variadic functions.
> 
> > gimple_call_builtin_p is the API that has been agreed upon to be used for
> > checking for builtins several years ago, there is absolutely no reason why
> > this pass needs to treat them differently.
> 
> I already explained the reason: to help detect bugs.  That's

Your warning is about restrict and argument overlap, what does it have to do
with unprototyped calls?  Nothing.  There is no restrict in that case, and
it isn't handled as builtin if it doesn't match the builtin's prototype.

The user can use -Wstrict-prototypes and -Wbuiltin-declaration-mismatch already,
and if needed, we can resurrect the "Add new warning -Wunprototyped-calls" 
thread
for the rest.  That is actually that can help (the remaining few users that
actually still use unprototyped calls).

Jakub


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Jakub Jelinek
On Mon, Dec 18, 2017 at 05:03:22PM -0700, Martin Sebor wrote:
> > Your warning is about restrict and argument overlap, what does it have to do
> > with unprototyped calls?  Nothing.  There is no restrict in that case, and
> > it isn't handled as builtin if it doesn't match the builtin's prototype.
> 
> $ cat a.c && gcc -O2 -S -Wrestrict a.c
> void* memcpy ();
> 
> char a[5];
> 
> void f (void)
> {
>   memcpy (a, a + 1, 3);
> }
> a.c: In function ‘f’:
> a.c:7:3: warning: ‘memcpy’ accessing 3 bytes at offsets 0 and 1 overlaps 2
> bytes at offset 1 [-Wrestrict]
>memcpy (a, a + 1, 3);
>^~~~
> 
> By insisting on using gimple_call_builtin_p() you're effectively
> arguing to disable the warning above, for no good reason that I
> can see.

Because it is wrong.

Consider e.g.
void *mempcpy ();

void
foo (int *a)
{
  mempcpy (a, (float *) a + 1, ' ');
}

mempcpy isn't defined in ISO C99, so it is fine if you define it yourself
with void *mempcpy (int *, float *, char); prototype and do whatever you
want in there.  Warning about restrict doesn't make sense in that case.

Jakub


[PATCH] Fix off by one error in loop-unroll.c (PR rtl-optimization/82675).

2017-12-18 Thread Martin Liška
Hello.

Following patch fixes off by one in loop unroller where wont_exit has
not enough elements that are later on asked for.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-11-08  Martin Liska  

PR rtl-optimization/82675
* loop-unroll.c (unroll_loop_constant_iterations): Allocate one
more element in sbitmap.
---
 gcc/loop-unroll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 91bf5dddeed..2fab92bcf74 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -477,7 +477,7 @@ unroll_loop_constant_iterations (struct loop *loop)
 
   exit_mod = niter % (max_unroll + 1);
 
-  auto_sbitmap wont_exit (max_unroll + 1);
+  auto_sbitmap wont_exit (max_unroll + 2);
   bitmap_ones (wont_exit);
 
   auto_vec remove_edges;



Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-12-18 Thread Jeff Law
On 11/27/2017 11:44 AM, James Greenhalgh wrote:
> On Wed, Nov 22, 2017 at 06:28:24PM +, Jeff Law wrote:
>> On 11/21/2017 04:57 AM, James Greenhalgh wrote:
>>> I've finally built up enough courage to start getting my head around this...
>> Can't blame you for avoiding :-) This stuff isn't my idea of fun either.
> 
> Right, here's where I'm up to... I think that I have a reasonable grasp of
> the overall intentions for this patch and what we're trying to do.
Good.  I realize it's somewhat painful.  Wilco's feedback WRT what is
and what is not possible in aarch64 prologues has been invaluable in
simplifying things.  But in the end it's just painful.



> 
> We want to put out regular probes (writes of zero) to the stack, at a regular
> interval (given by PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL ), starting
> at a known offset from the stack
> base ( PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE ). There's some subtetly
> around when and where we need probes, what the efficient sequences to
> achieve that would be, etc. and handling that complexity, and the trouble
> of having two possible stack adjustments to protect (initial adjustment and
> final adjustment), is where most of the body of the patch comes from.
Right.

> 
> So, as far as I can read there are a couple of design points that I need
> clarified. If we get through these it might be possible for Wilco to pick
> up the patch and optimise the cases he's marked as a concern earlier.
> 
> Let me try to expand on these...
> 
> One point of contention in the review so far has been whether the default
> values for PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL and
> PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE are correct for AArch64. As this
> has some implications for the algorithm we would like to use for probing,
> where exactly we'd like to emit the probes, and might need a glibc patch
> it will be helpful to narrow this down to an answer.
So the guard size came out of a discussion with Richard Earnshaw.  I
think there's a general agreement on that.  It was explicitly decided to
punt ILP32 on aarch64 and not worry about it.  So that's really the one
thing that could throw a wrench into the guard size.


> 
> I see two positions.
> 
> Wilco would like both parameters set to 16 - that means (ignoring the
> caller-protected stuff for now) a guard page size of 64k, and a probe
> every 64k.
> 
> Jeff's current patch has a 64k guard page size, and a probe every 4k.
> 
> What concerns me is this from Jeff from couple of emails back:
> 
>> Given the ISA I wouldn't expect an interval > 12 to be useful or necessarily
>> even work correctly.
> 
> There's a clear difference of opinion here!
So the issue is that once you get an interval > 12 you're unable to
adjust the stack pointer in a single instruction without using a scratch
register.

*Most* of the code tries to handle this.  The notable exception is
aarch64_output_probe_stack_range which is used by both stack clash
protection and Ada style -fstack-check.

For stack clash prologues we get there via

aarch64_expand_prologue -> aarch64_allocate_and_probe_stack_space

Which implies that we have to have another scratch register in the
prologue code.  We already use IP0_REGNUM and IP1_REGNUM depending on
context.  So neither of those is available.  I'm not familiar enough
with aarch64 to know if there's another register we can safely use.

Without another register we can safely use, probing at intervals > 4k is
a non-starter AFAICT.

> I'd guess the difference of opinion is whether you need to probe every page
> (4k) or whether you need to probe on each multiple of the minimum guard page
> size (64k), but maybe it is just a difficulty in juggling registers? For
> Wilco's position to work, we'd need to be happy emitting a probe once for
> every 64k - from the sounds of the other comments on this thread, there's no
> problem with that - we could have a minimum size of 64k for
> PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL (though the current code would
> probably need to change to accomodate that). Clearly we can reduce the
> number of probes required to 1/16th if this were feasible.
It's primarily a matter of getting a regsiter.

> 
> So, first design question - other than potential troubles grabbing scratch
> registers (we'll revisit this), is there a reason that would stop us deciding
> to default to 64k for both parameters on AArch64?
Florian might have comments here.


> 
> Where this gets messy is handling the stack write. The current design 
> will drop the stack pointer, then emit a probe back near the top where we
> were previosuly sitting. i.e.
> 
>   sp = sp - 4096
>   *(sp + 4088) = 0
> 
> For larger probe intervals, we end up juggling bigger numbers, which makes
> the calculation slightly more unwiedly.
Right.

> 
> One suggestion from Wilco would be to change this sequence to:
> 
>   sp = sp - 4096
>   *(sp) = 0
> 
> In other words, drop the stack and then emit a probe where we are now
> sitting. That would 

[PATCH, rs6000] Add vec_mergee, vec_mergeo, vec_float2 builtin support

2017-12-18 Thread Carl Love
GCC maintainers:

The following patch adds support for some missing instances of builtins,
vec_mergee, vec_mergeo, vec_float2.  This patch adds the missing GCC
functionality and test cases for the builtins.  The patch has been run
on:

  powerpc64le-unknown-linux-gnu (Power 8 LE)
  powerpc64le-unknown-linux-gnu (Power 8 BE)
  powerpc64le-unknown-linux-gnu (Power 9 LE)

without regressions.  

Please let me know if the following patch is acceptable.  Thanks.

  Carl Love




gcc/ChangeLog:

2017-12-18  Carl Love  

* config/rs6000/altivec.md (p8_vmrgow): Add support for V2DI, V2DF,
V4SI, V4SF types.
(p8_vmrgew): Add support for V2DI, V2DF, V4SF types.
* config/rs6000/rs6000-builtin.def: Add definitions for FLOAT2_V2DF,
VMRGEW_V2DI, VMRGEW_V2DF, VMRGEW_V4SF, VMRGOW_V4SI, VMRGOW_V4SF,
VMRGOW_V2DI, VMRGOW_V2DF.  Remove definition for VMRGOW.
* config/rs6000/rs6000-c.c (VSX_BUILTIN_VEC_FLOAT2,
P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VEC_VMRGOW):  Add definitions.
* config/rs6000/rs6000-protos.h: Add extern defition for
rs6000_generate_float2_double_code.
* config/rs6000/rs6000.c (rs6000_generate_float2_double_code): Add
function.
* config/rs6000/vsx.md (vsx_xvcdpsp): Add define_insn.
(float2_v2df): Add define_expand.

gcc/testsuite/ChangeLog:

2017-12-18 Carl Love  

* gcc.target/powerpc/builtins-1.c (main): Add tests for vec_mergee and
vec_mergeo builtins with float, double, long long, unsigned long long,
bool long long  arguments.
* gcc.target/powerpc/builtins-3-runnable.c (main): Add test for
vec_float2 with double arguments.
* gcc.target/powerpc/builtins-mergew-mergow.c: New runable test for the
vec_mergew and vec_mergow builtins.
---
 gcc/config/rs6000/altivec.md   |  48 +++-
 gcc/config/rs6000/rs6000-builtin.def   |   9 +-
 gcc/config/rs6000/rs6000-c.c   |  32 ++-
 gcc/config/rs6000/rs6000-protos.h  |   1 +
 gcc/config/rs6000/rs6000.c |  37 +++
 gcc/config/rs6000/vsx.md   |  27 +++
 gcc/testsuite/gcc.target/powerpc/builtins-1.c  |  12 +
 .../gcc.target/powerpc/builtins-3-runnable.c   |   4 +
 .../gcc.target/powerpc/builtins-mergew-mergow.c| 263 +
 9 files changed, 423 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-mergew-mergow.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 7122f99..8e0dfcf 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1,3 +1,4 @@
+
 ;; AltiVec patterns.
 ;; Copyright (C) 2002-2017 Free Software Foundation, Inc.
 ;; Contributed by Aldy Hernandez (a...@quesejoda.com)
@@ -1318,6 +1319,24 @@
 }
   [(set_attr "type" "vecperm")])
 
+;; Power8 vector merge two V2DF/V2DI even words to V2DF
+(define_expand "p8_vmrgew_"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
+   (use (match_operand:VSX_D 1 "vsx_register_operand" ""))
+   (use (match_operand:VSX_D 2 "vsx_register_operand" ""))]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  rtvec v;
+  rtx x;
+
+  v = gen_rtvec (2, GEN_INT (0), GEN_INT (2));
+  x = gen_rtx_VEC_CONCAT (mode, operands[1], operands[2]);
+
+  x = gen_rtx_VEC_SELECT (mode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
+  DONE;
+})
+
 ;; Power8 vector merge two V4SF/V4SI even words to V4SF
 (define_insn "p8_vmrgew_"
   [(set (match_operand:VSX_W 0 "register_operand" "=v")
@@ -1336,12 +1355,12 @@
 }
   [(set_attr "type" "vecperm")])
 
-(define_insn "p8_vmrgow"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-   (vec_select:V4SI
- (vec_concat:V8SI
-   (match_operand:V4SI 1 "register_operand" "v")
-   (match_operand:V4SI 2 "register_operand" "v"))
+(define_insn "p8_vmrgow_"
+  [(set (match_operand:VSX_W 0 "register_operand" "=v")
+   (vec_select:VSX_W
+ (vec_concat:
+   (match_operand:VSX_W 1 "register_operand" "v")
+   (match_operand:VSX_W 2 "register_operand" "v"))
  (parallel [(const_int 1) (const_int 5)
 (const_int 3) (const_int 7)])))]
   "TARGET_P8_VECTOR"
@@ -1353,6 +1372,23 @@
 }
   [(set_attr "type" "vecperm")])
 
+(define_expand "p8_vmrgow_"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
+   (use (match_operand:VSX_D 1 "vsx_register_operand" ""))
+   (use (match_operand:VSX_D 2 "vsx_register_operand" ""))]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  rtvec v;
+  rtx x;
+
+  v = gen_rtvec (2, GEN_INT (1), GEN_INT (3));
+  x = gen_rtx_VEC_CONCAT (mode, operands[1], operands[2]);
+
+  x = gen_rtx_VEC_SELECT (mode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], 

Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 12:04 PM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 11:08:19AM -0700, Martin Sebor wrote:

It isn't optimized either way.  In fact, the only indication
of a problem in the code below is the new -Wrestrict warning:


So just call it as memcpy (0, 0, (size_t) 0); or memcpy (0, 0, 0UL);
on targets where size_t is unsigned long int, and it will be optimized away,
because it will be compatible with the builtin's prototype then.


A call to memcpy(0, 0, (size_t)0) is not eliminated on any target,
presumably because the type of 0 is int and not void*.  "Don't
write bad code" is also not an answer to the question of why
shouldn't GCC eliminate the bad code (null pointer) when there
is no prototype given that it eliminates it when the function
is declared with one.  But I'm not insisting on it.  What
I think will much more helpful is diagnosing such bad calls,
similarly to how attribute sentinel diagnoses the analogous
subset of the problem involving variadic functions.


gimple_call_builtin_p is the API that has been agreed upon to be used for
checking for builtins several years ago, there is absolutely no reason why
this pass needs to treat them differently.


I already explained the reason: to help detect bugs.  That's
the essential purpose of the pass (although this particular
bug wasn't necessarily its original focus).  Using an API that
would result in compromising this goal without offering other
benefits would be self-defeating.

Martin


Re: [PATCH] avoid false negatives in attr-nonstring-3.c (PR 83131)

2017-12-18 Thread Jeff Law
On 12/11/2017 03:54 PM, Martin Sebor wrote:
> The attr-nonstring-3.c test fails on targets that expand
> the calls to some of the tested string functions in builtins.c,
> before they reach the checker in calls.c.  The failures were
> reported on powrrpc64le but tests can be constructed that fail
> even on other targets (including x86_64).
> 
> To fix these failures the checker needs to be invoked both
> in builtins.c when the expansion takes place and in calls.c
> otherwise.
> 
> The attached patch does that.  Since it also adjusts
> the indentation in the changed functions, I used diff -w
> to leave the whitespace changes out of it.
> 
> Bootstrapped and tested on x86_64-linux.  I verified the tests
> pass using a powerpc64le-linux cross-compiler.
> 
> Martin
> 
> gcc-83131.diff-w
> 
> 
> PR testsuite/83131 - c-c++/common/attr-nonstring-3 failure for strcmp tests 
> on PowerPC
> 
> gcc/ChangeLog:
> 
>   PR testsuite/83131
>   * builtins.c (expand_builtin_strlen): Use get_callee_fndecl.
>   (expand_builtin_strcmp): Call maybe_warn_nonstring_arg. 
>   (expand_builtin_strncmp): Same.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/83131
>   * c-c++-common/attr-nonstring-4.c: New test.
OK.  Thanks for the -w output, it certainly makes this one easier to read.



jeff


Re: PATCH to fix -Wrestrict ICE (PR middle-end/83463)

2017-12-18 Thread Martin Sebor

On 12/18/2017 04:41 PM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 04:04:11PM -0700, Martin Sebor wrote:

On 12/18/2017 12:04 PM, Jakub Jelinek wrote:

On Mon, Dec 18, 2017 at 11:08:19AM -0700, Martin Sebor wrote:

It isn't optimized either way.  In fact, the only indication
of a problem in the code below is the new -Wrestrict warning:


So just call it as memcpy (0, 0, (size_t) 0); or memcpy (0, 0, 0UL);
on targets where size_t is unsigned long int, and it will be optimized away,
because it will be compatible with the builtin's prototype then.


A call to memcpy(0, 0, (size_t)0) is not eliminated on any target,
presumably because the type of 0 is int and not void*.  "Don't
write bad code" is also not an answer to the question of why
shouldn't GCC eliminate the bad code (null pointer) when there
is no prototype given that it eliminates it when the function
is declared with one.  But I'm not insisting on it.  What
I think will much more helpful is diagnosing such bad calls,
similarly to how attribute sentinel diagnoses the analogous
subset of the problem involving variadic functions.


gimple_call_builtin_p is the API that has been agreed upon to be used for
checking for builtins several years ago, there is absolutely no reason why
this pass needs to treat them differently.


I already explained the reason: to help detect bugs.  That's


Your warning is about restrict and argument overlap, what does it have to do
with unprototyped calls?  Nothing.  There is no restrict in that case, and
it isn't handled as builtin if it doesn't match the builtin's prototype.


$ cat a.c && gcc -O2 -S -Wrestrict a.c
void* memcpy ();

char a[5];

void f (void)
{
  memcpy (a, a + 1, 3);
}
a.c: In function ‘f’:
a.c:7:3: warning: ‘memcpy’ accessing 3 bytes at offsets 0 and 1 overlaps 
2 bytes at offset 1 [-Wrestrict]

   memcpy (a, a + 1, 3);
   ^~~~

By insisting on using gimple_call_builtin_p() you're effectively
arguing to disable the warning above, for no good reason that I
can see.

I'm happy to take suggestions to improve the pass and fix bugs
in it.  But I see no point in debating whether it should be
changed in ways that would prevent it from diagnosing bugs like
the one above.  That would be counter-productive.

Martin


Re: [PATCH] Improve vect_create_epilog_for_reduction (PR tree-optimization/80631)

2017-12-18 Thread Jeff Law
On 12/18/2017 02:50 PM, Jakub Jelinek wrote:
> Hi!
> 
> When backporting the wrong-code bugfix parts of PR80631 to 7.3, I've noticed
> that we perform the optimization to use the induc_val only when reduc_fn is
> IFN_REDUC_{MAX,MIN}.  That is true e.g. for AVX2, but not plain SSE2, so if
> we have a loop like:
> void foo (int *v)
> {
>   int found_index = -17;
>   for (int k = 0; k < 64; k++)
> if (v[k] == 77)
>   found_index = k;
>   return found_index;
> }
> we start with { -17, -17, -17... } only for avx* and can optimize away
> the scalar if (reduc_result == -17 ? -17 : reduc_result, but for sse*
> we start instead with { -1, -1, -1... } and can't optimize
> if (reduc_result == -1 ? -17 : reduc_result.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2017-12-18  Jakub Jelinek  
> 
>   PR tree-optimization/80631
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): Compare
>   induc_code against MAX_EXPR or MIN_EXPR instead of reduc_fn against
>   IFN_REDUC_MAX or IFN_REDUC_MIN.
OK.
jeff


Re: [RFC PATCH] Avoid PRED_NEGATIVE_RETURN prediction on likely -1/0/1 comparison functions (PR middle-end/81914)

2017-12-18 Thread Jeff Law
On 12/13/2017 07:39 AM, Jakub Jelinek wrote:
> Hi!
> 
> While the PRED_NEGATIVE_RETURN heuristics generally works quite well, for
> qsort comparison functions and similar, including the planned C++
> spaceship operator <=> where typically negative and positive are
> approximately even it doesn't work that well.  This patch is an attempt
> to at least detect some of these cases.  It won't catch functions where
> also other values are returned (e.g. a - b or similar), but then it would be
> even harder to make a distinction.
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64le}-linux, regtest on
> powerpc64-linux pending.  Honza, if it doesn't look completely bogus to you,
> could you give it a spin on SPEC (which I don't have easy access to)?
Note you can have access to SPEC.  Red Hat is appropriately licensed.
Contact Vlad to get suitable bits.

> 
> 2017-12-13  Jakub Jelinek  
> 
>   PR middle-end/81914
>   * predict.c (zero_one_minusone): New function.
>   (apply_return_prediction): Avoid return prediction for functions
>   returning only -1, 0 and 1 values, unless they only return -1 and 0
>   or 0 and 1.
Seems reasonable to me.  You call on how long to wait for Honza.

Jeff


Re: [i386] PR81842 [CET] -fcf-protection -mcet is incompatible with makecontext family functions

2017-12-18 Thread Jeff Law
On 12/18/2017 12:39 PM, Sandra Loosemore wrote:
> On 12/17/2017 05:05 PM, Tsimbalist, Igor V wrote:
>> -fcf-protection -mcet is incompatible with makecontext family functions
>> since they can't properly set up and destroy shadow stack pointer. This
>> change provides a mechanism to help detection shadow stack compatibility.
>> The current proposal is to add -mcheck-shstk-compat option which will
>> predefine __CHECK_SHSTK_COMPAT__ macro. The option will be
>> set on by default.  Then we can add a code
>>
>> #if defined __SHSTK__ && defined __CHECK_SHSTK_COMPAT__
>> # error This source is incompatible with -mshstk
>> #endif
>>
>> to .
> 
> The functional change here is out of my maintainership domain, but
> Why does this need a new macro and a new option to control it?  If the
> code being protected doesn't work properly with -mshstk, it seems like
> it would be more robust to do just
> 
> #if defined __SHSTK__
> # error This source is incompatible with -mshstk
> #endif
> 
> I don't see any discussion in the bugzilla issue to explain this.
I'd tend to agree.  Making another option to handle this seems excessive.

jeff