Re: [PATCH] i386: Roundeven expansion for SSE4.1+

2019-07-30 Thread Tejas Joshi
Hi.

> > * gcc.target/i386/avx-vround-roundeven-1.c: New test.
> > * gcc.target/i386/avx-vround-roundeven-2.c: New test.
>
> roundss and roundsd are sse4_1 instructions, also please change tests
> to use -O2:

I have made the following changes you suggested and changed the file names to:

* gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
* gcc.target/i386/sse4_1-round-roundeven-2.c: New test.

Thanks,
Tejas

On Wed, 31 Jul 2019 at 11:26, Tejas Joshi  wrote:
>
> Previous mail got sent to your mail only. Will send on mailing list.
> Sorry for the trouble.
>
> On Wed, 31 Jul 2019 at 10:52, Tejas Joshi  wrote:
> >
> > Hi.
> >
> > > > * gcc.target/i386/avx-vround-roundeven-1.c: New test.
> > > > * gcc.target/i386/avx-vround-roundeven-2.c: New test.
> > >
> > > roundss and roundsd are sse4_1 instructions, also please change tests
> > > to use -O2:
> >
> > I have made the following changes you suggested and changed the file names 
> > to:
> >
> > * gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
> > * gcc.target/i386/sse4_1-round-roundeven-2.c: New test.
> >
> > Thanks,
> > Tejas
> >
> >
> > On Sat, 27 Jul 2019 at 14:34, Uros Bizjak  wrote:
> > >
> > > On Wed, Jul 24, 2019 at 12:43 PM Tejas Joshi  
> > > wrote:
> > > >
> > > > Hi.
> > > > This is a patch that Uros suggested for roundeven expansion, here.
> > > > Thanks for the heads up.
> > > > 
> > > > I have rerun the testsuite on the patch, it survives the regression
> > > > tests and bootstraps on x86_64-linux-gnu. Note, patch to be applied on
> > > > top of
> > > > 
> > > >
> > > > Thanks,
> > > > Tejas
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-07-24  Tejas Joshi  
> > > >
> > > > * builtins.c (mathfn_built_in_2): Change CASE_MATHFN to
> > > > CASE_MATHFN_FLOATN for roundeven.
> > > > * config/i386/i386.c (ix86_i387_mode_needed): Add case 
> > > > I387_ROUNDEVEN.
> > > > (ix86_mode_needed): Likewise.
> > > > (ix86_mode_after): Likewise.
> > > > (ix86_mode_entry): Likewise.
> > > > (ix86_mode_exit): Likewise.
> > > > (ix86_emit_mode_set): Likewise.
> > > > (emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN.
> > > > * config/i386/i386.h (ix86_stack_slot) : Add SLOT_CW_ROUNDEVEN.
> > > > (ix86_entry): Add I387_ROUNDEVEN.
> > > > (avx_u128_state): Add I387_CW_ANY.
> > > > * config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN.
> > > > (define_int_iterator): Likewise.
> > > > (define_int_attr): Likewise for rounding_insn, rounding and 
> > > > ROUNDING.
> > > > (define_constant): Define ROUND_ROUNDEVEN mode.
> > > > (define_attr): Add roundeven mode for i387_cw.
> > > > (2): Add condition for ROUND_ROUNDEVEN.
> > > > * internal-fn.def (ROUNDEVEN): New builtin function.
> > > > * optabs.def (roundeven_optab): New optab.
> > > >
> > >
> > > LGTM for the x86 part, but you are mixing middle-end changes in the
> > > patch, so you also need an OK from a middle-end maintainer.
> > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2019-07-24  Tejas Joshi  
> > > >
> > > > * gcc.target/i386/avx-vround-roundeven-1.c: New test.
> > > > * gcc.target/i386/avx-vround-roundeven-2.c: New test.
> > >
> > > roundss and roundsd are sse4_1 instructions, also please change tests
> > > to use -O2:
> > >
> > > /* { dg-require-effective-target sse4 } */
> > > /* { dg-options "-O2 -msse4.1" } */
> > >
> > > and
> > >
> > > #include "sse4_1-check.h"
> > >
> > > with
> > >
> > > static void
> > > sse4_1_test (void)
> > > {
> > > ... test code ...
> > > }
> > >
> > > No need to use that much #defines.
> > >
> > > Uros.
diff --git a/gcc/testsuite/gcc.target/i386/avx-vround-roundeven-1.c b/gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
similarity index 55%
rename from gcc/testsuite/gcc.target/i386/avx-vround-roundeven-1.c
rename to gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
index 072d0f0e73a..e673dfd16b3 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vround-roundeven-1.c
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx" } */
+/* { dg-options "-msse4.1" } */
 
 __attribute__((noinline, noclone)) double
 f1 (double x)
@@ -13,5 +13,5 @@ f2 (float x)
   return __builtin_roundevenf (x);
 }
 
-/* { dg-final { scan-assembler-times "vroundsd\[^\n\r\]*xmm" 1 } } */
-/* { dg-final { scan-assembler-times "vroundss\[^\n\r\]*xmm" 1 } } */
+/* { dg-final { scan-assembler-times "roundsd\[^\n\r\]*xmm" 1 } } */
+/* { dg-final { scan-assembler-times "roundss\[^\n\r\]*xmm" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vround-roundeven-2.c b/gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-2.c
similarity index 51%
rename from gcc/testsuite/gcc.target/i386/avx-vround-roundeven-2.c
rename to 

[PATCH v3] Missed function specialization + partial devirtualization

2019-07-30 Thread luoxhu
This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement.
Still, currently the default profile only generates SINGLE indirect target
that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
  1.  PGO with topn is enbaled by default now, but only one indirect
  target edge will be generated in ipa-profile pass, so add variables to enable
  multiple speculative edges through passes, speculative_id will record the 
direct edge
  index bind to the indirect edge, num_of_ics records how many direct edges 
owned by
  the indirect edge, postpone gimple_ic to ipa-profile like default as inline
  pass will decide whether it is benefit to transform indirect call.
  2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
  profile full support in ipa passes and cgraph_edge functions.  speculative_id
  can be set by make_speculative id when multiple targets are binded to
  one indirect edge, and cloned if new edge is cloned.  speculative_id
  is streamed out and stream int by lto like lto_stmt_uid.
  3.  Add 1 in module testcase and 2 cross module testcases.
  4.  Bootstrap and regression test passed on Power8-LE.

v3 Changes:
 1. Rebase to trunk.
 2. Use speculative_id to track and search the reference node matched
 with the direct edge's callee for multiple targets.  This could
 eliminate the workaround strstr before.  Actually, it is the caller's
 response to handle the direct edges mapped to same indirect edge.
 speculative_call_info will still return one of the direct edge
 specified, this will leverage current IPA edge process framework mostly.

gcc/ChangeLog

2019-07-31  Xiong Hu Luo  

PR ipa/69678
* cgraph.c (symbol_table::create_edge): Init speculative_id.
(cgraph_edge::make_speculative): Add param for setting speculative_id.
(cgraph_edge::speculative_call_info): Find reference by
speculative_id for multiple indirect targets.
(cgraph_edge::resolve_speculation): Decrease the speculations
for indirect edge, drop it's speculative if not direct target
left.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
(cgraph_node::verify_node): Don't report error if speculative
edge not include statement.
* cgraph.h (struct indirect_target_info): New struct.
(indirect_call_targets): New vector variable.
(num_of_ics): New variable.
(make_speculative): Add param for setting speculative_id.
(speculative_id): New variable.
* cgraphclones.c (cgraph_node::create_clone): Clone speculative_id.
* ipa-inline.c (inline_small_functions): Add iterator update.
* ipa-profile.c (ipa_profile_generate_summary): Add indirect
multiple targets logic.
(ipa_profile): Likewise.
* ipa-ref.h (speculative_id): New variable.
* ipa.c (process_references): Fix typo.
* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
logic.  Stream out speculative_id.
(input_edge): Likewise.
* predict.c (dump_prediction): Revome edges count assert to be
precise.
* symtab.c (symtab_node::create_reference): Init speculative_id.
(symtab_node::clone_references): Clone speculative_id.
(symtab_node::clone_referring): Clone speculative_id.
(symtab_node::clone_reference): Clone speculative_id.
(symtab_node::clear_stmts_in_references): Clear speculative_id.
* tree-inline.c (copy_bb): Duplicate all the speculative edges
if indirect call contains multiple speculative targets.
* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
__gcov_indirect_call.counters and __gcov_indirect_call.callee.
(gimple_gen_ic_func_profiler): Likewise.
(pass_ipa_tree_profile::gate): Fix comment typos.
* value-prof.c  (gimple_ic_transform): Handle topn case.
Fix comment typos.

gcc/testsuite/ChangeLog

2019-07-31  Xiong Hu Luo  

PR ipa/69678
* gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
---
 gcc/cgraph.c  |  70 +-
 gcc/cgraph.h  |  28 ++-
 gcc/cgraphclones.c|   1 +
 gcc/ipa-inline.c  |  15 +-
 gcc/ipa-profile.c

rework Ada EH Machine_Occurrence deallocation

2019-07-30 Thread Alexandre Oliva
Introduce exception handler ABI #1 to ensure single release, no access
after release of reraised Machine_Occurrences, and no failure to
re-reraise a Machine_Occurrence.

Unlike Ada exceptions, foreign exceptions do not get a new
Machine_Occurrence upon reraise, but each handler would delete the
exception upon completion, normal or exceptional, save for the case of
a 'raise;' statement within the handler, that avoided the delete by
clearing the exception pointer that the cleanup would use to release
it.  The cleared exception pointer might then be used by a subsequent
reraise within the same handler.  Get_Current_Excep.all would also
expose the Machine_Occurrence to reuse by Reraise_Occurrence, even for
native exceptions.

Under ABI #1, Begin_Handler_v1 claims responsibility for releasing an
exception by saving its cleanup and setting it to Claimed_Cleanup.
End_Handler_v1 restores the cleanup and runs it, as long as it isn't
still Claimed_Cleanup (which indicates an enclosing handler has
already claimed responsibility for releasing it), and as long as the
same exception is not being propagated up (the next handler of the
propagating exception will then claim responsibility for releasing
it), so reraise no longer needs to clear the exception pointer, and it
can just propagate the exception, just like Reraise_Occurrence.

ABI #1 is fully interoperable with ABI #0, i.e., exception handlers
that call the #0 primitives can be linked together with ones that call
the #1 primitives, and they will not misbehave.  When a #1 handler
claims responsibility for releasing an exception, even #0 reraises
dynamically nested within it will refrain from releasing it.  However,
when a #0 handler is a handler of a foreign exception that would have
been responsible for releasing it with #1, a Reraise_Occurrence of
that foreign or other Machine_Occurrence-carrying exception may still
cause the exception to be released multiple times, and to be used
after it is first released, even if other handlers of the foreign
exception use #1.

This was regstrapped on x86_64-linux-gnu, and tested internally on
various other platforms.  I intend to install it after the corresponding
GDB changes, that I'm about to post, are in.


for  gcc/ada/ChangeLog

* libgnat/a-exexpr.adb (Begin_Handler_v1, End_Handler_v1): New.
(Claimed_Cleanup): New.
(Begin_Handler, End_Handler): Document.
* gcc-interface/trans.c (gigi): Switch to exception handler
ABI #1.
(Exception_Handler_to_gnu_gcc): Save the original cleanup
returned by begin handler, pass it to end handler, and use
EH_ELSE_EXPR to pass a propagating exception to end handler.
(gnat_to_gnu): Leave the exception pointer alone for reraise.
(add_cleanup): Handle EH_ELSE_EXPR, require it by itself.
---
 gcc/ada/gcc-interface/trans.c |  162 ++-
 gcc/ada/libgnat/a-exexpr.adb  |  188 -
 2 files changed, 303 insertions(+), 47 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 6cd37598d3962..b484bc78532a6 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -524,22 +524,27 @@ gigi (Node_Id gnat_root,
NULL_TREE, is_default, true, true, true, false, false, NULL, Empty);
 
   /* Hooks to call when entering/leaving an exception handler.  */
-  ftype = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
-
+  ftype = build_function_type_list (ptr_type_node,
+   ptr_type_node, NULL_TREE);
   begin_handler_decl
-= create_subprog_decl (get_identifier ("__gnat_begin_handler"), NULL_TREE,
-  ftype, NULL_TREE,
+= create_subprog_decl (get_identifier ("__gnat_begin_handler_v1"),
+  NULL_TREE, ftype, NULL_TREE,
   is_default, true, true, true, false, false, NULL,
   Empty);
-  /* __gnat_begin_handler is a dummy procedure.  */
+  /* __gnat_begin_handler_v1 is not a dummy procedure, but we arrange
+ for it not to throw.  */
   TREE_NOTHROW (begin_handler_decl) = 1;
 
+  ftype = build_function_type_list (ptr_type_node,
+   ptr_type_node, ptr_type_node,
+   ptr_type_node, NULL_TREE);
   end_handler_decl
-= create_subprog_decl (get_identifier ("__gnat_end_handler"), NULL_TREE,
+= create_subprog_decl (get_identifier ("__gnat_end_handler_v1"), NULL_TREE,
   ftype, NULL_TREE,
   is_default, true, true, true, false, false, NULL,
   Empty);
 
+  ftype = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
   unhandled_except_decl
 = create_subprog_decl (get_identifier ("__gnat_unhandled_except_handler"),
   NULL_TREE, ftype, NULL_TREE,
@@ -6201,37 +6206,55 @@ 

Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-30 Thread Jakub Jelinek
On Mon, Jul 29, 2019 at 03:35:08PM +0200, Martin Liška wrote:
> I'm sending v2 of the patch that can newly auto-detect make
> jobserver and use it.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> >From df747a46241dcdb8ad055071f9e0605c9f469268 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 23 Jul 2019 10:14:31 +0200
> Subject: [PATCH 1/2] Deduce automatically number of cores for -flto option.
> 
> gcc/ChangeLog:
> 
> 2019-07-23  Martin Liska  
> 
>   * doc/invoke.texi: Document new behavior.
>   * lto-wrapper.c (cpuset_popcount): New function
>   is a copy of libgomp/config/linux/proc.c.
>   (init_num_threads): Likewise.
>   (run_gcc): Automatically detect core count for -flto.
>   (jobserver_active_p): New function.

This broke a lot of tests.
The logs show
spawn -ignore SIGHUP /home/jakub/src/gcc/obj31/gcc/xgcc 
-B/home/jakub/src/gcc/obj31/gcc/ c_lto_pr83954_0.o c_lto_pr83954_1.o 
-fno-diagnostics-show-
caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2 -flto 
-flto-partition=1to1 -o gcc-dg-lto-pr83954-31.exe
make[4]: *** write jobserver: Bad file descriptor.  Stop.
make[4]: *** Waiting for unfinished jobs
make[4]: *** write jobserver: Bad file descriptor.  Stop.
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
collect2: fatal error: lto-wrapper returned 1 exit status
compilation terminated.
compiler exited with status 1
FAIL: gcc.dg/lto/pr83954 c_lto_pr83954_0.o-c_lto_pr83954_1.o link, -O2 -flto 
-flto-partition=1to1 
and similar, e.g. for x86_64-linux it was following regressions, on
i686-linux also some tests in libgomp etc.
Is -flto now really using all available CPUs for each compilation?  Without
jobserver that would like a fork-bomb, say if I have a CPU with 32 threads
and do make check -j32, does that mean there are 1024 lto1s?
Judging from http://gcc.gnu.org/ml/gcc-testresults/2019-07/msg03610.html
I'm not alone.

+FAIL: gcc.dg/lto/2008 c_lto_2008_0.o-c_lto_2008_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/2008 c_lto_2008_0.o-c_lto_2008_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081112 c_lto_20081112_0.o-c_lto_20081112_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081112 c_lto_20081112_0.o-c_lto_20081112_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081125 c_lto_20081125_0.o-c_lto_20081125_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081125 c_lto_20081125_0.o-c_lto_20081125_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081222 c_lto_20081222_0.o-c_lto_20081222_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20081222 c_lto_20081222_0.o-c_lto_20081222_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090210 c_lto_20090210_0.o-c_lto_20090210_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090213 c_lto_20090213_0.o-c_lto_20090213_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090213 c_lto_20090213_0.o-c_lto_20090213_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218 c_lto_20090218_0.o-c_lto_20090218_3.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218 c_lto_20090218_0.o-c_lto_20090218_3.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218-1 c_lto_20090218-1_0.o-c_lto_20090218-1_1.o link, 
-O0 -flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218-1 c_lto_20090218-1_0.o-c_lto_20090218-1_1.o link, 
-O2 -flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218-2 c_lto_20090218-2_0.o-c_lto_20090218-2_1.o link, 
-O0 -flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090218-2 c_lto_20090218-2_0.o-c_lto_20090218-2_1.o link, 
-O2 -flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090312 c_lto_20090312_0.o-c_lto_20090312_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090312 c_lto_20090312_0.o-c_lto_20090312_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090717 c_lto_20090717_0.o-c_lto_20090717_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090717 c_lto_20090717_0.o-c_lto_20090717_1.o link, -O2 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090812 c_lto_20090812_0.o-c_lto_20090812_1.o link, -O0 
-flto -flto-partition=1to1 -fno-use-linker-plugin 
+FAIL: gcc.dg/lto/20090812 

Re: Make lra use per-alternative earlyclobber info

2019-07-30 Thread Vladimir Makarov



On 2019-07-30 5:58 a.m., Richard Sandiford wrote:

lra_insn_reg and lra_operand_data have both a bitmask of earlyclobber
alternatives and an overall boolean.  The danger is that we then test
the overall boolean when really we should be testing for a particular
alternative.  This patch gets rid of the boolean and tests the mask
against zero when we really do need to test "any alternative might
be earlyclobber".  (I think the only instance of that is the
LRA_UNKNOWN_ALT handling in lra-lives.c:reg_early_clobber_p.)

This is needed (and tested) by an upcoming SVE patch.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?


Yes.  It looks good. Thank you for making the code more straightforward 
and accurate, Richard.





[wwwdocs] readings.html - move www.adaic.org reference to https

2019-07-30 Thread Gerald Pfeifer
...like the others.

Committed.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.316
diff -u -r1.316 readings.html
--- readings.html   28 Jul 2019 08:54:12 -  1.316
+++ readings.html   30 Jul 2019 22:13:35 -
@@ -568,7 +568,7 @@
https://www.adacore.com/community;>AdaCore Community 
Site
https://www2.adacore.com/gap-static/GNAT_Book/html/;>GNAT:
The GNU Ada Compiler
-   http://www.adaic.org/resources/add_content/docs/95style/html/cover.html;>Ada
+   https://www.adaic.org/resources/add_content/docs/95style/html/cover.html;>Ada
Quality  Style Guide
http://www.sigada.org/ada_letters/jun2004/ravenscar_article.pdf;>Guide
for the use of the Ada Ravenscar Profile in high integrity


Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-30 Thread Steve Ellcey
On Tue, 2019-07-30 at 14:31 +0100, Richard Sandiford wrote:
> 
> > -
> > -  tree new_type = build_distinct_type_copy (TREE_TYPE (node-
> > >decl));
> > -  TYPE_ARG_TYPES (new_type) = new_reversed;
> 
> I think you still need this line, just applied to the existing type
> rather than a new one.
> 
> > -  TREE_TYPE (node->decl) = new_type;
> > -
> >adjustments.release ();
> >  }

OK, I fixed that and retested with no regressions.

Steve Ellcey
sell...@marvell.com


2019-07-30  Steve Ellcey  

* omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
build_distinct_type_copy.
(simd_clone_adjust_argument_types): Ditto.
(simd_clone_adjust): Call build_distinct_type_copy here.
(expand_simd_clones): Ditto.

2019-07-30  Steve Ellcey  

* gcc.target/aarch64/simd_pcs_attribute.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..c8c528471a3 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
 return NULL_TREE;
-  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
 veclen = node->simdclone->vecsize_int;
@@ -724,11 +723,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	new_reversed = void_list_node;
 	}
-
-  tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
-  TYPE_ARG_TYPES (new_type) = new_reversed;
-  TREE_TYPE (node->decl) = new_type;
-
+  TYPE_ARG_TYPES (TREE_TYPE (node->decl)) = new_reversed;
   adjustments.release ();
 }
   args.release ();
@@ -1164,6 +1159,7 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
   targetm.simd_clone.adjust (node);
 
   tree retval = simd_clone_adjust_return_type (node);
@@ -1737,6 +1733,8 @@ expand_simd_clones (struct cgraph_node *node)
 	simd_clone_adjust (n);
 	  else
 	{
+	  TREE_TYPE (n->decl)
+		= build_distinct_type_copy (TREE_TYPE (n->decl));
 	  targetm.simd_clone.adjust (n);
 	  simd_clone_adjust_return_type (n);
 	  simd_clone_adjust_argument_types (n);
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
index e69de29bb2d..8eec6824f37 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double foo (double x);
+
+void bar(double * f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index e69de29bb2d..95f6a6803e8 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+double foo (double x);
+
+void bar(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+double foo(double x)
+{
+	return x * x / 3.0;
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
index e69de29bb2d..eddcef3597c 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double log (double __x);
+
+void foo(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = log(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tlog} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_log} 1 

[PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-07-30 Thread Bernd Edlinger
Hi Richard,

it is already a while ago, but I had not found time to continue
with this patch until now.

I think I have now a better solution, which properly addresses your
comments below.

On 3/25/19 9:41 AM, Richard Biener wrote:
> On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> 
>> On 3/21/19 12:15 PM, Richard Biener wrote:
>>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
>>> Finally...
>>>
>>> Index: gcc/function.c
>>> ===
>>> --- gcc/function.c  (revision 269264)
>>> +++ gcc/function.c  (working copy)
>>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
>>>if (DECL_MODE (decl) == BLKmode)
>>>  return false;
>>>
>>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
>>> +  && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
>>> +  && GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>> +> MEM_ALIGN (DECL_INCOMING_RTL (decl)))
>>> +return false;
>>> +
>>>/* If -ffloat-store specified, don't put explicit float variables
>>>   into registers.  */
>>>/* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
>>>
>>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
>>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
>>> doesn't exist then shouldn't we return false for safety reasons?
>>>

You are right, it is not possbile to return different results from
use_register_for_decl before vs. after incoming RTL is assigned.
That hits an assertion in set_rtl.

This hunk is gone now, instead I changed assign_parm_setup_reg
to use movmisalign optab and/or extract_bit_field if misaligned
entry_parm is to be assigned in a register.

I have no test coverage for the movmisalign optab though, so I
rely on your code review for that part.

>>> Similarly the very same issue should exist on x86_64 which is
>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>> a wrong one.
>>>
>>
>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
>> would look at MEM_ALIGN and do the right thing, yes?
> 
> No, they need to use the movmisalign optab and end up with UNSPECs
> for example.
Ah, thanks, now I see.

>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
>> right?
> 
> Yes, I think we never losened that.  Note that RTL expansion has to
> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> specifies that x86 is strict-align wrt vector modes.
> 

Yes I agree, the code would be incorrect for x86 as well when the 
movmisalign_optab
is not used.  So I invoke the movmisalign optab if available and if not fall
back to extract_bit_field.  As in the assign_parm_setup_stack 
assign_parm_setup_reg
assumes that data->promoted_mode != data->nominal_mode does not happen with
misaligned stack slots.


Attached is the v3 if my patch.

Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.

Is it OK for trunk?


Thanks
Bernd.
2019-07-30  Bernd Edlinger  

	PR middle-end/89544
	* function.c (assign_param_data_one): Remove unused data members.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.
	(assign_parm_adjust_stack_rtl): Revise STRICT_ALIGNMENT check.
	(assign_parm_setup_reg): Handle misaligned stack arguments.

testsuite:
2019-07-30  Bernd Edlinger  

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===
--- gcc/function.c	(revision 273767)
+++ gcc/function.c	(working copy)
@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };
 
 /* A subroutine of assign_parms.  Initialize ALL.  */
@@ -2699,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
  intentionally forcing upward padding.  Otherwise we have to come
  up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
 align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+{
+  align = boundary;
+  /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+  if (poly_int_rtx_p (offset_rtx, )
+	  

Re: [PATCH] PR fortran/91296 -- Prevent ICE in aliasing check

2019-07-30 Thread Steve Kargl
On Tue, Jul 30, 2019 at 08:17:39PM +0200, Thomas Koenig wrote:
> Hi Steve,
> 
> > The attach patch has been regression tested on x86_64-*-freebsd.
> > There were no regression.
> > 
> > The patch prevents an ICE when checking for aliasing and the
> > actual arguments are the real and imaginary parts of a complex
> > entity.  See the testcase for more information.
> > 
> > OK to commit to both trunk and 9-branch?
> 
> OK.
> 
> Thanks for the patch!
> 

Thanks for quick reviewed.  Committed to trunk as
revision 273914.

-- 
Steve


Re: [PATCH] Adding _Dependent_ptr type qualifier in C part 1/3

2019-07-30 Thread Martin Sebor

On 7/30/19 1:13 AM, Akshat Garg wrote:

Hi,
This patch includes C front-end code for a type qualifier _Dependent_ptr.


Just some very high-level comments/questions.  I only followed
the _Dependent_ptr discussion from a distance and I'm likely
missing some context so the first thing I looked for in this
patch is documentation of the new qualifier.  Unless it's
a proposed C2x feature that I missed I would expect to find it
in section 6 - Extensions to the C Language Family of the manual.
I saw the references to WG21's p0190r4 in the tests but the paper
doesn't mention _Dependent_ptr, and I found no references to a C
paper that does.  If it has been proposed for C2X as well can
you point to it?  (In that case, or if a proposal is planned,
the feature should probably either only be available with
-std=c2x and -std=gnu2x or a pedantic warning should be issued
for its use in earlier modes similarly to how uses of _Atomic
are diagnosed in pre-C11 modes.)

Martin


The patch has been tested using the following
make bootstrap -j 4
make -k check -j 4

on x86_64-linux-gnu.

Thanks,
Akshat

gcc/ChangeLog:

2019-07-30  Akshat Garg 

 * c-family/c-common.c (struct c_common_resword c_common_reswords):
Add "_Dependent_ptr".
 (c_apply_type_quals_to_decl): Set the flag for _Dependent_ptr if
qualified.
 (keyword_is_type_qualifier): Add RID_DEPENDENT_PTR.
 * c-family/c-common.h (enum rid): Add RID_DEPENDENT_PTR.
 * c-family/c-format.c (check_format_types): Add dependent pointer
as a qualifier check.
 * c-family/c-pretty-print.c (pp_c_cv_qualifiers): Handle dependent
pointer qualifier.
 * c/c-aux-info.c (gen_type): Handle dependent pointer qualifier.
 (gen_decl): Handle dependent pointer qualifier.
 * c/c-decl.c (merge_decls): Set old declaration as having dependent
pointer qualification if new declaration has one.
 (shadow_tag_warned): Add dependent_ptr_p to declspecs check.
 (quals_from_declspecs): Add dependent_ptr_p to declspecs check.
 (grokdeclarator): Add checks for dependent pointer qualifier and
warn of duplicate or errors. Allow dependent pointer for pointer types only.
 * c/c-parser.c (c_keyword_starts_typename, c_token_is_qualifier,
c_token_starts_declspecs): Add RID_DEPENDENT_PTR.
 (c_parser_static_assert_declaration_no_semi): Add _Dependent_ptr
qualifier in comments.
 (c_parser_declspecs, c_parser_attribute_any_word): Add
RID_DEPENDENT_PTR.
 * c/c-tree.h (C_TYPE_FIELDS_DEPENDENT_PTR): New macro to mark
dependent pointer.
 (enum c_declspec_word): Add cdw_dependent_ptr.
 (struct c_declspecs): Add dependent_ptr_p field.
 * print-tree.c (print_node): Print dependent_ptr qualifier.
 * tree-core.hi (enum cv_qualifier): Add TYPE_QUAL_DEPENDENT_PTR.
 (enum tree_index): Add TI_DEPENDENT_PTR_TYPE.
 (struct tree_base): Add dependent_ptr_flag field.
 * tree-pretty-print.c (dump_generic_node): Print dependent pointer
type qualifier.
 * tree.c (fld_type_variant, set_type_quals): Set TYPE_DEPENDENT_PTR.
 * tree.h (TREE_THIS_DEPENDENT_PTR): New macro. To access
dependent_ptr_flag field in tree_base.
 (TYPE_DEPENDENT_PTR): New accessor macro.
 (TYPE_QUALS, TYPE_QUALS_NO_ADDR_SPACE): Add TYPE_QUAL_DEPENDENT_PTR.
 (dependent_ptrTI_type_node): Add new type node.

gcc/testsuite/ChangeLog:

2019-07-30  Akshat Garg 

 * gcc.dg/c11-dependent_ptr-test-1.c: New test for _Dependent_ptr
qualifier.
 * gcc.dg/{p0190r4_fig8, p0190r4_fig9, p0190r4_fig10, p0190r4_fig11,
p0190r4_fig12, p0190r4_fig13}.c: New tests from document P0190R4 for
_Dependent_ptr qualifier.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index cb92710..4f09037 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -345,6 +345,7 @@ const struct c_common_resword c_common_reswords[] =
{ "_Alignas", RID_ALIGNAS,   D_CONLY },
{ "_Alignof", RID_ALIGNOF,   D_CONLY },
{ "_Atomic", RID_ATOMIC,D_CONLY },
+ { "_Dependent_ptr",   RID_DEPENDENT_PTR,  0 },
{ "_Bool", RID_BOOL,  D_CONLY },
{ "_Complex", RID_COMPLEX, 0 },
{ "_Imaginary", RID_IMAGINARY, D_CONLY },
@@ -3546,6 +3547,11 @@ c_apply_type_quals_to_decl (int type_quals, tree
decl)
TREE_SIDE_EFFECTS (decl) = 1;
TREE_THIS_VOLATILE (decl) = 1;
  }
+  if (type_quals & TYPE_QUAL_DEPENDENT_PTR)
+{
+  TREE_SIDE_EFFECTS (decl) = 1;
+  TREE_THIS_DEPENDENT_PTR (decl) = 1;
+}
if (type_quals & TYPE_QUAL_RESTRICT)
  {
while (type && TREE_CODE (type) == ARRAY_TYPE)
@@ -7851,6 +7857,7 @@ keyword_is_type_qualifier (enum rid keyword)
  case RID_VOLATILE:
  case RID_RESTRICT:
  case RID_ATOMIC:
+case RID_DEPENDENT_PTR:
return true;
  default:
return false;
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 117d729..ab55882 100644
--- 

Re: [C++ PATCH] PR c++/90590 Suppress warning for enumeration value not handled in switch warning

2019-07-30 Thread Matthew Beliveau
ping

On Tue, Jul 23, 2019 at 10:27 AM Matthew Beliveau  wrote:
>
> ping
>
> On Tue, Jul 16, 2019 at 8:40 AM Marek Polacek  wrote:
> >
> > On Mon, Jul 15, 2019 at 09:47:26AM -0400, Matthew Beliveau wrote:
> > > Okay I kept the TYPE_MAIN_VARIANT and dropped the accidental new line!
> > > Hopefully this should be fine!
> >
> > CCing Joseph as this is c-family/ stuff.
> >
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > >
> > > 2019-07-12  Matthew Beliveau  
> > >
> > >   PR c++/90590
> > >   * c-warn.c (c_do_switch_warnings): Suppress warning for enumerators
> > >   with reserved names that are in a system header.
> > >
> > >   * c-c++-common/pr90590-1.c: New test.
> > >   * c-c++-common/pr90590-1.h: New test.
> > >   * c-c++-common/pr90590-2.c: New test.
> > >   * c-c++-common/pr90590-2.h: New test.
> > >
> > > diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> > > index b5d09e761d7..51c54a283e5 100644
> > > --- gcc/c-family/c-warn.c
> > > +++ gcc/c-family/c-warn.c
> > > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "gcc-rich-location.h"
> > >  #include "gimplify.h"
> > >  #include "c-family/c-indentation.h"
> > > +#include "c-family/c-spellcheck.h"
> > >  #include "calls.h"
> > >  #include "stor-layout.h"
> > >
> > > @@ -1628,6 +1629,15 @@ c_do_switch_warnings (splay_tree cases, location_t 
> > > switch_location,
> > >if (cond && tree_int_cst_compare (cond, value))
> > >   continue;
> > >
> > > +  /* If the enumerator is defined in a system header and uses a 
> > > reserved
> > > +  name, then we continue to avoid throwing a warning.  */
> > > +  location_t loc = DECL_SOURCE_LOCATION
> > > + (TYPE_STUB_DECL (TYPE_MAIN_VARIANT (type)));
> > > +  if (in_system_header_at (loc)
> > > +   && name_reserved_for_implementation_p
> > > +   (IDENTIFIER_POINTER (TREE_PURPOSE (chain
> > > + continue;
> > > +
> > >/* If there is a default_node, the only relevant option is
> > >Wswitch-enum.  Otherwise, if both are enabled then we prefer
> > >to warn using -Wswitch because -Wswitch is enabled by -Wall
> > > diff --git gcc/testsuite/c-c++-common/pr90590-1.c 
> > > gcc/testsuite/c-c++-common/pr90590-1.c
> > > new file mode 100644
> > > index 000..4e11debb7fa
> > > --- /dev/null
> > > +++ gcc/testsuite/c-c++-common/pr90590-1.c
> > > @@ -0,0 +1,15 @@
> > > +// PR c++/90590
> > > +// { dg-options -Wswitch }
> > > +#include "pr90590-1.h"
> > > +
> > > +void
> > > +g ()
> > > +{
> > > +  enum E e = _A;
> > > +  switch (e) // { dg-bogus "enumeration value '_C' not handled in 
> > > switch" }
> > > +{
> > > +case _A:
> > > +case _B:
> > > +  break;
> > > +}
> > > +}
> > > diff --git gcc/testsuite/c-c++-common/pr90590-1.h 
> > > gcc/testsuite/c-c++-common/pr90590-1.h
> > > new file mode 100644
> > > index 000..22f1a7d5d52
> > > --- /dev/null
> > > +++ gcc/testsuite/c-c++-common/pr90590-1.h
> > > @@ -0,0 +1,2 @@
> > > +#pragma GCC system_header
> > > +enum E { _A, _B, _C };
> > > diff --git gcc/testsuite/c-c++-common/pr90590-2.c 
> > > gcc/testsuite/c-c++-common/pr90590-2.c
> > > new file mode 100644
> > > index 000..23da97f9d74
> > > --- /dev/null
> > > +++ gcc/testsuite/c-c++-common/pr90590-2.c
> > > @@ -0,0 +1,11 @@
> > > +// PR c++/90590
> > > +// { dg-options -Wswitch }
> > > +
> > > +#include "pr90590-2.h"
> > > +
> > > +void
> > > +fn ()
> > > +{
> > > +  switch (c.b) // { dg-bogus "enumeration value" }
> > > +;
> > > +}
> > > diff --git gcc/testsuite/c-c++-common/pr90590-2.h 
> > > gcc/testsuite/c-c++-common/pr90590-2.h
> > > new file mode 100644
> > > index 000..e4f8635576f
> > > --- /dev/null
> > > +++ gcc/testsuite/c-c++-common/pr90590-2.h
> > > @@ -0,0 +1,4 @@
> > > +#pragma GCC system_header
> > > +struct {
> > > +  enum { _A } b;
> > > +} c;
> >
> >
> > Marek


Re: [PATCH] PR fortran/91296 -- Prevent ICE in aliasing check

2019-07-30 Thread Thomas Koenig

Hi Steve,


The attach patch has been regression tested on x86_64-*-freebsd.
There were no regression.

The patch prevents an ICE when checking for aliasing and the
actual arguments are the real and imaginary parts of a complex
entity.  See the testcase for more information.

OK to commit to both trunk and 9-branch?


OK.

Thanks for the patch!

Regards

Thomas


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2019 at 5:33 PM Peter Bergner  wrote:
>
> On 7/30/19 7:52 AM, Uros Bizjak wrote:
> > On Thu, Jul 25, 2019 at 7:34 PM Peter Bergner  wrote:
> >>> +#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> >>> +
> >>> +#ifndef SUBTARGET_DRIVER_SELF_SPECS
> >>> +# define SUBTARGET_DRIVER_SELF_SPECS
> >>> +#endif
> >
> > Shouldn't we swap these two defines, so we always get SUBTARGET_D_S_S
> > defined first? Like:
> >
> > #ifndef SUBTARGET_DRIVER_SELF_SPECS
> > # define SUBTARGET_DRIVER_SELF_SPECS ""
> > #endif
> >
> > #define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
>
> That's the way I would have coded it myself, but when I looked at the
> other arches handling of spec defines, they all seemed to code it this
> way.  Consider it swapped.

Yes, this looks much more readable to me; consider the x86 part OK.

Uros.


[PATCH][aarch64] Use neoversen1 tuning struct for -mcpu=cortex-a76

2019-07-30 Thread Kyrill Tkachov

Hi all,

The neoversen1 tuning struct gives better performance on the Cortex-A76, 
so use that.
The only difference from the current tuning is the function and label 
alignment settings.


This gives about 1.3% improvement on SPEC2006 int and 0.3% on SPEC2006 fp.

Tested on aarch64-none-elf.

Ok for trunk?
Thanks,
Kyrill

2019-07-31  Kyrylo Tkachov  

    * config/aarch64/aarch64-cores.def (cortex-a76): Use neoversen1 tuning
    struct.

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 82d91d62519f0476e059718d8c2f02e3ea5a0613..bd257aeb34e7948caf729e5a684ddd9ceba457a4 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -99,7 +99,7 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR
 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
-AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, neoversen1, 0x41, 0xd0b, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)


[PATCH] PR fortran/91296 -- Prevent ICE in aliasing check

2019-07-30 Thread Steve Kargl
The attach patch has been regression tested on x86_64-*-freebsd.
There were no regression.

The patch prevents an ICE when checking for aliasing and the
actual arguments are the real and imaginary parts of a complex
entity.  See the testcase for more information.

OK to commit to both trunk and 9-branch?

2019-07-30  Steven G. Kargl  

PR fortran/91296
* interface.c (compare_actual_expr): When checking for aliasing, add
a case to handle REF_INQUIRY (e.g., foo(x%re, x%i) do not alias).

2019-07-30  Steven G. Kargl  

PR fortran/91296
* gfortran.dg/pr91296.f90: New test.

-- 
Steve
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c	(revision 273846)
+++ gcc/fortran/interface.c	(working copy)
@@ -3489,6 +3489,13 @@ compare_actual_expr (gfc_expr *e1, gfc_expr *e2)
 	case REF_SUBSTRING:
 	  return false;
 
+	case REF_INQUIRY:
+	  if (e1->symtree->n.sym->ts.type == BT_COMPLEX
+	  && e1->ts.type == BT_REAL && e2->ts.type == BT_REAL
+	  && r1->u.i != r2->u.i)
+	return false;
+	  break;
+
 	default:
 	  gfc_internal_error ("compare_actual_expr(): Bad component code");
 	}
Index: gcc/testsuite/gfortran.dg/pr91296.f90
===
--- gcc/testsuite/gfortran.dg/pr91296.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91296.f90	(working copy)
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options "-Waliasing" }
+! PR fortran/91296
+! Code contributed by Chinoune Mehdi  
+module m
+  implicit none
+  integer, parameter :: sp = selected_real_kind(6)
+
+contains
+  pure subroutine s(a,b,c)
+real(sp), intent(in) :: a, b
+real(sp), intent(out) :: c
+c = a + b
+  end subroutine s
+end module m
+
+program test
+  use m
+  implicit none
+  real(sp) :: a
+  complex(sp) :: c
+
+  c = (1._sp,1._sp)
+  call s(c%re,c%im,a)   ! *** This use to cause an ICE. ***
+  print*,a
+
+end program test


[COMMITTED][ARM] Adjust literal pool offset in Thumb-2 movsi patterns

2019-07-30 Thread Wilco Dijkstra
My previous change to the Thumb-2 movsi patterns [1] caused a codesize 
regression
with -Os in large functions.  Fix this by using the literal pool offset of the
16-bit literal load so that the literal pool is dumped earlier, reducing the
number of 32-bit literal loads.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57,
committed as obvious.

[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01579.html

ChangeLog:
2019-07-30  Wilco Dijkstra  

* config/arm/thumb2.md (thumb2_movsi_insn): Adjust literal offset.
* config/arm/vfp.md (thumb2_movsi_vfp): Likewise.

---
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
c124838c3a536244909223bd18735427cef7d969..6ccc875e2b4e7b8ce256e52da966dfe220c6f5d6
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -274,7 +274,7 @@ (define_insn "*thumb2_movsi_insn"
(set_attr "length" "2,4,2,4,4,4,4")
(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "yes,no,yes,no,no,no,no")
-   (set_attr "pool_range" "*,*,*,*,*,4094,*")
+   (set_attr "pool_range" "*,*,*,*,*,1018,*")
(set_attr "neg_pool_range" "*,*,*,*,*,0,*")]
 )
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
9cc5946c79e027c0132d2b5e4fd6b97f13bca72e..661919e2357d352d12ff1020dc061f0c8d052841
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -297,7 +297,7 @@ (define_insn "*thumb2_movsi_vfp"
(set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no")
(set_attr "type" 
"mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")
(set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4")
-   (set_attr "pool_range" "*,*,*,*,*,4094,*,*,*,*,1018,*")
+   (set_attr "pool_range" "*,*,*,*,*,1018,*,*,*,*,1018,*")
(set_attr "neg_pool_range" "*,*,*,*,*,   0,*,*,*,*,1008,*")]
 )
 


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-30 Thread Peter Bergner
On 7/30/19 7:52 AM, Uros Bizjak wrote:
> On Thu, Jul 25, 2019 at 7:34 PM Peter Bergner  wrote:
>>> +#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
>>> +
>>> +#ifndef SUBTARGET_DRIVER_SELF_SPECS
>>> +# define SUBTARGET_DRIVER_SELF_SPECS
>>> +#endif
> 
> Shouldn't we swap these two defines, so we always get SUBTARGET_D_S_S
> defined first? Like:
> 
> #ifndef SUBTARGET_DRIVER_SELF_SPECS
> # define SUBTARGET_DRIVER_SELF_SPECS ""
> #endif
> 
> #define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS

That's the way I would have coded it myself, but when I looked at the
other arches handling of spec defines, they all seemed to code it this
way.  Consider it swapped.

Peter




Re: [PATCH] Make BITMAP_WORD more easily configurable

2019-07-30 Thread Martin Sebor

On 7/30/19 9:19 AM, Jakub Jelinek wrote:

On Tue, Jul 30, 2019 at 09:16:45AM -0600, Martin Sebor wrote:

Say something like this POC:


We have those already, see ctz_hwi, clz_hwi etc.
The thing is that BITMAP_WORD isn't always the same type as unsigned
HOST_WIDE_INT, and so we need ones with the BITMAP_WORD type.


That sounds like a problem overloading would solve to the benefit
of all clients.  I.e., overload count_trailing_zeros (or whatever
other name) to take integer arguments of all widths (signed and
unsigned) and call the appropriate built-in from each.  That way
we just have to remember one function name and know that it will
be the right choice regardless of the type of the argument.

Martin


Re: [PATCH] Make BITMAP_WORD more easily configurable

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 30, 2019 at 09:16:45AM -0600, Martin Sebor wrote:
> Say something like this POC:

We have those already, see ctz_hwi, clz_hwi etc.
The thing is that BITMAP_WORD isn't always the same type as unsigned
HOST_WIDE_INT, and so we need ones with the BITMAP_WORD type.

Jakub


Re: [PATCH] Make BITMAP_WORD more easily configurable

2019-07-30 Thread Martin Sebor

On 7/30/19 8:37 AM, Richard Biener wrote:


I've played with different BITMAP_WORD and BITMAP_ELEMENT_WORDS
values and found the code using CTZ and friends a bit unwieldly
to adjust.  So I came up with a set of macros wrapping them
(_maybe_ inline overloads would be a cleaner solution, but ...).

Specifically it looks like for PR91257 making bitmap_element
cache-line aligned and 32byte in size by using an unsigned
int BITMAP_WORD and BITMAP_ELEMENT_WORDS 3 (thus reducing
the load factor from 0.4 to 0.375) is a slight win (in both
memory use and compile-time).  Enlarging to 64 bytes and keeping
unsigned long (and the padding) is worst, even using only
a single unsigned long (but then aligning to 32bytes again)
is faster than the current state.  Surprisingly the division/modulo
by 3 or the larger number of loads/stores didn't matter or were
at least offsetted by reliably touching only a single cache-line
per bitmap_element.

I would guess the optimal setting should depend on the host.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Any objection to the macro-ification below?


The built-ins are general purpose (but poorly named) and could be
handy elsewhere (I just recently looked for one of them) so rather
than adding BITMAP_WORD_CTZ et al. to bitmap (and continuing to
support the GCC 3 conditional in the bitmap code) I'd suggest
introducing a general purpose wrapper function with a good name
as the portable interface to __builtin_ctz et al.

Say something like this POC:

  static inline unsigned
  count_trailing_zeros (unsigned x)
  {
  #if (GCC_VERSION >= 3004)
return __builtin_ctz (x);
  #else
unsigned n = 0;
while ((n < sizeof n * CHAR_BIT) && !(x & 1))
  {
x >>= 1;
++n;
   }
return n;
  #endif
  }

Martin


Richard.

2019-07-30  Richard Biener  

* bitmap.h (BITMAP_WORD_CTZ): New define.
(BITMAP_WORD_CLZ): Likewise.
(BITMAP_WORD_POPCOUNT): Likewise.
(bmp_iter_next_bit): Use BITMAP_WORD_CTZ, remove assert.
* bitmap.c (bitmap_count_bits_in_word): use BITMAP_WORD_POPCOUNT.
(bitmap_single_bit_set_p): likewise.
(bitmap_first_set_bit): Use BITMAP_WORD_CTZ, remove assert.
(bitmap_last_set_bit): Use BITMAP_WORD_CLZ, remove assert.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 273795)
+++ gcc/bitmap.c(working copy)
@@ -1016,7 +1020,7 @@ bitmap_count_bits_in_word (const BITMAP_
  #if GCC_VERSION >= 3400
/* Note that popcountl matches BITMAP_WORD in type, so the actual size
 of BITMAP_WORD is not material.  */
-  count += __builtin_popcountl (bits[ix]);
+  count += BITMAP_WORD_POPCOUNT (bits[ix]);
  #else
count += bitmap_popcount (bits[ix]);
  #endif
@@ -1101,7 +1105,7 @@ bitmap_single_bit_set_p (const_bitmap a)
  #if GCC_VERSION >= 3400
/* Note that popcountl matches BITMAP_WORD in type, so the actual size
 of BITMAP_WORD is not material.  */
-  count += __builtin_popcountl (elt->bits[ix]);
+  count += BITMAP_WORD_POPCOUNT (elt->bits[ix]);
  #else
count += bitmap_popcount (elt->bits[ix]);
  #endif
@@ -1142,8 +1146,7 @@ bitmap_first_set_bit (const_bitmap a)
bit_no += ix * BITMAP_WORD_BITS;
  
  #if GCC_VERSION >= 3004

-  gcc_assert (sizeof (long) == sizeof (word));
-  bit_no += __builtin_ctzl (word);
+  bit_no += BITMAP_WORD_CTZ (word);
  #else
/* Binary search for the first set bit.  */
  #if BITMAP_WORD_BITS > 64
@@ -1200,8 +1203,7 @@ bitmap_last_set_bit (const_bitmap a)
   found_bit:
bit_no += ix * BITMAP_WORD_BITS;
  #if GCC_VERSION >= 3004
-  gcc_assert (sizeof (long) == sizeof (word));
-  bit_no += BITMAP_WORD_BITS - __builtin_clzl (word) - 1;
+  bit_no += BITMAP_WORD_BITS - BITMAP_WORD_CLZ (word) - 1;
  #else
/* Hopefully this is a twos-complement host...  */
BITMAP_WORD x = word;
Index: gcc/bitmap.h
===
--- gcc/bitmap.h(revision 273795)
+++ gcc/bitmap.h(working copy)
@@ -272,6 +272,10 @@ extern mem_alloc_description  
  typedef unsigned long BITMAP_WORD;

+/* Word primitives.  */
+#define BITMAP_WORD_CTZ __builtin_ctzl
+#define BITMAP_WORD_CLZ __builtin_clzl
+#define BITMAP_WORD_POPCOUNT __builtin_popcountl
  /* BITMAP_WORD_BITS needs to be unsigned, but cannot contain casts as
 it is used in preprocessor directives -- hence the 1u.  */
  #define BITMAP_WORD_BITS (CHAR_BIT * SIZEOF_LONG * 1u)
@@ -683,8 +688,7 @@ bmp_iter_next_bit (bitmap_iterator * bi,
  {
  #if (GCC_VERSION >= 3004)
{
-unsigned int n = __builtin_ctzl (bi->bits);
-gcc_assert (sizeof (unsigned long) == sizeof (BITMAP_WORD));
+unsigned int n = BITMAP_WORD_CTZ (bi->bits);
  bi->bits >>= n;
  *bit_no += n;
}





Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Wilco Dijkstra
Hi all,
 
 >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
 >> On 30/07/2019 10:08, Christophe Lyon wrote:

 >>> Hi Wilco,
 >>>
 >>> Do you know which benchmarks were used when this was checked-in?
 >>> It isn't clear from 
 >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
 >> 
 >> It was from my time in Linaro and thus would have been a famous embedded 
 >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
 >> Cortex-A15. In addition to this I would like to see what the impact of 
 >> this is on something like Cortex-A53 as the issue rates are likely to be 
 >> different on the schedulers causing different behaviour.

Obviously there are differences between various schedulers, but the general
issue is that register pressure is increased many times beyond the spilling 
limit
(a few cases I looked at had a pressure well over 120 when there are only 14
integer registers - this causes panic spilling in the register allocator).

In fact the spilling overhead between the 2 algorithms is almost identical on
Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
model used. It seems more related to the scheduler being too aggressive
and not caring about register pressure at all (for example lifting a load 100
instructions before its use so it must be spilled).

 >> I don't have all the notes today for that - maybe you can look into the 
 >> linaro wiki.
 >> 
 >> I am concerned about taking this patch in without some more data across 
 >> a variety of cores.
 >> 
 >
 > My concern is the original patch 
 > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
 > any real detail as to the reasons for the choice of the second algorithm 
 > over the first.
 > 
 > - It's not clear what the win was
 > - It's not clear what outliers there were and whether they were significant.
 >
 > And finally, it's not clear if, 7 years later, this is still the best 
 > choice.
 > 
 > If the second algorithm really is better, why is no other target using 
 > it by default?
 >
 > I think we need a bit more information (both ways).  In particular I'm 
 > concerned not just by the overall benchmark average, but also the amount 
 > of variance across the benchmarks.  I think the default needs to avoid 
 > significant outliers if at all possible, even if it is marginally less 
 > good on the average.
 
The results clearly show that algorithm 1 works best on Arm today - I haven't
seen a single benchmark where algorithm 2 results in less spilling. We could
tune algorithm 2 so it switches back to algorithm 1 when register pressure is
high or a basic block is large. However until it is fixed, the evidence is that
algorithm 1 is the best choice for current cores.

Wilco

Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-30 Thread Andrew MacLeod

On 7/30/19 4:55 AM, Richard Biener wrote:

On Fri, Jul 26, 2019 at 4:32 PM Andrew MacLeod  wrote:

On 7/25/19 11:37 PM, Jeff Law wrote:

On 7/24/19 12:33 PM, Richard Biener wrote:

On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law 
wrote:

On 7/24/19 11:00 AM, Richard Biener wrote: [ Big snip, ignore
missing reply attributions... ]


it. But I'd claim that if callers are required not to change
these ranges, then the callers are fundamentally broken.  I'm
not sure what the "sanitization" is really buying you here.
Can you point to something specific?


But you lose the sanitizing that nobody can change it and the
   changed info leaks to other SSA vars.

As said, fix all callers to deal with NULL.

But I argue the current code is exactly optimal and safe.

ANd I'd argue that it's just plain broken and that the
sanitization you're referring to points to something broken
elsewhere,  higher up in the callers.

Another option is to make get_value_range return by value and
the only way to change the lattice to call an appropriate set
function. I think we already do the latter in all cases (but we
use get_value_range in the setter) and returning by reference is
just eliding the copy.

OK, so what I think you're getting at (and please correct me if
I'm wrong) is that once the lattice values are set, you don't want
something changing the recorded ranges underneath?

ISTM the way to enforce that is to embed the concept in the class
and enforce it by not allowing direct manipulation of range by the
clients. So a client that wants this behavior somehow tells the
class that ranges are "set in stone" and from that point the
setters don't allow changing the underlying ranges.

Yes. You'll see that nearly all callers do

Value_range vr = *get_value_range (name);

Modify

Update_value_range (name, ) ;

And returning by reference was mostly an optimization. We _did_ have
callers Changing the range in place and the const varying catched
those.

When returning by value we can return individual VARYINGs not in the
lattice if we decide that's what we want.


I just want to make sure we're on the same page WRT why you think
the constant varying range object is useful.

As said it's an optimization. We do not want to reallocate the
lattice. And we want lattice updating to happen in a controlled
manner, so returning a pointer into the lattice is bad design at this
point.

But I would claim that the current state is dreadful.  Consider that
when gimple-fold asks for a new SSA_NAME, it could get a recycled one,
in which case we get a real range.  Or if it doens't get a recycled
name, then we get the const varying node.  The inconsistently is silly
when we can just reallocate the underlying object.

Between recycling of SSA_NAMEs and allocating a bit of additional space
(say rounding up to some reasonable boundary) I'd bet you'd never be
able to measure the reallocation in practice.


I annotated the patch yesterday to actually log reallocations and ran a
couple of bootstraps...

If we don't add any extra space in the vector initially (as it is
allocated today) , it is re-sized a total of 201 times.  Of those, 93
get back the same pointer so no resize actually happens.

IF we use the patch as I initially propose, where we add 10% to the
vector to start, we re-size 10 times, all from either 18 to 25 , or 8 to
14,so very small numbers of ssaname functions, where the 10% doesnt
really do much.  Those were all re-allocations but one.   The initial
resize does seem to help prevent any larger reallocations,

THat doesn't seem like that bad of a thing over all, and we could tweak
the initial size a bit more if it would help? to deal with the small
cases, we could make it num_names+10%+10 or something even.

I feel it would be safer.. It seems to me the CONST solution cannot be
disabled.. ie, even a non-checking production compiler would go boom if
it triggered.

As for addressing the issue that the CONST approach is trying to
resolve, Could we change all the set/update routines to do something like

gcc_checking_assert (new_range->varying_p ()  || !values_propagated);

that way we'll trigger an assert if we try to change any value to
something other than varying when values_propagated is set?

I think the constness is appropriately addressed by my recent API update
to split the get-value from get-lattice-entry and properly forcing lattice
update to go through the few setters.

I'm not totally against making the lattice expandable, as the followup
patch to the original re-org notices we do run into "new" stmts during
the combined analysis/propagation stages the DOM-based users use.
And yes, allocating the lattice with some initial head-room is of course
the way to go (I'd even just do 2 * num_ssa_names...).  Avoiding
"useless" allocations of VR_VARYING lattice entries (where a NULL
does it just fine) is another optimization.  But yeah, we do not
"free" lattice entries when they become VR_VARYING and further
return the shared const entry (but 

Re: [PATCH v2] Use edge->indirect_unknown_callee in cgraph_edge::make_direct (PR ipa/89330).

2019-07-30 Thread Martin Liška
On 7/30/19 3:37 PM, Martin Liška wrote:
> Hi.
> 
> Thanks to Martin I was able to prepare a proper fix. The issue is that
> cgraph_edge::resolve_speculation can delete this pointer (yes, it's
> super nasty) and so that the caller can't use this->something
> right after the function returns.
> 
> For the long term, I'll rework the ::resolve_speculation function.
> 
> The patch survives --enable-checking bootstrap on x86_64-linux-gnu.
> 
> Ready to be installed after proper testing?
> Thanks,
> Martin
> 

Honza approved me the patch offline.

Martin


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-30 Thread Martin Sebor

On 7/30/19 2:44 AM, Richard Biener wrote:

On Tue, Jul 30, 2019 at 10:35 AM Jakub Jelinek  wrote:


On Tue, Jul 30, 2019 at 10:21:15AM +0200, Richard Biener wrote:

Would you need to LTO stream & merge the bitmaps / maps somehow?


Yes.  And if we do not throw unneeded warnings from the sets normally, LTO
streaming might be a good time to do that, so that we merge in only warnings
that will be tested during/post IPA.


(maybe "unmap" to sth else when streaming individual stmts)  Not sure if
a bitmap is really necessary, we could simply have a tree/gimple * -> vec<>
mapping with records about emitted (or queued) diagnostics, like
OPT_, message pairs or so.


Right now we use it both for the case we've already emitted some diagnostics
and don't want to emit it again later, or for the case where we insert
something in the IL that a warning could be diagnosed on but we know we
don't want that.  The latter is the case e.g. for when we add
TREE_NO_WARNING on the last value in a statement expression so that we don't
diagnose it as unused, or the case we are discussing here.
If we need queued diagnostics, sure, we could add it in, but I don't see a
need for that right now.  vecs compared to bitmap might be larger and harder
to avoid duplicates.  I guess we'd need to do some analysis though, and e.g.
if 99% of cases we need to disable just one warning and not multiple,
perhaps optimize for that case.


I was thinking we may want to use the same facility to record warnings we
want to not emit if we later figure a stmt was in an unreachable region.  For
this we need to record the actual warning, not only the warning kind.


I was thinking of introducing a __builtin_warning() for this and
issuing it if it's not eliminated just before RTL expansion.  That
would let programs like libstdc++ use it too.

The downside of these solutions is that warnings may be issued out
of order as code moves around.  To have the issued in the order of
increasing source lines they would need to be queued.

Martin


[PATCH] Make VRP union faster

2019-07-30 Thread Richard Biener


The following addresses value_range::union_ranges which ends up
doing quite some redundant work for the PR91257 testcase via
PHI argument processing.  compare_values isn't really much
slower once you may end up needing more than a single
operand_less_p - in fact for non-constant ranges it often ends
up dispatching there.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-07-30  Richard Biener  

PR tree-optimization/91257
* tree-vrp.c (union_ranges): Unify equality and less tests
by using compare_values.  Re-order cheap tests first.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 273795)
+++ gcc/tree-vrp.c  (working copy)
@@ -5468,8 +5474,10 @@ union_ranges (enum value_range_kind *vr0
  enum value_range_kind vr1type,
  tree vr1min, tree vr1max)
 {
-  bool mineq = vrp_operand_equal_p (*vr0min, vr1min);
-  bool maxeq = vrp_operand_equal_p (*vr0max, vr1max);
+  int cmpmin = compare_values (*vr0min, vr1min);
+  int cmpmax = compare_values (*vr0max, vr1max);
+  bool mineq = cmpmin == 0;
+  bool maxeq = cmpmax == 0;
 
   /* [] is vr0, () is vr1 in the following classification comments.  */
   if (mineq && maxeq)
@@ -5569,8 +5577,8 @@ union_ranges (enum value_range_kind *vr0
   else
gcc_unreachable ();
 }
-  else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1)
-  && (mineq || operand_less_p (*vr0min, vr1min) == 1))
+  else if ((maxeq || cmpmax == 1)
+  && (mineq || cmpmin == -1))
 {
   /* [ (  ) ] or [(  ) ] or [ (  )] */
   if (*vr0type == VR_RANGE
@@ -5603,8 +5611,8 @@ union_ranges (enum value_range_kind *vr0
   else
gcc_unreachable ();
 }
-  else if ((maxeq || operand_less_p (*vr0max, vr1max) == 1)
-  && (mineq || operand_less_p (vr1min, *vr0min) == 1))
+  else if ((maxeq || cmpmax == -1)
+  && (mineq || cmpmin == 1))
 {
   /* ( [  ] ) or ([  ] ) or ( [  ]) */
   if (*vr0type == VR_RANGE
@@ -5643,10 +5651,10 @@ union_ranges (enum value_range_kind *vr0
   else
gcc_unreachable ();
 }
-  else if ((operand_less_p (vr1min, *vr0max) == 1
-   || operand_equal_p (vr1min, *vr0max, 0))
-  && operand_less_p (*vr0min, vr1min) == 1
-  && operand_less_p (*vr0max, vr1max) == 1)
+  else if (cmpmin == -1
+  && cmpmax == -1
+  && (operand_less_p (vr1min, *vr0max) == 1
+  || operand_equal_p (vr1min, *vr0max, 0)))
 {
   /* [  (  ]  ) or [   ](   ) */
   if (*vr0type == VR_RANGE
@@ -5680,10 +5688,10 @@ union_ranges (enum value_range_kind *vr0
   else
gcc_unreachable ();
 }
-  else if ((operand_less_p (*vr0min, vr1max) == 1
-   || operand_equal_p (*vr0min, vr1max, 0))
-  && operand_less_p (vr1min, *vr0min) == 1
-  && operand_less_p (vr1max, *vr0max) == 1)
+  else if (cmpmin == 1
+  && cmpmax == 1
+  && (operand_less_p (*vr0min, vr1max) == 1
+  || operand_equal_p (*vr0min, vr1max, 0)))
 {
   /* (  [  )  ] or (   )[   ] */
   if (*vr0type == VR_RANGE


[PATCH] Make BITMAP_WORD more easily configurable

2019-07-30 Thread Richard Biener


I've played with different BITMAP_WORD and BITMAP_ELEMENT_WORDS
values and found the code using CTZ and friends a bit unwieldly
to adjust.  So I came up with a set of macros wrapping them
(_maybe_ inline overloads would be a cleaner solution, but ...).

Specifically it looks like for PR91257 making bitmap_element
cache-line aligned and 32byte in size by using an unsigned
int BITMAP_WORD and BITMAP_ELEMENT_WORDS 3 (thus reducing
the load factor from 0.4 to 0.375) is a slight win (in both
memory use and compile-time).  Enlarging to 64 bytes and keeping
unsigned long (and the padding) is worst, even using only
a single unsigned long (but then aligning to 32bytes again)
is faster than the current state.  Surprisingly the division/modulo
by 3 or the larger number of loads/stores didn't matter or were
at least offsetted by reliably touching only a single cache-line
per bitmap_element.

I would guess the optimal setting should depend on the host.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Any objection to the macro-ification below?

Richard.

2019-07-30  Richard Biener  

* bitmap.h (BITMAP_WORD_CTZ): New define.
(BITMAP_WORD_CLZ): Likewise.
(BITMAP_WORD_POPCOUNT): Likewise.
(bmp_iter_next_bit): Use BITMAP_WORD_CTZ, remove assert.
* bitmap.c (bitmap_count_bits_in_word): use BITMAP_WORD_POPCOUNT.
(bitmap_single_bit_set_p): likewise.
(bitmap_first_set_bit): Use BITMAP_WORD_CTZ, remove assert.
(bitmap_last_set_bit): Use BITMAP_WORD_CLZ, remove assert.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 273795)
+++ gcc/bitmap.c(working copy)
@@ -1016,7 +1020,7 @@ bitmap_count_bits_in_word (const BITMAP_
 #if GCC_VERSION >= 3400
   /* Note that popcountl matches BITMAP_WORD in type, so the actual size
 of BITMAP_WORD is not material.  */
-  count += __builtin_popcountl (bits[ix]);
+  count += BITMAP_WORD_POPCOUNT (bits[ix]);
 #else
   count += bitmap_popcount (bits[ix]);
 #endif
@@ -1101,7 +1105,7 @@ bitmap_single_bit_set_p (const_bitmap a)
 #if GCC_VERSION >= 3400
   /* Note that popcountl matches BITMAP_WORD in type, so the actual size
 of BITMAP_WORD is not material.  */
-  count += __builtin_popcountl (elt->bits[ix]);
+  count += BITMAP_WORD_POPCOUNT (elt->bits[ix]);
 #else
   count += bitmap_popcount (elt->bits[ix]);
 #endif
@@ -1142,8 +1146,7 @@ bitmap_first_set_bit (const_bitmap a)
   bit_no += ix * BITMAP_WORD_BITS;
 
 #if GCC_VERSION >= 3004
-  gcc_assert (sizeof (long) == sizeof (word));
-  bit_no += __builtin_ctzl (word);
+  bit_no += BITMAP_WORD_CTZ (word);
 #else
   /* Binary search for the first set bit.  */
 #if BITMAP_WORD_BITS > 64
@@ -1200,8 +1203,7 @@ bitmap_last_set_bit (const_bitmap a)
  found_bit:
   bit_no += ix * BITMAP_WORD_BITS;
 #if GCC_VERSION >= 3004
-  gcc_assert (sizeof (long) == sizeof (word));
-  bit_no += BITMAP_WORD_BITS - __builtin_clzl (word) - 1;
+  bit_no += BITMAP_WORD_BITS - BITMAP_WORD_CLZ (word) - 1;
 #else
   /* Hopefully this is a twos-complement host...  */
   BITMAP_WORD x = word;
Index: gcc/bitmap.h
===
--- gcc/bitmap.h(revision 273795)
+++ gcc/bitmap.h(working copy)
@@ -272,6 +272,10 @@ extern mem_alloc_description= 3004)
   {
-unsigned int n = __builtin_ctzl (bi->bits);
-gcc_assert (sizeof (unsigned long) == sizeof (BITMAP_WORD));
+unsigned int n = BITMAP_WORD_CTZ (bi->bits);
 bi->bits >>= n;
 *bit_no += n;
   }


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Marc Glisse

On Tue, 30 Jul 2019, Martin Liška wrote:


Ah, that's bad, both of them need a care:


Yes, that makes more sense to me, thanks.


I tried to experiment to understand, but it is complicated because including 
 disables the optimization:

#include 
void fn1() {
    char*p=new char;
    delete p;
}

This ICEs with -O -std=c++17:

int a = 64;
std::align_val_t b{64};
void fn1() {
  void *s = operator new(a,b);
  operator delete(s,8+*(unsigned long*)s,b);
}




I can't see it on current master. Can you?


Yes. I just did a clean build to make sure.

--
Marc Glisse


Re: [ARM][PATCH 1/2] Support HFmode for standard names implemented with VRINT instructions.

2019-07-30 Thread Richard Earnshaw (lists)




On 29/05/2019 15:47, Srinath Parvathaneni wrote:

Hello,

The initial implementation of the FP16 extension added HFmode support to
a limited number of the standard names.  Following
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00168.html , this patch
extends the HFmode support to the names implemented by the ARM
 and l expanders: btrunc, ceil, round,
floor, nearbyint and rint. This patch also changes the patterns
supporting the neon_vrnd* and neon_vcvt* Adv.SIMD intrinsics to use the
standard names, where apropriate.

No new tests are added. The ARM tests for the SF and DF mode variants of
these names are based on GCC builtin functions and there doesn't seem to
be an obvious alternative to trigger the new patterns through the
standard names. The pattern definitions though are tested through the
Adv.SIMD intrinsics.

The following patch reworks the implementation for HFmode VRINT to
remove a number of redundant constructs that were added in the initial
implementation.

The two patches have been tested for arm-none-linux-gnueabihf with
native bootstrap and make check and for arm-none-eabi with
cross-compiled check-gcc on an ARMv8.4-A emulator.

Ok for trunk? If ok, could someone please commit the patch on my behalf,
I don't have commit rights.

2019-05-29 Srinath Parvathaneni 
   Matthew Wahab  

* config/arm/iterators.md (fp16_rnd_str): Replace UNSPEC_VRND
values with equivalent UNSPEC_VRINT values.  Add UNSPEC_NVRINTZ,
UNSPEC_NVRINTA, UNSPEC_NVRINTM, UNSPEC_NVRINTN, UNSPEC_NVRINTP,
UNSPEC_NVRINTX.
(vrint_variant): Fix some white-space.
(vrint_predicable): Fix some white-space.
* config/arm/neon.md (neon_v): Replace
FP16_RND iterator with NEON_VRINT and update the rest of the
pattern accordingly.
(neon_vcvt): Replace with
neon_vcvt.
(neon_vcvt): New.
(neon_vcvtn): New.
* config/arm/unspecs.md: Add UNSPEC_VRINTN.
* config/arm/vfp.md (neon_vhf): Convert to an
expander invoking hf2.
(neon_vrndihf): Remove.
(neon_vrndnhf): New.
(neon_vcvthsi): Remove.
(hf2): New.
(lhfsi2): New.
(neon_vcvthsi): New.
(neon_vcvtnhsi): New.



Picking just one pattern as an example here, why does:

+(define_insn "lhfsi2"
+  [(set (match_operand:SI 0 "register_operand" "=t")
+   (FIXUORS:SI
+(unspec:HF [(match_operand:HF 1 "register_operand" "t")] VCVT)))]
+  "TARGET_VFP_FP16INST"
+  "vcvt.32.f16\\t%0, %1"
+  [(set_attr "predicable" "no")
+   (set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)

still need the unspec inside the FIXUORS?  If the pattern does what the 
FIX says, then it should be safe to do this directly on the register, 
without wrapping it first in an UNSPEC.


The other thing I noticed was:

   UNSPEC_VRINTA ; Represent a float to integral float rounding
 ; towards nearest, ties away from zero.
+  UNSPEC_VRINTN; Represent a float to integral float rounding
+   ; towards nearest.

So we have VRINTA which rounds to nearest, but rounds ties away from 
zero, but VRINTN rounds to nearest but doesn't say what happens when 
there is a tie (ie a value exactly half-way between two results.  The 
architecture treats this as ties-to-even, so I think we should say that 
here.


R.


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-30 Thread Martin Liška
On 7/29/19 3:35 PM, Martin Liška wrote:
> Hi.
> 
> I'm sending v2 of the patch that can newly auto-detect make
> jobserver and use it.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 

Ok, I've discussed the latest version also with Honza and he's fine.
So I'm going to install the patch.

Thanks,
Martin


Re: [PATCH] Improve PR91257, specialize bitmap_ior_and_compl_into

2019-07-30 Thread Richard Biener
On Tue, 30 Jul 2019, Jakub Jelinek wrote:

> On Tue, Jul 30, 2019 at 12:20:00PM +0200, Richard Biener wrote:
> > +  if (compl_p)
> > +   for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> > + {
> > +   and_elt.bits[ix] = b_elt->bits[ix] & ~c_elt->bits[ix];
> > +   overall |= and_elt.bits[ix];
> > + }
> > +  else
> > +   for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> > + {
> > +   and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
> > +   overall |= and_elt.bits[ix];
> > + }
> 
> Might be more readable by moving the if (compl_p) into the loop,
> just guarding the single statement that is different or even use a
> BITMAP_WORD temporary to load c_elt->bits[ix] into it, then conditionally
> complement and then use unconditionally in &.

As said in the followup the patch didn't acutally work.  So here's
the open-coded variant.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will commit shortly.

Richard.

2019-07-30  Richard Biener  

PR tree-optimization/91257
* bitmap.c (bitmap_ior_and_compl_into): Open-code.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 273906)
+++ gcc/bitmap.c(working copy)
@@ -2367,16 +2367,75 @@ bitmap_ior_and_compl (bitmap dst, const_
 bool
 bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
 {
-  bitmap_head tmp;
-  bool changed;
+  bitmap_element *a_elt = a->first;
+  const bitmap_element *b_elt = b->first;
+  const bitmap_element *c_elt = c->first;
+  bitmap_element and_elt;
+  bitmap_element *a_prev = NULL;
+  bitmap_element **a_prev_pnext = >first;
+  bool changed = false;
+  unsigned ix;
 
   gcc_checking_assert (!a->tree_form && !b->tree_form && !c->tree_form);
 
-  bitmap_initialize (, _default_obstack);
-  bitmap_and_compl (, b, c);
-  changed = bitmap_ior_into (a, );
-  bitmap_clear ();
+  if (a == b)
+return false;
+  if (bitmap_empty_p (c))
+return bitmap_ior_into (a, b);
+  else if (bitmap_empty_p (a))
+return bitmap_and_compl (a, b, c);
 
+  and_elt.indx = -1;
+  while (b_elt)
+{
+  /* Advance C.  */
+  while (c_elt && c_elt->indx < b_elt->indx)
+   c_elt = c_elt->next;
+
+  const bitmap_element *and_elt_ptr;
+  if (c_elt && c_elt->indx == b_elt->indx)
+   {
+ BITMAP_WORD overall = 0;
+ and_elt_ptr = _elt;
+ and_elt.indx = b_elt->indx;
+ for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
+   {
+ and_elt.bits[ix] = b_elt->bits[ix] & ~c_elt->bits[ix];
+ overall |= and_elt.bits[ix];
+   }
+ if (!overall)
+   {
+ b_elt = b_elt->next;
+ continue;
+   }
+   }
+  else
+   and_elt_ptr = b_elt;
+
+  b_elt = b_elt->next;
+
+  /* Now find a place to insert AND_ELT.  */
+  do
+   {
+ ix = a_elt ? a_elt->indx : and_elt_ptr->indx;
+  if (ix == and_elt_ptr->indx)
+   changed = bitmap_elt_ior (a, a_elt, a_prev, a_elt,
+ and_elt_ptr, changed);
+  else if (ix > and_elt_ptr->indx)
+   changed = bitmap_elt_copy (a, NULL, a_prev, and_elt_ptr, changed);
+
+  a_prev = *a_prev_pnext;
+  a_prev_pnext = _prev->next;
+  a_elt = *a_prev_pnext;
+
+  /* If A lagged behind B/C, we advanced it so loop once more.  */
+   }
+  while (ix < and_elt_ptr->indx);
+}
+
+  gcc_checking_assert (!a->current == !a->first);
+  if (a->current)
+a->indx = a->current->indx;
   return changed;
 }
 


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Martin Liška
On 7/30/19 3:09 PM, Marc Glisse wrote:
> On Tue, 30 Jul 2019, Martin Liška wrote:
> 
>> On 7/30/19 1:35 PM, Marc Glisse wrote:
>>> +  /* Some delete operators have size as 2nd argument.  */
>>>
>>> Some delete operators have 3 arguments. From cp/decl.c:
>>>
>>>     /* operator delete (void *, size_t, align_val_t); */
>>>
>>
>> Yep, I know. The patch I installed expects at least 2 arguments:
>>
>> + /* Some delete operators have size as 2nd argument.  */
>> + if (is_delete_operator && gimple_call_num_args (stmt) >= 2)
> 
> True, I guess I am a bit confused why the second argument (which could be 
> either size or alignment) needs special handling (mark_operand_necessary) 
> while the third one does not (it is usually a constant).

Ah, that's bad, both of them need a care:

diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index bec13cd5930..80d5f5c30f7 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -824,13 +824,16 @@ propagate_necessity (bool aggressive)
   || DECL_FUNCTION_CODE (def_callee) == 
BUILT_IN_CALLOC))
  || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (def_callee)))
{
- /* Some delete operators have size as 2nd argument.  */
+ /* Delete operators can have alignment and (or) size as next
+arguments.  When being a SSA_NAME, they must be marked
+as necessary.  */
  if (is_delete_operator && gimple_call_num_args (stmt) >= 2)
-   {
- tree size_argument = gimple_call_arg (stmt, 1);
- if (TREE_CODE (size_argument) == SSA_NAME)
-   mark_operand_necessary (size_argument);
-   }
+   for (unsigned i = 1; i < gimple_call_num_args (stmt); i++)
+ {
+   tree arg = gimple_call_arg (stmt, i);
+   if (TREE_CODE (arg) == SSA_NAME)
+ mark_operand_necessary (arg);
+ }
 
  continue;
}

> 
> I tried to experiment to understand, but it is complicated because including 
>  disables the optimization:
> 
> #include 
> void fn1() {
>     char*p=new char;
>     delete p;
> }
> 
> This ICEs with -O -std=c++17:
> 
> int a = 64;
> std::align_val_t b{64};
> void fn1() {
>   void *s = operator new(a,b);
>   operator delete(s,8+*(unsigned long*)s,b);
> }
> 
> 

I can't see it on current master. Can you?

Martin



[PATCH v2] Use edge->indirect_unknown_callee in cgraph_edge::make_direct (PR ipa/89330).

2019-07-30 Thread Martin Liška
Hi.

Thanks to Martin I was able to prepare a proper fix. The issue is that
cgraph_edge::resolve_speculation can delete this pointer (yes, it's
super nasty) and so that the caller can't use this->something
right after the function returns.

For the long term, I'll rework the ::resolve_speculation function.

The patch survives --enable-checking bootstrap on x86_64-linux-gnu.

Ready to be installed after proper testing?
Thanks,
Martin
>From 15873b0ebfca4cf4ce06b49f6ebb798a14414eb2 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 30 Jul 2019 15:12:52 +0200
Subject: [PATCH] Use edge->indirect_unknown_callee in cgraph_edge::make_direct
 (PR ipa/89330).

gcc/ChangeLog:

2019-07-30  Martin Liska  

	PR ipa/89330
	* cgraph.c (cgraph_edge::make_direct): Use
	edge->indirect_unknown_callee as edge->resolve_speculation can
	deallocate edge which is this pointer.
---
 gcc/cgraph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 81250acb70c..8dbe705af68 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1215,7 +1215,7 @@ cgraph_edge::make_direct (cgraph_node *callee)
   edge = edge->resolve_speculation (callee->decl);
 
   /* On successful speculation just return the pre existing direct edge.  */
-  if (!indirect_unknown_callee)
+  if (!edge->indirect_unknown_callee)
 return edge;
 }
 
-- 
2.22.0



Re: [PATCH] Add generic support for "noinit" attribute

2019-07-30 Thread Christophe Lyon
Hi,

Thanks for the useful feedback.


On Tue, 16 Jul 2019 at 11:54, Richard Sandiford
 wrote:
>
> Thanks for doing this in a generic way.
>
> Christophe Lyon  writes:
> > @@ -2224,6 +2234,50 @@ handle_weak_attribute (tree *node, tree name,
> >return NULL_TREE;
> >  }
> >
> > +/* Handle a "noinit" attribute; arguments as in struct
> > +   attribute_spec.handler.  Check whether the attribute is allowed
> > +   here and add the attribute to the variable decl tree or otherwise
> > +   issue a diagnostic.  This function checks NODE is of the expected
> > +   type and issues diagnostics otherwise using NAME.  If it is not of
> > +   the expected type *NO_ADD_ATTRS will be set to true.  */
> > +
> > +static tree
> > +handle_noinit_attribute (tree * node,
> > +   tree   name,
> > +   tree   args,
> > +   intflags ATTRIBUTE_UNUSED,
> > +   bool *no_add_attrs)
> > +{
> > +  const char *message = NULL;
> > +
> > +  gcc_assert (DECL_P (*node));
> > +  gcc_assert (args == NULL);
> > +
> > +  if (TREE_CODE (*node) != VAR_DECL)
> > +message = G_("%qE attribute only applies to variables");
> > +
> > +  /* Check that it's possible for the variable to have a section.  */
> > +  if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p)
> > +  && DECL_SECTION_NAME (*node))
> > +message = G_("%qE attribute cannot be applied to variables "
> > +  "with specific sections");
> > +
> > +  /* If this var is thought to be common, then change this.  Common
> > + variables are assigned to sections before the backend has a
> > + chance to process them.  */
> > +  if (DECL_COMMON (*node))
> > +DECL_COMMON (*node) = 0;
> > +
> > +  if (message)
> > +{
> > +  warning (OPT_Wattributes, message, name);
> > +  *no_add_attrs = true;
> > +}
> > +
> > +  return NULL_TREE;
> > +}
>
> This might cause us to clear DECL_COMMON even when rejecting the
> attribute.  Also, the first message should win over the others
> (e.g. for a function in a particular section).
>
> So I think the "/* Check that it's possible ..." should be an else-if
> and the DECL_COMMON stuff should be specific to !message.

You are right, thanks.

Jozef, this is true for msp430_data_attr() too. I'll let you update it.

>
> Since this is specific to ELF targets, I think we should also
> warn if !targetm.have_switchable_bss_sections.
>
OK

> > @@ -2338,6 +2336,8 @@ msp430_select_section (tree decl, int reloc, unsigned 
> > HOST_WIDE_INT align)
> >  {
> >if (TREE_CODE (decl) == FUNCTION_DECL)
> >   return text_section;
> > +  /* FIXME: ATTR_NOINIT is handled generically in
> > +  default_elf_select_section.  */
> >else if (has_attr (ATTR_NOINIT, decl))
> >   return noinit_section;
> >else if (has_attr (ATTR_PERSIST, decl))
>
> I guess adding a FIXME is OK.  It's very tempting to remove
> the noinit stuff and use default_elf_select_section instead of
> default_select_section, but I realise that'd be difficult to test.

I added that as suggested by Jozef, it's best if he handles the
change you propose, I guess he can do proper testing.


> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index f2619e1..850153e 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -7129,6 +7129,12 @@ The @code{visibility} attribute is described in
> >  The @code{weak} attribute is described in
> >  @ref{Common Function Attributes}.
> >
> > +@item noinit
> > +@cindex @code{noinit} variable attribute
> > +Any data with the @code{noinit} attribute will not be initialized by
> > +the C runtime startup code, or the program loader.  Not initializing
> > +data in this way can reduce program startup times.
> > +
> >  @end table
> >
> >  @node ARC Variable Attributes
>
> Would be good to mention that the attribute is specific to ELF targets.
> (Yeah, we don't seem to do that consistently for other attributes.)
> It might also be worth saying that it requires specific linker support.
OK, done.

>
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c 
> > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c
> > new file mode 100644
> > index 000..f33c0d0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c
> > @@ -0,0 +1,59 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target noinit */
> > +/* { dg-options "-O2" } */
> > +
> > +/* This test checks that noinit data is handled correctly.  */
> > +
> > +extern void _start (void) __attribute__ ((noreturn));
> > +extern void abort (void) __attribute__ ((noreturn));
> > +extern void exit (int) __attribute__ ((noreturn));
> > +
> > +int var_common;
> > +int var_zero = 0;
> > +int var_one = 1;
> > +int __attribute__((noinit)) var_noinit;
> > +int var_init = 2;
> > +
> > +int __attribute__((noinit)) func(); /* { dg-warning "attribute only 
> > applies to variables" } */
> > +int __attribute__((section 

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-30 Thread Richard Sandiford
Steve Ellcey  writes:
> On Fri, 2019-07-19 at 19:24 +0100, Richard Sandiford wrote:
>> 
>> You can probably also remove:
>> 
>>   tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
>>   ...
>>   TREE_TYPE (node->decl) = new_type;
>> 
>> in simd_clone_adjust_argument_types.
>> 
>> I'm happy doing it this way or doing the copy in the AArch64 hook.
>> It's really Jakub's call.
>
> You are right, that is no longer needed with the current patch.  I
> removed it and retested with no regressions.  Jakub, do you have
> any preference?  I have attached a new version of the patch to this
> email.
>
>> I don't think the tests need:
>> 
>> /* { dg-require-effective-target aarch64_variant_pcs } */
>> 
>> since they're only dg-do compile.  Leaving the line out would get more
>> coverage for people using older binutils.
>> 
>> The tests are OK with that change, thanks.
>
> OK, I made that change to the tests.
>
>
> Latest version of the patch:
>
> 2019-07-22  Steve Ellcey  
>
>   * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
>   build_distinct_type_copy.
>   (simd_clone_adjust_argument_types): Ditto.
>   (simd_clone_adjust): Call build_distinct_type_copy here.
>   (expand_simd_clones): Ditto.
>
>
> 2019-07-22  Steve Ellcey  
>
>   * gcc.target/aarch64/simd_pcs_attribute.c: New test.
>   * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
>   * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
>
>
> diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
> index caa8da3cba5..da01d9b8e6c 100644
> --- a/gcc/omp-simd-clone.c
> +++ b/gcc/omp-simd-clone.c
> @@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>/* Adjust the function return type.  */
>if (orig_rettype == void_type_node)
>  return NULL_TREE;
> -  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
>t = TREE_TYPE (TREE_TYPE (fndecl));
>if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
>  veclen = node->simdclone->vecsize_int;
> @@ -724,11 +723,6 @@ simd_clone_adjust_argument_types (struct cgraph_node 
> *node)
> else
>   new_reversed = void_list_node;
>   }
> -
> -  tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
> -  TYPE_ARG_TYPES (new_type) = new_reversed;

I think you still need this line, just applied to the existing type
rather than a new one.

> -  TREE_TYPE (node->decl) = new_type;
> -
>adjustments.release ();
>  }
>args.release ();
> @@ -1164,6 +1158,7 @@ simd_clone_adjust (struct cgraph_node *node)
>  {
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>  
> +  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
>targetm.simd_clone.adjust (node);
>  
>tree retval = simd_clone_adjust_return_type (node);
> @@ -1737,6 +1732,8 @@ expand_simd_clones (struct cgraph_node *node)
>   simd_clone_adjust (n);
> else
>   {
> +   TREE_TYPE (n->decl)
> + = build_distinct_type_copy (TREE_TYPE (n->decl));
> targetm.simd_clone.adjust (n);
> simd_clone_adjust_return_type (n);
> simd_clone_adjust_argument_types (n);
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c 
> b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
> index e69de29bb2d..8eec6824f37 100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +
> +__attribute__ ((__simd__ ("notinbranch")))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> +extern double foo (double x);
> +
> +void bar(double * f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo(f[i]);
> +}
> +
> +/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c 
> b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> index e69de29bb2d..95f6a6803e8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> +double foo (double x);
> +
> +void bar(double *f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo(f[i]);
> +}
> +
> +double foo(double x)
> +{
> + return x * x / 3.0;
> +}
> +
> +/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
> +/* { dg-final { scan-assembler-times 

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning

2019-07-30 Thread Andre Vieira (lists)




On 30/07/2019 13:16, Andre Vieira (lists) wrote:

Hi Richard,

I believe this is in line with what you were explaining to me earlier. 
The one thing I haven't quite done here is the jump around for if there 
is no prolog peeling. Though I think skip_vectors introduces the jumping 
we need.


The main gist of it is:
1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding 
the versioning threshold check when versioning a loop at first,
2) later we allow the vectorizer to use skip_vectors to basically check 
niters to be able to skip the vectorized loop on to the right epilogue,
3) after vectorizing the epilogue, we check whether this was a versioned 
loop and whether we skipped adding the versioning threshold, if so we 
add a threshold based on the last VF


There is a caveat here, if we don't vectorize the epilogue, because say 
our architecture doesn't know how, we will end up with a regression.
Let's say we have a loop for which our target can vectorize for 32x but 
not 16x, we get:


if (alias & threshold_for_16x ())
{
   if (niters > 31)
     vectorized_31 ();
   else
     scalar ();
}
else
  scalar ();

Imagine our iteration count is 18, all we hit is the scalar loop, but 
now we go through 1x alias and 2x niters. Whereas potentially we could 
have done with just 1x niters.


A mitigation for this could be to only add the threshold check if we 
actually vectorized the last loop, I believe a:
'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that 
new "else" in vect_transform_loop would do the trick. Though then we 
will still have that extra alias check... >
I am going to hack around in the back-end to "fake" a situation like 
this and see what happens.


Right should have quickly tried this before sending the email, what 
happens is actually vect_transform_loop never gets called because 
try_vectorize_loop_1 will recognize it can't vectorize the epilogue. So 
we end up with the "mitigation" result I suggested, where we simply 
don't have a versioning threshold which might still not be ideal.




Another potential issue arises when the profitability threshold obtained 
from the cost model isn't guaranteed to be either LE or EQ to the 
versioning threshold. In that case there are two separate checks, which 
now we no longer will attempt to fold.


And I just realized I don't take the cost model threshold into account 
when creating the new threshold check, I guess if it is ordered_p we 
should again take the max there between the new threshold check and the 
cost threshold for the new check.




I am trying to find a way to test and benchmark these changes. 
Unfortunately I am having trouble doing this for AVX512 as I find that 
the old '--param vect_epilogues_nomask=1' already causes wrong codegen 
in SPEC2017 for the gcc and perlbench benchmarks.



Cheers,
Andre

On 19/07/2019 13:38, Andre Vieira (lists) wrote:



On 19/07/2019 12:35, Richard Biener wrote:

On Fri, 19 Jul 2019, Andre Vieira (lists) wrote:




On 15/07/2019 11:54, Richard Biener wrote:

On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:




On 12/07/2019 11:19, Richard Biener wrote:

On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:



I have code that can split the condition and alias checks in
'vect_loop_versioning'. For this approach I am considering keeping 
that

bit of
code and seeing if I can patch up the checks after vectorizing the
epilogue
further. I think initially I will just go with a "hacked up" way of
passing
down the bb with the iteration check and split the false edge 
every time

we
vectorize it further. Will keep you posted on progress. If you 
have any

pointers/tips they are most welc ome!


I thought to somehow force the idea that we have a prologue loop
to the vectorizer so it creates the number-of-vectorized iterations
check and branch around the main (highest VF) vectorized loop.



Hmm I think I may have skimmed over this earlier. I am reading it 
now and am

not entirely sure what you mean by this. Specifically the "number of
vectorized iterations" or how a prologue loop plays a role in this.


When there is no prologue then the versioning condition currently
ensures we always enter the vector loop.  Only if there is a prologue
is there a check and branch eventually skipping right to the epilogue.
If the versioning condition (now using a lower VF) doesn't guarantee
this we have to add this jump-around.


Right, I haven't looked at the prologue path yet. I had a quick look 
and can't find where this branch skipping to the epilogue is 
constructed.  I will take a better look after I got my current example 
to work.




I guess this sheds some light on the comment above. And it 
definitely implies
we would need to know the lowest VF when creating this condition. 
Which is

tricky.


We can simply use the smallest vector size supported by the target to
derive it from the actual VF, no?


So I could wait to introduce this check after all epilogue 
vectorization is done, back 

Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Marc Glisse

On Tue, 30 Jul 2019, Martin Liška wrote:


On 7/30/19 1:35 PM, Marc Glisse wrote:

+  /* Some delete operators have size as 2nd argument.  */

Some delete operators have 3 arguments. From cp/decl.c:

    /* operator delete (void *, size_t, align_val_t); */



Yep, I know. The patch I installed expects at least 2 arguments:

+ /* Some delete operators have size as 2nd argument.  */
+ if (is_delete_operator && gimple_call_num_args (stmt) >= 2)


True, I guess I am a bit confused why the second argument (which could be 
either size or alignment) needs special handling (mark_operand_necessary) 
while the third one does not (it is usually a constant).


I tried to experiment to understand, but it is complicated because 
including  disables the optimization:


#include 
void fn1() {
char*p=new char;
delete p;
}

This ICEs with -O -std=c++17:

int a = 64;
std::align_val_t b{64};
void fn1() {
  void *s = operator new(a,b);
  operator delete(s,8+*(unsigned long*)s,b);
}


--
Marc Glisse


Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-30 Thread Uros Bizjak
On Thu, Jul 25, 2019 at 7:34 PM Peter Bergner  wrote:
>
> Uros and Jan,
>
> I should have CC'd you for the i386 portion of the patch.  Are you ok with
> the i386.h change if Iain's x86 Darwin testing comes out clean?
>
> Peter
>
>
> On 7/25/19 9:41 AM, Peter Bergner wrote:
> > On 7/25/19 2:50 AM, Iain Sandoe wrote:
> >> This will break Darwin which has used DRIVER_SELF_SPECS in config/darwin.h
> >> since they were introduced (and the equivalent before).
> >>
> >> This is because Darwin has driver self-specs common to the PPC and X86 
> >> ports.
> >>
> >> If this change is seen the way to go, then I guess one solution would be 
> >> to rename the
> >> driver specs in config/darwin.h => SUBTARGET_DRIVER_SELF_SPECS*** and then
> >> put a dummy DRIVER_SELF_SPECS in i386/i386.h
> >> and append SUBTARGET_DRIVER_SELF_SPECS  to both the i386.h and rs6000.h
> >> cases.  (or something along those lines)
> >
> > Hi Iain,
> >
> > The patch below uses your suggestion.  Does this work for you on x86 and 
> > ppc?
> >
> >
> >
> >>> Segher, I tried your suggestion of writing the spec more generically
> >>> (ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
> >>> replacement.  However, the % >>> option(s) would need to be written like % >>> at all.
> >>
> >> not sure what the objective is here - if you want to remove all 
> >> pre-existing -mcpu from
> >> the command line won’t % >> -mcpunewthing
> >> if it gets invented) ..
> >
> > We only want to remove all pre-existing -mcpu= options IF the user also used
> > -mdejagnu-cpu=.  You do not want to remove -mcpu= options if the user used
> > -mdejagnu-tune=.  That's why I tried:
> >
> > %{mdejagnu-*: % >
> > so -mdejagnu-cpu= would only remove -mcpu options and -mdejagnu-tune= would
> > only remove -mtune= options.  The problem is that it doesn't look like you
> > can you use %* with %<.
> >
> > Peter
> >
> >
> > gcc/
> >   PR target/91050
> >   * config/rs6000/rs6000.opt (mdejagnu-cpu=): Delete option.
> >   * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
> >   use of deleted rs6000_dejagnu_cpu_index variable.
> >   * config/rs6000/rs6000.h (DRIVER_SELF_SPECS): Define.
> >   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
> >   * config/darwin.h (DRIVER_SELF_SPECS): Rename from this ...
> >   (SUBTARGET_DRIVER_SELF_SPECS): ...to this.
> >   * config/i386/i386.h (DRIVER_SELF_SPECS): Define.
> >   (SUBTARGET_DRIVER_SELF_SPECS): Likewise.
> >
> > Index: gcc/config/rs6000/rs6000.opt
> > ===
> > --- gcc/config/rs6000/rs6000.opt  (revision 273707)
> > +++ gcc/config/rs6000/rs6000.opt  (working copy)
> > @@ -388,13 +388,6 @@ mtune=
> >  Target RejectNegative Joined Var(rs6000_tune_index) Init(-1) 
> > Enum(rs6000_cpu_opt_value) Save
> >  -mtune=  Schedule code for given CPU.
> >
> > -; Only for use in the testsuite.  This simply overrides -mcpu=.  With older
> > -; versions of Dejagnu the command line arguments you set in RUNTESTFLAGS
> > -; override those set in the testcases; with this option, the testcase will
> > -; always win.
> > -mdejagnu-cpu=
> > -Target Undocumented RejectNegative Joined Var(rs6000_dejagnu_cpu_index) 
> > Init(-1) Enum(rs6000_cpu_opt_value) Save
> > -
> >  mtraceback=
> >  Target RejectNegative Joined Enum(rs6000_traceback_type) 
> > Var(rs6000_traceback)
> >  -mtraceback=[full,part,no]   Select type of traceback table.
> > Index: gcc/config/rs6000/rs6000.c
> > ===
> > --- gcc/config/rs6000/rs6000.c(revision 273707)
> > +++ gcc/config/rs6000/rs6000.c(working copy)
> > @@ -3489,9 +3489,6 @@ rs6000_option_override_internal (bool gl
> >/* Don't override by the processor default if given explicitly.  */
> >set_masks &= ~rs6000_isa_flags_explicit;
> >
> > -  if (global_init_p && rs6000_dejagnu_cpu_index >= 0)
> > -rs6000_cpu_index = rs6000_dejagnu_cpu_index;
> > -
> >/* Process the -mcpu= and -mtune= argument.  If the user 
> > changed
> >   the cpu in a target attribute or pragma, but did not specify a tuning
> >   option, use the cpu for the tuning option rather than the option 
> > specified
> > Index: gcc/config/rs6000/rs6000.h
> > ===
> > --- gcc/config/rs6000/rs6000.h(revision 273707)
> > +++ gcc/config/rs6000/rs6000.h(working copy)
> > @@ -77,6 +77,20 @@
> >  #define PPC405_ERRATUM77 0
> >  #endif
> >
> > +/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
> > +   With older versions of Dejagnu the command line arguments you set in
> > +   RUNTESTFLAGS override those set in the testcases; with this option,
> > +   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
> > +#define DRIVER_SELF_SPECS \
> > +  "%{mdejagnu-cpu=*: % > +   %{mdejagnu-tune=*: % > +   %{mdejagnu-*: % > +   

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Richard Earnshaw (lists)




On 30/07/2019 10:31, Ramana Radhakrishnan wrote:

On 30/07/2019 10:08, Christophe Lyon wrote:
On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  
wrote:


Currently the Arm backend selects the alternative sched pressure 
algorithm.
The issue is that this doesn't take register pressure into account, 
and so

it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 
shows

~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.



Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from 
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html


It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.



I don't have all the notes today for that - maybe you can look into the 
linaro wiki.


I am concerned about taking this patch in without some more data across 
a variety of cores.




My concern is the original patch 
(https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
any real detail as to the reasons for the choice of the second algorithm 
over the first.


- It's not clear what the win was
- It's not clear what outliers there were and whether they were significant.

And finally, it's not clear if, 7 years later, this is still the best 
choice.


If the second algorithm really is better, why is no other target using 
it by default?


I think we need a bit more information (both ways).  In particular I'm 
concerned not just by the overall benchmark average, but also the amount 
of variance across the benchmarks.  I think the default needs to avoid 
significant outliers if at all possible, even if it is marginally less 
good on the average.


R.


Thanks,
Ramana




Thanks,

Christophe


Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

 * config/arm/arm.c (arm_option_override): Don't override sched
 pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
    if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;

-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,

-    global_options.x_param_values,
-    global_options_set.x_param_values);
-
    /* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
    int param_sched_autopref_queue_depth;





Re: wrap math.h for M_PI et al in target/i386 tests

2019-07-30 Thread Uros Bizjak
> Most but not all of the tests that expect M_PI, M_PI_2 and/or M_PI_4
> to be defined in math.h explicitly exclude one target system that does
> not satisfy this non-standard assumption.
>
> This patch introduces a wrapper header that includes math.h and then
> conditionally supplies the missing non-standard macro definitions.
> With that, we can drop the dg-skip-if "no M_PI" exclusions.
>
> Tested on x86_64-linux-gnu, with a regular math.h, and with a "manually
> fixincluded" math.h so as to not define M_PI, M_PI_2 and M_PI_4.  Ok to
> install?
>
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.target/i386/math_m_pi.h: New.
> * gcc.target/i386/sse4_1-round.h: Use it.
> * gcc.target/i386/pr73350.c: Likewise.
> * gcc.target/i386/avx512f-vfixupimmpd-2.c: Likewise.
> * gcc.target/i386/avx512f-vfixupimmps-2.c: Likewise.
> * gcc.target/i386/avx512f-vfixupimmsd-2.c: Likewise.
> * gcc.target/i386/avx512f-vfixupimmss-2.c: Likewise.
> * gcc.target/i386/avx512f-vfixupimmss-2.c: Likewise.
> * gcc.target/i386/avx-ceil-sfix-2-vec.c: Likewise.  Drop
> dg-skip-if "no M_PI".
> * gcc.target/i386/avx-cvt-2-vec.c: Likewise.
> * gcc.target/i386/avx-floor-sfix-2-vec.c: Likewise.
> * gcc.target/i386/avx-rint-sfix-2-vec.c: Likewise.
> * gcc.target/i386/avx-round-sfix-2-vec.c: Likewise.
> * gcc.target/i386/avx512f-ceil-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-ceil-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-ceilf-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-floor-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-floor-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-floorf-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-floorf-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-rint-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-rintf-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-round-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-roundf-sfix-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-trunc-vec-1.c: Likewise.
> * gcc.target/i386/avx512f-truncf-vec-1.c: Likewise.
> * gcc.target/i386/sse2-cvt-vec.c: Likewise.
> * gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-ceil-vec.c: Likewise.
> * gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-ceilf-vec.c: Likewise.
> * gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-floor-vec.c: Likewise.
> * gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-floorf-vec.c: Likewise.
> * gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-rint-vec.c: Likewise.
> * gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-rintf-vec.c: Likewise.
> * gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-round-vec.c: Likewise.
> * gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
> * gcc.target/i386/sse4_1-roundf-vec.c: Likewise.
> * gcc.target/i386/sse4_1-roundsd-4.c: Likewise.
> * gcc.target/i386/sse4_1-roundss-4.c: Likewise.
> * gcc.target/i386/sse4_1-trunc-vec.c: Likewise.
> * gcc.target/i386/sse4_1-truncf-vec.c: Likewise.

LGTM, patch also needs approval from testsuite maintainer (CC'd).

Thanks,
Uros.


Re: Don't use integer "FMA" for shifts

2019-07-30 Thread Richard Earnshaw (lists)




On 30/07/2019 11:50, Richard Sandiford wrote:

Richard Biener  writes:

On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
 wrote:


tree-ssa-math-opts supports FMA optabs for integers as well as
floating-point types, even though there's no distinction between
fused and unfused there.  It turns out to be pretty handy for the
IFN_COND_* handling though, so I don't want to remove it, however
weird it might seem.

Instead this patch makes sure that we don't apply it to integer
multiplications that are actually shifts (but that are represented
in gimple as multiplications because that's the canonical form).

This is a preemptive strike.  The test doesn't fail for SVE as-is,
but would after a later patch.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2019-07-30  Richard Sandiford  

gcc/
 * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
 multiplications by a power of 2 or a negative power of 2.

gcc/testsuite/
 * gcc.dg/vect/vect-cond-arith-8.c: New test.

Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
+++ gcc/tree-ssa-math-opts.c2019-07-30 10:52:27.327139149 +0100
@@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
&& flag_fp_contract_mode == FP_CONTRACT_OFF)
  return false;

-  /* We don't want to do bitfield reduction ops.  */
-  if (INTEGRAL_TYPE_P (type)
-  && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
-return false;
+  if (ANY_INTEGRAL_TYPE_P (type))
+{
+  /* We don't want to do bitfield reduction ops.  */
+  tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);


you can use element_type () for this.  But I think it's easier to
leave the existing
test in-place since vector or complex types cannot have non-mode precision
components.


Ah, yeah.  Guess I was over-generalising here :-)


+  if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype))
+   return false;
+
+  /* Don't use FMA for multiplications that are actually shifts.  */


So I question this - if the FMA can do the shift "for free" and it eventually is
same cost/latency as an add why do we want to not allow this (for all targets)?
Esp. for vector types I would guess a add plus a shift may be more costly
(not all ISAs implement shifts anyhow).


The shift doesn't really come for free.  By using FMA we're converting
the shift by a constant into a general multiplication.


Isn't this just the problem that the madd pattern for aarch64 
doesn't accept constants that are a power of 2?  So it ends up trying to 
force the constant into a register unnecessarily.


(define_insn "madd"
  [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (mult:GPI (match_operand:GPI 1 "register_operand" "r")
(match_operand:GPI 2 "register_operand" "r"))
--- This should be reg_or_imm_power2, then a second alternative for the 
immediate case.


  (match_operand:GPI 3 "register_operand" "r")))]
  ""

R.




Can this be fixed up in the target instead?  (by a splitter or appropriate
expander?)


OK, I'll try to do it that way instead.

Thanks,
Richard




+  tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
+  if (val
+ && TREE_CODE (val) == INTEGER_CST
+ && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
+   return false;
+}

/* If the target doesn't support it, don't generate it.  We assume that
   if fma isn't available then fms, fnma or fnms are not either.  */
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-07-30 
10:52:27.327139149 +0100
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target scalar_all_fma } */
+/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
+
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS * 11 / 64 + 3)
+
+#define DEF(INV)   \
+  void __attribute__ ((noipa)) \
+  f_##INV (int *restrict a, int *restrict b,   \
+  int *restrict c) \
+  {\
+for (int i = 0; i < N; ++i)\
+  {\
+   int mb = (INV & 1 ? -b[i] : b[i]);  \
+   int mc = (INV & 2 ? -c[i] : c[i]);  \
+   a[i] = b[i] < 10 ? mb * 8 + mc : 10;\
+  }\
+  }
+
+#define TEST(INV)  \
+  {\
+f_##INV (a, b, c);

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning

2019-07-30 Thread Andre Vieira (lists)

Hi Richard,

I believe this is in line with what you were explaining to me earlier. 
The one thing I haven't quite done here is the jump around for if there 
is no prolog peeling. Though I think skip_vectors introduces the jumping 
we need.


The main gist of it is:
1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding 
the versioning threshold check when versioning a loop at first,
2) later we allow the vectorizer to use skip_vectors to basically check 
niters to be able to skip the vectorized loop on to the right epilogue,
3) after vectorizing the epilogue, we check whether this was a versioned 
loop and whether we skipped adding the versioning threshold, if so we 
add a threshold based on the last VF


There is a caveat here, if we don't vectorize the epilogue, because say 
our architecture doesn't know how, we will end up with a regression.
Let's say we have a loop for which our target can vectorize for 32x but 
not 16x, we get:


if (alias & threshold_for_16x ())
{
  if (niters > 31)
vectorized_31 ();
  else
scalar ();
}
else
 scalar ();

Imagine our iteration count is 18, all we hit is the scalar loop, but 
now we go through 1x alias and 2x niters. Whereas potentially we could 
have done with just 1x niters.


A mitigation for this could be to only add the threshold check if we 
actually vectorized the last loop, I believe a:
'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that 
new "else" in vect_transform_loop would do the trick. Though then we 
will still have that extra alias check...


I am going to hack around in the back-end to "fake" a situation like 
this and see what happens.


Another potential issue arises when the profitability threshold obtained 
from the cost model isn't guaranteed to be either LE or EQ to the 
versioning threshold. In that case there are two separate checks, which 
now we no longer will attempt to fold.


I am trying to find a way to test and benchmark these changes. 
Unfortunately I am having trouble doing this for AVX512 as I find that 
the old '--param vect_epilogues_nomask=1' already causes wrong codegen 
in SPEC2017 for the gcc and perlbench benchmarks.



Cheers,
Andre

On 19/07/2019 13:38, Andre Vieira (lists) wrote:



On 19/07/2019 12:35, Richard Biener wrote:

On Fri, 19 Jul 2019, Andre Vieira (lists) wrote:




On 15/07/2019 11:54, Richard Biener wrote:

On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:




On 12/07/2019 11:19, Richard Biener wrote:

On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:



I have code that can split the condition and alias checks in
'vect_loop_versioning'. For this approach I am considering keeping 
that

bit of
code and seeing if I can patch up the checks after vectorizing the
epilogue
further. I think initially I will just go with a "hacked up" way of
passing
down the bb with the iteration check and split the false edge every 
time

we
vectorize it further. Will keep you posted on progress. If you have 
any

pointers/tips they are most welc ome!


I thought to somehow force the idea that we have a prologue loop
to the vectorizer so it creates the number-of-vectorized iterations
check and branch around the main (highest VF) vectorized loop.



Hmm I think I may have skimmed over this earlier. I am reading it now 
and am

not entirely sure what you mean by this. Specifically the "number of
vectorized iterations" or how a prologue loop plays a role in this.


When there is no prologue then the versioning condition currently
ensures we always enter the vector loop.  Only if there is a prologue
is there a check and branch eventually skipping right to the epilogue.
If the versioning condition (now using a lower VF) doesn't guarantee
this we have to add this jump-around.


Right, I haven't looked at the prologue path yet. I had a quick look and 
can't find where this branch skipping to the epilogue is constructed.  I 
will take a better look after I got my current example to work.




I guess this sheds some light on the comment above. And it definitely 
implies
we would need to know the lowest VF when creating this condition. 
Which is

tricky.


We can simply use the smallest vector size supported by the target to
derive it from the actual VF, no?


So I could wait to introduce this check after all epilogue vectorization 
is done, back track to the last niter check and duplicate that in the 
outer loop.


What I didn't want to do was use the smallest possible vector size for 
the target because I was under the impression that does not necessarily 
correspond to the smallest VF we may have for a loop, as the vectorizer 
may have decided not to vectorize for that vector size because of costs? 
If it I can assume this never happens, that once it starts to vectorize 
epilogues that it will keep vectorizing them for any vector size it 
knows off then yeah I can use that.




I'm not sure I understand - why would you have any check not inside
the outer loop?  Yes, we now eventually hoist 

Re: [PATCH] Improve PTA for PR91257, new bitmap_ior_into_and_free

2019-07-30 Thread Richard Biener
On Mon, 29 Jul 2019, Richard Biener wrote:

> 
> The following patch addresses appearant quadraticness in PTAs
> SCC merging algorithm which does (simplified)
> 
>  for (node N in SCC)
>bitmap_ior_into (pred(leader), pred(N));
> 
> with the linked list implementation this leads to increasingly large
> amount of walking for pred(leader).
> 
> The patch changes this to
> 
>  while (split /= 2)
>for (node N in lower-half-of-SCC)
>  bitmap_ior_into_and_free (pred (N), pred (N in upper half of SCC));
> 
> so in each outer iteration reducing the number of bitmaps by a factor
> of two and thus hopefully (and in practice for the testcase) reducing
> the intermediate bitmap sizes we work on.
> 
> This improves
> 
>  tree PTA   :   8.38 ( 23%)   0.22 ( 33%)   8.62 ( 
> 24%)7679 kB (  1%)
> 
> to
> 
>  tree PTA   :   5.30 ( 16%)   0.22 ( 32%)   5.54 ( 
> 16%)   28081 kB (  4%)
> 
> for the reduced testcase.
> 
> I somehow didn't manage a 1:1 conversion so I need to re-check details
> here still I'd like to hear comments about the new bitmap API
> which is a source-destructive variant for bitmap_ior_into, instead of
> copying elements not in A moving them from B and in the end releasing
> the rest.  Note using this API with keeping the merging as-is doesn't
> help PTA time.

This is what I applied.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2019-07-30  Richard Biener  

PR tree-optimization/91257
* bitmap.h (bitmap_ior_into_and_free): Declare.
* bitmap.c (bitmap_list_unlink_element): Add defaulted param
whether to add the unliked element to the freelist.
(bitmap_list_insert_element_after): Add defaulted param for
an already allocated element.
(bitmap_ior_into_and_free): New function.
* tree-ssa-structalias.c (condense_visit): Reduce the
ponts-to and edge bitmaps of the SCC members in a
logarithmic fashion rather than all to one.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 273795)
+++ gcc/bitmap.c(working copy)
@@ -252,7 +252,8 @@ bitmap_list_link_element (bitmap head, b
and return it to the freelist.  */
 
 static inline void
-bitmap_list_unlink_element (bitmap head, bitmap_element *element)
+bitmap_list_unlink_element (bitmap head, bitmap_element *element,
+   bool to_freelist = true)
 {
   bitmap_element *next = element->next;
   bitmap_element *prev = element->prev;
@@ -279,18 +280,21 @@ bitmap_list_unlink_element (bitmap head,
head->indx = 0;
 }
 
-  bitmap_elem_to_freelist (head, element);
+  if (to_freelist)
+bitmap_elem_to_freelist (head, element);
 }
 
-/* Insert a new uninitialized element into bitmap HEAD after element
-   ELT.  If ELT is NULL, insert the element at the start.  Return the
-   new element.  */
+/* Insert a new uninitialized element (or NODE if not NULL) into bitmap
+   HEAD after element ELT.  If ELT is NULL, insert the element at the start.
+   Return the new element.  */
 
 static bitmap_element *
 bitmap_list_insert_element_after (bitmap head,
- bitmap_element *elt, unsigned int indx)
+ bitmap_element *elt, unsigned int indx,
+ bitmap_element *node = NULL)
 {
-  bitmap_element *node = bitmap_element_allocate (head);
+  if (!node)
+node = bitmap_element_allocate (head);
   node->indx = indx;
 
   gcc_checking_assert (!head->tree_form);
@@ -2009,6 +2013,56 @@ bitmap_ior_into (bitmap a, const_bitmap
   return changed;
 }
 
+/* A |= B.  Return true if A changes.  Free B (re-using its storage
+   for the result).  */
+
+bool
+bitmap_ior_into_and_free (bitmap a, bitmap *b_)
+{
+  bitmap b = *b_;
+  bitmap_element *a_elt = a->first;
+  bitmap_element *b_elt = b->first;
+  bitmap_element *a_prev = NULL;
+  bitmap_element **a_prev_pnext = >first;
+  bool changed = false;
+
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+  gcc_assert (a->obstack == b->obstack);
+  if (a == b)
+return false;
+
+  while (b_elt)
+{
+  /* If A lags behind B, just advance it.  */
+  if (!a_elt || a_elt->indx == b_elt->indx)
+   {
+ changed = bitmap_elt_ior (a, a_elt, a_prev, a_elt, b_elt, changed);
+ b_elt = b_elt->next;
+   }
+  else if (a_elt->indx > b_elt->indx)
+   {
+ bitmap_element *b_elt_next = b_elt->next;
+ bitmap_list_unlink_element (b, b_elt, false);
+ bitmap_list_insert_element_after (a, a_prev, b_elt->indx, b_elt);
+ b_elt = b_elt_next;
+   }
+
+  a_prev = *a_prev_pnext;
+  a_prev_pnext = _prev->next;
+  a_elt = *a_prev_pnext;
+}
+
+  gcc_checking_assert (!a->current == !a->first);
+  if (a->current)
+a->indx = a->current->indx;
+
+  if (b->obstack)
+BITMAP_FREE (*b_);
+  else
+bitmap_clear (b);
+  

Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Martin Liška
On 7/30/19 1:35 PM, Marc Glisse wrote:
> +  /* Some delete operators have size as 2nd argument.  */
> 
> Some delete operators have 3 arguments. From cp/decl.c:
> 
>     /* operator delete (void *, size_t, align_val_t); */
> 

Yep, I know. The patch I installed expects at least 2 arguments:

+ /* Some delete operators have size as 2nd argument.  */
+ if (is_delete_operator && gimple_call_num_args (stmt) >= 2)

Martin


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Marc Glisse

+ /* Some delete operators have size as 2nd argument.  */

Some delete operators have 3 arguments. From cp/decl.c:

/* operator delete (void *, size_t, align_val_t); */

--
Marc Glisse


Re: Don't use integer "FMA" for shifts

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 12:50 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
> >  wrote:
> >>
> >> tree-ssa-math-opts supports FMA optabs for integers as well as
> >> floating-point types, even though there's no distinction between
> >> fused and unfused there.  It turns out to be pretty handy for the
> >> IFN_COND_* handling though, so I don't want to remove it, however
> >> weird it might seem.
> >>
> >> Instead this patch makes sure that we don't apply it to integer
> >> multiplications that are actually shifts (but that are represented
> >> in gimple as multiplications because that's the canonical form).
> >>
> >> This is a preemptive strike.  The test doesn't fail for SVE as-is,
> >> but would after a later patch.
> >>
> >> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> >> OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2019-07-30  Richard Sandiford  
> >>
> >> gcc/
> >> * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
> >> multiplications by a power of 2 or a negative power of 2.
> >>
> >> gcc/testsuite/
> >> * gcc.dg/vect/vect-cond-arith-8.c: New test.
> >>
> >> Index: gcc/tree-ssa-math-opts.c
> >> ===
> >> --- gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
> >> +++ gcc/tree-ssa-math-opts.c2019-07-30 10:52:27.327139149 +0100
> >> @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
> >>&& flag_fp_contract_mode == FP_CONTRACT_OFF)
> >>  return false;
> >>
> >> -  /* We don't want to do bitfield reduction ops.  */
> >> -  if (INTEGRAL_TYPE_P (type)
> >> -  && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS 
> >> (type)))
> >> -return false;
> >> +  if (ANY_INTEGRAL_TYPE_P (type))
> >> +{
> >> +  /* We don't want to do bitfield reduction ops.  */
> >> +  tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
> >
> > you can use element_type () for this.  But I think it's easier to
> > leave the existing
> > test in-place since vector or complex types cannot have non-mode precision
> > components.
>
> Ah, yeah.  Guess I was over-generalising here :-)
>
> >> +  if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS 
> >> (itype))
> >> +   return false;
> >> +
> >> +  /* Don't use FMA for multiplications that are actually shifts.  */
> >
> > So I question this - if the FMA can do the shift "for free" and it 
> > eventually is
> > same cost/latency as an add why do we want to not allow this (for all 
> > targets)?
> > Esp. for vector types I would guess a add plus a shift may be more costly
> > (not all ISAs implement shifts anyhow).
>
> The shift doesn't really come for free.  By using FMA we're converting
> the shift by a constant into a general multiplication.

Yeah, it probably "depends".  OTOH the shift may end up being a (vector)
multiply anyway due to target constraints.

I wonder how we select between (vector) shift and multiply when expanding
without FMA.  Ideally the scheduler would have a say here based on
port utilization ;)

> > Can this be fixed up in the target instead?  (by a splitter or appropriate
> > expander?)
>
> OK, I'll try to do it that way instead.
>
> Thanks,
> Richard
>
> >
> >> +  tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
> >> +  if (val
> >> + && TREE_CODE (val) == INTEGER_CST
> >> + && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
> >> +   return false;
> >> +}
> >>
> >>/* If the target doesn't support it, don't generate it.  We assume that
> >>   if fma isn't available then fms, fnma or fnms are not either.  */
> >> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
> >> ===
> >> --- /dev/null   2019-07-30 08:53:31.317691683 +0100
> >> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-07-30 
> >> 10:52:27.327139149 +0100
> >> @@ -0,0 +1,57 @@
> >> +/* { dg-require-effective-target scalar_all_fma } */
> >> +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
> >> +
> >> +#include "tree-vect.h"
> >> +
> >> +#define N (VECTOR_BITS * 11 / 64 + 3)
> >> +
> >> +#define DEF(INV)   \
> >> +  void __attribute__ ((noipa)) \
> >> +  f_##INV (int *restrict a, int *restrict b,   \
> >> +  int *restrict c) \
> >> +  {\
> >> +for (int i = 0; i < N; ++i)\
> >> +  {\
> >> +   int mb = (INV & 1 ? -b[i] : b[i]);  \
> >> +   int mc = (INV & 2 ? -c[i] : c[i]);  \
> >> +   a[i] = b[i] < 10 ? mb * 8 + mc : 10;\
> >> +  }

Re: Don't use integer "FMA" for shifts

2019-07-30 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
>  wrote:
>>
>> tree-ssa-math-opts supports FMA optabs for integers as well as
>> floating-point types, even though there's no distinction between
>> fused and unfused there.  It turns out to be pretty handy for the
>> IFN_COND_* handling though, so I don't want to remove it, however
>> weird it might seem.
>>
>> Instead this patch makes sure that we don't apply it to integer
>> multiplications that are actually shifts (but that are represented
>> in gimple as multiplications because that's the canonical form).
>>
>> This is a preemptive strike.  The test doesn't fail for SVE as-is,
>> but would after a later patch.
>>
>> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
>> OK to install?
>>
>> Richard
>>
>>
>> 2019-07-30  Richard Sandiford  
>>
>> gcc/
>> * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
>> multiplications by a power of 2 or a negative power of 2.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/vect-cond-arith-8.c: New test.
>>
>> Index: gcc/tree-ssa-math-opts.c
>> ===
>> --- gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
>> +++ gcc/tree-ssa-math-opts.c2019-07-30 10:52:27.327139149 +0100
>> @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
>>&& flag_fp_contract_mode == FP_CONTRACT_OFF)
>>  return false;
>>
>> -  /* We don't want to do bitfield reduction ops.  */
>> -  if (INTEGRAL_TYPE_P (type)
>> -  && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
>> -return false;
>> +  if (ANY_INTEGRAL_TYPE_P (type))
>> +{
>> +  /* We don't want to do bitfield reduction ops.  */
>> +  tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
>
> you can use element_type () for this.  But I think it's easier to
> leave the existing
> test in-place since vector or complex types cannot have non-mode precision
> components.

Ah, yeah.  Guess I was over-generalising here :-)

>> +  if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype))
>> +   return false;
>> +
>> +  /* Don't use FMA for multiplications that are actually shifts.  */
>
> So I question this - if the FMA can do the shift "for free" and it eventually 
> is
> same cost/latency as an add why do we want to not allow this (for all 
> targets)?
> Esp. for vector types I would guess a add plus a shift may be more costly
> (not all ISAs implement shifts anyhow).

The shift doesn't really come for free.  By using FMA we're converting
the shift by a constant into a general multiplication.

> Can this be fixed up in the target instead?  (by a splitter or appropriate
> expander?)

OK, I'll try to do it that way instead.

Thanks,
Richard

>
>> +  tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
>> +  if (val
>> + && TREE_CODE (val) == INTEGER_CST
>> + && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
>> +   return false;
>> +}
>>
>>/* If the target doesn't support it, don't generate it.  We assume that
>>   if fma isn't available then fms, fnma or fnms are not either.  */
>> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
>> ===
>> --- /dev/null   2019-07-30 08:53:31.317691683 +0100
>> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-07-30 
>> 10:52:27.327139149 +0100
>> @@ -0,0 +1,57 @@
>> +/* { dg-require-effective-target scalar_all_fma } */
>> +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
>> +
>> +#include "tree-vect.h"
>> +
>> +#define N (VECTOR_BITS * 11 / 64 + 3)
>> +
>> +#define DEF(INV)   \
>> +  void __attribute__ ((noipa)) \
>> +  f_##INV (int *restrict a, int *restrict b,   \
>> +  int *restrict c) \
>> +  {\
>> +for (int i = 0; i < N; ++i)\
>> +  {\
>> +   int mb = (INV & 1 ? -b[i] : b[i]);  \
>> +   int mc = (INV & 2 ? -c[i] : c[i]);  \
>> +   a[i] = b[i] < 10 ? mb * 8 + mc : 10;\
>> +  }\
>> +  }
>> +
>> +#define TEST(INV)  \
>> +  {\
>> +f_##INV (a, b, c); \
>> +for (int i = 0; i < N; ++i)\
>> +  {\
>> +   int mb = (INV & 1 ? -b[i] : b[i]);  \
>> +   int mc = (INV & 2 ? -c[i] : c[i]);  \
>> +   int truev = mb * 8 + mc;\
>> +   if (a[i] != (i % 17 

Re: [PATCH] Improve PR91257, specialize bitmap_ior_and_compl_into

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 30, 2019 at 12:20:00PM +0200, Richard Biener wrote:
> +  if (compl_p)
> + for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> +   {
> + and_elt.bits[ix] = b_elt->bits[ix] & ~c_elt->bits[ix];
> + overall |= and_elt.bits[ix];
> +   }
> +  else
> + for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> +   {
> + and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
> + overall |= and_elt.bits[ix];
> +   }

Might be more readable by moving the if (compl_p) into the loop,
just guarding the single statement that is different or even use a
BITMAP_WORD temporary to load c_elt->bits[ix] into it, then conditionally
complement and then use unconditionally in &.

Jakub


Re: Don't use integer "FMA" for shifts

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
 wrote:
>
> tree-ssa-math-opts supports FMA optabs for integers as well as
> floating-point types, even though there's no distinction between
> fused and unfused there.  It turns out to be pretty handy for the
> IFN_COND_* handling though, so I don't want to remove it, however
> weird it might seem.
>
> Instead this patch makes sure that we don't apply it to integer
> multiplications that are actually shifts (but that are represented
> in gimple as multiplications because that's the canonical form).
>
> This is a preemptive strike.  The test doesn't fail for SVE as-is,
> but would after a later patch.
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?
>
> Richard
>
>
> 2019-07-30  Richard Sandiford  
>
> gcc/
> * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
> multiplications by a power of 2 or a negative power of 2.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-cond-arith-8.c: New test.
>
> Index: gcc/tree-ssa-math-opts.c
> ===
> --- gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
> +++ gcc/tree-ssa-math-opts.c2019-07-30 10:52:27.327139149 +0100
> @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
>&& flag_fp_contract_mode == FP_CONTRACT_OFF)
>  return false;
>
> -  /* We don't want to do bitfield reduction ops.  */
> -  if (INTEGRAL_TYPE_P (type)
> -  && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
> -return false;
> +  if (ANY_INTEGRAL_TYPE_P (type))
> +{
> +  /* We don't want to do bitfield reduction ops.  */
> +  tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);

you can use element_type () for this.  But I think it's easier to
leave the existing
test in-place since vector or complex types cannot have non-mode precision
components.

> +  if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype))
> +   return false;
> +
> +  /* Don't use FMA for multiplications that are actually shifts.  */

So I question this - if the FMA can do the shift "for free" and it eventually is
same cost/latency as an add why do we want to not allow this (for all targets)?
Esp. for vector types I would guess a add plus a shift may be more costly
(not all ISAs implement shifts anyhow).

Can this be fixed up in the target instead?  (by a splitter or appropriate
expander?)

> +  tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
> +  if (val
> + && TREE_CODE (val) == INTEGER_CST
> + && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
> +   return false;
> +}
>
>/* If the target doesn't support it, don't generate it.  We assume that
>   if fma isn't available then fms, fnma or fnms are not either.  */
> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
> ===
> --- /dev/null   2019-07-30 08:53:31.317691683 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-07-30 
> 10:52:27.327139149 +0100
> @@ -0,0 +1,57 @@
> +/* { dg-require-effective-target scalar_all_fma } */
> +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
> +
> +#include "tree-vect.h"
> +
> +#define N (VECTOR_BITS * 11 / 64 + 3)
> +
> +#define DEF(INV)   \
> +  void __attribute__ ((noipa)) \
> +  f_##INV (int *restrict a, int *restrict b,   \
> +  int *restrict c) \
> +  {\
> +for (int i = 0; i < N; ++i)\
> +  {\
> +   int mb = (INV & 1 ? -b[i] : b[i]);  \
> +   int mc = (INV & 2 ? -c[i] : c[i]);  \
> +   a[i] = b[i] < 10 ? mb * 8 + mc : 10;\
> +  }\
> +  }
> +
> +#define TEST(INV)  \
> +  {\
> +f_##INV (a, b, c); \
> +for (int i = 0; i < N; ++i)\
> +  {\
> +   int mb = (INV & 1 ? -b[i] : b[i]);  \
> +   int mc = (INV & 2 ? -c[i] : c[i]);  \
> +   int truev = mb * 8 + mc;\
> +   if (a[i] != (i % 17 < 10 ? truev : 10)) \
> + __builtin_abort ();   \
> +   asm volatile ("" ::: "memory"); \
> +  }\
> +  }
> +
> +#define FOR_EACH_INV(T) \
> +  T (0) T (1) T (2) T (3)
> +
> +FOR_EACH_INV (DEF)
> +
> +int
> +main (void)
> +{
> +  int a[N], b[N], c[N];
> +  for (int i 

Re: Handle IFN_COND_MUL in tree-ssa-math-opts.c

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 12:01 PM Richard Sandiford
 wrote:
>
> This patch extends the FMA handling in tree-ssa-math-opts.c so
> that it can cope with conditional multiplications as well as
> unconditional multiplications.  The addition or subtraction must then
> have the same condition as the multiplication (at least for now).
>
> E.g. we can currently fold:
>
>   (IFN_COND_ADD cond (mul x y) z fallback)
> -> (IFN_COND_FMA cond x y z fallback)
>
> This patch also allows:
>
>   (IFN_COND_ADD cond (IFN_COND_MUL cond x y ) z fallback)
> -> (IFN_COND_FMA cond x y z fallback)
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?

OK.

> Richard
>
>
> 2019-07-30  Richard Sandiford  
>
> gcc/
> * tree-ssa-math-opts.c (convert_mult_to_fma): Add a mul_cond
> parameter.  When nonnull, make sure that the addition or subtraction
> has the same condition.
> (math_opts_dom_walker::after_dom_children): Try convert_mult_to_fma
> for CFN_COND_MUL too.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-cond-arith-7.c: New test.
>
> Index: gcc/tree-ssa-math-opts.c
> ===
> --- gcc/tree-ssa-math-opts.c2019-07-30 10:51:22.0 +0100
> +++ gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
> @@ -3044,6 +3044,8 @@ last_fma_candidate_feeds_initial_phi (fm
>  /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
> with uses in additions and subtractions to form fused multiply-add
> operations.  Returns true if successful and MUL_STMT should be removed.
> +   If MUL_COND is nonnull, the multiplication in MUL_STMT is conditional
> +   on MUL_COND, otherwise it is unconditional.
>
> If STATE indicates that we are deferring FMA transformation, that means
> that we do not produce FMAs for basic blocks which look like:
> @@ -3060,7 +3062,7 @@ last_fma_candidate_feeds_initial_phi (fm
>
>  static bool
>  convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
> -fma_deferring_state *state)
> +fma_deferring_state *state, tree mul_cond = NULL_TREE)
>  {
>tree mul_result = gimple_get_lhs (mul_stmt);
>tree type = TREE_TYPE (mul_result);
> @@ -3174,6 +3176,9 @@ convert_mult_to_fma (gimple *mul_stmt, t
>   return false;
> }
>
> +  if (mul_cond && cond != mul_cond)
> +   return false;
> +
>if (cond)
> {
>   if (cond == result || else_value == result)
> @@ -3785,38 +3790,48 @@ math_opts_dom_walker::after_dom_children
> }
>else if (is_gimple_call (stmt))
> {
> - tree fndecl = gimple_call_fndecl (stmt);
> - if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> + switch (gimple_call_combined_fn (stmt))
> {
> - switch (DECL_FUNCTION_CODE (fndecl))
> +   CASE_CFN_POW:
> + if (gimple_call_lhs (stmt)
> + && TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
> + && real_equal (_REAL_CST (gimple_call_arg (stmt, 1)),
> +)
> + && convert_mult_to_fma (stmt,
> + gimple_call_arg (stmt, 0),
> + gimple_call_arg (stmt, 0),
> + _state))
> {
> -   case BUILT_IN_POWF:
> -   case BUILT_IN_POW:
> -   case BUILT_IN_POWL:
> - if (gimple_call_lhs (stmt)
> - && TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
> - && real_equal
> - (_REAL_CST (gimple_call_arg (stmt, 1)),
> -  )
> - && convert_mult_to_fma (stmt,
> - gimple_call_arg (stmt, 0),
> - gimple_call_arg (stmt, 0),
> - _state))
> -   {
> - unlink_stmt_vdef (stmt);
> - if (gsi_remove (, true)
> - && gimple_purge_dead_eh_edges (bb))
> -   *m_cfg_changed_p = true;
> - release_defs (stmt);
> - continue;
> -   }
> - break;
> + unlink_stmt_vdef (stmt);
> + if (gsi_remove (, true)
> + && gimple_purge_dead_eh_edges (bb))
> +   *m_cfg_changed_p = true;
> + release_defs (stmt);
> + continue;
> +   }
> + break;
>
> -   default:;
> +   case CFN_COND_MUL:
> + if (convert_mult_to_fma (stmt,
> +  gimple_call_arg (stmt, 1),
> +  gimple_call_arg (stmt, 2),
> 

Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 12:11 PM Martin Liška  wrote:
>
> On 7/30/19 10:40 AM, Richard Biener wrote:
> > On Tue, Jul 30, 2019 at 10:07 AM Martin Liška  wrote:
> >>
> >> On 7/30/19 9:46 AM, Martin Liška wrote:
> >>> Anyway that's not a candidate for DCE. I'm testing following patch.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> One alternative approach can be to drop DECL_SET_IS_OPERATOR_DELETE in:
> >> cat -n gcc/cp/decl.c | less
> >> ...
> >>   4410  deltype = cp_build_type_attribute_variant (deltype, 
> >> extvisattr);
> >>   4411  deltype = build_exception_variant (deltype, 
> >> empty_except_spec);
> >>   4412  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
> >> ECF_NOTHROW);
> >>   4413  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
> >>   4414  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
> >> ECF_NOTHROW);
> >>   4415  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
> >>   4416
> >>   4417  if (flag_sized_deallocation)
> >>   4418{
> >>   4419  /* operator delete (void *, size_t, align_val_t); */
> >>   4420  deltype = build_function_type_list (void_type_node, 
> >> ptr_type_node,
> >>   4421  size_type_node, 
> >> align_type_node,
> >>   4422  NULL_TREE);
> >>   4423  deltype = cp_build_type_attribute_variant (deltype, 
> >> extvisattr);
> >>   4424  deltype = build_exception_variant (deltype, 
> >> empty_except_spec);
> >>   4425  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
> >> ECF_NOTHROW);
> >>   4426  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
> >>   4427  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
> >> ECF_NOTHROW);
> >>   4428  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
> >>   4429}
> >>   4430}
> >>
> >> at lines 4426 and 4428.
> >>
> >> Richi what do you prefer?
> >
> > I don't understand why a "not simple" delete operator isn't fine to be
> > DCEd?  Does C++
> > somehow allow mismatching size specifications here?
>
> No, they are the same.
>
> >  And what's the semantics
> > then?
> >
> > Thus I'd rather go with your earlier patch to mark the op necessary.
>
> Ok, I'm sending tested patch.
>
> Ready for trunk?

OK with the tests in

  if (gimple_call_builtin_p (stmt, BUILT_IN_FREE)
- || (is_gimple_call (stmt)
- && gimple_call_operator_delete_p (as_a  (stmt
-
+ || is_delete_operator)

exchanged (you already compuited is_delete_operator, no need to check for
BUILT_IN_FREE if it is true).

Richard.

> Thanks,
> Martin
>
> >
> > Richard.
> >
> >> Martin
>


Re: [PATCH] Improve PR91257, specialize bitmap_ior_and_compl_into

2019-07-30 Thread Richard Biener
On Tue, 30 Jul 2019, Richard Biener wrote:

> 
> bitmap_ior_and_compl_into is currently doing bitmap_and_compl into
> a temporary bitmap and then bitmap_ior_into.  The following uses
> the existing implementation of bitmap_ior_and_into and exchanges
> the core worker based on a template parameter.
> 
> This results in a slight savings in out-of-SSA time (it also avoids
> using the default obstack for the temporary while all operands for
> out-of-SSA and the result live in different obstacks).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Hmm, I realize I cannot reuse the implementations this way since
for non-existing C element I have to assume 1s...

Richard.

> Richard.
> 
> 2019-07-30  Richard Biener  
> 
>   PR tree-optimization/91257
>   * bitmap.c (bitmap_ior_and_maybe_compl): New templated function
>   based on bitmap_ior_and_into.
>   (bitmap_ior_and_into): Wrap around bitmap_ior_and_maybe_compl.
>   (bitmap_ior_and_compl_into): Likewise.
> 
> Index: gcc/bitmap.c
> ===
> --- gcc/bitmap.c  (revision 273795)
> +++ gcc/bitmap.c  (working copy)
> @@ -2345,28 +2399,11 @@ bitmap_ior_and_compl (bitmap dst, const_
>return changed;
>  }
>  
> -/* A |= (B & ~C).  Return true if A changes.  */
> +/* Helper for A |= (B & ~C) and A |= (B & C).  Return true if A changes.  */
>  
> -bool
> -bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
> -{
> -  bitmap_head tmp;
> -  bool changed;
> -
> -  gcc_checking_assert (!a->tree_form && !b->tree_form && !c->tree_form);
> -
> -  bitmap_initialize (, _default_obstack);
> -  bitmap_and_compl (, b, c);
> -  changed = bitmap_ior_into (a, );
> -  bitmap_clear ();
> -
> -  return changed;
> -}
> -
> -/* A |= (B & C).  Return true if A changes.  */
> -
> -bool
> -bitmap_ior_and_into (bitmap a, const_bitmap b, const_bitmap c)
> +template 
> +static bool
> +bitmap_ior_and_maybe_compl_into (bitmap a, const_bitmap b, const_bitmap c)
>  {
>bitmap_element *a_elt = a->first;
>const bitmap_element *b_elt = b->first;
> @@ -2408,11 +2445,18 @@ bitmap_ior_and_into (bitmap a, const_bit
>  
>overall = 0;
>and_elt.indx = b_elt->indx;
> -  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> - {
> -   and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
> -   overall |= and_elt.bits[ix];
> - }
> +  if (compl_p)
> + for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> +   {
> + and_elt.bits[ix] = b_elt->bits[ix] & ~c_elt->bits[ix];
> + overall |= and_elt.bits[ix];
> +   }
> +  else
> + for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
> +   {
> + and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
> + overall |= and_elt.bits[ix];
> +   }
>  
>b_elt = b_elt->next;
>c_elt = c_elt->next;
> @@ -2444,6 +2488,22 @@ bitmap_ior_and_into (bitmap a, const_bit
>return changed;
>  }
>  
> +/* A |= (B & ~C).  Return true if A changes.  */
> +
> +bool
> +bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
> +{
> +  return bitmap_ior_and_maybe_compl_into (a, b, c);
> +}
> +
> +/* A |= (B & C).  Return true if A changes.  */
> +
> +bool
> +bitmap_ior_and_into (bitmap a, const_bitmap b, const_bitmap c)
> +{
> +  return bitmap_ior_and_maybe_compl_into (a, b, c);
> +}
> +
>  /* Compute hash of bitmap (for purposes of hashing).  */
>  
>  hashval_t
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

[PATCH] Improve PR91257, specialize bitmap_ior_and_compl_into

2019-07-30 Thread Richard Biener


bitmap_ior_and_compl_into is currently doing bitmap_and_compl into
a temporary bitmap and then bitmap_ior_into.  The following uses
the existing implementation of bitmap_ior_and_into and exchanges
the core worker based on a template parameter.

This results in a slight savings in out-of-SSA time (it also avoids
using the default obstack for the temporary while all operands for
out-of-SSA and the result live in different obstacks).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-07-30  Richard Biener  

PR tree-optimization/91257
* bitmap.c (bitmap_ior_and_maybe_compl): New templated function
based on bitmap_ior_and_into.
(bitmap_ior_and_into): Wrap around bitmap_ior_and_maybe_compl.
(bitmap_ior_and_compl_into): Likewise.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 273795)
+++ gcc/bitmap.c(working copy)
@@ -2345,28 +2399,11 @@ bitmap_ior_and_compl (bitmap dst, const_
   return changed;
 }
 
-/* A |= (B & ~C).  Return true if A changes.  */
+/* Helper for A |= (B & ~C) and A |= (B & C).  Return true if A changes.  */
 
-bool
-bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
-{
-  bitmap_head tmp;
-  bool changed;
-
-  gcc_checking_assert (!a->tree_form && !b->tree_form && !c->tree_form);
-
-  bitmap_initialize (, _default_obstack);
-  bitmap_and_compl (, b, c);
-  changed = bitmap_ior_into (a, );
-  bitmap_clear ();
-
-  return changed;
-}
-
-/* A |= (B & C).  Return true if A changes.  */
-
-bool
-bitmap_ior_and_into (bitmap a, const_bitmap b, const_bitmap c)
+template 
+static bool
+bitmap_ior_and_maybe_compl_into (bitmap a, const_bitmap b, const_bitmap c)
 {
   bitmap_element *a_elt = a->first;
   const bitmap_element *b_elt = b->first;
@@ -2408,11 +2445,18 @@ bitmap_ior_and_into (bitmap a, const_bit
 
   overall = 0;
   and_elt.indx = b_elt->indx;
-  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
-   {
- and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
- overall |= and_elt.bits[ix];
-   }
+  if (compl_p)
+   for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
+ {
+   and_elt.bits[ix] = b_elt->bits[ix] & ~c_elt->bits[ix];
+   overall |= and_elt.bits[ix];
+ }
+  else
+   for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
+ {
+   and_elt.bits[ix] = b_elt->bits[ix] & c_elt->bits[ix];
+   overall |= and_elt.bits[ix];
+ }
 
   b_elt = b_elt->next;
   c_elt = c_elt->next;
@@ -2444,6 +2488,22 @@ bitmap_ior_and_into (bitmap a, const_bit
   return changed;
 }
 
+/* A |= (B & ~C).  Return true if A changes.  */
+
+bool
+bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
+{
+  return bitmap_ior_and_maybe_compl_into (a, b, c);
+}
+
+/* A |= (B & C).  Return true if A changes.  */
+
+bool
+bitmap_ior_and_into (bitmap a, const_bitmap b, const_bitmap c)
+{
+  return bitmap_ior_and_maybe_compl_into (a, b, c);
+}
+
 /* Compute hash of bitmap (for purposes of hashing).  */
 
 hashval_t


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Martin Liška
On 7/30/19 10:40 AM, Richard Biener wrote:
> On Tue, Jul 30, 2019 at 10:07 AM Martin Liška  wrote:
>>
>> On 7/30/19 9:46 AM, Martin Liška wrote:
>>> Anyway that's not a candidate for DCE. I'm testing following patch.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> One alternative approach can be to drop DECL_SET_IS_OPERATOR_DELETE in:
>> cat -n gcc/cp/decl.c | less
>> ...
>>   4410  deltype = cp_build_type_attribute_variant (deltype, 
>> extvisattr);
>>   4411  deltype = build_exception_variant (deltype, 
>> empty_except_spec);
>>   4412  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
>> ECF_NOTHROW);
>>   4413  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>>   4414  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
>> ECF_NOTHROW);
>>   4415  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>>   4416
>>   4417  if (flag_sized_deallocation)
>>   4418{
>>   4419  /* operator delete (void *, size_t, align_val_t); */
>>   4420  deltype = build_function_type_list (void_type_node, 
>> ptr_type_node,
>>   4421  size_type_node, 
>> align_type_node,
>>   4422  NULL_TREE);
>>   4423  deltype = cp_build_type_attribute_variant (deltype, 
>> extvisattr);
>>   4424  deltype = build_exception_variant (deltype, 
>> empty_except_spec);
>>   4425  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
>> ECF_NOTHROW);
>>   4426  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>>   4427  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
>> ECF_NOTHROW);
>>   4428  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>>   4429}
>>   4430}
>>
>> at lines 4426 and 4428.
>>
>> Richi what do you prefer?
> 
> I don't understand why a "not simple" delete operator isn't fine to be
> DCEd?  Does C++
> somehow allow mismatching size specifications here?

No, they are the same.

>  And what's the semantics
> then?
> 
> Thus I'd rather go with your earlier patch to mark the op necessary.

Ok, I'm sending tested patch.

Ready for trunk?
Thanks,
Martin

> 
> Richard.
> 
>> Martin

>From b4645189743b2670f77028933086fff56c46eb75 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Sun, 28 Jul 2019 13:04:28 +0200
Subject: [PATCH] Mark 2nd argument of delete operator as needed (PR
 tree-optimization/91270).

gcc/cp/ChangeLog:

2019-07-30  Martin Liska  

	PR tree-optimization/91270
	* tree-ssa-dce.c (propagate_necessity): Mark 2nd argument
	of delete operator as needed.

gcc/testsuite/ChangeLog:

2019-07-30  Martin Liska  

	PR tree-optimization/91270
	* g++.dg/torture/pr91270.C: New test.
---
 gcc/testsuite/g++.dg/torture/pr91270.C | 10 ++
 gcc/tree-ssa-dce.c | 19 +++
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr91270.C

diff --git a/gcc/testsuite/g++.dg/torture/pr91270.C b/gcc/testsuite/g++.dg/torture/pr91270.C
new file mode 100644
index 000..60d766e9e9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr91270.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+struct S {
+  ~S();
+};
+int a = 123;
+void fn1() {
+  S *s = new S[a];
+  delete[] s;
+}
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 763b76f0e53..c816fccceb4 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -804,10 +804,11 @@ propagate_necessity (bool aggressive)
 	  /* If this is a call to free which is directly fed by an
 	 allocation function do not mark that necessary through
 	 processing the argument.  */
+	  bool is_delete_operator
+	= (is_gimple_call (stmt)
+	   && gimple_call_operator_delete_p (as_a  (stmt)));
 	  if (gimple_call_builtin_p (stmt, BUILT_IN_FREE)
-	  || (is_gimple_call (stmt)
-		  && gimple_call_operator_delete_p (as_a  (stmt
-
+	  || is_delete_operator)
 	{
 	  tree ptr = gimple_call_arg (stmt, 0);
 	  gimple *def_stmt;
@@ -822,7 +823,17 @@ propagate_necessity (bool aggressive)
 			   || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_MALLOC
 			   || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC))
 		  || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (def_callee)))
-		continue;
+		{
+		  /* Some delete operators have size as 2nd argument.  */
+		  if (is_delete_operator && gimple_call_num_args (stmt) >= 2)
+		{
+		  tree size_argument = gimple_call_arg (stmt, 1);
+		  if (TREE_CODE (size_argument) == SSA_NAME)
+			mark_operand_necessary (size_argument);
+		}
+
+		  continue;
+		}
 	}
 
 	  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
-- 
2.22.0



Don't use integer "FMA" for shifts

2019-07-30 Thread Richard Sandiford
tree-ssa-math-opts supports FMA optabs for integers as well as
floating-point types, even though there's no distinction between
fused and unfused there.  It turns out to be pretty handy for the
IFN_COND_* handling though, so I don't want to remove it, however
weird it might seem.

Instead this patch makes sure that we don't apply it to integer
multiplications that are actually shifts (but that are represented
in gimple as multiplications because that's the canonical form).

This is a preemptive strike.  The test doesn't fail for SVE as-is,
but would after a later patch.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2019-07-30  Richard Sandiford  

gcc/
* tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
multiplications by a power of 2 or a negative power of 2.

gcc/testsuite/
* gcc.dg/vect/vect-cond-arith-8.c: New test.

Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
+++ gcc/tree-ssa-math-opts.c2019-07-30 10:52:27.327139149 +0100
@@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
   && flag_fp_contract_mode == FP_CONTRACT_OFF)
 return false;
 
-  /* We don't want to do bitfield reduction ops.  */
-  if (INTEGRAL_TYPE_P (type)
-  && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
-return false;
+  if (ANY_INTEGRAL_TYPE_P (type))
+{
+  /* We don't want to do bitfield reduction ops.  */
+  tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
+  if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype))
+   return false;
+
+  /* Don't use FMA for multiplications that are actually shifts.  */
+  tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
+  if (val
+ && TREE_CODE (val) == INTEGER_CST
+ && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
+   return false;
+}
 
   /* If the target doesn't support it, don't generate it.  We assume that
  if fma isn't available then fms, fnma or fnms are not either.  */
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c   2019-07-30 
10:52:27.327139149 +0100
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target scalar_all_fma } */
+/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
+
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS * 11 / 64 + 3)
+
+#define DEF(INV)   \
+  void __attribute__ ((noipa)) \
+  f_##INV (int *restrict a, int *restrict b,   \
+  int *restrict c) \
+  {\
+for (int i = 0; i < N; ++i)\
+  {\
+   int mb = (INV & 1 ? -b[i] : b[i]);  \
+   int mc = (INV & 2 ? -c[i] : c[i]);  \
+   a[i] = b[i] < 10 ? mb * 8 + mc : 10;\
+  }\
+  }
+
+#define TEST(INV)  \
+  {\
+f_##INV (a, b, c); \
+for (int i = 0; i < N; ++i)\
+  {\
+   int mb = (INV & 1 ? -b[i] : b[i]);  \
+   int mc = (INV & 2 ? -c[i] : c[i]);  \
+   int truev = mb * 8 + mc;\
+   if (a[i] != (i % 17 < 10 ? truev : 10)) \
+ __builtin_abort ();   \
+   asm volatile ("" ::: "memory"); \
+  }\
+  }
+
+#define FOR_EACH_INV(T) \
+  T (0) T (1) T (2) T (3)
+
+FOR_EACH_INV (DEF)
+
+int
+main (void)
+{
+  int a[N], b[N], c[N];
+  for (int i = 0; i < N; ++i)
+{
+  b[i] = i % 17;
+  c[i] = i % 13 + 14;
+  asm volatile ("" ::: "memory");
+}
+  FOR_EACH_INV (TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not { = \.COND_FMA } "optimized" } } */
+/* { dg-final { scan-tree-dump-not { = \.COND_FMS } "optimized" } } */
+/* { dg-final { scan-tree-dump-not { = \.COND_FNMA } "optimized" } } */
+/* { dg-final { scan-tree-dump-not { = \.COND_FNMS } "optimized" } } */


Handle IFN_COND_MUL in tree-ssa-math-opts.c

2019-07-30 Thread Richard Sandiford
This patch extends the FMA handling in tree-ssa-math-opts.c so
that it can cope with conditional multiplications as well as
unconditional multiplications.  The addition or subtraction must then
have the same condition as the multiplication (at least for now).

E.g. we can currently fold:

  (IFN_COND_ADD cond (mul x y) z fallback)
-> (IFN_COND_FMA cond x y z fallback)

This patch also allows:

  (IFN_COND_ADD cond (IFN_COND_MUL cond x y ) z fallback)
-> (IFN_COND_FMA cond x y z fallback)

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2019-07-30  Richard Sandiford  

gcc/
* tree-ssa-math-opts.c (convert_mult_to_fma): Add a mul_cond
parameter.  When nonnull, make sure that the addition or subtraction
has the same condition.
(math_opts_dom_walker::after_dom_children): Try convert_mult_to_fma
for CFN_COND_MUL too.

gcc/testsuite/
* gcc.dg/vect/vect-cond-arith-7.c: New test.

Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c2019-07-30 10:51:22.0 +0100
+++ gcc/tree-ssa-math-opts.c2019-07-30 10:51:51.827405171 +0100
@@ -3044,6 +3044,8 @@ last_fma_candidate_feeds_initial_phi (fm
 /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
with uses in additions and subtractions to form fused multiply-add
operations.  Returns true if successful and MUL_STMT should be removed.
+   If MUL_COND is nonnull, the multiplication in MUL_STMT is conditional
+   on MUL_COND, otherwise it is unconditional.
 
If STATE indicates that we are deferring FMA transformation, that means
that we do not produce FMAs for basic blocks which look like:
@@ -3060,7 +3062,7 @@ last_fma_candidate_feeds_initial_phi (fm
 
 static bool
 convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
-fma_deferring_state *state)
+fma_deferring_state *state, tree mul_cond = NULL_TREE)
 {
   tree mul_result = gimple_get_lhs (mul_stmt);
   tree type = TREE_TYPE (mul_result);
@@ -3174,6 +3176,9 @@ convert_mult_to_fma (gimple *mul_stmt, t
  return false;
}
 
+  if (mul_cond && cond != mul_cond)
+   return false;
+
   if (cond)
{
  if (cond == result || else_value == result)
@@ -3785,38 +3790,48 @@ math_opts_dom_walker::after_dom_children
}
   else if (is_gimple_call (stmt))
{
- tree fndecl = gimple_call_fndecl (stmt);
- if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+ switch (gimple_call_combined_fn (stmt))
{
- switch (DECL_FUNCTION_CODE (fndecl))
+   CASE_CFN_POW:
+ if (gimple_call_lhs (stmt)
+ && TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
+ && real_equal (_REAL_CST (gimple_call_arg (stmt, 1)),
+)
+ && convert_mult_to_fma (stmt,
+ gimple_call_arg (stmt, 0),
+ gimple_call_arg (stmt, 0),
+ _state))
{
-   case BUILT_IN_POWF:
-   case BUILT_IN_POW:
-   case BUILT_IN_POWL:
- if (gimple_call_lhs (stmt)
- && TREE_CODE (gimple_call_arg (stmt, 1)) == REAL_CST
- && real_equal
- (_REAL_CST (gimple_call_arg (stmt, 1)),
-  )
- && convert_mult_to_fma (stmt,
- gimple_call_arg (stmt, 0),
- gimple_call_arg (stmt, 0),
- _state))
-   {
- unlink_stmt_vdef (stmt);
- if (gsi_remove (, true)
- && gimple_purge_dead_eh_edges (bb))
-   *m_cfg_changed_p = true;
- release_defs (stmt);
- continue;
-   }
- break;
+ unlink_stmt_vdef (stmt);
+ if (gsi_remove (, true)
+ && gimple_purge_dead_eh_edges (bb))
+   *m_cfg_changed_p = true;
+ release_defs (stmt);
+ continue;
+   }
+ break;
 
-   default:;
+   case CFN_COND_MUL:
+ if (convert_mult_to_fma (stmt,
+  gimple_call_arg (stmt, 1),
+  gimple_call_arg (stmt, 2),
+  _state,
+  gimple_call_arg (stmt, 0)))
+
+   {
+ gsi_remove (, true);
+ release_defs (stmt);
+ continue;
}
+ break;
+
+

Make lra use per-alternative earlyclobber info

2019-07-30 Thread Richard Sandiford
lra_insn_reg and lra_operand_data have both a bitmask of earlyclobber
alternatives and an overall boolean.  The danger is that we then test
the overall boolean when really we should be testing for a particular
alternative.  This patch gets rid of the boolean and tests the mask
against zero when we really do need to test "any alternative might
be earlyclobber".  (I think the only instance of that is the
LRA_UNKNOWN_ALT handling in lra-lives.c:reg_early_clobber_p.)

This is needed (and tested) by an upcoming SVE patch.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Thanks,
Richard


2019-07-30  Richard Sandiford  

gcc/
* lra-int.h (lra_operand_data): Remove early_clobber field.
(lra_insn_reg): Likewise.
* lra.c (debug_operand_data): Update accordingly.
(setup_operand_alternative): Likewise.
(new_insn_reg): Likewise.  Remove early_clobber parameter.
(collect_non_operand_hard_regs): Update call accordingly.
Don't assign to lra_insn_reg::early_clobber.
(add_regs_to_insn_regno_info): Remove early_clobber parameter
and update calls to new_insn_reg.
(lra_update_insn_regno_info): Update calls accordingly.
* lra-constraints.c (update_and_check_small_class_inputs): Take the
alternative number as a parameter and test whether the operand
is earlyclobbered in that particular alternative.
(process_alt_operands): Update call accordingly.  Use per-alternative
checks for earyclobber here too.
* lra-lives.c (reg_early_clobber_p): Check early_clobber_alts
against zero for IRA_UNKNOWN_ALT.

Index: gcc/lra-int.h
===
--- gcc/lra-int.h   2019-07-10 19:41:21.607936374 +0100
+++ gcc/lra-int.h   2019-07-30 10:51:22.999621161 +0100
@@ -142,10 +142,6 @@ struct lra_operand_data
   unsigned int strict_low : 1;
   /* True if the operand is an operator.  */
   unsigned int is_operator : 1;
-  /* True if there is an early clobber alternative for this operand.
- This field is set up every time when corresponding
- operand_alternative in lra_static_insn_data is set up.  */
-  unsigned int early_clobber : 1;
   /* True if the operand is an address.  */
   unsigned int is_address : 1;
 };
@@ -164,9 +160,6 @@ struct lra_insn_reg
   /* True if the reg is accessed through a subreg and the subreg is
  just a part of the register.  */
   unsigned int subreg_p : 1;
-  /* True if there is an early clobber alternative for this
- operand.  */
-  unsigned int early_clobber : 1;
   /* True if the reg is clobber highed by the operand.  */
   unsigned int clobber_high : 1;
   /* The corresponding regno of the register.  */
Index: gcc/lra.c
===
--- gcc/lra.c   2019-07-10 19:41:21.611936343 +0100
+++ gcc/lra.c   2019-07-30 10:51:22.999621161 +0100
@@ -536,16 +536,14 @@ object_allocator lra_insn_
 
 /* Create LRA insn related info about a reference to REGNO in INSN
with TYPE (in/out/inout), biggest reference mode MODE, flag that it
-   is reference through subreg (SUBREG_P), flag that is early
-   clobbered in the insn (EARLY_CLOBBER), and reference to the next
+   is reference through subreg (SUBREG_P), and reference to the next
insn reg info (NEXT).  If REGNO can be early clobbered,
alternatives in which it can be early clobbered are given by
EARLY_CLOBBER_ALTS.  CLOBBER_HIGH marks if reference is a clobber
high.  */
 static struct lra_insn_reg *
 new_insn_reg (rtx_insn *insn, int regno, enum op_type type,
- machine_mode mode,
- bool subreg_p, bool early_clobber,
+ machine_mode mode, bool subreg_p,
  alternative_mask early_clobber_alts,
  struct lra_insn_reg *next, bool clobber_high)
 {
@@ -556,7 +554,6 @@ new_insn_reg (rtx_insn *insn, int regno,
   && partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
 lra_reg_info[regno].biggest_mode = mode;
   ir->subreg_p = subreg_p;
-  ir->early_clobber = early_clobber;
   ir->early_clobber_alts = early_clobber_alts;
   ir->clobber_high = clobber_high;
   ir->regno = regno;
@@ -605,7 +602,7 @@ static struct lra_operand_data debug_ope
 0, /* early_clobber_alts */
 E_VOIDmode, /* We are not interesting in the operand mode.  */
 OP_IN,
-0, 0, 0, 0
+0, 0, 0
   };
 
 /* The following data are used as static insn data for all debug
@@ -801,7 +798,6 @@ setup_operand_alternative (lra_insn_reco
   for (i = 0; i < nop; i++)
 {
   static_data->operand[i].early_clobber_alts = 0;
-  static_data->operand[i].early_clobber = false;
   static_data->operand[i].is_address = false;
   if (static_data->operand[i].constraint[0] == '%')
{
@@ -817,7 +813,6 @@ setup_operand_alternative (lra_insn_reco
   for (j = 0; j < nalt; j++)
 for (i = 0; i < nop; i++, op_alt++)
   {
-   

Re: Update x86-tune-costs.h for znver2

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 11:33 AM Jan Hubicka  wrote:
>
> > On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka  wrote:
> > >
> > > > Hi,
> > > > this patch updates znver2 costs to match reality.  In particular we
> > > > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > > > for relatively small blocks.
> > > > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > > > paths and faster multiplication.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, comitted.
> > > >
> > > > Honza
> > > >
> > > >   * x86-tune-costs.h (znver2_memcpy): Update.
> > > >   (znver2_costs): Update 256 bit SSE costs and multiplication.
> > >
> > > Hello,
> > > I have now backported the patch to gcc 9 branch.
> >
> > Thanks - can you please update changes.html for it in the 9.2 section?
>
> There seems to be no GCC 9.2 section yet.

Yes.  Looks good to me btw.

Richard.

> Index: gcc-9/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
> retrieving revision 1.72
> diff -c -3 -p -r1.72 changes.html
> *** gcc-9/changes.html  12 Jul 2019 15:55:50 -  1.72
> --- gcc-9/changes.html  30 Jul 2019 09:32:17 -
> *** complete (that is, it is possible that s
> *** 1095,1099 
> --- 1095,1105 
>   are not listed here).
>
>   
> + GCC 9.2
> + 
> +   IA-32/x86-64 backend tuning for znver2 was improved 
> based on benchmarks on real hardware.
> + 
> +
> + 
>   
>   


Re: Update x86-tune-costs.h for znver2

2019-07-30 Thread Jan Hubicka
> On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka  wrote:
> >
> > > Hi,
> > > this patch updates znver2 costs to match reality.  In particular we
> > > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > > for relatively small blocks.
> > > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > > paths and faster multiplication.
> > >
> > > Bootstrapped/regtested x86_64-linux, comitted.
> > >
> > > Honza
> > >
> > >   * x86-tune-costs.h (znver2_memcpy): Update.
> > >   (znver2_costs): Update 256 bit SSE costs and multiplication.
> >
> > Hello,
> > I have now backported the patch to gcc 9 branch.
> 
> Thanks - can you please update changes.html for it in the 9.2 section?

There seems to be no GCC 9.2 section yet.

Index: gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.72
diff -c -3 -p -r1.72 changes.html
*** gcc-9/changes.html  12 Jul 2019 15:55:50 -  1.72
--- gcc-9/changes.html  30 Jul 2019 09:32:17 -
*** complete (that is, it is possible that s
*** 1095,1099 
--- 1095,1105 
  are not listed here).
  
  
+ GCC 9.2
+ 
+   IA-32/x86-64 backend tuning for znver2 was improved based 
on benchmarks on real hardware.
+ 
+ 
+ 
  
  


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Ramana Radhakrishnan

On 30/07/2019 10:08, Christophe Lyon wrote:

On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  wrote:


Currently the Arm backend selects the alternative sched pressure algorithm.
The issue is that this doesn't take register pressure into account, and so
it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 shows
~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.



Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html


It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.



I don't have all the notes today for that - maybe you can look into the 
linaro wiki.


I am concerned about taking this patch in without some more data across 
a variety of cores.


Thanks,
Ramana




Thanks,

Christophe


Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

 * config/arm/arm.c (arm_option_override): Don't override sched
 pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;

-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-global_options.x_param_values,
-global_options_set.x_param_values);
-
/* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
int param_sched_autopref_queue_depth;





Re: Add znver2 scheduler model

2019-07-30 Thread Jan Hubicka
> Hi,
> this patch adds znver2 scheduler model. Znver2 is close enough to znver1
> that I have decided to implement it in one automaton.  The main
> difference is extra AGU unit that seems to be used for store only
> (according to CPU diagram), the fact that 256bit vector operations are
> no longer split (and thus they behave like 128bit) and reduced latency
> of fp multiply and conversion operations.
> 
> The patch seems to have very little effect on overall performance but
> since we do not model the out of order core and thus we think that the
> CPU is mostly starved by not having enough parallelism to exectue.
> Still it is better to be precise.
> 
> Bootstrapped/regtested x86_64-linux, commited.
> 
>   * common/config/i386/i386-common.c: Use PROCESSOR_ZNVER2 scheduler for
>   znver2.
>   * config/i386/znver1.md: Enable patterns for znver2 and add store
>   variants which use extra AGU unit.

Hello,
I have backported this patch to gcc 9 branch now too.

Honza


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Christophe Lyon
On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  wrote:
>
> Currently the Arm backend selects the alternative sched pressure algorithm.
> The issue is that this doesn't take register pressure into account, and so
> it causes significant additional spilling on Arm where there are only 14
> allocatable registers.  SPEC2006 shows significant codesize reduction
> with the default pressure algorithm, so switch back to that.  PR77308 shows
> ~800 fewer instructions.
>
> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
> patches. Overall SPEC codesize is 1.1% smaller.
>

Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html

Thanks,

Christophe

> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
>
> ChangeLog:
> 2019-07-29  Wilco Dijkstra  
>
> * config/arm/arm.c (arm_option_override): Don't override sched
> pressure algorithm.
>
> --
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3575,11 +3575,6 @@ arm_option_override (void)
>if (use_neon_for_64bits == 1)
>   prefer_neon_for_64bits = true;
>
> -  /* Use the alternative scheduling-pressure algorithm by default.  */
> -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
> SCHED_PRESSURE_MODEL,
> -global_options.x_param_values,
> -global_options_set.x_param_values);
> -
>/* Look through ready list and all of queue for instructions
>   relevant for L2 auto-prefetcher.  */
>int param_sched_autopref_queue_depth;
>


Re: [wwwdocs] nds32 documentation - remove broken reference

2019-07-30 Thread Shiva Chen
Hi Gerald,

The update link will be
http://www.andestech.com/en/products-solutions/product-documentation/
Thanks for your kindly remind,
Shiva

Gerald Pfeifer  於 2019年7月28日 週日 下午4:54寫道:

> I could not find an updated link on www.andestech.com, in fact the
> reference I could find there was broken as well.
>
> If anyone has an update link, happy to add that again!
>
> Applied for now.
>
> Gerald
>
> Index: readings.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
> retrieving revision 1.315
> diff -u -r1.315 readings.html
> --- readings.html   19 Jun 2019 20:27:02 -  1.315
> +++ readings.html   28 Jul 2019 08:52:46 -
> @@ -70,7 +70,6 @@
>   andes (nds32)
>Manufacturer: Various licenses of Andes Technology Corporation.
>CPUs include: AndesCore families N7, N8, SN8, N9, N10, N12 and
> N13.
> -  http://www.andestech.com/product.php?cls=9;>Andes
> Documentation
>GDB includes a simulator for all CPUs.
>   
>
>


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-30 Thread Richard Biener
On Fri, Jul 26, 2019 at 4:32 PM Andrew MacLeod  wrote:
>
> On 7/25/19 11:37 PM, Jeff Law wrote:
> > On 7/24/19 12:33 PM, Richard Biener wrote:
> >> On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law 
> >> wrote:
> >>> On 7/24/19 11:00 AM, Richard Biener wrote: [ Big snip, ignore
> >>> missing reply attributions... ]
> >>>
> > it. But I'd claim that if callers are required not to change
> > these ranges, then the callers are fundamentally broken.  I'm
> > not sure what the "sanitization" is really buying you here.
> > Can you point to something specific?
> >
> >> But you lose the sanitizing that nobody can change it and the
> >>   changed info leaks to other SSA vars.
> >>
> >> As said, fix all callers to deal with NULL.
> >>
> >> But I argue the current code is exactly optimal and safe.
> > ANd I'd argue that it's just plain broken and that the
> > sanitization you're referring to points to something broken
> > elsewhere,  higher up in the callers.
>  Another option is to make get_value_range return by value and
>  the only way to change the lattice to call an appropriate set
>  function. I think we already do the latter in all cases (but we
>  use get_value_range in the setter) and returning by reference is
>  just eliding the copy.
> >>> OK, so what I think you're getting at (and please correct me if
> >>> I'm wrong) is that once the lattice values are set, you don't want
> >>> something changing the recorded ranges underneath?
> >>>
> >>> ISTM the way to enforce that is to embed the concept in the class
> >>> and enforce it by not allowing direct manipulation of range by the
> >>> clients. So a client that wants this behavior somehow tells the
> >>> class that ranges are "set in stone" and from that point the
> >>> setters don't allow changing the underlying ranges.
> >> Yes. You'll see that nearly all callers do
> >>
> >> Value_range vr = *get_value_range (name);
> >>
> >> Modify
> >>
> >> Update_value_range (name, ) ;
> >>
> >> And returning by reference was mostly an optimization. We _did_ have
> >> callers Changing the range in place and the const varying catched
> >> those.
> >>
> >> When returning by value we can return individual VARYINGs not in the
> >> lattice if we decide that's what we want.
> >>
> >>> I just want to make sure we're on the same page WRT why you think
> >>> the constant varying range object is useful.
> >> As said it's an optimization. We do not want to reallocate the
> >> lattice. And we want lattice updating to happen in a controlled
> >> manner, so returning a pointer into the lattice is bad design at this
> >> point.
> > But I would claim that the current state is dreadful.  Consider that
> > when gimple-fold asks for a new SSA_NAME, it could get a recycled one,
> > in which case we get a real range.  Or if it doens't get a recycled
> > name, then we get the const varying node.  The inconsistently is silly
> > when we can just reallocate the underlying object.
> >
> > Between recycling of SSA_NAMEs and allocating a bit of additional space
> > (say rounding up to some reasonable boundary) I'd bet you'd never be
> > able to measure the reallocation in practice.
> >
> I annotated the patch yesterday to actually log reallocations and ran a
> couple of bootstraps...
>
> If we don't add any extra space in the vector initially (as it is
> allocated today) , it is re-sized a total of 201 times.  Of those, 93
> get back the same pointer so no resize actually happens.
>
> IF we use the patch as I initially propose, where we add 10% to the
> vector to start, we re-size 10 times, all from either 18 to 25 , or 8 to
> 14,so very small numbers of ssaname functions, where the 10% doesnt
> really do much.  Those were all re-allocations but one.   The initial
> resize does seem to help prevent any larger reallocations,
>
> THat doesn't seem like that bad of a thing over all, and we could tweak
> the initial size a bit more if it would help? to deal with the small
> cases, we could make it num_names+10%+10 or something even.
>
> I feel it would be safer.. It seems to me the CONST solution cannot be
> disabled.. ie, even a non-checking production compiler would go boom if
> it triggered.
>
> As for addressing the issue that the CONST approach is trying to
> resolve, Could we change all the set/update routines to do something like
>
> gcc_checking_assert (new_range->varying_p ()  || !values_propagated);
>
> that way we'll trigger an assert if we try to change any value to
> something other than varying when values_propagated is set?

I think the constness is appropriately addressed by my recent API update
to split the get-value from get-lattice-entry and properly forcing lattice
update to go through the few setters.

I'm not totally against making the lattice expandable, as the followup
patch to the original re-org notices we do run into "new" stmts during
the combined analysis/propagation stages the DOM-based users use.

Re: [Mingw-w64-public] Fwd: [patch] Reimplement GNU threads library on native Windows

2019-07-30 Thread Eric Botcazou
> 0) complexifies comparison of thread IDs without obvious benefits, and

The reverse argument is also true: using IDs would complexify everything else 
with the only benefit of simplifying the equal primitive.

> 1) does not work reliably because handles can be duplicated, and

That's pure FUD.

> 2) makes `__gthread_self()` return invalid handles in detached threads.

Admittedly, but this can be fixed if this is deemed necessary by clearing the 
thread descriptor when detaching the thread.

-- 
Eric Botcazou


Re: wrap math.h for M_PI et al in target/i386 tests

2019-07-30 Thread Alexandre Oliva
On Jul 17, 2019, Alexandre Oliva  wrote:

> Most but not all of the tests that expect M_PI, M_PI_2 and/or M_PI_4
> to be defined in math.h explicitly exclude one target system that does
> not satisfy this non-standard assumption.

> This patch introduces a wrapper header that includes math.h and then
> conditionally supplies the missing non-standard macro definitions.
> With that, we can drop the dg-skip-if "no M_PI" exclusions.

> Tested on x86_64-linux-gnu, with a regular math.h, and with a "manually
> fixincluded" math.h so as to not define M_PI, M_PI_2 and M_PI_4.  Ok to
> install?

Ping?

https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01211.html


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [PATCH] Clean up dangling pointers in cgraph_edge (PR ipa/89330).

2019-07-30 Thread Martin Liška
On 7/30/19 10:36 AM, Richard Biener wrote:
> On Tue, Jul 30, 2019 at 9:27 AM Martin Liška  wrote:
>>
>> Hi.
>>
>> We have to clean up dangling pointers before we call ggc_free for a 
>> cgraph_edge.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>> And it survives --enable-checking=release bootstrap on x86_64-linux-gnu.
>>
>> Ready to be installed?
> 
> Eh?  The only "real" effect I see is that e->indirect_info test is now
> never true.

Yep, you are right.

> 
> I think it rather means the edge we ggc_free is still referenced to
> from somewhere
> and _that_ needs to be fixed or we ggc_free the edge wrongly.

Yes, that's one another situation IPA CP is touching a dead cgraph_edge.
Martin will help me latter.
I'm reducing a test-case now..

Martin

> 
> Richard.
> 
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-07-30  Martin Liska  
>>
>> PR ipa/89330
>> * cgraph.c (symbol_table::free_edge): Memset 0 to cgraph_edge
>> before we call ggc_free.
>> ---
>>  gcc/cgraph.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>>



Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 10:35 AM Jakub Jelinek  wrote:
>
> On Tue, Jul 30, 2019 at 10:21:15AM +0200, Richard Biener wrote:
> > Would you need to LTO stream & merge the bitmaps / maps somehow?
>
> Yes.  And if we do not throw unneeded warnings from the sets normally, LTO
> streaming might be a good time to do that, so that we merge in only warnings
> that will be tested during/post IPA.
>
> > (maybe "unmap" to sth else when streaming individual stmts)  Not sure if
> > a bitmap is really necessary, we could simply have a tree/gimple * -> vec<>
> > mapping with records about emitted (or queued) diagnostics, like
> > OPT_, message pairs or so.
>
> Right now we use it both for the case we've already emitted some diagnostics
> and don't want to emit it again later, or for the case where we insert
> something in the IL that a warning could be diagnosed on but we know we
> don't want that.  The latter is the case e.g. for when we add
> TREE_NO_WARNING on the last value in a statement expression so that we don't
> diagnose it as unused, or the case we are discussing here.
> If we need queued diagnostics, sure, we could add it in, but I don't see a
> need for that right now.  vecs compared to bitmap might be larger and harder
> to avoid duplicates.  I guess we'd need to do some analysis though, and e.g.
> if 99% of cases we need to disable just one warning and not multiple,
> perhaps optimize for that case.

I was thinking we may want to use the same facility to record warnings we
want to not emit if we later figure a stmt was in an unreachable region.  For
this we need to record the actual warning, not only the warning kind.

Richard.

>
> Jakub


Re: Update x86-tune-costs.h for znver2

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 10:09 AM Jan Hubicka  wrote:
>
> > Hi,
> > this patch updates znver2 costs to match reality.  In particular we
> > re-benchmarked memcpy strategies and it looks that glibc now wins even
> > for relatively small blocks.
> > Moreover I updated costs of moves to reflect that znver2 has 256 vector
> > paths and faster multiplication.
> >
> > Bootstrapped/regtested x86_64-linux, comitted.
> >
> > Honza
> >
> >   * x86-tune-costs.h (znver2_memcpy): Update.
> >   (znver2_costs): Update 256 bit SSE costs and multiplication.
>
> Hello,
> I have now backported the patch to gcc 9 branch.

Thanks - can you please update changes.html for it in the 9.2 section?

Richard.

> Honza
> > Index: config/i386/x86-tune-costs.h
> > ===
> > --- config/i386/x86-tune-costs.h  (revision 273727)
> > +++ config/i386/x86-tune-costs.h  (working copy)
> > @@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
> >  static stringop_algs znver2_memcpy[2] = {
> >{libcall, {{6, loop, false}, {14, unrolled_loop, false},
> >{-1, rep_prefix_4_byte, false}}},
> > -  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
> > +  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
> >{-1, libcall, false;
> >  static stringop_algs znver2_memset[2] = {
> >{libcall, {{8, loop, false}, {24, unrolled_loop, false},
> >{2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> > -  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
> > +  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, 
> > false},
> >{-1, libcall, false;
> >
> >  struct processor_costs znver2_cost = {
> > @@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
> >  in SImode and DImode.  */
> >{8, 8},/* cost of storing MMX registers
> >  in SImode and DImode.  */
> > -  2, 3, 6,   /* cost of moving XMM,YMM,ZMM
> > +  2, 2, 3,   /* cost of moving XMM,YMM,ZMM
> >  register.  */
> > -  {6, 6, 6, 10, 20}, /* cost of loading SSE registers
> > +  {6, 6, 6, 6, 12},  /* cost of loading SSE registers
> >  in 32,64,128,256 and 512-bit.  */
> > -  {6, 6, 6, 10, 20}, /* cost of unaligned loads.  */
> > +  {6, 6, 6, 6, 12},  /* cost of unaligned loads.  */
> >{8, 8, 8, 8, 16},  /* cost of storing SSE registers
> >  in 32,64,128,256 and 512-bit.  */
> >{8, 8, 8, 8, 16},  /* cost of unaligned stores.  */
> > @@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
> >COSTS_N_INSNS (1), /* cost of cheap SSE instruction.  */
> >COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns.  
> > */
> >COSTS_N_INSNS (3), /* cost of MULSS instruction.  */
> > -  COSTS_N_INSNS (4), /* cost of MULSD instruction.  */
> > +  COSTS_N_INSNS (3), /* cost of MULSD instruction.  */
> >COSTS_N_INSNS (5), /* cost of FMA SS instruction.  */
> >COSTS_N_INSNS (5), /* cost of FMA SD instruction.  */
> >COSTS_N_INSNS (10),/* cost of DIVSS instruction. 
> >  */


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 10:07 AM Martin Liška  wrote:
>
> On 7/30/19 9:46 AM, Martin Liška wrote:
> > Anyway that's not a candidate for DCE. I'm testing following patch.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> One alternative approach can be to drop DECL_SET_IS_OPERATOR_DELETE in:
> cat -n gcc/cp/decl.c | less
> ...
>   4410  deltype = cp_build_type_attribute_variant (deltype, 
> extvisattr);
>   4411  deltype = build_exception_variant (deltype, 
> empty_except_spec);
>   4412  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
> ECF_NOTHROW);
>   4413  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>   4414  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
> ECF_NOTHROW);
>   4415  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>   4416
>   4417  if (flag_sized_deallocation)
>   4418{
>   4419  /* operator delete (void *, size_t, align_val_t); */
>   4420  deltype = build_function_type_list (void_type_node, 
> ptr_type_node,
>   4421  size_type_node, 
> align_type_node,
>   4422  NULL_TREE);
>   4423  deltype = cp_build_type_attribute_variant (deltype, 
> extvisattr);
>   4424  deltype = build_exception_variant (deltype, 
> empty_except_spec);
>   4425  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
> ECF_NOTHROW);
>   4426  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>   4427  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
> ECF_NOTHROW);
>   4428  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
>   4429}
>   4430}
>
> at lines 4426 and 4428.
>
> Richi what do you prefer?

I don't understand why a "not simple" delete operator isn't fine to be
DCEd?  Does C++
somehow allow mismatching size specifications here?  And what's the semantics
then?

Thus I'd rather go with your earlier patch to mark the op necessary.

Richard.

> Martin


Re: [PATCH] Clean up dangling pointers in cgraph_edge (PR ipa/89330).

2019-07-30 Thread Richard Biener
On Tue, Jul 30, 2019 at 9:27 AM Martin Liška  wrote:
>
> Hi.
>
> We have to clean up dangling pointers before we call ggc_free for a 
> cgraph_edge.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> And it survives --enable-checking=release bootstrap on x86_64-linux-gnu.
>
> Ready to be installed?

Eh?  The only "real" effect I see is that e->indirect_info test is now
never true.

I think it rather means the edge we ggc_free is still referenced to
from somewhere
and _that_ needs to be fixed or we ggc_free the edge wrongly.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-07-30  Martin Liska  
>
> PR ipa/89330
> * cgraph.c (symbol_table::free_edge): Memset 0 to cgraph_edge
> before we call ggc_free.
> ---
>  gcc/cgraph.c | 2 ++
>  1 file changed, 2 insertions(+)
>
>


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 30, 2019 at 10:21:15AM +0200, Richard Biener wrote:
> Would you need to LTO stream & merge the bitmaps / maps somehow?

Yes.  And if we do not throw unneeded warnings from the sets normally, LTO
streaming might be a good time to do that, so that we merge in only warnings
that will be tested during/post IPA.

> (maybe "unmap" to sth else when streaming individual stmts)  Not sure if
> a bitmap is really necessary, we could simply have a tree/gimple * -> vec<>
> mapping with records about emitted (or queued) diagnostics, like
> OPT_, message pairs or so.

Right now we use it both for the case we've already emitted some diagnostics
and don't want to emit it again later, or for the case where we insert
something in the IL that a warning could be diagnosed on but we know we
don't want that.  The latter is the case e.g. for when we add
TREE_NO_WARNING on the last value in a statement expression so that we don't
diagnose it as unused, or the case we are discussing here.
If we need queued diagnostics, sure, we could add it in, but I don't see a
need for that right now.  vecs compared to bitmap might be larger and harder
to avoid duplicates.  I guess we'd need to do some analysis though, and e.g.
if 99% of cases we need to disable just one warning and not multiple,
perhaps optimize for that case.

Jakub


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-30 Thread Richard Biener
On Mon, Jul 29, 2019 at 6:03 PM Jakub Jelinek  wrote:
>
> On Wed, Jul 24, 2019 at 12:07:36PM -0600, Martin Sebor wrote:
> > > > There are a number of existing instances of setting TREE_NO_WARNING
> > > > to suppress -Wuninitialized, and some are the cause of known problems.
> > > > Bugs 51545, 58950, 74762, 74765 or 89697 are examples.  They all boil
> > > > down to the fact that there's just one bit for all warnings.  Jakub
> > > > mentioned adding a hash-map for this.  That seems like a simple and
> > > > good solution.
> > > I'm not sure how that really helps here.  We marking the MEM on the LHS
> > > and that's not a shared object.  I don't see how it's going to be
> > > significantly different using a hash map vs the bit in this circumstance.
> >
> > I don't know what Jakub had in mind but the mapping I envision is
> > one like hash_map that would make it possible to set
> > a bit for each distinct warning for every tree node.  It would let
> > us set a bit for -Wuninitialized while leaving the bit for
> > -Warray-bounds (and all other warnings) clear.
>
> I had in mind something like a hash_map / hash_map *,
> const_bitmap> (or just make it a ) where possibly the
> bitmaps would be shared, so have a hash table in which one would look up 
> existing
> bitmaps containing particular sets of warnings and if not create a new one 
> (and
> where the bitmaps would be const after creation).
> The TREE_NO_WARNING or gimple_no_warning bits would be a flag that one
> should look up the hash_map if needed, those bits clear would imply empty
> bitmap.  Routines that copy trees or gimple stmts, like copy_node or
> gimple_copy would take care of adding the new tree/gimple * into the
> hash_map if needed too.
> And, because we have a lot of warnings that are only emitted in the FEs, or
> say before IPA and not afterwards, we could also have spots where we'd
> replace the bitmaps with ones that don't have any of the no longer
> interesting warning bits.
> Say during gimplification we could drop TREE_NO_WARNING bit or not set
> gimple_no_warning if the bitmap only contains FE warning bits, or (perhaps
> more questionable whether worth it) replace with a const_bitmap that doesn't
> have those).

Would you need to LTO stream & merge the bitmaps / maps somehow?
(maybe "unmap" to sth else when streaming individual stmts)  Not sure if
a bitmap is really necessary, we could simply have a tree/gimple * -> vec<>
mapping with records about emitted (or queued) diagnostics, like
OPT_, message pairs or so.

Richard.

>
> Jakub


Re: Update X86_TUNE_AVOID_256FMA_CHAINS for znver2

2019-07-30 Thread Jan Hubicka
> Hi,
> this patch enables logic which avoid FMA for matrix multiplicaiton loop
> for 256 bit vectors. The underlying issue is same as with znver1. While
> combined latency of mutliply and add operations is slower than FMA, the
> dependency chain in matrix multiplication depends only on additions
> that are faster.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
>   * config/i386/i386-options.c (ix86_option_override_internal): Default
>   PARAM_AVOID_FMA_MAX_BITS to 256 for znver2.
>   * conifg/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Set for
>   ZNVER2.

Hi,
this patch is now also backported to gcc9 branch (r273901)

Honza


Re: Update x86-tune-costs.h for znver2

2019-07-30 Thread Jan Hubicka
> Hi,
> this patch updates znver2 costs to match reality.  In particular we
> re-benchmarked memcpy strategies and it looks that glibc now wins even
> for relatively small blocks. 
> Moreover I updated costs of moves to reflect that znver2 has 256 vector
> paths and faster multiplication.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
>   * x86-tune-costs.h (znver2_memcpy): Update.
>   (znver2_costs): Update 256 bit SSE costs and multiplication.

Hello,
I have now backported the patch to gcc 9 branch.

Honza
> Index: config/i386/x86-tune-costs.h
> ===
> --- config/i386/x86-tune-costs.h  (revision 273727)
> +++ config/i386/x86-tune-costs.h  (working copy)
> @@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
>  static stringop_algs znver2_memcpy[2] = {
>{libcall, {{6, loop, false}, {14, unrolled_loop, false},
>{-1, rep_prefix_4_byte, false}}},
> -  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
> +  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
>{-1, libcall, false;
>  static stringop_algs znver2_memset[2] = {
>{libcall, {{8, loop, false}, {24, unrolled_loop, false},
>{2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> -  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
> +  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  
>  struct processor_costs znver2_cost = {
> @@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
>  in SImode and DImode.  */
>{8, 8},/* cost of storing MMX registers
>  in SImode and DImode.  */
> -  2, 3, 6,   /* cost of moving XMM,YMM,ZMM
> +  2, 2, 3,   /* cost of moving XMM,YMM,ZMM
>  register.  */
> -  {6, 6, 6, 10, 20}, /* cost of loading SSE registers
> +  {6, 6, 6, 6, 12},  /* cost of loading SSE registers
>  in 32,64,128,256 and 512-bit.  */
> -  {6, 6, 6, 10, 20}, /* cost of unaligned loads.  */
> +  {6, 6, 6, 6, 12},  /* cost of unaligned loads.  */
>{8, 8, 8, 8, 16},  /* cost of storing SSE registers
>  in 32,64,128,256 and 512-bit.  */
>{8, 8, 8, 8, 16},  /* cost of unaligned stores.  */
> @@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
>COSTS_N_INSNS (1), /* cost of cheap SSE instruction.  */
>COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns.  */
>COSTS_N_INSNS (3), /* cost of MULSS instruction.  */
> -  COSTS_N_INSNS (4), /* cost of MULSD instruction.  */
> +  COSTS_N_INSNS (3), /* cost of MULSD instruction.  */
>COSTS_N_INSNS (5), /* cost of FMA SS instruction.  */
>COSTS_N_INSNS (5), /* cost of FMA SD instruction.  */
>COSTS_N_INSNS (10),/* cost of DIVSS instruction.  
> */


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Martin Liška
On 7/30/19 9:46 AM, Martin Liška wrote:
> Anyway that's not a candidate for DCE. I'm testing following patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

One alternative approach can be to drop DECL_SET_IS_OPERATOR_DELETE in:
cat -n gcc/cp/decl.c | less
...
  4410  deltype = cp_build_type_attribute_variant (deltype, extvisattr);
  4411  deltype = build_exception_variant (deltype, empty_except_spec);
  4412  opdel = push_cp_library_fn (DELETE_EXPR, deltype, ECF_NOTHROW);
  4413  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
  4414  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
ECF_NOTHROW);
  4415  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
  4416  
  4417  if (flag_sized_deallocation)
  4418{
  4419  /* operator delete (void *, size_t, align_val_t); */
  4420  deltype = build_function_type_list (void_type_node, 
ptr_type_node,
  4421  size_type_node, 
align_type_node,
  4422  NULL_TREE);
  4423  deltype = cp_build_type_attribute_variant (deltype, 
extvisattr);
  4424  deltype = build_exception_variant (deltype, 
empty_except_spec);
  4425  opdel = push_cp_library_fn (DELETE_EXPR, deltype, 
ECF_NOTHROW);
  4426  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
  4427  opdel = push_cp_library_fn (VEC_DELETE_EXPR, deltype, 
ECF_NOTHROW);
  4428  DECL_SET_IS_OPERATOR_DELETE (opdel, true);
  4429}
  4430}

at lines 4426 and 4428.

Richi what do you prefer?
Martin


Re: [PATCH] Do not emit __gnu_lto_v1 symbol.

2019-07-30 Thread Martin Liška
On 7/29/19 3:46 PM, Georg-Johann Lay wrote:
> Hi Martin.
> 
> In SVN r273662 you changed a comment in avr.c from __gnu_lto_v1 to 
> __gnu_lto_slim without changing the relevant code:

Yes.

> 
>   /* __gnu_lto_slim is just a marker for the linker injected by toplev.c.
>  There is no need to trigger __do_clear_bss code for them.  */
> 
>   if (!STR_PREFIX_P (name, "__gnu_lto"))
>     avr_need_clear_bss_p = true;

Because now, the only symbol that starts with __gnu_lto is __gnu_lto_slim. 
That's why I adjusted
the comment only to reflect reality.

> 
> This means that just defining symbol __gnu_lto_slim will trigger code from 
> libgcc (code to clear .bss, which is supposed to be conditional depending on 
> whether .bss actually contains anything).
> 
> Would you also adjust that test?
> 
> Thanks,
> 
> Johann
> 



Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-30 Thread Martin Liška
Hi.

After playing with the test-case, I noticed that we don't want
to handle new/delete operators with a varying (SSA_NAME) as a second argument.

Considering following test-case:

$ cat /tmp/new.C
struct S {
  S ();
  ~S();
};
int a = 123;
void fn1() {
  S *s = new S[a];
  delete[] s;
}

$ ./xg++ -B. /tmp/new.C -O2 -fdump-tree-gimple=/dev/stdout:


  a.1_1 = a;
  D.2341 = (sizetype) a.1_1;
  if (D.2341 <= 9223372036854775800) goto ; else goto ;
  :
  iftmp.0 = D.2341 + 8;
  goto ;
  :
  iftmp.0 = 18446744073709551615;
  :
  D.2323 = operator new [] (iftmp.0);
  MEM[(sizetype *)D.2323] = D.2341;
  try
{
  D.2324 = D.2323 + 8;
  D.2325 = D.2324;
  _2 = D.2341 + 18446744073709551615;
  D.2326 = (long int) _2;
  try
{
  :
  if (D.2326 < 0) goto ; else goto ;
  :
  S::S (D.2325);
  D.2325 = D.2325 + 1;
  D.2326 = D.2326 + -1;
  goto ;
  :
}
  catch
{
  {
struct S * D.2336;

if (D.2324 != 0B) goto ; else goto ;
:
_3 = (sizetype) D.2326;
_4 = D.2341 - _3;
_5 = _4 + 18446744073709551615;
D.2336 = D.2324 + _5;
:
if (D.2336 == D.2324) goto ; else goto ;
:
D.2336 = D.2336 + 18446744073709551615;
S::~S (D.2336);
goto ;
:
goto ;
:
:
  }
}
  retval.2 = D.2324;
}
  catch
{
  if (D.2341 <= 9223372036854775800) goto ; else goto ;
  :
  iftmp.3 = D.2341 + 8;
  goto ;
  :
  iftmp.3 = 18446744073709551615;
  :
  operator delete [] (D.2323, iftmp.3);
}
  s = D.2323 + 8;
  {
struct S * D.2337;

if (s != 0B) goto ; else goto ;
:
try
  {
_6 = s + 18446744073709551608;
_7 = *_6;
D.2337 = s + _7;
:
if (D.2337 == s) goto ; else goto ;
:
D.2337 = D.2337 + 18446744073709551615;
S::~S (D.2337);
goto ;
:
  }
finally
  {
_8 = s + 18446744073709551608;
_9 = *_8;
_10 = _9 + 8;
_11 = s + 18446744073709551608;
operator delete [] (_11, _10);
  }
goto ;
:
:
  }
}


as seen from the dump, we first make a operator new[] for a capped value
based on variable 'a'. Latter we have a loop that calls S:S and catch block 
contains another
loop calling S::~S. Similarly last part contains delete[] which is based on 
number of really
allocated memory (that lives in *D.2323).

Anyway that's not a candidate for DCE. I'm testing following patch.

Martin
>From 312626229bfd4162550891bd8947b0fe310da6f5 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 30 Jul 2019 09:24:56 +0200
Subject: [PATCH] DCE: do not handle delete operators with a SSA_NAME as a size
 argument (PR tree-optimization/91270).

gcc/ChangeLog:

2019-07-30  Martin Liska  

	PR tree-optimization/91270
	* tree-ssa-dce.c (simple_delete_operator_p): New.
	(propagate_necessity): Use it to filter delete operators
	that we want to delete.
	(eliminate_unnecessary_stmts): Likewise.

gcc/testsuite/ChangeLog:

2019-07-30  Martin Liska  

	PR tree-optimization/91270
	* g++.dg/torture/pr91270.C: New test.
---
 gcc/testsuite/g++.dg/torture/pr91270.C | 10 ++
 gcc/tree-ssa-dce.c | 19 ++-
 2 files changed, 24 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr91270.C

diff --git a/gcc/testsuite/g++.dg/torture/pr91270.C b/gcc/testsuite/g++.dg/torture/pr91270.C
new file mode 100644
index 000..60d766e9e9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr91270.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+struct S {
+  ~S();
+};
+int a = 123;
+void fn1() {
+  S *s = new S[a];
+  delete[] s;
+}
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 763b76f0e53..e844824dc12 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -646,6 +646,18 @@ degenerate_phi_p (gimple *phi)
   return true;
 }
 
+/* Return true when a GIMPLE STMT is a delete call operator that
+   has either one argument or second argument is an integer constant.  */
+
+static bool
+simple_delete_operator_p (gimple *stmt)
+{
+  return (is_gimple_call (stmt)
+	  && (gimple_call_operator_delete_p (as_a  (stmt))
+	  && (gimple_call_num_args (stmt) == 1
+		  || TREE_CODE (gimple_call_arg (stmt, 1)) == INTEGER_CST)));
+}
+
 /* Propagate necessity using the operands of necessary statements.
Process the uses on each statement in the worklist, and add all
feeding statements which contribute to the calculation of this
@@ -805,9 +817,7 @@ propagate_necessity (bool aggressive)
 	 allocation function do not mark that necessary through
 	 processing the argument.  */
 	  if (gimple_call_builtin_p (stmt, BUILT_IN_FREE)
-	  || (is_gimple_call (stmt)
-		  && gimple_call_operator_delete_p (as_a 

Re: Fix up -fexcess-precision handling in LTO (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 30, 2019 at 09:35:04AM +0200, Richard Biener wrote:
> Looks OK to me but I'd like Joseph to chime in here.  I think for

Ok, will wait for Joseph.

> FE support it means adding actual _widening_ casts to the effective
> compute precision type (thus long double).  But once that's done
> the rest should be indeed FE independent (but I doubt D and go get
> it correct).

The FE must add casts to whatever excess_precision_type returns (if
non-NULL).  Seems all of C, D and Go do something:
c/c-typeck.c: && (eptype = excess_precision_type (type2)) != NULL_TREE)
c/c-typeck.c:  && (eptype = excess_precision_type (type1)) != 
NULL_TREE)
c/c-typeck.c:  && (eptype = excess_precision_type (type0)) != NULL_TREE)
c/c-typeck.c:  && (eptype = excess_precision_type (type1)) != NULL_TREE)
c-family/c-lex.c:  const_type = excess_precision_type (type);
d/expr.cc:  tree eptype = excess_precision_type (type);
d/expr.cc:tree eptype = excess_precision_type (type);
go/go-gcc.cc:tree computed_type = excess_precision_type(type_tree);
go/go-gcc.cc:  tree computed_type = excess_precision_type(type_tree);
go/go-gcc.cc:  excess_type = excess_precision_type(TREE_TYPE(args[0]));
match.pd:  && !excess_precision_type (newtype)))
and in the middle-end (since Tamar's changes) we only care if it returns
NULL or not.

Jakub


Re: Fix up -fexcess-precision handling in LTO (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Richard Biener
On Tue, 30 Jul 2019, Jakub Jelinek wrote:

> On Tue, Jul 02, 2019 at 04:43:54PM +, Tamar Christina wrote:
> > Here's an updated patch with the changes processed from the previous review.
> > 
> > I've bootstrapped and regtested on aarch64-none-linux-gnu and 
> > x86_64-pc-linux-gnu and no issues.
> 
> These changes also broke gcc.dg/torture/c99-contract-1.c with -flto
> on i686-linux.
> 
> The problem is that after moving the folding from convert.c to match.pd,
> it is now performed not only during FE folding, but also much later on,
> including post-IPA optimizations in lto1.  The C FE arranges
> flag_excess_precision_cmdline and flag_excess_precision to be
> EXCESS_PRECISION_STANDARD and thus on i686-linux floating point arithmetics
> is performed in long double, but the lto1 FE has both set to
> EXCESS_PRECISION_FAST and undoes that widening.
> 
> There seems to be quite complicated distinction between
> flag_excess_precision_cmdline and flag_excess_precision, but it seems
> that these days it is unnecessary, flag_excess_precision is only ever set
> from flag_excess_precision_cmdline, perhaps in the past targets used to
> modify flag_excess_precision, but they don't do that anymore.
> 
> Furthermore, some comments claimed that the proper EXCESS_PRECISION_STANDARD
> handling requires FE support, but that also doesn't seem to be the case
> these days, some FEs even just use EXCESS_PRECISION_STANDARD by default
> (go, D).
> 
> So, the following patch gets rid of flag_excess_precision and renames
> flag_excess_precision_cmdline to flag_excess_precision, plus adds
> Optimization flag to that command line option, so that we remember it during
> compilation and e.g. during LTO can then have some functions with standard
> excess precision and others with fast excess precision.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Looks OK to me but I'd like Joseph to chime in here.  I think for
FE support it means adding actual _widening_ casts to the effective
compute precision type (thus long double).  But once that's done
the rest should be indeed FE independent (but I doubt D and go get
it correct).

Thanks,
Richard.

> 2019-07-30  Jakub Jelinek  
> 
>   PR middle-end/91283
>   * common.opt (fexcess-precision=): Add Optimization flag.  Use
>   flag_excess_precision variable instead of
>   flag_excess_precision_cmdline.
>   * flags.h (class target_flag_state): Remove x_flag_excess_precision
>   member.
>   (flag_excess_precision): Don't define.
>   * langhooks.c (lhd_post_options): Set flag_excess_precision instead of
>   flag_excess_precision_cmdline.  Remove comment.
>   * opts.c (set_fast_math_flags): Use frontend_set_flag_excess_precision
>   and x_flag_excess_precision instead of
>   frontend_set_flag_excess_precision_cmdline and
>   x_flag_excess_precision_cmdline.
>   (fast_math_flags_set_p): Use x_flag_excess_precision instead of
>   x_flag_excess_precision_cmdline.
>   * toplev.c (init_excess_precision): Remove.
>   (lang_dependent_init_target): Don't call it.
> ada/
>   * gcc-interface/misc.c (gnat_post_options): Set flag_excess_precision
>   instead of flag_excess_precision_cmdline.
> brig/
>   * brig-lang.c (brig_langhook_post_options): Set flag_excess_precision
>   instead of flag_excess_precision_cmdline.
> c-family/
>   * c-common.c (c_ts18661_flt_eval_method): Use flag_excess_precision
>   instead of flag_excess_precision_cmdline.
>   * c-cppbuiltin.c (c_cpp_flt_eval_method_iec_559): Likewise.
>   * c-opts.c (c_common_post_options): Likewise.
> d/
>   * d-lang.cc (d_post_options): Set flag_excess_precision instead of
>   flag_excess_precision_cmdline.
> fortran/
>   * options.c (gfc_post_options): Set flag_excess_precision instead of
>   flag_excess_precision_cmdline.  Remove comment.
> go/
>   * go-lang.c (go_langhook_post_options): Set flag_excess_precision
>   instead of flag_excess_precision_cmdline.
> lto/
>   * lto-lang.c (lto_post_options): Set flag_excess_precision instead of
>   flag_excess_precision_cmdline.  Remove comment.
> 
> --- gcc/common.opt.jj 2019-07-29 12:56:38.968248060 +0200
> +++ gcc/common.opt2019-07-29 13:01:24.067067583 +0200
> @@ -1399,7 +1399,7 @@ Common Report Var(flag_expensive_optimiz
>  Perform a number of minor, expensive optimizations.
>  
>  fexcess-precision=
> -Common Joined RejectNegative Enum(excess_precision) 
> Var(flag_excess_precision_cmdline) Init(EXCESS_PRECISION_DEFAULT) 
> SetByCombined
> +Common Joined RejectNegative Enum(excess_precision) 
> Var(flag_excess_precision) Init(EXCESS_PRECISION_DEFAULT) Optimization 
> SetByCombined
>  -fexcess-precision=[fast|standard]   Specify handling of excess 
> floating-point precision.
>  
>  Enum
> --- gcc/flags.h.jj2019-07-10 15:52:20.362155642 +0200
> +++ gcc/flags.h   2019-07-29 13:02:05.488460207 +0200
> @@ -51,9 +51,6 @@ public:
>

[committed] Fix omp lowering when global vars turn to be addressable while lowering (PR middle-end/91216)

2019-07-30 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because the reduction is expanded as
__atomic_* operation on the address of the global variable which makes the
global var, previously not TREE_ADDRESSABLE, suddenly TREE_ADDRESSABLE
during the lowering, but after scanning when we decided how to pass certain
vars.  If the variable is then privatized again, the scanning and lowering
disagrees on how to pass it.  I believe this can happen only for global vars
and the following patch fixes it by remembering the global vars for which
we saw once they aren't addressable and repeating that even in cases where
they are later addressable.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for backporting.

2019-07-30  Jakub Jelinek  

PR middle-end/91216
* omp-low.c (global_nonaddressable_vars): New variable.
(use_pointer_for_field): For global decls, if they are non-addressable,
remember it in the global_nonaddressable_vars bitmap, if they are
addressable and in the global_nonaddressable_vars bitmap, ignore their
TREE_ADDRESSABLE bit.
(omp_copy_decl_2): Clear TREE_ADDRESSABLE also on private copies of
vars in global_nonaddressable_vars bitmap.
(execute_lower_omp): Free global_nonaddressable_vars bitmap.

* gcc.dg/gomp/pr91216.c: New test.

--- gcc/omp-low.c.jj2019-07-20 13:18:54.477980721 +0200
+++ gcc/omp-low.c   2019-07-29 18:49:04.838065123 +0200
@@ -162,6 +162,7 @@ static splay_tree all_contexts;
 static int taskreg_nesting_level;
 static int target_nesting_level;
 static bitmap task_shared_vars;
+static bitmap global_nonaddressable_vars;
 static vec taskreg_contexts;
 
 static void scan_omp (gimple_seq *, omp_context *);
@@ -426,7 +427,26 @@ use_pointer_for_field (tree decl, omp_co
 
   /* Do not use copy-in/copy-out for variables that have their
 address taken.  */
-  if (TREE_ADDRESSABLE (decl))
+  if (is_global_var (decl))
+   {
+ /* For file scope vars, track whether we've seen them as
+non-addressable initially and in that case, keep the same
+answer for the duration of the pass, even when they are made
+addressable later on e.g. through reduction expansion.  Global
+variables which weren't addressable before the pass will not
+have their privatized copies address taken.  See PR91216.  */
+ if (!TREE_ADDRESSABLE (decl))
+   {
+ if (!global_nonaddressable_vars)
+   global_nonaddressable_vars = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (global_nonaddressable_vars, DECL_UID (decl));
+   }
+ else if (!global_nonaddressable_vars
+  || !bitmap_bit_p (global_nonaddressable_vars,
+DECL_UID (decl)))
+   return true;
+   }
+  else if (TREE_ADDRESSABLE (decl))
return true;
 
   /* lower_send_shared_vars only uses copy-in, but not copy-out
@@ -504,8 +524,10 @@ omp_copy_decl_2 (tree var, tree name, tr
  it's address.  But we don't need to take address of privatizations
  from that var.  */
   if (TREE_ADDRESSABLE (var)
-  && task_shared_vars
-  && bitmap_bit_p (task_shared_vars, DECL_UID (var)))
+  && ((task_shared_vars
+  && bitmap_bit_p (task_shared_vars, DECL_UID (var)))
+ || (global_nonaddressable_vars
+ && bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var)
 TREE_ADDRESSABLE (copy) = 0;
   ctx->block_vars = copy;
 
@@ -12730,6 +12752,7 @@ execute_lower_omp (void)
   all_contexts = NULL;
 }
   BITMAP_FREE (task_shared_vars);
+  BITMAP_FREE (global_nonaddressable_vars);
 
   /* If current function is a method, remove artificial dummy VAR_DECL created
  for non-static data member privatization, they aren't needed for
--- gcc/testsuite/gcc.dg/gomp/pr91216.c.jj  2019-07-29 18:57:02.915209678 
+0200
+++ gcc/testsuite/gcc.dg/gomp/pr91216.c 2019-07-29 18:56:53.930338628 +0200
@@ -0,0 +1,20 @@
+/* PR middle-end/91216 */
+
+int r;
+
+void
+foo (int *a)
+{
+  int i;
+  #pragma omp for reduction(+:r)
+  for (i = 0; i < 64; i++)
+a[i] = i;
+  #pragma omp for private (r)
+  for (i = 0; i < 64; i++)
+{
+  r = 0;
+  #pragma omp parallel shared(r)
+  #pragma omp master
+  r = r + 1;
+}
+}

Jakub


Re: [PATCH] Fix up gcc.dg/type-convert-var.c testcase (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Richard Biener
On Tue, 30 Jul 2019, Jakub Jelinek wrote:

> On Tue, Jul 30, 2019 at 09:05:35AM +0200, Rainer Orth wrote:
> > > This new testcase FAILs e.g. on i686-linux.  The problem is that
> > 
> > this is PR middle-end/91282.
> 
> Indeed.
> 
> > > with no dg-options, the testcase options default to -ansi, which
> > > implies -fexcess-precision=standard.  On i686-linux, that is conversion to
> > > long double which must (and does) survive until expansion.
> > >
> > > Fixed by using -fexcess-precision=fast, tested on x86_64-linux and
> > > i686-linux, ok for trunk?
> > >
> > > 2019-07-30  Jakub Jelinek  
> > >
> > >   * gcc.dg/type-convert-var.c: Add -0fexcess-precision=fast to
> >   ^ typo
> 
> Oops, fixed:

OK.

> 2019-07-30  Jakub Jelinek  
> 
>   PR middle-end/91282
>   * gcc.dg/type-convert-var.c: Add -fexcess-precision=fast to
>   dg-additional-options.
> 
> --- gcc/testsuite/gcc.dg/type-convert-var.c.jj2019-07-28 
> 17:29:27.156351325 +0200
> +++ gcc/testsuite/gcc.dg/type-convert-var.c   2019-07-30 08:51:33.349558035 
> +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-additional-options "-O1 -fdump-tree-optimized" } */
> +/* { dg-additional-options "-fexcess-precision=fast -O1 
> -fdump-tree-optimized" } */
>  void foo (float a, float b, float *c)
>  {
>double e = (double)a * (double)b;
> 
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

[PATCH] Clean up dangling pointers in cgraph_edge (PR ipa/89330).

2019-07-30 Thread Martin Liška
Hi.

We have to clean up dangling pointers before we call ggc_free for a cgraph_edge.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
And it survives --enable-checking=release bootstrap on x86_64-linux-gnu.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-07-30  Martin Liska  

PR ipa/89330
* cgraph.c (symbol_table::free_edge): Memset 0 to cgraph_edge
before we call ggc_free.
---
 gcc/cgraph.c | 2 ++
 1 file changed, 2 insertions(+)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 81250acb70c..372974f12df 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1008,6 +1008,8 @@ symbol_table::free_edge (cgraph_edge *e)
   if (e->m_summary_id != -1)
 edge_released_summary_ids.safe_push (e->m_summary_id);
 
+  /* Clear out the edge so we do not dangle pointers.  */
+  memset (e, 0, sizeof (*e));
   if (e->indirect_info)
 ggc_free (e->indirect_info);
   ggc_free (e);



[PATCH] Fix PR91291

2019-07-30 Thread Richard Biener


Testing the following.

Richard.

2019-07-30  Richard Biener  

PR tree-optimization/91291
* tree-ssa-sccvn.c (rpo_elim::eliminate_push_avail): Ignore
constant values.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 273896)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -6253,7 +6345,8 @@ void
 rpo_elim::eliminate_push_avail (basic_block bb, tree leader)
 {
   tree valnum = VN_INFO (leader)->valnum;
-  if (valnum == VN_TOP)
+  if (valnum == VN_TOP
+  || is_gimple_min_invariant (valnum))
 return;
   if (dump_file && (dump_flags & TDF_DETAILS))
 {


[C PATCH] Fix C error-recovery (PR c/91192)

2019-07-30 Thread Jakub Jelinek
Hi!

Neither c_expr_sizeof_expr nor c_expr_sizeof_type bother with filling up
the locations in c_expr struct they return.  Normally, this isn't a problem,
as the sole caller of those calls set_c_expr_source_range.  It doesn't call
it though if we reach CPP_EOF while parsing the sizeof expression.
Later on when the callers access the location info, it can randomly segfault
during error-recovery.  The testcase is too obscure with too many errors to
include IMHO though, and as it only ICEs randomly, I'm not including it.

The fix is simple, just initialize the locations to something, doesn't
matter much exactly to what, this patch uses a range from start to start.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-07-30  Jakub Jelinek  

PR c/91192
* c-parser.c (c_parser_sizeof_expression): Call set_c_expr_source_range
even if finish is UNKNOWN_LOCATION, just use start as finish in that
case.

--- gcc/c/c-parser.c.jj 2019-07-19 20:53:42.121228422 +0200
+++ gcc/c/c-parser.c2019-07-29 16:54:43.046562282 +0200
@@ -7477,8 +7477,9 @@ c_parser_sizeof_expression (c_parser *pa
error_at (expr_loc, "% applied to a bit-field");
   result = c_expr_sizeof_expr (expr_loc, expr);
 }
-  if (finish != UNKNOWN_LOCATION)
-set_c_expr_source_range (, start, finish);
+  if (finish == UNKNOWN_LOCATION)
+finish = start;
+  set_c_expr_source_range (, start, finish);
   return result;
 }
 

Jakub


[committed] Fix up expand_vec_perm_blend for V64QImode (PR target/91150)

2019-07-30 Thread Jakub Jelinek
Hi!

As the following testcase shows, for V64QImode expand_vec_perm_blend
needs to build a 64-bit mask, so using unsigned type for it results in
wrong-code.
While the testcase in the PR which also requires AVX512VBMI regressed
recently, the actual bug as shown by this different testcase is there
since the introduction of AVX512BW support.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed as obvious to trunk, queued for backporting.

2019-07-30  Jakub Jelinek  

PR target/91150
* config/i386/i386-expand.c (expand_vec_perm_blend): Change mask type
from unsigned to unsigned HOST_WIDE_INT.  For E_V64QImode cast
comparison to unsigned HOST_WIDE_INT before shifting it left.

* gcc.target/i386/avx512bw-pr91150.c: New test.

--- gcc/config/i386/i386-expand.c.jj2019-07-16 13:35:01.213083117 +0200
+++ gcc/config/i386/i386-expand.c   2019-07-29 15:11:09.553633265 +0200
@@ -16385,7 +16385,8 @@ static bool
 expand_vec_perm_blend (struct expand_vec_perm_d *d)
 {
   machine_mode mmode, vmode = d->vmode;
-  unsigned i, mask, nelt = d->nelt;
+  unsigned i, nelt = d->nelt;
+  unsigned HOST_WIDE_INT mask;
   rtx target, op0, op1, maskop, x;
   rtx rperm[32], vperm;
 
@@ -16439,7 +16440,7 @@ expand_vec_perm_blend (struct expand_vec
 case E_V16SImode:
 case E_V8DImode:
   for (i = 0; i < nelt; ++i)
-   mask |= (d->perm[i] >= nelt) << i;
+   mask |= ((unsigned HOST_WIDE_INT) (d->perm[i] >= nelt)) << i;
   break;
 
 case E_V2DImode:
--- gcc/testsuite/gcc.target/i386/avx512bw-pr91150.c.jj 2019-07-29 
15:47:09.750324258 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr91150.c2019-07-29 
15:47:15.369242808 +0200
@@ -0,0 +1,37 @@
+/* PR target/91150 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+typedef unsigned char V __attribute__((vector_size (64)));
+
+__attribute__((noipa)) void
+foo (V *x, V *y, V *z)
+{
+  *x = __builtin_shuffle (*y, *z, (V) { 0, 1, 2, 3, 4, 5, 6, 7, 8,
+   9, 10, 11, 12, 13, 14, 15,
+   80, 81, 82, 83, 84, 85, 86, 87,
+   88, 89, 90, 91, 92, 93, 94, 95,
+   96, 97, 98, 99, 100, 101, 102, 103,
+   104, 105, 106, 107, 108, 109, 110, 111,
+   112, 113, 114, 115, 116, 117, 118, 119,
+   120, 121, 122, 123, 124, 125, 126, 127 
});
+}
+
+static void
+avx512bw_test (void)
+{
+  union U { unsigned char a[64]; V v; } a, b, c;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  b.a[i] = i + 1;
+  c.a[i] = i + 65;
+}
+  foo (, , );
+  for (i = 0; i < 64; i++)
+if (a.a[i] != (i < 16 ? i + 1 : i + 65))
+  __builtin_abort ();
+}

Jakub


[PATCH] Adding _Dependent_ptr type qualifier in C part 1/3

2019-07-30 Thread Akshat Garg
Hi,
This patch includes C front-end code for a type qualifier _Dependent_ptr.
The patch has been tested using the following
make bootstrap -j 4
make -k check -j 4

on x86_64-linux-gnu.

Thanks,
Akshat

gcc/ChangeLog:

2019-07-30  Akshat Garg 

* c-family/c-common.c (struct c_common_resword c_common_reswords):
Add "_Dependent_ptr".
(c_apply_type_quals_to_decl): Set the flag for _Dependent_ptr if
qualified.
(keyword_is_type_qualifier): Add RID_DEPENDENT_PTR.
* c-family/c-common.h (enum rid): Add RID_DEPENDENT_PTR.
* c-family/c-format.c (check_format_types): Add dependent pointer
as a qualifier check.
* c-family/c-pretty-print.c (pp_c_cv_qualifiers): Handle dependent
pointer qualifier.
* c/c-aux-info.c (gen_type): Handle dependent pointer qualifier.
(gen_decl): Handle dependent pointer qualifier.
* c/c-decl.c (merge_decls): Set old declaration as having dependent
pointer qualification if new declaration has one.
(shadow_tag_warned): Add dependent_ptr_p to declspecs check.
(quals_from_declspecs): Add dependent_ptr_p to declspecs check.
(grokdeclarator): Add checks for dependent pointer qualifier and
warn of duplicate or errors. Allow dependent pointer for pointer types only.
* c/c-parser.c (c_keyword_starts_typename, c_token_is_qualifier,
c_token_starts_declspecs): Add RID_DEPENDENT_PTR.
(c_parser_static_assert_declaration_no_semi): Add _Dependent_ptr
qualifier in comments.
(c_parser_declspecs, c_parser_attribute_any_word): Add
RID_DEPENDENT_PTR.
* c/c-tree.h (C_TYPE_FIELDS_DEPENDENT_PTR): New macro to mark
dependent pointer.
(enum c_declspec_word): Add cdw_dependent_ptr.
(struct c_declspecs): Add dependent_ptr_p field.
* print-tree.c (print_node): Print dependent_ptr qualifier.
* tree-core.hi (enum cv_qualifier): Add TYPE_QUAL_DEPENDENT_PTR.
(enum tree_index): Add TI_DEPENDENT_PTR_TYPE.
(struct tree_base): Add dependent_ptr_flag field.
* tree-pretty-print.c (dump_generic_node): Print dependent pointer
type qualifier.
* tree.c (fld_type_variant, set_type_quals): Set TYPE_DEPENDENT_PTR.
* tree.h (TREE_THIS_DEPENDENT_PTR): New macro. To access
dependent_ptr_flag field in tree_base.
(TYPE_DEPENDENT_PTR): New accessor macro.
(TYPE_QUALS, TYPE_QUALS_NO_ADDR_SPACE): Add TYPE_QUAL_DEPENDENT_PTR.
(dependent_ptrTI_type_node): Add new type node.

gcc/testsuite/ChangeLog:

2019-07-30  Akshat Garg 

* gcc.dg/c11-dependent_ptr-test-1.c: New test for _Dependent_ptr
qualifier.
* gcc.dg/{p0190r4_fig8, p0190r4_fig9, p0190r4_fig10, p0190r4_fig11,
p0190r4_fig12, p0190r4_fig13}.c: New tests from document P0190R4 for
_Dependent_ptr qualifier.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index cb92710..4f09037 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -345,6 +345,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Alignas", RID_ALIGNAS,   D_CONLY },
   { "_Alignof", RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+ { "_Dependent_ptr",   RID_DEPENDENT_PTR,  0 },
   { "_Bool", RID_BOOL,  D_CONLY },
   { "_Complex", RID_COMPLEX, 0 },
   { "_Imaginary", RID_IMAGINARY, D_CONLY },
@@ -3546,6 +3547,11 @@ c_apply_type_quals_to_decl (int type_quals, tree
decl)
   TREE_SIDE_EFFECTS (decl) = 1;
   TREE_THIS_VOLATILE (decl) = 1;
 }
+  if (type_quals & TYPE_QUAL_DEPENDENT_PTR)
+{
+  TREE_SIDE_EFFECTS (decl) = 1;
+  TREE_THIS_DEPENDENT_PTR (decl) = 1;
+}
   if (type_quals & TYPE_QUAL_RESTRICT)
 {
   while (type && TREE_CODE (type) == ARRAY_TYPE)
@@ -7851,6 +7857,7 @@ keyword_is_type_qualifier (enum rid keyword)
 case RID_VOLATILE:
 case RID_RESTRICT:
 case RID_ATOMIC:
+case RID_DEPENDENT_PTR:
   return true;
 default:
   return false;
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 117d729..ab55882 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -68,7 +68,7 @@ enum rid
   RID_UNSIGNED, RID_LONG,RID_CONST, RID_EXTERN,
   RID_REGISTER, RID_TYPEDEF, RID_SHORT, RID_INLINE,
   RID_VOLATILE, RID_SIGNED,  RID_AUTO,  RID_RESTRICT,
-  RID_NORETURN, RID_ATOMIC,
+  RID_NORETURN, RID_ATOMIC, RID_DEPENDENT_PTR,

   /* C extensions */
   RID_COMPLEX, RID_THREAD, RID_SAT,
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index d134116..00769bb 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -4172,6 +4172,7 @@ check_format_types (const substring_loc _loc,
   && (TYPE_READONLY (cur_type)
   || TYPE_VOLATILE (cur_type)
   || TYPE_ATOMIC (cur_type)
+  || TYPE_DEPENDENT_PTR (cur_type)
   || TYPE_RESTRICT (cur_type)))
  warning (OPT_Wformat_, "extra type qualifiers in format "
  "argument (argument %d)",
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 

Re: [PATCH] Fix up gcc.dg/type-convert-var.c testcase (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 30, 2019 at 09:05:35AM +0200, Rainer Orth wrote:
> > This new testcase FAILs e.g. on i686-linux.  The problem is that
> 
> this is PR middle-end/91282.

Indeed.

> > with no dg-options, the testcase options default to -ansi, which
> > implies -fexcess-precision=standard.  On i686-linux, that is conversion to
> > long double which must (and does) survive until expansion.
> >
> > Fixed by using -fexcess-precision=fast, tested on x86_64-linux and
> > i686-linux, ok for trunk?
> >
> > 2019-07-30  Jakub Jelinek  
> >
> > * gcc.dg/type-convert-var.c: Add -0fexcess-precision=fast to
>   ^ typo

Oops, fixed:

2019-07-30  Jakub Jelinek  

PR middle-end/91282
* gcc.dg/type-convert-var.c: Add -fexcess-precision=fast to
dg-additional-options.

--- gcc/testsuite/gcc.dg/type-convert-var.c.jj  2019-07-28 17:29:27.156351325 
+0200
+++ gcc/testsuite/gcc.dg/type-convert-var.c 2019-07-30 08:51:33.349558035 
+0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O1 -fdump-tree-optimized" } */
+/* { dg-additional-options "-fexcess-precision=fast -O1 -fdump-tree-optimized" 
} */
 void foo (float a, float b, float *c)
 {
   double e = (double)a * (double)b;


Jakub


Fix up -fexcess-precision handling in LTO (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 02, 2019 at 04:43:54PM +, Tamar Christina wrote:
> Here's an updated patch with the changes processed from the previous review.
> 
> I've bootstrapped and regtested on aarch64-none-linux-gnu and 
> x86_64-pc-linux-gnu and no issues.

These changes also broke gcc.dg/torture/c99-contract-1.c with -flto
on i686-linux.

The problem is that after moving the folding from convert.c to match.pd,
it is now performed not only during FE folding, but also much later on,
including post-IPA optimizations in lto1.  The C FE arranges
flag_excess_precision_cmdline and flag_excess_precision to be
EXCESS_PRECISION_STANDARD and thus on i686-linux floating point arithmetics
is performed in long double, but the lto1 FE has both set to
EXCESS_PRECISION_FAST and undoes that widening.

There seems to be quite complicated distinction between
flag_excess_precision_cmdline and flag_excess_precision, but it seems
that these days it is unnecessary, flag_excess_precision is only ever set
from flag_excess_precision_cmdline, perhaps in the past targets used to
modify flag_excess_precision, but they don't do that anymore.

Furthermore, some comments claimed that the proper EXCESS_PRECISION_STANDARD
handling requires FE support, but that also doesn't seem to be the case
these days, some FEs even just use EXCESS_PRECISION_STANDARD by default
(go, D).

So, the following patch gets rid of flag_excess_precision and renames
flag_excess_precision_cmdline to flag_excess_precision, plus adds
Optimization flag to that command line option, so that we remember it during
compilation and e.g. during LTO can then have some functions with standard
excess precision and others with fast excess precision.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-07-30  Jakub Jelinek  

PR middle-end/91283
* common.opt (fexcess-precision=): Add Optimization flag.  Use
flag_excess_precision variable instead of
flag_excess_precision_cmdline.
* flags.h (class target_flag_state): Remove x_flag_excess_precision
member.
(flag_excess_precision): Don't define.
* langhooks.c (lhd_post_options): Set flag_excess_precision instead of
flag_excess_precision_cmdline.  Remove comment.
* opts.c (set_fast_math_flags): Use frontend_set_flag_excess_precision
and x_flag_excess_precision instead of
frontend_set_flag_excess_precision_cmdline and
x_flag_excess_precision_cmdline.
(fast_math_flags_set_p): Use x_flag_excess_precision instead of
x_flag_excess_precision_cmdline.
* toplev.c (init_excess_precision): Remove.
(lang_dependent_init_target): Don't call it.
ada/
* gcc-interface/misc.c (gnat_post_options): Set flag_excess_precision
instead of flag_excess_precision_cmdline.
brig/
* brig-lang.c (brig_langhook_post_options): Set flag_excess_precision
instead of flag_excess_precision_cmdline.
c-family/
* c-common.c (c_ts18661_flt_eval_method): Use flag_excess_precision
instead of flag_excess_precision_cmdline.
* c-cppbuiltin.c (c_cpp_flt_eval_method_iec_559): Likewise.
* c-opts.c (c_common_post_options): Likewise.
d/
* d-lang.cc (d_post_options): Set flag_excess_precision instead of
flag_excess_precision_cmdline.
fortran/
* options.c (gfc_post_options): Set flag_excess_precision instead of
flag_excess_precision_cmdline.  Remove comment.
go/
* go-lang.c (go_langhook_post_options): Set flag_excess_precision
instead of flag_excess_precision_cmdline.
lto/
* lto-lang.c (lto_post_options): Set flag_excess_precision instead of
flag_excess_precision_cmdline.  Remove comment.

--- gcc/common.opt.jj   2019-07-29 12:56:38.968248060 +0200
+++ gcc/common.opt  2019-07-29 13:01:24.067067583 +0200
@@ -1399,7 +1399,7 @@ Common Report Var(flag_expensive_optimiz
 Perform a number of minor, expensive optimizations.
 
 fexcess-precision=
-Common Joined RejectNegative Enum(excess_precision) 
Var(flag_excess_precision_cmdline) Init(EXCESS_PRECISION_DEFAULT) SetByCombined
+Common Joined RejectNegative Enum(excess_precision) Var(flag_excess_precision) 
Init(EXCESS_PRECISION_DEFAULT) Optimization SetByCombined
 -fexcess-precision=[fast|standard] Specify handling of excess 
floating-point precision.
 
 Enum
--- gcc/flags.h.jj  2019-07-10 15:52:20.362155642 +0200
+++ gcc/flags.h 2019-07-29 13:02:05.488460207 +0200
@@ -51,9 +51,6 @@ public:
   align_flags x_align_jumps;
   align_flags x_align_labels;
   align_flags x_align_functions;
-
-  /* The excess precision currently in effect.  */
-  enum excess_precision x_flag_excess_precision;
 };
 
 extern class target_flag_state default_target_flag_state;
@@ -68,12 +65,6 @@ extern class target_flag_state *this_tar
 #define align_labels(this_target_flag_state->x_align_labels)
 #define align_functions (this_target_flag_state->x_align_functions)
 
-/* 

Re: [PATCH] Fix up gcc.dg/type-convert-var.c testcase (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Rainer Orth
Hi Jakub,

> On Tue, Jul 02, 2019 at 04:43:54PM +, Tamar Christina wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/type-convert-var.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-O1 -fdump-tree-optimized" } */
>> +void foo (float a, float b, float *c)
>> +{
>> +  double e = (double)a * (double)b;
>> +  *c = (float)e;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not {double} "optimized" } } */
>> 
>
> This new testcase FAILs e.g. on i686-linux.  The problem is that

this is PR middle-end/91282.

> with no dg-options, the testcase options default to -ansi, which
> implies -fexcess-precision=standard.  On i686-linux, that is conversion to
> long double which must (and does) survive until expansion.
>
> Fixed by using -fexcess-precision=fast, tested on x86_64-linux and
> i686-linux, ok for trunk?
>
> 2019-07-30  Jakub Jelinek  
>
>   * gcc.dg/type-convert-var.c: Add -0fexcess-precision=fast to
  ^ typo

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix up gcc.dg/type-convert-var.c testcase (was Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch))

2019-07-30 Thread Jakub Jelinek
On Tue, Jul 02, 2019 at 04:43:54PM +, Tamar Christina wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/type-convert-var.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O1 -fdump-tree-optimized" } */
> +void foo (float a, float b, float *c)
> +{
> +  double e = (double)a * (double)b;
> +  *c = (float)e;
> +}
> +
> +/* { dg-final { scan-tree-dump-not {double} "optimized" } } */
> 

This new testcase FAILs e.g. on i686-linux.  The problem is that
with no dg-options, the testcase options default to -ansi, which
implies -fexcess-precision=standard.  On i686-linux, that is conversion to
long double which must (and does) survive until expansion.

Fixed by using -fexcess-precision=fast, tested on x86_64-linux and
i686-linux, ok for trunk?

2019-07-30  Jakub Jelinek  

* gcc.dg/type-convert-var.c: Add -0fexcess-precision=fast to
dg-additional-options.

--- gcc/testsuite/gcc.dg/type-convert-var.c.jj  2019-07-28 17:29:27.156351325 
+0200
+++ gcc/testsuite/gcc.dg/type-convert-var.c 2019-07-30 08:51:33.349558035 
+0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O1 -fdump-tree-optimized" } */
+/* { dg-additional-options "-fexcess-precision=fast -O1 -fdump-tree-optimized" 
} */
 void foo (float a, float b, float *c)
 {
   double e = (double)a * (double)b;


Jakub