Re: [PATCH 1/2] gcc/riscv: Include more registers in SIBCALL_REGS

2019-08-22 Thread Jim Wilson
On Mon, Aug 19, 2019 at 12:15 PM Andrew Burgess
 wrote:
> I don't see any reason why we couldn't add t1, and a0 to a7 into this
> set, and this is what this patch does.

SIBCALL_REGS already includes t1 and t2.  It is t0 aka x5 that is
missing.  I think this is wrong.  As Andrew mentioned, this will
penalize any target that has a call-stack aware branch predictor.  We
could add a tune flag for that, but it doesn't seem worth the effort.
Adding the other regs a0 to a7 is OK.  They won't be used unless they
are available.  This is OK without the t0/x5 change.

Jim


[PATCH resend 2/2] PR c/65403 - Add tests for -Wno-error=

2019-08-22 Thread Alex Henrie
* c-c++-common/pr65403-1.c: New test.
* c-c++-common/pr65403-2.c: New test.
---
 gcc/testsuite/c-c++-common/pr65403-1.c | 10 ++
 gcc/testsuite/c-c++-common/pr65403-2.c | 15 +++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/pr65403-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr65403-2.c

diff --git a/gcc/testsuite/c-c++-common/pr65403-1.c 
b/gcc/testsuite/c-c++-common/pr65403-1.c
new file mode 100644
index 000..fbe004a1f78
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr65403-1.c
@@ -0,0 +1,10 @@
+/* PR c/65403 */
+/* Test an unrecognized -Wno-error option in the absence of any other
+   diagnostics. The -Wno-error option should be ignored. */
+
+/* { dg-options "-Werror -Wno-error=some-future-warning" } */
+
+int main(int argc, char **argv)
+{
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/pr65403-2.c 
b/gcc/testsuite/c-c++-common/pr65403-2.c
new file mode 100644
index 000..8b5faa7270e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr65403-2.c
@@ -0,0 +1,15 @@
+/* PR c/65403 */
+/* Test a warning, treated as an error, that some future -Wno-error option
+   might downgrade back to a warning. The -Wno-error option should produce a
+   warning in this case. */
+
+/* { dg-options "-Wunused-variable -Werror -Wno-error=some-future-warning" } */
+
+int main(int argc, char **argv)
+{
+  int foo; /* { dg-error "unused variable 'foo'" } */
+  return 0;
+}
+
+/* { dg-error "no option '-Wsome-future-warning'" "" { target *-*-* } 0 } */
+/* { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 } 
*/
-- 
2.23.0



[PATCH resend 1/2] PR c/65403 - Ignore -Wno-error=

2019-08-22 Thread Alex Henrie
From: Manuel López-Ibáñez 

* opts-common.c (ignored_wnoerror_options): New global variable.
* opts-global.c (print_ignored_options): Ignore
-Wno-error= except if there are other
diagnostics.
* opts.c (enable_warning_as_error): Record ignored -Wno-error
options.
* opts.h (ignored_wnoerror_options): Declare.
* gcc.dg/Werror-13.c: Don't expect hints for
-Wno-error=.
---
 gcc/opts-common.c|  2 ++
 gcc/opts-global.c| 10 +++---
 gcc/opts.c   | 21 +
 gcc/opts.h   |  2 ++
 gcc/testsuite/gcc.dg/Werror-13.c |  2 +-
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index e2a315ba229..86e518bdd2b 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -26,6 +26,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "spellcheck.h"
 
+vec ignored_wnoerror_options;
+
 static void prune_options (struct cl_decoded_option **, unsigned int *);
 
 /* An option that is undocumented, that takes a joined argument, and
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index 7c5bd16c7ea..b4b4576e450 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -136,12 +136,16 @@ print_ignored_options (void)
 {
   while (!ignored_options.is_empty ())
 {
-  const char *opt;
-
-  opt = ignored_options.pop ();
+  const char * opt = ignored_options.pop ();
   warning_at (UNKNOWN_LOCATION, 0,
  "unrecognized command-line option %qs", opt);
 }
+  while (!ignored_wnoerror_options.is_empty ())
+{
+  const char * opt = ignored_wnoerror_options.pop ();
+  warning_at (UNKNOWN_LOCATION, 0,
+ "%<-Wno-error=%s%>: no option %<-W%s%>", opt, opt);
+}
 }
 
 /* Handle an unknown option DECODED, returning true if an error should
diff --git a/gcc/opts.c b/gcc/opts.c
index bb0d8b5e7db..a78e5cf1949 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -3175,15 +3175,20 @@ enable_warning_as_error (const char *arg, int value, 
unsigned int lang_mask,
   option_index = find_opt (new_option, lang_mask);
   if (option_index == OPT_SPECIAL_unknown)
 {
-  option_proposer op;
-  const char *hint = op.suggest_option (new_option);
-  if (hint)
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
- " did you mean %<-%s%>?", value ? "" : "no-",
- arg, new_option, hint);
+  if (value)
+   {
+ option_proposer op;
+ const char *hint = op.suggest_option (new_option);
+ if (hint)
+   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
+ " did you mean %<-%s%>?", value ? "" : "no-",
+ arg, new_option, hint);
+ else
+   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
+ value ? "" : "no-", arg, new_option);
+   }
   else
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
- value ? "" : "no-", arg, new_option);
+   ignored_wnoerror_options.safe_push (arg);
 }
   else if (!(cl_options[option_index].flags & CL_WARNING))
 error_at (loc, "%<-Werror=%s%>: %<-%s%> is not an option that "
diff --git a/gcc/opts.h b/gcc/opts.h
index 47223229388..ac281aef540 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -461,4 +461,6 @@ extern bool parse_and_check_align_values (const char *flag,
  bool report_error,
  location_t loc);
 
+extern vec ignored_wnoerror_options;
+
 #endif
diff --git a/gcc/testsuite/gcc.dg/Werror-13.c b/gcc/testsuite/gcc.dg/Werror-13.c
index 3a02b7ea2b5..7c2bf6836ed 100644
--- a/gcc/testsuite/gcc.dg/Werror-13.c
+++ b/gcc/testsuite/gcc.dg/Werror-13.c
@@ -5,6 +5,6 @@
 /* { dg-error "'-Werror' is not an option that controls warnings" "" { target 
*-*-* } 0 } */
 /* { dg-error "'-Wfatal-errors' is not an option that controls warnings" "" { 
target *-*-* } 0 } */
 /* { dg-error "'-Werror=vla2': no option '-Wvla2'; did you mean '-Wvla." "" { 
target *-*-* } 0 } */
-/* { dg-error "'-Wno-error=misleading-indentation2': no option 
'-Wmisleading-indentation2'; did you mean '-Wmisleading-indentation'" "" { 
target *-*-* } 0 } */
+/* { dg-warning "'-Wno-error=misleading-indentation2': no option 
'-Wmisleading-indentation2'" "" { target *-*-* } 0 } */
 
 int i;
-- 
2.23.0



Re: Rework constant subreg folds and handle more variable-length cases

2019-08-22 Thread Jeff Law
On 7/12/19 1:44 AM, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> This patch rewrites the way simplify_subreg handles constants.
>> It uses similar native_encode/native_decode routines to the
>> tree-level handling of VIEW_CONVERT_EXPR, meaning that we can
>> move between rtx constants and the target memory image of them.
>>
>> The main point of this patch is to support subregs of constant-length
>> vectors for VLA vectors, beyond the very simple cases that were already
>> handled.  Many of the new tests failed before the patch for variable-
>> length vectors.
>>
>> The boolean side is tested more by the upcoming SVE ACLE work.
>>
>> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
>> OK to install?
> I made a last-minute change after testing, to use uintNN_t types
> for target_unit rather than the original unsigned char/short/int.
> Of course, that doesn't survive a libgcc build since 
> isn't included there.
> 
> Fixed below, and posted as tested this time.
> 
> Richard
> 
> 
> 2019-07-12  Richard Sandiford  
> 
> gcc/
>   * defaults.h (TARGET_UNIT): New macro.
>   (target_unit): New type.
>   * rtl.h (native_encode_rtx, native_decode_rtx)
>   (native_decode_vector_rtx, subreg_size_lsb): Declare.
>   (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb.
>   * rtlanal.c (subreg_lsb_1): Delete.
>   (subreg_size_lsb): New function.
>   * simplify-rtx.c: Include rtx-vector-builder.h
>   (simplify_immed_subreg): Delete.
>   (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx)
>   (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New
>   functions.
>   (simplify_subreg): Use them.
>   (test_vector_subregs_modes, test_vector_subregs_repeating)
>   (test_vector_subregs_fore_back, test_vector_subregs_stepped)
>   (test_vector_subregs): New functions.
>   (test_vector_ops): Call test_vector_subregs for integer vector
>   modes with at least 2 elements.
This just turns out to be amazingly painful to work through and I don't
particularly see any good breakdown which would make it obvious where
the behavioral changes are vs just refactoring.

Given your long history with GCC and your expertise in RTL as well as
the SVE space I'm inclined to say go for it and we'll cope with any fallout.

Jeff


Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-22 Thread Bin.Cheng
On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin  wrote:
>
> Hi Bin,
>
> on 2019/8/22 下午1:46, Bin.Cheng wrote:
> > On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
> >>
> >> Hi Bin,
> >>
> >> Thanks for your time!
> >>
> >> on 2019/8/21 下午8:32, Bin.Cheng wrote:
> >>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
> 
>  Hi!
> 
>  Comparing to the previous versions of implementation mainly based on the
>  existing IV cands but zeroing the related group/use cost, this new one 
>  is based
>  on Richard and Segher's suggestion introducing one doloop dedicated IV 
>  cand.
> 
>  Some key points are listed below:
>    1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
>  IV cand.
>    2) Special name "doloop" assigned.
>    3) Doloop IV cand with form (niter+1, +, -1)
>    4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
>  for step.
>    5) Support may_be_zero (regressed PR is in this case), the base of 
>  doloop IV
>   can be COND_EXPR, add handlings in cand_value_at and 
>  may_eliminate_iv.
>    6) Add more expr support in force_expr_to_var_cost for reasonable cost
>   calculation on the IV base with may_be_zero (like COND_EXPR).
>    7) Set zero cost when using doloop IV cand for doloop use.
>    8) Add three hooks (should we merge _generic and _address?).
>  *) have_count_reg_decr_p, is to indicate the target has special 
>  hardware
> count register, we shouldn't consider the impact of doloop IV when
> calculating register pressures.
>  *) doloop_cost_for_generic, is the extra cost when using doloop IV 
>  cand for
> generic type IV use.
>  *) doloop_cost_for_address, is the extra cost when using doloop IV 
>  cand for
> address type IV use.
> >>> What will happen if doloop IV cand be used for generic/address type iv
> >>> use?  Can RTL doloop can still perform doloop optimization in this
> >>> case?
> >>>
> >>
> >> On Power, we put the iteration count into hardware count register, it 
> >> takes very
> >> high cost to move the count to GPR, so the cost is set as INF to make it 
> >> impossible
> >> to use it for generic/address type iv use.  But as some discussion before, 
> >> on some
> >> targets using GPR instead of hardware count register, they probably want 
> >> to use this
> >> doloop iv used for other uses if profitable.  These two hooks offer the 
> >> possibility.
> >> In that case, I think RTL doloop can still perform since it can still get 
> >> the
> >> pattern and transform.  The generic/address uses can still use it.
> 
>  Bootstrapped on powerpc64le-linux-gnu and regression testing passed 
>  excepting
>  for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
>  tracked
>  by PR89983.
> 
>  Any comments and suggestions are highly appreciated.  Thanks!
> >>> Not sure if I understand the patch correctly, some comments embedded.
> >>>
> >>> +  /* The number of doloop candidate in the set.  */
> >>> +  unsigned n_doloop_cands;
> >>> +
> >>> This is unnecessary.  See below comments.
> >>>
> >>> -add_candidate_1 (data, base, step, important,
> >>> -IP_NORMAL, use, NULL, orig_iv);
> >>> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> >>> doloop,
> >>> +orig_iv);
> >>>if (ip_end_pos (data->current_loop)
> >>>&& allow_ip_end_pos_p (data->current_loop))
> >>> -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> >>> orig_iv);
> >>> +add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> >>> doloop,
> >>> +orig_iv);
> >>> Do we need to skip ip_end_pos case for doloop candidate?  Because the
> >>> candidate increment will be inserted in latch, i.e, increment position
> >>> is after exit condition.
> >>>
> >>
> >> Yes, we should skip it.  Currently function find_doloop_use has the check 
> >> on an
> >> empty latch and gimple_cond to latch, partially excluding it.  But it's 
> >> still good
> >> to guard it directly here.
> >>
> >>> -  tree_to_aff_combination (iv->base, type, val);
> >>> +  tree base = iv->base;
> >>> +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
> >>> extract
> >>> + the value under !may_be_zero to get the compact bound which also 
> >>> well fits
> >>> + for may_be_zero since we ensure the value for it is const one.  */
> >>> +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
> >>> (desc->may_be_zero))
> >>> +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> >>> +   unshare_expr (rewrite_to_non_trapping_overflow 
> >>> (niter)),
> >>> +   build_int_cst (TREE_TYPE (niter), 1));
> >>> +  tree_to_aff_combination (base, type, val);
> >>> I don't quite follow here.  The iv->base is 

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-22 Thread Jeff Law
On 8/12/19 4:09 PM, Martin Sebor wrote:

> 
> gcc-83431.diff
> 
> PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
> truncation
> 
> gcc/ChangeLog:
> 
>   PR c++/83431
>   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
>   (sprintf_dom_walker): Remove class.
>   (get_int_range): Make argument const.
>   (directive::fmtfunc, directive::set_precision): Same.
>   (format_none): Same.
>   (build_intmax_type_nodes): Same.
>   (adjust_range_for_overflow): Same.
>   (format_floating): Same.
>   (format_character): Same.
>   (format_string): Same.
>   (format_plain): Same.
>   (get_int_range): Cast away constness.
>   (format_integer): Same.
>   (get_string_length): Call get_range_strlen_dynamic.  Handle
>   null lendata.maxbound.
>   (should_warn_p): Adjust argument scope qualifier.
>   (maybe_warn): Same.
>   (format_directive): Same.
>   (parse_directive): Same.
>   (is_call_safe): Same.
>   (try_substitute_return_value): Same.
>   (sprintf_dom_walker::handle_printf_call): Rename...
>   (handle_printf_call): ...to this.  Initialize target to host charmap
>   here instead of in pass_sprintf_length::execute.
>   (struct call_info): Make global.
>   (sprintf_dom_walker::compute_format_length): Make global.
>   (sprintf_dom_walker::handle_gimple_call): Same.
>   * passes.def (pass_sprintf_length): Replace with pass_strlen.
>   * print-rtl.c (print_pattern): Reduce the number of spaces to
>   avoid -Wformat-truncation.
>   * tree-pass.h (make_pass_warn_printf): New function.
>   * tree-ssa-strlen.c (strlen_optimize): New variable.
>   (get_string_length): Add comments.
>   (get_range_strlen_dynamic): New function.
>   (check_and_optimize_call): New function.
>   (handle_integral_assign): New function.
>   (strlen_check_and_optimize_stmt): Factor code out into
>   strlen_check_and_optimize_call and handle_integral_assign.
>   (strlen_dom_walker::evrp): New member.
>   (strlen_dom_walker::before_dom_children): Use evrp member.
>   (strlen_dom_walker::after_dom_children): Use evrp member.
>   (printf_strlen_execute): New function.
>   (pass_strlen::gate): Update to handle printf calls.
>   (dump_strlen_info): New function.
>   (pass_data_warn_printf): New variable.
>   (pass_warn_printf): New class.
>   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
>   (handle_printf_call): Same.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/83431
>   * gcc.dg/strlenopt-63.c: New test.
>   * gcc.dg/pr79538.c: Adjust text of expected warning.
>   * gcc.dg/pr81292-1.c: Adjust pass name.
>   * gcc.dg/pr81292-2.c: Same.
>   * gcc.dg/pr81703.c: Same.
>   * gcc.dg/strcmpopt_2.c: Same.
>   * gcc.dg/strcmpopt_3.c: Same.
>   * gcc.dg/strcmpopt_4.c: Same.
>   * gcc.dg/strlenopt-1.c: Same.
>   * gcc.dg/strlenopt-10.c: Same.
>   * gcc.dg/strlenopt-11.c: Same.
>   * gcc.dg/strlenopt-13.c: Same.
>   * gcc.dg/strlenopt-14g.c: Same.
>   * gcc.dg/strlenopt-14gf.c: Same.
>   * gcc.dg/strlenopt-15.c: Same.
>   * gcc.dg/strlenopt-16g.c: Same.
>   * gcc.dg/strlenopt-17g.c: Same.
>   * gcc.dg/strlenopt-18g.c: Same.
>   * gcc.dg/strlenopt-19.c: Same.
>   * gcc.dg/strlenopt-1f.c: Same.
>   * gcc.dg/strlenopt-2.c: Same.
>   * gcc.dg/strlenopt-20.c: Same.
>   * gcc.dg/strlenopt-21.c: Same.
>   * gcc.dg/strlenopt-22.c: Same.
>   * gcc.dg/strlenopt-22g.c: Same.
>   * gcc.dg/strlenopt-24.c: Same.
>   * gcc.dg/strlenopt-25.c: Same.
>   * gcc.dg/strlenopt-26.c: Same.
>   * gcc.dg/strlenopt-27.c: Same.
>   * gcc.dg/strlenopt-28.c: Same.
>   * gcc.dg/strlenopt-29.c: Same.
>   * gcc.dg/strlenopt-2f.c: Same.
>   * gcc.dg/strlenopt-3.c: Same.
>   * gcc.dg/strlenopt-30.c: Same.
>   * gcc.dg/strlenopt-31g.c: Same.
>   * gcc.dg/strlenopt-32.c: Same.
>   * gcc.dg/strlenopt-33.c: Same.
>   * gcc.dg/strlenopt-33g.c: Same.
>   * gcc.dg/strlenopt-34.c: Same.
>   * gcc.dg/strlenopt-35.c: Same.
>   * gcc.dg/strlenopt-4.c: Same.
>   * gcc.dg/strlenopt-48.c: Same.
>   * gcc.dg/strlenopt-49.c: Same.
>   * gcc.dg/strlenopt-4g.c: Same.
>   * gcc.dg/strlenopt-4gf.c: Same.
>   * gcc.dg/strlenopt-5.c: Same.
>   * gcc.dg/strlenopt-50.c: Same.
>   * gcc.dg/strlenopt-51.c: Same.
>   * gcc.dg/strlenopt-52.c: Same.
>   * gcc.dg/strlenopt-53.c: Same.
>   * gcc.dg/strlenopt-54.c: Same.
>   * gcc.dg/strlenopt-55.c: Same.
>   * gcc.dg/strlenopt-56.c: Same.
>   * gcc.dg/strlenopt-6.c: Same.
>   * gcc.dg/strlenopt-61.c: Same.
>   * gcc.dg/strlenopt-7.c: Same.
>   * gcc.dg/strlenopt-8.c: Same.
>   * gcc.dg/strlenopt-9.c: Same.
>   * gcc.dg/strlenopt.h (snprintf, snprintf): Declare.
>   * 

Re: [Patch v2] Enable math functions linking with static library for LTO

2019-08-22 Thread luoxhu

Hi Richard,

On 2019/8/13 17:10, Richard Biener wrote:

On Tue, Aug 13, 2019 at 4:22 AM luoxhu  wrote:


Hi Richard,

On 2019/8/12 16:51, Richard Biener wrote:

On Mon, Aug 12, 2019 at 8:50 AM luoxhu  wrote:


Hi Richard,
Thanks for your comments, updated the v2 patch as below:
1. Define and use builtin_with_linkage_p.
2. Add comments.
3. Add a testcase.

In LTO mode, if static library and dynamic library contains same
function and both libraries are passed as arguments, linker will link
the function in dynamic library no matter the sequence.  This patch
will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL
is a math function, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.


Comments below


gcc/ChangeLog

  2019-08-12  Xiong Hu Luo  

  PR lto/91287
  * builtins.c (builtin_with_linkage_p): New function.
  * builtins.h (builtin_with_linkage_p): New function.
  * symtab.c (write_symbol): Use builtin_with_linkage_p.
  * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p):
  Likewise.

gcc/testsuite/ChangeLog

  2019-08-12  Xiong Hu Luo  

  PR lto/91287
  * gcc.dg/pr91287.c: New testcase.
---
   gcc/builtins.c | 89 ++
   gcc/builtins.h |  2 +
   gcc/lto-streamer-out.c |  4 +-
   gcc/symtab.c   | 13 -
   gcc/testsuite/gcc.dg/pr91287.c | 40 +++
   5 files changed, 145 insertions(+), 3 deletions(-)
   create mode 100644 gcc/testsuite/gcc.dg/pr91287.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..f4dea941a27 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p)
 *p = (char)tree_to_uhwi (t);
 return true;
   }
+
+/* Return true if DECL is a specified builtin math function.  These functions
+   should have symbol in symbol table to provide linkage with faster version of
+   libraries.  */


The comment should read like

/* Return true if the builtin DECL is implemented in a standard
library.  Otherwise
 returns false which doesn't guarantee it is not (thus the list of
handled builtins
 below may be incomplete).  */


+bool
+builtin_with_linkage_p (tree decl)
+{
+  if (!decl)
+return false;


Omit this check please.


+  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+switch (DECL_FUNCTION_CODE (decl))
+{
+  CASE_FLT_FN (BUILT_IN_ACOS):
+  CASE_FLT_FN (BUILT_IN_ACOSH):
+  CASE_FLT_FN (BUILT_IN_ASIN):
+  CASE_FLT_FN (BUILT_IN_ASINH):
+  CASE_FLT_FN (BUILT_IN_ATAN):
+  CASE_FLT_FN (BUILT_IN_ATANH):
+  CASE_FLT_FN (BUILT_IN_ATAN2):
+  CASE_FLT_FN (BUILT_IN_CBRT):
+  CASE_FLT_FN (BUILT_IN_CEIL):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL):
+  CASE_FLT_FN (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN (BUILT_IN_COS):
+  CASE_FLT_FN (BUILT_IN_COSH):
+  CASE_FLT_FN (BUILT_IN_ERF):
+  CASE_FLT_FN (BUILT_IN_ERFC):
+  CASE_FLT_FN (BUILT_IN_EXP):
+  CASE_FLT_FN (BUILT_IN_EXP2):
+  CASE_FLT_FN (BUILT_IN_EXPM1):
+  CASE_FLT_FN (BUILT_IN_FABS):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
+  CASE_FLT_FN (BUILT_IN_FDIM):
+  CASE_FLT_FN (BUILT_IN_FLOOR):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR):
+  CASE_FLT_FN (BUILT_IN_FMA):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
+  CASE_FLT_FN (BUILT_IN_FMAX):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX):
+  CASE_FLT_FN (BUILT_IN_FMIN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN):
+  CASE_FLT_FN (BUILT_IN_FMOD):
+  CASE_FLT_FN (BUILT_IN_FREXP):
+  CASE_FLT_FN (BUILT_IN_HYPOT):
+  CASE_FLT_FN (BUILT_IN_ILOGB):
+  CASE_FLT_FN (BUILT_IN_LDEXP):
+  CASE_FLT_FN (BUILT_IN_LGAMMA):
+  CASE_FLT_FN (BUILT_IN_LLRINT):
+  CASE_FLT_FN (BUILT_IN_LLROUND):
+  CASE_FLT_FN (BUILT_IN_LOG):
+  CASE_FLT_FN (BUILT_IN_LOG10):
+  CASE_FLT_FN (BUILT_IN_LOG1P):
+  CASE_FLT_FN (BUILT_IN_LOG2):
+  CASE_FLT_FN (BUILT_IN_LOGB):
+  CASE_FLT_FN (BUILT_IN_LRINT):
+  CASE_FLT_FN (BUILT_IN_LROUND):
+  CASE_FLT_FN (BUILT_IN_MODF):
+  CASE_FLT_FN (BUILT_IN_NAN):
+  CASE_FLT_FN (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN (BUILT_IN_NEXTAFTER):
+  CASE_FLT_FN (BUILT_IN_NEXTTOWARD):
+  CASE_FLT_FN (BUILT_IN_POW):
+  CASE_FLT_FN (BUILT_IN_REMAINDER):
+  CASE_FLT_FN (BUILT_IN_REMQUO):
+  CASE_FLT_FN (BUILT_IN_RINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT):
+  CASE_FLT_FN (BUILT_IN_ROUND):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND):
+  CASE_FLT_FN (BUILT_IN_SCALBLN):
+  CASE_FLT_FN (BUILT_IN_SCALBN):
+  CASE_FLT_FN (BUILT_IN_SIN):
+  CASE_FLT_FN (BUILT_IN_SINH):
+  CASE_FLT_FN (BUILT_IN_SINCOS):
+  CASE_FLT_FN (BUILT_IN_SQRT):
+  

Re: [PATCH, c-family] Fix a PCH thinko (and thus PR61250).

2019-08-22 Thread Jason Merrill
On Thu, Aug 22, 2019 at 2:39 PM Jeff Law  wrote:
>
> On 8/22/19 1:59 PM, Iain Sandoe wrote:
> > Hi,
> >
> > When we are parsing a source file, the very first token might
> > be a PRAGMA_GCC_PCH_PREPROCESS.  This indicates that we are going
> > read in a PCH file (named as the value of the pragma).  If we don't
> > see this pragma, then we know that it's OK to release any resources
> > that the host might have set aside for the PCH file.
> >
> > There is a thinko in the current implementation, in that the decision
> > to release resources is happening unconditionally right after the first
> > token is extracted but before it's been checked or acted upon.
> >
> > This leads to the pch bug on Darwin, because we actually do release
> > resources - which are subsequently (reasonably) assumed to be available
> > when reading a PCH file.  We then get random crashes or hangs depending
> > on the interaction between unmmap and malloc.
> >
> > The bug is present everywhere but doesn't show on (say) Linux, since
> > the release of PCH resources is a NOP there.
> >
> > This effects all the c-family front ends, because they all use 
> > c_lex_with_flags ()
> > to implement this.
> >
> > The solution is to check for the PRAGMA_GCC_PCH_PREPROCESS and only call
> > c_common_no_more_pch () when that is not the first token.
> >
> > A secondary effect of the collection is that the name of the PCH file
> > can be collected during the ggc_pch_read() reset of state.  Therefore
> > we should issue any diagnostic that might name the file before the
> > collections are triggered.
> >
> > Tested on x86-64-darwin16/18 (where the pch.exp tests fail roughly half the
> > time for any parallel testing) and pass reliably without it.
> > Tested on x86_64,powerpc-linux-gnu with no regressions (and no expectation
> > of any progression either).
> >
> > Since the fix is in common code, it needs the ack of both C and C++ to apply
> > (I’m obviously OK with applying it from the Objective-C/C++ PoV)
> >
> > OK for trunk?
> >
> > given that this is a  show-stopper for PCH + -save-temps I would also like
> > to fix it on the open branches?
> >
> > thanks
> > Iain
> >
> > gcc/c-family/
> >
> > 2019-08-22  Iain Sandoe  
> >
> >   PR pch/61250
> >   * c-lex.c (c_lex_with_flags):  Don't call
> >   c_common_no_more_pch () from here.
> >
> > gcc/c/
> >
> > 2019-08-22  Iain Sandoe  
> >
> >   PR pch/61250
> >   * c-parser.c (c_parse_file): Call c_common_no_more_pch ()
> >   after determining that the first token is not
> >   PRAGMA_GCC_PCH_PREPROCESS.
> >
> > gcc/cp/
> >
> > 2019-08-22  Iain Sandoe  
> >
> >   PR pch/61250
> >   * parser.c (cp_parser_initial_pragma): Call c_common_no_more_pch ()
> >   after determining that the first token is not
> >   PRAGMA_GCC_PCH_PREPROCESS.
> >
> > gcc/
> >
> > 2019-08-22  Iain Sandoe  
> >
> >   PR pch/61250
> >   * ggc-page.c (ggc_pch_read): Read the ggc_pch_ondisk structure
> >   and issue any diagnostics needed before collecting the pre-PCH
> >   state.
> OK

OK with me, too.  Joseph recently mentioned being swamped with
reviews, so I wouldn't worry about waiting for his review.

Jason


Re: C++ PATCH for c++/91304 - prefix attributes ignored in condition

2019-08-22 Thread Jason Merrill
On Wed, Aug 21, 2019 at 7:24 AM Marek Polacek  wrote:
>
> Currently, we disregard prefix attributes in conditions, e.g.:
>
>   if ([[maybe_unused]] int i = f()) { }
>
> The problem here is that although we've parsed the attribute, it
> was never passed down to start_decl, so the effects were lost.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-08-21  Marek Polacek  
>
> PR c++/91304 - prefix attributes ignored in condition.
> * parser.c (cp_parser_condition): Handle prefix attributes.
>
> * g++.dg/cpp0x/gen-attrs-70.C: New test.
>
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index dbbfe1dbc2f..b410a6c030f 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -12066,6 +12066,10 @@ cp_parser_condition (cp_parser* parser)
>/* Restore the saved message.  */
>parser->type_definition_forbidden_message = saved_message;
>
> +  /* Gather the attributes that were provided with the
> + decl-specifiers.  */
> +  tree prefix_attributes = type_specifiers.attributes;

The patch is OK, since it follows the existing pattern, but it's weird
that we depend on various places in the parser to extract the
attributes from the specifiers rather than deal with that in
grokdeclarator.

> +
>cp_parser_maybe_commit_to_declaration (parser,
>  type_specifiers.any_specifiers_p);
>
> @@ -12116,7 +12120,7 @@ cp_parser_condition (cp_parser* parser)
>   /* Create the declaration.  */
>   decl = start_decl (declarator, _specifiers,
>  /*initialized_p=*/true,
> -attributes, /*prefix_attributes=*/NULL_TREE,
> +attributes, prefix_attributes,
>  _scope);
>
>   /* Parse the initializer.  */
> diff --git gcc/testsuite/g++.dg/cpp0x/gen-attrs-70.C 
> gcc/testsuite/g++.dg/cpp0x/gen-attrs-70.C
> new file mode 100644
> index 000..90a2e97a3f6
> --- /dev/null
> +++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-70.C
> @@ -0,0 +1,13 @@
> +// PR c++/91304 - prefix attributes ignored in condition.
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wall -Wextra" }
> +
> +int f();
> +
> +void g()
> +{
> +  if ([[maybe_unused]] int i = f()) { }
> +  if ([[deprecated]] int i = f()) { i = 10; } // { dg-warning ".i. is 
> deprecated" }
> +  if (int i [[maybe_unused]] = f()) { }
> +  if (int i [[deprecated]] = f()) { i = 10; } // { dg-warning ".i. is 
> deprecated" }
> +}


[Committed,Fortran] errmsg is intent(inout)

2019-08-22 Thread Steve Kargl
F2018 states that ERRMSG in co_broadcast, co_max, co_min, co_reduce,
and co_sum is INTENT(INOUT).  The committed patch makes this so.

2019-08-22  Steven G. Kargl  

* intrinsic.c (add_subroutines): ERRMSG is INTENT(INOUT) in
co_broadcast, co_max, co_min, co_reduce, and  co_sum.

-- 
Steve
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c	(revision 274827)
+++ gcc/fortran/intrinsic.c	(working copy)
@@ -3691,7 +3691,7 @@ add_subroutines (void)
 	  a, BT_REAL, dr, REQUIRED, INTENT_INOUT,
 	  "source_image", BT_INTEGER, di, REQUIRED, INTENT_IN,
 	  stat, BT_INTEGER, di, OPTIONAL, INTENT_OUT,
-	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_OUT);
+	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_INOUT);
 
   add_sym_4s ("co_max", GFC_ISYM_CO_MAX, CLASS_IMPURE,
 	  BT_UNKNOWN, 0, GFC_STD_F2018,
@@ -3699,7 +3699,7 @@ add_subroutines (void)
 	  a, BT_REAL, dr, REQUIRED, INTENT_INOUT,
 	  result_image, BT_INTEGER, di, OPTIONAL, INTENT_IN,
 	  stat, BT_INTEGER, di, OPTIONAL, INTENT_OUT,
-	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_OUT);
+	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_INOUT);
 
   add_sym_4s ("co_min", GFC_ISYM_CO_MIN, CLASS_IMPURE,
 	  BT_UNKNOWN, 0, GFC_STD_F2018,
@@ -3707,7 +3707,7 @@ add_subroutines (void)
 	  a, BT_REAL, dr, REQUIRED, INTENT_INOUT,
 	  result_image, BT_INTEGER, di, OPTIONAL, INTENT_IN,
 	  stat, BT_INTEGER, di, OPTIONAL, INTENT_OUT,
-	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_OUT);
+	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_INOUT);
 
   add_sym_4s ("co_sum", GFC_ISYM_CO_SUM, CLASS_IMPURE,
 	  BT_UNKNOWN, 0, GFC_STD_F2018,
@@ -3715,7 +3715,7 @@ add_subroutines (void)
 	  a, BT_REAL, dr, REQUIRED, INTENT_INOUT,
 	  result_image, BT_INTEGER, di, OPTIONAL, INTENT_IN,
 	  stat, BT_INTEGER, di, OPTIONAL, INTENT_OUT,
-	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_OUT);
+	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_INOUT);
 
   add_sym_5s ("co_reduce", GFC_ISYM_CO_REDUCE, CLASS_IMPURE,
 	  BT_UNKNOWN, 0, GFC_STD_F2018,
@@ -3724,7 +3724,7 @@ add_subroutines (void)
 	  "operator", BT_INTEGER, di, REQUIRED, INTENT_IN,
 	  result_image, BT_INTEGER, di, OPTIONAL, INTENT_IN,
 	  stat, BT_INTEGER, di, OPTIONAL, INTENT_OUT,
-	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_OUT);
+	  errmsg, BT_CHARACTER, dc, OPTIONAL, INTENT_INOUT);
 
 
   /* The following subroutine is internally used for coarray libray functions.


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/22/19 4:37 PM, Jeff Law wrote:

On 8/22/19 4:23 PM, Martin Sebor wrote:

On 8/22/19 3:27 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning
on strlen of a flexible array member

gcc/c-family/ChangeLog:

 PR middle-end/91490
 * c-common.c (braced_list_to_string): Add argument and overload.
 Handle flexible length arrays.

gcc/testsuite/ChangeLog:

 PR middle-end/91490
 * c-c++-common/Warray-bounds-7.c: New test.
 * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
 -Wstringop-overflow.
 * gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

 PR middle-end/91490
 * builtins.c (c_strlen): Rename argument and introduce new local.
 Set no-warning bit on original argument.
 * expr.c (string_constant): Pass argument type to
fold_ctor_reference.
 * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
 for missing initializers.
 * tree.c (build_string_literal): Handle optional argument.
 * tree.h (build_string_literal): Add defaulted argument.
 * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
 no-warning bit on original expression.

diff --git a/gcc/builtins.c b/gcc/builtins.c
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
const_tree exp)
   tree
   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
*decl)
   {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+    mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+    chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+    chartype = TREE_TYPE (chartype);
+
     tree array;
     STRIP_NOPS (arg);

So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


We can't.  string_constant is also called for wide strings (e.g.,
by the sprintf pass).  Returning a narrow string when the caller
asks for a wide one breaks the sprintf stuff.

Sigh.  And presumably we can't just  move the block down because other
bits in string_constant modify ARG.

So I think a quick comment before that fragment about its purpose and
we're good to go for the patch as a whole based on the one you posted an
hour or so ago.


I did move it down into the block with the STRING_CST transform.
A comment is also there so I committed the patch in r274837.

Thanks
Martin


Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-22 Thread Jeff Law
On 8/15/19 1:47 PM, Bernd Edlinger wrote:
> Hi,
> 
> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on 
> ARMv5 (PR 89544)"
> which is sanitizing the middle-end interface to the back-end for strict 
> alignment,
> and a couple of bug-fixes that are necessary to survive boot-strap.
> It is intended to be applied after the PR 89544 fix.
> 
> I think it would be possible to change the default implementation of 
> STACK_SLOT_ALIGNMENT
> to make all stack variables always naturally aligned instead of doing that 
> only
> in assign_parm_setup_stack, but would still like to avoid changing too many 
> things
> that do not seem to have a problem.  Since this would affect many targets, 
> and more
> kinds of variables that may probably not have a strict alignment problem.
> But I am ready to take your advice though.
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-strict-align.diff
> 
> 2019-08-15  Bernd Edlinger  
>   Richard Biener  
> 
>   * expr.c (expand_assignment): Handle misaligned DECLs.
>   (expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
>   * function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
>   too.
>   (assign_parm_setup_stack): Allocate properly aligned stack slots.
>   * varasm.c (build_constant_desc): Align constants of misaligned types.
>   * config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
>   strict alignment restrictions on memory addresses.
>   * config/arm/neon.md (movti, mov, mov): Likewise.
>   * config/arm/vec-common.md (mov): Likewise.
I'll ack the generic bits.  I have no clue if the ARM maintainers want
the asserts or not.

jeff


Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-22 Thread Jeff Law
On 8/17/19 1:44 AM, Bernd Edlinger wrote:
> On 8/15/19 9:47 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on 
>> ARMv5 (PR 89544)"
>> which is sanitizing the middle-end interface to the back-end for strict 
>> alignment,
>> and a couple of bug-fixes that are necessary to survive boot-strap.
>> It is intended to be applied after the PR 89544 fix.
>>
>> I think it would be possible to change the default implementation of 
>> STACK_SLOT_ALIGNMENT
>> to make all stack variables always naturally aligned instead of doing that 
>> only
>> in assign_parm_setup_stack, but would still like to avoid changing too many 
>> things
>> that do not seem to have a problem.  Since this would affect many targets, 
>> and more
>> kinds of variables that may probably not have a strict alignment problem.
>> But I am ready to take your advice though.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
>> Is it OK for trunk?
>>
>>
> 
> Hmm, actually the hunk in assign_parm_setup_stack is not only failing
> an assertion, but rather a wrong code bug:
> 
> I found now a test case that generates silently wrong code and is fixed
> by this patch.
> 
> $ cat unaligned-argument-3.c 
> /* { dg-do compile } */
> /* { dg-require-effective-target arm_arm_ok } */
> /* { dg-options "-marm -mno-unaligned-access -O3" } */
> 
> typedef int __attribute__((aligned(1))) s;
> 
> void x(char*, s*);
> void f(char a, s f)
> {
>   x(, );
> }
> 
> /* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp\\\]" 1 } } */
> /* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp, #3\\\]" 0 } } */
> 
> currently with -marm -mno-unaligned-access -O3 we generate:
> 
> f:
>   @ args = 0, pretend = 0, frame = 8
>   @ frame_needed = 0, uses_anonymous_args = 0
>   str lr, [sp, #-4]!
>   sub sp, sp, #12
>   mov r3, r0
>   str r1, [sp, #3]  <- may trap
>   add r0, sp, #7
>   add r1, sp, #3
>   strbr3, [sp, #7]
>   bl  x
>   add sp, sp, #12
>   @ sp needed
>   ldr pc, [sp], #4
> 
> 
> So I would like to add a test case to the patch as attached.
> 
> Tested with a cross, that both dg-final fail currently and are fixed
> with the other patches applied.
> 
> Is it OK for trunk?
OK when the patch that fixes this is ACK'd.

jeff


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/22/19 4:23 PM, Martin Sebor wrote:
> On 8/22/19 3:27 PM, Jeff Law wrote:
>> On 8/21/19 2:50 PM, Martin Sebor wrote:
>>> This patch is a subset of the solution for PR 91457 whose main
>>> goal is to eliminate inconsistencies in warnings issued for
>>> out-of-bounds accesses to the various flavors of flexible array
>>> members of constant objects.  That patch was posted here:
>>>    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html
>>>
>>> Like PR 91457, this patch also relies on improving optimization
>>> to issue better quality warnings.  (I.e., with the latter being
>>> what motivated it.)
>>>
>>> The optimization enhances string_constant to recognize empty
>>> CONSTRUCTORs returned by fold_ctor_reference for references
>>> to members of constant objects with either "insufficient"
>>> initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
>>> or with braced-list initializers (e.g., given
>>> struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
>>> convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
>>> enables the folding of calls to built-ins like strlen, strchr, or
>>> strcmp with such arguments.
>>>
>>> Exposing the strings to the folder then also lets it detect and
>>> issue warnings for out-of-bounds offsets in more instances of
>>> such references than before.
>>>
>>> The remaining changes in the patch mostly enhance the places
>>> that use the no-warning bit to prevent redundant diagnostics.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> gcc-91490.diff
>>>
>>> PR middle-end/91490 - bogus argument missing terminating nul warning
>>> on strlen of a flexible array member
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * c-common.c (braced_list_to_string): Add argument and overload.
>>> Handle flexible length arrays.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * c-c++-common/Warray-bounds-7.c: New test.
>>> * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
>>> -Wstringop-overflow.
>>> * gcc.dg/strlenopt-78.c: New test.
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * builtins.c (c_strlen): Rename argument and introduce new local.
>>> Set no-warning bit on original argument.
>>> * expr.c (string_constant): Pass argument type to
>>> fold_ctor_reference.
>>> * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
>>> for missing initializers.
>>> * tree.c (build_string_literal): Handle optional argument.
>>> * tree.h (build_string_literal): Add defaulted argument.
>>> * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
>>> no-warning bit on original expression.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
>>> const_tree exp)
>>>   tree
>>>   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
>>> *decl)
>>>   {
>>> +  tree dummy = NULL_TREE;;
>>> +  if (!mem_size)
>>> +    mem_size = 
>>> +
>>> +  tree chartype = TREE_TYPE (arg);
>>> +  if (POINTER_TYPE_P (chartype))
>>> +    chartype = TREE_TYPE (chartype);
>>> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>>> +
>>>     tree array;
>>>     STRIP_NOPS (arg);
>> So per our conversation today, I took a closer look at this.  As you
>> noted CHARTYPE is only used for the empty constructor code you're adding
>> as a part of this patch.
>>
>> Rather than stripping away types like this to compute chartype, couldn't
>> we just use char_type_node instead of chartype in this code below?
> 
> We can't.  string_constant is also called for wide strings (e.g.,
> by the sprintf pass).  Returning a narrow string when the caller
> asks for a wide one breaks the sprintf stuff.
Sigh.  And presumably we can't just  move the block down because other
bits in string_constant modify ARG.

So I think a quick comment before that fragment about its purpose and
we're good to go for the patch as a whole based on the one you posted an
hour or so ago.

Thanks,
jeff


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/22/19 3:27 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning on strlen 
of a flexible array member

gcc/c-family/ChangeLog:

PR middle-end/91490
* c-common.c (braced_list_to_string): Add argument and overload.
Handle flexible length arrays.

gcc/testsuite/ChangeLog:

PR middle-end/91490
* c-c++-common/Warray-bounds-7.c: New test.
* gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
-Wstringop-overflow.
* gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

PR middle-end/91490
* builtins.c (c_strlen): Rename argument and introduce new local.
Set no-warning bit on original argument.
* expr.c (string_constant): Pass argument type to fold_ctor_reference.
* gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
for missing initializers.
* tree.c (build_string_literal): Handle optional argument.
* tree.h (build_string_literal): Add defaulted argument.
* gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
no-warning bit on original expression.

diff --git a/gcc/builtins.c b/gcc/builtins.c
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree exp)
  tree
  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+chartype = TREE_TYPE (chartype);
+
tree array;
STRIP_NOPS (arg);

So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


We can't.  string_constant is also called for wide strings (e.g.,
by the sprintf pass).  Returning a narrow string when the caller
asks for a wide one breaks the sprintf stuff.



+  tree initsize = TYPE_SIZE_UNIT (inittype);
+
+  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
+{
+  /* Convert a char array to an empty STRING_CST having an array
+of the expected type.  */
+  if (!initsize)
+ initsize = integer_zero_node;
+
+  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
+  init = build_string_literal (size ? 1 : 0, "", chartype, size);
+  init = TREE_OPERAND (init, 0);
+  init = TREE_OPERAND (init, 0);
+
+  *ptr_offset = integer_zero_node;
+}

Per my prior message, we do need to update that test to be something like

if (TREE_CODE (init) == CONSTRUCTOR
 && !CONSTRUCTOR_ELTS
 && !TREE_CLOBBER_P (init))

While I don't think we're likely to see a TREE_CLOBBER_P here, it's best
to be safe since those have totally different meanings,


I went with the other alternative you mentioned and used
initializer_zerop as I explained in my reply.  Let me know if
that's not what you meant.




diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 2810867235d..ef942ffce57 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
unsigned HOST_WIDE_INT idx;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
{
- val = braced_lists_to_strings (ttp, val);
+ val = braced_lists_to_strings (ttp, val, code 

Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/22/19 3:55 PM, Martin Sebor wrote:

>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 92979289e83..d16e0982dcf 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
>>> const_tree exp)
>>>   tree
>>>   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
>>> *decl)
>>>   {
>>> +  tree dummy = NULL_TREE;;
>>> +  if (!mem_size)
>>> +    mem_size = 
>>> +
>>> +  tree chartype = TREE_TYPE (arg);
>>> +  if (POINTER_TYPE_P (chartype))
>>> +    chartype = TREE_TYPE (chartype);
>>> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>> I'm hesitant to ACK this bit.  I can understand that if we're given an
>> ADDR_EXPR, which should be a POINTER_TYPE_P, that we'll want to look at
>> what it's pointing to.  But shouldn't we verify that it's an ADDR_EXPR
>> before just blindly stripping away the outer type?
> 
> The character type is that of the expression, before any casts
> are stripped.  For example, given:
> 
>   struct S { char n, a[4]; } s[1] = { 0 };
> 
> and strlen (s->a), ARG is the POINTER_PLUS_EXPR
> '(const char *)  + 1'  The type of the operand is a pointer to
> the struct S.
Given "chartype" is only used in the CONSTRUCTOR handling you're adding,
couldn't we just use "char_type_node" in there instead and drop the bits
noted above?


>>> +  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
>> Would initializer_zerop be better here?  That would catch
>> zero-initialized constructors as well.
> 
> I originally had initializer_zerop there and removed it at the last
> minute it because none of my tests needed it.  But my tests only
> exercised the flexible array members (which have to be initialized
> in constant objects) and not the case where initializer_zerop does
> make a difference such as:
> 
>   struct A { char a[4]; };
>   struct B { struct A a; };
> 
>   const struct B b[2] = { 0 };
> 
>   void f (void)
>   {
>     if (__builtin_strlen (b->a.a) == 3)
>   __builtin_abort ();
>   }
> 
> Here, fold_ctor_reference returns a non-empty CONSTRUCTOR that
> the code doesn't handle.  Adding initializer_zerop lets it create
> the empty string and the caller fold the call.  But it's just
> a special case of the more general problem where the CONSTRUCTOR
> is both non-empty and non-zero for the parts of the object before
> the array we're interested in, such as in
> 
>   const struct B b[1] = { 1 };
> 
> Here, strlen (b->a.a) again isn't folded.  Ironically, the equivalent
> strlen (b[0].a.a) is folded.  The difference between the two is that
> b[0].a.a is represented as itself (i.e., NOP (char*, ADDR (b.a.a))
> while b->a.a as (const char *)  + 1.  In the former case
> fold_ctor_reference returns an empty CONSTRUCTOR for the char array.
> In the latter, it returns the non-empty, non-zero CTOR for all of
> b the we don't know how to extract the empty string from.
> 
> Incidentally, it's only the C front-end that "bastardizes"
> the expression like this.  The C++ FE preserves the original form
> of the expression and so it's able to fold the call.
> 
> (Uncovering and trying to fix these problems, by the way, is what
> I meant the other day when I said how patches in this area have
> a tendency to snownball into "projects."  Folding empty ctors of
> constant objects probably isn't terribly important but the lack
> of consistency is what makes writing tests that behave the same
> way across different front-ends and back-ends challenging.)
Yup.  I understand.  When presented with these kinds of snowballing
issues, the best advice I can give is to break things down into logical
units and make patches which are a series rather than a single unified
patch to address multiple issues.  git rebase can really help.

> 
>> If not, then you want to make sure to check !TREE_CLOBBER_P (init) since
>> that's something completely different than zero initialization.
> 
> I went with the initializer_zerop since it improves things, if
> negligibly, and added tests for it.  If you or Richard have a test
> case that I could add to the patch showing when TREE_CLOBBER_P is
> set on a CTOR and that not checking would cause trouble.
Note that initializer_zerop checks TREE_CLOBBER internally.  So if
you're using initializer_zerop, then you're good.


So all that's left is the "chartype" stuff in string_constant.  If we
can use char_type_node instead of trying to extract a character type,
then the problematical code would just go away.

jeff



Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/21/19 4:28 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning on strlen 
of a flexible array member

gcc/c-family/ChangeLog:

PR middle-end/91490
* c-common.c (braced_list_to_string): Add argument and overload.
Handle flexible length arrays.

gcc/testsuite/ChangeLog:

PR middle-end/91490
* c-c++-common/Warray-bounds-7.c: New test.
* gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
-Wstringop-overflow.
* gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

PR middle-end/91490
* builtins.c (c_strlen): Rename argument and introduce new local.
Set no-warning bit on original argument.
* expr.c (string_constant): Pass argument type to fold_ctor_reference.
* gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
for missing initializers.
* tree.c (build_string_literal): Handle optional argument.
* tree.h (build_string_literal): Add defaulted argument.
* gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
no-warning bit on original expression.




diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 2810867235d..ef942ffce57 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
unsigned HOST_WIDE_INT idx;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
{
- val = braced_lists_to_strings (ttp, val);
+ val = braced_lists_to_strings (ttp, val, code == RECORD_TYPE);

Do you need to handle UNION_TYPE or QUAL_UNION_TYPE here too?  If so,
RECORD_OR_UNION_TYPE_P is a better test.


I confess I didn't even think about unions.  I wouldn't expect
to see the definition of a const union but I also can't think
of a reason to exclude them from here so I've made the change.




diff --git a/gcc/expr.c b/gcc/expr.c
index 92979289e83..d16e0982dcf 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree exp)
  tree
  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+chartype = TREE_TYPE (chartype);

I'm hesitant to ACK this bit.  I can understand that if we're given an
ADDR_EXPR, which should be a POINTER_TYPE_P, that we'll want to look at
what it's pointing to.  But shouldn't we verify that it's an ADDR_EXPR
before just blindly stripping away the outer type?


The character type is that of the expression, before any casts
are stripped.  For example, given:

  struct S { char n, a[4]; } s[1] = { 0 };

and strlen (s->a), ARG is the POINTER_PLUS_EXPR
'(const char *)  + 1'  The type of the operand is a pointer to
the struct S.



I also wonder if this (assuming we keep it in some form) belongs after
the STRIP_NOPs.


In other expressions, after STRIP_NOPS the type also may not reflect
the actual character type.  I don't know how else  get at the character
type when the type of the CONSTRUCTOR is that of an enclosing object.

I've moved the code closer to where it's needed but otherwise left it
unchanged.


@@ -11602,17 +11605,39 @@ string_constant (tree arg, tree *ptr_offset, tree 
*mem_size, tree *decl)
int len = native_encode_expr (init, charbuf, sizeof charbuf, 0);
if (len > 0)
{
- /* Construct a string literal with 

Re: [PATCH, c-family] Fix a PCH thinko (and thus PR61250).

2019-08-22 Thread Jeff Law
On 8/22/19 1:59 PM, Iain Sandoe wrote:
> Hi,
> 
> When we are parsing a source file, the very first token might
> be a PRAGMA_GCC_PCH_PREPROCESS.  This indicates that we are going
> read in a PCH file (named as the value of the pragma).  If we don't
> see this pragma, then we know that it's OK to release any resources
> that the host might have set aside for the PCH file.
> 
> There is a thinko in the current implementation, in that the decision
> to release resources is happening unconditionally right after the first
> token is extracted but before it's been checked or acted upon.
> 
> This leads to the pch bug on Darwin, because we actually do release
> resources - which are subsequently (reasonably) assumed to be available
> when reading a PCH file.  We then get random crashes or hangs depending
> on the interaction between unmmap and malloc.
> 
> The bug is present everywhere but doesn't show on (say) Linux, since
> the release of PCH resources is a NOP there.
> 
> This effects all the c-family front ends, because they all use 
> c_lex_with_flags ()
> to implement this.
> 
> The solution is to check for the PRAGMA_GCC_PCH_PREPROCESS and only call
> c_common_no_more_pch () when that is not the first token.
> 
> A secondary effect of the collection is that the name of the PCH file
> can be collected during the ggc_pch_read() reset of state.  Therefore
> we should issue any diagnostic that might name the file before the
> collections are triggered.
> 
> Tested on x86-64-darwin16/18 (where the pch.exp tests fail roughly half the
> time for any parallel testing) and pass reliably without it.
> Tested on x86_64,powerpc-linux-gnu with no regressions (and no expectation
> of any progression either).
> 
> Since the fix is in common code, it needs the ack of both C and C++ to apply
> (I’m obviously OK with applying it from the Objective-C/C++ PoV)
> 
> OK for trunk?
> 
> given that this is a  show-stopper for PCH + -save-temps I would also like
> to fix it on the open branches?
> 
> thanks
> Iain
> 
> gcc/c-family/
> 
> 2019-08-22  Iain Sandoe  
> 
>   PR pch/61250
>   * c-lex.c (c_lex_with_flags):  Don't call
>   c_common_no_more_pch () from here.
> 
> gcc/c/
> 
> 2019-08-22  Iain Sandoe  
> 
>   PR pch/61250
>   * c-parser.c (c_parse_file): Call c_common_no_more_pch ()
>   after determining that the first token is not
>   PRAGMA_GCC_PCH_PREPROCESS.
> 
> gcc/cp/
> 
> 2019-08-22  Iain Sandoe  
> 
>   PR pch/61250
>   * parser.c (cp_parser_initial_pragma): Call c_common_no_more_pch ()
>   after determining that the first token is not
>   PRAGMA_GCC_PCH_PREPROCESS.
> 
> gcc/
> 
> 2019-08-22  Iain Sandoe  
> 
>   PR pch/61250
>   * ggc-page.c (ggc_pch_read): Read the ggc_pch_ondisk structure
>   and issue any diagnostics needed before collecting the pre-PCH
>   state.
OK
jeff


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-22 Thread Jeff Law
On 8/20/19 8:10 PM, Martin Sebor wrote:
> Jeff,
> 
> Please let me know if you agree/disagree and what I need to
> do to advance this work:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
For the official record, I agree :-)

jeff


Re: [PATCH] libgcc/crtstuff.c: Fix incorrect alignment of entries in CRT data structures (PR target/91306)

2019-08-22 Thread Jeff Law
On 8/21/19 7:34 AM, Jozef Lawrynowicz wrote:
> As described in PR target/91306, the alignment of entries in CRT data 
> structures
> as set in libgcc/crtstuff.c can be too large for msp430-elf.
> 
> crtstuff.c assumes the correct alignment of entries with a type of (void *) or
> int32 is the "sizeof" that type.
> 
> However, for msp430-elf in the large memory model, pointer size is 4 bytes, 
> but
> they only need to be 2-byte aligned. Likewise for int32.
> 
> This is causing problems for __frame_dummy_init_array_entry, which gets 
> inserted into .init_array.
> It is being forced into a 4 byte alignment, which can result in a gap between
> __frame_dummy_init_array_entry and the previous entry (or the start of the
> table). So when the startup code comes to run through the entries
> in .init_array, it interprets 2-bytes of padding 0s as part of a function
> address, resulting in a function call to the incorrect location.
> 
> This issue has only recently appeared because msp430-elf was just transitioned
> to use init/fini_array by default, as mandated by the mspabi.
> 
> The other CRT array entries in crtstuff.c are not used for msp430-elf, so 
> aren't
> causing problems.
> 
> As suggested in the PR, I tried changing the alignment of the array entry from
> sizeof(func_ptr) to __alignof__(func_ptr), and this fixed the issue. In the
> attached patch I also updated the other uses of sizeof in "aligned" attributes
> to use __alignof__ instead.
> 
> I understand the alignment to sizeof(func_ptr) was originally added to fix an
> issue on mips64 (r182066
> https://gcc.gnu.org/ml/gcc-patches/2011-12/msg00393.html), however I do
> not have access to a mips64 platform for which to bootstrap on. I tried 
> building
> a cross-compiler with the "aligned" attributes removed to see if I could
> reproduce the failure, but the build completed successfully.
> I built the mips64 cross-compiler with my patch applied and it
> also completed successfully.
> 
> I successfully regtested the patch for msp430-elf. It fixes many execution
> failures for msp430-elf/-mlarge.
> I also successfully bootstrapped and regtested x86_64-pc-linux-gnu (g++ gcc
> gfortran libatomic libgomp libitm libstdc++ objc).
> 
> According to the mips eabi (is this the right one?
> http://www.cygwin.com/ml/binutils/2003-06/msg00436.html), pointers have 64-bit
> size and alignment in 64-bit mode, so I am assuming using __alignof__ instead 
> of
> sizeof isn't going to cause problems.
> 
> Ok for trunk?
> 
> 
> 0001-libgcc-crtstuff.c-Fix-incorrect-alignment-of-entries.patch
> 
> From 636ffa0c4f15047dc27c829d9a3c8fea9ad5d635 Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Thu, 15 Aug 2019 14:17:25 +0100
> Subject: [PATCH] libgcc/crtstuff.c: Fix incorrect alignment of entries in CRT
>  data structures
> 
> libgcc/ChangeLog:
> 
> 2019-08-21  Jozef Lawrynowicz  
> 
>   * crtstuff.c (__CTOR_LIST__): Align to the "__alignof__" the array
>   element type, instead of "sizeof" the element type.
>   (__DTOR_LIST__): Likewise.
>   (__TMC_LIST__): Likewise.
>   (__do_global_dtors_aux_fini_array_entry): Likewise.
>   (__frame_dummy_init_array_entry): Likewise.
>   (__CTOR_END__): Likewise.
>   (__DTOR_END__): Likweise.
>   (__FRAME_END__): Likewise.
>   (__TMC_END__): Likewise.
OK
jeff


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/21/19 2:50 PM, Martin Sebor wrote:
> This patch is a subset of the solution for PR 91457 whose main
> goal is to eliminate inconsistencies in warnings issued for
> out-of-bounds accesses to the various flavors of flexible array
> members of constant objects.  That patch was posted here:
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html
> 
> Like PR 91457, this patch also relies on improving optimization
> to issue better quality warnings.  (I.e., with the latter being
> what motivated it.)
> 
> The optimization enhances string_constant to recognize empty
> CONSTRUCTORs returned by fold_ctor_reference for references
> to members of constant objects with either "insufficient"
> initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
> or with braced-list initializers (e.g., given
> struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
> convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
> enables the folding of calls to built-ins like strlen, strchr, or
> strcmp with such arguments.
> 
> Exposing the strings to the folder then also lets it detect and
> issue warnings for out-of-bounds offsets in more instances of
> such references than before.
> 
> The remaining changes in the patch mostly enhance the places
> that use the no-warning bit to prevent redundant diagnostics.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-91490.diff
> 
> PR middle-end/91490 - bogus argument missing terminating nul warning on 
> strlen of a flexible array member
> 
> gcc/c-family/ChangeLog:
> 
>   PR middle-end/91490
>   * c-common.c (braced_list_to_string): Add argument and overload.
>   Handle flexible length arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/91490
>   * c-c++-common/Warray-bounds-7.c: New test.
>   * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
>   -Wstringop-overflow.
>   * gcc.dg/strlenopt-78.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/91490
>   * builtins.c (c_strlen): Rename argument and introduce new local.
>   Set no-warning bit on original argument.
>   * expr.c (string_constant): Pass argument type to fold_ctor_reference.
>   * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
>   for missing initializers.
>   * tree.c (build_string_literal): Handle optional argument.
>   * tree.h (build_string_literal): Add defaulted argument.
>   * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
>   no-warning bit on original expression.
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree 
> exp)
>  tree
>  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
>  {
> +  tree dummy = NULL_TREE;;
> +  if (!mem_size)
> +mem_size = 
> +
> +  tree chartype = TREE_TYPE (arg);
> +  if (POINTER_TYPE_P (chartype))
> +chartype = TREE_TYPE (chartype);
> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
> +chartype = TREE_TYPE (chartype);
> +
>tree array;
>STRIP_NOPS (arg);
So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


> +  tree initsize = TYPE_SIZE_UNIT (inittype);
> +
> +  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
> +{
> +  /* Convert a char array to an empty STRING_CST having an array
> +  of the expected type.  */
> +  if (!initsize)
> +   initsize = integer_zero_node;
> +
> +  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> +  init = build_string_literal (size ? 1 : 0, "", chartype, size);
> +  init = TREE_OPERAND (init, 0);
> +  init = TREE_OPERAND (init, 0);
> +
> +  *ptr_offset = integer_zero_node;
> +}
Per my prior message, we do need to update that test to be something like

if (TREE_CODE (init) == CONSTRUCTOR
&& !CONSTRUCTOR_ELTS
&& !TREE_CLOBBER_P (init))

While I don't think we're likely to see a TREE_CLOBBER_P here, it's best
to be safe since those have totally different meanings,

> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 2810867235d..ef942ffce57 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
>unsigned HOST_WIDE_INT idx;
>FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
>   {
> -   val = braced_lists_to_strings (ttp, val);
> +   val = braced_lists_to_strings (ttp, val, code == RECORD_TYPE);
Also per our discussion, I think a comment that we might want to test
RECORD_OR_UNION_TYPE_P here instead of just testing for RECORD_TYPE is
all we need.

With those changes and a 

Re: [PATCH] Builtin function roundeven folding implementation

2019-08-22 Thread Tejas Joshi
Hi,
This is a full patch for the roundeven variants along with
documentation and additional testcases. The following code also
conforms to GNU's coding standards.

Thanks,
Tejas

2019-08-22  Tejas Joshi  
Martin Jambor  

* builtins.c (mathfn_built_in_2): Added CASE_MATHFN for ROUNDEVEN.
* builtins.def: Added function definitions for roundeven function
variants.
* fold-const-call.c (fold_const_call_ss): Added case for roundeven
function call.
* fold-const.c (negate_mathfn_p): Added case for roundeven function.
(tree_call_nonnegative_warnv_p): Added case for roundeven function.
(integer_valued_real_call_p): Added case for roundeven function.
* real.c (is_even): New function. Returns true if real number is even,
otherwise returns false.
(is_halfway_below): New function. Returns true if real number is
halfway between two integers, else return false.
(real_roundeven): New function. Round real number to nearest integer,
rounding halfway cases towards even.
* real.h (real_value): Added descriptive comments.  Added function
declaration for roundeven function.
* doc/extend.texi (Other Builtins): List roundeven variants among
functions which can be handled as builtins.

gcc/testsuite/ChangeLog:

2019-08-21  Tejas Joshi  

* gcc.dg/torture/builtin-round-roundeven.c: New test.
* gcc.dg/torture/builtin-round-roundevenf128.c: New test.

On Thu, 22 Aug 2019 at 20:03, Joseph Myers  wrote:
>
> On Thu, 22 Aug 2019, Martin Jambor wrote:
>
> > +/* Round X to nearest integer, rounding halfway cases towards even.  */
> > +
> > +void
> > +real_roundeven (REAL_VALUE_TYPE *r, format_helper fmt,
> > + const REAL_VALUE_TYPE *x)
> > +{
> > +  if (is_halfway_below (x))
> > +  {
> > +do_add (r, x, , x->sign);
> > +if (!is_even (r))
> > +  do_add (r, r, , x->sign);
>
> I'm concerned that this would produce +0.0 for an argument of -0.5 (via
> -0.5 - 0.5 - -1.0 producing +0.0) when it needs to produce -0.0.
>
> Note that testcases for the sign of zero results need to check e.g.
> !!__builtin_signbit on the result, or the result of calling
> __builtin_copysign* to extract the sign of the result, since 0.0 == -0.0
> so checking with ==, while necessary, is not sufficient in that case.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a766e4ad63..5149d901a96 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2056,6 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 CASE_MATHFN (REMQUO)
 CASE_MATHFN_FLOATN (RINT)
 CASE_MATHFN_FLOATN (ROUND)
+CASE_MATHFN (ROUNDEVEN)
 CASE_MATHFN (SCALB)
 CASE_MATHFN (SCALBLN)
 CASE_MATHFN (SCALBN)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 6d41bdb4f44..8bb7027aac7 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -542,12 +542,18 @@ DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, AT
 #define RINT_TYPE(F) BT_FN_##F##_##F
 DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_RINT, "rint", RINT_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
 #undef RINT_TYPE
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVEN, "roundeven", BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENF, "roundevenf", BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENL, "roundevenl", BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUND, "round", BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDF, "roundf", BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDL, "roundl", BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 #define ROUND_TYPE(F) BT_FN_##F##_##F
 DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_ROUND, "round", ROUND_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
 #undef ROUND_TYPE
+#define ROUNDEVEN_TYPE(F) BT_FN_##F##_##F
+DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_ROUNDEVEN, "roundeven", ROUNDEVEN_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
+#undef ROUNDEVEN_TYPE
 DEF_EXT_LIB_BUILTIN(BUILT_IN_SCALB, "scalb", BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_SCALBF, "scalbf", BT_FN_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_SCALBL, "scalbl", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 235be99abcb..4aea4d31761 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12448,7 +12448,8 @@ Outside strict ISO C mode (@option{-ansi}, @option{-std=c90},
 @code{j1f}, @code{j1l}, @code{j1}, @code{jnf}, @code{jnl}, @code{jn},
 @code{lgammaf_r}, @code{lgammal_r}, @code{lgamma_r}, @code{mempcpy},
 @code{pow10f}, @code{pow10l}, @code{pow10}, @code{printf_unlocked},
-@code{rindex}, 

[PATCH, c-family] Fix a PCH thinko (and thus PR61250).

2019-08-22 Thread Iain Sandoe
Hi,

When we are parsing a source file, the very first token might
be a PRAGMA_GCC_PCH_PREPROCESS.  This indicates that we are going
read in a PCH file (named as the value of the pragma).  If we don't
see this pragma, then we know that it's OK to release any resources
that the host might have set aside for the PCH file.

There is a thinko in the current implementation, in that the decision
to release resources is happening unconditionally right after the first
token is extracted but before it's been checked or acted upon.

This leads to the pch bug on Darwin, because we actually do release
resources - which are subsequently (reasonably) assumed to be available
when reading a PCH file.  We then get random crashes or hangs depending
on the interaction between unmmap and malloc.

The bug is present everywhere but doesn't show on (say) Linux, since
the release of PCH resources is a NOP there.

This effects all the c-family front ends, because they all use c_lex_with_flags 
()
to implement this.

The solution is to check for the PRAGMA_GCC_PCH_PREPROCESS and only call
c_common_no_more_pch () when that is not the first token.

A secondary effect of the collection is that the name of the PCH file
can be collected during the ggc_pch_read() reset of state.  Therefore
we should issue any diagnostic that might name the file before the
collections are triggered.

Tested on x86-64-darwin16/18 (where the pch.exp tests fail roughly half the
time for any parallel testing) and pass reliably without it.
Tested on x86_64,powerpc-linux-gnu with no regressions (and no expectation
of any progression either).

Since the fix is in common code, it needs the ack of both C and C++ to apply
(I’m obviously OK with applying it from the Objective-C/C++ PoV)

OK for trunk?

given that this is a  show-stopper for PCH + -save-temps I would also like
to fix it on the open branches?

thanks
Iain

gcc/c-family/

2019-08-22  Iain Sandoe  

PR pch/61250
* c-lex.c (c_lex_with_flags):  Don't call
c_common_no_more_pch () from here.

gcc/c/

2019-08-22  Iain Sandoe  

PR pch/61250
* c-parser.c (c_parse_file): Call c_common_no_more_pch ()
after determining that the first token is not
PRAGMA_GCC_PCH_PREPROCESS.

gcc/cp/

2019-08-22  Iain Sandoe  

PR pch/61250
* parser.c (cp_parser_initial_pragma): Call c_common_no_more_pch ()
after determining that the first token is not
PRAGMA_GCC_PCH_PREPROCESS.

gcc/

2019-08-22  Iain Sandoe  

PR pch/61250
* ggc-page.c (ggc_pch_read): Read the ggc_pch_ondisk structure
and issue any diagnostics needed before collecting the pre-PCH
state.

---
 gcc/c-family/c-lex.c | 7 ---
 gcc/c/c-parser.c | 2 ++
 gcc/cp/parser.c  | 5 -
 gcc/ggc-page.c   | 5 +++--
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 851fd704e5..e3c602fbb8 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -394,7 +394,6 @@ enum cpp_ttype
 c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
  int lex_flags)
 {
-  static bool no_more_pch;
   const cpp_token *tok;
   enum cpp_ttype type;
   unsigned char add_flags = 0;
@@ -628,12 +627,6 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned 
char *cpp_flags,
   if (cpp_flags)
 *cpp_flags = tok->flags | add_flags;
 
-  if (!no_more_pch)
-{
-  no_more_pch = true;
-  c_common_no_more_pch ();
-}
-
   timevar_pop (TV_CPP);
 
   return type;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 81919a89cc..cf973b3c8d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -20232,6 +20232,8 @@ c_parse_file (void)
 
   if (c_parser_peek_token ()->pragma_kind == PRAGMA_GCC_PCH_PREPROCESS)
 c_parser_pragma_pch_preprocess ();
+  else
+c_common_no_more_pch ();
 
   the_parser = ggc_alloc ();
   *the_parser = tparser;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index dbbfe1dbc2..ab8a0cabb4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -41353,7 +41353,10 @@ cp_parser_initial_pragma (cp_token *first_token)
 
   cp_lexer_get_preprocessor_token (NULL, first_token);
   if (cp_parser_pragma_kind (first_token) != PRAGMA_GCC_PCH_PREPROCESS)
-return;
+{
+  c_common_no_more_pch ();
+  return;
+}
 
   cp_lexer_get_preprocessor_token (NULL, first_token);
   if (first_token->type == CPP_STRING)
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index a2736bc1df..220f20c5cf 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -2556,6 +2556,9 @@ ggc_pch_read (FILE *f, void *addr)
 
   count_old_page_tables = G.by_depth_in_use;
 
+  if (fread (, sizeof (d), 1, f) != 1)
+fatal_error (input_location, "cannot read PCH file: %m");
+
   /* We've just read in a PCH file.  So, every object that used to be
  allocated is now free.  */
   clear_marks ();
@@ -2584,8 +2587,6 @@ ggc_pch_read (FILE *f, void *addr)
 
   /* 

[PATCH 1/2] rs6000: Move various non-vector things out of altivec.md

2019-08-22 Thread Segher Boessenkool
Tested on powerpc64-linux {-m32,-m64}; committing to trunk.


Segher


2019-08-22  Segher Boessenkool  

* config/rs6000/altivec.md (unspec): Delete UNSPEC_DARN, UNSPEC_DARN_32,
UNSPEC_DARN_RAW, UNSPEC_CMPRB, UNSPEC_CMPRB2, UNSPEC_CMPEQB; move to...
* config/rs6000/rs6000.md (unspec): ... here.
* config/rs6000/altivec.md (darn_32, darn_raw, darn, cmprb,
*cmprb_internal, setb_signed, setb_unsigned, cmprb2, *cmprb2_internal,
cmpeqb, *cmpeqb_internal): Delete, move to...
* config/rs6000/rs6000.md (darn_32, darn_raw, darn, cmprb,
*cmprb_internal, setb_signed, setb_unsigned, cmprb2, *cmprb2_internal,
cmpeqb, *cmpeqb_internal): ... here.

---
 gcc/config/rs6000/altivec.md | 223 --
 gcc/config/rs6000/rs6000.md  | 224 +++
 2 files changed, 224 insertions(+), 223 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 3a8cd76..6fa4d80 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -80,9 +80,6 @@ (define_c_enum "unspec"
UNSPEC_VUPKHPX
UNSPEC_VUPKLPX
UNSPEC_CONVERT_4F32_8I16
-   UNSPEC_DARN
-   UNSPEC_DARN_32
-   UNSPEC_DARN_RAW
UNSPEC_DST
UNSPEC_DSTT
UNSPEC_DSTST
@@ -161,9 +158,6 @@ (define_c_enum "unspec"
UNSPEC_BCDADD
UNSPEC_BCDSUB
UNSPEC_BCD_OVERFLOW
-   UNSPEC_CMPRB
-   UNSPEC_CMPRB2
-   UNSPEC_CMPEQB
UNSPEC_VRLMI
UNSPEC_VRLNM
 ])
@@ -4107,223 +4101,6 @@ (define_insn "*bcd_test2"
   "bcd. %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "darn_32"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
-  "TARGET_P9_MISC"
-  "darn %0,0"
-  [(set_attr "type" "integer")])
-
-(define_insn "darn_raw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(const_int 0)] UNSPEC_DARN_RAW))]
-  "TARGET_P9_MISC && TARGET_64BIT"
-  "darn %0,2"
-  [(set_attr "type" "integer")])
-
-(define_insn "darn"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(const_int 0)] UNSPEC_DARN))]
-  "TARGET_P9_MISC && TARGET_64BIT"
-  "darn %0,1"
-  [(set_attr "type" "integer")])
-
-;; Test byte within range.
-;;
-;; The bytes of operand 1 are organized as xx:xx:xx:vv, where xx
-;; represents a byte whose value is ignored in this context and
-;; vv, the least significant byte, holds the byte value that is to
-;; be tested for membership within the range specified by operand 2.
-;; The bytes of operand 2 are organized as xx:xx:hi:lo.
-;;
-;; Return in target register operand 0 a value of 1 if lo <= vv and
-;; vv <= hi.  Otherwise, set register operand 0 to 0.
-;;
-;; Though the instructions to which this expansion maps operate on
-;; 64-bit registers, the current implementation only operates on
-;; SI-mode operands as the high-order bits provide no information
-;; that is not already available in the low-order bits.  To avoid the
-;; costs of data widening operations, future enhancements might allow
-;; DI mode for operand 0 and/or might allow operand 1 to be QI mode.
-(define_expand "cmprb"
-  [(set (match_dup 3)
-   (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
-   (match_operand:SI 2 "gpc_reg_operand" "r")]
-UNSPEC_CMPRB))
-   (set (match_operand:SI 0 "gpc_reg_operand" "=r")
-   (if_then_else:SI (lt (match_dup 3)
-(const_int 0))
-(const_int -1)
-(if_then_else (gt (match_dup 3)
-  (const_int 0))
-  (const_int 1)
-  (const_int 0]
-  "TARGET_P9_MISC"
-{
-  operands[3] = gen_reg_rtx (CCmode);
-})
-
-;; The bytes of operand 1 are organized as xx:xx:xx:vv, where xx
-;; represents a byte whose value is ignored in this context and
-;; vv, the least significant byte, holds the byte value that is to
-;; be tested for membership within the range specified by operand 2.
-;; The bytes of operand 2 are organized as xx:xx:hi:lo.
-;;
-;; Set bit 1 (the GT bit, 0x4) of CR register operand 0 to 1 if
-;; lo <= vv and vv <= hi.  Otherwise, set the GT bit to 0.  The other
-;; 3 bits of the target CR register are all set to 0.
-(define_insn "*cmprb_internal"
-  [(set (match_operand:CC 0 "cc_reg_operand" "=y")
-   (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
-   (match_operand:SI 2 "gpc_reg_operand" "r")]
-UNSPEC_CMPRB))]
-  "TARGET_P9_MISC"
-  "cmprb %0,0,%1,%2"
-  [(set_attr "type" "logical")])
-
-;; Set operand 0 register to -1 if the LT bit (0x8) of condition
-;; register operand 1 is on.  Otherwise, set operand 0 register to 1
-;; if the GT bit (0x4) of condition register operand 1 is on.
-;; Otherwise, set operand 0 to 0.  Note that the result stored into
-;; register operand 0 is non-zero iff either the LT or GT bits 

[PATCH 2/2] rs6000: Use unspec_volatile for darn (PR91481)

2019-08-22 Thread Segher Boessenkool
Every call to darn should deliver a *new* random number; such calls
shouldnot be CSEd together.  So they should be unspec_volatile, not
plain unspec.

Tested on powerpc64-linux {-m32,-m64}.  (New testcase coming soon).
Committing to trunk, and will backport to 9, 8, 7 as well.


Segher


2019-08-22  Segher Boessenkool  

PR target/91481
* config/rs6000/rs6000.md (unspec): Delete UNSPEC_DARN, UNSPEC_DARN_32,
and UNSPEC_DARN_RAW.
(unspecv): New enumerator values UNSPECV_DARN, UNSPECV_DARN_32, and
UNSPECV_DARN_RAW.
(darn_32): Use an unspec_volatile, and UNSPECV_DARN_32.
(darn_raw): Use an unspec_volatile, and UNSPECV_DARN_RAW.
(darn): Use an unspec_volatile, and UNSPECV_DARN.

---
 gcc/config/rs6000/rs6000.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 68cdf68..b0aea23 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -134,9 +134,6 @@ (define_c_enum "unspec"
UNSPEC_LSQ
UNSPEC_FUSION_GPR
UNSPEC_STACK_CHECK
-   UNSPEC_DARN
-   UNSPEC_DARN_32
-   UNSPEC_DARN_RAW
UNSPEC_CMPRB
UNSPEC_CMPRB2
UNSPEC_CMPEQB
@@ -168,6 +165,9 @@ (define_c_enum "unspecv"
UNSPECV_EH_RR   ; eh_reg_restore
UNSPECV_ISYNC   ; isync instruction
UNSPECV_MFTB; move from time base
+   UNSPECV_DARN; darn 1 (deliver a random number)
+   UNSPECV_DARN_32 ; darn 2
+   UNSPECV_DARN_RAW; darn 0
UNSPECV_NLGR; non-local goto receiver
UNSPECV_MFFS; Move from FPSCR
UNSPECV_MFFSL   ; Move from FPSCR light instruction version
@@ -14387,21 +14387,21 @@ (define_insn "*cmp_hw"
 
 (define_insn "darn_32"
   [(set (match_operand:SI 0 "register_operand" "=r")
-(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
+(unspec_volatile:SI [(const_int 0)] UNSPECV_DARN_32))]
   "TARGET_P9_MISC"
   "darn %0,0"
   [(set_attr "type" "integer")])
 
 (define_insn "darn_raw"
   [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(const_int 0)] UNSPEC_DARN_RAW))]
+(unspec_volatile:DI [(const_int 0)] UNSPECV_DARN_RAW))]
   "TARGET_P9_MISC && TARGET_64BIT"
   "darn %0,2"
   [(set_attr "type" "integer")])
 
 (define_insn "darn"
   [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(const_int 0)] UNSPEC_DARN))]
+(unspec_volatile:DI [(const_int 0)] UNSPECV_DARN))]
   "TARGET_P9_MISC && TARGET_64BIT"
   "darn %0,1"
   [(set_attr "type" "integer")])
-- 
1.8.3.1



dwarf2out suspicious code

2019-08-22 Thread Nathan Sidwell

Honza, Ricard,

I fell over this assert in dwarf2out.c, but the later comment says it 
can happen??  does a checking build fail on LTO?


static inline void
add_AT_die_ref (dw_die_ref die, enum dwarf_attribute attr_kind, 
dw_die_ref targ_die)

{
  dw_attr_node attr;
  gcc_checking_assert (targ_die != NULL);

  /* With LTO we can end up trying to reference something we didn't create
 a DIE for.  Avoid crashing later on a NULL referenced DIE.  */
  if (targ_die == NULL)
return;

nathan
--
Nathan Sidwell


Re: [PATCH] issue consistent warning for past-the-end array stores (PR 91457)

2019-08-22 Thread Jason Merrill
On Thu, Aug 22, 2019 at 3:43 AM Richard Biener
 wrote:
> On Wed, Aug 21, 2019 at 10:10 PM Martin Sebor  wrote:
> > On 8/20/19 1:26 AM, Richard Biener wrote:
> > > On Tue, Aug 20, 2019 at 4:32 AM Martin Sebor  wrote:
> > >> On 8/19/19 8:10 AM, Richard Biener wrote:
> > >>> On Sat, Aug 17, 2019 at 12:43 AM Martin Sebor  wrote:
> > 
> >  With the recent enhancement to the strlen handling of multibyte
> >  stores the g++.dg/warn/Warray-bounds-4.C for zero-length arrays
> >  started failing on hppa (and probably elsewhere as well).  This
> >  is partly the result of the added detection of past-the-end
> >  writes into the strlen pass which detects more instances of
> >  the problem than -Warray-bounds.  Since the IL each warning
> >  works with varies between targets, the same invalid code can
> >  be diagnosed by one warning one target and different warning
> >  on another.
> > 
> >  The attached patch does three things:
> > 
> >  1) It enhances compute_objsize to also determine the size of
> >  a flexible array member (and its various variants), including
> >  from its initializer if necessary.  (This resolves 91457 but
> >  introduces another warning where was previously just one.)
> >  2) It guards the new instance of -Wstringop-overflow with
> >  the no-warning bit on the assignment to avoid warning on code
> >  that's already been diagnosed.
> >  3) It arranges for -Warray-bounds to set the no-warning bit on
> >  the enclosing expression to keep -Wstringop-overflow from issuing
> >  another warning for the same problem.
> > 
> >  Testing the compute_objsize enhancement to bring it up to par
> >  with -Warray-bounds in turn exposed a weakness in the latter
> >  warning for flexible array members.  Rather than snowballing
> >  additional improvements into this one I decided to put that
> >  off until later, so the new -Warray-bounds test has a bunch
> >  of XFAILs.  I'll see if I can find the time to deal with those
> >  either still in stage 1 or in stage 3 (one of them is actually
> >  an ancient regression).
> > >>>
> > >>> +static tree
> > >>> +get_initializer_for (tree init, tree decl)
> > >>> +{
> > >>>
> > >>> can't you use fold_ctor_reference here?
> > >>
> > >> Yes, but only with an additional enhancement.  Char initializers
> > >> for flexible array members aren't transformed to STRING_CSTs yet,
> > >> so without the size of the initializer specified, the function
> > >> returns the initializer for the smallest subobject, or char in
> > >> this case.  I've enhanced the function to handle them.
> > >
> > > So at the moment it returns an empty array constructor, correct?
> > > Isn't that the more canonical representation?  The STRING_CST
> > > index type doesn't really handle "empty" strings and I expect code
> > > more confused about that than about an empty CTOR?
> >
> > Yes.  Returning an empty CTOR is more general than an empty
> > string and enables more optimizations.  It requires changing
> > the caller(s) that look for a string but I think that's fine.
> > Thanks for the hint!
> >
> > >>> +/* Determine the size of the flexible array FLD from the initializer
> > >>> +   expression for the struct object DECL in which the meber is declared
> > >>> +   (possibly recursively).  Return the size or zero constant if it 
> > >>> isn't
> > >>> +   initialized.  */
> > >>> +
> > >>> +static tree
> > >>> +get_flexarray_size (tree decl, tree fld)
> > >>> +{
> > >>> +  if (tree init = DECL_INITIAL (decl))
> > >>> +{
> > >>> +  init = get_initializer_for (init, fld);
> > >>> +  if (init)
> > >>> +   return TYPE_SIZE_UNIT (TREE_TYPE (init));
> > >>> +}
> > >>> +
> > >>> +  return integer_zero_node;
> > >>>
> > >>> so you're hoping that the (sub-)CONSTRUCTOR get_initializer_for
> > >>> returns has a complete type but the initialized object didn't get it
> > >>> completed.  Isnt that wishful thinking?
> > >>
> > >> I don't know what you mean.  When might a CONSTRUCTOR not have
> > >> a complete type, and if/when it doesn't, why would that be
> > >> a problem here?  TYPE_SIZE_UNIT will evaluate to null meaning
> > >> "don't know" and that's fine.  Could you try to be more specific
> > >> about the problem you're pointing out?
> > >>
> > >>> And why return integer_zero_node
> > >>> rather than NULL_TREE here?
> > >>
> > >> Because the size of a flexible array member with no initializer
> > >> is zero.
> > >>
> > >>>
> > >>> +  if (TREE_CODE (dest) == COMPONENT_REF)
> > >>> +{
> > >>> +  *pdecl = TREE_OPERAND (dest, 1);
> > >>> +
> > >>> +  /* If the member has a size return it.  Otherwise it's a flexible
> > >>> +array member.  */
> > >>> +  if (tree size = DECL_SIZE_UNIT (*pdecl))
> > >>> +   return size;
> > >>>
> > >>> because here you do.
> > >>
> > >> Not sure what you mean here either.  (This code was also a bit
> > >
> > > You 

Re: [patch, fortran] Fix PR 91390 - treatment of extra parameter in a subroutine call

2019-08-22 Thread Steve Kargl
On Thu, Aug 22, 2019 at 08:50:20PM +0200, Thomas Koenig wrote:
> Am 20.08.19 um 22:32 schrieb Thomas König:
> 
> > here is the next installment of checking for mismatched calls,
> > this time for mismatching CALLs.
> 
> The reorganization of the code also means that
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91519 (a rejects-valid
> regression) is fixed by this patch.
> 
> So, OK for trunk?
> 

I'll have time to look over the patch on Saturday.
If someone can give the patch a review, then go
for it.

-- 
Steve


Re: [patch, fortran] Fix PR 91390 - treatment of extra parameter in a subroutine call

2019-08-22 Thread Thomas Koenig

Am 20.08.19 um 22:32 schrieb Thomas König:


here is the next installment of checking for mismatched calls,
this time for mismatching CALLs.


The reorganization of the code also means that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91519 (a rejects-valid
regression) is fixed by this patch.

So, OK for trunk?

Regards

Thomas


Re: C++ PATCH for c++/79817 - attribute deprecated on namespace

2019-08-22 Thread Jason Merrill
On Thu, Aug 22, 2019 at 11:01 AM Nathan Sidwell  wrote:
>
> On 8/20/19 9:03 PM, Marek Polacek wrote:
>
> > and in cp_parser_nested_name_specifier_opt we simply don't know if we're
> > dealing with a function decl.  Calling cp_warn_deprecated_use_scopes from
> > cp_parser_type_specifier resulted int duplicated diagnostics so that one
> > is out too.  So I did the following which doesn't seem too bad.
>
> >
> > diff --git gcc/cp/decl.c gcc/cp/decl.c
> > index 08b7baa40e0..46ad0271f7b 100644
> > --- gcc/cp/decl.c
> > +++ gcc/cp/decl.c
> > @@ -10791,6 +10791,7 @@ grokdeclarator (const cp_declarator *declarator,
> > cp_warn_deprecated_use (type);
> > if (type && TREE_CODE (type) == TYPE_DECL)
> >   {
> > +  cp_warn_deprecated_use_scopes (DECL_CONTEXT (type));
>
> CP_DECL_CONTEXT would be clearer, here and elsewhere.
>
> > /* Do warn about using typedefs to a deprecated class.  */
> > diff --git gcc/cp/decl2.c gcc/cp/decl2.c
> > index a32108f9d16..d6f407d7aef 100644
> > --- gcc/cp/decl2.c
> > +++ gcc/cp/decl2.c
> > @@ -5407,6 +5407,23 @@ cp_warn_deprecated_use (tree decl, tsubst_flags_t 
> > complain)
> > return warned;
> >   }
> >
> > +/* Like above, but takes into account outer scopes.  */
> > +
> > +void
> > +cp_warn_deprecated_use_scopes (tree ns)
>
> Do we need to walk non-namespace scopes here?  can we just bail if NS is
> not a namespace?  if can legitimately not be a namespace, calling it NS
> is confusing :)

It seems like it can be a class in some cases, and I would want to
warn about deprecated classes named in a nested-name-specifier.  Did
we not already warn about that?

I agree that the parameter name is confusing.

> > +{
> > +  while (ns
> > +  && ns != error_mark_node
> > +  && ns != global_namespace)
> > +{
> > +  cp_warn_deprecated_use (ns);
> > +  if (TYPE_P (ns))
> ... and does this ever trigger?
>
> > + ns = CP_TYPE_CONTEXT (ns);
> > +  else
> > + ns = CP_DECL_CONTEXT (ns);
> > +}
> > +}
>
> I always worry about such recursive lookups.  NAMESPACE_DECL has so many
> spare flags, could we take one to say 'is, or contained in, deprecated',
> and thus know whether we can bail early.  And stop at the first
> deprecated one -- though not sure why someone would deprecate more than
> one namespace in a nest.  thoughts?

I can imagine deprecating an inner namespace and later deprecating an
outer namespace, but I don't think it's important to warn about more
than one in that case.

Jason


Re: C++ PATCH for c++/79817 - attribute deprecated on namespace

2019-08-22 Thread Nathan Sidwell

On 8/20/19 9:03 PM, Marek Polacek wrote:


and in cp_parser_nested_name_specifier_opt we simply don't know if we're
dealing with a function decl.  Calling cp_warn_deprecated_use_scopes from
cp_parser_type_specifier resulted int duplicated diagnostics so that one
is out too.  So I did the following which doesn't seem too bad.




diff --git gcc/cp/decl.c gcc/cp/decl.c
index 08b7baa40e0..46ad0271f7b 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -10791,6 +10791,7 @@ grokdeclarator (const cp_declarator *declarator,
cp_warn_deprecated_use (type);
if (type && TREE_CODE (type) == TYPE_DECL)
  {
+  cp_warn_deprecated_use_scopes (DECL_CONTEXT (type));


CP_DECL_CONTEXT would be clearer, here and elsewhere.


/* Do warn about using typedefs to a deprecated class.  */
diff --git gcc/cp/decl2.c gcc/cp/decl2.c
index a32108f9d16..d6f407d7aef 100644
--- gcc/cp/decl2.c
+++ gcc/cp/decl2.c
@@ -5407,6 +5407,23 @@ cp_warn_deprecated_use (tree decl, tsubst_flags_t 
complain)
return warned;
  }
  
+/* Like above, but takes into account outer scopes.  */

+
+void
+cp_warn_deprecated_use_scopes (tree ns)


Do we need to walk non-namespace scopes here?  can we just bail if NS is 
not a namespace?  if can legitimately not be a namespace, calling it NS 
is confusing :)



+{
+  while (ns
+&& ns != error_mark_node
+&& ns != global_namespace)
+{
+  cp_warn_deprecated_use (ns);
+  if (TYPE_P (ns))

... and does this ever trigger?


+   ns = CP_TYPE_CONTEXT (ns);
+  else
+   ns = CP_DECL_CONTEXT (ns);
+}
+}


I always worry about such recursive lookups.  NAMESPACE_DECL has so many 
spare flags, could we take one to say 'is, or contained in, deprecated', 
and thus know whether we can bail early.  And stop at the first 
deprecated one -- though not sure why someone would deprecate more than 
one namespace in a nest.  thoughts?


otherwise looks good, with a good set of tests.

nathan

--
Nathan Sidwell


[Patch, Fortran] CO_BROADCAST for derived types with allocatable components

2019-08-22 Thread Alessandro Fanfarillo
Dear all,
please find in attachment a preliminary patch that adds support to
co_broadcast for allocatable components of derived types.
The patch is currently ignoring the stat and errmsg arguments, mostly
because I am not sure how to handle them properly. I have created a
new data structure called used to pass those argument to the
preexisting structure_alloc_comps.
Suggestions on how to handle them are more than welcome :-)

The patch builds correctly on x86_64 and it has been tested with
OpenCoarrays and the following test cases:

https://github.com/sourceryinstitute/OpenCoarrays/blob/co_broadcast-derived-type/src/tests/unit/collectives/co_broadcast_allocatable_components.f90

https://github.com/sourceryinstitute/OpenCoarrays/blob/co_broadcast-derived-type/src/tests/unit/collectives/co_broadcast_allocatable_components_array.f90

Regards,
commit b9458ff4414615263ed92d8965c93fd0a953f4a9
Author: Alessandro Fanfarillo 
Date:   Thu Aug 22 10:50:17 2019 -0600

Co_broadcast derived types with allocatable components

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index c8d74e588dd..005646f1359 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -8571,13 +8571,15 @@ gfc_caf_is_dealloc_only (int caf_mode)
 
 enum {DEALLOCATE_ALLOC_COMP = 1, NULLIFY_ALLOC_COMP,
   COPY_ALLOC_COMP, COPY_ONLY_ALLOC_COMP, REASSIGN_CAF_COMP,
-  ALLOCATE_PDT_COMP, DEALLOCATE_PDT_COMP, CHECK_PDT_DUMMY};
+  ALLOCATE_PDT_COMP, DEALLOCATE_PDT_COMP, CHECK_PDT_DUMMY,
+  BCAST_ALLOC_COMP};
 
 static gfc_actual_arglist *pdt_param_list;
 
 static tree
 structure_alloc_comps (gfc_symbol * der_type, tree decl,
-		   tree dest, int rank, int purpose, int caf_mode)
+		   tree dest, int rank, int purpose, int caf_mode,
+		   gfc_co_subroutines_args *args)
 {
   gfc_component *c;
   gfc_loopinfo loop;
@@ -8663,14 +8665,14 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
 	  && !caf_enabled (caf_mode))
 	{
 	  tmp = build_fold_indirect_ref_loc (input_location,
-	 gfc_conv_array_data (dest));
+	 gfc_conv_array_data (dest));
 	  dref = gfc_build_array_ref (tmp, index, NULL);
 	  tmp = structure_alloc_comps (der_type, vref, dref, rank,
-   COPY_ALLOC_COMP, 0);
+   COPY_ALLOC_COMP, 0, args);
 	}
   else
 	tmp = structure_alloc_comps (der_type, vref, NULL_TREE, rank, purpose,
- caf_mode);
+ caf_mode, args);
 
   gfc_add_expr_to_block (, tmp);
 
@@ -8704,13 +8706,13 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
   if (purpose == DEALLOCATE_ALLOC_COMP && der_type->attr.pdt_type)
 {
   tmp = structure_alloc_comps (der_type, decl, NULL_TREE, rank,
-   DEALLOCATE_PDT_COMP, 0);
+   DEALLOCATE_PDT_COMP, 0, args);
   gfc_add_expr_to_block (, tmp);
 }
   else if (purpose == ALLOCATE_PDT_COMP && der_type->attr.alloc_comp)
 {
   tmp = structure_alloc_comps (der_type, decl, NULL_TREE, rank,
-   NULLIFY_ALLOC_COMP, 0);
+   NULLIFY_ALLOC_COMP, 0, args);
   gfc_add_expr_to_block (, tmp);
 }
 
@@ -8732,6 +8734,128 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
 
   switch (purpose)
 	{
+
+	case BCAST_ALLOC_COMP:
+
+	  tree ubound;
+	  tree cdesc;
+	  stmtblock_t derived_type_block;
+
+	  gfc_init_block ();
+
+	  comp = fold_build3_loc (input_location, COMPONENT_REF, ctype,
+  decl, cdecl, NULL_TREE);
+
+	  /* Shortcut to get the attributes of the component.  */
+	  if (c->ts.type == BT_CLASS)
+	{
+	  attr = _DATA (c)->attr;
+	  if (attr->class_pointer)
+		continue;
+	}
+	  else
+	{
+	  attr = >attr;
+	  if (attr->pointer)
+		continue;
+	}
+
+	  add_when_allocated = NULL_TREE;
+	  if (cmp_has_alloc_comps
+	  && !c->attr.pointer && !c->attr.proc_pointer)
+	{
+	  /* Add checked deallocation of the components.  This code is
+		 obviously added because the finalizer is not trusted to free
+		 all memory.  */
+	  if (c->ts.type == BT_CLASS)
+		{
+		  rank = CLASS_DATA (c)->as ? CLASS_DATA (c)->as->rank : 0;
+		  add_when_allocated
+		  = structure_alloc_comps (CLASS_DATA (c)->ts.u.derived,
+	   comp, NULL_TREE, rank, purpose,
+	   caf_mode, args);
+		}
+	  else
+		{
+		  rank = c->as ? c->as->rank : 0;
+		  add_when_allocated = structure_alloc_comps (c->ts.u.derived,
+			  comp, NULL_TREE,
+			  rank, purpose,
+			  caf_mode, args);
+		}
+	}
+
+	  gfc_init_block (_type_block);
+	  if (add_when_allocated)
+	gfc_add_expr_to_block (_type_block, add_when_allocated);
+	  tmp = gfc_finish_block (_type_block);
+	  gfc_add_expr_to_block (, tmp);
+
+	  /* Convert the component into a rank 1 descriptor type.  */
+	  if (attr->dimension)
+	{
+	  tmp = gfc_get_element_type (TREE_TYPE (comp));
+	  ubound = gfc_full_array_size (, comp,
+	c->ts.type == BT_CLASS
+	? CLASS_DATA (c)->as->rank
+	: c->as->rank);
+	}
+	  else
+	{
+	  

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

2019-08-22 Thread Martin Sebor

On 8/21/19 4:34 PM, Jeff Law wrote:

On 8/16/19 4:21 PM, Martin Sebor wrote:

On 8/16/19 12:15 PM, Jeff Law wrote:

On 8/16/19 12:09 PM, Marc Glisse wrote:

On Fri, 16 Aug 2019, Jeff Law wrote:


This patch improves our ability to detect dead stores by handling cases
where the size memcpy, memset, strncpy, etc call is not constant.  This
addresses some, but not all, of the issues in 80576.

The key here is when the size is not constant we can make conservative
decisions that still give us a chance to analyze the code for dead
stores.

Remember that for dead store elimination, we're trying to prove that
given two stores, the second store overwrites (partially or fully) the
same memory locations as the first store.  That makes the first store
either partially or fully dead.

When we encounter the first store, we set up a bitmap of bytes written
by that store (live_bytes).  We then look at subsequent stores and
clear
the appropriate entries in the bitmap.

If the first store has a nonconstant length argument we can use the
range of the length argument (max) and the size of the destination
object to make a conservative estimation of how many bytes are written.

For the second store the conservative thing to do for a non-constant
length is to use the minimum of the range of the length argument.


So I guess it won't handle things like

void f(char*p,int n){
    __builtin_memset(p,3,n);
    __builtin_memset(p,7,n);
}

where we know nothing about the length, except that it is the same? Or
do you look at symbolic ranges?

Nope.  I think ao_ref can represent that, so it'd just be a matter of
recording "n" as the length, then verifying that the second call's
length is "n" as well.  That makes the first call dead.  We'd have to
bypass the byte tracking in that case, but I think that's trivial
because we already have a means to do that when the sizes are too large.




This doesn't come up a lot in practice.  But it also happens to put
some
of the infrastructure in place to handle strcpy and strcpy_chk which
are
needed to fully resolve 80576.

Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
ppc32, aarch64, sparc, s390x and probably others.  Also verified that
the tests work on the various *-elf targets in my tester.

OK for the trunk?


ENOPATCH

Opps.    Attached.


It's an improvement and I realize you said it doesn't handle
everything and that you don't think it comes up a lot, but...
I would actually expect the following example (from the bug)
not to be that uncommon:

   void g (char *s)
   {
     char a[8];
     __builtin_strcpy (a, s);
     __builtin_memset (a, 0, sizeof a);
     f (a);
   }

or at least to be more common than the equivalent alternative
the improvement does optimize:

   void g (char *s)
   {
     char a[8];
     __builtin_memcpy (a, s,  __builtin_strlen (s));
     __builtin_memset (a, 0, 8);
     f (a);
   }

It seems that making the first one work should be just a matter
of handling strcpy analogously (the string length doesn't matter).

As an aside, the new tests make me realize that -Wstringop-overflow
should be enhanced to detect this problem (i.e., a consider
the largest size in a PHI).

Certainly not seeing much of these in the code I've looked at.  It may
be a case that aliasing gets in the way often.

Also note that we've got the same issues in this space that we did with
the strlen optimization improvements for last year.  I totally spaced
that and the net result may be we have to avoid using the type size for
anything in DSE which may make it impossible to really do a good job.


The issue with the strlen optimization was that it relied on
the type of subobjects of multidimensional arrays and structs
to determine the maximum valid size of the access to them.
This ran afoul of assumptions in code that doesn't respect
subobject boundaries.

This is not a concern for outermost objects like in the example
above.  strlen and other parts of GCC still make use of their
size for optimization.  The middle-end diagnostics I've been
adding still expect code to respect subobject boundaries.  I've
been doing that for this very reason: to let us do a better job
not just optimizing code, but also diagnosing bugs: find more
real problems with fewer false positives.

Martin


Re: [PATCH] issue consistent warning for past-the-end array stores (PR 91457)

2019-08-22 Thread Jeff Law
On 8/20/19 1:26 AM, Richard Biener wrote:
> On Tue, Aug 20, 2019 at 4:32 AM Martin Sebor  wrote:
>>
>> On 8/19/19 8:10 AM, Richard Biener wrote:
>>> On Sat, Aug 17, 2019 at 12:43 AM Martin Sebor  wrote:

 With the recent enhancement to the strlen handling of multibyte
 stores the g++.dg/warn/Warray-bounds-4.C for zero-length arrays
 started failing on hppa (and probably elsewhere as well).  This
 is partly the result of the added detection of past-the-end
 writes into the strlen pass which detects more instances of
 the problem than -Warray-bounds.  Since the IL each warning
 works with varies between targets, the same invalid code can
 be diagnosed by one warning one target and different warning
 on another.

 The attached patch does three things:

 1) It enhances compute_objsize to also determine the size of
 a flexible array member (and its various variants), including
 from its initializer if necessary.  (This resolves 91457 but
 introduces another warning where was previously just one.)
 2) It guards the new instance of -Wstringop-overflow with
 the no-warning bit on the assignment to avoid warning on code
 that's already been diagnosed.
 3) It arranges for -Warray-bounds to set the no-warning bit on
 the enclosing expression to keep -Wstringop-overflow from issuing
 another warning for the same problem.

 Testing the compute_objsize enhancement to bring it up to par
 with -Warray-bounds in turn exposed a weakness in the latter
 warning for flexible array members.  Rather than snowballing
 additional improvements into this one I decided to put that
 off until later, so the new -Warray-bounds test has a bunch
 of XFAILs.  I'll see if I can find the time to deal with those
 either still in stage 1 or in stage 3 (one of them is actually
 an ancient regression).
>>>
>>> +static tree
>>> +get_initializer_for (tree init, tree decl)
>>> +{
>>>
>>> can't you use fold_ctor_reference here?
>>
>> Yes, but only with an additional enhancement.  Char initializers
>> for flexible array members aren't transformed to STRING_CSTs yet,
>> so without the size of the initializer specified, the function
>> returns the initializer for the smallest subobject, or char in
>> this case.  I've enhanced the function to handle them.
> 
> So at the moment it returns an empty array constructor, correct?
> Isn't that the more canonical representation?  The STRING_CST
> index type doesn't really handle "empty" strings and I expect code
> more confused about that than about an empty CTOR?
> 
>>>
>>> +/* Determine the size of the flexible array FLD from the initializer
>>> +   expression for the struct object DECL in which the meber is declared
>>> +   (possibly recursively).  Return the size or zero constant if it isn't
>>> +   initialized.  */
>>> +
>>> +static tree
>>> +get_flexarray_size (tree decl, tree fld)
>>> +{
>>> +  if (tree init = DECL_INITIAL (decl))
>>> +{
>>> +  init = get_initializer_for (init, fld);
>>> +  if (init)
>>> +   return TYPE_SIZE_UNIT (TREE_TYPE (init));
>>> +}
>>> +
>>> +  return integer_zero_node;
>>>
>>> so you're hoping that the (sub-)CONSTRUCTOR get_initializer_for
>>> returns has a complete type but the initialized object didn't get it
>>> completed.  Isnt that wishful thinking?
>>
>> I don't know what you mean.  When might a CONSTRUCTOR not have
>> a complete type, and if/when it doesn't, why would that be
>> a problem here?  TYPE_SIZE_UNIT will evaluate to null meaning
>> "don't know" and that's fine.  Could you try to be more specific
>> about the problem you're pointing out?
>>
>>> And why return integer_zero_node
>>> rather than NULL_TREE here?
>>
>> Because the size of a flexible array member with no initializer
>> is zero.
>>
>>>
>>> +  if (TREE_CODE (dest) == COMPONENT_REF)
>>> +{
>>> +  *pdecl = TREE_OPERAND (dest, 1);
>>> +
>>> +  /* If the member has a size return it.  Otherwise it's a flexible
>>> +array member.  */
>>> +  if (tree size = DECL_SIZE_UNIT (*pdecl))
>>> +   return size;
>>>
>>> because here you do.
>>
>> Not sure what you mean here either.  (This code was also a bit
> 
> You return NULL_TREE.
> 
>> out of date WRT to the patch I had tested.  Not sure how that
>> happened.  The attached patch is up to date.)
>>
>>>
>>> Also once you have an underlying VAR_DECL you can compute
>>> the flexarray size by DECL_SIZE (var) - offset-of flexarray member.
>>> Isn't that way cheaper than walking the initializer (possibly many
>>> times?)
>>
>> It would be nice if it were this easy.  Is the value of DECL_SIZE
>> (var) supposed to include the size of the flexible array member?
> 
> Yes, DECL_SIZE of the VAR_DECL is the size we use for assembling.
> It is usually only the types that remain incomplete (or too small) since
> the FE doesn't create many variants of a struct S { int n; char x[]; }
> when "instantiating" it 

Re: [testsuite, i386] Fix gcc.target/i386/minmax-4.c etc. on 32-bit Solaris/x86

2019-08-22 Thread Jeff Law
On 8/22/19 9:18 AM, Rainer Orth wrote:
> The new gcc.target/i386/minmax-4.c etc. testcases currently FAIL on
> 32-bit Solaris/x86:
> 
> FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pmaxsd 1
> FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pmaxud 1
> FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pminsd 1
> FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pminud 1
> FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpmaxsd 1
> FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpmaxud 1
> FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpminsd 1
> FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpminud 1
> FAIL: gcc.target/i386/minmax-6.c scan-assembler pmaxsd
> FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd
> FAIL: gcc.target/i386/pr91154.c scan-assembler-not cmov
> FAIL: gcc.target/i386/pr91154.c scan-assembler-times paddd 2
> FAIL: gcc.target/i386/pr91154.c scan-assembler-times pmaxsd 2
> 
> I'd mentioned this in PRs target/91154 and target/91498 where Uros
> pointed out that 32-bit Solaris/x86 only guarantees 4-byte stack
> alignment, thus STV is disabled.
> 
> In line with several other STV tests, adding -mno-stackrealign to the
> options fixes the failures.
> 
> Tested with the appropriate runtest invocation on i386-pc-solaris2.11
> and x86_64-pc-linux-gnu.
> 
> Ok for mainline?
OK.
Jeff


[PATCH][arm][committed] Fix use of CRC32 intrinsics with Armv8-a and hard-float

2019-08-22 Thread Kyrill Tkachov

Hi all,

We currently have a nasty error when trying to use the __crc* intrinsics 
with an -mfloat-abi=hard.
That is because the target pragma guarding them uses armv8-a+crc that 
does not include fp by default.

So we get errors like:
error: '-mfloat-abi=hard': selected processor lacks an FPU

This patch fixes that by using an FP-enabled arch target pragma to guard 
these intrinsics when floating-point is available.

That way both the softfloat and hardfloat variants work.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk. Will backport to branches later.

Thanks,
Kyrill

2019-08-22  Kyrylo Tkachov 

    * config/arm/arm_acle.h: Use arch=armv8-a+crc+simd pragma for CRC32
    intrinsics if __ARM_FP.
    Use __ARM_FEATURE_CRC32 ifdef guard.

2019-08-22  Kyrylo Tkachov 

    * gcc.target/arm/acle/crc_hf_1.c: New test.

diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 2c7acc698ea50e0cd539a7d9647498746cd1536f..6857ab1787df0ffa672e5078e5a0b9c9cc52e695 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -174,8 +174,12 @@ __arm_mrrc2 (const unsigned int __coproc, const unsigned int __opc1,
 #endif /* (!__thumb__ || __thumb2__) &&  __ARM_ARCH >= 4.  */
 
 #pragma GCC push_options
-#if __ARM_ARCH >= 8
+#ifdef __ARM_FEATURE_CRC32
+#ifdef __ARM_FP
+#pragma GCC target ("arch=armv8-a+crc+simd")
+#else
 #pragma GCC target ("arch=armv8-a+crc")
+#endif
 
 __extension__ static __inline uint32_t __attribute__ ((__always_inline__))
 __crc32b (uint32_t __a, uint8_t __b)
@@ -235,7 +239,7 @@ __crc32cd (uint32_t __a, uint64_t __b)
 }
 #endif
 
-#endif /* __ARM_ARCH >= 8.  */
+#endif /* __ARM_FEATURE_CRC32  */
 #pragma GCC pop_options
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c
new file mode 100644
index ..e6cbfc0b33e56e4275b96978ca1823d7682792fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c
@@ -0,0 +1,14 @@
+/* Test that using an Armv8-a hard-float target doesn't
+   break CRC intrinsics.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-options "-mfloat-abi=hard -march=armv8-a+simd+crc" } */
+
+#include 
+
+uint32_t
+foo (uint32_t a, uint32_t b)
+{
+  return __crc32cw (a, b);
+}


[PATCH][Arm] Add support for missing CPUs

2019-08-22 Thread Dennis Zhang
Hi all,

This patch adds '-mcpu' options for following CPUs:
Cortex-M35P, Cortex-A77, Cortex-A76AE.

Related specifications are as following:
https://developer.arm.com/ip-products/processors/cortex-m
https://developer.arm.com/ip-products/processors/cortex-a

Bootstraped/Regtested for arm-none-linux-gnueabihf.

Please help to check.

Cheers
Dennis

gcc/ChangeLog:

2019-07-29  Dennis Zhang  

* config/arm/arm-cpus.in: New entries for Cortex-M35P,
Cortex-A76AE and Cortex-A77.
* config/arm/arm-tables.opt: Regenerated.
* config/arm/arm-tune.md: Likewise.
* doc/invoke.texi: Document the added processors.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 3a55f6ac6d2..f8a3b3db67a 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1331,6 +1331,28 @@ begin cpu cortex-a76
  part d0b
 end cpu cortex-a76
 
+begin cpu cortex-a76ae
+ cname cortexa76ae
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16+dotprod+simd
+ option crypto add FP_ARMv8 CRYPTO
+ costs cortex_a57
+ vendor 41
+ part d0e
+end cpu cortex-a76ae
+
+begin cpu cortex-a77
+ cname cortexa77
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16+dotprod+simd
+ option crypto add FP_ARMv8 CRYPTO
+ costs cortex_a57
+ vendor 41
+ part d0d
+end cpu cortex-a77
+
 begin cpu neoverse-n1
  cname neoversen1
  alias !ares
@@ -1379,6 +1401,15 @@ begin cpu cortex-m33
  costs v7m
 end cpu cortex-m33
 
+begin cpu cortex-m35p
+ cname cortexm35p
+ tune flags LDSCHED
+ architecture armv8-m.main+dsp+fp
+ option nofp remove ALL_FP
+ option nodsp remove armv7em
+ costs v7m
+end cpu cortex-m35p
+
 # V8 R-profile implementations.
 begin cpu cortex-r52
  cname cortexr52
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index bba54aea3d6..aeb5b3fbf62 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -234,6 +234,12 @@ Enum(processor_type) String(cortex-a75) Value( TARGET_CPU_cortexa75)
 EnumValue
 Enum(processor_type) String(cortex-a76) Value( TARGET_CPU_cortexa76)
 
+EnumValue
+Enum(processor_type) String(cortex-a76ae) Value( TARGET_CPU_cortexa76ae)
+
+EnumValue
+Enum(processor_type) String(cortex-a77) Value( TARGET_CPU_cortexa77)
+
 EnumValue
 Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
 
@@ -249,6 +255,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23)
 EnumValue
 Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33)
 
+EnumValue
+Enum(processor_type) String(cortex-m35p) Value( TARGET_CPU_cortexm35p)
+
 EnumValue
 Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index b9dfb66ec84..6fa5bb27750 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -44,7 +44,8 @@
 	cortexa73,exynosm1,xgene1,
 	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
 	cortexa73cortexa53,cortexa55,cortexa75,
-	cortexa76,neoversen1,cortexa75cortexa55,
-	cortexa76cortexa55,cortexm23,cortexm33,
+	cortexa76,cortexa76ae,cortexa77,
+	neoversen1,cortexa75cortexa55,cortexa76cortexa55,
+	cortexm23,cortexm33,cortexm35p,
 	cortexr52"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 29585cf15aa..42a9d7f81ed 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17510,10 +17510,12 @@ Permissible names are: @samp{arm7tdmi}, @samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-a9}, @samp{cortex-a12}, @samp{cortex-a15}, @samp{cortex-a17},
 @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{cortex-r4}, @samp{cortex-r4f},
+@samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
+@samp{ares}, @samp{cortex-r4}, @samp{cortex-r4f},
 @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8}, @samp{cortex-r52},
 @samp{cortex-m0}, @samp{cortex-m0plus}, @samp{cortex-m1}, @samp{cortex-m3},
 @samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m23}, @samp{cortex-m33},
+@samp{cortex-m35p},
 @samp{cortex-m1.small-multiply}, @samp{cortex-m0.small-multiply},
 @samp{cortex-m0plus.small-multiply}, @samp{exynos-m1}, @samp{marvell-pj4},
 @samp{neoverse-n1}, @samp{xscale}, @samp{iwmmxt}, @samp{iwmmxt2},
@@ -17577,14 +17579,14 @@ The following extension options are common to the listed CPUs:
 
 @table @samp
 @item +nodsp
-Disable the DSP instructions on @samp{cortex-m33}.
+Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
 @samp{arm1020e}, @samp{arm1022e}, @samp{arm926ej-s},
 @samp{arm1026ej-s}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
-@samp{cortex-m4}, @samp{cortex-m7} and @samp{cortex-m33}.

[testsuite, i386] Fix gcc.target/i386/minmax-4.c etc. on 32-bit Solaris/x86

2019-08-22 Thread Rainer Orth
The new gcc.target/i386/minmax-4.c etc. testcases currently FAIL on
32-bit Solaris/x86:

FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pmaxsd 1
FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pmaxud 1
FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pminsd 1
FAIL: gcc.target/i386/minmax-4.c scan-assembler-times pminud 1
FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpmaxsd 1
FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpmaxud 1
FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpminsd 1
FAIL: gcc.target/i386/minmax-5.c scan-assembler-times vpminud 1
FAIL: gcc.target/i386/minmax-6.c scan-assembler pmaxsd
FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd
FAIL: gcc.target/i386/pr91154.c scan-assembler-not cmov
FAIL: gcc.target/i386/pr91154.c scan-assembler-times paddd 2
FAIL: gcc.target/i386/pr91154.c scan-assembler-times pmaxsd 2

I'd mentioned this in PRs target/91154 and target/91498 where Uros
pointed out that 32-bit Solaris/x86 only guarantees 4-byte stack
alignment, thus STV is disabled.

In line with several other STV tests, adding -mno-stackrealign to the
options fixes the failures.

Tested with the appropriate runtest invocation on i386-pc-solaris2.11
and x86_64-pc-linux-gnu.

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-08-21  Rainer Orth  

* gcc.target/i386/minmax-4.c: Add -mno-stackrealign to dg-options.
* gcc.target/i386/minmax-5.c: Likewise.
* gcc.target/i386/minmax-6.c: Likewise.
* gcc.target/i386/minmax-7.c: Likewise.
* gcc.target/i386/pr91154.c: Likewise.

# HG changeset patch
# Parent  82b696ad52afcccff42b7b86c591cff9732e667e
Fix gcc.target/i386/minmax-4.c etc. on 32-bit Solaris/x86

diff --git a/gcc/testsuite/gcc.target/i386/minmax-4.c b/gcc/testsuite/gcc.target/i386/minmax-4.c
--- a/gcc/testsuite/gcc.target/i386/minmax-4.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mstv -msse4.1" } */
+/* { dg-options "-O2 -mstv -mno-stackrealign -msse4.1" } */
 
 #include "minmax-3.c"
 
diff --git a/gcc/testsuite/gcc.target/i386/minmax-5.c b/gcc/testsuite/gcc.target/i386/minmax-5.c
--- a/gcc/testsuite/gcc.target/i386/minmax-5.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mstv -mavx512vl" } */
+/* { dg-options "-O2 -mstv -mno-stackrealign -mavx512vl" } */
 
 #include "minmax-3.c"
 
diff --git a/gcc/testsuite/gcc.target/i386/minmax-6.c b/gcc/testsuite/gcc.target/i386/minmax-6.c
--- a/gcc/testsuite/gcc.target/i386/minmax-6.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=haswell" } */
+/* { dg-options "-O2 -march=haswell -mno-stackrealign" } */
 
 unsigned short
 UMVLine16Y_11 (short unsigned int * Pic, int y, int width)
diff --git a/gcc/testsuite/gcc.target/i386/minmax-7.c b/gcc/testsuite/gcc.target/i386/minmax-7.c
--- a/gcc/testsuite/gcc.target/i386/minmax-7.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=haswell" } */
+/* { dg-options "-O2 -march=haswell -mno-stackrealign" } */
 
 extern int numBins;
 extern int binOffst;
diff --git a/gcc/testsuite/gcc.target/i386/pr91154.c b/gcc/testsuite/gcc.target/i386/pr91154.c
--- a/gcc/testsuite/gcc.target/i386/pr91154.c
+++ b/gcc/testsuite/gcc.target/i386/pr91154.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse4.1 -mstv" } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
 
 void foo (int *dc, int *mc, int *tpdd, int *tpmd, int M)
 {


[Arm] Add 16-bit thumb alternatives to iorsi3_compare0[_scratch]

2019-08-22 Thread Richard Earnshaw (lists)
The iorsi3_compare0 and iorsi3_compare0_scratch patterns can make use of 
the 16-bit thumb orrs instruction if suitable registers are allocated. 
This patch adds the alternative to allow this to happen.


* config/arm/arm.md (iorsi3_compare0): Add alternative for 16-bit thumb
insn.
(iorsi3_compare0_scratch): Likewise.

Committed to trunk.

R.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 50e1b908f59..4ba246c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3339,27 +3339,33 @@ (define_peephole2
 
 (define_insn "*iorsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
-	(compare:CC_NOOV (ior:SI (match_operand:SI 1 "s_register_operand" "%r,r")
- (match_operand:SI 2 "arm_rhs_operand" "I,r"))
-			 (const_int 0)))
-   (set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(compare:CC_NOOV
+	 (ior:SI (match_operand:SI 1 "s_register_operand" "%r,0,r")
+		 (match_operand:SI 2 "arm_rhs_operand" "I,l,r"))
+	 (const_int 0)))
+   (set (match_operand:SI 0 "s_register_operand" "=r,l,r")
 	(ior:SI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "orrs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
-   (set_attr "type" "logics_imm,logics_reg")]
+   (set_attr "arch" "*,t2,*")
+   (set_attr "length" "4,2,4")
+   (set_attr "type" "logics_imm,logics_reg,logics_reg")]
 )
 
 (define_insn "*iorsi3_compare0_scratch"
   [(set (reg:CC_NOOV CC_REGNUM)
-	(compare:CC_NOOV (ior:SI (match_operand:SI 1 "s_register_operand" "%r,r")
- (match_operand:SI 2 "arm_rhs_operand" "I,r"))
-			 (const_int 0)))
-   (clobber (match_scratch:SI 0 "=r,r"))]
+	(compare:CC_NOOV
+	 (ior:SI (match_operand:SI 1 "s_register_operand" "%r,0,r")
+		 (match_operand:SI 2 "arm_rhs_operand" "I,l,r"))
+	 (const_int 0)))
+   (clobber (match_scratch:SI 0 "=r,l,r"))]
   "TARGET_32BIT"
   "orrs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
-   (set_attr "type" "logics_imm,logics_reg")]
+   (set_attr "arch" "*,t2,*")
+   (set_attr "length" "4,2,4")
+   (set_attr "type" "logics_imm,logics_reg,logics_reg")]
 )
 
 (define_expand "xordi3"


Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

2019-08-22 Thread Martin Sebor

On 8/21/19 4:47 PM, Jeff Law wrote:

On 8/19/19 7:57 AM, Richard Biener wrote:



+static tree
+objsize_from_type (tree object)
+{
+  if (TREE_CODE (object) != ADDR_EXPR)
+return NULL_TREE;
+
+  tree type = TREE_TYPE (object);
+  if (POINTER_TYPE_P (type))

that if looks suspicious...  I'd say
   if (!POINTER_TYPE_P (type))
 return NULL_TREE;

is better

Sure.  But we may not be able to use this anyway as you noted later,
depending on the type is not safe.



+type = TREE_TYPE (type);

+  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
+{

array_at_struct_end_p will never return true here since you pass it
an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?

I think you're right.  Given this was cribbed from the tail of another
function elsewhere, I suspect that other function has the same issue.

I suspect ultimately we want to fix that other copy and drop
objsize_from_type.


Is the other copy compute_objsize in builtins.c?  The function
has a comment that says it "is intended for diagnostics a should
not be used to influence code generation or optimization."
I added that comment on your request, not because the function
relies the type of objects but because it conservatively uses
the lower bound of ranges of offsets into arrays to determine
the minimum size.

Martin


Re: [PATCH] Builtin function roundeven folding implementation

2019-08-22 Thread Joseph Myers
On Thu, 22 Aug 2019, Martin Jambor wrote:

> +/* Round X to nearest integer, rounding halfway cases towards even.  */
> +
> +void
> +real_roundeven (REAL_VALUE_TYPE *r, format_helper fmt,
> + const REAL_VALUE_TYPE *x)
> +{
> +  if (is_halfway_below (x))
> +  {
> +do_add (r, x, , x->sign);
> +if (!is_even (r))
> +  do_add (r, r, , x->sign);

I'm concerned that this would produce +0.0 for an argument of -0.5 (via 
-0.5 - 0.5 - -1.0 producing +0.0) when it needs to produce -0.0.

Note that testcases for the sign of zero results need to check e.g. 
!!__builtin_signbit on the result, or the result of calling 
__builtin_copysign* to extract the sign of the result, since 0.0 == -0.0 
so checking with ==, while necessary, is not sufficient in that case.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] Enable math functions linking with static library for LTO

2019-08-22 Thread Richard Biener
On August 22, 2019 2:36:47 PM GMT+02:00, Joseph Myers  
wrote:
>On Thu, 22 Aug 2019, Richard Biener wrote:
>
>> Not sure if that helps answering your question.  In essence,
>> this function should tell whether we would emit an UNDEF
>> in the end and we want to know that early, at LTO IL generation
>> time to communicate this to the linker plugin.  Given previous
>
>To me this sounds like e.g. roundeven and fadd (and, generally, any new
>
>libm functions we add built-in functions for) *should* be included in
>the 
>list like all the other libm functions, given that on many systems they
>
>can't be expanded inline for all arguments and so will generate UNDEFs 
>(and some people might want to get them from a static library) - even
>if 
>some systems' libm might in fact not have those functions.

Yes.

Richard. 



Re: [PATCH] i386: Roundeven expansion for SSE4.1+

2019-08-22 Thread Martin Jambor
Hi,

On Wed, Jul 31 2019, Uros Bizjak wrote:
> On Wed, Jul 31, 2019 at 7:51 AM Tejas Joshi  wrote:
>>
>> Hi.
>>
>> > > * gcc.target/i386/avx-vround-roundeven-1.c: New test.
>> > > * gcc.target/i386/avx-vround-roundeven-2.c: New test.
>> >
>> > roundss and roundsd are sse4_1 instructions, also please change tests
>> > to use -O2:
>>
>> I have made the following changes you suggested and changed the file names 
>> to:
>>
>> * gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
>> * gcc.target/i386/sse4_1-round-roundeven-2.c: New test.
>
> +++ b/gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mavx" } */
> +/* { dg-options "-msse4.1" } */
>
> -O2 -msse4.1
>
> OK with the above change.
>

Given that Jeff approved the middle-end bits, I intend to commit the
following on Tejas's behalf tomorrow after I commit
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01567.html (assuming there
won't be any final objections to it).

It is the combination of the patches in this thread with the final minor
testcase issue fixed.  I have bootstrapped and tested it on x86_64-linux
and for good measure also on aarch64-linux.

Thanks,

Martin


gcc/ChangeLog:

2019-08-22  Tejas Joshi  
Uros Bizjak  

* builtins.c (mathfn_built_in_2): Change CASE_MATHFN to
CASE_MATHFN_FLOATN for roundeven.
* config/i386/i386.c (ix86_i387_mode_needed): Add case
I387_ROUNDEVEN.
(ix86_mode_needed): Likewise.
(ix86_mode_after): Likewise.
(ix86_mode_entry): Likewise.
(ix86_mode_exit): Likewise.
(ix86_emit_mode_set): Likewise.
(emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN.
* config/i386/i386.h (ix86_stack_slot) : Add SLOT_CW_ROUNDEVEN.
(ix86_entry): Add I387_ROUNDEVEN.
(avx_u128_state): Add I387_CW_ANY.
* config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN.
(define_int_iterator): Likewise.
(define_int_attr): Likewise for rounding_insn, rounding and ROUNDING.
(define_constant): Define ROUND_ROUNDEVEN mode.
(define_attr): Add roundeven mode for i387_cw.
(2): Add condition for ROUND_ROUNDEVEN.
* internal-fn.def (ROUNDEVEN): New builtin function.
* optabs.def (roundeven_optab): New optab.

gcc/testsuite/ChangeLog:

2019-08-22  Tejas Joshi  

* gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
* gcc.target/i386/sse4_1-round-roundeven-2.c: New test.
---
 gcc/builtins.c|  2 +-
 gcc/config/i386/i386.c| 16 +
 gcc/config/i386/i386.h|  4 +++-
 gcc/config/i386/i386.md   | 23 ---
 gcc/internal-fn.def   |  1 +
 gcc/optabs.def|  1 +
 gcc/reg-stack.c   |  1 +
 .../i386/sse4_1-round-roundeven-1.c   | 17 ++
 .../i386/sse4_1-round-roundeven-2.c   | 15 
 9 files changed, 70 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-2.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 5149d901a96..e44866e2a60 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2056,7 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 CASE_MATHFN (REMQUO)
 CASE_MATHFN_FLOATN (RINT)
 CASE_MATHFN_FLOATN (ROUND)
-CASE_MATHFN (ROUNDEVEN)
+CASE_MATHFN_FLOATN (ROUNDEVEN)
 CASE_MATHFN (SCALB)
 CASE_MATHFN (SCALBLN)
 CASE_MATHFN (SCALBN)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 647bcbef050..46d19c88c1f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13565,6 +13565,11 @@ ix86_i387_mode_needed (int entity, rtx_insn *insn)
 
   switch (entity)
 {
+case I387_ROUNDEVEN:
+  if (mode == I387_CW_ROUNDEVEN)
+   return mode;
+  break;
+
 case I387_TRUNC:
   if (mode == I387_CW_TRUNC)
return mode;
@@ -13599,6 +13604,7 @@ ix86_mode_needed (int entity, rtx_insn *insn)
   return ix86_dirflag_mode_needed (insn);
 case AVX_U128:
   return ix86_avx_u128_mode_needed (insn);
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13659,6 +13665,7 @@ ix86_mode_after (int entity, int mode, rtx_insn *insn)
   return mode;
 case AVX_U128:
   return ix86_avx_u128_mode_after (mode, insn);
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13711,6 +13718,7 @@ ix86_mode_entry (int entity)
   return ix86_dirflag_mode_entry ();
 case AVX_U128:
   return ix86_avx_u128_mode_entry ();
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13748,6 +13756,7 @@ ix86_mode_exit (int entity)
  

Re: [PATCH] Builtin function roundeven folding implementation

2019-08-22 Thread Martin Jambor
Hi,

On Wed, Aug 21 2019, Joseph Myers wrote:
> On Wed, 21 Aug 2019, Martin Jambor wrote:
>
>>   - I have listed roundeven variants in extend.texi.  If I did not find
>> the right spot, I will gladly move to a more appropriate one.
>
> I don't think they should be documented with the __builtin_* that are 
> always expanded inline.  They should be documented in the list of 
> functions that *might* be expanded inline.  That is, where extend.texi 
> says:
>
>   [...]  Many of these
>   functions are only optimized in certain cases; if they are not optimized in
>   a particular case, a call to the library function is emitted.
>
>   @opindex ansi
>   @opindex std
>   Outside strict ISO C mode (@option{-ansi}, @option{-std=c90},
>   @option{-std=c99} or @option{-std=c11}), the functions
>   @code{_exit}, [...]
>   may be handled as built-in functions.
>   All these functions have corresponding versions
>   prefixed with @code{__builtin_}, which may be used even in strict C90
>   mode.
>
> (Until we enable these by default for C2X, at which point they'd be listed 
> as C2X functions after the C99 ones.)
>

that indeed makes more sense.  I have changed that in the patch below.
I hope the patch is good to be committed now and would like to do that
tomorrow.

I have bootstrapped and tested it on x86_64-linux and aarch64-linux and
I also checked the documentation changes with make info and make pdf.

Thanks,

Martin



gcc/ChangeLog:

2019-08-22  Tejas Joshi  
Martin Jambor  

* builtins.c (mathfn_built_in_2): Added CASE_MATHFN for ROUNDEVEN.
* builtins.def: Added function definitions for roundeven function
variants.
* fold-const-call.c (fold_const_call_ss): Added case for roundeven
function call.
* fold-const.c (negate_mathfn_p): Added case for roundeven function.
(tree_call_nonnegative_warnv_p): Added case for roundeven function.
(integer_valued_real_call_p): Added case for roundeven function.
* real.c (is_even): New function. Returns true if real number is even,
otherwise returns false.
(is_halfway_below): New function. Returns true if real number is
halfway between two integers, else return false.
(real_roundeven): New function. Round real number to nearest integer,
rounding halfway cases towards even.
* real.h (real_value): Added descriptive comments.  Added function
declaration for roundeven function.
* doc/extend.texi (Other Builtins): List roundeven variants among
functions which can be handled as builtins.

gcc/testsuite/ChangeLog:

2019-08-21  Tejas Joshi  

* gcc.dg/torture/builtin-round-roundeven.c: New test.
* gcc.dg/torture/builtin-round-roundevenf128.c: New test.
---
 gcc/builtins.c|  1 +
 gcc/builtins.def  |  6 ++
 gcc/doc/extend.texi   |  3 +-
 gcc/fold-const-call.c | 23 +++--
 gcc/fold-const.c  |  6 ++
 gcc/real.c| 83 +++
 gcc/real.h|  9 ++
 .../gcc.dg/torture/builtin-round-roundeven.c  | 23 +
 .../torture/builtin-round-roundevenf128.c | 20 +
 9 files changed, 168 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/builtin-round-roundeven.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/builtin-round-roundevenf128.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a766e4ad63..5149d901a96 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2056,6 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 CASE_MATHFN (REMQUO)
 CASE_MATHFN_FLOATN (RINT)
 CASE_MATHFN_FLOATN (ROUND)
+CASE_MATHFN (ROUNDEVEN)
 CASE_MATHFN (SCALB)
 CASE_MATHFN (SCALBLN)
 CASE_MATHFN (SCALBN)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 6d41bdb4f44..8bb7027aac7 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -542,12 +542,18 @@ DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, AT
 #define RINT_TYPE(F) BT_FN_##F##_##F
 DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_RINT, "rint", RINT_TYPE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 #undef RINT_TYPE
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVEN, "roundeven", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENF, "roundevenf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_ROUNDEVENL, "roundevenl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUND, "round", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDF, "roundf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDL, "roundl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 #define ROUND_TYPE(F) BT_FN_##F##_##F
 

[RFC] [AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2019-08-22 Thread Shaokun Zhang
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, then we can get some
performance benefit from this feature.

2019-08-22  Shaokun Zhang  

* config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC

---
 libgcc/config/aarch64/sync-cache.c | 56 --
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/aarch64/sync-cache.c 
b/libgcc/config/aarch64/sync-cache.c
index 791f5e42ff44..0b057efbdcab 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#define CTR_IDC_SHIFT   28
+#define CTR_DIC_SHIFT   29
+
 void __aarch64_sync_cache_range (const void *, const void *);
 
 void
@@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, const void 
*end)
   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);
 
-  /* Loop over the address range, clearing one cache line at once.
- Data cache must be flushed to unification first to make sure the
- instruction cache fetches the updated data.  'end' is exclusive,
- as per the GNU definition of __clear_cache.  */
+  /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of Unification is
+ not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) {
+/* Loop over the address range, clearing one cache line at once.
+   Data cache must be flushed to unification first to make sure the
+   instruction cache fetches the updated data.  'end' is exclusive,
+   as per the GNU definition of __clear_cache.  */
 
-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
+/* Make the start address of the loop cache aligned.  */
+address = (const char*) ((__UINTPTR_TYPE__) base
+& ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
 
-  for (; address < (const char *) end; address += dcache_lsize)
-asm volatile ("dc\tcvau, %0"
- :
- : "r" (address)
- : "memory");
+for (; address < (const char *) end; address += dcache_lsize)
+  asm volatile ("dc\tcvau, %0"
+   :
+   : "r" (address)
+   : "memory");
+  }
 
   asm volatile ("dsb\tish" : : : "memory");
 
-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of
+ Unification is not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) {
+/* Make the start address of the loop cache aligned.  */
+address = (const char*) ((__UINTPTR_TYPE__) base
+& ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
 
-  for (; address < (const char *) end; address += icache_lsize)
-asm volatile ("ic\tivau, %0"
- :
- : "r" (address)
- : "memory");
+for (; address < (const char *) end; address += icache_lsize)
+  asm volatile ("ic\tivau, %0"
+   :
+   : "r" (address)
+   : "memory");
 
-  asm volatile ("dsb\tish; isb" : : : "memory");
+asm volatile ("dsb\tish" : : : "memory");
+  }
+  asm volatile("isb" : : : "memory")
 }
-- 
2.7.4



PING^2 [PATCH v2] S/390: Improve storing asan frame_pc

2019-08-22 Thread Ilya Leoshkevich


>> Am 02.07.2019 um 17:34 schrieb Ilya Leoshkevich :
>> 
>> Bootstrap and regtest running on x86_64-redhat-linux, s390x-redhat-linux
>> and ppc64le-redhat-linux.
>> 
>> Currently s390 emits the following sequence to store a frame_pc:
>> 
>>  a:
>>  .LASANPC0:
>> 
>>  lg  %r1,.L5-.L4(%r13)
>>  la  %r1,0(%r1,%r12)
>>  stg %r1,176(%r11)
>> 
>>  .L5:
>>  .quad   .LASANPC0@GOTOFF
>> 
>> The reason GOT indirection is used instead of larl is that gcc does not
>> know that .LASANPC0, being a code label, is aligned on a 2-byte
>> boundary, and larl can load only even addresses.
>> 
>> This patch provides such an alignment hint.  Since targets don't provide
>> their instruction alignments yet, the new macro is introduced for that
>> purpose.  It returns 1-byte alignment by default, so this change is a
>> no-op for targets other than s390.
>> 
>> As a result, we get the desired:
>> 
>>  larl%r1,.LASANPC0
>>  stg %r1,176(%r11)
>> 
>> gcc/ChangeLog:
>> 
>> 2019-06-28  Ilya Leoshkevich  
>> 
>>  * asan.c (asan_emit_stack_protection): Provide an alignment
>>  hint.
>>  * config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
>>  requires code labels to be aligned on a 2-byte boundary.
>>  * defaults.h (CODE_LABEL_BOUNDARY): New macro.
>>  * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
>>  * doc/tm.texi.in: Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2019-06-28  Ilya Leoshkevich  
>> 
>>  * gcc.target/s390/asan-no-gotoff.c: New test.
>> ---
>> gcc/asan.c |  1 +
>> gcc/config/s390/s390.h |  3 +++
>> gcc/defaults.h |  5 +
>> gcc/doc/tm.texi|  4 
>> gcc/doc/tm.texi.in |  4 
>> gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
>> 6 files changed, 32 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
>> 
>> diff --git a/gcc/asan.c b/gcc/asan.c
>> index 605d04f87f7..2db69f476bc 100644
>> --- a/gcc/asan.c
>> +++ b/gcc/asan.c
>> @@ -1523,6 +1523,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
>> unsigned int alignb,
>>  DECL_INITIAL (decl) = decl;
>>  TREE_ASM_WRITTEN (decl) = 1;
>>  TREE_ASM_WRITTEN (id) = 1;
>> +  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
>>  emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>>  shadow_base = expand_binop (Pmode, lshr_optab, base,
>>gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>> diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
>> index 969f58a2ba0..3d0266c9dff 100644
>> --- a/gcc/config/s390/s390.h
>> +++ b/gcc/config/s390/s390.h
>> @@ -334,6 +334,9 @@ extern const char *s390_host_detect_local_cpu (int argc, 
>> const char **argv);
>> /* Allocation boundary (in *bits*) for the code of a function.  */
>> #define FUNCTION_BOUNDARY 64
>> 
>> +/* Alignment required for a code label, in bits.  */
>> +#define CODE_LABEL_BOUNDARY 16
>> +
>> /* There is no point aligning anything to a rounder boundary than this.  */
>> #define BIGGEST_ALIGNMENT 64
>> 
>> diff --git a/gcc/defaults.h b/gcc/defaults.h
>> index af7ea185f1e..97c4c17537d 100644
>> --- a/gcc/defaults.h
>> +++ b/gcc/defaults.h
>> @@ -1459,4 +1459,9 @@ see the files COPYING3 and COPYING.RUNTIME 
>> respectively.  If not, see
>> #define DWARF_GNAT_ENCODINGS_DEFAULT DWARF_GNAT_ENCODINGS_GDB
>> #endif
>> 
>> +/* Alignment required for a code label, in bits.  */
>> +#ifndef CODE_LABEL_BOUNDARY
>> +#define CODE_LABEL_BOUNDARY BITS_PER_UNIT
>> +#endif
>> +
>> #endif  /* ! GCC_DEFAULTS_H */
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 14c1ea6a323..3b50fc0c0a7 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -1019,6 +1019,10 @@ to a value equal to or larger than 
>> @code{STACK_BOUNDARY}.
>> Alignment required for a function entry point, in bits.
>> @end defmac
>> 
>> +@defmac CODE_LABEL_BOUNDARY
>> +Alignment required for a code label, in bits.
>> +@end defmac
>> +
>> @defmac BIGGEST_ALIGNMENT
>> Biggest alignment that any data type can require on this machine, in
>> bits.  Note that this is not the biggest alignment that is supported,
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index b4d57b86e2f..ab038b7462c 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -969,6 +969,10 @@ to a value equal to or larger than 
>> @code{STACK_BOUNDARY}.
>> Alignment required for a function entry point, in bits.
>> @end defmac
>> 
>> +@defmac CODE_LABEL_BOUNDARY
>> +Alignment required for a code label, in bits.
>> +@end defmac
>> +
>> @defmac BIGGEST_ALIGNMENT
>> Biggest alignment that any data type can require on this machine, in
>> bits.  Note that this is not the biggest alignment that is supported,
>> diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c 
>> 

[PATCH v2 9/9] S/390: Test signaling FP comparison instructions

2019-08-22 Thread Ilya Leoshkevich
gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* gcc.target/s390/s390.exp: Enable Fortran tests.
* gcc.target/s390/zvector/autovec-double-quiet-eq.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-ge.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-gt.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-le.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-lt.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-ordered.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: New test.
* gcc.target/s390/zvector/autovec-double-quiet-unordered.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-eq.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-le-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c: New 
test.
* gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c: 
New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c: New test.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: New test.
* gcc.target/s390/zvector/autovec-double-smax-z13.F90: New test.
* gcc.target/s390/zvector/autovec-double-smax.F90: New test.
* gcc.target/s390/zvector/autovec-double-smin-z13.F90: New test.
* gcc.target/s390/zvector/autovec-double-smin.F90: New test.
* gcc.target/s390/zvector/autovec-float-quiet-eq.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-ge.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-gt.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-le.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-lt.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-ordered.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: New test.
* gcc.target/s390/zvector/autovec-float-quiet-unordered.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: New test.
* gcc.target/s390/zvector/autovec-fortran.h: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-ge.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-gt.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-le.c: New test.
* gcc.target/s390/zvector/autovec-long-double-signaling-lt.c: New test.
* gcc.target/s390/zvector/autovec.h: New test.
---
 gcc/testsuite/gcc.target/s390/s390.exp|  8 
 .../s390/zvector/autovec-double-quiet-eq.c|  8 
 .../s390/zvector/autovec-double-quiet-ge.c|  8 
 .../s390/zvector/autovec-double-quiet-gt.c|  8 
 .../s390/zvector/autovec-double-quiet-le.c|  8 
 .../s390/zvector/autovec-double-quiet-lt.c|  8 
 .../zvector/autovec-double-quiet-ordered.c| 10 +
 .../s390/zvector/autovec-double-quiet-uneq.c  | 10 +
 .../zvector/autovec-double-quiet-unordered.c  | 11 +
 .../autovec-double-signaling-eq-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-eq-z13.c |  9 
 .../zvector/autovec-double-signaling-eq.c | 11 +
 .../autovec-double-signaling-ge-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-ge-z13.c |  9 
 .../zvector/autovec-double-signaling-ge.c |  8 
 .../autovec-double-signaling-gt-z13-finite.c  | 10 +
 .../zvector/autovec-double-signaling-gt-z13.c |  9 
 .../zvector/autovec-double-signaling-gt.c |  8 
 .../autovec-double-signaling-le-z13-finite.c  | 10 +
 

[PATCH v2 8/9] S/390: Use signaling FP comparison instructions

2019-08-22 Thread Ilya Leoshkevich
dg-torture.exp=inf-compare-1.c is failing, because (qNaN > +Inf)
comparison is compiled to CDB instruction, which does not signal an
invalid operation exception. KDB should have been used instead.

This patch introduces a new CCmode and a new pattern in order to
generate signaling instructions in this and similar cases.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* config/s390/2827.md: Add new opcodes.
* config/s390/2964.md: Likewise.
* config/s390/3906.md: Likewise.
* config/s390/8561.md: Likewise.
* config/s390/s390-builtins.def (s390_vfchesb): Use
the new vec_cmpgev4sf_quiet_nocc.
(s390_vfchedb): Use the new vec_cmpgev2df_quiet_nocc.
(s390_vfchsb): Use the new vec_cmpgtv4sf_quiet_nocc.
(s390_vfchdb): Use the new vec_cmpgtv2df_quiet_nocc.
(vec_cmplev4sf): Use the new vec_cmplev4sf_quiet_nocc.
(vec_cmplev2df): Use the new vec_cmplev2df_quiet_nocc.
(vec_cmpltv4sf): Use the new vec_cmpltv4sf_quiet_nocc.
(vec_cmpltv2df): Use the new vec_cmpltv2df_quiet_nocc.
* config/s390/s390-modes.def (CCSFPS): New mode.
* config/s390/s390.c (s390_match_ccmode_set): Support CCSFPS.
(s390_select_ccmode): Return CCSFPS for LT, LE, GT, GE and LTGT.
(s390_branch_condition_mask): Reuse CCS for CCSFPS.
(s390_expand_vec_compare): Use non-signaling patterns where
necessary.
(s390_reverse_condition): Support CCSFPS.
* config/s390/s390.md (*cmp_ccsfps): New pattern.
* config/s390/vector.md: (VFCMP_HW_OP): Remove.
(asm_fcmp_op): Likewise.
(*smaxv2df3_vx): Use pattern for quiet comparison.
(*sminv2df3_vx): Likewise.
(*vec_cmp_nocc): Remove.
(*vec_cmpeq_quiet_nocc): New pattern.
(vec_cmpgt_quiet_nocc): Likewise.
(vec_cmplt_quiet_nocc): New expander.
(vec_cmpge_quiet_nocc): New pattern.
(vec_cmple_quiet_nocc): New expander.
(*vec_cmpeq_signaling_nocc): New pattern.
(*vec_cmpgt_signaling_nocc): Likewise.
(*vec_cmpgt_signaling_finite_nocc): Likewise.
(*vec_cmpge_signaling_nocc): Likewise.
(*vec_cmpge_signaling_finite_nocc): Likewise.
(vec_cmpungt): New expander.
(vec_cmpunge): Likewise.
(vec_cmpuneq): Use quiet patterns.
(vec_cmpltgt): Allow only on z14+.
(vec_cmpordered): Use quiet patterns.
(vec_cmpunordered): Likewise.
(VEC_CODE_WITH_COMPLEX_EXPAND): Add ungt and unge.

gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* gcc.target/s390/vector/vec-scalar-cmp-1.c: Adjust
expectations.
---
 gcc/config/s390/2827.md   |  14 +-
 gcc/config/s390/2964.md   |  13 +-
 gcc/config/s390/3906.md   |  17 +-
 gcc/config/s390/8561.md   |  19 +-
 gcc/config/s390/s390-builtins.def |  16 +-
 gcc/config/s390/s390-modes.def|   8 +
 gcc/config/s390/s390.c|  34 ++--
 gcc/config/s390/s390.md   |  14 ++
 gcc/config/s390/vector.md | 171 +++---
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 10 files changed, 240 insertions(+), 74 deletions(-)

diff --git a/gcc/config/s390/2827.md b/gcc/config/s390/2827.md
index 3f63f82284d..aafe8e27339 100644
--- a/gcc/config/s390/2827.md
+++ b/gcc/config/s390/2827.md
@@ -44,7 +44,7 @@
 
 (define_insn_reservation "zEC12_normal_fp" 8
   (and (eq_attr "cpu" "zEC12")
-   (eq_attr "mnemonic" 
"lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb"))
 "nothing")
+   (eq_attr "mnemonic" 
"lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb,keb,kebr,kdb,kdbr"))
 "nothing")
 
 (define_insn_reservation "zEC12_cgdbr" 2
   (and (eq_attr "cpu" "zEC12")
@@ -426,6 +426,10 @@
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "cxbr")) "nothing")
 
+(define_insn_reservation "zEC12_kxbr" 18
+  (and (eq_attr "cpu" "zEC12")
+   (eq_attr "mnemonic" "kxbr")) "nothing")
+
 (define_insn_reservation "zEC12_ddbr" 36
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "ddbr")) "nothing")
@@ -578,10 +582,18 @@
   (and (eq_attr "cpu" "zEC12")
(eq_attr "mnemonic" "cdtr")) "nothing")
 
+(define_insn_reservation "zEC12_kdtr" 11
+  (and (eq_attr "cpu" "zEC12")

[PATCH v2 7/9] S/390: Remove code duplication in vec_* comparison expanders

2019-08-22 Thread Ilya Leoshkevich
s390.md uses a lot of near-identical expanders that perform dispatching
to other expanders based on operand types. Since the following patch
would require even more of these, avoid copy-pasting the code by
generating these expanders using an iterator.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* config/s390/s390.c (s390_expand_vec_compare): Use
gen_vec_cmpordered and gen_vec_cmpunordered.
* config/s390/vector.md (vec_cmpuneq, vec_cmpltgt, vec_ordered,
vec_unordered): Delete.
(vec_ordered): Rename to vec_cmpordered.
(vec_unordered): Rename to vec_cmpunordered.
(vec_cmp): Generic dispatcher.
---
 gcc/config/s390/s390.c|  4 +--
 gcc/config/s390/vector.md | 67 +++
 2 files changed, 13 insertions(+), 58 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fa17d7d5d08..f9817f6edaf 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6523,10 +6523,10 @@ s390_expand_vec_compare (rtx target, enum rtx_code cond,
  emit_insn (gen_vec_cmpltgt (target, cmp_op1, cmp_op2));
  return;
case ORDERED:
- emit_insn (gen_vec_ordered (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpordered (target, cmp_op1, cmp_op2));
  return;
case UNORDERED:
- emit_insn (gen_vec_unordered (target, cmp_op1, cmp_op2));
+ emit_insn (gen_vec_cmpunordered (target, cmp_op1, cmp_op2));
  return;
default: break;
}
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 1b66b8be61f..a093ae5c565 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1507,22 +1507,6 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_cmpuneq"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_cmpuneqv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_cmpuneqv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; LTGT a <> b -> a > b | b > a
 (define_expand "vec_cmpltgt"
   [(set (match_operand: 0 "register_operand" "=v")
@@ -1535,24 +1519,8 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_cmpltgt"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_cmpltgtv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_cmpltgtv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; ORDERED (a, b): a >= b | b > a
-(define_expand "vec_ordered"
+(define_expand "vec_cmpordered"
   [(set (match_operand:  0 "register_operand" "=v")
(ge: (match_operand:VFT 1 "register_operand"  "v")
 (match_operand:VFT 2 "register_operand"  "v")))
@@ -1563,45 +1531,32 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
-(define_expand "vec_ordered"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_orderedv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn (gen_vec_orderedv2df (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
-
-  DONE;
-})
-
 ; UNORDERED (a, b): !ORDERED (a, b)
-(define_expand "vec_unordered"
+(define_expand "vec_cmpunordered"
   [(match_operand: 0 "register_operand" "=v")
(match_operand:VFT1 "register_operand" "v")
(match_operand:VFT2 "register_operand" "v")]
   "TARGET_VX"
 {
-  emit_insn (gen_vec_ordered (operands[0], operands[1], operands[2]));
+  emit_insn (gen_vec_cmpordered (operands[0], operands[1], operands[2]));
   emit_insn (gen_rtx_SET (operands[0],
 gen_rtx_NOT (mode, operands[0])));
   DONE;
 })
 
-(define_expand "vec_unordered"
+(define_code_iterator VEC_CODE_WITH_COMPLEX_EXPAND
+  [uneq ltgt ordered unordered])
+
+(define_expand "vec_cmp"
   [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
+   (VEC_CODE_WITH_COMPLEX_EXPAND (match_operand 1 "register_operand" "")
+(match_operand 2 "register_operand" ""))]
   "TARGET_VX"
 {
   if (GET_MODE (operands[1]) == V4SFmode)
-emit_insn (gen_vec_unorderedv4sf (operands[0], operands[1], operands[2]));
+emit_insn (gen_vec_cmpv4sf (operands[0], operands[1], operands[2]));
   else if (GET_MODE (operands[1]) == V2DFmode)
-emit_insn 

[PATCH v2 6/9] S/390: Remove code duplication in vec_unordered

2019-08-22 Thread Ilya Leoshkevich
vec_unordered is vec_ordered plus a negation at the end.
Reuse vec_unordered logic.

gcc/ChangeLog:

2019-08-13  Ilya Leoshkevich  

* config/s390/vector.md (vec_unordered): Call
gen_vec_ordered.
---
 gcc/config/s390/vector.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index ca5ec0dd3b0..1b66b8be61f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1581,15 +1581,15 @@
 
 ; UNORDERED (a, b): !ORDERED (a, b)
 (define_expand "vec_unordered"
-  [(set (match_operand:  0 "register_operand" "=v")
-   (ge: (match_operand:VFT 1 "register_operand"  "v")
-(match_operand:VFT 2 "register_operand"  "v")))
-   (set (match_dup 3) (gt: (match_dup 2) (match_dup 1)))
-   (set (match_dup 0) (ior: (match_dup 0) (match_dup 3)))
-   (set (match_dup 0) (not: (match_dup 0)))]
+  [(match_operand: 0 "register_operand" "=v")
+   (match_operand:VFT1 "register_operand" "v")
+   (match_operand:VFT2 "register_operand" "v")]
   "TARGET_VX"
 {
-  operands[3] = gen_reg_rtx (mode);
+  emit_insn (gen_vec_ordered (operands[0], operands[1], operands[2]));
+  emit_insn (gen_rtx_SET (operands[0],
+gen_rtx_NOT (mode, operands[0])));
+  DONE;
 })
 
 (define_expand "vec_unordered"
-- 
2.21.0



[PATCH v2 4/9] S/390: Do not use signaling vector comparisons on z13

2019-08-22 Thread Ilya Leoshkevich
z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.  Notify
middle-end about this using more restrictive operator predicate in
vcond.

gcc/ChangeLog:

2019-08-21  Ilya Leoshkevich  

* config/s390/vector.md (vcond_comparison_operator): New
predicate.
(vcond): Use vcond_comparison_operator.
---
 gcc/config/s390/vector.md | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 0702e1de835..d7a266c5605 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -614,10 +614,30 @@
   operands[2] = GEN_INT (GET_MODE_NUNITS (mode) - 1);
 })
 
+(define_predicate "vcond_comparison_operator"
+  (match_operand 0 "comparison_operator")
+{
+  if (!HONOR_NANS (GET_MODE (XEXP (op, 0)))
+  && !HONOR_NANS (GET_MODE (XEXP (op, 1
+return true;
+  switch (GET_CODE (op))
+{
+case LE:
+case LT:
+case GE:
+case GT:
+case LTGT:
+  /* Signaling vector comparisons are supported only on z14+.  */
+  return TARGET_Z14;
+default:
+  return true;
+}
+})
+
 (define_expand "vcond"
   [(set (match_operand:V_HW 0 "register_operand" "")
(if_then_else:V_HW
-(match_operator 3 "comparison_operator"
+(match_operator 3 "vcond_comparison_operator"
 [(match_operand:V_HW2 4 "register_operand" "")
  (match_operand:V_HW2 5 "nonmemory_operand" "")])
 (match_operand:V_HW 1 "nonmemory_operand" "")
-- 
2.21.0



[PATCH v2 5/9] S/390: Implement vcond expander for V1TI,V1TF

2019-08-22 Thread Ilya Leoshkevich
Currently gcc does not emit wf{c,k}* instructions when comparing long
double values.  Middle-end actually adds them in the first place, but
then veclower pass replaces them with floating point register pair
operations, because the corresponding expander is missing.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

* config/s390/vector.md (vcondv1tiv1tf): New variant of
vcond$a$b expander.
---
 gcc/config/s390/vector.md | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index d7a266c5605..ca5ec0dd3b0 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -649,6 +649,21 @@
   DONE;
 })
 
+(define_expand "vcondv1tiv1tf"
+  [(set (match_operand:V1TI 0 "register_operand" "")
+   (if_then_else:V1TI
+(match_operator 3 "vcond_comparison_operator"
+[(match_operand:V1TF 4 "register_operand" "")
+ (match_operand:V1TF 5 "nonmemory_operand" "")])
+(match_operand:V1TI 1 "nonmemory_operand" "")
+(match_operand:V1TI 2 "nonmemory_operand" "")))]
+  "TARGET_VXE"
+{
+  s390_expand_vcond (operands[0], operands[1], operands[2],
+GET_CODE (operands[3]), operands[4], operands[5]);
+  DONE;
+})
+
 (define_expand "vcondu"
   [(set (match_operand:V_HW 0 "register_operand" "")
(if_then_else:V_HW
-- 
2.21.0



[PATCH v2 3/9] Introduce can_vector_compare_p function

2019-08-22 Thread Ilya Leoshkevich
z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.
However, we cannot express this restriction today: the code only checks
whether vcond$a$b optab exists, which does not contain information about
the operation.

Introduce a function that checks whether back-end supports vector
comparisons with individual rtx codes by matching vcond expander's third
argument with a fake comparison with the corresponding rtx code.

gcc/ChangeLog:

2019-08-21  Ilya Leoshkevich  

* Makefile.in (GTFILES): Add optabs.c.
* optabs-tree.c (expand_vec_cond_expr_p): Use
can_vector_compare_p.
* optabs.c (binop_key): Binary operation cache key.
(binop_hasher): Binary operation cache hasher.
(cached_binops): Binary operation cache.
(get_cached_binop): New function that returns a cached binary
operation or creates a new one.
(can_vector_compare_p): New function.
* optabs.h (enum can_vector_compare_purpose): New enum. Not
really needed today, but can be used to extend the support to
e.g. vec_cmp if need arises.
(can_vector_compare_p): New function.
---
 gcc/Makefile.in   |  2 +-
 gcc/optabs-tree.c | 11 +--
 gcc/optabs.c  | 79 +++
 gcc/optabs.h  | 15 +
 4 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 597dc01328b..d2207da5657 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2541,7 +2541,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/function.c $(srcdir)/except.c \
   $(srcdir)/ggc-tests.c \
   $(srcdir)/gcse.c $(srcdir)/godump.c \
-  $(srcdir)/lists.c $(srcdir)/optabs-libfuncs.c \
+  $(srcdir)/lists.c $(srcdir)/optabs.c $(srcdir)/optabs-libfuncs.c \
   $(srcdir)/profile.c $(srcdir)/mcf.c \
   $(srcdir)/reg-stack.c $(srcdir)/cfgrtl.c \
   $(srcdir)/stor-layout.c \
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 8157798cc71..e68bb39c021 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -23,7 +23,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "target.h"
 #include "insn-codes.h"
+#include "rtl.h"
 #include "tree.h"
+#include "memmodel.h"
+#include "optabs.h"
 #include "optabs-tree.h"
 #include "stor-layout.h"
 
@@ -347,8 +350,12 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type, 
enum tree_code code)
   || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS 
(cmp_op_mode)))
 return false;
 
-  if (get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
-  TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing
+  bool unsigned_p = TYPE_UNSIGNED (cmp_op_type);
+  if (((get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+unsigned_p) == CODE_FOR_nothing)
+   || !can_vector_compare_p (get_rtx_code (code, unsigned_p),
+TYPE_MODE (value_type),
+TYPE_MODE (cmp_op_type), cvcp_vcond))
   && ((code != EQ_EXPR && code != NE_EXPR)
  || get_vcond_eq_icode (TYPE_MODE (value_type),
 TYPE_MODE (cmp_op_type)) == CODE_FOR_nothing))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 9e54dda6e7f..07b4d824822 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "diagnostic-core.h"
 #include "rtx-vector-builder.h"
+#include "hash-table.h"
 
 /* Include insn-config.h before expr.h so that HAVE_conditional_move
is properly defined.  */
@@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
   return 0;
 }
 
+/* can_vector_compare_p presents fake rtx binary operations to the the back-end
+   in order to determine its capabilities.  In order to avoid creating fake
+   operations on each call, values from previous calls are cached in a global
+   cached_binops hash_table.  It contains rtxes, which can be looked up using
+   binop_keys.  */
+
+struct binop_key {
+  enum rtx_code code;/* Operation code.  */
+  machine_mode value_mode;   /* Result mode. */
+  machine_mode cmp_op_mode;  /* Operand mode.*/
+};
+
+struct binop_hasher : pointer_hash_mark, ggc_cache_remove {
+  typedef rtx value_type;
+  typedef binop_key compare_type;
+
+  static hashval_t
+  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
+  {
+inchash::hash hstate (0);
+hstate.add_int (code);
+hstate.add_int (value_mode);
+hstate.add_int (cmp_op_mode);
+return hstate.end ();
+  }
+
+  static hashval_t
+  hash (const rtx )
+  {
+return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
+  }
+
+  static bool
+  equal (const rtx , const binop_key )
+  {
+return (GET_CODE (ref1) == ref2.code)
+  && (GET_MODE 

[PATCH v2 0/9] S/390: Use signaling FP comparison instructions

2019-08-22 Thread Ilya Leoshkevich
Bootstrap and regtest running on x86_64-redhat-linux and
s390x-redhat-linux.

This patch series adds signaling FP comparison support (both scalar and
vector) to s390 backend.

Patch 1 documents the current behavior of MIN, MAX and LTGT operations
with respect to signaling.

Patches 2-4 make it possible to query supported vcond rtxes and make
use of that for z13.

Patches 5-7 are preparation cleanups.

Patch 8 is an actual implementation.

Path 9 contains new tests, that make sure autovectorized comparisons use
proper instructions.

Ilya Leoshkevich (9):
  Document signaling for min, max and ltgt operations
  hash_traits: split pointer_hash_mark from pointer_hash
  Introduce can_vector_compare_p function
  S/390: Do not use signaling vector comparisons on z13
  S/390: Implement vcond expander for V1TI,V1TF
  S/390: Remove code duplication in vec_unordered
  S/390: Remove code duplication in vec_* comparison expanders
  S/390: Use signaling FP comparison instructions
  S/390: Test signaling FP comparison instructions

v1->v2:
Improve wording in documentation commit message.
Replace hook with optabs query.
Add signaling eq test.

 gcc/Makefile.in   |   2 +-
 gcc/config/s390/2827.md   |  14 +-
 gcc/config/s390/2964.md   |  13 +-
 gcc/config/s390/3906.md   |  17 +-
 gcc/config/s390/8561.md   |  19 +-
 gcc/config/s390/s390-builtins.def |  16 +-
 gcc/config/s390/s390-modes.def|   8 +
 gcc/config/s390/s390.c|  38 ++-
 gcc/config/s390/s390.md   |  14 +
 gcc/config/s390/vector.md | 283 --
 gcc/cp/decl2.c|  14 +-
 gcc/doc/generic.texi  |  16 +-
 gcc/doc/md.texi   |   3 +-
 gcc/doc/rtl.texi  |   3 +-
 gcc/hash-traits.h |  74 +++--
 gcc/ipa-prop.c|  47 +--
 gcc/optabs-tree.c |  11 +-
 gcc/optabs.c  |  79 +
 gcc/optabs.h  |  15 +
 gcc/testsuite/gcc.target/s390/s390.exp|   8 +
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 .../s390/zvector/autovec-double-quiet-eq.c|   8 +
 .../s390/zvector/autovec-double-quiet-ge.c|   8 +
 .../s390/zvector/autovec-double-quiet-gt.c|   8 +
 .../s390/zvector/autovec-double-quiet-le.c|   8 +
 .../s390/zvector/autovec-double-quiet-lt.c|   8 +
 .../zvector/autovec-double-quiet-ordered.c|  10 +
 .../s390/zvector/autovec-double-quiet-uneq.c  |  10 +
 .../zvector/autovec-double-quiet-unordered.c  |  11 +
 .../autovec-double-signaling-eq-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-eq-z13.c |   9 +
 .../zvector/autovec-double-signaling-eq.c |  11 +
 .../autovec-double-signaling-ge-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-ge-z13.c |   9 +
 .../zvector/autovec-double-signaling-ge.c |   8 +
 .../autovec-double-signaling-gt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-gt-z13.c |   9 +
 .../zvector/autovec-double-signaling-gt.c |   8 +
 .../autovec-double-signaling-le-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-le-z13.c |   9 +
 .../zvector/autovec-double-signaling-le.c |   8 +
 .../autovec-double-signaling-lt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-lt-z13.c |   9 +
 .../zvector/autovec-double-signaling-lt.c |   8 +
 ...autovec-double-signaling-ltgt-z13-finite.c |   9 +
 .../autovec-double-signaling-ltgt-z13.c   |   9 +
 .../zvector/autovec-double-signaling-ltgt.c   |   9 +
 .../s390/zvector/autovec-double-smax-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smax.F90  |   8 +
 .../s390/zvector/autovec-double-smin-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smin.F90  |   8 +
 .../s390/zvector/autovec-float-quiet-eq.c |   8 +
 .../s390/zvector/autovec-float-quiet-ge.c |   8 +
 .../s390/zvector/autovec-float-quiet-gt.c |   8 +
 .../s390/zvector/autovec-float-quiet-le.c |   8 +
 .../s390/zvector/autovec-float-quiet-lt.c |   8 +
 .../zvector/autovec-float-quiet-ordered.c |  10 +
 .../s390/zvector/autovec-float-quiet-uneq.c   |  10 +
 .../zvector/autovec-float-quiet-unordered.c   |  11 +
 .../s390/zvector/autovec-float-signaling-eq.c |  11 +
 .../s390/zvector/autovec-float-signaling-ge.c |   8 +
 .../s390/zvector/autovec-float-signaling-gt.c |   8 +
 .../s390/zvector/autovec-float-signaling-le.c |   8 +
 .../s390/zvector/autovec-float-signaling-lt.c |   8 +
 .../zvector/autovec-float-signaling-ltgt.c|   9 +
 .../gcc.target/s390/zvector/autovec-fortran.h |   7 +
 .../autovec-long-double-signaling-ge.c|   8 +
 .../autovec-long-double-signaling-gt.c|   8 +
 .../autovec-long-double-signaling-le.c|   8 +
 

[PATCH v2 1/9] Document signaling for min, max and ltgt operations

2019-08-22 Thread Ilya Leoshkevich
Currently it's not clear whether min, max and ltgt should raise floating
point exceptions when dealing with qNaNs.

Right now a lot of code assumes that LTGT is signaling: in particular,
with -fno-finite-math-only, which is the default, it's generated for
the signaling ((x < y) || (x > y)).

The behavior of MIN/MAX is (intentionally?) left unspecified, according
to https://gcc.gnu.org/viewcvs/gcc?view=revision=263751
("Unconditionally use MAX_EXPR/MIN_EXPR for MAX/MIN intrinsics").

So document the status quo.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  

PR target/91323
* doc/generic.texi (LTGT_EXPR): Restore the original wording
regarding floating point exceptions.
(MIN_EXPR, MAX_EXPR): Document.
* doc/md.texi (smin, smax): Add a clause regarding floating
point exceptions.
* doc/rtl.texi (smin, smax): Add a clause regarding floating
point exceptions.
---
 gcc/doc/generic.texi | 16 +---
 gcc/doc/md.texi  |  3 ++-
 gcc/doc/rtl.texi |  3 ++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8901d5f357e..d5ae20bd461 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1331,6 +1331,8 @@ the byte offset of the field, but should not be used 
directly; call
 @tindex UNGE_EXPR
 @tindex UNEQ_EXPR
 @tindex LTGT_EXPR
+@tindex MIN_EXPR
+@tindex MAX_EXPR
 @tindex MODIFY_EXPR
 @tindex INIT_EXPR
 @tindex COMPOUND_EXPR
@@ -1602,13 +1604,21 @@ These operations take two floating point operands and 
determine whether
 the operands are unordered or are less than, less than or equal to,
 greater than, greater than or equal to, or equal respectively.  For
 example, @code{UNLT_EXPR} returns true if either operand is an IEEE
-NaN or the first operand is less than the second.  With the possible
-exception of @code{LTGT_EXPR}, all of these operations are guaranteed
-not to generate a floating point exception.  The result
+NaN or the first operand is less than the second.  Only @code{LTGT_EXPR}
+is expected to raise an invalid floating-point-operation trap when the
+outcome is unordered.  All other operations are guaranteed not to raise
+a floating point exception.  The result
 type of these expressions will always be of integral or boolean type.
 These operations return the result type's zero value for false,
 and the result type's one value for true.
 
+@item MIN_EXPR
+@itemx MAX_EXPR
+These nodes represent minimum and maximum operations.  When used with
+floating point, if both operands are zeros, or if either operand is
+@code{NaN}, then it is unspecified which of the two operands is returned
+as the result and whether or not a floating point exception is raised.
+
 @item MODIFY_EXPR
 These nodes represent assignment.  The left-hand side is the first
 operand; the right-hand side is the second operand.  The left-hand side
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7751984bf5f..74f8ec84974 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5353,7 +5353,8 @@ in the rtl as
 @item @samp{smin@var{m}3}, @samp{smax@var{m}3}
 Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
-it is unspecified which of the two operands is returned as the result.
+it is unspecified which of the two operands is returned as the result
+and whether or not a floating point exception is raised.
 
 @cindex @code{fmin@var{m}3} instruction pattern
 @cindex @code{fmax@var{m}3} instruction pattern
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 0814b66a486..e0628da893d 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -2596,7 +2596,8 @@ Represents the smaller (for @code{smin}) or larger (for 
@code{smax}) of
 @var{x} and @var{y}, interpreted as signed values in mode @var{m}.
 When used with floating point, if both operands are zeros, or if either
 operand is @code{NaN}, then it is unspecified which of the two operands
-is returned as the result.
+is returned as the result and whether or not a floating point exception
+is raised.
 
 @findex umin
 @findex umax
-- 
2.21.0



[PATCH v2 2/9] hash_traits: split pointer_hash_mark from pointer_hash

2019-08-22 Thread Ilya Leoshkevich
The next patch introducing can_vector_compare_p function needs to store
rtxes in a hash table and look them up using a special key type.
Currently pointer_hash requires value_type to be the same as
compare_type, so it would not be usable and one would have to implement
mark_deleted, mark_empty, is_deleted and is_empty manually.

Split pointer_hash_mark out of pointer_hash in order to support such use
cases.  Also make use of it in the existing code where possible.

gcc/ChangeLog:

2019-08-22  Ilya Leoshkevich  

* hash-traits.h (struct pointer_hash_mark): New trait.
(Pointer>::mark_deleted): Move from pointer_hash.
(Pointer>::mark_empty): Likewise.
(Pointer>::is_deleted): Likewise.
(Pointer>::is_empty): Likewise.
(struct pointer_hash): Inherit from pointer_hash_mark.
(Type>::mark_deleted): Move to pointer_hash_mark.
(Type>::mark_empty): Likewise.
(Type>::is_deleted): Likewise.
(Type>::is_empty): Likewise.
* ipa-prop.c (struct ipa_bit_ggc_hash_traits): Use
pointer_hash_mark.
(struct ipa_vr_ggc_hash_traits): Likewise.

gcc/cp/ChangeLog:

2019-08-22  Ilya Leoshkevich  

* decl2.c (struct mangled_decl_hash): Use pointer_hash_mark.
---
 gcc/cp/decl2.c| 14 +
 gcc/hash-traits.h | 74 ++-
 gcc/ipa-prop.c| 47 --
 3 files changed, 47 insertions(+), 88 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index a32108f9d16..36a10f491fa 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -105,7 +105,7 @@ static GTY(()) vec *mangling_aliases;
 /* hash traits for declarations.  Hashes single decls via
DECL_ASSEMBLER_NAME_RAW.  */
 
-struct mangled_decl_hash : ggc_remove 
+struct mangled_decl_hash : pointer_hash_mark , ggc_remove 
 {
   typedef tree value_type; /* A DECL.  */
   typedef tree compare_type; /* An identifier.  */
@@ -119,18 +119,6 @@ struct mangled_decl_hash : ggc_remove 
 tree name = DECL_ASSEMBLER_NAME_RAW (existing);
 return candidate == name;
   }
-
-  static inline void mark_empty (value_type ) {p = NULL_TREE;}
-  static inline bool is_empty (value_type p) {return !p;}
-
-  static bool is_deleted (value_type e)
-  {
-return e == reinterpret_cast  (1);
-  }
-  static void mark_deleted (value_type )
-  {
-e = reinterpret_cast  (1);
-  }
 };
 
 /* A hash table of decls keyed by mangled name.  Used to figure out if
diff --git a/gcc/hash-traits.h b/gcc/hash-traits.h
index 2d17e2c982a..e5c9e88d99f 100644
--- a/gcc/hash-traits.h
+++ b/gcc/hash-traits.h
@@ -136,12 +136,52 @@ int_hash ::is_empty (Type x)
   return x == Empty;
 }
 
+/* Base class for pointer hashers that want to implement marking in a generic
+   way.  */
+
+template 
+struct pointer_hash_mark
+{
+  static inline void mark_deleted (Pointer &);
+  static inline void mark_empty (Pointer &);
+  static inline bool is_deleted (Pointer);
+  static inline bool is_empty (Pointer);
+};
+
+template 
+inline void
+pointer_hash_mark ::mark_deleted (Pointer )
+{
+  e = reinterpret_cast (1);
+}
+
+template 
+inline void
+pointer_hash_mark ::mark_empty (Pointer )
+{
+  e = NULL;
+}
+
+template 
+inline bool
+pointer_hash_mark ::is_deleted (Pointer e)
+{
+  return e == reinterpret_cast (1);
+}
+
+template 
+inline bool
+pointer_hash_mark ::is_empty (Pointer e)
+{
+  return e == NULL;
+}
+
 /* Pointer hasher based on pointer equality.  Other types of pointer hash
can inherit this and override the hash and equal functions with some
other form of equality (such as string equality).  */
 
 template 
-struct pointer_hash
+struct pointer_hash : pointer_hash_mark
 {
   typedef Type *value_type;
   typedef Type *compare_type;
@@ -149,10 +189,6 @@ struct pointer_hash
   static inline hashval_t hash (const value_type &);
   static inline bool equal (const value_type ,
const compare_type );
-  static inline void mark_deleted (Type *&);
-  static inline void mark_empty (Type *&);
-  static inline bool is_deleted (Type *);
-  static inline bool is_empty (Type *);
 };
 
 template 
@@ -172,34 +208,6 @@ pointer_hash ::equal (const value_type ,
   return existing == candidate;
 }
 
-template 
-inline void
-pointer_hash ::mark_deleted (Type *)
-{
-  e = reinterpret_cast (1);
-}
-
-template 
-inline void
-pointer_hash ::mark_empty (Type *)
-{
-  e = NULL;
-}
-
-template 
-inline bool
-pointer_hash ::is_deleted (Type *e)
-{
-  return e == reinterpret_cast (1);
-}
-
-template 
-inline bool
-pointer_hash ::is_empty (Type *e)
-{
-  return e == NULL;
-}
-
 /* Hasher for "const char *" strings, using string rather than pointer
equality.  */
 
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 1a0e12e6c0c..6fde7500cf4 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -64,7 +64,8 @@ ipa_edge_args_sum_t *ipa_edge_args_sum;
 
 /* Traits for a hash table for reusing already existing ipa_bits. */
 
-struct ipa_bit_ggc_hash_traits 

[PATCH] Fix PR91522

2019-08-22 Thread Richard Biener


This fixes quadraticness in STV and makes

 machine dep reorg  :  89.07 ( 95%)   0.02 ( 18%)  89.10 ( 
95%)  54 kB (  0%)

drop to zero.  Anybody remembers why it is the way it is now?

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2019-08-22  Richard Biener  

PR target/91522
* config/i386/i386-features.c (scalar_chain::add_insn): Do not
iterate over all defs of a reg.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274764)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
   df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
 if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-  for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-  def;
-  def = DF_REF_NEXT_REG (def))
-   analyze_register_chain (candidates, def);
+  analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
 if (!DF_REF_REG_MEM_P (ref))
   analyze_register_chain (candidates, ref);



Re: [RFC] Enable math functions linking with static library for LTO

2019-08-22 Thread Joseph Myers
On Thu, 22 Aug 2019, Richard Biener wrote:

> Not sure if that helps answering your question.  In essence,
> this function should tell whether we would emit an UNDEF
> in the end and we want to know that early, at LTO IL generation
> time to communicate this to the linker plugin.  Given previous

To me this sounds like e.g. roundeven and fadd (and, generally, any new 
libm functions we add built-in functions for) *should* be included in the 
list like all the other libm functions, given that on many systems they 
can't be expanded inline for all arguments and so will generate UNDEFs 
(and some people might want to get them from a static library) - even if 
some systems' libm might in fact not have those functions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][ARM] Remove remaining Neon DImode support

2019-08-22 Thread Kyrill Tkachov

Hi Wilco,

On 7/22/19 5:18 PM, Wilco Dijkstra wrote:

Remove the remaining Neon adddi3, subdi3 and negdi2 patterns.  As a result
adddi3, subdi3 and negdi2 can now always be expanded early 
irrespectively of

whether Neon is available.  Also expand the extenddi patterns at the same
time.  Several Neon arch attributes are no longer used and removed.

Code generation is improved in all cases, saving another 400-500 
instructions
from the PR77308 testcase (total improvement is over 1700 instructions 
with

-mcpu=cortex-a57 -O2).

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57


Ok.

Thanks,

Kyrill


ChangeLog:
2019-07-19  Wilco Dijkstra  

* config/arm/arm.md (neon_for_64bits): Remove.
(avoid_neon_for_64bits): Remove.
(arm_adddi3): Always split early.
(arm_subdi3): Always split early.
(negdi2): Remove Neon expansion.
(split zero_extend): Split before reload.
(split sign_extend): Split before reload.
---

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
10ed70dac4384354c0a2453c5e51a29108c6c062..6d8a5a54997caee0e6956f01018cb5300a9a07e1 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -125,7 +125,7 @@ (define_attr "length" ""
 ; arm_arch6.  "v6t2" for Thumb-2 with arm_arch6 and "v8mb" for ARMv8-M
 ; Baseline.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,v8mb,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"
+(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,v8mb,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"

   (const_string "any"))

 (define_attr "arch_enabled" "no,yes"
@@ -168,16 +168,6 @@ (define_attr "arch_enabled" "no,yes"
   (match_test "TARGET_THUMB1 && arm_arch8"))
  (const_string "yes")

- (and (eq_attr "arch" "avoid_neon_for_64bits")
-  (match_test "TARGET_NEON")
-  (not (match_test "TARGET_PREFER_NEON_64BITS")))
- (const_string "yes")
-
- (and (eq_attr "arch" "neon_for_64bits")
-  (match_test "TARGET_NEON")
-  (match_test "TARGET_PREFER_NEON_64BITS"))
- (const_string "yes")
-
  (and (eq_attr "arch" "iwmmxt2")
   (match_test "TARGET_REALLY_IWMMXT2"))
  (const_string "yes")
@@ -450,13 +440,8 @@ (define_expand "adddi3"
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
-  if (TARGET_THUMB1)
-    {
-  if (!REG_P (operands[1]))
-    operands[1] = force_reg (DImode, operands[1]);
-  if (!REG_P (operands[2]))
-    operands[2] = force_reg (DImode, operands[2]);
- }
+  if (TARGET_THUMB1 && !REG_P (operands[2]))
+    operands[2] = force_reg (DImode, operands[2]);
   "
 )

@@ -465,9 +450,9 @@ (define_insn_and_split "*arm_adddi3"
 (plus:DI (match_operand:DI 1 "arm_general_register_operand" "%0, 0, 
r, 0, r")

  (match_operand:DI 2 "arm_general_adddi_operand" "r,  0, r, Dd, Dd")))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT && !TARGET_NEON"
+  "TARGET_32BIT"
   "#"
-  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || 
reload_completed)"

+  "TARGET_32BIT"
   [(parallel [(set (reg:CC_C CC_REGNUM)
    (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
  (match_dup 1)))
@@ -1290,24 +1275,16 @@ (define_expand "subdi3"
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
-  if (TARGET_THUMB1)
-    {
-  if (!REG_P (operands[1]))
-    operands[1] = force_reg (DImode, operands[1]);
-  if (!REG_P (operands[2]))
-    operands[2] = force_reg (DImode, operands[2]);
- }
-  "
-)
+")

 (define_insn_and_split "*arm_subdi3"
   [(set (match_operand:DI   0 "arm_general_register_operand" 
"=,,")

 (minus:DI (match_operand:DI 1 "arm_general_register_operand" "0,r,0")
   (match_operand:DI 2 "arm_general_register_operand" "r,0,0")))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT && !TARGET_NEON"
+  "TARGET_32BIT"
   "#"  ; "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2"
-  "&& (!TARGET_IWMMXT || reload_completed)"
+  "TARGET_32BIT"
   [(parallel [(set (reg:CC CC_REGNUM)
    (compare:CC (match_dup 1) (match_dup 2)))
   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
@@ -4164,13 +4141,6 @@ (define_expand "negdi2"
  (neg:DI (match_operand:DI 1 "s_register_operand")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
-  {
-    if (TARGET_NEON)
-  {
-    emit_insn (gen_negdi2_neon (operands[0], operands[1]));
-DONE;
-  }
-  }
 )

 ;; The constraints here are to prevent a *partial* overlap (where %Q0 
== %R1).

@@ -4182,7 +4152,7 @@ (define_insn_and_split "*negdi2_insn"
   "TARGET_32BIT"
   "#"; rsbs %Q0, %Q1, #0; rsc %R0, %R1, #0   (ARM)
 ; negs %Q0, %Q1    ; sbc %R0, %R1, %R1, lsl #1 (Thumb-2)
-  "&& reload_completed"
+  "TARGET_32BIT"
   [(parallel [(set (reg:CC CC_REGNUM)
    (compare:CC (const_int 0) (match_dup 1)))
   (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))])
@@ -4714,25 +4684,17 @@ (define_insn "extenddi2"
 (define_split
   [(set (match_operand:DI 0 "s_register_operand" "")

Re: [PATCH][ARM] Cleanup DImode shifts

2019-08-22 Thread Kyrill Tkachov

Hi Wilco,

On 7/31/19 5:25 PM, Wilco Dijkstra wrote:

ping


Like the logical operations, expand all shifts early rather than only
 sometimes.  The Neon shift expansions are never emitted (not even with
 -fneon-for-64bits), so they are not useful.  So all the late expansions
 and Neon shift patterns can be removed, and shifts are more optimized
 as a result.  Since some extend patterns use Neon DImode shifts, remove
 the Neon extend variants and related splits.

 A simple example (relying on [1]) generates the same efficient code after
 this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
 having Neon enabled resulted inefficient code for no reason).

 unsigned long long f(unsigned long long x, unsigned long long y)
 { return x & (y >> 33); }

 Before:
 strd    r4, r5, [sp, #-8]!
 lsr r4, r3, #1
 mov r5, #0
 and r1, r1, r5
 and r0, r0, r4
 ldrd    r4, r5, [sp]
 add sp, sp, #8
 bx  lr

 After:
 and r0, r0, r3, lsr #1
 mov r1, #0
 bx  lr

 Bootstrap and regress OK on arm-none-linux-gnueabihf 
--with-cpu=cortex-a57


Seems to me we should deprecate -mneon-for-64bits for GCC 10 and look to 
remove (or make it a no-op at least) in future releases...


Ok for trunk.

Please keep an eye for regressions as with the other patches in this series.

Thanks,

Kyrill



 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html

 ChangeLog:
 2019-07-19  Wilco Dijkstra  

 * config/arm/iterators.md (qhs_extenddi_cstr): Update.
 (qhs_extenddi_cstr): Likewise.
 * config/arm/arm.md (ashldi3): Always expand early.
 (ashlsi3): Likewise.
 (ashrsi3): Likewise.
 (zero_extenddi2): Remove Neon variants.
 (extenddi2): Likewise.
 * config/arm/neon.md (ashldi3_neon_noclobber): Remove.
 (signed_shift_di3_neon): Likewise.
 (unsigned_shift_di3_neon): Likewise.
 (ashrdi3_neon_imm_noclobber): Likewise.
 (lshrdi3_neon_imm_noclobber): Likewise.
 (di3_neon): Likewise.
 (split extend): Remove DI extend split patterns.

 testsuite/
 * gcc.target/arm/neon-extend-1.c: Remove test.
 * gcc.target/arm/neon-extend-2.c: Remove test.
 ---

 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
    [(set (match_operand:DI    0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
 -   (match_operand:SI 2 "general_operand")))]
 +   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashldi3_neon (operands[0], operands[1], 
operands[2]));

 -  DONE;
 -    }
 -  else
 -    {
 -  /* Only the NEON case can handle in-memory shift counts.  */
 -  if (!reg_or_int_operand (operands[2], SImode))
 -    operands[2] = force_reg (SImode, operands[2]);
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that 
operands[1]

 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values to iwmmxt regs and back.  */
 -
 -  /* Expand operation using core-registers.
 -    'FAIL' would achieve the same thing, but this is a bit 
smarter.  */

 -  scratch1 = gen_reg_rtx (SImode);
 -  scratch2 = gen_reg_rtx (SImode);
 -  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 -    operands[2], scratch1, scratch2);
 -  DONE;
 -    }
 -  "
 -)
 +  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 +    operands[2], gen_reg_rtx (SImode),
 +    gen_reg_rtx (SImode));
 +  DONE;
 +")

  (define_expand "ashlsi3"
    [(set (match_operand:SI    0 "s_register_operand")
 @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], 
operands[2]));

 -  DONE;
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  

Re: [PATCH][ARM] Cleanup logical DImode operations

2019-08-22 Thread Kyrill Tkachov

Hi Wilco,

On 7/19/19 12:30 PM, Wilco Dijkstra wrote:


Cleanup the logical DImode operations since the current implementation 
is way
too complicated.  Thumb-1, Thumb-2, VFP/Neon and iwMMXt all work 
differently,
resulting in a bewildering number of expansions, patterns and splits 
across

several md files.  All this complexity is counterproductive and results in
inefficient code.

A much simpler approach is to split these operations early in the expander
so that optimizations and register allocation are applied on the 
32-bit halves.
Code generation is unchanged on Thumb-1 and Arm/Thumb-2 without Neon 
or iwMMXt
(which already expand these instructions early). With Neon these 
changes save
~1000 instructions from the PR77308 testcase, mostly by significantly 
reducing

register pressure and spilling.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

OK for commit?


Ok. Thanks for doing this.

Please keep an eye out for fallout after committing.

Kyrill




ChangeLog:
2019-07-18  Wilco Dijkstra  

* config/arm/arm.md (split and/eor/ior): Remove Neon check.
(split not): Add DImode not splitter.
(anddi3): Remove pattern.
(anddi3_insn): Likewise.
(anddi_zesidi_di): Likewise.
(anddi_sesdi_di): Likewise.
(anddi_notdi_di): Likewise.
(anddi_notzesidi_di): Likewise.
(anddi_notsesidi_di): Likewise.
(iordi3): Likewise.
(iordi3_insn): Likewise.
(iordi_zesidi_di): Likewise.
(iordi_sesidi_di): Likewise.
(xordi3): Likewise.
(xordi3_insn): Likewise.
(xordi_sesidi_di): Likewise.
(xordi_zesidi_di): Likewise.
(one_cmpldi2): Likewise.
(one_cmpldi2_insn): Likewise.
* config/arm/constraints.md: Remove De, Df, Dg constraints.
* config/arm/iwmmxt.md (iwmmxt_iordi3): Remove general register
alternative.
(iwmmxt_xordi3): Likewise.
(iwmmxt_anddi3): Likewise.
* config/arm/neon.md (orndi3_neon): Remove pattern.
(anddi_notdi_di): Likewise.
* config/arm/predicates.md (arm_anddi_operand_neon): Remove.
(arm_iordi_operand_neon): Likewise.
(arm_xordi_operand_neon): Likewise.
* config/arm/thumb2.md(iordi_notdi_di): Remove pattern.
(iordi_notzesidi_di): Likewise.
(iordi_notdi_zesidi): Likewise.
(iordi_notsesidi_di): Likewise.


---
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
8f4a4c26ea849a023f2e63d2efbf327423512dfc..cab59c403b777c37c1e412ab9a69db2c2ec533a2 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2183,19 +2183,16 @@ (define_expand "divdf3"
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
   "")

-;; Boolean and,ior,xor insns

-;; Split up double word logical operations
-
-;; Split up simple DImode logical operations. Simply perform the logical
+;; Split DImode and, ior, xor operations.  Simply perform the logical
 ;; operation on the upper and lower halves of the registers.
+;; This is needed for atomic operations in arm_split_atomic_op.
 (define_split
   [(set (match_operand:DI 0 "s_register_operand" "")
 (match_operator:DI 6 "logical_binary_operator"
   [(match_operand:DI 1 "s_register_operand" "")
    (match_operand:DI 2 "s_register_operand" "")]))]
   "TARGET_32BIT && reload_completed
-   && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))
    && ! IS_IWMMXT_REGNUM (REGNO (operands[0]))"
   [(set (match_dup 0) (match_op_dup:SI 6 [(match_dup 1) (match_dup 2)]))
    (set (match_dup 3) (match_op_dup:SI 6 [(match_dup 4) (match_dup 5)]))]
@@ -2210,167 +2207,20 @@ (define_split
   }"
 )

+;; Split DImode not (needed for atomic operations in 
arm_split_atomic_op).

 (define_split
-  [(set (match_operand:DI 0 "s_register_operand" "")
-(match_operator:DI 6 "logical_binary_operator"
-  [(sign_extend:DI (match_operand:SI 2 "s_register_operand" ""))
-   (match_operand:DI 1 "s_register_operand" "")]))]
-  "TARGET_32BIT && reload_completed"
-  [(set (match_dup 0) (match_op_dup:SI 6 [(match_dup 1) (match_dup 2)]))
-   (set (match_dup 3) (match_op_dup:SI 6
-[(ashiftrt:SI (match_dup 2) (const_int 31))
- (match_dup 4)]))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[5] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-)
-
-;; The zero extend of operand 2 means we can just copy the high part of
-;; operand1 into operand0.
-(define_split
-  [(set (match_operand:DI 0 "s_register_operand" "")
-(ior:DI
-  (zero_extend:DI (match_operand:SI 2 "s_register_operand" ""))
-  (match_operand:DI 1 "s_register_operand" "")))]
-  "TARGET_32BIT && operands[0] != operands[1] && reload_completed"
-  [(set (match_dup 0) (ior:SI (match_dup 1) (match_dup 2)))
-   (set (match_dup 3) (match_dup 4))]
-  "
-  {
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-)
-
-;; The 

[PATCH] or1k: Fix issue with set_got clobbering r9

2019-08-22 Thread Stafford Horne
When compiling glibc we found that the GOT register was being allocated
r9 when the instruction was still set_got_tmp.  That caused set_got to
clobber r9.  We cannot simply say set_got_tmp clobbers r9 as this is the
reason for having the temporary set_got_tmp.

Fix by using a register class constraint that does not allow r9 during
register allocation.

gcc/ChangeLog:

* config/or1k/constraints.md (t): New constraint.
* config/or1k/or1k.h (GOT_REGS): New register class.
* config/or1k/or1k.md (set_got_tmp, set_got): Use t contraint.
---
 gcc/config/or1k/constraints.md | 4 
 gcc/config/or1k/or1k.h | 3 +++
 gcc/config/or1k/or1k.md| 4 ++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/or1k/constraints.md b/gcc/config/or1k/constraints.md
index 8cac7eb5329..ba330c6b8c2 100644
--- a/gcc/config/or1k/constraints.md
+++ b/gcc/config/or1k/constraints.md
@@ -25,6 +25,7 @@
 ; We use:
 ;  c - sibcall registers
 ;  d - double pair base registers (excludes r0, r30 and r31 which overflow)
+;  t - got address registers (excludes r9 is clobbered by set_got)
 ;  I - constant signed 16-bit
 ;  K - constant unsigned 16-bit
 ;  M - constant signed 16-bit shifted left 16-bits (l.movhi)
@@ -36,6 +37,9 @@
 (define_register_constraint "d" "DOUBLE_REGS"
   "Registers which can be used for double reg pairs.")
 
+(define_register_constraint "t" "GOT_REGS"
+  "Registers which can be used to store the Global Offset Table (GOT) 
address.")
+
 ;; Immediates
 (define_constraint "I"
   "A signed 16-bit immediate in the range -32768 to 32767."
diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index 2b29e62fdd3..4c32607bac1 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -190,6 +190,7 @@ enum reg_class
   NO_REGS,
   SIBCALL_REGS,
   DOUBLE_REGS,
+  GOT_REGS,
   GENERAL_REGS,
   FLAG_REGS,
   ALL_REGS,
@@ -202,6 +203,7 @@ enum reg_class
   "NO_REGS",   \
   "SIBCALL_REGS",  \
   "DOUBLE_REGS",   \
+  "GOT_REGS",  \
   "GENERAL_REGS",  \
   "FLAG_REGS", \
   "ALL_REGS" }
@@ -215,6 +217,7 @@ enum reg_class
 { { 0x, 0x },  \
   { SIBCALL_REGS_MASK,   0 },  \
   { 0x7f7e, 0x },  \
+  { 0xfdff, 0x },  \
   { 0x, 0x0003 },  \
   { 0x, 0x0004 },  \
   { 0x, 0x0007 }   \
diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
index cee11d078cc..36bcee336ab 100644
--- a/gcc/config/or1k/or1k.md
+++ b/gcc/config/or1k/or1k.md
@@ -706,7 +706,7 @@
 ;; set_got pattern below.  This works because the set_got_tmp insn is the
 ;; first insn in the stream and that it isn't moved during RA.
 (define_insn "set_got_tmp"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=t")
(unspec_volatile:SI [(const_int 0)] UNSPECV_SET_GOT))]
   ""
 {
@@ -715,7 +715,7 @@
 
 ;; The insn to initialize the GOT.
 (define_insn "set_got"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=t")
(unspec:SI [(const_int 0)] UNSPEC_SET_GOT))
(clobber (reg:SI LR_REGNUM))]
   ""
-- 
2.21.0



Re: [patch][aarch64]: add intrinsics for vld1(q)_x4 and vst1(q)_x4

2019-08-22 Thread Kyrill Tkachov



On 8/19/19 5:18 PM, James Greenhalgh wrote:

On Thu, Aug 15, 2019 at 12:28:27PM +0100, Kyrill Tkachov wrote:
> Hi all,
>
> On 8/6/19 10:51 AM, Richard Earnshaw (lists) wrote:
> On 18/07/2019 18:18, James Greenhalgh wrote:
> > On Mon, Jun 10, 2019 at 06:21:05PM +0100, Sylvia Taylor wrote:
> >> Greetings,
> >>
> >> This patch adds the intrinsic functions for:
> >> - vld1__x4
> >> - vst1__x4
> >> - vld1q__x4
> >> - vst1q__x4
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >>
> >> Ok for trunk? If yes, I don't have any commit rights, so can someone
> >> please commit it on my behalf.
> >
> > Hi,
> >
> > I'm concerned by this strategy for implementing the arm_neon.h 
builtins:

> >
> >> +__extension__ extern __inline int8x8x4_t
> >> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >> +vld1_s8_x4 (const int8_t *__a)
> >> +{
> >> +  union { int8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
> >> +  __au.__o
> >> +    = __builtin_aarch64_ld1x4v8qi ((const 
__builtin_aarch64_simd_qi *) __a);

> >> +  return __au.__i;
> >> +}
> >
> > As far as I know this is undefined behaviour in C++11. This was 
the best

> > resource I could find pointing to the relevant standards paragraphs.
> >
> > 
https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior

> >
> > That said, GCC explicitly allows it, so maybe this is fine?
> >
> > 
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Optimize-Options.html#Type-punning

> >
> > Can anyone from the languages side chime in on whether we're exposing
> > undefined behaviour (in either C or C++) here?
>
> Yes, this is a GNU extension.  My only question is whether or not this
> can be disabled within GCC if you're trying to check for strict
> standards conformance of your code?  And if so, is there a way of making
> sure that this header still works in that case?  A number of GNU
> extensions can be protected with __extension__ but it's not clear how
> that could be applied in this case.  Perhaps the outer __extension__ on
> the function will already do that.
>
>
> It should still work. The only relevant flag is -fstrict-aliasing 
and it is

> documented to preserve this case:
>
> 
https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Optimize-Options.html#Optimize-Options

>
> Note that we've already been using this idiom in arm_neon.h since 
2014 [1]

> and it's worked fine.

Based on that input, this is OK for trunk.



I've committed this to trunk on Sylvia's behalf as r274820.

Thanks,

Kyrill



Thanks,
James

>
> Thanks,
>
> Kyrill
>
> [1] http://gcc.gnu.org/r209880
>
>
>
> R.


Re: [SVE] PR86753

2019-08-22 Thread Richard Biener
On Wed, Aug 21, 2019 at 8:24 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 15 Aug 2019 at 01:50, Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Wed, Aug 14, 2019 at 6:49 PM Richard Biener
> > >  wrote:
> > >>
> > >> On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni
> > >>  wrote:
> > >> >
> > >> > Hi,
> > >> > The attached patch tries to fix PR86753.
> > >> >
> > >> > For following test:
> > >> > void
> > >> > f1 (int *restrict x, int *restrict y, int *restrict z)
> > >> > {
> > >> >   for (int i = 0; i < 100; ++i)
> > >> > x[i] = y[i] ? z[i] : 10;
> > >> > }
> > >> >
> > >> > vect dump shows:
> > >> >   vect_cst__42 = { 0, ... };
> > >> >   vect_cst__48 = { 0, ... };
> > >> >
> > >> >   vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40);
> > >> >   _4 = *_3;
> > >> >   _5 = z_12(D) + _2;
> > >> >   mask__35.8_43 = vect__4.7_41 != vect_cst__42;
> > >> >   _35 = _4 != 0;
> > >> >   vec_mask_and_46 = mask__35.8_43 & loop_mask_40;
> > >> >   vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46);
> > >> >   iftmp.0_13 = 0;
> > >> >   vect_iftmp.12_50 = VEC_COND_EXPR  > >> > vect_iftmp.11_47, vect_cst__49>;
> > >> >
> > >> > and following code-gen:
> > >> > L2:
> > >> > ld1wz0.s, p2/z, [x1, x3, lsl 2]
> > >> > cmpne   p1.s, p3/z, z0.s, #0
> > >> > cmpne   p0.s, p2/z, z0.s, #0
> > >> > ld1wz0.s, p0/z, [x2, x3, lsl 2]
> > >> > sel z0.s, p1, z0.s, z1.s
> > >> >
> > >> > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions
> > >> > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42
> > >> > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != 
> > >> > vect_cst__48.
> > >> >
> > >> > I suppose in general for vec_cond_expr  if T comes from 
> > >> > masked load,
> > >> > which is conditional on C, then we could reuse the mask used in load,
> > >> > in vec_cond_expr ?
> > >> >
> > >> > The patch maintains a hash_map cond_to_vec_mask
> > >> > from  vec_mask (with loop predicate applied).
> > >> > In prepare_load_store_mask, we record  -> vec_mask & 
> > >> > loop_mask,
> > >> > and in vectorizable_condition, we check if  exists in
> > >> > cond_to_vec_mask
> > >> > and if found, the corresponding vec_mask is used as 1st operand of
> > >> > vec_cond_expr.
> > >> >
> > >> >  is represented with cond_vmask_key, and the patch
> > >> > adds tree_cond_ops to represent condition operator and operands coming
> > >> > either from cond_expr
> > >> > or a gimple comparison stmt. If the stmt is not comparison, it returns
> > >> >  and inserts that into cond_to_vec_mask.
> > >> >
> > >> > With patch, the redundant p1 is eliminated and sel uses p0 for above 
> > >> > test.
> > >> >
> > >> > For following test:
> > >> > void
> > >> > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback)
> > >> > {
> > >> >   for (int i = 0; i < 100; ++i)
> > >> > x[i] = y[i] ? z[i] : fallback;
> > >> > }
> > >> >
> > >> > input to vectorizer has operands swapped in cond_expr:
> > >> >   _36 = _4 != 0;
> > >> >   iftmp.0_14 = .MASK_LOAD (_5, 32B, _36);
> > >> >   iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14;
> > >> >
> > >> > So we need to check for inverted condition in cond_to_vec_mask,
> > >> > and swap the operands.
> > >> > Does the patch look OK so far ?
> > >> >
> > >> > One major issue remaining with the patch is value  numbering.
> > >> > Currently, it does value numbering for entire function using sccvn
> > >> > during start of vect pass, which is too expensive since we only need
> > >> > block based VN. I am looking into that.
> > >>
> > >> Why do you need it at all?  We run VN on the if-converted loop bodies 
> > >> btw.
> >
> > This was my suggestion, but with the idea being to do the numbering
> > per-statement as we vectorise.  We'll then see pattern statements too.
> >
> > That's important because we use pattern statements to set the right
> > vector boolean type (e.g. vect_recog_mask_conversion_pattern).
> > So some of the masks we care about don't exist after if converison.
> >
> > > Also I can't trivially see the equality of the masks and probably so
> > > can't VN.  Is it that we just don't bother to apply loop_mask to
> > > VEC_COND but there's no harm if we do?
> >
> > Yeah.  The idea of the optimisation is to decide when it's more profitable
> > to apply the loop mask, even though doing so isn't necessary.  It would
> > be hard to do after vectorisation because the masks aren't equivalent.
> > We're relying on knowledge of how the vectoriser uses the result.
> Hi,
> Sorry for late response. This is an updated patch, that integrates
> block-based VN into vect pass.
> The patch
> (a) Exports visit_stmt (renamed to vn_visit_stmt), vn_bb_init to
> initialize VN state, and vn_bb_free to free it.
> (b) Calls vn_visit_stmt in vect_transform_stmt for value numbering
> stmts. We're only interested in obtaining
> value numbers, not eliminating redundancies.
> Does it look in the 

Re: [PATCH][AArch64] Add support for missing CPUs

2019-08-22 Thread Kyrill Tkachov

Hi Dennis,

On 8/21/19 10:27 AM, Dennis Zhang wrote:

Hi all,

This patch adds '-mcpu' options for following CPUs:
Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and Cortex-A34.

Related specifications are as following:
https://developer.arm.com/ip-products/processors/cortex-a

Bootstraped/regtested for aarch64-none-linux-gnu.

Please help to check if it's ready.


This looks ok to me but you'll need maintainer approval.

Thanks,

Kyrill



Many thanks!
Dennis

gcc/ChangeLog:

2019-08-21  Dennis Zhang  

    * config/aarch64/aarch64-cores.def (AARCH64_CORE): New entries
    for Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and
    Cortex-A34.
    * config/aarch64/aarch64-tune.md: Regenerated.
    * doc/invoke.texi: Document the new processors.


Re: [PATCH 2/8] bpf: new GCC port

2019-08-22 Thread Segher Boessenkool
On Thu, Aug 22, 2019 at 06:26:57AM -0400, Hans-Peter Nilsson wrote:
> On Wed, 21 Aug 2019, Segher Boessenkool wrote:
> > On Tue, Aug 20, 2019 at 04:05:40PM -0400, Hans-Peter Nilsson wrote:
> > > On Tue, 20 Aug 2019, Jose E. Marchesi wrote:
> > > > > On Thu, Aug 15, 2019 at 12:22:46AM +0200, Jose E. Marchesi wrote:
> > > > > > --- a/configure
> > > > > > +++ b/configure
> > > > Yeah by mistake I used a Debian patched autoconf 2.96.  Will regenerate
> > > > using vanilla autoconf for subsequent versions of the patch.
> > >
> > > It's nice that this is identified and hopefully resolved, but
> > > since nobody mentioned it I'll just point out that
> > > it's preferable to *not at all* include generated files like
> > > configure in patches.  See
> > > .
> >
> > "Do not include generated files as part of the patch, just mention them
> > in the ChangeLog (e.g., "* configure: Regenerate.")."
> >
> > That's not common practice nowadays I think?
> 
> I don't know about that.  But, if that's a valid observation
> then I'd say people just tend to generally drift to the path of
> least resistance for the immediate task at hand.  It's rarely an
> indication of *good* practice by itself, but perhaps that it's
> time to bring it up.

(The changelog should just say "Regenerate.", I'm refering only to
posting the actual diff to the configure script).

> > It's also not good advice
> > for people who might get it wrong.
> 
> You mean the autoconf version, I guess.

For people who get the regeneration wrong in any way.  Wrong version of
auto is frequent, sure.

> That's about the only
> value of including generated files; for newcomers that miss
> using matched versions.  Including bulk in patches can also
> cause grief by messages being lost when they exceed the
> size-limit.

Again, I only meant the configuration scripts.

(What autogenerated files are much bigger than their source files, btw?
As diffs even?)

> >  Also, the patches on the mailing
> > list should preferably be exactly what is committed.  For sanity.
> 
> IMHO sanity for gcc-patches is maximum review-value and
> readability for others, avoiding redundancy.  I'd just think
> sending generated files for review is as redundant now as it was
> when that advice was added.

For many things you are right afaics.  But I see people getting auto
wrong often.

> Though, if you as a frequent reviewer believe that the valued
> majority of reviewers and readers thinks differently, I suggest
> taking steps to update the advice.  Either way, keep the advice
> up-to-date, for sanity.

:-)


Segher


Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-22 Thread Richard Biener
On Wed, Aug 21, 2019 at 5:53 PM Jeff Law  wrote:
>
> On 8/21/19 7:57 AM, Wilco Dijkstra wrote:
> > Hi Richard,
> >
> >>>
> >>> I think this should be in expand stage where there could be comparison
> >>> of the cost of the RTLs.
> >>
> >> I tend to agree here, if not then for the reason the "simplified" variants
> >> have more GIMPLE stmts which means they are not "simpler".  In
> >> fact I'd argue for canonicalization we'd want to have the reverse
> >> "simplifications" on GIMPLE and expansion based on target cost.
> >
> > So how would this work? Expand works on one statement at a time, but
> > we are dealing with more complex expressions here. When we get a
> > popcount (x) > 1 in expand_gimple_cond, the popcount has already been
> > expanded. And the code in builtins.c that emits popcount doesn't see or
> > consider the comparison, so it would be difficult to change it at that 
> > point.
> > None of the infrastructure in expand seems to be set up to do complex
> > pattern matches and replacements at expand time...
> >
> > Costing would be difficult too since rtx_cost doesn't support builtins or
> > calls, so each backend would need to be modified to add costs for these.
> >
> > So what is the best place to do pattern matches? I thought it was all
> > moving to match.pd.
> I believe the expanders have access to more than one statement via the
> use-def chains and TER's transformations.

Either that or as repeatedly suggested elsewhere more complex
"expand time instruction selection" can happen on GIMPLE right
before RTL expansion (pass_optimize_widening_mul is a pass
doing something like that).  We probably want to have a more
formalized thing at some point as part of RTL expansion itself
to also get rid of TER.

The issue with using TER for this is that TER doesn't "handle"
internal FN calls I think (which is simply an oversight):

  /* Increment counter if this is a non BUILT_IN call. We allow
 replacement over BUILT_IN calls since many will expand to inline
 insns instead of a true call.  */
  if (is_gimple_call (stmt)
  && !((fndecl = gimple_call_fndecl (stmt))
   && fndecl_built_in_p (fndecl)))
cur_call_cnt++;

Richard.

> jeff


Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

2019-08-22 Thread Richard Biener
On Thu, Aug 22, 2019 at 12:48 AM Jeff Law  wrote:
>
> On 8/19/19 7:57 AM, Richard Biener wrote:
>
> >
> > +static tree
> > +objsize_from_type (tree object)
> > +{
> > +  if (TREE_CODE (object) != ADDR_EXPR)
> > +return NULL_TREE;
> > +
> > +  tree type = TREE_TYPE (object);
> > +  if (POINTER_TYPE_P (type))
> >
> > that if looks suspicious...  I'd say
> >   if (!POINTER_TYPE_P (type))
> > return NULL_TREE;
> >
> > is better
> Sure.  But we may not be able to use this anyway as you noted later,
> depending on the type is not safe.
>
> >
> > +type = TREE_TYPE (type);
> >
> > +  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
> > +{
> >
> > array_at_struct_end_p will never return true here since you pass it
> > an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?
> I think you're right.  Given this was cribbed from the tail of another
> function elsewhere, I suspect that other function has the same issue.
>
> I suspect ultimately we want to fix that other copy and drop
> objsize_from_type.
>
>
> >
> > Also you seem to use this info to constrain optimization when you
> > might remember that types of addresses do not carry such information...
> > Thus it should be "trivially" possible to write a testcase that is 
> > miscompiled
> > after your patch.  I also don't see this really exercised in the
> > testcases you add?
> Arggh.  You're absolutely correct.  I must be blocking out that entire
> discussion from last summer due to the trama :-)
>
> If the destination is the address of a _DECL node, can we use the size
> of the _DECL?

Yes, but this should already happen for both invariant ones like 
and variant ones like [i].c in ao_ref_init_from_ptr_and_size.

Richard.

>
>
> > All of them reference a static array a ...
> >
> > +   if (TREE_CODE (size) != INTEGER_CST)
> >
> > TREE_CODE (size) == SSA_NAME you want
> Agreed.  Will fix.
>
> jeff


Re: [PATCH] issue consistent warning for past-the-end array stores (PR 91457)

2019-08-22 Thread Richard Biener
On Wed, Aug 21, 2019 at 10:10 PM Martin Sebor  wrote:
>
> On 8/20/19 1:26 AM, Richard Biener wrote:
> > On Tue, Aug 20, 2019 at 4:32 AM Martin Sebor  wrote:
> >>
> >> On 8/19/19 8:10 AM, Richard Biener wrote:
> >>> On Sat, Aug 17, 2019 at 12:43 AM Martin Sebor  wrote:
> 
>  With the recent enhancement to the strlen handling of multibyte
>  stores the g++.dg/warn/Warray-bounds-4.C for zero-length arrays
>  started failing on hppa (and probably elsewhere as well).  This
>  is partly the result of the added detection of past-the-end
>  writes into the strlen pass which detects more instances of
>  the problem than -Warray-bounds.  Since the IL each warning
>  works with varies between targets, the same invalid code can
>  be diagnosed by one warning one target and different warning
>  on another.
> 
>  The attached patch does three things:
> 
>  1) It enhances compute_objsize to also determine the size of
>  a flexible array member (and its various variants), including
>  from its initializer if necessary.  (This resolves 91457 but
>  introduces another warning where was previously just one.)
>  2) It guards the new instance of -Wstringop-overflow with
>  the no-warning bit on the assignment to avoid warning on code
>  that's already been diagnosed.
>  3) It arranges for -Warray-bounds to set the no-warning bit on
>  the enclosing expression to keep -Wstringop-overflow from issuing
>  another warning for the same problem.
> 
>  Testing the compute_objsize enhancement to bring it up to par
>  with -Warray-bounds in turn exposed a weakness in the latter
>  warning for flexible array members.  Rather than snowballing
>  additional improvements into this one I decided to put that
>  off until later, so the new -Warray-bounds test has a bunch
>  of XFAILs.  I'll see if I can find the time to deal with those
>  either still in stage 1 or in stage 3 (one of them is actually
>  an ancient regression).
> >>>
> >>> +static tree
> >>> +get_initializer_for (tree init, tree decl)
> >>> +{
> >>>
> >>> can't you use fold_ctor_reference here?
> >>
> >> Yes, but only with an additional enhancement.  Char initializers
> >> for flexible array members aren't transformed to STRING_CSTs yet,
> >> so without the size of the initializer specified, the function
> >> returns the initializer for the smallest subobject, or char in
> >> this case.  I've enhanced the function to handle them.
> >
> > So at the moment it returns an empty array constructor, correct?
> > Isn't that the more canonical representation?  The STRING_CST
> > index type doesn't really handle "empty" strings and I expect code
> > more confused about that than about an empty CTOR?
>
> Yes.  Returning an empty CTOR is more general than an empty
> string and enables more optimizations.  It requires changing
> the caller(s) that look for a string but I think that's fine.
> Thanks for the hint!
>
> >>> +/* Determine the size of the flexible array FLD from the initializer
> >>> +   expression for the struct object DECL in which the meber is declared
> >>> +   (possibly recursively).  Return the size or zero constant if it isn't
> >>> +   initialized.  */
> >>> +
> >>> +static tree
> >>> +get_flexarray_size (tree decl, tree fld)
> >>> +{
> >>> +  if (tree init = DECL_INITIAL (decl))
> >>> +{
> >>> +  init = get_initializer_for (init, fld);
> >>> +  if (init)
> >>> +   return TYPE_SIZE_UNIT (TREE_TYPE (init));
> >>> +}
> >>> +
> >>> +  return integer_zero_node;
> >>>
> >>> so you're hoping that the (sub-)CONSTRUCTOR get_initializer_for
> >>> returns has a complete type but the initialized object didn't get it
> >>> completed.  Isnt that wishful thinking?
> >>
> >> I don't know what you mean.  When might a CONSTRUCTOR not have
> >> a complete type, and if/when it doesn't, why would that be
> >> a problem here?  TYPE_SIZE_UNIT will evaluate to null meaning
> >> "don't know" and that's fine.  Could you try to be more specific
> >> about the problem you're pointing out?
> >>
> >>> And why return integer_zero_node
> >>> rather than NULL_TREE here?
> >>
> >> Because the size of a flexible array member with no initializer
> >> is zero.
> >>
> >>>
> >>> +  if (TREE_CODE (dest) == COMPONENT_REF)
> >>> +{
> >>> +  *pdecl = TREE_OPERAND (dest, 1);
> >>> +
> >>> +  /* If the member has a size return it.  Otherwise it's a flexible
> >>> +array member.  */
> >>> +  if (tree size = DECL_SIZE_UNIT (*pdecl))
> >>> +   return size;
> >>>
> >>> because here you do.
> >>
> >> Not sure what you mean here either.  (This code was also a bit
> >
> > You return NULL_TREE.
> >
> >> out of date WRT to the patch I had tested.  Not sure how that
> >> happened.  The attached patch is up to date.)
> >>
> >>>
> >>> Also once you have an underlying VAR_DECL you can compute
> >>> the flexarray size by DECL_SIZE (var) - offset-of 

Re: [PATCH 2/8] bpf: new GCC port

2019-08-22 Thread Hans-Peter Nilsson
I didn't expect this to be contested and not by a frequent
reviewer, but since you took the time to express yourself, I'll
do the same.

On Wed, 21 Aug 2019, Segher Boessenkool wrote:

> On Tue, Aug 20, 2019 at 04:05:40PM -0400, Hans-Peter Nilsson wrote:
> > On Tue, 20 Aug 2019, Jose E. Marchesi wrote:
> > > > On Thu, Aug 15, 2019 at 12:22:46AM +0200, Jose E. Marchesi wrote:
> > > > > --- a/configure
> > > > > +++ b/configure
> > > Yeah by mistake I used a Debian patched autoconf 2.96.  Will regenerate
> > > using vanilla autoconf for subsequent versions of the patch.
> >
> > It's nice that this is identified and hopefully resolved, but
> > since nobody mentioned it I'll just point out that
> > it's preferable to *not at all* include generated files like
> > configure in patches.  See
> > .
>
> "Do not include generated files as part of the patch, just mention them
> in the ChangeLog (e.g., "* configure: Regenerate.")."
>
> That's not common practice nowadays I think?

I don't know about that.  But, if that's a valid observation
then I'd say people just tend to generally drift to the path of
least resistance for the immediate task at hand.  It's rarely an
indication of *good* practice by itself, but perhaps that it's
time to bring it up.

> It's also not good advice
> for people who might get it wrong.

You mean the autoconf version, I guess.  That's about the only
value of including generated files; for newcomers that miss
using matched versions.  Including bulk in patches can also
cause grief by messages being lost when they exceed the
size-limit.

>  Also, the patches on the mailing
> list should preferably be exactly what is committed.  For sanity.

IMHO sanity for gcc-patches is maximum review-value and
readability for others, avoiding redundancy.  I'd just think
sending generated files for review is as redundant now as it was
when that advice was added.

Though, if you as a frequent reviewer believe that the valued
majority of reviewers and readers thinks differently, I suggest
taking steps to update the advice.  Either way, keep the advice
up-to-date, for sanity.

brgds, H-P


Re: [PATCH][AArch64] Don't split 64-bit constant stores to volatile location

2019-08-22 Thread Andrew Pinski
On Thu, Aug 22, 2019 at 2:16 AM Kyrill Tkachov
 wrote:
>
> Hi all,
>
> The optimisation to optimise:
> typedef unsigned long long u64;
>
> void bar(u64 *x)
> {
>   *x = 0xabcdef10abcdef10;
> }
>
> from:
>  mov x1, 61200
>  movkx1, 0xabcd, lsl 16
>  movkx1, 0xef10, lsl 32
>  movkx1, 0xabcd, lsl 48
>  str x1, [x0]
>
> into:
>  mov w1, 61200
>  movkw1, 0xabcd, lsl 16
>  stp w1, w1, [x0]
>
> ends up producing two distinct stores if the destination is volatile:
>void bar(u64 *x)
>{
>  *(volatile u64 *)x = 0xabcdef10abcdef10;
>}
>  mov w1, 61200
>  movkw1, 0xabcd, lsl 16
>  str w1, [x0]
>  str w1, [x0, 4]
>
> because we end up not merging the strs into an stp. It's questionable
> whether the use of STP is valid for volatile in the first place.

It is not valid as the architecture does not require stp to be atomic
so then the order of the stores can change

Thanks,
Andrew Pinski

> To avoid unnecessary pain in a context where it's unlikely to be
> performance critical [1] (use of volatile), this patch avoids this
> transformation for volatile destinations, so we produce the original
> single STR-X.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk (and eventual backports)?
>
> Thanks,
> Kyrill
>
> [1]
> https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
>
>
> gcc/
> 2019-08-22  Kyrylo Tkachov 
>
>  * config/aarch64/aarch64.md (mov): Don't call
>  aarch64_split_dimode_const_store on volatile MEM.
>
> gcc/testsuite/
> 2019-08-22  Kyrylo Tkachov 
>
>  * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
>


Re: [PATCH, libphobos] Committed merge with upstream druntime 5bb8ce19

2019-08-22 Thread Matthias Klose
On 21.08.19 10:02, Iain Buclaw wrote:
> Hi,
> 
> This patch merges the libdruntime library with upstream druntime 5bb8ce19.
> 
> Synchronizes extern(C) bindings with the latest release, mostly this
> is just Musl target support.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu and x86_64-linux-musl.
> 
> Committed to trunk as r274773.

Does the change the ABI?  Asking, because apparently the out-of-tree libgphobos
for GCC 8, and the one in GCC 9 don't seem to be ABI compatible.

Matthias


[PATCH][AArch64] Don't split 64-bit constant stores to volatile location

2019-08-22 Thread Kyrill Tkachov

Hi all,

The optimisation to optimise:
   typedef unsigned long long u64;

   void bar(u64 *x)
   {
 *x = 0xabcdef10abcdef10;
   }

from:
    mov x1, 61200
    movk    x1, 0xabcd, lsl 16
    movk    x1, 0xef10, lsl 32
    movk    x1, 0xabcd, lsl 48
    str x1, [x0]

into:
    mov w1, 61200
    movk    w1, 0xabcd, lsl 16
    stp w1, w1, [x0]

ends up producing two distinct stores if the destination is volatile:
  void bar(u64 *x)
  {
    *(volatile u64 *)x = 0xabcdef10abcdef10;
  }
    mov w1, 61200
    movk    w1, 0xabcd, lsl 16
    str w1, [x0]
    str w1, [x0, 4]

because we end up not merging the strs into an stp. It's questionable 
whether the use of STP is valid for volatile in the first place.
To avoid unnecessary pain in a context where it's unlikely to be 
performance critical [1] (use of volatile), this patch avoids this
transformation for volatile destinations, so we produce the original 
single STR-X.


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk (and eventual backports)?

Thanks,
Kyrill

[1] 
https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/



gcc/
2019-08-22  Kyrylo Tkachov 

    * config/aarch64/aarch64.md (mov): Don't call
    aarch64_split_dimode_const_store on volatile MEM.

gcc/testsuite/
2019-08-22  Kyrylo Tkachov 

    * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 753c7978700d441c71cf97d5ae1e11f160b270af..cb91da796b3e9bfd6f19e714ccee7791b9cf3714 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1108,8 +1108,8 @@
 	(match_operand:GPI 1 "general_operand"))]
   ""
   "
-if (MEM_P (operands[0]) && CONST_INT_P (operands[1])
-	&& mode == DImode
+if (MEM_P (operands[0]) && !MEM_VOLATILE_P (operands[0])
+	&& CONST_INT_P (operands[1]) && mode == DImode
 	&& aarch64_split_dimode_const_store (operands[0], operands[1]))
   DONE;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c b/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c
new file mode 100644
index ..da5975ad1652c00a3b69b3dc19e5c09c77526de7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/nosplit-di-const-volatile_1.c
@@ -0,0 +1,15 @@
+/* Check that storing the 64-bit immediate to a volatile location is done
+   with a single store.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned long long u64;
+
+void bar (u64 *x)
+{
+  *(volatile u64 *)x = 0xabcdef10abcdef10ULL;
+}
+
+/* { dg-final { scan-assembler-times "str\tx..?, .*" 1 } } */
+/* { dg-final { scan-assembler-not "str\tw..?, .*" } } */


Re: [RFC] Enable math functions linking with static library for LTO

2019-08-22 Thread Richard Biener
On Wed, Aug 21, 2019 at 9:56 PM Joseph Myers  wrote:
>
> I'm afraid I'm not clear on the semantics of builtin_with_linkage_p (as in
> the current checked-in version).  It says:
>
> /* Return true if the builtin DECL is implemented in a standard library.
>Otherwise returns false which doesn't guarantee it is not (thus the list of
>handled builtins below may be incomplete).  */
>
> But what does "is implemented in a standard library" mean?  Certainly
> plenty of GCC ports are for systems that e.g. don't have fmaf64 which is
> included in that switch statement, and calls to fmaf64 on such systems
> will result in an undefined reference unless it's on an architecture where
> GCC can expand that inline.
>
> Specifically: does Tejas's roundeven patch (and likewise the fadd patch)
> need updating to add those functions to the switch statement in
> builtin_with_linkage_p?  What are the correct semantics for a built-in
> function with the following properties: it may not be in the system libc /
> libm on all systems, but GCC can't always expand it inline either, so it
> may result in an undefined reference on systems where it's not in system
> libc / libm?
>
> If it's only meant to be for functions you *know* are in a standard
> library on the system used, I'd expect checks of the libc_has_function
> target hook.  If, however, it's meant to be for functions that might
> result in an undefined reference (where GCC expects there to be a library
> function to satisfy that reference, but some systems don't have that
> function), I'd expect it to be based on whether a library name was passed
> when add_builtin_function was called, if that's something you can
> determine by examining the DECL - and in that case I wouldn't expect
> there to be a long switch statement listing particular functions, because
> the relevant information is specified when the built-in functions are
> created in the first place.

So before the patch we did not tell the linker that we have UNDEFs
to any of those symbol in the IL which means its resolution computation
could be off if for example static libraries were involved.  Usually at
final link time it fixed things for us as those UNDEFs then eventually
appeared out of nothing (as if GCC late generated them).

With the patch I wanted to make sure we never end up link-failing
(due to unresolved symbols) on the IL output of UNDEFs, so we
now have the explicit list of builtins I believe always come from
some library that the GCC link or the user eventually appends to
the linker command line.

Not sure if that helps answering your question.  In essence,
this function should tell whether we would emit an UNDEF
in the end and we want to know that early, at LTO IL generation
time to communicate this to the linker plugin.  Given previous
behavior I want to error on the side of rather omitting the UNDEF.

Richard.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [SVE] [aarch64] Add "@" in vcond_mask pattern

2019-08-22 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> Hi Richard,
> I screwed up while committing pr88839 fix, by not including the change
> to aarch64-sve.md
> that adds "@" to vcond_mask pattern, which resulted in trunk failing to build
> with: "gen_vcond_mask" was not declared in this scope -:/
> The attached patch resolves the failure.
> OK to commit ?

OK, thanks.  (And would have been OK as obvious FWIW.)

Richard

>
> Thanks,
> Prathamesh
>
> 2019-08-22  Prathamesh Kulkarni  
>
>   * aarch64-sve.md (vcond_mask): Add "@".
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index ac65e691d73..f58353e9c6d 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3927,7 +3927,7 @@
>  ;; vcond_mask operand order: true, false, mask
>  ;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR)
>  ;; SEL operand order:mask, true, false
> -(define_expand "vcond_mask_"
> +(define_expand "@vcond_mask_"
>[(set (match_operand:SVE_ALL 0 "register_operand")
>   (unspec:SVE_ALL
> [(match_operand: 3 "register_operand")


Re: [PATCH V2 2/8] bpf: new GCC port

2019-08-22 Thread Segher Boessenkool
Hi!

On Thu, Aug 22, 2019 at 04:11:46AM +0200, Jose E. Marchesi wrote:
> A colleague (who actually _uses_ eBPF extensively, ahem) tells me that
> the kernel verifier allows to pass addresses of the caller's stack
> frame, tracking that it is a ptr to a stack location, and it knows which
> stack it came from.  So it is indeed possible for the callee to access
> the caller's frame, and therefore to pass arguments by reference.

Good news for testability of the GCC port, and also good news for users,
who will have one less (HUGE) arbitrary restriction to deal with :-)

> On the downside, it is not possible for a callee to access the caller's
> frame applying an offset to its frame pointer,

That is true for many targets.

> because the stacks are disjoint.

And even that sometimes.

> This means that most probably I will have to dedicate a real,
> not eliminable register to act as the arg pointer, if I want to get rid
> of the annoying limitation on the number of arguments...  and in order
> to keep ABI compatibility with llvm built objects, this register is
> gonna have to be %r5, i.e. the last register usable to pass arguments,
> but it should be only used for that purpose if the function gets more
> than 5 arguments...  sounds messy, but there is hope, yay!

At *function entry* it is in %r5, but you can immediately copy that
elsewhere, at function start; there is no need to dedicate a hard
register to it.


Segher


Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-22 Thread Kewen.Lin
Hi Bin,

on 2019/8/22 下午1:46, Bin.Cheng wrote:
> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
>>
>> Hi Bin,
>>
>> Thanks for your time!
>>
>> on 2019/8/21 下午8:32, Bin.Cheng wrote:
>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:

 Hi!

 Comparing to the previous versions of implementation mainly based on the
 existing IV cands but zeroing the related group/use cost, this new one is 
 based
 on Richard and Segher's suggestion introducing one doloop dedicated IV 
 cand.

 Some key points are listed below:
   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV 
 cand.
   2) Special name "doloop" assigned.
   3) Doloop IV cand with form (niter+1, +, -1)
   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for 
 step.
   5) Support may_be_zero (regressed PR is in this case), the base of 
 doloop IV
  can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
   6) Add more expr support in force_expr_to_var_cost for reasonable cost
  calculation on the IV base with may_be_zero (like COND_EXPR).
   7) Set zero cost when using doloop IV cand for doloop use.
   8) Add three hooks (should we merge _generic and _address?).
 *) have_count_reg_decr_p, is to indicate the target has special 
 hardware
count register, we shouldn't consider the impact of doloop IV when
calculating register pressures.
 *) doloop_cost_for_generic, is the extra cost when using doloop IV 
 cand for
generic type IV use.
 *) doloop_cost_for_address, is the extra cost when using doloop IV 
 cand for
address type IV use.
>>> What will happen if doloop IV cand be used for generic/address type iv
>>> use?  Can RTL doloop can still perform doloop optimization in this
>>> case?
>>>
>>
>> On Power, we put the iteration count into hardware count register, it takes 
>> very
>> high cost to move the count to GPR, so the cost is set as INF to make it 
>> impossible
>> to use it for generic/address type iv use.  But as some discussion before, 
>> on some
>> targets using GPR instead of hardware count register, they probably want to 
>> use this
>> doloop iv used for other uses if profitable.  These two hooks offer the 
>> possibility.
>> In that case, I think RTL doloop can still perform since it can still get the
>> pattern and transform.  The generic/address uses can still use it.

 Bootstrapped on powerpc64le-linux-gnu and regression testing passed 
 excepting
 for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
 tracked
 by PR89983.

 Any comments and suggestions are highly appreciated.  Thanks!
>>> Not sure if I understand the patch correctly, some comments embedded.
>>>
>>> +  /* The number of doloop candidate in the set.  */
>>> +  unsigned n_doloop_cands;
>>> +
>>> This is unnecessary.  See below comments.
>>>
>>> -add_candidate_1 (data, base, step, important,
>>> -IP_NORMAL, use, NULL, orig_iv);
>>> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
>>> doloop,
>>> +orig_iv);
>>>if (ip_end_pos (data->current_loop)
>>>&& allow_ip_end_pos_p (data->current_loop))
>>> -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
>>> orig_iv);
>>> +add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
>>> doloop,
>>> +orig_iv);
>>> Do we need to skip ip_end_pos case for doloop candidate?  Because the
>>> candidate increment will be inserted in latch, i.e, increment position
>>> is after exit condition.
>>>
>>
>> Yes, we should skip it.  Currently function find_doloop_use has the check on 
>> an
>> empty latch and gimple_cond to latch, partially excluding it.  But it's 
>> still good
>> to guard it directly here.
>>
>>> -  tree_to_aff_combination (iv->base, type, val);
>>> +  tree base = iv->base;
>>> +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
>>> extract
>>> + the value under !may_be_zero to get the compact bound which also well 
>>> fits
>>> + for may_be_zero since we ensure the value for it is const one.  */
>>> +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
>>> (desc->may_be_zero))
>>> +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
>>> +   unshare_expr (rewrite_to_non_trapping_overflow 
>>> (niter)),
>>> +   build_int_cst (TREE_TYPE (niter), 1));
>>> +  tree_to_aff_combination (base, type, val);
>>> I don't quite follow here.  The iv->base is computed from niter, I
>>> suppose compact bound is for cheaper candidate initialization?  Why
>>> it's possible to extract !may_be_zero niter for may_be_zero here?  The
>>> niter under !may_be_zero has no indication about the real niter under
>>> may_be_zero.
>>>
>>
>> As you note below, the