Re: [PATCH] [PR libcpp/71681] Fix handling header.gcc in subdirectories

2016-09-08 Thread Andris Pavenis

On 09/08/2016 12:09 PM, Thomas Schwinge wrote:

Hi!

A few review comments:

On Wed, 7 Sep 2016 20:19:20 +0300, Andris Pavenis  wrote:

This patch fixes handling header.gcc in subdirectories when command line option 
-remap has been
used.

(I have not yet reviewed the logic of your change itself.)  Wouldn't it
be simpler to just unconditionally add a directory separator here?

Is it OK to assume that a directory separator always is "/"?  (Think DOS,
Windows etc.  But maybe there's some "translation layer" beneath this --
I don't know.)

No.

DJGPP supports both '/' and '\'. '\' is OK except in some cases (special handling of paths 
beginning with /dev/). Blind conversion off all '/' to '\' do not work for DJGPP due to this reason

(had related problems in directory gcc/ada).

Windows targets (WINGW, Cygwin): at least in Ada gcc/ada/s-os_lib.adb converts all '/' to '\' for 
Windows targets and without submitted but not yet committed patch also for DJGPP (that broke 
bootstrapping gcc for DJGPP due to gnatmake not working). I have not done however real testing for 
Windows targets myself.

Can you provide some test cases?  (Ah, I now see you got some "Test
script to reproduce problem" attached to  --
this should be turned into a regression test for the GCC testsuite.)

Which could more appropriate place for test-case?
- gcc/testsuite/c-c++-common/cpp
- gcc/testsuite/gcc.dg/cpp

Also should this test be in a separate subdirectory under either of them?

It is good practice to assign a GCC PR to yourself if you're working on
it, and it also helps to post to the PR a comment with a link to the
mailing list archive for patch submissions, etc.

Done

Andris



[PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-08 Thread Martin Sebor

While working on the -Wformat-length pass I noticed that in some
diagnostics that make use of the %qc and %qs directives GCC prints
non-printable characters raw.  For example, it might print a newline,
corrupting the diagnostic stream (bug 77521).

Some other diagnostics that try to avoid this problem by using
a directive such as %x when the character is not printable might
use the sign-extended value of the character, printing a very
large hexadecimal value (bug 77520).

The attached patch changes the pretty printer to detect non-printable
characters in %qc and %qs directives (but not %c or %s) and print
those in hexadecimal (via "\\x%02x").

Martin

PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong
preference for it.  Incidentally, %qE too suffers from bug 77520
(see below).  The patch doesn't try to fix that.

$ cat z.C && gcc z.C
constexpr int i = "ABC\x7f_\x80XYZ";
z.C:1:19: error: invalid conversion from ‘const char*’ to ‘int’ 
[-fpermissive]

 constexpr int i = "ABC\x7f_\x80XYZ";
   ^
z.C:1:19: error: ‘(int)((const char*)"ABC\177_\3777600XYZ")’ is not 
a constant expression
PR c/77520 - wrong value for extended ASCII characters in -Wformat message
PR c/77521 - %qc format directive should quote non-printable characters

gcc/c-family/ChangeLog:
2016-09-08  Martin Sebor  

	PR c/77520
	PR c/77521
	* c-format.c (argument_parser::find_format_char_info): Use %qc
	format directive unconditionally.

gcc/ChangeLog:
2016-09-08  Martin Sebor  

	PR c/77520
	PR c/77521
	* pretty-print.c (pp_quoted_string): New function.
	(pp_format): Call it for %c and %s directives.

gcc/testsuite/ChangeLog:
2016-09-08  Martin Sebor  

	PR c/77520
	PR c/77521
	* gcc.dg/pr77520.c: New test.
	* gcc.dg/pr77521.c: New test.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 09d514e..0c17340 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -2355,20 +2355,12 @@ argument_parser::find_format_char_info (char format_char)
 ++fci;
   if (fci->format_chars == 0)
 {
-  if (ISGRAPH (format_char))
-	format_warning_at_char (format_string_loc, format_string_cst,
-format_chars - orig_format_chars,
-OPT_Wformat_,
-"unknown conversion type character"
-" %qc in format",
-format_char);
-  else
-	format_warning_at_char (format_string_loc, format_string_cst,
-format_chars - orig_format_chars,
-OPT_Wformat_,
-"unknown conversion type character"
-" 0x%x in format",
-format_char);
+  format_warning_at_char (format_string_loc, format_string_cst,
+			  format_chars - orig_format_chars,
+			  OPT_Wformat_,
+			  "unknown conversion type character"
+			  " %qc in format",
+			  format_char);
   return NULL;
 }
 
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 325263e..a39815e 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #endif
 
+static void pp_quoted_string (pretty_printer *, const char *, size_t = -1);
+
 /* Overwrite the given location/range within this text_info's rich_location.
For use e.g. when implementing "+" in client format decoders.  */
 
@@ -555,8 +557,20 @@ pp_format (pretty_printer *pp, text_info *text)
 	  break;
 
 	case 'c':
-	  pp_character (pp, va_arg (*text->args_ptr, int));
-	  break;
+	  {
+	/* When quoting, print alphanumeric, punctuation, and the space
+	   character unchanged, and all others in hexadecimal with the
+	   "\x" prefix.  Otherwise print them all unchanged.  */
+	int chr = va_arg (*text->args_ptr, int);
+	if (ISPRINT (chr) || !quote)
+	  pp_character (pp, chr);
+	else
+	  {
+		const char str [2] = { chr, '\0' };
+		pp_quoted_string (pp, str, 1);
+	  }
+	break;
+	  }
 
 	case 'd':
 	case 'i':
@@ -577,7 +591,10 @@ pp_format (pretty_printer *pp, text_info *text)
 	  break;
 
 	case 's':
-	  pp_string (pp, va_arg (*text->args_ptr, const char *));
+	  if (quote)
+	pp_quoted_string (pp, va_arg (*text->args_ptr, const char *));
+	  else
+	pp_string (pp, va_arg (*text->args_ptr, const char *));
 	  break;
 
 	case 'p':
@@ -939,6 +956,41 @@ pp_string (pretty_printer *pp, const char *str)
   pp_maybe_wrap_text (pp, str, str + strlen (str));
 }
 
+/* Append the leading N characters of STRING to the output area of
+   PRETTY-PRINTER, quoting in hexadecimal non-printable characters.
+   Setting N = -1 is as if N were set to strlen (STRING).  The STRING
+   may be line-wrapped if in appropriate mode.  */
+static void
+pp_quoted_string (pretty_printer *pp, const char *str, size_t n /* = -1 */)
+{
+  gcc_checking_assert (str);
+
+  const char *last = str;

[PATCH] PR fortran/77506

2016-09-08 Thread Steve Kargl
Regression tested on x86_64-*-freebsd.  OK to commit?

2016-09-08  Steven G. Kargl  

PR fortran/77506
* array.c (gfc_match_array_constructor): CHARACTER(len=*) cannot
appear in an array constructor.

2016-09-08  Steven G. Kargl  

PR fortran/77506
* gfortran.dg/pr77506.f90: New test.

Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c (revision 240039)
+++ gcc/fortran/array.c (working copy)
@@ -1142,6 +1142,15 @@ gfc_match_array_constructor (gfc_expr **
  gfc_restore_last_undo_checkpoint ();
  goto cleanup;
}
+
+ if (ts.type == BT_CHARACTER
+ && ts.u.cl && !ts.u.cl->length && !ts.u.cl->length_from_typespec)
+   {
+ gfc_error ("Type-spec at %L cannot contain an asterisk for a "
+"type parameter", );
+ gfc_restore_last_undo_checkpoint ();
+ goto cleanup;
+   }
}
 }
   else if (m == MATCH_ERROR)
Index: gcc/testsuite/gfortran.dg/pr77506.f90
===
--- gcc/testsuite/gfortran.dg/pr77506.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr77506.f90   (working copy)
@@ -0,0 +1,4 @@
+! { dg-do compile }
+program foo
+   print *, [character(len=*)::'ab','cd'] ! { dg-error "contain an asterisk" }
+end program foo

-- 
Steve


[PATCH 6/9] df selftests

2016-09-08 Thread David Malcolm
gcc/ChangeLog:
* df-core.c: Include selftest.h and selftest-rtl.h.
(selftest::dataflow_test::dataflow_test): New ctor.
(selftest::dataflow_test::~dataflow_test): New dtor.
(selftest::test_df_single_set): New function.
(selftest::df_core_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call it.
* selftest.h (selftest::df_core_c_tests): New decl.
---
 gcc/df-core.c| 77 
 gcc/selftest-run-tests.c |  1 +
 gcc/selftest.h   |  1 +
 3 files changed, 79 insertions(+)

diff --git a/gcc/df-core.c b/gcc/df-core.c
index e531d58..cb8e2f9 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -384,6 +384,8 @@ are write-only operations.
 #include "cfganal.h"
 #include "tree-pass.h"
 #include "cfgloop.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 static void *df_get_bb_info (struct dataflow *, unsigned int);
 static void df_set_bb_info (struct dataflow *, unsigned int, void *);
@@ -2482,3 +2484,78 @@ debug_df_chain (struct df_link *link)
   df_chain_dump (link, stderr);
   fputc ('\n', stderr);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* dataflow_test's constructor.  Initialize df.  */
+
+dataflow_test::dataflow_test ()
+{
+  rest_of_handle_df_initialize ();
+}
+
+/* dataflow_test's destructor.  Clean up df.  */
+
+dataflow_test::~dataflow_test ()
+{
+  rest_of_handle_df_finish ();
+}
+
+/* Verify df_note on a trivial function.  */
+
+void
+test_df_single_set ()
+{
+  const char *input_dump
+= "(insn 1 0 0 2 (set (reg:SI 100) (reg:SI 101)) -1 (nil))\n";
+  rtl_dump_test t (input_dump, 100);
+  //print_rtl_with_bb (stdout, get_insns (), 1024);
+
+  dataflow_test dftest;
+
+  df_note_add_problem ();
+  df_analyze ();
+  //df_dump (stderr);
+  df_finish_pass (false);
+
+  rtx_insn *insn = get_insn_by_uid (1);
+
+  ASSERT_NE (NULL, REG_NOTES (insn));
+  rtx_expr_list *note0 = as_a  (REG_NOTES (insn));
+
+  rtx_expr_list *note1 = note0->next ();
+  ASSERT_NE (NULL, note1);
+
+  ASSERT_EQ (NULL, note1->next ());
+
+  ASSERT_EQ (SET_SRC (PATTERN (insn)), note0->element ());
+  ASSERT_EQ (REG_DEAD, REG_NOTE_KIND (note0));
+
+  ASSERT_EQ (SET_DEST (PATTERN (insn)), note1->element ());
+  ASSERT_EQ (REG_UNUSED, REG_NOTE_KIND (note1));
+
+  struct df_lr_bb_info *bb_info = df_lr_get_bb_info (2);
+  ASSERT_NE (NULL, bb_info);
+
+  /* "r100 = r101;" so we should have a use of r101.  */
+  ASSERT_FALSE (bitmap_bit_p (_info->use, t.effective_regno (100)));
+  ASSERT_TRUE (bitmap_bit_p (_info->use, t.effective_regno (101)));
+
+  /* ...and a def of r100.  */
+  ASSERT_TRUE (bitmap_bit_p (_info->def, t.effective_regno (100)));
+  ASSERT_FALSE (bitmap_bit_p (_info->def, t.effective_regno (101)));
+}
+
+/* Run all of the selftests within this file.  */
+
+void
+df_core_c_tests ()
+{
+  test_df_single_set ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index c90037c..14e5828 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -64,6 +64,7 @@ selftest::run_tests ()
   gimple_c_tests ();
   rtl_tests_c_tests ();
   read_rtl_function_c_tests ();
+  df_core_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 75fea6f..037a5ee 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -190,6 +190,7 @@ extern void forcibly_ggc_collect ();
 /* Declarations for specific families of tests (by source file), in
alphabetical order.  */
 extern void bitmap_c_tests ();
+extern void df_core_c_tests ();
 extern void diagnostic_c_tests ();
 extern void diagnostic_show_locus_c_tests ();
 extern void edit_context_c_tests ();
-- 
1.8.5.3



[PATCH 5/9] Introduce class function_reader

2016-09-08 Thread David Malcolm
This patch generalizes the RTL-reading capabilities so that they
can be run on the host as well as the build machine.
The available rtx in rtl.def changes dramatically between these
two configurations, so a fair amount of #ifdef GENERATOR_FILE is
required to express this.

This patch introduces a function_reader subclass of rtx_reader,
capable of reading an RTL function dump (or part of one),
reconstructing a cfun with a CFG and basic blocks containing insns.

gcc/ChangeLog:
* Makefile.in (OBJS): Add errors.o, read-md.o, read-rtl.o,
read-rtl-function.o, and selftest-rtl.o.
* cfgexpand.c (pass_expand::execute): Move stack initializations
to rtl_data::init_stack_alignment and call it.  Pass "true"
for new "emit_insns" param of expand_function_start.
* emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and
regno_reg_rtx resizing logic to...
(emit_status::ensure_regno_capacity): ...this new method.
(init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc
rather than ggc_vec_alloc.
(rtl_data::init_stack_alignment): New method.
(get_insn_by_uid): New function.
* emit-rtl.h (rtl_data::init_stack_alignment): New method.
* errors.c: Use consistent pattern for bconfig.h vs config.h
includes.
(progname): Wrap with #ifdef GENERATOR_FILE.
(error): Likewise.  Add "error: " to message.
(fatal): Likewise.
(internal_error): Likewise.
(trim_filename): Likewise.
(fancy_abort): Likewise.
* errors.h (struct file_location): Move here from read-md.h.
(file_location::file_location): Likewise.
(error_at): New decl.
* function-tests.c (selftest::verify_three_block_rtl_cfg): Remove
"static".
* function.c (instantiate_decls): Guard call to
instantiate_decls_1 with if (DECL_INITIAL (fndecl)).
(expand_function_start): Add param "emit_insns", and use it to
guard the various gen/emit calls.
* function.h (emit_status::ensure_regno_capacity): New method.
(expand_function_start): Add bool param to decl.
* gensupport.c (gen_reader::gen_reader): Add NULL for new policy
param of rtx_reader ctor.
* print-rtl.c (print_rtx): Print "(nil)" rather than an empty
string for NULL strings.  Print "(nil)" for NULL basic blocks.
* read-md.c (read_skip_construct): Provide forward decl.
(read_skip_spaces): Support '/'.
(require_char): New function.
(require_word_ws): New function.
(peek_char): New function.
(read_name): Rename to...
(read_name_1): ...this new static function, adding "out_loc" param,
and converting "missing name or number" to returning false, rather
than failing.
(read_name): Reimplement in terms of read_name_1.
(read_name_or_nil): New function.
(read_string): Handle "(nil)" by returning NULL.  */
(rtx_reader::rtx_reader): Add rtl_reader_policy * param, using
it to initialize m_policy.
(rtx_reader::~rtx_reader): Free m_base_dir.  Clean up global data.
* read-md.h (struct file_location): Move to errors.h.
(file_location::file_location): Likewise.
Include errors.h.
(class regno_remapper): New class.
(struct rtl_reader_policy): New struct.
(rtx_reader::rtx_reader): Add rtl_reader_policy * param.
(rtx_reader::add_fixup_insn_uid): New vfunc.
(rtx_reader::add_fixup_bb): New vfunc.
(rtx_reader::add_fixup_note_insn_basic_block): New vfunc.
(rtx_reader::add_fixup_source_location): New vfunc.
(rtx_reader::add_fixup_jump_label): New vfunc.
(rtx_reader::add_fixup_expr): New vfunc.
(rtx_reader::remap_regno): New method.
(rtx_reader::m_policy): New field.
(noop_reader::noop_reader): Add NULL for new policy param of
rtx_reader ctor.
(peek_char): New decl.
(require_char): New decl.
(require_word_ws): New decl.
(read_name): Convert return type from void to file_location.
(read_name_or_nil): New decl.
* read-rtl-function.c: New file.
* read-rtl-function.h: New file.
* read-rtl.c: Potentially include config.h rather than bconfig.h.
For host, include function.h and emit-rtl.h.
(apply_subst_iterator): Wrap with #ifdef GENERATOR_FILE.
(bind_subst_iter_and_attr): Likewise.
(add_condition_to_string): Likewise.
(add_condition_to_rtx): Likewise.
(apply_attribute_uses): Likewise.
(add_current_iterators): Likewise.
(apply_iterators): Likewise.
(initialize_iterators): Guard usage of apply_subst_iterator with
#ifdef GENERATOR_FILE.
(read_conditions): Wrap with #ifdef GENERATOR_FILE.
(read_mapping): Likewise.
(add_define_attr_for_define_subst): Likewise.

[PATCH 4/9] Expose forcibly_ggc_collect and run it after all selftests

2016-09-08 Thread David Malcolm
Force a GC at the end of the selftests, to shake out GC-related
issues.  For example, if any GC-managed items have buggy (or missing)
finalizers, this last collection will ensure that things that were
failed to be finalized can be detected by valgrind.

gcc/ChangeLog:
* ggc-tests.c (forcibly_ggc_collect): Rename to...
(selftest::forcibly_ggc_collect): ...this, and remove "static".
(test_basic_struct): Update for above renaming.
(test_length): Likewise.
(test_union): Likewise.
(test_finalization): Likewise.
(test_deletable_global): Likewise.
(test_inheritance): Likewise.
(test_chain_next): Likewise.
(test_user_struct): Likewise.
(test_tree_marking): Likewise.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::forcibly_ggc_collect at the end of the selftests.
* selftest.h (selftest::forcibly_ggc_collect): New decl.
---
 gcc/ggc-tests.c  | 28 ++--
 gcc/selftest-run-tests.c |  6 ++
 gcc/selftest.h   |  5 +
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/gcc/ggc-tests.c b/gcc/ggc-tests.c
index 7f97231..b9cd276 100644
--- a/gcc/ggc-tests.c
+++ b/gcc/ggc-tests.c
@@ -27,19 +27,19 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
-/* The various GTY markers must be outside of a namespace to be seen by
-   gengtype, so we don't put this file within the selftest namespace.  */
-
 /* A helper function for writing ggc tests.  */
 
-static void
-forcibly_ggc_collect ()
+void
+selftest::forcibly_ggc_collect ()
 {
   ggc_force_collect = true;
   ggc_collect ();
   ggc_force_collect = false;
 }
 
+/* The various GTY markers must be outside of a namespace to be seen by
+   gengtype, so we don't put this file within the selftest namespace.  */
+
 
 
 /* Verify that a simple struct works, and that it can
@@ -58,7 +58,7 @@ test_basic_struct ()
   root_test_struct = ggc_cleared_alloc  ();
   root_test_struct->other = ggc_cleared_alloc  ();
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_TRUE (ggc_marked_p (root_test_struct));
   ASSERT_TRUE (ggc_marked_p (root_test_struct->other));
@@ -88,7 +88,7 @@ test_length ()
   for (int i = 0; i < count; i++)
 root_test_of_length->elem[i] = ggc_cleared_alloc  ();
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_TRUE (ggc_marked_p (root_test_of_length));
   for (int i = 0; i < count; i++)
@@ -162,7 +162,7 @@ test_union ()
   test_struct *referenced_by_other = ggc_cleared_alloc  ();
   other->m_ptr = referenced_by_other;
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_TRUE (ggc_marked_p (root_test_of_union_1));
   ASSERT_TRUE (ggc_marked_p (ts));
@@ -202,7 +202,7 @@ test_finalization ()
 
   test_struct_with_dtor::dtor_call_count = 0;
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   /* Verify that the destructor was run for each instance.  */
   ASSERT_EQ (count, test_struct_with_dtor::dtor_call_count);
@@ -220,7 +220,7 @@ test_deletable_global ()
   test_of_deletable = ggc_cleared_alloc  ();
   ASSERT_TRUE (test_of_deletable != NULL);
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_EQ (NULL, test_of_deletable);
 }
@@ -293,7 +293,7 @@ test_inheritance ()
   test_some_subclass_as_base_ptr = new some_subclass ();
   test_some_other_subclass_as_base_ptr = new some_other_subclass ();
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   /* Verify that the roots and everything referenced by them got marked
  (both for fields in the base class and those in subclasses).  */
@@ -372,7 +372,7 @@ test_chain_next ()
   tail_node = new_node;
 }
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   /* If we got here, we survived.  */
 
@@ -439,7 +439,7 @@ test_user_struct ()
 
   num_calls_to_user_gt_ggc_mx = 0;
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_TRUE (ggc_marked_p (root_user_struct_ptr));
   ASSERT_TRUE (ggc_marked_p (referenced));
@@ -457,7 +457,7 @@ test_tree_marking ()
 {
   dummy_unittesting_tree = build_int_cst (integer_type_node, 1066);
 
-  forcibly_ggc_collect ();
+  selftest::forcibly_ggc_collect ();
 
   ASSERT_TRUE (ggc_marked_p (dummy_unittesting_tree));
 }
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index d9d3ea1..54a9b0f 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -80,6 +80,12 @@ selftest::run_tests ()
   /* Run any lang-specific selftests.  */
   lang_hooks.run_lang_selftests ();
 
+  /* Force a GC at the end of the selftests, to shake out GC-related
+ issues.  For example, if any GC-managed items have buggy (or missing)
+ finalizers, this last collection will ensure that things that were
+ failed to be finalized can be detected by valgrind.  */
+  forcibly_ggc_collect ();

[PATCH 0/9] RFC: selftests based on RTL dumps

2016-09-08 Thread David Malcolm
The current selftest code is adequate for testing individual
instructions, but most interesting passes have logic involving the
interaction of multiple instructions, or require a CFG and function
to be runnable.  In theory we could write selftests by programatically
constructing a function and CFG "by hand" via API calls, but this is
awkward, tedious, and the resulting code is unnatural.  Examples can
be seen in function-tests.c; that file constructs trivial 3-block
functions, and is pushing the limits of what I'd want to write
"by hand".

This patch kit provides a more natural way to write RTL selftests,
by providing a parser for RTL dumps (or fragments of RTL dumps).  You
can copy and paste fragments of RTL dump into the source for a pass
and then have selftests that run part of the pass on that dump,
asserting that desired outcomes occur.

This is an updated and rewritten version of the RTL frontend work I
posted in May (c.f. "[PATCH 0/4] RFC: RTL frontend"
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00352.html).

In that patch kit, I introduced an rtl1 frontend, capable of parsing
RTL dumps, and running just one RTL pass on them, in the hope of
better testing individual RTL passes.

This patch kit takes a somewhat different approach: it provides
the infrastructure for parsing RTL dumps, but doesn't wire it up as
a frontend.  Instead, it just provides a set of classes for use when
writing selftests.  It would be possible to build out an "rtl1"
frontend as a followup to this kit.

The advantage of this approach is that it's possible to run and test
passes at finer granularity: for example, rather than being limited
to testing all of, say, pass "final", we can also write tests that
run just final_scan_insn on individual rtx_insn *, and verify that
the expected output is emitted.  Tests can be written at various
different levels, testing how the components of a pass handle fragments
of an insn, how they handle entire insns, basic blocks, and so on up
to running a whole pass on a whole function.

The disadvantage is that changing a selftest requires recompilation
of cc1 (though that drawback affects selftests in general).

An overview of the kit follows; patches 6-9 are probably the most
interesting, as they show example of the kinds of selftest that can
be written using this approach:

  * Patch 1 tidies up some global state in .md parsing, wrapping it up in
  an rtx_reader class.

  * Patches 2-4 add some selftest support code.

  * Patch 5 introduces a function_reader subclass of patch 1's rtx_reader,
  capable of parsing RTL function dumps (or fragments thereof), and
  generalizes things so that rtx parsing can be run from the host, rather
  than just at build time.

   * Patch 6 uses the infrastructure to write a dataflow selftest.  It's
   a trivial example, but hopefully shows how more interesting selftests
   could be written.  (it's much easier to write a selftest if a similar
   one already exists)

   * Patch 7 does the same, but for combine.c.

   * Patch 8 does the same, but for final.c, showing examples of both
   assembling an entire function, and of assembling individual rtx_insn *
   (to exercise the .md files and asm output code)

   * Patch 9 does the same, but for cse.c.  I attempted to create a
   selftest that directly reproduces PR 71779, though I haven't yet
   been able to to reproduce the issue (just load the insns and run cse
   on them).

These tests are very target-specific and were developed mostly for
target==x86_64, and a little for target==aarch64.
I put clauses like this in the various test functions, which is a kludge:

/* Only run these tests for i386.  */
 #ifndef I386_OPTS_H
return;
 #endif

Is there a better way to express this condition?  (I guess I could
add a selftest::target_is_x86_p () predicate).

Posting for comment (doesn't fully bootstrap yet due to a few stray
warnings).  Patches 1-4 could probably be applicable even without
the rest of the kit.

Thoughts?

David Malcolm (9):
  Introduce class rtx_reader
  Add selftest::read_file
  selftest.h: add temp_override fixture
  Expose forcibly_ggc_collect and run it after all selftests
  Introduce class function_reader
  df selftests
  combine.c selftests
  final.c selftests
  cse.c selftests

 gcc/Makefile.in  |5 +
 gcc/cfgexpand.c  |7 +-
 gcc/combine.c|  155 ++
 gcc/cse.c|  109 +
 gcc/df-core.c|   77 +++
 gcc/emit-rtl.c   |   70 ++-
 gcc/emit-rtl.h   |2 +
 gcc/errors.c |   23 +-
 gcc/errors.h |   13 +
 gcc/final.c  |  271 +++
 gcc/function-tests.c |2 +-
 gcc/function.c   |   41 +-
 gcc/function.h   |4 +-
 gcc/genconstants.c   |3 +-
 gcc/genenums.c   |3 +-
 gcc/genmddeps.c  |3 +-
 gcc/genpreds.c   |9 +-
 gcc/gensupport.c |   29 +-
 gcc/gensupport.h |6 +-
 gcc/ggc-tests.c  

[PATCH 1/9] Introduce class rtx_reader

2016-09-08 Thread David Malcolm
Bundle up various global variables within gensupport.c into a
class rtx_reader, with a view towards making it easier to run the
code more than once in-process.

gcc/ChangeLog:
* genconstants.c (main): Introduce noop_reader and convert call
to read_md_files to a method call.
* genenums.c (main): Likewise.
* genmddeps.c (main): Likewise.
* genpreds.c (write_tm_constrs_h): Replace use of "in_fname" with
rtx_reader_ptr->get_top_level_filename ().
(write_tm_preds_h): Likewise.
(write_insn_preds_c): Likewise.
* gensupport.c (class gen_reader): New subclass of rtx_reader.
(rtx_handle_directive): Convert to...
(gen_reader::handle_unknown_directive): ...this.
(init_rtx_reader_args_cb): Convert return type from bool to
rtx_reader *.  Create a gen_reader instance, using it for the
call to read_md_files.  Return it if no errors occur.
(init_rtx_reader_args): Convert return type from bool to
rtx_reader *.
* gensupport.h (init_rtx_reader_args_cb): Likewise.
(init_rtx_reader_args_cb): Likewise.
* read-md.c (struct file_name_list): Move to class rtx_reader.
(read_md_file): Delete in favor of rtx_reader::m_read_md_file.
(read_md_filename): Delete in favor of
rtx_reader::m_read_md_filename.
(read_md_lineno): Delete in favor of rtx_reader::m_read_md_lineno.
(in_fname): Delete in favor of rtx_reader::m_toplevel_fname.
(base_dir): Delete in favor of rtx_reader::m_base_dir.
(first_dir_md_include): Delete in favor of
rtx_reader::m_first_dir_md_include.
(last_dir_md_include_ptr): Delete in favor of
rtx_reader::m_last_dir_md_include_ptr.
(max_include_len): Delete.
(rtx_reader_ptr): New.
(fatal_with_file_and_line): Use get_filename and get_lineno
accessors of rtx_reader_ptr.
(require_char_ws): Likewise.
(rtx_reader::read_char): New method, based on ::read_char.
(rtx_reader::unread_char): New method, based on ::unread_char.
(read_escape): Use get_filename and get_lineno accessors of
rtx_reader_ptr.
(read_braced_string): Use get_lineno accessor of rtx_reader_ptr.
(read_string): Use get_filename and get_lineno accessors of
rtx_reader_ptr.
(rtx_reader::rtx_reader): New ctor.
(rtx_reader::~rtx_reader): New dtor.
(handle_include): Convert from a function to...
(rtx_reader::handle_include): ...this method, converting
handle_directive from a callback to a virtual function.
(handle_file): Likewise, converting to...
(rtx_reader::handle_file): ...this method.
(handle_toplevel_file): Likewise, converting to...
(rtx_reader::handle_toplevel_file): ...this method.
(rtx_reader::get_current_location): New method.
(parse_include): Convert from a function to...
(rtx_reader::add_include_path): ...this method, dropping redundant
update to unused max_include_len.
(read_md_files): Convert from a function to...
(rtx_reader::read_md_files): ...this method, converting
handle_directive from a callback to a virtual function.
(noop_reader::handle_unknown_directive): New method.
* read-md.h (directive_handler_t): Delete this typedef.
(in_fname): Delete.
(read_md_file): Delete.
(read_md_lineno): Delete.
(read_md_filename): Delete.
(class rtx_reader): New class.
(rtx_reader_ptr): New decl.
(class noop_reader): New subclass of rtx_reader.
(read_char): Reimplement in terms of rtx_reader::read_char.
(unread_char): Reimplement in terms of rtx_reader::unread_char.
(read_md_files): Delete.
* read-rtl.c (read_rtx_code): Update for deletion of globals
read_md_filename and read_md_lineno.
---
 gcc/genconstants.c |   3 +-
 gcc/genenums.c |   3 +-
 gcc/genmddeps.c|   3 +-
 gcc/genpreds.c |   9 ++-
 gcc/gensupport.c   |  29 +--
 gcc/gensupport.h   |   6 +-
 gcc/read-md.c  | 225 ++---
 gcc/read-md.h  |  98 ++-
 gcc/read-rtl.c |   3 +-
 9 files changed, 242 insertions(+), 137 deletions(-)

diff --git a/gcc/genconstants.c b/gcc/genconstants.c
index c10e3e3..e8be5b6 100644
--- a/gcc/genconstants.c
+++ b/gcc/genconstants.c
@@ -79,7 +79,8 @@ main (int argc, const char **argv)
 {
   progname = "genconstants";
 
-  if (!read_md_files (argc, argv, NULL, NULL))
+  noop_reader reader;
+  if (!reader.read_md_files (argc, argv, NULL))
 return (FATAL_EXIT_CODE);
 
   /* Initializing the MD reader has the side effect of loading up
diff --git a/gcc/genenums.c b/gcc/genenums.c
index db46a67..8af8d9a 100644
--- a/gcc/genenums.c
+++ b/gcc/genenums.c
@@ -49,7 +49,8 @@ main (int argc, const char **argv)
 {
   progname = "genenums";
 
-  if 

[PATCH 8/9] final.c selftests

2016-09-08 Thread David Malcolm
gcc/ChangeLog:
* final.c: Include selftest.h and selftest-rtl.h.
(class selftest::temp_asm_out): New subclass of
selftest::named_temp_file.
(selftest::temp_asm_out::temp_asm_out): New ctor.
(selftest::temp_asm_out::~temp_asm_out): New dtor.
(class selftest::asm_out_test): New subclass of
selftest::rtl_dump_test.
(selftest::asm_out_test::asm_out_test): New ctor.
(selftest::test_jump_insn): New function.
(selftest::test_empty_function): New function.
(selftest::test_asm_for_insn): New function.
(TEST_ASM_FOR_INSN): New macro.
(selftest::test_x86_64_leal): New function.
(selftest::test_x86_64_negl): New function.
(selftest::test_x86_64_cmpl): New function.
(selftest::test_x86_64_cmovge): New function.
(selftest::test_x86_64_ret): New function.
(selftest::final_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::final_c_tests.
* selftest.h (selftest::final_c_tests): New decl.
---
 gcc/final.c  | 271 +++
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 273 insertions(+)

diff --git a/gcc/final.c b/gcc/final.c
index eccc3d8..990f898 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -78,6 +78,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "rtl-iter.h"
 #include "print-rtl.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data declarations.  */
@@ -4894,3 +4896,272 @@ get_call_reg_set_usage (rtx_insn *insn, HARD_REG_SET 
*reg_set,
   COPY_HARD_REG_SET (*reg_set, default_set);
   return false;
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Fixture for temporarily setting the global "asm_out_file"
+   to a named temporary file.  */
+
+class temp_asm_out : public named_temp_file
+{
+ public:
+  temp_asm_out (const location );
+  ~temp_asm_out ();
+};
+
+/* Constructor.  Open a tempfile for writing and set "asm_out_file" to it.  */
+
+temp_asm_out::temp_asm_out (const location )
+: named_temp_file (".s")
+{
+  gcc_assert (asm_out_file == NULL);
+  asm_out_file = fopen (get_filename (), "w");
+  if (!asm_out_file)
+::selftest::fail_formatted (loc, "unable to open tempfile: %s",
+   get_filename ());
+}
+
+/* Destructor.  Close the tempfile and unset "asm_out_file".
+   The tempfile is unlinked by the named_temp_file dtor.  */
+
+temp_asm_out::~temp_asm_out ()
+{
+  fclose (asm_out_file);
+  asm_out_file = NULL;
+}
+
+/* A subclass of rtl_dump_test for testing asm output.
+   Overrides asm_out_file to a tempfile, and temporarily
+   sets "reload_completed = 1;".  */
+
+class asm_out_test : public rtl_dump_test
+{
+ public:
+  asm_out_test (const location , const char *dump_content);
+
+  /* Get the asm output written so far.  The caller should free
+ the returned ptr.  */
+  char *
+  get_output (const location ) const
+  {
+fflush (asm_out_file);
+return read_file (loc, m_asm_out.get_filename ());
+  }
+
+ private:
+  temp_asm_out m_asm_out;
+  temp_override  m_override_reload_completed;
+  temp_override  m_override_in_section;
+};
+
+/* asm_out_test's constructor.  Write DUMP_CONTENT to a tempfile,
+   overrides "asm_out_file" to a tempfile, and temporarily
+   sets "reload_completed = 1;".
+   Assume that no pseudo regs are present in DUMP_CONTENT (by
+   using the default arg to rtl_dump_test's ctor).  */
+
+asm_out_test::asm_out_test (const location , const char *dump_content)
+: rtl_dump_test (dump_content),
+  m_asm_out (loc),
+  /* Temporarily set reload_completed = true.
+ Needed e.g. by ix86_can_use_return_insn_p.  */
+  m_override_reload_completed (reload_completed, true),
+  /* Temporarily set "in_section = NULL;".  */
+  m_override_in_section (in_section, NULL)
+{
+}
+
+/* Test writing asm for a jump_insn.  */
+
+static void
+test_jump_insn ()
+{
+  const char *input_dump
+= ("(jump_insn 1 0 2 4 (set (pc)\n"
+   "(label_ref 3)) ../../src/gcc/testsuite/rtl.dg/test.c:4 -1\n"
+   " (nil)\n"
+   " -> 3)\n"
+   "(barrier 2 1 3)\n"
+   "(code_label 3 2 0 5 2 (nil) [1 uses])\n");
+  asm_out_test t (SELFTEST_LOCATION, input_dump);
+
+  rtx_insn *jump_insn = get_insn_by_uid (1);
+  ASSERT_EQ (JUMP_INSN, GET_CODE (jump_insn));
+
+  rtx_insn *barrier = get_insn_by_uid (2);
+  ASSERT_EQ (BARRIER, GET_CODE (barrier));
+
+  rtx_insn *code_label = get_insn_by_uid (3);
+  ASSERT_EQ (CODE_LABEL, GET_CODE (code_label));
+
+  int seen;
+  final_scan_insn (jump_insn, stderr, 0, 0, );
+
+  char *c = t.get_output (SELFTEST_LOCATION);
+  ASSERT_STREQ ("\tjmp\t.L2\n", c);
+  free (c);
+}
+
+/* Test writing asm for an empty function.  */
+
+static void
+test_empty_function ()
+{
+  /* Dump of the dump from cc1 in "test.c.289r.dwarf2" 

[PATCH 9/9] cse.c selftests

2016-09-08 Thread David Malcolm
This patch uses rtl_dump_test to start building out a test suite
for cse.

I attempted to create a reproducer for PR 71779; however I'm not yet
able to replicate the bogus cse reported there via the test case.

gcc/ChangeLog:
* cse.c: Include selftest.h and selftest-rtl.h.
(selftest::test_simple_cse): New function.
(selftest::test_pr71779): New function.
(selftest::cse_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::cse_c_tests.
* selftest.h (selftest::cse_c_tests): New decl.
---
 gcc/cse.c| 109 +++
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 111 insertions(+)

diff --git a/gcc/cse.c b/gcc/cse.c
index 0bfd7ff..f4f06fe 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "rtl-iter.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 #ifndef LOAD_EXTEND_OP
 #define LOAD_EXTEND_OP(M) UNKNOWN
@@ -7773,3 +7775,110 @@ make_pass_cse_after_global_opts (gcc::context *ctxt)
 {
   return new pass_cse_after_global_opts (ctxt);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests for CSE.  */
+
+/* Simple test of eliminating a redundant (reg + 1) computation
+   i.e. that:
+ r101 = r100 + 1;
+ r102 = r100 + 1; <<< common subexpression
+ *r103 = r101 * r102;
+   can be CSE-ed to:
+ r101 = r100 + 1;
+ r102 = r101; <<< replaced
+ *r103 = r101 * r102;
+   by cse_main.  */
+
+static void
+test_simple_cse ()
+{
+  /* Only run this tests for i386.  */
+#ifndef I386_OPTS_H
+  return;
+#endif
+
+  const char *input_dump
+= (/* "r101 = r100 + 1;" */
+   "(insn 1 0 2 2 (set (reg:SI 101)\n"
+   "   (plus:SI (reg:SI 100)\n"
+   "(const_int 1 [0x1]))) -1 (nil))\n"
+   /* "r102 = r100 + 1;" */
+   "(insn 2 1 3 2 (set (reg:SI 102)\n"
+   "   (plus:SI (reg:SI 100)\n"
+   "(const_int 1 [0x1]))) -1 (nil))\n"
+   /* "*r103 = r101 * r102;" */
+   "(insn 3 2 0 2 (set (mem:SI (reg:SI 103) [1 i+0 S4 A32])\n"
+   "   (mult:SI (reg:SI 101) (reg:SI 102))) -1 (nil))\n"
+   );
+  rtl_dump_test t (input_dump, 100);
+  dataflow_test df_test;
+
+  int tem;
+  tem = cse_main (get_insns (), max_reg_num ());
+  ASSERT_EQ (0, tem);
+
+  /* Verify that insn 2's SET_SRC has been replaced with
+ the SET_DEST of insn 1.  */
+  ASSERT_EQ (SET_DEST (PATTERN (get_insn_by_uid (1))),
+SET_SRC (PATTERN (get_insn_by_uid (2;
+}
+
+/* Towards a regression test for PR 71779.  */
+
+static void
+test_pr71779 ()
+{
+  /* Only run this tests for target==aarch64.  */
+#ifndef GCC_AARCH64_H
+  return;
+#endif
+
+  /* Dump taken from comment 2 of PR 71779, of
+ "...the relevant memory access coming out of expand"
+ with basic block IDs added, and prev/next insns set to
+ 0 at ends.  */
+  const char *input_dump
+= (";; MEM[(struct isl_obj *)] = _obj_map_vtable;\n"
+   "(insn 1045 0 1046 2 (set (reg:SI 480)\n"
+   "(high:SI (symbol_ref:SI (\"isl_obj_map_vtable\") [flags 0xc0] 
))) y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1046 1045 1047 2 (set (reg/f:SI 479)\n"
+   "(lo_sum:SI (reg:SI 480)\n"
+   "(symbol_ref:SI (\"isl_obj_map_vtable\") [flags 0xc0] 
))) y.c:12702 -1\n"
+   " (expr_list:REG_EQUAL (symbol_ref:SI (\"isl_obj_map_vtable\") 
[flags 0xc0] )\n"
+   "(nil)))\n"
+   "(insn 1047 1046 1048 2 (set (reg:DI 481)\n"
+   "(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1048 1047 1049 2 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17368 ])\n"
+   "(const_int 32 [0x20])\n"
+   "(const_int 0 [0]))\n"
+   "(reg:DI 481)) y.c:12702 -1\n"
+   " (nil))\n"
+   /* Extra insn, to avoid all of the above from being deleted by DCE.  */
+   "(insn 1049 1048 0 2 (set (mem:DI (reg:DI 191) [1 i+0 S4 A32])\n"
+   " (const_int 1 [0x1])) -1 (nil))\n");
+
+  rtl_dump_test t (input_dump);
+  dataflow_test df_test;
+
+  int tem;
+  tem = cse_main (get_insns (), max_reg_num ());
+  ASSERT_EQ (0, tem);
+}
+
+/* Run all of the selftests within this file.  */
+
+void
+cse_c_tests ()
+{
+  test_simple_cse ();
+  test_pr71779 ();
+}
+
+} // namespace selftest
+#endif /* CHECKING_P */
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 015572c..5fdfb42 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -65,6 +65,7 @@ selftest::run_tests ()
   rtl_tests_c_tests ();
   read_rtl_function_c_tests ();
   df_core_c_tests ();
+  cse_c_tests ();
 
   /* Higher-level tests, or for components that other selftests don't
  rely on.  */
diff 

[PATCH 7/9] combine.c selftests

2016-09-08 Thread David Malcolm
gcc/ChangeLog:
* combine.c: Include selftest.h and selftest-rtl.h.
(try_combine): Add assertion on this_basic_block.
(class selftest::combine_test): New subclass of
selftest::tl_dump_test.
(selftest::combine_test::combine_test): New ctor.
(selftest::test_combining_shifts): New function.
(selftest::test_non_combinable_shifts): New function.
(selftest::combine_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Run
selftest::combine_c_tests.
* selftest.h (selftest::combine_c_tests): New decl.
---
 gcc/combine.c| 155 +++
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 157 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 1b262f9..9c148bb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -102,6 +102,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "valtrack.h"
 #include "rtl-iter.h"
 #include "print-rtl.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 #ifndef LOAD_EXTEND_OP
 #define LOAD_EXTEND_OP(M) UNKNOWN
@@ -2625,6 +2627,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   rtx new_other_notes;
   int i;
 
+  gcc_assert (this_basic_block);
+
   /* Immediately return if any of I0,I1,I2 are the same insn (I3 can
  never be).  */
   if (i1 == i2 || i0 == i2 || (i0 && i0 == i1))
@@ -14441,3 +14445,154 @@ make_pass_combine (gcc::context *ctxt)
 {
   return new pass_combine (ctxt);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* A subclass of rtl_dump_test for testing combine.c.  */
+
+class combine_test : public rtl_dump_test
+{
+ public:
+   combine_test (const char *dump_content, int dumped_first_pseudo_regno);
+
+ private:
+   dataflow_test m_df_test;
+};
+
+/* combine_test's constructor.  Write DUMP_CONTENT to a tempfile and load
+   it.  Initialize df and perform dataflow analysis.  */
+
+combine_test::combine_test (const char *dump_content,
+   int dumped_first_pseudo_regno)
+: rtl_dump_test (dump_content, dumped_first_pseudo_regno),
+  m_df_test ()
+{
+  /* The dataflow instance should have been created by m_df_test's ctor.  */
+  gcc_assert (df);
+
+  /* From rest_of_handle_combine.  */
+  df_set_flags (/*DF_LR_RUN_DCE + */ DF_DEFER_INSN_RESCAN);
+  df_note_add_problem ();
+  df_analyze ();
+}
+
+/* Verify that combine_instructions works, for combining a pair of shifts.
+   Ideally we'd test try_combine by itself, but a fair amount of
+   refactoring would be needed to do so.  */
+
+static void
+test_combining_shifts ()
+{
+  /* Taken from
+   gcc/testsuite/gcc.dg/asr_div1.c -O2 -fdump-rtl-all -mtune=cortex-a53
+ for aarch64, hand editing the prev/next insns to 0 as needed, and
+ editing whitespace to avoid over-long lines.  */
+  const char *input_dump
+= ("(insn 8 0 9 2 (set (reg:DI 78)\n"
+   "(lshiftrt:DI (reg:DI 76)\n"
+   "(const_int 32 [0x20])))\n"
+   "../../src/gcc/testsuite/gcc.dg/asr_div1.c:14\n"
+   "641 {*aarch64_lshr_sisd_or_int_di3}\n"
+   " (expr_list:REG_DEAD (reg:DI 76)\n"
+   "(nil)))\n"
+   "(insn 9 8 0 2 (set (reg:SI 79)\n"
+   "(ashiftrt:SI (subreg:SI (reg:DI 78) 0)\n"
+   "(const_int 3 [0x3])))\n"
+   "../../src/gcc/testsuite/gcc.dg/asr_div1.c:14\n"
+   "642 {*aarch64_ashr_sisd_or_int_si3}\n"
+   " (expr_list:REG_DEAD (reg:DI 78)\n"
+   "(nil)))\n");
+  combine_test t (input_dump, 76);
+
+  rtx_insn *insn_8 = get_insn_by_uid (8);
+  ASSERT_TRUE (insn_8);
+
+  rtx_insn *insn_9 = get_insn_by_uid (9);
+  ASSERT_TRUE (insn_9);
+
+  int rebuild_jump_labels_after_combine
+= combine_instructions (get_insns (), max_reg_num ());
+  ASSERT_FALSE (rebuild_jump_labels_after_combine);
+
+  /* Verify that insns 8 and 9 were combined.  */
+  ASSERT_EQ (1, combine_merges);
+
+  /* insn 8 should now be deleted.  */
+  ASSERT_EQ (NOTE, GET_CODE (insn_8));
+  ASSERT_EQ (NOTE_INSN_DELETED, NOTE_KIND (insn_8));
+
+  /* insn 9 should now be a shift of 35.
+ On aarch64 it's a set; on x86_64 it's a parallel of a set and a clobber
+ of CC.  */
+  rtx set_in_9 = single_set (insn_9);
+  ASSERT_TRUE (set_in_9);
+  rtx src_of_9 = SET_SRC (set_in_9);
+  ASSERT_EQ (ASHIFTRT, GET_CODE (src_of_9));
+  rtx amt = XEXP (src_of_9, 1);
+  ASSERT_TRUE (CONST_INT_P (amt));
+  ASSERT_EQ (35, INTVAL (amt));
+}
+
+/* Test of failing to combine instructions.
+
+   Similar to test_combining_shifts, but with the input register
+   for the 2nd shift hand-edited (from 78 to 80) so that it doesn't come
+   from the output of the 1st shift, so that the shifts should *not*
+   be combinable.  */
+
+static void
+test_non_combinable_shifts ()
+{
+  const char *input_dump
+= ("(insn 8 0 9 2 (set (reg:DI 78)\n"
+   "(lshiftrt:DI (reg:DI 76)\n"
+   "  

[PATCH 2/9] Add selftest::read_file

2016-09-08 Thread David Malcolm
This is used later in the kit by the selftests for final.c

gcc/ChangeLog:
* selftest.c (selftest::read_file): New function.
(selftest::test_read_file): New function.
(selftest::selftest_c_tests): Call test_read_file.
* selftest.h (selftest::read_file): New decl.
---
 gcc/selftest.c | 60 ++
 gcc/selftest.h |  7 +++
 2 files changed, 67 insertions(+)

diff --git a/gcc/selftest.c b/gcc/selftest.c
index 0db69d2..cf7031f 100644
--- a/gcc/selftest.c
+++ b/gcc/selftest.c
@@ -151,6 +151,53 @@ temp_source_file::temp_source_file (const location ,
   fclose (out);
 }
 
+/* Read the contents of PATH into memory, returning a 0-terminated buffer
+   that must be freed by the caller.
+   Fail (and abort) if there are any problems, with LOC as the reported
+   location of the failure.  */
+
+char *
+read_file (const location , const char *path)
+{
+  FILE *f_in = fopen (path, "r");
+  if (!f_in)
+fail_formatted (loc, "unable to open file: %s", path);
+
+  /* Read content, allocating FIXME.  */
+  char *result = NULL;
+  size_t total_sz = 0;
+  size_t alloc_sz = 0;
+  char buf[4096];
+  size_t iter_sz_in;
+
+  while ( (iter_sz_in = fread (buf, 1, sizeof (buf), f_in)) )
+{
+  gcc_assert (alloc_sz >= total_sz);
+  size_t old_total_sz = total_sz;
+  total_sz += iter_sz_in;
+  /* Allow 1 extra byte for 0-termination.  */
+  if (alloc_sz < (total_sz + 1))
+   {
+ size_t new_alloc_sz = alloc_sz ? alloc_sz * 2: total_sz + 1;
+ result = (char *)xrealloc (result, new_alloc_sz);
+ alloc_sz = new_alloc_sz;
+   }
+  memcpy (result + old_total_sz, buf, iter_sz_in);
+}
+
+  if (!feof (f_in))
+fail_formatted (loc, "error reading from %s: %s", path,
+   xstrerror (errno));
+
+  fclose (f_in);
+
+  /* 0-terminate the buffer.  */
+  gcc_assert (total_sz < alloc_sz);
+  result[total_sz] = '\0';
+
+  return result;
+}
+
 /* Selftests for the selftest system itself.  */
 
 /* Sanity-check the ASSERT_ macros with various passing cases.  */
@@ -181,6 +228,18 @@ test_named_temp_file ()
   fclose (f);
 }
 
+/* Verify read_file (and also temp_source_file).  */
+
+static void
+test_read_file ()
+{
+  temp_source_file t (SELFTEST_LOCATION, "test1.s",
+ "\tjmp\t.L2\n");
+  char *buf = read_file (SELFTEST_LOCATION, t.get_filename ());
+  ASSERT_STREQ ("\tjmp\t.L2\n", buf);
+  free (buf);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -188,6 +247,7 @@ selftest_c_tests ()
 {
   test_assertions ();
   test_named_temp_file ();
+  test_read_file ();
 }
 
 } // namespace selftest
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 3938560..e5f5c60 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -146,6 +146,13 @@ class line_table_test
 extern void
 for_each_line_table_case (void (*testcase) (const line_table_case &));
 
+/* Read the contents of PATH into memory, returning a 0-terminated buffer
+   that must be freed by the caller.
+   Fail (and abort) if there are any problems, with LOC as the reported
+   location of the failure.  */
+
+extern char *read_file (const location , const char *path);
+
 /* Declarations for specific families of tests (by source file), in
alphabetical order.  */
 extern void bitmap_c_tests ();
-- 
1.8.5.3



[PATCH 3/9] selftest.h: add temp_override fixture

2016-09-08 Thread David Malcolm
We have a lot of global state in our code.  Ideally we'd reduce the
amount of such global state, but a prerequisite for sane refactoring
is having automated testing in place to ensure that the refactoring
doesn't break anything.

However, the global state itself makes it hard to write such automated
testing.

To break this Catch-22, this patch introduces a class temp_override,
for temporarily assigning a value to a global variable, saving the old
value, and then restoring that old value in a dtor.

gcc/ChangeLog:
* selftest.h (selftest::temp_override): New class.
---
 gcc/selftest.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/selftest.h b/gcc/selftest.h
index e5f5c60..4c50217 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -153,6 +153,35 @@ for_each_line_table_case (void (*testcase) (const 
line_table_case &));
 
 extern char *read_file (const location , const char *path);
 
+/* A fixture for temporarily overriding a global variable with a new
+   value.  The original value of the variable is captured in the ctor,
+   and restored in the dtor.  */
+
+template 
+class temp_override
+{
+ public:
+  temp_override (T& var, T new_value)
+  : m_var (var),
+/* Record the current value of VAR.  */
+m_old_value (var)
+  {
+/* Set the var to the new value.  */
+m_var = new_value;
+  }
+
+  ~temp_override ()
+  {
+/* Restore the value of the variable to that stored in the
+   ctor.  */
+m_var = m_old_value;
+  }
+
+ private:
+  T& m_var;
+  T m_old_value;
+};
+
 /* Declarations for specific families of tests (by source file), in
alphabetical order.  */
 extern void bitmap_c_tests ();
-- 
1.8.5.3



[PATCH] PR fortran/77507

2016-09-08 Thread Steve Kargl
The attached patch fixes issues with using keywords with
the IEEE_VALUE and C_ASSOCIATED intrinsic routines. 
Regression tested on x86_64-*-freebsd.  OK to commit?

2016-09-08  Steven G. Kargl  

PR fortran/77507
* intrinsic.c (add_functions):  Use correct keyword.

2016-09-08  Steven G. Kargl  

PR fortran/77507
* ieee/ieee_arithmetic.F90 (IEEE_VALUE_4,IEEE_VALUE_8,IEEE_VALULE_10,
IEEE_VALUE_16):  Use correct keyword.

2016-09-08  Steven G. Kargl  

PR fortran/77507
* gfortran.dg/pr77507.f90: New test.

-- 
Steve
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c	(revision 240039)
+++ gcc/fortran/intrinsic.c	(working copy)
@@ -1239,7 +1239,8 @@ add_functions (void)
 *z = "z", *ln = "len", *ut = "unit", *han = "handler",
 *num = "number", *tm = "time", *nm = "name", *md = "mode",
 *vl = "values", *p1 = "path1", *p2 = "path2", *com = "command",
-*ca = "coarray", *sub = "sub", *dist = "distance", *failed="failed";
+*ca = "coarray", *sub = "sub", *dist = "distance", *failed="failed",
+*c_ptr_1 = "c_ptr_1", *c_ptr_2 = "c_ptr_2";
 
   int di, dr, dd, dl, dc, dz, ii;
 
@@ -2914,8 +2915,8 @@ add_functions (void)
   /* The following functions are part of ISO_C_BINDING.  */
   add_sym_2 ("c_associated", GFC_ISYM_C_ASSOCIATED, CLASS_INQUIRY, ACTUAL_NO,
 	 BT_LOGICAL, dl, GFC_STD_F2003, gfc_check_c_associated, NULL, NULL,
-	 "C_PTR_1", BT_VOID, 0, REQUIRED,
-	 "C_PTR_2", BT_VOID, 0, OPTIONAL);
+	 c_ptr_1, BT_VOID, 0, REQUIRED,
+	 c_ptr_2, BT_VOID, 0, OPTIONAL);
   make_from_module();
 
   add_sym_1 ("c_loc", GFC_ISYM_C_LOC, CLASS_INQUIRY, ACTUAL_NO,
Index: libgfortran/ieee/ieee_arithmetic.F90
===
--- libgfortran/ieee/ieee_arithmetic.F90	(revision 240039)
+++ libgfortran/ieee/ieee_arithmetic.F90	(working copy)
@@ -857,12 +857,12 @@ contains
 
   ! IEEE_VALUE
 
-  elemental real(kind=4) function IEEE_VALUE_4(X, C) result(res)
-implicit none
+  elemental real(kind=4) function IEEE_VALUE_4(X, CLASS) result(res)
+
 real(kind=4), intent(in) :: X
-type(IEEE_CLASS_TYPE), intent(in) :: C
+type(IEEE_CLASS_TYPE), intent(in) :: CLASS
 
-select case (C%hidden)
+select case (CLASS%hidden)
   case (1) ! IEEE_SIGNALING_NAN
 res = -1
 res = sqrt(res)
@@ -895,12 +895,12 @@ contains
  end select
   end function
 
-  elemental real(kind=8) function IEEE_VALUE_8(X, C) result(res)
-implicit none
+  elemental real(kind=8) function IEEE_VALUE_8(X, CLASS) result(res)
+
 real(kind=8), intent(in) :: X
-type(IEEE_CLASS_TYPE), intent(in) :: C
+type(IEEE_CLASS_TYPE), intent(in) :: CLASS
 
-select case (C%hidden)
+select case (CLASS%hidden)
   case (1) ! IEEE_SIGNALING_NAN
 res = -1
 res = sqrt(res)
@@ -934,12 +934,12 @@ contains
   end function
 
 #ifdef HAVE_GFC_REAL_10
-  elemental real(kind=10) function IEEE_VALUE_10(X, C) result(res)
-implicit none
+  elemental real(kind=10) function IEEE_VALUE_10(X, CLASS) result(res)
+
 real(kind=10), intent(in) :: X
-type(IEEE_CLASS_TYPE), intent(in) :: C
+type(IEEE_CLASS_TYPE), intent(in) :: CLASS
 
-select case (C%hidden)
+select case (CLASS%hidden)
   case (1) ! IEEE_SIGNALING_NAN
 res = -1
 res = sqrt(res)
@@ -971,15 +971,16 @@ contains
 res = 0
  end select
   end function
+
 #endif
 
 #ifdef HAVE_GFC_REAL_16
-  elemental real(kind=16) function IEEE_VALUE_16(X, C) result(res)
-implicit none
+  elemental real(kind=16) function IEEE_VALUE_16(X, CLASS) result(res)
+
 real(kind=16), intent(in) :: X
-type(IEEE_CLASS_TYPE), intent(in) :: C
+type(IEEE_CLASS_TYPE), intent(in) :: CLASS
 
-select case (C%hidden)
+select case (CLASS%hidden)
   case (1) ! IEEE_SIGNALING_NAN
 res = -1
 res = sqrt(res)
Index: gcc/testsuite/gfortran.dg/pr77507.f90
===
--- gcc/testsuite/gfortran.dg/pr77507.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr77507.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+Program p
+  Use ieee_arithmetic
+  Use iso_c_binding
+  Print *, ieee_value(x=1.0, class=ieee_negative_inf)
+  Print *, c_associated(c_ptr_1=c_null_ptr)
+End Program


[Committed] PR fortran/69514

2016-09-08 Thread Steve Kargl
I've committed the attached patch to trunk after
completing regression testing on x86_64-*-freebsd.

2016-09-08  Steven G. Kargl  

PR fortran/69514
* array.c (gfc_match_array_constructor):  If type-spec is present,
walk the array constructor performing possible conversions for 
numeric types.

2016-09-08  Steven G. Kargl  
Louis Krupp  

PR fortran/69514
* gfortran.dg/pr69514_1.f90: New test.
* gfortran.dg/pr69514_2.f90: New test.

-- 
Steve
Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c	(revision 240038)
+++ gcc/fortran/array.c	(working copy)
@@ -1089,6 +1089,7 @@ match_array_cons_element (gfc_constructo
 match
 gfc_match_array_constructor (gfc_expr **result)
 {
+  gfc_constructor *c;
   gfc_constructor_base head, new_cons;
   gfc_undo_change_set changed_syms;
   gfc_expr *expr;
@@ -1194,8 +1195,6 @@ done:
 	 be converted.  See PR fortran/67803.  */
   if (ts.type == BT_CHARACTER)
 	{
-	  gfc_constructor *c;
-
 	  c = gfc_constructor_first (head);
 	  for (; c; c = gfc_constructor_next (c))
 	{
@@ -1218,6 +1217,14 @@ done:
 		}
 	}
 	}
+
+  /* Walk the constructor and ensure type conversion for numeric types.  */
+  if (gfc_numeric_ts ())
+	{
+	  c = gfc_constructor_first (head);
+	  for (; c; c = gfc_constructor_next (c))
+	gfc_convert_type (c->expr, , 1);
+	}
 }
   else
 expr = gfc_get_array_expr (BT_UNKNOWN, 0, );
Index: gcc/testsuite/gfortran.dg/pr69514_1.f90
===
--- gcc/testsuite/gfortran.dg/pr69514_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr69514_1.f90	(working copy)
@@ -0,0 +1,5 @@
+! { dg-do run }
+program foo
+   real, parameter :: x(3) = 2.0 * [real :: 1, 2, 3 ]
+   if (any(x /= [2., 4., 6.])) call abort
+end program foo
Index: gcc/testsuite/gfortran.dg/pr69514_2.f90
===
--- gcc/testsuite/gfortran.dg/pr69514_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr69514_2.f90	(working copy)
@@ -0,0 +1,49 @@
+! { dg-do run }
+program p
+ implicit none
+
+ real   , parameter :: arr(3) = [ real:: 2, 2.5, (1.5, 2.5) ]
+ real   , parameter :: ari(3) = [ integer :: 2, 2.5, (1.5, 2.5) ]
+ real   , parameter :: arc(3) = [ complex :: 2, 2.5, (1.5, 2.5) ]
+ integer, parameter :: air(3) = [ real:: 2, 2.5, (1.5, 2.5) ]
+ integer, parameter :: aii(3) = [ integer :: 2, 2.5, (1.5, 2.5) ]
+ integer, parameter :: aic(3) = [ complex :: 2, 2.5, (1.5, 2.5) ]
+ complex, parameter :: acr(3) = [ real:: 2, 2.5, (1.5, 2.5) ]
+ complex, parameter :: aci(3) = [ integer :: 2, 2.5, (1.5, 2.5) ]
+ complex, parameter :: acc(3) = [ complex :: 2, 2.5, (1.5, 2.5) ]
+
+ real   , parameter :: mrr(3) =  4.5   * [ real:: 2, 2.5, (3.5, 4.0) ]
+ real   , parameter :: mri(3) =  4.5   * [ integer :: 2, 2.5, (3.5, 4.0) ]
+ real   , parameter :: mrc(3) =  4.5   * [ complex :: 2, 2.5, (3.5, 4.0) ]
+ integer, parameter :: mir(3) =  4 * [ real:: 2, 2.5, (3.5, 4.0) ]
+ integer, parameter :: mii(3) =  4 * [ integer :: 2, 2.5, (3.5, 4.0) ]
+ integer, parameter :: mic(3) =  4 * [ complex :: 2, 2.5, (3.5, 4.0) ]
+ complex, parameter :: mcr(3) = (4.5, 5.5) * [ real:: 2, 2.5, (3.5, 4.0) ]
+ complex, parameter :: mci(3) = (4.5, 5.5) * [ integer :: 2, 2.5, (3.5, 4.0) ]
+ complex, parameter :: mcc(3) = (4.5, 5.5) * [ complex :: 2, 2.5, (3.5, 4.0) ]
+
+ if (any(arr /= [2.00, 2.50, 1.50])) call abort
+ if (any(ari /= [2.00, 2.00, 1.00])) call abort
+ if (any(arc /= [2.00, 2.50, 1.50])) call abort
+
+ if (any(air /= [2, 2, 1])) call abort
+ if (any(aii /= [2, 2, 1])) call abort
+ if (any(aic /= [2, 2, 1])) call abort
+
+ if (any(acr /= [(2.00, 0.00), (2.50, 0.00), (1.50, 0.00)])) call abort
+ if (any(aci /= [(2.00, 0.00), (2.00, 0.00), (1.00, 0.00)])) call abort
+ if (any(acc /= [(2.00, 0.00), (2.50, 0.00), (1.50, 2.50)])) call abort
+
+ if (any(mrr /= [9.00, 11.25, 15.75])) call abort
+ if (any(mri /= [9.00,  9.00, 13.50])) call abort
+ if (any(mrc /= [9.00, 11.25, 15.75])) call abort
+
+ if (any(mir /= [8, 10, 14])) call abort
+ if (any(mii /= [8,  8, 12])) call abort
+ if (any(mic /= [8, 10, 14])) call abort
+
+ if (any(mcr /= [(9.00, 11.00), (11.25, 13.75), (15.75, 19.25)])) call abort
+ if (any(mci /= [(9.00, 11.00), ( 9.00, 11.00), (13.50, 16.50)])) call abort
+ if (any(mcc /= [(9.00, 11.00), (11.25, 13.75), (-6.25, 37.25)])) call abort
+
+end program p


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread Joseph Myers
On Thu, 8 Sep 2016, Martin Sebor wrote:

> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index da133a4..4607495 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4081,6 +4081,13 @@ In either case, it remains possible to select 
> code-generation for the alternate
>  scheme, by means of compiler command line switches.
>  @end defmac
>  
> +@deftypefn {Target Hook} {const char *} TARGET_LIBC_PRINTF_POINTER_FORMAT 
> (tree, const char **@var{flags})
> +A hook to determine the target @code{printf} implementation format string
> +that the most closely corresponds to the @code{%p} format directive.
> +The object pointed to by the @var{flags} is set to a string consisting
> +of recognized format flags such as the @code{'#'} character.
> +@end deftypefn

No, the substance of hook documentation should go in target.def with just 
an @hook line in tm.texi.in leading to the documentation going in tm.texi 
automatically.

You appear to be defining a target macro masquerading as a hook.  Please 
don't (new target macros should be avoided where possible); use a proper 
hook.  (Maybe the settings depending on OS rather than architecture means 
it needs to be one of those whose default is a manual setting in 
target-def.h rather than automatically generated, but that should be the 
limit of deviation from the normal workings of hooks.)

> +  const char *pfmt = TARGET_LIBC_PRINTF_POINTER_FORMAT (arg, );

With a proper hook them you'd call targetm.libc_printf_pointer_format.

> + inform (callloc,
> + (nbytes + exact == 1
> +  ? "format output %wu byte into a destination of size %wu"
> +  : "format output %wu bytes into a destination of size %wu"),
> + nbytes + exact, info.objsize);

You need to use G_() around both format strings in such a case; xgettext 
doesn't know how to extract them both.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread Martin Sebor

On 09/08/2016 01:45 PM, David Malcolm wrote:

On Thu, 2016-09-08 at 13:03 -0600, Martin Sebor wrote:

Attached is another update to the patch to address the last round
of comments and suggestions, most notably to:

   *  implement the checking of the implementation limit of 4,095 on
  the output of a single directive to allow for the Glibc failure
  due to ENOMEM (the patch issues a warning and disables the
  optimization when this happens)
   *  implement checking for exceeding INT_MAX bytes (warn and disable
  optimization)
   *  call set_range_info when the return value optimization is not
  possible
   *  remove code to work around tree-optimization/71831 (now on
  trunk)

The -fprintf-return value optimization is still disabled.  GCC
successfully bootstraps with it and most tests pass but there's
a failure in the Fortran libgomp tests that I am yet to figure
out.

I'm hoping to get the patch reviewed and hopefully approved while
I debug the libgomp failure.

Martin


I see that you also integrated the substring_loc and format_warning API
into this revision - thanks.


Yes, I forgot to mention it among the highlights.  Thanks for making
the API available to the middle-end!



The patch has a lot of macro-based testcases, presumably for exercising
all of the format codes and boundary conditions, but it seems to be
lacking what I'd call a "usability test case" - a test case that shows
a representative example of idiomatic but buggy code, along with the
full output of the warning, with carets and underlining, so that we can
easily see what the user experience is.  (sorry if there is one and I
didn't see it).


There is a simple test (in the test_sprintf_note() function in
builtin-sprintf-warn-1.c) that exercises the notes but there's always
room for more and more robust test cases :)  I'll see about adding
a few.

FWIW, the challenge here is not in adding them but rather in knowing
when to stop and what level of detail to go to.  With too many test
cases (exercising this level of detail) it can get very tedious and
time consuming to then change the diagnostics or add more detail.
This is not an excuse for not having enough tests.  But striking
the right balance between the level of detail in them is something
I had to grapple with on this project.


 From a marketing point-of-view, I think any new diagnostics like this
deserve a screenshot on the website's gcc-7/changes.html page, showing
a simple example of the above that makes a casual reader think "gcc 7
looks neat; I've definitely made that mistake; I wonder if that's going
to find bugs in my code; I'd better try it at some point".

So please can you add a test case that demonstrates such a screenshot
-worthy example, using:

   /* { dg-options "-fdiagnostics-show-caret" } */

and you can use:

   /* { dg-begin-multiline-output "" }
copy the source, underlines and carets here, omitting trailing dg
directives.
  { dg-end-multiline-output "" } */

(we'd strip away all the dg- directives when making the screenshots for
the website).

The act of creating such an example sometimes suggests tweaks e.g. to
the exact wording of the warning.

Sorry if this seems like I'm picking on you Martin; I just wanted to
share some thoughts that I'm trying to crystallize into general
guidelines on writing diagnostics.


Not at all!  Thanks for the review and for the suggestion to make
use of the multiline DejaGnu directives.  I'll add more tests in
the next revision of the patch (as I have been in each iteration).

Martin


Re: [PATCH, PING] DWARF: process all TYPE_DECL nodes when iterating on scopes

2016-09-08 Thread Pierre-Marie de Rodat

On 09/07/2016 11:30 AM, Richard Biener wrote:

Ok, had time to look at this issue again.  I see the patch works like dwarf2out
works currently with respect to DIE creation order and re-location.


Thank you very much for helping me with this again!

So yes, that was the intent of the patch.


this might be incomplete though for the case where it's say

 typedef const T X;

thus the type of decl is a qualified type?  In this case the qualification might
be at the correct scope but the actual type not or you just relocate the
qualification but not the type DIE it refers to?


I haven’t tested this yet but I guess you are right. A complete patch 
should also probably see if the unqualifier type should be relocated.



That said, with the idea of early debug in place and thus giving
more responsibility to the frontends I wonder in what order the Ada
FE calls debug_hooks.early_global_decl ()? Maybe it's the middle-end
and it should arrange for a more natural order on function nests.  So
I'd appreciate if you can investigate this side of the issue a bit,
that is, simply avoid the bad ordering.


Reordering compilation of function nests was the first idea I had in 
mind when I first worked on this issue, maybe two years ago. I thought 
it would make sense debug info generation-wise, but I wondered if this 
specific order was motivated by code generation concerns.


Anyway I agree it would be a more elegant way out. As the GNU cauldron 
is going to keep me busy, I think I’ll investigate this way next week. 
Thanks again! :-)


--
Pierre-Marie de Rodat


Re: RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-08 Thread Jason Merrill
On Thu, Sep 8, 2016 at 11:55 AM, Joseph Myers  wrote:
> On Thu, 8 Sep 2016, Jason Merrill wrote:
>
>> Various places in GCC use negate, bit-and and compare to test whether
>> an integer is a power of 2, but I think it would be clearer for this
>> test to be wrapped in a function.
>
> (x & -x) == x is also true for 0.  Whatever the correct function semantics
> for 0, the comment needs to reflect them.

Yep, I was just realizing that, too.  This much larger patch introduces:

 least_bit_hwi: (x & -x)
 pow2_or_zerop: (x & -x) == x
 pow2p_hwi: x && x == (x & -x)
 ctz_or_zero: floor_log2 (x & -x)

and replaces these patterns accordingly.  I'm not at all attached to the names.

Tested x86_64-pc-linux-gnu.
commit 4b76501b72f3953ac57cb077b4e25a90afb6d9a9
Author: Jason Merrill 
Date:   Thu Sep 8 13:10:12 2016 -0400

Add inline functions for various bitwise operations.

* hwint.h (least_bit_hwi, pow2_or_zerop, pow2p_hwi, ctz_or_zero):
New.
* hwint.c (exact_log2): Use pow2p_hwi.
(ctz_hwi, ffs_hwi): Use least_bit_hwi.
* alias.c (memrefs_conflict_p): Use pow2_or_zerop.
* builtins.c (get_object_alignment_2, get_object_alignment)
(get_pointer_alignment, fold_builtin_atomic_always_lock_free): Use
least_bit_hwi.
* calls.c (compute_argument_addresses, store_one_arg): Use
least_bit_hwi.
* cfgexpand.c (expand_one_stack_var_at): Use least_bit_hwi.
* combine.c (force_to_mode): Use least_bit_hwi.
* emit-rtl.c (set_mem_attributes_minus_bitpos, adjust_address_1):
Use least_bit_hwi.
* expmed.c (synth_mult, expand_divmod): Use ctz_or_zero, ctz_hwi.
(init_expmed_one_conv): Use pow2p_hwi.
* fold-const.c (round_up_loc, round_down_loc): Use pow2_or_zerop.
(fold_binary_loc): Use pow2p_hwi.
* function.c (assign_parm_find_stack_rtl): Use least_bit_hwi.
* gimple-fold.c (gimple_fold_builtin_memory_op): Use pow2p_hwi.
* gimple-ssa-strength-reduction.c (replace_ref): Use least_bit_hwi.
* hsa-gen.c (gen_hsa_addr_with_align, hsa_bitmemref_alignment):
Use least_bit_hwi.
* ipa-cp.c (ipcp_alignment_lattice::meet_with_1): Use least_bit_hwi.
* ipa-prop.c (ipa_modify_call_arguments): Use least_bit_hwi.
* omp-low.c (oacc_loop_fixed_partitions)
(oacc_loop_auto_partitions): Use least_bit_hwi.
* rtlanal.c (nonzero_bits1): Use ctz_or_zero.
* stor-layout.c (place_field): Use least_bit_hwi.
* tree-pretty-print.c (dump_generic_node): Use pow2p_hwi.
* tree-sra.c (build_ref_for_offset): Use least_bit_hwi.
* tree-ssa-ccp.c (ccp_finalize): Use least_bit_hwi.
* tree-ssa-math-opts.c (bswap_replace): Use least_bit_hwi.
* tree-ssa-strlen.c (handle_builtin_memcmp): Use pow2p_hwi.
* tree-vect-data-refs.c (vect_analyze_group_access_1)
(vect_grouped_store_supported, vect_grouped_load_supported)
(vect_permute_load_chain, vect_shift_permute_load_chain)
(vect_transform_grouped_load): Use pow2p_hwi.
* tree-vect-generic.c (expand_vector_divmod): Use ctz_or_zero.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use ctz_or_zero.
* tree-vect-stmts.c (vectorizable_mask_load_store): Use
least_bit_hwi.
* tsan.c (instrument_expr): Use least_bit_hwi.
* var-tracking.c (negative_power_of_two_p): Use pow2_or_zerop.

diff --git a/gcc/alias.c b/gcc/alias.c
index f4b5a92..277125e 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -2534,7 +2534,7 @@ memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, 
HOST_WIDE_INT c)
 {
   HOST_WIDE_INT sc = INTVAL (XEXP (x, 1));
   unsigned HOST_WIDE_INT uc = sc;
-  if (sc < 0 && -uc == (uc & -uc))
+  if (sc < 0 && pow2_or_zerop (-uc))
{
  if (xsize > 0)
xsize = -xsize;
@@ -2549,7 +2549,7 @@ memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, 
HOST_WIDE_INT c)
 {
   HOST_WIDE_INT sc = INTVAL (XEXP (y, 1));
   unsigned HOST_WIDE_INT uc = sc;
-  if (sc < 0 && -uc == (uc & -uc))
+  if (sc < 0 && pow2_or_zerop (-uc))
{
  if (ysize > 0)
ysize = -ysize;
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1073e35..0ccba15 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -305,7 +305,7 @@ get_object_alignment_2 (tree exp, unsigned int *alignp,
{
  ptr_bitmask = TREE_INT_CST_LOW (TREE_OPERAND (addr, 1));
  ptr_bitmask *= BITS_PER_UNIT;
- align = ptr_bitmask & -ptr_bitmask;
+ align = least_bit_hwi (ptr_bitmask);
  addr = TREE_OPERAND (addr, 0);
}
 
@@ -325,7 +325,7 @@ get_object_alignment_2 (tree exp, unsigned int *alignp,
  

Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread David Malcolm
On Thu, 2016-09-08 at 13:03 -0600, Martin Sebor wrote:
> Attached is another update to the patch to address the last round
> of comments and suggestions, most notably to:
> 
>   *  implement the checking of the implementation limit of 4,095 on
>  the output of a single directive to allow for the Glibc failure
>  due to ENOMEM (the patch issues a warning and disables the
>  optimization when this happens)
>   *  implement checking for exceeding INT_MAX bytes (warn and disable
>  optimization)
>   *  call set_range_info when the return value optimization is not
>  possible
>   *  remove code to work around tree-optimization/71831 (now on
>  trunk)
> 
> The -fprintf-return value optimization is still disabled.  GCC
> successfully bootstraps with it and most tests pass but there's
> a failure in the Fortran libgomp tests that I am yet to figure
> out.
> 
> I'm hoping to get the patch reviewed and hopefully approved while
> I debug the libgomp failure.
> 
> Martin

I see that you also integrated the substring_loc and format_warning API
into this revision - thanks.

The patch has a lot of macro-based testcases, presumably for exercising
all of the format codes and boundary conditions, but it seems to be
lacking what I'd call a "usability test case" - a test case that shows
a representative example of idiomatic but buggy code, along with the
full output of the warning, with carets and underlining, so that we can
easily see what the user experience is.  (sorry if there is one and I
didn't see it).

>From a marketing point-of-view, I think any new diagnostics like this
deserve a screenshot on the website's gcc-7/changes.html page, showing
a simple example of the above that makes a casual reader think "gcc 7
looks neat; I've definitely made that mistake; I wonder if that's going
to find bugs in my code; I'd better try it at some point".

So please can you add a test case that demonstrates such a screenshot
-worthy example, using:

  /* { dg-options "-fdiagnostics-show-caret" } */

and you can use:

  /* { dg-begin-multiline-output "" }
copy the source, underlines and carets here, omitting trailing dg
directives.
 { dg-end-multiline-output "" } */

(we'd strip away all the dg- directives when making the screenshots for
the website).

The act of creating such an example sometimes suggests tweaks e.g. to
the exact wording of the warning.

Sorry if this seems like I'm picking on you Martin; I just wanted to
share some thoughts that I'm trying to crystallize into general
guidelines on writing diagnostics.

Hope this is constructive
Dave


[PATCH] ada/77535: GNAT.Perfect_Hash_Generators for non-1-based strings

2016-09-08 Thread Florian Weimer
This patch fixes GNAT.Perfect_Hash_Generators for strings which are
not 1-based.  It does this by introducing its own storage type which
fixes the first index as 1.  This is also a minor optimization because
it avoids the need to store the index.

Okay for trunk?

Should I try to construct a new test case for this?  I don't see any
existing tests for this package.

2016-09-08  Florian Weimer  

PR ada/77535
Make all word strings start with 1.
* g-pehage.adb (Word_Storage): New type.
(Word_Type): Use Word_Storage.
(Free_Word): Instantiate Unchecked_Deallocation.
(Apply_Position_Selection, Put_Initial_Keys, Put_Reduced_Keys)
(Resize_Word, Select_Char_Position, Select_Character_Set): Adjust
indirection through Word_Type.
(New_Word): Allocate Word_Storage instead of String.

Index: gcc/ada/g-pehage.adb
===
--- gcc/ada/g-pehage.adb	(revision 240038)
+++ gcc/ada/g-pehage.adb	(working copy)
@@ -32,6 +32,7 @@
 with Ada.IO_Exceptions;   use Ada.IO_Exceptions;
 with Ada.Characters.Handling; use Ada.Characters.Handling;
 with Ada.Directories;
+with Ada.Unchecked_Deallocation;
 
 with GNAT.Heap_Sort_G;
 with GNAT.OS_Lib;  use GNAT.OS_Lib;
@@ -102,8 +103,12 @@
No_Edge   : constant Edge_Id   := -1;
No_Table  : constant Table_Id  := -1;
 
-   type Word_Type is new String_Access;
-   procedure Free_Word (W : in out Word_Type) renames Free;
+   type Word_Storage (Length : Natural) is record
+  Word : String (1 .. Length);
+   end record;
+   type Word_Type is access Word_Storage;
+   procedure Free_Word is
+  new Ada.Unchecked_Deallocation (Word_Storage, Word_Type);
function New_Word (S : String) return Word_Type;
 
procedure Resize_Word (W : in out Word_Type; Len : Natural);
@@ -574,7 +579,7 @@
begin
   for J in 0 .. NK - 1 loop
  declare
-IW : constant String := WT.Table (Initial (J)).all;
+IW : constant String := WT.Table (Initial (J)).Word;
 RW : String (1 .. IW'Length) := (others => ASCII.NUL);
 N  : Natural := IW'First - 1;
 
@@ -1312,7 +1317,8 @@
 
function New_Word (S : String) return Word_Type is
begin
-  return new String'(S);
+  return new Word_Storage'(Length => S'Length,
+   Word => S);
end New_Word;
 
--
@@ -1913,7 +1919,7 @@
  K := Get_Key (J);
  Put (File, Image (J, M),   F1, L1, J, 1, 3, 1);
  Put (File, Image (K.Edge, M),  F1, L1, J, 1, 3, 2);
- Put (File, Trim_Trailing_Nuls (WT.Table (Initial (J)).all),
+ Put (File, Trim_Trailing_Nuls (WT.Table (Initial (J)).Word),
 F1, L1, J, 1, 3, 3);
   end loop;
end Put_Initial_Keys;
@@ -1995,7 +2001,7 @@
  K := Get_Key (J);
  Put (File, Image (J, M),   F1, L1, J, 1, 3, 1);
  Put (File, Image (K.Edge, M),  F1, L1, J, 1, 3, 2);
- Put (File, Trim_Trailing_Nuls (WT.Table (Reduced (J)).all),
+ Put (File, Trim_Trailing_Nuls (WT.Table (Reduced (J)).Word),
 F1, L1, J, 1, 3, 3);
   end loop;
end Put_Reduced_Keys;
@@ -2075,7 +2081,7 @@
-
 
procedure Resize_Word (W : in out Word_Type; Len : Natural) is
-  S1 : constant String := W.all;
+  S1 : constant String := W.Word;
   S2 : String (1 .. Len) := (others => ASCII.NUL);
   L  : constant Natural := S1'Length;
begin
@@ -2161,7 +2167,7 @@
Right := Offset + R;
 end if;
 
-return WT.Table (Left)(C) < WT.Table (Right)(C);
+return WT.Table (Left).Word (C) < WT.Table (Right).Word (C);
  end Lt;
 
  --
@@ -2221,8 +2227,8 @@
   --  Two contiguous words are identical when they have the
   --  same Cth character.
 
-  elsif WT.Table (Reduced (N))(C) =
-WT.Table (Reduced (N + 1))(C)
+  elsif WT.Table (Reduced (N)).Word (C) =
+WT.Table (Reduced (N + 1)).Word (C)
   then
  L := N + 1;
 
@@ -2265,7 +2271,7 @@
 
 N := (others => 0);
 for K in Table (S).First .. Table (S).Last loop
-   C := WT.Table (Reduced (K))(Pos);
+   C := WT.Table (Reduced (K)).Word (Pos);
N (C) := N (C) + 1;
 end loop;
 
@@ -2288,7 +2294,7 @@
   --  Initialize the reduced words set
 
   for K in 0 .. NK - 1 loop
- WT.Table (Reduced (K)) := New_Word (WT.Table (Initial (K)).all);
+ WT.Table (Reduced (K)) := New_Word (WT.Table (Initial (K)).Word);
   end loop;
 
   declare
@@ -2384,7 +2390,7 @@
 Same_Keys_Sets_Table (J).Last
   loop
  Put (Output,
-  

Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread Martin Sebor

On 09/08/2016 01:21 PM, Jeff Law wrote:

On 08/24/2016 10:40 AM, Martin Sebor wrote:

On 08/23/2016 05:00 PM, Joseph Myers wrote:

Some observations:

* Does -fprintf-return-value allow for the possibility of snprintf
failing
because of a memory allocation failure and so returning -1 when GCC
computed bounds on what it could return if successful?


No.  I recall having seen Glibc fail with ENOMEM years ago when
formatting a floating point number to a very large precision but
I haven't seen any implementation fail.  I haven't yet looked to
see if the Glibc failure can still happen.  My reading of C and
POSIX is that snprintf is only allowed to fail due to an encoding
error, not because it runs out of memory, so such a failure would
seem like a bug.

At the same time, the cause of the failure doesn't really matter.
If the function could fail, unless GCC could determine the failure
at compile time and avoid the optimization, the transformation
wouldn't be guaranteed to be safe.  It might be something to
consider and possibly accommodate in the implementation.  It's
one of the reasons why I want to expose the optimization to
more code before enabling it.

I also haven't yet thought about how to deal with it but if it
is a possibility we want to allow for then maybe a target hook
for libc implementers to set to indicate whether sprintf can fail
and when would work.  Libc implementations that can fail under
any conditions (whether allowed by the standard or not) would
need to disable (or not enable, depending on the default) the
optimization.  I'm certainly open to other ideas.

So what are the implications for the optimization part of this patch?


By coincidence I've just posted an updated patch that handles this
situation by avoiding the optimization (and issuing a warning pointing
out that a directive produced more that 4,095 bytes worth of output).
Ditto for INT_MAX.

Martin


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread Jeff Law

On 08/24/2016 10:40 AM, Martin Sebor wrote:

On 08/23/2016 05:00 PM, Joseph Myers wrote:

Some observations:

* Does -fprintf-return-value allow for the possibility of snprintf
failing
because of a memory allocation failure and so returning -1 when GCC
computed bounds on what it could return if successful?


No.  I recall having seen Glibc fail with ENOMEM years ago when
formatting a floating point number to a very large precision but
I haven't seen any implementation fail.  I haven't yet looked to
see if the Glibc failure can still happen.  My reading of C and
POSIX is that snprintf is only allowed to fail due to an encoding
error, not because it runs out of memory, so such a failure would
seem like a bug.

At the same time, the cause of the failure doesn't really matter.
If the function could fail, unless GCC could determine the failure
at compile time and avoid the optimization, the transformation
wouldn't be guaranteed to be safe.  It might be something to
consider and possibly accommodate in the implementation.  It's
one of the reasons why I want to expose the optimization to
more code before enabling it.

I also haven't yet thought about how to deal with it but if it
is a possibility we want to allow for then maybe a target hook
for libc implementers to set to indicate whether sprintf can fail
and when would work.  Libc implementations that can fail under
any conditions (whether allowed by the standard or not) would
need to disable (or not enable, depending on the default) the
optimization.  I'm certainly open to other ideas.

So what are the implications for the optimization part of this patch?

Jeff


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-08 Thread Jeff Law

On 08/24/2016 05:14 PM, Manuel López-Ibáñez wrote:



I agree.  The challenge is that not all the bits this depends on
(the g_string_concat_db and parse_in globals defined in the front
end) are available in the middle end.  I've been talking to David
Malcolm about how best to factor things out of c-format.c and make
it available in both parts of the compiler under a convenient API.


Perhaps diagnostics_context could have pointers to those, forward
defined in the .h file and include the relevant libcpp headers in
diagnostics.c (or input.c). FEs that make use of those features could
initialize them (via some API) to some existing object. Otherwise,
they will work like in your patch (but within diagnostic.c). Similar
to how we initialize the FE-specific pretty-printers.

We already depend on libcpp for line-map.c, so internally depending on
other libcpp features is not so bad. The important thing is to hide
this from the clients, so that the clients do not need to be aware of
what diagnostics.c requires. That is, the middle-end and Ada should
not include headers that include libcpp headers, but diagnostics.c can
include whatever it needs.

Otherwise, the future will be again a mess and we get further away
from ever separating the FEs from the ME.
Warnings which rely on things like const/copy propagation, range 
analysis, inherently belong in the the middle end.  They're next to 
useless in the front-end.


This implies that a goodly amount of what's in c-format needs to move 
and likely bits of libcpp/line-map as well.





BTW, it would be nice to explain in comments why each header needs to
be included, besides obvious ones such as tree.h and gimple.h (it
would be great if we had guidelines on how to order included headers,
why not group together all gimple*, tree*, backend-stuff, diagnostics
stuff?). On the other hand, it is unfair to nitpick your patch
regarding this when other commits do the same.

I see this as busy work and work that easily gets out of date.

I'd rather just use Andrew's header file refactoring work to be able to 
answer these kinds of questions, canonicalize include ordering and to 
eliminate unnecessary/redundant includes.  In fact, ISTM it ought to be 
run before we hit stage1 close :-)


Jeff



Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate components

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:


Deciding what blocks should run with a certain component active so that
the total cost of executing the prologues (and epilogues) is optimal, is
not a computationally feasible problem.
Really?  It's just a dataflow problem is it not and one that ought to 
converge very quickly I'd think.  Or is it more a function of having to 
run it on so many independent components?  I'm still pondering the value 
of having every GPR be an independent component :-)


ISTM this adds a fair amount of complexity to the implementation in that 
prologues and epilogues for a particular component can run more than 
once.  Can you give a concrete example where this happens so that we can 
all understand it better?


If we keep this aspect of the implementation it seems like a note in the 
developer section would be in order.  It's certainly non-intuitive.


I only glanced over the code that seems related to this aspect of the 
implementation.  If we decide to go forward, I'd like to look at it 
again more closely.




Now all that is left is inserting prologues and epilogues on all edges
that jump into resp. out of the "active" set of blocks.  Often we need
to insert some components' prologues (or epilogues) on all edges into
(or out of) a block.  In theory cross-jumping can unify all such, but
in practice that often fails; besides, that is a lot of work.  So in
this case we insert the prologue and epilogue components at the "head"
or "tail" of a block, instead.
Cross jumping is rather simplistic, so I'm not surprised that it doesn't 
catch all this stuff.




As a final optimisation, if a block needs a prologue and its immediate
dominator has the block as a post-dominator, the dominator gets the
prologue as well.
So why not just put it in the idom and not in the dominated block? 
Doesn't this just end up running that component's prologue twice?




2016-06-07  Segher Boessenkool  

* function.c (thread_prologue_and_epilogue_insns): Recompute the
live info.  Call try_shrink_wrapping_separate.  Compute the
prologue_seq afterwards, if it has possibly changed.  Compute the
split_prologue_seq and epilogue_seq later, too.
* shrink-wrap.c: #include cfgbuild.h.
(dump_components): New function.
(struct sw): New struct.
(SW): New function.
(init_separate_shrink_wrap): New function.
(fini_separate_shrink_wrap): New function.
(place_prologue_for_one_component): New function.
(spread_components): New function.
(disqualify_problematic_components): New function.
(emit_common_heads_for_components): New function.
(emit_common_tails_for_components): New function.
(insert_prologue_epilogue_for_components): New function.
(try_shrink_wrapping_separate): New function.
* shrink-wrap.h: Declare try_shrink_wrapping_separate.
---
 gcc/function.c|  15 +-
 gcc/shrink-wrap.c | 715 ++
 gcc/shrink-wrap.h |   1 +
 3 files changed, 729 insertions(+), 2 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index bba0705..390e9a6 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5912,6 +5912,12 @@ make_epilogue_seq (void)
 void
 thread_prologue_and_epilogue_insns (void)
 {
+  if (optimize > 1)
+{
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
+}

Perhaps conditional on separate shrink wrapping?


@@ -5922,9 +5928,7 @@ thread_prologue_and_epilogue_insns (void)
   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   edge orig_entry_edge = entry_edge;

-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
   rtx_insn *prologue_seq = make_prologue_seq ();
-  rtx_insn *epilogue_seq = make_epilogue_seq ();

   /* Try to perform a kind of shrink-wrapping, making sure the
  prologue/epilogue is emitted only around those parts of the
@@ -5932,6 +5936,13 @@ thread_prologue_and_epilogue_insns (void)

   try_shrink_wrapping (_edge, prologue_seq);

+  df_analyze ();
+  try_shrink_wrapping_separate (entry_edge->dest);
+  if (crtl->shrink_wrapped_separate)
+prologue_seq = make_prologue_seq ();

Perhaps push the df_analyze call into try_shrink_wrapping_separate?

ANd if it was successful, do you need to update the DF information?  Is 
that perhaps the cause of some of the issues we're seeing with DCE, 
regrename, the scheduler?





diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b85b1c3..643e375 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "tree-pass.h"
 #include "cfgrtl.h"
+#include "cfgbuild.h"
 #include "params.h"
 #include "bb-reorder.h"
 #include "shrink-wrap.h"
@@ -1006,3 +1007,717 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn 
*prologue_seq)
   BITMAP_FREE (bb_with);
   free_dominance_info 

Re: [PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

Doing cprop on frame-related instructions blows up spectacularly.
So don't.

2016-06-07  Segher Boessenkool  

* regcprop.c (copyprop_hardreg_forward_1): Don't change
RTX_FRAME_RELATED_P instructions.

Example or testcase?

jeff



Re: [PATCH 4/9] regrename: Don't rename restores

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

A restore is supposed to restore some certain register.  Restoring it
into some other register will not work.  Don't.

2016-06-07  Segher Boessenkool  

* regrename.c (build_def_use): Invalidate chains that have a
REG_CFA_RESTORE on some instruction.
Again, how is this different from a normal epilogue that restores 
registers which regrename seems to not muck with.


Jeff



Re: [PATCH 6/9] sel-sched: Don't mess with register restores

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

If selective scheduling copies register restores it confuses dwarf2cfi.

2016-06-07  Segher Boessenkool  

* sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy
instructions with a REG_CFA_RESTORE note.
Similarly, I think you're papering over a lifetime problem of some kind 
here.


jeff



Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

Unfortunately even after the previous patch there are still a few cases
where regrename creates invalid code when used together with separate
shrink-wrapping.  At noreturn exits regrename thinks it can use all
callee-saved registers, but that is not true.  This patch disables
regrename for functions that were separately shrink-wrapped.

2016-06-07  Segher Boessenkool  

* regrename.c (gate): Disable regrename if shrink_wrapped_separate
is set in crtl.
I think this (and the prior patches) are masking a deeper issue.  It's 
almost as if we don't have correct lifetime information and the various 
return points.  I'd really like to see a deeper analysis of these issues.


jeff



Re: [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

Deleting restores (before a noreturn) that are dead confuses dwarf2cfi.

2016-06-07  Segher Boessenkool  

* dce.c (delete_unmarked_insns): Don't delete instructions with
a REG_CFA_RESTORE note.
I don't really understand this one.  Why is the restore marked dead and 
why doesn't that happen for normal epilogues?  Something wonky seems to 
be going on here.


jeff



Re: [PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

cfgcleanup would try to join noreturn paths with different components
handled.  This then fails in dwarf2cfi.

2016-06-07  Segher Boessenkool  

* cfgcleanup.c (outgoing_edges_match): Don't join noreturn calls
if shrink_wrapped_separate.
So if you only have fake edges, then you've got a non-returning call. If 
you could twiddle the comment to make that clear it'd be appreciated.


Obviously you could look at the components and allow joining if the 
components are the same.  But I don't know if that happens enough in 
practice to be worth the effort.


OK with the comment update if/when the kit as a whole is approved.

jeff





Re: [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc

2016-09-08 Thread Jeff Law

On 07/31/2016 07:42 PM, Segher Boessenkool wrote:

This patch adds a new command-line flag "-fshrink-wrap-separate", a status
flag "shrink_wrapped_separate", hooks for abstracting the target components,
and documentation for all those.

2016-06-07  Segher Boessenkool  

* common.opt (-fshrink-wrap-separate): New flag.
* doc/invoke.texi: Document it.
* doc/tm.texi.in (Shrink-wrapping separate components): New subsection.
* doc/tm.texi: Regenerate.
* emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate.
* target.def (shrink_wrap): New hook vector.
(get_separate_components, components_for_bb, disqualify_components,
emit_prologue_components, emit_epilogue_components,
set_handled_components): New hooks.
---
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi | 11 ++-
 gcc/doc/tm.texi | 54 ++
 gcc/doc/tm.texi.in  | 29 +++
 gcc/emit-rtl.h  |  4 
 gcc/target.def  | 57 +
 6 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..97d305f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9edb006..5a5c5cab 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4852,6 +4853,59 @@ This hook should add additional registers that are 
computed by the prologue to t
 True if a function's return statements should be checked for matching the 
function's return type.  This includes checking for falling off the end of a 
non-void function.  Return false if no such check should be made.
 @end deftypefn

+@node Shrink-wrapping separate components
+@subsection Shrink-wrapping separate components
+@cindex shrink-wrapping separate components
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.
The prologue may perform a variety of target dependent tasks such as 
saving callee saved registers, saving the return address, aligning the 
stack, create a local stack frame, initialize the PIC register, etc.


On some targets some of these tasks may be independent of others and 
thus may be shrink-wrapped separately.  These independent tasks are 
referred to as components and are handled generically by the target 
independent parts of GCC.


Each component has a slot in a sbitmap that is generated and maintained 
for each basic block.





+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS 
(void)
+This hook should return an @code{sbitmap} with the bits set for those
+components that can be separately shrink-wrapped in the current function.
+Return @code{NULL} if the current function should not get any separate
+shrink-wrapping.
+Don't define this hook if it would always return @code{NULL}.
+If it is defined, the other hooks in this group have to be defined as well.
+@end deftypefn
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB 
(basic_block)
+This hook should return an @code{sbitmap} with the bits set for those
+components where either the prologue component has to be executed before
+the @code{basic_block}, or the epilogue component after it, or both.
+@end deftypefn

Who is responsible for allocating and releasing the sbitmaps?


I don't have major concerns with this patch -- I'd like to see 
clarification done on the ownership of the sbitmaps (ie, who allocates 
and releases those objects).  I'd like to see if we can get a better 
introduction as well.


Jeff



Re: [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc

2016-09-08 Thread Jeff Law

On 08/29/2016 03:31 AM, Bernd Schmidt wrote:

On 08/01/2016 03:42 AM, Segher Boessenkool wrote:

+@deftypefn {Target Hook} void
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
+Emit prologue insns for the components indicated by the parameter.
+@end deftypefn
+
+@deftypefn {Target Hook} void
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
+Emit epilogue insns for the components indicated by the parameter.
+@end deftypefn


How do these actually know where to save/restore registers? The frame
pointer may have been eliminated, and SP isn't necessarily constant
during the function. Seems like you'd have to calculate CFA reg/offset
much like dwarf2out does and pass it to this hook.
So I think the confusion here is these hooks are independent of 
placement. ie, the target independent code does something like:


FOR_EACH_BB
  Build the component bitmap using the incoming edge components
  Emit the prologue components at the start of the block
  Emit the epilogue components at the end of the block


The components handled by a particular block start are set/cleared by 
the other hooks.


jeff


Re: [PATCH v2 0/9] Separate shrink-wrapping

2016-09-08 Thread Jeff Law

On 08/26/2016 09:03 AM, Bernd Schmidt wrote:

On 08/26/2016 04:50 PM, Segher Boessenkool wrote:

The head comment starts with

+/* Separate shrink-wrapping
+
+   Instead of putting all of the prologue and epilogue in one spot, we
+   can put parts of it in places where those components are executed
less
+   frequently.

and that is the long and short of it.


And that comment puzzles me. Surely prologue and epilogue are executed
only once currently, so how does frequency come into it? Again - please
provide an example.
Right, they're executed once currently.  But the prologue could be sunk 
into locations where they are not executed every time the function is 
called.  That's the basis behind shrink wrapping.


Segher's code essentially allows individual components of the prologue 
to sink to different points within the function rather than forcing the 
prologue to be sunk as an atomic unit.


Conceptually you could run the standard algorithm on each independent 
component.






The full-prologue algorithm makes as many blocks run without prologue as
possible, by duplicating blocks where that helps.  If you do this for
every component you can and up with 2**40 blocks for just 40 components,


Ok, so why wouldn't we use the existing code with the duplication part
disabled? That's a later addition anyway and isn't necessary to do
shrink-wrapping in the first place.
I think the concern here is the balance between code explosion and the 
runtime gains.


jeff


Re: [PATCH v2 0/9] Separate shrink-wrapping

2016-09-08 Thread Jeff Law

On 08/26/2016 10:27 AM, Segher Boessenkool wrote:

On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote:

On 08/26/2016 04:50 PM, Segher Boessenkool wrote:

The head comment starts with

+/* Separate shrink-wrapping
+
+   Instead of putting all of the prologue and epilogue in one spot, we
+   can put parts of it in places where those components are executed less
+   frequently.

and that is the long and short of it.


And that comment puzzles me. Surely prologue and epilogue are executed
only once currently, so how does frequency come into it? Again - please
provide an example.


If some component is only needed for 0.01% of executions of a function,
running it once for every execution is 1 times too much.

The trivial example is a function that does an early exit, but uses one
or a few non-volatile registers before that exit.  This happens in e.g.
glibc's malloc, if you want an easily accessed example.  With the current
code, *all* components will be saved and then restored shortly afterwards.
So can you expand on the malloc example a bit -- I'm pretty sure I 
understand what you're trying to do, but a concrete example may help 
Bernd and be useful for archival purposes.


I also know that Carlos is interested in the malloc example -- so I'd 
like to be able to pass that along to him.


Given the multiple early exit and fast paths through the allocator, I'm 
not at all surprised that sinking different components of the prologue 
to different locations is useful.


Also if there's a case where sinking into a loop occurs, definitely 
point that out.





The full-prologue algorithm makes as many blocks run without prologue as
possible, by duplicating blocks where that helps.  If you do this for
every component you can and up with 2**40 blocks for just 40 components,


Ok, so why wouldn't we use the existing code with the duplication part
disabled?


That would not perform nearly as well.


That's a later addition anyway and isn't necessary to do
shrink-wrapping in the first place.


No, it always did that, just not as often (it only duplicated straight-line
code before).
Presumably (I haven't looked yet), the duplication is so that we can 
isolate one or more paths which in turn allows sinking the prologue 
further on some of those paths.


This is something I'll definitely want to look at -- block duplication 
to facilitate code elimination (or in this case avoid code insertion) 
hits several areas of interest to me -- and how we balance duplication 
vs runtime savings is always interesting.


Jeff



Re: [PATCH v2 0/9] Separate shrink-wrapping

2016-09-08 Thread Jeff Law

On 08/30/2016 06:31 AM, Michael Matz wrote:

Hi,

On Fri, 26 Aug 2016, Bernd Schmidt wrote:


And that comment puzzles me. Surely prologue and epilogue are executed only
once currently, so how does frequency come into it? Again - please provide an
example.


int some_global;
int foo (void) {
  if (!some_global) {
call_this();
call_that();
x = some + large / computation;
while (x--) { much_more_code(); }
some_global = 1;
  }
  return some_global;
}

Now you need the full pro/epilogue (saving/restoring all call clobbered
regs presumably used by the large computation and loop) for only one call
of that function (the first), and then never again.  But you do need a
part of it always assuming the architecture is right, and this is a shared
library, namely the setup of the PIC pointer for the access to
some_global.




A prologue does many things, and in some cases many of them aren't
necessary for all calls (indeed that's often the case for functions
containing early-outs), so much so that the full prologue including
unnecessary parts needs more time than the functional body of a functions
particular call.  It's obvious that it would be better to move those parts
of the prologue to places where they actually are needed:

[ ... ]
Right.  Essentially Segher's patch introduces the concept of prologue 
components that are independent of each other and which can be 
shrink-wrapped separately.  The degree of independence is highly target 
specific, of course.


I think one of the questions (and I haven't looked through the whole 
thread yet to see if it's answered) is why the basic shrink-wrapping 
algorithm can't be applied to each of the prologue components -- though 
you may have answered it with your loop example below.





int f2 (void) {
  setup_pic_reg();
  if (!some_global) {
save_call_clobbers();
code_from_above();
restore_call_clobbers();
  }
  retreg = some_global;
  restore_pic_reg();
}

That includes moving parts of the prologue into loops:

int g() {
  int sum = 0;
  for (int i = 0; i < NUM; i++) {
sum += i;
if (sum >= CUTOFF) {
  some_long_winded_expression();
  that_eventually_calls_abort();
}
  }
  return sum;
}

Here all parts of the prologue that somehow make it possible to call other
functions are necessary only when the program will abort eventually: hence
is necessary only at one call of g() at most.  Again it's sensible to move
those parts inside the unlikely condition, even though it's inside a loop.
Thanks.  I'd been wondering about when it'd be useful to push prologue 
code into a loop nest when I saw the patches fly by, but didn't think 
about it too much.  I haven't looked at the shrink-wrapping literature 
in years, but I don't recall it having any concept that there were cases 
where sinking into a loop nest was profitable.


Jeff


Re: [x86] Disable STV pass if -mstackrealign is enabled.

2016-09-08 Thread Eric Botcazou
> What should I look for with  'svn annotate' ?

"Disable STV" line 5959 of config/i386/i386.c.

-- 
Eric Botcazou


Re: RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-08 Thread Joseph Myers
On Thu, 8 Sep 2016, Jason Merrill wrote:

> Various places in GCC use negate, bit-and and compare to test whether
> an integer is a power of 2, but I think it would be clearer for this
> test to be wrapped in a function.

(x & -x) == x is also true for 0.  Whatever the correct function semantics 
for 0, the comment needs to reflect them.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][expmed.c] PR middle-end/77426 Delete duplicate condition in synth_mult

2016-09-08 Thread Jeff Law

On 09/07/2016 06:59 AM, Kyrill Tkachov wrote:

Hi all,

The duplicate mode check in synth can just be deleted IMO. It was
introduced as part of r139821 that was
a much larger change introducing size/speed differentiation to the RTL
midend. So I think it's just a typo/copy-pasto.

Tested on aarch64-none-elf.
Ok?

Thanks,
Kyrill

2016-09-07  Kyrylo Tkachov  

PR middle-end/77426
* expmed.c (synth_mult): Delete duplicate mode check.

OK.
jeff


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-08 Thread Bernhard Reutner-Fischer
On 8 September 2016 at 10:31, Kyrill Tkachov
 wrote:
>
> On 07/09/16 20:03, Bernhard Reutner-Fischer wrote:
>>
>> On September 6, 2016 5:14:47 PM GMT+02:00, Kyrill Tkachov
>>  wrote:

> Thanks, fixed all the above in my tree (will be retesting).
>

>> What about debug statements? ISTM you should skip those.
>> (Isn't visited reset before entry of a pass?)
>
>
> Yes, I'll skip debug statements. Comments in gimple.h say that the visited
> property is undefined at pass boundaries, so I'd rather initialize it here.

Right.
>
>
>> Maybe I missed the bikeshedding about the name but I'd have used
>> -fmerge-stores instead.
>
>
> Wouldn't be hard to change. I can change it any point if there's a
> consensus.

Did you consider any relation to any of
https://gcc.gnu.org/PR22141
https://gcc.gnu.org/PR23684
https://gcc.gnu.org/PR47059
https://gcc.gnu.org/PR54422
and their dups
or https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg77311.html
(the -fmerge-bitfields suggestion from imgtec; maybe the testcases are
of interest)

thanks,


Re: RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-08 Thread Jakub Jelinek
On Thu, Sep 08, 2016 at 08:53:26AM -0600, Jeff Law wrote:
> On 09/07/2016 11:56 PM, Jason Merrill wrote:
> > Various places in GCC use negate, bit-and and compare to test whether
> > an integer is a power of 2, but I think it would be clearer for this
> > test to be wrapped in a function.
> > 
> > OK for trunk?
> > 
> I think the canonical way we've written that is
> 
> exact_log2 (x) != -1

We have integer_pow2p which works on trees (and is actually implemented by
testing if popcnt is 1 on the wide-int).
Calling exact_log2 for such a test if you don't need the value is a waste of
compile time, ctz_hwi doesn't need to be computed.  And, exact_log2 is -1
even on 0, which is not pow2p_hwi.

I think having pow2p_hwi would be nice, if we really start using it where
appropriate.

Jakub


Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Paul Eggert

On 09/08/2016 04:56 AM, Mark Wielaard wrote:

I don't think there is anything valgrind can do to detect that
compw really only depends on d[0] if the result is false.

valgrind (with the default --partial-loads-ok=yes) could and should do 
the same thing with cmpw that it already does with cmpl. The attached 
program compiles and runs OK with gcc -O2 and valgrind on x86-64, even 
though the machine code uses cmpl to load bytes that were not allocated.



Do gnulib and glibc syncronize?


They do at times, though not as often as we'd like. fts.c in particular 
has strayed quite a bit, to cater GNU utilities' robustness needs over 
what glibc provides. (GNU coreutils, for example, uses Gnulib fts.c even 
on glibc platforms.) At some point somebody should merge Gnulib fts.c 
back into glibc.


#include 
#include 

struct s { struct s *next; char d[]; };

int
main (void)
{
  struct s *p = malloc (offsetof (struct s, d) + 1);
  p->d[0] = 1;
  return p->d[0] == 2 && p->d[1] == 3 && p->d[2] == 4 && p->d[3] == 5;
}


Re: RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-08 Thread Jeff Law

On 09/07/2016 11:56 PM, Jason Merrill wrote:

Various places in GCC use negate, bit-and and compare to test whether
an integer is a power of 2, but I think it would be clearer for this
test to be wrapped in a function.

OK for trunk?


I think the canonical way we've written that is

exact_log2 (x) != -1


Though you could argue that's far from obvious.

Presumably you have a follow-up to start using pow2p_hwi?

jeff


Re: [x86] Disable STV pass if -mstackrealign is enabled.

2016-09-08 Thread H.J. Lu
On Wed, Sep 7, 2016 at 11:00 PM, Eric Botcazou  wrote:
>> Is there a testcase to show the problem with -mincoming-stack-boundary=
>> on Linux?
>
> I don't know, 'svn annotate' will probably tell you.

What should I look for with  'svn annotate' ?

-- 
H.J.


[PATCH][AArch64] Align FP callee-saves

2016-09-08 Thread Wilco Dijkstra
If the number of integer callee-saves is odd, the FP callee-saves use 8-byte 
aligned
LDP/STP.  Since 16-byte alignment may be faster on some CPUs, align the FP
callee-saves to 16 bytes and use the alignment gap for the last FP callee-save 
when
possible. Besides slightly different offsets for FP callee-saves, the generated 
code
doesn't change.

Bootstrap and regression pass, OK for commit?


ChangeLog:
2016-09-08  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_layout_frame):
Align FP callee-saves.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fed3b6e803821392194dc34a6c3df5f653d2e33e..075b3802c72a68f63b47574e19186e7ce3440b28
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2735,7 +2735,7 @@ static void
 aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
-  int regno;
+  int regno, last_fp_reg = INVALID_REGNUM;
 
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
@@ -2781,7 +2781,10 @@ aarch64_layout_frame (void)
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
&& !call_used_regs[regno])
-  cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+  {
+   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+   last_fp_reg = regno;
+  }
 
   if (cfun->machine->frame.emit_frame_chain)
 {
@@ -2805,9 +2808,21 @@ aarch64_layout_frame (void)
offset += UNITS_PER_WORD;
   }
 
+  HOST_WIDE_INT max_int_offset = offset;
+  offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+  bool has_align_gap = offset != max_int_offset;
+
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
   {
+   /* If there is an alignment gap between integer and fp callee-saves,
+  allocate the last fp register to it if possible.  */
+   if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+ {
+   cfun->machine->frame.reg_offset[regno] = max_int_offset;
+   break;
+ }
+
cfun->machine->frame.reg_offset[regno] = offset;
if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
  cfun->machine->frame.wb_candidate1 = regno;



Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-09-08 Thread Thomas Schwinge
Hi!

On Thu, 8 Sep 2016 19:18:30 +0800, Chung-Lin Tang  
wrote:
> On 2016/9/6 8:11 PM, Thomas Schwinge wrote:
> > On Mon, 29 Aug 2016 15:46:47 +0800, Chung-Lin Tang 
> >  wrote:
> >> this patch is a port of some changes from gomp-4_0-branch,
> >> including adding additional map type handling in OpenACC enter/exit data
> >> directives, and some pointer set handling changes. Updated
> >> testsuite case are also included.
> >>
> >> Tested on trunk to ensure no regressions, is this okay for trunk?
> > 
> >> 2016-08-29  Cesar Philippidis  
> >> Thomas Schwinge  
> >> Chung-Lin Tang  
> > 
> > Maybe I'm misremembering, but I can't remember having been involved in
> > this.  ;-)
> 
> A part of this was picked from r223178, which you committed to 
> gomp-4_0-branch.

Heh, right, though that was a commit containing "Assorted OpenACC
changes", so merging various changes from our internal development
branch, done by several people.  Anyway, nothing to waste much time on.
;-)


> >> +/* Returns the number of mappings associated with the pointer or pset. 
> >> PSET
> >> +   have three mappings, whereas pointer have two.  */
> >> +
> >>  static int
> >> -find_pset (int pos, size_t mapnum, unsigned short *kinds)
> >> +find_pointer (int pos, size_t mapnum, unsigned short *kinds)
> >>  {
> >>if (pos + 1 >= mapnum)
> >>  return 0;
> >>  
> >>unsigned char kind = kinds[pos+1] & 0xff;
> >>  
> >> -  return kind == GOMP_MAP_TO_PSET;
> >> +  if (kind == GOMP_MAP_TO_PSET)
> >> +return 3;
> >> +  else if (kind == GOMP_MAP_POINTER)
> >> +return 2;
> >> +
> >> +  return 0;
> >>  }
> > 
> > I'm still confused about that find_pset/find_pointer handling.  Why is
> > that required?  Essentially, that means that GOACC_enter_exit_data is
> > skipping over some mappings, right?  If yes, why do the front ends
> > (Fortran only?) then emit these mappings to begin with, if we're then
> > ignoring them in the runtime?
> 
> It's not skipping mappings. GOMP_MAP_PSET uses 3 continuous entries while
> GOMP_MAP_POINTER uses 2, see how these are eventually processed together
> in gomp_map_vars().

I now see how for the "pointer != 0" case, *the address of*
"hostaddrs[i]" etc. is passed to gomp_acc_insert_pointer, which then
calls gomp_map_vars.  So, you're (or more precisely, those who once
committed these changes to our internal development branch) indeed just
extend the existing GOMP_MAP_TO_PSET handling to also cover
GOMP_MAP_POINTER.  This code still doesn't look very pretty generally,
but that's not your task to fix, right now.


Thus, your patch is back in the queue, waiting for approval.


Grüße
 Thomas


[PATCH] -fsanitize=thread fixes (PR sanitizer/68260, take 2)

2016-09-08 Thread Jakub Jelinek
On Thu, Sep 08, 2016 at 02:34:18PM +0200, Jakub Jelinek wrote:
> I can split the patch into two, one dealing just with the
> __atomic_clear/__atomic_test_and_set instrumentation and another for the
> preexisting -fnon-call-exceptions ICEs.  For the latter, one possibility
> would be to error out on the -fsanitize=thread -fnon-call-exceptions
> combination.

Here is just the hopefully non-controversial first split part.
Ok for trunk?

2016-09-08  Jakub Jelinek  

PR sanitizer/68260
* tsan.c: Include target.h.
(enum tsan_atomic_action): Add bool_clear and bool_test_and_set.
(BOOL_CLEAR, BOOL_TEST_AND_SET): Define.
(tsan_atomic_table): Add BUILT_IN_ATOMIC_CLEAR and
BUILT_IN_ATOMIC_TEST_AND_SET entries.
(instrument_builtin_call): Handle bool_clear and bool_test_and_set.

* c-c++-common/tsan/pr68260.c: New test.

--- gcc/tsan.c.jj   2016-07-22 15:55:31.591287885 +0200
+++ gcc/tsan.c  2016-09-08 14:46:08.645027959 +0200
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.
 #include "tsan.h"
 #include "asan.h"
 #include "builtins.h"
+#include "target.h"
 
 /* Number of instrumented memory accesses in the current function.  */
 
@@ -240,7 +241,8 @@ instrument_expr (gimple_stmt_iterator gs
 enum tsan_atomic_action
 {
   check_last, add_seq_cst, add_acquire, weak_cas, strong_cas,
-  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst
+  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst,
+  bool_clear, bool_test_and_set
 };
 
 /* Table how to map sync/atomic builtins to their corresponding
@@ -274,6 +276,10 @@ static const struct tsan_map_atomic
   TRANSFORM (fcode, tsan_fcode, fetch_op, code)
 #define FETCH_OPS(fcode, tsan_fcode, code) \
   TRANSFORM (fcode, tsan_fcode, fetch_op_seq_cst, code)
+#define BOOL_CLEAR(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, bool_clear, ERROR_MARK)
+#define BOOL_TEST_AND_SET(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, bool_test_and_set, ERROR_MARK)
 
   CHECK_LAST (ATOMIC_LOAD_1, TSAN_ATOMIC8_LOAD),
   CHECK_LAST (ATOMIC_LOAD_2, TSAN_ATOMIC16_LOAD),
@@ -463,7 +469,11 @@ static const struct tsan_map_atomic
   LOCK_RELEASE (SYNC_LOCK_RELEASE_2, TSAN_ATOMIC16_STORE),
   LOCK_RELEASE (SYNC_LOCK_RELEASE_4, TSAN_ATOMIC32_STORE),
   LOCK_RELEASE (SYNC_LOCK_RELEASE_8, TSAN_ATOMIC64_STORE),
-  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE)
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE),
+
+  BOOL_CLEAR (ATOMIC_CLEAR, TSAN_ATOMIC8_STORE),
+
+  BOOL_TEST_AND_SET (ATOMIC_TEST_AND_SET, TSAN_ATOMIC8_EXCHANGE)
 };
 
 /* Instrument an atomic builtin.  */
@@ -615,6 +625,57 @@ instrument_builtin_call (gimple_stmt_ite
build_int_cst (NULL_TREE,
   MEMMODEL_RELEASE));
return;
+ case bool_clear:
+ case bool_test_and_set:
+   if (BOOL_TYPE_SIZE != 8)
+ {
+   decl = NULL_TREE;
+   for (j = 1; j < 5; j++)
+ if (BOOL_TYPE_SIZE == (8 << j))
+   {
+ enum built_in_function tsan_fcode
+   = (enum built_in_function)
+ (tsan_atomic_table[i].tsan_fcode + j);
+ decl = builtin_decl_implicit (tsan_fcode);
+ break;
+   }
+   if (decl == NULL_TREE)
+ return;
+ }
+   last_arg = gimple_call_arg (stmt, num - 1);
+   if (!tree_fits_uhwi_p (last_arg)
+   || memmodel_base (tree_to_uhwi (last_arg)) >= MEMMODEL_LAST)
+ return;
+   t = TYPE_ARG_TYPES (TREE_TYPE (decl));
+   t = TREE_VALUE (TREE_CHAIN (t));
+   if (tsan_atomic_table[i].action == bool_clear)
+ {
+   update_gimple_call (gsi, decl, 3, gimple_call_arg (stmt, 0),
+   build_int_cst (t, 0), last_arg);
+   return;
+ }
+   t = build_int_cst (t, targetm.atomic_test_and_set_trueval);
+   update_gimple_call (gsi, decl, 3, gimple_call_arg (stmt, 0),
+   t, last_arg);
+   stmt = gsi_stmt (*gsi);
+   lhs = gimple_call_lhs (stmt);
+   if (lhs == NULL_TREE)
+ return;
+   if (targetm.atomic_test_and_set_trueval != 1
+   || !useless_type_conversion_p (TREE_TYPE (lhs),
+  TREE_TYPE (t)))
+ {
+   tree new_lhs = make_ssa_name (TREE_TYPE (t));
+   gimple_call_set_lhs (stmt, new_lhs);
+   if (targetm.atomic_test_and_set_trueval != 1)
+ g = gimple_build_assign (lhs, NE_EXPR, new_lhs,
+  build_int_cst (TREE_TYPE (t), 0));
+   else
+ g = gimple_build_assign (lhs, NOP_EXPR, new_lhs);
+

Re: [PATCH] -fsanitize=thread fixes (PR sanitizer/68260)

2016-09-08 Thread Jakub Jelinek
On Wed, Sep 07, 2016 at 10:40:20PM +0200, Jakub Jelinek wrote:
> On Wed, Sep 07, 2016 at 09:07:45AM +0200, Richard Biener wrote:
> > > @@ -493,6 +504,8 @@ instrument_builtin_call (gimple_stmt_ite
> > >   if (!tree_fits_uhwi_p (last_arg)
> > >   || memmodel_base (tree_to_uhwi (last_arg)) >= MEMMODEL_LAST)
> > > return;
> > > + if (lookup_stmt_eh_lp (stmt))
> > > +   remove_stmt_from_eh_lp (stmt);
> > 
> > These changes look bogus to me -- how can the tsan instrumentation
> > function _not_ throw when the original function we replace it can?
> 
> The __tsan*atomic* functions are right now declared to be nothrow, so the
> patch just matches how they are declared.
> While the sync-builtins.def use
> #define ATTR_NOTHROWCALL_LEAF_LIST (flag_non_call_exceptions ? \
> ATTR_LEAF_LIST : ATTR_NOTHROW_LEAF_LIST)
> I guess I could use the same for the tsan atomics, but wonder if it will work
> properly when the libtsan is built with exceptions disabled and without
> -fnon-call-exceptions.  Perhaps it would, at least if it is built with
> -fasynchronous-unwind-tables (which is the case for x86_64 and aarch64 and
> tsan isn't supported elsewhere).

Actually, looking at the libsanitizer/tsan/ implementation, I have doubts
the atomics handling code would behave reasonably if an exception is thrown
during the atomic memory access, I'm afraid some classes wouldn't be
destructed.

I can split the patch into two, one dealing just with the
__atomic_clear/__atomic_test_and_set instrumentation and another for the
preexisting -fnon-call-exceptions ICEs.  For the latter, one possibility
would be to error out on the -fsanitize=thread -fnon-call-exceptions
combination.

Jakub


Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Florian Weimer

On 09/08/2016 02:26 PM, Bernd Schmidt wrote:

On 09/08/2016 01:21 AM, Paul Eggert wrote:


Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind
3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind
./a.out", valgrind complains "Invalid read of size 2". This is because
GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax);
sete %al", which loads the uninitialized byte p->d[1] simultaneously
with the initialized byte p->d[0].


Interesting. That optimization doesn't really depend on d being a
flexible array, so you can also reproduce a (different) valgrind warning
with the following:

#include 
#include 

struct s { int x; char d[2]; };

__attribute__((noinline,noclone)) void foo (struct s *p)
{
  p->d[0] = 1;
}

int
main (void)
{
  struct s *p = malloc (sizeof *p);
  foo (p);
  return p->d[0] == 2 && p->d[1] == 3;
}


Very interesting.  So the ASan failure reported for gnulib fts and this 
valgrind issue have separate causes (ASan does not care about undefined 
memory).


Thanks,
Florian



Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Bernd Schmidt

On 09/08/2016 01:21 AM, Paul Eggert wrote:


Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind
3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind
./a.out", valgrind complains "Invalid read of size 2". This is because
GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax);
sete %al", which loads the uninitialized byte p->d[1] simultaneously
with the initialized byte p->d[0].


Interesting. That optimization doesn't really depend on d being a 
flexible array, so you can also reproduce a (different) valgrind warning 
with the following:


#include 
#include 

struct s { int x; char d[2]; };

__attribute__((noinline,noclone)) void foo (struct s *p)
{
  p->d[0] = 1;
}

int
main (void)
{
  struct s *p = malloc (sizeof *p);
  foo (p);
  return p->d[0] == 2 && p->d[1] == 3;
}


Bernd


[committed] Fix ICE during fortran !$omp atomic write expansion (PR fortran/77500)

2016-09-08 Thread Jakub Jelinek
Hi!

expr2 for atomic write or swap is not something we want to take appart, we
just want to expand it as an rvalue as is, so we shouldn't be skipping its
conversions.  And, for the other cases, the patch adds function.isym check
so that we don't ICE if there is a call to an external function rather than
internal.

Tested on x86_64-linux, committed to trunk so far.

2016-09-08  Jakub Jelinek  

PR fortran/77500
* trans-openmp.c (gfc_trans_omp_atomic): For atomic write or
swap, don't try to look through GFC_ISYM_CONVERSION.  In other cases,
check that value.function.isym is non-NULL before dereferencing it.

* gfortran.dg/gomp/pr77500.f90: New test.

--- gcc/fortran/trans-openmp.c.jj   2016-09-06 20:00:37.0 +0200
+++ gcc/fortran/trans-openmp.c  2016-09-08 12:24:13.112540373 +0200
@@ -2818,7 +2818,11 @@ gfc_trans_omp_atomic (gfc_code *code)
   gfc_start_block ();
 
   expr2 = code->expr2;
-  if (expr2->expr_type == EXPR_FUNCTION
+  if (((atomic_code->ext.omp_atomic & GFC_OMP_ATOMIC_MASK)
+   != GFC_OMP_ATOMIC_WRITE)
+  && (atomic_code->ext.omp_atomic & GFC_OMP_ATOMIC_SWAP) == 0
+  && expr2->expr_type == EXPR_FUNCTION
+  && expr2->value.function.isym
   && expr2->value.function.isym->id == GFC_ISYM_CONVERSION)
 expr2 = expr2->value.function.actual->expr;
 
@@ -2857,6 +2861,7 @@ gfc_trans_omp_atomic (gfc_code *code)
  var = code->expr1->symtree->n.sym;
  expr2 = code->expr2;
  if (expr2->expr_type == EXPR_FUNCTION
+ && expr2->value.function.isym
  && expr2->value.function.isym->id == GFC_ISYM_CONVERSION)
expr2 = expr2->value.function.actual->expr;
}
@@ -2914,6 +2919,7 @@ gfc_trans_omp_atomic (gfc_code *code)
}
   e = expr2->value.op.op1;
   if (e->expr_type == EXPR_FUNCTION
+ && e->value.function.isym
  && e->value.function.isym->id == GFC_ISYM_CONVERSION)
e = e->value.function.actual->expr;
   if (e->expr_type == EXPR_VARIABLE
@@ -2927,6 +2933,7 @@ gfc_trans_omp_atomic (gfc_code *code)
{
  e = expr2->value.op.op2;
  if (e->expr_type == EXPR_FUNCTION
+ && e->value.function.isym
  && e->value.function.isym->id == GFC_ISYM_CONVERSION)
e = e->value.function.actual->expr;
  gcc_assert (e->expr_type == EXPR_VARIABLE
@@ -3041,6 +3048,7 @@ gfc_trans_omp_atomic (gfc_code *code)
  code = code->next;
  expr2 = code->expr2;
  if (expr2->expr_type == EXPR_FUNCTION
+ && expr2->value.function.isym
  && expr2->value.function.isym->id == GFC_ISYM_CONVERSION)
expr2 = expr2->value.function.actual->expr;
 
--- gcc/testsuite/gfortran.dg/gomp/pr77500.f90.jj   2016-09-08 
12:36:11.252253375 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr77500.f90  2016-09-08 12:30:50.0 
+0200
@@ -0,0 +1,9 @@
+! PR fortran/77500
+! { dg-do compile }
+
+program pr77500
+   real :: x
+!$omp atomic write
+   x = f()
+!$omp end atomic
+end

Jakub


[committed] Fix ICE with safelen(0) in fortran (PR fortran/77516)

2016-09-08 Thread Jakub Jelinek
Hi!

This patch fixes ICE on safelen(0).  Not adding a warning for this in the
FE, as it has been added already on gomp-4_5-branch, so when it is merged,
the testcase will just need to be adjusted for the added warning.

Tested on x86_64-linux, committed to trunk so far.

2016-09-08  Jakub Jelinek  

PR fortran/77516
* omp-low.c (lower_rec_simd_input_clauses): Use max_vf for non-positive
OMP_CLAUSE_SAFELEN_EXPR.

* gfortran.dg/gomp/pr77516.f90: New test.

--- gcc/omp-low.c.jj2016-09-06 20:00:33.0 +0200
+++ gcc/omp-low.c   2016-09-08 11:35:42.960174529 +0200
@@ -4302,7 +4302,9 @@ lower_rec_simd_input_clauses (tree new_v
{
  tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
OMP_CLAUSE_SAFELEN);
- if (c && TREE_CODE (OMP_CLAUSE_SAFELEN_EXPR (c)) != INTEGER_CST)
+ if (c
+ && (TREE_CODE (OMP_CLAUSE_SAFELEN_EXPR (c)) != INTEGER_CST
+ || tree_int_cst_sgn (OMP_CLAUSE_SAFELEN_EXPR (c)) != 1))
max_vf = 1;
  else if (c && compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c),
  max_vf) == -1)
--- gcc/testsuite/gfortran.dg/gomp/pr77516.f90.jj   2016-09-08 
12:06:07.363581303 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr77516.f90  2016-09-08 11:59:33.0 
+0200
@@ -0,0 +1,12 @@
+! PR fortran/77516
+! { dg-do compile }
+
+program pr77516
+   integer :: i, x
+   x = 0
+!$omp simd safelen(0) reduction(+:x)
+   do i = 1, 8
+  x = x + 1
+   end do
+   print *, x
+end

Jakub


Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Florian Weimer

On 09/08/2016 01:56 PM, Mark Wielaard wrote:


On Wed, Sep 07, 2016 at 04:21:29PM -0700, Paul Eggert wrote:

On 09/07/2016 04:52 AM, Mark Wielaard wrote:

If valgrind believes the
memory isn't in valid memory then it will complain about an invalid access.
But if the memory is accessible but uninitialised then it will just track
the undefinedness complain later if such a value is used.


I think the former is what happened in Gnulib fts.c before Gnulib was fixed.


BTW. Do gnulib and glibc syncronize? I had no idea gnulib also contained
fts. I recently fixed some issue (adding LFS support) in glibc.


They are expected to synchronize.  It does not always happen.

Florian


Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Mark Wielaard
Hi,

I added Julian to the CC who will probably correct any mistakes
I make. Or even might know a simple way to make valgrind detect
this idiom.

On Wed, Sep 07, 2016 at 04:21:29PM -0700, Paul Eggert wrote:
> On 09/07/2016 04:52 AM, Mark Wielaard wrote:
> > If valgrind believes the
> > memory isn't in valid memory then it will complain about an invalid access.
> > But if the memory is accessible but uninitialised then it will just track
> > the undefinedness complain later if such a value is used.
> 
> I think the former is what happened in Gnulib fts.c before Gnulib was fixed.

BTW. Do gnulib and glibc syncronize? I had no idea gnulib also contained
fts. I recently fixed some issue (adding LFS support) in glibc.

> > valgrind also has --partial-loads-ok (which in newer versions defaults
> > to =yes):
> > 
> > Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
> > aligned loads from addresses for which some bytes are addressable
> > and others are not.
> 
> Although this helps in some cases, it does not suffice in general since the
> problem can occur with 16-bit aligned loads. I think that is what happened
> with fts.c.
> 
> > Does anybody have an example program of the above issue compiled with
> > gcc that produces false positives with valgrind?
> > 
> 
> Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind 3.11.0),
> when I compile with "gcc -O2 flexouch.c" and run with "valgrind ./a.out",
> valgrind complains "Invalid read of size 2". This is because GCC compiles
> "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax); sete %al", which
> loads the uninitialized byte p->d[1] simultaneously with the initialized
> byte p->d[0].
> 
> As mentioned previously, although flexouch.c does not conform to C11, this
> is arguably a defect in C11.
> 

> #include 
> #include 
> 
> struct s { struct s *next; char d[]; };
> 
> int
> main (void)
> {
>   struct s *p = malloc (offsetof (struct s, d) + 1);
>   p->d[0] = 1;
>   return p->d[0] == 2 && p->d[1] == 3;
> }

OK, I can replicate that with the same GCC when using -O2.

==25520== Invalid read of size 2
==25520==at 0x400442: main (in /home/mark/src/tests/flexouch)
==25520==  Address 0x51fa048 is 8 bytes inside a block of size 9 alloc'd
==25520==at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==25520==by 0x40043D: main (in /home/mark/src/tests/flexouch)

Without optimization the p->d[1] == 3 is never executed.

Dump of assembler code for function main:
=> 0x00400430 <+0>: sub$0x8,%rsp
   0x00400434 <+4>: mov$0x9,%edi
   0x00400439 <+9>: callq  0x400410 
   0x0040043e <+14>:movb   $0x1,0x8(%rax)
   0x00400442 <+18>:cmpw   $0x302,0x8(%rax)
   0x00400448 <+24>:sete   %al
   0x0040044b <+27>:add$0x8,%rsp
   0x0040044f <+31>:movzbl %al,%eax
   0x00400452 <+34>:retq   
End of assembler dump.

Note that valgrind will also get this "wrong" if you do
allocate enough space, but don't initialize d[1]:

==25906== Syscall param exit_group(status) contains uninitialised byte(s)
==25906==at 0x4F00CC8: _Exit (in /usr/lib64/libc-2.23.so)
==25906==by 0x4E7119A: __run_exit_handlers (in /usr/lib64/libc-2.23.so)
==25906==by 0x4E71234: exit (in /usr/lib64/libc-2.23.so)
==25906==by 0x4E58737: (below main) (in /usr/lib64/libc-2.23.so)
==25906==  Uninitialised value was created by a heap allocation
==25906==at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==25906==by 0x40043D: main (in /home/mark/src/tests/flexouch)

I don't think there is anything valgrind can do to detect that
compw really only depends on d[0] if the result is false.

Cheers,

Mark


Re: [PATCH][AArch64] Improve legitimize_address

2016-09-08 Thread Wilco Dijkstra
Christophe Lyon wrote:
> After this patch, I've noticed a regression:
> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
> You probably need to adjust the scan pattern.

The original code was better and what we should generate. IVOpt chooses to use
indexing eventhough it is more expensive, and that is a latent bug.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068 which shows the same
cost calculation issue.

Wilco




Re: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6])

2016-09-08 Thread Thomas Schwinge
Hi!

On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener  
wrote:
> On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge  
> wrote:
> > I trimmed the CC list -- I'm looking for advice about debugging a lto1
> > ICE.
> >
> > On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers  
> > wrote:
> >> On Fri, 19 Aug 2016, Richard Biener wrote:
> >> > Can you quickly verify if LTO works with the new types?  I don't see 
> >> > anything
> >> > that would prevent it but having new global trees and backends 
> >> > initializing them
> >> > might come up with surprises (see tree-streamer.c:preload_common_nodes)
> >>
> >> Well, the execution tests are in gcc.dg/torture, which is run with various
> >> options including -flto (and I've checked the testsuite logs to confirm
> >> these tests are indeed run with such options).  Is there something else
> >> you think should be tested?
> >
> > As I noted in :
> >
> > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx types", 
> > nvptx
> > offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
> > data-type
> > mismatch that is not visible with Intel MIC offloading (using the same 
> > data
> > types) but explodes with nvptx.  I'm having a look.
> >
> > I know how to use "-save-temps -v" to re-run the ICEing lto1 in GDB; a
> > backtrace of the ICE looks as follows:
> >
> > #0  fancy_abort (file=file@entry=0x10d61d0 
> > "[...]/source-gcc/gcc/vec.h", line=line@entry=727, 
> > function=function@entry=0x10d6e3a 
> > <_ZZN3vecIP9tree_node7va_heap8vl_embedEixEjE12__FUNCTION__> "operator[]") 
> > at [...]/source-gcc/gcc/diagnostic.c:1414
> > #1  0x0058c9ef in vec > vl_embed>::operator[] (this=0x16919c0, ix=ix@entry=185) at 
> > [...]/source-gcc/gcc/vec.h:727
> > #2  0x0058ca33 in vec::operator[] 
> > (this=this@entry=0x1691998, ix=ix@entry=185) at 
> > [...]/source-gcc/gcc/vec.h:1211
> 
> so it wants tree 185 which is (given the low number) likely one streamed by
> preload_common_nodes.  This is carefully crafted to _not_ diverge by
> frontend (!) it wasn't even designed to cope with global trees being present
> or not dependent on target (well, because the target is always the
> same! mind you!)

Scary.  ;-/

> Now -- in theory it should deal with NULLs just fine (resulting in
> error_mark_node), but it can diverge when there are additional
> compount types (like vectors, complex
> or array or record types) whose element types are not in the set of
> global trees.
> The complex _FloatN types would be such a case given they appear before their
> components.  That mixes up the ordering at least.

ACK, but that's also an issue for "regular" float/complex float, which
also is in "reverse" order.  But that's "fixed" by the recursion in
gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.  (I've
put "fixed" in quotes -- doesn't that recursion mean that we're thus
putting "complex float", "float", [...], "float" (again) into the cache?
Anyway that's for later...)

> So I suggest to add a print_tree to where it does the 
> streamer_tree_cache_append
> and compare cc1 and lto1 outcome.

Thanks for all your suggestions!

As far as I can tell, tree 185 is in fact among the first trees just
after the preloaded ones...  (See record_common_node followed by
streamer_tree_cache_append with "ix == hash" vs., from "ix=180" onwards,
streamer_tree_cache_append called from elsewhere with "ix != hash".)
(I'm also noticing that this cache first is built from "ix=0" through
"ix=179", then apparently discarded, and rebuilt again, which seems
surprising but which I've not yet looked into; hopefully unrelated
issue.)  I'll continue to poke at this, but wanted to given an update
here at least.

[...]
PID=12052 [...]/source-gcc/gcc/tree-streamer.c:272:record_common_node
  constant 64>
unit size  constant 8>
align 64 symtab 0 alias set -1 canonical type 0x768e23f0 precision 
64>
PID=12052 
[...]/source-gcc/gcc/tree-streamer.c:214:streamer_tree_cache_append
  ix=178 hash=178 tree=0x768e23f0
  constant 64>
unit size  constant 8>
align 64 symtab 0 alias set -1 canonical type 0x768e23f0 precision 
64>
PID=12052 [...]/source-gcc/gcc/tree-streamer.c:272:record_common_node
  constant 128>
unit size  constant 16>
align 64 symtab 0 alias set -1 canonical type 0x768e2690 precision 
128>
PID=12052 
[...]/source-gcc/gcc/tree-streamer.c:214:streamer_tree_cache_append
  ix=179 hash=179 tree=0x768e2690
  constant 128>
unit size  constant 16>
align 64 symtab 0 alias set -1 canonical type 0x768e2690 precision 
128>
PID=12052 
[...]/source-gcc/gcc/tree-streamer.c:214:streamer_tree_cache_append
  

Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-09-08 Thread Chung-Lin Tang
On 2016/9/6 8:11 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 29 Aug 2016 15:46:47 +0800, Chung-Lin Tang  
> wrote:
>> this patch is a port of some changes from gomp-4_0-branch,
>> including adding additional map type handling in OpenACC enter/exit data
>> directives, and some pointer set handling changes. Updated
>> testsuite case are also included.
>>
>> Tested on trunk to ensure no regressions, is this okay for trunk?
> 
>> 2016-08-29  Cesar Philippidis  
>> Thomas Schwinge  
>> Chung-Lin Tang  
> 
> Maybe I'm misremembering, but I can't remember having been involved in
> this.  ;-)

A part of this was picked from r223178, which you committed to gomp-4_0-branch.

>> +/* Returns the number of mappings associated with the pointer or pset. PSET
>> +   have three mappings, whereas pointer have two.  */
>> +
>>  static int
>> -find_pset (int pos, size_t mapnum, unsigned short *kinds)
>> +find_pointer (int pos, size_t mapnum, unsigned short *kinds)
>>  {
>>if (pos + 1 >= mapnum)
>>  return 0;
>>  
>>unsigned char kind = kinds[pos+1] & 0xff;
>>  
>> -  return kind == GOMP_MAP_TO_PSET;
>> +  if (kind == GOMP_MAP_TO_PSET)
>> +return 3;
>> +  else if (kind == GOMP_MAP_POINTER)
>> +return 2;
>> +
>> +  return 0;
>>  }
> 
> I'm still confused about that find_pset/find_pointer handling.  Why is
> that required?  Essentially, that means that GOACC_enter_exit_data is
> skipping over some mappings, right?  If yes, why do the front ends
> (Fortran only?) then emit these mappings to begin with, if we're then
> ignoring them in the runtime?

It's not skipping mappings. GOMP_MAP_PSET uses 3 continuous entries while
GOMP_MAP_POINTER uses 2, see how these are eventually processed together
in gomp_map_vars().

Chung-Lin



Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-08 Thread Jonathan Wakely

On 08/09/16 09:10 +0200, Marc Glisse wrote:
Do we want a generic fallback implementation (similar to 
gcc/config/i386/gmm_malloc.h)? A windows version with _aligned_malloc 
/ _aligned_free would also be possible.


Making it work for MinGW would be nice. If there are other targets
that don't support any of C11, POSIX-2001, or memalign then we can add
a generic fallback later, but I'm not sure it's worth doing now. It
might never be needed.



[PATCH][ARM][v2] Use snprintf rather than sprintf

2016-09-08 Thread Kyrill Tkachov

Hi all,

This is a rebase and slight respin of [1] that I sent out last November to 
change all uses of
sprintf to snprintf in the arm backend.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00937.html

2016-09-08  Kyrylo Tkachov  

* config/arm/arm.c (arm_set_fixed_optab_libfunc): Use snprintf
rather than sprintf.
(arm_set_fixed_conv_libfunc): Likewise.
(arm_option_override): Likewise.
(neon_output_logic_immediate): Likewise.
(neon_output_shift_immediate): Likewise.
(arm_output_multireg_pop): Likewise.
(vfp_output_vstmd): Likewise.
(output_move_vfp): Likewise.
(output_move_neon): Likewise.
(output_return_instruction): Likewise.
(arm_elf_asm_cdtor): Likewise.
(arm_output_shift): Likewise.
(arm_output_iwmmxt_shift_immediate): Likewise.
(arm_output_iwmmxt_tinsr): Likewise.
* config/arm/neon.md (*neon_mov, VDX): Likewise.
(*neon_mov, VQXMOV): Likewise.
(neon_vc_insn): Likewise.
(neon_vc_insn_unspec): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 946f308ca84e232af8af6eca4813464914cbd59c..8ff6c0e18f7c2a2eed72e56402ed582c9f692d2d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2388,9 +2388,10 @@ arm_set_fixed_optab_libfunc (optab optable, machine_mode mode,
   char buffer[50];
 
   if (num_suffix == 0)
-sprintf (buffer, "__gnu_%s%s", funcname, modename);
+snprintf (buffer, sizeof (buffer), "__gnu_%s%s", funcname, modename);
   else
-sprintf (buffer, "__gnu_%s%s%d", funcname, modename, num_suffix);
+snprintf (buffer, sizeof (buffer), "__gnu_%s%s%d", funcname,
+	  modename, num_suffix);
 
   set_optab_libfunc (optable, mode, buffer);
 }
@@ -2409,8 +2410,8 @@ arm_set_fixed_conv_libfunc (convert_optab optable, machine_mode to,
   && ALL_FRACT_MODE_P (from) == ALL_FRACT_MODE_P (to))
 maybe_suffix_2 = "2";
 
-  sprintf (buffer, "__gnu_%s%s%s%s", funcname, fromname, toname,
-	   maybe_suffix_2);
+  snprintf (buffer, sizeof (buffer), "__gnu_%s%s%s%s", funcname,
+	fromname, toname, maybe_suffix_2);
 
   set_conv_libfunc (optable, to, from, buffer);
 }
@@ -3163,7 +3164,8 @@ arm_option_override (void)
   if (!arm_selected_tune)
 arm_selected_tune = _cores[arm_selected_cpu->core];
 
-  sprintf (arm_arch_name, "__ARM_ARCH_%s__", arm_selected_cpu->arch);
+  snprintf (arm_arch_name, sizeof (arm_arch_name),
+	"__ARM_ARCH_%s__", arm_selected_cpu->arch);
   insn_flags = arm_selected_cpu->flags;
   arm_base_arch = arm_selected_cpu->base_arch;
 
@@ -12735,9 +12737,11 @@ neon_output_logic_immediate (const char *mnem, rtx *op2, machine_mode mode,
   gcc_assert (is_valid != 0);
 
   if (quad)
-sprintf (templ, "%s.i%d\t%%q0, %%2", mnem, width);
+snprintf (templ, sizeof (templ), "%s.i%d\t%%q0, %%2",
+	  mnem, width);
   else
-sprintf (templ, "%s.i%d\t%%P0, %%2", mnem, width);
+snprintf (templ, sizeof (templ), "%s.i%d\t%%P0, %%2",
+	  mnem, width);
 
   return templ;
 }
@@ -12757,9 +12761,11 @@ neon_output_shift_immediate (const char *mnem, char sign, rtx *op2,
   gcc_assert (is_valid != 0);
 
   if (quad)
-sprintf (templ, "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width);
+snprintf (templ, sizeof (templ),
+	  "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width);
   else
-sprintf (templ, "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width);
+snprintf (templ, sizeof (templ),
+	  "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width);
 
   return templ;
 }
@@ -17858,17 +17864,17 @@ arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
   conditional = reverse ? "%?%D0" : "%?%d0";
   /* Can't use POP if returning from an interrupt.  */
   if ((regno_base == SP_REGNUM) && update && !(interrupt_p && return_pc))
-sprintf (pattern, "pop%s\t{", conditional);
+snprintf (pattern, sizeof (pattern), "pop%s\t{", conditional);
   else
 {
   /* Output ldmfd when the base register is SP, otherwise output ldmia.
  It's just a convention, their semantics are identical.  */
   if (regno_base == SP_REGNUM)
-	sprintf (pattern, "ldmfd%s\t", conditional);
+	snprintf (pattern, sizeof (pattern), "ldmfd%s\t", conditional);
   else if (update)
-	sprintf (pattern, "ldmia%s\t", conditional);
+	snprintf (pattern, sizeof (pattern), "ldmia%s\t", conditional);
   else
-	sprintf (pattern, "ldm%s\t", conditional);
+	snprintf (pattern, sizeof (pattern), "ldm%s\t", conditional);
 
   strcat (pattern, reg_names[regno_base]);
   if (update)
@@ -17924,7 +17930,8 @@ vfp_output_vstmd (rtx * operands)
   base = (REGNO (operands[1]) - FIRST_VFP_REGNUM) / 2;
   for (i = 1; i < XVECLEN (operands[2], 0); i++)
 {
-  p += sprintf ([p], ", d%d", base + i);
+  p += snprintf ([p], sizeof (pattern),
+		 ", d%d", base + i);
 }
   strcpy ([p], "}");
 
@@ -18719,7 +18726,7 @@ output_move_vfp (rtx *operands)

Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-08 Thread Jonathan Wakely

On 08/09/16 02:06 -0400, Jason Merrill wrote:

This patch adds support for C++17 allocation of types with alignment
greater than max_align_t using 'new'.  This is on by default in C++17
and can also be enabled for other -std= with -falign-new.


Nice.


If a user wants to use a different boundary than alignof(max_align_t),
perhaps because their malloc provides more or less alignment than
glibc's, they can specify -falign-new=.

The patch also adds a warning about allocating an over-aligned type
without using an aligned new-operator, which is enabled by -Wall.

libstdc++ folk: Does my configury handling of different C library
functions that might be usable for aligned allocation make sense?


The AC_CHECK_FUNCS is OK but our configure munges all the autoconf
macros to add _GLIBCXX_ as a prefix, so you need to check
_GLIBCXX_HAVE_ALIGNED_ALLOC not HAVE_ALIGNED_ALLOC, and similarly for
the other two macros. Otherwise you always get this case:

+// The C library doesn't provide any aligned allocation functions, declare
+// aligned_alloc and get a link failure if aligned new is used.
+extern "C" void *aligned_alloc(std::size_t, std::size_t);

So it will fail for a pre-C11 libc even if it provides posix_memalign.


Is
the (standard-conforming) implementation of the nothrow allocation
function OK despite Jonathan's comment in bug 68210?


Yes. If anyone (maybe Taller Technologies?) cares about building
libstdc++ with -fno-exceptions then they can patch the new_op*.cc
files to do something different when !defined(__cpp_exceptions).
Or maybe I'll look at doing that as part of fixing 68210.

I'm still a little bothered about the try-catch overhead being
incurred for the std::nothrow_t allocation functions, but assuming
that allocation failure is rare that won't usually be a problem.


OK for trunk?


The libstdc++ parts are OK with the s/HAVE/_GLIBCXX_HAVE/ change.



Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Florian Weimer

On 09/07/2016 08:32 PM, Bernd Edlinger wrote:

interesting.  I just tried the test case from PR 77330 with _Decimal128.
result: _Decimal128 did *not* trap with gcc4.8.4, but it does trap with
gcc-7.0.0.


Recent GCC versions rely on struct pointer alignment for struct member 
access, older versions did less so.


For example, this tcsh bug flew under the radar for about two years:

  

(glibc added 16-byte alignment to struct __dirstream as a side effect of 
a change to help sparc in 2013, and this did not cause any trouble with 
tcsh before.)


Florian


Re: Make max_align_t respect _Float128 [version 2]

2016-09-08 Thread Florian Weimer

On 09/07/2016 07:45 PM, Joseph Myers wrote:

On Wed, 7 Sep 2016, Florian Weimer wrote:


The existence of such a cut-off constant appears useful, but it's not clear if
it should be tied to the fundamental alignment (especially, as this discussion
shows, the fundamental alignment will be somewhat in flux).


I don't think it's in flux.  I think it's very rare for a new basic type
to be added that has increased alignment requirements compared to the
existing basic types.  _Decimal128 was added in GCC 4.3, which increased
the requirement on 32-bit x86 (only) (32-bit x86 malloc having been buggy
in that regard ever since then); __float128 / _Float128 did not increase
the requirement relative to that introduced with _Decimal128.


I said “somewhat”. :)  I don't expect many changes.  Although with the 
gnulib emulation, you could easily get different results depending on 
whether your toolchain has C11 support or not, on the same architecture.


If max_align_t denotes the cut-off point between automatic and manual 
alignment (as suggested by Richard), it really has to be set in stone, IMHO.


Florian



Re: [PATCH] [PR libcpp/71681] Fix handling header.gcc in subdirectories

2016-09-08 Thread Thomas Schwinge
Hi!

A few review comments:

On Wed, 7 Sep 2016 20:19:20 +0300, Andris Pavenis  wrote:
> This patch fixes handling header.gcc in subdirectories when command line 
> option -remap has been 
> used.

(I have not yet looked up what that mechanism actually does.)  ;-)

> Current version finds header.gcc in directories from GCC include directory 
> search path but 
> fails to find them in subdirectories due to missing directory separator.

> --- a/libcpp/files.c
> +++ b/libcpp/files.c
> @@ -1672,7 +1672,7 @@ static char *
>  remap_filename (cpp_reader *pfile, _cpp_file *file)
>  {
>const char *fname, *p;
> -  char *new_dir;
> +  char *new_dir, *p3;
>cpp_dir *dir;
>size_t index, len;
>  
> @@ -1701,9 +1701,15 @@ remap_filename (cpp_reader *pfile, _cpp_file *file)
>   return NULL;
>  
>len = dir->len + (p - fname + 1);
> -  new_dir = XNEWVEC (char, len + 1);
> +  new_dir = XNEWVEC (char, len + 2);
> +  p3 = new_dir + dir->len;
>memcpy (new_dir, dir->name, dir->len);
> -  memcpy (new_dir + dir->len, fname, p - fname + 1);
> +  if (dir->len && !IS_DIR_SEPARATOR(dir->name[dir->len - 1]))
> + {
> +   *p3++ = '/';
> +   len++;
> + }
> +  memcpy (p3, fname, p - fname + 1);
>new_dir[len] = '\0';
>  
>dir = make_cpp_dir (pfile, new_dir, dir->sysp);

(I have not yet reviewed the logic of your change itself.)  Wouldn't it
be simpler to just unconditionally add a directory separator here?

Is it OK to assume that a directory separator always is "/"?  (Think DOS,
Windows etc.  But maybe there's some "translation layer" beneath this --
I don't know.)


Can you provide some test cases?  (Ah, I now see you got some "Test
script to reproduce problem" attached to  --
this should be turned into a regression test for the GCC testsuite.)


It is good practice to assign a GCC PR to yourself if you're working on
it, and it also helps to post to the PR a comment with a link to the
mailing list archive for patch submissions, etc.


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH, testsuite]: Test compat _Complex varargs passing

2016-09-08 Thread Uros Bizjak
On Mon, Sep 5, 2016 at 1:45 PM, Joseph Myers  wrote:
> On Sun, 4 Sep 2016, Uros Bizjak wrote:
>
>> It looks that different handling of _Complex char, _Complex short and
>> _Complex float is there on purpose. Is (was?) there a limitation in a
>> c language standard that prevents passing of these arguments as
>> varargs?
>
> Well, ISO C doesn't define complex integers at all.  But it's deliberate
> (see DR#206) that _Complex float doesn't promote to _Complex double in
> variable arguments.  And there is nothing in ISO C to stop _Complex float
> being passed in variable arguments.
>
> For all these types including the complex integer ones: given that the
> front end doesn't promote them, they should be usable in variable
> arguments.

Attached patch adds various _Complex variable arguments tests to
scalar-by-value-4 and scalar-return-4 tests. These tests previously
erroneously claimed that these argument types are unsupported as
variable arguments.

2016-09-08  Uros Bizjak  

* gcc.dg/compat/scalar-by-value-4_x.c: Also test passing of
variable arguments.
* gcc.dg/compat/scalar-by-value-4_y.c (testva##NAME): New.
* gcc.dg/compat/scalar-by-value-4_main.c: Update description comment.
* gcc.dg/compat/scalar-return-4_x.c: Also test returning of
variable argument.
* gcc.dg/compat/scalar-return-4_y.c (testva##NAME): New.
* gcc.dg/compat/scalar-return-4_main.c: Update description comment.

Tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_main.c 
b/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_main.c
index 8164b44..bd024c0 100644
--- a/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_main.c
+++ b/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_main.c
@@ -1,5 +1,5 @@
 /* Test passing scalars by value.  This test includes _Complex types
-   whose real and imaginary parts cannot be used in variable-length
+   whose real and imaginary parts can be used in variable-length
argument lists.  */
 
 extern void scalar_by_value_4_x (void);
diff --git a/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_x.c 
b/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_x.c
index a4e73c9..a36a060 100644
--- a/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_x.c
+++ b/gcc/testsuite/gcc.dg/compat/scalar-by-value-4_x.c
@@ -13,6 +13,7 @@ test##NAME (TYPE x01, TYPE x02, TYPE x03, TYPE x04,   
\
 TYPE x05, TYPE x06, TYPE x07, TYPE x08,\
 TYPE x09, TYPE x10, TYPE x11, TYPE x12,\
 TYPE x13, TYPE x14, TYPE x15, TYPE x16);   \
+extern void testva##NAME (int n, ...); \
\
 void   \
 check##NAME (TYPE x, TYPE v)   \
@@ -62,6 +63,81 @@ testit##NAME (void)  
\
  g13##NAME, g14##NAME, g15##NAME, g16##NAME);  \
   DEBUG_NL;\
   DEBUG_FPUTS (#NAME); \
+  DEBUG_FPUTS (" testva:");\
+  DEBUG_NL;\
+  testva##NAME (1, \
+   g01##NAME); \
+  DEBUG_NL;\
+  testva##NAME (2, \
+   g01##NAME, g02##NAME);  \
+  DEBUG_NL;\
+  testva##NAME (3, \
+   g01##NAME, g02##NAME, g03##NAME);   \
+  DEBUG_NL;\
+  testva##NAME (4, \
+   g01##NAME, g02##NAME, g03##NAME, g04##NAME);\
+  DEBUG_NL;\
+  testva##NAME (5, \
+   g01##NAME, g02##NAME, g03##NAME, g04##NAME, \
+   g05##NAME); \
+  DEBUG_NL;\
+  testva##NAME (6, \
+   g01##NAME, g02##NAME, g03##NAME, g04##NAME, \
+   g05##NAME, g06##NAME);  \
+  DEBUG_NL;\
+  testva##NAME (7, \
+   g01##NAME, g02##NAME, g03##NAME, g04##NAME, \
+   g05##NAME, g06##NAME, g07##NAME);   \
+  DEBUG_NL;\
+  testva##NAME (8, \
+   g01##NAME, 

Re: [PATCH][v3] GIMPLE store merging pass

2016-09-08 Thread Kyrill Tkachov


On 07/09/16 20:03, Bernhard Reutner-Fischer wrote:

On September 6, 2016 5:14:47 PM GMT+02:00, Kyrill Tkachov 
 wrote:

Hi all,

s/contigous/contiguous/
s/ where where/ where/

+struct merged_store_group
+{
+  HOST_WIDE_INT start;
+  HOST_WIDE_INT width;
+  unsigned char *val;
+  unsigned int align;
+  auto_vec stores;
+  /* We record the first and last original statements in the sequence because
+ because we'll need their vuse/vdef and replacement position.  */
+  gimple *last_stmt;

s/ because because/ because/

Why aren't these two HWIs unsigned, likewise in store_immediate_info and in 
most other spots in the patch?

+ fprintf (dump_file, "Afer writing ");
s/Afer /After/

/access if prohibitively slow/s/ if /is /

I'd get rid of successful_p in imm_store_chain_info::output_merged_stores.


Thanks, fixed all the above in my tree (will be retesting).



+unsigned int
+pass_store_merging::execute (function *fun)
+{
+  basic_block bb;
+  hash_set orig_stmts;
+
+  FOR_EACH_BB_FN (bb, fun)
+{
+  gimple_stmt_iterator gsi;
+  HOST_WIDE_INT num_statements = 0;
+  /* Record the original statements so that we can keep track of
+statements emitted in this pass and not re-process new
+statements.  */
+  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next ())
+   {
+ gimple_set_visited (gsi_stmt (gsi), false);
+ num_statements++;
+   }
+
+  if (num_statements < 2)
+   continue;

What about debug statements? ISTM you should skip those.
(Isn't visited reset before entry of a pass?)


Yes, I'll skip debug statements. Comments in gimple.h say that the visited
property is undefined at pass boundaries, so I'd rather initialize it here.



Maybe I missed the bikeshedding about the name but I'd have used -fmerge-stores 
instead.


Wouldn't be hard to change. I can change it any point if there's a consensus.

Thanks for the feedback.
Kyrill


Thanks,

The v3 of this patch addresses feedback I received on the version
posted at [1].
The merged store buffer is now represented as a char array that we
splat values onto with
native_encode_expr and native_interpret_expr. This allows us to merge
anything that native_encode_expr
accepts, including floating point values and short vectors. So this
version extends the functionality
of the previous one in that it handles floating point values as well.

The first phase of the algorithm that detects the contiguous stores is
also slightly refactored according
to feedback to read more fluently.

Richi, I experimented with merging up to MOVE_MAX bytes rather than
word size but I got worse results on aarch64.
MOVE_MAX there is 16 (because it has load/store register pair
instructions) but the 128-bit immediates that we ended
synthesising were too complex. Perhaps the TImode immediate store RTL
expansions could be improved, but for now
I've left the maximum merge size to be BITS_PER_WORD.

I've disabled the pass for PDP-endian targets as the merging code
proved to be quite fiddly to get right for different
endiannesses and I didn't feel comfortable writing logic for
BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN targets without serious
testing capabilities. I hope that's ok (I note the bswap pass also
doesn't try to do anything on such targets).

Tested on arm, aarch64, x86_64 and on big-endian arm and aarch64.

How does this version look?
Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01512.html

2016-09-06  Kyrylo Tkachov  

 PR middle-end/22141
 * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
 * common.opt (fstore-merging): New Optimization option.
 * opts.c (default_options_table): Add entry for
 OPT_ftree_store_merging.
 * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
 * passes.def: Insert pass_tree_store_merging.
 * tree-pass.h (make_pass_store_merging): Declare extern
 prototype.
 * gimple-ssa-store-merging.c: New file.
 * doc/invoke.texi (Optimization Options): Document
 -fstore-merging.

2016-09-06  Kyrylo Tkachov  
 Jakub Jelinek  

 PR middle-end/22141
 * gcc.c-torture/execute/pr22141-1.c: New test.
 * gcc.c-torture/execute/pr22141-2.c: Likewise.
 * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
 * gcc.target/aarch64/ldp_stp_4.c: Likewise.
 * gcc.dg/store_merging_1.c: New test.
 * gcc.dg/store_merging_2.c: Likewise.
 * gcc.dg/store_merging_3.c: Likewise.
 * gcc.dg/store_merging_4.c: Likewise.
 * gcc.dg/store_merging_5.c: Likewise.
 * gcc.dg/store_merging_6.c: Likewise.
 * gcc.target/i386/pr22141.c: Likewise.
 * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.






Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-08 Thread Marc Glisse

On Thu, 8 Sep 2016, Jason Merrill wrote:


This patch adds support for C++17 allocation of types with alignment
greater than max_align_t using 'new'.  This is on by default in C++17
and can also be enabled for other -std= with -falign-new.


Great :-)


If a user wants to use a different boundary than alignof(max_align_t),
perhaps because their malloc provides more or less alignment than
glibc's, they can specify -falign-new=.

The patch also adds a warning about allocating an over-aligned type
without using an aligned new-operator, which is enabled by -Wall.

libstdc++ folk: Does my configury handling of different C library
functions that might be usable for aligned allocation make sense?  Is
the (standard-conforming) implementation of the nothrow allocation
function OK despite Jonathan's comment in bug 68210?  OK for trunk?


Do we want a generic fallback implementation (similar to 
gcc/config/i386/gmm_malloc.h)? A windows version with _aligned_malloc / 
_aligned_free would also be possible.


--
Marc Glisse


RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-08 Thread Jason Merrill
This patch adds support for C++17 allocation of types with alignment
greater than max_align_t using 'new'.  This is on by default in C++17
and can also be enabled for other -std= with -falign-new.

If a user wants to use a different boundary than alignof(max_align_t),
perhaps because their malloc provides more or less alignment than
glibc's, they can specify -falign-new=.

The patch also adds a warning about allocating an over-aligned type
without using an aligned new-operator, which is enabled by -Wall.

libstdc++ folk: Does my configury handling of different C library
functions that might be usable for aligned allocation make sense?  Is
the (standard-conforming) implementation of the nothrow allocation
function OK despite Jonathan's comment in bug 68210?  OK for trunk?

Joseph: My introduction of max_align_t_align will need to be merged
with your change to cxx_fundamental_alignment_p.

Tested x86_64-pc-linux-gnu.
commit bea7a322a3b2badc3e80bddee7d5e6b20f4c2fe0
Author: Jason Merrill 
Date:   Wed Sep 7 16:22:54 2016 -0400

Implement P0035R4, C++17 new of over-aligned types.

gcc/cp/
* cp-tree.h (enum cp_tree_index): Add CPTI_ALIGN_TYPE.
(align_type_node): New macro.
* call.c (build_operator_new_call): Handle C++17 aligned new.
(second_parm_is_size_t, build_op_delete_call): Likewise.
(non_placement_deallocation_fn_p): Likewise. Rename to
usual_deallocation_fn_p.
(aligned_allocation_fn_p, aligned_deallocation_fn_p): New.
* decl.c (cxx_init_decl_processing): Add aligned new support.
* init.c (type_has_new_extended_alignment): New.
(build_new_1): Handle aligned new.
* tree.c (vec_copy_and_insert): New.
gcc/c-family/
* c.opt: Add -faligned-new and -Waligned-new.
* c-common.c (max_align_t_align): Split out from...
(cxx_fundamental_alignment_p): ...here.
* c-common.h: Declare it.
* c-cppbuiltin.c (c_cpp_builtins): Handle aligned new.
libstdc++-v3/
* libsupc++/new: Declare aligned new/delete operators.
* config/abi/pre/gnu.ver: Export them.
* configure.ac: Check for aligned_alloc, posix_memalign, memalign.
* libsupc++/new_opa.cc: New.
* libsupc++/new_opant.cc: New.
* libsupc++/new_opva.cc: New.
* libsupc++/new_opva.cc: New.
* libsupc++/del_opa.cc: New.
* libsupc++/del_opant.cc: New.
* libsupc++/del_opsa.cc: New.
* libsupc++/del_opva.cc: New.
* libsupc++/del_opvant.cc: New.
* libsupc++/del_opvsa.cc: New.
* libsupc++/Makefile.am: Build them.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index de9f881..12d0e18 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -12875,6 +12875,19 @@ scalar_to_vector (location_t loc, enum tree_code code, 
tree op0, tree op1,
   return stv_nothing;
 }
 
+/* Return the alignment of std::max_align_t.
+
+   [support.types.layout] The type max_align_t is a POD type whose alignment
+   requirement is at least as great as that of every scalar type, and whose
+   alignment requirement is supported in every context.  */
+
+unsigned
+max_align_t_align ()
+{
+  return MAX (TYPE_ALIGN (long_long_integer_type_node),
+ TYPE_ALIGN (long_double_type_node));
+}
+
 /* Return true iff ALIGN is an integral constant that is a fundamental
alignment, as defined by [basic.align] in the c++-11
specifications.
@@ -12883,14 +12896,12 @@ scalar_to_vector (location_t loc, enum tree_code 
code, tree op0, tree op1,
 
[A fundamental alignment is represented by an alignment less than or
 equal to the greatest alignment supported by the implementation
-in all contexts, which is equal to
-alignof(max_align_t)].  */
+in all contexts, which is equal to alignof(max_align_t)].  */
 
 bool
-cxx_fundamental_alignment_p  (unsigned align)
+cxx_fundamental_alignment_p (unsigned align)
 {
-  return (align <=  MAX (TYPE_ALIGN (long_long_integer_type_node),
-TYPE_ALIGN (long_double_type_node)));
+  return (align <= max_align_t_align ());
 }
 
 /* Return true if T is a pointer to a zero-sized aggregate.  */
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 42ce969..baacb6c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -863,6 +863,7 @@ extern bool keyword_begins_type_specifier (enum rid);
 extern bool keyword_is_storage_class_specifier (enum rid);
 extern bool keyword_is_type_qualifier (enum rid);
 extern bool keyword_is_decl_specifier (enum rid);
+extern unsigned max_align_t_align (void);
 extern bool cxx_fundamental_alignment_p (unsigned);
 extern bool pointer_to_zero_sized_aggr_p (tree);
 extern bool diagnose_mismatched_attributes (tree, tree);
diff --git 

Re: [x86] Disable STV pass if -mstackrealign is enabled.

2016-09-08 Thread Eric Botcazou
> Is there a testcase to show the problem with -mincoming-stack-boundary=
> on Linux?

I don't know, 'svn annotate' will probably tell you.

-- 
Eric Botcazou