Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Martin Sebor

On 09/16/2016 04:29 AM, Richard Biener wrote:

On Fri, Sep 16, 2016 at 5:29 AM, Martin Sebor  wrote:

__builtin_object_size fails for POINTER_PLUS expressions involving
non-constant offsets into objects of known size, causing GCC to fail
to detect (and add instrumentation to prevent) buffer overflow in
some basic cases such as the following:

  void f (unsigned i)
  {
char d [3];
memcpy (d + i, "abcdef", 5);
  }

Since the size of the destination object is known, the call to memcpy
is guaranteed to write past the end of it regardless of the value of
the offset.

The attached patch enhances __builtin_object_size to handle this case
by returning the size of the whole object as the maximum and the size
of the object minus T_MAX for the type of the offset T as the minimum.

The patch also adds handling of ranges even though only very few cases
benefit from it because the VRP pass runs after the object size pass.
The one case that does appear to profit is when the value of the offset
is constrained by its type, as in

  char a [1000];

  unsigned g (unsigned char i)
  {
char *p = a + i;
return __builtin_object_size (p, 2);
  }

Here get_range_info () lets __builtin_object_size determine that
the minimum number of bytes between (a + i) and (a + sizeof s) is
sizeof a - UCHAR_MAX.

The patch results in 64 more checking calls in a Binutils build than
before.


I believe that you can't simply use the min/max of the ranges the way you
do nor can you assume zero for the minimum size as op1 might be
negative (for POINTER_PLUS_EXPR you need to interpret the offset
as signed).

Unless I completely misremember how tree-object-size.c works, of course.

Say,

char a[1000];

unsigned g (signed char i)
{
  char *p = [10] + i;
  return __builtin_object_size (p, 1);
}

range for i will be [-128, 127] but we'd like to return 1000 here I think.


Thanks for the example.  I agree that 1000 is the correct result
in type < 2.  Curiously, the range reported for this case is
actually the anti-range ~[128, -129], and because the patch
didn't handle those the result was 990 (i.e., the same as
__builtin_object_size([10], type) for type < 2).

I think the way to handle cases like this in (type < 2) might be
by computing the size of the whole object first (i.e., ignoring
the array subscript) and then adjusting the result down according
to the bounds of the range.  Let me work on that.

Thanks
Martin



[PATCH] Fix typo in Libstdc++ Profile Mode docs

2016-09-16 Thread Jonathan Wakely

* doc/xml/manual/profile_mode.xml: Fix typo.
* doc/html/manual/profile_mode_devel.html: Regenerate.

Committed to trunk.

commit 70bd84b48945ab73c35f0c841614dc51f72697ca
Author: redi 
Date:   Fri Sep 16 22:09:15 2016 +

Fix typo in Libstdc++ Profile Mode docs

* doc/xml/manual/profile_mode.xml: Fix typo.
* doc/html/manual/profile_mode_devel.html: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@240204 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/doc/xml/manual/profile_mode.xml 
b/libstdc++-v3/doc/xml/manual/profile_mode.xml
index 0125f55..2f5eb6bf 100644
--- a/libstdc++-v3/doc/xml/manual/profile_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/profile_mode.xml
@@ -619,7 +619,7 @@ it helps the user focus on the key problems and ignore the 
uninteresting ones.
include/profile/impl/profiler.h.
Hook names must start with __profcxx_.
Make sure they transform
-   in no code with -D_NO_GLBICXX_PROFILE_MAGIC.
+   in no code with -D_NO_GLIBCXX_PROFILE_MAGIC.
Make sure all calls to any method in namespace __gnu_profile
is protected against reentrance using macro
_GLIBCXX_PROFILE_REENTRANCE_GUARD.


[patch] Fix ICE on ACATS test for Aarch64 at -O

2016-09-16 Thread Eric Botcazou
Hi,

for the attached reduced testcase, the ICE is:

eric@polaris:~/gnat/bugs/P901-028> ~/build/gcc/aarch64-linux/gcc/gnat1 -quiet 
p.adb -O -I ~/build/gcc/aarch64-linux/gcc/ada/rts
+===GNAT BUG DETECTED==+
| 7.0.0 20160914 (experimental) [trunk revision 240142] (aarch64-linux) GCC 
error:|
| in expand_shift_1, at expmed.c:2451  |
| Error detected around p.adb:21:29

#0  internal_error (gmsgid=0x22327b7 "in %s, at %s:%d")
at /home/eric/svn/gcc/gcc/diagnostic.c:1347
#1  0x01e2373b in fancy_abort (
file=0x1f965f8 "/home/eric/svn/gcc/gcc/expmed.c", line=2451, 
function=0x1f96ce7  "expand_shift_1")
at /home/eric/svn/gcc/gcc/diagnostic.c:1415
#2  0x00eb435c in expand_shift_1 (code=RSHIFT_EXPR, mode=OImode, 
shifted=0x768e02e8, amount=0x768bcef0, target=0x0, unsignedp=1)
at /home/eric/svn/gcc/gcc/expmed.c:2451
#3  0x00eb43bd in expand_shift (code=RSHIFT_EXPR, mode=OImode, 
shifted=0x768e02e8, amount=255, target=0x0, unsignedp=1)
at /home/eric/svn/gcc/gcc/expmed.c:2467
#4  0x00ebefe3 in emit_store_flag (target=0x768de780, code=NE, 
op0=0x768de798, op1=0x76c3d400, mode=OImode, unsignedp=1, 
normalizep=1) at /home/eric/svn/gcc/gcc/expmed.c:5826
#5  0x00ebe979 in emit_store_flag (target=0x768de780, code=NE, 
op0=0x768dc138, op1=0x768dc030, mode=OImode, unsignedp=1, 
normalizep=1) at /home/eric/svn/gcc/gcc/expmed.c:5670
#6  0x00ebf0ab in emit_store_flag_force (target=0x768de780, 
code=NE, op0=0x768dc138, op1=0x768dc030, mode=OImode, unsignedp=1, 
normalizep=1) at /home/eric/svn/gcc/gcc/expmed.c:5860
#7  0x00ef0aba in do_store_flag (ops=0x7fffd4d0, 
target=0x768de780, mode=QImode) at /home/eric/svn/gcc/gcc/expr.c:11408
#8  0x00ee6873 in expand_expr_real_2 (ops=0x7fffd4d0, target=0x0, 
tmode=QImode, modifier=EXPAND_NORMAL) at 
/home/eric/svn/gcc/gcc/expr.c:9196

It's an attempt at generating a store-flag sequence with OImode and it fails 
because there are no shift operations supported in that mode.  It turns out 
that emit_store_flag_force knows how to fall back on a branchy sequence in 
that case so the attached patch simply removes the assertion.

Tested on x86-64/Linux, OK for the mainline?


2016-09-16  Eric Botcazou  

* expmed.c (expand_shift_1): Remove assertion and adjust comment.
(expand_shift): Adjust comment.

-- 
Eric BotcazouIndex: expmed.c
===
--- expmed.c	(revision 240142)
+++ expmed.c	(working copy)
@@ -2247,7 +2247,7 @@ expand_dec (rtx target, rtx dec)
and AMOUNT the rtx for the amount to shift by.
Store the result in the rtx TARGET, if that is convenient.
If UNSIGNEDP is nonzero, do a logical shift; otherwise, arithmetic.
-   Return the rtx for where the value is.  */
+   Return the rtx for where the value is or 0 if that cannot be done.  */
 
 static rtx
 expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
@@ -2448,7 +2448,6 @@ expand_shift_1 (enum tree_code code, mac
 	 define_expand for lshrsi3 was added to vax.md.  */
 }
 
-  gcc_assert (temp);
   return temp;
 }
 
@@ -2457,7 +2456,7 @@ expand_shift_1 (enum tree_code code, mac
and AMOUNT the amount to shift by.
Store the result in the rtx TARGET, if that is convenient.
If UNSIGNEDP is nonzero, do a logical shift; otherwise, arithmetic.
-   Return the rtx for where the value is.  */
+   Return the rtx for where the value is, or 0 if that cannot be done.  */
 
 rtx
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,


Re: [PATCH 1/9] Introduce class rtx_reader

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

Bundle up various global variables within gensupport.c into a
class rtx_reader, with a view towards making it easier to run the
code more than once in-process.

gcc/ChangeLog:
* genconstants.c (main): Introduce noop_reader and convert call
to read_md_files to a method call.
* genenums.c (main): Likewise.
* genmddeps.c (main): Likewise.
* genpreds.c (write_tm_constrs_h): Replace use of "in_fname" with
rtx_reader_ptr->get_top_level_filename ().
(write_tm_preds_h): Likewise.
(write_insn_preds_c): Likewise.
* gensupport.c (class gen_reader): New subclass of rtx_reader.
(rtx_handle_directive): Convert to...
(gen_reader::handle_unknown_directive): ...this.
(init_rtx_reader_args_cb): Convert return type from bool to
rtx_reader *.  Create a gen_reader instance, using it for the
call to read_md_files.  Return it if no errors occur.
(init_rtx_reader_args): Convert return type from bool to
rtx_reader *.
* gensupport.h (init_rtx_reader_args_cb): Likewise.
(init_rtx_reader_args_cb): Likewise.
* read-md.c (struct file_name_list): Move to class rtx_reader.
(read_md_file): Delete in favor of rtx_reader::m_read_md_file.
(read_md_filename): Delete in favor of
rtx_reader::m_read_md_filename.
(read_md_lineno): Delete in favor of rtx_reader::m_read_md_lineno.
(in_fname): Delete in favor of rtx_reader::m_toplevel_fname.
(base_dir): Delete in favor of rtx_reader::m_base_dir.
(first_dir_md_include): Delete in favor of
rtx_reader::m_first_dir_md_include.
(last_dir_md_include_ptr): Delete in favor of
rtx_reader::m_last_dir_md_include_ptr.
(max_include_len): Delete.
(rtx_reader_ptr): New.
(fatal_with_file_and_line): Use get_filename and get_lineno
accessors of rtx_reader_ptr.
(require_char_ws): Likewise.
(rtx_reader::read_char): New method, based on ::read_char.
(rtx_reader::unread_char): New method, based on ::unread_char.
(read_escape): Use get_filename and get_lineno accessors of
rtx_reader_ptr.
(read_braced_string): Use get_lineno accessor of rtx_reader_ptr.
(read_string): Use get_filename and get_lineno accessors of
rtx_reader_ptr.
(rtx_reader::rtx_reader): New ctor.
(rtx_reader::~rtx_reader): New dtor.
(handle_include): Convert from a function to...
(rtx_reader::handle_include): ...this method, converting
handle_directive from a callback to a virtual function.
(handle_file): Likewise, converting to...
(rtx_reader::handle_file): ...this method.
(handle_toplevel_file): Likewise, converting to...
(rtx_reader::handle_toplevel_file): ...this method.
(rtx_reader::get_current_location): New method.
(parse_include): Convert from a function to...
(rtx_reader::add_include_path): ...this method, dropping redundant
update to unused max_include_len.
(read_md_files): Convert from a function to...
(rtx_reader::read_md_files): ...this method, converting
handle_directive from a callback to a virtual function.
(noop_reader::handle_unknown_directive): New method.
* read-md.h (directive_handler_t): Delete this typedef.
(in_fname): Delete.
(read_md_file): Delete.
(read_md_lineno): Delete.
(read_md_filename): Delete.
(class rtx_reader): New class.
(rtx_reader_ptr): New decl.
(class noop_reader): New subclass of rtx_reader.
(read_char): Reimplement in terms of rtx_reader::read_char.
(unread_char): Reimplement in terms of rtx_reader::unread_char.
(read_md_files): Delete.
* read-rtl.c (read_rtx_code): Update for deletion of globals
read_md_filename and read_md_lineno.

I don't see anything terribly objectionable here.

It looks like you use a singleton to avoid passing the object around 
everywhere, but given the state of all this prior to this patch, I can 
live with the cleanup as a whole.


I'll note Richard Sandiford. hasn't chimed in here.  You might ping him 
directly to see if he's got any feedback.  If he doesn't prior to say 
Wed, this is OK for the trunk.


jeff


[build] Fix race condition during libgcc build

2016-09-16 Thread Eric Botcazou
Hi,

we have some new machines which seem to be very good at stumbling upon race 
conditions during bootstrap.  Another example with libgcc:

/azun.a/gnatmail/sandbox/gnatcross-cont/x86-linux/gcc/build/./gcc/xgcc -
B/azun.a/gnatmail/sandbox/gnatcross-cont/x86-linux/gcc/build/./gcc/ -
B/azun.a/gnatmail/sandbox/gnatcross-cont/x86-linux/gcc/pkg/i686-pc-linux-
gnu/bin/ -B/azun.a/gnatmail/sandbox/gnatcross-cont/x86-linux/gcc/pkg/i686-pc-
linux-gnu/lib/ -isystem /azun.a/gnatmail/sandbox/gnatcross-cont/x86-
linux/gcc/pkg/i686-pc-linux-gnu/include -isystem 
/azun.a/gnatmail/sandbox/gnatcross-cont/x86-linux/gcc/pkg/i686-pc-linux-
gnu/sys-include-g -O2 -O2  -g -O2 -DIN_GCC-W -Wall -Wno-narrowing -
Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-
style-definition  -isystem ./include   -fpic -mlong-double-80 -DUSE_ELF_SYMVER 
-g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fpic -mlong-
double-80 -DUSE_ELF_SYMVER -I. -I. -I../.././gcc -I../../../src/libgcc -
I../../../src/libgcc/. -I../../../src/libgcc/../gcc -
I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -
DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o _negdi2_s.o -MT 
_negdi2_s.o -MD -MP -MF _negdi2_s.dep -DSHARED -DL_negdi2 -c 
../../../src/libgcc/libgcc2.c
config.status: linking ../../../src/libgcc/unwind-generic.h to unwind.h
config.status: linking ../../../src/libgcc/config/i386/linux-unwind.h to md-
unwind-support.h
config.status: linking ../../../src/libgcc/config/i386/sfp-machine.h to sfp-
machine.h
config.status: linking ../../../src/libgcc/gthr-posix.h to gthr-default.h
In file included from ../../../src/libgcc/libgcov-interface.c:27:0:
../../../src/libgcc/gthr.h:148:26: fatal error: gthr-default.h: No such file 
or directory
 #include "gthr-default.h"
  ^
compilation terminated.
Makefile:894: recipe for target '_gcov_reset.o' failed
make[3]: *** [_gcov_reset.o] Error 1
make[3]: *** Waiting for unfinished jobs

So there is a race condition between the creation of (soft) links by the 
libgcc configure script and the use of these links by the compilation of files 
launched by the Makefile.

Tentative fix attached, tested on a bunch of machines for several native 
platforms, OK for the mainline?


2016-09-16  Eric Botcazou  

* configure.ac: Do not create links, only substitute the filenames.
* configure: Regenerate.
* Makefile.in: Assign the substitution results to variables.
(LIBGCC_LINKS): Define.
(enable-execute-stack.c): New rule.
(unwind.h): Likewise.
(md-unwind-support.h): Likewise.
(sfp-machine.h): Likewise.
(gthr-default.h): Likewise.
Add $(LIBGCC_LINKS) to the prerequisites of all object files and
unwind.h as prerequisite of install-unwind_h-forbuild.

-- 
Eric BotcazouIndex: Makefile.in
===
--- Makefile.in	(revision 240142)
+++ Makefile.in	(working copy)
@@ -43,6 +43,11 @@ enable_vtable_verify = @enable_vtable_ve
 enable_decimal_float = @enable_decimal_float@
 fixed_point = @fixed_point@
 with_aix_soname = @with_aix_soname@
+enable_execute_stack = @enable_execute_stack@
+unwind_header = @unwind_header@
+md_unwind_header = @md_unwind_header@
+sfp_machine_header = @sfp_machine_header@
+thread_header = @thread_header@
 
 host_noncanonical = @host_noncanonical@
 real_host_noncanonical = @real_host_noncanonical@
@@ -345,6 +350,21 @@ SHLIBUNWIND_INSTALL =
 tmake_file = @tmake_file@
 include $(srcdir)/empty.mk $(tmake_file)
 
+# Create links to files specified in config.host.
+LIBGCC_LINKS = enable-execute-stack.c unwind.h md-unwind-support.h \
+   sfp-machine.h gthr-default.h
+
+enable-execute-stack.c: $(srcdir)/$(enable_execute_stack)
+	-$(LN_S) $< $@
+unwind.h: $(srcdir)/$(unwind_header)
+	-$(LN_S) $< $@
+md-unwind-support.h: $(srcdir)/config/$(md_unwind_header)
+	-$(LN_S) $< $@
+sfp-machine.h: $(srcdir)/config/$(sfp_machine_header)
+	-$(LN_S) $< $@
+gthr-default.h: $(srcdir)/$(thread_header)
+	-$(LN_S) $< $@
+
 # Collect target defines and headers from config.host.
 libgcc_tm_defines = @tm_defines@
 libgcc_tm_file = @tm_file@
@@ -1069,10 +1089,10 @@ all: $(extra-parts)
 $(libgcc-objects) $(libgcc-s-objects) $(libgcc-eh-objects) \
 	$(libgcov-objects) \
 	$(libunwind-objects) $(libunwind-s-objects) \
-	$(EXTRA_PARTS): libgcc_tm.h
+	$(EXTRA_PARTS): $(LIBGCC_LINKS) libgcc_tm.h
 
 # Copy unwind.h to the place where gcc will look for it at build-time
-install-unwind_h-forbuild:
+install-unwind_h-forbuild: unwind.h
 	dest=$(gcc_objdir)/include/tmp-unwind.h; \
 	cp unwind.h $$dest; \
 	chmod a+r $$dest; \
Index: configure.ac
===
--- configure.ac	(revision 240142)
+++ configure.ac	(working copy)
@@ -548,11 +548,11 @@ GCC_AC_THREAD_HEADER([$target_thread_fil
 AC_SUBST(cpu_type)
 AC_SUBST(extra_parts)
 

Re: [PATCH 5/9] Introduce class function_reader

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 15:28 -0600, Jeff Law wrote:
> On 09/08/2016 06:30 PM, David Malcolm wrote:
> > This patch generalizes the RTL-reading capabilities so that they
> > can be run on the host as well as the build machine.
> > The available rtx in rtl.def changes dramatically between these
> > two configurations, so a fair amount of #ifdef GENERATOR_FILE is
> > required to express this.
> > 
> > This patch introduces a function_reader subclass of rtx_reader,
> > capable of reading an RTL function dump (or part of one),
> > reconstructing a cfun with a CFG and basic blocks containing insns.
> > 
> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add errors.o, read-md.o, read-rtl.o,
> > read-rtl-function.o, and selftest-rtl.o.
> 
> > * cfgexpand.c (pass_expand::execute): Move stack
> > initializations
> > to rtl_data::init_stack_alignment and call it.  Pass "true"
> > for new "emit_insns" param of expand_function_start.
> > * emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and
> > regno_reg_rtx resizing logic to...
> > (emit_status::ensure_regno_capacity): ...this new method.
> > (init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc
> > rather than ggc_vec_alloc.
> > (rtl_data::init_stack_alignment): New method.
> > (get_insn_by_uid): New function.
> > * emit-rtl.h (rtl_data::init_stack_alignment): New method.
> > * errors.c: Use consistent pattern for bconfig.h vs config.h
> > includes.
> > (progname): Wrap with #ifdef GENERATOR_FILE.
> > (error): Likewise.  Add "error: " to message.
> > (fatal): Likewise.
> > (internal_error): Likewise.
> > (trim_filename): Likewise.
> > (fancy_abort): Likewise.
> > * errors.h (struct file_location): Move here from read-md.h.
> > (file_location::file_location): Likewise.
> > (error_at): New decl.
> > * function-tests.c (selftest::verify_three_block_rtl_cfg):
> > Remove
> > "static".
> > * function.c (instantiate_decls): Guard call to
> > instantiate_decls_1 with if (DECL_INITIAL (fndecl)).
> > (expand_function_start): Add param "emit_insns", and use it to
> > guard the various gen/emit calls.
> > * function.h (emit_status::ensure_regno_capacity): New method.
> > (expand_function_start): Add bool param to decl.
> > * gensupport.c (gen_reader::gen_reader): Add NULL for new
> > policy
> > param of rtx_reader ctor.
> > * print-rtl.c (print_rtx): Print "(nil)" rather than an empty
> > string for NULL strings.  Print "(nil)" for NULL basic blocks.
> > * read-md.c (read_skip_construct): Provide forward decl.
> > (read_skip_spaces): Support '/'.
> > (require_char): New function.
> > (require_word_ws): New function.
> > (peek_char): New function.
> > (read_name): Rename to...
> > (read_name_1): ...this new static function, adding "out_loc"
> > param,
> > and converting "missing name or number" to returning false,
> > rather
> > than failing.
> > (read_name): Reimplement in terms of read_name_1.
> > (read_name_or_nil): New function.
> > (read_string): Handle "(nil)" by returning NULL.  */
> > (rtx_reader::rtx_reader): Add rtl_reader_policy * param, using
> > it to initialize m_policy.
> > (rtx_reader::~rtx_reader): Free m_base_dir.  Clean up global
> > data.
> > * read-md.h (struct file_location): Move to errors.h.
> > (file_location::file_location): Likewise.
> > Include errors.h.
> > (class regno_remapper): New class.
> > (struct rtl_reader_policy): New struct.
> > (rtx_reader::rtx_reader): Add rtl_reader_policy * param.
> > (rtx_reader::add_fixup_insn_uid): New vfunc.
> > (rtx_reader::add_fixup_bb): New vfunc.
> > (rtx_reader::add_fixup_note_insn_basic_block): New vfunc.
> > (rtx_reader::add_fixup_source_location): New vfunc.
> > (rtx_reader::add_fixup_jump_label): New vfunc.
> > (rtx_reader::add_fixup_expr): New vfunc.
> > (rtx_reader::remap_regno): New method.
> > (rtx_reader::m_policy): New field.
> > (noop_reader::noop_reader): Add NULL for new policy param of
> > rtx_reader ctor.
> > (peek_char): New decl.
> > (require_char): New decl.
> > (require_word_ws): New decl.
> > (read_name): Convert return type from void to file_location.
> > (read_name_or_nil): New decl.
> > * read-rtl-function.c: New file.
> > * read-rtl-function.h: New file.
> > * read-rtl.c: Potentially include config.h rather than
> > bconfig.h.
> > For host, include function.h and emit-rtl.h.
> > (apply_subst_iterator): Wrap with #ifdef GENERATOR_FILE.
> > (bind_subst_iter_and_attr): Likewise.
> > (add_condition_to_string): Likewise.
> > (add_condition_to_rtx): Likewise.
> > (apply_attribute_uses): Likewise.
> > (add_current_iterators): Likewise.
> > (apply_iterators): Likewise.
> > (initialize_iterators): Guard usage of apply_subst_iterator
> > with
> > #ifdef 

Delete useless .split2 dump file at -O0 on non-x86 targets

2016-09-16 Thread Eric Botcazou
Doing -fdump-rtl-all at -O0 on a non-x86 target yields a .split2 file which is 
mightily confusing since nothing is split, so you can easily burn a dozen of 
minutes trying to pinpoint the first split point after reload...

Tested on x86-64/Linux and SPARC/Solaris, applied on the mainline.


2016-09-16  Eric Botcazou  

* recog.c (rest_of_handle_split_after_reload): Delete.
(pass_split_after_reload::gate): New method.
(pass_split_after_reload::execute): Call split_all_insns directly.

-- 
Eric BotcazouIndex: recog.c
===
--- recog.c	(revision 240142)
+++ recog.c	(working copy)
@@ -3901,17 +3901,6 @@ make_pass_split_all_insns (gcc::context
   return new pass_split_all_insns (ctxt);
 }
 
-static unsigned int
-rest_of_handle_split_after_reload (void)
-{
-  /* If optimizing, then go ahead and split insns now.  */
-#ifndef STACK_REGS
-  if (optimize > 0)
-#endif
-split_all_insns ();
-  return 0;
-}
-
 namespace {
 
 const pass_data pass_data_split_after_reload =
@@ -3935,9 +3924,23 @@ public:
   {}
 
   /* opt_pass methods: */
+  virtual bool gate (function *)
+{
+  /* If optimizing, then go ahead and split insns now.  */
+  if (optimize > 0)
+	return true;
+
+#ifdef STACK_REGS
+  return true;
+#else
+  return false;
+#endif
+}
+
   virtual unsigned int execute (function *)
 {
-  return rest_of_handle_split_after_reload ();
+  split_all_insns ();
+  return 0;
 }
 
 }; // class pass_split_after_reload


[PATCH] Fix documentation for overflow-checking builtins

2016-09-16 Thread Jonathan Wakely

This fixes a pasto in the manual. Approved by Jakub on IRC.

Committed to trunk and gcc-6-branch and gcc-5-branch.

commit 7bb31ae6738252bcb7839699cab5e50e8702b51a
Author: Jonathan Wakely 
Date:   Fri Sep 16 22:34:24 2016 +0100

Fix documentation for overflow-checking builtins

	* doc/extend.texi (Integer Overflow Builtins): Fix type of out
	parameters for functions taking long long arguments.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8df9d62..299986d8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9881,10 +9881,10 @@ together with checking whether the operations overflowed.
 @deftypefn {Built-in Function} bool __builtin_add_overflow (@var{type1} a, @var{type2} b, @var{type3} *res)
 @deftypefnx {Built-in Function} bool __builtin_sadd_overflow (int a, int b, int *res)
 @deftypefnx {Built-in Function} bool __builtin_saddl_overflow (long int a, long int b, long int *res)
-@deftypefnx {Built-in Function} bool __builtin_saddll_overflow (long long int a, long long int b, long int *res)
+@deftypefnx {Built-in Function} bool __builtin_saddll_overflow (long long int a, long long int b, long long int *res)
 @deftypefnx {Built-in Function} bool __builtin_uadd_overflow (unsigned int a, unsigned int b, unsigned int *res)
 @deftypefnx {Built-in Function} bool __builtin_uaddl_overflow (unsigned long int a, unsigned long int b, unsigned long int *res)
-@deftypefnx {Built-in Function} bool __builtin_uaddll_overflow (unsigned long long int a, unsigned long long int b, unsigned long int *res)
+@deftypefnx {Built-in Function} bool __builtin_uaddll_overflow (unsigned long long int a, unsigned long long int b, unsigned long long int *res)
 
 These built-in functions promote the first two operands into infinite precision signed
 type and perform addition on those promoted operands.  The result is then
@@ -9907,10 +9907,10 @@ after addition, conditional jump on carry etc.
 @deftypefn {Built-in Function} bool __builtin_sub_overflow (@var{type1} a, @var{type2} b, @var{type3} *res)
 @deftypefnx {Built-in Function} bool __builtin_ssub_overflow (int a, int b, int *res)
 @deftypefnx {Built-in Function} bool __builtin_ssubl_overflow (long int a, long int b, long int *res)
-@deftypefnx {Built-in Function} bool __builtin_ssubll_overflow (long long int a, long long int b, long int *res)
+@deftypefnx {Built-in Function} bool __builtin_ssubll_overflow (long long int a, long long int b, long long int *res)
 @deftypefnx {Built-in Function} bool __builtin_usub_overflow (unsigned int a, unsigned int b, unsigned int *res)
 @deftypefnx {Built-in Function} bool __builtin_usubl_overflow (unsigned long int a, unsigned long int b, unsigned long int *res)
-@deftypefnx {Built-in Function} bool __builtin_usubll_overflow (unsigned long long int a, unsigned long long int b, unsigned long int *res)
+@deftypefnx {Built-in Function} bool __builtin_usubll_overflow (unsigned long long int a, unsigned long long int b, unsigned long long int *res)
 
 These built-in functions are similar to the add overflow checking built-in
 functions above, except they perform subtraction, subtract the second argument
@@ -9921,10 +9921,10 @@ from the first one, instead of addition.
 @deftypefn {Built-in Function} bool __builtin_mul_overflow (@var{type1} a, @var{type2} b, @var{type3} *res)
 @deftypefnx {Built-in Function} bool __builtin_smul_overflow (int a, int b, int *res)
 @deftypefnx {Built-in Function} bool __builtin_smull_overflow (long int a, long int b, long int *res)
-@deftypefnx {Built-in Function} bool __builtin_smulll_overflow (long long int a, long long int b, long int *res)
+@deftypefnx {Built-in Function} bool __builtin_smulll_overflow (long long int a, long long int b, long long int *res)
 @deftypefnx {Built-in Function} bool __builtin_umul_overflow (unsigned int a, unsigned int b, unsigned int *res)
 @deftypefnx {Built-in Function} bool __builtin_umull_overflow (unsigned long int a, unsigned long int b, unsigned long int *res)
-@deftypefnx {Built-in Function} bool __builtin_umulll_overflow (unsigned long long int a, unsigned long long int b, unsigned long int *res)
+@deftypefnx {Built-in Function} bool __builtin_umulll_overflow (unsigned long long int a, unsigned long long int b, unsigned long long int *res)
 
 These built-in functions are similar to the add overflow checking built-in
 functions above, except they perform multiplication, instead of addition.


Re: [PATCH 8/9] final.c selftests

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:45 -0600, Jeff Law wrote:
> On 09/08/2016 06:30 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * final.c: Include selftest.h and selftest-rtl.h.
> > (class selftest::temp_asm_out): New subclass of
> > selftest::named_temp_file.
> > (selftest::temp_asm_out::temp_asm_out): New ctor.
> > (selftest::temp_asm_out::~temp_asm_out): New dtor.
> > (class selftest::asm_out_test): New subclass of
> > selftest::rtl_dump_test.
> > (selftest::asm_out_test::asm_out_test): New ctor.
> > (selftest::test_jump_insn): New function.
> > (selftest::test_empty_function): New function.
> > (selftest::test_asm_for_insn): New function.
> > (TEST_ASM_FOR_INSN): New macro.
> > (selftest::test_x86_64_leal): New function.
> > (selftest::test_x86_64_negl): New function.
> > (selftest::test_x86_64_cmpl): New function.
> > (selftest::test_x86_64_cmovge): New function.
> > (selftest::test_x86_64_ret): New function.
> > (selftest::final_c_tests): New function.
> > * selftest-run-tests.c (selftest::run_tests): Call
> > selftest::final_c_tests.
> > * selftest.h (selftest::final_c_tests): New decl.
> I'm really not sure how useful these tests are going to be and would 
> question the long term maintenance costs of keeping them up-to-date.
> 
> I could see perhaps verifying that when there are multiple
> alternatives 
> that the correct one is selected or somesuch, but these tests really 
> don't seem to be covering anything particularly useful.

My thinking here was that it might be useful to verify insn recognition
and output when someone is bringing up a new target, or adding new
insns to a .md file; the selftest::test_x86_64_cmpl show the beginnings
of how one might write that in selftest form.

But I'm happy to drop it.
Dave


Re: [PATCH 7/9] combine.c selftests

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:40 -0600, Jeff Law wrote:
> On 09/08/2016 06:30 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * combine.c: Include selftest.h and selftest-rtl.h.
> > (try_combine): Add assertion on this_basic_block.
> > (class selftest::combine_test): New subclass of
> > selftest::tl_dump_test.
> > (selftest::combine_test::combine_test): New ctor.
> > (selftest::test_combining_shifts): New function.
> > (selftest::test_non_combinable_shifts): New function.
> > (selftest::combine_c_tests): New function.
> > * selftest-run-tests.c (selftest::run_tests): Run
> > selftest::combine_c_tests.
> > * selftest.h (selftest::combine_c_tests): New decl.
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 1b262f9..9c148bb 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -2625,6 +2627,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
> > rtx_insn *i1, rtx_insn *i0,
> >rtx new_other_notes;
> >int i;
> > 
> > +  gcc_assert (this_basic_block);
> Presumably when you set up the self test the first time this was NULL
> :-)

Indeed :)

> > +
> > +/* combine_test's constructor.  Write DUMP_CONTENT to a tempfile
> > and load
> > +   it.  Initialize df and perform dataflow analysis.  */
> > +
> > +combine_test::combine_test (const char *dump_content,
> > +   int dumped_first_pseudo_regno)
> > +: rtl_dump_test (dump_content, dumped_first_pseudo_regno),
> > +  m_df_test ()
> > +{
> > +  /* The dataflow instance should have been created by m_df_test's
> > ctor.  */
> > +  gcc_assert (df);
> > +
> > +  /* From rest_of_handle_combine.  */
> > +  df_set_flags (/*DF_LR_RUN_DCE + */ DF_DEFER_INSN_RESCAN);
> > +  df_note_add_problem ();
> > +  df_analyze ();
> > +}
> So rather than taking a string (which is a pain to construct), then 
> writing it out to a file, then reading it in, what's wrong with
> having 
> RTL fragments in a file?
> 

I plan to rework these fixtures so that they can accept both string
fragments and filenames.


Re: [PATCH 6/9] df selftests

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:34 -0600, Jeff Law wrote:
> On 09/08/2016 06:30 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * df-core.c: Include selftest.h and selftest-rtl.h.
> > (selftest::dataflow_test::dataflow_test): New ctor.
> > (selftest::dataflow_test::~dataflow_test): New dtor.
> > (selftest::test_df_single_set): New function.
> > (selftest::df_core_c_tests): New function.
> > * selftest-run-tests.c (selftest::run_tests): Call it.
> > * selftest.h (selftest::df_core_c_tests): New decl.
> > ---
> >  gcc/df-core.c| 77
> > 
> >  gcc/selftest-run-tests.c |  1 +
> >  gcc/selftest.h   |  1 +
> >  3 files changed, 79 insertions(+)
> > 
> > diff --git a/gcc/df-core.c b/gcc/df-core.c
> > index e531d58..cb8e2f9 100644
> > --- a/gcc/df-core.c
> > +++ b/gcc/df-core.c
> > @@ -384,6 +384,8 @@ are write-only operations.
> >  #include "cfganal.h"
> >  #include "tree-pass.h"
> >  #include "cfgloop.h"
> > +#include "selftest.h"
> > +#include "selftest-rtl.h"
> > 
> >  static void *df_get_bb_info (struct dataflow *, unsigned int);
> >  static void df_set_bb_info (struct dataflow *, unsigned int, void
> > *);
> > @@ -2482,3 +2484,78 @@ debug_df_chain (struct df_link *link)
> >df_chain_dump (link, stderr);
> >fputc ('\n', stderr);
> >  }
> > +
> > +#if CHECKING_P
> > +
> > +namespace selftest {
> > +
> > +/* dataflow_test's constructor.  Initialize df.  */
> > +
> > +dataflow_test::dataflow_test ()
> > +{
> > +  rest_of_handle_df_initialize ();
> > +}
> > +
> > +/* dataflow_test's destructor.  Clean up df.  */
> > +
> > +dataflow_test::~dataflow_test ()
> > +{
> > +  rest_of_handle_df_finish ();
> > +}
> > +
> > +/* Verify df_note on a trivial function.  */
> > +
> > +void
> > +test_df_single_set ()
> > +{
> > +  const char *input_dump
> > += "(insn 1 0 0 2 (set (reg:SI 100) (reg:SI 101)) -1 (nil))\n";
> > +  rtl_dump_test t (input_dump, 100);
> > +  //print_rtl_with_bb (stdout, get_insns (), 1024);
> Obviously this call to pritn_rtl_with_bb should disappear...
> 
> 
> > +
> > +  dataflow_test dftest;
> > +
> > +  df_note_add_problem ();
> > +  df_analyze ();
> > +  //df_dump (stderr);
> ANd this call to df_dump.
> 
> > +
> > +  ASSERT_EQ (SET_SRC (PATTERN (insn)), note0->element ());
> > +  ASSERT_EQ (REG_DEAD, REG_NOTE_KIND (note0));
> > +
> > +  ASSERT_EQ (SET_DEST (PATTERN (insn)), note1->element ());
> > +  ASSERT_EQ (REG_UNUSED, REG_NOTE_KIND (note1));
> I don't think the ordering of the notes is guaranteed.
> 
> Like the other RTL test I've looked at, I'd be real curious to know
> if 
> you get any uninitialized memory reads if you run this test under
> valgrind.

FWIW there don't seem to be any.


Re: [PATCH 5/9] Introduce class function_reader

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

This patch generalizes the RTL-reading capabilities so that they
can be run on the host as well as the build machine.
The available rtx in rtl.def changes dramatically between these
two configurations, so a fair amount of #ifdef GENERATOR_FILE is
required to express this.

This patch introduces a function_reader subclass of rtx_reader,
capable of reading an RTL function dump (or part of one),
reconstructing a cfun with a CFG and basic blocks containing insns.

gcc/ChangeLog:
* Makefile.in (OBJS): Add errors.o, read-md.o, read-rtl.o,
read-rtl-function.o, and selftest-rtl.o.



* cfgexpand.c (pass_expand::execute): Move stack initializations
to rtl_data::init_stack_alignment and call it.  Pass "true"
for new "emit_insns" param of expand_function_start.
* emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and
regno_reg_rtx resizing logic to...
(emit_status::ensure_regno_capacity): ...this new method.
(init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc
rather than ggc_vec_alloc.
(rtl_data::init_stack_alignment): New method.
(get_insn_by_uid): New function.
* emit-rtl.h (rtl_data::init_stack_alignment): New method.
* errors.c: Use consistent pattern for bconfig.h vs config.h
includes.
(progname): Wrap with #ifdef GENERATOR_FILE.
(error): Likewise.  Add "error: " to message.
(fatal): Likewise.
(internal_error): Likewise.
(trim_filename): Likewise.
(fancy_abort): Likewise.
* errors.h (struct file_location): Move here from read-md.h.
(file_location::file_location): Likewise.
(error_at): New decl.
* function-tests.c (selftest::verify_three_block_rtl_cfg): Remove
"static".
* function.c (instantiate_decls): Guard call to
instantiate_decls_1 with if (DECL_INITIAL (fndecl)).
(expand_function_start): Add param "emit_insns", and use it to
guard the various gen/emit calls.
* function.h (emit_status::ensure_regno_capacity): New method.
(expand_function_start): Add bool param to decl.
* gensupport.c (gen_reader::gen_reader): Add NULL for new policy
param of rtx_reader ctor.
* print-rtl.c (print_rtx): Print "(nil)" rather than an empty
string for NULL strings.  Print "(nil)" for NULL basic blocks.
* read-md.c (read_skip_construct): Provide forward decl.
(read_skip_spaces): Support '/'.
(require_char): New function.
(require_word_ws): New function.
(peek_char): New function.
(read_name): Rename to...
(read_name_1): ...this new static function, adding "out_loc" param,
and converting "missing name or number" to returning false, rather
than failing.
(read_name): Reimplement in terms of read_name_1.
(read_name_or_nil): New function.
(read_string): Handle "(nil)" by returning NULL.  */
(rtx_reader::rtx_reader): Add rtl_reader_policy * param, using
it to initialize m_policy.
(rtx_reader::~rtx_reader): Free m_base_dir.  Clean up global data.
* read-md.h (struct file_location): Move to errors.h.
(file_location::file_location): Likewise.
Include errors.h.
(class regno_remapper): New class.
(struct rtl_reader_policy): New struct.
(rtx_reader::rtx_reader): Add rtl_reader_policy * param.
(rtx_reader::add_fixup_insn_uid): New vfunc.
(rtx_reader::add_fixup_bb): New vfunc.
(rtx_reader::add_fixup_note_insn_basic_block): New vfunc.
(rtx_reader::add_fixup_source_location): New vfunc.
(rtx_reader::add_fixup_jump_label): New vfunc.
(rtx_reader::add_fixup_expr): New vfunc.
(rtx_reader::remap_regno): New method.
(rtx_reader::m_policy): New field.
(noop_reader::noop_reader): Add NULL for new policy param of
rtx_reader ctor.
(peek_char): New decl.
(require_char): New decl.
(require_word_ws): New decl.
(read_name): Convert return type from void to file_location.
(read_name_or_nil): New decl.
* read-rtl-function.c: New file.
* read-rtl-function.h: New file.
* read-rtl.c: Potentially include config.h rather than bconfig.h.
For host, include function.h and emit-rtl.h.
(apply_subst_iterator): Wrap with #ifdef GENERATOR_FILE.
(bind_subst_iter_and_attr): Likewise.
(add_condition_to_string): Likewise.
(add_condition_to_rtx): Likewise.
(apply_attribute_uses): Likewise.
(add_current_iterators): Likewise.
(apply_iterators): Likewise.
(initialize_iterators): Guard usage of apply_subst_iterator with
#ifdef GENERATOR_FILE.
(read_conditions): Wrap with #ifdef GENERATOR_FILE.
(read_mapping): Likewise.

Re: [PATCH 0/9] RFC: selftests based on RTL dumps

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:05 -0600, Jeff Law wrote:
> On 09/13/2016 05:15 AM, Bernd Schmidt wrote:
> > > 
> > > Note that the loader now resets INSN_CODE to -1, regardless of
> > > the
> > > actual code passed in, to force re-recognition, and to isolate
> > > the
> > > dumps somewhat from changes to the .md files.  So although the
> > > above
> > > says insn 641 and 642 (for some snapshot of the aarch64 md file),
> > > it
> > > gets reset to -1.
> > 
> > Best to find out a way to avoid including it in the strings then,
> > to
> > avoid confusion.
> We should also twiddle how we represent registers in the dumps. 
> Identifying hard regs by name (so we can map back to a hard reg if
> the 
> hard regs change), identifying pseudos by number that isn't affected
> if 
> the hard register set changes ie, p0, p1, p2, p3 where the number is 
> REGNO (x) - FIRST_PSEUDO_REGISTER. identifying the virtual registers, 
> etc.

Good idea.

I don't know if you saw this yet, but the patch has logic from
renumbering pseudos on load (see class regno_remapper), but dumping
them as p0, p1 etc and reloading them as such seems much easier for
everyone.

> The key being rather than put a ton of smarts/hacks in a reader, we 
> should work to have the RTL writer give us something more useful. 
>  That 
> may mean simple changes to the output, or some conditional changes
> (like 
> not emitting the INSN_CODE or its name).

That would make the reader a lot less grim; it has a lot of warts for
dealing with all the special cases in the current output format.

But maybe it is useful to be able to deal with the current output
format.  For example, I was able to look at PR 71779 and find some
fragmentary RTL dumps there (comment #2) and use them.  I *did* need to
edit them a little, so maybe it's OK to require a little editing
(especially with older dumps... to what extent has the format changed
over the years?)


Re: [PATCH 9/9] cse.c selftests

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:26 -0600, Jeff Law wrote:
> On 09/08/2016 06:30 PM, David Malcolm wrote:
> > This patch uses rtl_dump_test to start building out a test suite
> > for cse.
> > 
> > I attempted to create a reproducer for PR 71779; however I'm not
> > yet
> > able to replicate the bogus cse reported there via the test case.
> > 
> > gcc/ChangeLog:
> > * cse.c: Include selftest.h and selftest-rtl.h.
> > (selftest::test_simple_cse): New function.
> > (selftest::test_pr71779): New function.
> > (selftest::cse_c_tests): New function.
> > * selftest-run-tests.c (selftest::run_tests): Call
> > selftest::cse_c_tests.
> > * selftest.h (selftest::cse_c_tests): New decl.
> So as I look at this, the first thing that comes to mind is whether
> or 
> not to refine the dump output.
> 
> There's a lot of useless crap in there that really only helps folks
> that 
> sitting inside a debugger dumping hunks of RTL (I'm thinking 
> specifically about the pointers back to tree nodes for DECLs).  Those
> addresses are useless in other contexts.
> 
> When possible I don't think we want the tests to be target specific. 
> Hmm, I'm probably about to argue for Bernd's work...  The 71779
> testcase 
> is a great example -- it uses high/lo_sum.  Not all targets support
> that 
> -- as long as we don't try to recognize those insns we're likely OK,
> but 
> that seems fragile long term.  Having an idealized target means we
> can 
> ignore much of these issues.

An alternative would be to pick a specific target for each test.


> > ---
> >  gcc/cse.c| 109
> > +++
> >  gcc/selftest-run-tests.c |   1 +
> >  gcc/selftest.h   |   1 +
> >  3 files changed, 111 insertions(+)
> > 
> > diff --git a/gcc/cse.c b/gcc/cse.c
> > index 0bfd7ff..f4f06fe 100644
> > --- a/gcc/cse.c
> > +++ b/gcc/cse.c
> > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not
> > see
> >  #include "tree-pass.h"
> >  #include "dbgcnt.h"
> >  #include "rtl-iter.h"
> > +#include "selftest.h"
> > +#include "selftest-rtl.h"
> > 
> >  #ifndef LOAD_EXTEND_OP
> >  #define LOAD_EXTEND_OP(M) UNKNOWN
> > @@ -7773,3 +7775,110 @@ make_pass_cse_after_global_opts
> > (gcc::context *ctxt)
> >  {
> >return new pass_cse_after_global_opts (ctxt);
> >  }
> > +
> > +#if CHECKING_P
> > +
> > +namespace selftest {
> > +
> > +/* Selftests for CSE.  */
> > +
> > +/* Simple test of eliminating a redundant (reg + 1) computation
> > +   i.e. that:
> > + r101 = r100 + 1;
> > + r102 = r100 + 1; <<< common subexpression
> > + *r103 = r101 * r102;
> > +   can be CSE-ed to:
> > + r101 = r100 + 1;
> > + r102 = r101; <<< replaced
> > + *r103 = r101 * r102;
> > +   by cse_main.  */
> So I'm real curious, what happens if you run this RTL selftest under 
> valgrind?  I have the sneaking suspicion that we'll start doing some 
> uninitialized memory reads.

I'm seeing various leaks (an htab within linemap_init for all of the
line_table fixtures), but no uninitialized reads.

> > +
> > +static void
> > +test_simple_cse ()
> > +{
> > +  /* Only run this tests for i386.  */
> > +#ifndef I386_OPTS_H
> > +  return;
> > +#endif
> > +
> > +  const char *input_dump
> > += (/* "r101 = r100 + 1;" */
> > +   "(insn 1 0 2 2 (set (reg:SI 101)\n"
> > +   "   (plus:SI (reg:SI 100)\n"
> > +   "(const_int 1 [0x1]))) -1
> > (nil))\n"
> > +   /* "r102 = r100 + 1;" */
> > +   "(insn 2 1 3 2 (set (reg:SI 102)\n"
> > +   "   (plus:SI (reg:SI 100)\n"
> > +   "(const_int 1 [0x1]))) -1
> > (nil))\n"
> > +   /* "*r103 = r101 * r102;" */
> > +   "(insn 3 2 0 2 (set (mem:SI (reg:SI 103) [1 i+0 S4 A32])\n"
> > +   "   (mult:SI (reg:SI 101) (reg:SI 102))) -1
> > (nil))\n"
> > +   );
> I don't think we need to comment each RTL insn for those which are so
> obvious.  It just adds to the visual clutter.

This may have been more for my benefit; I'm still relatively new to RTL
:)

>   +
> > +  /* Dump taken from comment 2 of PR 71779, of
> > + "...the relevant memory access coming out of expand"
> > + with basic block IDs added, and prev/next insns set to
> > + 0 at ends.  */
> > +  const char *input_dump
> > += (";; MEM[(struct isl_obj *)] = _obj_map_vtable;\n"
> > +   "(insn 1045 0 1046 2 (set (reg:SI 480)\n"
> > +   "(high:SI (symbol_ref:SI (\"isl_obj_map_vtable\")
> > [flags 0xc0] )))
> > y.c:12702 -1\n"
> > +   " (nil))\n"
> > +   "(insn 1046 1045 1047 2 (set (reg/f:SI 479)\n"
> > +   "(lo_sum:SI (reg:SI 480)\n"
> > +   "(symbol_ref:SI (\"isl_obj_map_vtable\") [flags
> > 0xc0] ))) y.c:12702 
> > -1\n"
> > +   " (expr_list:REG_EQUAL (symbol_ref:SI
> > (\"isl_obj_map_vtable\") [flags 0xc0]  > isl_obj_map_vtable>)\n"
> > +   "(nil)))\n"
> > +   "(insn 1047 

Re: [PATCH 0/9] RFC: selftests based on RTL dumps

2016-09-16 Thread David Malcolm
On Fri, 2016-09-16 at 14:00 -0600, Jeff Law wrote:
> On 09/14/2016 02:37 AM, Richard Biener wrote:
> > 
> > Note that while the "snippets" may largely work (not sure how many
> > you tried to come up with) I see the issue that a lot of RTL "unit
> > tests" would need some trees set up, like to properly form
> > MEM_EXPRs
> > or REG_DECL or even SYMBOL_REFs.  So I fear that the scope of
> > unit-tests we can implement with the proposed scheme is very
> > limited
> > (you may also need other stuff setup, like alias analysis or parts
> > of
> > IRA or cost analysis parts).  So I agree a separate testing backend
> > is a good way to make unit-testing more stable on the target side
> > we
> > also need a way to provide input on some of the global state that
> > is
> > currently set up by frontends.
> Agreed across the board.  I think we're still early in the learning 
> phase on this stuff.   I shudder when I think about the amount of
> state 
> that we depend on, but which is not represented in the RTL dumps.
> 
> But I do think there are some things we can test for in an RTL self 
> testing framework and that having one would be a significant step
> forward.
> 
> So I think we have two big questions.
> 
> First, does David's work have value as a way to directly test pieces
> of 
> the RTL pipeline without having to generate all the RTL bits by hand.
>   I 
> think the answer is yes.
> 
> Second, will David's work help identify internal state that is not 
> expressed in the RTL dumps or poor modularity (ie, cases like trees 
> embedded within the RTL structures).  I think the answer to this is
> yes 
> as well.
> 
> Third, is a framework like Bernd's useful as well and can it be mated
> with David's work.  And I think the answer is yes & yes as well with
> the 
> result being more useful than either Bernd or David's work in
> isolation.

As far as I understand Bernd's suggestion there seem to be two parts to
it:
(a) a kind of virtual target, which can act in different ways depending
on the needs of a test case.  e.g. dynamically  select 32 bit vs 64
bit, does it have a CCmode, how many stages is the pipeline etc etc.
(b) a separate build of the "gcc" subdir, configured to use (a).

These seem like laudable goal, but I see it as orthogonal to my patch
kit.  It'd also be a massive expansion of scope.

Also, re (a) any given test is likely to be tested against a specific
target. That could be a real target, or a particular setting of a
virtual target.  The tests in my patch kit have largely gated on the
specific targets I was testing with.

So is Bernd's suggestion a prerequisite for my work, or can my work
stand alone from it?

> > 
> > But my biggest worry is with putting unit-tests into cc1 itself --
> > even more so with RTL unit tests of this kind than with all the
> > other
> > ones we have.  We'll quickly have 99% of a source file comprised of
> > RTL unit tests rather than source (and cc1 object size as well).
> > Hardly something we want to have (not even mentioning bootstrap
> > time
> > issues).
> I can live revisiting this -- I always expected we would after we 
> started building out the framework.

There are some interrelated questions here:
(a) where do the dumps live? (string fragments embedded in the source
file vs external files)
(b) -fself-test vs DejaGnu tests that use a real frontend.  In the
latter case, is the frontend "rtl1", or an extension of "cc1" with an
"__RTL" marker?

For (a), I'd like to do support both (in that it's clear we need
support for external files, but it seems trivial to support embedding).
 I've worked with both approaches in the past, and both are useful (a
two-liner may make sense to live "inline", as they get larger you'd
want them in a separate file).

For (b), I'd like to do both: if nothing else, the loader itself needs
selftests.  Plus selftests allow for tests that are more fine-grained
than just the level of one optimization pass.  But I'd anticipate most
of them being external.

For selftests that load external files, there's a slight wart - how are
they accessed?  There needs to be some way to tell it which directory
to look in.  Also, what happens when build != host?

I have some followup patches that both build out an actual frontend on
top of the loader, and extend cc1, but there's some ugly diagnostic and
linemap issues to resolve.


> > 
> > Yes, putting the unit-tests in source files makes us not require
> > exporting an interface to the parts we are testing.  But that's
> > about
> > the only advantage I can see.  You didn't show that it isn't
> > possible
> > to put the small test you were writing into a RTL-frontendish test
> > which starts compiling the function with the test with the pass you
> > are about to unit-test.
> The other advantage is tests which use the internal APIs are easy to 
> identify/fix when an internal API is changed.
> 
> Jeff
> 


Re: [PATCH 2/9] Add selftest::read_file

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

This is used later in the kit by the selftests for final.c

gcc/ChangeLog:
* selftest.c (selftest::read_file): New function.
(selftest::test_read_file): New function.
(selftest::selftest_c_tests): Call test_read_file.
* selftest.h (selftest::read_file): New decl.

Seems reasonable.  Install when you have a need.

jeff



Re: [PATCH 8/9] final.c selftests

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

gcc/ChangeLog:
* final.c: Include selftest.h and selftest-rtl.h.
(class selftest::temp_asm_out): New subclass of
selftest::named_temp_file.
(selftest::temp_asm_out::temp_asm_out): New ctor.
(selftest::temp_asm_out::~temp_asm_out): New dtor.
(class selftest::asm_out_test): New subclass of
selftest::rtl_dump_test.
(selftest::asm_out_test::asm_out_test): New ctor.
(selftest::test_jump_insn): New function.
(selftest::test_empty_function): New function.
(selftest::test_asm_for_insn): New function.
(TEST_ASM_FOR_INSN): New macro.
(selftest::test_x86_64_leal): New function.
(selftest::test_x86_64_negl): New function.
(selftest::test_x86_64_cmpl): New function.
(selftest::test_x86_64_cmovge): New function.
(selftest::test_x86_64_ret): New function.
(selftest::final_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::final_c_tests.
* selftest.h (selftest::final_c_tests): New decl.
I'm really not sure how useful these tests are going to be and would 
question the long term maintenance costs of keeping them up-to-date.


I could see perhaps verifying that when there are multiple alternatives 
that the correct one is selected or somesuch, but these tests really 
don't seem to be covering anything particularly useful.


Jeff



Re: [C++ PATCH] Fix constexpr switch handling (PR c++/77467)

2016-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2016 at 03:51:11PM -0400, Jason Merrill wrote:
> On Mon, Sep 5, 2016 at 1:11 PM, Jakub Jelinek  wrote:
> > +  /* If body is a statement other than STATEMENT_LIST or BIND_EXPR,
> > + it should be skipped.  E.g. switch (a) b = a;  */
> > +  if (TREE_CODE (body) == STATEMENT_LIST
> > +  || TREE_CODE (body) == BIND_EXPR)
> 
> I'm nervous about this optimization for useless code breaking other
> things that might (one day) wrap a case label; I think I'd prefer to
> drop the condition.

By droping the condition you mean unconditionally call
  cxx_eval_constant_expression (ctx, body, false,
non_constant_p, overflow_p, jump_target);
?  That is known not to work, that breaks the
+constexpr int
+bar (int x)
+{
+  int a = x;
+  switch (x)
+a = x + 1;
+  return a;
+}
handling in the testcase, where body is the MODIFY_EXPR which doesn't have
the label and thus needs to be skipped.  The problem is that all the logic for
skipping statements until the label is found is in cxx_eval_statement_list
only.  For STATEMENT_LIST that is called by cxx_eval_constant_expression,
for BIND_EXPR if we are lucky enough that BIND_EXPR_BODY is a STATEMENT_LIST
too (otherwise I assume even my patch doesn't fix it, it would need to
verify that).  If body is some other statement, then it really should be
skipped, but it isn't, because cxx_eval_constant_expression ignores it.

I wonder if we e.g. cxx_eval_constant_expression couldn't early in the
function for if (*jump_target) return immediately unless code is something
like STATEMENT_LIST or BIND_EXPR with BIND_EXPR_BODY being STATEMENT_LIST,
or perhaps in the future other construct containing other stmts.
I've beeing thinking about TRY block, but at least on the testcases I've
tried it has been rejected in constexpr functions, I think one can't branch
into statement expressions, so that should be fine, OpenMP/OpenACC
constructs are hopefully also rejected in constexpr, what else?

Jakub


Re: [PATCH 7/9] combine.c selftests

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

gcc/ChangeLog:
* combine.c: Include selftest.h and selftest-rtl.h.
(try_combine): Add assertion on this_basic_block.
(class selftest::combine_test): New subclass of
selftest::tl_dump_test.
(selftest::combine_test::combine_test): New ctor.
(selftest::test_combining_shifts): New function.
(selftest::test_non_combinable_shifts): New function.
(selftest::combine_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Run
selftest::combine_c_tests.
* selftest.h (selftest::combine_c_tests): New decl.
diff --git a/gcc/combine.c b/gcc/combine.c
index 1b262f9..9c148bb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2625,6 +2627,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   rtx new_other_notes;
   int i;

+  gcc_assert (this_basic_block);

Presumably when you set up the self test the first time this was NULL :-)


+
+/* combine_test's constructor.  Write DUMP_CONTENT to a tempfile and load
+   it.  Initialize df and perform dataflow analysis.  */
+
+combine_test::combine_test (const char *dump_content,
+   int dumped_first_pseudo_regno)
+: rtl_dump_test (dump_content, dumped_first_pseudo_regno),
+  m_df_test ()
+{
+  /* The dataflow instance should have been created by m_df_test's ctor.  */
+  gcc_assert (df);
+
+  /* From rest_of_handle_combine.  */
+  df_set_flags (/*DF_LR_RUN_DCE + */ DF_DEFER_INSN_RESCAN);
+  df_note_add_problem ();
+  df_analyze ();
+}
So rather than taking a string (which is a pain to construct), then 
writing it out to a file, then reading it in, what's wrong with having 
RTL fragments in a file?



Jeff



Re: [PATCH] Fix warning breaking profiled bootstrap

2016-09-16 Thread Andi Kleen
On Fri, Sep 16, 2016 at 10:44:34AM -0600, Jeff Law wrote:
> On 08/21/2016 02:30 PM, Eric Botcazou wrote:
> > > Consider the case where sym1 results in a non-null return value (and
> > > initializes neg1/inv1), but sym2 results in a null return value, leaving
> > > neg2/inv2 undefined, but cst2 is can still be true (ADDR_EXPR with an
> > > invariant address comes to mind).
> > > 
> > > Thus we can get into these statements:
> > > 
> > > 
> > >tree cst = cst1 ? val1 : val2;
> > >tree inv = cst1 ? inv2 : inv1;
> > > 
> > > 
> > > Note carefully how they test cst1 and depending on its value, they may
> > > read val2 or inv2.
> > 
> > The key here is that cst1 cannot be true if sym1 is non-null, same for cst2
> > and sym2.
> > 
> > The code is guarded with:
> > 
> >   /* If one is of the form '[-]NAME + CST' and the other is constant, then
> >  it might be possible to say something depending on the constants.  */
> >   if ((sym1 && inv1 && cst2) || (sym2 && inv2 && cst1))
> > 
> > If this is the first case, then cst1 is false and val2 and inv1 are read.
> > If this is the second case, then cst1 is true and val1 and inv2 are read.
> > 
> > So inv2 is read only in the second case, and is initialized.
> THanks.  Sometimes this stuff gets painful to follow :(
> 
> It looks like Andi already checked in this change.  Andi, please don't do
> that.  THe patch wasn't approved at the time of your commit and there's no
> indication the change was regression tested.

I don't post any patches that are not regression tested.
In fact the patch was needed to run proper regression tests with
profiling on because it was a bootstrap fix.

I had interpreted your earlier comments as approval, and imho
the patch is obvious anyways.

Also I would like to point out that such a long turn around time for
bootstrap fixes (and only out of secondary considerations which had
nothing to do with the actual bootstrap problem) is extremely disruptive for
anyone who actually relies on working bootstraps.

If there's no intention to keep --enable-werror working for all build
options the default should probably change to --disable-werror.

-Andi


Re: [PATCH 6/9] df selftests

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

gcc/ChangeLog:
* df-core.c: Include selftest.h and selftest-rtl.h.
(selftest::dataflow_test::dataflow_test): New ctor.
(selftest::dataflow_test::~dataflow_test): New dtor.
(selftest::test_df_single_set): New function.
(selftest::df_core_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call it.
* selftest.h (selftest::df_core_c_tests): New decl.
---
 gcc/df-core.c| 77 
 gcc/selftest-run-tests.c |  1 +
 gcc/selftest.h   |  1 +
 3 files changed, 79 insertions(+)

diff --git a/gcc/df-core.c b/gcc/df-core.c
index e531d58..cb8e2f9 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -384,6 +384,8 @@ are write-only operations.
 #include "cfganal.h"
 #include "tree-pass.h"
 #include "cfgloop.h"
+#include "selftest.h"
+#include "selftest-rtl.h"

 static void *df_get_bb_info (struct dataflow *, unsigned int);
 static void df_set_bb_info (struct dataflow *, unsigned int, void *);
@@ -2482,3 +2484,78 @@ debug_df_chain (struct df_link *link)
   df_chain_dump (link, stderr);
   fputc ('\n', stderr);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* dataflow_test's constructor.  Initialize df.  */
+
+dataflow_test::dataflow_test ()
+{
+  rest_of_handle_df_initialize ();
+}
+
+/* dataflow_test's destructor.  Clean up df.  */
+
+dataflow_test::~dataflow_test ()
+{
+  rest_of_handle_df_finish ();
+}
+
+/* Verify df_note on a trivial function.  */
+
+void
+test_df_single_set ()
+{
+  const char *input_dump
+= "(insn 1 0 0 2 (set (reg:SI 100) (reg:SI 101)) -1 (nil))\n";
+  rtl_dump_test t (input_dump, 100);
+  //print_rtl_with_bb (stdout, get_insns (), 1024);

Obviously this call to pritn_rtl_with_bb should disappear...



+
+  dataflow_test dftest;
+
+  df_note_add_problem ();
+  df_analyze ();
+  //df_dump (stderr);

ANd this call to df_dump.


+
+  ASSERT_EQ (SET_SRC (PATTERN (insn)), note0->element ());
+  ASSERT_EQ (REG_DEAD, REG_NOTE_KIND (note0));
+
+  ASSERT_EQ (SET_DEST (PATTERN (insn)), note1->element ());
+  ASSERT_EQ (REG_UNUSED, REG_NOTE_KIND (note1));

I don't think the ordering of the notes is guaranteed.

Like the other RTL test I've looked at, I'd be real curious to know if 
you get any uninitialized memory reads if you run this test under valgrind.


jeff



Re: [PATCH 3/9] selftest.h: add temp_override fixture

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

We have a lot of global state in our code.  Ideally we'd reduce the
amount of such global state, but a prerequisite for sane refactoring
is having automated testing in place to ensure that the refactoring
doesn't break anything.

However, the global state itself makes it hard to write such automated
testing.

To break this Catch-22, this patch introduces a class temp_override,
for temporarily assigning a value to a global variable, saving the old
value, and then restoring that old value in a dtor.

gcc/ChangeLog:
* selftest.h (selftest::temp_override): New class.

When you need it, this is fine.

jeff



Re: [PATCH 9/9] cse.c selftests

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

This patch uses rtl_dump_test to start building out a test suite
for cse.

I attempted to create a reproducer for PR 71779; however I'm not yet
able to replicate the bogus cse reported there via the test case.

gcc/ChangeLog:
* cse.c: Include selftest.h and selftest-rtl.h.
(selftest::test_simple_cse): New function.
(selftest::test_pr71779): New function.
(selftest::cse_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::cse_c_tests.
* selftest.h (selftest::cse_c_tests): New decl.
So as I look at this, the first thing that comes to mind is whether or 
not to refine the dump output.


There's a lot of useless crap in there that really only helps folks that 
sitting inside a debugger dumping hunks of RTL (I'm thinking 
specifically about the pointers back to tree nodes for DECLs).  Those 
addresses are useless in other contexts.


When possible I don't think we want the tests to be target specific. 
Hmm, I'm probably about to argue for Bernd's work...  The 71779 testcase 
is a great example -- it uses high/lo_sum.  Not all targets support that 
-- as long as we don't try to recognize those insns we're likely OK, but 
that seems fragile long term.  Having an idealized target means we can 
ignore much of these issues.





---
 gcc/cse.c| 109 +++
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 111 insertions(+)

diff --git a/gcc/cse.c b/gcc/cse.c
index 0bfd7ff..f4f06fe 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "rtl-iter.h"
+#include "selftest.h"
+#include "selftest-rtl.h"

 #ifndef LOAD_EXTEND_OP
 #define LOAD_EXTEND_OP(M) UNKNOWN
@@ -7773,3 +7775,110 @@ make_pass_cse_after_global_opts (gcc::context *ctxt)
 {
   return new pass_cse_after_global_opts (ctxt);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests for CSE.  */
+
+/* Simple test of eliminating a redundant (reg + 1) computation
+   i.e. that:
+ r101 = r100 + 1;
+ r102 = r100 + 1; <<< common subexpression
+ *r103 = r101 * r102;
+   can be CSE-ed to:
+ r101 = r100 + 1;
+ r102 = r101; <<< replaced
+ *r103 = r101 * r102;
+   by cse_main.  */
So I'm real curious, what happens if you run this RTL selftest under 
valgrind?  I have the sneaking suspicion that we'll start doing some 
uninitialized memory reads.




+
+static void
+test_simple_cse ()
+{
+  /* Only run this tests for i386.  */
+#ifndef I386_OPTS_H
+  return;
+#endif
+
+  const char *input_dump
+= (/* "r101 = r100 + 1;" */
+   "(insn 1 0 2 2 (set (reg:SI 101)\n"
+   "   (plus:SI (reg:SI 100)\n"
+   "(const_int 1 [0x1]))) -1 (nil))\n"
+   /* "r102 = r100 + 1;" */
+   "(insn 2 1 3 2 (set (reg:SI 102)\n"
+   "   (plus:SI (reg:SI 100)\n"
+   "(const_int 1 [0x1]))) -1 (nil))\n"
+   /* "*r103 = r101 * r102;" */
+   "(insn 3 2 0 2 (set (mem:SI (reg:SI 103) [1 i+0 S4 A32])\n"
+   "   (mult:SI (reg:SI 101) (reg:SI 102))) -1 (nil))\n"
+   );
I don't think we need to comment each RTL insn for those which are so 
obvious.  It just adds to the visual clutter.



 +

+  /* Dump taken from comment 2 of PR 71779, of
+ "...the relevant memory access coming out of expand"
+ with basic block IDs added, and prev/next insns set to
+ 0 at ends.  */
+  const char *input_dump
+= (";; MEM[(struct isl_obj *)] = _obj_map_vtable;\n"
+   "(insn 1045 0 1046 2 (set (reg:SI 480)\n"
+   "(high:SI (symbol_ref:SI (\"isl_obj_map_vtable\") [flags 0xc0] 
))) y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1046 1045 1047 2 (set (reg/f:SI 479)\n"
+   "(lo_sum:SI (reg:SI 480)\n"
+   "(symbol_ref:SI (\"isl_obj_map_vtable\") [flags 0xc0] ))) y.c:12702 -1\n"
+   " (expr_list:REG_EQUAL (symbol_ref:SI (\"isl_obj_map_vtable\") [flags 0xc0] 
)\n"
+   "(nil)))\n"
+   "(insn 1047 1046 1048 2 (set (reg:DI 481)\n"
+   "(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1\n"
+   " (nil))\n"
+   "(insn 1048 1047 1049 2 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17368 
])\n"
+   "(const_int 32 [0x20])\n"
+   "(const_int 0 [0]))\n"
+   "(reg:DI 481)) y.c:12702 -1\n"
+   " (nil))\n"
So looking at this just makes my head hurt.  Escaping, lots of quotes, 
unnecessary things in the dump, etc.  The question I would have is why 
not have a file with the dump and read the file?





Re: [PATCH 4/9] Expose forcibly_ggc_collect and run it after all selftests

2016-09-16 Thread Jeff Law

On 09/08/2016 06:30 PM, David Malcolm wrote:

Force a GC at the end of the selftests, to shake out GC-related
issues.  For example, if any GC-managed items have buggy (or missing)
finalizers, this last collection will ensure that things that were
failed to be finalized can be detected by valgrind.

gcc/ChangeLog:
* ggc-tests.c (forcibly_ggc_collect): Rename to...
(selftest::forcibly_ggc_collect): ...this, and remove "static".
(test_basic_struct): Update for above renaming.
(test_length): Likewise.
(test_union): Likewise.
(test_finalization): Likewise.
(test_deletable_global): Likewise.
(test_inheritance): Likewise.
(test_chain_next): Likewise.
(test_user_struct): Likewise.
(test_tree_marking): Likewise.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::forcibly_ggc_collect at the end of the selftests.
* selftest.h (selftest::forcibly_ggc_collect): New decl.

Seems reasonable and doesn't depend on earlier patches, right?

Assuming that's correct it seems fine for the trunk whenever you want to 
install it.


jeff



Re: [PATCH 0/9] RFC: selftests based on RTL dumps

2016-09-16 Thread Jeff Law

On 09/13/2016 05:15 AM, Bernd Schmidt wrote:


Note that the loader now resets INSN_CODE to -1, regardless of the
actual code passed in, to force re-recognition, and to isolate the
dumps somewhat from changes to the .md files.  So although the above
says insn 641 and 642 (for some snapshot of the aarch64 md file), it
gets reset to -1.


Best to find out a way to avoid including it in the strings then, to
avoid confusion.
We should also twiddle how we represent registers in the dumps. 
Identifying hard regs by name (so we can map back to a hard reg if the 
hard regs change), identifying pseudos by number that isn't affected if 
the hard register set changes ie, p0, p1, p2, p3 where the number is 
REGNO (x) - FIRST_PSEUDO_REGISTER. identifying the virtual registers, etc.


The key being rather than put a ton of smarts/hacks in a reader, we 
should work to have the RTL writer give us something more useful.  That 
may mean simple changes to the output, or some conditional changes (like 
not emitting the INSN_CODE or its name).


jeff


Re: [PATCH 0/9] RFC: selftests based on RTL dumps

2016-09-16 Thread Jeff Law

On 09/14/2016 02:37 AM, Richard Biener wrote:


Note that while the "snippets" may largely work (not sure how many
you tried to come up with) I see the issue that a lot of RTL "unit
tests" would need some trees set up, like to properly form MEM_EXPRs
or REG_DECL or even SYMBOL_REFs.  So I fear that the scope of
unit-tests we can implement with the proposed scheme is very limited
(you may also need other stuff setup, like alias analysis or parts of
IRA or cost analysis parts).  So I agree a separate testing backend
is a good way to make unit-testing more stable on the target side we
also need a way to provide input on some of the global state that is
currently set up by frontends.
Agreed across the board.  I think we're still early in the learning 
phase on this stuff.   I shudder when I think about the amount of state 
that we depend on, but which is not represented in the RTL dumps.


But I do think there are some things we can test for in an RTL self 
testing framework and that having one would be a significant step forward.


So I think we have two big questions.

First, does David's work have value as a way to directly test pieces of 
the RTL pipeline without having to generate all the RTL bits by hand.  I 
think the answer is yes.


Second, will David's work help identify internal state that is not 
expressed in the RTL dumps or poor modularity (ie, cases like trees 
embedded within the RTL structures).  I think the answer to this is yes 
as well.


Third, is a framework like Bernd's useful as well and can it be mated 
with David's work.  And I think the answer is yes & yes as well with the 
result being more useful than either Bernd or David's work in isolation.




But my biggest worry is with putting unit-tests into cc1 itself --
even more so with RTL unit tests of this kind than with all the other
ones we have.  We'll quickly have 99% of a source file comprised of
RTL unit tests rather than source (and cc1 object size as well).
Hardly something we want to have (not even mentioning bootstrap time
issues).
I can live revisiting this -- I always expected we would after we 
started building out the framework.




Yes, putting the unit-tests in source files makes us not require
exporting an interface to the parts we are testing.  But that's about
the only advantage I can see.  You didn't show that it isn't possible
to put the small test you were writing into a RTL-frontendish test
which starts compiling the function with the test with the pass you
are about to unit-test.
The other advantage is tests which use the internal APIs are easy to 
identify/fix when an internal API is changed.


Jeff



Re: [C++ PATCH] Fix ICE in cp/error.c (PR c++/77482)

2016-09-16 Thread Jason Merrill
OK.

On Mon, Sep 5, 2016 at 1:17 PM, Jakub Jelinek  wrote:
> Hi!
>
> The recent concept changes that were also backported to 6.2 break
> diagnostics e.g. on the following testcase, sometimes it ICEs during
> reporting of the first error, so isn't just a normal low prio error
> recovery.  The thing is that on the testcase the VAR_DECL has no
> DECL_LANG_SPECIFIC, DECL_DECLARED_CONSTEXPR_P is a lang flag rather than
> lang_specific field.  I believe in various places in cp/error.c we check
> for *_LANG_SPECIFIC similarly.  In addition, the hunk had formatting issues.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk/6.3?
>
> 2016-09-05  Jakub Jelinek  
>
> PR c++/77482
> * error.c (dump_simple_decl): Only check DECL_DECLARED_CONCEPT_P
> if DECL_LANG_SPECIFIC is non-NULL.  Fix up formatting.
>
> * g++.dg/cpp0x/constexpr-77482.C: New test.
>
> --- gcc/cp/error.c.jj   2016-09-02 18:17:32.0 +0200
> +++ gcc/cp/error.c  2016-09-05 14:01:43.091770870 +0200
> @@ -959,14 +959,13 @@ dump_simple_decl (cxx_pretty_printer *pp
>  {
>if (flags & TFF_DECL_SPECIFIERS)
>  {
> -  if (VAR_P (t)
> - && DECL_DECLARED_CONSTEXPR_P (t))
> -{
> -  if (DECL_DECLARED_CONCEPT_P (t))
> -pp_cxx_ws_string (pp, "concept");
> -  else
> -   pp_cxx_ws_string (pp, "constexpr");
> -}
> +  if (VAR_P (t) && DECL_DECLARED_CONSTEXPR_P (t))
> +{
> + if (DECL_LANG_SPECIFIC (t) && DECL_DECLARED_CONCEPT_P (t))
> +   pp_cxx_ws_string (pp, "concept");
> + else
> +   pp_cxx_ws_string (pp, "constexpr");
> +   }
>dump_type_prefix (pp, type, flags & ~TFF_UNQUALIFIED_NAME);
>pp_maybe_space (pp);
>  }
> --- gcc/testsuite/g++.dg/cpp0x/constexpr-77482.C.jj 2016-09-05 
> 13:58:59.609821176 +0200
> +++ gcc/testsuite/g++.dg/cpp0x/constexpr-77482.C2016-09-05 
> 13:58:18.0 +0200
> @@ -0,0 +1,6 @@
> +// PR c++/77482
> +// { dg-do compile { target c++11 } }
> +
> +constexpr auto x;  // { dg-error "declaration\[^\n\r]*has no 
> initializer" }
> +extern struct S s;
> +constexpr auto y = s;  // { dg-error "has incomplete type" }
>
> Jakub


Re: [C++ PATCH] Fix constexpr switch handling (PR c++/77467)

2016-09-16 Thread Jason Merrill
On Mon, Sep 5, 2016 at 1:11 PM, Jakub Jelinek  wrote:
> +  /* If body is a statement other than STATEMENT_LIST or BIND_EXPR,
> + it should be skipped.  E.g. switch (a) b = a;  */
> +  if (TREE_CODE (body) == STATEMENT_LIST
> +  || TREE_CODE (body) == BIND_EXPR)

I'm nervous about this optimization for useless code breaking other
things that might (one day) wrap a case label; I think I'd prefer to
drop the condition.

OK with that change, for trunk and 6.

Jason


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-16 Thread Jeff Law

On 09/12/2016 10:19 AM, Tamar Christina wrote:

Hi All,

This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same ± 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
 return Normal;
  else if (no bits are set on the number after masking out
   sign bits then)
 return Zero;
  else if (exponent has no bits set)
 return Subnormal;
  else if (mantissa has no bits set)
 return Infinite;
  else
 return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than
the fpclassify builtin.

As an example, Aarch64 now generates for classification of doubles:

f:
fmovx1, d0
mov w0, 7
sbfxx2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
bne .L1
mov w0, 13
tst x1, 0x7fff
beq .L1
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xf
mov w0, 3
mov w1, 5
cselw0, w0, w1, ne

.L1:
ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.

Ok for trunk?

Thanks,
Tamar

PS. I don't have commit rights so if OK can someone apply the patch for me.

gcc/
2016-08-25  Tamar Christina  
Wilco Dijkstra  

* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
(mips_single_format): Likewise.
(motorola_single_format): Likewise.
(spu_single_format): Likewise.
(ieee_double_format): Likewise.
(mips_double_format): Likewise.
(motorola_double_format): Likewise.
(ieee_extended_motorola_format): Likewise.
(ieee_extended_intel_128_format): Likewise.
(ieee_extended_intel_96_round_53_format): Likewise.
(ibm_extended_format): Likewise.
(mips_extended_format): Likewise.
(ieee_quad_format): Likewise.
(mips_quad_format): Likewise.
(vax_f_format): Likewise.
(vax_d_format): Likewise.
(vax_g_format): Likewise.
(decimal_single_format): Likewise.
(decimal_quad_format): Likewise.
(iee_half_format): Likewise.
(mips_single_format): Likewise.
(arm_half_format): Likewise.
(real_internal_format): Likewise.


gcc-public.patch


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
1073e35b17b1bc1f6974c71c940bd9d82bbbfc0f..58bf129f9a0228659fd3b976d38d021d1d5bd6bb
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7947,10 +7947,8 @@ static tree
 fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
 {
   tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
-arg, type, res, tmp;
+arg, type, res;
   machine_mode mode;
-  REAL_VALUE_TYPE r;
-  char buf[128];

   /* Verify the required arguments in the original call.  */
   if (nargs != 6
@@ -7970,14 +7968,143 @@ fold_builtin_fpclassify (location_t loc, tree *args, 
int nargs)
   arg = args[5];
   type = TREE_TYPE (arg);
   mode = TYPE_MODE (type);
-  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+  const real_format *format = REAL_MODE_FORMAT (mode);
+
+  /*
+  For IEEE 754 types:
+
+  fpclassify (x) ->
+   !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+? (x & sign_mask == 0 ? FP_ZERO :
+  (exp & exp_mask == exp_mask
+ ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+ FP_SUBNORMAL)):
+   FP_NORMAL.
+
+  Otherwise
+
+  fpclassify (x) ->
+   isnan (x) ? FP_NAN :
+   (fabs (x) == Inf ? FP_INFINITE :
+  (fabs (x) >= DBL_MIN ? FP_NORMAL :
+(x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+  */
+
+  /* Check if the number that is being classified is close enough to IEEE 754
+ format to be able to go in the early exit code.  */
+  if (format->is_binary_ieee_compatible)
+

Re: [PATCH] Fix abi_tag23* test failure (PR c++/77379)

2016-09-16 Thread Jason Merrill
OK.

On Mon, Aug 29, 2016 at 4:10 PM, Jakub Jelinek  wrote:
> On Mon, Aug 29, 2016 at 12:42:28PM -0400, Jason Merrill wrote:
>> Another missing ABI tag, sigh.
>>
>> Tested x86_64-pc-linux-gnu, applying to trunk.
>
>> commit 1337a943a2d3926537b63d6e1f0d7f46ef10a06d
>> Author: Jason Merrill 
>> Date:   Fri Aug 26 15:12:52 2016 -0400
>>
>>   PR c++/77379 - ABI tag on thunk
>>
>>   * mangle.c (maybe_check_abi_tags): Add version parm, handle thunks.
>>   (mangle_thunk): Add thunk parameter.
>>   * method.c (finish_thunk): Pass it.
>>   * cp-tree.h: Declare it.
>
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/abi/abi-tag23.C
>> +// { dg-final { scan-assembler "_ZThn16_N7Derived7get_fooB3barEv" } }
>
> This unfortunately fails e.g. on i686-linux, because the symbol is
> _ZThn8_N7Derived7get_fooB3barEv instead of
> _ZThn16_N7Derived7get_fooB3barEv
>
> The following patch accepts any negative offsets.  Tested on x86_64-linux
> and i686-linux, ok for trunk?
>
> 2016-08-29  Jakub Jelinek  
>
> PR c++/77379
> * g++.dg/abi/abi-tag23.C: Adjust scan-assembler regex for differing
> thunk offsets.
> * g++.dg/abi/abi-tag23a.C: Likewise.
>
> --- gcc/g++.dg/abi/abi-tag23.C.jj   2016-08-29 19:34:12.0 +0200
> +++ gcc/g++.dg/abi/abi-tag23.C  2016-08-29 22:04:16.328873328 +0200
> @@ -32,4 +32,4 @@ int main()
>Final().get_foo();
>  }
>
> -// { dg-final { scan-assembler "_ZThn16_N7Derived7get_fooB3barEv" } }
> +// { dg-final { scan-assembler "_ZThn\[0-9]+_N7Derived7get_fooB3barEv" } }
> --- gcc/g++.dg/abi/abi-tag23a.C.jj  2016-08-29 19:34:12.0 +0200
> +++ gcc/g++.dg/abi/abi-tag23a.C 2016-08-29 22:04:55.053398520 +0200
> @@ -32,4 +32,4 @@ int main()
>Final().get_foo();
>  }
>
> -// { dg-final { scan-assembler "_ZThn16_N7Derived7get_fooEv" } }
> +// { dg-final { scan-assembler "_ZThn\[0-9]+_N7Derived7get_fooEv" } }
>
>
> Jakub


Re: [C++ PATCH] Fix ICE with PARM_DECL with incomplete type (PR c++/77338)

2016-09-16 Thread Jason Merrill
OK.

On Mon, Aug 29, 2016 at 3:46 PM, Jakub Jelinek  wrote:
> Hi!
>
> In r239289 you've done something similar for the VAR_DECL etc. case, but
> for PARM_DECL we can still call is_really_empty_class on incomplete types
> and ICE because TYPE_BINFO is NULL.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2016-08-29  Jakub Jelinek  
>
> PR c++/77338
> * constexpr.c (cxx_eval_constant_expression) : Only
> call is_really_empty_class on complete types.
>
> * g++.dg/cpp0x/decltype-77338.C: New test.
>
> --- gcc/cp/constexpr.c.jj   2016-08-12 17:33:42.0 +0200
> +++ gcc/cp/constexpr.c  2016-08-29 14:26:50.342319322 +0200
> @@ -3747,7 +3747,8 @@ cxx_eval_constant_expression (const cons
> /* Defer in case this is only used for its type.  */;
>else if (TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE)
> /* Defer, there's no lvalue->rvalue conversion.  */;
> -  else if (is_really_empty_class (TREE_TYPE (t)))
> +  else if (COMPLETE_TYPE_P (TREE_TYPE (t))
> +  && is_really_empty_class (TREE_TYPE (t)))
> {
>   /* If the class is empty, we aren't actually loading anything.  */
>   r = build_constructor (TREE_TYPE (t), NULL);
> --- gcc/testsuite/g++.dg/cpp0x/decltype-77338.C.jj  2016-08-29 
> 14:42:31.974306247 +0200
> +++ gcc/testsuite/g++.dg/cpp0x/decltype-77338.C 2016-08-29 14:41:12.0 
> +0200
> @@ -0,0 +1,7 @@
> +// PR c++/77338
> +// { dg-do compile { target c++11 } }
> +
> +struct S;
> +
> +template 
> +auto f (S s) -> decltype (s (s));  // { dg-error "no match for call to" }
>
> Jakub


Re: [C++ PATCH] Propagate CLASSTYPE_HAS_MUTABLE from bases to derived classes (PR c++/77375)

2016-09-16 Thread Jason Merrill
OK.

On Mon, Aug 29, 2016 at 3:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase fails (foo is allocated in .rodata and modified)
> because while we clear TREE_READONLY for classes with mutable members, we
> don't do that if they only have bases with mutable members.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.  Ok for
> trunk?  What about release branches?
>
> 2016-08-29  Jakub Jelinek  
>
> PR c++/77375
> * class.c (check_bases): Set CLASSTYPE_HAS_MUTABLE if any 
> TYPE_HAS_MUTABLE_P
> for any bases.
>
> * g++.dg/cpp0x/mutable1.C: New test.
>
> --- gcc/cp/class.c.jj   2016-08-12 17:33:42.0 +0200
> +++ gcc/cp/class.c  2016-08-29 11:54:10.387061502 +0200
> @@ -1796,6 +1796,8 @@ check_bases (tree t,
>SET_CLASSTYPE_REF_FIELDS_NEED_INIT
> (t, CLASSTYPE_REF_FIELDS_NEED_INIT (t)
>  | CLASSTYPE_REF_FIELDS_NEED_INIT (basetype));
> +  if (TYPE_HAS_MUTABLE_P (basetype))
> +   CLASSTYPE_HAS_MUTABLE (t) = 1;
>
>/*  A standard-layout class is a class that:
>   ...
> --- gcc/testsuite/g++.dg/cpp0x/mutable1.C.jj2016-08-29 12:05:13.700212552 
> +0200
> +++ gcc/testsuite/g++.dg/cpp0x/mutable1.C   2016-08-29 12:03:51.0 
> +0200
> @@ -0,0 +1,12 @@
> +// PR c++/77375
> +// { dg-do run { target c++11 } }
> +
> +struct Base { mutable int i; };
> +struct Derived : Base {};
> +const Derived foo{};
> +
> +int
> +main ()
> +{
> +  foo.i = 42;
> +}
>
> Jakub


Re: [PATCH 1/3] Put a TARGET_LRA_P into every target

2016-09-16 Thread Mike Stump
On Sep 16, 2016, at 6:40 AM, Segher Boessenkool  
wrote:
> 
> Does just "The default version of this target hook returns true." sound
> better?  I.e. delete "always".

That is fine.



[Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-09-16 Thread Senthil Kumar Selvaraj
Hi,

  The below patch fixes what I think are a couple of problems with
  reload.c:find_valid_class_1.

  First, even if no regno is in_hard_reg_set_p, it goes ahead and
  considers rclass as valid. bad is set only if a regno is in the reg
  class *and* HARD_REGNO_MODE_OK is false - if both are false, bad isn't
  set and the reload gets a wrong rclass. If that happens to be the best
  one, this eventually leads to find_reg running out of registers to
  spill, because the chosen rclass won't have enough regs.

  Second, it expects every regno in rclass to be valid for mode i.e., if
  any regno fails HARD_REGNO_MODE_OK, it rejects the rclass. The
  comments in the original commit for find_valid_class_1 say atleast one
  regno is ok. This was updated to say "class which contains only
  registers" when in_hard_reg_set_p was introduced in place of just
  TEST_HARD_REG_BIT.

  Is it meant to really test all regs? For the avr target, all regs are
  8 bits wide, and HARD_REGNO_MODE_OK returns false for odd regnos if
  mode != QImode. With the current code, it ignores a bunch of otherwise
  perfectly legal reg classes.

  To fix the first problem, the patch adds regno_in_rclass_mode to track
  whether there's atleast one regno in hard_reg_set_p. To fix the second
  problem, the patch sets bad initially, and resets it if
  HARD_REGNO_MODE_OK succeeded for a regno, thus breaking the loop.

  Of course, if both my points above are valid, we can do away with
  regno_in_rclass_mode, just bad would do.

  Does this make sense? I ran a reg test for the avr target with a
  slightly older version of this patch, it did not show any regressions.
  If this is the right fix, I'll make sure to run reg tests on x86_64
  after backporting to a gcc version where that target used reload.

Regards
Senthil


Index: reload.c
===
--- reload.c(revision 240185)
+++ reload.c(working copy)
@@ -714,17 +714,22 @@
 
   for (rclass = 1; rclass < N_REG_CLASSES; rclass++)
 {
-  int bad = 0;
-  for (regno = 0; regno < FIRST_PSEUDO_REGISTER && !bad; regno++)
-   {
- if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno)
- && !HARD_REGNO_MODE_OK (regno, mode))
-   bad = 1;
-   }
-  
-  if (bad)
-   continue;
+  int bad = 1;
+  int regno_in_rclass_mode = 0;
 
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER && bad; regno++)
+{
+  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
+{
+  regno_in_rclass_mode = 1;
+  if (HARD_REGNO_MODE_OK (regno, mode))
+bad = 0;
+}
+}
+
+  if (bad || !regno_in_rclass_mode)
+continue;
+
   cost = register_move_cost (outer, (enum reg_class) rclass, dest_class);
 
   if ((reg_class_size[rclass] > best_size


[PATCH] Fix cgraph_node::rtl_info (PR target/77587)

2016-09-16 Thread Jakub Jelinek
Hi!

On the following testcase the weak alias can be overridden, so we shouldn't
assume because we see bar to be alias to foo and have foo definition that
call to bar will bind to the current TU's foo.
cgraph_node::rtl_info calls ultimate_alias_target without checking if it can
be overridden or not.
The following patch fixes that, the decl != current_function_decl in there
is there so that for the current function it always returns non-NULL,
otherwise we ICE during bootstrap, and the node->decl != current_function_decl
is there e.g. for the case where we'd call an alias ultimately pointing to
the current function (TREE_ASM_WRITTEN would still not be set in that case).

The patch also removes extra calls to ultimate_alias_target, the function
called it up to 4 times, while only one is enough.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-16  Jakub Jelinek  
Jan Hubicka  

PR target/77587
* cgraph.c (cgraph_node::rtl_info): Pass  to
ultimate_alias_target call, return NULL if avail < AVAIL_AVAILABLE.
Call ultimate_alias_target just once, not up to 4 times.

* gcc.dg/pr77587.c: New test.
* gcc.dg/pr77587a.c: New file.

--- gcc/cgraph.c.jj 2016-08-06 12:11:49.0 +0200
+++ gcc/cgraph.c2016-09-16 14:19:31.553999007 +0200
@@ -1955,14 +1955,17 @@ cgraph_node::rtl_info (tree decl)
   cgraph_node *node = get (decl);
   if (!node)
 return NULL;
-  node = node->ultimate_alias_target ();
-  if (node->decl != current_function_decl
-  && !TREE_ASM_WRITTEN (node->decl))
+  enum availability avail;
+  node = node->ultimate_alias_target ();
+  if (decl != current_function_decl
+  && (avail < AVAIL_AVAILABLE
+ || (node->decl != current_function_decl
+ && !TREE_ASM_WRITTEN (node->decl
 return NULL;
-  /* Allocate if it doesnt exist.  */
-  if (node->ultimate_alias_target ()->rtl == NULL)
-node->ultimate_alias_target ()->rtl = ggc_cleared_alloc 
();
-  return node->ultimate_alias_target ()->rtl;
+  /* Allocate if it doesn't exist.  */
+  if (node->rtl == NULL)
+node->rtl = ggc_cleared_alloc ();
+  return node->rtl;
 }
 
 /* Return a string describing the failure REASON.  */
--- gcc/testsuite/gcc.dg/pr77587.c.jj   2016-09-15 14:06:09.449901026 +0200
+++ gcc/testsuite/gcc.dg/pr77587.c  2016-09-15 14:18:53.615467361 +0200
@@ -0,0 +1,14 @@
+/* PR target/77587 */
+/* { dg-do run } */
+/* { dg-require-weak-override "" } */
+/* { dg-additional-sources "pr77587a.c" } */
+
+void
+bar (long x, long y, long z)
+{
+  struct __attribute__((aligned (16))) S { long a, b, c, d; } s;
+  char *p = (char *) 
+  __asm ("" : "+r" (p));
+  if (((__UINTPTR_TYPE__) p) & 15)
+__builtin_abort ();
+}
--- gcc/testsuite/gcc.dg/pr77587a.c.jj  2016-09-15 14:05:58.873031952 +0200
+++ gcc/testsuite/gcc.dg/pr77587a.c 2016-09-15 14:16:57.242903986 +0200
@@ -0,0 +1,23 @@
+/* PR target/77587 */
+/* { dg-do compile } */
+/* { dg-require-weak-override "" } */
+
+void
+foo (long x, long y, long z)
+{
+}
+
+void bar (long x, long y, long z) __attribute__ ((weak, alias ("foo")));
+
+void
+baz (long x, long y, long z)
+{
+  bar (0, 0, 0);
+}
+
+int
+main ()
+{
+  baz (0, 0, 0);
+  return 0;
+}

Jakub


Re: [PATCH] Partially improve scalability of the unwinder (PR libgcc/71744)

2016-09-16 Thread Torvald Riegel
On Thu, 2016-09-15 at 06:05 -0700, Ian Lance Taylor wrote:
> Jakub Jelinek  writes:
> 
> > 2016-09-15  Jakub Jelinek  
> >
> > PR libgcc/71744
> > * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
> > is not the primary registry and atomics are available.
> > (any_objects_registered): New variable.
> > (__register_frame_info_bases, __register_frame_info_table_bases):
> > Atomically store 1 to any_objects_registered after registering first
> > unwind info.
> > (_Unwind_Find_FDE): Return early if any_objects_registered is 0.
> 
> This is OK.

In glibc, we have a rule that we add sufficient code comments for glibc.
I'm not in the position to set rules for GCC, but I think this would
help here too.  Trying to explain in comments why a certain memory order
is used would have at least brought up the issue you mention below.

> > +#ifdef ATOMIC_FDE_FAST_PATH
> > +  /* For targets where unwind info is usually not registered through these
> > + APIs anymore, avoid taking a global lock.  */
> > +  if (__builtin_expect (!__atomic_load_n (_objects_registered,
> > + __ATOMIC_ACQUIRE), 1))
> > +return NULL;
> > +#endif
> > +
> >init_object_mutex_once ();
> >__gthread_mutex_lock (_mutex);
> 
> I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE
> in the atomic_load_n.  You could use __ATOMIC_RELAXED.  Acquiring the
> mutex is going to enforce cross-thread sequential consistency anyhow.

But then the release MO wouldn't be required either.  The
__gthread_mutex_lock call has no synchronizes-with relation with the
release MO stores the patch adds, if that was what you were assuming.

Note that in the C11 / C++11 memory model, lock acquisitions are not
considered to automatically also be acquire MO fences.  This may be the
case on many archs, but it's not something the memory model guarantees.




Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-16 Thread David Malcolm
On Sun, 2016-09-11 at 20:03 -0600, Martin Sebor wrote:
> On 09/08/2016 04:10 PM, Joseph Myers wrote:
> > On Thu, 8 Sep 2016, Martin Sebor wrote:
> > 
> > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > > index da133a4..4607495 100644
> > > --- a/gcc/doc/tm.texi.in
> > > +++ b/gcc/doc/tm.texi.in
> > > @@ -4081,6 +4081,13 @@ In either case, it remains possible to
> > > select code-generation for the alternate
> > >   scheme, by means of compiler command line switches.
> > >   @end defmac
> > > 
> > > +@deftypefn {Target Hook} {const char *}
> > > TARGET_LIBC_PRINTF_POINTER_FORMAT (tree, const char **@var{flags}
> > > )
> > > +A hook to determine the target @code{printf} implementation
> > > format string
> > > +that the most closely corresponds to the @code{%p} format
> > > directive.
> > > +The object pointed to by the @var{flags} is set to a string
> > > consisting
> > > +of recognized format flags such as the @code{'#'} character.
> > > +@end deftypefn
> > 
> > No, the substance of hook documentation should go in target.def
> > with just
> > an @hook line in tm.texi.in leading to the documentation going in
> > tm.texi
> > automatically.
> > 
> > You appear to be defining a target macro masquerading as a hook. 
> >  Please
> > don't (new target macros should be avoided where possible); use a
> > proper
> > hook.  (Maybe the settings depending on OS rather than architecture
> > means
> > it needs to be one of those whose default is a manual setting in
> > target-def.h rather than automatically generated, but that should
> > be the
> > limit of deviation from the normal workings of hooks.)
> > 
> > > +  const char *pfmt = TARGET_LIBC_PRINTF_POINTER_FORMAT (arg,
> > > );
> > 
> > With a proper hook them you'd call
> > targetm.libc_printf_pointer_format.
> > 
> > > + inform (callloc,
> > > + (nbytes + exact == 1
> > > +  ? "format output %wu byte into a destination of
> > > size %wu"
> > > +  : "format output %wu bytes into a destination
> > > of size %wu"),
> > > + nbytes + exact, info.objsize);
> > 
> > You need to use G_() around both format strings in such a case;
> > xgettext
> > doesn't know how to extract them both.
> 
> Attached is revision 8 of the patch (hopefully) adjusted as
> requested above, and with a simple test with of the multiline
> diagnostic directives suggested by David.  This revision also
> enables the -fprintf-return-value option by default.  The libgomp
> test failures I was seeing in my earlier testing must have been
> caused by an older version of GMP or MPFR that I had inadvertently
> use (normally I use in-tree versions downloaded and expanded there
> by the download_prerequisites script but that time I forgot that
> step).
> 
> David, in the way of feedback, I spent hours debugging the simple
> multiline test I added.  It seems that DejaGnu is exquisitely
> sensitive to whitespaces in the multiline output.  I appreciate
> that spacing is essential to lining up the caret and the tildes
> with the text of the program but tests fail even when all the
> expected output is lined up right in the multiline directive but
> off by a space (or tab) with respect to the actual output.  That
> DejaGnu output doesn't make this very clear at all makes debugging
> these types of issues a pain.  I wonder if there's a better to do
> this.

Gah; I'm sorry this was such a pain to do.

I tend to just copy the stuff I want to quote directly from
Emacs's compilation buffer.

However a nit, which I think is why you had problems...


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
new file mode 100644
index 000..a601ba4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret 
-ftrack-macro-expansion=0" } */
+
+extern int sprintf (char*, const char*, ...);
+
+char dst [8];
+
+void test (void)
+{
+  sprintf (dst + 7, "%-s", "1");
+  /* { dg-warning "writing a terminating nul past the end of the destination" 
"" { target *-*-*-* } 10 }
+ { dg-message "format output 2 bytes into a destination of size 1" "" { 
target *-*-*-* } 10 }
+{ dg-begin-multiline-output "" }
+   sprintf (dst + 7, "%-s", "1");
+ ^
+   sprintf (dst + 7, "%-s", "1");
+   ^
+{ dg-end-multiline-output "" } */
+
+  sprintf (dst + 7, "%-s", "abcd");
+  /* { dg-warning ".%-s. directive writing 4 bytes into a region of size 1" "" 
{ target *-*-*-* } 20 }
+ { dg-message "format output 5 bytes into a destination of size 1" "" { 
target *-*-*-* } 20 }
+{ dg-begin-multiline-output "" }
+   sprintf (dst + 7, "%-s", "abcd");
+  ^~~   ~~
+   sprintf (dst + 7, "%-s", "abcd");
+   ^~~~
+{ dg-end-multiline-output "" } */
+}

You have two pairs of dg-begin/end-multiline-output, 

Re: [PATCH] accept flexible arrays in struct in unions (c++/71912 - [6/7 regression])

2016-09-16 Thread Jason Merrill

On 09/14/2016 01:03 PM, Martin Sebor wrote:

Attached is an updated patch that does away with the "overlapping"
warning and instead issues the same warnings as the C front end
(though with more context).

In struct flexmems_t I've retained the NEXT array even though only
the first element is used to diagnose problems.  The second element
is used by the find_flexarrays function to differentiate structs
with flexible array members in unions (which are valid) from those
in structs (which are not).

FWIW, I think this C restriction on structs is unnecessary and will
propose to remove it so that's valid to define a struct that contains
another struct (possibly recursively) with a flexible array member.
I.e., I think this should be made valid in C (and accepted without
the pedantic warning that GCC, and with this patch also G++, issues):


Agreed.


+  /* Is FLD a typedef for an anonymous struct?  */
+  bool anon_type_p
+   = (TREE_CODE (fld) == TYPE_DECL
+  && DECL_IMPLICIT_TYPEDEF_P (fld)
+  && anon_aggrname_p (DECL_NAME (fld)));


We talked earlier about handling typedefs in 
cp_parser_{simple,single}_declaration, so that we catch


typedef struct { int a[]; } B;

or at least adding a FIXME comment here explaining that looking at 
typedefs is a temporary hack.



+  /* Type of the member.  */
+  tree fldtype = TREE_CODE (fld) == FIELD_DECL ? TREE_TYPE (fld) : fld;


Why set "fldtype" to be a TYPE_DECL rather than its type?


+  /* Determine the type of the array element or object referenced
+by the member so that it can be checked for flexible array
+members if it hasn't been yet.  */
+  tree eltype = TREE_CODE (fld) == FIELD_DECL ? fldtype : TREE_TYPE (fld);


Given the above, this seems equivalent to

tree eltype = TREE_TYPE (fld);


+  if (RECORD_OR_UNION_TYPE_P (eltype))
+   {

...

+ if (fmem->array && !fmem->after[bool (pun)])
+   {
+ /* Once the member after the flexible array has been found
+we're done.  */
+ fmem->after[bool (pun)] = fld;
+ break;
+   }

...

  if (field_nonempty_p (fld))
{

...

  /* Remember the first non-static data member after the flexible
 array member, if one has been found, or the zero-length array
 if it has been found.  */
  if (fmem->array && !fmem->after[bool (pun)])
fmem->after[bool (pun)] = fld;
}


Can we do this in one place rather than two?


+ if (eltype == fldtype || TYPE_UNNAMED_P (eltype))


This is excluding arrays, so we don't diagnose, say,


struct A
{
  int i;
  int ar[];
};

struct B {
  A a[2];
};


Should we complain elsewhere about an array of a class with a flexible 
array member?



+ /* Does the field represent an anonymous struct?  */
+ bool anon_p = !DECL_NAME (fld) && ANON_AGGR_TYPE_P (eltype);


You don't need to check DECL_NAME; ANON_AGGR_TYPE_P by itself indicates 
that we're dealing with an anonymous struct/union.



   Similarly, PSTR is set to the outermost struct of which T is a member
   if one such struct exists, otherwise to NULL.  */

...

  find_flexarrays (eltype, fmem, false, pun,
   !pstr && TREE_CODE (t) == RECORD_TYPE ? fld : 
pstr);

...

+diagnose_invalid_flexarray (const flexmems_t *fmem)
+{
+  if (fmem->array && fmem->enclosing
+  && pedwarn (location_of (fmem->enclosing), OPT_Wpedantic,
+ TYPE_DOMAIN (TREE_TYPE (fmem->array))
+ ? G_("invalid use of %q#T with a zero-size array "
+  "in %q#D")
+ : G_("invalid use of %q#T with a flexible array member "
+  "in %q#T"),
+ DECL_CONTEXT (fmem->array),
+ DECL_CONTEXT (fmem->enclosing)))


PSTR is documented to be the outermost struct, but it (and thus 
fmem->enclosing) end up being a data member of that outermost struct, 
which is why you need to take its DECL_CONTEXT.  What's the advantage of 
doing that over passing t itself to the recursive call?



+  /* The context of the flexible array member.  Either the struct
+ in which it's declared or, for anonymous structs and unions,
+ the struct/union of which the array is effectively a member.  */
+  tree fmemctx = anon_context (fmem->array);
+  bool anon_p = fmemctx != DECL_CONTEXT (fmem->array);
+  if (!anon_p)
+fmemctx = t;


Why do you need to do something different here based on anon_p?  I don't 
see why that would affect whether we want to look through intermediate 
non-anonymous classes.



+  check_flexarrays (basetype, fmem, true);


Please decorate boolean literal arguments like this with the name of the 
parameter, e.g. /*base_p*/true.



+  bool maybe_anon_p = anon_aggrname_p (TYPE_IDENTIFIER (t));


Now we can use TYPE_UNNAMED_P.

Jason



Re: [PATCH][Libiberty] Support empty arguments in pex-win32

2016-09-16 Thread DJ Delorie

This is OK.  Thanks!


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Jonathan Wakely

On 16/09/16 13:12 +0100, Jonathan Wakely wrote:

On 16/09/16 11:56 +0100, Jonathan Wakely wrote:

On 16/09/16 11:37 +0200, Marc Glisse wrote:

Is the division (by a non-constant denominator) really necessary?


Probably not, but I've asked the committee for clarification what this
function should do when called with an invalid alignment.

Since align has to be a power of 2, x % align should be the same 
as x & (align - 1), for instance.


Thanks, if it's UB to call it with alignments that aren't a power of
two then we can do that.


I've committed the patch now, to fix the failures for Solaris. I'll
revisit it when I get clarification from the committee about invalid
alignment arguments.


I missed 18.6.2/1 which is clear that we don't have to support invalid
alignments passed to operator new, so I'm committing this.

Tested x86_64-linux.



commit 0a70fa595953ff1e6e46266d7f86bc6d7e3400a4
Author: Jonathan Wakely 
Date:   Fri Sep 16 17:36:44 2016 +0100

Replace modulus with mask operation in over-aligned new

2016-09-16  Jonathan Wakely  
	Marc Glisse  

	* libsupc++/new_opa.cc [_GLIBCXX_HAVE_ALIGNED_ALLOC]
	(operator new(size_t, align_val_t)): Replace modulus operator with
	mask.

diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc
index 9c859c1..91e53a8 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -69,7 +69,7 @@ operator new (std::size_t sz, std::align_val_t al)
 
 #if _GLIBCXX_HAVE_ALIGNED_ALLOC
   /* C11: the value of size shall be an integral multiple of alignment.  */
-  if (std::size_t rem = sz % align)
+  if (std::size_t rem = sz & (align - 1))
 sz += align - rem;
 #endif
 


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-16 Thread Jeff Law

On 09/11/2016 08:03 PM, Martin Sebor wrote:


Attached is revision 8 of the patch (hopefully) adjusted as
requested above, and with a simple test with of the multiline
diagnostic directives suggested by David.  This revision also
enables the -fprintf-return-value option by default.  The libgomp
test failures I was seeing in my earlier testing must have been
caused by an older version of GMP or MPFR that I had inadvertently
use (normally I use in-tree versions downloaded and expanded there
by the download_prerequisites script but that time I forgot that
step).
Ah, my question answered.  I started looking at the multiline stuff and 
didn't read the last sentence WRT libgomp.  Sorry for the noise in my 
prior message.





David, in the way of feedback, I spent hours debugging the simple
multiline test I added.  It seems that DejaGnu is exquisitely
sensitive to whitespaces in the multiline output.  I appreciate
that spacing is essential to lining up the caret and the tildes
with the text of the program but tests fail even when all the
expected output is lined up right in the multiline directive but
off by a space (or tab) with respect to the actual output.  That
DejaGnu output doesn't make this very clear at all makes debugging
these types of issues a pain.  I wonder if there's a better to do
this.
dejagnu as a whole is a pain and trying to get the right whitespace, 
regexps, escaping is an exercise in pure torture.  How you and David 
have managed to do any multiline tests is amazing.





Thanks
Martin

gcc-49905.diff


PR middle-end/49905 - Better sanity checking on sprintf src & dest to
produce warning for dodgy code

gcc/ChangeLog:
PR middle-end/49905
* Makefile.in (OBJS): Add gimple-ssa-sprintf.o.
* config/linux.h (TARGET_PRINTF_POINTER_FORMAT): Redefine.
* config/linux.c (gnu_libc_printf_pointer_format): New function.
* config/sol2.h (TARGET_PRINTF_POINTER_FORMAT): Same.
* config/sol2.c (solaris_printf_pointer_format): New function.
* doc/invoke.texi (-Wformat-length, -fprintf-return-value): New
options.
* doc/tm.texi.in (TARGET_PRINTF_POINTER_FORMAT): Document.
* doc/tm.texi: Regenerate.
* gimple-fold.h (get_range_strlen): New function.
(get_maxval_strlen): Declare existing function.
* gimple-fold.c (get_range_strlen): Add arguments and compute both
maximum and minimum.
 (get_range_strlen): Define overload.
(get_maxval_strlen): Adjust.
* gimple-ssa-sprintf.c: New file and pass.
* passes.def (pass_sprintf_length): Add new pass.
* targhooks.h (default_printf_pointer_format): Declare new function.
(gnu_libc_printf_pointer_format): Same.
(solaris_libc_printf_pointer_format): Same.
* targhooks.c (default_printf_pointer_format): Define new function.
* tree-pass.h (make_pass_sprintf_length): Declare new function.
* print-tree.c: Increase buffer size.

gcc/c-family/ChangeLog:
PR middle-end/49905
* c.opt: Add -Wformat-length and -fprintf-return-value.

gcc/testsuite/ChangeLog:
PR middle-end/49905
* gcc.dg/builtin-stringop-chk-1.c: Adjust.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-2.c: New test.





 static bool
-get_maxval_strlen (tree arg, tree *length, bitmap *visited, int type)
+get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
+ bool fuzzy)
 {
   tree var, val;
   gimple *def_stmt;

+  /* The minimum and maximum length.  The MAXLEN pointer stays unchanged
+ but MINLEN may be cleared during the execution of the function.  */
+  tree *minlen = length;
+  tree* const maxlen = length + 1;
Check formatting here.  I'm not sure if the formatting standards 
explicitly state how to handle the "*" when there's a qualifier between 
the type and the object.  But please check.



+
+ if (warned && fmtres.argmin)
+   {
+ if (fmtres.argmin == fmtres.argmax)
+   inform (info.fmtloc, "directive argument %qE", fmtres.argmin);
+ else if (fmtres.bounded)
+   inform (info.fmtloc, "directive argument in the range [%E, %E]",
+   fmtres.argmin, fmtres.argmax);
+ else
+   inform (info.fmtloc,
+   "using the range [%qE, %qE] for directive argument",
+   fmtres.argmin, fmtres.argmax);

Don't you need G_ for these messages?

Can you please check the calls to fmtwarn which pass in the string as an 
argument and add G_ as needed?  I think the cases where the string is 
computed into a variable are all handled correctly, 

[Committed] Fix PR fortran/77612

2016-09-16 Thread Steve Kargl
I've committed the following patch.

2016-09-16  Steven G. Kargl  

PR fortran/77612
* decl.c (char_len_param_value): Check parent namespace for 
seen_implicit_none.


2016-09-16  Steven G. Kargl  

PR fortran/77612
* gfortran.dg/pr77612.f90: New test.


Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (revision 240140)
+++ gcc/fortran/decl.c  (working copy)
@@ -920,9 +920,10 @@ char_len_param_value (gfc_expr **expr, b
 
   t = gfc_reduce_init_expr (e);
 
-  if (!t && (e->ts.type == BT_UNKNOWN
-&& e->symtree->n.sym->attr.untyped == 1
-&& e->symtree->n.sym->ns->seen_implicit_none == 1))
+  if (!t && e->ts.type == BT_UNKNOWN
+ && e->symtree->n.sym->attr.untyped == 1
+ && (e->symtree->n.sym->ns->seen_implicit_none == 1
+ || e->symtree->n.sym->ns->parent->seen_implicit_none == 1))
{
  gfc_free_expr (e);
  goto syntax;
Index: gcc/testsuite/gfortran.dg/pr77612.f90
===
--- gcc/testsuite/gfortran.dg/pr77612.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr77612.f90   (working copy)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+
+program bad_len
+
+  implicit none
+
+contains
+
+  subroutine sub
+character(len = ICE) :: line ! { dg-error "INTEGER expression expected" }
+  end subroutine
+
+end program
-- 
Steve


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Joseph Myers
On Fri, 16 Sep 2016, Thomas Schwinge wrote:

> That's what I was afraid of: for example, I can't tell if it holds for
> all GCC configurations (back ends), that complex types' component types
> will always match one of the already existing global trees?  (I can

Well, a component type could certainly match a target-specific type 
instead (e.g. __ibm128 on powerpc, which if it's not long double won't be 
any of the other types either).  That's a type registered with 
lang_hooks.types.register_builtin_type, not one of the global trees.  
(You can't write _Complex __ibm128, but can get such a type with _Complex 
float __attribute__ ((__mode__ (__IC__))).  Or similarly, with ARM __fp16, 
the complex type _Complex float __attribute__ ((__mode__ (__HC__))).)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Joseph Myers
On Fri, 16 Sep 2016, Richard Biener wrote:

> Humm ... do we anywhere compare to those global trees by pointer equivalence?
> If so then it breaks LTO support for those types.

The C front end compares main variants to those types for handling usual 
arithmetic conversions (and more generally for type compatibility), but 
that's not relevant to LTO.  I don't think there should be such 
comparisons outside the front ends.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-09-16 Thread Jeff Law

On 09/08/2016 01:03 PM, Martin Sebor wrote:

Attached is another update to the patch to address the last round
of comments and suggestions, most notably to:

 *  implement the checking of the implementation limit of 4,095 on
the output of a single directive to allow for the Glibc failure
due to ENOMEM (the patch issues a warning and disables the
optimization when this happens)
 *  implement checking for exceeding INT_MAX bytes (warn and disable
optimization)
 *  call set_range_info when the return value optimization is not
possible
 *  remove code to work around tree-optimization/71831 (now on
trunk)

The -fprintf-return value optimization is still disabled.  GCC
successfully bootstraps with it and most tests pass but there's
a failure in the Fortran libgomp tests that I am yet to figure
out.
I'm just now getting back to this (sorry for the long delay).  I see 
there's another version after this one.  Did the version from 9/11/2016 
address the Fortran libgomp test issue?


jeff



Re: [PATCH] unbreak libsanitizer build on sparc-linux (PR 67899)

2016-09-16 Thread Jeff Law

On 09/12/2016 11:04 AM, Mikael Pettersson wrote:

PR sanitizer/67899 is a bootstrap failure on sparc-linux, caused by a
compilation error in libsanitizer.

The root cause is that `struct sigaction' has changed layout in glibc
twice recently, first an unintended ABI change in glibc-2.20, and then
the correction in glibc-2.22 (backported to the .20 and .21 branches).
(See glibc bz#18694).  Around the time of the first change, libsanitizer
was changed to match, but it wasn't updated for the correction.  The end
result is an ABI mismatch and assertion errors during compilation of
libsanitizer.  (The sa_flags field is of the wrong size and at the wrong
offset.)

Fixed by adjusting the SPARC part of libsanitizer's sigaction struct
to match glibc (except for the broken .20 and .21 initial releases).

Tested w/o regressions on sparc-linux-gnu (post-2.20 glibc), x86_64-linux-gnu,
powerpc-linux-gnu, armv7l-linux-gnueabi, and m68k-linux-gnu.

Is this Ok for trunk and 5/6 branches?

(Note: I don't have commit rights so if this is approved I would need
help to get it applied.)

Thanks,

/Mikael


libsanitizer/

2016-09-12  Mikael Pettersson  

PR sanitizer/67899
* sanitizer_common/sanitizer_platform_limits_posix.h
(__sanitizer_sigaction): Adjust for sparc targets.
I believe this is part of the upstream sanitizer projects at Google.  It 
should be submitted there first and pulled in via a merge.



Jeff


Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-16 Thread Jeff Law

On 09/08/2016 08:19 PM, Martin Sebor wrote:

While working on the -Wformat-length pass I noticed that in some
diagnostics that make use of the %qc and %qs directives GCC prints
non-printable characters raw.  For example, it might print a newline,
corrupting the diagnostic stream (bug 77521).

Some other diagnostics that try to avoid this problem by using
a directive such as %x when the character is not printable might
use the sign-extended value of the character, printing a very
large hexadecimal value (bug 77520).

The attached patch changes the pretty printer to detect non-printable
characters in %qc and %qs directives (but not %c or %s) and print
those in hexadecimal (via "\\x%02x").

Martin

PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong
preference for it.  Incidentally, %qE too suffers from bug 77520
(see below).  The patch doesn't try to fix that.

$ cat z.C && gcc z.C
constexpr int i = "ABC\x7f_\x80XYZ";
z.C:1:19: error: invalid conversion from ‘const char*’ to ‘int’
[-fpermissive]
 constexpr int i = "ABC\x7f_\x80XYZ";
   ^
z.C:1:19: error: ‘(int)((const char*)"ABC\177_\3777600XYZ")’ is not
a constant expression

gcc-77520.diff


PR c/77520 - wrong value for extended ASCII characters in -Wformat message
PR c/77521 - %qc format directive should quote non-printable characters

gcc/c-family/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* c-format.c (argument_parser::find_format_char_info): Use %qc
format directive unconditionally.

gcc/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* pretty-print.c (pp_quoted_string): New function.
(pp_format): Call it for %c and %s directives.

gcc/testsuite/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* gcc.dg/pr77520.c: New test.
* gcc.dg/pr77521.c: New test.
I was about to ack, the realized Joseph already did and you'd already 
installed the change :-)


Jeff



Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-16 Thread Jeff Law

On 09/09/2016 05:17 PM, Martin Sebor wrote:

On 09/09/2016 07:59 AM, Joseph Myers wrote:

On Thu, 8 Sep 2016, Martin Sebor wrote:


PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong


I'm not clear what you mean about ambiguity.  In C strings, an octal
escape sequence has up to three characters, so if it has three characters
it's unambiguous, whereas a hex escape sequence can have any number of
characters, so if the unprintable character is followed by a valid hex
digit then in C you need to represent that as an escape (or use string
constant concatenation, etc.).  The patch doesn't try to do that as
far as
I can see.

Now, presumably the output isn't intended to be interpreted as C strings
anyway (if it was, you'd need to escape " and \ as well), so the patch is
OK, but I don't think it avoids ambiguity (and there's a clear case that
it shouldn't - that if the string passed to %qs is printable, it
should be
printed as-is even if it contains escape sequences that could also result
from a non-printable string passed to %qs).


Thank you.

I tried to be clear about it in the description of the changes
but I see the PS caused some confusion.  Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences.  The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
diagnostics.

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).

What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)  To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.

In contrast to the octal notation, I find the string "\x1234"
clearer.  It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common.  But this is just my own perception and YMMV.
Both styles are ambiguous, but isn't that an inherent problem once we 
try to avoid non-printable characters by rendering them as octal or hex 
sequences?


I can't make a strong argument for either style over the other.

Jeff



Re: [PATCH] Fix warning breaking profiled bootstrap

2016-09-16 Thread Jeff Law

On 08/21/2016 02:30 PM, Eric Botcazou wrote:

Consider the case where sym1 results in a non-null return value (and
initializes neg1/inv1), but sym2 results in a null return value, leaving
neg2/inv2 undefined, but cst2 is can still be true (ADDR_EXPR with an
invariant address comes to mind).

Thus we can get into these statements:


   tree cst = cst1 ? val1 : val2;
   tree inv = cst1 ? inv2 : inv1;


Note carefully how they test cst1 and depending on its value, they may
read val2 or inv2.


The key here is that cst1 cannot be true if sym1 is non-null, same for cst2
and sym2.

The code is guarded with:

  /* If one is of the form '[-]NAME + CST' and the other is constant, then
 it might be possible to say something depending on the constants.  */
  if ((sym1 && inv1 && cst2) || (sym2 && inv2 && cst1))

If this is the first case, then cst1 is false and val2 and inv1 are read.
If this is the second case, then cst1 is true and val1 and inv2 are read.

So inv2 is read only in the second case, and is initialized.

THanks.  Sometimes this stuff gets painful to follow :(

It looks like Andi already checked in this change.  Andi, please don't 
do that.  THe patch wasn't approved at the time of your commit and 
there's no indication the change was regression tested.


Jeff



Re: RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-16 Thread Jeff Law

On 09/08/2016 02:59 PM, Jason Merrill wrote:

On Thu, Sep 8, 2016 at 11:55 AM, Joseph Myers  wrote:

> On Thu, 8 Sep 2016, Jason Merrill wrote:
>

>> Various places in GCC use negate, bit-and and compare to test whether
>> an integer is a power of 2, but I think it would be clearer for this
>> test to be wrapped in a function.

>
> (x & -x) == x is also true for 0.  Whatever the correct function semantics
> for 0, the comment needs to reflect them.

Yep, I was just realizing that, too.  This much larger patch introduces:

 least_bit_hwi: (x & -x)
 pow2_or_zerop: (x & -x) == x
 pow2p_hwi: x && x == (x & -x)
 ctz_or_zero: floor_log2 (x & -x)

and replaces these patterns accordingly.  I'm not at all attached to the names.

Tested x86_64-pc-linux-gnu.


hwint.diff


commit 4b76501b72f3953ac57cb077b4e25a90afb6d9a9
Author: Jason Merrill 
Date:   Thu Sep 8 13:10:12 2016 -0400

Add inline functions for various bitwise operations.

* hwint.h (least_bit_hwi, pow2_or_zerop, pow2p_hwi, ctz_or_zero):
New.
* hwint.c (exact_log2): Use pow2p_hwi.
(ctz_hwi, ffs_hwi): Use least_bit_hwi.
* alias.c (memrefs_conflict_p): Use pow2_or_zerop.
* builtins.c (get_object_alignment_2, get_object_alignment)
(get_pointer_alignment, fold_builtin_atomic_always_lock_free): Use
least_bit_hwi.
* calls.c (compute_argument_addresses, store_one_arg): Use
least_bit_hwi.
* cfgexpand.c (expand_one_stack_var_at): Use least_bit_hwi.
* combine.c (force_to_mode): Use least_bit_hwi.
* emit-rtl.c (set_mem_attributes_minus_bitpos, adjust_address_1):
Use least_bit_hwi.
* expmed.c (synth_mult, expand_divmod): Use ctz_or_zero, ctz_hwi.
(init_expmed_one_conv): Use pow2p_hwi.
* fold-const.c (round_up_loc, round_down_loc): Use pow2_or_zerop.
(fold_binary_loc): Use pow2p_hwi.
* function.c (assign_parm_find_stack_rtl): Use least_bit_hwi.
* gimple-fold.c (gimple_fold_builtin_memory_op): Use pow2p_hwi.
* gimple-ssa-strength-reduction.c (replace_ref): Use least_bit_hwi.
* hsa-gen.c (gen_hsa_addr_with_align, hsa_bitmemref_alignment):
Use least_bit_hwi.
* ipa-cp.c (ipcp_alignment_lattice::meet_with_1): Use least_bit_hwi.
* ipa-prop.c (ipa_modify_call_arguments): Use least_bit_hwi.
* omp-low.c (oacc_loop_fixed_partitions)
(oacc_loop_auto_partitions): Use least_bit_hwi.
* rtlanal.c (nonzero_bits1): Use ctz_or_zero.
* stor-layout.c (place_field): Use least_bit_hwi.
* tree-pretty-print.c (dump_generic_node): Use pow2p_hwi.
* tree-sra.c (build_ref_for_offset): Use least_bit_hwi.
* tree-ssa-ccp.c (ccp_finalize): Use least_bit_hwi.
* tree-ssa-math-opts.c (bswap_replace): Use least_bit_hwi.
* tree-ssa-strlen.c (handle_builtin_memcmp): Use pow2p_hwi.
* tree-vect-data-refs.c (vect_analyze_group_access_1)
(vect_grouped_store_supported, vect_grouped_load_supported)
(vect_permute_load_chain, vect_shift_permute_load_chain)
(vect_transform_grouped_load): Use pow2p_hwi.
* tree-vect-generic.c (expand_vector_divmod): Use ctz_or_zero.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use ctz_or_zero.
* tree-vect-stmts.c (vectorizable_mask_load_store): Use
least_bit_hwi.
* tsan.c (instrument_expr): Use least_bit_hwi.
* var-tracking.c (negative_power_of_two_p): Use pow2_or_zerop.
I was briefly worried about some of the expmed changes, but managed to 
convince myself they were correct.


Ok for the trunk.

Thanks,
jeff



[PATCH][Libiberty] Support empty arguments in pex-win32

2016-09-16 Thread Andrew Stubbs

Hi,

This patch fixes a libiberty bug in which zero-length arguments to 
"pex_run" subprocesses were silently dropped on MinGW.


Basically, the code does not quote parameters unless it has to, but this 
corner-case was forgotten.


OK?

Andrew

2016-09-16  Andrew Stubbs  

	libiberty/
	* pex-win32.c (argv_to_cmdline): Quote zero-length parameters.

	libiberty/testsuite/
	* test-pexecute.c (main): Insert check for zero-length parameters.

Index: libiberty/pex-win32.c
===
--- libiberty/pex-win32.c	(revision 240189)
+++ libiberty/pex-win32.c	(working copy)
@@ -370,6 +370,8 @@
 	  cmdline_len++;
 	}
 	}
+  if (j == 0)
+	needs_quotes = 1;
   /* Trailing backslashes also need to be escaped because they will be
  followed by the terminating quote.  */
   if (needs_quotes)
@@ -394,6 +396,8 @@
   break;
 }
 }
+  if (j == 0)
+	needs_quotes = 1;
 
   if (needs_quotes)
 {
Index: libiberty/testsuite/test-pexecute.c
===
--- libiberty/testsuite/test-pexecute.c	(revision 240189)
+++ libiberty/testsuite/test-pexecute.c	(working copy)
@@ -285,8 +285,22 @@
 ERROR ("echo exit status failed");
   pex_free (pex1);
 
+  /* Check empty parameters don't get lost.  */
   pex1 = TEST_PEX_INIT (PEX_USE_PIPES, "temp");
   subargv[1] = "echo";
+  subargv[2] = "foo";
+  subargv[3] = "";
+  subargv[4] = "bar";
+  subargv[5] = NULL;
+  TEST_PEX_RUN (pex1, 0, "./test-pexecute", subargv, NULL, NULL);
+  e = TEST_PEX_READ_OUTPUT (pex1);
+  CHECK_LINE (e, "foo  bar");  /* Two spaces!  */
+  if (TEST_PEX_GET_STATUS_1 (pex1) != 0)
+ERROR ("echo exit status failed");
+  pex_free (pex1);
+
+  pex1 = TEST_PEX_INIT (PEX_USE_PIPES, "temp");
+  subargv[1] = "echo";
   subargv[2] = "bar";
   subargv[3] = NULL;
   TEST_PEX_RUN (pex1, PEX_SUFFIX, "./test-pexecute", subargv, ".x", NULL);


RE: [RFC,PATCH] Using equivalences to help eliminate_regs_in_insn

2016-09-16 Thread Matthew Fortune
Vladimir N Makarov  writes:
> On 09/06/2016 11:22 AM, Matthew Fortune wrote:
> > There is an implementation that optimises a single set but not one for
> > a REG_EQUAL. Do you have any recollection of this code and do you
> > think there was a reason you didn't implement the REG_EQUAL case?
> > Either way any advice on my approach is welcome.
> Matt, sorry for delay with the answer.  I had a long vacation.
> 
> I did not write the code.  I mostly took it from the old register
> allocator practically without any changes.
> Looking at your prototype patch, I think you are doing an optimization
> which can be important for many targets.  Moreover, the optimization
> probably will be frequently applied for these targets.
> 
> I believe you should produce a final patch, test it well and submit it.
> 
> Thank you for finding the optimization opportunity and working on
> implementing it.

Thanks, I'll keep working on it and get a patch submitted. It may take a
little while but it will get done.

Matthew



Re: [PATCH, rs6000] Fix PR77613 (swap optimization for splat-with-truncate)

2016-09-16 Thread Segher Boessenkool
On Fri, Sep 16, 2016 at 10:13:09AM -0500, Bill Schmidt wrote:
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Ok for trunk, and eventual backport to 6 and 5 branches?

Okay and okay.  One nit...

> --- gcc/config/rs6000/rs6000.c(revision 240187)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -39105,6 +39105,11 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
>  && GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0)))
>   /* This catches V2DF and V2DI splat, at a minimum.  */
>   return 1;
> +  else if (GET_CODE (XEXP (op, 0)) == TRUNCATE
> +&& GET_CODE (XEXP (XEXP (op, 0), 0)) == REG

Please use REG_P here.

> +&& GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0)))
> + /* This catches splat of a truncated value.  */
> + return 1;
>else if (GET_CODE (XEXP (op, 0)) == VEC_SELECT)
>   /* If the duplicated item is from a select, defer to the select
>  processing to see if we can change the lane for the splat.  */

Thanks,


Segher


Re: [RFC,PATCH] Using equivalences to help eliminate_regs_in_insn

2016-09-16 Thread Vladimir N Makarov



On 09/06/2016 11:22 AM, Matthew Fortune wrote:

There is an implementation that optimises a single set but not one for
a REG_EQUAL. Do you have any recollection of this code and do you think
there was a reason you didn't implement the REG_EQUAL case? Either way
any advice on my approach is welcome.

Matt, sorry for delay with the answer.  I had a long vacation.

I did not write the code.  I mostly took it from the old register 
allocator practically without any changes.
Looking at your prototype patch, I think you are doing an optimization 
which can be important for many targets.  Moreover, the optimization 
probably will be frequently applied for these targets.


I believe you should produce a final patch, test it well and submit it.

Thank you for finding the optimization opportunity and working on 
implementing it.




Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-16 Thread Jeff Law

On 09/16/2016 07:28 AM, Wilco Dijkstra wrote:


I noticed rawmemchr taking non-trivial amounts of time in various profiles
despite no use of rawmemchr in any of the source code. It's apparently a
common idiom to use strchr (s, 0) to find the end of a string.

A bit of a surprise, but it is what it is I guess.

jeff




Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-16 Thread Jeff Law

On 09/16/2016 04:10 AM, Bernd Schmidt wrote:

On 09/16/2016 12:10 PM, Trevor Saunders wrote:

ok, going through all the casts added here I see the following reasons
for them.

- in md files operands is a array of rtx, we should probably have a
  different way to pass insns to the C code here.  That seems worth
  investigating incrementally.

- JUMP_LABEL can be a return which is not an insn.  That also seems
  like something that should be improved at some point.


These just show that fundamentally, rtl is just dynamically typed, and
used as such, which is why I was never massively enthusiastic about the
rtx -> rtx_insn conversion to begin with.
RTL is dynamically typed, but I think we can start carving off things 
like the insn chain objects -- they don't need to be dynamically typed. 
Essentially in my mind the insn chain doesn't need to be RTL.  It's 
really an RTL container.


It does mean there are things we have to fix up, but so far the things 
we've fixed up, I've considered clear improvements.


Jeff


[PATCH, rs6000] Fix PR77613 (swap optimization for splat-with-truncate)

2016-09-16 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77613 identifies a case
where we fail to remove swaps from a region because it contains a form
of splat that we don't yet recognize.  This patch adds support for splat
insns that have an embedded truncate, such as the vsx_vsplth_di pattern.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk, and eventual backport to 6 and 5 branches?

Thanks,
Bill


[gcc]

2016-09-16  Bill Schmidt  

PR target/77613
* config/rs6000/rs6000.c (rtx_is_swappable_p): Add support for
splat with truncate.

[gcc/testsuite]

2016-09-16  Bill Schmidt  

PR target/77613
* gcc.target/powerpc/swaps-p8-25.c: New.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 240187)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -39105,6 +39105,11 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
   && GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0)))
/* This catches V2DF and V2DI splat, at a minimum.  */
return 1;
+  else if (GET_CODE (XEXP (op, 0)) == TRUNCATE
+  && GET_CODE (XEXP (XEXP (op, 0), 0)) == REG
+  && GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0)))
+   /* This catches splat of a truncated value.  */
+   return 1;
   else if (GET_CODE (XEXP (op, 0)) == VEC_SELECT)
/* If the duplicated item is from a select, defer to the select
   processing to see if we can change the lane for the splat.  */
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-25.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-25.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-25.c  (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O3 " } */
+/* { dg-final { scan-assembler "lxvd2x" } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */
+
+/* Verify that swap optimization works correctly for a truncating splat.  */
+
+/* Test case to resolve PR77613.  */
+
+void pr77613 (signed short a, signed short *x, signed short *y)
+{
+  unsigned long i;
+
+  for (i = 0; i < 1024; i++)
+y[i] = a * x[i] + y[i];
+}



Re: [PATCH 2/8] use rtx_insn * more

2016-09-16 Thread Alexander Monakov
On Wed, 14 Sep 2016, tbsaunde+...@tbsaunde.org wrote:
> @@ -3790,7 +3790,8 @@ static void
>  rl78_calculate_death_notes (void)
>  {
>char dead[FIRST_PSEUDO_REGISTER];
> -  rtx insn, p, s, d;
> +  rtx p, s, d;
> +rtx_insn *insn;

Minor nit: this hunk seems to be missing indentation preceding 'rtx_insn'.

Alexander


Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-16 Thread Trevor Saunders
On Fri, Sep 16, 2016 at 12:10:51PM +0200, Bernd Schmidt wrote:
> On 09/16/2016 12:10 PM, Trevor Saunders wrote:
> > ok, going through all the casts added here I see the following reasons
> > for them.
> > 
> > - in md files operands is a array of rtx, we should probably have a
> >   different way to pass insns to the C code here.  That seems worth
> >   investigating incrementally.
> > 
> > - JUMP_LABEL can be a return which is not an insn.  That also seems
> >   like something that should be improved at some point.
> 
> These just show that fundamentally, rtl is just dynamically typed, and used
> as such, which is why I was never massively enthusiastic about the rtx ->
> rtx_insn conversion to begin with.

I would agree that rtl is dynamically typed at the moment, and viewing
rtx_insn as an attempt to change that is certainly reasonable.  I don't
know that any of these things mean rtl has to be dynamically typed.  I
think that effort has already helped some things, I wouldn't want to
think about trying to get rid of rtx_insn_list without it.  Further I
expect it will make it possible to change the data structures here more,
and I suspect there is room for cleverness there that isn't possible
with dynamic typing.  Changing away from dynamic typing ccertainly isn't
easy, but I think there's a good amount of evidence it isn't well suited
to large complicated projects.

Alternatively if you take the view that dynamic typing is just a special
case of static typing with one type then adding more types allows you to
reduce the number of places you need to check the "type" of an object.
If we can enable some of what rtl checking gets us with out the compile
time penalty that certainly seems valuable.

Trev

> 
> 
> Bernd


Re: [PATCH 1/3] Put a TARGET_LRA_P into every target

2016-09-16 Thread Segher Boessenkool
On Fri, Sep 16, 2016 at 12:26:42PM +0200, Gerald Pfeifer wrote:
> On Wed, 14 Sep 2016, Segher Boessenkool wrote:
> > 2016-09-14  Segher Boessenkool  
> > 
> > * target.def (lra_p): Change commentary (for the manual) for the
> > new default.
> > * doc/tm.texi: Regenerate.
> 
> "returns always true" -> "always returns true" ?
> 
> (The former is how we'd say it in German, and hence might be common in 
> Dutch as well?  In English, both probably are fine, the latter feeling 
> more natural to me.  But then, I'm not a native speaker. ;-)

In Dutch we would do this altogether differently.  I did not change this
text at all though (except s/false/true/) ;-)

Does just "The default version of this target hook returns true." sound
better?  I.e. delete "always".


Segher


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Thomas Schwinge
Hi!

On Fri, 16 Sep 2016 10:59:16 +0200, Richard Biener  
wrote:
> On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
>  wrote:
> > On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
> >> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
> >>  wrote:
> >> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge 
> >> >  wrote:
> >> > > As I noted in :
> >> > >
> >> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
> >> > > types", nvptx
> >> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind 
> >> > > of data-type
> >> > > mismatch that is not visible with Intel MIC offloading (using the 
> >> > > same data
> >> > > types) but explodes with nvptx.  I'm having a look.
> >
> >> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge 
> >> > by
> >> > frontend (!) it wasn't even designed to cope with global trees being 
> >> > present
> >> > or not dependent on target (well, because the target is always the
> >> > same! mind you!)
> >>
> >> Scary.  ;-/
> >>
> >> > Now -- in theory it should deal with NULLs just fine (resulting in
> >> > error_mark_node), but it can diverge when there are additional
> >> > compount types (like vectors, complex
> >> > or array or record types) whose element types are not in the set of
> >> > global trees.
> >> > The complex _FloatN types would be such a case given they appear before 
> >> > their
> >> > components.  That mixes up the ordering at least.
> >>
> >> ACK, but that's also an issue for "regular" float/complex float, which
> >> also is in "reverse" order.  But that's "fixed" by the recursion in
> >> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
> >> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
> >
> > So, that mechanism does work, but what's going wrong is the following:
> > with differing target vs. offload target, we potentially (and actually
> > do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> > _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> > NULL_TREE), and so it follows their complex variants also don't exist,
> > and thus the recursion that I just mentioned for complex types is no
> > longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> > get an offset (of two, in this specific case), and consequently streaming
> > explodes, for example, as soon as it hits a forward-reference (due to
> > looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> > should be looking for tree 183 (nvptx lto1 view).
> >
> >> (I've
> >> put "fixed" in quotes -- doesn't that recursion mean that we're thus
> >> putting "complex float", "float", [...], "float" (again) into the cache?
> >> Anyway that's for later...)
> >
> > Maybe it would make sense to do this tree streaming in two passes: first
> > build a set of what we actually need, and then stream that, without
> > duplicates.  (Or, is also for these "pickled" trees their order relevant,
> > so that one tree may only refer to other earlier but not later ones?
> > Anyway, we could still remember the set of trees already streamed, and
> > avoid the double streaming I described?)
> >
> > So I now understand that due to the stream format, the integer tree IDs
> > (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> > make that work for x86_64 target vs. nvptx offload target being
> > different.  (I'm pondering some ideas about how to rework that integer
> > tree ID generation.)
> >
> > (I have not digested completely yet what the implications are for the
> > skipping we're doing for some trees in preload_common_nodes), but here is
> > a first patch that at least gets us rid of the immediate problem.  (I'll
> > fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> > gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> > now?
> 
> Humm ... do we anywhere compare to those global trees by pointer equivalence?

I have not verified that.  Does GCC permit/forbid that on a case-by-case
basis -- which seems very fragile to me?

> If so then it breaks LTO support for those types.

OK, I think I understand that -- but I do have a "lto_stream_offload_p"
conditional in my code changes, so these changes should only affect the
offloading stream, in my understanding?

> I think forcing proper ordering so that we can assert that at the
> point we'd like
> to recurse the nodes we recurse for are already in the cache would be a better
> fix.

ACK.  That's what I'd planned to look into as a next step.

> This might need some additional global trees in case components do not
> explicitely exist

That's what I was afraid of: for example, I can't tell if it holds for
all GCC configurations (back ends), that complex types' component types
will always match one of the already existing 

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-16 Thread Wilco Dijkstra
Bernd Schmidt wrote:
> On 09/15/2016 03:38 PM, Wilco Dijkstra wrote:
> > __rawmemchr is not the fastest on any target I tried, including x86,
>
> Interesting. Care to share your test program? I just looked at the libc 
> sources and strlen/rawmemchr are practically identical code so I'd 
> expect any difference to be lost in the noise. Of course there might be 
> inlines interfering with the comparison.

It's glibc/benchtests/bench-strlen.c slightly modified to compare strlen,
rawmemchr and strchr. Even if they appear identical the inner loop of strlen
is much faster than strchr and rawmemchr at larger sizes:

  strchr  
rawmemchr  strlen
Length 4096, alignment 12:  3.35132e+06 2.39842e+06 1.88962e+06

> > So the only reasonable optimization is to always emit a + strlen (a).
>
> Not sure about "only reasonable" but on the whole I'd agree that it's 
> reasonable and we shouldn't let the perfect be the enemy of the good 
> here. I'm sure we can come up with lots of different ways to do this but 
> let's just pick one and if the one Wilco submitted looks decent let's 
> just put it in.
> 
> Out of curiousity, is there real-world code that this is intended to 
> optimize?

I noticed rawmemchr taking non-trivial amounts of time in various profiles
despite no use of rawmemchr in any of the source code. It's apparently a
common idiom to use strchr (s, 0) to find the end of a string. Given strchr is
slower than strlen, it is changed to rawmemchr by GLIBC headers. However
this makes things even slower since few targets have an optimized rawmemchr,
and for targets that do, strlen is faster.

So this is one of many improvements to ensure GCC/GLIBC by default do 
optimizations in a way that is best for most targets. If a particular target
wants to do something different that is always possible of course.

Wilco



Re: [PATCH 1/3] Put a TARGET_LRA_P into every target

2016-09-16 Thread Richard Kenner
> "returns always true" -> "always returns true" ?
> 
> (The former is how we'd say it in German, and hence might be common in 
> Dutch as well?  In English, both probably are fine, the latter feeling 
> more natural to me.  But then, I'm not a native speaker. ;-)

The former is unusual in English and borders on being "wrong".  It
would be parsed as saying that it returns something called "always
true", as if that were a construct.


[PATCH] Fix PR77605

2016-09-16 Thread Richard Biener

The following patch fixes PR77605.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-09-16  Richard Biener  

PR middle-end/77605
* tree-data-ref.c (analyze_subscript_affine_affine): Use the
proper niter to bound the loops.

* gcc.dg/torture/pr77605.c: New testcase.

Index: gcc/tree-data-ref.c
===
*** gcc/tree-data-ref.c (revision 240176)
--- gcc/tree-data-ref.c (working copy)
*** analyze_subscript_affine_affine (tree ch
*** 2686,2698 
  
  if (niter > 0)
{
! HOST_WIDE_INT tau2 = MIN (FLOOR_DIV (niter - i0, i1),
!   FLOOR_DIV (niter - j0, j1));
  HOST_WIDE_INT last_conflict = tau2 - (x1 - i0)/i1;
  
  /* If the overlap occurs outside of the bounds of the
 loop, there is no dependence.  */
! if (x1 >= niter || y1 >= niter)
{
  *overlaps_a = conflict_fn_no_dependence ();
  *overlaps_b = conflict_fn_no_dependence ();
--- 2682,2694 
  
  if (niter > 0)
{
! HOST_WIDE_INT tau2 = MIN (FLOOR_DIV (niter_a - i0, i1),
!   FLOOR_DIV (niter_b - j0, j1));
  HOST_WIDE_INT last_conflict = tau2 - (x1 - i0)/i1;
  
  /* If the overlap occurs outside of the bounds of the
 loop, there is no dependence.  */
! if (x1 >= niter_a || y1 >= niter_b)
{
  *overlaps_a = conflict_fn_no_dependence ();
  *overlaps_b = conflict_fn_no_dependence ();
Index: gcc/testsuite/gcc.dg/torture/pr77605.c
===
*** gcc/testsuite/gcc.dg/torture/pr77605.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr77605.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do run } */
+ 
+ int a, b, c[2][8];
+ 
+ int main ()
+ {
+   for (a = 0; a < 8; a++)
+ for (b = 0; b < 2; b++)
+   c[b][a] = c[b][b + 6] ^ 1;
+ 
+   if (c[0][7] != 0) 
+ __builtin_abort ();
+ 
+   return 0; 
+ }


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Rainer Orth
Hi Jonathan,

> I've committed the patch now, to fix the failures for Solaris. I'll
> revisit it when I get clarification from the committee about invalid
> alignment arguments.

Solaris bootstraps with the revised patch included just completed
successfully.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Jonathan Wakely

On 16/09/16 11:56 +0100, Jonathan Wakely wrote:

On 16/09/16 11:37 +0200, Marc Glisse wrote:

On Fri, 16 Sep 2016, Jonathan Wakely wrote:


On 16/09/16 09:04 +0200, Rainer Orth wrote:

Hi Jason,


OK, one more:


this works just fine on both sparc-sun-solaris2.12 and
i386-pc-solaris2.12.

Once Jonathan's patch to heed aligned_alloc's requirement on size being
a multiple of alignment is in, all is fine on Solaris.


I've got a slightly different fix now.

We only need to make the size a multiple of alignment for
aligned_alloc, however for posix_memalign we need to ensure the
alignment is a multiple of sizeof(void*).

I'm testing this now (but only on x86_64 GNU/Linux where it wasn't
failing anyway).


+  // The value of alignment shall be a power of two multiple of sizeof(void *).
+  if (al < sizeof(void*))
+al = sizeof(void*);

The code doesn't exactly match the comment. I can't find the 
precondition in the standard that says operator new can only be 
called on a power of 2... (maybe we can add it if it is really 
missing?)


[basic.align] says "Every alignment value shall be a non-negative
integral power of two." So asking operator new for any other value
doesn't make sense, but I can't find a restriction on doing so.

I was assuming we only need to ensure it's possible to use valid
alignments such as align_val_t(2) which are not valid arguments to
posix_memalign. For other values such as align_val_t(15) I was
assuming it's OK for posix_memalign to fail, so we throw bad_alloc.

If that's not the case then we need to round up all alignments that
aren't power of two multiples of sizeof(void*). I'd like to avoid
that.


Would using __builtin_expect (sz == 0, false) make sense?  Surely it's
rare to try to allocate zero bytes.


https://gcc.gnu.org/ml/libstdc++/2014-03/msg1.html

gcc already guesses that a test like sz == 0 is usually false (not 
with as large a probability as if you use __builtin_expect, but 
enough that the generated code is unlikely to differ). But adding 
__builtin_expect cannot hurt...


Is the division (by a non-constant denominator) really necessary?


Probably not, but I've asked the committee for clarification what this
function should do when called with an invalid alignment.

Since align has to be a power of 2, x % align should be the same as 
x & (align - 1), for instance.


Thanks, if it's UB to call it with alignments that aren't a power of
two then we can do that.


I've committed the patch now, to fix the failures for Solaris. I'll
revisit it when I get clarification from the committee about invalid
alignment arguments.




Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Jan Hubicka
> > > I also like a new param better as it avoids a new magic constant and
> > > makes playing with
> > > it easier (your patch removes the ability to do statistics like you did 
> > > via the
> > > early-inlining-insns parameter by forcing it to two).
> > 
> > Hmm, you are right that you do not know if this particular function will get
> > profile (forgot about that).  Still, please use two params - it is more
> > consistent with what we have now and we may make it profile specific in
> > future..
> > 
> > Honza
> > > 
> > > Thanks,
> > > Richard.
> 
> A new patch for trunk is attached.
> 
> Regards,
> Yuan, Pengfei
> 
> 
> 2016-09-16  Yuan Pengfei  
> 
>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.

OK,
thanks

Honza


[Patch, testsuite] Require int32plus for pr70421.c

2016-09-16 Thread Senthil Kumar Selvaraj
Hi,

  This patch fixes a bogus testsuite failure for the avr target. The
  test has integer literals that only fit on targets with int size >= 32.

  Committed to trunk.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-09-16  Senthil Kumar Selvaraj  

* gcc.dg/torture/pr70421.c: Require int32plus.

Index: gcc.dg/torture/pr70421.c
===
--- gcc.dg/torture/pr70421.c(revision 240004)
+++ gcc.dg/torture/pr70421.c(working copy)
@@ -1,5 +1,6 @@
 /* PR target/70421 */
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 /* { dg-additional-options "-Wno-psabi -w" } */
 
 typedef unsigned V __attribute__ ((vector_size (64)));


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > I also like a new param better as it avoids a new magic constant and
> > makes playing with
> > it easier (your patch removes the ability to do statistics like you did via 
> > the
> > early-inlining-insns parameter by forcing it to two).
> 
> Hmm, you are right that you do not know if this particular function will get
> profile (forgot about that).  Still, please use two params - it is more
> consistent with what we have now and we may make it profile specific in
> future..
> 
> Honza
> > 
> > Thanks,
> > Richard.

A new patch for trunk is attached.

Regards,
Yuan, Pengfei


2016-09-16  Yuan Pengfei  

* doc/invoke.texi (--param early-inlining-insns-feedback): New.
* ipa-inline.c (want_early_inline_function_p): Use
PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
* params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
(PARAM_EARLY_INLINING_INSNS): Change help string accordingly.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8eb5eff..6e7659a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9124,12 +9124,18 @@ given call expression.  This parameter limits inlining 
only to call expressions
 whose probability exceeds the given threshold (in percents).
 The default value is 10.
 
 @item early-inlining-insns
+@itemx early-inlining-insns-feedback
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
 The default value is 14.
 
+The @option{early-inlining-insns-feedback} parameter is used only when
+profile feedback-directed optimizations are enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}).
+The default value is 2.
+
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
 Deeper chains are still handled by late inlining.
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 5c9366a..e028c08 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
 }
   else
 {
   int growth = estimate_edge_growth (e);
+  int growth_limit;
   int n;
 
+  if ((profile_arc_flag && !flag_test_coverage)
+ || (flag_branch_probabilities && !flag_auto_profile))
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
+  else
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
+
   if (growth <= 0)
;
   else if (!e->maybe_hot_p ()
   && growth > 0)
@@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 xstrdup_for_dump (callee->name ()), callee->order,
 growth);
  want_inline = false;
}
-  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  else if (growth > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns\n",
@@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 growth);
  want_inline = false;
}
   else if ((n = num_calls (callee)) != 0
-  && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  && growth * (n + 1) > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns "
diff --git a/gcc/params.def b/gcc/params.def
index 79b7dd4..91ea513 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
 DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 "ipcp-unit-growth",
 "How much can given compilation unit grow because of the 
interprocedural constant propagation (in percent).",
 10, 0, 0)
-DEFPARAM(PARAM_EARLY_INLINING_INSNS,
-"early-inlining-insns",
-"Maximal estimated growth of function body caused by early inlining of 
single call.",
-14, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS_FEEDBACK,
+ "early-inlining-insns-feedback",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are enabled.",
+ 2, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS,
+ "early-inlining-insns",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are not enabled.",
+ 14, 0, 0)
 DEFPARAM(PARAM_LARGE_STACK_FRAME,
 "large-stack-frame",
 "The size of stack frame to be considered large.",
 256, 0, 0)



Re: [PATCH] Fix late dwarf generated early from optimized out globals

2016-09-16 Thread Richard Biener
On Thu, 15 Sep 2016, Richard Biener wrote:

> 
> This addresses sth I needed to address with the early LTO debug patches
> (you might now figure I'm piecemail merging stuff from that patch).
> 
> When the cgraph code optimizes out a global we call the late_global_decl
> debug hook to eventually add a DW_AT_const_value to its DIE (we don't
> really expect a location as that will be invalid after optimizing out
> and will be pruned).
> 
> With the early LTO debug patches I have introduced a early_dwarf_finished
> flag (mainly for consistency checking) and I figured I can use that to
> detect the call to the late hook during the early phase and provide
> the following cleaned up variant of avoiding to create locations that
> require later pruning (which doesn't work with emitting the early DIEs).
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> I verified it does the correct thing for a unit like
> 
> static const int i = 2;
> 
> (but ISTR we do have at least one testcase in the testsuite as well).
> 
> Will commit if testing finishes successfully.

Ok, so it showed issues when merging that back to early-LTO-debug.
Turns out in LTO we never call early_finish and thus early_dwarf_finished
was never set.  Also dwarf2out_late_global_decl itself is a better
place to constrain generating locations.

The following variant is in very late stage of testing.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
LTO bootstrap on x86_64-unknown-linux-gnu in stage3.  LTO bootstrap
with early-LTO-debug in stage3, bootstraped with early-LTO-debug,
testing in progress.

Richard.

2016-09-16  Richard Biener  

* dwarf2out.c (early_dwarf_finished): New global.
(set_early_dwarf::set_early_dwarf): Assert early_dwarf_finished
is false.
(dwarf2out_early_finish): Set early_dwarf_finished at the end,
if called from LTO exit early.
(dwarf2out_late_global_decl): When being during the early
debug phase do not add locations but only const value attributes.
Adjust the way we generate early DIEs for LTO.

lto/
* lto.c (lto_main): Invoke early_finish debug hook.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 240164)
+++ gcc/dwarf2out.c (working copy)
@@ -2711,9 +2711,14 @@ die_node;
 
 /* Set to TRUE while dwarf2out_early_global_decl is running.  */
 static bool early_dwarf;
+static bool early_dwarf_finished;
 struct set_early_dwarf {
   bool saved;
-  set_early_dwarf () : saved(early_dwarf) { early_dwarf = true; }
+  set_early_dwarf () : saved(early_dwarf)
+{
+  gcc_assert (! early_dwarf_finished);
+  early_dwarf = true;
+}
   ~set_early_dwarf () { early_dwarf = saved; }
 };
 
@@ -23878,18 +23883,31 @@ dwarf2out_early_global_decl (tree decl)
 static void
 dwarf2out_late_global_decl (tree decl)
 {
-  /* We have to generate early debug late for LTO.  */
-  if (in_lto_p)
-dwarf2out_early_global_decl (decl);
-
-/* Fill-in any location information we were unable to determine
-   on the first pass.  */
+  /* Fill-in any location information we were unable to determine
+ on the first pass.  */
   if (TREE_CODE (decl) == VAR_DECL
   && !POINTER_BOUNDS_P (decl))
 {
   dw_die_ref die = lookup_decl_die (decl);
+
+  /* We have to generate early debug late for LTO.  */
+  if (! die && in_lto_p)
+   {
+ dwarf2out_decl (decl);
+ die = lookup_decl_die (decl);
+   }
+
   if (die)
-   add_location_or_const_value_attribute (die, decl, false);
+   {
+ /* We get called during the early debug phase via the symtab
+code invoking late_global_decl for symbols that are optimized
+out.  When the early phase is not finished, do not add
+locations.  */
+ if (! early_dwarf_finished)
+   tree_add_const_value_attribute_for_decl (die, decl);
+ else
+   add_location_or_const_value_attribute (die, decl, false);
+   }
 }
 }
 
@@ -28137,6 +28155,14 @@ dwarf2out_early_finish (void)
 {
   set_early_dwarf s;
 
+  /* With LTO early dwarf was really finished at compile-time, so make
+ sure to adjust the phase after annotating the LTRANS CU DIE.  */
+  if (in_lto_p)
+{
+  early_dwarf_finished = true;
+  return;
+}
+
   /* Walk through the list of incomplete types again, trying once more to
  emit full debugging info for them.  */
   retry_incomplete_types ();
@@ -28163,6 +28189,9 @@ dwarf2out_early_finish (void)
}
 }
   deferred_asm_name = NULL;
+
+  /* The early debug phase is now finished.  */
+  early_dwarf_finished = true;
 }
 
 /* Reset all state within dwarf2out.c so that we can rerun the compiler
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 240164)
+++ gcc/lto/lto.c   

Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Kyrill Tkachov


On 16/09/16 11:45, Bernd Schmidt wrote:

On 09/16/2016 10:40 AM, Kyrill Tkachov wrote:


2016-09-16  Kyrylo Tkachov  

* simplify-rtx.c (simplify_relational_operation_1): Add transformation
(GTU (PLUS a C) (C - 1)) --> (LTU a -C).

2016-09-16  Kyrylo Tkachov  

* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.


Ok. Don't know if you want to add more variants of the input code to the 
testcase to make sure they're all covered.



Thanks.
I'm having trouble writing testcases for variations of the original testcase as 
GCC really really wants to convert
everything to a comparison against 1 at RTL level, so only the x == -2 || x == 
-1 condition seems to trigger this.
However, testcases of the form:
unsigned int
foo (unsigned int a, unsigned int b)
{
  return (a + 10) > 9;
}

seem to trigger it, so I can add some of this form. However, these will be 
optimised by a match.pd version
of this transformation that I'm working on.

Kyrill



Bernd




Re: [PING] set libfunc entry for sdivmod_optab to NULL in optabs.def

2016-09-16 Thread Prathamesh Kulkarni
On 15 September 2016 at 17:53, Richard Biener
 wrote:
> On Thu, Sep 15, 2016 at 1:21 PM, Prathamesh Kulkarni
>  wrote:
>> On 15 September 2016 at 16:31, Richard Sandiford
>>  wrote:
>>> Prathamesh Kulkarni  writes:
 On 15 September 2016 at 04:21, Richard Sandiford
  wrote:
> Richard Sandiford  writes:
>> Prathamesh Kulkarni  writes:
>>> Hi,
>>> I would like to ping the following patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
>>>
>>> While implementing divmod transform:
>>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>>>
>>> I ran into an  issue with optab_libfunc().
>>> It appears optab_libfunc (sdivmod_optab, mode) returns
>>> bogus libfunc for unsupported modes, for instance
>>> on x86_64, optab_libfunc (sdivmod_optab, DImode) returns
>>> a libfunc with name "__divmoddi4", even though such a libfunc
>>> does not exist in libgcc. This happens because in optabs.def
>>> the libfunc entry for sdivmod_optab has gen_int_libfunc,
>>> and call to optab_libfunc (sdivmo_optab, DImode) lazily
>>> creates a bogus libfunc "__divmoddi4" by calling gen_int_libfunc().
>>>
>>> To work around this issue I set libfunc entry for sdivmod_optab to NULL
>>> and verified that optab_libfunc (sdivmod_optab, DImode) returns NULL_RTX
>>> instead of a bogus libfunc if it's not overriden by the target.
>>>
>>> Bootstrapped and tested on ppc64le-linux-gnu, x86_64-linux-gnu.
>>> Cross tested on arm*-*-*, aarch64*-*-*.
>>> OK for trunk ?
>>
>> I'm not a maintainer for this area, but:
>
> ...in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
> you said that c6x follows the return-by-pointer convention.
> I'm no c6x expert, but from a quick look, I think its divrem
> function returns a div/mod pair in A4/A5, which matches the
> ARM convention of returning both results by value.
>
> Does anyone know if the optab function registered by the SPU
> backend is ever called directly?
>
> You mention that this is latent as far as expand_twoval_binop_libfunc
> is concerned.  AIUI expand_twoval_binop_libfunc implements the ARM/c6x
> convention and expects the two values to be returned as a pair.
> It then extracts one half of the pair and discards the other.
> So my worry is that we're leaving the udivmod entry intact even though
> the standard __udivmoddi4 doesn't do what expand_twoval_binop_libfunc
> expects.
>
> Would it make sense to set both entries to null and treat __udivmoddi4
> as a non-optab function?  ARM and c6x could then continue to register
> their current optab functions and a non-null optab function would
> indicate a return value pair.
 AFAIU, there are only three targets (c6x, spu, arm) that override
 optab_libfunc for udivmod_optab for following modes:
 ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, SImode, 
 "__c6xabi_divremu");
 ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, DImode, 
 "__c6xabi_divremull");
 ./arm/arm.c:  set_optab_libfunc (udivmod_optab, DImode, 
 "__aeabi_uldivmod");
 ./arm/arm.c:  set_optab_libfunc (udivmod_optab, SImode, 
 "__aeabi_uidivmod");
 ./spu/spu.c:  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
 ./spu/spu.c:  set_optab_libfunc (udivmod_optab, TImode, "__udivmodti4");

 Out of these only the arm, and c6x have target-specific divmod libfuncs 
 which
 return  pair, while spu merely makes it point to the
 standard functions.

 So we could set libfunc entry for udivmod_optab to NULL, thus dropping
 support for generic
 divmod functions (__udivmoddi4, __udivmodti4). For targets that
 require standard divmod libfuncs like __udivmoddi4,
 they could explicitly override  optab_libfunc and set it to
 __udivmoddi4, just as spu does.

 However this implies that non-null optab function doesn't necessarily
 follow arm/c6x convention.
 (i686-gcc for instance generates call to libgcc routines
 __udivdi3/__umoddi3 for DImode division/mod operations
 and could profit from divmod transform by calling __udivmoddi4).
>>>
>>> What I meant was that we shouldn't treat udivmoddi4 as an optab function
>>> at all, but handle it with some on-the-side mechanism.  That seems like
>>> quite a natural fit if we handle the fused div/mod operation as an
>>> internal function during gimple.
>> Ah right, thanks for pointing out. So if optab function for [us]divmod_optab
>> is defined, then it must follow the arm/c6x convention ?
>>>
>>> I think the current SPU code is wrong, but it looks like a latent bug.
>>> (Like I say, does the udivmodti4 

Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Jonathan Wakely

On 16/09/16 11:37 +0200, Marc Glisse wrote:

On Fri, 16 Sep 2016, Jonathan Wakely wrote:


On 16/09/16 09:04 +0200, Rainer Orth wrote:

Hi Jason,


OK, one more:


this works just fine on both sparc-sun-solaris2.12 and
i386-pc-solaris2.12.

Once Jonathan's patch to heed aligned_alloc's requirement on size being
a multiple of alignment is in, all is fine on Solaris.


I've got a slightly different fix now.

We only need to make the size a multiple of alignment for
aligned_alloc, however for posix_memalign we need to ensure the
alignment is a multiple of sizeof(void*).

I'm testing this now (but only on x86_64 GNU/Linux where it wasn't
failing anyway).


+  // The value of alignment shall be a power of two multiple of sizeof(void *).
+  if (al < sizeof(void*))
+al = sizeof(void*);

The code doesn't exactly match the comment. I can't find the 
precondition in the standard that says operator new can only be called 
on a power of 2... (maybe we can add it if it is really missing?)


[basic.align] says "Every alignment value shall be a non-negative
integral power of two." So asking operator new for any other value
doesn't make sense, but I can't find a restriction on doing so.

I was assuming we only need to ensure it's possible to use valid
alignments such as align_val_t(2) which are not valid arguments to
posix_memalign. For other values such as align_val_t(15) I was
assuming it's OK for posix_memalign to fail, so we throw bad_alloc.

If that's not the case then we need to round up all alignments that
aren't power of two multiples of sizeof(void*). I'd like to avoid
that.


Would using __builtin_expect (sz == 0, false) make sense?  Surely it's
rare to try to allocate zero bytes.


https://gcc.gnu.org/ml/libstdc++/2014-03/msg1.html

gcc already guesses that a test like sz == 0 is usually false (not 
with as large a probability as if you use __builtin_expect, but enough 
that the generated code is unlikely to differ). But adding 
__builtin_expect cannot hurt...


Is the division (by a non-constant denominator) really necessary? 


Probably not, but I've asked the committee for clarification what this
function should do when called with an invalid alignment.

Since align has to be a power of 2, x % align should be the same as x 
& (align - 1), for instance.


Thanks, if it's UB to call it with alignments that aren't a power of
two then we can do that.

I guess people interested in performance will do for aligned new the 
same as for the old new: provide an inline version that skips all the 
overhead to forward directly to malloc/aligned_alloc (and avoid 
questionable calls in their code).


--
Marc Glisse


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Bernd Schmidt

On 09/16/2016 10:40 AM, Kyrill Tkachov wrote:


2016-09-16  Kyrylo Tkachov  

* simplify-rtx.c (simplify_relational_operation_1): Add transformation
(GTU (PLUS a C) (C - 1)) --> (LTU a -C).

2016-09-16  Kyrylo Tkachov  

* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.


Ok. Don't know if you want to add more variants of the input code to the 
testcase to make sure they're all covered.



Bernd


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2016 at 12:29:49PM +0200, Richard Biener wrote:
> > PS What would be a good way to arrange for the VRP pass to run before
> > the object size pass so that the latter can benefit more from range
> > information?  As an experiment I added another instance of the VRP
> > pass before the object size pass in passes.def and that worked, but
> > I suspect that running VRP a third time isn't optimal.

As I said in PR77606, I have strong doubts about desirability of using VRP
info for __builtin_object_size computation.  VRP is an optimization that
assumes undefined behavior does not happen, __builtin_object_size is
typically used to catch undefined behavior, so pretty much assumes undefined
behavior can happen.  So, the __builtin_object_size computations should
prove no other value can appear in any program invocation, rather than
just any valid program invocation.  Sure, it is a best effort, with
possibility to return "unknown" or conservatively correct answers, but using
VRP info is IMHO already too much.

Jakub


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 5:29 AM, Martin Sebor  wrote:
> __builtin_object_size fails for POINTER_PLUS expressions involving
> non-constant offsets into objects of known size, causing GCC to fail
> to detect (and add instrumentation to prevent) buffer overflow in
> some basic cases such as the following:
>
>   void f (unsigned i)
>   {
> char d [3];
> memcpy (d + i, "abcdef", 5);
>   }
>
> Since the size of the destination object is known, the call to memcpy
> is guaranteed to write past the end of it regardless of the value of
> the offset.
>
> The attached patch enhances __builtin_object_size to handle this case
> by returning the size of the whole object as the maximum and the size
> of the object minus T_MAX for the type of the offset T as the minimum.
>
> The patch also adds handling of ranges even though only very few cases
> benefit from it because the VRP pass runs after the object size pass.
> The one case that does appear to profit is when the value of the offset
> is constrained by its type, as in
>
>   char a [1000];
>
>   unsigned g (unsigned char i)
>   {
> char *p = a + i;
> return __builtin_object_size (p, 2);
>   }
>
> Here get_range_info () lets __builtin_object_size determine that
> the minimum number of bytes between (a + i) and (a + sizeof s) is
> sizeof a - UCHAR_MAX.
>
> The patch results in 64 more checking calls in a Binutils build than
> before.

I believe that you can't simply use the min/max of the ranges the way you
do nor can you assume zero for the minimum size as op1 might be
negative (for POINTER_PLUS_EXPR you need to interpret the offset
as signed).

Unless I completely misremember how tree-object-size.c works, of course.

Say,

char a[1000];

unsigned g (signed char i)
{
  char *p = [10] + i;
  return __builtin_object_size (p, 1);
}

range for i will be [-128, 127] but we'd like to return 1000 here I think.

Richard.

> Martin
>
> PS What would be a good way to arrange for the VRP pass to run before
> the object size pass so that the latter can benefit more from range
> information?  As an experiment I added another instance of the VRP
> pass before the object size pass in passes.def and that worked, but
> I suspect that running VRP a third time isn't optimal.


Re: [PATCH 1/3] Put a TARGET_LRA_P into every target

2016-09-16 Thread Gerald Pfeifer
On Wed, 14 Sep 2016, Segher Boessenkool wrote:
> 2016-09-14  Segher Boessenkool  
> 
>   * target.def (lra_p): Change commentary (for the manual) for the
>   new default.
>   * doc/tm.texi: Regenerate.

"returns always true" -> "always returns true" ?

(The former is how we'd say it in German, and hence might be common in 
Dutch as well?  In English, both probably are fine, the latter feeling 
more natural to me.  But then, I'm not a native speaker. ;-)

Gerald


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Bin.Cheng
On Fri, Sep 16, 2016 at 11:07 AM, Kyrill Tkachov
 wrote:
>
> On 16/09/16 11:05, Bin.Cheng wrote:
>>
>> On Fri, Sep 16, 2016 at 10:53 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> On 16/09/16 10:50, Bin.Cheng wrote:

 On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
  wrote:
>
> On 16/09/16 10:02, Richard Biener wrote:
>>
>> On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> Currently the functions:
>>> int f1(int x, int t)
>>> {
>>>  if (x == -1 || x == -2)
>>>t = 1;
>>>  return t;
>>> }
>>>
>>> int f2(int x, int t)
>>> {
>>>  if (x == -1 || x == -2)
>>>return 1;
>>>  return t;
>>> }
>>>
>>> generate different code on AArch64 even though they have identical
>>> functionality:
>>> f1:
>>>add w0, w0, 2
>>>cmp w0, 1
>>>csinc   w0, w1, wzr, hi
>>>ret
>>>
>>> f2:
>>>cmn w0, #2
>>>csinc   w0, w1, wzr, cc
>>>ret
>>>
>>> The problem is that f2 performs the comparison (LTU w0 -2)
>>> whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
>>> simplify
>>> the f1 form
>>> to the f2 form with the simplify-rtx.c rule added in this patch. With
>>> this
>>> patch the
>>> codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN,
>>> CSINC).
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>> aarch64-none-linux-gnu,
>>> x86_64.
>>> What do you think? Is this a correct generalisation of this issue?
>>> If so, ok for trunk?
>>
>> Do you see a difference on the GIMPLE level?  If so, this kind of
>> transform looks
>> appropriate there, too.
>
>
> The GIMPLE for the two functions looks almost identical:
> f1 (intD.7 xD.3078, intD.7 tD.3079)
> {
> intD.7 x_4(D) = xD.3078;
> intD.7 t_5(D) = tD.3079;
> unsigned int x.0_1;
> unsigned int _2;
> x.0_1 = (unsigned int) x_4(D);
>
> _2 = x.0_1 + 2;
> if (_2 <= 1)
>   goto ;
> else
>   goto ;
> ;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
>
> # t_3 = PHI 
> return t_3;
> }
>
> f2 (intD.7 xD.3082, intD.7 tD.3083)
> {
> intD.7 x_4(D) = xD.3082;
> intD.7 t_5(D) = tD.3083;
> unsigned int x.1_1;
> unsigned int _2;
> intD.7 _3;
>
> x.1_1 = (unsigned int) x_4(D);
>
> _2 = x.1_1 + 2;
> if (_2 <= 1)
>   goto ;
> else
>   goto ;
>
> ;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
> # _3 = PHI <1(2), t_5(D)(3)>
> return _3;
>
> }
>
> So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with
> slightly
> different CFG.  RTL-level transformations (ce1) bring it to the
> pre-combine
> RTL
> where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).
>
> So the differences start at RTL level, so I think we need this
> transformation there.
> However, for the testcase:
> unsigned int
> foo (unsigned int a, unsigned int b)
> {
> return (a + 2) > 1;
> }
>
> The differences do appear at GIMPLE level, so I think a match.pd
> pattern
> would help here.

 Hi, may I ask what the function looks like to which this one is
 different
 to?
>>>
>>>
>>> Hi Bin,
>>> I meant to say that the unsigned greater than comparison is retained at
>>> the
>>> GIMPLE level
>>> so could be optimised there.
>>
>> In this case, the resulting gimple code refers to a huge unsigned
>> constant.  It's target dependent if that constant can be encoded.
>> AArch64 has CMN to do that, not sure what other targets' case.  And
>> AArch64 only supports small range of such constants.  May be better to
>> leave it for RTL where we know better if result code is optimal.
>
>
> Well, we are saving a PLUS operation, so the resulting GIMPLE is simpler
Ah, yes, right.

Thanks,
bin


Re: [RFC][IPA-VRP] Early VRP Implementation

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 7:59 AM, kugan
 wrote:
> Hi Richard,
>
> Thanks for the review.
>
> On 14/09/16 22:04, Richard Biener wrote:
>>
>> On Tue, Aug 23, 2016 at 4:11 AM, Kugan Vivekanandarajah
>>  wrote:
>>>
>>> Hi,
>>>
>>> On 19 August 2016 at 21:41, Richard Biener 
>>> wrote:

 On Tue, Aug 16, 2016 at 9:45 AM, kugan
  wrote:
>
> Hi Richard,
>
>
> I am now having -ftree-evrp which is enabled all the time. But This
> will
> only be used for disabling the early-vrp. That is, early-vrp will be
> run
> when ftree-vrp is enabled and ftree-evrp is not explicitly disabled. Is
> this
> OK?


 Why would one want to disable early-vrp?  I see you do this in the
 testsuite
 for non-early VRP unit-tests but using -fdisable-tree-evrp1 there
 would be ok as well.
>>>
>>>
>>> Removed it altogether. I though that you wanted a way to disable
>>> early-vrp for testing purposes.
>>
>>
>> But there is via the generic -fdisable-tree-DUMPFILE way.
>
>
> OK. I didnt know about that.
>
>
 Note that you want to have a custom valueize function instead of just
 follow_single_use_edges as you want to valueize all SSA names according
 to their lattice value (if it has a single value).  You can use
 vrp_valueize
 for this though that gets you non-single-use edge following as well.
 Eventually it's going to be cleaner to do what the SSA propagator does
 and
 before folding do

did_replace = replace_uses_in (stmt, vrp_valueize);
if (fold_stmt (, follow_single_use_edges)
|| did_replace)
  update_stmt (gsi_stmt (gsi));

 exporting replace_uses_in for this is ok.  I guess I prefer this for
 now.
>>>
>>>
>>> I also added the above.  I noticed that I need
>>> recompute_tree_invariant_for_addr_expr as in ssa_propagate. My initial
>>> implementation also had gimple_purge_all_dead_eh_edges and
>>> fixup_noreturn_call as in ssa_propagat but I thinj that is not needed
>>> as it would be done at the end of the pass.
>>
>>
>> I don't see this being done at the end of the pass.  So please
>> re-instantiate
>> that parts.
>
>
> I have copied these part as well.
>
>>> With this I noticed more stmts are folded before vrp1. This required
>>> me to adjust some testcases.
>>>

 Overall this now looks good apart from the folding and the
 VR_INITIALIZER thing.

 You can commit the already approved refactoring changes and combine this
 patch with the struct value_range move, this way I can more easily look
 into
 issues with the UNDEFINED thing if you can come up with a testcase that
 doesn't work.

>>>
>>> I have now committed all the dependent patches.
>>>
>>> Attached patch passes regression and bootstrap except pr33738.C. This
>>> is an unrelated issue as discussed in
>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01386.html
>>>
>>> Is this OK?
>>
>>
>> +/* Initialize local data structures for VRP.  If DOM_P is true,
>> +   we will be calling this from early_vrp where value range propagation
>> +   is done by visiting stmts in dominator tree.  ssa_propagate engine
>> +   is not used in this case and that part of the ininitialization will
>> +   be skipped.  */
>> +
>> +static void
>> +vrp_initialize ()
>>
>> comment needs updating now.
>>
> Done.
>
>>
>>  static void
>> -extract_range_from_phi_node (gphi *phi, value_range *vr_result)
>> +extract_range_from_phi_node (gphi *phi, value_range *vr_result,
>> +bool early_vrp_p)
>>  {
>>
>>
>> I don't think you need this changes now that you have
>> stmt_visit_phi_node_in_dom_p
>> guarding its call.
>
>
> OK removed it. That also mean I had to put scev_* in the early_vrp.
>
>
>
>> +static bool
>> +stmt_visit_phi_node_in_dom_p (gphi *phi)
>> +{
>> +  ssa_op_iter iter;
>> +  use_operand_p oprnd;
>> +  tree op;
>> +  value_range *vr;
>> +  FOR_EACH_PHI_ARG (oprnd, phi, iter, SSA_OP_USE)
>> +{
>> +  op = USE_FROM_PTR (oprnd);
>> +  if (TREE_CODE (op) == SSA_NAME)
>> +   {
>> + vr = get_value_range (op);
>> + if (vr->type == VR_UNDEFINED)
>> +   return false;
>> +   }
>> +}
>>
>> I think this is overly conservative in never allowing UNDEFINED on PHI
>> node args (even if the def was actually visited).  I think that the most
>> easy way to improve this bit would be to actually track visited blocks.
>> You already set the EDGE_EXECUTABLE flag on edges so you could
>> clear BB_VISITED on all blocks and set it in the before_dom_children
>> hook (at the end).  Then the above can be folded into the PHI visiting:
>>
>> bool has_unvisited_pred = false;
>> FOR_EACH_EDGE (e, ei, bb->preds)
>>if (!(e->src->flags & BB_VISITED))
>>  {
>> has_unvisited_preds = true;
>> break;
>>   

[committed] Use true/false instead of 1/0 in lvalue_p

2016-09-16 Thread Marek Polacek
lvalue_p returns bool but was using 0/1 as return values.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2016-09-16  Marek Polacek  

* c-typeck.c (lvalue_p): Use true and false instead of 1 and 0.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 4dec397..059ad1f 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -4631,7 +4631,7 @@ lvalue_p (const_tree ref)
 
 case COMPOUND_LITERAL_EXPR:
 case STRING_CST:
-  return 1;
+  return true;
 
 case INDIRECT_REF:
 case ARRAY_REF:
@@ -4647,7 +4647,7 @@ lvalue_p (const_tree ref)
   return TREE_CODE (TREE_TYPE (ref)) == ARRAY_TYPE;
 
 default:
-  return 0;
+  return false;
 }
 }
 

Marek


Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-16 Thread Bernd Schmidt

On 09/16/2016 12:10 PM, Trevor Saunders wrote:

ok, going through all the casts added here I see the following reasons
for them.

- in md files operands is a array of rtx, we should probably have a
  different way to pass insns to the C code here.  That seems worth
  investigating incrementally.

- JUMP_LABEL can be a return which is not an insn.  That also seems
  like something that should be improved at some point.


These just show that fundamentally, rtl is just dynamically typed, and 
used as such, which is why I was never massively enthusiastic about the 
rtx -> rtx_insn conversion to begin with.



Bernd


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Kyrill Tkachov


On 16/09/16 11:05, Bin.Cheng wrote:

On Fri, Sep 16, 2016 at 10:53 AM, Kyrill Tkachov
 wrote:

On 16/09/16 10:50, Bin.Cheng wrote:

On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
 wrote:

On 16/09/16 10:02, Richard Biener wrote:

On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
 wrote:

Hi all,

Currently the functions:
int f1(int x, int t)
{
 if (x == -1 || x == -2)
   t = 1;
 return t;
}

int f2(int x, int t)
{
 if (x == -1 || x == -2)
   return 1;
 return t;
}

generate different code on AArch64 even though they have identical
functionality:
f1:
   add w0, w0, 2
   cmp w0, 1
   csinc   w0, w1, wzr, hi
   ret

f2:
   cmn w0, #2
   csinc   w0, w1, wzr, cc
   ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With
this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf,
aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?

Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.


The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
intD.7 x_4(D) = xD.3078;
intD.7 t_5(D) = tD.3079;
unsigned int x.0_1;
unsigned int _2;
x.0_1 = (unsigned int) x_4(D);

_2 = x.0_1 + 2;
if (_2 <= 1)
  goto ;
else
  goto ;
;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot

# t_3 = PHI 
return t_3;
}

f2 (intD.7 xD.3082, intD.7 tD.3083)
{
intD.7 x_4(D) = xD.3082;
intD.7 t_5(D) = tD.3083;
unsigned int x.1_1;
unsigned int _2;
intD.7 _3;

x.1_1 = (unsigned int) x_4(D);

_2 = x.1_1 + 2;
if (_2 <= 1)
  goto ;
else
  goto ;

;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
# _3 = PHI <1(2), t_5(D)(3)>
return _3;

}

So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG.  RTL-level transformations (ce1) bring it to the
pre-combine
RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).

So the differences start at RTL level, so I think we need this
transformation there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
return (a + 2) > 1;
}

The differences do appear at GIMPLE level, so I think a match.pd pattern
would help here.

Hi, may I ask what the function looks like to which this one is different
to?


Hi Bin,
I meant to say that the unsigned greater than comparison is retained at the
GIMPLE level
so could be optimised there.

In this case, the resulting gimple code refers to a huge unsigned
constant.  It's target dependent if that constant can be encoded.
AArch64 has CMN to do that, not sure what other targets' case.  And
AArch64 only supports small range of such constants.  May be better to
leave it for RTL where we know better if result code is optimal.


Well, we are saving a PLUS operation, so the resulting GIMPLE is simpler IMO,
which is match.pd's goal.

Thanks,
Kyrill


Thanks,
bin




Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Bin.Cheng
On Fri, Sep 16, 2016 at 10:53 AM, Kyrill Tkachov
 wrote:
>
> On 16/09/16 10:50, Bin.Cheng wrote:
>>
>> On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> On 16/09/16 10:02, Richard Biener wrote:

 On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
  wrote:
>
> Hi all,
>
> Currently the functions:
> int f1(int x, int t)
> {
> if (x == -1 || x == -2)
>   t = 1;
> return t;
> }
>
> int f2(int x, int t)
> {
> if (x == -1 || x == -2)
>   return 1;
> return t;
> }
>
> generate different code on AArch64 even though they have identical
> functionality:
> f1:
>   add w0, w0, 2
>   cmp w0, 1
>   csinc   w0, w1, wzr, hi
>   ret
>
> f2:
>   cmn w0, #2
>   csinc   w0, w1, wzr, cc
>   ret
>
> The problem is that f2 performs the comparison (LTU w0 -2)
> whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
> simplify
> the f1 form
> to the f2 form with the simplify-rtx.c rule added in this patch. With
> this
> patch the
> codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).
>
> Bootstrapped and tested on arm-none-linux-gnueabihf,
> aarch64-none-linux-gnu,
> x86_64.
> What do you think? Is this a correct generalisation of this issue?
> If so, ok for trunk?

 Do you see a difference on the GIMPLE level?  If so, this kind of
 transform looks
 appropriate there, too.
>>>
>>>
>>> The GIMPLE for the two functions looks almost identical:
>>> f1 (intD.7 xD.3078, intD.7 tD.3079)
>>> {
>>>intD.7 x_4(D) = xD.3078;
>>>intD.7 t_5(D) = tD.3079;
>>>unsigned int x.0_1;
>>>unsigned int _2;
>>>x.0_1 = (unsigned int) x_4(D);
>>>
>>>_2 = x.0_1 + 2;
>>>if (_2 <= 1)
>>>  goto ;
>>>else
>>>  goto ;
>>> ;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
>>> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
>>>
>>># t_3 = PHI 
>>>return t_3;
>>> }
>>>
>>> f2 (intD.7 xD.3082, intD.7 tD.3083)
>>> {
>>>intD.7 x_4(D) = xD.3082;
>>>intD.7 t_5(D) = tD.3083;
>>>unsigned int x.1_1;
>>>unsigned int _2;
>>>intD.7 _3;
>>>
>>>x.1_1 = (unsigned int) x_4(D);
>>>
>>>_2 = x.1_1 + 2;
>>>if (_2 <= 1)
>>>  goto ;
>>>else
>>>  goto ;
>>>
>>> ;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
>>> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
>>># _3 = PHI <1(2), t_5(D)(3)>
>>>return _3;
>>>
>>> }
>>>
>>> So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
>>> different CFG.  RTL-level transformations (ce1) bring it to the
>>> pre-combine
>>> RTL
>>> where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).
>>>
>>> So the differences start at RTL level, so I think we need this
>>> transformation there.
>>> However, for the testcase:
>>> unsigned int
>>> foo (unsigned int a, unsigned int b)
>>> {
>>>return (a + 2) > 1;
>>> }
>>>
>>> The differences do appear at GIMPLE level, so I think a match.pd pattern
>>> would help here.
>>
>> Hi, may I ask what the function looks like to which this one is different
>> to?
>
>
> Hi Bin,
> I meant to say that the unsigned greater than comparison is retained at the
> GIMPLE level
> so could be optimised there.
In this case, the resulting gimple code refers to a huge unsigned
constant.  It's target dependent if that constant can be encoded.
AArch64 has CMN to do that, not sure what other targets' case.  And
AArch64 only supports small range of such constants.  May be better to
leave it for RTL where we know better if result code is optimal.

Thanks,
bin


Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-16 Thread Trevor Saunders
On Thu, Sep 15, 2016 at 10:27:56AM -0600, Jeff Law wrote:
> On 09/15/2016 10:10 AM, Trevor Saunders wrote:
> > On Thu, Sep 15, 2016 at 12:52:59PM +0200, Bernd Schmidt wrote:
> > > On 09/14/2016 09:21 PM, tbsaunde+...@tbsaunde.org wrote:
> > > 
> > > > Basically $subject.  First change variable's type to rtx_insn * where 
> > > > possible.
> > > > Then change the functions and fixup callers where it is still necessary 
> > > > to
> > > > cast.
> > > 
> > > #2, #4 and #8 look good and can be applied if they work independently of 
> > > the
> > > others.
> > 
> > at most #2 should depend on #1 so it should be fine and I can check on
> > the others.
> > 
> > > Less certain about some of the others which introduce additional casts.
> > 
> > yeah, its somewhat unfortunate, though one way or another we will need
> > to add casts I think the question is just how many we will accept and
> > where.
> In my mind the casts represent the "bounds" of how far we've taken this
> work.  They occur at the boundaries where we haven't converted something
> from an "rtx" to an "rtx_insn *" and allow us to do the work piecemeal
> rather than all-at-once.
> 
> What I don't have a sense of is when we'll be able to push rtx_insn * far
> enough that we're able to start removing casts.
> 
> And that might be a good way to prioritize the next batch of work.  Pick a
> cast, remove it and deal with the fallout.  :-)
> 
> 
> > 
> > > Maybe LABEL_REF_LABEL needs converting first, and reorg.c has a few
> > > variables that might have to be made rtx_insn * in patch #7 to avoid 
> > > casts.
> > 
> > LABEL_REF_LABEL might be doable, its a good idea I'll look into.  The
> > reorg.c stuff around target_label is rather complicated unfortunately.
> > In the end I of course agree the variables should be rtx_insn *.
> > However currently things are assigned to that variable that are not
> > insns.  So we need to break the variable up, but its involved in a lot
> > of code I don't think I know well enough to really refactor.  For
> > example it looks like target_label can hold a value between iterations
> > of the loop, I suspect that would be a bug, but I'm not really sure.
> I can probably help with reorg.  Hell, you might even be referring to my
> code!

ok, going through all the casts added here I see the following reasons
for them.

- in md files operands is a array of rtx, we should probably have a
  different way to pass insns to the C code here.  That seems worth
  investigating incrementally.

- JUMP_LABEL can be a return which is not an insn.  That also seems
  like something that should be improved at some point.

- tablejump_p returns a label through a rtx * out argument.  I expect
  that isn't hard to fix at this point.

- sh.c uses XEXP (SET_SRC (PATTERN (jump)) 0) with a comment JUMP_LABEL
  might be undefined when not optimizing.  Its not clear to me if that
  is still true.

- var_loc_node::loc is either an expr list or a note, off hand I'm not
  sure what to do with this.

- in reorg.c variables refer to both a insn and other things I think
  this is more results of JUMP_LABEL some times being a return.

I've locally converted LABEL_REF_LABEL, but that only avoids 2 casts.
However it does seem like an improvement so I'll send that shortly.

Trev

> jeff


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Kyrill Tkachov


On 16/09/16 10:50, Bin.Cheng wrote:

On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
 wrote:

On 16/09/16 10:02, Richard Biener wrote:

On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
 wrote:

Hi all,

Currently the functions:
int f1(int x, int t)
{
if (x == -1 || x == -2)
  t = 1;
return t;
}

int f2(int x, int t)
{
if (x == -1 || x == -2)
  return 1;
return t;
}

generate different code on AArch64 even though they have identical
functionality:
f1:
  add w0, w0, 2
  cmp w0, 1
  csinc   w0, w1, wzr, hi
  ret

f2:
  cmn w0, #2
  csinc   w0, w1, wzr, cc
  ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With
this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf,
aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?

Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.


The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
   intD.7 x_4(D) = xD.3078;
   intD.7 t_5(D) = tD.3079;
   unsigned int x.0_1;
   unsigned int _2;
   x.0_1 = (unsigned int) x_4(D);

   _2 = x.0_1 + 2;
   if (_2 <= 1)
 goto ;
   else
 goto ;
;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot

   # t_3 = PHI 
   return t_3;
}

f2 (intD.7 xD.3082, intD.7 tD.3083)
{
   intD.7 x_4(D) = xD.3082;
   intD.7 t_5(D) = tD.3083;
   unsigned int x.1_1;
   unsigned int _2;
   intD.7 _3;

   x.1_1 = (unsigned int) x_4(D);

   _2 = x.1_1 + 2;
   if (_2 <= 1)
 goto ;
   else
 goto ;

;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
   # _3 = PHI <1(2), t_5(D)(3)>
   return _3;

}

So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG.  RTL-level transformations (ce1) bring it to the pre-combine
RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).

So the differences start at RTL level, so I think we need this
transformation there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
   return (a + 2) > 1;
}

The differences do appear at GIMPLE level, so I think a match.pd pattern
would help here.

Hi, may I ask what the function looks like to which this one is different to?


Hi Bin,
I meant to say that the unsigned greater than comparison is retained at the 
GIMPLE level
so could be optimised there.

Kyrill


Thanks,
bin




Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Bin.Cheng
On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
 wrote:
>
> On 16/09/16 10:02, Richard Biener wrote:
>>
>> On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> Currently the functions:
>>> int f1(int x, int t)
>>> {
>>>if (x == -1 || x == -2)
>>>  t = 1;
>>>return t;
>>> }
>>>
>>> int f2(int x, int t)
>>> {
>>>if (x == -1 || x == -2)
>>>  return 1;
>>>return t;
>>> }
>>>
>>> generate different code on AArch64 even though they have identical
>>> functionality:
>>> f1:
>>>  add w0, w0, 2
>>>  cmp w0, 1
>>>  csinc   w0, w1, wzr, hi
>>>  ret
>>>
>>> f2:
>>>  cmn w0, #2
>>>  csinc   w0, w1, wzr, cc
>>>  ret
>>>
>>> The problem is that f2 performs the comparison (LTU w0 -2)
>>> whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
>>> simplify
>>> the f1 form
>>> to the f2 form with the simplify-rtx.c rule added in this patch. With
>>> this
>>> patch the
>>> codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>>> aarch64-none-linux-gnu,
>>> x86_64.
>>> What do you think? Is this a correct generalisation of this issue?
>>> If so, ok for trunk?
>>
>> Do you see a difference on the GIMPLE level?  If so, this kind of
>> transform looks
>> appropriate there, too.
>
>
> The GIMPLE for the two functions looks almost identical:
> f1 (intD.7 xD.3078, intD.7 tD.3079)
> {
>   intD.7 x_4(D) = xD.3078;
>   intD.7 t_5(D) = tD.3079;
>   unsigned int x.0_1;
>   unsigned int _2;
>   x.0_1 = (unsigned int) x_4(D);
>
>   _2 = x.0_1 + 2;
>   if (_2 <= 1)
> goto ;
>   else
> goto ;
> ;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
>
>   # t_3 = PHI 
>   return t_3;
> }
>
> f2 (intD.7 xD.3082, intD.7 tD.3083)
> {
>   intD.7 x_4(D) = xD.3082;
>   intD.7 t_5(D) = tD.3083;
>   unsigned int x.1_1;
>   unsigned int _2;
>   intD.7 _3;
>
>   x.1_1 = (unsigned int) x_4(D);
>
>   _2 = x.1_1 + 2;
>   if (_2 <= 1)
> goto ;
>   else
> goto ;
>
> ;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
> ;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
>   # _3 = PHI <1(2), t_5(D)(3)>
>   return _3;
>
> }
>
> So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
> different CFG.  RTL-level transformations (ce1) bring it to the pre-combine
> RTL
> where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).
>
> So the differences start at RTL level, so I think we need this
> transformation there.
> However, for the testcase:
> unsigned int
> foo (unsigned int a, unsigned int b)
> {
>   return (a + 2) > 1;
> }
>
> The differences do appear at GIMPLE level, so I think a match.pd pattern
> would help here.
Hi, may I ask what the function looks like to which this one is different to?

Thanks,
bin


Backports to 6.x

2016-09-16 Thread Jakub Jelinek
Hi!

I've backported a bunch of patches to 6.x, after bootstrapping/regtesting
them on x86_64-linux and i686-linux:

Jakub
2016-09-16  Jakub Jelinek  

Backported from mainline
2016-09-05  Jakub Jelinek  

PR sanitizer/77396
* asan/asan_globals.cc: Cherry-pick upstream r280657.

* g++.dg/asan/pr77396-2.C: New test.

2016-09-02  Jakub Jelinek  

PR sanitizer/77396
* g++.dg/asan/pr77396.C: New test.

--- libsanitizer/asan/asan_globals.cc   (revision 239997)
+++ libsanitizer/asan/asan_globals.cc   (revision 239998)
@@ -248,10 +248,10 @@ void __asan_unregister_globals(__asan_gl
 // initializer can only touch global variables in the same TU.
 void __asan_before_dynamic_init(const char *module_name) {
   if (!flags()->check_initialization_order ||
-  !CanPoisonMemory())
+  !CanPoisonMemory() ||
+  !dynamic_init_globals)
 return;
   bool strict_init_order = flags()->strict_init_order;
-  CHECK(dynamic_init_globals);
   CHECK(module_name);
   CHECK(asan_inited);
   BlockingMutexLock lock(_for_globals);
@@ -274,7 +274,8 @@ void __asan_before_dynamic_init(const ch
 // TU are poisoned.  It simply unpoisons all dynamically initialized globals.
 void __asan_after_dynamic_init() {
   if (!flags()->check_initialization_order ||
-  !CanPoisonMemory())
+  !CanPoisonMemory() ||
+  !dynamic_init_globals)
 return;
   CHECK(asan_inited);
   BlockingMutexLock lock(_for_globals);
--- gcc/testsuite/g++.dg/asan/pr77396.C (revision 0)
+++ gcc/testsuite/g++.dg/asan/pr77396.C (revision 239961)
@@ -0,0 +1,12 @@
+// PR sanitizer/77396
+// { dg-do run }
+// { dg-set-target-env-var ASAN_OPTIONS "check_initialization_order=true" }
+
+static int a = 0; 
+static int b = a; 
+
+int
+main ()
+{
+  return 0;
+}
--- gcc/testsuite/g++.dg/asan/pr77396-2.C   (revision 0)
+++ gcc/testsuite/g++.dg/asan/pr77396-2.C   (revision 239998)
@@ -0,0 +1,12 @@
+// PR sanitizer/77396
+// { dg-do run }
+// { dg-set-target-env-var ASAN_OPTIONS "check_initialization_order=true" }
+
+struct S { S () { asm volatile ("" : : : "memory"); } };
+static S c;
+
+int
+main ()
+{
+  return 0;
+}
2016-09-16  Jakub Jelinek  

Backported from mainline
2016-09-06  Jakub Jelinek  

PR target/69255
* config/i386/i386.c (ix86_expand_builtin): For builtin with
unsupported or unknown ISA, use expand_call.

* gcc.target/i386/pr69255-1.c: New test.
* gcc.target/i386/pr69255-2.c: New test.
* gcc.target/i386/pr69255-3.c: New test.

--- gcc/config/i386/i386.c  (revision 240013)
+++ gcc/config/i386/i386.c  (revision 240014)
@@ -36107,7 +36107,7 @@ ix86_expand_builtin (tree exp, rtx targe
  error ("%qE needs isa option %s", fndecl, opts);
  free (opts);
}
-  return const0_rtx;
+  return expand_call (exp, target, ignore);
 }
 
   switch (fcode)
--- gcc/testsuite/gcc.target/i386/pr69255-1.c   (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr69255-1.c   (revision 240014)
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-msse4 -mno-avx" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target "no-avx512vl"
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p)
+{
+  a = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error "needs 
isa option -m32 -mavx512vl" } */
+}
+
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
--- gcc/testsuite/gcc.target/i386/pr69255-2.c   (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr69255-2.c   (revision 240014)
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-msse4 -mno-avx" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target ""
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p)
+{
+  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error 
"needs isa option -m32 -mavx512vl" } */
+}
+
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
--- gcc/testsuite/gcc.target/i386/pr69255-3.c   (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr69255-3.c   (revision 240014)
@@ -0,0 +1,17 @@
+/* PR target/69255 */
+/* { dg-do compile } */
+/* { dg-options "-msse4 -mno-avx" } */
+
+#pragma GCC target "avx512vl"
+#pragma GCC target ""
+__attribute__ ((__vector_size__ (32))) long long a;
+__attribute__ ((__vector_size__ (16))) int b;
+
+void
+foo (const long long *p, __attribute__ ((__vector_size__ (32))) long 

Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Marc Glisse

On Fri, 16 Sep 2016, Jonathan Wakely wrote:


On 16/09/16 09:04 +0200, Rainer Orth wrote:

Hi Jason,


OK, one more:


this works just fine on both sparc-sun-solaris2.12 and
i386-pc-solaris2.12.

Once Jonathan's patch to heed aligned_alloc's requirement on size being
a multiple of alignment is in, all is fine on Solaris.


I've got a slightly different fix now.

We only need to make the size a multiple of alignment for
aligned_alloc, however for posix_memalign we need to ensure the
alignment is a multiple of sizeof(void*).

I'm testing this now (but only on x86_64 GNU/Linux where it wasn't
failing anyway).


+  // The value of alignment shall be a power of two multiple of sizeof(void *).
+  if (al < sizeof(void*))
+al = sizeof(void*);

The code doesn't exactly match the comment. I can't find the precondition 
in the standard that says operator new can only be called on a power of 
2... (maybe we can add it if it is really missing?)



Would using __builtin_expect (sz == 0, false) make sense?  Surely it's
rare to try to allocate zero bytes.


https://gcc.gnu.org/ml/libstdc++/2014-03/msg1.html

gcc already guesses that a test like sz == 0 is usually false (not with as 
large a probability as if you use __builtin_expect, but enough that the 
generated code is unlikely to differ). But adding __builtin_expect cannot 
hurt...


Is the division (by a non-constant denominator) really necessary? Since 
align has to be a power of 2, x % align should be the same as x & (align - 
1), for instance.


I guess people interested in performance will do for aligned new the same 
as for the old new: provide an inline version that skips all the overhead 
to forward directly to malloc/aligned_alloc (and avoid questionable calls 
in their code).


--
Marc Glisse


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Kyrill Tkachov


On 16/09/16 10:02, Richard Biener wrote:

On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
 wrote:

Hi all,

Currently the functions:
int f1(int x, int t)
{
   if (x == -1 || x == -2)
 t = 1;
   return t;
}

int f2(int x, int t)
{
   if (x == -1 || x == -2)
 return 1;
   return t;
}

generate different code on AArch64 even though they have identical
functionality:
f1:
 add w0, w0, 2
 cmp w0, 1
 csinc   w0, w1, wzr, hi
 ret

f2:
 cmn w0, #2
 csinc   w0, w1, wzr, cc
 ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?

Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.


The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
  intD.7 x_4(D) = xD.3078;
  intD.7 t_5(D) = tD.3079;
  unsigned int x.0_1;
  unsigned int _2;
  x.0_1 = (unsigned int) x_4(D);

  _2 = x.0_1 + 2;
  if (_2 <= 1)
goto ;
  else
goto ;
;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot

  # t_3 = PHI 
  return t_3;
}

f2 (intD.7 xD.3082, intD.7 tD.3083)
{
  intD.7 x_4(D) = xD.3082;
  intD.7 t_5(D) = tD.3083;
  unsigned int x.1_1;
  unsigned int _2;
  intD.7 _3;

  x.1_1 = (unsigned int) x_4(D);

  _2 = x.1_1 + 2;
  if (_2 <= 1)
goto ;
  else
goto ;

;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 1, maybe hot
  # _3 = PHI <1(2), t_5(D)(3)>
  return _3;

}

So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG.  RTL-level transformations (ce1) bring it to the pre-combine RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).

So the differences start at RTL level, so I think we need this transformation 
there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
  return (a + 2) > 1;
}

The differences do appear at GIMPLE level, so I think a match.pd pattern would 
help here.
I'll look into adding one there as well, but that would be independent of this 
patch.

Thanks,
Kyrill


Richard.


Thanks,
Kyrill

2016-09-16  Kyrylo Tkachov  

 * simplify-rtx.c (simplify_relational_operation_1): Add transformation
 (GTU (PLUS a C) (C - 1)) --> (LTU a -C).

2016-09-16  Kyrylo Tkachov  

 * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.




Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Jan Hubicka
> On Fri, Sep 16, 2016 at 10:50 AM, Yuan, Pengfei  wrote:
> >> > Here are the results:
> >> >
> >> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> >> > 0  44686265 (-8.26%)  58.772s  66.332s
> >> > 1  45692793 (-6.19%)  40.684s  39.220s
> >> > 2  45556185 (-6.47%)  35.292s  34.328s
> >> > 3  46251049 (-5.05%)  28.820s  27.136s
> >> > 4  47028873 (-3.45%)  24.616s  22.200s
> >> > 5  47495641 (-2.49%)  20.160s  17.800s
> >> > 6  47520153 (-2.44%)  16.444s  15.656s
> >> > 14 48708873   5.620s   5.556s
> >>
> >> Thanks for data! I meant to run the benchmark myself, but had little time 
> >> to do
> >> it over past week becuase of traveling and was also wondering what to do 
> >> given
> >> that spec is rather poor benchmark in this area. Tramp3d is biassed but we 
> >> are
> >> in stage1 and can fine tune latter. I am debugging the libxul crashes in 
> >> FDO
> >> binary now, so we can re-run talos.
> >
> > It seems to be memory corruption. The content of a pointer becomes 
> > 0xe5e5...e5.
> > I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same 
> > error.
> >
> >> > Param: value of PARAM_EARLY_INLINING_INSNS
> >> > Size:  code size (.text) of optimized libxul.so
> >> > Time:  execution time of instrumented tramp3d (-n 25)
> >> >
> >> > To balance between size reduction of optimized binary and speed penalty
> >> > of instrumented binary, I set param=6 as baseline and compare:
> >> >
> >> > Param  Size score  Time score  Total
> >> > 0  3.39-3.57   -0.18
> >> > 1  2.54-2.470.07
> >> > 2  2.65-2.150.50
> >> > 3  2.07-1.750.32
> >> > 4  1.41-1.50   -0.09
> >> > 5  1.02-1.23   -0.21
> >> > 6  1.00-1.000.00
> >> > 14 0.00-0.34   -0.34
> >> >
> >> > Therefore, I think param=2 is the best choice.
> >> >
> >> > Is the attached patch OK?
> >>
> >> Setting param to 2 looks fine
> >>
> >> > gcc/ChangeLog
> >> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> >> > when FDO is enabled.
> >> >
> >> >
> >> > diff --git a/gcc/opts.c b/gcc/opts.c
> >> > index 39c190d..b59c700 100644
> >> > --- a/gcc/opts.c
> >> > +++ b/gcc/opts.c
> >> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> >> > gcc_options *opts_set,
> >> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
> >> >  opts->x_param_values, 
> >> > opts_set->x_param_values);
> >> >  }
> >> >
> >> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> >> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> >> > +  || (opts->x_flag_branch_probabilities && 
> >> > !opts->x_flag_auto_profile))
> >> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> >> > +  opts->x_param_values, 
> >> > opts_set->x_param_values);
> >> > +
> >>
> >> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
> >> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The 
> >> reason is
> >> that profile is not a global property of program. It may or may not be
> >> available for given function, while params are global.
> >
> > Whether profile is available for specific functions is not important here 
> > because
> > profile is loaded after pass_early_inline. Therefore I think setting the 
> > param
> > globally is fine.
> 
> I also like a new param better as it avoids a new magic constant and
> makes playing with
> it easier (your patch removes the ability to do statistics like you did via 
> the
> early-inlining-insns parameter by forcing it to two).

Hmm, you are right that you do not know if this particular function will get
profile (forgot about that).  Still, please use two params - it is more
consistent with what we have now and we may make it profile specific in
future..

Honza
> 
> Thanks,
> Richard.


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 10:50 AM, Yuan, Pengfei  wrote:
>> > Here are the results:
>> >
>> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
>> > 0  44686265 (-8.26%)  58.772s  66.332s
>> > 1  45692793 (-6.19%)  40.684s  39.220s
>> > 2  45556185 (-6.47%)  35.292s  34.328s
>> > 3  46251049 (-5.05%)  28.820s  27.136s
>> > 4  47028873 (-3.45%)  24.616s  22.200s
>> > 5  47495641 (-2.49%)  20.160s  17.800s
>> > 6  47520153 (-2.44%)  16.444s  15.656s
>> > 14 48708873   5.620s   5.556s
>>
>> Thanks for data! I meant to run the benchmark myself, but had little time to 
>> do
>> it over past week becuase of traveling and was also wondering what to do 
>> given
>> that spec is rather poor benchmark in this area. Tramp3d is biassed but we 
>> are
>> in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
>> binary now, so we can re-run talos.
>
> It seems to be memory corruption. The content of a pointer becomes 
> 0xe5e5...e5.
> I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same error.
>
>> > Param: value of PARAM_EARLY_INLINING_INSNS
>> > Size:  code size (.text) of optimized libxul.so
>> > Time:  execution time of instrumented tramp3d (-n 25)
>> >
>> > To balance between size reduction of optimized binary and speed penalty
>> > of instrumented binary, I set param=6 as baseline and compare:
>> >
>> > Param  Size score  Time score  Total
>> > 0  3.39-3.57   -0.18
>> > 1  2.54-2.470.07
>> > 2  2.65-2.150.50
>> > 3  2.07-1.750.32
>> > 4  1.41-1.50   -0.09
>> > 5  1.02-1.23   -0.21
>> > 6  1.00-1.000.00
>> > 14 0.00-0.34   -0.34
>> >
>> > Therefore, I think param=2 is the best choice.
>> >
>> > Is the attached patch OK?
>>
>> Setting param to 2 looks fine
>>
>> > gcc/ChangeLog
>> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
>> > when FDO is enabled.
>> >
>> >
>> > diff --git a/gcc/opts.c b/gcc/opts.c
>> > index 39c190d..b59c700 100644
>> > --- a/gcc/opts.c
>> > +++ b/gcc/opts.c
>> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
>> > gcc_options *opts_set,
>> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
>> >  opts->x_param_values, 
>> > opts_set->x_param_values);
>> >  }
>> >
>> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
>> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
>> > +  || (opts->x_flag_branch_probabilities && 
>> > !opts->x_flag_auto_profile))
>> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
>> > +  opts->x_param_values, opts_set->x_param_values);
>> > +
>>
>> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
>> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason 
>> is
>> that profile is not a global property of program. It may or may not be
>> available for given function, while params are global.
>
> Whether profile is available for specific functions is not important here 
> because
> profile is loaded after pass_early_inline. Therefore I think setting the param
> globally is fine.

I also like a new param better as it avoids a new magic constant and
makes playing with
it easier (your patch removes the ability to do statistics like you did via the
early-inlining-insns parameter by forcing it to two).

Thanks,
Richard.

>> Even at compile time profile may be selectively missing for example for
>> COMDATs that did not win in the linking process.
>>
>> There is also need to update the documentation.
>> Thanks for the work!
>> Honza
>
> Here is the new patch.
>
> Regards,
> Yuan, Pengfei
>
>
> gcc/ChangeLog
> * doc/invoke.texi (early-inlining-insns): Value is adjusted
> when FDO is enabled.
> * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> when FDO is enabled.
>
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1ca4dcc..880a28c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10272,9 +10272,11 @@ The default value is 10.
>
>  @item early-inlining-insns
>  Specify growth that the early inliner can make.  In effect it increases
>  the amount of inlining for code having a large abstraction penalty.
> -The default value is 14.
> +The default value is 14.  When feedback-directed optimizations is enabled (by
> +@option{-fprofile-generate} or @option{-fprofile-use}), the value is adjusted
> +to 2.
>
>  @item max-early-inliner-iterations
>  Limit of iterations of the early inliner.  This basically bounds
>  the number of nested indirect calls the early inliner can resolve.
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 39c190d..b59c700 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -826,8 +826,14 @@ finish_options (struct 

Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-16 Thread Jonathan Wakely

On 16/09/16 09:04 +0200, Rainer Orth wrote:

Hi Jason,


OK, one more:


this works just fine on both sparc-sun-solaris2.12 and
i386-pc-solaris2.12.

Once Jonathan's patch to heed aligned_alloc's requirement on size being
a multiple of alignment is in, all is fine on Solaris.


I've got a slightly different fix now.

We only need to make the size a multiple of alignment for
aligned_alloc, however for posix_memalign we need to ensure the
alignment is a multiple of sizeof(void*).

I'm testing this now (but only on x86_64 GNU/Linux where it wasn't
failing anyway).

Would using __builtin_expect (sz == 0, false) make sense?  Surely it's
rare to try to allocate zero bytes.

commit 216b9547230295e615bab86aaede65554f63e57d
Author: Jonathan Wakely 
Date:   Fri Sep 16 09:54:51 2016 +0100

Adjust arguments to aligned_alloc or posix_memalign

	* libsupc++/new_opa.cc [_GLIBCXX_HAVE_POSIX_MEMALIGN] (aligned_alloc):
	Increase alignment if less than sizeof(void*).
	[_GLIBCXX_HAVE_ALIGNED_ALLOC] (operator new(size_t, align_val_t)):
	Increase size if not a multiple of alignment.

diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc
index 6ff5421..9c859c1 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -39,6 +39,9 @@ static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
 {
   void *ptr;
+  // The value of alignment shall be a power of two multiple of sizeof(void *).
+  if (al < sizeof(void*))
+al = sizeof(void*);
   int ret = posix_memalign (, al, sz);
   if (ret == 0)
 return ptr;
@@ -58,13 +61,19 @@ _GLIBCXX_WEAK_DEFINITION void *
 operator new (std::size_t sz, std::align_val_t al)
 {
   void *p;
+  std::size_t align = (std::size_t)al;
 
   /* malloc (0) is unpredictable; avoid it.  */
   if (sz == 0)
 sz = 1;
 
-  while (__builtin_expect ((p = aligned_alloc ((std::size_t)al, sz)) == 0,
-			   false))
+#if _GLIBCXX_HAVE_ALIGNED_ALLOC
+  /* C11: the value of size shall be an integral multiple of alignment.  */
+  if (std::size_t rem = sz % align)
+sz += align - rem;
+#endif
+
+  while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false))
 {
   new_handler handler = std::get_new_handler ();
   if (! handler)


Re: C/C++ PATCH for c/77423 (bogus warning with -Wlogical-not-parentheses)

2016-09-16 Thread Marek Polacek
On Thu, Sep 15, 2016 at 01:21:21PM -0600, Jeff Law wrote:
> On 09/05/2016 08:28 AM, Marek Polacek wrote:
> > > test.c:10:8: note: add parentheses around left hand side expression to
> > > silence this warning
> > > r += !a == ~b;
> > >  ^~
> > >  ( )
> > > 
> > > this will not fix it, but make  it worse.
> > > I think a better warning would be
> > > warning: ~ on boolean value, did you mean ! ?
> > 
> > Could you please open a PR?  I'll take care of it.
> > 
> > Still not sure about other operations.  I guess no one would
> > object to warning on bool1 % bool2, but should we warn for
> > bool1 + bool2?
> Wouldn't the desire for a warning largely depend on the type of the result?
> So I'll assume you're referring to a boolean result :-)
> 
> bool1 + bool2 does have meaning though, even when the result is a bool. You
> have to be leery of both having a true value as that causes an overflow.

Yea.  The version of the patch I posted doesn't warn for addition of booleans.

Marek


Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> Currently the functions:
> int f1(int x, int t)
> {
>   if (x == -1 || x == -2)
> t = 1;
>   return t;
> }
>
> int f2(int x, int t)
> {
>   if (x == -1 || x == -2)
> return 1;
>   return t;
> }
>
> generate different code on AArch64 even though they have identical
> functionality:
> f1:
> add w0, w0, 2
> cmp w0, 1
> csinc   w0, w1, wzr, hi
> ret
>
> f2:
> cmn w0, #2
> csinc   w0, w1, wzr, cc
> ret
>
> The problem is that f2 performs the comparison (LTU w0 -2)
> whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to simplify
> the f1 form
> to the f2 form with the simplify-rtx.c rule added in this patch. With this
> patch the
> codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64.
> What do you think? Is this a correct generalisation of this issue?
> If so, ok for trunk?

Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.

Richard.

> Thanks,
> Kyrill
>
> 2016-09-16  Kyrylo Tkachov  
>
> * simplify-rtx.c (simplify_relational_operation_1): Add transformation
> (GTU (PLUS a C) (C - 1)) --> (LTU a -C).
>
> 2016-09-16  Kyrylo Tkachov  
>
> * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.


Re: C/C++ PATCH to implement -Wbool-operation (PR c/77490)

2016-09-16 Thread Marek Polacek
On Thu, Sep 15, 2016 at 04:45:03PM -0400, Eric Gallager wrote:
> > +@item -Wbool-operation
> > +@opindex Wno-bool-operation
> > +@opindex Wbool-operation
> > +Warn about suspicious operations on expressions of a boolean type.  For
> > +instance, bitwise negation of a boolean is very likely a bug in the 
> > program.
> > +For C, this warning also warns about incrementing or decrementing a 
> > boolean,
> > +which rarely makes sense.
> > +
> 
> 
> I'd also mention that C++ now warns about incrementing and
> decrementing anyways, even without this option.

Ok, I could have added that.  Done now.

> > --- gcc/testsuite/gcc.dg/Wall.c
> > +++ gcc/testsuite/gcc.dg/Wall.c
> > @@ -1,7 +1,7 @@
> >  /* PR 30437: Test -Wall
> > Don't change this without changing Wno-all.c as well.  */
> >  /* { dg-do compile } */
> > -/* { dg-options "-Wall" } */
> > +/* { dg-options "-Wall -Wno-bool-operation" } */
> >
> 
> 
> Why did -Wno-bool-operation need to be added here?

Oops, that is a mistake.  This is a remnant from the previous version.
Removed.

> Anyways, thanks for your work on this; I hope it goes in soon!

Thanks.

Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

2016-09-16  Marek Polacek  

PR c/77490
* c.opt (Wbool-operation): New.

* c-typeck.c (build_unary_op): Warn about bit not on expressions that
have boolean value.  Warn about ++/-- on booleans.

* typeck.c (cp_build_unary_op): Warn about bit not on expressions that
have boolean value.

* doc/invoke.texi: Document -Wbool-operation.

* c-c++-common/Wbool-operation-1.c: New test.
* gcc.dg/Wbool-operation-1.c: New test.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index c55c7c3..fb6f2d1 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -315,6 +315,10 @@ Wbool-compare
 C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about boolean expression compared with an integer value different from 
true/false.
 
+Wbool-operation
+C ObjC C++ ObjC++ Var(warn_bool_op) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
+Warn about certain operations on boolean expressions.
+
 Wframe-address
 C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn when __builtin_frame_address or __builtin_return_address is used unsafely.
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 4dec397..c455d22 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -4196,6 +4196,22 @@ build_unary_op (location_t location, enum tree_code 
code, tree xarg,
  || (typecode == VECTOR_TYPE
  && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg
{
+ tree e = arg;
+
+ /* Warn if the expression has boolean value.  */
+ while (TREE_CODE (e) == COMPOUND_EXPR)
+   e = TREE_OPERAND (e, 1);
+
+ if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
+  || truth_value_p (TREE_CODE (e)))
+ && warning_at (location, OPT_Wbool_operation,
+"%<~%> on a boolean expression"))
+   {
+ gcc_rich_location richloc (location);
+ richloc.add_fixit_insert_before (location, "!");
+ inform_at_rich_loc (, "did you mean to use logical "
+ "not?");
+   }
  if (!noconvert)
arg = default_conversion (arg);
}
@@ -4306,6 +4322,16 @@ build_unary_op (location_t location, enum tree_code 
code, tree xarg,
"decrement of enumeration value is invalid in C++");
}
 
+  if (TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE)
+   {
+ if (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR)
+   warning_at (location, OPT_Wbool_operation,
+   "increment of a boolean expression");
+ else
+   warning_at (location, OPT_Wbool_operation,
+   "decrement of a boolean expression");
+   }
+
   /* Ensure the argument is fully folded inside any SAVE_EXPR.  */
   arg = c_fully_fold (arg, false, NULL);
 
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index c53a85a..9fc1a4b 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -5853,7 +5853,16 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
   arg, true)))
errstring = _("wrong type argument to bit-complement");
   else if (!noconvert && CP_INTEGRAL_TYPE_P (TREE_TYPE (arg)))
-   arg = cp_perform_integral_promotions (arg, complain);
+   {
+ /* Warn if the expression has boolean value.  */
+ location_t location = EXPR_LOC_OR_LOC (arg, input_location);
+ if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
+  || truth_value_p (TREE_CODE (arg)))
+ && warning_at (location, OPT_Wbool_operation,
+"%<~%> on a boolean expression"))
+   

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
 wrote:
> Hi!
>
> (CCing Bernd and Jakub -- for your information, or: "amusement" -- as
> you've discussed offloading preload_common_nodes issues before...)
>
> Got to look into this some more, yesterday:
>
> On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
>> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
>>  wrote:
>> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge  
>> > wrote:
>> > > On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers 
>> > >  wrote:
>> > >> On Fri, 19 Aug 2016, Richard Biener wrote:
>> > >> > Can you quickly verify if LTO works with the new types?  I don't see 
>> > >> > anything
>> > >> > that would prevent it but having new global trees and backends 
>> > >> > initializing them
>> > >> > might come up with surprises (see 
>> > >> > tree-streamer.c:preload_common_nodes)
>> > >>
>> > >> Well, the execution tests are in gcc.dg/torture, which is run with 
>> > >> various
>> > >> options including -flto (and I've checked the testsuite logs to confirm
>> > >> these tests are indeed run with such options).  Is there something else
>> > >> you think should be tested?
>> > >
>> > > As I noted in :
>> > >
>> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
>> > > types", nvptx
>> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
>> > > data-type
>> > > mismatch that is not visible with Intel MIC offloading (using the 
>> > > same data
>> > > types) but explodes with nvptx.  I'm having a look.
>
>> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge by
>> > frontend (!) it wasn't even designed to cope with global trees being 
>> > present
>> > or not dependent on target (well, because the target is always the
>> > same! mind you!)
>>
>> Scary.  ;-/
>>
>> > Now -- in theory it should deal with NULLs just fine (resulting in
>> > error_mark_node), but it can diverge when there are additional
>> > compount types (like vectors, complex
>> > or array or record types) whose element types are not in the set of
>> > global trees.
>> > The complex _FloatN types would be such a case given they appear before 
>> > their
>> > components.  That mixes up the ordering at least.
>>
>> ACK, but that's also an issue for "regular" float/complex float, which
>> also is in "reverse" order.  But that's "fixed" by the recursion in
>> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
>> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
>
> So, that mechanism does work, but what's going wrong is the following:
> with differing target vs. offload target, we potentially (and actually
> do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> NULL_TREE), and so it follows their complex variants also don't exist,
> and thus the recursion that I just mentioned for complex types is no
> longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> get an offset (of two, in this specific case), and consequently streaming
> explodes, for example, as soon as it hits a forward-reference (due to
> looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> should be looking for tree 183 (nvptx lto1 view).
>
>> (I've
>> put "fixed" in quotes -- doesn't that recursion mean that we're thus
>> putting "complex float", "float", [...], "float" (again) into the cache?
>> Anyway that's for later...)
>
> Maybe it would make sense to do this tree streaming in two passes: first
> build a set of what we actually need, and then stream that, without
> duplicates.  (Or, is also for these "pickled" trees their order relevant,
> so that one tree may only refer to other earlier but not later ones?
> Anyway, we could still remember the set of trees already streamed, and
> avoid the double streaming I described?)
>
> So I now understand that due to the stream format, the integer tree IDs
> (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> make that work for x86_64 target vs. nvptx offload target being
> different.  (I'm pondering some ideas about how to rework that integer
> tree ID generation.)
>
> (I have not digested completely yet what the implications are for the
> skipping we're doing for some trees in preload_common_nodes), but here is
> a first patch that at least gets us rid of the immediate problem.  (I'll
> fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> now?

Humm ... do we anywhere compare to those global trees by pointer equivalence?
If so then it breaks LTO support for those types.

I think forcing proper ordering so that we can assert that at the
point we'd like
to recurse the nodes we recurse for are already in the 

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > Here are the results:
> > 
> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> > 0  44686265 (-8.26%)  58.772s  66.332s
> > 1  45692793 (-6.19%)  40.684s  39.220s
> > 2  45556185 (-6.47%)  35.292s  34.328s
> > 3  46251049 (-5.05%)  28.820s  27.136s
> > 4  47028873 (-3.45%)  24.616s  22.200s
> > 5  47495641 (-2.49%)  20.160s  17.800s
> > 6  47520153 (-2.44%)  16.444s  15.656s
> > 14 48708873   5.620s   5.556s
> 
> Thanks for data! I meant to run the benchmark myself, but had little time to 
> do
> it over past week becuase of traveling and was also wondering what to do given
> that spec is rather poor benchmark in this area. Tramp3d is biassed but we are
> in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
> binary now, so we can re-run talos.

It seems to be memory corruption. The content of a pointer becomes 0xe5e5...e5.
I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same error.

> > Param: value of PARAM_EARLY_INLINING_INSNS
> > Size:  code size (.text) of optimized libxul.so
> > Time:  execution time of instrumented tramp3d (-n 25)
> > 
> > To balance between size reduction of optimized binary and speed penalty
> > of instrumented binary, I set param=6 as baseline and compare:
> > 
> > Param  Size score  Time score  Total
> > 0  3.39-3.57   -0.18
> > 1  2.54-2.470.07
> > 2  2.65-2.150.50
> > 3  2.07-1.750.32
> > 4  1.41-1.50   -0.09
> > 5  1.02-1.23   -0.21
> > 6  1.00-1.000.00
> > 14 0.00-0.34   -0.34
> > 
> > Therefore, I think param=2 is the best choice.
> > 
> > Is the attached patch OK?
> 
> Setting param to 2 looks fine
> 
> > gcc/ChangeLog
> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> > when FDO is enabled.
> > 
> > 
> > diff --git a/gcc/opts.c b/gcc/opts.c
> > index 39c190d..b59c700 100644
> > --- a/gcc/opts.c
> > +++ b/gcc/opts.c
> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
> >  opts->x_param_values, 
> > opts_set->x_param_values);
> >  }
> > 
> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> > +  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> > +  opts->x_param_values, opts_set->x_param_values);
> > +
> 
> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason is
> that profile is not a global property of program. It may or may not be
> available for given function, while params are global.

Whether profile is available for specific functions is not important here 
because
profile is loaded after pass_early_inline. Therefore I think setting the param
globally is fine.

> Even at compile time profile may be selectively missing for example for
> COMDATs that did not win in the linking process.
> 
> There is also need to update the documentation.
> Thanks for the work!
> Honza

Here is the new patch.

Regards,
Yuan, Pengfei


gcc/ChangeLog
* doc/invoke.texi (early-inlining-insns): Value is adjusted
when FDO is enabled.
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1ca4dcc..880a28c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10272,9 +10272,11 @@ The default value is 10.
 
 @item early-inlining-insns
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
-The default value is 14.
+The default value is 14.  When feedback-directed optimizations is enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}), the value is adjusted
+to 2.
 
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
 opts->x_param_values, opts_set->x_param_values);
 }
 
+  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
+  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
+  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
+maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
+  

Re: [RFC][IPA-VRP] Early VRP Implementation

2016-09-16 Thread Richard Biener
On Thu, Sep 15, 2016 at 4:45 PM, Jeff Law  wrote:
> On 09/14/2016 11:55 PM, Richard Biener wrote:
>>
>> On September 14, 2016 11:36:16 PM GMT+02:00, Jan Hubicka 
>> wrote:

 +  /* Visit PHI stmts and discover any new VRs possible.  */
 +  gimple_stmt_iterator gsi;
 +  for (gphi_iterator gpi = gsi_start_phis (bb);
 +   !gsi_end_p (gpi); gsi_next ())
 +{
 +  gphi *phi = gpi.phi ();
 +  tree lhs = PHI_RESULT (phi);
 +  value_range vr_result = VR_INITIALIZER;
 +  if (! has_unvisived_preds
&& stmt_interesting_for_vrp (phi)
 + && stmt_visit_phi_node_in_dom_p (phi))
 +   extract_range_from_phi_node (phi, _result, true);
 +  else
 +   set_value_range_to_varying (_result);
 +  update_value_range (lhs, _result);
 +}

 due to a bug in IRA you need to make sure to un-set BB_VISITED after
 early-vrp is finished again.
>>>
>>> How IRA bugs affects early passes?
>>
>>
>> IRA bogously relies on BB_VISITED being cleared at pass start.
>
> Seems like IRA ought to be fixed to clear BB_VISITED on every block as part
> of its initialization.

Agreed but I didn't find the time to track down an appropriate place to do that.

Richard.

> Jeff
>


[PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

2016-09-16 Thread Kyrill Tkachov

Hi all,

Currently the functions:
int f1(int x, int t)
{
  if (x == -1 || x == -2)
t = 1;
  return t;
}

int f2(int x, int t)
{
  if (x == -1 || x == -2)
return 1;
  return t;
}

generate different code on AArch64 even though they have identical 
functionality:
f1:
add w0, w0, 2
cmp w0, 1
csinc   w0, w1, wzr, hi
ret

f2:
cmn w0, #2
csinc   w0, w1, wzr, cc
ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to simplify the 
f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With this 
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, 
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?

Thanks,
Kyrill

2016-09-16  Kyrylo Tkachov  

* simplify-rtx.c (simplify_relational_operation_1): Add transformation
(GTU (PLUS a C) (C - 1)) --> (LTU a -C).

2016-09-16  Kyrylo Tkachov  

* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 14302ea06eccc099ef356ab6c63ac020dd083b0c..4153c7335680068ed3ce08410400ac6abaf30c89 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4663,6 +4663,19 @@ simplify_relational_operation_1 (enum rtx_code code, machine_mode mode,
   cmp_mode, XEXP (op0, 0), new_cmp);
 }
 
+  /* (GTU (PLUS a C) (C - 1)) where C is a non-zero constant can be
+ transformed into (LTU a -C).  */
+  if (code == GTU && GET_CODE (op0) == PLUS && CONST_INT_P (op1)
+  && CONST_INT_P (XEXP (op0, 1))
+  && (UINTVAL (op1) == UINTVAL (XEXP (op0, 1)) - 1)
+  && XEXP (op0, 1) != const0_rtx)
+{
+  rtx new_cmp
+	= simplify_gen_unary (NEG, cmp_mode, XEXP (op0, 1), cmp_mode);
+  return simplify_gen_relational (LTU, mode, cmp_mode,
+   XEXP (op0, 0), new_cmp);
+}
+
   /* Canonicalize (LTU/GEU (PLUS a b) b) as (LTU/GEU (PLUS a b) a).  */
   if ((code == LTU || code == GEU)
   && GET_CODE (op0) == PLUS
diff --git a/gcc/testsuite/gcc.target/aarch64/gtu_to_ltu_cmp_1.c b/gcc/testsuite/gcc.target/aarch64/gtu_to_ltu_cmp_1.c
new file mode 100644
index ..81c536c90afe38932c48ed0af24f55e73eeff80e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gtu_to_ltu_cmp_1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+f1 (int x, int t)
+{
+  if (x == -1 || x == -2)
+t = 1;
+
+  return t;
+}
+
+/* { dg-final { scan-assembler-times "cmn\\tw\[0-9\]+, #2" 1 } } */


Re: [PATCH][2/n] Change dw2_asm_output_offset to allow assembling extra offset

2016-09-16 Thread Richard Biener
On Fri, 16 Sep 2016, Andreas Schwab wrote:

>   * config/ia64/ia64.h (ASM_OUTPUT_DWARF_OFFSET): Use parameter
>   OFFSET, not offset.
>   * config/i386/cygming.h (ASM_OUTPUT_DWARF_OFFSET): Likewise.

Oops.

Ok (counts as obvious).

Thanks,
Richard.

> diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
> index 228d6a2..faf8fa4 100644
> --- a/gcc/config/i386/cygming.h
> +++ b/gcc/config/i386/cygming.h
> @@ -109,8 +109,8 @@ along with GCC; see the file COPYING3.  If not see
>case 4:\
>   fputs ("\t.secrel32\t", FILE);  \
>   assemble_name (FILE, LABEL);\
> - if (offset != 0)\
> -   fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC, offset)   \
> + if ((OFFSET) != 0)  \
> +   fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC, OFFSET)   \
>   break;  \
>case 8:\
>   /* This is a hack.  There is no 64-bit section relative \
> @@ -120,8 +120,8 @@ along with GCC; see the file COPYING3.  If not see
>  Fake the 64-bit offset by zero-extending it.  */ \
>   fputs ("\t.secrel32\t", FILE);  \
>   assemble_name (FILE, LABEL);\
> - if (offset != 0)\
> -   fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC, offset)   \
> + if ((OFFSET) != 0)  \
> +   fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC, OFFSET)   \
>   fputs ("\n\t.long\t0", FILE);   \
>   break;  \
>default:   \
> diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
> index aab2d7a..00516bb 100644
> --- a/gcc/config/ia64/ia64.h
> +++ b/gcc/config/ia64/ia64.h
> @@ -1586,7 +1586,7 @@ do {
> \
>  fputs (integer_asm_op (SIZE, FALSE), FILE);  \
>  fputs ("@secrel(", FILE);\
>  assemble_name (FILE, LABEL); \
> -if (offset != 0) \
> +if ((OFFSET) != 0)   \
>fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC, OFFSET);   \
>  fputc (')', FILE);   \
>} while (0)
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


  1   2   >