Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Marek Polacek
On Fri, Apr 18, 2014 at 07:49:22AM +0200, Marc Glisse wrote:
 On Fri, 18 Apr 2014, Marek Polacek wrote:
 
 This patch implements a new warning that warns when controlling
 expression of a switch has boolean value.  (Intentionally I don't
 warn if the controlling expression is (un)signed:1 bit-field.)
 I guess the question is if this should be enabled by default or
 deserves some new warning option.  Since clang does the former,
 I did it too and currently this warning is enabled by default.
 
 It can be enabled by -Wsome-name which is itself enabled by default but
 at least gives the possibility to use -Wno-some-name, -Werror=some-name,
 etc. No? I believe Manuel insists regularly that no new warning should
 use 0 (and old ones should progressively lose it).

Yes, that's the other possibility and exactly what I wanted to
discuss.  I think I'll prepare another version with -Wswitch-bool (and
documentation).

Marek


RE: [PATCH v8] PR middle-end/60281

2014-04-18 Thread Bernd Edlinger
Hi Jakub,

I can take that task over and will boot-strap with all languages and run the
test suite on my armv7-linux-gnueabihf system.
But that will take until next week as it is currently occupied with other tests.


Can you please review Lin's latest patch and give your OK for check-in on trunk
and 4.9.1 branch?


Thanks
Bernd. 

On Fri, 18 Apr 2014 12:26:36, Lin Zuojian wrote:

 Hi Bernd,
 a) On which target(s) did you boot-strap your patch?
 I just run it on x86, can't run it on ARM, because Android is not a
 posix system, nor a System V compatible system. And my code does not
 effect x86.

 b) Did you run the testsuite?
 Yes, but again my code does not effect x86.

 c) When you compare the test results with and without the patch, were there 
 any regressions?
 Only the bug has gone. My app can run on my Android ARM system.

 On Fri, Apr 18, 2014 at 12:21:50PM +0800, lin zuojian wrote:
 Hi,
 Here is the patch after the Jakub's review, and Jakub helps with the
 coding style.

 --

 * asan.c (asan_emit_stack_protection):
 Force the base to align to appropriate bits if STRICT_ALIGNMENT. Set
 shadow_mem align to appropriate bits if STRICT_ALIGNMENT.
 * cfgexpand.c
 (expand_stack_vars): Set base_align appropriately when asan is on.
 (expand_used_vars): Leave a space in the stack frame for alignment if
 STRICT_ALIGNMENT.

 ---
 gcc/ChangeLog | 9 +
 gcc/asan.c | 15 +++
 gcc/cfgexpand.c | 18 --
 3 files changed, 40 insertions(+), 2 deletions(-)

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index da35be8..30a2b33 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,12 @@
 +2014-04-18 Lin Zuojian manjian2...@gmail.com
 + PR middle-end/60281
 + * asan.c (asan_emit_stack_protection): Force the base to align to
 + appropriate bits if STRICT_ALIGNMENT. Set shadow_mem align to
 + appropriate bits if STRICT_ALIGNMENT.
 + * cfgexpand.c (expand_stack_vars): Set base_align appropriately
 + when asan is on.
 + (expand_used_vars): Leave a space in the stack frame for alignment
 + if STRICT_ALIGNMENT.
 2014-04-17 Jakub Jelinek ja...@redhat.com

 PR target/60847
 diff --git a/gcc/asan.c b/gcc/asan.c
 index 53992a8..28a476f 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -1017,8 +1017,17 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,
 base_align_bias = ((asan_frame_size + alignb - 1)
  ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
 + /* Align base if target is STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + base = expand_binop (Pmode, and_optab, base,
 + gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)
 +  ASAN_SHADOW_SHIFT)
 + / BITS_PER_UNIT), Pmode), NULL_RTX,
 + 1, OPTAB_DIRECT);
 +
 if (use_after_return_class == -1  pbase)
 emit_move_insn (pbase, base);
 +
 base = expand_binop (Pmode, add_optab, base,
 gen_int_mode (base_offset - base_align_bias, Pmode),
 NULL_RTX, 1, OPTAB_DIRECT);
 @@ -1097,6 +1106,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,
  (ASAN_RED_ZONE_SIZE ASAN_SHADOW_SHIFT) == 4);
 shadow_mem = gen_rtx_MEM (SImode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 + if (STRICT_ALIGNMENT)
 + set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 prev_offset = base_offset;
 for (l = length; l; l -= 2)
 {
 @@ -1186,6 +1197,10 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,

 shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 +
 + if (STRICT_ALIGNMENT)
 + set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 +
 prev_offset = base_offset;
 last_offset = base_offset;
 last_size = 0;
 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index b7f6360..14511e1 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -1013,10 +1013,19 @@ expand_stack_vars (bool (*pred) (size_t), struct 
 stack_vars_data *data)
 if (data-asan_base == NULL)
 data-asan_base = gen_reg_rtx (Pmode);
 base = data-asan_base;
 +
 + if (!STRICT_ALIGNMENT)
 + base_align = crtl-max_used_stack_slot_alignment;
 + else
 + base_align = MAX (crtl-max_used_stack_slot_alignment,
 + GET_MODE_ALIGNMENT (SImode)
 +  ASAN_SHADOW_SHIFT);
 }
 else
 - offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 - base_align = crtl-max_used_stack_slot_alignment;
 + {
 + offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 + base_align = crtl-max_used_stack_slot_alignment;
 + }
 }
 else
 {
 @@ -1845,6 +1854,11 @@ expand_used_vars (void)
 = alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
 data.asan_vec.safe_push (prev_offset);
 data.asan_vec.safe_push (offset);
 + /* Leave space for alignment if STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode)
 +  ASAN_SHADOW_SHIFT)
 + / BITS_PER_UNIT, 1);

 var_end_seq
 = asan_emit_stack_protection (virtual_stack_vars_rtx,
 --
 1.8.3.2

 --
 Regards
 lin zuojian

Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Marek Polacek
On Fri, Apr 18, 2014 at 08:00:45AM +0200, Marek Polacek wrote:
 On Fri, Apr 18, 2014 at 07:49:22AM +0200, Marc Glisse wrote:
  On Fri, 18 Apr 2014, Marek Polacek wrote:
  
  This patch implements a new warning that warns when controlling
  expression of a switch has boolean value.  (Intentionally I don't
  warn if the controlling expression is (un)signed:1 bit-field.)
  I guess the question is if this should be enabled by default or
  deserves some new warning option.  Since clang does the former,
  I did it too and currently this warning is enabled by default.
  
  It can be enabled by -Wsome-name which is itself enabled by default but
  at least gives the possibility to use -Wno-some-name, -Werror=some-name,
  etc. No? I believe Manuel insists regularly that no new warning should
  use 0 (and old ones should progressively lose it).
 
 Yes, that's the other possibility and exactly what I wanted to
 discuss.  I think I'll prepare another version with -Wswitch-bool (and
 documentation).

Here.

2014-04-18  Marek Polacek  pola...@redhat.com

PR c/60439
* doc/invoke.texi: Document -Wswitch-bool.
c/
* c-typeck.c (c_start_case): Warn if switch condition has boolean
value.
c-family/
* c.opt (Wswitch-bool): New option.
testsuite/
* gcc.dg/pr60439.c: New test.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 390c056..9089496 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -529,6 +529,10 @@ Wswitch-enum
 C ObjC C++ ObjC++ Var(warn_switch_enum) Warning
 Warn about all enumerated switches missing a specific case
 
+Wswitch-bool
+C ObjC Warning Init(1)
+Warn about switches with boolean controlling expression
+
 Wmissing-format-attribute
 C ObjC C++ ObjC++ Alias(Wsuggest-attribute=format)
 ;
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 65aad45..44982d3 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -9344,6 +9344,28 @@ c_start_case (location_t switch_loc,
   else
{
  tree type = TYPE_MAIN_VARIANT (orig_type);
+ tree e = exp;
+ enum tree_code exp_code;
+
+ while (TREE_CODE (e) == COMPOUND_EXPR)
+   e = TREE_OPERAND (e, 1);
+ exp_code = TREE_CODE (e);
+
+ if (TREE_CODE (type) == BOOLEAN_TYPE
+ || exp_code == TRUTH_ANDIF_EXPR
+ || exp_code == TRUTH_AND_EXPR
+ || exp_code == TRUTH_ORIF_EXPR
+ || exp_code == TRUTH_OR_EXPR
+ || exp_code == TRUTH_XOR_EXPR
+ || exp_code == TRUTH_NOT_EXPR
+ || exp_code == EQ_EXPR
+ || exp_code == NE_EXPR
+ || exp_code == LE_EXPR
+ || exp_code == GE_EXPR
+ || exp_code == LT_EXPR
+ || exp_code == GT_EXPR)
+   warning_at (switch_cond_loc, OPT_Wswitch_bool,
+   switch condition has boolean value);
 
  if (!in_system_header_at (input_location)
   (type == long_integer_type_node
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 8004da8..04e1c41 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -268,7 +268,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstrict-aliasing=n @gol -Wstrict-overflow -Wstrict-overflow=@var{n} @gol
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
 -Wmissing-format-attribute @gol
--Wswitch  -Wswitch-default  -Wswitch-enum -Wsync-nand @gol
+-Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
 -Wsystem-headers  -Wtrampolines  -Wtrigraphs  -Wtype-limits  -Wundef @gol
 -Wuninitialized  -Wunknown-pragmas  -Wno-pragmas @gol
 -Wunsuffixed-float-constants  -Wunused  -Wunused-function @gol
@@ -3822,6 +3822,12 @@ between @option{-Wswitch} and this option is that this 
option gives a
 warning about an omitted enumeration code even if there is a
 @code{default} label.
 
+@item -Wswitch-bool
+@opindex Wswitch-bool
+@opindex Wno-switch-bool
+Warn whenever a @code{switch} statement has an index of boolean type.
+This warning is enabled by default for C programs.
+
 @item -Wsync-nand @r{(C and C++ only)}
 @opindex Wsync-nand
 @opindex Wno-sync-nand
diff --git gcc/testsuite/gcc.dg/pr60439.c gcc/testsuite/gcc.dg/pr60439.c
index e69de29..26e7c25 100644
--- gcc/testsuite/gcc.dg/pr60439.c
+++ gcc/testsuite/gcc.dg/pr60439.c
@@ -0,0 +1,112 @@
+/* PR c/60439 */
+/* { dg-do compile } */
+
+typedef _Bool bool;
+extern _Bool foo (void);
+
+void
+f1 (const _Bool b)
+{
+  switch (b) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+}
+
+void
+f2 (int a, int b)
+{
+  switch (a  b) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch ((bool) (a  b)) /* { dg-warning switch condition has boolean 
value } */
+case 1:
+  break;
+  switch ((a  b) || a) /* { dg-warning switch condition has boolean value 
} */
+case 1:
+  break;
+}
+
+void
+f3 (int a)
+{
+  switch (!!a) /* { dg-warning switch condition has boolean value } */
+case 

Re: [PATCH v8] PR middle-end/60281

2014-04-18 Thread Jakub Jelinek
On Fri, Apr 18, 2014 at 12:21:50PM +0800, lin zuojian wrote:
 Here is the patch after the Jakub's review, and Jakub helps with the
 coding style.

 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,12 @@
 +2014-04-18  Lin Zuojian  manjian2...@gmail.com
 +   PR middle-end/60281

Extra line missing before the PR line.
Otherwise this is ok for trunk and for 4.9.1 after a while on the trunk (a
week or two).

Thanks.

Jakub


Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Jakub Jelinek
On Fri, Apr 18, 2014 at 07:49:22AM +0200, Marc Glisse wrote:
 On Fri, 18 Apr 2014, Marek Polacek wrote:
 
 This patch implements a new warning that warns when controlling
 expression of a switch has boolean value.  (Intentionally I don't
 warn if the controlling expression is (un)signed:1 bit-field.)
 I guess the question is if this should be enabled by default or
 deserves some new warning option.  Since clang does the former,
 I did it too and currently this warning is enabled by default.
 
 It can be enabled by -Wsome-name which is itself enabled by default but
 at least gives the possibility to use -Wno-some-name, -Werror=some-name,
 etc. No? I believe Manuel insists regularly that no new warning should
 use 0 (and old ones should progressively lose it).

Yeah, completely agreed.  It can be enabled by default, but it should still be
constorlled by a warning switch.

Jakub


Re: Add testcase for PR lto/60820

2014-04-18 Thread Dominique Dhumieres
 this is stand alone testcase for that PR.
 Comitted to mainline.
 ...

The test fails on darwin with

/opt/gcc/work/gcc/testsuite/gcc.dg/lto/pr60820_0.c:13:1: warning: alias 
definitions not supported in Mach-O; ignored

TIA

Dominique


Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-04-18 Thread Samuel Thibault
Samuel Thibault, le Thu 17 Apr 2014 00:03:45 +0200, a écrit :
 Thomas Schwinge, le Wed 09 Apr 2014 09:36:42 +0200, a écrit :
  Well, the first step is to verify that TARGET_THREAD_SPLIT_STACK_OFFSET
  and similar configury is correct for the Hurd,
 
 I have added the corresponding field, so we can just use the same offset
 as on Linux.

I have uploaded packages on http://people.debian.org/~sthibault/tmp/ so
Svante can try setting TARGET_THREAD_SPLIT_STACK_OFFSET to 0x30 with
them.

Samuel


Re: [build] PR 43538: Don't overwrite CXXFLAGS_FOR_TARGET in config/mt-gnu

2014-04-18 Thread Marc Glisse

Ping
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01480.html

On Thu, 23 Jan 2014, Marc Glisse wrote:


Hello,

although setting CFLAGS_FOR_TARGET before compiling gcc works fine, 
CXXFLAGS_FOR_TARGET is ignored. I don't see any good reason for that.


I tested the patch by doing a regular bootstrap+testsuite on 
x86_64-unknown-linux-gnu. I also did a non-bootstrap build where I set 
CXXFLAGS_FOR_TARGET and checked that it now propagates to libstdc++ and 
others.


config/ChangeLog:

2014-01-23  Marc Glisse  marc.gli...@inria.fr

PR target/43538
* mt-gnu: Don't reset CXXFLAGS_FOR_TARGET.


--
Marc GlisseIndex: config/mt-gnu
===
--- config/mt-gnu   (revision 209514)
+++ config/mt-gnu   (working copy)
@@ -1 +1 @@
-CXXFLAGS_FOR_TARGET = $(CXXFLAGS) -D_GNU_SOURCE
+CXXFLAGS_FOR_TARGET += -D_GNU_SOURCE


Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Dinar Temirbulatov
Hello,
Here is another version of fix. This time, I added
complete_type_or_else call just before aggregate_value_p. Bootstraped
and tested on x86_64-pc-linux-gnu with no new regressions.  Ok to
apply
to trunk?
 Thanks, Dinar.


On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote:
 On 04/07/2014 03:46 PM, Jason Merrill wrote:

 I guess we need to call complete_type before aggregate_value_p.


 complete_type_or_else, actually.

 Jason


2014-04-18  Dinar Temirbulatov  dtemirbula...@gmail.com

PR c++/57958
* semantics.c (apply_deduced_return_type): Complete non-void type
before estimating whether the type is aggregate.


fix.patch
Description: Binary data


Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-04-18 Thread Senthil Kumar Selvaraj

On Sat, Apr 12, 2014 at 06:36:01PM +0200, Georg-Johann Lay wrote:
 Senthil Kumar Selvaraj schrieb:
 This patch modifies AVR target's ASM spec to pass -mlink-relax to the
 assembler if -mrelax is passed to the compiler driver. This was already
 being passed on to the linker, this patch merely makes the assembler
 also aware of it.
 
 The corresponding patch in binutils to handle the -mlink-relax patch is
 already committed in the binutils repo. I'm not sure how to manage a
 running a newer gcc with an older version of binutils though - how is this
 generally handled?
 
 The right place is gcc/configure.ac and have a macro defined depending on
 whether gas supports -mlink-relax.
 
 
 Same should be done for -mrmw, IMO, for similar reasons, e.g. something like
 
 case $target in
   ...
   avr-*-*)
   ...
 gcc_GAS_CHECK_FEATURE([-mrmw option], gcc_cv_as_avr_mrmw,,
   [-mrmw], [.text],,
   [AC_DEFINE(HAVE_AS_AVR_MRMW_OPTION, 1,
   [Define if your assembler supports -mrmw option.])])
 
 or
 
 gcc_GAS_CHECK_FEATURE([-mrmw option], gcc_cv_as_avr_mrmw,,
   [-mrmw], [.text],,,)
 if test x$gcc_cv_as_avr_mrmw = xyes; then
   AC_DEFINE(HAVE_AS_AVR_MRMW_OPTION, 1,
 [Define if your assembler supports the -mrmw option.])
 

Thanks Johann. The below patch adds the configury check for -mlink-relax,
along with the change to ASM_SPEC to propagate the -mrelax flag. I
modified the original patch a bit to make it easier to handle
conditional additions to SPECs (inspired by the sparc target).

 
 However, the gcc-4_9-branch has already been created...

Yes, but I figured it would be useful anyway - if this eventually gets
backported to 4_9, for example.

If the below patch looks ok, could someone commit please? I don't have
commit access.

Regards
Senthil

gcc/ChangeLog

2014-04-18  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

* config/avr/avr.h: Pass on mlink-relax to assembler.
* configure.ac: Test for mlink-relax support in assembler.
* configure: Regenerate.

diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 78434ec..b4e3eb1 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -512,7 +512,28 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
 %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
 %{!fexceptions:-fno-exceptions}
 
-#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) 
+#ifdef HAVE_AS_RELAX_OPTION
+#define ASM_RELAX_SPEC %{mrelax:-mlink-relax}
+#else
+#define ASM_RELAX_SPEC 
+#endif
+
+#define ASM_SPEC %:device_to_as(%{mmcu=*:%*})\
+%(asm_relax)
+
+/* This macro defines names of additional specifications to put in the specs
+   that can be used in various specifications like CC1_SPEC.  Its definition
+   is an initializer with a subgrouping for each command option.
+
+   Each subgrouping contains a string constant, that defines the
+   specification name, and a string constant that used by the GCC driver
+   program.
+
+   Do not define this macro if it does not need to do anything.  */
+
+#define EXTRA_SPECS \
+  { asm_relax,   ASM_RELAX_SPEC }
+
   
 #define LINK_SPEC \
 %{mrelax:--relax\
diff --git gcc/configure gcc/configure
index bfb1525..7815038 100755
--- gcc/configure
+++ gcc/configure
@@ -24142,6 +24142,39 @@ $as_echo #define HAVE_AS_JSRDIRECT_RELOCS 1 
confdefs.h
 fi
 ;;
 
+  avr-*-*)
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for 
-mlink-relax option 5
+$as_echo_n checking assembler for -mlink-relax option...  6; }
+if test ${gcc_cv_as_avr_relax+set} = set; then :
+  $as_echo_n (cached)  6
+else
+  gcc_cv_as_avr_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text'  conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o 
conftest.s 5'
+  { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5
+  (eval $ac_try) 25
+  ac_status=$?
+  $as_echo $as_me:${as_lineno-$LINENO}: \$? = $ac_status 5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_avr_relax=yes
+else
+  echo configure: failed program was 5
+  cat conftest.s 5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax 5
+$as_echo $gcc_cv_as_avr_relax 6; }
+if test $gcc_cv_as_avr_relax = yes; then
+
+$as_echo #define HAVE_AS_RELAX_OPTION 1 confdefs.h
+
+fi
+  ;;
+
   cris-*-*)
 { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for 
-no-mul-bug-abort option 5
 $as_echo_n checking assembler for -no-mul-bug-abort option...  6; }
diff --git gcc/configure.ac gcc/configure.ac
index d7cae6c..cfa862d 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -3579,6 +3579,13 @@ case $target in
   [Define if your assembler supports the lituse_jsrdirect relocation.])])
 ;;
 
+  avr-*-*)
+gcc_GAS_CHECK_FEATURE([-mlink-relax option], gcc_cv_as_avr_relax,,
+  [-mlink-relax], [.text],,
+  [AC_DEFINE(HAVE_AS_RELAX_OPTION, 1,
+   

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-04-18 Thread Evgeny Stupachenko
Hi,

Merged with current master the patch passes bootstrap and is giving
expected gains.
Patch and new tests are attached.

ChangeLog:

2014-04-18  Evgeny Stupachenko  evstu...@gmail.com

* tree-vect-data-refs.c (vect_grouped_store_supported): New
check for stores group of length 3.
(vect_permute_store_chain): New permutations for stores group of
length 3.
(vect_grouped_load_supported): New check for loads group of length 3.
(vect_permute_load_chain): New permutations for loads group of length 3.
* tree-vect-stmts.c (vect_model_store_cost): Change cost
of vec_perm_shuffle for the new permutations.
(vect_model_load_cost): Ditto.

ChangeLog for testsuite:

2014-04-18  Evgeny Stupachenko  evstu...@gmail.com

   PR tree-optimization/52252
   * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
   * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3.

Evgeny

On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Missed attachment.

 On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 I've separated the patch into 2: cost model tuning and load/store
 groups parallelism.
 SLM tuning was partially introduced in the patch:
 http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
 The patch introducing vectorization for load/store groups of size 3 attached.

 Is it ok for stage1?

 ChangeLog:

 2014-03-06  Evgeny Stupachenko  evstu...@gmail.com

* tree-vect-data-refs.c (vect_grouped_store_supported): New
check for stores group of length 3.
(vect_permute_store_chain): New permutations for stores group of
length 3.
(vect_grouped_load_supported): New check for loads group of length 3.
(vect_permute_load_chain): New permutations for loads group of length 
 3.
* tree-vect-stmts.c (vect_model_store_cost): Change cost
of vec_perm_shuffle for the new permutations.
(vect_model_load_cost): Ditto.



 On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener rguent...@suse.de wrote:
 On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:

 Missed patch attached in plain-text.

 I have copyright assignment on file with the FSF covering work on GCC.

 Load/stores groups of length 3 is the most frequent non-power-of-2
 case. It is used in RGB image processing (like test case in PR52252).
 For sure we can extend the patch to length 5 and more. However, this
 potentially affect performance on some other architectures and
 requires larger testing. So length 3 it is just first step.The
 algorithm in the patch could be modified for a general case in several
 steps.

 I understand that the patch should wait for the stage 1, however since
 its ready we can discuss it right now and make some changes (like
 general size of group).

 Other than that I'd like to see a vectorizer hook querying the cost of a
 vec_perm_const expansion instead of adding vec_perm_shuffle
 (thus requires the constant shuffle mask to be passed as well
 as the vector type).  That's more useful for other uses that
 would require (arbitrary) shuffles.

 Didn't look at the rest of the patch yet - queued in my review
 pipeline.

 Thanks,
 Richard.

 Thanks,
 Evgeny

 On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener rguent...@suse.de wrote:
 
  On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
 
   Hi,
  
   The patch gives an expected 3 times gain for the test case in the 
   PR52252
   (and even 6 times for AVX2).
   It passes make check and bootstrap on x86.
   spec2000/spec2006 got no regressions/gains on x86.
  
   Is this patch ok?
 
  I've worked on generalizing the permutation support in the light
  of the availability of the generic shuffle support in the IL
  but hit some road-blocks in the way code-generation works for
  group loads with permutations (I don't remember if I posted all patches).
 
  This patch seems to be to a slightly different place but it again
  special-cases a specific permutation.  Why's that?  Why can't we
  support groups of size 7 for example?  So - can this be generalized
  to support arbitrary non-power-of-two load/store groups?
 
  Other than that the patch has to wait for stage1 to open again,
  of course.  And it misses a testcase.
 
  Btw, do you have a copyright assignment on file with the FSF covering
  work on GCC?
 
  Thanks,
  Richard.
 
   ChangeLog:
  
   2014-02-11  Evgeny Stupachenko  evstu...@gmail.com
  
   * target.h (vect_cost_for_stmt): Defining new cost 
   vec_perm_shuffle.
   * tree-vect-data-refs.c (vect_grouped_store_supported): New
   check for stores group of length 3.
   (vect_permute_store_chain): New permutations for stores group 
   of
   length 3.
   (vect_grouped_load_supported): New check for loads group of 
   length
   3.
   (vect_permute_load_chain): New permutations for loads group of
   length 3.
   * tree-vect-stmts.c 

Re: [PATCH v8] PR middle-end/60281

2014-04-18 Thread lin zuojian
Hi Jakub,

On Fri, Apr 18, 2014 at 09:09:17AM +0200, Jakub Jelinek wrote:
 Extra line missing before the PR line.
Should I post PATCH v9 or someone helps adding one when committing
the patch?

--
Regards
lin zuojian


Re: Remove obsolete Solaris 9 support

2014-04-18 Thread Eric Botcazou
 But for the Solaris 9 stuff, it crystal clear that this cannot occur on
 Solaris 10 and up (no single-threaded case anymore since libthread.so.1
 has been folded into libc.so.1).  Ok to remove this part?

OK for the Solaris 9 - single-threaded part.

-- 
Eric Botcazou


[PATCH] RTEMS thread model configuration

2014-04-18 Thread Sebastian Huber
From: Sebastian Huber sebastian-hu...@web.de

The command line to build a GCC for RTEMS contained virtually always a
'--enable-threads'.  This patch helps to avoid this extra configuration
command line parameter and makes the GCC build a bit more user friendly
for RTEMS.

This patch should be applied to GCC 4.9 branch and master.

2014-04-18  Sebastian Huber  sebastian.hu...@embedded-brains.de

* config.gcc (*-*-rtems*): Default to 'rtems' thread model.
Enable selection of 'posix' or no thread model.
---
 gcc/config.gcc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3c55c88..93d5994 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -791,7 +791,13 @@ case ${target} in
   ;;
 *-*-rtems*)
   case ${enable_threads} in
-yes) thread_file='rtems' ;;
+ | yes | rtems) thread_file='rtems' ;;
+posix) thread_file='posix' ;;
+no) ;;
+*)
+  echo 'Unknown thread configuration for RTEMS'
+  exit 1
+  ;;
   esac
   tmake_file=${tmake_file} t-rtems
   extra_options=${extra_options} rtems.opt
-- 
1.8.1.4



Re: [PATCH] Do not run IPA transform phases multiple times

2014-04-18 Thread Jakub Jelinek
On Wed, Apr 16, 2014 at 03:28:59PM +, Zamyatin, Igor wrote:
 Likely after this was checked in appeared following on x86
 
 FAIL: gcc.dg/vect/vect-simd-clone-11.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-11.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-12.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-12.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-1.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-1.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-2.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-2.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-3.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-3.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-4.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-4.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-5.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-5.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-6.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-6.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-7.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-7.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-8.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-8.c -flto -ffat-lto-objects (test for 
 excess errors)
 FAIL: gcc.dg/vect/vect-simd-clone-9.c -flto -ffat-lto-objects (internal 
 compiler error)
 FAIL: gcc.dg/vect/vect-simd-clone-9.c -flto -ffat-lto-objects (test for 
 excess errors)

Yeah, it is in the assert added in this patch:
977   gcc_assert (!old_version_node-ipa_transforms_to_apply.exists ());
in cgraph_function_versioning.
pass_omp_simd_clone is a late IPA pass which needs to perform
cgraph_function_versioning, and the ICE is in lto1 when the old_version_node
has been read from the LTO IL from the object file, and
ipa_transforms_to_apply contains tons of various transforms, but I suppose
that during late IPA passes they are no longer performed.

Martin, can you please fix this up?

Jakub


Re: [PATCH] Try to coalesce for unary and binary ops

2014-04-18 Thread Eric Botcazou
 Now the question is what does this tell us?  Not re-using
 the same pseudo as op and target is always better?

I think that generally better is the most appropriate wording, there is even 
a specific pass (web.c) to that effect.

-- 
Eric Botcazou


Re: debug container patch

2014-04-18 Thread Jonathan Wakely
On 17 April 2014 21:43, François Dumont wrote:
 Hi

 Here is a patch to globally enhance debug containers implementation.

 I have isolated all code of special functions in a base class so that in
 C++11 we can use default implementations for the debug containers. This way
 implementation is simpler and inherit from the noexcept qualifications.

Excellent.

 I had to put a _IsCpp11AllocatorAware template parameter to this new
 type for types that are not yet C++11 allocator aware. We will be able to
 simplify it later.

Minor: we switch from using cpp to cxx, meaning cpp can
unambiguously refer to the preprocessor, so I would use _IsCxx11...
for that.

 I noticed also that in std/c++11/debug.cc we have some methods qualified
 with noexcept while in a C++03 user code those methods will have a throw()
 qualification. Is that fine ?

Yes, an empty throw() is compatible with noexcept(true).

I'll review the rest of the patch over the weekend, thanks!


Re: [PATCH] Fix up rotate expansion (take 2)

2014-04-18 Thread Jakub Jelinek
On Wed, Apr 16, 2014 at 10:22:52PM -0400, DJ Delorie wrote:
  +   {
  + other_amount
  +   = simplify_gen_unary (NEG, GET_MODE (op1),
  + op1, GET_MODE (op1));
  + other_amount
  +   = simplify_gen_binary (AND, GET_MODE (op1),
  +  other_amount,
  +  GEN_INT (GET_MODE_PRECISION (mode)
  +   - 1));
  +   }
   
shifted = force_reg (mode, shifted);
   
 
 causes an ICE in gcc.c-torture/execute/20020226-1.c, which we tried to
 avoid by adding an addneghi pattern (which itself has a bug, hence me
 investigating).

I don't see why you would need such an pattern.  Here is what happens
on x86_64 if I forcefully ignore the rotate patterns to excersize this code:

#8  0x00a20bd7 in expand_simple_unop (mode=SImode, code=NEG, 
op0=0x719f7600, target=0x719fc2c0, unsignedp=0)
at ../../gcc/optabs.c:2511
#9  0x007f180c in force_operand (value=0x719fd040, 
target=0x719fc2c0) at ../../gcc/expr.c:7212
#10 0x007f1457 in force_operand (value=0x719ddd68, target=0x0) at 
../../gcc/expr.c:7154
#11 0x007c8509 in force_reg (mode=SImode, x=0x719ddd68) at 
../../gcc/explow.c:683
#12 0x007dd502 in convert_move (to=0x719fc2a0, from=0x719ddd68, 
unsignedp=1) at ../../gcc/expr.c:607
#13 0x007ddde0 in convert_modes (mode=QImode, oldmode=SImode, 
x=0x719ddd68, unsignedp=1) at ../../gcc/expr.c:798
#14 0x00a1ddba in expand_binop_directly (mode=SImode, 
binoptab=ashl_optab, op0=0x719f78c0, op1=0x719ddd68, target=0x0, 
unsignedp=1, methods=OPTAB_LIB_WIDEN, last=0x719f4f78) at 
../../gcc/optabs.c:1437
#15 0x00a1e18e in expand_binop (mode=SImode, binoptab=ashl_optab, 
op0=0x719f78c0, op1=0x719ddd68, target=0x0, unsignedp=1, 
methods=OPTAB_LIB_WIDEN) at ../../gcc/optabs.c:1546
#16 0x007d12c9 in expand_shift_1 (code=LSHIFT_EXPR, mode=SImode, 
shifted=0x719f78c0, amount=0x719ddd68, target=0x0, unsignedp=1)
at ../../gcc/expmed.c:2287
#17 0x007d1209 in expand_shift_1 (code=RROTATE_EXPR, mode=SImode, 
shifted=0x719f78c0, amount=0x719f7600, target=0x0, unsignedp=1)
at ../../gcc/expmed.c:2275

i.e. the other_amount expression (and (neg ()) (const_int)) is, unless
it is a general_operand (shouldn't be usually the case) is first forced
to register with whatever mode it has (GET_MODE (op1)), and forcing it
into a register forces even the operands of the AND using force_operand,
thus separate NEG and separate AND insn, and then finally is mode converted
(zero extended or truncated) into whatever mode the shift pattern requires.

Jakub


Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Steven Bosscher
On Fri, Apr 18, 2014 at 7:30 AM, Marek Polacek wrote:
 + if (TREE_CODE (type) == BOOLEAN_TYPE
 + || exp_code == TRUTH_ANDIF_EXPR
 + || exp_code == TRUTH_AND_EXPR
 + || exp_code == TRUTH_ORIF_EXPR
 + || exp_code == TRUTH_OR_EXPR
 + || exp_code == TRUTH_XOR_EXPR
 + || exp_code == TRUTH_NOT_EXPR
 + || exp_code == EQ_EXPR
 + || exp_code == NE_EXPR
 + || exp_code == LE_EXPR
 + || exp_code == GE_EXPR
 + || exp_code == LT_EXPR
 + || exp_code == GT_EXPR)

Is there a TREE_CODE_CLASS or a #define for this?

Ciao!
Steven


Re: [PATCH] Try to coalesce for unary and binary ops

2014-04-18 Thread Steven Bosscher
On Thu, Apr 17, 2014 at 4:00 PM, Michael Matz wrote:
 And to have sth that TER not immediately un-does we have
 to disable TER which conveniently happens for coalesced
 SSA names.

 So, instead TER should be improved to not disturb the incoming instruction
 order (except where secondary effects of expanding larger trees can be
 had).  Changing the coalescing set to disable some bad parts in a later
 pass doesn't sound very convincing :)

IMHO TER should be improved to *do* disturb the order of the incoming
instructions, to reduce register pressure. There are test cases I've
looked at (pathological cases, I'll admit) where TER forwarded loads
to stores and blew up register pressure.

Alternatively: Do what has to be done to enable sched1 for ix86/x86_64...

Ciao!
Steven


Re: [PATCH] Do not run IPA transform phases multiple times

2014-04-18 Thread Martin Jambor
On Fri, Apr 18, 2014 at 12:11:45PM +0200, Jakub Jelinek wrote:
 On Wed, Apr 16, 2014 at 03:28:59PM +, Zamyatin, Igor wrote:
  Likely after this was checked in appeared following on x86
  
  FAIL: gcc.dg/vect/vect-simd-clone-11.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-11.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-12.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-12.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-1.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-1.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-2.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-2.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-3.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-3.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-4.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-4.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-5.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-5.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-6.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-6.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-7.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-7.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-8.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-8.c -flto -ffat-lto-objects (test for 
  excess errors)
  FAIL: gcc.dg/vect/vect-simd-clone-9.c -flto -ffat-lto-objects (internal 
  compiler error)
  FAIL: gcc.dg/vect/vect-simd-clone-9.c -flto -ffat-lto-objects (test for 
  excess errors)
 
 Yeah, it is in the assert added in this patch:
 977 gcc_assert (!old_version_node-ipa_transforms_to_apply.exists ());
 in cgraph_function_versioning.
 pass_omp_simd_clone is a late IPA pass which needs to perform
 cgraph_function_versioning, and the ICE is in lto1 when the old_version_node
 has been read from the LTO IL from the object file, and
 ipa_transforms_to_apply contains tons of various transforms, but I suppose
 that during late IPA passes they are no longer performed.
 
 Martin, can you please fix this up?
 
   Jakub

I am aware this problem has been reported but my problem is that I
cannot reproduce it anywhere.  The tests pass on x86_64 host and the
only i686 host I use is gcc45 on gcc farm where these tests are
skipped.  It is very likely that the patch below is the proper fix but
it's cumbersome to propose patches without having the testcase.  And
of course I cannot test it because the vecotr is empty for me in all
cases.

I'll try to look for some new i686 machine somewhere, although they
are very scarce recently.

Martin


2014-04-18  Martin Jambor  mjam...@suse.cz

* cgraphclones.c (cgraph_function_versioning): Copy
ipa_transforms_to_apply instead of asserting it is empty.

Index: src/gcc/cgraphclones.c
===
--- src.orig/gcc/cgraphclones.c
+++ src/gcc/cgraphclones.c
@@ -974,7 +974,9 @@ cgraph_function_versioning (struct cgrap
 cgraph_copy_node_for_versioning (old_version_node, new_decl,
 redirect_callers, bbs_to_copy);
 
-  gcc_assert (!old_version_node-ipa_transforms_to_apply.exists ());
+  if (old_version_node-ipa_transforms_to_apply.exists ())
+new_version_node-ipa_transforms_to_apply
+  = old_version_node-ipa_transforms_to_apply.copy ();
   /* Copy the OLD_VERSION_NODE function tree to the new version.  */
   tree_function_versioning (old_decl, new_decl, tree_map, false, args_to_skip,
skip_return, bbs_to_copy, new_entry_block);


Re: [PATCH] Do not run IPA transform phases multiple times

2014-04-18 Thread Jakub Jelinek
On Fri, Apr 18, 2014 at 01:33:54PM +0200, Martin Jambor wrote:
 I am aware this problem has been reported but my problem is that I
 cannot reproduce it anywhere.  The tests pass on x86_64 host and the
 only i686 host I use is gcc45 on gcc farm where these tests are
 skipped.  It is very likely that the patch below is the proper fix but

It reproduces on x86_64 too, I guess the reason why you aren't seeing this
is that you might have too old assembler that doesn't support
avx2 instructions (you actually don't need avx2 hw to reproduce, any
x86_64 or i686 just with gas that supports avx2 should be enough).

Jakub


Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Marc Glisse

On Fri, 18 Apr 2014, Steven Bosscher wrote:


On Fri, Apr 18, 2014 at 7:30 AM, Marek Polacek wrote:

+ if (TREE_CODE (type) == BOOLEAN_TYPE
+ || exp_code == TRUTH_ANDIF_EXPR
+ || exp_code == TRUTH_AND_EXPR
+ || exp_code == TRUTH_ORIF_EXPR
+ || exp_code == TRUTH_OR_EXPR
+ || exp_code == TRUTH_XOR_EXPR
+ || exp_code == TRUTH_NOT_EXPR
+ || exp_code == EQ_EXPR
+ || exp_code == NE_EXPR
+ || exp_code == LE_EXPR
+ || exp_code == GE_EXPR
+ || exp_code == LT_EXPR
+ || exp_code == GT_EXPR)


Is there a TREE_CODE_CLASS or a #define for this?


truth_value_p

--
Marc Glisse


Re: [PATCH] Try to coalesce for unary and binary ops

2014-04-18 Thread Richard Biener
On April 18, 2014 1:30:59 PM CEST, Steven Bosscher stevenb@gmail.com 
wrote:
On Thu, Apr 17, 2014 at 4:00 PM, Michael Matz wrote:
 And to have sth that TER not immediately un-does we have
 to disable TER which conveniently happens for coalesced
 SSA names.

 So, instead TER should be improved to not disturb the incoming
instruction
 order (except where secondary effects of expanding larger trees can
be
 had).  Changing the coalescing set to disable some bad parts in a
later
 pass doesn't sound very convincing :)

IMHO TER should be improved to *do* disturb the order of the incoming
instructions, to reduce register pressure. There are test cases I've
looked at (pathological cases, I'll admit) where TER forwarded loads
to stores and blew up register pressure.

I am looking at doing sth like that by scheduling gimple stmts to that effect 
before RTL expansion. But TER undos most of the effort unfortunately. TER 
itself won't be able to perform scheduling due to what it is designed to do 
(simulate RTL expansion from GENERIC).

Richard.

Alternatively: Do what has to be done to enable sched1 for
ix86/x86_64...

Ciao!
Steven




Re: [C PATCH] Warn if switch has boolean value (PR c/60439)

2014-04-18 Thread Marek Polacek
On Fri, Apr 18, 2014 at 01:20:59PM +0200, Steven Bosscher wrote:
 On Fri, Apr 18, 2014 at 7:30 AM, Marek Polacek wrote:
  + if (TREE_CODE (type) == BOOLEAN_TYPE
  + || exp_code == TRUTH_ANDIF_EXPR
  + || exp_code == TRUTH_AND_EXPR
  + || exp_code == TRUTH_ORIF_EXPR
  + || exp_code == TRUTH_OR_EXPR
  + || exp_code == TRUTH_XOR_EXPR
  + || exp_code == TRUTH_NOT_EXPR
  + || exp_code == EQ_EXPR
  + || exp_code == NE_EXPR
  + || exp_code == LE_EXPR
  + || exp_code == GE_EXPR
  + || exp_code == LT_EXPR
  + || exp_code == GT_EXPR)
 
 Is there a TREE_CODE_CLASS or a #define for this?

Good question.  I was looking for something nicer and found nothing,
no T_C_C or #define.
But now when I'm looking again, I found truth_value_p...  Thanks.

2014-04-18  Marek Polacek  pola...@redhat.com

PR c/60439
* doc/invoke.texi: Document -Wswitch-bool.
c/
* c-typeck.c (c_start_case): Warn if switch condition has boolean
value.
c-family/
* c.opt (Wswitch-bool): New option.
testsuite/
* gcc.dg/pr60439.c: New test.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 390c056..9089496 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -529,6 +529,10 @@ Wswitch-enum
 C ObjC C++ ObjC++ Var(warn_switch_enum) Warning
 Warn about all enumerated switches missing a specific case
 
+Wswitch-bool
+C ObjC Warning Init(1)
+Warn about switches with boolean controlling expression
+
 Wmissing-format-attribute
 C ObjC C++ ObjC++ Alias(Wsuggest-attribute=format)
 ;
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 65aad45..1b37f83 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -9344,6 +9344,15 @@ c_start_case (location_t switch_loc,
   else
{
  tree type = TYPE_MAIN_VARIANT (orig_type);
+ tree e = exp;
+
+ while (TREE_CODE (e) == COMPOUND_EXPR)
+   e = TREE_OPERAND (e, 1);
+
+ if (TREE_CODE (type) == BOOLEAN_TYPE
+ || truth_value_p (TREE_CODE (e)))
+   warning_at (switch_cond_loc, OPT_Wswitch_bool,
+   switch condition has boolean value);
 
  if (!in_system_header_at (input_location)
   (type == long_integer_type_node
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 8004da8..04e1c41 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -268,7 +268,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstrict-aliasing=n @gol -Wstrict-overflow -Wstrict-overflow=@var{n} @gol
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
 -Wmissing-format-attribute @gol
--Wswitch  -Wswitch-default  -Wswitch-enum -Wsync-nand @gol
+-Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
 -Wsystem-headers  -Wtrampolines  -Wtrigraphs  -Wtype-limits  -Wundef @gol
 -Wuninitialized  -Wunknown-pragmas  -Wno-pragmas @gol
 -Wunsuffixed-float-constants  -Wunused  -Wunused-function @gol
@@ -3822,6 +3822,12 @@ between @option{-Wswitch} and this option is that this 
option gives a
 warning about an omitted enumeration code even if there is a
 @code{default} label.
 
+@item -Wswitch-bool
+@opindex Wswitch-bool
+@opindex Wno-switch-bool
+Warn whenever a @code{switch} statement has an index of boolean type.
+This warning is enabled by default for C programs.
+
 @item -Wsync-nand @r{(C and C++ only)}
 @opindex Wsync-nand
 @opindex Wno-sync-nand
diff --git gcc/testsuite/gcc.dg/pr60439.c gcc/testsuite/gcc.dg/pr60439.c
index e69de29..26e7c25 100644
--- gcc/testsuite/gcc.dg/pr60439.c
+++ gcc/testsuite/gcc.dg/pr60439.c
@@ -0,0 +1,112 @@
+/* PR c/60439 */
+/* { dg-do compile } */
+
+typedef _Bool bool;
+extern _Bool foo (void);
+
+void
+f1 (const _Bool b)
+{
+  switch (b) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+}
+
+void
+f2 (int a, int b)
+{
+  switch (a  b) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch ((bool) (a  b)) /* { dg-warning switch condition has boolean 
value } */
+case 1:
+  break;
+  switch ((a  b) || a) /* { dg-warning switch condition has boolean value 
} */
+case 1:
+  break;
+}
+
+void
+f3 (int a)
+{
+  switch (!!a) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch (!a) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+}
+
+void
+f4 (void)
+{
+  switch (foo ()) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+}
+
+void
+f5 (int a)
+{
+  switch (a == 3) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch (a != 3) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch (a  3) /* { dg-warning switch condition has boolean value } */
+case 1:
+  break;
+  switch (a  3) /* { dg-warning switch condition has 

Re: [PATH, SH] Small builtin_strlen improvement

2014-04-18 Thread Oleg Endo
Sorry for the delayed reply.

On Mon, 2014-03-31 at 09:44 +0200, Christian Bruel wrote:
 On 03/30/2014 11:02 PM, Oleg Endo wrote:
  Hi,
 
  On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote:
 
  This patches adds a few instructions to the inlined builtin_strlen to
  unroll the remaining bytes for word-at-a-time loop. This enables to have
  2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
  allowing block alignment assignation. This partially improves the
  problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
  string functions ignore loop and label alignment
  Actually, my original concern was the (mis)alignment of the 4 byte inner
  loop.  AFAIR it's better for the SH pipeline if the first insn of a loop
  is 4 byte aligned.
 
 yes, this is why I haven't closed the PR. IMHO the problem is with the
 non-aligned loop stems from to the generic alignment code in final.c.
 changing branch frequencies is quite impacting to BB reordering as well.
 Further tuning of static branch estimations, or tuning of the LOOP_ALIGN
 macro is needed. 

OK, I've updated PR 60539 accordingly.

 Note that my branch estimations in this code is very
 empirical, a dynamic profiling benchmarking would be nice as well.
 My point was just that forcing a local .align in this code is a
 workaround, as we should be able to rely on generic reordering/align
 code  for this. So the tuning of loop alignment is more global (and well
 exhibited here indeed)

I think that those two are separate issues.  I've opened a new PR 60884
for this.  Let's continue the discussions and experiments there.

Cheers,
Oleg




Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Dinar Temirbulatov
I found typo in the testcase header, fixed. Ok to apply to trunk?
thanks, Dinar.

On Fri, Apr 18, 2014 at 1:13 PM, Dinar Temirbulatov
dtemirbula...@gmail.com wrote:
 Hello,
 Here is another version of fix. This time, I added
 complete_type_or_else call just before aggregate_value_p. Bootstraped
 and tested on x86_64-pc-linux-gnu with no new regressions.  Ok to
 apply
 to trunk?
  Thanks, Dinar.


 On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote:
 On 04/07/2014 03:46 PM, Jason Merrill wrote:

 I guess we need to call complete_type before aggregate_value_p.


 complete_type_or_else, actually.

 Jason


2014-04-18  Dinar Temirbulatov  dtemirbula...@gmail.com

PR c++/57958
* semantics.c (apply_deduced_return_type): Complete non-void type
before estimating whether the type is aggregate.


fix.patch
Description: Binary data


Re: [PATCH] Do not run IPA transform phases multiple times

2014-04-18 Thread Martin Jambor
On Fri, Apr 18, 2014 at 01:49:36PM +0200, Jakub Jelinek wrote:
 It reproduces on x86_64 too, I guess the reason why you aren't seeing this
 is that you might have too old assembler that doesn't support
 avx2 instructions (you actually don't need avx2 hw to reproduce, any
 x86_64 or i686 just with gas that supports avx2 should be enough).
 

I see, with that information I have managed to reproduce the failures
and now am convinced the patch (re-posted below) is indeed the correct
thing to do.  I am going to bootstrap it over the weekend (can't do it
earlier to test these testcases).  Honza, is it OK for trunk if it
passes?

Thanks,

Martin


2014-04-18  Martin Jambor  mjam...@suse.cz

* cgraphclones.c (cgraph_function_versioning): Copy
ipa_transforms_to_apply instead of asserting it is empty.

Index: src/gcc/cgraphclones.c
===
--- src.orig/gcc/cgraphclones.c
+++ src/gcc/cgraphclones.c
@@ -974,7 +974,9 @@ cgraph_function_versioning (struct cgrap
 cgraph_copy_node_for_versioning (old_version_node, new_decl,
 redirect_callers, bbs_to_copy);
 
-  gcc_assert (!old_version_node-ipa_transforms_to_apply.exists ());
+  if (old_version_node-ipa_transforms_to_apply.exists ())
+new_version_node-ipa_transforms_to_apply
+  = old_version_node-ipa_transforms_to_apply.copy ();
   /* Copy the OLD_VERSION_NODE function tree to the new version.  */
   tree_function_versioning (old_decl, new_decl, tree_map, false, args_to_skip,
skip_return, bbs_to_copy, new_entry_block);



Re: [PATCH, rs6000, 4.8, 4.9, trunk] Fix little endian behavior of vec_merge[hl] for V4SI/V4SF with VSX

2014-04-18 Thread David Edelsohn
On Thu, Apr 17, 2014 at 11:06 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 I missed a case in the vector API work for little endian.  When VSX is
 enabled, the vec_mergeh and vec_mergel interfaces for 4x32 vectors are
 translated into xxmrghw and xxmrglw.  The patterns for these were not
 adjusted for little endian.  This patch fixes this and adds tests for
 V4SI and V4SF modes when VSX is available.

 Bootstrapped and tested on 4.8, 4.9, and trunk for
 powerpc64le-unknown-linux-gnu with no regressions.  Tests are still
 ongoing for powerpc64-unknown-linux-gnu.  Provided those complete
 without regressions, is this fix ok for trunk, 4.9, and 4.8?

 Thanks,
 Bill


 [gcc]

 2014-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/vsx.md (vsx_xxmrghw_mode): Adjust for
 little-endian.
 (vsx_xxmrglw_mode): Likewise.

 [gcc/testsuite]

 2014-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * gcc.dg/vmx/merge-vsx.c: Add V4SI and V4SF tests.
 * gcc.dg/vmx/merge-vsx-be-order.c: Likewise.

This is okay for trunk, 4.9.1 and 4.8.3.  Note that the 4.9 branch is
frozen at the moment.

Thanks, David


[PATCH] Fix omp declare simd cloning (PR tree-optimization/60823)

2014-04-18 Thread Jakub Jelinek
Hi!

This patch fixes the adjustments performed by ipa_simd_modify_function_body
on omp declare simd clones.
Previously we've been trying to replace typically SSA_NAMEs with underlying
PARM_DECLs of the to be replaced arguments with loads/stores from/to
array refs (that will be hopefully vectorized) right around the referencing
stmt, but:
1) this can't really work well if there is any life range overlap in SSA_NAMEs
   with the same underlying PARM_DECL
2) PHIs weren't handled at all (neither PHI arguments, nor lhs of the PHIs)
3) for addressable PARM_DECLs the code pretty much assumed the same thing
   can be done too

This patch instead adjusts all SSA_NAMEs with SSA_NAME_VAR equal to the to
be replaced PARM_DECLs to a new underlying VAR_DECL, only changes the
(D) SSA_NAME to a load done at the start of the entry block, and for
addressable PARM_DECLs adjusts them such that they don't have to be
regimplified (as we replace say address of a PARM_DECL which is a
gimple_min_invariant with array ref with variable index which is not
gimple_min_invariant, we need to force the addresses into SSA_NAMEs).

The tree-dfa.c fix is what I've discovered while writing the patch,
if htab_find_slot_with_hash (..., NO_INSERT) fails to find something
in the hash table (most likely not actually needed by the patch, discovered
that just because the patch was buggy initially), it returns NULL rather
than address of some slot which will contain NULL.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Richard, does this look reasonable?

2014-04-18  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/60823
* omp-low.c (ipa_simd_modify_function_body): Go through
all SSA_NAMEs and for those refering to vector arguments
which are going to be replaced adjust SSA_NAME_VAR and,
if it is a default definition, change it into a non-default
definition assigned at the beginning of function from new_decl.
(ipa_simd_modify_stmt_ops): Rewritten.
* tree-dfa.c (set_ssa_default_def): When removing default def,
check for NULL loc instead of NULL *loc.

* c-c++-common/gomp/pr60823-1.c: New test.
* c-c++-common/gomp/pr60823-2.c: New test.
* c-c++-common/gomp/pr60823-3.c: New test.

--- gcc/omp-low.c.jj2014-04-17 14:48:59.076025713 +0200
+++ gcc/omp-low.c   2014-04-18 12:00:16.666701773 +0200
@@ -11281,45 +11281,53 @@ static tree
 ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, void *data)
 {
   struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
-  if (!SSA_VAR_P (*tp))
+  struct modify_stmt_info *info = (struct modify_stmt_info *) wi-info;
+  tree *orig_tp = tp;
+  if (TREE_CODE (*tp) == ADDR_EXPR)
+tp = TREE_OPERAND (*tp, 0);
+  struct ipa_parm_adjustment *cand = NULL;
+  if (TREE_CODE (*tp) == PARM_DECL)
+cand = ipa_get_adjustment_candidate (tp, NULL, info-adjustments, true);
+  else
 {
-  /* Make sure we treat subtrees as a RHS.  This makes sure that
-when examining the `*foo' in *foo=x, the `foo' get treated as
-a use properly.  */
-  wi-is_lhs = false;
-  wi-val_only = true;
   if (TYPE_P (*tp))
*walk_subtrees = 0;
-  return NULL_TREE;
-}
-  struct modify_stmt_info *info = (struct modify_stmt_info *) wi-info;
-  struct ipa_parm_adjustment *cand
-= ipa_get_adjustment_candidate (tp, NULL, info-adjustments, true);
-  if (!cand)
-return NULL_TREE;
-
-  tree t = *tp;
-  tree repl = make_ssa_name (TREE_TYPE (t), NULL);
-
-  gimple stmt;
-  gimple_stmt_iterator gsi = gsi_for_stmt (info-stmt);
-  if (wi-is_lhs)
-{
-  stmt = gimple_build_assign (unshare_expr (cand-new_decl), repl);
-  gsi_insert_after (gsi, stmt, GSI_SAME_STMT);
-  SSA_NAME_DEF_STMT (repl) = info-stmt;
 }
+
+  tree repl = NULL_TREE;
+  if (cand)
+repl = unshare_expr (cand-new_decl);
   else
 {
-  /* You'd think we could skip the extra SSA variable when
-wi-val_only=true, but we may have `*var' which will get
-replaced into `*var_array[iter]' and will likely be something
-not gimple.  */
-  stmt = gimple_build_assign (repl, unshare_expr (cand-new_decl));
-  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  if (tp != orig_tp)
+   {
+ *walk_subtrees = 0;
+ bool modified = info-modified;
+ info-modified = false;
+ walk_tree (tp, ipa_simd_modify_stmt_ops, wi, wi-pset);
+ if (!info-modified)
+   {
+ info-modified = modified;
+ return NULL_TREE;
+   }
+ info-modified = modified;
+ repl = *tp;
+   }
+  else
+   return NULL_TREE;
 }
 
-  if (!useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (repl)))
+  if (tp != orig_tp)
+{
+  repl = build_fold_addr_expr (repl);
+  gimple stmt
+   = gimple_build_assign (make_ssa_name (TREE_TYPE (repl), NULL), repl);
+  repl = gimple_assign_lhs (stmt);
+  

Re: [PATCH] Introduce MODE_SIZE mode attribute

2014-04-18 Thread H.J. Lu
On Mon, Jan 6, 2014 at 7:20 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Sat, Jan 04, 2014 at 12:37:57AM +0100, Jakub Jelinek wrote:
 That is certainly doable (as attached), but strangely if the patch (that I've
 already committed) is reverted and this one applied, the .text savings are
 much smaller.

 Here are .text and .rodata readelf -WS lines from x86_64 (first 4 pairs) and
 i686 (last 4 pairs) builds, always vanilla trunk before r206312, that plus
 r206312 patch, without r206312 but with attached patch, with both r206312
 and attached patch.  So, for .text size the best is both patches, but
 for .rodata patches just r206312.  I'll try to look at details why this is so
 next week.

 The difference is I think caused by the way gencondition.c works.
 As the array with the conditions is a toplevel array, __builtin_constant_p
 is folded there already during the parsing, after folding the conditions.


For some reason, it triggered:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60887

-- 
H.J.


Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Jason Merrill

On 04/18/2014 08:53 AM, Dinar Temirbulatov wrote:

+  if (TREE_CODE (TREE_TYPE(result)) != VOID_TYPE)


Space after TREE_TYPE.  OK with that change.

Jason



Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Paolo Carlini


Hi,

On 18 aprile 2014 17:20:40 CEST, Jason Merrill ja...@redhat.com wrote:
On 04/18/2014 08:53 AM, Dinar Temirbulatov wrote:
 +  if (TREE_CODE (TREE_TYPE(result)) != VOID_TYPE)

Space after TREE_TYPE.  OK with that change.

I'm traveling and I can't really check, but I seem to remember that we do have 
a VOID_TYPE_P.

Paolo



Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Jason Merrill

On 04/18/2014 11:39 AM, Paolo Carlini wrote:

I'm traveling and I can't really check, but I seem to remember that we do have 
a VOID_TYPE_P.


True, I suppose that would be preferable.

Jason




[COMMITTED] Fix silly error in aarch64_register_move_cost

2014-04-18 Thread Richard Henderson
Building mainline I got

 .../aarch64.c:4879:134: error: invalid conversion from ‘reg_class_t {aka 
 int}’ to ‘machine_mode’ [-fpermissive]
if (! TARGET_SIMD  GET_MODE_SIZE (from) == 128  GET_MODE_SIZE (to) == 
 128)

Sure enough, TO and FROM are not modes.  Did mainline just change away from
permissive or something?  It surely seems like we ought to have seen this
earlier...

Anyway, applied as obvious to all active branches.


r~
* config/aarch64/aarch64.c (aarch64_register_move_cost): Pass a mode
to GET_MODE_SIZE, not a reg_class_t.



diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a3147ee..7b6c2b3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4847,9 +4847,11 @@ aarch64_address_cost (rtx x ATTRIBUTE_UNUSED,
 }
 
 static int
-aarch64_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
-   reg_class_t from, reg_class_t to)
+aarch64_register_move_cost (enum machine_mode mode,
+   reg_class_t from_i, reg_class_t to_i)
 {
+  enum reg_class from = (enum reg_class) from_i;
+  enum reg_class to = (enum reg_class) to_i;
   const struct cpu_regmove_cost *regmove_cost
 = aarch64_tune_params-regmove_cost;
 
@@ -4875,8 +4877,7 @@ aarch64_register_move_cost (enum machine_mode mode 
ATTRIBUTE_UNUSED,
  secondary reload.  A general register is used as a scratch to move
  the upper DI value and the lower DI value is moved directly,
  hence the cost is the sum of three moves. */
-
-  if (! TARGET_SIMD  GET_MODE_SIZE (from) == 128  GET_MODE_SIZE (to) == 
128)
+  if (! TARGET_SIMD  GET_MODE_SIZE (mode) == 128)
 return regmove_cost-GP2FP + regmove_cost-FP2GP + regmove_cost-FP2FP;
 
   return regmove_cost-FP2FP;


Re: [RFC] Add aarch64 support for ada

2014-04-18 Thread Richard Henderson
On 04/16/2014 12:55 AM, Eric Botcazou wrote:
 Similarly with the HAVE_GNAT_ALTERNATE_STACK stuff.  There aren't any
 linux hosts that don't support sigaltstack, so why is this
 conditionalized?
 
 Hum, I didn't know that Android also used the alternate stack...  OK, let's 
 use it unconditionally on Linux then, except for IA-64 which is a totally 
 different beast.  Can you change the patch accordingly?
 

How about this?  I added a check vs MINSIGSTKSZ just in case, and updated the
commentary a bit.  While 16K is 2*SIGSTKSIZE for i686, it certainly isn't for
powerpc64.  But since things are working as-is I thought the revision is 
clearer.

Ok?


r~
* init.c [__linux__] (HAVE_GNAT_ALTERNATE_STACK): New define.
(__gnat_alternate_stack): Enable for all linux except ia64.


diff --git a/gcc/ada/init.c b/gcc/ada/init.c
index c3824ab..48319d6 100644
--- a/gcc/ada/init.c
+++ b/gcc/ada/init.c
@@ -556,9 +556,14 @@ __gnat_error_handler (int sig, siginfo_t *si 
ATTRIBUTE_UNUSED, void *ucontext)
   Raise_From_Signal_Handler (exception, msg);
 }
 
-#if defined (i386) || defined (__x86_64__) || defined (__powerpc__)
-/* This must be in keeping with System.OS_Interface.Alternate_Stack_Size.  */
-char __gnat_alternate_stack[16 * 1024]; /* 2 * SIGSTKSZ */
+#ifndef __ia64__
+#define HAVE_GNAT_ALTERNATE_STACK 1
+/* This must be in keeping with System.OS_Interface.Alternate_Stack_Size.
+   It must be larger than MINSIGSTKSZ and hopefully near 2 * SIGSTKSZ.  */
+# if 16 * 1024  MINSIGSTKSZ
+#  error __gnat_alternate_stack too small
+# endif
+char __gnat_alternate_stack[16 * 1024];
 #endif
 
 #ifdef __XENO__
@@ -612,7 +617,7 @@ __gnat_install_handler (void)
 sigaction (SIGBUS,  act, NULL);
   if (__gnat_get_interrupt_state (SIGSEGV) != 's')
 {
-#if defined (i386) || defined (__x86_64__) || defined (__powerpc__)
+#ifdef HAVE_GNAT_ALTERNATE_STACK
   /* Setup an alternate stack region for the handler execution so that
 stack overflows can be handled properly, avoiding a SEGV generation
 from stack usage by the handler itself.  */


LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Aaron Sawdey

Honza,
  Seeing your recent patches relating to inliner heuristics for LTO, I 
thought I should mention some related work I'm doing.


By way of introduction, I've recently joined the IBM LTC's PPC Toolchain 
team, working on gcc performance.


We have not generally seen good results using LTO on IBM power processors 
and one of the problems seems to be excessive inlining that results in the 
generation of excessive spill code. So, I have set out to tackle this by 
doing some analysis at the time of the inliner pass to compute something 
analogous to register pressure, which is then used to shut down inlining of 
routines that have a lot of pressure.


The analysis is basically a liveness analysis on the SSA names per basic 
block and looking for the maximum number live in any block. I've been using 
liveness pressure as a shorthand name for this.


This can then be used in two ways.
1) want_inline_function_to_all_callers_p at present always says to inline 
things that have only one call site without regard to size or what this may 
do to the register allocator downstream. In particular, BZ2_decompress in 
bzip2 gets inlined and this causes the pressure reported downstream for the 
int register class to increase 10x. Looking at some combination of pressure 
in caller/callee may help avoid this kind of situation.
2) I also want to experiment with adding the liveness pressure in the 
callee into the badness calculation in edge_badness used by 
inline_small_functions. The idea here is to try to inline functions that 
are less likely to cause register allocator difficulty downstream first.


I am just at the point of getting a prototype working, I will get a patch 
you could take a look at posted next week. In the meantime, do you have any 
comments or feedback?


Thanks,
   Aaron

--
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Jan Hubicka
Hello,
 Honza,
   Seeing your recent patches relating to inliner heuristics for LTO,
 I thought I should mention some related work I'm doing.
 
 By way of introduction, I've recently joined the IBM LTC's PPC
 Toolchain team, working on gcc performance.
 
 We have not generally seen good results using LTO on IBM power
 processors and one of the problems seems to be excessive inlining
 that results in the generation of excessive spill code. So, I have
 set out to tackle this by doing some analysis at the time of the
 inliner pass to compute something analogous to register pressure,
 which is then used to shut down inlining of routines that have a lot
 of pressure.

This is intresting.  I sort of planned to add register pressure logic
but always tought it is somewhat hard to do at GIMPLE level in a way
that would work for all CPUs.
 
 The analysis is basically a liveness analysis on the SSA names per
 basic block and looking for the maximum number live in any block.
 I've been using liveness pressure as a shorthand name for this.

I believe this is usually called width
 
 This can then be used in two ways.
 1) want_inline_function_to_all_callers_p at present always says to
 inline things that have only one call site without regard to size or
 what this may do to the register allocator downstream. In
 particular, BZ2_decompress in bzip2 gets inlined and this causes the
 pressure reported downstream for the int register class to increase
 10x. Looking at some combination of pressure in caller/callee may
 help avoid this kind of situation.
 2) I also want to experiment with adding the liveness pressure in
 the callee into the badness calculation in edge_badness used by
 inline_small_functions. The idea here is to try to inline functions
 that are less likely to cause register allocator difficulty
 downstream first.

Sounds interesting.  I am very curious if you can get consistent improvements
with this.  I only implemented logic for large stack frames, but in C++ code
it seems often to do more harm than good.

If you find examples of bad inlining, can you also fill it into bugzilla?
Perhaps the individual cases could be handled better by improving IRA.

Honza
 
 I am just at the point of getting a prototype working, I will get a
 patch you could take a look at posted next week. In the meantime, do
 you have any comments or feedback?
 
 Thanks,
Aaron
 
 -- 
 Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
 050-2/C113  (507) 253-7520 home: 507/263-0782
 IBM Linux Technology Center - PPC Toolchain


Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Xinliang David Li
Do you witness similar problems with LTO +FDO?

My concern is it can be tricky to get the register pressure estimate
right. The register pressure problem is created by downstream
components (code motions etc) but only exposed by the inliner.  If you
want to get it 'right' (i.e., not exposing the problems), you will
need to bake the knowledge of the downstream components (possibly
bugs) into the analysis which might not be a good thing to do longer
term.

David

On Fri, Apr 18, 2014 at 9:43 AM, Aaron Sawdey
acsaw...@linux.vnet.ibm.com wrote:
 Honza,
   Seeing your recent patches relating to inliner heuristics for LTO, I
 thought I should mention some related work I'm doing.

 By way of introduction, I've recently joined the IBM LTC's PPC Toolchain
 team, working on gcc performance.

 We have not generally seen good results using LTO on IBM power processors
 and one of the problems seems to be excessive inlining that results in the
 generation of excessive spill code. So, I have set out to tackle this by
 doing some analysis at the time of the inliner pass to compute something
 analogous to register pressure, which is then used to shut down inlining of
 routines that have a lot of pressure.

 The analysis is basically a liveness analysis on the SSA names per basic
 block and looking for the maximum number live in any block. I've been using
 liveness pressure as a shorthand name for this.

 This can then be used in two ways.
 1) want_inline_function_to_all_callers_p at present always says to inline
 things that have only one call site without regard to size or what this may
 do to the register allocator downstream. In particular, BZ2_decompress in
 bzip2 gets inlined and this causes the pressure reported downstream for the
 int register class to increase 10x. Looking at some combination of pressure
 in caller/callee may help avoid this kind of situation.
 2) I also want to experiment with adding the liveness pressure in the callee
 into the badness calculation in edge_badness used by inline_small_functions.
 The idea here is to try to inline functions that are less likely to cause
 register allocator difficulty downstream first.

 I am just at the point of getting a prototype working, I will get a patch
 you could take a look at posted next week. In the meantime, do you have any
 comments or feedback?

 Thanks,
Aaron

 --
 Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
 050-2/C113  (507) 253-7520 home: 507/263-0782
 IBM Linux Technology Center - PPC Toolchain



Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Xinliang David Li
On Fri, Apr 18, 2014 at 10:26 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hello,
 Honza,
   Seeing your recent patches relating to inliner heuristics for LTO,
 I thought I should mention some related work I'm doing.

 By way of introduction, I've recently joined the IBM LTC's PPC
 Toolchain team, working on gcc performance.

 We have not generally seen good results using LTO on IBM power
 processors and one of the problems seems to be excessive inlining
 that results in the generation of excessive spill code. So, I have
 set out to tackle this by doing some analysis at the time of the
 inliner pass to compute something analogous to register pressure,
 which is then used to shut down inlining of routines that have a lot
 of pressure.

 This is intresting.  I sort of planned to add register pressure logic
 but always tought it is somewhat hard to do at GIMPLE level in a way
 that would work for all CPUs.

 The analysis is basically a liveness analysis on the SSA names per
 basic block and looking for the maximum number live in any block.
 I've been using liveness pressure as a shorthand name for this.

 I believe this is usually called width

 This can then be used in two ways.
 1) want_inline_function_to_all_callers_p at present always says to
 inline things that have only one call site without regard to size or
 what this may do to the register allocator downstream. In
 particular, BZ2_decompress in bzip2 gets inlined and this causes the
 pressure reported downstream for the int register class to increase
 10x. Looking at some combination of pressure in caller/callee may
 help avoid this kind of situation.
 2) I also want to experiment with adding the liveness pressure in
 the callee into the badness calculation in edge_badness used by
 inline_small_functions. The idea here is to try to inline functions
 that are less likely to cause register allocator difficulty
 downstream first.

 Sounds interesting.  I am very curious if you can get consistent improvements
 with this.  I only implemented logic for large stack frames, but in C++ code
 it seems often to do more harm than good.

 If you find examples of bad inlining, can you also fill it into bugzilla?
 Perhaps the individual cases could be handled better by improving IRA.

yes -- I think this is the right time to do regardless.

David



 Honza

 I am just at the point of getting a prototype working, I will get a
 patch you could take a look at posted next week. In the meantime, do
 you have any comments or feedback?

 Thanks,
Aaron

 --
 Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
 050-2/C113  (507) 253-7520 home: 507/263-0782
 IBM Linux Technology Center - PPC Toolchain


C++ PATCH for c++/60872 (bogus error converting pointer to restricted pointer)

2014-04-18 Thread Jason Merrill
When building up the internal representation of a conversion, G++ was 
trying to apply 'restrict' to void, which doesn't make sense.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 53c6cb04ee081b959788da69d36b8d4db1e3d442
Author: Jason Merrill ja...@redhat.com
Date:   Thu Apr 17 09:22:15 2014 -0400

	PR c++/60872
	* call.c (standard_conversion): Don't try to apply restrict to void.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 7dbe935..fbd2f83 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -1196,9 +1196,10 @@ standard_conversion (tree to, tree from, tree expr, bool c_cast_p,
 	TREE_CODE (TREE_TYPE (from)) != FUNCTION_TYPE)
 	{
 	  tree nfrom = TREE_TYPE (from);
+	  /* Don't try to apply restrict to void.  */
+	  int quals = cp_type_quals (nfrom)  ~TYPE_QUAL_RESTRICT;
 	  from = build_pointer_type
-	(cp_build_qualified_type (void_type_node, 
-			  cp_type_quals (nfrom)));
+	(cp_build_qualified_type (void_type_node, quals));
 	  conv = build_conv (ck_ptr, from, conv);
 	}
   else if (TYPE_PTRDATAMEM_P (from))
diff --git a/gcc/testsuite/g++.dg/ext/restrict2.C b/gcc/testsuite/g++.dg/ext/restrict2.C
new file mode 100644
index 000..f053210
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/restrict2.C
@@ -0,0 +1,8 @@
+// PR c++/60872
+// { dg-options  }
+
+typedef double *__restrict T;
+void f(T* p)
+{
+  void *p2 = p;
+}


Re: C++ PATCH for DR 1571 (reference binding)

2014-04-18 Thread Jason Merrill

On 02/25/2014 04:27 PM, Jason Merrill wrote:

Getting the reference binding rules for C++11 right (in the standard)
has taken quite a few iterations.  I'm pretty happy with the latest
wording, which deals with user-defined conversions by recursing on the
result of the conversion.  This patch implements those rules.


The earlier patch broke Firefox and was reverted; this patch avoids that 
regression.


Tested x86_64-pc-linux-gnu, applying to trunk.

commit 17d24bce33f71316e2b1f0a4a96deb29d09de0be
Author: jason jason@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Feb 25 21:27:51 2014 +

	DR 1571
	* call.c (reference_binding): Recurse on user-defined conversion.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index fbd2f83..8c55c32 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -1684,20 +1684,30 @@ reference_binding (tree rto, tree rfrom, tree expr, bool c_cast_p, int flags,
   if (!conv)
 return NULL;
 
+  if (conv-user_conv_p)
+{
+  /* If initializing the temporary used a conversion function,
+	 recalculate the second conversion sequence.  */
+  for (conversion *t = conv; t; t = next_conversion (t))
+	if (t-kind == ck_user
+	 DECL_CONV_FN_P (t-cand-fn))
+	  {
+	tree ftype = TREE_TYPE (TREE_TYPE (t-cand-fn));
+	int sflags = (flags|LOOKUP_NO_CONVERSION)~LOOKUP_NO_TEMP_BIND;
+	conversion *new_second
+	  = reference_binding (rto, ftype, NULL_TREE, c_cast_p,
+   sflags, complain);
+	if (!new_second)
+	  return NULL;
+	return merge_conversion_sequences (t, new_second);
+	  }
+}
+
   conv = build_conv (ck_ref_bind, rto, conv);
   /* This reference binding, unlike those above, requires the
  creation of a temporary.  */
   conv-need_temporary_p = true;
-  if (TYPE_REF_IS_RVALUE (rto))
-{
-  conv-rvaluedness_matches_p = 1;
-  /* In the second case, if the reference is an rvalue reference and
-	 the second standard conversion sequence of the user-defined
-	 conversion sequence includes an lvalue-to-rvalue conversion, the
-	 program is ill-formed.  */
-  if (conv-user_conv_p  next_conversion (conv)-kind == ck_rvalue)
-	conv-bad_p = 1;
-}
+  conv-rvaluedness_matches_p = TYPE_REF_IS_RVALUE (rto);
 
   return conv;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/overload3.C b/gcc/testsuite/g++.dg/cpp0x/overload3.C
index 2d95783..0eecabd 100644
--- a/gcc/testsuite/g++.dg/cpp0x/overload3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/overload3.C
@@ -13,5 +13,5 @@ struct wrap
 int main()
 {
   wrap w;
-  f(w);// { dg-error lvalue }
+  f(w);// { dg-error  }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-init1.C b/gcc/testsuite/g++.dg/cpp0x/rv-init1.C
new file mode 100644
index 000..2e8d4f7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-init1.C
@@ -0,0 +1,26 @@
+// Core DR 1604/1571/1572
+// { dg-require-effective-target c++11 }
+
+struct Banana { };
+struct Enigma { operator const Banana(); };
+struct Doof { operator Banana(); };
+void enigmatic() {
+  typedef const Banana ConstBanana;
+  Banana banana1 = ConstBanana(); // { dg-error  }
+  Banana banana2 = Enigma();  // { dg-error  }
+  Banana banana3 = Doof();// { dg-error  }
+}
+
+class A {
+public:
+  operator volatile int ();
+};
+A a;
+
+const int  ir1a = a.operator volatile int(); // { dg-error  }
+const int  ir2a = a;			   // { dg-error  }
+
+struct X {
+  operator int();
+} x;
+int rri2 = X();		// { dg-error  }


Re: calloc = malloc + memset

2014-04-18 Thread Marc Glisse

Thanks for the comments!

On Fri, 18 Apr 2014, Jakub Jelinek wrote:


The passes.def change makes me a little bit nervous, but if it works,
perhaps.


Would you prefer running the pass twice? I thought there would be less 
resistance to moving the pass than duplicating it. By the way, I think 
even passes we run only once should have the required functions 
implemented so they can be run several times (at least most of them), in 
case users want to do that in plugins. I was surprised when I tried adding 
a second strlen pass and the compiler refused.



--- gcc/testsuite/g++.dg/tree-ssa/calloc.C  (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options -O3 -fdump-tree-optimized } */
+
+#include new
+#include vector
+#include cstdlib
+
+void g(void*);
+inline void* operator new(std::size_t sz)
+{
+  void *p;
+
+  if (sz == 0)
+sz = 1;
+
+  // Slightly modified from the libsupc++ version, that one has 2 calls
+  // to malloc which makes it too hard to optimize.
+  while ((p = std::malloc (sz)) == 0)
+{
+  std::new_handler handler = std::get_new_handler ();
+  if (! handler)
+throw std::bad_alloc();
+  handler ();
+}
+  return p;
+}
+
+void f(void*p,int n){
+  new(p)std::vectorint(n);
+}
+
+/* { dg-final { scan-tree-dump-times calloc 1 optimized } } */
+/* { dg-final { scan-tree-dump-not malloc optimized } } */
+/* { dg-final { scan-tree-dump-not memset optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */


This looks to me way too much fragile, any time the libstdc++
or glibc headers change a little bit, you might need to adjust the
dg-final directives.  Much better would be if you just provided
the prototypes yourself and subset of the std::vector you really need for
the testcase.  You can throw some class or int, it doesn't have to be
std::bad_alloc, etc.


I don't understand what seems so fragile to you. There is a single 
function in the .optimized dump, which just calls calloc in a loop. It 
doesn't seem that likely that a change in glibc/libstdc++ would make an 
extra memset pop up. A change in libstdc++ could easily prevent the 
optimization completely (I'd like to hope we can avoid that, half of the 
purpose of the testcase was making sure libstdc++ didn't change in a bad 
way), but I don't really see how it could keep it in a way that requires 
tweaking dg-final.


While trying to write a standalone version, I hit again many missed 
optimizations, getting such nice things in the .optimized dump as:


  _12 = p_13 + sz_7;
  if (_12 != p_13)

or:

  _12 = p_13 + sz_7;
  _30 = (unsigned long) _12;
  _9 = p_13 + 4;
  _10 = (unsigned long) _9;
  _11 = _30 - _10;
  _22 = _11 /[ex] 4;
  _21 = _22;
  _40 = _21 + 1;
  _34 = _40 * 4;

It is embarrassing... I hope the combiner GSoC will work well and we can 
just add a dozen patterns to handle this before 4.10.



--- gcc/testsuite/gcc.dg/strlenopt-9.c  (revision 208772)
+++ gcc/testsuite/gcc.dg/strlenopt-9.c  (working copy)
@@ -11,21 +11,21 @@ fn1 (int r)
  optimized away.  */
   return strchr (p, '\0');
 }

 __attribute__((noinline, noclone)) size_t
 fn2 (int r)
 {
   char *p, q[10];
   strcpy (q, abc);
   p = r ? a : q;
-  /* String length for p varies, therefore strlen below isn't
+  /* String length is constant for both alternatives, and strlen is
  optimized away.  */
   return strlen (p);


Is this because of jump threading?


It is PRE that turns:

  if (r_4(D) == 0)
goto bb 5;
  else
goto bb 3;

  bb 5:
  goto bb 4;

  bb 3:

  bb 4:
  # p_1 = PHI q(5), a(3)
  _5 = __builtin_strlen (p_1);

into:

  if (r_4(D) == 0)
goto bb 5;
  else
goto bb 3;

  bb 5:
  _7 = __builtin_strlen (q);
  pretmp_8 = _7;
  goto bb 4;

  bb 3:

  bb 4:
  # p_1 = PHI q(5), a(3)
  # prephitmp_9 = PHI pretmp_8(5), 1(3)
  _5 = prephitmp_9;

It says:

Found partial redundancy for expression 
{call_expr__builtin_strlen,p_1}@.MEM_3 (0005)



--- gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc-1.c(working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+#include stdlib.h
+#include string.h


Even this I find unsafe.  The strlenopt*.c tests use it's custom
strlenopt.h header for a reason, you might just add a calloc
prototype in there and use that header.


Might as well use __builtin_* then.


+/* Handle a call to malloc or calloc.  */
+
+static void
+handle_builtin_malloc (enum built_in_function bcode, gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  tree lhs = gimple_call_lhs (stmt);
+  gcc_assert (get_stridx (lhs) == 0);
+  int idx = new_stridx (lhs);
+  tree length = NULL_TREE;
+  if (bcode == BUILT_IN_CALLOC)
+length = build_int_cst (size_type_node, 0);


Is this safe?  I mean, if you call int a = 0; ptr = calloc (a, n);
or ptr = calloc (n, a); or ptr = calloc (0, 0); etc., then there 

Inliner heuristics TLC 2/n - dump stats about inliner behaviour

2014-04-18 Thread Jan Hubicka
Hi,
this patch adds dumping of some summary stats that I used to getnerate
data for 
http://hubicka.blogspot.ca/2014/04/devirtualization-in-c-part-5-feedback.html
and found it relatively enlightening, so I think they are useful for mainline, 
too.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-inline.c (spec_rem): New static variable.
(dump_overall_stats): New function.
(dump_inline_stats): New function.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 209490)
+++ ipa-inline.c(working copy)
@@ -127,6 +127,7 @@ along with GCC; see the file COPYING3.
 static int overall_size;
 static gcov_type max_count;
 static sreal max_count_real, max_relbenefit_real, half_int_min_real;
+static gcov_type spec_rem;
 
 /* Return false when inlining edge E would lead to violating
limits on function unit growth or stack usage growth.  
@@ -1533,6 +1534,7 @@ resolve_noninline_speculation (fibheap_t
  ? node-global.inlined_to : node;
   bitmap updated_nodes = BITMAP_ALLOC (NULL);
 
+  spec_rem += edge-count;
   cgraph_resolve_speculation (edge, NULL);
   reset_edge_caches (where);
   inline_update_overall_summary (where);
@@ -1996,6 +1998,130 @@ inline_to_all_callers (struct cgraph_nod
   return false;
 }
 
+/* Output overall time estimate.  */
+static void
+dump_overall_stats (void)
+{
+  HOST_WIDEST_INT sum_weighted = 0, sum = 0;
+  struct cgraph_node *node;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+if (!node-global.inlined_to
+!node-alias)
+  {
+   int time = inline_summary (node)-time;
+   sum += time;
+   sum_weighted += time * node-count;
+  }
+  fprintf (dump_file, Overall time estimate: 
+  HOST_WIDEST_INT_PRINT_DEC weighted by profile: 
+  HOST_WIDEST_INT_PRINT_DEC\n, sum, sum_weighted);
+}
+
+/* Output some useful stats about inlining.  */
+
+static void
+dump_inline_stats (void)
+{
+  HOST_WIDEST_INT inlined_cnt = 0, inlined_indir_cnt = 0;
+  HOST_WIDEST_INT inlined_virt_cnt = 0, inlined_virt_indir_cnt = 0;
+  HOST_WIDEST_INT noninlined_cnt = 0, noninlined_indir_cnt = 0;
+  HOST_WIDEST_INT noninlined_virt_cnt = 0, noninlined_virt_indir_cnt = 0;
+  HOST_WIDEST_INT  inlined_speculative = 0, inlined_speculative_ply = 0;
+  HOST_WIDEST_INT indirect_poly_cnt = 0, indirect_cnt = 0;
+  HOST_WIDEST_INT reason[CIF_N_REASONS][3];
+  int i;
+  struct cgraph_node *node;
+
+  memset (reason, 0, sizeof (reason));
+  FOR_EACH_DEFINED_FUNCTION (node)
+  {
+struct cgraph_edge *e;
+for (e = node-callees; e; e = e-next_callee)
+  {
+   if (e-inline_failed)
+ {
+   reason[(int) e-inline_failed][0] += e-count;
+   reason[(int) e-inline_failed][1] += e-frequency;
+   reason[(int) e-inline_failed][2] ++;
+   if (DECL_VIRTUAL_P (e-callee-decl))
+ {
+   if (e-indirect_inlining_edge)
+ noninlined_virt_indir_cnt += e-count;
+   else
+ noninlined_virt_cnt += e-count;
+ }
+   else
+ {
+   if (e-indirect_inlining_edge)
+ noninlined_indir_cnt += e-count;
+   else
+ noninlined_cnt += e-count;
+ }
+ }
+   else
+ {
+   if (e-speculative)
+ {
+   if (DECL_VIRTUAL_P (e-callee-decl))
+ inlined_speculative_ply += e-count;
+   else
+ inlined_speculative += e-count;
+ }
+   else if (DECL_VIRTUAL_P (e-callee-decl))
+ {
+   if (e-indirect_inlining_edge)
+ inlined_virt_indir_cnt += e-count;
+   else
+ inlined_virt_cnt += e-count;
+ }
+   else
+ {
+   if (e-indirect_inlining_edge)
+ inlined_indir_cnt += e-count;
+   else
+ inlined_cnt += e-count;
+ }
+ }
+  }
+for (e = node-indirect_calls; e; e = e-next_callee)
+  if (e-indirect_info-polymorphic)
+   indirect_poly_cnt += e-count;
+  else
+   indirect_cnt += e-count;
+  }
+  if (max_count)
+{
+  fprintf (dump_file,
+  Inlined  HOST_WIDEST_INT_PRINT_DEC  + speculative 
+  HOST_WIDEST_INT_PRINT_DEC  + speculative polymorphic 
+  HOST_WIDEST_INT_PRINT_DEC  + previously indirect 
+  HOST_WIDEST_INT_PRINT_DEC  + virtual 
+  HOST_WIDEST_INT_PRINT_DEC  + virtual and previously indirect 
+  HOST_WIDEST_INT_PRINT_DEC \n Not inlined 
+  HOST_WIDEST_INT_PRINT_DEC  + previously indirect 
+  HOST_WIDEST_INT_PRINT_DEC  + virtual 
+  HOST_WIDEST_INT_PRINT_DEC  + virtual and previously indirect 
+  HOST_WIDEST_INT_PRINT_DEC  + stil indirect 
+  

Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Aaron Sawdey
On Fri, 2014-04-18 at 19:26 +0200, Jan Hubicka wrote:
 This is intresting.  I sort of planned to add register pressure logic
 but always tought it is somewhat hard to do at GIMPLE level in a way
 that would work for all CPUs.

Yes, this is just meant to try to measure something that is
representative of register pressure. Different architectures would
probably want different thresholds for this.

 If you find examples of bad inlining, can you also fill it into bugzilla?
 Perhaps the individual cases could be handled better by improving IRA.

Yes, I can do that for the case that's happening in bzip2.

  Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Aaron Sawdey
What I've observed on power is that LTO alone reduces performance and
LTO+FDO is not significantly different than FDO alone.

I agree that an exact estimate of the register pressure would be a
difficult problem. I'm hoping that something that approximates potential
register pressure downstream will be sufficient to help inlining
decisions. 

  Aaron

On Fri, 2014-04-18 at 10:36 -0700, Xinliang David Li wrote:
 Do you witness similar problems with LTO +FDO?
 
 My concern is it can be tricky to get the register pressure estimate
 right. The register pressure problem is created by downstream
 components (code motions etc) but only exposed by the inliner.  If you
 want to get it 'right' (i.e., not exposing the problems), you will
 need to bake the knowledge of the downstream components (possibly
 bugs) into the analysis which might not be a good thing to do longer
 term.
 
 David
 
 On Fri, Apr 18, 2014 at 9:43 AM, Aaron Sawdey
 acsaw...@linux.vnet.ibm.com wrote:
  Honza,
Seeing your recent patches relating to inliner heuristics for LTO, I
  thought I should mention some related work I'm doing.
 
  By way of introduction, I've recently joined the IBM LTC's PPC Toolchain
  team, working on gcc performance.
 
  We have not generally seen good results using LTO on IBM power processors
  and one of the problems seems to be excessive inlining that results in the
  generation of excessive spill code. So, I have set out to tackle this by
  doing some analysis at the time of the inliner pass to compute something
  analogous to register pressure, which is then used to shut down inlining of
  routines that have a lot of pressure.
 
  The analysis is basically a liveness analysis on the SSA names per basic
  block and looking for the maximum number live in any block. I've been using
  liveness pressure as a shorthand name for this.
 
  This can then be used in two ways.
  1) want_inline_function_to_all_callers_p at present always says to inline
  things that have only one call site without regard to size or what this may
  do to the register allocator downstream. In particular, BZ2_decompress in
  bzip2 gets inlined and this causes the pressure reported downstream for the
  int register class to increase 10x. Looking at some combination of pressure
  in caller/callee may help avoid this kind of situation.
  2) I also want to experiment with adding the liveness pressure in the callee
  into the badness calculation in edge_badness used by inline_small_functions.
  The idea here is to try to inline functions that are less likely to cause
  register allocator difficulty downstream first.
 
  I am just at the point of getting a prototype working, I will get a patch
  you could take a look at posted next week. In the meantime, do you have any
  comments or feedback?
 
  Thanks,
 Aaron
 
  --
  Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
  050-2/C113  (507) 253-7520 home: 507/263-0782
  IBM Linux Technology Center - PPC Toolchain
 
 

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[RFC][PATCH] RL78 - Add predicates to reduce code bloat when accessing volatile memory.

2014-04-18 Thread Richard Hulme

Hi,

This patch adds predicates (taken and renamed from the MSP430 backend) 
to allow more efficient code generation when accessing volatile memory 
turning this (for example):


movwr10, #240
movwax, r10
movwhl, ax
mov a, [hl]
or  a, #32
mov [hl], a

into this:

mov a, !240
or  a, #32
mov !240, a

Regards,

Richard.

2014-04-18  Richard Hulme  pepe...@yahoo.com

* config/rl78/predicates.md (rl78_volatile_memory_operand): New
  (rl78_general_operand): New
  (rl78_nonimmediate_operand): New
  (rl78_any_operand): Now includes volatile memory
  (rl78_nonfar_operand): Likewise
  (rl78_nonfar_nonimm_operand): Likewise

---
 gcc/config/rl78/predicates.md |   26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rl78/predicates.md b/gcc/config/rl78/predicates.md
index e564f43..29e3922 100644
--- a/gcc/config/rl78/predicates.md
+++ b/gcc/config/rl78/predicates.md
@@ -18,18 +18,38 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; http://www.gnu.org/licenses/.
 
-(define_predicate rl78_any_operand
+(define_predicate rl78_volatile_memory_operand
+  (and (match_code mem)
+   (match_test (memory_address_addr_space_p (GET_MODE (op), XEXP 
(op, 0), MEM_ADDR_SPACE (op)

+)
+
+; TRUE for any valid general operand.  We do this because
+; general_operand refuses to match volatile memory refs.
+
+(define_predicate rl78_general_operand
   (ior (match_operand 0 general_operand)
+   (match_operand 0 rl78_volatile_memory_operand))
+)
+
+; Likewise for nonimmediate_operand.
+
+(define_predicate rl78_nonimmediate_operand
+  (ior (match_operand 0 nonimmediate_operand)
+   (match_operand 0 rl78_volatile_memory_operand))
+)
+
+(define_predicate rl78_any_operand
+  (ior (match_operand 0 rl78_general_operand)
(match_code mem,const_int,const_double,reg))
 )

 (define_predicate rl78_nonfar_operand
-  (and (match_operand 0 general_operand)
+  (and (match_operand 0 rl78_general_operand)
(not (match_test rl78_far_p (op
 )

 (define_predicate rl78_nonfar_nonimm_operand
-  (and (match_operand 0 nonimmediate_operand)
+  (and (match_operand 0 rl78_nonimmediate_operand)
(not (match_test rl78_far_p (op
 )

--
1.7.9.5



Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Jan Hubicka
 What I've observed on power is that LTO alone reduces performance and
 LTO+FDO is not significantly different than FDO alone.
On SPEC2k6?

This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
off-noise win on SPEC2k6
http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html

I do not see why PPC should be significantly more constrained by register
pressure.  

I do not have head to head comparsion of FDO and FDO+LTO for SPEC
http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
shows noticeable drop in calculix and gamess.
Martin profiled calculix and tracked it down to a loop that is not trained
but hot in the reference run.  That makes it optimized for size.  

http://dromaeo.com/?id=219677,219672,219965,219877
compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
Here the benefits of LTO and FDO seems to add up nicely.
 
 I agree that an exact estimate of the register pressure would be a
 difficult problem. I'm hoping that something that approximates potential
 register pressure downstream will be sufficient to help inlining
 decisions. 

Yep, register pressure and I-cache overhead estimates are used for inline
decisions by some compilers.

I am mostly concerned about the metric suffering from GIGO principe if we mix
together too many estimates that are somehwat wrong by their nature. This is
why I mostly tried to focus on size/time estimates and not add too many other
metrics. But perhaps it is a time to experiment wit these, since obviously we
pushed current infrastructure to mostly to its limits.

Honza
 
   Aaron
 
 On Fri, 2014-04-18 at 10:36 -0700, Xinliang David Li wrote:
  Do you witness similar problems with LTO +FDO?
  
  My concern is it can be tricky to get the register pressure estimate
  right. The register pressure problem is created by downstream
  components (code motions etc) but only exposed by the inliner.  If you
  want to get it 'right' (i.e., not exposing the problems), you will
  need to bake the knowledge of the downstream components (possibly
  bugs) into the analysis which might not be a good thing to do longer
  term.
  
  David
  
  On Fri, Apr 18, 2014 at 9:43 AM, Aaron Sawdey
  acsaw...@linux.vnet.ibm.com wrote:
   Honza,
 Seeing your recent patches relating to inliner heuristics for LTO, I
   thought I should mention some related work I'm doing.
  
   By way of introduction, I've recently joined the IBM LTC's PPC Toolchain
   team, working on gcc performance.
  
   We have not generally seen good results using LTO on IBM power processors
   and one of the problems seems to be excessive inlining that results in the
   generation of excessive spill code. So, I have set out to tackle this by
   doing some analysis at the time of the inliner pass to compute something
   analogous to register pressure, which is then used to shut down inlining 
   of
   routines that have a lot of pressure.
  
   The analysis is basically a liveness analysis on the SSA names per basic
   block and looking for the maximum number live in any block. I've been 
   using
   liveness pressure as a shorthand name for this.
  
   This can then be used in two ways.
   1) want_inline_function_to_all_callers_p at present always says to inline
   things that have only one call site without regard to size or what this 
   may
   do to the register allocator downstream. In particular, BZ2_decompress in
   bzip2 gets inlined and this causes the pressure reported downstream for 
   the
   int register class to increase 10x. Looking at some combination of 
   pressure
   in caller/callee may help avoid this kind of situation.
   2) I also want to experiment with adding the liveness pressure in the 
   callee
   into the badness calculation in edge_badness used by 
   inline_small_functions.
   The idea here is to try to inline functions that are less likely to cause
   register allocator difficulty downstream first.
  
   I am just at the point of getting a prototype working, I will get a patch
   you could take a look at posted next week. In the meantime, do you have 
   any
   comments or feedback?
  
   Thanks,
  Aaron
  
   --
   Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
   050-2/C113  (507) 253-7520 home: 507/263-0782
   IBM Linux Technology Center - PPC Toolchain
  
  
 
 -- 
 Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
 050-2/C113  (507) 253-7520 home: 507/263-0782
 IBM Linux Technology Center - PPC Toolchain


Inliner heuristics TLC 3/n

2014-04-18 Thread Jan Hubicka
Hi,
this patch makes FDO inliner to be more aggressive on inlining function
calls that are considered hot.  This is based on observation that
INLINE_INSNS_AUTO is the most common reason for inlining not happening
(20.5% for Firefox, where 63.2% of calls are not inlinable because body
is not avaiable) and 66% for GCC.

With this patch INLINE_HINT_known_hot hint is added to edges that was
determined to be hot by profile and moreover there is at least 50%
chance that caller will invoke the call during its execution.

With this hint we now ignore both limits - this is because the greedy algorithm
driven by speed/size_cost metric should work pretty well here, but we may want
to revisit it (i.e. add INLINE_INSNS_FDO or so).  I am on the aggressive side so
we collect some data on when the profile is a win or loss.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-inline.h (INLINE_HINT_known_hot): New hint.
* ipa-inline-analysis.c (dump_inline_hints): Dump it.
(do_estimate_edge_time): Compute it.
* ipa-inline.c (want_inline_small_function_p): Bypass
INLINE_INSNS_AUTO/SINGLE limits for calls that are known
to be hot.
Index: ipa-inline.h
===
--- ipa-inline.h(revision 209489)
+++ ipa-inline.h(working copy)
@@ -68,7 +68,9 @@ enum inline_hints_vals {
   INLINE_HINT_cross_module = 64,
   /* If array indexes of loads/stores become known there may be room for
  further optimization.  */
-  INLINE_HINT_array_index = 128
+  INLINE_HINT_array_index = 128,
+  /* We know that the callee is hot by profile.  */
+  INLINE_HINT_known_hot = 256
 };
 typedef int inline_hints;
 
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 209489)
+++ ipa-inline-analysis.c   (working copy)
@@ -671,6 +671,11 @@ dump_inline_hints (FILE *f, inline_hints
   hints = ~INLINE_HINT_array_index;
   fprintf (f,  array_index);
 }
+  if (hints  INLINE_HINT_known_hot)
+{
+  hints = ~INLINE_HINT_known_hot;
+  fprintf (f,  known_hot);
+}
   gcc_assert (!hints);
 }
 
@@ -3666,6 +3671,17 @@ do_estimate_edge_time (struct cgraph_edg
known_aggs);
   estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
   known_aggs, size, min_size, time, hints, 
es-param);
+
+  /* When we have profile feedback, we can quite safely identify hot
+ edges and for those we disable size limits.  Don't do that when
+ probability that caller will call the callee is low however, since it
+ may hurt optimization of the caller's hot path.  */
+  if (edge-count  cgraph_maybe_hot_edge_p (edge)
+   (edge-count * 2
+   (edge-caller-global.inlined_to
+? edge-caller-global.inlined_to-count : edge-caller-count)))
+hints |= INLINE_HINT_known_hot;
+
   known_vals.release ();
   known_binfos.release ();
   known_aggs.release ();
Index: ipa-inline.c
===
--- ipa-inline.c(revision 209522)
+++ ipa-inline.c(working copy)
@@ -578,18 +578,21 @@ want_inline_small_function_p (struct cgr
  inline cnadidate.  At themoment we allow inline hints to
  promote non-inline function to inline and we increase
  MAX_INLINE_INSNS_SINGLE 16fold for inline functions.  */
-  else if (!DECL_DECLARED_INLINE_P (callee-decl)
+  else if ((!DECL_DECLARED_INLINE_P (callee-decl)
+   (!e-count || !cgraph_maybe_hot_edge_p (e)))
inline_summary (callee)-min_size - inline_edge_summary 
(e)-call_stmt_size
   MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO))
 {
   e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
   want_inline = false;
 }
-  else if (DECL_DECLARED_INLINE_P (callee-decl)
+  else if ((DECL_DECLARED_INLINE_P (callee-decl) || e-count)
inline_summary (callee)-min_size - inline_edge_summary 
(e)-call_stmt_size
   16 * MAX_INLINE_INSNS_SINGLE)
 {
-  e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
+  e-inline_failed = (DECL_DECLARED_INLINE_P (callee-decl)
+ ? CIF_MAX_INLINE_INSNS_SINGLE_LIMIT
+ : CIF_MAX_INLINE_INSNS_AUTO_LIMIT);
   want_inline = false;
 }
   else
@@ -606,6 +609,7 @@ want_inline_small_function_p (struct cgr
growth = MAX_INLINE_INSNS_SINGLE
((!big_speedup
 !(hints  (INLINE_HINT_indirect_call
+ | INLINE_HINT_known_hot
  | INLINE_HINT_loop_iterations
  | INLINE_HINT_array_index
  | INLINE_HINT_loop_stride)))
@@ -630,6 +634,7 @@ want_inline_small_function_p (struct cgr
 inlining given function is very profitable.  */
 

Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Jan Hubicka
 What I've observed on power is that LTO alone reduces performance and
 LTO+FDO is not significantly different than FDO alone.
 
 I agree that an exact estimate of the register pressure would be a
 difficult problem. I'm hoping that something that approximates potential
 register pressure downstream will be sufficient to help inlining
 decisions. 

One (ortoghonal) way to deal with this problem would be also to disable
inlining of functions called once when the edge frequency is low.
I.e. adding to check_callers something like
edge-frequency  CGRAPH_FREQ_BASE / 2
if you want to disqualify all calls that have only 50% chance that they
will be called during function invocation.

Does something like that help in your cases?

It would help in the case Linus complained about
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

The difficulty here is that disabling inlies on not so important paths
may prevent SRA and other optimizations so it may in turn also penalize
the hot path. I saw this in some cases where EH cleanup code was optimized
for size.

Perhaps SRA canalso be extended to handle cases where non-SRAable code is
on a cold path?

Honza


Re: [COMMITTED] Fix silly error in aarch64_register_move_cost

2014-04-18 Thread Jeff Law

On 04/18/14 10:02, Richard Henderson wrote:

Building mainline I got


.../aarch64.c:4879:134: error: invalid conversion from ‘reg_class_t {aka int}’ 
to ‘machine_mode’ [-fpermissive]
if (! TARGET_SIMD  GET_MODE_SIZE (from) == 128  GET_MODE_SIZE (to) == 
128)


Sure enough, TO and FROM are not modes.  Did mainline just change away from
permissive or something?  It surely seems like we ought to have seen this
earlier...
I haven't looked at GET_MODE_XXX, but most, if not of the RTL checking 
is disabled because it's too expensive.


I'm hoping that David's work will take us to a place where we can do 
more static checking on the RTL bits too.


jeff


[C PATCH] Fix up diagnostics (PR c/25801)

2014-04-18 Thread Marek Polacek
Here's a patch that tidies up diagnostics given when there's
an arithmetic on pointer to an incomplete type.  Firstly, we ought not
to say unknown structure when the type is not a structure; secondly,
we shouldn't report identical error twice.
(It might make sense to instead increment/decrement of just say
arithmetic on, but I'm not changing that now.)

Regtested on x86_64-unknown-linux-gnu, bootstrapped on
powerpc64-unknown-linux-gnu, ok for trunk?

2014-04-18  Marek Polacek  pola...@redhat.com

PR c/25801
* c-typeck.c (c_size_in_bytes): Don't call error.
(pointer_diff): Remove comment.
(build_unary_op): Improve error messages.

* gcc.dg/pr25801.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 65aad45..bc977a7 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -1760,15 +1760,10 @@ c_size_in_bytes (const_tree type)
 {
   enum tree_code code = TREE_CODE (type);
 
-  if (code == FUNCTION_TYPE || code == VOID_TYPE || code == ERROR_MARK)
+  if (code == FUNCTION_TYPE || code == VOID_TYPE || code == ERROR_MARK
+  || !COMPLETE_OR_VOID_TYPE_P (type))
 return size_one_node;
 
-  if (!COMPLETE_OR_VOID_TYPE_P (type))
-{
-  error (arithmetic on pointer to an incomplete type);
-  return size_one_node;
-}
-
   /* Convert in case a char is more than one unit.  */
   return size_binop_loc (input_location, CEIL_DIV_EXPR, TYPE_SIZE_UNIT (type),
 size_int (TYPE_PRECISION (char_type_node)
@@ -3528,7 +3523,6 @@ pointer_diff (location_t loc, tree op0, tree op1)
   if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (TREE_TYPE (orig_op1
 error_at (loc, arithmetic on pointer to an incomplete type);
 
-  /* This generates an error if op0 is pointer to incomplete type.  */
   op1 = c_size_in_bytes (target_type);
 
   if (pointer_to_zero_sized_aggr_p (TREE_TYPE (orig_op1)))
@@ -4002,16 +3996,18 @@ build_unary_op (location_t location,
 
if (typecode == POINTER_TYPE)
  {
-   /* If pointer target is an undefined struct,
+   /* If pointer target is an incomplete type,
   we just cannot know how to do the arithmetic.  */
if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (argtype)))
  {
if (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR)
  error_at (location,
-   increment of pointer to unknown structure);
+   increment of pointer to an incomplete type %qT,
+   TREE_TYPE (argtype));
else
  error_at (location,
-   decrement of pointer to unknown structure);
+   decrement of pointer to an incomplete type %qT,
+   TREE_TYPE (argtype));
  }
else if (TREE_CODE (TREE_TYPE (argtype)) == FUNCTION_TYPE
 || TREE_CODE (TREE_TYPE (argtype)) == VOID_TYPE)
diff --git gcc/testsuite/gcc.dg/pr25801.c gcc/testsuite/gcc.dg/pr25801.c
index e69de29..70b4ef8 100644
--- gcc/testsuite/gcc.dg/pr25801.c
+++ gcc/testsuite/gcc.dg/pr25801.c
@@ -0,0 +1,19 @@
+/* PR c/25801 */
+/* { dg-do compile } */
+
+int (*a)[];
+struct S *s;
+
+void
+f (void)
+{
+  a++; /* { dg-error increment of pointer to an incomplete type } */
+  ++a; /* { dg-error increment of pointer to an incomplete type } */
+  a--; /* { dg-error decrement of pointer to an incomplete type } */
+  --a; /* { dg-error decrement of pointer to an incomplete type } */
+
+  s++; /* { dg-error increment of pointer to an incomplete type } */
+  ++s; /* { dg-error increment of pointer to an incomplete type } */
+  s--; /* { dg-error decrement of pointer to an incomplete type } */
+  --s; /* { dg-error decrement of pointer to an incomplete type } */
+}

Marek


Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Xinliang David Li
On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 What I've observed on power is that LTO alone reduces performance and
 LTO+FDO is not significantly different than FDO alone.
 On SPEC2k6?

 This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO seems
 off-noise win on SPEC2k6
 http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
 http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html

 I do not see why PPC should be significantly more constrained by register
 pressure.

 I do not have head to head comparsion of FDO and FDO+LTO for SPEC
 http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
 shows noticeable drop in calculix and gamess.
 Martin profiled calculix and tracked it down to a loop that is not trained
 but hot in the reference run.  That makes it optimized for size.

 http://dromaeo.com/?id=219677,219672,219965,219877
 compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
 Here the benefits of LTO and FDO seems to add up nicely.

 I agree that an exact estimate of the register pressure would be a
 difficult problem. I'm hoping that something that approximates potential
 register pressure downstream will be sufficient to help inlining
 decisions.

 Yep, register pressure and I-cache overhead estimates are used for inline
 decisions by some compilers.

 I am mostly concerned about the metric suffering from GIGO principe if we mix
 together too many estimates that are somehwat wrong by their nature. This is
 why I mostly tried to focus on size/time estimates and not add too many other
 metrics. But perhaps it is a time to experiment wit these, since obviously we
 pushed current infrastructure to mostly to its limits.


I like the word GIGO here. Getting inline signals right  requires deep
analysis (including interprocedural analysis). Different signals/hints
may also come with different quality thus different weights.

Another challenge is how to quantify cycle savings/overhead more
precisely. With that, we can abandon the threshold based scheme -- any
callsite with a net saving will be considered.

David


 Honza

   Aaron

 On Fri, 2014-04-18 at 10:36 -0700, Xinliang David Li wrote:
  Do you witness similar problems with LTO +FDO?
 
  My concern is it can be tricky to get the register pressure estimate
  right. The register pressure problem is created by downstream
  components (code motions etc) but only exposed by the inliner.  If you
  want to get it 'right' (i.e., not exposing the problems), you will
  need to bake the knowledge of the downstream components (possibly
  bugs) into the analysis which might not be a good thing to do longer
  term.
 
  David
 
  On Fri, Apr 18, 2014 at 9:43 AM, Aaron Sawdey
  acsaw...@linux.vnet.ibm.com wrote:
   Honza,
 Seeing your recent patches relating to inliner heuristics for LTO, I
   thought I should mention some related work I'm doing.
  
   By way of introduction, I've recently joined the IBM LTC's PPC Toolchain
   team, working on gcc performance.
  
   We have not generally seen good results using LTO on IBM power processors
   and one of the problems seems to be excessive inlining that results in 
   the
   generation of excessive spill code. So, I have set out to tackle this by
   doing some analysis at the time of the inliner pass to compute something
   analogous to register pressure, which is then used to shut down inlining 
   of
   routines that have a lot of pressure.
  
   The analysis is basically a liveness analysis on the SSA names per basic
   block and looking for the maximum number live in any block. I've been 
   using
   liveness pressure as a shorthand name for this.
  
   This can then be used in two ways.
   1) want_inline_function_to_all_callers_p at present always says to inline
   things that have only one call site without regard to size or what this 
   may
   do to the register allocator downstream. In particular, BZ2_decompress in
   bzip2 gets inlined and this causes the pressure reported downstream for 
   the
   int register class to increase 10x. Looking at some combination of 
   pressure
   in caller/callee may help avoid this kind of situation.
   2) I also want to experiment with adding the liveness pressure in the 
   callee
   into the badness calculation in edge_badness used by 
   inline_small_functions.
   The idea here is to try to inline functions that are less likely to cause
   register allocator difficulty downstream first.
  
   I am just at the point of getting a prototype working, I will get a patch
   you could take a look at posted next week. In the meantime, do you have 
   any
   comments or feedback?
  
   Thanks,
  Aaron
  
   --
   Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
   050-2/C113  (507) 253-7520 home: 507/263-0782
   IBM Linux Technology Center - PPC Toolchain
  
 

 --
 Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
 050-2/C113  (507) 253-7520 home: 507/263-0782
 

Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Xinliang David Li
On Fri, Apr 18, 2014 at 12:51 PM, Jan Hubicka hubi...@ucw.cz wrote:
 What I've observed on power is that LTO alone reduces performance and
 LTO+FDO is not significantly different than FDO alone.

 I agree that an exact estimate of the register pressure would be a
 difficult problem. I'm hoping that something that approximates potential
 register pressure downstream will be sufficient to help inlining
 decisions.

 One (ortoghonal) way to deal with this problem would be also to disable
 inlining of functions called once when the edge frequency is low.
 I.e. adding to check_callers something like
 edge-frequency  CGRAPH_FREQ_BASE / 2
 if you want to disqualify all calls that have only 50% chance that they
 will be called during function invocation.

 Does something like that help in your cases?

 It would help in the case Linus complained about
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

 The difficulty here is that disabling inlies on not so important paths
 may prevent SRA and other optimizations so it may in turn also penalize
 the hot path. I saw this in some cases where EH cleanup code was optimized
 for size.


yes. The callsite may be cold, but the profile scaled callee body may
still be hot. Inlining the callee allows more context to be passed.
Similarly for hot callers, cold callees may also expose more
information (e.g, better alias info) to the caller.

David


 Perhaps SRA canalso be extended to handle cases where non-SRAable code is
 on a cold path?

 Honza


Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Jan Hubicka
 On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
  What I've observed on power is that LTO alone reduces performance and
  LTO+FDO is not significantly different than FDO alone.
  On SPEC2k6?
 
  This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO 
  seems
  off-noise win on SPEC2k6
  http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
  http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
 
  I do not see why PPC should be significantly more constrained by register
  pressure.
 
  I do not have head to head comparsion of FDO and FDO+LTO for SPEC
  http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
  shows noticeable drop in calculix and gamess.
  Martin profiled calculix and tracked it down to a loop that is not trained
  but hot in the reference run.  That makes it optimized for size.
 
  http://dromaeo.com/?id=219677,219672,219965,219877
  compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
  Here the benefits of LTO and FDO seems to add up nicely.
 
  I agree that an exact estimate of the register pressure would be a
  difficult problem. I'm hoping that something that approximates potential
  register pressure downstream will be sufficient to help inlining
  decisions.
 
  Yep, register pressure and I-cache overhead estimates are used for inline
  decisions by some compilers.
 
  I am mostly concerned about the metric suffering from GIGO principe if we 
  mix
  together too many estimates that are somehwat wrong by their nature. This is
  why I mostly tried to focus on size/time estimates and not add too many 
  other
  metrics. But perhaps it is a time to experiment wit these, since obviously 
  we
  pushed current infrastructure to mostly to its limits.
 
 
 I like the word GIGO here. Getting inline signals right  requires deep
 analysis (including interprocedural analysis). Different signals/hints
 may also come with different quality thus different weights.
 
 Another challenge is how to quantify cycle savings/overhead more
 precisely. With that, we can abandon the threshold based scheme -- any
 callsite with a net saving will be considered.

Inline hints are intended to do this - at the moment we bump the limits up
when we estimate big speedups for the inlining and with today patch and FDO
we bypass the thresholds when we know from FDO that call matters.

Concerning your other email, indeed we should consider heavy callees (in Open64
terminology) that consume a lot of time and do not skip the call sites.  Easy
way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
multiplies the count and estimated time.  (We probably gouth to get rid of the
time capping and use wider arithmetics too).

I wonder if that is not too local and if we should not try to estimate 
cumulative time
of the function and get more agressive on inlining over the whole path leading
to hot code.

Honza


Re: LTO inliner -- sensitivity to increasing register pressure

2014-04-18 Thread Xinliang David Li
On Fri, Apr 18, 2014 at 2:16 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
  What I've observed on power is that LTO alone reduces performance and
  LTO+FDO is not significantly different than FDO alone.
  On SPEC2k6?
 
  This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO 
  seems
  off-noise win on SPEC2k6
  http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
  http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
 
  I do not see why PPC should be significantly more constrained by register
  pressure.
 
  I do not have head to head comparsion of FDO and FDO+LTO for SPEC
  http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
  shows noticeable drop in calculix and gamess.
  Martin profiled calculix and tracked it down to a loop that is not trained
  but hot in the reference run.  That makes it optimized for size.
 
  http://dromaeo.com/?id=219677,219672,219965,219877
  compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
  Here the benefits of LTO and FDO seems to add up nicely.
 
  I agree that an exact estimate of the register pressure would be a
  difficult problem. I'm hoping that something that approximates potential
  register pressure downstream will be sufficient to help inlining
  decisions.
 
  Yep, register pressure and I-cache overhead estimates are used for inline
  decisions by some compilers.
 
  I am mostly concerned about the metric suffering from GIGO principe if we 
  mix
  together too many estimates that are somehwat wrong by their nature. This 
  is
  why I mostly tried to focus on size/time estimates and not add too many 
  other
  metrics. But perhaps it is a time to experiment wit these, since obviously 
  we
  pushed current infrastructure to mostly to its limits.
 

 I like the word GIGO here. Getting inline signals right  requires deep
 analysis (including interprocedural analysis). Different signals/hints
 may also come with different quality thus different weights.

 Another challenge is how to quantify cycle savings/overhead more
 precisely. With that, we can abandon the threshold based scheme -- any
 callsite with a net saving will be considered.

 Inline hints are intended to do this - at the moment we bump the limits up
 when we estimate big speedups for the inlining and with today patch and FDO
 we bypass the thresholds when we know from FDO that call matters.

 Concerning your other email, indeed we should consider heavy callees (in 
 Open64
 terminology) that consume a lot of time and do not skip the call sites.  Easy
 way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
 multiplies the count and estimated time.  (We probably gouth to get rid of the
 time capping and use wider arithmetics too).

That's what we did in Google branches. We had two heuristics -- hot
caller and hot callee heuristics.

1) For the hot caller heuristic, other simple analysis is checked a)
global working set size; b)  callsite argument check -- very simple
check to guess if inlining this callsite would sharpen analysis

2) We had not tuned hot callee heuristic by doing more analysis --
simply turn in on using hotness does not make a noticable differences.
Other hints are needed.

David




 I wonder if that is not too local and if we should not try to estimate 
 cumulative time
 of the function and get more agressive on inlining over the whole path leading
 to hot code.

 Honza


Re: [VRP][PATCH] Improve value range for loop index

2014-04-18 Thread Kugan
Ping?

On 10/04/14 06:07, Kugan wrote:
 Value range propagation simplifies convergence in vrp_visit_phi_node by
 setting minimum to TYPE_MIN when the computed minimum is smaller than
 the previous minimum. This can however result in pessimistic value
 ranges in some cases.
 
 for example,
 
   unsigned int i;
   for (i = 0; i  8; i++)
   {
 
   }
 
   # ivtmp_19 = PHI ivtmp_17(5), 8(2)
   ...
   bb 5:
   ivtmp_17 = ivtmp_19 - 1;
   if (ivtmp_17 != 0)
   
   goto bb 5;
 
 min value of ivtmp_19  is simplified to 0 (in tree-vrp.c:8465) where as
 it should have been 1. This prevents correct value ranges being
 calculated for ivtmp_17 in the example.
 
 We should be able to see the step (the difference from previous minimum
 to computed minimum) and if there is scope for more iterations (computed
 minimum is greater than step), and then we should be able set minimum to
 do one more iteration and converge to the right minimum value.
 
 Attached patch fixes this. Is this OK for stage-1?
 
 Bootstrapped and regression tested on X86_64-unknown-linux-gnu with no
 new regressions.
 
 Thanks,
 Kugan
 
 gcc/
 
 +2014-04-09  Kugan Vivekanandarajah  kug...@linaro.org
 +
 + * tree-vrp.c (vrp_visit_phi_node) : Improve value ranges of loop
 + index when simplifying convergence towards minimum.
 +
 
 gcc/testsuite
 
 +2014-04-09  Kugan Vivekanandarajah  kug...@linaro.org
 +
 + * gcc.dg/tree-ssa/vrp91.c: New test
 +