Re: RFA: patch to fix PR55116

2012-10-30 Thread H.J. Lu
On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov vmaka...@redhat.com 
 wrote:
 On 12-10-29 12:21 PM, Richard Sandiford wrote:

 Vladimir Makarov vmaka...@redhat.com writes:

 H.J. in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116

 reported an interesting address

 (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
   (const_int 2 [0x2]))
   (symbol_ref:SI (glob_vol_int_arr) var_decl
 0x703c2720 glob_vol_int_arr)) 0)
   (const_int 4294967295 [0x]))

 which can not be correctly extracted.  Here `and' with `subreg'
 behaves as an address mutation.

 The following patch fixes the problem.

 Ok to commit, Richard?

 Heh, I wondered if subregs might still be used like that, and was almost
 tempted to add them just in case.

 I think this particular case is really a failed canonicalisation and that:

 (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))

 ought to be:

 (zero_extend:DI (foo:SI ...))

 Yes, that was my thought too.

 But I know I've approved MIPS patches to accept
 (and:DI ... (const_int 0x)) as an alternative.

 Index: rtlanal.c
 ===
 --- rtlanal.c   (revision 192942)
 +++ rtlanal.c   (working copy)
 @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
  else if (code == AND  CONST_INT_P (XEXP (*loc, 1)))
   /* (and ... (const_int -X)) is used to align to X bytes.  */
   loc = XEXP (*loc, 0);
 +  else if (code == SUBREG
 +   ! REG_P (XEXP (*loc, 0))  ! MEM_P (XEXP (*loc, 0)))
 +   /* (subreg (operator ...) ...) usually inside and is used for
 +  mode conversion too.  */
 +   loc = XEXP (*loc, 0);

 I think the condition should be:

else if (code == SUBREG
  !OBJECT_P (SUBREG_REG (*loc))
  subreg_lowpart (*loc))

 OK with that change, if it works.

 Yes, it works.
 I've submitted the following patch.


 It doesn't work right.  I will create a new testcase.




This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
OK to install?

Thanks.

-- 
H.J.
gcc/

2012-10-29  H.J. Lu  hongjiu...@intel.com

PR middle-end/55116
* rtlanal.c (strip_address_mutations): Handle SUBREG only for
Pmode.

gcc/testsuite/

2012-10-29  H.J. Lu  hongjiu...@intel.com

PR middle-end/55116
* gcc.target/i386/pr55116-2.c: New file.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 43d4cb8..d076ad6 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5460,6 +5460,7 @@ strip_address_mutations (rtx *loc, enum rtx_code
*outer_code)
/* (and ... (const_int -X)) is used to align to X bytes.  */
loc = XEXP (*loc, 0);
   else if (code == SUBREG
+   GET_MODE (*loc) == Pmode
 !OBJECT_P (SUBREG_REG (*loc))
 subreg_lowpart_p (*loc))
/* (subreg (operator ...) ...) inside and is used for mode
diff --git a/gcc/testsuite/gcc.target/i386/pr55116-2.c
b/gcc/testsuite/gcc.target/i386/pr55116-2.c
new file mode 100644
index 000..7ef8ead
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr55116-2.c
@@ -0,0 +1,86 @@
+/* { dg-do compile { target { ! { ia32 } } } } */
+/* { dg-options -O2 -mx32 -maddress-mode=long } */
+
+typedef struct rtx_def *rtx;
+enum rtx_code { MINUS };
+union rtunion_def {
+  rtx rt_rtx;
+};
+typedef union rtunion_def rtunion;
+struct rtx_def {
+  enum rtx_code code: 16;
+  union u {
+rtunion fld[1];
+  }
+  u;
+};
+rtx simplify_binary_operation (enum rtx_code code, int mode,
+  rtx op0, rtx op1);
+struct simplify_plus_minus_op_data {
+  rtx op;
+  short neg;
+};
+void simplify_plus_minus (enum rtx_code code, int mode, rtx op0, rtx op1)
+{
+  struct simplify_plus_minus_op_data ops[8];
+  rtx tem = (rtx) 0;
+  int n_ops = 2, input_ops = 2;
+  int changed, canonicalized = 0;
+  int i, j;
+  __builtin_memset (ops, 0, sizeof (ops));
+  do
+{
+  changed = 0;
+  for (i = 0; i  n_ops; i++)
+   {
+ rtx this_op = ops[i].op;
+ int this_neg = ops[i].neg;
+ enum rtx_code this_code = ((enum rtx_code) (this_op)-code);
+ switch (this_code)
+   {
+   case MINUS:
+ if (n_ops == 7)
+   return;
+ n_ops++;
+ input_ops++;
+ changed = 1;
+ canonicalized |= this_neg;
+ break;
+   }
+   }
+}
+  while (changed);
+  do
+{
+  j =  n_ops - 1;
+  for (i = n_ops - 1; j = 0; j--)
+   {
+ rtx lhs = ops[j].op, rhs = ops[i].op;
+ int lneg = ops[j].neg, rneg = ops[i].neg;
+ if (lhs != 0  rhs != 0)
+   {
+ enum rtx_code ncode = MINUS;
+ if (((enum rtx_code) (lhs)-code) == 

Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed

2012-10-30 Thread H.J. Lu
On Mon, Oct 29, 2012 at 3:49 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 3:44 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 8:15 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 H.J. Lu hjl.to...@gmail.com writes:
 Hi,

 This patch changes get_elimination to check register number instead of
 RTX.  Tested on Linux/x32 with -maddress-mode=long.  OK to install?

 FWIW, this doesn't sound right to me, at least not without more 
 justification.
 The idea is that things like frame_pointer_rtx are supposed to be unique,
 so the original code:

if ((ep = elimination_map[hard_regno]) != NULL)
 -return ep-from_rtx != reg ? NULL : ep;
 from != hard_regno ? NULL : ep;

 ought to be correct in itself.  reload did the same thing:

   for (ep = reg_eliminate; ep  reg_eliminate[NUM_ELIMINABLE_REGS];
ep++)
 if (ep-from_rtx == x  ep-can_eliminate)
   return plus_constant (Pmode, ep-to_rtx, ep-previous_offset);

 It sounds on the face of it like the bug is elsewhere.


 LRA has

   if (REG_P (reg)  (ep = get_elimination (reg)) != NULL)
 {
   rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx;

   if (! replace_p)
 {
   offset += (ep-offset - ep-previous_offset);
   offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
 }

   if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
 to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx);

 Reload has

 rtx to_rtx = ep-to_rtx;
 offset += ep-offset;
 offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));

 if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
   to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)),
 to_rtx);

 (gdb) call debug_rtx (ep-to_rtx)
 (reg/f:DI 7 sp)
 (gdb) call debug_rtx (ep-from_rtx)
 (reg/f:DI 16 argp)
 (gdb)

 gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp)
 for LRA.   They are caused by

   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
   /* We should convert arg register in LRA after the elimination
  if it is possible.  */
xregno == ARG_POINTER_REGNUM
! lra_in_progress)
 return -1;

 It doesn't work in this case.


 This testcase shows that LRA can't convert arg register after
 the elimination.


Here is a patch to remove ra_in_progress check for
ARG_POINTER_REGNUM.  Tested on Linux.x86-64.
OK to install?

Thanks.

-- 
H.J.
gcc/

2012-10-29  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/55093
* rtlanal.c (simplify_subreg_regno): Remove lra_in_progress
check for ARG_POINTER_REGNUM.

gcc/testsuite/

2012-10-29  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/55093
* gcc.target/i386/pr55093.c: New file.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 43d4cb8..c1a7580 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3494,10 +3494,7 @@ simplify_subreg_regno (unsigned int xregno,
enum machine_mode xmode,
 return -1;

   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
-  /* We should convert arg register in LRA after the elimination
-if it is possible.  */
-   xregno == ARG_POINTER_REGNUM
-   ! lra_in_progress)
+   xregno == ARG_POINTER_REGNUM)
 return -1;

   if (xregno == STACK_POINTER_REGNUM
diff --git a/gcc/testsuite/gcc.target/i386/pr55093.c
b/gcc/testsuite/gcc.target/i386/pr55093.c
new file mode 100644
index 000..76b4042
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr55093.c
@@ -0,0 +1,80 @@
+/* { dg-do compile { target { ! { ia32 } } } } */
+/* { dg-options -O2 -mx32 -maddress-mode=long } */
+
+typedef union tree_node *tree;
+typedef const union tree_node *const_tree;
+typedef struct {
+  unsigned long long low;
+  long long high;
+} double_int;
+struct real_value {
+};
+struct real_format {
+  int has_signed_zero;
+};
+extern const struct real_format *   real_format_for_mode[];
+extern int real_isnegzero (const struct real_value *);
+enum tree_code { REAL_CST, SSA_NAME };
+struct tree_base {
+  enum tree_code code : 16;
+  union {
+unsigned int version;
+  }
+  u;
+};
+extern void tree_check_failed (const_tree, const char *, int, const
char *,   ...) __attribute__ ((__noreturn__));
+union tree_node {
+  struct tree_base base;
+};
+inline tree tree_check (tree __t, const char *__f, int __l, const
char *__g, enum tree_code __c) {
+  if (((enum tree_code) (__t)-base.code) != __c)
+tree_check_failed (__t, __f, __l, __g, __c, 0);
+  return __t;
+}
+struct prop_value_d {
+  int lattice_val;
+  tree value;
+  double_int mask;
+};
+typedef struct prop_value_d prop_value_t;
+static prop_value_t *const_val;
+static void canonicalize_float_value (prop_value_t *);
+typedef void (*ssa_prop_visit_stmt_fn) (prop_value_t);
+typedef void (*ssa_prop_visit_phi_fn) (void);
+typedef void 

RE: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Gopalasubramanian, Ganesh
Hi Jakub,

We are working on the following. 
1. bdver3 enablement. Review completed. Changes to be incorporated and 
checked-in.
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01131.html

2. btver2 basic enablement is done 
(http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01018.html)/
Scheduler descriptions are being updated. This is architecture specific and we 
consider it not to be a stage-1 material.

Regards
Ganesh

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: Monday, October 29, 2012 11:27 PM
To: g...@gcc.gnu.org
Cc: gcc-patches@gcc.gnu.org
Subject: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Status
==

I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 
5th.  If you have still patches for new features you'd like to see in GCC 4.8, 
please post them for review soon.  Patches posted before the freeze, but 
reviewed shortly after the freeze, may still go in, further changes should be 
just bugfixes and documentation fixes.


Quality Data


Priority  #   Change from Last Report
---   ---
P1   23   + 23
P2   77   +  8
P3   85   + 84
---   ---
Total   185   +115


Previous Report
===

http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html

The next report will be sent by me again, announcing end of stage 1.



Re: [patch] Unify bitmap interface.

2012-10-30 Thread Bin.Cheng
On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl cr...@googlers.com wrote:
 On 10/29/12, Diego Novillo dnovi...@google.com wrote:
 On Oct 29, 2012 Diego Novillo dnovi...@google.com wrote:
  Just to make sure.  Testing on ppc should be fast, for example.

 And useless.  Your patch does not touch ppc.

 I've fixed the #if 0 and the remaining suggestions will happen in
 another patch.  I've committed this one.

 ===

 This patch implements the unification of the *bitmap interfaces as discussed.
 Essentially, we rename ebitmap and sbitmap functions to use the same names
 as the bitmap functions.  This rename works because we can now overload
 on the bitmap type.  Some macros now become inline functions to enable
 that overloading.

 The sbitmap non-bool returning bitwise operations have been merged with
 the bool versions.  Sometimes this merge involved modifying the non-bool
 version to compute the bool value, and sometimes modifying bool version to
 add additional work from the non-bool version.  The redundant routines have
 been removed.

 The allocation functions have not been renamed, because we often do not
 have an argument on which to overload.  The cardinality functions have not
 been renamed, because they have different parameters, and are thus not
 interchangable.  The iteration functions have not been renamed, because
 they are functionally different.

 Tested on x86_64, contrib/config-list.mk testing passed.
 Index: gcc/ChangeLog


Just one question: Should we change the name of functions
sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds
too? It might be a little confusing that sbitmap_* is used among
bitmap_*.

-- 
Best Regards.


Adapt one fold-const optimization for vectors

2012-10-30 Thread Marc Glisse

Hello,

one more optimization that needed help for vectors, it crashed on (xy)0. 
Because of PR 55001, testcases are awkward to add (I could do a x86-only 
one if needed).


bootstrap+testsuite.

2012-10-30  Marc Glisse  marc.gli...@inria.fr

* fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
(fold_binary_loc): call it for VEC_COND_EXPR.

--
Marc Glisse


[patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o

2012-10-30 Thread Ralf Corsepius

Hi,

I would like to apply the patch below to trunk and gcc-4.7-branch.

This patch was originalyl submitted by Joel Sherrill back in May 
(http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html),

but had never received any feedback.

It has been part of the rtems-gcc patches, since then.

Ralf
2012-05-16  Joel Sherrill joel.sherr...@oarcorp.com

	* config.host (m32r-*-rtems*): Include crtinit.o and crtfinit.o
 	as extra_parts.

diff --git a/libgcc/config.host b/libgcc/config.host
index 051d6b0..bbf21a9 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -693,6 +693,7 @@ m32r-*-elf*)
  	;;
 m32r-*-rtems*)
 	tmake_file=$tmake_file m32r/t-m32r t-fdpbit
+	extra_parts=$extra_parts crtinit.o crtfini.o
 	;;
 m32rle-*-elf*)
 	tmake_file=t-fdpbit


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Matthew Gretton-Dann
On 30 October 2012 05:20, Teresa Johnson tejohn...@google.com wrote:
 Index: cfgrtl.c
 ===
 --- cfgrtl.c(revision 192692)
 +++ cfgrtl.c(working copy)
 @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
   partition boundaries).  See  the comments at the top of
   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

 -  if (BB_PARTITION (a) != BB_PARTITION (b))
 +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
 +  || BB_PARTITION (a) != BB_PARTITION (b))
  return false;

/* Protect the loop latches.  */
 @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
   partition boundaries).  See  the comments at the top of
   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

 -  if (BB_PARTITION (a) != BB_PARTITION (b))
 +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
 +  || BB_PARTITION (a) != BB_PARTITION (b))
  return false;

/* Protect the loop latches.  */

As this if() condition seems to be the canonical way to detect being
in a different partition should it be moved out into a query function,
and all of cfgrtl.c updated to use it?

[Note I am not a maintainer and so can't approve/reject your patch].

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Jakub Jelinek
On Mon, Oct 29, 2012 at 02:07:55PM -0400, David Miller wrote:
  I'd like to close the stage 1 phase of GCC 4.8 development
  on Monday, November 5th.  If you have still patches for new features you'd
  like to see in GCC 4.8, please post them for review soon.  Patches
  posted before the freeze, but reviewed shortly after the freeze, may
  still go in, further changes should be just bugfixes and documentation
  fixes.
 
 I'd like to get the Sparc cbcond stuff in (3 revisions posted) which
 is waiting for Eric B. to do some Solaris specific work.

That has been posted in stage 1, so it is certainly ok to commit it even
during early stage 3.  And, on a case by case basis exceptions are always
possible.  This hasn't changed in the last few years.  By the reviewed
shortly after the freeze I just want to say that e.g. having large intrusive
patches posted now, but reviewed late December is already too late.

As for postponing end of stage 1 by a few weeks because of the storm, I'm
afraid if we want to keep roughly timely releases we don't have that luxury.
If you look at http://gcc.gnu.org/develop.html, ending stage 1 around end of
October happened already for 4.6 and 4.7, for 4.5 if was a month earlier and
for 4.4 even two months earlier.  The 4.7 bugfixing went IMHO smothly, but
we certainly have to expect lots of bugfixing.

 I'd also like to enable LRA for at least 32-bit sparc, even if I can't
 find the time to work on auditing 64-bit completely.

I agree with Eric that it is better to enable it for the whole target
together, rather than based on some options.  Enabling LRA in early stage 3
for some targets should be ok, if it doesn't require too large and intrusive
changes to the generic code that could destabilize other targets.

Jakub


[PATCH] PR 54472

2012-10-30 Thread Andrey Belevantsev

Hello,

This PR is due to the selective scheduling missing the dependencies with 
implicit_sets.  From the sched-deps.c code it looks like implicit sets 
generate anti dependencies with either sets, uses or clobbers, so that's 
that I've done with the below patch.  Vlad, as it looks you've added 
implicit sets, does the above conclusion look correct?  I will commit the 
patch then after bootstrapping and testing will complete.


Yours,
Andrey

2012-10-30  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/54472

* sel-sched-ir.c (has_dependence_note_reg_set): Handle
implicit sets.
(has_dependence_note_reg_clobber,
has_dependence_note_reg_use): Likewise.

2012-10-30  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/54472

* gcc.dg/pr54472.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 2a7a170..220568a 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3185,7 +3185,7 @@ has_dependence_note_reg_set (int regno)
  || reg_last-clobbers != NULL)
*dsp = (*dsp  ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last-uses)
+  if (reg_last-uses || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3205,7 +3205,7 @@ has_dependence_note_reg_clobber (int regno)
   if (reg_last-sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last-uses)
+  if (reg_last-uses || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3225,7 +3225,7 @@ has_dependence_note_reg_use (int regno)
   if (reg_last-sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_TRUE;

-  if (reg_last-clobbers)
+  if (reg_last-clobbers || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;

   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
diff --git a/gcc/testsuite/gcc.dg/pr54472.c b/gcc/testsuite/gcc.dg/pr54472.c
new file mode 100644
index 000..9395203
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54472.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fschedule-insns -fselective-scheduling } */
+
+int main ()
+{
+  int a[3][3][3];
+  __builtin_memset (a, 0, sizeof a);
+  return 0;
+}


Add myself to MAINTAINERS

2012-10-30 Thread Gopalasubramanian, Ganesh
Adding myself to the list of members in write after approval.

Index: ChangeLog
===
--- ChangeLog   (revision 192977)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2012-10-30 Ganesh Gopalasubramanian  ganesh.gopalasubraman...@amd.com
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2012-10-26  James Greenhalgh  james.greenha...@arm.com

* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 192977)
+++ MAINTAINERS (working copy)
@@ -372,6 +372,7 @@
 Chao-ying Fu   f...@mips.com
 Gary Funck g...@intrepid.com
 Pompapathi V Gadad pompapathi.v.ga...@nsc.com
+Gopalasubramanian Ganesh   ganesh.gopalasubraman...@amd.com
 Kaveh Ghazigh...@gcc.gnu.org
 Matthew Gingellging...@gnat.com
 Tristan Gingoldging...@adacore.com

Regards
Ganesh



Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Michael Haubenwallner

On 08/30/2012 11:45 AM, Jonathan Wakely wrote:
 On 29 August 2012 13:25, Michael Haubenwallner wrote:
 On 08/28/2012 08:12 PM, Jonathan Wakely wrote:
 On 28 August 2012 18:27, Michael Haubenwallner wrote:

 Does it actually produce a segfault? I suppose it might on some
 platforms, but not all, so I'm not sure it's worth changing.

 Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does 
 segfault
 as expected - when I make sure the correct libstdc++ is used at runtime,
 having the '_S_empty_rep_storage' symbol in the .rodata section rather than 
 .bss.
 
 If it works reliably on x86_64 then I think the patch is worth considering.
 
 I'm on holiday for a week, so maybe one of the other maintainers will
 deal with it first.

Any chance to get this in for 4.8?

Thank you!
/haubi/


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Marek Polacek
On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:
 Hello,
 
 one more optimization that needed help for vectors, it crashed on
 (xy)0. Because of PR 55001, testcases are awkward to add (I could
 do a x86-only one if needed).
 
 bootstrap+testsuite.
 
 2012-10-30  Marc Glisse  marc.gli...@inria.fr
 
   * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
   (fold_binary_loc): call it for VEC_COND_EXPR.

Patch missing?



[SH, committed] PR 54963

2012-10-30 Thread Oleg Endo
Hello,

This is the latest proposed patch from the PR.
Tested on rev 192482 with
make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

and no new failures.
Pre-approved by Kaz in the PR.
Committed as rev 192983.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54963
* config/sh/iterators.md (SIDI): New mode iterator.
* config/sh/sh.md (negdi2): Use parallel around operation and
T_REG clobber in expander.
(*negdi2): Mark output operand as early clobbered.
Add T_REG clobber.  Split after reload.  Simplify split code.
(abssi2, absdi2): Fold expanders into absmode2.
(*abssi2, *absdi2): Fold into *absmode2 insn_and_split.
Split insns before reload.
(*negabssi2, *negabsdi2): Fold into *negabsmode2.
Add T_REG clobber.  Split insns before reload.
(negsi_cond): Reformat.  Use emit_move_insn instead of
gen_movesi.
(negdi_cond): Reformat.  Use emit_move_insn instead of a pair
of gen_movsi.  Split insn before reload.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192482)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5177,28 +5177,25 @@
 ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be
 ;; combined.
 (define_expand negdi2
-  [(set (match_operand:DI 0 arith_reg_dest )
-	(neg:DI (match_operand:DI 1 arith_reg_operand )))
-   (clobber (reg:SI T_REG))]
-  TARGET_SH1
-  )
+  [(parallel [(set (match_operand:DI 0 arith_reg_dest)
+		   (neg:DI (match_operand:DI 1 arith_reg_operand)))
+	  (clobber (reg:SI T_REG))])]
+  TARGET_SH1)
 
 (define_insn_and_split *negdi2
-  [(set (match_operand:DI 0 arith_reg_dest =r)
-	(neg:DI (match_operand:DI 1 arith_reg_operand r)))]
+  [(set (match_operand:DI 0 arith_reg_dest =r)
+	(neg:DI (match_operand:DI 1 arith_reg_operand r)))
+   (clobber (reg:SI T_REG))]
   TARGET_SH1
   #
-  TARGET_SH1
+   reload_completed
   [(const_int 0)]
 {
-  rtx low_src = gen_lowpart (SImode, operands[1]);
-  rtx high_src = gen_highpart (SImode, operands[1]);
-  rtx low_dst = gen_lowpart (SImode, operands[0]);
-  rtx high_dst = gen_highpart (SImode, operands[0]);
-
   emit_insn (gen_clrt ());
-  emit_insn (gen_negc (low_dst, low_src));
-  emit_insn (gen_negc (high_dst, high_src));
+  emit_insn (gen_negc (gen_lowpart (SImode, operands[0]),
+		   gen_lowpart (SImode, operands[1])));
+  emit_insn (gen_negc (gen_highpart (SImode, operands[0]),
+		   gen_highpart (SImode, operands[1])));
   DONE;
 })
 
@@ -5272,38 +5269,53 @@
 		(const_int -1)))]
   TARGET_SHMEDIA )
 
-(define_expand abssi2
-  [(set (match_operand:SI 0 arith_reg_dest )
-  	(abs:SI (match_operand:SI 1 arith_reg_operand )))
+(define_expand absmode2
+  [(parallel [(set (match_operand:SIDI 0 arith_reg_dest)
+		   (abs:SIDI (match_operand:SIDI 1 arith_reg_operand)))
+	  (clobber (reg:SI T_REG))])]
+  TARGET_SH1)
+
+(define_insn_and_split *absmode2
+  [(set (match_operand:SIDI 0 arith_reg_dest)
+  	(abs:SIDI (match_operand:SIDI 1 arith_reg_operand)))
(clobber (reg:SI T_REG))]
   TARGET_SH1
-  )
-
-(define_insn_and_split *abssi2
-  [(set (match_operand:SI 0 arith_reg_dest =r)
-  	(abs:SI (match_operand:SI 1 arith_reg_operand r)))]
-  TARGET_SH1
   #
-  TARGET_SH1
+   can_create_pseudo_p ()
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const1_rtx));
+  if (MODEmode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_negmode_cond (operands[0], operands[1], operands[1],
+ const1_rtx));
   DONE;
 })
 
-(define_insn_and_split *negabssi2
-  [(set (match_operand:SI 0 arith_reg_dest =r)
-  	(neg:SI (abs:SI (match_operand:SI 1 arith_reg_operand r]
+(define_insn_and_split *negabsmode2
+  [(set (match_operand:SIDI 0 arith_reg_dest)
+	(neg:SIDI (abs:SIDI (match_operand:SIDI 1 arith_reg_operand
+   (clobber (reg:SI T_REG))]
   TARGET_SH1
   #
-  TARGET_SH1
+   can_create_pseudo_p ()
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const0_rtx));
+  if (MODEmode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_negmode_cond (operands[0], operands[1], operands[1],
+ const0_rtx));
   DONE;
 })
 
@@ -5316,10 +5328,10 @@
 
 (define_insn_and_split negsi_cond
   [(set (match_operand:SI 0 arith_reg_dest =r,r)
-	(if_then_else:SI (eq:SI (reg:SI T_REG)
-			  (match_operand:SI 3 const_int_operand M,N))
-	 (match_operand:SI 1 

Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 09:05, Michael Haubenwallner wrote:
 Any chance to get this in for 4.8?

I'm looking into it today.


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Mon, Oct 29, 2012 at 05:10:10PM +0100, Richard Biener wrote:
 On Mon, Oct 29, 2012 at 4:25 PM, Dehao Chen de...@google.com wrote:
  On Mon, Oct 29, 2012 at 7:17 AM, Michael Matz m...@suse.de wrote:
  Hi,
 
  On Mon, 29 Oct 2012, Richard Biener wrote:
 
   Well, you merely moved the bogus code to gimple-low.c.  It is bogus
   because you unconditionally overwrite TREE_BLOCK of all operands (and 
   all
 
  Emm, then in gimple-low.c, we should probably not unconditionally
  overwrite gimple_block for stmts too?
 
 gimple stmts have no block before gimple-low.

And tree expressions don't have TREE_BLOCK before gimple-low either.
So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
stmts as well as all expression in the operands.  It is not overwriting
anything, no frontend sets TREE_BLOCK for any expression, the way frontends
associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
after gimplification, and it is gimple-low responsibility to set it.

In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
initially.  Before the location_t changes, again it was gimple-low that
was the first setter of TREE_BLOCK, which was valid for all
IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
with block for all gimple stmts and all tree expressions used in its
operands.  It shouldn't be set on trees that can be shared, so
say decls etc. should keep using just location_t's without associated block.
So perhaps the right test for gimple-low setting of block is
CAN_HAVE_LOCATION_P (exp)  !tree_node_can_be_shared (exp).

Jakub


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h

2012-10-30 Thread JonY
On 10/29/2012 21:05, JonY wrote:
 ChangeLog
 2012-10-29  Jonathan Yong  jo...@users.sourceforge.net

  * config/os/mingw32-w64/os_defines.h: Remove 
 _GLIBCXX_HAVE_BROKEN_VSWPRINTF
  as no longer required.



 Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h
 ===
 --- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 192802)
 +++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy)
 @@ -63,8 +63,9 @@
  // See libstdc++/20806.
  #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1

 -// See  libstdc++/37522.
 -#define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1
 +// See  libstdc++/37522. mingw-w64 stdio redirect for C++
 +// #define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1
 +// Workaround added for mingw-w64 trunk headers r5437

  // See libstdc++/43738
  // On native windows targets there is no ioctl function. And the existing

 
 

Hi,

Can I have this in before 4.8 branches? Maintainer is away for the few
weeks, but OK'ed it on IRC.





signature.asc
Description: OpenPGP digital signature


Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 09:28, Jonathan Wakely wrote:
 On 30 October 2012 09:05, Michael Haubenwallner wrote:
 Any chance to get this in for 4.8?

 I'm looking into it today.

Consider the case where one object file containing
std::string().erase() is built with an older GCC without the fix for
PR 40518, then it's linked to a new libstdc++.so where the empty rep
is read-only.  The program will attempt to write to the empty rep, but
now it's read-only and will crash.  I don't think we can apply it
unless we change the library ABI so that no pre-PR40518 objects can
link to a libstdc++.so containing a read-only empty rep.


Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed

2012-10-30 Thread Richard Sandiford
H.J. Lu hjl.to...@gmail.com writes:
 LRA has

   if (REG_P (reg)  (ep = get_elimination (reg)) != NULL)
 {
   rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx;

   if (! replace_p)
 {
   offset += (ep-offset - ep-previous_offset);
   offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
 }

   if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
 to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), 
 to_rtx);

 Reload has

 rtx to_rtx = ep-to_rtx;
 offset += ep-offset;
 offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));

 if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
   to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)),
 to_rtx);

 (gdb) call debug_rtx (ep-to_rtx)
 (reg/f:DI 7 sp)
 (gdb) call debug_rtx (ep-from_rtx)
 (reg/f:DI 16 argp)
 (gdb)

 gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp)
 for LRA.   They are caused by

   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
   /* We should convert arg register in LRA after the elimination
  if it is possible.  */
xregno == ARG_POINTER_REGNUM
! lra_in_progress)
 return -1;

 It doesn't work in this case.


 This testcase shows that LRA can't convert arg register after
 the elimination.

 Here is a patch to remove ra_in_progress check for
 ARG_POINTER_REGNUM.  Tested on Linux.x86-64.
 OK to install?

Thanks HJ.  This looks good to me.  As well as your testcase, I think
it would be dangerous to reduce this kind of subreg during non-final
elimination in cases where the argument pointer occupies more than one
hard register (like avr IIRC).  We could end up with something like
ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register
during the rest of LRA.

It's important that we do get rid of the subreg during the final
elimination stage, but I think alter_subreg already handles that case.

Since this code is outside the LRA files: patch is OK if Vlad agrees.

Richard


Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 10:11, Jonathan Wakely wrote:
 On 30 October 2012 09:28, Jonathan Wakely wrote:
 On 30 October 2012 09:05, Michael Haubenwallner wrote:
 Any chance to get this in for 4.8?

 I'm looking into it today.

 Consider the case where one object file containing
 std::string().erase() is built with an older GCC without the fix for
 PR 40518, then it's linked to a new libstdc++.so where the empty rep
 is read-only.  The program will attempt to write to the empty rep, but
 now it's read-only and will crash.  I don't think we can apply it
 unless we change the library ABI so that no pre-PR40518 objects can
 link to a libstdc++.so containing a read-only empty rep.

If I can convince myself this can't happen then I'll commit it, but I
need to look into it further this evening.


Re: RFA: patch to fix PR55116

2012-10-30 Thread Richard Sandiford
H.J. Lu hjl.to...@gmail.com writes:
 On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
 vmaka...@redhat.com wrote:
 On 12-10-29 12:21 PM, Richard Sandiford wrote:

 Vladimir Makarov vmaka...@redhat.com writes:

 H.J. in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116

 reported an interesting address

 (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
   (const_int 2 [0x2]))
   (symbol_ref:SI (glob_vol_int_arr) var_decl
 0x703c2720 glob_vol_int_arr)) 0)
   (const_int 4294967295 [0x]))

 which can not be correctly extracted.  Here `and' with `subreg'
 behaves as an address mutation.

 The following patch fixes the problem.

 Ok to commit, Richard?

 Heh, I wondered if subregs might still be used like that, and was almost
 tempted to add them just in case.

 I think this particular case is really a failed canonicalisation and that:

 (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))

 ought to be:

 (zero_extend:DI (foo:SI ...))

 Yes, that was my thought too.

 But I know I've approved MIPS patches to accept
 (and:DI ... (const_int 0x)) as an alternative.

 Index: rtlanal.c
 ===
 --- rtlanal.c   (revision 192942)
 +++ rtlanal.c   (working copy)
 @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
  else if (code == AND  CONST_INT_P (XEXP (*loc, 1)))
   /* (and ... (const_int -X)) is used to align to X bytes.  */
   loc = XEXP (*loc, 0);
 +  else if (code == SUBREG
 +   ! REG_P (XEXP (*loc, 0))  ! MEM_P (XEXP (*loc, 0)))
 +   /* (subreg (operator ...) ...) usually inside and is used for
 +  mode conversion too.  */
 +   loc = XEXP (*loc, 0);

 I think the condition should be:

else if (code == SUBREG
  !OBJECT_P (SUBREG_REG (*loc))
  subreg_lowpart (*loc))

 OK with that change, if it works.

 Yes, it works.
 I've submitted the following patch.


 It doesn't work right.  I will create a new testcase.




 This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
 OK to install?

 Thanks.

The address in this case is:

(plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
(const_int 8 [0x8]))
(subreg:SI (plus:DI (reg/f:DI 20 frame)
(const_int 32 [0x20])) 0))

which after Uros's subreg simplification patch shouldn't be allowed:
the subreg ought to be on the frame register rather than the plus.

The attached patch seems to work for the testcase.  Does it work
more generally?

Richard


gcc/
* lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
rather than gen_rtx_SUBREG.

Index: gcc/lra-eliminations.c
===
--- gcc/lra-eliminations.c  (revision 192983)
+++ gcc/lra-eliminations.c  (working copy)
@@ -550,7 +550,8 @@
  return x;
}
  else
-   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
+   return simplify_gen_subreg (GET_MODE (x), new_rtx,
+   GET_MODE (new_rtx), SUBREG_BYTE (x));
}
 
   return x;


[RFC] Heuristics to throttle the complette unrolling

2012-10-30 Thread Jan Hubicka
Hi,
for past week or two I was playing with ways to throttle down the complette
unrolling heuristics.  I made complette unroller to use the tree-ssa-loop-niter
upper bound and unroll even in non-trivial cases and this has turned out to
increase number of complettely unrolled loops by great amount, so one can
see it as considerable code size growth at -O3 SPEC build.

http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png
it is the largest jump on right hand side in both peak and base runs.
There are also performance imrovements, most impotantly 11% on applu.

The intuition is that complette unrolling is most profitable when IV tests
are eliminated and single basic block is created. When condtionals stay
in the code it is not that good idea and also functions containing calls
are less interesting for unrolling since the calls are slow and optimization
oppurtunities are not so great.

This patch reduces unrolling on loops having many branches or calls on the
hot patch.  I found that for applu speedup the number of branches needs to be
pretty high - about 32.

The patch saves about half of the growth introduced (but on different 
benchmarks)
and I think I can move all peeling to trees and reduce peeling limits a bit, 
too.

Does this sound sane? Any ideas?

Honza

Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 192892)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -140,6 +140,20 @@ struct loop_size
  instructions after exit are not executed.  */
   int last_iteration;
   int last_iteration_eliminated_by_peeling;
+  
+  /* If some IV computation will become constant.  */
+  bool constant_iv;
+
+  /* Number of call stmts that are not a builtin and are pure or const
+ present on the hot path.  */
+  int num_pure_calls_on_hot_path;
+  /* Number of call stmts that are not a builtin and are not pure nor const
+ present on the hot path.  */
+  int num_non_pure_calls_on_hot_path;
+  /* Number of statements other than calls in the loop.  */
+  int non_call_stmts_on_hot_path;
+  /* Number of branches seen on the hot path.  */
+  int num_branches_on_hot_path;
 };
 
 /* Return true if OP in STMT will be constant after peeling LOOP.  */
@@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple
   return true;
 }
 
-/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
+/* Computes an estimated number of insns in LOOP.
+   EXIT (if non-NULL) is an exite edge that will be eliminated in all but last
+   iteration of the loop.
+   EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last 
iteration
+   of loop.
Return results in SIZE, estimate benefits for complete unrolling exiting by 
EXIT.  */
 
 static void
@@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo
   gimple_stmt_iterator gsi;
   unsigned int i;
   bool after_exit;
+  VEC (basic_block, heap) *path = get_loop_hot_path (loop);
 
   size-overall = 0;
   size-eliminated_by_peeling = 0;
   size-last_iteration = 0;
   size-last_iteration_eliminated_by_peeling = 0;
+  size-num_pure_calls_on_hot_path = 0;
+  size-num_non_pure_calls_on_hot_path = 0;
+  size-non_call_stmts_on_hot_path = 0;
+  size-num_branches_on_hot_path = 0;
+  size-constant_iv = 0;
 
   if (dump_file  (dump_flags  TDF_DETAILS))
 fprintf (dump_file, Estimating sizes for loop %i\n, loop-num);
@@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo
  gimple stmt = gsi_stmt (gsi);
  int num = estimate_num_insns (stmt, eni_size_weights);
  bool likely_eliminated = false;
+ bool likely_eliminated_last = false;
+ bool likely_eliminated_peeled = false;
 
  if (dump_file  (dump_flags  TDF_DETAILS))
{
@@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo
  /* Look for reasons why we might optimize this stmt away. */
 
  /* Exit conditional.  */
- if (exit  body[i] == exit-src  stmt == last_stmt (exit-src))
+ if (exit  body[i] == exit-src
+   stmt == last_stmt (exit-src))
{
  if (dump_file  (dump_flags  TDF_DETAILS))
-   fprintf (dump_file,Exit condition will be eliminated.\n);
- likely_eliminated = true;
+   fprintf (dump_file,Exit condition will be eliminated 
+in peeled copies.\n);
+ likely_eliminated_peeled = true;
+   }
+ else if (edge_to_cancel  body[i] == edge_to_cancel-src
+   stmt == last_stmt (edge_to_cancel-src))
+   {
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   fprintf (dump_file,Exit condition will be eliminated 
+in last copy.\n);
+ likely_eliminated_last = true;
}
  /* Sets of IV variables  */
  else if (gimple_code (stmt) == GIMPLE_ASSIGN
@@ -249,19 +285,22 @@ 

Re: [RFC] Heuristics to throttle the complette unrolling

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Jan Hubicka wrote:

 Hi,
 for past week or two I was playing with ways to throttle down the complette
 unrolling heuristics.  I made complette unroller to use the 
 tree-ssa-loop-niter
 upper bound and unroll even in non-trivial cases and this has turned out to
 increase number of complettely unrolled loops by great amount, so one can
 see it as considerable code size growth at -O3 SPEC build.
 
 http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png
 it is the largest jump on right hand side in both peak and base runs.
 There are also performance imrovements, most impotantly 11% on applu.
 
 The intuition is that complette unrolling is most profitable when IV tests
 are eliminated and single basic block is created. When condtionals stay
 in the code it is not that good idea and also functions containing calls
 are less interesting for unrolling since the calls are slow and optimization
 oppurtunities are not so great.
 
 This patch reduces unrolling on loops having many branches or calls on the
 hot patch.  I found that for applu speedup the number of branches needs to be
 pretty high - about 32.
 
 The patch saves about half of the growth introduced (but on different 
 benchmarks)
 and I think I can move all peeling to trees and reduce peeling limits a bit, 
 too.
 
 Does this sound sane? Any ideas?

Yes, this sounds ok (beware of unrelated PARAM_MAX_ONCE_PEELED_INSNS
remove in the patch below).

Richard.

 Honza
 
 Index: tree-ssa-loop-ivcanon.c
 ===
 --- tree-ssa-loop-ivcanon.c   (revision 192892)
 +++ tree-ssa-loop-ivcanon.c   (working copy)
 @@ -140,6 +140,20 @@ struct loop_size
   instructions after exit are not executed.  */
int last_iteration;
int last_iteration_eliminated_by_peeling;
 +  
 +  /* If some IV computation will become constant.  */
 +  bool constant_iv;
 +
 +  /* Number of call stmts that are not a builtin and are pure or const
 + present on the hot path.  */
 +  int num_pure_calls_on_hot_path;
 +  /* Number of call stmts that are not a builtin and are not pure nor const
 + present on the hot path.  */
 +  int num_non_pure_calls_on_hot_path;
 +  /* Number of statements other than calls in the loop.  */
 +  int non_call_stmts_on_hot_path;
 +  /* Number of branches seen on the hot path.  */
 +  int num_branches_on_hot_path;
  };
  
  /* Return true if OP in STMT will be constant after peeling LOOP.  */
 @@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple
return true;
  }
  
 -/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
 +/* Computes an estimated number of insns in LOOP.
 +   EXIT (if non-NULL) is an exite edge that will be eliminated in all but 
 last
 +   iteration of the loop.
 +   EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last 
 iteration
 +   of loop.
 Return results in SIZE, estimate benefits for complete unrolling exiting 
 by EXIT.  */
  
  static void
 @@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo
gimple_stmt_iterator gsi;
unsigned int i;
bool after_exit;
 +  VEC (basic_block, heap) *path = get_loop_hot_path (loop);
  
size-overall = 0;
size-eliminated_by_peeling = 0;
size-last_iteration = 0;
size-last_iteration_eliminated_by_peeling = 0;
 +  size-num_pure_calls_on_hot_path = 0;
 +  size-num_non_pure_calls_on_hot_path = 0;
 +  size-non_call_stmts_on_hot_path = 0;
 +  size-num_branches_on_hot_path = 0;
 +  size-constant_iv = 0;
  
if (dump_file  (dump_flags  TDF_DETAILS))
  fprintf (dump_file, Estimating sizes for loop %i\n, loop-num);
 @@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo
 gimple stmt = gsi_stmt (gsi);
 int num = estimate_num_insns (stmt, eni_size_weights);
 bool likely_eliminated = false;
 +   bool likely_eliminated_last = false;
 +   bool likely_eliminated_peeled = false;
  
 if (dump_file  (dump_flags  TDF_DETAILS))
   {
 @@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo
 /* Look for reasons why we might optimize this stmt away. */
  
 /* Exit conditional.  */
 -   if (exit  body[i] == exit-src  stmt == last_stmt (exit-src))
 +   if (exit  body[i] == exit-src
 + stmt == last_stmt (exit-src))
   {
 if (dump_file  (dump_flags  TDF_DETAILS))
 - fprintf (dump_file,Exit condition will be eliminated.\n);
 -   likely_eliminated = true;
 + fprintf (dump_file,Exit condition will be eliminated 
 +  in peeled copies.\n);
 +   likely_eliminated_peeled = true;
 + }
 +   else if (edge_to_cancel  body[i] == edge_to_cancel-src
 + stmt == last_stmt (edge_to_cancel-src))
 + {
 +   if (dump_file  (dump_flags  TDF_DETAILS))
 + fprintf (dump_file,Exit condition will be eliminated 
 +  

Re: RFA: patch to fix PR55116

2012-10-30 Thread H.J. Lu
On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 H.J. Lu hjl.to...@gmail.com writes:
 On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
 vmaka...@redhat.com wrote:
 On 12-10-29 12:21 PM, Richard Sandiford wrote:

 Vladimir Makarov vmaka...@redhat.com writes:

 H.J. in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116

 reported an interesting address

 (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
   (const_int 2 [0x2]))
   (symbol_ref:SI (glob_vol_int_arr) var_decl
 0x703c2720 glob_vol_int_arr)) 0)
   (const_int 4294967295 [0x]))

 which can not be correctly extracted.  Here `and' with `subreg'
 behaves as an address mutation.

 The following patch fixes the problem.

 Ok to commit, Richard?

 Heh, I wondered if subregs might still be used like that, and was almost
 tempted to add them just in case.

 I think this particular case is really a failed canonicalisation and 
 that:

 (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))

 ought to be:

 (zero_extend:DI (foo:SI ...))

 Yes, that was my thought too.

 But I know I've approved MIPS patches to accept
 (and:DI ... (const_int 0x)) as an alternative.

 Index: rtlanal.c
 ===
 --- rtlanal.c   (revision 192942)
 +++ rtlanal.c   (working copy)
 @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
  else if (code == AND  CONST_INT_P (XEXP (*loc, 1)))
   /* (and ... (const_int -X)) is used to align to X bytes.  */
   loc = XEXP (*loc, 0);
 +  else if (code == SUBREG
 +   ! REG_P (XEXP (*loc, 0))  ! MEM_P (XEXP (*loc, 0)))
 +   /* (subreg (operator ...) ...) usually inside and is used for
 +  mode conversion too.  */
 +   loc = XEXP (*loc, 0);

 I think the condition should be:

else if (code == SUBREG
  !OBJECT_P (SUBREG_REG (*loc))
  subreg_lowpart (*loc))

 OK with that change, if it works.

 Yes, it works.
 I've submitted the following patch.


 It doesn't work right.  I will create a new testcase.




 This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
 OK to install?

 Thanks.

 The address in this case is:

 (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
 (const_int 8 [0x8]))
 (subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 32 [0x20])) 0))

 which after Uros's subreg simplification patch shouldn't be allowed:
 the subreg ought to be on the frame register rather than the plus.

 The attached patch seems to work for the testcase.  Does it work
 more generally?

 Richard


 gcc/
 * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
 rather than gen_rtx_SUBREG.

 Index: gcc/lra-eliminations.c
 ===
 --- gcc/lra-eliminations.c  (revision 192983)
 +++ gcc/lra-eliminations.c  (working copy)
 @@ -550,7 +550,8 @@
   return x;
 }
   else
 -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
 +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
 +   GET_MODE (new_rtx), SUBREG_BYTE (x));
 }

return x;

I am running the full test.

Thanks.


-- 
H.J.


Re: Fix estimation of array accesses

2012-10-30 Thread Richard Biener
On Mon, 29 Oct 2012, Jan Hubicka wrote:

   ICK ...
   
   Why not sth as simple as
   
return num_ssa_operands (stmt, SSA_OP_USE);
   
   ?  a[1][2] and b[2] really have the same cost, variable length
   objects have extra SSA operands in ARRAY_REF/COMPONENT_REF for
   the size.  Thus, stmt cost somehow should reflect the number
   of dependent stmts.
   
   So in estimate_num_insns I'd try
   
   int
   estimate_num_insns (gimple stmt, eni_weights *weights)
   {
 unsigned cost, i;
 enum gimple_code code = gimple_code (stmt);
 tree lhs;
 tree rhs;
   
 switch (code)
 {
  case GIMPLE_ASSIGN:
   /* Initialize with the number of SSA uses, one is free.  */
   cost = num_ssa_operands (stmt, SSA_OP_USE);
   if (cost  1)
 --cost;
  
  Hi,
  this is the udpated patch I am testing after today discussion.  I decided to
  drop the ipa-inline-analysis changes and do that incrementally. So the patch
  now trashes tramp3d performance by increasing need for early-inlining-insns,
  but it is not inexpected.
  
  The patch also fixes accounting of addr expr that got previously confused 
  with
  a load.
  
  Does this seem sane?
  
  * tree-inline.c (estimate_operator_cost): Return 1 for non-trivial
  ADDR_EXPR operations.
  (estimate_num_insns): Do not confuse general single rhs with
  loads; account cost of non-trivial addr_expr for ASSIGN_EXPR,
  GIMPLE_RETURN and GIMPLE_ASM
 
 Hi,
 this patch actually do not cause that much of tramp3d fuzz and no differences
 in testsuite due to unroling changes
 
 The change are the constants when accounting loads and stores.
 Typical store has two SSA uses (one for address and one for value to store).
 Of course we lose difference in between array offset and pointer dereference.
 
 Typical load/address expression has one SSA use (for the address)
 
 Bootstrapped/regtested x86_64-linux, OK?

ChangeLog?

 Honza
 
 Index: tree-inline.c
 ===
 --- tree-inline.c (revision 192945)
 +++ tree-inline.c (working copy)
 @@ -3447,6 +3447,19 @@ estimate_operator_cost (enum tree_code c
if (TREE_CODE (op2) != INTEGER_CST)
  return weights-div_mod_cost;
return 1;
 +case ADDR_EXPR:
 +  { 
 +tree addr_base;
 +HOST_WIDE_INT addr_offset;
 +
 + addr_base = get_addr_base_and_unit_offset (TREE_OPERAND (op1, 0),
 +addr_offset);
 + /* If the offset is variable or with non-zero offset, return 2.  */
 + if (!addr_base || addr_offset != 0 || TREE_CODE (addr_base) != MEM_REF
 + || !integer_zerop (TREE_OPERAND (addr_base, 1)))
 +   return 1;

The comment doesn't match the code.  Why the TREE_CODE (addr_base) != 
MEM_REF check?  If it isn't a MEM_REF then it is a plain decl, thus
no dereference.  So it's not clear what you want here ...?

It looks like you want to pessimize variable addresses here,
like a.a[i]?  Before all ADDR_EXPR cost was zero.

I'd say you want simply

if (!addr_base || addr_offset != 0)
  return 1;

no?  Or even

if (!addr_base
|| (addr_offset != 0  TREE_CODE (addr_base) == MEM_REF))
  return 1;

that keeps decl + CST as cost 0.  Or even

if (!addr_base)
  return 1;

that even keeps ptr + CST as cost 0.  Both because they are likely
combined with some complex addressing mode later.

 +  }
 +  return 0;

Inside the } please.

  
  default:
/* We expect a copy assignment with no operator.  */
 @@ -3512,14 +3525,24 @@ estimate_num_insns (gimple stmt, eni_wei
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
  
 -  if (is_gimple_reg (lhs))
 - cost = 0;
 -  else
 +  /* Store.  */
 +  if (gimple_vdef (stmt))
   cost = estimate_move_cost (TREE_TYPE (lhs));
 +  else
 + cost = 0;

That change seems bogus.  A !is_gimple_reg lhs is always a store.

  
 -  if (!is_gimple_reg (rhs)  !is_gimple_min_invariant (rhs))
 +  /* Load.  */
 +  if (gimple_vuse (stmt))
   cost += estimate_move_cost (TREE_TYPE (rhs));

Likewise.  If rhs1 is not a register or a constant this is a load.
All stores also have a VUSE so you are always accounting a store
as aggregate copy this way.

  
 +  /* Stores, loads and address expressions may have variable array
 +  references in them. Account these.  */
 +  if (gimple_vdef (stmt))
 + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 2);
 +  else if (gimple_vuse (stmt)
 +   || gimple_assign_rhs_code (stmt) == ADDR_EXPR)
 + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 1);
 +

Rather than doing this here in this awkward and bogus way (see above)
why not do it in estimate_operator_cost?  You added ADDR_EXPR already,
so simply add DECLs and handled-components.  Then you can do
better than estimating from 

Re: [patch] Unify bitmap interface.

2012-10-30 Thread Diego Novillo
On Tue, Oct 30, 2012 at 2:17 AM, Bin.Cheng amker.ch...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl cr...@googlers.com wrote:
 On 10/29/12, Diego Novillo dnovi...@google.com wrote:
 On Oct 29, 2012 Diego Novillo dnovi...@google.com wrote:
  Just to make sure.  Testing on ppc should be fast, for example.

 And useless.  Your patch does not touch ppc.

 I've fixed the #if 0 and the remaining suggestions will happen in
 another patch.  I've committed this one.

 ===

 This patch implements the unification of the *bitmap interfaces as discussed.
 Essentially, we rename ebitmap and sbitmap functions to use the same names
 as the bitmap functions.  This rename works because we can now overload
 on the bitmap type.  Some macros now become inline functions to enable
 that overloading.

 The sbitmap non-bool returning bitwise operations have been merged with
 the bool versions.  Sometimes this merge involved modifying the non-bool
 version to compute the bool value, and sometimes modifying bool version to
 add additional work from the non-bool version.  The redundant routines have
 been removed.

 The allocation functions have not been renamed, because we often do not
 have an argument on which to overload.  The cardinality functions have not
 been renamed, because they have different parameters, and are thus not
 interchangable.  The iteration functions have not been renamed, because
 they are functionally different.

 Tested on x86_64, contrib/config-list.mk testing passed.
 Index: gcc/ChangeLog


 Just one question: Should we change the name of functions
 sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds
 too? It might be a little confusing that sbitmap_* is used among
 bitmap_*.

Yes.  Lawrence is proceeding with this unification in stages.  The
next few patches should rename these.  The only two sets of functions
that will remain separate for now are the iterators and the bitmap
creation routines, I think.  Lawrence?


Diego.


Re: [patch] Unify bitmap interface.

2012-10-30 Thread Diego Novillo
On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote:

 Sure.  But the point is not to add more.  We should mechanically strip
 all the #if 0 code from the tree, btw.  No point keeping all that
 garbage around.

 Please no. A lot (if not most) if the #if 0 code serves as good
 documentation for why something is *not* done, other pieces are there
 to indicate possible enhancement, and some are useful for debugging.

I never really bought into that line of reasoning.  Documenting why an
approach was not taken is better to do it in words than in code that
will grow stale wrt the surrounding code.  Similarly for possible
enhancements.  Prose is better than code in those cases.  If debugging
code is useful, then it can remain predicated on some debugging
switch.


Diego.


[PATCH] Fix PR55111

2012-10-30 Thread Richard Biener

This fixes PR55111.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-30  Richard Biener  rguent...@suse.de

PR tree-optimization/55111
* tree-ssa-pre.c (eliminate_insert): Properly fold the built
stmt.

* gcc.dg/torture/pr55111.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 192983)
--- gcc/tree-ssa-pre.c  (working copy)
*** eliminate_insert (gimple_stmt_iterator *
*** 3996,4003 
  
tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, pretmp);
gimple tem = gimple_build_assign (res,
!   build1 (TREE_CODE (expr),
!   TREE_TYPE (expr), leader));
gsi_insert_before (gsi, tem, GSI_SAME_STMT);
VN_INFO_GET (res)-valnum = val;
  
--- 3996,4003 
  
tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, pretmp);
gimple tem = gimple_build_assign (res,
!   fold_build1 (TREE_CODE (expr),
!TREE_TYPE (expr), leader));
gsi_insert_before (gsi, tem, GSI_SAME_STMT);
VN_INFO_GET (res)-valnum = val;
  
Index: gcc/testsuite/gcc.dg/torture/pr55111.c
===
*** gcc/testsuite/gcc.dg/torture/pr55111.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55111.c  (working copy)
***
*** 0 
--- 1,24 
+ /* { dg-do compile } */
+ 
+ int a, b, c;
+ long d;
+ unsigned long *e;
+ 
+ int f(void)
+ {
+   for(;; a++)
+ {
+   if(c)
+   {
+ for(b = d = 0; b  1; b++)
+   e = d;
+ 
+ --*e;
+ 
+ if(d  0)
+   a = 0;
+ 
+ return d;
+   }
+ }
+ }


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Marc Glisse

On Tue, 30 Oct 2012, Marek Polacek wrote:


On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:

Hello,

one more optimization that needed help for vectors, it crashed on
(xy)0. Because of PR 55001, testcases are awkward to add (I could
do a x86-only one if needed).

bootstrap+testsuite.

2012-10-30  Marc Glisse  marc.gli...@inria.fr

* fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
(fold_binary_loc): call it for VEC_COND_EXPR.


Patch missing?


Indeed, thanks for the notice.

Here it is.

--
Marc GlisseIndex: fold-const.c
===
--- fold-const.c(revision 192976)
+++ fold-const.c(working copy)
@@ -5952,42 +5952,47 @@ static tree
 fold_binary_op_with_conditional_arg (location_t loc,
 enum tree_code code,
 tree type, tree op0, tree op1,
 tree cond, tree arg, int cond_first_p)
 {
   tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
   tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
   tree test, true_value, false_value;
   tree lhs = NULL_TREE;
   tree rhs = NULL_TREE;
+  enum tree_code cond_code = COND_EXPR;
 
-  if (TREE_CODE (cond) == COND_EXPR)
+  if (TREE_CODE (cond) == COND_EXPR
+  || TREE_CODE (cond) == VEC_COND_EXPR)
 {
   test = TREE_OPERAND (cond, 0);
   true_value = TREE_OPERAND (cond, 1);
   false_value = TREE_OPERAND (cond, 2);
   /* If this operand throws an expression, then it does not make
 sense to try to perform a logical or arithmetic operation
 involving it.  */
   if (VOID_TYPE_P (TREE_TYPE (true_value)))
lhs = true_value;
   if (VOID_TYPE_P (TREE_TYPE (false_value)))
rhs = false_value;
 }
   else
 {
   tree testtype = TREE_TYPE (cond);
   test = cond;
   true_value = constant_boolean_node (true, testtype);
   false_value = constant_boolean_node (false, testtype);
 }
 
+  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
+cond_code = VEC_COND_EXPR;
+
   /* This transformation is only worthwhile if we don't have to wrap ARG
  in a SAVE_EXPR and the operation can be simplified on at least one
  of the branches once its pushed inside the COND_EXPR.  */
   if (!TREE_CONSTANT (arg)
(TREE_SIDE_EFFECTS (arg)
  || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
 return NULL_TREE;
 
   arg = fold_convert_loc (loc, arg_type, arg);
   if (lhs == 0)
@@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc
   if (cond_first_p)
rhs = fold_build2_loc (loc, code, type, false_value, arg);
   else
rhs = fold_build2_loc (loc, code, type, arg, false_value);
 }
 
   /* Check that we have simplified at least one of the branches.  */
   if (!TREE_CONSTANT (arg)  !TREE_CONSTANT (lhs)  !TREE_CONSTANT (rhs))
 return NULL_TREE;
 
-  return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs);
+  return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
 }
 
 
 /* Subroutine of fold() that checks for the addition of +/- 0.0.
 
If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
ADDEND is the same as X.
 
X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero
@@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc,
   if (TREE_CODE (arg1) == COMPOUND_EXPR
   reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
{
  tem = fold_build2_loc (loc, code, type, op0,
 fold_convert_loc (loc, TREE_TYPE (op1),
   TREE_OPERAND (arg1, 1)));
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1, 0),
 tem);
}
 
-  if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0))
+  if (TREE_CODE (arg0) == COND_EXPR
+ || TREE_CODE (arg0) == VEC_COND_EXPR
+ || COMPARISON_CLASS_P (arg0))
{
  tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1,
 arg0, arg1,
 /*cond_first_p=*/1);
  if (tem != NULL_TREE)
return tem;
}
 
-  if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1))
+  if (TREE_CODE (arg1) == COND_EXPR
+ || TREE_CODE (arg1) == VEC_COND_EXPR
+ || COMPARISON_CLASS_P (arg1))
{
  tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1,
 arg1, arg0,
 /*cond_first_p=*/0);
  if (tem != NULL_TREE)
return tem;
}
 }
 
   switch (code)


Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Diego Novillo

On 2012-10-29 15:01 , Lawrence Crowl wrote:

On 10/27/12, Marc Glisse marc.gli...@inria.fr wrote:

On Fri, 26 Oct 2012, Lawrence Crowl wrote:

2012-10-26  Lawrence Crowl  cr...@google.com


missing ''


Fixed.


* is-a.h: New.
(is_a T (U*)): New.  Test for is-a relationship.
(as_a T (U*)): New.  Treat as a derived type.
(dyn_cast T (U*)): New.  Conditionally cast based on is_a.


I can't find this file in the patch...


I forgot to svn add.  Updated patch here.


This patch implements generic type query and conversion functions,
and applies them to the use of cgraph_node, varpool_node, and symtab_node.

The functions are:

bool is_a TYPE (pointer)
   Tests whether the pointer actually points to a more derived TYPE.

TYPE *as_a TYPE (pointer)
   Converts pointer to a TYPE*.

TYPE *dyn_cast TYPE (pointer)
   Converts pointer to TYPE* if and only if is_a TYPE pointer.
   Otherwise, returns NULL.
   This function is essentially a checked down cast.

These functions reduce compile time and increase type safety when treating a
generic item as a more specific item.  In essence, the code change is from

   if (symtab_function_p (node))
 {
   struct cgraph_node *cnode = cgraph (node);
   
 }

to

   if (cgraph_node *cnode = dyn_cast cgraph_node (node))
 {
   
 }

The necessary conditional test defines a variable that holds a known good
pointer to the specific item and avoids subsequent conversion calls and
the assertion checks that may come with them.

When, the property test is embedded within a larger condition, the variable
declaration gets pulled out of the condition.  (This leaves some room for
using the variable inappropriately.)

   if (symtab_variable_p (node)
varpool (node)-finalized)
 varpool_analyze_node (varpool (node));

becomes

   varpool_node *vnode = dyn_cast varpool_node (node);
   if (vnode  vnode-finalized)
 varpool_analyze_node (vnode);

Note that we have converted two sets of assertions in the calls to varpool
into safe and efficient use of a variable.


There are remaining calls to symtab_function_p and symtab_variable_p that
do not involve a pointer to a more specific type.  These have been converted
to calls to a functions is_a cgraph_node and is_a varpool_node.  The
original predicate functions have been removed.

The cgraph.h header defined both a struct and a function with the name
varpool_node.  This name overloading can cause some unintuitive error messages
when, as is common in C++, one omits the struct keyword when using the type.
I have renamed the function to varpool_node_for_decl.

Tested on x86_64.


Okay for trunk?
Index: gcc/ChangeLog

2012-10-29  Lawrence Crowl  cr...@google.com

* is-a.h: New.
(is_a T (U*)): New.  Test for is-a relationship.
(as_a T (U*)): New.  Treat as a derived type.
(dyn_cast T (U*)): New.  Conditionally cast based on is_a.
* cgraph.h (varpool_node): Rename to varpool_node_for_decl.
Adjust callers to match.
(is_a_helper cgraph_node::test (symtab_node_def *)): New.
(is_a_helper varpool_node::test (symtab_node_def *)): New.
(symtab_node_def::try_function): New.  Change most calls to
symtab_function_p with calls to dyn_cast cgraph_node (p).
(symtab_node_def::try_variable): New.  Change most calls to
symtab_variable_p with calls to dyn_cast varpool_node (p).
(symtab_function_p): Remove.  Change callers to use
 is_a cgraph_node (p) instead.
(symtab_variable_p): Remove.  Change callers to use
 is_a varpool_node (p) instead.
* cgraph.c (cgraph_node_for_asm): Remove redundant call to
symtab_node_for_asm.
* cgraphunit.c (symbol_finalized_and_needed): New.
(symbol_finalized): New.
(cgraph_analyze_functions): Split complicated conditionals out into
above new functions.
* Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.


Thanks.  I really like this cleanup.  I have a few questions and 
comments below.  Honza, the patch looks OK to me but it touches a bunch 
of cgraph code, could you please go over it?




Index: gcc/is-a.h
===
--- gcc/is-a.h  (revision 0)
+++ gcc/is-a.h  (revision 0)
@@ -0,0 +1,118 @@
+/* Dynamic testing for abstract is-a relationships.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Lawrence Crowl.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have 

Re: [patch] Unify bitmap interface.

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 12:53 PM, Diego Novillo dnovi...@google.com wrote:
 On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote:

 Sure.  But the point is not to add more.  We should mechanically strip
 all the #if 0 code from the tree, btw.  No point keeping all that
 garbage around.

 Please no. A lot (if not most) if the #if 0 code serves as good
 documentation for why something is *not* done, other pieces are there
 to indicate possible enhancement, and some are useful for debugging.

 I never really bought into that line of reasoning.  Documenting why an
 approach was not taken is better to do it in words than in code that
 will grow stale wrt the surrounding code.  Similarly for possible
 enhancements.  Prose is better than code in those cases.  If debugging
 code is useful, then it can remain predicated on some debugging
 switch.

I agree with both of you - #if 0 code if it is useful as comment deserves
being rewritten as a comment.

Richard.


 Diego.


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 1:14 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 30 Oct 2012, Marek Polacek wrote:

 On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:

 Hello,

 one more optimization that needed help for vectors, it crashed on
 (xy)0. Because of PR 55001, testcases are awkward to add (I could
 do a x86-only one if needed).

 bootstrap+testsuite.

 2012-10-30  Marc Glisse  marc.gli...@inria.fr

 * fold-const.c (fold_binary_op_with_conditional_arg): Handle
 vectors.
 (fold_binary_loc): call it for VEC_COND_EXPR.


 Patch missing?


 Indeed, thanks for the notice.

Ok.

Thanks,
Richard.

 Here it is.

 --
 Marc Glisse
 Index: fold-const.c
 ===
 --- fold-const.c(revision 192976)
 +++ fold-const.c(working copy)
 @@ -5952,42 +5952,47 @@ static tree
  fold_binary_op_with_conditional_arg (location_t loc,
  enum tree_code code,
  tree type, tree op0, tree op1,
  tree cond, tree arg, int cond_first_p)
  {
tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
tree test, true_value, false_value;
tree lhs = NULL_TREE;
tree rhs = NULL_TREE;
 +  enum tree_code cond_code = COND_EXPR;

 -  if (TREE_CODE (cond) == COND_EXPR)
 +  if (TREE_CODE (cond) == COND_EXPR
 +  || TREE_CODE (cond) == VEC_COND_EXPR)
  {
test = TREE_OPERAND (cond, 0);
true_value = TREE_OPERAND (cond, 1);
false_value = TREE_OPERAND (cond, 2);
/* If this operand throws an expression, then it does not make
  sense to try to perform a logical or arithmetic operation
  involving it.  */
if (VOID_TYPE_P (TREE_TYPE (true_value)))
 lhs = true_value;
if (VOID_TYPE_P (TREE_TYPE (false_value)))
 rhs = false_value;
  }
else
  {
tree testtype = TREE_TYPE (cond);
test = cond;
true_value = constant_boolean_node (true, testtype);
false_value = constant_boolean_node (false, testtype);
  }

 +  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
 +cond_code = VEC_COND_EXPR;
 +
/* This transformation is only worthwhile if we don't have to wrap ARG
   in a SAVE_EXPR and the operation can be simplified on at least one
   of the branches once its pushed inside the COND_EXPR.  */
if (!TREE_CONSTANT (arg)
 (TREE_SIDE_EFFECTS (arg)
   || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
  return NULL_TREE;

arg = fold_convert_loc (loc, arg_type, arg);
if (lhs == 0)
 @@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc
if (cond_first_p)
 rhs = fold_build2_loc (loc, code, type, false_value, arg);
else
 rhs = fold_build2_loc (loc, code, type, arg, false_value);
  }

/* Check that we have simplified at least one of the branches.  */
if (!TREE_CONSTANT (arg)  !TREE_CONSTANT (lhs)  !TREE_CONSTANT (rhs))
  return NULL_TREE;

 -  return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs);
 +  return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
  }


  /* Subroutine of fold() that checks for the addition of +/- 0.0.

 If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
 TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
 ADDEND is the same as X.

 X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero
 @@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc,
if (TREE_CODE (arg1) == COMPOUND_EXPR
reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
 {
   tem = fold_build2_loc (loc, code, type, op0,
  fold_convert_loc (loc, TREE_TYPE (op1),
TREE_OPERAND (arg1, 1)));
   return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1,
 0),
  tem);
 }

 -  if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0))
 +  if (TREE_CODE (arg0) == COND_EXPR
 + || TREE_CODE (arg0) == VEC_COND_EXPR
 + || COMPARISON_CLASS_P (arg0))
 {
   tem = fold_binary_op_with_conditional_arg (loc, code, type, op0,
 op1,
  arg0, arg1,
  /*cond_first_p=*/1);
   if (tem != NULL_TREE)
 return tem;
 }

 -  if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1))
 +  if (TREE_CODE (arg1) == COND_EXPR
 + || TREE_CODE (arg1) == VEC_COND_EXPR
 + || COMPARISON_CLASS_P (arg1))
 {
   tem = fold_binary_op_with_conditional_arg (loc, code, type, op0,
 op1,
  arg1, 

Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 1:20 PM, Diego Novillo dnovi...@google.com wrote:
 On 2012-10-29 15:01 , Lawrence Crowl wrote:

 On 10/27/12, Marc Glisse marc.gli...@inria.fr wrote:

 On Fri, 26 Oct 2012, Lawrence Crowl wrote:

 2012-10-26  Lawrence Crowl  cr...@google.com


 missing ''


 Fixed.

 * is-a.h: New.
 (is_a T (U*)): New.  Test for is-a relationship.
 (as_a T (U*)): New.  Treat as a derived type.
 (dyn_cast T (U*)): New.  Conditionally cast based on is_a.


 I can't find this file in the patch...


 I forgot to svn add.  Updated patch here.


 This patch implements generic type query and conversion functions,
 and applies them to the use of cgraph_node, varpool_node, and symtab_node.

 The functions are:

 bool is_a TYPE (pointer)
Tests whether the pointer actually points to a more derived TYPE.

 TYPE *as_a TYPE (pointer)
Converts pointer to a TYPE*.

 TYPE *dyn_cast TYPE (pointer)
Converts pointer to TYPE* if and only if is_a TYPE pointer.
Otherwise, returns NULL.
This function is essentially a checked down cast.

 These functions reduce compile time and increase type safety when treating
 a
 generic item as a more specific item.  In essence, the code change is from

if (symtab_function_p (node))
  {
struct cgraph_node *cnode = cgraph (node);

  }

 to

if (cgraph_node *cnode = dyn_cast cgraph_node (node))
  {

  }

 The necessary conditional test defines a variable that holds a known good
 pointer to the specific item and avoids subsequent conversion calls and
 the assertion checks that may come with them.

 When, the property test is embedded within a larger condition, the
 variable
 declaration gets pulled out of the condition.  (This leaves some room for
 using the variable inappropriately.)

if (symtab_variable_p (node)
 varpool (node)-finalized)
  varpool_analyze_node (varpool (node));

 becomes

varpool_node *vnode = dyn_cast varpool_node (node);
if (vnode  vnode-finalized)
  varpool_analyze_node (vnode);

 Note that we have converted two sets of assertions in the calls to varpool
 into safe and efficient use of a variable.


 There are remaining calls to symtab_function_p and symtab_variable_p that
 do not involve a pointer to a more specific type.  These have been
 converted
 to calls to a functions is_a cgraph_node and is_a varpool_node.  The
 original predicate functions have been removed.

 The cgraph.h header defined both a struct and a function with the name
 varpool_node.  This name overloading can cause some unintuitive error
 messages
 when, as is common in C++, one omits the struct keyword when using the
 type.
 I have renamed the function to varpool_node_for_decl.

 Tested on x86_64.


 Okay for trunk?
 Index: gcc/ChangeLog

 2012-10-29  Lawrence Crowl  cr...@google.com

 * is-a.h: New.
 (is_a T (U*)): New.  Test for is-a relationship.
 (as_a T (U*)): New.  Treat as a derived type.
 (dyn_cast T (U*)): New.  Conditionally cast based on is_a.
 * cgraph.h (varpool_node): Rename to varpool_node_for_decl.
 Adjust callers to match.
 (is_a_helper cgraph_node::test (symtab_node_def *)): New.
 (is_a_helper varpool_node::test (symtab_node_def *)): New.
 (symtab_node_def::try_function): New.  Change most calls to
 symtab_function_p with calls to dyn_cast cgraph_node (p).
 (symtab_node_def::try_variable): New.  Change most calls to
 symtab_variable_p with calls to dyn_cast varpool_node (p).
 (symtab_function_p): Remove.  Change callers to use
  is_a cgraph_node (p) instead.
 (symtab_variable_p): Remove.  Change callers to use
  is_a varpool_node (p) instead.
 * cgraph.c (cgraph_node_for_asm): Remove redundant call to
 symtab_node_for_asm.
 * cgraphunit.c (symbol_finalized_and_needed): New.
 (symbol_finalized): New.
 (cgraph_analyze_functions): Split complicated conditionals out
 into
 above new functions.
 * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.


 Thanks.  I really like this cleanup.  I have a few questions and comments
 below.  Honza, the patch looks OK to me but it touches a bunch of cgraph
 code, could you please go over it?



 Index: gcc/is-a.h
 ===
 --- gcc/is-a.h  (revision 0)
 +++ gcc/is-a.h  (revision 0)
 @@ -0,0 +1,118 @@
 +/* Dynamic testing for abstract is-a relationships.
 +   Copyright (C) 2012 Free Software Foundation, Inc.
 +   Contributed by Lawrence Crowl.
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify it under
 +the terms of the GNU General Public License as published by the Free
 +Software Foundation; either version 3, or (at your option) any later
 +version.
 +
 +GCC is distributed in the hope that it will be useful, but WITHOUT 

[PATCH] Add gimple load/store predicates, use them from stmt estimates

2012-10-30 Thread Richard Biener

As requested this adds predicates to check whether the lhs of
a assign or call is a store and whether rhs1 of an assignment
is a load.  It uses this in place of the existing, slightly
bogus, check in the stmt estimate code.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2012-10-30  Richard Biener  rguent...@suse.de

* gimple.h (gimple_store_p): New predicate.
(gimple_assign_load_p): Likewise.
* tree-inline.c (estimate_num_insns): Use it.

Index: gcc/gimple.h
===
*** gcc/gimple.h(revision 192984)
--- gcc/gimple.h(working copy)
*** gimple_assign_single_p (gimple gs)
*** 2041,2046 
--- 2041,2071 
 gimple_assign_rhs_class (gs) == GIMPLE_SINGLE_RHS);
  }
  
+ /* Return true if GS performs a store to its lhs.  */
+ 
+ static inline bool
+ gimple_store_p (gimple gs)
+ {
+   tree lhs = gimple_get_lhs (gs);
+   return lhs  !is_gimple_reg (lhs);
+ }
+ 
+ /* Return true if GS is an assignment that loads from its rhs1.  */
+ 
+ static inline bool
+ gimple_assign_load_p (gimple gs)
+ {
+   tree rhs;
+   if (!gimple_assign_single_p (gs))
+ return false;
+   rhs = gimple_assign_rhs1 (gs);
+   if (TREE_CODE (rhs) == WITH_SIZE_EXPR)
+ return true;
+   rhs = get_base_address (rhs);
+   return (DECL_P (rhs)
+ || TREE_CODE (rhs) == MEM_REF || TREE_CODE (rhs) == TARGET_MEM_REF);
+ }
+ 
  
  /* Return true if S is a type-cast assignment.  */
  
Index: gcc/tree-inline.c
===
*** gcc/tree-inline.c   (revision 192984)
--- gcc/tree-inline.c   (working copy)
*** estimate_num_insns (gimple stmt, eni_wei
*** 3512,3523 
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
  
!   if (is_gimple_reg (lhs))
!   cost = 0;
!   else
!   cost = estimate_move_cost (TREE_TYPE (lhs));
  
!   if (!is_gimple_reg (rhs)  !is_gimple_min_invariant (rhs))
cost += estimate_move_cost (TREE_TYPE (rhs));
  
cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights,
--- 3512,3523 
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
  
!   cost = 0;
  
!   /* Account for the cost of moving to / from memory.  */
!   if (gimple_store_p (stmt))
!   cost += estimate_move_cost (TREE_TYPE (lhs));
!   if (gimple_assign_load_p (stmt))
cost += estimate_move_cost (TREE_TYPE (rhs));
  
cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights,


RE: [PATCH] [AArch64] Add vcond, vcondu support.

2012-10-30 Thread James Greenhalgh
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Marcus Shawcroft
 Sent: 15 October 2012 12:37
 To: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [AArch64] Add vcond, vcondu support.
 
 On 09/10/12 12:08, James Greenhalgh wrote:
 
  Hi,
 
  This patch adds support for vcond and vcondu to the AArch64
  backend.
 
  Tested with no regressions on aarch64-none-elf.
 
  OK for aarch64-branch?
 
  (If so, someone will have to commit for me, as I do not
  have commit rights.)
 
  Thanks
  James Greenhalgh
 
  ---
  2012-09-11  James Greenhalghjames.greenha...@arm.com
  Tejas Belagodtejas.bela...@arm.com
 
  * config/aarch64/aarch64-simd.md
  (aarch64_simd_bslmode_internal): New pattern.
  (aarch64_simd_bslmode): Likewise.
  (aarch64_vcond_internalmode): Likewise.
  (vcondumodemode): Likewise.
  (vcondmodemode): Likewise.
  * config/aarch64/iterators.md (UNSPEC_BSL): Add to
 define_constants.
 
 OK
 /Marcus
 

Hi,

Committed as revision 192985.

Thanks,
James Greenhalgh


Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Jan Hubicka
 2012-10-29  Lawrence Crowl  cr...@google.com
 
   * is-a.h: New.
   (is_a T (U*)): New.  Test for is-a relationship.
   (as_a T (U*)): New.  Treat as a derived type.
   (dyn_cast T (U*)): New.  Conditionally cast based on is_a.
   * cgraph.h (varpool_node): Rename to varpool_node_for_decl.
   Adjust callers to match.
   (is_a_helper cgraph_node::test (symtab_node_def *)): New.
   (is_a_helper varpool_node::test (symtab_node_def *)): New.
   (symtab_node_def::try_function): New.  Change most calls to
   symtab_function_p with calls to dyn_cast cgraph_node (p).
   (symtab_node_def::try_variable): New.  Change most calls to
   symtab_variable_p with calls to dyn_cast varpool_node (p).
   (symtab_function_p): Remove.  Change callers to use
 is_a cgraph_node (p) instead.
   (symtab_variable_p): Remove.  Change callers to use
 is_a varpool_node (p) instead.
   * cgraph.c (cgraph_node_for_asm): Remove redundant call to
   symtab_node_for_asm.
   * cgraphunit.c (symbol_finalized_and_needed): New.
   (symbol_finalized): New.
   (cgraph_analyze_functions): Split complicated conditionals out into
   above new functions.
   * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.
 

the patch is OK,
thanks!
Honza


Re: RFA: patch to fix PR55116

2012-10-30 Thread H.J. Lu
On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 H.J. Lu hjl.to...@gmail.com writes:
 On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
 vmaka...@redhat.com wrote:
 On 12-10-29 12:21 PM, Richard Sandiford wrote:

 Vladimir Makarov vmaka...@redhat.com writes:

 H.J. in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116

 reported an interesting address

 (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
   (const_int 2 [0x2]))
   (symbol_ref:SI (glob_vol_int_arr) var_decl
 0x703c2720 glob_vol_int_arr)) 0)
   (const_int 4294967295 [0x]))

 which can not be correctly extracted.  Here `and' with `subreg'
 behaves as an address mutation.

 The following patch fixes the problem.

 Ok to commit, Richard?

 Heh, I wondered if subregs might still be used like that, and was almost
 tempted to add them just in case.

 I think this particular case is really a failed canonicalisation and 
 that:

 (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))

 ought to be:

 (zero_extend:DI (foo:SI ...))

 Yes, that was my thought too.

 But I know I've approved MIPS patches to accept
 (and:DI ... (const_int 0x)) as an alternative.

 Index: rtlanal.c
 ===
 --- rtlanal.c   (revision 192942)
 +++ rtlanal.c   (working copy)
 @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
  else if (code == AND  CONST_INT_P (XEXP (*loc, 1)))
   /* (and ... (const_int -X)) is used to align to X bytes.  */
   loc = XEXP (*loc, 0);
 +  else if (code == SUBREG
 +   ! REG_P (XEXP (*loc, 0))  ! MEM_P (XEXP (*loc, 0)))
 +   /* (subreg (operator ...) ...) usually inside and is used for
 +  mode conversion too.  */
 +   loc = XEXP (*loc, 0);

 I think the condition should be:

else if (code == SUBREG
  !OBJECT_P (SUBREG_REG (*loc))
  subreg_lowpart (*loc))

 OK with that change, if it works.

 Yes, it works.
 I've submitted the following patch.


 It doesn't work right.  I will create a new testcase.




 This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
 OK to install?

 Thanks.

 The address in this case is:

 (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
 (const_int 8 [0x8]))
 (subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 32 [0x20])) 0))

 which after Uros's subreg simplification patch shouldn't be allowed:
 the subreg ought to be on the frame register rather than the plus.

 The attached patch seems to work for the testcase.  Does it work
 more generally?

 Richard


 gcc/
 * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
 rather than gen_rtx_SUBREG.

 Index: gcc/lra-eliminations.c
 ===
 --- gcc/lra-eliminations.c  (revision 192983)
 +++ gcc/lra-eliminations.c  (working copy)
 @@ -550,7 +550,8 @@
   return x;
 }
   else
 -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
 +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
 +   GET_MODE (new_rtx), SUBREG_BYTE (x));
 }

return x;

 I am running the full test.


It works.  Can you check in your patch?  I will check in
my testcase.

Thanks.


-- 
H.J.


Re: RFA: patch to fix PR55116

2012-10-30 Thread Richard Sandiford
H.J. Lu hjl.to...@gmail.com writes:
 On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 The address in this case is:

 (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
 (const_int 8 [0x8]))
 (subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 32 [0x20])) 0))

 which after Uros's subreg simplification patch shouldn't be allowed:
 the subreg ought to be on the frame register rather than the plus.

 The attached patch seems to work for the testcase.  Does it work
 more generally?

 Richard


 gcc/
 * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
 rather than gen_rtx_SUBREG.

 Index: gcc/lra-eliminations.c
 ===
 --- gcc/lra-eliminations.c  (revision 192983)
 +++ gcc/lra-eliminations.c  (working copy)
 @@ -550,7 +550,8 @@
   return x;
 }
   else
 -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
 +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
 +   GET_MODE (new_rtx), SUBREG_BYTE 
 (x));
 }

return x;

 I am running the full test.


 It works.  Can you check in your patch?  I will check in
 my testcase.

Thanks.  Vlad, is the patch OK?

Richard


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Diego Novillo
On Mon, Oct 29, 2012 at 1:56 PM, Jakub Jelinek ja...@redhat.com wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.

I will be committing the VEC overhaul soon.  With any luck this week,
but PCH and gengtype are giving me a lot of grief.


Diego.


Re: [Patch] Potential fix for PR55033

2012-10-30 Thread Sebastian Huber

On 10/26/2012 02:22 PM, Sebastian Huber wrote:

Hello,

here is a test case for PR55033.



Is there something wrong with this test case?  It compiles well with Alan's 
patch.

--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




Re: [Patch] Potential fix for PR55033

2012-10-30 Thread Sebastian Huber

Hello,

what needs to be done to get this committed?

--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




Re: [PATCH] Add gimple load/store predicates, use them from stmt estimates

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Richard Biener wrote:

 
 As requested this adds predicates to check whether the lhs of
 a assign or call is a store and whether rhs1 of an assignment
 is a load.  It uses this in place of the existing, slightly
 bogus, check in the stmt estimate code.
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Committed with the following adjustment as we now would inline foo
as we no longer account memory move costs for stmts like

  pInput_9 = MEM[(void *)pInput_1 + 8B];

adding noinline looks reasonable anyway (we'll fix the above
to cost something again with a followup).

Richard.

Index: gcc/testsuite/gcc.dg/vect/slp-perm-2.c
===
*** gcc/testsuite/gcc.dg/vect/slp-perm-2.c  (revision 192984)
--- gcc/testsuite/gcc.dg/vect/slp-perm-2.c  (working copy)
***
*** 12,18 

  #define N 16

! void foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
unsigned int i, a, b;

--- 12,19 

  #define N 16

! void __attribute__((noinline))
! foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
unsigned int i, a, b;




Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
 And tree expressions don't have TREE_BLOCK before gimple-low either.
 So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
 stmts as well as all expression in the operands.  It is not overwriting
 anything, no frontend sets TREE_BLOCK for any expression, the way frontends
 associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
 after gimplification, and it is gimple-low responsibility to set it.

 In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
 initially.  Before the location_t changes, again it was gimple-low that
 was the first setter of TREE_BLOCK, which was valid for all
 IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
 with block for all gimple stmts and all tree expressions used in its
 operands.  It shouldn't be set on trees that can be shared, so
 say decls etc. should keep using just location_t's without associated block.
 So perhaps the right test for gimple-low setting of block is
 CAN_HAVE_LOCATION_P (exp)  !tree_node_can_be_shared (exp).

 Jakub

I Kind of like this idea. What do you guys think?

Thanks,
Dehao


Re: RFA: patch to fix PR55116

2012-10-30 Thread Vladimir Makarov

On 10/30/2012 09:39 AM, Richard Sandiford wrote:

H.J. Lu hjl.to...@gmail.com writes:

On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote:

On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:

The address in this case is:

(plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
 (const_int 8 [0x8]))
 (subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 32 [0x20])) 0))

which after Uros's subreg simplification patch shouldn't be allowed:
the subreg ought to be on the frame register rather than the plus.

The attached patch seems to work for the testcase.  Does it work
more generally?

Richard


gcc/
 * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
 rather than gen_rtx_SUBREG.

Index: gcc/lra-eliminations.c
===
--- gcc/lra-eliminations.c  (revision 192983)
+++ gcc/lra-eliminations.c  (working copy)
@@ -550,7 +550,8 @@
   return x;
 }
   else
-   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
+   return simplify_gen_subreg (GET_MODE (x), new_rtx,
+   GET_MODE (new_rtx), SUBREG_BYTE (x));
 }

return x;

I am running the full test.


It works.  Can you check in your patch?  I will check in
my testcase.

Thanks.  Vlad, is the patch OK?


Sure.  Thanks.



Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote:
 And tree expressions don't have TREE_BLOCK before gimple-low either.
 So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
 stmts as well as all expression in the operands.  It is not overwriting
 anything, no frontend sets TREE_BLOCK for any expression, the way frontends
 associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
 after gimplification, and it is gimple-low responsibility to set it.

 In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
 initially.  Before the location_t changes, again it was gimple-low that
 was the first setter of TREE_BLOCK, which was valid for all
 IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
 with block for all gimple stmts and all tree expressions used in its
 operands.  It shouldn't be set on trees that can be shared, so
 say decls etc. should keep using just location_t's without associated block.
 So perhaps the right test for gimple-low setting of block is
 CAN_HAVE_LOCATION_P (exp)  !tree_node_can_be_shared (exp).

 Jakub

 I Kind of like this idea. What do you guys think?

I question the need of BLOCK info on expression trees.  If BLOCKs are
relevant then the tree ends up referencing a declaration with a BLOCK as
context, no?  Thus, the case

  int tem, a;
  {
int a;
...
tem = a;
  }
  int b = tem + 5;

where we may end up with gimple like

  b = a + 5;

thus mixing two BLOCKs inside a stmt (and no expression trees to
attach different BLOCKs) is no different from the case where we end
up with expression trees.

Thus my original question - why isn't a NULL BLOCK treated the same
as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
Did you analyze the guality fails?

Thanks,
Richard.

 Thanks,
 Dehao


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Andrey Turetskiy
I changed the patch according Uros' remarks. Please, have a look.

Changelog:

2012-10-30  Andrey Turetskiy  andrey.turets...@gmail.com

   * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to
   CODE_FOR_avx2_pmulhrswv16hi3.
   * config/i386/predicates.md (const1_operand): Extend for vectors.
   * config/i386/sse.md (ssse3_avx2): Extend.
   (ssedoublemode): Ditto.
   (sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
   avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
   (*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
   *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
   (PMULHRSW): New.
   (ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3,
   ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
   (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
   (*ssse3_pmulhrswv8hi3): Ditto.
   (*ssse3_pmulhrswv4hi3): Ditto.

---
Best regards,
Andrey Turetskiy

On Wed, Oct 24, 2012 at 5:36 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Oct 24, 2012 at 3:01 PM, Andrey Turetskiy
 andrey.turets...@gmail.com wrote:

 On Tue, Oct 23, 2012 at 2:45 PM, Andrey Turetskiy
 andrey.turets...@gmail.com wrote:
 Hi,

 This patch replaces large const_vector constructions with
 match_operand in sse.md to decrease its size.
 Is it ok?

 No, you don't have to touch generic expand machinery.

 In the expander, use (match_dup X), and declare operands[X] =
 CONST1_RTX (..some_mode..) in preparation statement. In unnamed
 patterns, use const1_operand operand predicate. You should extend
 existing const1_operand in the same way as const0_operand.

 This approach is not compatible with named insn patterns, which
 duplicate its functionality as pattern matcher and as an expander.

 Uros.


const_vector_replace.patch
Description: Binary data


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Michael Matz
Hi,

On Tue, 30 Oct 2012, Richard Biener wrote:

 On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote:
  And tree expressions don't have TREE_BLOCK before gimple-low either.
  So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
  stmts as well as all expression in the operands.

That would be a step away from the ideal situation, so we rather should 
first analyze why the testcase fails with my patch.  I expected some 
fallout and am actually surprised it's only one testcase :)

What we should end up with in the ideal world is that we simply have no 
expressions in gimple (and hence no place to store any locations for 
them), except via gimple statements.

 I question the need of BLOCK info on expression trees.  If BLOCKs are 
 relevant then the tree ends up referencing a declaration with a BLOCK as 
 context, no?  Thus, the case
 
   int tem, a;
   {
 int a;
 ...
 tem = a;
   }
   int b = tem + 5;
 
 where we may end up with gimple like
 
   b = a + 5;
 
 thus mixing two BLOCKs inside a stmt (and no expression trees to
 attach different BLOCKs) is no different from the case where we end
 up with expression trees.
 
 Thus my original question - why isn't a NULL BLOCK treated the same
 as UNKNOWN_LOCATION?

Since merging location and block a null BLOCK doesn't imply no location.  
It can very well have a location without a block.  What we might want to 
imply is that a null BLOCK implies the BLOCK from the statement.  But as 
you say, first we should find out why my patch breaks the one testcase.

 Or rather, _why_ does Michas patch not work? Did you analyze the guality 
 fails?


Ciao,
Michael.


Re: [PR54693] loss of debug info in jump threading and loop ivopts

2012-10-30 Thread Richard Biener
On Fri, Oct 26, 2012 at 8:30 AM, Alexandre Oliva aol...@redhat.com wrote:
 Both jump threading and loop induction variable optimizations were
 dropping useful debug information, and it took improvements in both for
 debug info about relevant variables in the enclosed testcase to survive
 all the way to the end.

 The first problem was that jump threading could bypass blocks containing
 debug stmts, losing the required bindings.  The solution was to
 propagate bypassed debug insns into the target of the bypass.  Even if
 the new confluence ends up introducing new PHI nodes, the propagated
 debug binds will resolve to them, just as intended.  This is implemented
 in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch

 The second part of the problem was that, when performing loop ivopts,
 we'd often drop PHI nodes and other SSA names for unused ivs.  Although
 we had code to propagate SSA DEFs into debug uses upon their removal,
 this couldn't take care of PHI nodes (no debug phis or conditional debug
 binds), and it was precisely at the removal of a PHI node that we
 dropped debug info for the loop in the provided testcase.  Once Jakub
 figured out how to compute an unused iv out of available ivs (thanks!),
 it was easy to introduce debug temps with the expressions to compute
 them, so that debug binds would remain correct as long as the unused iv
 can still be computed out of the others.  (If it can't, we'll still try
 propagation, but may end up losing at the PHI nodes).  I had thought
 that replacing only the PHI nodes would suffice, but it turned out that
 replacing all unused iv defs with their equivalent used-IV expressions
 got us better coverage, so this is what the 5th patch below does:
 vta-ivopts-replace-dropped-phis-pr54693.patch

+ if (count  1)
+   {
+ tree vexpr = make_node (DEBUG_EXPR_DECL);
+ DECL_ARTIFICIAL (vexpr) = 1;
+ TREE_TYPE (vexpr) = TREE_TYPE (comp);
+ if (SSA_NAME_VAR (def))
+   DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def));
+ else
+   DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr));

simply always use the TREE_TYPE path.  TYPE_MODE is always valid
for SSA_NAMEs.

+ FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
+   {
+ if (!gimple_debug_bind_p (stmt))
+   continue;
+
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   SET_USE (use_p, comp);
+
+ if (!comp)
+   BREAK_FROM_IMM_USE_STMT (imm_iter);

how does comp magically become NULL_TREE here?

Btw, what's all the fuzz with IV candidates, etc?  At least for non-PHIs
I don't see why the regular release_ssa_name way of doing things does
not work.  IVOPTs is slow enough ...

Richard.



 Regression testing revealed -fcompare-debug regressions exposed by these
 patches.

 x86-specific code introduces pre-reload scheduling dependencies between
 calls and likely-spilled parameter registers, but it does so in a way
 that's AFAICT buggy, and fragile to the presence of debug insns at the
 top of a block: we won't introduce a dependency for the first insn of
 the block, even if we'd rather have such a dependency.  This fails to
 achieve the intended effect, and it also causes codegen differences when
 the first insn in the block happens to be a debug insn, for then we will
 add the intended dependency.  The first patch below,
 vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug
 insns, so as to remove the difference, and moves the end of the backward
 scan to the insn before the first actual insn, so that we don't refrain
 from considering it for dependencies.  This in turn required an
 additional test to make sure we wouldn't go past the nondebug head if
 first_arg happened to be the head.

 The introduction of debug insns at new spots also exposed a bug in loop
 unrolling: we'd unroll a loop a different number of times depending on
 whether or not its latch is an empty block.  The propagation or
 introduction of debug insns in previously-empty latch blocks caused
 loops to be unrolled a different number of times depending on the
 presence of the debug insns, which triggered -fcompare-debug errors.
 The fix was to count only nondebug insns towards the decision on whether
 the latch block was empty.  This is implemented in the second patch
 below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch

 Guality testsuite regressions given the patches above revealed that the
 fast DCE global dead debug substitution introduced for PR54551 was not
 correct: it was possible for us to visit, for the last time, a block
 with a REG used in debug stmts after its death before we visited the
 block with the debug use.  As a result, we'd fail to emit a debug temp
 at the not-revisited block, and the debug temp bindings introduced at
 other blocks might be insufficient 

Non-dominating loop bounds in tree-ssa-loop-niter 1/4

2012-10-30 Thread Jan Hubicka
Hi,
this is first patch of change of tree-ssa-loop-niter to consider bounds that are
not in block dominating latch.  This patch makes them to be recorded and they
are not used.  I plan to followup with:

1) patch to add simple shortest path walk at the end of 
estimate_numbers_of_iterations_loop
   determining bound of i.e.
   int a[10];
   int b[10];
   for (i=0;in;i++)
 if (t())
   q(a[i]);
 else
   q(a[i]);
2) make complette loop unrolling to kill all statements known to not be executed
   in the last iteration by __builtin_unreachable to silence part of 
-Warray-bounds
   warnings currently breaking bootstrap with -O3
3) make duplicate_loop_to_header_edge in peeling mode to do the same silencing
   the rest of warnings
4) make branch prediction code to drop the prediction on exits not dominating
   latch.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-loop-niter.c (number_of_iterations_exit): New parameter
EVERY_ITERATION with implicit value of true.
(record_estimate): Check dominance relationship of the basic block
we are estimating on instead of relying on UPPER to be false.
(struct ilb_data): Drop RELIABLE.
(idx_infer_loop_bounds): Update.
(infer_loop_bounds_from_ref): Drop parameter RELIABLE.
(infer_loop_bounds_from_array): Drop parameter RELIABLE.
(infer_loop_bounds_from_undefined): Update comments and handling
of RELIABLE.
(estimate_numbers_of_iterations_loop): Record all bounds.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192986)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -1793,12 +1793,15 @@ loop_only_exit_p (const struct loop *loo
meaning described in comments at struct tree_niter_desc
declaration), false otherwise.  If WARN is true and
-Wunsafe-loop-optimizations was given, warn if the optimizer is going to use
-   potentially unsafe assumptions.  */
+   potentially unsafe assumptions.  
+   When EVERY_ITERATION is true, only tests that are known to be executed
+   every iteration are considered (i.e. only test that alone bounds the loop). 
+ */
 
 bool
 number_of_iterations_exit (struct loop *loop, edge exit,
   struct tree_niter_desc *niter,
-  bool warn)
+  bool warn, bool every_iteration)
 {
   gimple stmt;
   tree type;
@@ -1806,7 +1809,8 @@ number_of_iterations_exit (struct loop *
   enum tree_code code;
   affine_iv iv0, iv1;
 
-  if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src))
+  if (every_iteration
+   !dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src))
 return false;
 
   niter-assumptions = boolean_false_node;
@@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree
   loop-bounds = elt;
 }
 
+  /* If statement is executed on every path to the loop latch, we can directly
+ infer the upper bound on the # of iterations of the loop.  */
+  if (!dominated_by_p (CDI_DOMINATORS, loop-latch, gimple_bb (at_stmt)))
+return;
+
   /* Update the number of iteration estimates according to the bound.
  If at_stmt is an exit then the loop latch is executed at most BOUND times,
  otherwise it can be executed BOUND + 1 times.  We will lower the estimate
@@ -2651,7 +2660,6 @@ struct ilb_data
 {
   struct loop *loop;
   gimple stmt;
-  bool reliable;
 };
 
 static bool
@@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree *
   struct ilb_data *data = (struct ilb_data *) dta;
   tree ev, init, step;
   tree low, high, type, next;
-  bool sign, upper = data-reliable, at_end = false;
+  bool sign, upper = true, at_end = false;
   struct loop *loop = data-loop;
 
   if (TREE_CODE (base) != ARRAY_REF)
@@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree *
STMT is guaranteed to be executed in every iteration of LOOP.*/
 
 static void
-infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref,
-   bool reliable)
+infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref)
 {
   struct ilb_data data;
 
   data.loop = loop;
   data.stmt = stmt;
-  data.reliable = reliable;
   for_each_index (ref, idx_infer_loop_bounds, data);
 }
 
@@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop 
executed in every iteration of LOOP.  */
 
 static void
-infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable)
+infer_loop_bounds_from_array (struct loop *loop, gimple stmt)
 {
   if (is_gimple_assign (stmt))
 {
@@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo
   /* For each memory access, analyze its access function
 and record a bound on the loop iteration domain.  */
   if (REFERENCE_CLASS_P (op0))
-   infer_loop_bounds_from_ref (loop, stmt, op0, reliable);
+   infer_loop_bounds_from_ref (loop, stmt, op0);
 
   if (REFERENCE_CLASS_P 

Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 3:49 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Tue, 30 Oct 2012, Richard Biener wrote:

 On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote:
  And tree expressions don't have TREE_BLOCK before gimple-low either.
  So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
  stmts as well as all expression in the operands.

 That would be a step away from the ideal situation, so we rather should
 first analyze why the testcase fails with my patch.  I expected some
 fallout and am actually surprised it's only one testcase :)

 What we should end up with in the ideal world is that we simply have no
 expressions in gimple (and hence no place to store any locations for
 them), except via gimple statements.

 I question the need of BLOCK info on expression trees.  If BLOCKs are
 relevant then the tree ends up referencing a declaration with a BLOCK as
 context, no?  Thus, the case

   int tem, a;
   {
 int a;
 ...
 tem = a;
   }
   int b = tem + 5;

 where we may end up with gimple like

   b = a + 5;

 thus mixing two BLOCKs inside a stmt (and no expression trees to
 attach different BLOCKs) is no different from the case where we end
 up with expression trees.

 Thus my original question - why isn't a NULL BLOCK treated the same
 as UNKNOWN_LOCATION?

 Since merging location and block a null BLOCK doesn't imply no location.
 It can very well have a location without a block.  What we might want to
 imply is that a null BLOCK implies the BLOCK from the statement.  But as
 you say, first we should find out why my patch breaks the one testcase.

Yes, I mean we happily leave the stmt line location the same if we have
a stmt with UNKNOWN_LOCATION (thus it inherits that of the previous stmt),
we should do the same with BLOCKs - if a stmt has a location with NULL
BLOCK then it should inherit the block info from the previous stmt.

Richard.

 Or rather, _why_ does Michas patch not work? Did you analyze the guality
 fails?


 Ciao,
 Michael.


Re: [PATCH] Inter-bb range test optimization (PRs tree-optimization/19105, tree-optimization/21643, tree-optimization/46309)

2012-10-30 Thread Richard Biener
On Fri, Oct 26, 2012 at 7:27 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 This patch extends optimize_range_tests optimization, so that it
 handles also the cases where the truth  or || has been gimplifed
 as a series of GIMPLE_CONDs or mixture thereof and BIT_{AND,IOR}_EXPR
 stmts.

 Example of code it handles is e.g.:
   bb 2:
   v1_3 = a_2(D) != 3;
   v2_4 = a_2(D) != 1;
   v3_5 = a_2(D) != 4;
   v4_6 = a_2(D) != 2;
   v5_7 = a_2(D) != 7;
   v6_8 = a_2(D) != 5;
   v7_9 = a_2(D) != 8;
   v8_10 = a_2(D) != 6;
   _11 = v1_3  v2_4;
   if (_11 != 0)
 goto bb 3;
   else
 goto bb 7;

   bb 3:
   _13 = v3_5  v4_6;
   if (_13 != 0)
 goto bb 4;
   else
 goto bb 7;

   bb 4:
   _14 = v5_7  v6_8;
   if (_14 != 0)
 goto bb 5;
   else
 goto bb 7;

   bb 5:
   _15 = v7_9  v8_10;
   if (_15 != 0)
 goto bb 6;
   else
 goto bb 7;

 or:

   bb 2:
   _3 = c_2(D) == 34;
   _4 = c_2(D) == 32;
   _5 = _3 | _4;
   if (_5 != 0)
 goto bb 4;
   else
 goto bb 3;

   bb 3:
   _8 = c_2(D) = 31;
   _7 = (int) _8;

   bb 4:
   # _1 = PHI _7(3), 1(2)

 It is implemented in reassociate_bb, as that is where all the
 infrastructure already is, but isn't done as part of normal
 reassociation, but before reassociating first bb that represents
 a series of  or || operands.  As post-dominator sons aren't
 particularly ordered, it can happen that the optimization is performed
 on the first, last or middle bb of the series of bbs, so it searches
 both forward and backward to find suitable basic blocks.
 The last bb in the series (last_bb) can be of the form of bb 3 in the
 second example, which is how certain sequencies are gimplified when
 assigning  or || result to a variable or returning it.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok, but the code could really really have some more comments - functions
not fitting in my 80x24 terminal without seeing any comment what happens
here are a maintainance nightmare.

Thanks,
Richard.

 2012-10-26  Jakub Jelinek  ja...@redhat.com

 PR tree-optimization/19105
 PR tree-optimization/21643
 PR tree-optimization/46309
 * tree-ssa-reassoc.c (init_range_entry): Add STMT argument
 and use it if EXP is NULL.
 (update_range_test): Handle OPCODE equal to ERROR_MARK
 and oe-op NULL.
 (optimize_range_tests): Likewise.
 (final_range_test_p, suitable_cond_bb, no_side_effect_bb, get_ops,
 maybe_optimize_range_tests): New functions.
 (reassociate_bb): Call maybe_optimize_range_tests if last
 stmt of bb is GIMPLE_COND that hasn't been visited yet.

 * gcc.dg/pr19105.c: New test.
 * gcc.dg/pr21643.c: New test.
 * gcc.dg/pr46309-2.c: New test.
 * gcc.c-torture/execute/pr46309.c: New test.

 --- gcc/tree-ssa-reassoc.c.jj   2012-10-25 09:21:08.657049321 +0200
 +++ gcc/tree-ssa-reassoc.c  2012-10-26 15:51:13.398025229 +0200
 @@ -1,5 +1,5 @@
  /* Reassociation for trees.
 -   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011
 +   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011, 2012
 Free Software Foundation, Inc.
 Contributed by Daniel Berlin d...@dberlin.org

 @@ -1713,10 +1713,12 @@ struct range_entry
  };

  /* This is similar to make_range in fold-const.c, but on top of
 -   GIMPLE instead of trees.  */
 +   GIMPLE instead of trees.  If EXP is non-NULL, it should be
 +   an SSA_NAME and STMT argument is ignored, otherwise STMT
 +   argument should be a GIMPLE_COND.  */

  static void
 -init_range_entry (struct range_entry *r, tree exp)
 +init_range_entry (struct range_entry *r, tree exp, gimple stmt)
  {
int in_p;
tree low, high;
 @@ -1727,7 +1729,8 @@ init_range_entry (struct range_entry *r,
r-strict_overflow_p = false;
r-low = NULL_TREE;
r-high = NULL_TREE;
 -  if (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
 +  if (exp != NULL_TREE
 +   (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp
  return;

/* Start with simply saying EXP != 0 and then look at the code of EXP
 @@ -1735,12 +1738,14 @@ init_range_entry (struct range_entry *r,
   happen, but it doesn't seem worth worrying about this.  We continue
   the outer loop when we've changed something; otherwise we break
   the switch, which will break the while.  */
 -  low = build_int_cst (TREE_TYPE (exp), 0);
 +  low = exp ? build_int_cst (TREE_TYPE (exp), 0) : boolean_false_node;
high = low;
in_p = 0;
strict_overflow_p = false;
is_bool = false;
 -  if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
 +  if (exp == NULL_TREE)
 +is_bool = true;
 +  else if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
  {
if (TYPE_UNSIGNED (TREE_TYPE (exp)))
 is_bool = true;
 @@ -1752,25 +1757,35 @@ init_range_entry (struct range_entry *r,

while (1)
  {
 -  gimple stmt;
enum tree_code code;
tree arg0, arg1, exp_type;
tree nexp;
 

Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

2012-10-30 Thread Richard Biener
On Sun, Oct 28, 2012 at 6:13 PM, Joern Rennecke
joern.renne...@embecosm.com wrote:
 Quoting Richard Biener richard.guent...@gmail.com:

 Thus, you can allow the length to vary downwards as well as upwards
 across iterations with suitable definitions of the @code{length}
 attribute
 and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does
 not
 lead to infinite loops.


 I don't see that you can shrink length with just suitable lock_length and
 length attributes.


 I disagree there (for certain values of 'just'), but we can just agree
 to disagree on this point because...

  What seems to be the cruical difference is that you
 apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
 But then you _decrease_ length with ADJUST_INSN_LENGHT ...

 Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
 applied afterwards?  Thus,


 Well, actually, I found a number of problems with ADJUST_INSN_LENGTH:
 - it is not applied to inner insn is a delay slot sequence.  You can
   sort of work around this by stitching recursive calls together, but
   then you are faced with another problem:
 - You don't know what the length prior to ADJUST_INSN_LENGTH was.  That
   was even worse with the non-unique uids where you didn't knew squat
   about the instructions in the delay slot.
 - As you said, I want the adjustment happen after the maximum calculation.
   Well, usually.  There are actually special cases where it would be useful
   to have some sort of maximum calculation in place, to break up
   alignment-driven cycles, but only applicable with a lesser priority than
   for the range-driven branch offset calculations.

 But adding yet another macro neither does solve all these problems, nor
 would it align with our goal to move away from target macros.

 I found now an alternate way to make the ARC port termiante building its
 insn-attrtab.c - it involves using match_test(get_attr_length (insn) == 2)
 instead of eq_attr([lock_]length (const_int 2)) - where I really want to
 know if the instruction was considered short in the previous iteration.

 So, I have made a patch to replace the ADJUST_INSN_LENGTH macro in final.c
 with an adjust_insn_length hook, for which a default implementation
 using ADJUST_INSN_LENGTH is provided in targhooks.c to provide for an
 easy transition.
 I've looked at the existing ports that use ADJUST_INSN_LENGTH, and some
 seem to prefer to return an adjustment to be added to the length, while
 others prefer to return the new length.  The latter seemed to be slightly
 more numerous, so I went with that.

 The hook has two new parameters:
 - a flag that tells it if the insn in question is a delay sequence.
   The default hook implementation skips the invocation of ADJUST_INSN_LENGTH
   in this case for the sake of compatibility.
 - a pointer to int to set the number of iteration loops till the length
   locking feature is supposed to apply to this instruction length when
   using the increasing size calculations.  The pointed-to value is
 initialized
   to zero, which means that length locking is always applied (assuming
 final.c
   uses the increasing length algorithm).  Setting this to a higher number
   effectively gives the new instruction length a lower priority to be put
   into uid_lock_length.

 Note that

 Care has to be taken that this does not
 lead to infinite loops.


 doesn't convince me that is properly designed ;)


 With the hook mechanism, it is much harder to create an infinite loop
 inside shorten_branches.  (It would involve something like setting
  iteration_threshold to MAX_INT and making length decreasing when niter
  is at MAX_INT, then letting integer overflow of niter take its course.
  Making it impossible for a port maintainer to get things wrong is not a
  meaningful goal for GCC, but making it straightforward to get it right is.)

 This patch builds on:
 http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html

 Bootstrapped in revision 192879 on i686-pc-linux-gnu.
 Tested with config-list.mk on x86_64-unknown-linux-gnu.

Apart from the iteration_threshold the hookization would be straight-forward.
Now I cannot decipher from the patch what functional change it introduces ;)

There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
warrant introducing a hook without removing the macro IMHO.

Thanks,
Richard.

 2012-10-28  Joern Rennecke  joern.renne...@embecosm.com

 * doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add.
 * doc/tm.texi: Regenerate.
 * final.c (get_attr_length_1): Use targetm.adjust_insn_length
 instead of ADJUST_INSN_LENGTH.
 (adjust_length): New function.
 (shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH.
 * target.def (adjust_insn_length): New hook.
 * targhooks.c (default_adjust_insn_length): New function.
 * targhooks.h (default_adjust_insn_length): Declare.

 diff -drup gcc-192879-haveattr/doc/tm.texi 

Re: Non-dominating loop bounds in tree-ssa-loop-niter 1/4

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Jan Hubicka wrote:

 Hi,
 this is first patch of change of tree-ssa-loop-niter to consider bounds that 
 are
 not in block dominating latch.  This patch makes them to be recorded and they
 are not used.  I plan to followup with:
 
 1) patch to add simple shortest path walk at the end of 
 estimate_numbers_of_iterations_loop
determining bound of i.e.
int a[10];
int b[10];
for (i=0;in;i++)
  if (t())
q(a[i]);
  else
q(a[i]);
 2) make complette loop unrolling to kill all statements known to not be 
 executed
in the last iteration by __builtin_unreachable to silence part of 
 -Warray-bounds
warnings currently breaking bootstrap with -O3
 3) make duplicate_loop_to_header_edge in peeling mode to do the same silencing
the rest of warnings
 4) make branch prediction code to drop the prediction on exits not dominating
latch.
 
 Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

   * tree-ssa-loop-niter.c (number_of_iterations_exit): New parameter
   EVERY_ITERATION with implicit value of true.
   (record_estimate): Check dominance relationship of the basic block
   we are estimating on instead of relying on UPPER to be false.
   (struct ilb_data): Drop RELIABLE.
   (idx_infer_loop_bounds): Update.
   (infer_loop_bounds_from_ref): Drop parameter RELIABLE.
   (infer_loop_bounds_from_array): Drop parameter RELIABLE.
   (infer_loop_bounds_from_undefined): Update comments and handling
   of RELIABLE.
   (estimate_numbers_of_iterations_loop): Record all bounds.
 Index: tree-ssa-loop-niter.c
 ===
 --- tree-ssa-loop-niter.c (revision 192986)
 +++ tree-ssa-loop-niter.c (working copy)
 @@ -1793,12 +1793,15 @@ loop_only_exit_p (const struct loop *loo
 meaning described in comments at struct tree_niter_desc
 declaration), false otherwise.  If WARN is true and
 -Wunsafe-loop-optimizations was given, warn if the optimizer is going to 
 use
 -   potentially unsafe assumptions.  */
 +   potentially unsafe assumptions.  
 +   When EVERY_ITERATION is true, only tests that are known to be executed
 +   every iteration are considered (i.e. only test that alone bounds the 
 loop). 
 + */
  
  bool
  number_of_iterations_exit (struct loop *loop, edge exit,
  struct tree_niter_desc *niter,
 -bool warn)
 +bool warn, bool every_iteration)
  {
gimple stmt;
tree type;
 @@ -1806,7 +1809,8 @@ number_of_iterations_exit (struct loop *
enum tree_code code;
affine_iv iv0, iv1;
  
 -  if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src))
 +  if (every_iteration
 +   !dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src))
  return false;
  
niter-assumptions = boolean_false_node;
 @@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree
loop-bounds = elt;
  }
  
 +  /* If statement is executed on every path to the loop latch, we can 
 directly
 + infer the upper bound on the # of iterations of the loop.  */
 +  if (!dominated_by_p (CDI_DOMINATORS, loop-latch, gimple_bb (at_stmt)))
 +return;
 +
/* Update the number of iteration estimates according to the bound.
   If at_stmt is an exit then the loop latch is executed at most BOUND 
 times,
   otherwise it can be executed BOUND + 1 times.  We will lower the 
 estimate
 @@ -2651,7 +2660,6 @@ struct ilb_data
  {
struct loop *loop;
gimple stmt;
 -  bool reliable;
  };
  
  static bool
 @@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree *
struct ilb_data *data = (struct ilb_data *) dta;
tree ev, init, step;
tree low, high, type, next;
 -  bool sign, upper = data-reliable, at_end = false;
 +  bool sign, upper = true, at_end = false;
struct loop *loop = data-loop;
  
if (TREE_CODE (base) != ARRAY_REF)
 @@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree *
 STMT is guaranteed to be executed in every iteration of LOOP.*/
  
  static void
 -infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref,
 - bool reliable)
 +infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref)
  {
struct ilb_data data;
  
data.loop = loop;
data.stmt = stmt;
 -  data.reliable = reliable;
for_each_index (ref, idx_infer_loop_bounds, data);
  }
  
 @@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop 
 executed in every iteration of LOOP.  */
  
  static void
 -infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable)
 +infer_loop_bounds_from_array (struct loop *loop, gimple stmt)
  {
if (is_gimple_assign (stmt))
  {
 @@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo
/* For each memory access, analyze its access function
and record a bound on the loop iteration domain.  */
if 

Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote:
 I question the need of BLOCK info on expression trees.  If BLOCKs are
 relevant then the tree ends up referencing a declaration with a BLOCK as
 context, no?  Thus, the case
 
   int tem, a;
   {
 int a;
 ...
 tem = a;
   }
   int b = tem + 5;
 
 where we may end up with gimple like
 
   b = a + 5;
 
 thus mixing two BLOCKs inside a stmt (and no expression trees to
 attach different BLOCKs) is no different from the case where we end
 up with expression trees.

IMHO either we don't use locations at all on tree expressions (thus
no source location nor block), or both.  A source location has always
an associated block it is present in.  Of course for shared trees we
can't put there any block, as it can appear anywhere.

 Thus my original question - why isn't a NULL BLOCK treated the same
 as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
 Did you analyze the guality fails?

Micha's patch is degrading debug info quality.  Whenever some expression
has different source location from the source location of the gimple stmt,
then assuming that other source location is from the same block is wrong,
it could very well be from a different one.  On the testcase that fails
with Micha's patch, we have:
  [pr43479.c : 8:4] l_2 = l_1(D) + 1;
  [pr43479.c : 8:4] # DEBUG l = l_2
  [pr43479.c : 10:9] # DEBUG h = n_3(D)
  [pr43479.c : 12:11] # DEBUG i = k_4(D)
  [pr43479.c : 13:8] k_5 = k_4(D) + 1;
  [pr43479.c : 13:8] # DEBUG k = k_5
  [pr43479.c : 17:11] # DEBUG j = m_6(D)
  [pr43479.c : 18:8] m_7 = m_6(D) + 1;
  [pr43479.c : 18:8] # DEBUG m = m_7
  [pr43479.c : 22:3] __asm__ __volatile__( :  : r k_5, r l_2);
  [pr43479.c : 23:3] __asm__ __volatile__( :  : r m_7, r n_3(D));
where line 8 is from the outer block in the source, line 10 from the middle
block, line 12/13 from first innermost block, line 17/18 from second
innermost block.  But all of the l_2, k_5 and m_7 setters are TERed,
so everything is emitted when expanding the two asm
statements and with Micha's patch the block used is the block of the asm
statement, while previously each TERed statement got its own block.

I'd say either we should do the TREE_BLOCK setting on all non-shareable
trees during gimple-low and clear the block (but then likely whole
location?; it doesn't make sense to say in the debugger that something
has certain source location when you can't print variables declared in that
location) if copying expressions for use elsewhere, outside of the
containing function.  Or say during gimplification or gimple-low.c
simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
to make it clear we don't use it (wonder if that could affect debug info
quality, perhaps not that much), but during expansion if creating trees
from TERed stmts they need to be set back, or the current location/block
adjusted accordingly.

Jakub


Re: [PATCH] PR c++/54955 - Fail to parse alignas expr at the beginning of a declaration

2012-10-30 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 4:03 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote:
 I question the need of BLOCK info on expression trees.  If BLOCKs are
 relevant then the tree ends up referencing a declaration with a BLOCK as
 context, no?  Thus, the case

   int tem, a;
   {
 int a;
 ...
 tem = a;
   }
   int b = tem + 5;

 where we may end up with gimple like

   b = a + 5;

 thus mixing two BLOCKs inside a stmt (and no expression trees to
 attach different BLOCKs) is no different from the case where we end
 up with expression trees.

 IMHO either we don't use locations at all on tree expressions (thus
 no source location nor block), or both.  A source location has always
 an associated block it is present in.  Of course for shared trees we
 can't put there any block, as it can appear anywhere.

 Thus my original question - why isn't a NULL BLOCK treated the same
 as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
 Did you analyze the guality fails?

 Micha's patch is degrading debug info quality.  Whenever some expression
 has different source location from the source location of the gimple stmt,
 then assuming that other source location is from the same block is wrong,
 it could very well be from a different one.  On the testcase that fails
 with Micha's patch, we have:
   [pr43479.c : 8:4] l_2 = l_1(D) + 1;
   [pr43479.c : 8:4] # DEBUG l = l_2
   [pr43479.c : 10:9] # DEBUG h = n_3(D)
   [pr43479.c : 12:11] # DEBUG i = k_4(D)
   [pr43479.c : 13:8] k_5 = k_4(D) + 1;
   [pr43479.c : 13:8] # DEBUG k = k_5
   [pr43479.c : 17:11] # DEBUG j = m_6(D)
   [pr43479.c : 18:8] m_7 = m_6(D) + 1;
   [pr43479.c : 18:8] # DEBUG m = m_7
   [pr43479.c : 22:3] __asm__ __volatile__( :  : r k_5, r l_2);
   [pr43479.c : 23:3] __asm__ __volatile__( :  : r m_7, r n_3(D));
 where line 8 is from the outer block in the source, line 10 from the middle
 block, line 12/13 from first innermost block, line 17/18 from second
 innermost block.  But all of the l_2, k_5 and m_7 setters are TERed,
 so everything is emitted when expanding the two asm
 statements and with Micha's patch the block used is the block of the asm
 statement, while previously each TERed statement got its own block.

 I'd say either we should do the TREE_BLOCK setting on all non-shareable
 trees during gimple-low and clear the block (but then likely whole
 location?; it doesn't make sense to say in the debugger that something
 has certain source location when you can't print variables declared in that
 location) if copying expressions for use elsewhere, outside of the
 containing function.  Or say during gimplification or gimple-low.c
 simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
 to make it clear we don't use it (wonder if that could affect debug info
 quality, perhaps not that much), but during expansion if creating trees
 from TERed stmts they need to be set back, or the current location/block
 adjusted accordingly.

So maybe TER (well, those looking up the stmt) should pick the location
from the TERed statement properly then?

Richard.

 Jakub


Re: [PR54693] loss of debug info in jump threading and loop ivopts

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 03:51:19PM +0100, Richard Biener wrote:
 + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
 +   {
 + if (!gimple_debug_bind_p (stmt))
 +   continue;
 +
 + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
 +   SET_USE (use_p, comp);
 +
 + if (!comp)
 +   BREAK_FROM_IMM_USE_STMT (imm_iter);
 
 how does comp magically become NULL_TREE here?

Looks like pasto to me, from the first loop.

BTW, I'd also think that the first loop should set count = 2 if
the debug stmt already has a non-trivial expression (i.e. rhs isn't
just the SSA_NAME), to force a debug temp in that case to avoid creating too
large debug stmts.

 Btw, what's all the fuzz with IV candidates, etc?  At least for non-PHIs
 I don't see why the regular release_ssa_name way of doing things does
 not work.  IVOPTs is slow enough ...

Even if it is non-PHI, release_ssa_name will replace it with the definition,
and then on another release_ssa_name again and again, and finally either
be lucky enough that some SSA_NAME stays (is an IV that has been kept), but
more often you'll just reach the PHI node and end up with a long list of:
  DEBUG D#7 = NULL
  DEBUG D#6 = D#7
  DEBUG D#5 = D#6
  DEBUG D#4 = D#5
  DEBUG D#3 = D#4
  DEBUG D#2 = D#3
  DEBUG D#1 = D#2
(the NULL because of PHI).  We don't need to find optimal IV replacement,
so the code tries just a couple of them (perhaps the 64 should be a param,
and perhaps could be lower by default), it just helps if the expression is
smaller (smaller debug info), and if it contains as few SSA_NAMEs as
possible (then it is more likely it will actually be useful).

Jakub


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
 So maybe TER (well, those looking up the stmt) should pick the location
 from the TERed statement properly then?

Perhaps, but Micha's patch doesn't do that.
But in that case IMHO it still would help to set all expr locations to
UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
the locations.

Jakub


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai sing...@google.com wrote:
 As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
 I have added the -fopt-info pass filtering in the attached patch.

 The basic idea is that there are optimization pass groups and a user
 can selectively enable dumps for these group(s) via command-line
 syntax. Currently, I define the following optimization groups: 'loop',
 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.

 If a pass doesn't explicitly specify an optimization group, (denoted
 by OPTGROUP_NONE) then a group is assigned based on the pass type.
 These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.

 Also there is a catch-all group, called 'optall'.

 The options for -fopt-info dumps verbosity remain 'optimized',
 'missed', 'note', and 'all'. Since these two types of options,
 verbosity and optimization groups are non-overlapping, I have decided
 to freely mix them. Something like this works as expected, i.e., dump
 missed vectorizer info into vec.missed.

 gcc ... -fopt-info-vec-missed=vec.missed

 which is equivalent to

 gcc ... -fopt-info-missed-vec=vec.missed

 However, the order still matters, and it can be somewhat confusing. For 
 example,

 gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt

 will dump missed and optimized vectorizer info into vec.opt, while no
 vec.miss is produced. This is due to the fact that the latter group
 specifier, 'vec' overrides the first one. However, the 'missed' and
 'optimized' are both honored as there is no conflict there. This is
 somewhat confusing. Hopefully, this type of usage would not be common.

What I'd expect from that would be both vec.miss and vec.opt being
populated ... (thus go away from the duality of dump files to
primary dump file plus a set of alternate dump files).

 I have updated the documentation to include -fopt-info examples, and
 added some details about -fopt-info command line conflicts.

I like it overall, not sure if we want to pre-populate the OPTGROUP
set too much at this point.  Like what is 'tree' or 'rtl' to users?
nothing I think.
'ipa' yes.  'lto'?  sounds redundant with 'ipa' to me.  'omp'?  we don't have
any optimizations here.

Thus please drop TREE, RTL, LTO and OMP for now.

Otherwise I'm leaving it for comments from other folks.

Thanks,
Richard.

 Thanks,
 Sharad


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
 So maybe TER (well, those looking up the stmt) should pick the location
 from the TERed statement properly then?

 Perhaps, but Micha's patch doesn't do that.
 But in that case IMHO it still would help to set all expr locations to
 UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
 the locations.

Yes indeed.

Richard.

 Jakub


Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed

2012-10-30 Thread Vladimir Makarov

On 10/30/2012 06:34 AM, Richard Sandiford wrote:

H.J. Lu hjl.to...@gmail.com writes:

LRA has

   if (REG_P (reg)  (ep = get_elimination (reg)) != NULL)
 {
   rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx;

   if (! replace_p)
 {
   offset += (ep-offset - ep-previous_offset);
   offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
 }

   if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
 to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx);

Reload has

 rtx to_rtx = ep-to_rtx;
 offset += ep-offset;
 offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));

 if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
   to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)),
 to_rtx);

(gdb) call debug_rtx (ep-to_rtx)
(reg/f:DI 7 sp)
(gdb) call debug_rtx (ep-from_rtx)
(reg/f:DI 16 argp)
(gdb)

gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp)
for LRA.   They are caused by

   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
   /* We should convert arg register in LRA after the elimination
  if it is possible.  */
xregno == ARG_POINTER_REGNUM
! lra_in_progress)
 return -1;

It doesn't work in this case.


This testcase shows that LRA can't convert arg register after
the elimination.


Here is a patch to remove ra_in_progress check for
ARG_POINTER_REGNUM.  Tested on Linux.x86-64.
OK to install?

Thanks HJ.  This looks good to me.  As well as your testcase, I think
it would be dangerous to reduce this kind of subreg during non-final
elimination in cases where the argument pointer occupies more than one
hard register (like avr IIRC).  We could end up with something like
ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register
during the rest of LRA.

It's important that we do get rid of the subreg during the final
elimination stage, but I think alter_subreg already handles that case.

Since this code is outside the LRA files: patch is OK if Vlad agrees.


I added this code for a reason probably to solve some target problems.

So I am not sure but let us try.

It is ok for me to commit the patch if there are no regressions on 
x86/x86-64.





Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Uros Bizjak
On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy
andrey.turets...@gmail.com wrote:
 I changed the patch according Uros' remarks. Please, have a look.

 Changelog:

 2012-10-30  Andrey Turetskiy  andrey.turets...@gmail.com

* config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 
 to
CODE_FOR_avx2_pmulhrswv16hi3.
* config/i386/predicates.md (const1_operand): Extend for vectors.
* config/i386/sse.md (ssse3_avx2): Extend.
(ssedoublemode): Ditto.
(sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
(*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
(PMULHRSW): New.
(ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3,
ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
(*ssse3_pmulhrswv8hi3): Ditto.
(*ssse3_pmulhrswv4hi3): Ditto.

+{
+  ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands);
+  operands[3] = CONST1_RTX(MODEmode);
+})

Please put operands[3] initialization before the call to
ix86_f_b_o_n_c. We don't want to pass uninitialized operands around.

+{
+  if (which_alternative == 0
+   (MODEmode == V16QImode
+  || MODEmode == V8HImode))
+return pavgssemodesuffix\t{%2, %0|%0, %2};
+  return vpavgssemodesuffix\t{%2, %1, %0|%0, %1, %2};
+}
+  [(set (attr isa)
+ (if_then_else
+   (match_test which_alternative == 0  (MODEmode ==
V16QImode || MODEmode == V8HImode))
+ (const_string noavx)
+ (const_string avx)))
+   (set (attr prefix_data16)
+(if_then_else (eq_attr isa noavx)
+  (const_string 1)
+  (const_string *)))
+   (set (attr prefix)
+(if_then_else (eq_attr isa noavx)
+  (const_string orig)
+  (const_string vex)))

Uh, oh.

Please note that isa attribute enables or disables the alternative
through enabled attribute. Just change the mode attribute to
sseinsnmode and everything will magically work:
- AVX2 implies AVX, so it enables alternative 1, while disabling
alternative 0 (and vice versa when AVX is disabled through noavx isa
attribute).
- Correct modes are conditionally enabled via VI12_AVX2 iterator
- Base ISA level is enabled via insn predicate (TARGET_SSE2).

You have to track three dependant conditions to calculate how insn
pattern/mode/operand predicate are enabled ;)

Uros,


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 1:21 AM, Sharad Singhai sing...@google.com wrote:
 As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
 I have added the -fopt-info pass filtering in the attached patch.

 The basic idea is that there are optimization pass groups and a user
 can selectively enable dumps for these group(s) via command-line
 syntax. Currently, I define the following optimization groups: 'loop',
 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.

 If a pass doesn't explicitly specify an optimization group, (denoted
 by OPTGROUP_NONE) then a group is assigned based on the pass type.
 These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.


I agree with Richard -- we don't need these implicit groups.


 Also there is a catch-all group, called 'optall'.


Why is this needed? Isn't that the default?



 The options for -fopt-info dumps verbosity remain 'optimized',
 'missed', 'note', and 'all'. Since these two types of options,
 verbosity and optimization groups are non-overlapping, I have decided
 to freely mix them. Something like this works as expected, i.e., dump
 missed vectorizer info into vec.missed.

 gcc ... -fopt-info-vec-missed=vec.missed

 which is equivalent to

 gcc ... -fopt-info-missed-vec=vec.missed

 However, the order still matters, and it can be somewhat confusing. For 
 example,

 gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt

 will dump missed and optimized vectorizer info into vec.opt, while no
 vec.miss is produced. This is due to the fact that the latter group
 specifier, 'vec' overrides the first one. However, the 'missed' and
 'optimized' are both honored as there is no conflict there. This is
 somewhat confusing. Hopefully, this type of usage would not be common.

Please document that only one alt dump file per optimization-group is supported.


 I have updated the documentation to include -fopt-info examples, and
 added some details about -fopt-info command line conflicts.

thanks,

David


 Thanks,
 Sharad


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Eric Botcazou
 I'd say either we should do the TREE_BLOCK setting on all non-shareable
 trees during gimple-low and clear the block (but then likely whole
 location?; it doesn't make sense to say in the debugger that something
 has certain source location when you can't print variables declared in that
 location) if copying expressions for use elsewhere, outside of the
 containing function.  Or say during gimplification or gimple-low.c
 simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
 to make it clear we don't use it (wonder if that could affect debug info
 quality, perhaps not that much), but during expansion if creating trees
 from TERed stmts they need to be set back, or the current location/block
 adjusted accordingly.

The debugger isn't the only consumer of debug info, and other tools might need 
a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work 
around a debug info size issue seems very short-sighted (to say the least).

-- 
Eric Botcazou


[PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Florian Weimer

On 09/17/2012 12:54 PM, Florian Weimer wrote:

On 09/17/2012 12:15 PM, Paolo Carlini wrote:

Hi,

On 09/17/2012 11:51 AM, Florian Weimer wrote:

On 08/21/2012 12:37 PM, Florian Weimer wrote:

I don't think there are any callers out there, but let's fix this for
completeness.

A compiler emitting code to call this function would still have to
perform overflow checks for the new T[n][m] case, so this interface is
not as helpful as it looks at first glance.

Tested on x86_64-redhat-linux-gnu.


Ping?

This function is apparently used by compilers based on the EDG
front-end, so it's not actually dead.



Being code that touches the library, the patch should go to the
libstdc++ maliling list too. Likewise the testcase, should be in the
libstdc++ testsuite, I guess.


Oh, I thought that this wouldn't apply to internal C++ support code.
Sorry.


That said, I didn't really follow the details of your recent work. Who
did? Jason? I would gently ping the same maintainer.


Indeed, Jason reviewed that.  Cc:ing.


Ping?

Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html

Thanks,
Florian
--
Florian Weimer / Red Hat Product Security Team


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 8:28 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai sing...@google.com wrote:
 As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
 I have added the -fopt-info pass filtering in the attached patch.

 The basic idea is that there are optimization pass groups and a user
 can selectively enable dumps for these group(s) via command-line
 syntax. Currently, I define the following optimization groups: 'loop',
 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.

 If a pass doesn't explicitly specify an optimization group, (denoted
 by OPTGROUP_NONE) then a group is assigned based on the pass type.
 These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.

 Also there is a catch-all group, called 'optall'.

 The options for -fopt-info dumps verbosity remain 'optimized',
 'missed', 'note', and 'all'. Since these two types of options,
 verbosity and optimization groups are non-overlapping, I have decided
 to freely mix them. Something like this works as expected, i.e., dump
 missed vectorizer info into vec.missed.

 gcc ... -fopt-info-vec-missed=vec.missed

 which is equivalent to

 gcc ... -fopt-info-missed-vec=vec.missed

 However, the order still matters, and it can be somewhat confusing. For 
 example,

 gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt

 will dump missed and optimized vectorizer info into vec.opt, while no
 vec.miss is produced. This is due to the fact that the latter group
 specifier, 'vec' overrides the first one. However, the 'missed' and
 'optimized' are both honored as there is no conflict there. This is
 somewhat confusing. Hopefully, this type of usage would not be common.

 What I'd expect from that would be both vec.miss and vec.opt being
 populated ... (thus go away from the duality of dump files to
 primary dump file plus a set of alternate dump files).

 I have updated the documentation to include -fopt-info examples, and
 added some details about -fopt-info command line conflicts.

 I like it overall, not sure if we want to pre-populate the OPTGROUP
 set too much at this point.  Like what is 'tree' or 'rtl' to users?
 nothing I think.
 'ipa' yes.  'lto'?  sounds redundant with 'ipa' to me.  'omp'?  we don't have
 any optimizations here.


OMP is a high level transformation, and it seems to be a good
candidate group, but this part does not need to be designed now. For
instance, there are a bunch of FDO related transformation (indirect
call promotion, memcpy transformation etc), and coverage mismatch
notes etc a good candidate to be filtered.

thanks,

David



 Thus please drop TREE, RTL, LTO and OMP for now.

 Otherwise I'm leaving it for comments from other folks.

 Thanks,
 Richard.

 Thanks,
 Sharad


Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

2012-10-30 Thread Joern Rennecke

Quoting Richard Biener richard.guent...@gmail.com:


Apart from the iteration_threshold the hookization would be straight-forward.
Now I cannot decipher from the patch what functional change it introduces ;)


The only change occurs if we reach an iteration count of MAX_INT iterations -
which should already be indicative of a problem.

At the MAX_INTth iteration, we will apply the length locking logic to
instructions inside a delay slot sequence as well.

If we disregard this exceptional case, there should be no functional changes
unless someone defines TARGET_ADJUST_INSN_LENGTH.

uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
particular, the elements corresponding to instructions inside delay slot
sequences are initialized to zero.

As default hook sets *iter_threshold to MAX_INT when inside a delay sequence,
and doesn't change length, the max operation with uid_lock_length is a no-op,
and *niter  iter_threshold is true, hence a length change results in
updating the length immediately, without changing uid_lock_length.

In the case that we are not inside a delay slot sequence, the default hook
leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when niter
is 0 or larger, as is the case for the ordinary looping operation, we
always find *niter = iter_threshold, and thus apply the length  
locking mechanism.


If we are in the preliminary pass, or doing the single !increasing iteration,
niter is set to -1, so *inter  iter_threshold is always true, so again we
update the length immediately.


There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
warrant introducing a hook without removing the macro IMHO.


Ok, I can prepare a patch for that, even though it makes it even a bit
less obvious that there's no functional change.

What about the preparatory patch
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ?
Can I check that in now?


Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Paolo Carlini
Hi,

Florian Weimer fwei...@redhat.com ha scritto:

Ping?

Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html

Sorry, I don't know the code well enough to review your patch, but since I'm in 
CC, I still don't understand why, instead of adding a full libstdc++ testcase 
you are extending a C++ testcase, in old-deja even, normally considered legacy.

Paolo




[PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Steven Bosscher
Hello,

Hot/cold partitioning is apparently a hot topic all of a sudden, which
is a good thing of course, because it's in need of some TLC.

The attached patch adds another check the RTL cfg checking
(verify_flow_info) for the partitioning: A hot block can never be
dominated by a cold block (because the dominated block must also be
cold). This trips in PR55121.

I haven't tested this with any profiling tests, but it's bound to
break things. From my POV, whatever gets broken by this patch was
already broken to begin with :-)   If you're in CC, it's because I
hope you can help test this patch.

Downside of this patch is that I need dominance info. If it's not
available, I compute and free it. I'm not sure if this works if
dominance info status is DOM_NO_FAST_QUERY, and I don't want to
recompute in this case because IMHO a verifier should be a no-op from
the POV of the rest of the compiler, and updating dominators would
make this patch a not-a-no-op :-)

Thoughts/comments?

Ciao!
Steven

* cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the
hot partition are not dominated by blocks in the cold partition.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 191819)
+++ cfgrtl.c(working copy)
@@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;

   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void)
n_eh++;
  else if (e-flags  EDGE_ABNORMAL)
n_abnormal++;
+
+ have_partitions |= is_crossing;
}

   if (n_eh  !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void)
  }
 }

+  /* If there are partitions, do a sanity check on them: A basic block in
+ a cold partition cannot dominate a basic block in a hot partition.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  if (have_partitions  !err)
+FOR_EACH_BB (bb)
+  if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+{
+  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+  basic_block son;
+
+  if (dom_calculated_here)
+   calculate_dominance_info (CDI_DOMINATORS);
+
+  while (! VEC_empty (basic_block, bbs_in_cold_partition))
+   {
+ bb = VEC_pop (basic_block, bbs_in_cold_partition);
+ if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+   {
+ error (non-cold basic block %d dominated 
+by a block in the cold partition, bb-index);
+ err = 1;
+   }
+ for (son = first_dom_son (CDI_DOMINATORS, bb);
+  son;
+  son = next_dom_son (CDI_DOMINATORS, son))
+   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+   }
+
+  if (dom_calculated_here)
+   free_dominance_info (CDI_DOMINATORS);
+}
+
   /* Clean up.  */
   return err;
 }


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
On Tue, Oct 30, 2012 at 8:29 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
 So maybe TER (well, those looking up the stmt) should pick the location
 from the TERed statement properly then?

 Perhaps, but Micha's patch doesn't do that.
 But in that case IMHO it still would help to set all expr locations to
 UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
 the locations.

 Yes indeed.

I agree, but this looks like too bold a move at this point. Shall we
do that in 4.8?

BTW, I updated the patch to ensure pr43479.c works fine. The testing
is still on-going.

Dehao

gcc/ChangeLog:
2012-10-25  Dehao Chen  de...@google.com

* tree-eh.c (do_return_redirection): Set location for jump statement.
(do_goto_redirection): Likewise.
(frob_into_branch_around): Likewise.
(lower_try_finally_nofallthru): Likewise.
(lower_try_finally_copy): Likewise.
(lower_try_finally_switch): Likewise.
* expr.c (store_expr): Use current insn location instead of expr
location.
(expand_expr_real): Likewise.
(expand_expr_real_1): Likewise.

gcc/testsuite/ChangeLog:
2012-10-25  Dehao Chen  de...@google.com

* g++.dg/debug/dwarf2/block.C: New testcase.
Index: testsuite/g++.dg/debug/dwarf2/block.C
===
--- testsuite/g++.dg/debug/dwarf2/block.C   (revision 0)
+++ testsuite/g++.dg/debug/dwarf2/block.C   (revision 0)
@@ -0,0 +1,29 @@
+// Compiler should not generate too many lexical blocks for this function.
+// { dg-do compile { target { i?86-*-* x86_64-*-* } } }
+// { dg-options -O0 -fno-exceptions -g -dA }
+
+union UElement {
+void* pointer;
+int integer;
+};
+struct UColToken {
+  unsigned source;
+  unsigned char **rulesToParseHdl;
+};
+
+int uhash_hashTokens(const union UElement k)
+{
+  int hash = 0;
+  struct UColToken *key = (struct UColToken *)k.pointer;
+  if (key != 0) {
+int len = (key-source  0xFF00)24;
+int inc = ((len - 32) / 32) + 1;
+const unsigned char *p = (key-source  0x00FF)
++ *(key-rulesToParseHdl);
+const unsigned char *limit = p + len;
+hash = *p + *limit;
+  }
+  return hash;
+}
+
+// { dg-final { scan-assembler-not LBB10 } }
Index: tree-eh.c
===
--- tree-eh.c   (revision 192809)
+++ tree-eh.c   (working copy)
@@ -739,6 +739,7 @@ do_return_redirection (struct goto_queue_node *q,
 gimple_seq_add_seq (q-repl_stmt, mod);
 
   x = gimple_build_goto (finlab);
+  gimple_set_location (x, q-location);
   gimple_seq_add_stmt (q-repl_stmt, x);
 }
 
@@ -758,6 +759,7 @@ do_goto_redirection (struct goto_queue_node *q, tr
 gimple_seq_add_seq (q-repl_stmt, mod);
 
   x = gimple_build_goto (finlab);
+  gimple_set_location (x, q-location);
   gimple_seq_add_stmt (q-repl_stmt, x);
 }
 
@@ -857,6 +859,7 @@ frob_into_branch_around (gimple tp, eh_region regi
   if (!over)
over = create_artificial_label (loc);
   x = gimple_build_goto (over);
+  gimple_set_location (x, loc);
   gimple_seq_add_stmt (cleanup, x);
 }
   gimple_seq_add_seq (eh_seq, cleanup);
@@ -1085,6 +1088,7 @@ lower_try_finally_nofallthru (struct leh_state *st
  emit_post_landing_pad (eh_seq, tf-region);
 
  x = gimple_build_goto (lab);
+ gimple_set_location (x, gimple_location (tf-try_finally_expr));
  gimple_seq_add_stmt (eh_seq, x);
}
 }
@@ -1223,6 +1227,7 @@ lower_try_finally_copy (struct leh_state *state, s
 
   tmp = lower_try_finally_fallthru_label (tf);
   x = gimple_build_goto (tmp);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (new_stmt, x);
 }
 
@@ -1395,6 +1400,7 @@ lower_try_finally_switch (struct leh_state *state,
 
   tmp = lower_try_finally_fallthru_label (tf);
   x = gimple_build_goto (tmp);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (switch_body, x);
 }
 
@@ -1423,6 +1429,7 @@ lower_try_finally_switch (struct leh_state *state,
   gimple_seq_add_stmt (eh_seq, x);
 
   x = gimple_build_goto (finally_label);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (eh_seq, x);
 
   tmp = build_int_cst (integer_type_node, eh_index);
Index: expr.c
===
--- expr.c  (revision 192809)
+++ expr.c  (working copy)
@@ -5030,7 +5030,7 @@ store_expr (tree exp, rtx target, int call_param_p
 {
   rtx temp;
   rtx alt_rtl = NULL_RTX;
-  location_t loc = EXPR_LOCATION (exp);
+  location_t loc = curr_insn_location ();
 
   if (VOID_TYPE_P (TREE_TYPE (exp)))
 {
@@ -7869,31 +7869,7 @@ expand_expr_real (tree exp, rtx target, enum machi
   return ret ? ret : const0_rtx;
 }
 
-  /* If this 

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 8:49 AM, Matthew Gretton-Dann wrote:
 On 30 October 2012 05:20, Teresa Johnson wrote:
 Index: cfgrtl.c
 ===
 --- cfgrtl.c(revision 192692)
 +++ cfgrtl.c(working copy)
 @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
   partition boundaries).  See  the comments at the top of
   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

 -  if (BB_PARTITION (a) != BB_PARTITION (b))
 +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
 +  || BB_PARTITION (a) != BB_PARTITION (b))
  return false;

/* Protect the loop latches.  */
 @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
   partition boundaries).  See  the comments at the top of
   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

 -  if (BB_PARTITION (a) != BB_PARTITION (b))
 +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
 +  || BB_PARTITION (a) != BB_PARTITION (b))
  return false;

/* Protect the loop latches.  */

 As this if() condition seems to be the canonical way to detect being
 in a different partition should it be moved out into a query function,
 and all of cfgrtl.c updated to use it?

Not just in cfgrtl.c but for example also in ifcvt.c (which currently
only tests for notes, that's broken).

Ciao!
Steven


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
BTW, one thing I found confusing is that in expr.c, some code is for
frontend, while some are for rtl. Shall we separate them into two
files? And we don't expect to see EXPR_LOCATION in the rtl side.

Thanks,
Dehao


Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Florian Weimer

On 10/30/2012 05:17 PM, Paolo Carlini wrote:


Sorry, I don't know the code well enough to review your patch, but since I'm in 
CC, I still don't understand why, instead of adding a full libstdc++ testcase 
you are extending a C++ testcase, in old-deja even, normally considered legacy.


AFAIK, this is the only place we have such a test.  I suppose I could it 
put it into testsuite/18_support, but I would have to duplicate a bit of 
the machinery of the original test case because I can't just write a 
class and take the address of its constructor and destructor (whose 
addresses are passed to the tested functions).  I really didn't want to 
do that because there are some platform dependencies (the __ARM_EABI__ 
#ifdef).


Not sure if this makes sense, but those were my reasons.

--
Florian Weimer / Red Hat Product Security Team


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
 The debugger isn't the only consumer of debug info, and other tools might need
 a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work
 around a debug info size issue seems very short-sighted (to say the least).

Hi, Eric,

There might be some misunderstanding here. Clearing EXPR_LOCATION is
not a work around to reduce debug size. It is aiming at making gcc
implementation cleaner. And before we resetting it, we need to truely
make sure nothing after gimple-low is dependent on it. Please let me
know if you have other concerns.

Thanks,
Dehao


 --
 Eric Botcazou


[PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when
vectorization is enabled on powerpc64-linux.  A reduced test case is:

bergner@bns:~/gcc/BUGS cat foo.i 
static void (*const init_array []) (void)
  __attribute__ ((section (.init_array), aligned (sizeof (void *)), used))
= { 0 };

bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o
bad.s

bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i
-o good.s

bergner@bns:~/gcc/BUGS diff -u bad.s good.s 
--- bad.s2012-10-30 10:41:15.0 -0500
+++ good.s2012-10-30 10:41:23.0 -0500
@@ -2,7 +2,7 @@
 .section.toc,aw
 .section.text
 .section.init_array,a
-.align 4
+.align 3
 .typeinit_array, @object
 .sizeinit_array, 8
 init_array:

The above is bad, because the extra alignment causes the linker to add some
null padding to the init_array and the loader isn't expecting that and ends
up segv'ing.  I'd like to backport Richard's patch below to the 4.7 branch.
The patch bootstrapped and regtested on powerpc64-linux with no regressions.

Is it ok for the 4.7 branch?

Peter


Backport from mainline
2012-06-19  Richard Guenther  rguent...@suse.de

PR tree-optimization/53708
* tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve
user-supplied alignment and alignment of decls with the used
attribute.
 
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 192988)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -4574,6 +4574,12 @@
   if (TREE_ASM_WRITTEN (decl))
 return false;
 
+  /* Do not override explicit alignment set by the user or the alignment
+ as specified by the ABI when the used attribute is set.  */
+  if (DECL_USER_ALIGN (decl)
+  || DECL_PRESERVE_P (decl))
+return false;
+
   if (TREE_STATIC (decl))
 return (alignment = MAX_OFILE_ALIGNMENT);
   else




Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 9:26 AM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 Hot/cold partitioning is apparently a hot topic all of a sudden, which
 is a good thing of course, because it's in need of some TLC.

 The attached patch adds another check the RTL cfg checking
 (verify_flow_info) for the partitioning: A hot block can never be
 dominated by a cold block (because the dominated block must also be
 cold). This trips in PR55121.

 I haven't tested this with any profiling tests, but it's bound to
 break things. From my POV, whatever gets broken by this patch was
 already broken to begin with :-)   If you're in CC, it's because I
 hope you can help test this patch.

I will try testing your patch on top of mine with our fdo benchmarks.
For the others on the cc list, you may need to include my patch as
well for testing. Without it, -freorder-blocks-and-partition was DOA
for me. For my patch, see
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html

Teresa


 Downside of this patch is that I need dominance info. If it's not
 available, I compute and free it. I'm not sure if this works if
 dominance info status is DOM_NO_FAST_QUERY, and I don't want to
 recompute in this case because IMHO a verifier should be a no-op from
 the POV of the rest of the compiler, and updating dominators would
 make this patch a not-a-no-op :-)

 Thoughts/comments?

 Ciao!
 Steven

 * cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the
 hot partition are not dominated by blocks in the cold partition.

 Index: cfgrtl.c
 ===
 --- cfgrtl.c(revision 191819)
 +++ cfgrtl.c(working copy)
 @@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void)
rtx x;
int err = 0;
basic_block bb;
 +  bool have_partitions = false;

/* Check the general integrity of the basic blocks.  */
FOR_EACH_BB_REVERSE (bb)
 @@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void)
 n_eh++;
   else if (e-flags  EDGE_ABNORMAL)
 n_abnormal++;
 +
 + have_partitions |= is_crossing;
 }

if (n_eh  !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
 @@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void)
   }
  }

 +  /* If there are partitions, do a sanity check on them: A basic block in
 + a cold partition cannot dominate a basic block in a hot partition.  */
 +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
 +  if (have_partitions  !err)
 +FOR_EACH_BB (bb)
 +  if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
 +   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
 +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
 +{
 +  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
 +  basic_block son;
 +
 +  if (dom_calculated_here)
 +   calculate_dominance_info (CDI_DOMINATORS);
 +
 +  while (! VEC_empty (basic_block, bbs_in_cold_partition))
 +   {
 + bb = VEC_pop (basic_block, bbs_in_cold_partition);
 + if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
 +   {
 + error (non-cold basic block %d dominated 
 +by a block in the cold partition, bb-index);
 + err = 1;
 +   }
 + for (son = first_dom_son (CDI_DOMINATORS, bb);
 +  son;
 +  son = next_dom_son (CDI_DOMINATORS, son))
 +   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
 +   }
 +
 +  if (dom_calculated_here)
 +   free_dominance_info (CDI_DOMINATORS);
 +}
 +
/* Clean up.  */
return err;
  }



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
 gcc/ChangeLog:
 2012-10-25  Dehao Chen  de...@google.com

 * tree-eh.c (do_return_redirection): Set location for jump statement.
 (do_goto_redirection): Likewise.
 (frob_into_branch_around): Likewise.
 (lower_try_finally_nofallthru): Likewise.
 (lower_try_finally_copy): Likewise.
 (lower_try_finally_switch): Likewise.
 * expr.c (store_expr): Use current insn location instead of expr
 location.
 (expand_expr_real): Likewise.
 (expand_expr_real_1): Likewise.

 gcc/testsuite/ChangeLog:
 2012-10-25  Dehao Chen  de...@google.com

 * g++.dg/debug/dwarf2/block.C: New testcase.

This patch bootstrapped and passed gcc regression tests.

Okay for trunk?

Thanks,
Dehao


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
 Index: bb-reorder.c
 ===
 --- bb-reorder.c(revision 192692)
 +++ bb-reorder.c(working copy)
 @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
 first_partition = BB_PARTITION (bb);
if (BB_PARTITION (bb) != first_partition)
 {
 +  /* There should be a barrier between text sections. */
 +  emit_barrier_after (BB_END (bb-prev_bb));

So why isn't there one? There can't be a fall-through edge from one
section to the other, so cfgrtl.c:fixup_reorder_chain should have
added a barrier here already in the code under the comment:

  /* Now add jumps and labels as needed to match the blocks new
 outgoing edges.  */

Why isn't it doing that for you?

BTW, something else I noted in cfgrtl.c:
NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in
duplicate_insn_chain, so the following is necessary for robustness:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 191819)
+++ cfgrtl.c(working copy)
@@ -3615,7 +3615,6 @@
  break;

case NOTE_INSN_EPILOGUE_BEG:
-   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
  emit_note_copy (insn);
  break;


There can be only one! One note to rule them all! etc.


 Index: cfgrtl.c
 ===
 --- cfgrtl.c(revision 192692)
 +++ cfgrtl.c(working copy)
 @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
   partition boundaries).  See  the comments at the top of
   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

 -  if (BB_PARTITION (a) != BB_PARTITION (b))
 +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
 +  || BB_PARTITION (a) != BB_PARTITION (b))
  return false;

My dislike for this whole scheme just continues to grow... How can
there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)?
That is a bug. We should not need the notes here.

As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be
the canonical way to check whether two blocks are in the same
partition, and the EDGE_CROSSING flag should be set iff an edge
crosses from one section to another. The REG_CROSSING_JUMP note should
only be used to see if a JUMP_INSN may jump to another section,
without having to check all successor edges.

Any place where we have to check the BB_PARTITION or
edge-flagsEDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in
the partitioning updating.

Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so
that slim RTL dumping breaks. I need this patchlet to make things
work:
Index: sched-vis.c
===
--- sched-vis.c (revision 191819)
+++ sched-vis.c (working copy)
@@ -553,6 +553,11 @@
 {
   char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN];

+  if (! x)
+{
+  sprintf (buf, (nil));
+  return;
+}
   switch (GET_CODE (x))
 {
 case SET:

Ciao!
Steven


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Andrey Turetskiy
Thanks for explanation, I understand it.
I fixed issue which you marked. Changelog is unchanged.

---
Best regards,
Andrey Turetskiy

On Tue, Oct 30, 2012 at 7:40 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy
 andrey.turets...@gmail.com wrote:
 I changed the patch according Uros' remarks. Please, have a look.

 Changelog:

 2012-10-30  Andrey Turetskiy  andrey.turets...@gmail.com

* config/i386/i386.c (bdesc_args): Rename 
 CODE_FOR_avx2_umulhrswv16hi3 to
CODE_FOR_avx2_pmulhrswv16hi3.
* config/i386/predicates.md (const1_operand): Extend for vectors.
* config/i386/sse.md (ssse3_avx2): Extend.
(ssedoublemode): Ditto.
(sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
(*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
(PMULHRSW): New.
(ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3,
ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
(*ssse3_pmulhrswv8hi3): Ditto.
(*ssse3_pmulhrswv4hi3): Ditto.

 +{
 +  ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands);
 +  operands[3] = CONST1_RTX(MODEmode);
 +})

 Please put operands[3] initialization before the call to
 ix86_f_b_o_n_c. We don't want to pass uninitialized operands around.

 +{
 +  if (which_alternative == 0
 +   (MODEmode == V16QImode
 +  || MODEmode == V8HImode))
 +return pavgssemodesuffix\t{%2, %0|%0, %2};
 +  return vpavgssemodesuffix\t{%2, %1, %0|%0, %1, %2};
 +}
 +  [(set (attr isa)
 + (if_then_else
 +   (match_test which_alternative == 0  (MODEmode ==
 V16QImode || MODEmode == V8HImode))
 + (const_string noavx)
 + (const_string avx)))
 +   (set (attr prefix_data16)
 +(if_then_else (eq_attr isa noavx)
 +  (const_string 1)
 +  (const_string *)))
 +   (set (attr prefix)
 +(if_then_else (eq_attr isa noavx)
 +  (const_string orig)
 +  (const_string vex)))

 Uh, oh.

 Please note that isa attribute enables or disables the alternative
 through enabled attribute. Just change the mode attribute to
 sseinsnmode and everything will magically work:
 - AVX2 implies AVX, so it enables alternative 1, while disabling
 alternative 0 (and vice versa when AVX is disabled through noavx isa
 attribute).
 - Correct modes are conditionally enabled via VI12_AVX2 iterator
 - Base ISA level is enabled via insn predicate (TARGET_SSE2).

 You have to track three dependant conditions to calculate how insn
 pattern/mode/operand predicate are enabled ;)

 Uros,


const_vector_replace.patch
Description: Binary data


Re: [patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o

2012-10-30 Thread Ian Lance Taylor
On Tue, Oct 30, 2012 at 12:10 AM, Ralf Corsepius
ralf.corsep...@rtems.org wrote:

 I would like to apply the patch below to trunk and gcc-4.7-branch.

 This patch was originalyl submitted by Joel Sherrill back in May
 (http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html),
 but had never received any feedback.

 It has been part of the rtems-gcc patches, since then.

You are an RTEMS maintainer; you don't need any additional approval to
commit this patch to trunk.

Ian


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Uros Bizjak
On Tue, Oct 30, 2012 at 6:53 PM, Andrey Turetskiy
andrey.turets...@gmail.com wrote:
 Thanks for explanation, I understand it.
 I fixed issue which you marked. Changelog is unchanged.

 I changed the patch according Uros' remarks. Please, have a look.

 Changelog:

 2012-10-30  Andrey Turetskiy  andrey.turets...@gmail.com

* config/i386/i386.c (bdesc_args): Rename 
 CODE_FOR_avx2_umulhrswv16hi3 to
CODE_FOR_avx2_pmulhrswv16hi3.
* config/i386/predicates.md (const1_operand): Extend for vectors.
* config/i386/sse.md (ssse3_avx2): Extend.
(ssedoublemode): Ditto.
(sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
(*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
(PMULHRSW): New.
(ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3,
ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.

Replace const_vector with const1_operand predicate.

(*ssse3_pmulhrswv8hi3): Ditto.
(*ssse3_pmulhrswv4hi3): Ditto.

Yes, the patch is OK for mainline SVN.

BTW: Probably, pmulhrsw insn patterns can be merged, too, but this can
be a follow-up patch.

Thanks,
Uros.


Re: [patch] Unify bitmap interface.

2012-10-30 Thread Lawrence Crowl
On 10/30/12, Diego Novillo dnovi...@google.com wrote:
 On Oct 30, 2012 Bin.Cheng amker.ch...@gmail.com wrote:
  Just one question: Should we change the name of functions
  sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/
  sbitmap_union_of_succs/sbitmap_union_of_preds too? It might
  be a little confusing that sbitmap_* is used among bitmap_*.

 Yes.  Lawrence is proceeding with this unification in stages.
 The next few patches should rename these.

Actually, I didn't know about them because they are auxillary
routines declared outside of the bitmap files.  I've gone ahead
and changed them as well.

 The only two sets of functions that will remain separate for
 now are the iterators and the bitmap creation routines, I think.
 Lawrence?

The iterator functions have been unified, but not the iterator
type names.

-- 
Lawrence Crowl


Non-dominating loop bounds in tree-ssa-loop-niter 2/4

2012-10-30 Thread Jan Hubicka
Hi,
this patch implements the second part of planned change - to determine loop 
bounds
based by shortest path discovery.  This allows to bound number of iterations
on loops with bounds in statements that do not dominate the latch.

I originally planned to implement this as part of maybe_lower_iteration_bound,
but I found doing it separately is more readable.  While both performs walk of
the loop body, both do that for different reasons.

discover_iteration_bound_by_body_walk walks from header to latch, while
maybe_lower_iteration_bound walks from header to first statement with side
effects looking if there is exit on the way.

Both walks can be skipped for many loops, but each with different reasons.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-loop-niter.c (double_int_cmp, bound_index,
discover_iteration_bound_by_body_walk): New functions.
(discover_iteration_bound_by_body_walk): Use it.

* gcc.dg/tree-ssa/loop-38.c: New testcase.

Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192989)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2955,6 +2955,234 @@ gcov_type_to_double_int (gcov_type val)
   return ret;
 }
 
+/* Compare double ints, callback for qsort.  */
+
+int
+double_int_cmp (const void *p1, const void *p2)
+{
+  const double_int *d1 = (const double_int *)p1;
+  const double_int *d2 = (const double_int *)p2;
+  if (*d1 == *d2)
+return 0;
+  if (d1-ult (*d2))
+return -1;
+  return 1;
+}
+
+/* Return index of BOUND in BOUNDS array sorted in increasing order.
+   Lookup by binary search.  */
+
+int
+bound_index (VEC (double_int, heap) *bounds, double_int bound)
+{
+  unsigned int end = VEC_length (double_int, bounds);
+  unsigned int begin = 0;
+
+  /* Find a matching index by means of a binary search.  */
+  while (begin != end)
+{
+  unsigned int middle = (begin + end) / 2;
+  double_int index = VEC_index (double_int, bounds, middle);
+
+  if (index == bound)
+   return middle;
+  else if (index.ult (bound))
+   begin = middle + 1;
+  else
+   end = middle;
+}
+  gcc_unreachable ();
+}
+
+/* Used to hold vector of queues of basic blocks bellow.  */
+typedef VEC (basic_block, heap) *bb_queue;
+DEF_VEC_P(bb_queue);
+DEF_VEC_ALLOC_P(bb_queue,heap);
+
+/* We recorded loop bounds only for statements dominating loop latch (and thus
+   executed each loop iteration).  If there are any bounds on statements not
+   dominating the loop latch we can improve the estimate by walking the loop
+   body and seeing if every path from loop header to loop latch contains
+   some bounded statement.  */
+
+static void
+discover_iteration_bound_by_body_walk (struct loop *loop)
+{
+  pointer_map_t *bb_bounds;
+  struct nb_iter_bound *elt;
+  VEC (double_int, heap) *bounds = NULL;
+  VEC (bb_queue, heap) *queues = NULL;
+  bb_queue queue = NULL;
+  ptrdiff_t queue_index;
+  ptrdiff_t latch_index = 0;
+  pointer_map_t *visited;
+
+  /* Discover what bounds may interest us.  */
+  for (elt = loop-bounds; elt; elt = elt-next)
+{
+  double_int bound = elt-bound;
+
+  /* Exit terminates loop at given iteration, while non-exits produce 
undefined
+effect on the next iteration.  */
+  if (!elt-is_exit)
+   {
+ bound += double_int_one;
+ /* Overflow, give up on this bound.  */
+ if (bound == double_int_zero)
+   continue;
+   }
+
+  if (!loop-any_upper_bound
+ || bound.ult (loop-nb_iterations_upper_bound))
+VEC_safe_push (double_int, heap, bounds, bound);
+}
+
+  /* Exit early if there is nothing to do.  */
+  if (!bounds)
+return;
+
+  if (dump_file  (dump_flags  TDF_DETAILS))
+fprintf (dump_file,  Trying to walk loop body to reduce the bound.\n);
+
+  /* Sort the bounds in decreasing order.  */
+  qsort (VEC_address (double_int, bounds), VEC_length (double_int, bounds),
+sizeof (double_int), double_int_cmp);
+
+  /* For every basic block record the lowest bound that is guaranteed to
+ terminate the loop.  */
+
+  bb_bounds = pointer_map_create ();
+  for (elt = loop-bounds; elt; elt = elt-next)
+{
+  double_int bound = elt-bound;
+  if (!elt-is_exit)
+   {
+ bound += double_int_one;
+ /* Overflow, give up on this bound.  */
+ if (bound == double_int_zero)
+   continue;
+   }
+
+  if (!loop-any_upper_bound
+ || bound.ult (loop-nb_iterations_upper_bound))
+   {
+ ptrdiff_t index = bound_index (bounds, bound);
+ void **entry = pointer_map_contains (bb_bounds,
+  gimple_bb (elt-stmt));
+ if (!entry)
+   *pointer_map_insert (bb_bounds,
+gimple_bb (elt-stmt)) = (void *)index;
+ else if ((ptrdiff_t)*entry  index)
+   *entry = (void *)index;
+   }
+}
+

Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
On Tue, 2012-10-30 at 11:58 -0500, Peter Bergner wrote:
 I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when
 vectorization is enabled on powerpc64-linux.  A reduced test case is:
 
 bergner@bns:~/gcc/BUGS cat foo.i 
 static void (*const init_array []) (void)
   __attribute__ ((section (.init_array), aligned (sizeof (void *)), used))
 = { 0 };
 
 bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc
 -B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o
 bad.s
 
 bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc
 -B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i
 -o good.s
 
 bergner@bns:~/gcc/BUGS diff -u bad.s good.s 
 --- bad.s2012-10-30 10:41:15.0 -0500
 +++ good.s2012-10-30 10:41:23.0 -0500
 @@ -2,7 +2,7 @@
  .section.toc,aw
  .section.text
  .section.init_array,a
 -.align 4
 +.align 3
  .typeinit_array, @object
  .sizeinit_array, 8
  init_array:
 
 The above is bad, because the extra alignment causes the linker to add some
 null padding to the init_array and the loader isn't expecting that and ends
 up segv'ing.  I'd like to backport Richard's patch below to the 4.7 branch.
 The patch bootstrapped and regtested on powerpc64-linux with no regressions.

Commenting on Richard's question from the bugzilla:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10

I suppose if attribute((__aligned__)) truly does just set a minimum alignment
value (and the documentation seems to say that) and the compiler is free to
arbitrarily increase it, then the GLIBC code to scan the init_array needs to
be tolerant of null values in init_array.

Does everyone agree with that assessment?

Peter





Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:48 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
 Index: bb-reorder.c
 ===
 --- bb-reorder.c(revision 192692)
 +++ bb-reorder.c(working copy)
 @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
 first_partition = BB_PARTITION (bb);
if (BB_PARTITION (bb) != first_partition)
 {
 +  /* There should be a barrier between text sections. */
 +  emit_barrier_after (BB_END (bb-prev_bb));

 So why isn't there one? There can't be a fall-through edge from one
 section to the other, so cfgrtl.c:fixup_reorder_chain should have
 added a barrier here already in the code under the comment:

   /* Now add jumps and labels as needed to match the blocks new
  outgoing edges.  */

 Why isn't it doing that for you?

Maybe it's because fix_up_fall_thru_edges calls force_nonfallthru,
which is incorrectly inserting JUMP_INSNs and BARRIERs in cfglayout
mode. I'm going to test this patch:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 192889)
+++ cfgrtl.c(working copy)
@@ -1511,16 +1511,17 @@ force_nonfallthru_and_redirect (edge e,
 #endif
}
   set_return_jump_label (BB_END (jump_block));
+  emit_barrier_after (BB_END (jump_block));
 }
-  else
+  else if (current_ir_type () == IR_RTL_CFGRTL)
 {
   rtx label = block_label (target);
   emit_jump_insn_after_setloc (gen_jump (label), BB_END (jump_block), loc);
   JUMP_LABEL (BB_END (jump_block)) = label;
   LABEL_NUSES (label)++;
+  emit_barrier_after (BB_END (jump_block));
 }

-  emit_barrier_after (BB_END (jump_block));
   redirect_edge_succ_nodup (e, target);

   if (abnormal_edge_flags)


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote:
 Commenting on Richard's question from the bugzilla:
 
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10
 
 I suppose if attribute((__aligned__)) truly does just set a minimum alignment
 value (and the documentation seems to say that) and the compiler is free to
 arbitrarily increase it, then the GLIBC code to scan the init_array needs to
 be tolerant of null values in init_array.
 
 Does everyone agree with that assessment?

I think the right test is for user supplied section name,
this approach is so common that I think we have to support it.
E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g.
crtstuff.c).

Jakub


Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*

2012-10-30 Thread Richard Sandiford
I can't approve the whole thing of course, but I like the idea.
However...

Joern Rennecke joern.renne...@embecosm.com writes:
 +@deftypevr {Target Hook} bool TARGET_HAVE_CC0
 +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC
 +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS
 +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED
 +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
 +These flags are automatically generated;  you should not override them in 
 @file{tm.c}.
 +@end deftypevr

Unless this goes against something already discussed, I think it'd be
better to leave these out until there's a concerted attempt to use them
somewhere.  On its own this isn't even a partial transition. :-)

 +  /* We make an exception here to provide stub definitions for
 + insn_*_length* / get_attr_enabled functions.  */
 +  puts (#if !HAVE_ATTR_length\n
 + extern int hook_int_rtx_0 (rtx);\n
 + #define insn_default_length hook_int_rtx_0\n
 + #define insn_min_length hook_int_rtx_0\n
 + #define insn_variable_length_p hook_int_rtx_0\n
 + #define insn_current_length hook_int_rtx_0\n
 + #include \insn-addr.h\\n
 + #endif\n

I'd prefer defaults that call gcc_unreachable, rather than silently
return an arbitrary value.  That said,

 + #if !HAVE_ATTR_enabled\n
 + extern int hook_int_rtx_1 (rtx);\n
 + #define get_attr_enabled hook_int_rtx_1\n
 + #endif\n);

I agree that 1 is a safe default here.

Richard


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
On Tue, 2012-10-30 at 19:55 +0100, Jakub Jelinek wrote:
 On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote:
  Commenting on Richard's question from the bugzilla:
  
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10
  
  I suppose if attribute((__aligned__)) truly does just set a minimum 
  alignment
  value (and the documentation seems to say that) and the compiler is free to
  arbitrarily increase it, then the GLIBC code to scan the init_array needs to
  be tolerant of null values in init_array.
  
  Does everyone agree with that assessment?
 
 I think the right test is for user supplied section name,
 this approach is so common that I think we have to support it.
 E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g.
 crtstuff.c).

Ok, then I'll bootstrap and regtest your suggested change while we
wait for richi to comment.  I'm fine with whatever you and richi
decide is best.  The ObjC guys should probably test it though too.

I assume you think we should change the current trunk code as well?

Peter





[committed] A global default for SLOW_UNALIGNED_ACCESS

2012-10-30 Thread Richard Sandiford
This patch replaces three separate default definitions of
SLOW_UNALIGNED_ACCESS with a single global one.  Note that tm.texi
requires SLOW_UNALIGNED_ACCESS to be true if STRICT_ALIGNMENT.

Tested on x86_64-linux-gnu, powerpc64-linux-gnu and mipsisa64-elf.
Applied as obvious.

Richard


gcc/
* defaults.h (SLOW_UNALIGNED_ACCESS): Provide default definition.
* expmed.c (SLOW_UNALIGNED_ACCESS): Remove default definition.
* expr.c (SLOW_UNALIGNED_ACCESS): Likewise.
* lra-constraints.c (SLOW_UNALIGNED_ACCESS): Likewise.
(simplify_operand_subreg): Don't check STRICT_ALIGNMENT here.

Index: gcc/defaults.h
===
--- gcc/defaults.h  2012-08-02 21:10:06.0 +0100
+++ gcc/defaults.h  2012-10-28 10:30:47.340353996 +
@@ -1218,6 +1218,10 @@ #define MINIMUM_ALIGNMENT(EXP,MODE,ALIGN
 #define ATTRIBUTE_ALIGNED_VALUE BIGGEST_ALIGNMENT
 #endif
 
+#ifndef SLOW_UNALIGNED_ACCESS
+#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
+#endif
+
 /* For most ports anything that evaluates to a constant symbolic
or integer value is acceptable as a constant address.  */
 #ifndef CONSTANT_ADDRESS_P
Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:25:12.0 +
+++ gcc/expmed.c2012-10-28 10:30:44.178354004 +
@@ -69,11 +69,6 @@ static rtx expand_sdiv_pow2 (enum machin
 /* Test whether a value is zero of a power of two.  */
 #define EXACT_POWER_OF_2_OR_ZERO_P(x) (((x)  ((x) - 1)) == 0)
 
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
-#endif
-
-
 /* Reduce conditional compilation elsewhere.  */
 #ifndef HAVE_insv
 #define HAVE_insv  0
Index: gcc/expr.c
===
--- gcc/expr.c  2012-10-25 10:08:06.0 +0100
+++ gcc/expr.c  2012-10-28 10:31:44.133353857 +
@@ -189,12 +189,6 @@ #define STORE_BY_PIECES_P(SIZE, ALIGN) \
   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
 (unsigned int) MOVE_RATIO (optimize_insn_for_speed_p ()))
 #endif
-
-/* SLOW_UNALIGNED_ACCESS is nonzero if unaligned accesses are very slow.  */
-
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
-#endif
 
 /* This is run to set up which modes can be used
directly in memory and to initialize the block move optab.  It is run
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2012-10-26 11:50:16.0 +0100
+++ gcc/lra-constraints.c   2012-10-28 10:32:02.499353813 +
@@ -1105,10 +1105,6 @@ process_addr_reg (rtx *loc, rtx *before,
   return true;
 }
 
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(mode, align) 0
-#endif
-
 /* Make reloads for subreg in operand NOP with internal subreg mode
REG_MODE, add new reloads for further processing.  Return true if
any reload was generated.  */
@@ -1132,8 +1128,7 @@ simplify_operand_subreg (int nop, enum m
  address might violate the necessary alignment or the access might
  be slow.  So take this into consideration. */
   if ((MEM_P (reg)
-((! STRICT_ALIGNMENT
-! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)))
+(! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
   || MEM_ALIGN (reg) = GET_MODE_ALIGNMENT (mode)))
   || (REG_P (reg)  REGNO (reg)  FIRST_PSEUDO_REGISTER))
 {


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-10-30 Thread Jason Merrill

On 10/27/2012 09:16 PM, Sriraman Tallam wrote:

+ /* See if there's a match.   For functions that are multi-versioned,
+all the versions match.  */
  if (same_type_p (target_fn_type, static_fn_type (fn)))
-   matches = tree_cons (fn, NULL_TREE, matches);
+   {
+ matches = tree_cons (fn, NULL_TREE, matches);
+ /*If versioned, push all possible versions into a vector.  */
+ if (DECL_FUNCTION_VERSIONED (fn))
+   {
+ if (fn_ver_vec == NULL)
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+ VEC_safe_push (tree, heap, fn_ver_vec, fn);
+   }
+   }


Why do we need to keep both a list and vector of the matches?


+Call decls_match to make sure they are different because they are
+versioned.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+   {
+  for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+   if (decls_match (fn, TREE_PURPOSE (match)))
+ break;
+   }


What if you have multiple matches that aren't all versions of the same 
function?


Why would it be a problem to have two separate declarations of the same 
function?



+  dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);


Is the idea here that if you have some versions declared, then a call, 
then more versions declared, then another call, you will call two 
different dispatchers, where the first one will only dispatch to the 
versions declared before the first call?  If not, why do we care about 
the set of declarations at this point?



+  /* Mark this functio to be output.  */
+  node-local.finalized = 1;


Missing 'n' in function.


@@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
  else
break;
}
-  name = DECL_ASSEMBLER_NAME (decl);
+  if (TREE_CODE (decl) == FUNCTION_DECL
+ DECL_FUNCTION_VERSIONED (decl))
+  name = DECL_NAME (decl);


This would mean that f in the global namespace and f in namespace foo 
would end up in the same comdat group.  Why do we need special handling 
here at all?



 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;

+  /* For function versions, use the assembler name as the decl name is
+ the same for all versions.  */
+  if (TREE_CODE (t) == FUNCTION_DECL
+   DECL_FUNCTION_VERSIONED (t))
+name = DECL_ASSEMBLER_NAME (t);


This shouldn't be necessary; we should print the target attribute when 
printing the function declaration.



+Also, mark this function as needed if it is marked inline but
+is a multi-versioned function.  */
+  if (((flag_keep_inline_functions
+   || DECL_FUNCTION_VERSIONED (fn))


This should be marked as needed by the code that builds the dispatcher.


+  /* For calls to a multi-versioned function, overload resolution
+ returns the function with the highest target priority, that is,
+ the version that will checked for dispatching first.  If this
+ version is inlinable, a direct call to this version can be made
+ otherwise the call should go through the dispatcher.  */


I'm a bit confused why people would want both dispatched calls and 
non-dispatched inlining; I would expect that if a function can be 
compiled differently enough on newer hardware to make versioning 
worthwhile, that would be a larger difference than the call overhead.



+  if (DECL_FUNCTION_VERSIONED (fn)
+   !targetm.target_option.can_inline_p (current_function_decl, fn))
+{
+  struct cgraph_node *dispatcher_node = NULL;
+  fn = get_function_version_dispatcher (fn);
+  if (fn == NULL)
+   return NULL;
+  dispatcher_node = cgraph_get_create_node (fn);
+  gcc_assert (dispatcher_node != NULL);
+  /* Mark this function to be output.  */
+  dispatcher_node-local.finalized = 1;
+}


Why do you need to mark this here?  If you generate a call to the 
dispatcher, cgraph should mark it to be output automatically.



+  /* For candidates of a multi-versioned function,  make the version with
+ the highest priority win.  This version will be checked for dispatching
+ first.  If this version can be inlined into the caller, the front-end
+ will simply make a direct call to this function.  */


This is still too high in joust.  I believe I said before that this code 
should come just above


   /* If the two function declarations represent the same function 
(this can
  happen with declarations in multiple scopes and arg-dependent 
lookup),
  arbitrarily choose one.  But first make sure the default args 
we're

  using match.  */


+  /* For multiversioned functions, aggregate all the versions here for
+ generating the dispatcher body later if necessary.  Check to see if
+ the dispatcher is already generated to avoid doing this more than
+ once.  */


This caching 

[0/8] Preparing for insv and ext(z)v optabs

2012-10-30 Thread Richard Sandiford
I'm finishing off some patches to allow insv, extv and extzv to be
defined as normal direct optabs (such as insvsi and insvdi rather
than just insv).  This series of patches does some groundwork to make
that possible.

The patches are not supposed to change the generated code.  I checked
that the assembly output for a set of gcc .ii files didn't change
for x86_64-linux-gnu, powerpc64-linux-gnu and mips64-linux-gnu.
Also regression-tested on x86_64-linux-gnu, powerpc64-linux-gnu
and mips64-elf.

Richard


[1/8] Remove redundant BLKmode test

2012-10-30 Thread Richard Sandiford
This patch removes what I believe is a redundant check in store_bit_field_1
for whether the value to insert (i.e. the rhs) has BLKmode.  We shouldn't
see BLKmode values here, and even if we did, the only effect of the test
is to fall through to store_fixed_bit_field, which can't handle BLKmode
either.  Specifically, store_fixed_bit_field would call:

  if (GET_MODE (value) != mode)
value = convert_to_mode (mode, value, 1);

and convert_to_mode ICEs on BLKmode values.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Remove test for BLKmode values.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:40:22.533352589 +
+++ gcc/expmed.c2012-10-28 10:40:23.119352588 +
@@ -670,7 +670,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
   if (HAVE_insv
-   GET_MODE (value) != BLKmode
bitsize  0
GET_MODE_BITSIZE (op_mode) = bitsize
   /* Do not use insv for volatile bitfields when


[2/8] Remove redundant MAX_MACHINE_MODE tests

2012-10-30 Thread Richard Sandiford
extract_bit_field_1 has a block beginning:

  /* If OP0 is a memory, try copying it to a register and seeing if a
 cheap register alternative is available.  */
  if (ext_mode != MAX_MACHINE_MODE  MEM_P (op0))
{

and within it there are tests for whether ext_mode != MAX_MACHINE_MODE.
This patch removes them.

store_bit_field_1 has a similar block, but it uses:

  /* If OP0 is a memory, try copying it to a register and seeing if a
 cheap register alternative is available.  */
  if (HAVE_insv  MEM_P (op0))
{

However, by definition, HAVE_insv is equivalent to op_mode != MAX_MACHINE_MODE,
so this patch changes it to that form for consistency.  It's then obvious
that the corresponding op_mode != MAX_MACHINE_MODE tests are redundant too.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Use OP_MODE to check whether an
insv pattern is available.  Remove redundant checks for OP_MODE
being MAX_MACHINE_MODE.
(extract_bit_field_1): Remove redundant checks for EXT_MODE being
MAX_MACHINE_MODE.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:54:24.0 +
+++ gcc/expmed.c2012-10-28 10:54:53.715350457 +
@@ -669,7 +669,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
  in a word.  */
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
-  if (HAVE_insv
+  if (op_mode != MAX_MACHINE_MODE
bitsize  0
GET_MODE_BITSIZE (op_mode) = bitsize
   /* Do not use insv for volatile bitfields when
@@ -788,7 +788,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
  cheap register alternative is available.  */
-  if (HAVE_insv  MEM_P (op0))
+  if (op_mode != MAX_MACHINE_MODE  MEM_P (op0))
 {
   enum machine_mode bestmode;
   unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
@@ -803,13 +803,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   if (GET_MODE (op0) == BLKmode
  || GET_MODE_BITSIZE (GET_MODE (op0))  maxbits
- || (op_mode != MAX_MACHINE_MODE
-  GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (op_mode)))
+ || GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (op_mode))
bestmode = get_best_mode (bitsize, bitnum,
  bitregion_start, bitregion_end,
- MEM_ALIGN (op0),
- (op_mode == MAX_MACHINE_MODE
-  ? VOIDmode : op_mode),
+ MEM_ALIGN (op0), op_mode,
  MEM_VOLATILE_P (op0));
   else
bestmode = GET_MODE (op0);
@@ -1597,12 +1594,9 @@ extract_bit_field_1 (rtx str_rtx, unsign
 smallest mode containing the field.  */
 
   if (GET_MODE (op0) == BLKmode
- || (ext_mode != MAX_MACHINE_MODE
-  GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (ext_mode)))
+ || GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (ext_mode))
bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
- (ext_mode == MAX_MACHINE_MODE
-  ? VOIDmode : ext_mode),
- MEM_VOLATILE_P (op0));
+ ext_mode, MEM_VOLATILE_P (op0));
   else
bestmode = GET_MODE (op0);
 


[3/8] Split out insv and ext(z)v code

2012-10-30 Thread Richard Sandiford
This patch splits out the code to handle insv and ext(z)v from
store_bit_field_1 and extract_bit_field_1 respectively.  I removed
x prefixes from some of the variables and tried to make the placement
of the REG and SUBREG handling more consistent, but there are no
behavioural changes.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_using_insv): New function,
split out from...
(store_bit_field_1): ...here.
(extract_bit_field_using_extv): New function, split out from...
(extract_bit_field_1): ...here.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-29 12:56:45.260327155 +
+++ gcc/expmed.c2012-10-29 13:11:08.244325044 +
@@ -404,6 +404,120 @@ lowpart_bit_field_p (unsigned HOST_WIDE_
 return bitnum % BITS_PER_WORD == 0;
 }
 
+/* Try to use an insv pattern to store VALUE into a field of OP0.
+   OP_MODE is the mode of the insertion and BITSIZE and BITNUM are
+   as for store_bit_field.  */
+
+static bool
+store_bit_field_using_insv (rtx op0, unsigned HOST_WIDE_INT bitsize,
+   unsigned HOST_WIDE_INT bitnum, rtx value,
+   enum machine_mode op_mode)
+{
+  struct expand_operand ops[4];
+  rtx value1;
+  rtx xop0 = op0;
+  rtx last = get_last_insn ();
+  bool copy_back = false;
+
+  unsigned int unit = GET_MODE_BITSIZE (op_mode);
+  if (bitsize == 0 || bitsize  unit)
+return false;
+
+  if (MEM_P (xop0))
+{
+  /* Get a reference to the first byte of the field.  */
+  xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT);
+  bitnum %= BITS_PER_UNIT;
+}
+  else
+{
+  /* Convert from counting within OP0 to counting in OP_MODE.  */
+  if (BYTES_BIG_ENDIAN)
+   bitnum += unit - GET_MODE_BITSIZE (GET_MODE (op0));
+
+  /* If xop0 is a register, we need it in OP_MODE
+to make it acceptable to the format of insv.  */
+  if (GET_CODE (xop0) == SUBREG)
+   /* We can't just change the mode, because this might clobber op0,
+  and we will need the original value of op0 if insv fails.  */
+   xop0 = gen_rtx_SUBREG (op_mode, SUBREG_REG (xop0), SUBREG_BYTE (xop0));
+  if (REG_P (xop0)  GET_MODE (xop0) != op_mode)
+   xop0 = gen_lowpart_SUBREG (op_mode, xop0);
+}
+
+  /* If the destination is a paradoxical subreg such that we need a
+ truncate to the inner mode, perform the insertion on a temporary and
+ truncate the result to the original destination.  Note that we can't
+ just truncate the paradoxical subreg as (truncate:N (subreg:W (reg:N
+ X) 0)) is (reg:N X).  */
+  if (GET_CODE (xop0) == SUBREG
+   REG_P (SUBREG_REG (xop0))
+   !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (SUBREG_REG (xop0)),
+op_mode))
+{
+  rtx tem = gen_reg_rtx (op_mode);
+  emit_move_insn (tem, xop0);
+  xop0 = tem;
+  copy_back = true;
+}
+
+  /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count
+ backwards from the size of the unit we are inserting into.
+ Otherwise, we count bits from the most significant on a
+ BYTES/BITS_BIG_ENDIAN machine.  */
+
+  if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+bitnum = unit - bitsize - bitnum;
+
+  /* Convert VALUE to op_mode (which insv insn wants) in VALUE1.  */
+  value1 = value;
+  if (GET_MODE (value) != op_mode)
+{
+  if (GET_MODE_BITSIZE (GET_MODE (value)) = bitsize)
+   {
+ /* Optimization: Don't bother really extending VALUE
+if it has all the bits we will actually use.  However,
+if we must narrow it, be sure we do it correctly.  */
+
+ if (GET_MODE_SIZE (GET_MODE (value))  GET_MODE_SIZE (op_mode))
+   {
+ rtx tmp;
+
+ tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0);
+ if (! tmp)
+   tmp = simplify_gen_subreg (op_mode,
+  force_reg (GET_MODE (value),
+ value1),
+  GET_MODE (value), 0);
+ value1 = tmp;
+   }
+ else
+   value1 = gen_lowpart (op_mode, value1);
+   }
+  else if (CONST_INT_P (value))
+   value1 = gen_int_mode (INTVAL (value), op_mode);
+  else
+   /* Parse phase is supposed to make VALUE's data type
+  match that of the component reference, which is a type
+  at least as wide as the field; so VALUE should have
+  a mode that corresponds to that type.  */
+   gcc_assert (CONSTANT_P (value));
+}
+
+  create_fixed_operand (ops[0], xop0);
+  create_integer_operand (ops[1], bitsize);
+  create_integer_operand (ops[2], bitnum);
+  create_input_operand (ops[3], value1, op_mode);
+  if (maybe_expand_insn (CODE_FOR_insv, 4, 

[4/8] Separate reg and mem uses of insv and ext(z)v

2012-10-30 Thread Richard Sandiford
This patch simply separates out the MEM and non-MEM insv and ext(z)v cases.
On it's own, it's probably a wash whether this is an improvement or not,
but it makes the optabs patches much easier.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Move generation of MEM insvs
to the MEM_P block.
(extract_bit_field_1): Likewise extvs and extzvs.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:55:22.754350385 +
+++ gcc/expmed.c2012-10-28 10:55:29.249350370 +
@@ -784,16 +784,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
   if (op_mode != MAX_MACHINE_MODE
-  /* Do not use insv for volatile bitfields when
- -fstrict-volatile-bitfields is in effect.  */
-   !(MEM_P (op0)  MEM_VOLATILE_P (op0)
-   flag_strict_volatile_bitfields  0)
-  /* Do not use insv if the bit region is restricted and
-op_mode integer at offset doesn't fit into the
-restricted region.  */
-   !(MEM_P (op0)  bitregion_end
-   bitnum - (bitnum % BITS_PER_UNIT) + GET_MODE_BITSIZE (op_mode)
-  bitregion_end + 1)
+   !MEM_P (op0)
store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode))
 return true;
 
@@ -804,6 +795,18 @@ store_bit_field_1 (rtx str_rtx, unsigned
   enum machine_mode bestmode;
   unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
 
+  /* Do not use insv for volatile bitfields when
+ -fstrict-volatile-bitfields is in effect.  */
+  if (!(MEM_VOLATILE_P (op0)  flag_strict_volatile_bitfields  0)
+ /* Do not use insv if the bit region is restricted and
+an op_mode integer doesn't fit into the restricted region.  */
+  !(bitregion_end
+   (bitnum - (bitnum % BITS_PER_UNIT)
+  + GET_MODE_BITSIZE (op_mode)
+   bitregion_end + 1))
+  store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode))
+   return true;
+
   if (bitregion_end)
maxbits = bitregion_end - bitregion_start + 1;
 
@@ -1594,11 +1597,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
  If OP0 is a register, it too fits within a word.  */
 
   ext_mode = mode_for_extraction (unsignedp ? EP_extzv : EP_extv, 0);
-  if (ext_mode != MAX_MACHINE_MODE
-  /* Do not use extv/extzv for volatile bitfields when
- -fstrict-volatile-bitfields is in effect.  */
-   !(MEM_P (op0)  MEM_VOLATILE_P (op0)
-   flag_strict_volatile_bitfields  0))
+  if (ext_mode != MAX_MACHINE_MODE  !MEM_P (op0))
 {
   rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum,
 unsignedp, target, mode,
@@ -1613,6 +1612,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
 {
   enum machine_mode bestmode;
 
+  /* Do not use extv/extzv for volatile bitfields when
+ -fstrict-volatile-bitfields is in effect.  */
+  if (!(MEM_VOLATILE_P (op0)  flag_strict_volatile_bitfields  0))
+   {
+ rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum,
+unsignedp, target, mode,
+tmode, ext_mode);
+ if (result)
+   return result;
+   }
+
   /* Get the mode to use for inserting into this field.  If
 OP0 is BLKmode, get the smallest mode consistent with the
 alignment. If OP0 is a non-BLKmode object that is no


Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote:
 I will try testing your patch on top of mine with our fdo benchmarks.

Thanks. But you should expect a lot of errors, hopefully you can make
something out of it for Bugzilla.

 For the others on the cc list, you may need to include my patch as
 well for testing. Without it, -freorder-blocks-and-partition was DOA
 for me. For my patch, see
 http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html

Ah, the fate of a pass that isn't enabled by default. From what I can
tell, looking at this code the past few hours, it's hopelessly broken
at the moment:

* It doesn't work with cfglayout mode, even though it is in between
pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out
of cfglayout mode

* Coming out of cfglayout mode, fixup_reorder_chain adds
REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called
just before it. So we end up with 1 such note, or with notes on jumps
that shouldn't have them.

* The scheduler doesn't respect the partitioning at all. remove_notes
happily removes the section split note.

* etc.


This pass may need some major work to be in good order again.

Ciao!
Steven


[5/8] Add narrow_bit_field_mem

2012-10-30 Thread Richard Sandiford
This patch splits out a fairly common operation: that of narrowing a MEM
to a particular mode and adjusting the bit number accordingly.

I've kept with bit_field rather than bitfield for consistency with
the callers, although we do have bitfield in adjust_bitfield_address.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (narrow_bit_field_mem): New function.
(store_bit_field_using_insv, store_bit_field_1, store_fixed_bit_field)
(extract_bit_field_1): Use it.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-30 19:25:44.797368678 +
+++ gcc/expmed.c2012-10-30 19:25:47.730368671 +
@@ -387,6 +387,23 @@ mode_for_extraction (enum extraction_pat
   return data-operand[opno].mode;
 }
 
+/* Adjust bitfield memory MEM so that it points to the first unit of
+   mode MODE that contains the bitfield at bit position BITNUM.
+   Set NEW_BITNUM to the bit position of the field within the
+   new memory.  */
+
+static rtx
+narrow_bit_field_mem (rtx mem, enum machine_mode mode,
+ unsigned HOST_WIDE_INT bitnum,
+ unsigned HOST_WIDE_INT new_bitnum)
+{
+  unsigned int unit = GET_MODE_BITSIZE (mode);
+  unsigned HOST_WIDE_INT offset = bitnum / unit * GET_MODE_SIZE (mode);
+  mem = adjust_bitfield_address (mem, mode, offset);
+  new_bitnum = bitnum % unit;
+  return mem;
+}
+
 /* Return true if a bitfield of size BITSIZE at bit number BITNUM within
a structure of mode STRUCT_MODE represents a lowpart subreg.   The subreg
offset is then BITNUM / BITS_PER_UNIT.  */
@@ -424,11 +441,8 @@ store_bit_field_using_insv (rtx op0, uns
 return false;
 
   if (MEM_P (xop0))
-{
-  /* Get a reference to the first byte of the field.  */
-  xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT);
-  bitnum %= BITS_PER_UNIT;
-}
+/* Get a reference to the first byte of the field.  */
+xop0 = narrow_bit_field_mem (xop0, byte_mode, bitnum, bitnum);
   else
 {
   /* Convert from counting within OP0 to counting in OP_MODE.  */
@@ -831,18 +845,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
GET_MODE_BITSIZE (bestmode)  MEM_ALIGN (op0)))
{
  rtx last, tempreg, xop0;
- unsigned int unit;
- unsigned HOST_WIDE_INT offset, bitpos;
+ unsigned HOST_WIDE_INT bitpos;
 
  last = get_last_insn ();
 
  /* Adjust address to point to the containing unit of
 that mode.  Compute the offset as a multiple of this unit,
 counting in bytes.  */
- unit = GET_MODE_BITSIZE (bestmode);
- offset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
- bitpos = bitnum % unit;
- xop0 = adjust_bitfield_address (op0, bestmode, offset);
+ xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos);
 
  /* Fetch that unit, store the bitfield in it, then store
 the unit.  */
@@ -975,9 +985,7 @@ store_fixed_bit_field (rtx op0, unsigned
  return;
}
 
-  HOST_WIDE_INT bit_offset = bitnum - bitnum % GET_MODE_BITSIZE (mode);
-  op0 = adjust_bitfield_address (op0, mode, bit_offset / BITS_PER_UNIT);
-  bitnum -= bit_offset;
+  op0 = narrow_bit_field_mem (op0, mode, bitnum, bitnum);
 }
 
   mode = GET_MODE (op0);
@@ -1246,11 +1254,8 @@ extract_bit_field_using_extv (rtx op0, u
 return NULL_RTX;
 
   if (MEM_P (op0))
-{
-  /* Get a reference to the first byte of the field.  */
-  op0 = adjust_bitfield_address (op0, byte_mode, bitnum / BITS_PER_UNIT);
-  bitnum %= BITS_PER_UNIT;
-}
+/* Get a reference to the first byte of the field.  */
+op0 = narrow_bit_field_mem (op0, byte_mode, bitnum, bitnum);
   else
 {
   /* Convert from counting within OP0 to counting in EXT_MODE.  */
@@ -1640,23 +1645,19 @@ extract_bit_field_1 (rtx str_rtx, unsign
   !(SLOW_UNALIGNED_ACCESS (bestmode, MEM_ALIGN (op0))
GET_MODE_BITSIZE (bestmode)  MEM_ALIGN (op0)))
{
- unsigned HOST_WIDE_INT offset, bitpos;
-
- /* Compute the offset as a multiple of this unit,
-counting in bytes.  */
- unsigned int unit = GET_MODE_BITSIZE (bestmode);
- offset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
- bitpos = bitnum % unit;
+ unsigned HOST_WIDE_INT bitpos;
+ rtx xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos);
 
- /* Make sure the register is big enough for the whole field.  */
- if (bitpos + bitsize = unit)
+ /* Make sure the register is big enough for the whole field.
+(It might not be if bestmode == GET_MODE (op0) and the input
+code was invalid.)  */
+ if (bitpos + bitsize = GET_MODE_BITSIZE (bestmode))
{
- rtx last, result, xop0;
+ rtx last, result;
 
 

[6/8] Simplify make_extraction

2012-10-30 Thread Richard Sandiford
On a change of tack, this tackles some redundant code in combine.
It has code to convert a variable bit position to the mode required
by the bit position operand to insv, extv or extzv:

[A]
  else if (pos_rtx != 0
GET_MODE_SIZE (pos_mode)  GET_MODE_SIZE (GET_MODE (pos_rtx)))
pos_rtx = gen_lowpart (pos_mode, pos_rtx);

However, this is protected by an earlier:

[B]
  if (pos_rtx  GET_MODE (pos_rtx) != VOIDmode
   GET_MODE_SIZE (pos_mode)  GET_MODE_SIZE (GET_MODE (pos_rtx)))
pos_mode = GET_MODE (pos_rtx);

so [B] makes [A] redundant.  That leaves this block:

  /* Adjust mode of POS_RTX, if needed.  If we want a wider mode, we
 have to zero extend.  Otherwise, we can just use a SUBREG.  */
  if (pos_rtx != 0
   GET_MODE_SIZE (pos_mode)  GET_MODE_SIZE (GET_MODE (pos_rtx)))
{

as the only use of pos_mode, so [B] can go too.

Similarly, the memory case has:

[C]
  /* If we have to change the mode of memory and cannot, the desired mode
 is EXTRACTION_MODE.  */
  if (inner_mode != wanted_inner_mode
   (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
  || MEM_VOLATILE_P (inner)
  || pos_rtx))
wanted_inner_mode = extraction_mode;

but those same conditions are repeated in the code that actually
applies wanted_inner_mode:

  /* If INNER has a wider mode, and this is a constant extraction, try to
 make it smaller and adjust the byte to point to the byte containing
 the value.  */
  if (wanted_inner_mode != VOIDmode
   inner_mode != wanted_inner_mode
   ! pos_rtx
   GET_MODE_SIZE (wanted_inner_mode)  GET_MODE_SIZE (is_mode)
   MEM_P (inner)
   ! mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
   ! MEM_VOLATILE_P (inner))
{

so [C] too is unnecessary.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* combine.c (make_extraction): Remove dead wanted_inner_mode-
and pos_rtx-related code.

Index: gcc/combine.c
===
--- gcc/combine.c   2012-10-27 22:22:45.0 +0100
+++ gcc/combine.c   2012-10-27 22:32:26.811370582 +0100
@@ -7211,10 +7211,6 @@ make_extraction (enum machine_mode mode,
GET_MODE_SIZE (extraction_mode)  GET_MODE_SIZE (mode))
 extraction_mode = mode;
 
-  if (pos_rtx  GET_MODE (pos_rtx) != VOIDmode
-   GET_MODE_SIZE (pos_mode)  GET_MODE_SIZE (GET_MODE (pos_rtx)))
-pos_mode = GET_MODE (pos_rtx);
-
   /* If this is not from memory, the desired mode is the preferred mode
  for an extraction pattern's first input operand, or word_mode if there
  is none.  */
@@ -7231,14 +7227,6 @@ make_extraction (enum machine_mode mode,
  wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode);
  gcc_assert (wanted_inner_mode != VOIDmode);
}
-
-  /* If we have to change the mode of memory and cannot, the desired mode
-is EXTRACTION_MODE.  */
-  if (inner_mode != wanted_inner_mode
-  (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
- || MEM_VOLATILE_P (inner)
- || pos_rtx))
-   wanted_inner_mode = extraction_mode;
 }
 
   orig_pos = pos;
@@ -7359,9 +7347,6 @@ make_extraction (enum machine_mode mode,
}
   pos_rtx = temp;
 }
-  else if (pos_rtx != 0
-   GET_MODE_SIZE (pos_mode)  GET_MODE_SIZE (GET_MODE (pos_rtx)))
-pos_rtx = gen_lowpart (pos_mode, pos_rtx);
 
   /* Make POS_RTX unless we already have it and it is correct.  If we don't
  have a POS_RTX but we do have an ORIG_POS_RTX, the latter must


Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 12:25 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote:
 I will try testing your patch on top of mine with our fdo benchmarks.

 Thanks. But you should expect a lot of errors, hopefully you can make
 something out of it for Bugzilla.

 For the others on the cc list, you may need to include my patch as
 well for testing. Without it, -freorder-blocks-and-partition was DOA
 for me. For my patch, see
 http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html

 Ah, the fate of a pass that isn't enabled by default.

I see only 5 test cases under tree-prof for function splitting. More
test cases are needed.

David


 From what I can
 tell, looking at this code the past few hours, it's hopelessly broken
 at the moment:

 * It doesn't work with cfglayout mode, even though it is in between
 pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out
 of cfglayout mode

 * Coming out of cfglayout mode, fixup_reorder_chain adds
 REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called
 just before it. So we end up with 1 such note, or with notes on jumps
 that shouldn't have them.

 * The scheduler doesn't respect the partitioning at all. remove_notes
 happily removes the section split note.

 * etc.


 This pass may need some major work to be in good order again.

 Ciao!
 Steven


[7/8] BITS_BIG_ENDIAN vs. (zero_extract (const_int ...) ...)

2012-10-30 Thread Richard Sandiford
Combine tries to optimise comparisons involving:

(zero_extract (const_int X)
  (const_int 1)
  (var Y))

and so on BITS_BIG_ENDIAN targets it tries gamely to work out what mode
X actually has.  At the moment it tries reading the mode from operand 1
of extzv, but that doesn't feel right, since we never use extzv itself
with this combination of operands.  (We only use it with combinations
in which the first zero_extract operand is variable and the third is
constant.)  And extzv isn't necessarily a good indicator of what a
matched zero_extract does, since targets often have matchable zero_extract
insns for more than one mode.  E.g. powerpc (a BITS_BIG_ENDIAN target)
has both SImode and DImode patterns.

In practice, all BITS_BIG_ENDIAN targets that have an extzv pattern
either have an explicit word_mode operand (e.g. m68k) or leave it void,
which make_for_extraction treats as word_mode.  Since word_mode is also
the default assumption when no extzv pattern is defined, I think it would
be more robust to assume/require word_mode across the board (as much as
anything can be called robust in this sort of situation).

Tested as described in the covering note.  OK to install?

Richard


gcc/
* combine.c (simplify_comparison): If BITS_BIG_ENDIAN, always assume
that zero_extracts of const_ints are doing word-sized extractions.

Index: gcc/combine.c
===
--- gcc/combine.c   2012-10-29 14:14:36.371315725 +
+++ gcc/combine.c   2012-10-29 14:29:26.800313546 +
@@ -11154,17 +11154,7 @@ simplify_comparison (enum rtx_code code,
   (i = exact_log2 (UINTVAL (XEXP (op0, 0 = 0)
{
  if (BITS_BIG_ENDIAN)
-   {
- enum machine_mode new_mode
-   = mode_for_extraction (EP_extzv, 1);
- if (new_mode == MAX_MACHINE_MODE)
-   i = BITS_PER_WORD - 1 - i;
- else
-   {
- mode = new_mode;
- i = (GET_MODE_PRECISION (mode) - 1 - i);
-   }
-   }
+   i = BITS_PER_WORD - 1 - i;
 
  op0 = XEXP (op0, 2);
  op1 = GEN_INT (i);


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote:
 Ok, then I'll bootstrap and regtest your suggested change while we
 wait for richi to comment.  I'm fine with whatever you and richi
 decide is best.  The ObjC guys should probably test it though too.
 
 I assume you think we should change the current trunk code as well?

Well, I haven't looked at the ObjC failures (guess they are darwin only
anyway), so have no idea whether those set section name or not.
So, if my proposed test instead of the trunk one doesn't work for darwin,
perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) 
!...).  I think DECL_USER_ALIGN test is undesirable for that though, it is
just fine to increase alignment of anything, after all, it is still aligned
properly then.

Jakub


  1   2   >