Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call

2011-11-01 Thread Denis Chertykov
2011/10/31 Georg-Johann Lay a...@gjlay.de:
 This is a fix for optimization flaw when dividing int by 2.

 There is really no need for a library call. Costs of [U]DIV/[U]MOD are 
 adjusted
 to take into account the costs of CONST_INT operands that must be loaded for
 division by means of libgcc call.

 There are some new combiner patterns suffixed .lt0 that so adjustment
 frequently seen when division-by-const in lowered to arithmetic in order to
 avoid more expensive libcall.

 Moreover, there are two patterns for adding sign-extended QI to HI. These
 patterns are shorter, faster and have lower register pressure than explicitly
 sign-extending the QI before adding it.  Example code is:

 int add (int a, char b) { return a + b; }
 int sub (int a, char b) { return a - b; }

 add:
        add r24,r22      ;  13  *addhi3.sign_extend1    [length = 4]
        adc r25,__zero_reg__
        sbrc r22,7
        dec r25
        ret

 sub:
        sub r24,r22      ;  13  *subhi3.sign_extend2    [length = 4]
        sbc r25,__zero_reg__
        sbrc r22,7
        inc r25
        ret

 The reg_overlap_mentioned case is just for pathological code like, e.g.
   a + (char) a
 so that the expected size is 4 instructions.

 Since beginning of time, BRANCH_COST was set to 0 so that some optimization
 passes make code happily jumping around. The patch introduces a new command
 line option for that; mainly because I don't know the rationale behind setting
 BRANCH_COST to 0.

 Regression-tested.

 Ok for trunk?

 Johann

        * config/avr/avr.opt (-mbranch-cost=): New option.
        * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost.
        * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs.
        * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns.
        (*addhi3_zero_extend1): Remov % in constraint of operand 1.
        (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns.


Approved.

Denis.


[PATCH] Add vcond/vcondu patterns to sparc backend.

2011-11-01 Thread David Miller

I really wanted to make this work using the define_expand rtl to
generate the pattern, but I ran into two problems:

1) In addition to mode GCM, we also need to iterate over P mode
   for the sake of the rtl of fpcmp and cmask.  So we'd get dups in
   the insn output files.

2) I couldn't substitute the mode attribute gcm_name into the
   cmask unspec code.  ie. UNSPEC_CMASKgcm_name didn't work.

Anyways, at least there is one expander function shared between the
signed and unsigned cases.

Committed to trunk.

gcc/

* config/sparc/sparc.c (sparc_expand_vcond): New function.
* config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it.
* config/sparc/sparc.md (vcondmodemode): New VIS3 expander.
(vconduv8qiv8qi): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180733 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |7 +++
 gcc/config/sparc/sparc-protos.h |1 +
 gcc/config/sparc/sparc.c|   37 +
 gcc/config/sparc/sparc.md   |   30 ++
 4 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d5f725b..d6a9c4d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2011-11-01  David S. Miller  da...@davemloft.net
+
+   * config/sparc/sparc.c (sparc_expand_vcond): New function.
+   * config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it.
+   * config/sparc/sparc.md (vcondmodemode): New VIS3 expander.
+   (vconduv8qiv8qi): Likewise.
+
 2011-11-01  Alexandre Oliva  aol...@redhat.com
 
PR debug/50869
diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h
index 108e105..b9a094e 100644
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@@ -108,6 +108,7 @@ extern const char *output_v8plus_mult (rtx, rtx *, const 
char *);
 extern void sparc_expand_vector_init (rtx, rtx);
 extern void sparc_expand_vec_perm_bmask(enum machine_mode, rtx);
 extern bool sparc_expand_conditional_move (enum machine_mode, rtx *);
+extern void sparc_expand_vcond (enum machine_mode, rtx *, int, int);
 #endif /* RTX_CODE */
 
 #endif /* __SPARC_PROTOS_H__ */
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index fd1b190..6431405 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -11531,4 +11531,41 @@ sparc_expand_conditional_move (enum machine_mode mode, 
rtx *operands)
   return true;
 }
 
+void
+sparc_expand_vcond (enum machine_mode mode, rtx *operands, int ccode, int 
fcode)
+{
+  rtx mask, cop0, cop1, fcmp, cmask, bshuf, gsr;
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  mask = gen_reg_rtx (Pmode);
+  cop0 = operands[4];
+  cop1 = operands[5];
+  if (code == LT || code == GE)
+{
+  rtx t;
+
+  code = swap_condition (code);
+  t = cop0; cop0 = cop1; cop1 = t;
+}
+
+  gsr = gen_rtx_REG (DImode, SPARC_GSR_REG);
+
+  fcmp = gen_rtx_UNSPEC (Pmode,
+gen_rtvec (1, gen_rtx_fmt_ee (code, mode, cop0, cop1)),
+fcode);
+
+  cmask = gen_rtx_UNSPEC (DImode,
+ gen_rtvec (2, mask, gsr),
+ ccode);
+
+  bshuf = gen_rtx_UNSPEC (mode,
+ gen_rtvec (3, operands[1], operands[2], gsr),
+ UNSPEC_BSHUFFLE);
+
+  emit_insn (gen_rtx_SET (VOIDmode, mask, fcmp));
+  emit_insn (gen_rtx_SET (VOIDmode, gsr, cmask));
+
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], bshuf));
+}
+
 #include gt-sparc.h
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index fbd1a87..5924403 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -8299,6 +8299,36 @@
   [(set_attr type fpmul)
(set_attr fptype double)])
 
+(define_expand vcondmodemode
+  [(match_operand:GCM 0 register_operand )
+   (match_operand:GCM 1 register_operand )
+   (match_operand:GCM 2 register_operand )
+   (match_operator 3 
+ [(match_operand:GCM 4 register_operand )
+  (match_operand:GCM 5 register_operand )])]
+  TARGET_VIS3
+{
+  sparc_expand_vcond (MODEmode, operands,
+  UNSPEC_CMASKgcm_name,
+  UNSPEC_FCMP);
+  DONE;
+})
+
+(define_expand vconduv8qiv8qi
+  [(match_operand:V8QI 0 register_operand )
+   (match_operand:V8QI 1 register_operand )
+   (match_operand:V8QI 2 register_operand )
+   (match_operator 3 
+ [(match_operand:V8QI 4 register_operand )
+  (match_operand:V8QI 5 register_operand )])]
+  TARGET_VIS3
+{
+  sparc_expand_vcond (V8QImode, operands,
+  UNSPEC_CMASK8,
+  UNSPEC_FUCMP);
+  DONE;
+})
+
 (define_insn array8P:mode_vis
   [(set (match_operand:P 0 register_operand =r)
 (unspec:P [(match_operand:P 1 register_or_zero_operand rJ)
-- 
1.7.6.401.g6a319



RFA: Fix dse / postreload not to bypass add expanders

2011-11-01 Thread Joern Rennecke

This patch makes emit_inc_dec_insn_before use add3_insn / gen_move_insn
so that the appropriate expanders are used to create the new instructions,
and for dse it use the available register liveness information to check
that no live fixed hard register, like a flags register, is clobbered in the
process.  For postreload, there is no such information available, so we
give up when we see a clobber / set that might be problematic.

regtested for epiphany-elf with modified rtx_cost, where it fixes  
three ICE-on-valid-code:
FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation,  -O1   
(internal compiler error)
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation,  -O1   
(internal compiler error)
FAIL: gcc.c-torture/execute/memcpy-bi.c compilation,  -O1  (internal  
compiler error)


Bootstrapped and regression tested on i686-pc-linux-gnu .
2011-10-31  Joern Rennecke joern.renne...@embecosm.com

* regset.h (fixed_regset): Declare.
* dse.c: Include regset.h .
(struct insn_info): Add member fixed_regs_live.
(note_add_store_info): New typedef.
(note_add_store): New function.
(emit_inc_dec_insn_before): Expect arg to be of type insn_info_t .
Use gen_add3_insn / gen_move_insn.
Check new insn for unwanted clobbers before emitting it.
(check_for_inc_dec): Rename to...
(check_for_inc_dec_1:) ... this.  Return bool.  Take insn_info
parameter.  Changed all callers in file.
(check_for_inc_dec, copy_fixed_regs): New functions.
(scan_insn): Set fixed_regs_live field of insn_info.
* rtl.h (check_for_inc_dec): Update prototype.
* postreload.c (reload_cse_simplify): Take new signature of
check_ind_dec into account.
* reginfo.c (fixed_regset): New variable.
(init_reg_sets_1): Initialize it.

Index: postreload.c
===
--- postreload.c(revision 180683)
+++ postreload.c(working copy)
@@ -112,8 +112,8 @@ reload_cse_simplify (rtx insn, rtx testr
  if (REG_P (value)
   ! REG_FUNCTION_VALUE_P (value))
value = 0;
- check_for_inc_dec (insn);
- delete_insn_and_edges (insn);
+ if (check_for_inc_dec (insn, NULL))
+   delete_insn_and_edges (insn);
  return;
}
 
@@ -164,8 +164,8 @@ reload_cse_simplify (rtx insn, rtx testr
 
   if (i  0)
{
- check_for_inc_dec (insn);
- delete_insn_and_edges (insn);
+ if (check_for_inc_dec (insn, NULL))
+   delete_insn_and_edges (insn);
  /* We're done with this insn.  */
  return;
}
Index: regset.h
===
--- regset.h(revision 180683)
+++ regset.h(working copy)
@@ -1,6 +1,6 @@
 /* Define regsets.
Copyright (C) 1987, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
-   2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+   2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -115,6 +115,9 @@ #define EXECUTE_IF_AND_IN_REG_SET(REGSET
 
 extern regset regs_invalidated_by_call_regset;
 
+/* Same information as FIXED_REG_SET but in regset form.  */
+extern regset fixed_regset;
+
 /* An obstack for regsets.  */
 extern bitmap_obstack reg_obstack;
 
Index: dse.c
===
--- dse.c   (revision 180683)
+++ dse.c   (working copy)
@@ -33,6 +33,7 @@ Software Foundation; either version 3, o
 #include tm_p.h
 #include regs.h
 #include hard-reg-set.h
+#include regset.h
 #include flags.h
 #include df.h
 #include cselib.h
@@ -377,6 +378,13 @@ struct insn_info
  created.  */
   read_info_t read_rec;
 
+  /* The live fixed registers.  We assume only fixed registers can
+ cause trouble by being clobbered from an expanded pattern;
+ storing only the live fixed registers (rather than all registers)
+ means less memory needs to be allocated / copied for the individual
+ stores.  */
+  regset fixed_regs_live;
+
   /* The prev insn in the basic block.  */
   struct insn_info * prev_insn;
 
@@ -448,9 +456,9 @@ struct bb_info
   /* The following bitvector is indexed by the reg number.  It
  contains the set of regs that are live at the current instruction
  being processed.  While it contains info for all of the
- registers, only the pseudos are actually examined.  It is used to
- assure that shift sequences that are inserted do not accidently
- clobber live hard regs.  */
+ registers, only the hard registers are actually examined.  It is used
+ to assure that shift and/or add sequences that are inserted do not
+ accidently clobber live hard regs.  */
   bitmap regs_live;
 };
 
@@ -827,6 +835,51 @@ free_store_info (insn_info_t insn_info)
   insn_info-store_rec = NULL;
 }
 
+typedef struct
+{
+  rtx 

[PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders

2011-11-01 Thread Jakub Jelinek
Hi!

Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion
support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode
conversion.

Ok for trunk?

2011-11-01  Jakub Jelinek  ja...@redhat.com

* config/i386/sse.md (ssepackfltmode): New mode attr.
(vec_pack_ufix_trunc_mode): New expander using VF2 iterator.

--- gcc/config/i386/sse.md.jj   2011-11-01 09:04:37.0 +0100
+++ gcc/config/i386/sse.md  2011-11-01 09:37:36.0 +0100
@@ -3127,6 +3127,56 @@ (define_expand vec_pack_sfix_trunc_v2df
   DONE;
 })
 
+(define_mode_attr ssepackfltmode
+  [(V4DF V8SI) (V2DF V4SI)])
+
+(define_expand vec_pack_ufix_trunc_mode
+  [(match_operand:ssepackfltmode 0 register_operand )
+   (match_operand:VF2 1 register_operand )
+   (match_operand:VF2 2 register_operand )]
+  TARGET_AVX
+{
+  REAL_VALUE_TYPE MTWO32r, TWO31r;
+  rtx two31r, mtwo32r, tmp[8];
+  int i;
+
+  for (i = 0; i  6; i++)
+tmp[i] = gen_reg_rtx (MODEmode);
+  tmp[6] = gen_reg_rtx (ssepackfltmodemode);
+  tmp[7] = gen_reg_rtx (ssepackfltmodemode);
+  real_ldexp (TWO31r, dconst1, 31);
+  two31r = const_double_from_real_value (TWO31r, DFmode);
+  two31r = ix86_build_const_vector (MODEmode, 1, two31r);
+  two31r = force_reg (MODEmode, two31r);
+  real_ldexp (MTWO32r, dconstm1, 32);
+  mtwo32r = const_double_from_real_value (MTWO32r, DFmode);
+  mtwo32r = ix86_build_const_vector (MODEmode, 1, mtwo32r);
+  mtwo32r = force_reg (MODEmode, mtwo32r);
+  emit_insn (gen_avx_cmpmode3 (tmp[0], operands[1], two31r, GEN_INT (29)));
+  emit_insn (gen_avx_cmpmode3 (tmp[1], operands[2], two31r, GEN_INT (29)));
+  emit_insn (gen_andmode3 (tmp[2], tmp[0], mtwo32r));
+  emit_insn (gen_andmode3 (tmp[3], tmp[1], mtwo32r));
+  emit_insn (gen_addmode3 (tmp[4], operands[1], tmp[2]));
+  emit_insn (gen_addmode3 (tmp[5], operands[2], tmp[3]));
+  if (MODEmode == V4DFmode)
+{
+  emit_insn (gen_avx_cvttpd2dq256_2 (tmp[6], tmp[4]));
+  emit_insn (gen_avx_cvttpd2dq256_2 (tmp[7], tmp[5]));
+  emit_insn (gen_avx_vperm2f128v8si3 (operands[0], tmp[6], tmp[7],
+ GEN_INT (0x20)));
+}
+  else
+{
+  emit_insn (gen_sse2_cvttpd2dq (tmp[6], tmp[4]));
+  emit_insn (gen_sse2_cvttpd2dq (tmp[7], tmp[5]));
+  emit_insn (gen_vec_interleave_lowv2di (gen_lowpart (V2DImode,
+ operands[0]),
+gen_lowpart (V2DImode, tmp[6]),
+gen_lowpart (V2DImode, tmp[7])));
+}
+  DONE;
+})
+
 (define_expand vec_pack_sfix_v4df
   [(match_operand:V8SI 0 register_operand )
(match_operand:V4DF 1 nonimmediate_operand )

Jakub


[PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch

2011-11-01 Thread Konrad Eisele
Use -Aleon to enable binutils sparc-leon architecture. The leon-arch
binutils GAS has umul/smul and casa enabled.

Signed-off-by: Konrad Eisele kon...@gaisler.com
---
 gcc/config/sparc/sparc.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 65b4527..bbadeb2 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -236,7 +236,7 @@ extern enum cmodel sparc_cmodel;
 
 #if TARGET_CPU_DEFAULT == TARGET_CPU_leon
 #define CPP_CPU32_DEFAULT_SPEC -D__leon__ -D__sparc_v8__
-#define ASM_CPU32_DEFAULT_SPEC 
+#define ASM_CPU32_DEFAULT_SPEC -Aleon
 #endif
 
 #endif
@@ -324,7 +324,7 @@ extern enum cmodel sparc_cmodel;
 
 /* Override in target specific files.  */
 #define ASM_CPU_SPEC \
-%{mcpu=sparclet:-Asparclet} %{mcpu=tsc701:-Asparclet} \
+%{mcpu=sparclet:-Asparclet} %{mcpu=leon:-Aleon} %{mcpu=tsc701:-Asparclet} \
 %{mcpu=sparclite:-Asparclite} \
 %{mcpu=sparclite86x:-Asparclite} \
 %{mcpu=f930:-Asparclite} %{mcpu=f934:-Asparclite} \
-- 
1.6.4.1



Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call

2011-11-01 Thread Georg-Johann Lay

Denis Chertykov schrieb:

2011/10/31 Georg-Johann Lay:


Since beginning of time, BRANCH_COST was set to 0 so that some optimization
passes make code happily jumping around. The patch introduces a new command
line option for that; mainly because I don't know the rationale behind setting
BRANCH_COST to 0.

Johann

  * config/avr/avr.opt (-mbranch-cost=): New option.
  * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost.
  * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs.
  * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns.
  (*addhi3_zero_extend1): Remov % in constraint of operand 1.
  (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns.


Approved.

Denis.


You know why the branch costs are set to 0 by default?
Maybe it's better to have a default of 1 for the new avr_branch_cost?

Johann




Re: [PATCH, devirtualization] Detect the new type in type change detection

2011-11-01 Thread Richard Guenther
On Mon, Oct 31, 2011 at 5:58 PM, Martin Jambor mjam...@suse.cz wrote:
 On Fri, Oct 28, 2011 at 11:21:23AM +0200, Richard Guenther wrote:
 On Thu, Oct 27, 2011 at 9:54 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  On Thu, Oct 27, 2011 at 11:06:02AM +0200, Richard Guenther wrote:
  On Thu, Oct 27, 2011 at 1:22 AM, Martin Jambor mjam...@suse.cz wrote:
   Hi,
  
   I've been asked by Maxim Kuvyrkov to revive the following patch which
   has not made it to 4.6.  Currently, when type based devirtualization
   detects a potential type change, it simply gives up on gathering any
   information on the object in question.  This patch adds an attempt to
   actually detect the new type after the change.
  
   Maxim claimed this (and another patch I'll post tomorrow) noticeably
   improved performance of some real code.  I can only offer a rather
   artificial example in the attachment.  When the constructors are
   inlined but the function multiply_matrices is not, this patch makes
   the produced executable run for only 7 seconds instead of about 20 on
   my 4 year old i686 desktop (with -Ofast).
  
   Anyway, the patch passes bootstrap and testsuite on x86_64-linux.
   What do you think, is it a good idea for trunk now?
  
   Thanks,
  
   Martin
  
  
   2011-10-21  Martin Jambor  mjam...@suse.cz
  
          * ipa-prop.c (type_change_info): New fields object, 
   known_current_type
          and multiple_types_encountered.
          (extr_type_from_vtbl_ptr_store): New function.
          (check_stmt_for_type_change): Use it, set 
   multiple_types_encountered if
          the result is different from the previous one.
          (detect_type_change): Renamed to detect_type_change_1. New 
   parameter
          comp_type.  Set up new fields in tci, build known type jump
          functions if the new type can be identified.
          (detect_type_change): New function.
          * tree.h (DECL_CONTEXT): Comment new use.
  
          * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans.
          * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise.
          * testsuite/g++.dg/ipa/devirt-c-7.C: New test.
  
  
   Index: src/gcc/ipa-prop.c
   ===
   --- src.orig/gcc/ipa-prop.c
   +++ src/gcc/ipa-prop.c
   @@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f)
  
    struct type_change_info
    {
   +  /* The declaration or SSA_NAME pointer of the base that we are 
   checking for
   +     type change.  */
   +  tree object;
   +  /* If we actually can tell the type that the object has changed to, 
   it is
   +     stored in this field.  Otherwise it remains NULL_TREE.  */
   +  tree known_current_type;
     /* Set to true if dynamic type change has been detected.  */
     bool type_maybe_changed;
   +  /* Set to true if multiple types have been encountered.  
   known_current_type
   +     must be disregarded in that case.  */
   +  bool multiple_types_encountered;
    };
  
    /* Return true if STMT can modify a virtual method table pointer.
   @@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt)
     return true;
    }
  
   +/* If STMT can be proved to be an assignment to the virtual method 
   table
   +   pointer of ANALYZED_OBJ and the type associated with the new table
   +   identified, return the type.  Otherwise return NULL_TREE.  */
   +
   +static tree
   +extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj)
   +{
   +  tree lhs, t, obj;
   +
   +  if (!is_gimple_assign (stmt))
 
  gimple_assign_single_p (stmt)
 
  OK.
 
 
   +    return NULL_TREE;
   +
   +  lhs = gimple_assign_lhs (stmt);
   +
   +  if (TREE_CODE (lhs) != COMPONENT_REF)
   +    return NULL_TREE;
   +  obj = lhs;
   +
   +  if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1)))
   +    return NULL_TREE;
   +
   +  do
   +    {
   +      obj = TREE_OPERAND (obj, 0);
   +    }
   +  while (TREE_CODE (obj) == COMPONENT_REF);
 
  You do not allow other components than component-refs (thus, for
  example an ARRAY_REF - that is for a reason?).  Please add
  a comment why.  Otherwise this whole sequence would look like
  it should be replaceable by get_base_address (obj).
 
 
  I guess I might have been overly conservative here, ARRAY_REFs are
  fine.  get_base_address only digs into MEM_REFs if they are based on
  an ADDR_EXPR while I do so always.  But I can check that either both
  obj and analyzed_obj are a MEM_REF of the same SSA_NAME or they are
  the same thing (i.e. the same decl)... which even feels a bit cleaner,
  so I did that.

 Well, as you are looking for a must-change-type pattern I think you cannot
 simply ignore offsets.  Consider

 T a[10];

 new (T') (a[9]);
 a[8]-foo();

 where the must-type-change on a[9] is _not_ changing the type of a[8]!

 Similar cases might happen with

 class Compound { T a; T b; };

 no?

 Please think about the difference must vs. may-type-change for these
 cases.  I'm not convinced that the must-type-change code is 

Re: [PR50878, PATCH] Fix for verify_dominators in -ftree-tail-merge

2011-11-01 Thread Richard Guenther
On Mon, Oct 31, 2011 at 9:19 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 10/30/2011 10:54 AM, Richard Guenther wrote:
 On Sun, Oct 30, 2011 at 9:27 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 10/30/2011 09:20 AM, Tom de Vries wrote:
 Richard,

 I have a fix for PR50878.

 Sorry, with patch this time.

 Ok for now, but see Davids mail and the complexity issue with iteratively
 updating dominators.

 I'm not sure which mail you mean.

The one I CCed you on, which complained about iterative dominator fixing
taking 70% of the compile-time in some GCC testsuite test.

 It seems to me that we know exactly what to update
 and how, and we should do that (well, if we need up-to-date dominators,
 re-computing them once in the pass would be ok).


 Indeed, in this example we know exactly what to update and how. However, 
 PR50908
 popped up, and there that's not the case anymore.

 Consider the following cfg, where A is the direct dominator of I:

                A
               / \
              B   \
             / \   \
                C   D
               /|   |\
                E   F
                |\ /|
                | x |
                |/ \|
                G   H
                 \ /
                  I

 Say E and F are duplicates, and F is removed.  The cfg then looks like
 this:

                A
               / \
              B   \
             / \   \
                C   D
               / \ / \
                  E
                 / \
                G   H
                 \ /
                  I

 E is now the new direct dominator of I.

 The patch for PR50878 did not address this example, since it uses the set of 
 bbs
 directly dominated by the (single) predecessor of bb1 and bb2.

 The new patch calculates the updated dominator info by taking the nearest 
 common
 dominator (A) of bb1 (F) and bb2 (E), and getting the set of bbs immediately
 dominated by it.  Part of this set is now directly dominated by bb2.

 Ideally we would have a means to determine which bbs in the set are now
 directly dominated by bb2, and call set_immediate_dominator for those bbs, but
 we don't, so instead we let iterate_fix_dominators figure it out.

 Additionally, the patch makes sure it updates dominator info before updating 
 the
 vuses, this fixes a latent bug.

 The patch fixes both PR50908 and PR50878.

 Bootstrapped and reg-tested on x86_64 and i686, and build and reg-tested on 
 ARM
 and MIPS.

 Ok for trunk?

Ok, but please add testcases for all the bugs you fixed.  This helps adding test
coverage for these cases.

Thanks,
Richard.

 Thanks,
 - Tom

 Richard.

 Thanks,
 - Tom


 A simplified form of the problem from the test-case of the PR is shown in 
 this
 cfg. Block 12 has as direct dominator block 5.

         5
        / \
       /   \
      *     *
      6     7
      |     |
      |     |
      *     *
      8     9
       \   /
        \ /
         *
        12

 tail_merge_optimize finds that blocks 6 and 7 are duplicates. After 
 replacing
 block 7 by block 6, the cfg looks like this:

         5
         |
         |
         *
         6
        / \
       /   \
      *     *
      8     9
       \   /
        \ /
         *
        12

 The new direct dominator of block 12 is block 6, but the current algorithm 
 only
 recalculates dominator info for blocks 6, 8 and 9.

 The patch fixes this by additionally recalculating the dominator info for 
 blocks
 immediately dominated by bb2 (block 6 in the example), if bb2 has a single
 predecessor after replacement.

 Bootstapped and reg-tested on x86_64 and i686. Build and reg-tested on 
 MIPS and ARM.

 Ok for trunk?

 Thanks,
 - Tom

 2011-10-30  Tom de Vries  t...@codesourcery.com

       PR tree-optimization/50878
       * tree-ssa-tail-merge.c (replace_block_by): Recalculate dominator 
 info
       for blocks immediately dominated by bb2, if bb2 has a single 
 predecessor
       after replacement.



 2011-10-31  Tom de Vries  t...@codesourcery.com

        PR tree-optimization/50908
        * tree-ssa-tail-merge.c (update_vuses): Now that edges are removed
        before update_vuses, test for 1 predecessor rather than two.
        (delete_block_update_dominator_info): New function, part of it factored
        out of ...
        (replace_block_by): Use delete_block_update_dominator_info.  Call
        update_vuses after deleting bb1 and updating dominator info, instead of
        before.



Re: AVX generic mode tuning discussion.

2011-11-01 Thread Richard Guenther
On Mon, Oct 31, 2011 at 9:36 PM, Jagasia, Harsha harsha.jaga...@amd.com wrote:
   We would like to propose changing AVX generic mode tuning to
 generate
  128-bit
   AVX instead of 256-bit AVX.
 
  You indicate a 3% reduction on bulldozer with avx256.
  How does avx128 compare to -mno-avx -msse4.2?

 We see these % differences going from SSE42 to AVX128 to AVX256 on
 Bulldozer with -mtune=generic -Ofast.
 (Positive is improvement, negative is degradation)

 Bulldozer:
                       AVX128/SSE42    AVX256/AVX-128
 410.bwaves            -1.4%                   -1.4%
 416.gamess            -1.1%                   0.0%
 433.milc              0.5%                    -2.4%
 434.zeusmp            9.7%                    -2.1%
 435.gromacs           5.1%                    0.5%
 436.cactusADM 8.2%                    -23.8%
 437.leslie3d  8.1%                    0.4%
 444.namd              3.6%                    0.0%
 447.dealII            -1.4%                   -0.4%
 450.soplex            -0.4%                   -0.4%
 453.povray            0.0%                    -1.5%
 454.calculix  15.7%                   -8.3%
 459.GemsFDTD  4.9%                    1.4%
 465.tonto             1.3%                    -0.6%
 470.lbm               0.9%                    0.3%
 481.wrf               7.3%                    -3.6%
 482.sphinx3           5.0%                    -9.8%
 SPECFP                3.8%                    -3.2%

  Will the next AMD generation have a useable avx256?
  I'm not keen on the idea of generic mode being tune
  for a single processor revision that maybe shouldn't
  actually be using avx at all.

 We see a substantial gain in several SPECFP benchmarks going from SSE42
 to AVX128 on Bulldozer.
 IMHO, accomplishing even a 5% gain in an individual benchmark takes a
 hardware company several man months.
 The loss with AVX256 for Bulldozer is much more significant than the
 gain for SandyBridge.
 While the general trend in the industry is a move toward AVX256, for
 now we would be disadvantaging Bulldozer with this choice.

 We have several customers who use -mtune=generic and it is default,
 unless a user explicitly overrides it with -mtune=native. They are the
 ones who want to experiment with latest ISA using gcc, but want to keep
 their ISA selection and tuning agnostic on x86/64. IMHO, it is with
 these customers in mind that generic was introduced in the first place.

 Since stage 1 closure is around the corner, just wanted to ping to see if the 
 maintainers have made up their mind on this one.
 AVX-128 is an improvement over SSE42 for Bulldozer and AVX-256 wipes out 
 pretty much all of that gain in generic mode.
 Until there is a convergence on AVX-256 for x86/64, we would like to propose 
 having generic generate avx-128 by default and have a user override to 
 avx-256 manually when known to benefit performance.

Did somebody spend the time analyzing why CactusADM shows so much of a
difference?  With the recent improvements in vectorizing for AVX, did
you
re-do the measurements with a recent trunk?

I don't think disabling avx-256 by default is a good idea until we
understand why these numbers happen and are convinced we cannot fix
this by proper
cost modeling.

Richard.

 Thanks,
 Harsha




Re: [google] Enable loop unroll/peel notes under -fopt-info

2011-11-01 Thread Richard Guenther
On Tue, Nov 1, 2011 at 1:46 AM, Teresa Johnson tejohn...@google.com wrote:
 This patch is for google-main only.

 Tested with bootstrap and regression tests.

 Print unroll and peel factors along with loop source position under 
 -fopt-info.

 Teresa

 2011-10-31   Teresa Johnson  tejohn...@google.com

        * common.opt (fopt-info): Disable -fopt-info by default.
        * loop-unroll.c (report_unroll_peel): New function.
        (unroll_and_peel_loops): Call record_loop_exits for later use.
        (peel_loops_completely): Print the loop source position in dump
        info and emit note under -fopt-info.
        (decide_unroll_and_peeling): Ditto.
        (decide_peel_once_rolling): Record peel factor for use in note
        emission.
        (decide_peel_completely): Ditto.
        * cfgloop.c (get_loop_location): New function.
        * cfgloop.h (get_loop_location): Ditto.
        * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Emit note
        under -fopt-info.

 Index: tree-ssa-loop-ivcanon.c
 ===
 --- tree-ssa-loop-ivcanon.c     (revision 180437)
 +++ tree-ssa-loop-ivcanon.c     (working copy)
 @@ -52,6 +52,7 @@
  #include flags.h
  #include tree-inline.h
  #include target.h
 +#include diagnostic.h

  /* Specifies types of loops that may be unrolled.  */

 @@ -443,6 +444,17 @@
     fprintf (dump_file, Unrolled loop %d completely by factor %d.\n,
              loop-num, (int) n_unroll);

 +  if (flag_opt_info = OPT_INFO_MIN)
 +    {
 +      location_t locus;
 +      locus = gimple_location (cond);
 +
 +      inform (locus, Completely Unroll loop by %d (execution count
 %d, const iterations %d),
 +              (int) n_unroll,
 +              (int) loop-header-count,
 +              (int) TREE_INT_CST_LOW(niter));
 +    }
 +

And this is exactly what I mean with code-duplication.  Two lines above
we already have Unroled loop %d completely by factor %d, not only
do you duplicate some diagnostic printing about this fact, you
put in useless info (complete unroll by N of a loop executing M (?! that's
surely N as well) times, const iterations O (?! that's surely N as well ...).

Richard.


Re: Go patch committed: Update Go library

2011-11-01 Thread Uros Bizjak
On Thu, Oct 27, 2011 at 6:42 PM, Uros Bizjak ubiz...@gmail.com wrote:

 This patch updates the Go library to the most recent weekly release.  I
 think the only potential portability issues here are the use of the
 ipv6_mreq struct.  I'm not entirely sure the new exp/terminal package is
 portable, but it might be.

 There are still problems with EpollEvent definition on Alpha, please
 see [1] for the analysis.

 [1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00457.html

Thanks, the resulting epoll.go on Alpha reads as:

epoll.go
package syscall
type EpollEvent struct {
Events uint32
Pad [4]byte
Fd int32
Pad2 [4]byte
}

However, I am not able to finish compilation of libgo due to unrelated
problem (reported in [1]) with TC[GS]ETS define:

libtool: compile:  /space/uros/gcc-build-go/./gcc/gccgo
-B/space/uros/gcc-build-go/./gcc/
-B/usr/local/alphaev68-unknown-linux-gnu/bin/
-B/usr/local/alphaev68-unknown-linux-gnu/lib/ -isystem
/usr/local/alphaev68-unknown-linux-gnu/include -isystem
/usr/local/alphaev68-unknown-linux-gnu/sys-include -O2 -g -mieee -I .
-c -fgo-prefix=libgo_bytes
../../../gcc-svn/trunk/libgo/go/bytes/buffer.go
../../../gcc-svn/trunk/libgo/go/bytes/bytes.go
../../../gcc-svn/trunk/libgo/go/bytes/bytes_decl.go -o bytes/bytes.o
/dev/null 21
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:31:78: error:
reference to undefined identifier ‘syscall.TCGETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:40:81: error:
reference to undefined identifier ‘syscall.TCGETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:47:81: error:
reference to undefined identifier ‘syscall.TCSETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:57:78: error:
reference to undefined identifier ‘syscall.TCSETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:66:81: error:
reference to undefined identifier ‘syscall.TCGETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:72:81: error:
reference to undefined identifier ‘syscall.TCSETS’
../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:77:68: error:
reference to undefined identifier ‘syscall.TCSETS’
make[4]: *** [exp/terminal.lo] Error 1


[1] http://gcc.gnu.org/ml/gcc/2011-10/msg00488.html

Uros.


Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call

2011-11-01 Thread Denis Chertykov
2011/11/1 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:

 2011/10/31 Georg-Johann Lay:

 Since beginning of time, BRANCH_COST was set to 0 so that some
 optimization
 passes make code happily jumping around. The patch introduces a new
 command
 line option for that; mainly because I don't know the rationale behind
 setting
 BRANCH_COST to 0.

 Johann

      * config/avr/avr.opt (-mbranch-cost=): New option.
      * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost.
      * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs.
      * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New
 insns.
      (*addhi3_zero_extend1): Remov % in constraint of operand 1.
      (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns.

 Approved.

 Denis.

 You know why the branch costs are set to 0 by default?

No.

 Maybe it's better to have a default of 1 for the new avr_branch_cost?

I don't know. (I forgot)

Denis.


Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders

2011-11-01 Thread Uros Bizjak
On Tue, Nov 1, 2011 at 10:07 AM, Jakub Jelinek ja...@redhat.com wrote:

 Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion
 support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode
 conversion.

 Ok for trunk?

Please put expander function into i386.c. IMO, this expander can be
better written using variable mode and indirect functions.

Otherwise, it looks OK.

Thanks,
Uros.


Re: PATCH: Move f16c intrinsics into f16cintrin.h

2011-11-01 Thread Uros Bizjak
Hello!

  On Mon, Oct 31, 2011 at 05:23:58PM -0500, Quentin Neill wrote:
  Interested parties should view these threads from three years ago:
  http://gcc.gnu.org/ml/gcc-patches/2008-11/threads.html#00145
  http://gcc.gnu.org/ml/gcc-patches/2008-12/threads.html#00174
 
  Testing on x86_64, okay to commit if no regressions?
 
  You aren't installing the header, so it will cause regressions.
  config.gcc needs to be adjusted for it.

 Arggh. ?Thanks, my tests found that too.

 Reposting, okay to commit after testing on x86_64 if no regressions?

   Piledriver f16cintrin.h fix.
   * config/i386/f16cintrin.h: Contents moved from immintrin.h.
   * config/config.gcc: Add f16cintrin.h.

OK.

Thanks,
Uros.


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread niXman
Rechecked.


diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc
index 09e7fc5..6feda4d 100644
--- a/libstdc++-v3/src/thread.cc
+++ b/libstdc++-v3/src/thread.cc
@@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   unsigned int
   thread::hardware_concurrency() noexcept
   {
-int __n = _GLIBCXX_NPROCS;
-if (__n  0)
-  __n = 0;
-return __n;
+int count=0;
+#if defined(PTW32_VERSION) || \
+   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
+   defined(__hpux)
+count=pthread_num_processors_np();
+#elif defined(__APPLE__) || defined(__FreeBSD__)
+size_t size=sizeof(count);
+sysctlbyname(hw.ncpu, count, size, NULL, 0);
+#elif defined(_SC_NPROCESSORS_ONLN)
+count=sysconf(_SC_NPROCESSORS_ONLN);
+#elif defined(_GLIBCXX_USE_GET_NPROCS)
+count=_GLIBCXX_NPROCS;
+#endif
+return (count0)?count:0;
   }

 _GLIBCXX_END_NAMESPACE_VERSION



2011/11/1 Paolo Carlini pcarl...@gmail.com:
 Hi,

 This is patch is implement the std::thread::hardware_concurrency().
 Tested on pthreads-win32/winpthreads on windows OS, and on Linux/FreeBSD.

 Please send library patches to the library mailing list too. Also, always 
 parch mainline first: actually in the latter the function is alread 
 implemented, maybe something is missing for win32, please check, rediff, and 
 resend.

 Thanks
 Paolo


[Patch, libfortran, committed] Cleanup NEWUNIT allocation

2011-11-01 Thread Janne Blomqvist
Hi,

attached patch committed to trunk as obvious after regtesting.

2011-11-01  Janne Blomqvist  j...@gcc.gnu.org

* io/io.h (next_available_newunit): Remove prototype.
* io/unit.h (next_available_newunit): Make variable static,
initialize it.
(init_units): Don't initialize next_available_newunit.
(get_unique_unit_number): Use atomic builtin if available.


-- 
Janne Blomqvist
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 23f07ca..3569c54 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -576,10 +576,6 @@ gfc_unit;
 extern gfc_offset max_offset;
 internal_proto(max_offset);
 
-/* Unit number to be assigned when NEWUNIT is used in an OPEN statement.  */
-extern GFC_INTEGER_4 next_available_newunit;
-internal_proto(next_available_newunit);
-
 /* Unit tree root.  */
 extern gfc_unit *unit_root;
 internal_proto(unit_root);
diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
index b4d10cd..33072fe 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -71,8 +71,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 /* Subroutines related to units */
 
-GFC_INTEGER_4 next_available_newunit;
+/* Unit number to be assigned when NEWUNIT is used in an OPEN statement.  */
 #define GFC_FIRST_NEWUNIT -10
+static GFC_INTEGER_4 next_available_newunit = GFC_FIRST_NEWUNIT;
 
 #define CACHE_SIZE 3
 static gfc_unit *unit_cache[CACHE_SIZE];
@@ -525,8 +526,6 @@ init_units (void)
   __GTHREAD_MUTEX_INIT_FUNCTION (unit_lock);
 #endif
 
-  next_available_newunit = GFC_FIRST_NEWUNIT;
-
   if (options.stdin_unit = 0)
 {/* STDIN */
   u = insert_unit (options.stdin_unit);
@@ -808,16 +807,19 @@ get_unique_unit_number (st_parameter_open *opp)
 {
   GFC_INTEGER_4 num;
 
+#ifdef HAVE_SYNC_FETCH_AND_ADD
+  num = __sync_fetch_and_add (next_available_newunit, -1);
+#else
   __gthread_mutex_lock (unit_lock);
   num = next_available_newunit--;
+  __gthread_mutex_unlock (unit_lock);
+#endif
 
   /* Do not allow NEWUNIT numbers to wrap.  */
-  if (next_available_newunit =  GFC_FIRST_NEWUNIT )
+  if (num  GFC_FIRST_NEWUNIT )
 {
-  __gthread_mutex_unlock (unit_lock);
   generate_error (opp-common, LIBERROR_INTERNAL, NEWUNIT exhausted);
   return 0;
 }
-  __gthread_mutex_unlock (unit_lock);
   return num;
 }


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Paolo Carlini

On 11/01/2011 12:33 PM, niXman wrote:

Rechecked.

Stylistically, you are missing a lot of spaces around the operators, eg:

return (count  0) ? count : 0;

also, patches are always submitted with a ChangeLog entry.

Do you have already a Copyright assignment in place? I'm asking in 
general, for your future submissions, this specific patch probably would 
be small enough to not require it.


Paolo.


[patch] Update gcc.dg/vect/no-scevccp-outer-6-global.c

2011-11-01 Thread Ira Rosen
Hi,

With the recent patches for __restrict__, the outer loop in
gcc.dg/vect/no-scevccp-outer-6-global.c is now vectorizable, because
it doesn't require loop versioning for alias anymore.  The comment in
the test is probably obsolete, and checking for widen-mult doesn't
make much sense, because there is no multiplication here at all.

Tested on powerpc64-suse-linux.
Committed.

Ira

testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-6-global.c: Expect to vectorize
the outer loop.  Remove comment.  Don't check for widen-mult.

Index: testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
===
--- testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c   (revision 180733)
+++ testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c   (working copy)
@@ -52,7 +52,5 @@
   return 0;
 }

-/* Too many BBs in loop  */
-/* { dg-final { scan-tree-dump-times OUTER LOOP VECTORIZED. 1
vect { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times vect_recog_widen_mult_pattern:
detected 1 vect { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times OUTER LOOP VECTORIZED. 1
vect { xfail vect_no_align } } } */
 /* { dg-final { cleanup-tree-dump vect } } */


Re: building binutils from same directory as gcc

2011-11-01 Thread Andrew Haley
On 10/30/2011 01:51 PM, Gerald Pfeifer wrote:
 Why not just declare
 that building from the same directory is not support and have one
 simple set of instructions that always works, as opposed to this
 ought to work with snapshots but not with direct checkouts?

That's right.  Is there ever any advantage to building in-srcdir?
I'm not aware of one.

Andrew.



Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Marc Glisse

On Tue, 1 Nov 2011, niXman wrote:


diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc
index 09e7fc5..6feda4d 100644
--- a/libstdc++-v3/src/thread.cc
+++ b/libstdc++-v3/src/thread.cc
@@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  unsigned int
  thread::hardware_concurrency() noexcept
  {
-int __n = _GLIBCXX_NPROCS;
-if (__n  0)
-  __n = 0;
-return __n;
+int count=0;
+#if defined(PTW32_VERSION) || \
+   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
+   defined(__hpux)
+count=pthread_num_processors_np();
+#elif defined(__APPLE__) || defined(__FreeBSD__)
+size_t size=sizeof(count);
+sysctlbyname(hw.ncpu, count, size, NULL, 0);
+#elif defined(_SC_NPROCESSORS_ONLN)
+count=sysconf(_SC_NPROCESSORS_ONLN);
+#elif defined(_GLIBCXX_USE_GET_NPROCS)
+count=_GLIBCXX_NPROCS;
+#endif
+return (count0)?count:0;


Er, the macro _GLIBCXX_NPROCS already handles the case 
sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove 
the macro _GLIBCXX_NPROCS completely.


--
Marc Glisse


[PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-01 Thread Ilya Enkovich
Hi,

Here is a patch which fixes redundant zero extensions problem. Issue
is resolved by expanding implicit_zee pass functionality to cover zero
and sign extends of different modes. Could please someone review it?

Bootstrapped and checked on linux-x86_64.

Thanks,
Ilya
---
2011-11-01  Enkovich Ilya  ilya.enkov...@intel.com

PR target/50038
* implicit-zee.c (ext_cand): New.
(ext_cand_pool): Likewise.
(add_ext_candidate): New.
(zee_init): New.
(zee_cleanup): New.
(get_reg_di): Removed.
(combine_set_zero_extend): Get extend candidate as new parameter.
Now handle sign extend cases and other modes.
(transform_ifelse): Likewise.
(merge_def_and_ze): Likewise.
(combine_reaching_defs): Change parameter type.
(zero_extend_info): Changed insn_list type.
(add_removable_zero_extend): Relaxed mode and code filter.
(find_removable_zero_extends): Changed return type.
(find_and_remove_ze): Var type changes.
(rest_of_handle_zee): Init and cleanup added.

* i386.c (ix86_option_override_internal): set flag_zee for
32 bit platform.


PR50038.diff
Description: Binary data


[PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)

2011-11-01 Thread Jakub Jelinek
On Tue, Nov 01, 2011 at 11:16:07AM +0100, Uros Bizjak wrote:
 On Tue, Nov 1, 2011 at 10:07 AM, Jakub Jelinek ja...@redhat.com wrote:
 
  Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion
  support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode
  conversion.
 
  Ok for trunk?
 
 Please put expander function into i386.c. IMO, this expander can be
 better written using variable mode and indirect functions.

Like this?
Advantage is that fixuns_truncmodesseintvecmodelower2 pattern can use
the helper too and shrink, disadvantage is that the stmts in the new
pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order
instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why
the scheduler didn't change it, but on the other side it is scheduler's
job).

2011-11-01  Jakub Jelinek  ja...@redhat.com

* config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New
prototype.
* config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New
function.
* config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use
it.
(ssepackfltmode): New mode attr.
(vec_pack_ufix_trunc_mode): New expander.

--- gcc/config/i386/i386-protos.h.jj2011-10-25 08:13:31.0 +0200
+++ gcc/config/i386/i386-protos.h   2011-11-01 14:18:59.0 +0100
@@ -109,6 +109,7 @@ extern void ix86_expand_convert_uns_sixf
 extern void ix86_expand_convert_uns_sidf_sse (rtx, rtx);
 extern void ix86_expand_convert_uns_sisf_sse (rtx, rtx);
 extern void ix86_expand_convert_sign_didf_sse (rtx, rtx);
+extern rtx ix86_expand_adjust_ufix_to_sfix_si (rtx);
 extern enum ix86_fpcmp_strategy ix86_fp_comparison_strategy (enum rtx_code);
 extern void ix86_expand_fp_absneg_operator (enum rtx_code, enum machine_mode,
rtx[]);
--- gcc/config/i386/i386.c.jj   2011-10-31 20:44:13.0 +0100
+++ gcc/config/i386/i386.c  2011-11-01 14:26:31.0 +0100
@@ -17016,6 +17016,46 @@ ix86_expand_convert_uns_sisf_sse (rtx ta
 emit_move_insn (target, fp_hi);
 }
 
+/* Adjust a V*SFmode/V*DFmode value VAL so that *sfix_trunc* resp. fix_trunc*
+   pattern can be used on it instead of *ufix_trunc* resp. fixuns_trunc*.
+   This is done by subtracting 0x1p32 from VAL if VAL is greater or equal
+   (non-signalling) than 0x1p31.  */
+
+rtx
+ix86_expand_adjust_ufix_to_sfix_si (rtx val)
+{
+  REAL_VALUE_TYPE MTWO32r, TWO31r;
+  rtx two31r, mtwo32r, tmp[3];
+  enum machine_mode mode = GET_MODE (val);
+  enum machine_mode scalarmode = GET_MODE_INNER (mode);
+  rtx (*cmp) (rtx, rtx, rtx, rtx);
+  int i;
+
+  for (i = 0; i  3; i++)
+tmp[i] = gen_reg_rtx (mode);
+  real_ldexp (TWO31r, dconst1, 31);
+  two31r = const_double_from_real_value (TWO31r, scalarmode);
+  two31r = ix86_build_const_vector (mode, 1, two31r);
+  two31r = force_reg (mode, two31r);
+  real_ldexp (MTWO32r, dconstm1, 32);
+  mtwo32r = const_double_from_real_value (MTWO32r, scalarmode);
+  mtwo32r = ix86_build_const_vector (mode, 1, mtwo32r);
+  mtwo32r = force_reg (mode, mtwo32r);
+  switch (mode)
+{
+case V8SFmode: cmp = gen_avx_cmpv8sf3; break;
+case V4SFmode: cmp = gen_avx_cmpv4sf3; break;
+case V4DFmode: cmp = gen_avx_cmpv4df3; break;
+case V2DFmode: cmp = gen_avx_cmpv2df3; break;
+default: gcc_unreachable ();
+}
+  emit_insn (cmp (tmp[0], val, two31r, GEN_INT (29)));
+  tmp[1] = expand_simple_binop (mode, AND, tmp[0], mtwo32r, tmp[1],
+   0, OPTAB_DIRECT);
+  return expand_simple_binop (mode, PLUS, val, tmp[1], tmp[2],
+ 0, OPTAB_DIRECT);
+}
+
 /* A subroutine of ix86_build_signbit_mask.  If VECT is true,
then replicate the value for all elements of the vector
register.  */
--- gcc/config/i386/sse.md.jj   2011-11-01 09:04:37.0 +0100
+++ gcc/config/i386/sse.md  2011-11-01 14:25:52.0 +0100
@@ -2323,32 +2323,13 @@ (define_insn fix_truncv4sfv4si2
(set_attr mode TI)])
 
 (define_expand fixuns_truncmodesseintvecmodelower2
-  [(set (match_dup 4)
-   (unspec:VF1
- [(match_operand:VF1 1 register_operand )
-  (match_dup 2)
-  (const_int 29)] UNSPEC_PCMP))
-   (set (match_dup 5)
-   (and:VF1 (match_dup 4) (match_dup 3)))
-   (set (match_dup 6)
-   (plus:VF1 (match_dup 1) (match_dup 5)))
-   (set (match_operand:sseintvecmode 0 register_operand )
-   (fix:sseintvecmode (match_dup 6)))]
+  [(match_operand:sseintvecmode 0 register_operand )
+   (match_operand:VF1 1 register_operand )]
   TARGET_AVX
 {
-  REAL_VALUE_TYPE MTWO32r, TWO31r;
-  int i;
-
-  real_ldexp (TWO31r, dconst1, 31);
-  operands[2] = const_double_from_real_value (TWO31r, SFmode);
-  operands[2] = ix86_build_const_vector (MODEmode, 1, operands[2]);
-  operands[2] = force_reg (MODEmode, operands[2]);
-  real_ldexp (MTWO32r, dconstm1, 32);
-  operands[3] = const_double_from_real_value (MTWO32r, SFmode);
-  operands[3] = 

Re: [google] Enable loop unroll/peel notes under -fopt-info

2011-11-01 Thread Teresa Johnson
Hi Richard,

Once we have a uniform way to emit notes to either stderr or dump, as
you and David had discussed in the earlier thread, we can merge these
two messages. The advantage with the new messages, besides going to
stderr, is that the source position information is being emitted since
it is a note. I agree that for complete unrolls the constant number of
iterations can be omitted (but it is useful for the other types of
unrolls/peels). But the execution count is something different - it
includes the number of times the loop header executes based on profile
information (i.e. iterations*# times loop is entered).

Thanks,
Teresa


On Tue, Nov 1, 2011 at 2:53 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Tue, Nov 1, 2011 at 1:46 AM, Teresa Johnson tejohn...@google.com wrote:
 This patch is for google-main only.

 Tested with bootstrap and regression tests.

 Print unroll and peel factors along with loop source position under 
 -fopt-info.

 Teresa

 2011-10-31   Teresa Johnson  tejohn...@google.com

        * common.opt (fopt-info): Disable -fopt-info by default.
        * loop-unroll.c (report_unroll_peel): New function.
        (unroll_and_peel_loops): Call record_loop_exits for later use.
        (peel_loops_completely): Print the loop source position in dump
        info and emit note under -fopt-info.
        (decide_unroll_and_peeling): Ditto.
        (decide_peel_once_rolling): Record peel factor for use in note
        emission.
        (decide_peel_completely): Ditto.
        * cfgloop.c (get_loop_location): New function.
        * cfgloop.h (get_loop_location): Ditto.
        * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Emit note
        under -fopt-info.

 Index: tree-ssa-loop-ivcanon.c
 ===
 --- tree-ssa-loop-ivcanon.c     (revision 180437)
 +++ tree-ssa-loop-ivcanon.c     (working copy)
 @@ -52,6 +52,7 @@
  #include flags.h
  #include tree-inline.h
  #include target.h
 +#include diagnostic.h

  /* Specifies types of loops that may be unrolled.  */

 @@ -443,6 +444,17 @@
     fprintf (dump_file, Unrolled loop %d completely by factor %d.\n,
              loop-num, (int) n_unroll);

 +  if (flag_opt_info = OPT_INFO_MIN)
 +    {
 +      location_t locus;
 +      locus = gimple_location (cond);
 +
 +      inform (locus, Completely Unroll loop by %d (execution count
 %d, const iterations %d),
 +              (int) n_unroll,
 +              (int) loop-header-count,
 +              (int) TREE_INT_CST_LOW(niter));
 +    }
 +

 And this is exactly what I mean with code-duplication.  Two lines above
 we already have Unroled loop %d completely by factor %d, not only
 do you duplicate some diagnostic printing about this fact, you
 put in useless info (complete unroll by N of a loop executing M (?! that's
 surely N as well) times, const iterations O (?! that's surely N as well ...).

 Richard.




-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: C++ PATCH for c++/50500 (DR 1082, implicitly declared copy in class with move)

2011-11-01 Thread Jason Merrill

On 10/29/2011 05:07 PM, Eric Botcazou wrote:

DR 1082 changed the rules for implicitly declared copy constructors and
assignment operators in the presence of move ctor/op= such that if
either move operation is present, instead of being suppressed the copy
operations will still be declared, but as deleted.


We have detected a side effect of this change by means of -fdump-ada-spec:
implicit copy assignment operators are now generated in simple cases where
they were not previously generated, for example:


Oops, thanks.  Fixed thus.

commit 06151eabf195163c8885da36abae67ab60cf1978
Author: Jason Merrill ja...@redhat.com
Date:   Mon Oct 31 16:57:17 2011 -0400

	PR c++/50500
	DR 1082
	* search.c (lookup_fnfields_idx_nolazy): Split out from...
	(lookup_fnfields_1): ...here.
	(lookup_fnfields_slot_nolazy): Use it.
	* cp-tree.h: Declare it.
	* class.c (type_has_move_assign): Use it.
	(type_has_user_declared_move_assign): Likewise.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index a014d25..41d182a 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4485,7 +4485,7 @@ type_has_move_assign (tree t)
   lazily_declare_fn (sfk_move_assignment, t);
 }
 
-  for (fns = lookup_fnfields_slot (t, ansi_assopname (NOP_EXPR));
+  for (fns = lookup_fnfields_slot_nolazy (t, ansi_assopname (NOP_EXPR));
fns; fns = OVL_NEXT (fns))
 if (move_fn_p (OVL_CURRENT (fns)))
   return true;
@@ -4530,7 +4530,7 @@ type_has_user_declared_move_assign (tree t)
   if (CLASSTYPE_LAZY_MOVE_ASSIGN (t))
 return false;
 
-  for (fns = lookup_fnfields_slot (t, ansi_assopname (NOP_EXPR));
+  for (fns = lookup_fnfields_slot_nolazy (t, ansi_assopname (NOP_EXPR));
fns; fns = OVL_NEXT (fns))
 {
   tree fn = OVL_CURRENT (fns);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7ff1491..ac42e0e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5328,6 +5328,7 @@ extern tree lookup_field_1			(tree, tree, bool);
 extern tree lookup_field			(tree, tree, int, bool);
 extern int lookup_fnfields_1			(tree, tree);
 extern tree lookup_fnfields_slot		(tree, tree);
+extern tree lookup_fnfields_slot_nolazy		(tree, tree);
 extern int class_method_index_for_fn		(tree, tree);
 extern tree lookup_fnfields			(tree, tree, int);
 extern tree lookup_member			(tree, tree, int, bool);
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 97f593c..5f60eee 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -1335,10 +1335,11 @@ lookup_conversion_operator (tree class_type, tree type)
 }
 
 /* TYPE is a class type. Return the index of the fields within
-   the method vector with name NAME, or -1 if no such field exists.  */
+   the method vector with name NAME, or -1 if no such field exists.
+   Does not lazily declare implicitly-declared member functions.  */
 
-int
-lookup_fnfields_1 (tree type, tree name)
+static int
+lookup_fnfields_idx_nolazy (tree type, tree name)
 {
   VEC(tree,gc) *method_vec;
   tree fn;
@@ -1348,34 +1349,6 @@ lookup_fnfields_1 (tree type, tree name)
   if (!CLASS_TYPE_P (type))
 return -1;
 
-  if (COMPLETE_TYPE_P (type))
-{
-  if ((name == ctor_identifier
-	   || name == base_ctor_identifier
-	   || name == complete_ctor_identifier))
-	{
-	  if (CLASSTYPE_LAZY_DEFAULT_CTOR (type))
-	lazily_declare_fn (sfk_constructor, type);
-	  if (CLASSTYPE_LAZY_COPY_CTOR (type))
-	lazily_declare_fn (sfk_copy_constructor, type);
-	  if (CLASSTYPE_LAZY_MOVE_CTOR (type))
-	lazily_declare_fn (sfk_move_constructor, type);
-	}
-  else if (name == ansi_assopname (NOP_EXPR))
-	{
-	  if (CLASSTYPE_LAZY_COPY_ASSIGN (type))
-	lazily_declare_fn (sfk_copy_assignment, type);
-	  if (CLASSTYPE_LAZY_MOVE_ASSIGN (type))
-	lazily_declare_fn (sfk_move_assignment, type);
-	}
-  else if ((name == dtor_identifier
-		|| name == base_dtor_identifier
-		|| name == complete_dtor_identifier
-		|| name == deleting_dtor_identifier)
-	CLASSTYPE_LAZY_DESTRUCTOR (type))
-	lazily_declare_fn (sfk_destructor, type);
-}
-
   method_vec = CLASSTYPE_METHOD_VEC (type);
   if (!method_vec)
 return -1;
@@ -1445,6 +1418,46 @@ lookup_fnfields_1 (tree type, tree name)
   return -1;
 }
 
+/* TYPE is a class type. Return the index of the fields within
+   the method vector with name NAME, or -1 if no such field exists.  */
+
+int
+lookup_fnfields_1 (tree type, tree name)
+{
+  if (!CLASS_TYPE_P (type))
+return -1;
+
+  if (COMPLETE_TYPE_P (type))
+{
+  if ((name == ctor_identifier
+	   || name == base_ctor_identifier
+	   || name == complete_ctor_identifier))
+	{
+	  if (CLASSTYPE_LAZY_DEFAULT_CTOR (type))
+	lazily_declare_fn (sfk_constructor, type);
+	  if (CLASSTYPE_LAZY_COPY_CTOR (type))
+	lazily_declare_fn (sfk_copy_constructor, type);
+	  if (CLASSTYPE_LAZY_MOVE_CTOR (type))
+	lazily_declare_fn (sfk_move_constructor, type);
+	}
+  else if (name == ansi_assopname (NOP_EXPR))
+	{
+	  if (CLASSTYPE_LAZY_COPY_ASSIGN (type))
+	

Re: [C++ Patch] PR 44277

2011-11-01 Thread Jason Merrill

OK.

Jason


[wwwdocs] Use regular markup for java/status.html

2011-11-01 Thread Gerald Pfeifer
That does not fix the fact that the status is not up-to-date, but
makes things more consistent and easier to carry along in case of
future updates.

Applied.

Gerald

2011-11-01  Gerald Pfeifer  ger...@pfeifer.com
 
* status.html: Use h2 instead of fake tables.

Index: status.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/java/status.html,v
retrieving revision 1.30
diff -u -r1.30 status.html
--- status.html 27 Jul 2004 23:59:38 -  1.30
+++ status.html 1 Nov 2011 14:03:45 -
@@ -17,14 +17,7 @@
 pStatus of GCJ as of GCC 3.2.  Improvements that are only
 in current development versions are marked as in CVS./p
 
-table id=features border=0 cellpadding=4 width=95%
-tr bgcolor=#b0d0ff
- th align=left
- Core Features
- /th
-/tr
-/table
-br /
+h2 id=featuresCore Features/h2
 
 ul
 liCompile Java source code (ahead-of-time) to native (machine) code,/li
@@ -43,14 +36,8 @@
 liAn extensive class library - see below./li
 /ul
 
-table id=packages border=0 cellpadding=4 width=95%
-tr bgcolor=#b0d0ff
- th align=left
- Implemented Packages
- /th
-/tr
-/table
 
+h2 id=packagesImplemented Packages/h2
 
 pYou can also see a href=http://www.kaffe.org/~stuart/japi/;a
 comparison of libgcj with the JDK/a.  This is updated nightly.  It
@@ -118,13 +105,8 @@
 a comparison of the GUI branch with Classpath/a.
 /p
 
-table id=targets border=0 cellpadding=4 width=95%
-tr bgcolor=#b0d0ff
- th align=left
- Supported Targets
- /th
-/tr
-/table
+
+h2 id=targetsSupported Targets/h2
 
 dl
 dt class=targetGNU/Linux on the Pentium-compatible PCs


[wwwdocs] Prepare GCC 4.7 release notes for the release

2011-11-01 Thread Gerald Pfeifer
...at least somwhat, and also to then serve as a better template
for the following release.

Sort ARM, MIPS and picochip alphabetically, add an anchor for MIPS.
Comment out empty sections.

Applied.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html1 Nov 2011 03:56:01 -   1.52
+++ changes.html1 Nov 2011 15:10:07 -
@@ -11,6 +11,7 @@
 body
 h1GCC 4.7 Release Seriesbr /Changes, New Features, and Fixes/h1
 
+
 h2Caveats/h2
 
   ul
@@ -51,6 +52,7 @@
 obsoleted in GCC 4.6./li
   /ul
 
+
 h2General Optimizer Improvements/h2
 
   ul
@@ -168,6 +170,7 @@
 /li
   /ul
 
+
 h2New Languages and Language specific improvements/h2
 
   ul
@@ -399,10 +402,23 @@
   /ul/li
   /ul
 
+!--
 h3Java (GCJ)/h3
+--
+
 
 h2 id=targetsNew Targets and Target Specific Improvements/h2
 
+h3 id=armARM/h3
+  ul
+liThe default vector size in auto-vectorization for NEON is now 128 bits.
+  If vectorization fails thusly, the vectorizer tries again with
+  64-bit vectors./li
+liA new option code-mvectorize-with-neon-double/code was added to
+  allow users to change the vector size to 64 bits./li
+
+  /ul
+
 h3C6X/h3
   ul
 liSupport has been added for the Texas Instruments C6X family of
@@ -430,6 +446,14 @@
 li.../li
   /ul
 
+!--
+h3 id=mipsMIPS/h3
+--
+
+!--
+h3 id=picochippicochip/h3
+--
+
 h3PowerPC/PowerPC64/h3
   ul
 liVectors of type ivector long long/i or ivector long/i are
@@ -448,8 +472,6 @@
  /li
   /ul
 
-h3MIPS/h3
-
 h3SPARC/h3
   ul
 liThe option code-mflat/code has been reinstated.  When it is
@@ -490,19 +512,11 @@
 default on UltraSPARC T3 (Niagara 3) and later CPUs./li
   /ul
 
-h3 id=picochippicochip/h3
-
-h3 id=armARM/h3
-ul
-liThe default vector size in auto-vectorization for NEON is now 128 bits.
-  If vectorization fails thusly, the vectorizer tries again with
-  64-bit vectors./li
-liA new option code-mvectorize-with-neon-double/code was added to
-  allow users to change the vector size to 64 bits./li
-
-  /ul
 
+!--
 h2Documentation improvements/h2
+--
+
 
 h2Other significant improvements/h2
 


[libstdc++, patch] Refer to GNU/Linux in acinclude.m4

2011-11-01 Thread Gerald Pfeifer
Applied, based on ongoing exchange with RMS.

Gerald


2011-10-31  Gerald Pfeifer  ger...@pfeifer.com

* acinclude.m4 (GLIBCXX_CONFIGURE): Refer to GNU/Linux.
* configure: Regenerate.

Index: acinclude.m4
===
--- acinclude.m4(revision 180677)
+++ acinclude.m4(working copy)
@@ -94,8 +94,8 @@
   ## (Right now, this only matters for enable_wchar_t, but nothing prevents
   ## other macros from doing the same.  This should be automated.)  -pme
 
-  # Check for C library flavor since Linux platforms use different 
configuration
-  # directories depending on the C library in use.
+  # Check for C library flavor since GNU/Linux platforms use different
+  # configuration directories depending on the C library in use.
   AC_EGREP_CPP([_using_uclibc], [
   #include stdio.h
   #if __UCLIBC__
Index: configure
===
--- configure   (revision 180677)
+++ configure   (working copy)
@@ -5219,8 +5219,8 @@
   ## (Right now, this only matters for enable_wchar_t, but nothing prevents
   ## other macros from doing the same.  This should be automated.)  -pme
 
-  # Check for C library flavor since Linux platforms use different 
configuration
-  # directories depending on the C library in use.
+  # Check for C library flavor since GNU/Linux platforms use different
+  # configuration directories depending on the C library in use.
   cat confdefs.h - _ACEOF conftest.$ac_ext
 /* end confdefs.h.  */
 


Re: v2[PATCH] update to libtool-2.4.2 and regenerate

2011-11-01 Thread Gerald Pfeifer
On Mon, 31 Oct 2011, Markus Trippelsdorf wrote:
 This is an updated version of the libtool update patch. It fixes the
 --with-sysroot clash by reverting commit 3334f7ed5851ef1 in libtools.
 I've also included Rainer's 64bit Solaris patch.

For the record, older versions of libtool have references to Linux
(where RMS would like to see GNU/Linux) which this addresses, too.

Doing this update really beneficial from this side as well.

Gerald


Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)

2011-11-01 Thread Richard Henderson
On 11/01/2011 06:35 AM, Jakub Jelinek wrote:
 ... disadvantage is that the stmts in the new
 pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order
 instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why
 the scheduler didn't change it, but on the other side it is scheduler's
 job).

I wonder if the scheduling description didn't get updated properly?
If the scheduler believes that the each insn takes 1 cycle, and there
is only one pipe for them, it won't reorder anything.

   * config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New
   prototype.
   * config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New
   function.
   * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use
   it.
   (ssepackfltmode): New mode attr.
   (vec_pack_ufix_trunc_mode): New expander.

Looks good to me.


r~


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely
On 1 November 2011 11:54, Marc Glisse wrote:
 On Tue, 1 Nov 2011, niXman wrote:

 diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc
 index 09e7fc5..6feda4d 100644
 --- a/libstdc++-v3/src/thread.cc
 +++ b/libstdc++-v3/src/thread.cc
 @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  unsigned int
  thread::hardware_concurrency() noexcept
  {
 -    int __n = _GLIBCXX_NPROCS;
 -    if (__n  0)
 -      __n = 0;
 -    return __n;
 +    int count=0;
 +#if defined(PTW32_VERSION) || \
 +   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
 +   defined(__hpux)
 +    count=pthread_num_processors_np();
 +#elif defined(__APPLE__) || defined(__FreeBSD__)
 +    size_t size=sizeof(count);
 +    sysctlbyname(hw.ncpu, count, size, NULL, 0);
 +#elif defined(_SC_NPROCESSORS_ONLN)
 +    count=sysconf(_SC_NPROCESSORS_ONLN);
 +#elif defined(_GLIBCXX_USE_GET_NPROCS)
 +    count=_GLIBCXX_NPROCS;
 +#endif
 +    return (count0)?count:0;

 Er, the macro _GLIBCXX_NPROCS already handles the case
 sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the
 macro _GLIBCXX_NPROCS completely.

Right, I already handled the case of using sysconf.  I'm going to veto
this patch in its current form - please check how it works now before
changing this code.

_GLIBCXX_NPROCS should be made to call pthread_num_processors_np() for
mingw or HPUX.


Re: PowerPC shrink-wrap support 3 of 3

2011-11-01 Thread Alan Modra
On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 Bits left to do
 - limit size of duplicated tails

Done here.  Also fixes a hole in that I took no notice of
targetm.cannot_copy_insn_p when duplicating tails.

One interesting result is that the tail duplication actually reduces
the text size of libstdc++.so from 1074042 to 1073478 bytes on
powerpc-linux.  The reason being that a shrink-wrapped function that
needs a prologue only on paths ending in a sibling call will lose one
copy of the epilogue.  That must happen enough to more than make up
for duplicated tails.

Bootstrapped and regression tested powerpc-linux.  OK to apply?
(And I won't be posting any more versions of the patch until this is
reviewed.  Please excuse me for spamming the list.)

* function.c (bb_active_p): Delete.
(dup_block_and_redirect, active_insn_between): New functions.
(convert_jumps_to_returns, emit_return_for_exit): New functions,
split out from..
(thread_prologue_and_epilogue_insns): ..here.  Delete
shadowing variables.  Don't do prologue register clobber tests
when shrink wrapping already failed.  Delete all last_bb_active
code.  Instead compute tail block candidates for duplicating
exit path.  Remove these from antic set.  Duplicate tails when
reached from both blocks needing a prologue/epilogue and
blocks not needing such.
* ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
HAVE_simple_return.
* bb-reorder.c (get_uncond_jump_length): Make global.
* bb-reorder.h (get_uncond_jump_length): Declare.
* cfgrtl.c (rtl_create_basic_block): Comment typo fix.
(rtl_split_edge): Likewise.  Warning fix.
(rtl_duplicate_bb): New function.
(rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

Index: gcc/function.c
===
--- gcc/function.c  (revision 180588)
+++ gcc/function.c  (working copy)
@@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
 #include df.h
 #include timevar.h
 #include vecprim.h
+#include params.h
+#include bb-reorder.h
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R
   HARD_REG_SET hardregs;
   unsigned regno;
 
-  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
-return false;
   if (CALL_P (insn))
 return !SIBLING_CALL_P (insn);
 
@@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump)
 JUMP_LABEL (returnjump) = ret_rtx;
 }
 
-/* Return true if BB has any active insns.  */
+#ifdef HAVE_simple_return
+/* Create a copy of BB instructions and insert at BEFORE.  Redirect
+   preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
+static void
+dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
+   bitmap_head *need_prologue)
+{
+  edge_iterator ei;
+  edge e;
+  rtx insn = BB_END (bb);
+
+  /* We know BB has a single successor, so there is no need to copy a
+ simple jump at the end of BB.  */
+  if (simplejump_p (insn))
+insn = PREV_INSN (insn);
+
+  start_sequence ();
+  duplicate_insn_chain (BB_HEAD (bb), insn);
+  if (dump_file)
+{
+  unsigned count = 0;
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+   if (active_insn_p (insn))
+ ++count;
+  fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n,
+  bb-index, copy_bb-index, count);
+}
+  insn = get_insns ();
+  end_sequence ();
+  emit_insn_before (insn, before);
+
+  /* Redirect all the paths that need no prologue into copy_bb.  */
+  for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); )
+if (!bitmap_bit_p (need_prologue, e-src-index))
+  {
+   redirect_edge_and_branch_force (e, copy_bb);
+   continue;
+  }
+else
+  ei_next (ei);
+}
+#endif
+
+#if defined (HAVE_return) || defined (HAVE_simple_return)
+/* Return true if there are any active insns between HEAD and TAIL.  */
 static bool
-bb_active_p (basic_block bb)
+active_insn_between (rtx head, rtx tail)
 {
+  while (tail)
+{
+  if (active_insn_p (tail))
+   return true;
+  if (tail == head)
+   return false;
+  tail = PREV_INSN (tail);
+}
+  return false;
+}
+
+/* LAST_BB is a block that exits, and empty of active instructions.
+   Examine its predecessors for jumps that can be converted to
+   (conditional) returns.  */
+static VEC (edge, heap) *
+convert_jumps_to_returns (basic_block last_bb, bool simple_p,
+ VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED)
+{
+  int i;
+  basic_block bb;
   rtx label;
+  edge_iterator ei;
+  edge e;
+  VEC(basic_block,heap) *src_bbs;
+
+  src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb-preds));
+  FOR_EACH_EDGE (e, ei, last_bb-preds)
+if (e-src != ENTRY_BLOCK_PTR)
+  VEC_quick_push (basic_block, 

Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread niXman
With what exactly do you don't accept this patch?


2011/11/1 Jonathan Wakely jwakely@gmail.com:
 On 1 November 2011 11:54, Marc Glisse wrote:
 On Tue, 1 Nov 2011, niXman wrote:

 diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc
 index 09e7fc5..6feda4d 100644
 --- a/libstdc++-v3/src/thread.cc
 +++ b/libstdc++-v3/src/thread.cc
 @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  unsigned int
  thread::hardware_concurrency() noexcept
  {
 -    int __n = _GLIBCXX_NPROCS;
 -    if (__n  0)
 -      __n = 0;
 -    return __n;
 +    int count=0;
 +#if defined(PTW32_VERSION) || \
 +   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
 +   defined(__hpux)
 +    count=pthread_num_processors_np();
 +#elif defined(__APPLE__) || defined(__FreeBSD__)
 +    size_t size=sizeof(count);
 +    sysctlbyname(hw.ncpu, count, size, NULL, 0);
 +#elif defined(_SC_NPROCESSORS_ONLN)
 +    count=sysconf(_SC_NPROCESSORS_ONLN);
 +#elif defined(_GLIBCXX_USE_GET_NPROCS)
 +    count=_GLIBCXX_NPROCS;
 +#endif
 +    return (count0)?count:0;

 Er, the macro _GLIBCXX_NPROCS already handles the case
 sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the
 macro _GLIBCXX_NPROCS completely.

 Right, I already handled the case of using sysconf.  I'm going to veto
 this patch in its current form - please check how it works now before
 changing this code.

 _GLIBCXX_NPROCS should be made to call pthread_num_processors_np() for
 mingw or HPUX.



Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely
I've put gcc-patches@ back in the CC list and removed gcc@


On 1 November 2011 15:35, niXman wrote:
 Er, the macro _GLIBCXX_NPROCS already handles
 the case sysconf(_SC_NPROCESSORS_ONLN).
 It looks like you actually want to remove the macro
 _GLIBCXX_NPROCS completely.

 Fixed.

No, this still isn't acceptable.

I do not want to see preprocessor tests like

+#elif defined(__APPLE__) || defined(__FreeBSD__)

in the body of get_thread::hardware_concurrency(), the configure
script should determine what is available on the platform and set an
appropriate macro.

Look at the definition of _GLIBCXX_NPROCS and adjust that to do

#define _GLIBCXX_NPROCS pthread_num_processors_np()

for the relevant platforms.

For the platforms using sysctlbyname there could be an inline function
that calls it, and _GLIBCXX_NPROCS could be defined to call that, so
that thread::hardware_concurrency() can still be defined as it is
today.

Please read the code you're changing and understand how it works today
before making changes.


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread niXman
 What header is required for pthread_num_processors_np?
pthread.h

 Also, you should include sys/sysctl.h before calling sysctlbyname.
On the right - yes.
sysctlbyname() implicitly included in some header files.


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread niXman
Ok. I correct it.

2011/11/1 Jonathan Wakely jwakely@gmail.com:
 I've put gcc-patches@ back in the CC list and removed gcc@


 On 1 November 2011 15:35, niXman wrote:
 Er, the macro _GLIBCXX_NPROCS already handles
 the case sysconf(_SC_NPROCESSORS_ONLN).
 It looks like you actually want to remove the macro
 _GLIBCXX_NPROCS completely.

 Fixed.

 No, this still isn't acceptable.

 I do not want to see preprocessor tests like

 +#elif defined(__APPLE__) || defined(__FreeBSD__)

 in the body of get_thread::hardware_concurrency(), the configure
 script should determine what is available on the platform and set an
 appropriate macro.

 Look at the definition of _GLIBCXX_NPROCS and adjust that to do

 #define _GLIBCXX_NPROCS pthread_num_processors_np()

 for the relevant platforms.

 For the platforms using sysctlbyname there could be an inline function
 that calls it, and _GLIBCXX_NPROCS could be defined to call that, so
 that thread::hardware_concurrency() can still be defined as it is
 today.

 Please read the code you're changing and understand how it works today
 before making changes.



[RFC][cxx-mem-model] mem_signal_fence

2011-11-01 Thread Richard Henderson
Any comments on the expectation, or implementation of signal-fence below?
Should I make the distinction between the memory models here at all?

At minimum there's another typo in the ifdef section; we really need to
minimize those...


r~
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 756070f..34922a8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5530,16 +5530,18 @@ expand_builtin_atomic_is_lock_free (tree exp)
 /* This routine will either emit the mem_thread_fence pattern or issue a 
sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
+#ifndef HAVE_mem_thread_fence
+# define HAVE_mem_thread_fence 0
+# define gen_mem_thread_fence(x) (gcc_unreachable (), NULL_RTX)
+#endif
+
 void
 expand_builtin_mem_thread_fence (enum memmodel model)
 {
-  if (model == MEMMODEL_RELAXED)
-return;
-#ifdef HAVE_mem_thread_fence
-  emit_insn (gen_mem_thread_fence (GEN_INT (model)));
-#else
-  expand_builtin_sync_synchronize ();
-#endif
+  if (HAVE_mem_thread_fence)
+emit_insn (gen_mem_thread_fence (GEN_INT (model)));
+  else if (model != MEMMODEL_RELAXED)
+expand_builtin_sync_synchronize ();
 }
 
 /* Expand the __atomic_thread_fence intrinsic:
@@ -5558,15 +5560,38 @@ expand_builtin_atomic_thread_fence (tree exp)
 /* This routine will either emit the mem_signal_fence pattern or issue a 
sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
+#ifndef HAVE_mem_signal_fence
+# define HAVE_mem_signal_fence 0
+# define gen_mem_signal_fence(x) (gcc_unreachable (), NULL_RTX)
+#endif
+
 static void
 expand_builtin_mem_signal_fence (enum memmodel model)
 {
-#ifdef HAVE_mem_signal_fence
-  emit_insn (gen_mem_signal_fence (memmodel));
-#else
-  if (model != MEMMODEL_RELAXED)
-expand_builtin_sync_synchronize ();
-#endif
+  if (HAVE_mem_signal_fence)
+emit_insn (gen_mem_signal_fence (GEN_INT (model)));
+  else
+{
+  rtx x;
+
+  /* By default I expect that targets are coherent between a thread and
+the signal handler running on the same thread.  Thus this really
+becomes a compiler barrier, in that stores must not be sunk past
+(or raised above) a given point.  */
+  switch (model)
+   {
+   case MEMMODEL_RELAXED:
+ break;
+   case MEMMODEL_SEQ_CST:
+ gen_blockage ();
+ break;
+   default:
+ x = gen_rtx_SCRATCH (Pmode);
+ x = gen_rtx_MEM (BLKmode, x);
+ emit_insn (gen_rtx_USE (x));
+ break;
+   }
+}
 }
 
 /* Expand the __atomic_signal_fence intrinsic:


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely
On 1 November 2011 15:57, niXman wrote:
 What header is required for pthread_num_processors_np?
 pthread.h

OK.

This assumes that Pthreads is the only abstraction available on __hpux
(i.e. that if _GLIBCXX_HAS_GTHREADS is true then we have already
included pthread.h):

+#if defined(PTW32_VERSION) || \
+   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
+   defined(__hpux)

Is that assumption safe?


 Also, you should include sys/sysctl.h before calling sysctlbyname.
 On the right - yes.
 sysctlbyname() implicitly included in some header files.

The manual page says it requires sys/sysctl.h so please do that,
otherwise a future version of darwin or freebsd might stop implicitly
including it and break the code.


Re: [libstdc++, patch] Refer to GNU/Linux in acinclude.m4

2011-11-01 Thread Joseph S. Myers
On Tue, 1 Nov 2011, Gerald Pfeifer wrote:

 -  # Check for C library flavor since Linux platforms use different 
 configuration
 -  # directories depending on the C library in use.
 +  # Check for C library flavor since GNU/Linux platforms use different
 +  # configuration directories depending on the C library in use.

I think this is a case that is definitely referring to platforms using the 
Linux kernel and not restricted in any way to GNU/Linux platforms (so 
platforms using the Linux kernel might be a better description in the 
comment).  It's a comment on tests for uClibc and Bionic, and even if you 
account for some GNU code present in uClibc, Bionic is the C library for 
Android which is the canonical example of a Linux system which is not 
GNU/Linux (no GPL code in userspace) - the test is for whether a Linux 
system is GNU/Linux or not.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely
On 1 November 2011 16:01, Jonathan Wakely wrote:
 On 1 November 2011 15:57, niXman wrote:
 What header is required for pthread_num_processors_np?
 pthread.h

 OK.

 This assumes that Pthreads is the only abstraction available on __hpux
 (i.e. that if _GLIBCXX_HAS_GTHREADS is true then we have already
 included pthread.h):

 +#if defined(PTW32_VERSION) || \
 +   (defined(__MINGW64_VERSION_MAJOR)  defined(_POSIX_THREADS)) || \
 +   defined(__hpux)

 Is that assumption safe?

OK, gthr-dec.h includes pthread.h so I think it is safe.

Do all supported versions of Pthreads-win32, mingw64 and HPUX define
pthread_num_processors_np() in pthread.h?  They might not, which is
why there should be a configure test checking for the availability of
that function, which sets a macro such as
_GLIBCXX_USE_PTHREAD_NUM_PROCESSORS_NP, which is then checked in
src/thread.cc


Re: [libstdc++, patch] Refer to GNU/Linux in acinclude.m4

2011-11-01 Thread Gerald Pfeifer
On Tue, 1 Nov 2011, Joseph S. Myers wrote:
 +  # Check for C library flavor since GNU/Linux platforms use different
 +  # configuration directories depending on the C library in use.
 I think this is a case that is definitely referring to platforms using the 
 Linux kernel and not restricted in any way to GNU/Linux platforms (so 
 platforms using the Linux kernel might be a better description in the 
 comment).  It's a comment on tests for uClibc and Bionic, and even if you 
 account for some GNU code present in uClibc, Bionic is the C library for 
 Android which is the canonical example of a Linux system which is not 
 GNU/Linux (no GPL code in userspace) - the test is for whether a Linux 
 system is GNU/Linux or not.

I was thinking of that, and agree it's a border case.  Given that
significant parts of the GNU toolchain are being used here it's not
just about the Linux kernel, but also at least some parts of GNU
and from that point it gets messy pretty quickly.  (Luckily I have
not seen GNU/Solaris being suggested yet.)

If you feel this is simply a mistake, happy to change to platforms
using the Linux kernel or start a conversation with you and RMS
(though the latter may be overkill).  Let me know.

Gerald


Re: building binutils from same directory as gcc

2011-11-01 Thread Mike Stump
On Nov 1, 2011, at 4:27 AM, Andrew Haley wrote:
 On 10/30/2011 01:51 PM, Gerald Pfeifer wrote:
 Why not just declare
 that building from the same directory is not support and have one
 simple set of instructions that always works, as opposed to this
 ought to work with snapshots but not with direct checkouts?
 
 That's right.  Is there ever any advantage to building in-srcdir?

Yes.  You can do configure  make  make install.


Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)

2011-11-01 Thread Xinliang David Li
that means some existing bugs get exposed. Your previous version
simply skipped the target mem refs.  You will need to debug the
problem a little more.

David

On Tue, Nov 1, 2011 at 5:26 AM,  dvyu...@google.com wrote:
 On 2011/10/31 06:08:34, davidxl wrote:

 http://codereview.appspot.com/5303083/diff/1/gcc/passes.c#newcode1423
 gcc/passes.c:1423: NEXT_PASS (pass_tsan);
 Move this to the same place as asan. Otherwise TARGET_MEM_REF won't be

 handled.

 After I moved the pass it started crashing:
 Program received signal SIGSEGV, Segmentation fault.
 0x00718f94 in is_gimple_reg_type (t=0x7771efa0)
    at gimple.c:2960
 2960      return !AGGREGATE_TYPE_P (type);
 (gdb) bt
 #0  0x00718f94 in is_gimple_reg_type (t=0x7771efa0)
    at gimple.c:2960
 #1  is_gimple_val (t=0x7771efa0) at gimple.c:3028
 #2  0x008a5d20 in verify_types_in_gimple_reference
 (expr=0x776b74c0, require_lvalue=false)
    at tree-cfg.c:2934
 #3  0x008b2d4f in verify_gimple_in_cfg (fn=0x777c67e0)
    at tree-cfg.c:4382
 #4  0x00a061d6 in verify_ssa (check_modified_stmt=true)
    at tree-ssa.c:924
 #5  0x007f755c in execute_function_todo (data=Unhandled dwarf
 expression opcode 0xf3
 )
    at passes.c:1727
 #6  0x007f7e4d in execute_todo (flags=34854) at passes.c:1758
 #7  0x007fafda in execute_one_pass (pass=0x122a900)
    at passes.c:2104

 The code seems to be (however I am not 100% sure):
  D.3617_33 = MEM[(const uint64_t *)ctx_11(D)].nhkey[D.3612_26]{lb:
 D.3810_279 sz: 8};


 http://codereview.appspot.com/5303083/



Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Mike Stump
On Nov 1, 2011, at 8:55 AM, Jonathan Wakely wrote:
 Is there a reason you used hw.ncpu not the constant HW_NCPU ?

I suspect on some systems, this would be a runtime value  so, no fixed 
constant could ever work.


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely
On 1 November 2011 17:06, Mike Stump wrote:
 On Nov 1, 2011, at 8:55 AM, Jonathan Wakely wrote:
 Is there a reason you used hw.ncpu not the constant HW_NCPU ?

 I suspect on some systems, this would be a runtime value  so, no fixed 
 constant could ever work.

It's a constant for identifying the sysctl, not a constant for the
number of processors e.g. (untested)

  int mib[] = { CTL_HW, HW_NCPU };
  if (!sysctl(mib, 2, count, size, NULL, 0))

The Mac OS X man page says the sysctl() function runs in about a
third the time as the same request made via the sysctlbyname()
function.

My preferred solution (which would be consistent with the existing
code, and additionally support NetBSD, OpenBSD and Irix) would be to
add autoconf tests for the required functionality, then:

#if defined(_GLIBCXX_USE_GET_NPROCS)
# include sys/sysinfo.h
# define _GLIBCXX_NPROCS get_nprocs()
#elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)
# include unistd.h
# define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)
#elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)
# include unistd.h
# define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)
#elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)
# define _GLIBCXX_NPROCS pthread_num_processors_np()
#elif defined(_GLIBCXX_USE_SYSCTLBYNAME_HW_NCPU)
# include sys/sysctl.h
static inline int get_nprocs()
{
  int count;
  size_t size = sizeof(count);
  int mib[] = { CTL_HW, HW_NCPU };
  if (!sysctl(mib, 2, count, size, NULL, 0))
return count;
  return 0;
}
# define _GLIBCXX_NPROCS get_nprocs()
#else
# define _GLIBCXX_NPROCS 0
#endif

...

  unsigned int
  thread::hardware_concurrency() noexcept
  {
int __n = _GLIBCXX_NPROCS;
if (__n  0)
  __n = 0;
return __n;
  }


Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)

2011-11-01 Thread davidxl


http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c
File gcc/tree-tsan.c (right):

http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c#newcode1075
gcc/tree-tsan.c:1075: for (eidx = 0; VEC_iterate (edge, exit_bb-preds,
eidx, e); eidx++)
Use FOR_EACH_EDGE macro

http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c#newcode1082
gcc/tree-tsan.c:1082: gsi_insert_seq_before (gsi, post_func_seq,
GSI_SAME_STMT);
On 2011/11/01 11:39:49, dvyukov wrote:

Do I need to make a copy of POST_FUNC_SEQ here?
I think that I do not need location info for this code at all, so is

it OK to

leave the seq w/o location and then insert it into several basic

blocks?

Yes, do not share gimple stmts.

http://codereview.appspot.com/5303083/


Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Mike Stump
On Nov 1, 2011, at 10:13 AM, Jonathan Wakely wrote:
 I suspect on some systems, this would be a runtime value  so, no fixed 
 constant could ever work.
 
 It's a constant for identifying the sysctl, not a constant for the
 number of processors e.g. (untested)

Ah, never mind, ignore me.


[Patch, libfortran] PR 46686 Implement backtrace using libgcc functionality

2011-11-01 Thread Janne Blomqvist
Hi,

the attached patch changes the backtracing functionality, which is
used to print a stack trace before aborting when something goes
belly-up, to use the stack unwinding functionality provided by libgcc
instead of using the glibc backtrace_symbols and backtrace_symbols_fd
functions, or the pstack utility which is available on some systems
(Solaris?). There are some nice benefits of this:

- It should work on all targets, not only those which use glibc or pstack.

- It gets the correct line numbers, whereas the backtrace_symbols_fd
output was usually (but not always) offset by one. This is probably
related to the use of _Unwind_GetIPInfo and in some cases decrementing
the IP.

- Based on some googling, it's a bit unclear whether backtrace()
and/or backtrace_symbols_fd() actually are async-signal-safe due to
usage of dlsym/dladdr and such.

It still uses addr2line if available to print out function and file
names and line numbers. If addr2line is not found on the path during
program startup, it resorts to printing out the addresses only.

Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

2011-11-01  Janne Blomqvist  j...@gcc.gnu.org

PR fortran/46686
* configure.ac: Don't check execinfo.h, backtrace,
backtrace_symbols_fd. Check execve instead of execvp. Call
GCC_CHECK_UNWIND_GETIPINFO.
* runtime/backtrace.c: Don't include unused headers, include
limits.h and unwind.h.
(CAN_FORK): Check execve instead of execvp.
(GLIBC_BACKTRACE): Remove.
(bt_header): Conform to gdb backtrace format.
(struct bt_state): New struct.
(trace_function): New function.
(show_backtrace): Use _Unwind_Backtrace from libgcc instead of
glibc backtrace functions.



-- 
Janne Blomqvist
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 74cfe44..32431c0 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -249,7 +249,7 @@ AC_HEADER_TIME
 AC_HAVE_HEADERS(stdio.h stdlib.h string.h unistd.h signal.h stdarg.h)
 AC_CHECK_HEADERS(time.h sys/time.h sys/times.h sys/resource.h)
 AC_CHECK_HEADERS(sys/types.h sys/stat.h sys/wait.h floatingpoint.h ieeefp.h)
-AC_CHECK_HEADERS(fenv.h fptrap.h float.h execinfo.h pwd.h)
+AC_CHECK_HEADERS(fenv.h fptrap.h float.h pwd.h)
 AC_CHECK_HEADER([complex.h],[AC_DEFINE([HAVE_COMPLEX_H], [1], [complex.h exists])])
 GCC_HEADER_STDINT(gstdint.h)
 
@@ -261,14 +261,11 @@ AC_CHECK_MEMBERS([struct stat.st_rdev])
 AC_CHECK_FUNCS(getrusage times mkstemp strtof strtold snprintf ftruncate chsize)
 AC_CHECK_FUNCS(chdir strerror getlogin gethostname kill link symlink perror)
 AC_CHECK_FUNCS(sleep time ttyname signal alarm clock access fork execl)
-AC_CHECK_FUNCS(wait setmode execvp pipe dup2 close fdopen strcasestr getrlimit)
+AC_CHECK_FUNCS(wait setmode execve pipe dup2 close fdopen strcasestr getrlimit)
 AC_CHECK_FUNCS(gettimeofday stat fstat lstat getpwuid vsnprintf dup getcwd)
 AC_CHECK_FUNCS(localtime_r gmtime_r strerror_r getpwuid_r ttyname_r)
 AC_CHECK_FUNCS(clock_gettime strftime readlink)
 
-# Check for glibc backtrace functions
-AC_CHECK_FUNCS(backtrace backtrace_symbols_fd)
-
 # Check libc for getgid, getpid, getuid
 AC_CHECK_LIB([c],[getgid],[AC_DEFINE([HAVE_GETGID],[1],[libc includes getgid])])
 AC_CHECK_LIB([c],[getpid],[AC_DEFINE([HAVE_GETPID],[1],[libc includes getpid])])
@@ -562,6 +559,9 @@ LIBGFOR_CHECK_UNLINK_OPEN_FILE
 # Check whether line terminator is LF or CRLF
 LIBGFOR_CHECK_CRLF
 
+# Check whether we have _Unwind_GetIPInfo for backtrace
+GCC_CHECK_UNWIND_GETIPINFO
+
 AC_CACHE_SAVE
 
 if test ${multilib} = yes; then
diff --git a/libgfortran/runtime/backtrace.c b/libgfortran/runtime/backtrace.c
index 7d6479f..70aae91 100644
--- a/libgfortran/runtime/backtrace.c
+++ b/libgfortran/runtime/backtrace.c
@@ -26,46 +26,38 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #include string.h
 
-#ifdef HAVE_STDLIB_H
-#include stdlib.h
-#endif
-
-#ifdef HAVE_INTTYPES_H
-#include inttypes.h
-#endif
-
 #ifdef HAVE_UNISTD_H
 #include unistd.h
 #endif
 
-#ifdef HAVE_EXECINFO_H
-#include execinfo.h
-#endif
-
 #ifdef HAVE_SYS_WAIT_H
 #include sys/wait.h
 #endif
 
-#include ctype.h
+#include limits.h
+
+#include unwind.h
 
 
 /* Macros for common sets of capabilities: can we fork and exec, can
we use glibc-style backtrace functions, and can we use pipes.  */
-#define CAN_FORK (defined(HAVE_FORK)  defined(HAVE_EXECVP) \
+#define CAN_FORK (defined(HAVE_FORK)  defined(HAVE_EXECVE) \
 		   defined(HAVE_WAIT))
-#define GLIBC_BACKTRACE (defined(HAVE_BACKTRACE) \
-			  defined(HAVE_BACKTRACE_SYMBOLS_FD))
 #define CAN_PIPE (CAN_FORK  defined(HAVE_PIPE) \
 		   defined(HAVE_DUP2)  defined(HAVE_FDOPEN) \
 		   defined(HAVE_CLOSE))
 
+#ifndef PATH_MAX
+#define PATH_MAX 4096
+#endif
+
 
 /* GDB style #NUM index for each stack frame.  */
 
 static void 
 bt_header (int num)
 {
-  st_printf ( #%d  , num);
+  st_printf (#%d  , num);
 }
 
 
@@ -106,24 +98,105 @@ 

Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-11-01 Thread Peter Bergner
On Mon, 2011-10-31 at 19:05 -0400, David Edelsohn wrote:
 Okay, go ahead with PPC64 support as well.  Hopefully no one ever will
 have to use it.  That implies the option should not explicitly
 reference ppc476.

Ok, for completeness, I attached what I committed below, which includes
the support for 64-bit because it makes the code cleaner and changes
the option name back to -mpreserve-link-stack.  Thanks.

Peter


* config.gcc (powerpc*-*-linux*): Add powerpc*-*-linux*ppc476* variant.
* config/rs6000/476.h: New file.
* config/rs6000/476.opt: Likewise.
* config/rs6000/rs6000.h (TARGET_LINK_STACK): New define.
(SET_TARGET_LINK_STACK): Likewise.
(TARGET_ASM_CODE_END): Define.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
TARGET_LINK_STACK for -mtune=476 and -mtune=476fp.
(rs6000_legitimize_tls_address): Emit the link stack preserving GOT
code if TARGET_LINK_STACK.
(rs6000_emit_load_toc_table): Likewise.
(output_function_profiler): Likewise
(macho_branch_islands): Likewise
(machopic_output_stub): Likewise
(get_ppc476_thunk_name): New function.
(rs6000_code_end): Likewise.
* config/rs6000/rs6000.md (load_toc_v4_PIC_1, load_toc_v4_PIC_1b):
Convert to a define_expand.
(load_toc_v4_PIC_1_normal): New define_insn.
(load_toc_v4_PIC_1_476): Likewise.
(load_toc_v4_PIC_1b_normal): Likewise.
(load_toc_v4_PIC_1b_476): Likewise.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 180740)
+++ gcc/config.gcc  (revision 180741)
@@ -2145,6 +2145,9 @@ powerpc-*-linux* | powerpc64-*-linux*)
esac
tmake_file=${tmake_file} t-slibgcc-libgcc
case ${target} in
+   powerpc*-*-linux*ppc476*)
+   tm_file=${tm_file} rs6000/476.h
+   extra_options=${extra_options} rs6000/476.opt ;;
powerpc*-*-linux*altivec*)
tm_file=${tm_file} rs6000/linuxaltivec.h ;;
powerpc*-*-linux*spe*)
Index: gcc/config/rs6000/476.h
===
--- gcc/config/rs6000/476.h (revision 0)
+++ gcc/config/rs6000/476.h (revision 180741)
@@ -0,0 +1,32 @@
+/* Enable IBM PowerPC 476 support.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Peter Bergner (berg...@vnet.ibm.com)
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+#undef TARGET_LINK_STACK
+#define TARGET_LINK_STACK (rs6000_link_stack)
+
+#undef SET_TARGET_LINK_STACK
+#define SET_TARGET_LINK_STACK(X) do { TARGET_LINK_STACK = (X); } while (0)
+
+#undef TARGET_ASM_CODE_END
+#define TARGET_ASM_CODE_END rs6000_code_end
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 180740)
+++ gcc/config/rs6000/rs6000-protos.h   (revision 180741)
@@ -173,6 +173,7 @@ extern void rs6000_emit_eh_reg_restore (
 extern const char * output_isel (rtx *);
 extern void rs6000_call_indirect_aix (rtx, rtx, rtx);
 extern void rs6000_aix_asm_output_dwarf_table_ref (char *);
+extern void get_ppc476_thunk_name (char name[32]);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/476.opt
===
--- gcc/config/rs6000/476.opt   (revision 0)
+++ gcc/config/rs6000/476.opt   (revision 180741)
@@ -0,0 +1,24 @@
+; IBM PowerPC 476 options.
+;
+; Copyright (C) 2011 Free Software Foundation, Inc.
+; Contributed by Peter Bergner (berg...@vnet.ibm.com)
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty 

Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-11-01 Thread Diego Novillo

On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote:

Diego mentioned that we can move the asan pass somewhere to the very
end, just before lowering to RTL.
Where would be this blessed place?
Does it still have TARGET_MEM_REF?


Right before pass_expand?  In init_optimization_passes(), look for 
NEXT_PASS (pass_expand).  That's RTL generation.  Somewhere before that.


TARGET_MEM_REFs are converted to RTL mems during RTL expansion.


Diego.



Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)

2011-11-01 Thread Uros Bizjak
On Tue, Nov 1, 2011 at 2:35 PM, Jakub Jelinek ja...@redhat.com wrote:

  Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion
  support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode
  conversion.
 
  Ok for trunk?

 Please put expander function into i386.c. IMO, this expander can be
 better written using variable mode and indirect functions.

 Like this?
 Advantage is that fixuns_truncmodesseintvecmodelower2 pattern can use
 the helper too and shrink, disadvantage is that the stmts in the new
 pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order
 instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why
 the scheduler didn't change it, but on the other side it is scheduler's
 job).

 2011-11-01  Jakub Jelinek  ja...@redhat.com

        * config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New
        prototype.
        * config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New
        function.
        * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use
        it.
        (ssepackfltmode): New mode attr.
        (vec_pack_ufix_trunc_mode): New expander.

OK.

Thanks,
Uros.


Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)

2011-11-01 Thread Martin Jambor
Hi,

sorry that I'm not using the fancy web tool but I do not want to use
my google account and gmail address in particular for work-related
stuff.

On Tue, Nov 01, 2011 at 06:05:46PM +, davi...@google.com wrote:


...

 
 http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode638
 gcc/tree-tsan.c:638: _vptr., sizeof (_vptr.) - 1) == 0)
 This is a very hacky way of recognizing vptr field. C++ FE provides
 TYPE_VFIELD macro to get the vptr field, but you will need to add a new
 langhook for it -- which is not liked in upstream -- so the hacky way
 may be ok (as it is for error checking purpose).
 

If you have a FIELD_DECL and want to check whether it is a VPTR, you
can simply use DECL_VIRTUAL_P.

Martin



Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)

2011-11-01 Thread Diego Novillo

On 11-11-01 15:26 , Martin Jambor wrote:

Hi,

sorry that I'm not using the fancy web tool but I do not want to use
my google account and gmail address in particular for work-related
stuff.


No worries.  You do not need to use the web tool at all.  You can simply 
reply to these messages.


As long as you keep re...@codereview.appspotmail.com in the CC and do 
not remove the (issue NN) string from the subject, your message 
will be added to the issue log (similarly to how bugzilla works).



Diego.


Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-11-01 Thread Xinliang David Li
On Tue, Nov 1, 2011 at 12:16 PM, Diego Novillo dnovi...@google.com wrote:
 On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote:

 Diego mentioned that we can move the asan pass somewhere to the very
 end, just before lowering to RTL.
 Where would be this blessed place?
 Does it still have TARGET_MEM_REF?

 Right before pass_expand?  In init_optimization_passes(), look for NEXT_PASS
 (pass_expand).  That's RTL generation.  Somewhere before that.


Why?

 TARGET_MEM_REFs are converted to RTL mems during RTL expansion.


What? they will still be seen by asan which can not be handled (e.g,
creating address expression out of it).

David



 Diego.




Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)

2011-11-01 Thread Xinliang David Li
On Tue, Nov 1, 2011 at 12:26 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 sorry that I'm not using the fancy web tool but I do not want to use
 my google account and gmail address in particular for work-related
 stuff.

 On Tue, Nov 01, 2011 at 06:05:46PM +, davi...@google.com wrote:


 ...


 http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode638
 gcc/tree-tsan.c:638: _vptr., sizeof (_vptr.) - 1) == 0)
 This is a very hacky way of recognizing vptr field. C++ FE provides
 TYPE_VFIELD macro to get the vptr field, but you will need to add a new
 langhook for it -- which is not liked in upstream -- so the hacky way
 may be ok (as it is for error checking purpose).


 If you have a FIELD_DECL and want to check whether it is a VPTR, you
 can simply use DECL_VIRTUAL_P.

ah yes, that will do.

thanks,

David


 Martin




Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-11-01 Thread Diego Novillo

On 11-11-01 15:34 , Xinliang David Li wrote:


Right before pass_expand?  In init_optimization_passes(), look for NEXT_PASS
(pass_expand).  That's RTL generation.  Somewhere before that.



Why?


The idea was to experiment where to best place ASAN to avoid 
instrumenting too much.  If we schedule it really late, then we may save 
ourselves some unnecessary instrumentation.


Though, I still think ASAN should never open code the library calls 
directly.  Rather, it should emit straight-code gimple that can be 
better understood and optimized away.




TARGET_MEM_REFs are converted to RTL mems during RTL expansion.



What? they will still be seen by asan which can not be handled (e.g,
creating address expression out of it).


So, it needs to run before TMRs are introduced then.  *shrug*.


Re: [patch] support for multiarch systems

2011-11-01 Thread Marc Glisse

On Sun, 21 Aug 2011, Matthias Klose wrote:


On 08/20/2011 09:51 PM, Matthias Klose wrote:

Multiarch [1] is the term being used to refer to the capability of a system to
install and run applications of multiple different binary targets on the same
system.  The idea and name of multiarch dates back to 2004/2005 [2] (to be
confused with multiarch in glibc).


attached is an updated patch which includes feedback from Jakub and Joseph.


Hello,

what is the status of this patch? Is it waiting for a review? Having gcc 
4.7 work out of the box on 2 of the most popular linux distributions seems 
like an important feature...


--
Marc Glisse


Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-11-01 Thread Xinliang David Li
On Tue, Nov 1, 2011 at 12:41 PM, Diego Novillo dnovi...@google.com wrote:
 On 11-11-01 15:34 , Xinliang David Li wrote:

 Right before pass_expand?  In init_optimization_passes(), look for
 NEXT_PASS
 (pass_expand).  That's RTL generation.  Somewhere before that.


 Why?

 The idea was to experiment where to best place ASAN to avoid instrumenting
 too much.  If we schedule it really late, then we may save ourselves some
 unnecessary instrumentation.


It needs to be balanced -- on one hand it needs to be as late as
possible so that as few memory references (dynamically executed) as
possible are instrumented. On the other hand, early enough so that the
instrumented code can be optimized sufficiently.

 Though, I still think ASAN should never open code the library calls
 directly.  Rather, it should emit straight-code gimple that can be better
 understood and optimized away.

that depends on the library function themselves -- if they are
trivial, inline sequence should be generated.



 TARGET_MEM_REFs are converted to RTL mems during RTL expansion.


 What? they will still be seen by asan which can not be handled (e.g,
 creating address expression out of it).

 So, it needs to run before TMRs are introduced then.  *shrug*.


yes it should be before ivopt as discussed.

David


Re: RFC: PATCH to adjust warning flags for C++

2011-11-01 Thread Gabriel Dos Reis
On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrill ja...@redhat.com wrote:
 Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus -Wall)
 broke bootstrap because of narrowing warnings, so I'd like to add
 -Wno-narrowing to the stage 2+ warning flags.  Is this the best way to do
 that?

why do we want to include -Wc++0x-compat in -Wall?


[PATCH, i386]: Fix PR50940, ICE in extract_insn, at recog.c:2137 during bootstrap

2011-11-01 Thread Uros Bizjak
Hello!

Fix a typo.

2011-10-30  Uros Bizjak  ubiz...@gmail.com

PR target/50940
* config/i386/i386.md (floatsimode2_vector_sse_with_temp splitter):
Compare ssevecmodemode with V4SFmode, not V4SImode.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 180741)
+++ config/i386/i386.md (working copy)
@@ -5053,7 +5053,7 @@
   emit_insn (gen_sse2_loadld (operands[4],
  CONST0_RTX (V4SImode), operands[2]));
 }
-  if (ssevecmodemode == V4SImode)
+  if (ssevecmodemode == V4SFmode)
 emit_insn (gen_floatv4siv4sf2 (operands[3], operands[4]));
   else
 emit_insn (gen_sse2_cvtdq2pd (operands[3], operands[4]));


Re: [PATCH] Fix errors in expand_atomic_store.

2011-11-01 Thread David Miller
From: Richard Henderson r...@redhat.com
Date: Tue, 01 Nov 2011 08:15:51 -0700

 Given that I believe that essentially all Sparcs still running
 are actually v9 and have native CAS, I think we can ignore this
 problem entirely.

Unfortunately, this is not true.

Otherwise we could change the 32-bit default code generation to
v9 from v7 under Linux.


Re: [PATCH] Add capability to run several iterations of early optimizations

2011-11-01 Thread Martin Jambor
Hi,

On Fri, Oct 28, 2011 at 04:06:20PM -0700, Matt wrote:

...

 
 I agree (of course). Having the knob will be very useful for testing
 and determining the acceptance criteria for the later smartness.
 While terminating early would be a nice optimization, the feature is
 still intrinsically useful and deployable without it. In addition,
 when using LTO on nearly all the projects/modules I tested on, 3+
 passes were always productive. To be fair, when not using LTO,
 beyond 2-3 passes did not often produce improvements unless
 individual compilation units were enormous.

I'm quite surprised you get extra benefit with LTO since early
optimizations are exactly the part of middle-end which should produce
the same results, LTO or not.  So the only way I can imagine this can
happen is that inlining analysis gets somehow a much better input and
then can make much bigger use of it.  If this is because of extra
early inlining, we might try to be able to catch these situations when
doing IPA inlining decisions which would work regardless of any
iteration number cut-off.  If it is because of something else, it's
probably better to (at least try to) tweak the passes and/or inlining
analysis to understand each other straight away.

 
 There was also the question of if some of the improvements seen with
 multiple passes were indicative of deficiencies in early inlining,
 CFG, SRA, 

SRA, because it is not flow-sensitive in any way, unfortunately
sometimes produces useless statements which then need to be cleanup up
by forwprop (and possibly dse and others).  We've already talked with
Richi about this and agreed the early one should be dumbed down a
little to produce much less of these.  I'm afraid I won't be able to
submit a patch doing that during this stage 1, though.

 etc. If the knob is available, I'm happy to continue
 testing on the same projects I've filed recent LTO/graphite bugs
 against (glib, zlib, openssl, scummvm, binutils, etc) and write a
 report on what I observe as suspicious improvements that perhaps
 should be caught/made in a single pass.
 
 It's worth noting again that while this is a useful feature in and
 of itself (especially when combined with LTO), it's *extremely*
 useful when coupled with the de-virtualization improvements
 submitted in other threads. The examples submitted for inclusion in
 the test suite aren't academic -- they are reductions of real-world
 performance issues from a mature (and shipping) C++-based networking
 product. Any C++ codebase that employs physical separation in their
 designs via Factory patterns, Interface Segregation, and/or
 Dependency Inversion will likely see improvements. To me, these
 enahncements combine to form one of the biggest leaps I've seen in
 C++ code optimization -- code that can be clean, OO, *and* fast.

Well, while I'd understand that whenever there is a new direct call
graph edge, trying early inlining again might help or save some work
for the full inlining, I think that we should rather try to enhance
the current IPA infrastructure rather than grow another one from the
early optimizations, especially if we aim at LTO - iterating early
optimizations will not help reduce abstraction if it is spread across
a number of compilation units.

Martin


Re: PING 2 : [Patch Darwin/PR49992 2/2] remove ranlib special-casing from the darwin port.

2011-11-01 Thread Arnaud Charlet

Le 28/10/2011 17:41, Iain Sandoe a écrit :

This is unreviewed for 2 weeks.

I am sure that this issue will be affecting Ada on Darwin10/11 with 
the latest toolchains.


It's actually under discussion and is pretty subtle, so delicate. Thanks 
for your patience.


Arno


Re: [PATCH] Fix errors in expand_atomic_store.

2011-11-01 Thread Richard Henderson
On 11/01/2011 01:20 PM, David Miller wrote:
 Unfortunately, this is not true.
 
 Otherwise we could change the 32-bit default code generation to
 v9 from v7 under Linux.

For v7, pa-risc, and sh, we originally allowed the test_and_set and
lock_release patterns to do non-obvious things with 0/1 constants.

My proposal is to *not* carry that over to the __atomic patterns.
The PA and SH targets have already switched to use kernel helpers,
and no longer rely on this hack.  The only one left is Sparc v7.

(1) Are there really live v7 still around?

At least with v8 we have SWAP, with which we can implement the full
__atomic_exchange pattern sans hackery.  We can't do that with just LDSTUB.

(2) Can we have the kernel implement some {SWAP,CAS}{4,8} primitives (possibly
via a special trap) that we can export from libgcc, as we do for ARM, PA,  
SH?

I believe that would allow all of the non-embedded linux to support all of
the c++11 atomic operations without having to resort to spinlocks.


r~


Re: [PATCH, devirtualization] Detect the new type in type change detection

2011-11-01 Thread Martin Jambor
Hi,

On Tue, Nov 01, 2011 at 10:37:10AM +0100, Richard Guenther wrote:
 On Mon, Oct 31, 2011 at 5:58 PM, Martin Jambor mjam...@suse.cz wrote:
  On Fri, Oct 28, 2011 at 11:21:23AM +0200, Richard Guenther wrote:
  On Thu, Oct 27, 2011 at 9:54 PM, Martin Jambor mjam...@suse.cz wrote:
   Hi,
  
   On Thu, Oct 27, 2011 at 11:06:02AM +0200, Richard Guenther wrote:
   On Thu, Oct 27, 2011 at 1:22 AM, Martin Jambor mjam...@suse.cz wrote:
Hi,
   
I've been asked by Maxim Kuvyrkov to revive the following patch which
has not made it to 4.6.  Currently, when type based devirtualization
detects a potential type change, it simply gives up on gathering any
information on the object in question.  This patch adds an attempt to
actually detect the new type after the change.
   
Maxim claimed this (and another patch I'll post tomorrow) noticeably
improved performance of some real code.  I can only offer a rather
artificial example in the attachment.  When the constructors are
inlined but the function multiply_matrices is not, this patch makes
the produced executable run for only 7 seconds instead of about 20 on
my 4 year old i686 desktop (with -Ofast).
   
Anyway, the patch passes bootstrap and testsuite on x86_64-linux.
What do you think, is it a good idea for trunk now?
   
Thanks,
   
Martin
   
   
2011-10-21  Martin Jambor  mjam...@suse.cz
   
       * ipa-prop.c (type_change_info): New fields object, 
known_current_type
       and multiple_types_encountered.
       (extr_type_from_vtbl_ptr_store): New function.
       (check_stmt_for_type_change): Use it, set 
multiple_types_encountered if
       the result is different from the previous one.
       (detect_type_change): Renamed to detect_type_change_1. New 
parameter
       comp_type.  Set up new fields in tci, build known type jump
       functions if the new type can be identified.
       (detect_type_change): New function.
       * tree.h (DECL_CONTEXT): Comment new use.
   
       * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans.
       * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise.
       * testsuite/g++.dg/ipa/devirt-c-7.C: New test.
   
   
Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f)
   
 struct type_change_info
 {
+  /* The declaration or SSA_NAME pointer of the base that we are 
checking for
+     type change.  */
+  tree object;
+  /* If we actually can tell the type that the object has changed 
to, it is
+     stored in this field.  Otherwise it remains NULL_TREE.  */
+  tree known_current_type;
  /* Set to true if dynamic type change has been detected.  */
  bool type_maybe_changed;
+  /* Set to true if multiple types have been encountered.  
known_current_type
+     must be disregarded in that case.  */
+  bool multiple_types_encountered;
 };
   
 /* Return true if STMT can modify a virtual method table pointer.
@@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt)
  return true;
 }
   
+/* If STMT can be proved to be an assignment to the virtual method 
table
+   pointer of ANALYZED_OBJ and the type associated with the new table
+   identified, return the type.  Otherwise return NULL_TREE.  */
+
+static tree
+extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj)
+{
+  tree lhs, t, obj;
+
+  if (!is_gimple_assign (stmt))
  
   gimple_assign_single_p (stmt)
  
   OK.
  
  
+    return NULL_TREE;
+
+  lhs = gimple_assign_lhs (stmt);
+
+  if (TREE_CODE (lhs) != COMPONENT_REF)
+    return NULL_TREE;
+  obj = lhs;
+
+  if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1)))
+    return NULL_TREE;
+
+  do
+    {
+      obj = TREE_OPERAND (obj, 0);
+    }
+  while (TREE_CODE (obj) == COMPONENT_REF);
  
   You do not allow other components than component-refs (thus, for
   example an ARRAY_REF - that is for a reason?).  Please add
   a comment why.  Otherwise this whole sequence would look like
   it should be replaceable by get_base_address (obj).
  
  
   I guess I might have been overly conservative here, ARRAY_REFs are
   fine.  get_base_address only digs into MEM_REFs if they are based on
   an ADDR_EXPR while I do so always.  But I can check that either both
   obj and analyzed_obj are a MEM_REF of the same SSA_NAME or they are
   the same thing (i.e. the same decl)... which even feels a bit cleaner,
   so I did that.
 
  Well, as you are looking for a must-change-type pattern I think you cannot
  simply ignore offsets.  Consider
 
  T a[10];
 
  new (T') (a[9]);
  a[8]-foo();
 
  where the must-type-change on a[9] is _not_ changing the type of a[8]!
 

Re: [PATCH] Fix errors in expand_atomic_store.

2011-11-01 Thread David Miller
From: Richard Henderson r...@redhat.com
Date: Tue, 01 Nov 2011 13:48:26 -0700

 (2) Can we have the kernel implement some {SWAP,CAS}{4,8} primitives (possibly
 via a special trap) that we can export from libgcc, as we do for ARM, PA, 
  SH?
 
 I believe that would allow all of the non-embedded linux to support all of
 the c++11 atomic operations without having to resort to spinlocks.

Yes, I was just looking into this right now.  I didn't realize that PA, SH,
and ARM had added these kernel hooks to solve this problem.


Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)

2011-11-01 Thread Paul Richard Thomas
Dear Mikael,


 PS: I hereby confess my failure to not split the patch too much. :-(

I hereby confess my failure to find anything to which I could gripe,
let alone object!

The patch can only be described as a tour de force.  Not only is there
a lot of it - 6160 lines with context on - but it is well commented
and well structured.  I cannot see any whitespace out of place or even
minor transgressions in respect of gnu coding style.  Bah humbug!

On top of all that, it even does what is promised!  Also, other
testers have run it through various benchmarks, as recent threads
attest.

The only, slight worry that I have is that it is going to make Richi's
middle end scalarization nearly impossible to use for gfortran.
However, the enhanced capability that this patch brings makes it a
worthy addition to gfortran.

I bootstrapped and regtested on FC9/x86_64, just for the record.

OK for trunk.

Many, many thanks for the patch.

Paul


Re: [PATCH] Add capability to run several iterations of early optimizations

2011-11-01 Thread Richard Guenther
On Sat, Oct 29, 2011 at 1:06 AM, Matt m...@use.net wrote:
 On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote:

 I like this variant a lot better than the last one - still it lacks any
 analysis-based justification for iteration (see my reply to Matt on
 what I discussed with Honza).

 Yes, having a way to tell whether a function have significantly changed
 would be awesome.  My approach here would be to make inline_parameters
 output feedback of how much the size/time metrics have changed for a
 function since previous run.  If the change is above X%, then queue
 functions callers for more optimizations.  Similarly, Martin's
 rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue new
 direct callees and current function for another iteration if new direct
 edges were resolved.

 Figuring out the heuristic will need decent testing on a few projects to
 figure out what the sweet spot is (smallest binary for time/passes spent)
 for that given codebase. With a few data points, a reasonable stab at the
 metrics you mention can be had that would not terminate the iterations
 before the known optimial number of passes. Without those data points, it
 seems like making sure the metrics allow those sweet spots to be attained
 will be difficult.

Well, sure - the same like with inlining heuristics.

  Thus, I don't think we want to
 merge this in its current form or in this stage1.

 What is the benefit of pushing this to a later release?  If anything,
 merging the support for iterative optimizations now will allow us to
 consider adding the wonderful smartness to it later.  In the meantime,
 substituting that smartness with a knob is still a great alternative.

The benefit?  The benifit is to not have another magic knob in there
that doesn't make too much sense and papers over real conceptual/algorithmic
issues.  Brute-force iterating is a hack, not a solution. (sorry)

 I agree (of course). Having the knob will be very useful for testing and
 determining the acceptance criteria for the later smartness. While
 terminating early would be a nice optimization, the feature is still
 intrinsically useful and deployable without it. In addition, when using LTO
 on nearly all the projects/modules I tested on, 3+ passes were always
 productive.

If that is true then I'd really like to see testcases.  Because I am sure you
are just papering over (mabe even easy to fix) issues by the brute-force
iterating approach.  We also do not have a switch to run every pass twice
in succession, just because that would be as stupid as this.

 To be fair, when not using LTO, beyond 2-3 passes did not often
 produce improvements unless individual compilation units were enormous.

 There was also the question of if some of the improvements seen with
 multiple passes were indicative of deficiencies in early inlining, CFG, SRA,
 etc. If the knob is available, I'm happy to continue testing on the same
 projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl,
 scummvm, binutils, etc) and write a report on what I observe as suspicious
 improvements that perhaps should be caught/made in a single pass.

 It's worth noting again that while this is a useful feature in and of itself
 (especially when combined with LTO), it's *extremely* useful when coupled
 with the de-virtualization improvements submitted in other threads. The
 examples submitted for inclusion in the test suite aren't academic -- they
 are reductions of real-world performance issues from a mature (and shipping)
 C++-based networking product. Any C++ codebase that employs physical
 separation in their designs via Factory patterns, Interface Segregation,
 and/or Dependency Inversion will likely see improvements. To me, these
 enahncements combine to form one of the biggest leaps I've seen in C++ code
 optimization -- code that can be clean, OO, *and* fast.

But iterating the whole early optimization pipeline is not a sensible approach
of attacking these.

Richard.

 Richard: If there's any additional testing or information I can reasonably
 provide to help get this in for this stage1, let me know.

 Thanks!


 --
 tangled strands of DNA explain the way that I behave.
 http://www.clock.org/~matt



[PATCH, i386]: Use reg_or_subregno in int-float splitters

2011-11-01 Thread Uros Bizjak
Hello!

We have a nice utility function that can be used in int-float
splitter constraints.

2011-11-01  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (splitters for int-float conversion): Use
reg_or_subregno in splitter constraints.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 180742)
+++ i386.md (working copy)
@@ -4920,9 +4920,7 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_MIX_SSE_I387
 TARGET_INTER_UNIT_CONVERSIONS
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(set (match_dup 0) (float:MODEF (match_dup 1)))])
 
 (define_split
@@ -4933,9 +4931,7 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_MIX_SSE_I387
 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (float:MODEF (match_dup 2)))])
 
@@ -5024,9 +5020,7 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(const_int 0)]
 {
   rtx op1 = operands[1];
@@ -5067,9 +5061,7 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -5091,9 +5083,7 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(const_int 0)]
 {
   rtx op1 = operands[1];
@@ -5137,9 +5127,7 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -5200,9 +5188,7 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 (TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(set (match_dup 0) (float:MODEF (match_dup 1)))])
 
 (define_insn *floatSWI48x:modeMODEF:mode2_sse_nointerunit
@@ -5235,9 +5221,7 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (float:MODEF (match_dup 2)))])
 
@@ -5248,9 +5232,7 @@
   (SWI48x:MODEmode != DImode || TARGET_64BIT)
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 reload_completed
-(SSE_REG_P (operands[0])
-   || (GET_CODE (operands[0]) == SUBREG
-   SSE_REG_P (operands[0])))
+SSE_REGNO_P (reg_or_subregno (operands[0]))
   [(set (match_dup 0) (float:MODEF (match_dup 1)))])
 
 (define_insn *floatSWI48x:modeX87MODEF:mode2_i387_with_temp


[Patch,Fortran] Fix tree-walking issue (was: gfortran tree walking issue)

2011-11-01 Thread Tobias Burnus

Dear all, dear Paul,

(For gcc-patch@ readers: gfortran has issues with tree walking: During 
traversal it does not touch all tree nodes if the function called during 
traversal adds new nodes to the tree - as this will rebalance the tree. 
This causes a regression with my recently posted RFC patch for 
constructors.)



Paul Richard Thomas wrote:

Maybe we should decide a priority order?  Your patch and those of
Mikael could cause regressions other than in code involving OOP.  I
would suggest, therefore, that we should find a fix for your problem
below and get these patches committed first.  I will still try to get
mine completed before the end of Stage 1 but it will not matter as
much if I am a week or so late.


I think it makes sense to have mine and Mikael's patch first. Actually, 
I just saw that you approved Mikael's patch. For my patch, the 
class_21.f03/tree-walking issue is solved by the attached patch 2. I 
think after that issue is solved, you can continue working on your patch.


Constructor patch: I still have another rejects-valid issue related to 
multiple USE, ONLY for the same module, but I don't think that it makes 
sense that we both simultaneously try tackle that issue. When I have 
solved the use-only issue, I can start cleaning up the patch, add two 
diagnostic checks, tweak some diagnostics/dg-error checks, write a 
ChangeLog, re-test the patch with real-world codes, and hopefully submit 
it by next weekend.


 * * *

Regarding the tree-walking issue: I think it is a general issue which 
could also affect other things. I really wonder why we haven't been 
bitten by it before. However, it might be that we hit those problems and 
fixed them by re-resolving symbols at later parts. My feeling is that 
the issue occurs either only with vtab/vtree or at least also due to 
those functions. However, I might be wrong as I do not quickly see which 
of the tree-traversal callers can generate new trees.


I made two attempts to fix the issue. The first one fails - hence, I use 
the second one. In particular, I seek comments and approval for the 
second patch.



 PATCH 1 

Ensuring that every tree node gets touched once. This patch works by 
traversing the tree until all nodes are touched at least once. That 
means that one has a couple of light-weight extra walks, which *includes 
the newly added nodes*.


The patch does:
a) Ensure that all trees are walked
b) Mark symbol nodes as already walked when finding a vtab
c) Skip vtab/vtype in resolve symbol

(b) and (c) do not seem to have any effect. The patch regtests*, except 
for gfortran.dg/class_21.f03, which still has an endless loop. (Cf. 
previous email.)



 PATCH 2 

This patch uses a different approach to makes sure that *newly added 
nodes* do *not* get visited. It does so by saving the symtree in a 
vector and then one walks the vector. Except for the additional memory 
requirement for the vector, this version should also be quick and avoids 
walking the tree multiple times. It also preserves the order the trees 
are walked.


This patch builds and regtests* (gfortran + libgomp) on x86-64-linux.
OK for the trunk?

Tobias

* Except for the known and meanwhile old failures for 
gfortran.dg/select_type_12.f03 (P1 regression), 
gfortran.fortran-torture/execute/entry_4.f90 (P1 regression) and 
gfortran.dg/realloc_on_assign_5.f03 (failed since committal).
NOTE: The following patch does not regtest. Hence, I do not seek approval
  for this patch.

2011-11-01  Tobias Burnus  bur...@net-b.de

	* symbol.c (all_marked): New file-global variable.
	(is_all_marked): New function which checks whether a variable
	is marked.
	(gfc_traverse_symtree): Use them to ensure that all tree nodes
	are touched, even if the tree changes during tranversal.
	* class.c (gfc_find_derived_vtab): Mark vtab symbol to avoid double
	resolution.

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index f64cc1b..8880d65 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -591,6 +591,10 @@ have_vtype:
 
   found_sym = vtab;
 
+  /* Avoid double evaluation.  */
+  if (found_sym)
+found_sym-mark = 1;
+
 cleanup:
   /* It is unexpected to have some symbols added at resolution or code
  generation time. We commit the changes in order to keep a clean state.  */
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 67d65cb..11c83cc 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -102,6 +102,7 @@ static gfc_symbol *changed_syms = NULL;
 
 gfc_dt_list *gfc_derived_types;
 
+static bool all_marked;
 
 /* List of tentative typebound-procedures.  */
 
@@ -3353,6 +3354,14 @@ traverse_ns (gfc_symtree *st, void (*func) (gfc_symbol *))
 }
 
 
+static void
+is_all_marked (gfc_symtree *st)
+{
+  if (!st-n.sym-mark)
+all_marked = false;
+}
+
+
 /* Call a given function for all symbols in the namespace.  We take
care that each gfc_symbol node is called exactly once.  */
 
@@ -3362,7 +3371,15 @@ gfc_traverse_ns 

Re: [Patch,Fortran] Fix tree-walking issue

2011-11-01 Thread Tobias Schlüter


Dear Tobias,

On 2011-11-01 22:33, Tobias Burnus wrote:

Regarding the tree-walking issue: I think it is a general issue which
could also affect other things. I really wonder why we haven't been
bitten by it before. However, it might be that we hit those problems and
fixed them by re-resolving symbols at later parts. My feeling is that
the issue occurs either only with vtab/vtree or at least also due to
those functions. However, I might be wrong as I do not quickly see which
of the tree-traversal callers can generate new trees.


I don't remember all this very clearly, but I think that the 
gfc_symbol::tlink field is intended for something like this, even though 
this is not very clear (at least to me) from the explanatory comment I 
quoted below.  Anyway, I thought I might point this out, as it might 
help you getting things working since the problem it addresses at least 
appears similar:


  /* Change management fields.  Symbols that might be modified by the
 current statement have the mark member nonzero and are kept in a
 singly linked list through the tlink field.  Of these symbols,
 symbols with old_symbol equal to NULL are symbols created within
 the current statement.  Otherwise, old_symbol points to a copy of
 the old symbol.  */

  struct gfc_symbol *old_symbol, *tlink;

Cheers,
- Tobi


Re: [PATCH, i386]: Use reg_or_subregno in int-float splitters

2011-11-01 Thread Jakub Jelinek
On Tue, Nov 01, 2011 at 10:33:07PM +0100, Uros Bizjak wrote:
 We have a nice utility function that can be used in int-float
 splitter constraints.
 
 2011-11-01  Uros Bizjak  ubiz...@gmail.com
 
   * config/i386/i386.md (splitters for int-float conversion): Use
   reg_or_subregno in splitter constraints.
 
 Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
 committed to mainline SVN.

Unfortunately reg_or_subregno is an external non-inline function,
doesn't have pure attribute and SSE_REGNO_P macro evaluates its argument
twice, which means the function is called multiple times.

Jakub


Re: [PATCH][ARM] Big Endian and Generic tuning

2011-11-01 Thread Andrew Stubbs

On 26/10/11 10:15, Richard Earnshaw wrote:

Here's an updated patch that makes no generalizations.

OK?



Yep


Committed.

Andrew


[PATCH, devirtualization] Intraprocedural devirtualization pass

2011-11-01 Thread Martin Jambor
Hi,

the patch below is the second (and last) revived type-based
devirtualization patch that did not make it to 4.6.  It deals with
virtual calls from the function in which the there is also the object
declaration:

void foo()
{
  A a;

  a.foo ();
}

Normally, the front-end would do the devirtualization on its own,
however, early-inlining can create these situations too.  Since there
is nothing interprocedural going on, the current inlining and IPA-CP
devirtualization bits are of no help.  We do not do type-based
devirtualization in OBJ_TYPE_REF folding either, because the necessary
type-changing checks might make it quite expensive.  Thus, this patch
introduces a new pass to do that.

The patch basically piggy-tails on the intraprocedural
devirtualization mechanism, trying to construct a known-type jump
function for all objects in OBJ_TYPE_REF calls and then immediately
using it like we do in IPA-CP.

The original patch was doing this as a part of
pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
a separate pass.  First, I scheduled it after
pass_rebuild_cgraph_edges and was only traversing indirect edges,
avoiding a sweep over all of the IL.  Unfortunately, this does not
work in one scenario.  If the newly-known destination of a virtual
call is known not to throw, we may have to purge some EH CFG edges and
potentially basic blocks.  If these basic blocks contain calls
(typically calls to object destructors), we may end up having stale
call edges in the call graph... and our current approach to that
problem is to call pass_rebuild_cgraph_edges.  I think that I was not
running into this problem when the mechanism was a part of that pass
just because of pure luck.  Anyway, this is why I eventually opted for
a sweep over the statements.

My best guess is that it is probably worth it, but only because the
overhead should be still fairly low.  The pass triggers quite a number
of times when building libstdc++ and it can speed up an artificial
testcase which I will attach from over 20 seconds to 7s on my older
desktop - it is very similar to the one I posted with the previous
patch but this time the object constructors must not get early inlined
but the function multiply_matrices has to.  Currently I have problems
compiling Firefox even without LTO so I don't have any numbers from it
either.  IIRC, Honza did not see this patch trigger there when he
tried the ancient version almost a year go.  On the other hand, Maxim
claimed that the impact can be noticeable on some code base he is
concerned about.

I have successfully bootstrapped and tested the patch on x86_64-linux.
What do you think, should we include this in trunk?

Thanks,

Martin


2011-10-31  Martin Jambor  mjam...@suse.cz

* ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
* ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and
updated all callers.
(intraprocedural_devirtualization): New function.
(gate_intra_devirtualization): Likewise.
(pass_intra_devirt): New pass.
* ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
* passes.c (init_optimization_passes): Schedule pass_intra_devirt.
* tree-pass.h (pass_intra_devirt): Declare.

* testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
* testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.


Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
@@ -0,0 +1,62 @@
+/* Verify that virtual calls are folded even when a typecast to an
+   ancestor is involved along the way.  */
+/* { dg-do run } */
+/* { dg-options -O2 -fdump-tree-devirt  } */
+
+extern C void abort (void);
+
+class A
+{
+public:
+  int data;
+  virtual int foo (int i);
+};
+
+
+class B : public A
+{
+public:
+  __attribute__ ((noinline)) B();
+  virtual int foo (int i);
+};
+
+int __attribute__ ((noinline)) A::foo (int i)
+{
+  return i + 1;
+}
+
+int __attribute__ ((noinline)) B::foo (int i)
+{
+  return i + 2;
+}
+
+int __attribute__ ((noinline,noclone)) get_input(void)
+{
+  return 1;
+}
+
+__attribute__ ((noinline)) B::B()
+{
+}
+
+static inline int middleman_1 (class A *obj, int i)
+{
+  return obj-foo (i);
+}
+
+static inline int middleman_2 (class B *obj, int i)
+{
+  return middleman_1 (obj, i);
+}
+
+int main (int argc, char *argv[])
+{
+  class B b;
+
+  if (middleman_2 (b, get_input ()) != 3)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Immediately devirtualizing call.*into.*B::foo 
 devirt  } } */
+/* { dg-final { cleanup-tree-dump devirt } } */
Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
@@ -0,0 +1,95 @@
+/* Verify that virtual calls are folded even when a typecast to an
+   ancestor is involved along the way.  */
+/* { dg-do run } 

Re: [PATCH, i386]: Use reg_or_subregno in int-float splitters

2011-11-01 Thread Uros Bizjak
On Tue, Nov 1, 2011 at 11:00 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 01, 2011 at 10:33:07PM +0100, Uros Bizjak wrote:
 We have a nice utility function that can be used in int-float
 splitter constraints.

 2011-11-01  Uros Bizjak  ubiz...@gmail.com

       * config/i386/i386.md (splitters for int-float conversion): Use
       reg_or_subregno in splitter constraints.

 Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
 committed to mainline SVN.

 Unfortunately reg_or_subregno is an external non-inline function,
 doesn't have pure attribute and SSE_REGNO_P macro evaluates its argument
 twice, which means the function is called multiple times.

You are right. :(

On a second look, we are missing SUBREG_REG on subregs, the constraint
should read:

(SSE_REG_P (operands[0])
   || (GET_CODE (operands[0]) == SUBREG
SSE_REG_P (SUBREG_REG (operands[0])))

I will do a partial revert with additional fix.

2011-11-01  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (splitters for int-float conversion): Use
SUBREG_REG on SUBREGs in splitter constraints.


Bootstrap and regression test in progress.

Thanks,
Uros.
Index: i386.md
===
--- i386.md (revision 180745)
+++ i386.md (working copy)
@@ -4920,7 +4920,9 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_MIX_SSE_I387
 TARGET_INTER_UNIT_CONVERSIONS
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(set (match_dup 0) (float:MODEF (match_dup 1)))])
 
 (define_split
@@ -4931,7 +4933,9 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_MIX_SSE_I387
 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (float:MODEF (match_dup 2)))])
 
@@ -5020,7 +5024,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(const_int 0)]
 {
   rtx op1 = operands[1];
@@ -5061,7 +5067,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -5083,7 +5091,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(const_int 0)]
 {
   rtx op1 = operands[1];
@@ -5127,7 +5137,9 @@
   TARGET_SSE2  TARGET_SSE_MATH
 TARGET_USE_VECTOR_CONVERTS  optimize_function_for_speed_p (cfun)
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(const_int 0)]
 {
   operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0],
@@ -5188,7 +5200,9 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 (TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(set (match_dup 0) (float:MODEF (match_dup 1)))])
 
 (define_insn *floatSWI48x:modeMODEF:mode2_sse_nointerunit
@@ -5221,7 +5235,9 @@
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun))
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (float:MODEF (match_dup 2)))])
 
@@ -5232,7 +5248,9 @@
   (SWI48x:MODEmode != DImode || TARGET_64BIT)
 SSE_FLOAT_MODE_P (MODEF:MODEmode)  TARGET_SSE_MATH
 reload_completed
-SSE_REGNO_P (reg_or_subregno (operands[0]))
+(SSE_REG_P (operands[0])
+   || (GET_CODE (operands[0]) == SUBREG
+   SSE_REG_P (SUBREG_REG (operands[0]
   [(set (match_dup 0) 

Re: [PATCH] Fix computed gotos on m68k

2011-11-01 Thread Eric Botcazou
 I've now committed the patch on 4.6 also. I did need to apply the
 following patch from Bernd in order to test the 4.6 branch tip
 successfully, since without it my build blew up in glibc with an error
 in final.c:

   http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00454.html

 Maybe that patch should be applied to 4.6 also?

Fine with me, this is a regression; please backport the testcase as well.  TIA.

-- 
Eric Botcazou


Re: [PATCH] Fix errors in expand_atomic_store.

2011-11-01 Thread Eric Botcazou
 (1) Are there really live v7 still around?

 At least with v8 we have SWAP, with which we can implement the full
 __atomic_exchange pattern sans hackery.  We can't do that with just
 LDSTUB.

I think that we can drop v7 support at this point but not v8 because of Leon.

-- 
Eric Botcazou


Re: [patch] support for multiarch systems

2011-11-01 Thread Joseph S. Myers
On Tue, 1 Nov 2011, Marc Glisse wrote:

 On Sun, 21 Aug 2011, Matthias Klose wrote:
 
  On 08/20/2011 09:51 PM, Matthias Klose wrote:
   Multiarch [1] is the term being used to refer to the capability of a
   system to
   install and run applications of multiple different binary targets on the
   same
   system.  The idea and name of multiarch dates back to 2004/2005 [2] (to be
   confused with multiarch in glibc).
  
  attached is an updated patch which includes feedback from Jakub and Joseph.
 
 Hello,
 
 what is the status of this patch? Is it waiting for a review? Having gcc 4.7
 work out of the box on 2 of the most popular linux distributions seems like an
 important feature...

There were comments of mine that remained unaddressed in the version to 
which you replied and I don't recall a version that addressed them.  So 
there isn't a patch ready for review.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Handle V4HI vector initialization more efficiently on VIS1.

2011-11-01 Thread David Miller

Committed to trunk.

gcc/

* config/sparc/sparc.c (vector_init_faligndata): New function.
(sparc_expand_vector_init): Use it for V4HImode on VIS1.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180752 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|3 +++
 gcc/config/sparc/sparc.c |   24 
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9c75318..a7b1c09 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -76,6 +76,9 @@
 
 2011-11-01  David S. Miller  da...@davemloft.net
 
+   * config/sparc/sparc.c (vector_init_faligndata): New function.
+   (sparc_expand_vector_init): Use it for V4HImode on VIS1.
+
* config/sparc/sparc.c (sparc_expand_vcond): New function.
* config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it.
* config/sparc/sparc.md (vcondmodemode): New VIS3 expander.
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 6431405..649612e 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -11340,6 +11340,25 @@ vector_init_fpmerge (rtx target, rtx elt, enum 
machine_mode inner_mode)
   emit_insn (gen_fpmerge_vis (gen_lowpart (V8QImode, target), t1, t2));
 }
 
+static void
+vector_init_faligndata (rtx target, rtx elt, enum machine_mode inner_mode)
+{
+  rtx t1 = gen_reg_rtx (V4HImode);
+
+  elt = convert_modes (SImode, inner_mode, elt, true);
+
+  emit_move_insn (gen_lowpart (SImode, t1), elt);
+
+  emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode),
+ force_reg (SImode, GEN_INT (6)),
+ CONST0_RTX (SImode)));
+
+  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
+  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
+  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
+  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
+}
+
 void
 sparc_expand_vector_init (rtx target, rtx vals)
 {
@@ -11404,6 +11423,11 @@ sparc_expand_vector_init (rtx target, rtx vals)
  vector_init_fpmerge (target, XVECEXP (vals, 0, 0), inner_mode);
  return;
}
+  if (mode == V4HImode)
+   {
+ vector_init_faligndata (target, XVECEXP (vals, 0, 0), inner_mode);
+ return;
+   }
 }
 
   mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
-- 
1.7.6.401.g6a319



[v3] implement LWG 2067 and new issues with constructors in future

2011-11-01 Thread Jonathan Wakely
This patch implements
http://lwg.github.com/issues/lwg-active.html#2067 which has Ready
status, as well as fixing two new issues I've reported in the past few
hours.  The first is that packaged_task's template constructors should
be restricted to prevent them from being chosen to copy a
packaged_task object and the second is that promise and packaged_task
should properly support uses-allocator construction so that if
promise(args...) is well-formed then so is promise(allocator_arg,
alloc, args...)

* include/std/future (promise): Add constructors for uses-allocator
construction from rvalue promise.
(packaged_task): Implement LWG 2067. Add additional constructors for
uses-allocator construction.
* testsuite/30_threads/packaged_task/cons/3.cc: New.
* testsuite/30_threads/packaged_task/cons/alloc2.cc: New.
* testsuite/30_threads/promise/cons/alloc2.cc: New.

Tested x86_64-linux, committed to trunk.
Index: include/std/future
===
--- include/std/future	(revision 180749)
+++ include/std/future	(working copy)
@@ -955,6 +955,12 @@
 	  _M_storage(__future_base::_S_allocate_result_Res(__a))
 { }
 
+  templatetypename _Allocator
+promise(allocator_arg_t, const _Allocator, promise __rhs)
+: _M_future(std::move(__rhs._M_future)),
+	  _M_storage(std::move(__rhs._M_storage))
+{ }
+
   promise(const promise) = delete;
 
   ~promise()
@@ -1047,6 +1053,12 @@
 	  _M_storage(__future_base::_S_allocate_result_Res(__a))
 { }
 
+  templatetypename _Allocator
+promise(allocator_arg_t, const _Allocator, promise __rhs)
+: _M_future(std::move(__rhs._M_future)),
+	  _M_storage(std::move(__rhs._M_storage))
+{ }
+
   promise(const promise) = delete;
 
   ~promise()
@@ -1122,6 +1134,12 @@
 	  _M_storage(__future_base::_S_allocate_resultvoid(__a))
 { }
 
+  templatetypename _Allocator
+promise(allocator_arg_t, const _Allocator, promise __rhs)
+: _M_future(std::move(__rhs._M_future)),
+	  _M_storage(std::move(__rhs._M_storage))
+{ }
+
   promise(const promise) = delete;
 
   ~promise()
@@ -1270,6 +1288,15 @@
 { return std::forward_Tp(__t); }
 };
 
+  templatetypename _Task, typename _Fn, bool
+   = is_same_Task, typename remove_reference_Fn::type::value
+struct __is_same_pkgdtask
+{ typedef void __type; };
+
+  templatetypename _Task, typename _Fn
+struct __is_same_pkgdtask_Task, _Fn, true
+{ };
+
   /// packaged_task
   templatetypename _Res, typename... _ArgTypes
 class packaged_task_Res(_ArgTypes...)
@@ -1281,13 +1308,20 @@
   // Construction and destruction
   packaged_task() noexcept { }
 
-  templatetypename _Fn
+  templatetypename _Allocator
 explicit
+packaged_task(allocator_arg_t, const _Allocator __a) noexcept
+{ }
+
+  templatetypename _Fn, typename = typename
+   __is_same_pkgdtaskpackaged_task, _Fn::__type
+explicit
 packaged_task(_Fn __fn)
 : _M_state(std::make_shared_State_type(std::forward_Fn(__fn)))
 { }
 
-  templatetypename _Fn, typename _Allocator
+  templatetypename _Fn, typename _Allocator, typename = typename
+   __is_same_pkgdtaskpackaged_task, _Fn::__type
 explicit
 packaged_task(allocator_arg_t, const _Allocator __a, _Fn __fn)
 : _M_state(std::allocate_shared_State_type(__a,
@@ -1301,13 +1335,24 @@
   }
 
   // No copy
-  packaged_task(packaged_task) = delete;
-  packaged_task operator=(packaged_task) = delete;
+  packaged_task(const packaged_task) = delete;
+  packaged_task operator=(const packaged_task) = delete;
 
+  templatetypename _Allocator
+explicit
+packaged_task(allocator_arg_t, const _Allocator,
+  const packaged_task) = delete;
+
   // Move support
   packaged_task(packaged_task __other) noexcept
   { this-swap(__other); }
 
+  templatetypename _Allocator
+explicit
+packaged_task(allocator_arg_t, const _Allocator,
+  packaged_task __other) noexcept
+{ this-swap(__other); }
+
   packaged_task operator=(packaged_task __other) noexcept
   {
 packaged_task(std::move(__other)).swap(*this);
Index: testsuite/30_threads/packaged_task/cons/alloc2.cc
===
--- testsuite/30_threads/packaged_task/cons/alloc2.cc	(revision 0)
+++ testsuite/30_threads/packaged_task/cons/alloc2.cc	(revision 0)
@@ -0,0 +1,40 @@
+// { dg-do compile }
+// { dg-options -std=gnu++0x }
+// { dg-require-cstdint  }
+// { dg-require-gthreads  }
+// { dg-require-atomic-builtins  }
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you 

[patch] update config.sub

2011-11-01 Thread DJ Delorie

Committed under the brought in via a merge rule.

2011-11-01  DJ Delorie  d...@redhat.com

* config.sub: Update to version 2011-10-29 (added rl78)

Imports this change:

2011-10-29  DJ Delorie  d...@redhat.com

* config.sub (rl78): New.


Re: RFC: PATCH to adjust warning flags for C++

2011-11-01 Thread Jason Merrill

On 11/01/2011 03:48 PM, Gabriel Dos Reis wrote:

On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrillja...@redhat.com  wrote:

Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus -Wall)
broke bootstrap because of narrowing warnings, so I'd like to add
-Wno-narrowing to the stage 2+ warning flags.  Is this the best way to do
that?


why do we want to include -Wc++0x-compat in -Wall?


It's already included.  And I think that your code won't work in C++11 
is a warning that most C++ programmers will be interested in if they are 
asking for warnings.


Jason


Re: [v3] implement LWG 2067 and new issues with constructors in future

2011-11-01 Thread Jonathan Wakely
On 2 November 2011 00:53, Jonathan Wakely wrote:
 The first is that packaged_task's template constructors should
 be restricted to prevent them from being chosen to copy a
 packaged_task object

While submitting that issue to the LWG chair I realised the constraint
should use decayFn instead of remove_referenceFn so that it also
removes cv qualifiers.  I'll fix that tomorrow.


[PATCH, ARM] Fix stack red zone bug (PR38644)

2011-11-01 Thread Jiangning Liu
Hi,

This patch is to fix PR38644 in ARM back-end. OK for trunk?

For every detail, please refer to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644.

ChangeLog:

2011-11-2  Jiangning Liu  jiangning@arm.com

PR rtl-optimization/38644
* config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier
for epilogue having stack adjustment.

ChangeLog of testsuite:

2011-11-2  Jiangning Liu  jiangning@arm.com

PR rtl-optimization/38644
* gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

Patch:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1ada6f..1f6fc26
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22215,6 +22215,8 @@ thumb1_expand_epilogue (void)
   gcc_assert (amount = 0);
   if (amount)
 {
+  emit_insn (gen_blockage ());
+
   if (amount  512)
emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (amount)));
diff --git a/gcc/testsuite/gcc.target/arm/stack-red-zone.c
b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
new file mode 100644
index 000..b9f0f99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
@@ -0,0 +1,12 @@
+/* No stack red zone.  PR38644.  */
+/* { dg-options -mthumb -O2 } */
+/* { dg-final { scan-assembler ldrb\[^\n\]*\\n\[\t \]*add\[\t \]*sp } }
*/
+
+extern int doStreamReadBlock (int *, char *, int size, int);
+
+char readStream (int *s)
+{
+   char c = 0;
+   doStreamReadBlock (s, c, 1, *s);
+   return c;
+}







Re: [PATCH] Fix errors in expand_atomic_store.

2011-11-01 Thread Andrew MacLeod

On 11/01/2011 11:15 AM, Richard Henderson wrote:

On 11/01/2011 04:56 AM, Andrew MacLeod wrote:

well, the reason for it was so that __atomic_store can be used as a
replacement for sync_lock_release on such targets...

And what was your replacement for sync_test_and_set?

If you don't have that pair, you don't have a replacement.


store (m, 0) is release and
t = exchange (m, 1)   is  test_and_set.




[pph] Merge static_decls. (issue5335042)

2011-11-01 Thread Lawrence Crowl
Add merging of static_decls in bindings.  Due to the current
structure, this change is currently only effective at namespace scope.
Consequently, there are no changes to test status.  We may need to
make all bindings merged by default.


Index: gcc/cp/ChangeLog.pph

2011-11-01   Lawrence Crowl  cr...@google.com

* pph-streamer-out.c (pph_out_binding_level_1): Remove streaming of
static_decls.
(pph_out_binding_level): Add streaming of static_decls.
(pph_out_binding_merge_bodies): Likewise.
* pph-streamer-in.c (pph_is_tree_element_of_vec): New.
(pph_union_two_tree_vecs): New.
(pph_union_into_tree_vec): New.
(pph_in_binding_level_1): Remove streaming of static_decls.
(pph_in_binding_level): Add streaming of static_decls.
(pph_in_binding_merge_bodies): Add merging of static_decls from
streamer into existing binding.  Needs new function parameter.
(pph_in_merge_key_tree): Preallocate namespace cp_binding_level.
(pph_in_global_binding): Update call to pph_in_binding_merge_bodies.


Index: gcc/cp/pph-streamer-in.c
===
--- gcc/cp/pph-streamer-in.c(revision 180705)
+++ gcc/cp/pph-streamer-in.c(working copy)
@@ -677,6 +677,66 @@ pph_in_tree_pair_vec (pph_stream *stream
 }
 
 
+/* Test whether tree T is an element of vector V.  */
+
+static bool
+pph_is_tree_element_of_vec (tree t, VEC(tree,gc) *v)
+{
+  unsigned i;
+  tree s;
+  FOR_EACH_VEC_ELT (tree, v, i, s)
+if (s == t)
+  return true;
+  return false;
+}
+
+
+/* Return the union of two tree vecs.  The argument vectors are unmodified.  */
+
+static VEC(tree,gc) *
+pph_union_two_tree_vecs (VEC(tree,gc) *left, VEC(tree,gc) *right)
+{
+  /* FIXME pph: This O(left)+O(left*right) union may become a problem.
+ In the long run, we probably want to copy both into a hash table
+ and then copy the table into the result.  */
+  unsigned i;
+  tree t;
+  VEC(tree,gc) *unioned = VEC_copy (tree, gc, left);
+  FOR_EACH_VEC_ELT (tree, right, i, t)
+{
+  if (!pph_is_tree_element_of_vec (t, left))
+   VEC_safe_push (tree, gc, unioned, t);
+}
+  return unioned;
+}
+
+
+/* Union FROM one tree vec with and INTO a tree vec.  The INTO argument will
+   have an updated value.  The FROM argument is no longer valid.  */
+
+static void
+pph_union_into_tree_vec (VEC(tree,gc) **into, VEC(tree,gc) *from)
+{
+  if (!VEC_empty (tree, from))
+{
+  if (*into == NULL)
+   *into = from;
+  else if (VEC_empty (tree, *into))
+   {
+ VEC_free (tree, gc, *into);
+ *into = from;
+   }
+  else
+   {
+ VEC(tree,gc) *unioned = pph_union_two_tree_vecs (*into, from);
+ VEC_free (tree, gc, *into);
+ VEC_free (tree, gc, from);
+ *into = unioned;
+   }
+}
+}
+
+
 / chains */
 
 
@@ -967,7 +1027,6 @@ pph_in_binding_level_1 (pph_stream *stre
   struct bitpack_d bp;
 
   bl-this_entity = pph_in_tree (stream);
-  bl-static_decls = pph_in_tree_vec (stream);
 
   num = pph_in_uint (stream);
   bl-class_shadowed = NULL;
@@ -1029,6 +1088,7 @@ pph_in_binding_level (pph_stream *stream
   bl-namespaces = pph_in_chain (stream);
   bl-usings = pph_in_chain (stream);
   bl-using_directives = pph_in_chain (stream);
+  bl-static_decls = pph_in_tree_vec (stream);
   pph_in_binding_level_1 (stream, bl);
 
   return bl;
@@ -1051,12 +,13 @@ pph_in_binding_merge_keys (pph_stream *s
 /* Read all the merge bodies from STREAM into the cp_binding_level BL.  */
 
 static void
-pph_in_binding_merge_bodies (pph_stream *stream)
+pph_in_binding_merge_bodies (pph_stream *stream, cp_binding_level *bl)
 {
   pph_in_merge_body_chain (stream);
   pph_in_merge_body_chain (stream);
   pph_in_merge_body_chain (stream);
   pph_in_merge_body_chain (stream);
+  pph_union_into_tree_vec (bl-static_decls, pph_in_tree_vec (stream));
 }
 
 
@@ -1951,11 +2012,11 @@ pph_in_merge_key_tree (pph_stream *strea
 {
   if (TREE_CODE (expr) == NAMESPACE_DECL)
 {
- /* struct lang_decl *ld; */
-  retrofit_lang_decl (expr);
- /* ld = DECL_LANG_SPECIFIC (expr); */
- /* FIXME NOW: allocate binding.  */
-  pph_in_binding_merge_keys (stream, NAMESPACE_LEVEL (expr));
+ cp_binding_level *bl;
+ retrofit_lang_decl (expr);
+ bl = ggc_alloc_cleared_cp_binding_level ();
+ NAMESPACE_LEVEL (expr) = bl;
+ pph_in_binding_merge_keys (stream, bl);
 }
 #if 0
 /* FIXME pph: Disable type merging for the moment.  */
@@ -2438,7 +2499,7 @@ pph_in_global_binding (pph_stream *strea
  same slot IX that the writer used, the trees read now will be
  bound to scope_chain-bindings.  */
   pph_in_binding_merge_keys (stream, bl);
-  pph_in_binding_merge_bodies (stream);
+  pph_in_binding_merge_bodies (stream, bl);
 
   /* FIXME 

Re: RFC: PATCH to adjust warning flags for C++

2011-11-01 Thread Gabriel Dos Reis
On Tue, Nov 1, 2011 at 8:11 PM, Jason Merrill ja...@redhat.com wrote:
 On 11/01/2011 03:48 PM, Gabriel Dos Reis wrote:

 On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrillja...@redhat.com  wrote:

 Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus
 -Wall)
 broke bootstrap because of narrowing warnings, so I'd like to add
 -Wno-narrowing to the stage 2+ warning flags.  Is this the best way to do
 that?

 why do we want to include -Wc++0x-compat in -Wall?

 It's already included.

yes, that is why I asked.  E.g. it isn't obvious that -Wc++0x-compat
 ought to be in -Wall at this stage or 4.6.x.

  And I think that your code won't work in C++11 is
 a warning that most C++ programmers will be interested in if they are asking
 for warnings.

Even when -std=c++03 -Wall or -std=c++98 -Wall?

I would suggest we do this:
1. Include -Wc++0x-compat in -W or -Wextra for THIS release.
2.  leave -Wnarrowing in -Wc++0x-compat by default.
3.  Make a release note that -Wc++0x-compat will be activated in
the very major release.


[v3] tr2 missing bits

2011-11-01 Thread Benjamin Kosnik

Ooops, noticed some minor bits when I was regenerating the docs. Some
of the TR2 man pages needed munging, and the c++config bits for
versioning TR2 needed to go in.

tested x86/linux

best,
benjamin

2011-11-02  Benjamin Kosnik  b...@redhat.com

	* include/bits/c++config: Add tr2 to versioned namespaces.
	* scripts/run_doxygen: Adjust generated man files as well.
	* testsuite/ext/profile/mutex_extensions_neg.cc: Adjust line numbers.

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index f77da5e..e76e742 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -148,6 +148,8 @@
   namespace __detail { }
 }
 
+namespace tr2 { }
+
 namespace decimal { }
 
 namespace chrono { }
@@ -197,6 +199,9 @@ namespace std
 namespace __detail { inline namespace __7 { } }
   }
 
+  namespace tr2
+  { inline namespace __7 { } }
+
   namespace decimal { inline namespace __7 { } }
 
   namespace chrono { inline namespace __7 { } }
diff --git a/libstdc++-v3/scripts/run_doxygen b/libstdc++-v3/scripts/run_doxygen
index 48b1724..3fef95f 100644
--- a/libstdc++-v3/scripts/run_doxygen
+++ b/libstdc++-v3/scripts/run_doxygen
@@ -339,6 +339,10 @@ for f in std_tr1_*; do
 newname=`echo $f | sed 's/^std_tr1_/std::tr1::/'`
 mv $f $newname
 done
+for f in std_tr2_*; do
+newname=`echo $f | sed 's/^std_tr2_/std::tr2::/'`
+mv $f $newname
+done
 for f in std_*; do
 newname=`echo $f | sed 's/^std_/std::/'`
 mv $f $newname
diff --git a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
index 4e2d071..c6e6fea 100644
--- a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
+++ b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
@@ -25,4 +25,4 @@
 
 #include vector
 
-// { dg-error multiple inlined namespaces  { target *-*-* } 258 }
+// { dg-error multiple inlined namespaces  { target *-*-* } 263 }


Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-11-01 Thread Alan Modra
On Tue, Nov 01, 2011 at 02:00:25PM -0500, Peter Bergner wrote:
   (get_ppc476_thunk_name): New function.
   (rs6000_code_end): Likewise.

rs6000.c:27968:1: error: 'void rs6000_code_end()' defined but not used 
[-Werror=unused-function]
cc1plus: all warnings being treated as errors

Bootstrapped and committed as obvious, revision 180761.

* config/rs6000/rs6000.c (rs6000_code_end): Declare ATTRIBUTE_UNUSED.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 180754)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1176,6 +1176,7 @@ static void rs6000_trampoline_init (rtx,
 static bool rs6000_cannot_force_const_mem (enum machine_mode, rtx);
 static bool rs6000_legitimate_constant_p (enum machine_mode, rtx);
 static bool rs6000_save_toc_in_prologue_p (void);
+static void rs6000_code_end (void) ATTRIBUTE_UNUSED;
 
 /* Hash table stuff for keeping track of TOC entries.  */
 
-- 
Alan Modra
Australia Development Lab, IBM


Re: -fdump-go-spec option does not handle redefinitions

2011-11-01 Thread Ian Lance Taylor
Uros Bizjak ubiz...@gmail.com writes:

 The problem with your proposal is that the output would be invalid Go,
 because it would attempt to define the name _aa twice.  However, it does
 seem plausible that in most scenarios of this type it would be more
 useful for -fdump-go-spec to generate

 const _aa = 3

 I agree.

This patch implements this approach.  Bootstrapped and ran Go testsuite
on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian


2011-11-01  Ian Lance Taylor  i...@google.com

* godump.c (struct macro_hash_value): Define.
(macro_hash_hashval): New static function.
(macro_hash_eq, macro_hash_del): New static functions.
(go_define): Use macro_hash_value to store values in macro_hash.
Replace an old value on a redefinition.  Don't print anything to
go_dump_file.
(go_undef): Delete the entry from the hash table.
(go_output_typedef): For an enum, use macro_hash_value, and don't
print anything to go_dump_file.
(go_print_macro): New static function.
(go_finish): Traverse macro_hash with go_print_macro.
(dump_go_spec_init): Update macro_hash creation for
macro_hash_value.


Index: godump.c
===
--- godump.c	(revision 180342)
+++ godump.c	(working copy)
@@ -62,7 +62,47 @@ static GTY(()) VEC(tree,gc) *queue;
 
 static htab_t macro_hash;
 
-/* For the hash tables.  */
+/* The type of a value in macro_hash.  */
+
+struct macro_hash_value
+{
+  /* The name stored in the hash table.  */
+  char *name;
+  /* The value of the macro.  */
+  char *value;
+};
+
+/* Calculate the hash value for an entry in the macro hash table.  */
+
+static hashval_t
+macro_hash_hashval (const void *val)
+{
+  const struct macro_hash_value *mhval = (const struct macro_hash_value *) val;
+  return htab_hash_string (mhval-name);
+}
+
+/* Compare values in the macro hash table for equality.  */
+
+static int
+macro_hash_eq (const void *v1, const void *v2)
+{
+  const struct macro_hash_value *mhv1 = (const struct macro_hash_value *) v1;
+  const struct macro_hash_value *mhv2 = (const struct macro_hash_value *) v2;
+  return strcmp (mhv1-name, mhv2-name) == 0;
+}
+
+/* Free values deleted from the macro hash table.  */
+
+static void
+macro_hash_del (void *v)
+{
+  struct macro_hash_value *mhv = (struct macro_hash_value *) v;
+  XDELETEVEC (mhv-name);
+  XDELETEVEC (mhv-value);
+  XDELETE (mhv);
+}
+
+/* For the string hash tables.  */
 
 static int
 string_hash_eq (const void *y1, const void *y2)
@@ -77,10 +117,12 @@ go_define (unsigned int lineno, const ch
 {
   const char *p;
   const char *name_end;
+  size_t out_len;
   char *out_buffer;
   char *q;
   bool saw_operand;
   bool need_operand;
+  struct macro_hash_value *mhval;
   char *copy;
   hashval_t hashval;
   void **slot;
@@ -105,17 +147,17 @@ go_define (unsigned int lineno, const ch
   memcpy (copy, buffer, name_end - buffer);
   copy[name_end - buffer] = '\0';
 
+  mhval = XNEW (struct macro_hash_value);
+  mhval-name = copy;
+  mhval-value = NULL;
+
   hashval = htab_hash_string (copy);
-  slot = htab_find_slot_with_hash (macro_hash, copy, hashval, NO_INSERT);
-  if (slot != NULL)
-{
-  XDELETEVEC (copy);
-  return;
-}
+  slot = htab_find_slot_with_hash (macro_hash, mhval, hashval, NO_INSERT);
 
   /* For simplicity, we force all names to be hidden by adding an
  initial underscore, and let the user undo this as needed.  */
-  out_buffer = XNEWVEC (char, strlen (p) * 2 + 1);
+  out_len = strlen (p) * 2 + 1;
+  out_buffer = XNEWVEC (char, out_len);
   q = out_buffer;
   saw_operand = false;
   need_operand = false;
@@ -141,6 +183,7 @@ go_define (unsigned int lineno, const ch
 	   don't worry about them.  */
 	const char *start;
 	char *n;
+	struct macro_hash_value idval;
 
 	if (saw_operand)
 	  goto unknown;
@@ -151,8 +194,9 @@ go_define (unsigned int lineno, const ch
 	n = XALLOCAVEC (char, p - start + 1);
 	memcpy (n, start, p - start);
 	n[p - start] = '\0';
-	slot = htab_find_slot (macro_hash, n, NO_INSERT);
-	if (slot == NULL || *slot == NULL)
+	idval.name = n;
+	idval.value = NULL;
+	if (htab_find (macro_hash, idval) == NULL)
 	  {
 		/* This is a reference to a name which was not defined
 		   as a macro.  */
@@ -382,18 +426,30 @@ go_define (unsigned int lineno, const ch
   if (need_operand)
 goto unknown;
 
+  gcc_assert ((size_t) (q - out_buffer)  out_len);
   *q = '\0';
 
-  slot = htab_find_slot_with_hash (macro_hash, copy, hashval, INSERT);
-  *slot = copy;
+  mhval-value = out_buffer;
 
-  fprintf (go_dump_file, const _%s = %s\n, copy, out_buffer);
+  if (slot == NULL)
+{
+  slot = htab_find_slot_with_hash (macro_hash, mhval, hashval, INSERT);
+  gcc_assert (slot != NULL  *slot == NULL);
+}
+  else
+{
+  if (*slot != NULL)
+	macro_hash_del (*slot);
+}
+
+  *slot = mhval;
 
-  

Re: RFC: PATCH to adjust warning flags for C++

2011-11-01 Thread Jason Merrill

On 11/02/2011 12:05 AM, Gabriel Dos Reis wrote:

  And I think that your code won't work in C++11 is
a warning that most C++ programmers will be interested in if they are asking
for warnings.


Even when -std=c++03 -Wall or -std=c++98 -Wall?


Yes.  -Wc++0x-compat has been part of -Wall for almost 5 years.  If 
people don't want narrowing warnings, they can use -Wno-narrowing, which 
is helpfully mentioned in the warnings themselves.


Jason