[Patch ARM] Fix off by one error in neon_evpc_vrev.

2012-05-26 Thread Ramana Radhakrishnan
Hi,

   There is  an off by one error in neon_evpc_vrev which means it
rarely gets triggerred. The problem is that if you are looking at
d-perm [i +j]  and you increment i by just diff you end up starting
looking where you looked at the end of the last place where you
checked. Given this I think this patch should make it to trunk and to
the 4.7 branch after appropriate testing and if the release managers
don't object to such a change. The point is that this helps generate
proper vect_reverse style instructions for arm_neon. There is scope
for a follow up patch that fixes up the intrinsics essentially
identical to all the functions in the test that I've added (except for
a couple of missing v2sf and v4sf type operations which should boil
down to the same implementation) . It's a permute so the type really
doesn't matter. However it requires some ML magic with permutations
for the masks which will prove to be good fun on a plane trip.

Testing currently in progress and if there are no regressions I intend
to commit this to trunk and (after a while I would like to backport
this to the 4.7 branch if there are no objections from the release
managers)

 RichardH , since you did the original patch would you mind having a
quick look to see if I haven't missed anything obvious.



* config/arm/arm.c (arm_evpc_neon_vrev): Fix off by one
error.
* gcc.target/arm/neon-vrev.c: New.


regards,
Ramana



1. For one example from the testcase you can see the effects before
and after for this :  (left is new compiler and right is old compiler)



vrev16q2_u8:vrev16q2_u8:
@ args = 0, pretend = 0, frame = 0  @ args 
= 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0 @ 
frame_needed = 0,
uses_anonymous_args = 0
@ link register save eliminated.@ link 
register save eliminated.
vmovd16, r0, r1  @ v16qi  | vmov
d20, r0, r1  @ v16qi
vmovd17, r2, r3   | vmov
d21, r2, r3
vrev16.8q8, q8| vldr
d16, .L41
vmovr0, r1, d16  @ v16qi  | vldr
d17, .L41+8
vmovr2, r3, d17   | vtbl.8  
d18, {d20, d21}, d16
bx  lr| vtbl.8  
d19, {d20, d21}, d17
   vmov
r0, r1, d18  @ v16qi
   vmov
r2, r3, d19
   bx  
lr
   .L42:
   .align  
3
   .L41:
   .byte   
1
   .byte   0
   .byte   
3
   .byte   
2
   .byte   
5
   .byte   
4
   .byte   
7
   .byte   
6
   .byte   
9
   .byte   
8
   .byte   
11
   .byte   
10
   .byte   
13
   .byte   
12
   .byte   
15
   .byte   
14
.size   vrev16q2_u8, .-vrev16q2_u8  .size   
vrev16q2_u8, .-vrev16q2_u8
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 321e6b5..bcad0b9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25668,10 +25668,18 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d)
   return false;
 }
 
-  for (i = 0; i  nelt; i += diff)
+  for (i = 0; i  nelt ; i += (diff + 1))
 for (j = 0; j = diff; j += 1)
-  if (d-perm[i + j] != i + diff - j)
-   return false;
+  {
+   /* This is guaranteed to be true as the value of diff
+  

[Ada] Small tweak to type derivation machinery

2012-05-26 Thread Eric Botcazou
To have the name of the types of the variant part and the fields therein be 
unique instead of mere duplicates of those of the base type, which makes it 
easier to debug type merging issues in LTO mode.

Tested on i586-suse-linux, applied on the mainline, 4.7 and 4.6 branches.


2012-05-26  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/decl.c (variant_desc): Rename 'record' to 'new_type'.
(build_variant_list): Adjust to above renaming.
(gnat_to_gnu_entity) E_Record_Subtype: Likewise.  Give a unique name
to the type of the variant containers.
(create_variant_part_from): Likewise.  Give a unique name to the type
of the variant part.


-- 
Eric Botcazou
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 187833)
+++ gcc-interface/decl.c	(working copy)
@@ -119,8 +119,8 @@ typedef struct variant_desc_d {
   /* The value of the qualifier.  */
   tree qual;
 
-  /* The record associated with this variant.  */
-  tree record;
+  /* The type of the variant after transformation.  */
+  tree new_type;
 } variant_desc;
 
 DEF_VEC_O(variant_desc);
@@ -3318,11 +3318,16 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		  {
 			tree old_variant = v-type;
 			tree new_variant = make_node (RECORD_TYPE);
+			tree suffix
+			  = concat_name (DECL_NAME (gnu_variant_part),
+	 IDENTIFIER_POINTER
+	 (DECL_NAME (v-field)));
 			TYPE_NAME (new_variant)
-			  = DECL_NAME (TYPE_NAME (old_variant));
+			  = concat_name (TYPE_NAME (gnu_type),
+	 IDENTIFIER_POINTER (suffix));
 			copy_and_substitute_in_size (new_variant, old_variant,
 		 gnu_subst_list);
-			v-record = new_variant;
+			v-new_type = new_variant;
 		  }
 		}
 	  else
@@ -3426,7 +3431,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 			if (selected_variant)
 			  gnu_cont_type = gnu_type;
 			else
-			  gnu_cont_type = v-record;
+			  gnu_cont_type = v-new_type;
 			  }
 			else
 			  /* The front-end may pass us ghost components if
@@ -7562,7 +7567,7 @@ build_variant_list (tree qual_union_type
 	  v-type = variant_type;
 	  v-field = gnu_field;
 	  v-qual = qual;
-	  v-record = NULL_TREE;
+	  v-new_type = NULL_TREE;
 
 	  /* Recurse on the variant subpart of the variant, if any.  */
 	  variant_subpart = get_variant_part (variant_type);
@@ -8238,7 +8243,9 @@ create_variant_part_from (tree old_varia
 
   /* First create the type of the variant part from that of the old one.  */
   new_union_type = make_node (QUAL_UNION_TYPE);
-  TYPE_NAME (new_union_type) = DECL_NAME (TYPE_NAME (old_union_type));
+  TYPE_NAME (new_union_type)
+= concat_name (TYPE_NAME (record_type),
+		   IDENTIFIER_POINTER (DECL_NAME (old_variant_part)));
 
   /* If the position of the variant part is constant, subtract it from the
  size of the type of the parent to get the new size.  This manual CSE
@@ -8272,7 +8279,7 @@ create_variant_part_from (tree old_varia
 	continue;
 
   /* Retrieve the list of fields already added to the new variant.  */
-  new_variant = v-record;
+  new_variant = v-new_type;
   field_list = TYPE_FIELDS (new_variant);
 
   /* If the old variant had a variant subpart, we need to create a new


Fix gnat.dg/renaming5.adb regression

2012-05-26 Thread Eric Botcazou
There is one more goto in the .optimized dump because the latch of a loop is 
now preserved.

Tested on i586-suse-linux, applied on the mainline.


2012-05-26  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/renaming5.adb: Adjust dg-final directive.


-- 
Eric Botcazou
Index: gnat.dg/renaming5.adb
===
--- gnat.dg/renaming5.adb	(revision 187833)
+++ gnat.dg/renaming5.adb	(working copy)
@@ -26,5 +26,5 @@ package body Renaming5 is
 
 end Renaming5;
 
--- { dg-final { scan-tree-dump-times goto 2 optimized } }
+-- { dg-final { scan-tree-dump-times goto 3 optimized } }
 -- { dg-final { cleanup-tree-dump optimized } }


[Ada] Fix gnat.dg/return3.adb regression

2012-05-26 Thread Eric Botcazou
The problem is that the new call to cleanup_cfg in gimple_expand_cfg has 
short-circuited the machinery that emits nops to carry goto locus at -O0.
The machinery works in CFGLAYOUT mode, but here we're still in CFGRTL.

The attached patch makes it so that forwarder blocks are not deleted by 
try_optimize_cfg if CLEANUP_NO_INSN_DEL and partially extends the above 
machinery to the CFGRTL mode.

Bootstrapped/regtested on x86_64-suse-linux, applied on the mainline.


2012-05-26  Eric Botcazou  ebotca...@adacore.com

* cfgcleanup.c (try_optimize_cfg): Do not delete forwarder blocks
if CLEANUP_NO_INSN_DEL.
* cfgrtl.c (unique_locus_on_edge_between_p): New function extracted
from cfg_layout_merge_blocks.
(emit_nop_for_unique_locus_between): New function.
(rtl_merge_blocks): Invoke emit_nop_for_unique_locus_between.
(cfg_layout_merge_blocks): Likewise.


-- 
Eric Botcazou
Index: cfgcleanup.c
===
--- cfgcleanup.c	(revision 187833)
+++ cfgcleanup.c	(working copy)
@@ -2644,7 +2644,7 @@ try_optimize_cfg (int mode)
 		}
 
 	  /* If we fall through an empty block, we can remove it.  */
-	  if (!(mode  CLEANUP_CFGLAYOUT)
+	  if (!(mode  (CLEANUP_CFGLAYOUT | CLEANUP_NO_INSN_DEL))
 		   single_pred_p (b)
 		   (single_pred_edge (b)-flags  EDGE_FALLTHRU)
 		   !LABEL_P (BB_HEAD (b))
Index: cfgrtl.c
===
--- cfgrtl.c	(revision 187833)
+++ cfgrtl.c	(working copy)
@@ -603,6 +603,56 @@ rtl_split_block (basic_block bb, void *i
   return new_bb;
 }
 
+/* Return true if the single edge between blocks A and B is the only place
+   in RTL which holds some unique locus.  */
+
+static bool
+unique_locus_on_edge_between_p (basic_block a, basic_block b)
+{
+  const int goto_locus = EDGE_SUCC (a, 0)-goto_locus;
+  rtx insn, end;
+
+  if (!goto_locus)
+return false;
+
+  /* First scan block A backward.  */
+  insn = BB_END (a);
+  end = PREV_INSN (BB_HEAD (a));
+  while (insn != end  (!NONDEBUG_INSN_P (insn) || INSN_LOCATOR (insn) == 0))
+insn = PREV_INSN (insn);
+
+  if (insn != end  locator_eq (INSN_LOCATOR (insn), goto_locus))
+return false;
+
+  /* Then scan block B forward.  */
+  insn = BB_HEAD (b);
+  if (insn)
+{
+  end = NEXT_INSN (BB_END (b));
+  while (insn != end  !NONDEBUG_INSN_P (insn))
+	insn = NEXT_INSN (insn);
+
+  if (insn != end  INSN_LOCATOR (insn) != 0
+	   locator_eq (INSN_LOCATOR (insn), goto_locus))
+	return false;
+}
+
+  return true;
+}
+
+/* If the single edge between blocks A and B is the only place in RTL which
+   holds some unique locus, emit a nop with that locus between the blocks.  */
+
+static void
+emit_nop_for_unique_locus_between (basic_block a, basic_block b)
+{
+  if (!unique_locus_on_edge_between_p (a, b))
+return;
+
+  BB_END (a) = emit_insn_after_noloc (gen_nop (), BB_END (a), a);
+  INSN_LOCATOR (BB_END (a)) = EDGE_SUCC (a, 0)-goto_locus;
+}
+
 /* Blocks A and B are to be merged into a single block A.  The insns
are already contiguous.  */
 
@@ -681,15 +731,25 @@ rtl_merge_blocks (basic_block a, basic_b
 
   /* Delete everything marked above as well as crap that might be
  hanging out between the two blocks.  */
-  BB_HEAD (b) = NULL;
+  BB_END (a) = a_end;
+  BB_HEAD (b) = b_empty ? NULL_RTX : b_head;
   delete_insn_chain (del_first, del_last, true);
 
+  /* When not optimizing CFG and the edge is the only place in RTL which holds
+ some unique locus, emit a nop with that locus in between.  */
+  if (!optimize)
+{
+  emit_nop_for_unique_locus_between (a, b);
+  a_end = BB_END (a);
+}
+
   /* Reassociate the insns of B with A.  */
   if (!b_empty)
 {
   update_bb_for_insn_chain (a_end, b_debug_end, a);
 
-  a_end = b_debug_end;
+  BB_END (a) = b_debug_end;
+  BB_HEAD (b) = NULL_RTX;
 }
   else if (b_end != b_debug_end)
 {
@@ -701,11 +761,10 @@ rtl_merge_blocks (basic_block a, basic_b
 	reorder_insns_nobb (NEXT_INSN (a_end), PREV_INSN (b_debug_start),
 			b_debug_end);
   update_bb_for_insn_chain (b_debug_start, b_debug_end, a);
-  a_end = b_debug_end;
+  BB_END (a) = b_debug_end;
 }
 
   df_bb_delete (b-index);
-  BB_END (a) = a_end;
 
   /* If B was a forwarder block, propagate the locus on the edge.  */
   if (forwarder_p  !EDGE_SUCC (b, 0)-goto_locus)
@@ -2853,33 +2912,10 @@ cfg_layout_merge_blocks (basic_block a,
 try_redirect_by_replacing_jump (EDGE_SUCC (a, 0), b, true);
   gcc_assert (!JUMP_P (BB_END (a)));
 
-  /* When not optimizing and the edge is the only place in RTL which holds
+  /* When not optimizing CFG and the edge is the only place in RTL which holds
  some unique locus, emit a nop with that locus in between.  */
-  if (!optimize  EDGE_SUCC (a, 0)-goto_locus)
-{
-  rtx insn = BB_END (a), end = PREV_INSN (BB_HEAD (a));
-  int goto_locus = EDGE_SUCC (a, 

Re: divide 64-bit by constant for 32-bit target machines

2012-05-26 Thread Paolo Bonzini

 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 2cecf45..9d6983b 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* 
 total, bool speed)
   *total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT  mode == DFmode  !TARGET_VFP_SINGLE)
   *total = COSTS_N_INSNS (4);
 +  else if (mode == DImode)
 +*total = COSTS_N_INSNS (50);
else
   *total = COSTS_N_INSNS (20);
return false;
 diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
 index d48a465..b5627c2 100644
 --- a/gcc/config/mips/mips.c
 +++ b/gcc/config/mips/mips.c
 @@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int 
 opno ATTRIBUTE_UNUSED,
 *total = COSTS_N_INSNS (mips_idiv_insns ());
   }
else if (mode == DImode)
 -*total = mips_cost-int_div_di;
 +{
 +  if (!TARGET_64BIT)
 +/* Divide double integer library call is expensive.  */
 +*total = COSTS_N_INSNS (200);
 +  else
 +*total = mips_cost-int_div_di;
 +}
else
   *total = mips_cost-int_div_si;
return false;
 diff --git a/gcc/expmed.c b/gcc/expmed.c
 index aa24fbf..5f4c921 100644
 --- a/gcc/expmed.c
 +++ b/gcc/expmed.c
 @@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, 
 rtx op0, rtx op1,
   }
  }
  
 +  if (unsignedp  (!optimize_size  (optimize1))
 +   (size - 1  BITS_PER_WORD
 +   BITS_PER_WORD == 32  GET_MODE_BITSIZE (mode) == 
 2*BITS_PER_WORD)

These references to 32-bits are still wrong (and unnecessary, just
remove them).

 +   (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
 +  + shift_cost[speed][mode][31]  max_cost))
 +{
 +  unsigned HOST_WIDE_INT d;
 +  rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, 
 result;
 +
 +  d = (INTVAL (op1)  GET_MODE_MASK (mode));

This could be a CONST_DOUBLE.  But you don't need d, because you can...

 +  /* Extracting the higher part of the 64-bit multiplier.  */
 +  x1 = gen_highpart (word_mode, op0);
 +  x1 = force_reg (word_mode, x1);
 +
 +  /* Extracting the lower part of the 64-bit multiplier.  */
 +  x0 = gen_lowpart (word_mode, op0);
 +  x0 = force_reg (word_mode, x0);
 +
 +  /* Splitting the 64-bit constant for the higher and the lower parts.  
 */
 +  y0 = gen_int_mode(d  UINT32_MAX, word_mode);
 +  y1 = gen_int_mode(d  32, word_mode);

... use gen_lowpart and gen_highpart directly on op1.

 +
 +  z2 = gen_reg_rtx (mode);
 +  u0 = gen_reg_rtx (mode);
 +
 +  /* Unsigned multiplication of the higher multiplier part
 +  and the higher constant part.  */
 +  z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
 +  /* Unsigned multiplication of the lower multiplier part
 +  and the higher constant part.  */
 +  u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
 +
 +  z0 = gen_reg_rtx (mode);
 +  u1 = gen_reg_rtx (mode);
 +
 +  /* Unsigned multiplication of the lower multiplier part
 +  and the lower constant part.  */
 +  z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
 +
 +  /* Unsigned multiplication of the higher multiplier part
 +  the lower constant part.  */
 +  u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);

Up to here the comments are not necessary.

 +  /* Getting the higher part of multiplication between the lower 
 multiplier
 +  part and the lower constant part, the lower part is not interesting
 +  for the final result.  */
 +  u0tmp = gen_highpart (word_mode, z0);
 +  u0tmp = force_reg (word_mode, u0tmp);
 +  u0tmp = convert_to_mode (mode, u0tmp, 1);
 +
 +  /* Adding the higher part of multiplication between the lower 
 multiplier
 +  part and the lower constant part to the result of multiplication 
 between
 +  the lower multiplier part and the higher constant part. Please note,
 +  that we couldn't get overflow here since in the worst case
 +  (0x*0x)+0x we get 0xL.  */

The command can simply be compute the middle word of the three-word
intermediate result.  Also it's not overflow, it's carry.

 +  expand_inc (u0, u0tmp);
 +  tmp = gen_reg_rtx (mode);
 +
 +  /* Adding multiplication between the lower multiplier part and the 
 higher
 +  constant part with the higher part of multiplication between the lower
 +  multiplier part and the lower constant part to the result of 
 multiplication
 +  between the higher multiplier part and the lower constant part.  */

Here you have to explain:

   /* We have to return

z2 + ((u0 + u1)  GET_MODE_BITSIZE (word_mode)).

  u0 + u1 are the upper two words of the three-word
  intermediate result and they could 

Re: divide 64-bit by constant for 32-bit target machines

2012-05-26 Thread Paolo Bonzini
Il 25/05/2012 12:20, Dinar Temirbulatov ha scritto:
 +  emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
 +  emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
 +  result = expand_binop (mode, ior_optab, c, c1, cres, 1, 
 OPTAB_LIB_WIDEN);
 +  if (!result)
 +   return 0;

Ah, you don't need the or.  u0  tmp is already giving the overflow.

Paolo


Re: divide 64-bit by constant for 32-bit target machines

2012-05-26 Thread Paolo Bonzini
Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
/* We have to return
 
 z2 + ((u0 + u1)  GET_MODE_BITSIZE (word_mode)).
 
   u0 + u1 are the upper two words of the three-word
   intermediate result and they could have up to
   2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
   We compute the extra bit by checking for carry, and add
   1  GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */

Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.

  +  tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, 
  OPTAB_LIB_WIDEN);
  +  if (!tmp)
  + return 0;
  /* We have to return z2 + (tmp  32).  We need
  +  /* Checking for overflow.  */
 This is not overflow, it's carry (see above).
 
  +  c = gen_reg_rtx (mode);
  +  c1 = gen_reg_rtx (mode);
  +  cres = gen_reg_rtx (mode);
  +
  +  emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
  +  emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
  +  result = expand_binop (mode, ior_optab, c, c1, cres, 1, 
  OPTAB_LIB_WIDEN);
  +  if (!result)
  +   return 0;
  +
  +  ccst = gen_reg_rtx (mode);
  +  ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
 This 32 should be GET_MODE_BITSIZE (word_mode).

Here, too.

Paolo



[C++ Patch] PR 25137

2012-05-26 Thread Paolo Carlini

Hi,

I found the time to return to this issue, where -Wmissing-braces is 
often overeager to warn, thus annoying, for example, people using -Wall 
together with std::array:


std::arrayint, 3 s = { 1, 2, 3 };

I handle the issue following the letter of the suggestion given by Ian 
at the time: do not warn when the class type has only one field and that 
field is an aggregate (recursively, of course). Indeed, that seems to me 
quite conservative. I also make sure to change nothing wrt the 
designated initializers extension.


Bootstrapped and tested x86_64-linux.

Thanks,
Paolo.


/cp
2012-05-26  Paolo Carlini  paolo.carl...@oracle.com

PR c++/25137
* decl.c (reshape_init_r): Add bool parameter.
(reshape_init_class): If the struct has only one field and that
field is an aggregate, don't warn if there is only one set of
braces in the initializer.
(reshape_init_array_1, reshape_init_class, reshape_init): Adjust.

/testsuite
2012-05-26  Paolo Carlini  paolo.carl...@oracle.com

PR c++/25137
* g++.dg/warn/Wbraces3.C: New.

Index: testsuite/g++.dg/warn/Wbraces3.C
===
--- testsuite/g++.dg/warn/Wbraces3.C(revision 0)
+++ testsuite/g++.dg/warn/Wbraces3.C(revision 0)
@@ -0,0 +1,34 @@
+// PR c++/25137
+// { dg-options -Wmissing-braces }
+
+struct S { int s[3]; };
+S s1 = { 1, 1, 1 };
+
+struct S1 { int s[3]; };
+struct S2 { struct S1 a; };
+S2 s21 = { 1, 1, 1 }; 
+
+struct S3 { int s[3]; };
+struct S4 { struct S3 a; int b; };
+S4 s41 = { 1, 1, 1, 1 };   // { dg-warning missing braces around initializer 
for 'S3' }
+
+struct S5 { int s[3]; };
+struct S6 { struct S5 a; int b; };
+S6 s61 = { { 1, 1, 1 }, 1 };
+
+struct S7 { int s[3]; };
+struct S8 { int a; struct S7 b; };
+S8 s81 = { 1, { 1, 1, 1 } };
+
+struct S9 { int s[2]; };
+struct S10 { struct S9 a; struct S9 b; };
+S10 s101 = { { 1, 1 }, 1, 1 }; // { dg-warning missing braces around 
initializer for 'S9' }
+
+struct S11 { int s[2]; };
+struct S12 { struct S11 a; struct S11 b; };
+S12 s121 = { { 1, 1 }, { 1, 1 } };
+
+struct S13 { int i; };
+struct S14 { struct S13 a; };
+struct S15 { struct S14 b; };
+S15 s151 = { 1 };
Index: cp/decl.c
===
--- cp/decl.c   (revision 187907)
+++ cp/decl.c   (working copy)
@@ -4954,7 +4954,7 @@ typedef struct reshape_iterator_t
   constructor_elt *end;
 } reshape_iter;
 
-static tree reshape_init_r (tree, reshape_iter *, bool, tsubst_flags_t);
+static tree reshape_init_r (tree, reshape_iter *, bool, bool, tsubst_flags_t);
 
 /* FIELD is a FIELD_DECL or NULL.  In the former case, the value
returned is the next FIELD_DECL (possibly FIELD itself) that can be
@@ -5014,7 +5014,7 @@ reshape_init_array_1 (tree elt_type, tree max_inde
 
   check_array_designated_initializer (d-cur, index);
   elt_init = reshape_init_r (elt_type, d, /*first_initializer_p=*/false,
-complain);
+/*no_warn_missing_braces=*/false, complain);
   if (elt_init == error_mark_node)
return error_mark_node;
   CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_init),
@@ -5082,6 +5082,7 @@ reshape_init_class (tree type, reshape_iter *d, bo
 {
   tree field;
   tree new_init;
+  bool no_warn_missing_braces;
 
   gcc_assert (CLASS_TYPE_P (type));
 
@@ -5105,6 +5106,12 @@ reshape_init_class (tree type, reshape_iter *d, bo
   return new_init;
 }
 
+  /* If the struct has only one field and that field is an aggregate,
+ don't warn if there is only one set of braces in the initializer.  */
+  no_warn_missing_braces
+= (next_initializable_field (DECL_CHAIN (field)) == NULL_TREE
+CP_AGGREGATE_TYPE_P (TREE_TYPE (field)));
+
   /* Loop through the initializable fields, gathering initializers.  */
   while (d-cur != d-end)
 {
@@ -5125,7 +5132,10 @@ reshape_init_class (tree type, reshape_iter *d, bo
/* We already reshaped this.  */
gcc_assert (d-cur-index == field);
  else
-   field = lookup_field_1 (type, d-cur-index, /*want_type=*/false);
+   {
+ field = lookup_field_1 (type, d-cur-index, /*want_type=*/false);
+ no_warn_missing_braces = false;
+   }
 
  if (!field || TREE_CODE (field) != FIELD_DECL)
{
@@ -5141,7 +5151,8 @@ reshape_init_class (tree type, reshape_iter *d, bo
break;
 
   field_init = reshape_init_r (TREE_TYPE (field), d,
-  /*first_initializer_p=*/false, complain);
+  /*first_initializer_p=*/false,
+  no_warn_missing_braces, complain);
   if (field_init == error_mark_node)
return error_mark_node;
 
@@ -5183,11 +5194,12 @@ has_designator_problem (reshape_iter *d, tsubst_fl
a CONSTRUCTOR). TYPE is the type of 

[ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-05-26 Thread Carrot Wei
Hi,

As described in PR53447, many 64bit ALU operations with constant can be
optimized to use corresponding 32bit instructions with immediate operands.

This is the first part of the patches that deals with 64bit add. It directly
extends the patterns adddi3, arm_adddi3 and adddi3_neon to handle constant
operands.

Tested on arm qemu without regression.

OK for trunk?

thanks
Carrot

2012-05-26  Wei Guozhi  car...@google.com

PR target/53447
* gcc.target/arm/pr53447-1.c: New testcase.


2012-05-26  Wei Guozhi  car...@google.com

PR target/53447
* config/arm/arm-protos.h (const_ok_for_adddi): New prototype.
* config/arm/arm.c (const_ok_for_adddi): New function.
* config/arm/constraints.md (Dd): New constraint.
* config/arm/arm.md (adddi3): Extend it to handle constants.
(arm_adddi3): Likewise.
* config/arm/neon.md (adddi3_neon): Likewise.


Index: testsuite/gcc.target/arm/pr53447-1.c
===
--- testsuite/gcc.target/arm/pr53447-1.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-1.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options -O2 }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not mov } } */
+
+void t0p(long long * p)
+{
+  *p += 0x10001;
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 187751)
+++ config/arm/arm.c(working copy)
@@ -2497,6 +2497,17 @@
 }
 }

+/* Return TRUE if int I is a valid immediate constant used by pattern
+   arm_adddi3.  */
+int
+const_ok_for_adddi (HOST_WIDE_INT i)
+{
+  HOST_WIDE_INT high = (i  32)  0x;
+  HOST_WIDE_INT low = i  0x;
+  return (const_ok_for_arm (high)
+  (const_ok_for_arm (low) || const_ok_for_arm (-low)));
+}
+
 /* Emit a sequence of insns to handle a large constant.
CODE is the code of the operation required, it can be any of SET, PLUS,
IOR, AND, XOR, MINUS;
Index: config/arm/arm-protos.h
===
--- config/arm/arm-protos.h (revision 187751)
+++ config/arm/arm-protos.h (working copy)
@@ -47,6 +47,7 @@
 extern bool arm_small_register_classes_for_mode_p (enum machine_mode);
 extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode);
+extern int const_ok_for_adddi (HOST_WIDE_INT);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187751)
+++ config/arm/neon.md  (working copy)
@@ -588,9 +588,9 @@
 )

 (define_insn adddi3_neon
-  [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w)
-(plus:DI (match_operand:DI 1 s_register_operand %w,0,0,w)
- (match_operand:DI 2 s_register_operand w,r,0,w)))
+  [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w,?r,?r,?r)
+(plus:DI (match_operand:DI 1 s_register_operand %w,0,0,w,r,0,r)
+ (match_operand:DI 2 reg_or_int_operand w,r,0,w,r,Dd,Dd)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_NEON
 {
@@ -600,13 +600,16 @@
 case 3: return vadd.i64\t%P0, %P1, %P2;
 case 1: return #;
 case 2: return #;
+case 4: return #;
+case 5: return #;
+case 6: return #;
 default: gcc_unreachable ();
 }
 }
-  [(set_attr neon_type neon_int_1,*,*,neon_int_1)
-   (set_attr conds *,clob,clob,*)
-   (set_attr length *,8,8,*)
-   (set_attr arch nota8,*,*,onlya8)]
+  [(set_attr neon_type neon_int_1,*,*,neon_int_1,*,*,*)
+   (set_attr conds *,clob,clob,*,clob,clob,clob)
+   (set_attr length *,8,8,*,*,*,*)
+   (set_attr arch nota8,*,*,onlya8,*,*,*)]
 )

 (define_insn *submode3_neon
Index: config/arm/constraints.md
===
--- config/arm/constraints.md   (revision 187751)
+++ config/arm/constraints.md   (working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -251,6 +251,12 @@
   (match_test TARGET_32BIT  arm_const_double_inline_cost (op) == 4
!(optimize_size || arm_ld_sched

+(define_constraint Dd
+ @internal
+ In ARM/Thumb-2 state a const_int that can be used by insn adddi.
+ (and (match_code const_int)
+  (match_test TARGET_32BIT  const_ok_for_adddi (ival
+
 (define_constraint Di
  @internal
   In ARM/Thumb-2 state a const_int or const_double where both the high

Re: [C++ Patch] PR 53491

2012-05-26 Thread Jason Merrill
I think I would rather fix stabilize_expr to handle void arguments 
properly: basically just stick the whole argument in *initp and return 
void_zero_node.


Jason


Re: RFC: IRA patch to reduce lifetimes

2012-05-26 Thread H.J. Lu
On Mon, May 21, 2012 at 9:33 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 7:35 AM, Bernd Schmidt ber...@codesourcery.com 
 wrote:
 On 12/23/2011 05:31 PM, Vladimir Makarov wrote:
 On 12/21/2011 09:09 AM, Bernd Schmidt wrote:
 This patch was an experiment to see if we can get the same improvement
 with modifications to IRA, making it more tolerant to over-aggressive
 scheduling. THe idea is that if an instruction sets a register A, and
 all its inputs are live and unmodified for the lifetime of A, then
 moving the instruction downwards towards its first use is going to be
 beneficial from a register pressure point of view.

 That alone, however, turns out to be too aggressive, performance drops
 presumably because we undo too many scheduling decisions. So, the patch
 detects such situations, and splits the pseudo; a new pseudo is
 introduced in the original setting instruction, and a copy is added
 before the first use. If the new pseudo does not get a hard register, it
 is removed again and instead the setting instruction is moved to the
 point of the copy.

 This gets up to 6.5% on 456.hmmer on the mips target I was working on;
 an embedded benchmark suite also seems to have a (small) geomean
 improvement. On x86_64, I've tested spec2k, where specint is unchanged
 and specfp has a tiny performance regression. All these tests were done
 with a gcc-4.6 based tree.

 Thoughts? Currently the patch feels somewhat bolted on to the side of
 IRA, maybe there's a nicer way to achieve this?

 I think that is an excellent idea.  I used analogous approach for
 splitting pseudo in IRA on loop bounds even if it gets hard register
 inside and outside loops.  The copies are removed if the live ranges
 were not spilled in reload.

 I have no problem with this patch.  It is just a small change in IRA.

 Sounds like you're happier with the patch than I am, so who am I to argue.

 Here's an updated version against current trunk, with some cc0 bugfixes
 that I've since discovered to be necessary. Bootstrapped and tested (but
 not benchmarked again) on i686-linux. Ok?


 Bernd

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53411


This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53495


-- 
H.J.


Re: [C++ Patch] PR 53491

2012-05-26 Thread Paolo Carlini

On 05/26/2012 04:21 PM, Jason Merrill wrote:
I think I would rather fix stabilize_expr to handle void arguments 
properly: basically just stick the whole argument in *initp and return 
void_zero_node.

Ok. Like this it works, if I understand your suggestion.

Thanks,
Paolo.


/cp
2012-05-26  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53491
* tree.c (stabilize_expr): Handle exp of void type.

/testsuite
2012-05-26  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53491
* g++.dg/parse/crash60.C: New.

Index: testsuite/g++.dg/parse/crash60.C
===
--- testsuite/g++.dg/parse/crash60.C(revision 0)
+++ testsuite/g++.dg/parse/crash60.C(revision 0)
@@ -0,0 +1,14 @@
+// PR c++/53491
+
+struct M
+{
+  void pop();
+};
+
+void foo()
+{
+  int result = 0;
+  M m;
+
+  result += m.pop();  // { dg-error invalid operands|in evaluation }
+}
Index: cp/tree.c
===
--- cp/tree.c   (revision 187915)
+++ cp/tree.c   (working copy)
@@ -3279,7 +3279,12 @@ stabilize_expr (tree exp, tree* initp)
 {
   tree init_expr;
 
-  if (!TREE_SIDE_EFFECTS (exp))
+  if (VOID_TYPE_P (TREE_TYPE (exp)))
+{
+  *initp = exp;
+  return void_zero_node;
+}
+  else if (!TREE_SIDE_EFFECTS (exp))
 init_expr = NULL_TREE;
   /* There are no expressions with REFERENCE_TYPE, but there can be call
  arguments with such a type; just treat it as a pointer.  */


Re: [C++ Patch] PR 53491

2012-05-26 Thread Jason Merrill

On 05/26/2012 11:31 AM, Paolo Carlini wrote:

Ok. Like this it works, if I understand your suggestion.


Yep, that's what I had in mind. But let's put it after the 
!TREE_SIDE_EFFECTS case.  OK with that change.


Jason


Re: RFC (c): PATCH for c++/53220 (array compound literals and C++)

2012-05-26 Thread Joseph S. Myers
On Sat, 26 May 2012, Jason Merrill wrote:

 In C++, C99 a compound literal creates a temporary object, unlike C, where it
 creates an automatic or static object.  As a result, using an array compound
 literal to initialize a pointer variable leads to undefined behavior, as the
 array's lifetime ends after the declaration statement.  This patch changes the
 C++ front end to reject decay of temporary array compound literals to
 pointers, and the C front end to warn about it with -Wc++-compat.
 
 Tested x86_64-pc-linux-gnu.  Is the C front-end change OK?

The C front-end change is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Ada] Fix some typos in comments

2012-05-26 Thread oliver.kell...@t-online.de
Hello,

I happened to notice these typos but I don't have commit rights.
Thanks to whoever might pick it up.

-- Oliver

2012-05-26  Oliver Kellogg  okell...@users.sourceforge.net

	* alloc.ads, exp_dbug.adb, gcc-interface/misc.c, lib.ads, lib-xref.adb,
	lib-xref.ads, sem_ch10.adb, sem_ch12.adb, sem_res.ads, table.ads:
	Fix typo.

Index: lib.ads
===
--- lib.ads	(revision 187915)
+++ lib.ads	(working copy)
@@ -66,7 +66,7 @@
 
--(a) Corresponding spec for a body
--(b) Parent spec of a child library spec
-   --(d) With'ed specs
+   --(c) With'ed specs
--(d) Parent body of a subunit
--(e) Subunits corresponding to any specified stubs
--(f) Bodies of inlined subprograms that are called
@@ -227,7 +227,7 @@
--  A subprogram body (in an adb file) may stand for both a spec and a body.
--  A simple model (and one that was adopted through version 2.07) is simply
--  to assume that such an adb file acts as its own spec if no ads file is
-   --  is present.
+   --  present.
 
--  However, this is not correct. RM 10.1.4(4) requires that such a body
--  act as a spec unless a subprogram declaration of the same name is
Index: table.ads
===
--- table.ads	(revision 187915)
+++ table.ads	(working copy)
@@ -144,7 +144,7 @@
 
   function Last return Table_Index_Type;
   pragma Inline (Last);
-  --  Returns the current value of the last used entry in the table, which
+  --  Returns the current index of the last used entry in the table, which
   --  can then be used as a subscript for Table. Note that the only way to
   --  modify Last is to call the Set_Last procedure. Last must always be
   --  used to determine the logically last entry.
Index: sem_ch10.adb
===
--- sem_ch10.adb	(revision 187915)
+++ sem_ch10.adb	(working copy)
@@ -767,7 +767,7 @@
 
 --  If the subprogram body is a child unit, we must create a
 --  declaration for it, in order to properly load the parent(s).
---  After this, the original unit does not acts as a spec, because
+--  After this, the original unit does not act as a spec, because
 --  there is an explicit one. If this unit appears in a context
 --  clause, then an implicit with on the parent will be added when
 --  installing the context. If this is the main unit, there is no
Index: sem_ch12.adb
===
--- sem_ch12.adb	(revision 187915)
+++ sem_ch12.adb	(working copy)
@@ -6612,7 +6612,7 @@
  --  If we are not instantiating, then this is where we load and
  --  analyze subunits, i.e. at the point where the stub occurs. A
  --  more permissive system might defer this analysis to the point
- --  of instantiation, but this seems to complicated for now.
+ --  of instantiation, but this seems too complicated for now.
 
  if not Instantiating then
 declare
Index: alloc.ads
===
--- alloc.ads	(revision 187915)
+++ alloc.ads	(working copy)
@@ -30,7 +30,7 @@
 --
 
 --  This package contains definitions for initial sizes and growth increments
---  for the various dynamic arrays used for principle compiler data strcutures.
+--  for the various dynamic arrays used for principal compiler data strcutures.
 --  The indicated initial size is allocated for the start of each file, and
 --  the increment factor is a percentage used to increase the table size when
 --  it needs expanding (e.g. a value of 100 = 100% increase = double)
Index: gcc-interface/misc.c
===
--- gcc-interface/misc.c	(revision 187915)
+++ gcc-interface/misc.c	(working copy)
@@ -348,7 +348,7 @@
 void
 gnat_init_gcc_eh (void)
 {
-  /* We shouldn't do anything if the No_Exceptions_Handler pragma is set,
+  /* We shouldn't do anything if the No_Exception_Handlers pragma is set,
  though. This could for instance lead to the emission of tables with
  references to symbols (such as the Ada eh personality routine) within
  libraries we won't link against.  */
Index: sem_res.ads
===
--- sem_res.ads	(revision 187915)
+++ sem_res.ads	(working copy)
@@ -45,7 +45,7 @@
--  Since in practice a lot of semantic analysis has to be postponed until
--  types are known (e.g. static folding, setting of suppress flags), the
--  Resolve routines also complete the semantic analysis, and call the
-   --  expander for possibly expansion of the completely type resolved node.
+   --  expander for possibly 

Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread Sriraman Tallam
On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote:

 On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote:


 On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 
 
  
   On Fri, May 25, 2012 at 5:0   BTW, I noticed:

  
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
      20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
      82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
   __cpu_model@@GCC_4.8.0
     310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
   [hjl@gnu-6 pr14170]$
  
   Why is __cpu_model in both libgcc.a and libgcc_s.o?
 
  How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
  wrong but I cannot figure out the fix.
 
 Why don't you want it in libgcc_s.so?

 I thought libgcc.a is always linked in for static and dynamic builds. So
 having it in libgcc_s.so is redundant.


 [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
 [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
 __cpu_model@@GCC_4.8.0
   223: 2b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
   310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
 [hjl@gnu-6 pr14170]$

 I think there should be only one copy of __cpu_model in the process.
 It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
 from libgcc_s.so?

Ok, I am elaborating so that I understand the issue clearly.

The dynamic symbol table of libgcc_s.so:

$ objdump -T libgcc_s.so | grep __cpu

00015fd0 gDO .bss   0010  GCC_4.8.0   __cpu_model

It only has __cpu_model, not __cpu_indicator_init just like you
pointed out. I will fix this by adding a versioned symbol of
__cpu_indicator_init to the *.ver files.

Do you see any other issues here? I dont get the duplicate entries
part you are referring to. The static symbol table also contains
references to __cpu_model and __cpu_indicator_init, but that is
expected right?

In libgcc.a:

readelf -sWt 
/g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
| grep __cpu

   20: 001016 OBJECT  GLOBAL HIDDEN  COM __cpu_model
21: 0110   612 FUNCGLOBAL HIDDEN4 __cpu_indicator_init

libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
HIDDEN visibility. Is this an issue? Is this not needed for static
linking?

Further thoughts:

* It looks like libgcc.a is always linked for both static and dynamic
links. It occurred to me when you brought this up. So, I thought why
not exclude the symbols from libgcc_s.so! Is there any problem here?

Example:

file:test.c

int main ()
{
  return (int) __builtin_cpu_is (corei7);
}

Case I : Use gcc to build dynamic

$ gcc test.c -Wl,-y,__cpu_model

libgcc.a(cpuinfo.o): reference to __cpu_model
libgcc_s.so: definition of __cpu_model

Case II: Use g++ to build dynamic

$ g++ test.c -Wl,-y,__cpu_model
fe1.o: reference to __cpu_model
libgcc_s.so: definition of __cpu_model

Case III: Use gcc to link static

$ gcc test.c -Wl,-y,__cpu_model -static
fe1.o: reference to __cpu_model
libgcc.a(cpuinfo.o): reference to __cpu_model


Please note that in all 3 cases, libgcc.a was linked in. Hence,
removing these symbols from the dynamic symbol table of libgcc_s.so
should have no issues.

Thanks,
-Sri.








 --
 H.J.


[C++ Patch] PR 25137 (no -Wmissing-braces in -Wall version)

2012-05-26 Thread Paolo Carlini
... and this is the version of the patch which simply takes 
-Wmissing-braces out of -Wall in C++. Bootstrapped and tested all 
C-family languages x86-64-linux.


Paolo.

///
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 187916)
+++ doc/invoke.texi (working copy)
@@ -3090,7 +3090,7 @@ Options} and @ref{Objective-C and Objective-C++ Di
 -Wformat   @gol
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}  @gol
 -Wmaybe-uninitialized @gol
--Wmissing-braces  @gol
+-Wmissing-braces @r{(only for C/ObjC)} @gol
 -Wnonnull  @gol
 -Wparentheses  @gol
 -Wpointer-sign  @gol
@@ -3401,7 +3401,8 @@ or @option{-Wpedantic}.
 @opindex Wno-missing-braces
 Warn if an aggregate or union initializer is not fully bracketed.  In
 the following example, the initializer for @samp{a} is not fully
-bracketed, but that for @samp{b} is fully bracketed.
+bracketed, but that for @samp{b} is fully bracketed.  This warning is
+enabled by @option{-Wall} in C.
 
 @smallexample
 int a[2][2] = @{ 0, 1, 2, 3 @};
Index: c-family/c-opts.c
===
--- c-family/c-opts.c   (revision 187916)
+++ c-family/c-opts.c   (working copy)
@@ -370,7 +370,6 @@ c_common_handle_option (size_t scode, const char *
   c_family_lang_mask, kind, loc,
   handlers, global_dc);
   warn_char_subscripts = value;
-  warn_missing_braces = value;
   warn_parentheses = value;
   warn_return_type = value;
   warn_sequence_point = value; /* Was C only.  */
@@ -402,6 +401,8 @@ c_common_handle_option (size_t scode, const char *
 done in c_common_post_options.  */
   if (warn_enum_compare == -1)
 warn_enum_compare = value;
+
+ warn_missing_braces = value;
}
   else
{
Index: testsuite/g++.dg/warn/Wbraces4.C
===
--- testsuite/g++.dg/warn/Wbraces4.C(revision 0)
+++ testsuite/g++.dg/warn/Wbraces4.C(revision 0)
@@ -0,0 +1,34 @@
+// PR c++/25137
+// { dg-options -Wmissing-braces }
+
+struct S { int s[3]; };
+S s1 = { 1, 1, 1 }; // { dg-warning missing braces }
+
+struct S1 { int s[3]; };
+struct S2 { struct S1 a; };
+S2 s21 = { 1, 1, 1 };   // { dg-warning missing braces }
+
+struct S3 { int s[3]; };
+struct S4 { struct S3 a; int b; };
+S4 s41 = { 1, 1, 1, 1 };// { dg-warning missing braces }
+
+struct S5 { int s[3]; };
+struct S6 { struct S5 a; int b; };
+S6 s61 = { { 1, 1, 1 }, 1 };// { dg-warning missing braces }
+
+struct S7 { int s[3]; };
+struct S8 { int a; struct S7 b; };
+S8 s81 = { 1, { 1, 1, 1 } };// { dg-warning missing braces }
+
+struct S9 { int s[2]; };
+struct S10 { struct S9 a; struct S9 b; };
+S10 s101 = { { 1, 1 }, 1, 1 };  // { dg-warning missing braces }
+
+struct S11 { int s[2]; };
+struct S12 { struct S11 a; struct S11 b; };
+S12 s121 = { { 1, 1 }, { 1, 1 } };  // { dg-warning missing braces }
+
+struct S13 { int i; };
+struct S14 { struct S13 a; };
+struct S15 { struct S14 b; };
+S15 s151 = { 1 };   // { dg-warning missing braces }
Index: testsuite/g++.dg/warn/Wbraces3.C
===
--- testsuite/g++.dg/warn/Wbraces3.C(revision 0)
+++ testsuite/g++.dg/warn/Wbraces3.C(revision 0)
@@ -0,0 +1,34 @@
+// PR c++/25137
+// { dg-options -Wall }
+
+struct S { int s[3]; };
+S s1 = { 1, 1, 1 };
+
+struct S1 { int s[3]; };
+struct S2 { struct S1 a; };
+S2 s21 = { 1, 1, 1 }; 
+
+struct S3 { int s[3]; };
+struct S4 { struct S3 a; int b; };
+S4 s41 = { 1, 1, 1, 1 };
+
+struct S5 { int s[3]; };
+struct S6 { struct S5 a; int b; };
+S6 s61 = { { 1, 1, 1 }, 1 };
+
+struct S7 { int s[3]; };
+struct S8 { int a; struct S7 b; };
+S8 s81 = { 1, { 1, 1, 1 } };
+
+struct S9 { int s[2]; };
+struct S10 { struct S9 a; struct S9 b; };
+S10 s101 = { { 1, 1 }, 1, 1 };
+
+struct S11 { int s[2]; };
+struct S12 { struct S11 a; struct S11 b; };
+S12 s121 = { { 1, 1 }, { 1, 1 } };
+
+struct S13 { int i; };
+struct S14 { struct S13 a; };
+struct S15 { struct S14 b; };
+S15 s151 = { 1 };


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread H.J. Lu
On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote:

 On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote:


 On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 
 
  
   On Fri, May 25, 2012 at 5:0   BTW, I noticed:

  
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
      20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
      82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
   __cpu_model@@GCC_4.8.0
     310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
   [hjl@gnu-6 pr14170]$
  
   Why is __cpu_model in both libgcc.a and libgcc_s.o?
 
  How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
  wrong but I cannot figure out the fix.
 
 Why don't you want it in libgcc_s.so?

 I thought libgcc.a is always linked in for static and dynamic builds. So
 having it in libgcc_s.so is redundant.


 [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN     4 
 __cpu_indicator_init
 [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
 __cpu_model@@GCC_4.8.0
   223: 2b60   560 FUNC    LOCAL  DEFAULT   11 
 __cpu_indicator_init
   310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
 [hjl@gnu-6 pr14170]$

 I think there should be only one copy of __cpu_model in the process.
 It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
 from libgcc_s.so?

 Ok, I am elaborating so that I understand the issue clearly.

 The dynamic symbol table of libgcc_s.so:

 $ objdump -T libgcc_s.so | grep __cpu

 00015fd0 g    DO .bss   0010  GCC_4.8.0   __cpu_model

 It only has __cpu_model, not __cpu_indicator_init just like you
 pointed out. I will fix this by adding a versioned symbol of
 __cpu_indicator_init to the *.ver files.

That will be great.

 Do you see any other issues here? I dont get the duplicate entries
 part you are referring to. The static symbol table also contains
 references to __cpu_model and __cpu_indicator_init, but that is
 expected right?

Duplication comes from static and dynamic symbol tables.

 In libgcc.a:

 readelf -sWt 
 /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
 | grep __cpu

   20: 0010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init

 libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
 HIDDEN visibility. Is this an issue? Is this not needed for static
 linking?

 Further thoughts:

 * It looks like libgcc.a is always linked for both static and dynamic
 links. It occurred to me when you brought this up. So, I thought why
 not exclude the symbols from libgcc_s.so! Is there any problem here?


You don't want one copy of those 2 symbols in each DSO where
they are used.

-- 
H.J.


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread Sriraman Tallam
On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com 
 wrote:

 On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote:


 On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 
 
  
   On Fri, May 25, 2012 at 5:0   BTW, I noticed:

  
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
      20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
      82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
   __cpu_model@@GCC_4.8.0
     310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
   [hjl@gnu-6 pr14170]$
  
   Why is __cpu_model in both libgcc.a and libgcc_s.o?
 
  How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
  wrong but I cannot figure out the fix.
 
 Why don't you want it in libgcc_s.so?

 I thought libgcc.a is always linked in for static and dynamic builds. So
 having it in libgcc_s.so is redundant.


 [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN     4 
 __cpu_indicator_init
 [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
 __cpu_model@@GCC_4.8.0
   223: 2b60   560 FUNC    LOCAL  DEFAULT   11 
 __cpu_indicator_init
   310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
 [hjl@gnu-6 pr14170]$

 I think there should be only one copy of __cpu_model in the process.
 It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
 from libgcc_s.so?

 Ok, I am elaborating so that I understand the issue clearly.

 The dynamic symbol table of libgcc_s.so:

 $ objdump -T libgcc_s.so | grep __cpu

 00015fd0 g    DO .bss   0010  GCC_4.8.0   __cpu_model

 It only has __cpu_model, not __cpu_indicator_init just like you
 pointed out. I will fix this by adding a versioned symbol of
 __cpu_indicator_init to the *.ver files.

 That will be great.

 Do you see any other issues here? I dont get the duplicate entries
 part you are referring to. The static symbol table also contains
 references to __cpu_model and __cpu_indicator_init, but that is
 expected right?

 Duplication comes from static and dynamic symbol tables.

 In libgcc.a:

 readelf -sWt 
 /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
 | grep __cpu

   20: 0010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init

 libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
 HIDDEN visibility. Is this an issue? Is this not needed for static
 linking?

 Further thoughts:

 * It looks like libgcc.a is always linked for both static and dynamic
 links. It occurred to me when you brought this up. So, I thought why
 not exclude the symbols from libgcc_s.so! Is there any problem here?


 You don't want one copy of those 2 symbols in each DSO where
 they are used.

Right, I agree. But this problem exists right now even if libgcc_s.so
is provided with these symbols. Please see example below:

Example:

dso.c
---

int some_func ()
{
   return (int) __builtin_cpu_is (corei7);
}

Build with gcc driver:
$ gcc dso.c -fPIC -shared -o dso.so
$ nm dso.so | grep __cpu
0780 t __cpu_indicator_init
1e00 b __cpu_model

This DSO is getting its own local copy of __cpu_model. This is fine
functionally but this is not the behaviour you have in mind.

whereas, if I build with g++ driver:

$ g++ dso.c -fPIC -shared dso.so
$ nm dso.so | grep __cpu
 U __cpu_model

This is as we would like, __cpu_model is undefined.

The difference is that with the gcc driver, the link line is -lgcc
-lgcc_s, whereas with the g++ driver -lgcc is not even present!

Should I fix the gcc driver instead? This double-standard is not clear to me.

Thanks,
-Sri.











 --
 H.J.


RFA: temp slot TLC [1/3]

2012-05-26 Thread Michael Matz
Hi,

I still had some cleanups for age-old code lying around, and thought to 
bring it up to date.  The whole dealing of slots for temporary stack space 
in function.c was never really updated to the way we're meanwhile 
expanding statements.  It has facilities that aren't useful anymore.  In 
the old days it worked like this: after a statement was expanded temporary 
slots that were allocated during that expansion would be freed.  The 
problem was that statements could be arbitrarily complex, including calls 
as call arguments or statement expressions.  So there had to be a way that 
some of these slots could be marked as to be kept until the next scope 
would be popped (the 'keep' flag to assign_temp), or at least that the 
slot might be used in the containing expression even after a 
free_temp_slots (the addr_taken flag for a slot).

Meanwhile the situation is like so:

1) There are only three places that still allocate a 'kept' slot (i.e. 
   keep==1 in a call to assign_temp, assign_stack_temp and 
   assign_stack_temp_for_type):

a) expand_expr_real_1, when expanding a reference from a constant object, 
   which must be in memory but isn't;
   this is meanwhile useless, the so allocated temporary will not be
   freed until the full statement is done, at which point the expanded 
   reference will either be consumed into the real objects, or will be 
   ignored; it's not necessary to artificially mark it kept
b) expand_asm_operands, for handling mem inputs on non-lvalues;
   this is dead code, we rewrite all such asms at gimple level already.
c) in expand_decl;
   this routine I'm going to remove (with removal of the reference from
   Ada it's all dead code)

2) The single place that still sometimes marks temp slots as addr_taken is 
   in expand_call.  It does so when the callee returns an aggregate and no 
   target is given (or no return slot optimization should happen).  The 
   temp slot will not go away until the next free_temp_slots at which 
   point it must already be copied out into the real LHS of the (gimple) 
   call statement or it'll be ignored, so marking it to be kept 
   isn't necessary.

expand_decl is dead code because it only does things for CONST_DECLs (for 
Ada, but I'm removing that use) and automatic vars.  But automatic vars 
are (supposed to be) allocated since quite some time in cfgexpand.  One 
call to expand_decl that still did something is for expansion of 
COMPOUND_LITERAL_EXPR, which we can have only in global ctors (usual 
problem of not gimplifying those).  That's easy to deal with, though, as 
in the patch.  The other expand_decl call that does something is for 
generating RTL for the nonlocal_goto_save_area in case it doesn't have RTL 
yet.  The latter shouldn't (and after the patch doesn't) happen: the 
underlying variable is supposed to be in local_decls and hence should have 
gotten RTL during cfgexpand.  This is only not true when it was unused, 
removed from local_decls, but the reference from cfun remains.  In that 
case we'd expand (unused) stack space.  Instead I simply remove the 
reference from cfun when the variable is removed.

As the data flow regarding temporaries in the old RTL code is a bit 
complicated I reassured myself regarding the above points with various 
gcc_unreachable and commenting code.  That is this first patch.  The 
second and third patch contain the more or less obvious cleanup followups.

I'm asking for approval for all three patches, but I have bootstrapped the 
first separately, so if approved I'm going to commit them as separate 
revisions so that in case I got something wrong with this touchy area it's 
easy to play with adjusting just parts of the patch, without always having 
to redo all the cleanup adjustments.

So, regstrapped on x86_64-linux all languages (+Ada,+obj-c++), this patch 
alone and in connection with [2/3] and [3/3].  Okay for trunk (I have 
approval for the Ada part already).


Ciao,
Michael.
---
* expr.c (expand_expr_real_1 normal_inner_ref): Don't allocate
a kept temp.
(expand_expr_real_1 COMPOUND_LITERAL_EXPR): Make unreachable.
* gimple-fold.c (canonicalize_constructor_val): Canonicalize 
COMPOUND_LITERAL_EXPR.
* function.c (expand_function_start): Don't call expand_decl,
instead assert that we have RTL assigned.
* tree-ssa-live.c (remove_unused_locals): Clear
nonlocal_goto_save_area if its backing variable is removed.
* stmt.c (expand_asm_operands): Remove handling of non-lvalues
as mem inputs.
(expand_decl): Assert that this does nothing.
* calls.c (expand_call): Don't call mark_temp_addr_taken.

* c-tree.h (c_expand_decl): Remove prototype.

c-family/
* c-common.h (c_expand_decl): Remove prototype.

ada/
* gcc-interface/utils.c (create_var_decl_1): Don't call expand_decl.

Index: expr.c
===
--- 

RFA: temp slot TLC [2/3]

2012-05-26 Thread Michael Matz
Hi,

and this is the large cleanup, removing dead code introduced by [1/3] for 
real, and removing the 'keep' parameter from assign_temp and friends, 
which now is always zero (in the compiler proper and in the backends).
That latter is the largest churn.

If [1/3] is approved this whole patch is large but obvious.

Regstrapped together with [1/3] on x86_64-linux, all languages 
(+Ada,+obj-c++).


Ciao,
Michael.

* rtl.h (assign_stack_temp, assign_stack_temp_for_type,
assign_temp): Remove 'keep' argument.
(mark_temp_addr_taken): Remove prototype.
* tree.h (expand_decl): Remove prototype.
* function.c (struct temp_slot): Remove addr_taken and keep
member.
(assign_stack_temp_for_type) Don't initialize above, remove
keep argument.
(assign_stack_temp, assign_temp): Remove keep argument.
(mark_temp_addr_taken): Remove.
(preserve_temp_slots): Remove handling of addr_taken and keep
members.
(free_temp_slots): Ditto.
* expr.c (expand_expr_real_1 COMPOUND_LITERAL_EXPR): Remove
handling.
* stmt.c (expand_asm_operands): Remove handling of non-lvalues
as mem inputs.
(expand_decl): Remove.
* c-decl.c (finish_struct): Don't call expand_decl.
* builtins.c (expand_builtin_cexpi): Adjust calls to assign_temp
and assign_stack_temp.
* calls.c (save_fixed_argument_area, initialize_argument_information,
expand_call, emit_library_call_value_1, store_one_arg): Ditto.
* expmed.c (extract_bit_field_1): Ditto.
* expr.c (emit_group_load_1, emit_group_store,
copy_blkmode_from_reg, emit_push_insn, expand_assignment,
store_field, expand_constructor, expand_cond_expr_using_cmove,
expand_expr_real_2, expand_expr_real_1): Ditto.
* stmt.c (expand_asm_operands, expand_return): Ditto.

* config/arm/arm.c (neon_expand_vector_init): Ditto.
* config/i386/i386.c (ix86_expand_vector_set): Ditto.
(ix86_expand_vector_extract): Ditto.
* config/ia64/ia64.c (spill_xfmode_rfmode_operand,
ia64_expand_movxf_movrf): Ditto.
* config/mips/mips.c (mips_expand_vi_general): Ditto.
* config/mmix/mmix.md (floatdisf2, floatunsdisf2, truncdfsf2,
extendsfdf2): Ditto.
* config/rs6000/rs6000.c (rs6000_expand_vector_init,
rs6000_expand_vector_set, rs6000_expand_vector_extract,
rs6000_allocate_stack_temp): Ditto.
* config/rs6000/rs6000.md (fix_trunctfsi2_fprs): Ditto.
* config/sparc/sparc.c (emit_soft_tfmode_libcall,
sparc_emit_float_lib_cmp, sparc_emit_float_lib_cmp,
sparc_expand_vector_init): Ditto.

Index: gcc/builtins.c
===
--- gcc.orig/builtins.c 2012-05-26 21:46:58.0 +0200
+++ gcc/builtins.c  2012-05-26 21:47:02.0 +0200
@@ -2650,8 +2650,8 @@ expand_builtin_cexpi (tree exp, rtx targ
   else
gcc_unreachable ();
 
-  op1 = assign_temp (TREE_TYPE (arg), 0, 1, 1);
-  op2 = assign_temp (TREE_TYPE (arg), 0, 1, 1);
+  op1 = assign_temp (TREE_TYPE (arg), 1, 1);
+  op2 = assign_temp (TREE_TYPE (arg), 1, 1);
   op1a = copy_addr_to_reg (XEXP (op1, 0));
   op2a = copy_addr_to_reg (XEXP (op2, 0));
   top1 = make_tree (build_pointer_type (TREE_TYPE (arg)), op1a);
Index: gcc/calls.c
===
--- gcc.orig/calls.c2012-05-26 21:47:02.0 +0200
+++ gcc/calls.c 2012-05-26 21:47:02.0 +0200
@@ -933,7 +933,7 @@ save_fixed_argument_area (int reg_parm_s
set_mem_align (stack_area, PARM_BOUNDARY);
if (save_mode == BLKmode)
  {
-   save_area = assign_stack_temp (BLKmode, num_to_save, 0);
+   save_area = assign_stack_temp (BLKmode, num_to_save);
emit_block_move (validize_mem (save_area), stack_area,
 GEN_INT (num_to_save), BLOCK_OP_CALL_PARM);
  }
@@ -1258,7 +1258,7 @@ initialize_argument_information (int num
  set_mem_attributes (copy, type, 1);
}
  else
-   copy = assign_temp (type, 0, 1, 0);
+   copy = assign_temp (type, 1, 0);
 
  store_expr (args[i].tree_value, copy, 0, false);
 
@@ -2404,7 +2404,7 @@ expand_call (tree exp, rtx target, int i
/* For variable-sized objects, we must be called with a target
   specified.  If we were to allocate space on the stack here,
   we would have no way of knowing when to free it.  */
-   rtx d = assign_temp (rettype, 0, 1, 1);
+   rtx d = assign_temp (rettype, 1, 1);
structure_value_addr = XEXP (d, 0);
target = 0;
  }
@@ -3278,7 +3278,7 @@ expand_call (tree exp, rtx target, int i
  

RFA: temp slot TLC [3/3]

2012-05-26 Thread Michael Matz
Hi,

and this is a further small cleanup.  pop_temp_slots is now the same as 
free_temp_slots (modulo the level-- of course), so there's no need in 
calling both.  (and preserve_temp_slots(NULL) is useless now).

Regstrapped on x86_64-linux together with [1/3] and [2/3], all languages 
(+Ada,+obj-c++).  Okay for trunk?


Ciao,
Michael.
--
* function.c (pop_temp_slots): Call free_temp_slots.
* calls.c (store_one_arg): Don't call preserve_temp_slots or
free_temp_slots.
* expr.c (expand_assignment): Don't call free_temp_slots.

Index: calls.c
===
--- calls.c.orig2012-05-26 21:47:02.0 +0200
+++ calls.c 2012-05-26 23:42:57.0 +0200
@@ -4721,11 +4721,7 @@ store_one_arg (struct arg_data *arg, rtx
  be deferred during the rest of the arguments.  */
   NO_DEFER_POP;
 
-  /* Free any temporary slots made in processing this argument.  Show
- that we might have taken the address of something and pushed that
- as an operand.  */
-  preserve_temp_slots (NULL_RTX);
-  free_temp_slots ();
+  /* Free any temporary slots made in processing this argument.  */
   pop_temp_slots ();
 
   return sibcall_failure;
Index: expr.c
===
--- expr.c.orig 2012-05-26 21:47:02.0 +0200
+++ expr.c  2012-05-26 23:43:23.0 +0200
@@ -4836,7 +4836,6 @@ expand_assignment (tree to, tree from, b
 
   if (result)
preserve_temp_slots (result);
-  free_temp_slots ();
   pop_temp_slots ();
   return;
 }
@@ -4884,7 +4883,6 @@ expand_assignment (tree to, tree from, b
  emit_move_insn (to_rtx, value);
}
   preserve_temp_slots (to_rtx);
-  free_temp_slots ();
   pop_temp_slots ();
   return;
 }
@@ -4911,7 +4909,6 @@ expand_assignment (tree to, tree from, b
emit_move_insn (to_rtx, temp);
 
   preserve_temp_slots (to_rtx);
-  free_temp_slots ();
   pop_temp_slots ();
   return;
 }
@@ -4941,7 +4938,6 @@ expand_assignment (tree to, tree from, b
 TYPE_MODE (sizetype));
 
   preserve_temp_slots (to_rtx);
-  free_temp_slots ();
   pop_temp_slots ();
   return;
 }
@@ -4951,7 +4947,6 @@ expand_assignment (tree to, tree from, b
   push_temp_slots ();
   result = store_expr (from, to_rtx, 0, nontemporal);
   preserve_temp_slots (result);
-  free_temp_slots ();
   pop_temp_slots ();
   return;
 }
Index: function.c
===
--- function.c.orig 2012-05-26 23:41:39.0 +0200
+++ function.c  2012-05-26 23:42:27.0 +0200
@@ -1200,22 +1200,7 @@ push_temp_slots (void)
 void
 pop_temp_slots (void)
 {
-  struct temp_slot *p, *next;
-  bool some_available = false;
-
-  for (p = *temp_slots_at_level (temp_slot_level); p; p = next)
-{
-  next = p-next;
-  make_slot_available (p);
-  some_available = true;
-}
-
-  if (some_available)
-{
-  remove_unused_temp_slot_addresses ();
-  combine_temp_slots ();
-}
-
+  free_temp_slots ();
   temp_slot_level--;
 }
 


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread H.J. Lu
On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com 
 wrote:

 On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote:


 On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 
 
  
   On Fri, May 25, 2012 at 5:0   BTW, I noticed:

  
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
      20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
      82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
   __cpu_model@@GCC_4.8.0
     310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
   [hjl@gnu-6 pr14170]$
  
   Why is __cpu_model in both libgcc.a and libgcc_s.o?
 
  How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
  wrong but I cannot figure out the fix.
 
 Why don't you want it in libgcc_s.so?

 I thought libgcc.a is always linked in for static and dynamic builds. So
 having it in libgcc_s.so is redundant.


 [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN     4 
 __cpu_indicator_init
 [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
 __cpu_model@@GCC_4.8.0
   223: 2b60   560 FUNC    LOCAL  DEFAULT   11 
 __cpu_indicator_init
   310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
 [hjl@gnu-6 pr14170]$

 I think there should be only one copy of __cpu_model in the process.
 It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
 from libgcc_s.so?

 Ok, I am elaborating so that I understand the issue clearly.

 The dynamic symbol table of libgcc_s.so:

 $ objdump -T libgcc_s.so | grep __cpu

 00015fd0 g    DO .bss   0010  GCC_4.8.0   __cpu_model

 It only has __cpu_model, not __cpu_indicator_init just like you
 pointed out. I will fix this by adding a versioned symbol of
 __cpu_indicator_init to the *.ver files.

 That will be great.

 Do you see any other issues here? I dont get the duplicate entries
 part you are referring to. The static symbol table also contains
 references to __cpu_model and __cpu_indicator_init, but that is
 expected right?

 Duplication comes from static and dynamic symbol tables.

 In libgcc.a:

 readelf -sWt 
 /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
 | grep __cpu

   20: 0010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN    4 
 __cpu_indicator_init

 libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
 HIDDEN visibility. Is this an issue? Is this not needed for static
 linking?

 Further thoughts:

 * It looks like libgcc.a is always linked for both static and dynamic
 links. It occurred to me when you brought this up. So, I thought why
 not exclude the symbols from libgcc_s.so! Is there any problem here?


 You don't want one copy of those 2 symbols in each DSO where
 they are used.

 Right, I agree. But this problem exists right now even if libgcc_s.so
 is provided with these symbols. Please see example below:

 Example:

 dso.c
 ---

 int some_func ()
 {
   return (int) __builtin_cpu_is (corei7);
 }

 Build with gcc driver:
 $ gcc dso.c -fPIC -shared -o dso.so
 $ nm dso.so | grep __cpu
 0780 t __cpu_indicator_init
 1e00 b __cpu_model

 This DSO is getting its own local copy of __cpu_model. This is fine
 functionally but this is not the behaviour you have in mind.

 whereas, if I build with g++ driver:

 $ g++ dso.c -fPIC -shared dso.so
 $ nm dso.so | grep __cpu
                 U __cpu_model

 This is as we would like, __cpu_model is undefined.

 The difference is that with the gcc driver, the link line is -lgcc
 -lgcc_s, whereas with the g++ driver -lgcc is not even present!

 Should I fix the gcc driver instead? This double-standard is not clear to me.


That is because libgcc_s.so is preferred by g++. We can do one
of 3 things:

1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
from libgcc.a to libgcc_eh.a.
2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
__cpu_indicator_init from libgcc.a to libgcc_static.a.
3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
 from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
libgcc_eh.a.


-- 
H.J.


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread Sriraman Tallam
On Sat, May 26, 2012 at 7:06 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com 
 wrote:

 On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote:


 On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 
 
  
   On Fri, May 25, 2012 at 5:0   BTW, I noticed:

  
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
      20: 0010    16 OBJECT  GLOBAL HIDDEN   COM 
   __cpu_model
   [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
      82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
   __cpu_model@@GCC_4.8.0
     310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 
   __cpu_model
   [hjl@gnu-6 pr14170]$
  
   Why is __cpu_model in both libgcc.a and libgcc_s.o?
 
  How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
  wrong but I cannot figure out the fix.
 
 Why don't you want it in libgcc_s.so?

 I thought libgcc.a is always linked in for static and dynamic builds. So
 having it in libgcc_s.so is redundant.


 [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN     4 
 __cpu_indicator_init
 [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24
 __cpu_model@@GCC_4.8.0
   223: 2b60   560 FUNC    LOCAL  DEFAULT   11 
 __cpu_indicator_init
   310: 00214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
 [hjl@gnu-6 pr14170]$

 I think there should be only one copy of __cpu_model in the process.
 It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
 from libgcc_s.so?

 Ok, I am elaborating so that I understand the issue clearly.

 The dynamic symbol table of libgcc_s.so:

 $ objdump -T libgcc_s.so | grep __cpu

 00015fd0 g    DO .bss   0010  GCC_4.8.0   __cpu_model

 It only has __cpu_model, not __cpu_indicator_init just like you
 pointed out. I will fix this by adding a versioned symbol of
 __cpu_indicator_init to the *.ver files.

 That will be great.

 Do you see any other issues here? I dont get the duplicate entries
 part you are referring to. The static symbol table also contains
 references to __cpu_model and __cpu_indicator_init, but that is
 expected right?

 Duplication comes from static and dynamic symbol tables.

 In libgcc.a:

 readelf -sWt 
 /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
 | grep __cpu

   20: 0010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
    21: 0110   612 FUNC    GLOBAL HIDDEN    4 
 __cpu_indicator_init

 libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
 HIDDEN visibility. Is this an issue? Is this not needed for static
 linking?

 Further thoughts:

 * It looks like libgcc.a is always linked for both static and dynamic
 links. It occurred to me when you brought this up. So, I thought why
 not exclude the symbols from libgcc_s.so! Is there any problem here?


 You don't want one copy of those 2 symbols in each DSO where
 they are used.

 Right, I agree. But this problem exists right now even if libgcc_s.so
 is provided with these symbols. Please see example below:

 Example:

 dso.c
 ---

 int some_func ()
 {
   return (int) __builtin_cpu_is (corei7);
 }

 Build with gcc driver:
 $ gcc dso.c -fPIC -shared -o dso.so
 $ nm dso.so | grep __cpu
 0780 t __cpu_indicator_init
 1e00 b __cpu_model

 This DSO is getting its own local copy of __cpu_model. This is fine
 functionally but this is not the behaviour you have in mind.

 whereas, if I build with g++ driver:

 $ g++ dso.c -fPIC -shared dso.so
 $ nm dso.so | grep __cpu
                 U __cpu_model

 This is as we would like, __cpu_model is undefined.

 The difference is that with the gcc driver, the link line is -lgcc
 -lgcc_s, whereas with the g++ driver -lgcc is not even present!

 Should I fix the gcc driver instead? This double-standard is not clear to me.


 That is because libgcc_s.so is preferred by g++. We can do one
 of 3 things:

 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
 from libgcc.a to libgcc_eh.a.
 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
 __cpu_indicator_init from libgcc.a to libgcc_static.a.
 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
 libgcc_eh.a.

Any reason why gcc should not be made to prefer libgcc_s.so too like g++?

Thanks for clearing this up. I will 

Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-05-26 Thread H.J. Lu
On Sat, May 26, 2012 at 7:23 PM, Sriraman Tallam tmsri...@google.com wrote:

 That is because libgcc_s.so is preferred by g++. We can do one
 of 3 things:

 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
 from libgcc.a to libgcc_eh.a.
 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
 __cpu_indicator_init from libgcc.a to libgcc_static.a.
 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
 libgcc_eh.a.

 Any reason why gcc should not be made to prefer libgcc_s.so too like g++?

 Thanks for clearing this up. I will take a stab at it.


This is a long story.  The short answer is people didn't want
to add libgcc_s.so to DT_NEEDED for C programs.  But
it is no longer an issue since we now pass

 -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed

to linker.


-- 
H.J.


RFA: Speedup expand_used_vars by 30 times (PR38474)

2012-05-26 Thread Michael Matz
Hi,

[for certain test cases :-) ]

the temp slot cleanups I just sent where actually motivated by PR38474.  
It exposes many slownesses in the compiler, but at -O0 the only remaining 
one is the expand phase.  expanding variables is the one thing (on my 
machine here, with -O0 -g f951): 30 seconds for the reduced testcase.  
expand itself (the statements) then needs 5.5 seconds and is the other 
thing.  Overall 77 seconds compile time.

The problem with expanding variables in this case is the 
add_alias_set_conflicts routine.  It walks all NxN/2 stack variable pairs, 
looks if pair (i,j) can share the same stack slot from a type perspective 
(canshare(i,j)) and adds a conflict if not so.  Before asking that 
question it doesn't look if pair (i,j) maybe already conflict for other 
reasons.

Now, we could simply add the conflicts(i,j) test in that loop, before 
asking canshare(i,j) and adding a new conflict.  But adding conflicts 
comes with a cost, so we can do better.  The answer to canshare(i,j) 
depends on only all components already merged into [i] and the type of j.  
With carefully choosing a type to represent all components of [i] and 
updating it in union_stack_vars we can simply ask canshare(i,j) right 
before we actually want to merge j into [i].  That's the least amount of 
work possible as long as we want to have the same results.

This change brings the 30 seconds for var expansion down to 0.8 seconds.

The other change in function.c further improves the temp slot machinery.  
While expanding free_temp_slots is called after each statement, and if it 
is able to free a slot it needs to update the RTX-slot mapping (a 
htab_t).  The freeing of slots was done incorrectly, it overwrote the slot 
in question with NULL, but it should have used htab_clear_slot.  This 
meant that the hashtable kept growing and growing even with all elements 
being empty because the size statistics aren't correct.  Additionally it 
breaks hash chains if there is one striding the nullified slot.

And then a microoptimization: if we know there aren't any slots in use at 
all (happens often in normal circumstances) we can as well use htab_empty 
instead of traversing the hash table for the right slots.

This brings the expansion time (i.e. for expanding the statements) down 
from 5.5 to 2.8 seconds.  All changes together reduce the compile time 
for the reduced testcase from ~ 77 seconds to ~ 44.

A variant of this regstrapped on x86_64-linux, all languages+Ada+objc++, 
but after some changes I'm redoing that regstrap.  Okay for trunk if that 
passes?


Ciao,
Michael.
---
PR middle-end/38474
* cfgexpand.c (struct stack_var): Add slot_type member.
(add_stack_var): Initialize it.
(add_alias_set_conflicts): Remove.
(merge_stack_vars_p, more_specific_type): New functions.
(nonconflicting_type): New static data.
(union_stack_vars): Use more_specific_type to update slot_type.
(partition_stack_vars): Call merge_stack_vars_p ...
(expand_used_vars): ... instead of add_alias_set_conflicts.
(fini_vars_expansion): Clear nonconflicting_type.
* function.c (n_temp_slots_in_use): New static data.
(make_slot_available, assign_stack_temp_for_type): Update it.
(init_temp_slots): Zero it.
(remove_unused_temp_slot_addresses): Use it for quicker removal.
(remove_unused_temp_slot_addresses_1): Use htab_clear_slot.

Index: cfgexpand.c
===
--- cfgexpand.c.orig2012-05-27 01:33:55.0 +0200
+++ cfgexpand.c 2012-05-27 04:37:18.0 +0200
@@ -177,6 +177,11 @@ struct stack_var
   /* The next stack variable in the partition, or EOC.  */
   size_t next;
 
+  /* The most specific type of all variables merged with the slot
+ if this is a representative.  Initially this is simply the
+ TREE_TYPE for the variable.  */
+  tree slot_type;
+
   /* The numbers of conflicting stack variables.  */
   bitmap conflicts;
 };
@@ -285,6 +290,7 @@ add_stack_var (tree decl)
   /* All variables are initially in their own partition.  */
   v-representative = stack_vars_num;
   v-next = EOC;
+  v-slot_type = TREE_TYPE (decl);
 
   /* All variables initially conflict with no other.  */
   v-conflicts = NULL;
@@ -353,45 +359,40 @@ aggregate_contains_union_type (tree type
   return false;
 }
 
-/* A subroutine of expand_used_vars.  If two variables X and Y have alias
-   sets that do not conflict, then do add a conflict for these variables
-   in the interference graph.  We also need to make sure to add conflicts
-   for union containing structures.  Else RTL alias analysis comes along
-   and due to type based aliasing rules decides that for two overlapping
-   union temporaries { short s; int i; } accesses to the same mem through
-   different types may not alias and happily reorders stores across
-   life-time boundaries of the temporaries (See 

Re: [cxx-conversion] New Hash Table (issue6244048)

2012-05-26 Thread Michael Matz
Hi,

On Fri, 25 May 2012, NightStrike wrote:

 This point of yours should be stressed.  Using writing standards of
 one language to develop in another language is a fundamentally bad
 idea.  C and C++ kind of look the same, but they are not:
 http://www.research.att.com/~bs/bs_faq.html#C-slash
 
 You wouldn't use the GNU C Coding conventions to write in tcl, and you
 shouldn't use them to write in C++.  You should just create the GNU
 C++ Coding Standards new, and not base them off of the former.

That is all nice and fine.  For starting from scratch.  But we aren't.  
We have an existing compiler written in a certain style.  We have existing 
people actually working on it and used to that style.  We don't want to 
have a mixture of several different styles in the compiler.  I (and I 
expect many others) don't want anyone working around the latter by going 
over the whole source base and reindent everything.  Hence inventing a new 
coding standard for GCC-in-C++ (by reusing existing ones or doing 
something new) that isn't mostly the same as GCC-in-C isn't going to fly.


Ciao,
Michael.