Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines

2012-07-30 Thread Sebastian Huber

Hello,

with this move to t-bpabi other targets like RTEMS profit also from this 
change.  This is very good since the unwinder pull-in for 64-bit divisions was 
pretty bad for small Cortex-M3 systems with internal flash only.


--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




Re: [PATCH v2] Target-specific limits on vector alignment

2012-07-30 Thread Richard Guenther
On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Richard Guenther wrote:
 On Mon, Jun 11, 2012 at 5:25 PM, Richard Earnshaw rearn...@arm.com wrote:
  On 11/06/12 15:53, Richard Guenther wrote:
  The type argument or the size argument looks redundant.
 
  Technically, yes, we could get rid of tree_low_cst (TYPE_SIZE (type)
  and calculate it inside the alignment function if it was needed.
  However, it seemed likely that most targets would need that number one
  way or another, such that passing it would be helpful.

 Well, you don't need it in stor-layout and targets might think the value
 may be completely unrelated to the type ...

  Note that we still can have such vector properly aligned, thus the
  vectorizer would need to use build_aligned_type also if it knows the
  type is aligned, not only when thinks it is misaligned.  You basically
  change the alignment of the default vector type.
 
  I'm not sure I follow...

 I say that a large vector may be still aligned, so the vectorizer when
 creating vector memory references has to use a non-default aligned vector
 type when the vector is aligned.  It won't do that at the moment.

 Richard (Earnshaw) has asked me to take over working on this patch now.

 I've now made the change requested above and removed the size argument.
 The target is now simply asked to return the required alignment for the
 given vector type.  I've also added a check for the case where the
 target provides both an alignment and a mode for a vector type, but
 the mode actually requires bigger alignment than the type.  This is
 simply rejected (the target can fix this by reporting a different
 type alignment or changing the mode alignment).

 I've not made any attempts to have the vectorizer register larger
 alignments than the one returned by the target hook.  It's not
 clear to me when this would be useful (at least on ARM) ...

 I've also run the testsuite, and this actually uncovered to bugs in
 the vectorizer where it made an implicit assumption that vector types
 must always be naturally aligned:

 - In vect_update_misalignment_for_peel, the code used the vector size
   instead of the required alignment in order to bound misalignment
   values -- leading to a misalignment value bigger than the underlying
   alignment requirement of the vector type, causing an ICE later on

 - In vect_do_peeling_for_loop_bound, the code divided the vector type
   alignment by the number of elements in order to arrive at the element
   size ... this returns a wrong value if the alignment is less than the
   vector size, causing incorrect code to be generated

   (This routine also had some confusion between size and alignment in
   comments and variable names, which I've fixed as well.)

 Finally, two test cases still failed spuriously:

 - gcc.dg/align-2.c actually checked that vector types are naturally
   aligned

 - gcc.dg/vect/slp-25.c checked that we needed to perform peeling for
   alignment, which we actually don't need any more if vector types
   have a lesser alignment requirement in the first place

 I've added a new effective target flag to check whether the target
 requires natural alignment for vector types, and disabled those two
 tests if it doesn't.

 With those changes, I've completed testing with no regressions on
 arm-linux-gnueabi.

 OK for mainline?

Ok.  Please add to the documentation that the default vector alignment
has to be a power-of-two multiple of the default vector element alignment.
You probably want to double-check vector_alignment_reachable_p as well
which checks whether vector alignment can be reached by peeling off
scalar iterations.

Thanks,
Richard.

 Bye,
 Ulrich


 ChangeLog:

 * target.def (vector_alignment): New target hook.
 * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook.
 * doc/tm.texi: Regenerate.
 * targhooks.c (default_vector_alignment): New function.
 * targhooks.h (default_vector_alignment): Add prototype.
 * stor-layout.c (layout_type): Use targetm.vector_alignment.
 * config/arm/arm.c (arm_vector_alignment): New function.
 (TARGET_VECTOR_ALIGNMENT): Define.

 * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use
 vector type alignment instead of size.
 * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use
 element type size directly instead of computing it from alignment.
 Fix variable naming and comment.

 testsuite/ChangeLog:

 * lib/target-supports.exp
 (check_effective_target_vect_natural_alignment): New function.
 * gcc.dg/align-2.c: Only run on targets with natural alignment
 of vector types.
 * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural
 alignment of vector types.


 Index: gcc/target.def
 ===
 *** gcc/target.def  (revision 189809)
 --- 

Re: [PATCH][4/n] into-SSA TLC

2012-07-30 Thread Richard Guenther
On Fri, 27 Jul 2012, Richard Guenther wrote:

 
 This avoids triggering update-ssa right after into-ssa just because
 we didn't rename virtual operands yet.  Simply do that on-the-fly,
 update_stmt will have added bare symbols as operands already.
 Surprisingly simple ... no idea why I chose the simple route
 when merging alias-improvements (originally the first 'alias' pass
 enabled virtual operands).
 
 Btw, we still have no virtual operands at -O0, it would now become
 a tiny bit cheaper to add them (just to remove some !optimize checks).
 
 Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

The following is what I have applied after bootstrap  regtest.

Richard.

2012-07-30  Richard Guenther  rguent...@suse.de

* tree-into-ssa.c (mark_def_sites): Also process virtual operands.
(rewrite_stmt): Likewise.
(rewrite_enter_block): Likewise.
(pass_build_ssa): Do not update virtual SSA form during TODO.
(mark_symbol_for_renaming): Do nothing if we are not in SSA form.
* lto-streamer-in.c (lto_read_body): Set in_ssa_p earlier.

* gcc.dg/ipa/ipa-pta-3.c: Adjust.
* gcc.dg/ipa/ipa-pta-4.c: Likewise.
* gcc.dg/tm/memopt-3.c: Likewise.

Index: trunk/gcc/tree-into-ssa.c
===
*** trunk.orig/gcc/tree-into-ssa.c  2012-07-30 11:27:06.0 +0200
--- trunk/gcc/tree-into-ssa.c   2012-07-30 11:34:59.588077320 +0200
*** mark_def_sites (basic_block bb, gimple s
*** 675,681 
  
/* If a variable is used before being set, then the variable is live
   across a block boundary, so mark it live-on-entry to BB.  */
!   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
  {
tree sym = USE_FROM_PTR (use_p);
gcc_assert (DECL_P (sym));
--- 675,681 
  
/* If a variable is used before being set, then the variable is live
   across a block boundary, so mark it live-on-entry to BB.  */
!   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
  {
tree sym = USE_FROM_PTR (use_p);
gcc_assert (DECL_P (sym));
*** mark_def_sites (basic_block bb, gimple s
*** 686,692 
  
/* Now process the defs.  Mark BB as the definition block and add
   each def to the set of killed symbols.  */
!   FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_DEF)
  {
gcc_assert (DECL_P (def));
set_def_block (def, bb, false);
--- 686,692 
  
/* Now process the defs.  Mark BB as the definition block and add
   each def to the set of killed symbols.  */
!   FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_ALL_DEFS)
  {
gcc_assert (DECL_P (def));
set_def_block (def, bb, false);
*** rewrite_stmt (gimple_stmt_iterator si)
*** 1336,1342 
if (is_gimple_debug (stmt))
rewrite_debug_stmt_uses (stmt);
else
!   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
  {
tree var = USE_FROM_PTR (use_p);
gcc_assert (DECL_P (var));
--- 1336,1342 
if (is_gimple_debug (stmt))
rewrite_debug_stmt_uses (stmt);
else
!   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
  {
tree var = USE_FROM_PTR (use_p);
gcc_assert (DECL_P (var));
*** rewrite_stmt (gimple_stmt_iterator si)
*** 1346,1352 
  
/* Step 2.  Register the statement's DEF operands.  */
if (register_defs_p (stmt))
! FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_DEF)
{
tree var = DEF_FROM_PTR (def_p);
tree name = make_ssa_name (var, stmt);
--- 1346,1352 
  
/* Step 2.  Register the statement's DEF operands.  */
if (register_defs_p (stmt))
! FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_ALL_DEFS)
{
tree var = DEF_FROM_PTR (def_p);
tree name = make_ssa_name (var, stmt);
*** static void
*** 1404,1410 
  rewrite_enter_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
 basic_block bb)
  {
-   gimple phi;
gimple_stmt_iterator gsi;
  
if (dump_file  (dump_flags  TDF_DETAILS))
--- 1404,1409 
*** rewrite_enter_block (struct dom_walk_dat
*** 1418,1428 
   node introduces a new version for the associated variable.  */
for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
  {
!   tree result;
! 
!   phi = gsi_stmt (gsi);
!   result = gimple_phi_result (phi);
!   gcc_assert (is_gimple_reg (result));
register_new_def (result, SSA_NAME_VAR (result));
  }
  
--- 1417,1423 
   node introduces a new version for the associated variable.  */
for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
  {
!   tree result = gimple_phi_result (gsi_stmt (gsi));
register_new_def (result, SSA_NAME_VAR (result));
  }
  
*** 

Re: [RFC C++ / PR51033 ] Handle __builtin_shuffle in constexpr properly in the C++ frontend.

2012-07-30 Thread Ramana Radhakrishnan
On 28 July 2012 10:26, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 18 Jun 2012, Ramana Radhakrishnan wrote:

 This patch following on from the fix for turning on __builtin_shuffle
 for c++ , enables folding of vec_perm_exprs in the front-end for
 constexpr and constructor style values.


 Hello,

 I took a look, and the example I gave in
 http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01066.html
 although it doesn't crash the compiler anymore, still fails to compile. I am
 not sure: were you just trying to remove the ICE, or actually support this
 use?

The intent was to actually support this use properly. I'll have a look
but it's unlikely to be today.


Ramana


 #include x86intrin.h
 int main(){
   constexpr __m128d x={1.,2.};
   constexpr __m128i y={1,0};
   constexpr __m128d z=__builtin_shuffle(x,y);
 }

 $ g++ -std=gnu++11 m.cc
 m.cc: In function 'int main()':
 m.cc:5:23: error: '#'vec_perm_expr' not supported by dump_expr#expression
 error' is not a constant expression
constexpr __m128d z=__builtin_shuffle(x,y);
^

 --
 Marc Glisse


[Patch 0/6] Improve Neon intrinsics a bit

2012-07-30 Thread Ramana Radhakrishnan
Hi,
 I've been working on a small project to improve neon intrinsic
and  I kept getting bothered by random failures in gcc.target/arm/neon
and I got sufficiently irritated that I decided to clean that bit up
and then found myself in a maze of rabbit holes. I've always been
somewhat bothered by the Neon intrinsics tests and took the
opportunity to actually do some proper cleanup work in that space.
It's not as good as having proper execute tests but this is certainly
better than the tests that are in place today.

Patch 1 fixes up the vaba and vabal patterns to use a canonical RTL
form with the first operand to the plus being the more complex one.
Patch 2 is a bug fix that fixes up the splitters so that they take
into account the right register for the right mode . For instance a
register not fit for a TImode value shouldn't be put in one even if
the larger mode allows a different register . This is possible for
OImode values or indeed HFA style values being passed around as
parameters and is potentially an issue for folks building hard-float
systems with neon and using some of the large structures.
Patch 3 fixes up the costs so that lower-subreg doesn't go bonkers
with splitting large values before it is visible . More in the actual
patch description. It is possibly the most contentious of the lot and
could do with some review. I think there is still quite a lot more to
be done around costs for some of the vector operations.
Patch 4 - Improves the testsuite for the Neon intrinsics.  There are
still testisms for a number of these but it boils down to the regexps
needing to be corrected for a number of these tests. I thought before
I spend more time on ML wrangling , I should get this out for some
review. Again a contentious one and probably could do with some
discussion.
Patch 5 - Bug fix that fixes up a set of ICEs because we were always
generating vec_duplicate of DImode values into other DImode values.
Possibly needs backporting to older versions.
Patch 6-  Similar to #5 but here we prevent a (set (vec_select:DI ())
(reg:DI))  type operation.  I will commit this regardless. Possibly
needs backporting to older release branches.


regards,
Ramana


[Patch ARM 1/6] Canonicalize neon_vaba and neon_vabal patterns.

2012-07-30 Thread Ramana Radhakrishnan
 Patch 1 fixes up the vaba and vabal patterns to use a canonical RTL
 form with the first operand to the plus being the more complex one.

This patch canonicalizes the instruction patterns for the
vaba and vabal intrinsics so that the more complex operand
to plus is the first operand. This prevents needless
splitting in combine.

For reference, this was found by the new test in gcc.target/neon/vaba*.c
and gcc.target/neon/vabal*.c from patch #4.


Ok ?

regards,
Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/neon.md (neon_vabamode): Change to define_expand.
  (neon_vabalmode): Likewise.
  (neon_vaba_internalmode): New internal pattern.
  (neon_vabal_internalmode): New internal pattern.
---
 gcc/config/arm/neon.md |   61 ---
 1 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 7142c98..1ffbb7d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2335,29 +2335,60 @@
   [(set_attr neon_type neon_int_5)]
 )

-(define_insn neon_vabamode
+(define_expand neon_vabamode
+   [(match_operand:VDQIW 0 s_register_operand )
+(match_operand:VDQIW 1 s_register_operand )
+(match_operand:VDQIW 2 s_register_operand )
+(match_operand:VDQIW 3 s_register_operand )
+(match_operand:SI 4 immediate_operand )]
+  TARGET_NEON
+  {
+emit_insn (gen_neon_vaba_internalmode (operands[0], operands[2],
+operands[3], operands[4],
+operands[1]));
+DONE;
+   }
+)
+
+(define_insn neon_vaba_internalmode
   [(set (match_operand:VDQIW 0 s_register_operand =w)
-(plus:VDQIW (match_operand:VDQIW 1 s_register_operand 0)
-(unspec:VDQIW [(match_operand:VDQIW 2
s_register_operand w)
-  (match_operand:VDQIW 3 s_register_operand 
w)
-   (match_operand:SI 4
immediate_operand i)]
- UNSPEC_VABD)))]
+(plus:VDQIW (unspec:VDQIW
+   [(match_operand:VDQIW 1 s_register_operand w)
+(match_operand:VDQIW 2 s_register_operand w)
+ (match_operand:SI 3 immediate_operand i)] UNSPEC_VABD)
+   (match_operand:VDQIW 4 s_register_operand 0)))]
   TARGET_NEON
-  vaba.%T4%#V_sz_elem\t%V_reg0, %V_reg2, %V_reg3
+  vaba.%T3%#V_sz_elem\t%V_reg0, %V_reg1, %V_reg2
   [(set (attr neon_type)
  (if_then_else (match_test Is_d_reg)
(const_string neon_vaba) (const_string neon_vaba_qqq)))]
 )

-(define_insn neon_vabalmode
+(define_expand neon_vabalmode
+  [(match_operand:V_widen 0 s_register_operand )
+   (match_operand:V_widen 1 s_register_operand )
+   (match_operand:VW2 s_register_operand )
+   (match_operand:VW3 s_register_operand )
+   (match_operand:SI4 immediate_operand  )]
+  TARGET_NEON
+  {
+emit_insn (gen_neon_vabal_internalmode (operands[0], operands[2],
+ operands[3], operands[4],
+ operands[1]));
+DONE;
+   }
+)
+
+(define_insn neon_vabal_internalmode
   [(set (match_operand:V_widen 0 s_register_operand =w)
-(plus:V_widen (match_operand:V_widen 1 s_register_operand 0)
-(unspec:V_widen [(match_operand:VW 2
s_register_operand w)
-   (match_operand:VW 3
s_register_operand w)
-   (match_operand:SI 4
immediate_operand i)]
-  UNSPEC_VABDL)))]
-  TARGET_NEON
-  vabal.%T4%#V_sz_elem\t%q0, %P2, %P3
+(plus:V_widen (unspec:V_widen
+   [(match_operand:VW 1 s_register_operand w)
+ (match_operand:VW 2 s_register_operand w)
+ (match_operand:SI 3 immediate_operand i)]
+   UNSPEC_VABDL)
+(match_operand:V_widen 4 s_register_operand 0)))]
+  TARGET_NEON
+  vabal.%T3%#V_sz_elem\t%q0, %P1, %P2
   [(set_attr neon_type neon_vaba)]
 )

-- 
1.7.4.1


[Patch ARM 2/6] Fix Large struct mode splitters for cases where registers are not TImode.

2012-07-30 Thread Ramana Radhakrishnan
 Patch 2 is a bug fix that fixes up the splitters so that they take
 into account the right register for the right mode . For instance a
 register not fit for a TImode value shouldn't be put in one even if
 the larger mode allows a different register . This is possible for
 OImode values or indeed HFA style values being passed around as
 parameters and is potentially an issue for folks building hard-float
 systems with neon and using some of the large structures.
,

  The large struct mode splitters don't take into account whether
a TImode value can be generated from a value that is in an appropriate
neon register for that value. This is possible in cases where you have
an EImode, OImode, CImode or TImode value in the appropriate registers
as these could be passed in their corresponding neon D registers.

This was exposed by the tests for v{ld/st/tbl/tbx}2/3/4{lane/}* and
friends in the new set of tests that follow at the end of this patch
series.

This is a problem for folks using the new hard float ABI and passing
such values in registers - so it might not show up that much in practice
but it's certainly worth backporting after sitting in trunk for a few
days. It certainly is not a regression since this bug has always been
there but it is a fundamental correctness issue in the backend with respect
to such splits, so I'd like some more consensus on whether this can be
safely backported.

regards,
Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

PR target/
* config/arm/arm-protos.h (arm_split_eimoves): Declare.
(arm_split_tocx_imoves): Declare.
* config/arm/iterators.md (TOCXI): New.
* config/arm/neon.md (EI TI OI CI XI mode splitters): Unify
and use iterator. Simplify EImode splitter. Move logic to ...
* config/arm/arm.c (arm_split_eimoves): here .. Handle
case for EImode values in registers not suitable for splits
into TImode values.
(arm_split_tocx_imoves): Likewise.
---
 gcc/config/arm/arm-protos.h |3 +
 gcc/config/arm/arm.c|   91 +++
 gcc/config/arm/iterators.md |3 +
 gcc/config/arm/neon.md  |   84 +---
 4 files changed, 107 insertions(+), 74 deletions(-)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c590ef4..dc93c5d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -248,6 +248,9 @@ extern int vfp3_const_double_for_fract_bits (rtx);
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
   rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
+extern void arm_split_tocx_imoves (rtx *, enum machine_mode);
+extern void arm_split_eimoves (rtx *);
+
 #endif /* RTX_CODE */

 extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1f3f9b3..b281485 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26410,4 +26410,95 @@ arm_validize_comparison (rtx *comparison, rtx
* op1, rtx * op2)

 }

+/* EImode values are usually in 3 DImode registers. This could be suitably
+   split into TImode moves and DImode moves.  */
+void
+arm_split_eimoves (rtx *operands)
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  int count = 0;
+  int increment = 0;
+  rtx dest[3], src[3];
+  int i, j;
+
+  if (NEON_REGNO_OK_FOR_QUAD (rdest)  NEON_REGNO_OK_FOR_QUAD (rsrc))
+{
+  dest[0] = gen_rtx_REG (TImode, rdest);
+  src[0] = gen_rtx_REG (TImode, rsrc);
+  count = 2;
+  increment = 4;
+}
+  else
+{
+  dest[0] = gen_rtx_REG (DImode, rdest);
+  src[0] = gen_rtx_REG (DImode, rsrc);
+  dest[1] = gen_rtx_REG (DImode, rdest + 2);
+  src[1] = gen_rtx_REG (DImode, rsrc + 2);
+  count = 3;
+  increment = 2;
+}
+
+  dest[count - 1] = gen_rtx_REG (DImode, rdest + 4);
+  src[count - 1] = gen_rtx_REG (DImode, rsrc + 4);
+
+  neon_disambiguate_copy (operands, dest, src, count);
+
+  for (i = 0, j = 0 ; j  count ; i = i + 2, j++)
+  emit_move_insn (operands[i], operands[i + 1]);
+
+  return;
+}
+
+/* Split TI, CI, OI and XImode moves into appropriate smaller
+   forms.  */
+void
+arm_split_tocx_imoves (rtx *operands, enum machine_mode mode)
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  enum machine_mode split_mode;
+  int count = 0;
+  int factor = 0;
+  int j;
+  /* We never should need more than 8 DImode registers in the worst case.  */
+  rtx dest[8], src[8];
+  int i;
+
+  if (NEON_REGNO_OK_FOR_QUAD (rdest)  NEON_REGNO_OK_FOR_QUAD (rsrc))
+{
+  split_mode = TImode;
+  if (dump_file)
+   fprintf (dump_file, split_mode is TImode\n);
+}
+  else
+{
+  split_mode = 

[Patch ARM 3/6] Adjust costs for Large moves for ARM.

2012-07-30 Thread Ramana Radhakrishnan
Hi,

lower-subreg.c goes completely bonkers at times with code
that uses the large vector modes, especially the vld3 / vst3
type operations. In these cases these large modes are usually
split into SImode moves which then cause massive spilling
and in these cases we end up generating really really bad code.

The problem here appears to be around the fact
that we report the cost of a reg-reg move to be 0 and the alternate
is also 0 which means that by default we split in any large register
case. I am a bit unsure about DImode moves and whether they should be
split or not which is why there is a fixme in this particular case.

With the examples that I've tried out which has been suitably complex
neon intrinsics code, this appears to prevent the gratuitous splitting.
Ofcourse not splitting has it's own problems as we now have a contiguous
3 registers with large values being allocated. I'm not however sure
how this will hold up in practice and in real life applications
and if someone could provide some feedback on this it would be
great.

If only smaller portions of those large registers are used, it gets
a bit harder for the register allocator to get this right.

So this is a patch that might need more tweaking and is potentially
the most contentious of the lot. In addition the same logic could be
applied to arm_size_cost before I commit this patch.

regards,
Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/arm.c (arm_rtx_costs_1): Adjust cost for register
register moves.
(arm_reg_reg_move_cost_for_mode): Use it.
---
 gcc/config/arm/arm.c |   46 ++
 1 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b281485..c59184f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -268,6 +268,7 @@ static int arm_cortex_a5_branch_cost (bool, bool);

 static bool arm_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 const unsigned char *sel);
+static int arm_reg_reg_move_cost_for_mode (enum machine_mode mode);

 
 /* Table of machine attributes.  */
@@ -7637,6 +7638,13 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer,
int* total, bool speed)
   return true;

 case SET:
+  if (s_register_operand (SET_DEST (x), GET_MODE (SET_DEST (x)))
+  s_register_operand (SET_SRC (x), GET_MODE (SET_SRC (x
+   {
+ *total = COSTS_N_INSNS (arm_reg_reg_move_cost_for_mode
+ (GET_MODE (SET_DEST (x;
+ return true;
+   }
   return false;

 case UNSPEC:
@@ -26501,4 +26509,42 @@ arm_split_tocx_imoves (rtx *operands, enum
machine_mode mode)

 }

+static int
+arm_reg_reg_move_cost_for_mode (enum machine_mode mode)
+{
+  /* Check if this is a move between 2 pseudos and
+ 2 hard registers will fall out from the stuff
+ below.  */
+  if (TARGET_NEON  TARGET_HARD_FLOAT)
+{
+  /* FIXME - this is currently in only to prevent
+the large register moves. However in practice
+preventing splitting of DImode values requires
+more tuning.  */
+  if (mode != DImode
+  (VALID_NEON_DREG_MODE (mode)
+ || VALID_NEON_QREG_MODE (mode)))
+   return 1;
+
+  /* The cost of moving a structure type size is the
+number of 128 bit moves one needs to do in addition
+to the number of 64 bit moves one needs to do in
+case of the EImode values.  */
+  if (VALID_NEON_STRUCT_MODE (mode))
+   {
+ return ((GET_MODE_SIZE (mode) / GET_MODE_SIZE (TImode))
+ + ((GET_MODE_SIZE (mode) / GET_MODE_SIZE (DImode))  1));
+   }
+}
+
+  if (TARGET_HARD_FLOAT  TARGET_VFP)
+{
+  if (mode == DFmode
+  mode == SFmode)
+   return 1;
+}
+
+  return ARM_NUM_REGS (mode);
+}
+
 #include gt-arm.h
-- 
1.7.4.1


[Patch ARM 4/6] Improve Neon intrinsics testsuite.

2012-07-30 Thread Ramana Radhakrishnan
On 30 July 2012 12:41, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:
 Hi,
  I've been working on a small project to improve neon intrinsic
 and  I kept getting bothered by random failures in gcc.target/arm/neon
 and I got sufficiently irritated that I decided to clean that bit up
 and then found myself in a maze of rabbit holes. I've always been
 somewhat bothered by the Neon intrinsics tests and took the
 opportunity to actually do some proper cleanup work in that space.
 It's not as good as having proper execute tests but this is certainly
 better than the tests that are in place today.

 Patch 1 fixes up the vaba and vabal patterns to use a canonical RTL
 form with the first operand to the plus being the more complex one.
 Patch 2 is a bug fix that fixes up the splitters so that they take
 into account the right register for the right mode . For instance a
 register not fit for a TImode value shouldn't be put in one even if
 the larger mode allows a different register . This is possible for
 OImode values or indeed HFA style values being passed around as
 parameters and is potentially an issue for folks building hard-float
 systems with neon and using some of the large structures.
 Patch 3 fixes up the costs so that lower-subreg doesn't go bonkers
 with splitting large values before it is visible . More in the actual
 patch description. It is possibly the most contentious of the lot and
 could do with some review. I think there is still quite a lot more to
 be done around costs for some of the vector operations.
 Patch 4 - Improves the testsuite for the Neon intrinsics.  There are
 still testisms for a number of these but it boils down to the regexps
 needing to be corrected for a number of these tests. I thought before
 I spend more time on ML wrangling , I should get this out for some
 review. Again a contentious one and probably could do with some
 discussion.

   This patch converts the testsuite generator to actually produce
something more sensible than the current set of tests. It changes
these to generate the following form for a test instead of the previous
set of tests.

It's careful to use the hard-fp variant so that we actually
produce an instruction (atleast a move of the appropriate form) and
uses a dummy floating point parameter to ensure this. This ensures that
most tests are alright. This does increase test times quite a bit
and I'm considering a follow-up to the build system that tries to do
some of these tests in parallel.

It's been useful and instructive so far and has found a few issues
in the compiler and probably been the twistiest passage in this maze
of twisty little passages.

There are still failures in these tests and some of them are down to testisms
and some them down to real issues which I'm still looking at. I'm only attaching
the non-autogenerated parts with the patch.


Thoughts ?

Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/neon-testgen.ml: Update copyright years.
(emit_prologue): Do not restrict runs to O0. Remove function start.
(emit_automatics): Rename to emit_test_prologue.
(emit_test_prologue test_name const_valuator): Additional parameters.
(emit_test_prologue print_arg print_args): New routines.
(emit_test_prologue): Update comment. Print test_name. Convert
automatics to function parameters and use above.
(emit_call): Handle printing of return value and close off
test function.
(emit_epilogue): Delete printing of end of function.
(test_intrinsic): Adjust calls to changed functions.

testsuite/
* gcc.target/arm/neon/neon.exp: Update copyright year and
change into a torture test.
* gcc.target/arm/neon/*.c: Regenerate.

/* Test the `vaddf32' ARM Neon intrinsic.  */
/* This file was autogenerated by neon-testgen.  */

/* { dg-do assemble } */
/* { dg-require-effective-target arm_neon_ok } */
/* { dg-options -save-temps } */
/* { dg-add-options arm_neon } */

float32x2_t __attribute__ ((pcs (aapcs-vfp)))
test_vaddf32 (float dummy_param, float32x2_t  arg0_float32x2_t,
float32x2_t  arg1_float32x2_t)
{
  float32x2_t out_float32x2_t;

  out_float32x2_t = vadd_f32 (arg0_float32x2_t, arg1_float32x2_t);
  return out_float32x2_t;
}

/* { dg-final { scan-assembler vadd\.f32\[ \]+\[dD\]\[0-9\]+,
\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[\]+@\[a-zA-Z0-9 \]+\)?\n } } */
/* { dg-final { cleanup-saved-temps } } */

diff --git a/gcc/config/arm/neon-testgen.ml b/gcc/config/arm/neon-testgen.ml
index a69a539..7da3489 100644
--- a/gcc/config/arm/neon-testgen.ml
+++ b/gcc/config/arm/neon-testgen.ml
@@ -1,5 +1,6 @@
 (* Auto-generate ARM Neon intrinsics tests.
-   Copyright (C) 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software
+   Foundation, Inc.

Re: [Patch 0/6] Improve Neon intrinsics a bit

2012-07-30 Thread Ramana Radhakrishnan
On 30 July 2012 12:41, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:

 Patch 5 - Bug fix that fixes up a set of ICEs because we were always
 generating vec_duplicate of DImode values into other DImode values.
 Possibly needs backporting to older versions.


The recent changes to the vld1_dup intrinsics ended up generating
(set reg:DI (vec_duplicate:DI  (mem:DI ))). Instead of folding these
out it was simpler just to fix this up in the backend.

Fixes up the failures in vld1_dups/u64 in the new intrinsics tests.
No need for a new test.

Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/neon.md (neon_vld1_dupdi): Split out from the other
vld1_dupmode patterns.
(neon_vld1_dupmode VDX): Change to iterate on VD iterator and
simplify.
(neon_vld1_dupmode VQ): Cleanup.
---
 gcc/config/arm/neon.md |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 7434625..843c907 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4349,29 +4349,29 @@
 )

 (define_insn neon_vld1_dupmode
-  [(set (match_operand:VDX 0 s_register_operand =w)
-(vec_duplicate:VDX (match_operand:V_elem 1
neon_struct_operand Um)))]
+  [(set (match_operand:VD 0 s_register_operand =w)
+(vec_duplicate:VD (match_operand:V_elem 1
neon_struct_operand Um)))]
   TARGET_NEON
-{
-  if (GET_MODE_NUNITS (MODEmode)  1)
-return vld1.V_sz_elem\t{%P0[]}, %A1;
-  else
-return vld1.V_sz_elem\t%h0, %A1;
-}
-  [(set (attr neon_type)
-  (if_then_else (gt (const_string V_mode_nunits) (const_string 1))
-(const_string neon_vld2_2_regs_vld1_vld2_all_lanes)
-(const_string neon_vld1_1_2_regs)))]
+  vld1.V_sz_elem\t{%P0[]}, %A1;
+  [(set_attr neon_type neon_vld1_1_2_regs)]
+)
+
+;; This has been split from the others because vld1_dupdi is the same
+;; as a DImode move and it is meaningless to vec_duplicate a DImode value into
+;; a DImode value.
+(define_expand neon_vld1_dupdi
+  [(set (match_operand:DI 0 s_register_operand )
+   (match_operand:DI 1 neon_struct_operand ))]
+ TARGET_NEON
+ 
 )

 (define_insn neon_vld1_dupmode
   [(set (match_operand:VQ 0 s_register_operand =w)
 (vec_duplicate:VQ (match_operand:V_elem 1
neon_struct_operand Um)))]
   TARGET_NEON
-{
-  return vld1.V_sz_elem\t{%e0[], %f0[]}, %A1;
-}
-  [(set_attr neon_type neon_vld2_2_regs_vld1_vld2_all_lanes)]
+ vld1.V_sz_elem\t{%e0[], %f0[]}, %A1
+ [(set_attr neon_type neon_vld2_2_regs_vld1_vld2_all_lanes)]
 )

 (define_insn_and_split neon_vld1_dupv2di
-- 
1.7.4.1


[Patch ARM 6/6] Fix ICE with vst1_lanedi type intrinsics.

2012-07-30 Thread Ramana Radhakrishnan
Hi,

This is similar to the previous patch except that it prevents
(vec_select:DI (operand:DI)) type operations.

Exposed by the vst*_lane*.c tests in the new testsuite.

regards,
Ramana

2012-07-27  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/neon.md (neon_vst1_lanedi): Split from ..
(neon_vst1mode VDX): this, iterate over VD and cleanup.
---
 gcc/config/arm/neon.md |   26 +-
 1 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 843c907..ec35d69 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4405,10 +4405,23 @@
   vst1.V_sz_elem\t%h1, %A0
   [(set_attr neon_type neon_vst1_1_2_regs_vst2_2_regs)])

+(define_expand neon_vst1_lanedi
+  [(match_operand:DI 0 neon_struct_operand)
+   (match_operand:DI 1 s_register_operand)
+   (match_operand:SI 2 neon_lane_number)]
+  TARGET_NEON
+  {
+if (INTVAL (operands[2]) == 1)
+  error (lane out of range for vst1_lanedi intrinsic);
+emit_move_insn (operands[0], operands[1]);
+DONE;
+  }
+)
+
 (define_insn neon_vst1_lanemode
   [(set (match_operand:V_elem 0 neon_struct_operand =Um)
(vec_select:V_elem
- (match_operand:VDX 1 s_register_operand w)
+ (match_operand:VD 1 s_register_operand w)
  (parallel [(match_operand:SI 2 neon_lane_number i)])))]
   TARGET_NEON
 {
@@ -4416,15 +4429,10 @@
   HOST_WIDE_INT max = GET_MODE_NUNITS (MODEmode);
   if (lane  0 || lane = max)
 error (lane out of range);
-  if (max == 1)
-return vst1.V_sz_elem\t{%P1}, %A0;
-  else
-return vst1.V_sz_elem\t{%P1[%c2]}, %A0;
+
+  return vst1.V_sz_elem\t{%P1[%c2]}, %A0;
 }
-  [(set (attr neon_type)
-  (if_then_else (eq (const_string V_mode_nunits) (const_int 1))
-(const_string neon_vst1_1_2_regs_vst2_2_regs)
-(const_string neon_vst1_vst2_lane)))])
+  [(set_attr neon_type neon_vst1_vst2_lane)])

 (define_insn neon_vst1_lanemode
   [(set (match_operand:V_elem 0 neon_struct_operand =Um)
-- 
1.7.4.1


Re: [PATCH] Intrinsics for PREFETCHW

2012-07-30 Thread Kirill Yukhin
 Ehm ...

 * gcc.target/i386/sse-13.c: Ditto.
 * gcc.target/i386/sse-14.c: Ditto.
 * g++.dg/other/i386-2.C: Ditto.
 * g++.dg/other/i386-3.C: Ditto.
Sorry, what's wrong here?

 I suggest you implement handling of this builtin in the same way
 rdrandmode_1 is implemented. Please also keep names of builtins and
 enums consistent with rdrand. Also, please put new defines just after
 rdrand stuff, they somehow belongs together.
Sure!

 +DEF_FUNCTION_TYPE (UCHAR, UCHAR, UINT, UINT, PINT)
 +DEF_FUNCTION_TYPE (UCHAR, UCHAR, ULONGLONG, ULONGLONG, PINT)
Whoops, removed!

 +  /* RDSEED instructions. */

 Two spaces after the dot.
Fixed!

 +  IX86_BUILTIN_RDSEED16,
 +  IX86_BUILTIN_RDSEED32,
 +  IX86_BUILTIN_RDSEED64,

 Please name this IX86_BUILTIN_RDSEED{16,32,64}_STEP.
Fixed!

 +  /* RDSEED */
 +  def_builtin (OPTION_MASK_ISA_RDSEED, __builtin_ia32_rdseed_hi,
 +  INT_FTYPE_PUSHORT, IX86_BUILTIN_RDSEED16);
 +  def_builtin (OPTION_MASK_ISA_RDSEED, __builtin_ia32_rdseed_si,
 +  INT_FTYPE_PUNSIGNED, IX86_BUILTIN_RDSEED32);
 +  def_builtin (OPTION_MASK_ISA_RDSEED  OPTION_MASK_ISA_64BIT,
 +  __builtin_ia32_rdseed_di,
 +  INT_FTYPE_PULONGLONG, IX86_BUILTIN_RDSEED64);

 __builtin_ia32_rdseed{16,32,64}_step
Fixed!

 +case IX86_BUILTIN_RDSEED16:
 +case IX86_BUILTIN_RDSEED32:
 +case IX86_BUILTIN_RDSEED64:

 Just copy from rdrand handling everything, up to:

   emit_move_insn (gen_rtx_MEM (mode0, op1), op0);

 +  /* Generate random number and save it in OP0.  */

 +  /* Store the result to sum.  */

 +  /* Return current CF value.  */

 No need for comments. BTW: You are probably returning a seed, not a
 random value.
Thanks! Fixed. We need to return success/failure of rdseed execution.
It set by CF.

 +  emit_insn (gen_rtx_SET (QImode, target,
 +   gen_rtx_LTU (QImode, gen_rtx_REG (CCCmode, FLAGS_REG), 
 const0_rtx)));

 This is wrong. Try following (untested) code:

   op2 = gen_reg_rtx (QImode);

   pat = gen_rtx_LTU (QImode, gen_rtx_REG (CCCmode, FLAGS_REG),
  const0_rtx);
   emit_insn (gen_rtx_SET (VOIDmode, op2, pat));

   if (target == 0)
 target = gen_reg_rtx (SImode);

   emit_insn (gen_zero_extendqisi2 (target, op0));
   return target;
Thanks! I added this (slightly fixed).


 +
 +  UNSPEC_RDSEED

 Needs to be volatile. Please also add comment.
Done.


 Wrong! Please copy pattern from rdrandmode_1 (also, please name it
 in the same way).
Done.


 +#if !defined _X86INTRIN_H_INCLUDED  !defined _IMMINTRIN_H_INCLUDED

 No need for immintrin.h check
Done.

Thanks fr review!
Here is updated patch. Tests passing:
ChangeLog entry:
2012-07-25  Kirill Yukhin  kirill.yuk...@intel.com
Michael Zolotukhin  michael.v.zolotuk...@intel.com

* common/config/i386/i386-common.c (OPTION_MASK_ISA_RDSEED_SET): New.
(OPTION_MASK_ISA_RDSEED_UNSET): Likewise.
(ix86_handle_option): Handle mrdseed option.
* config.gcc (i[34567]86-*-*): Add rdseedintrin.h.
(x86_64-*-*): Likewise.
* config/i386/prfchwintrin.h: New header.
* config/i386/cpuid.h (bit_RDSEED): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
RDSEED support.
* config/i386/i386-c.c: Define __RDSEED__ if needed.
* config/i386/i386.c (ix86_target_string): Define
-mrdseed option.
(PTA_RDSEED): New.
(ix86_option_override_internal): Handle new option.
(ix86_valid_target_attribute_inner_p): Add OPT_mrdseed.
(ix86_builtins): Add enum entries for RDSEED* builtins.
(ix86_init_mmx_sse_builtins): Define new builtins.
(ix86_expand_builtin): Expand RDSEED* builtins.
* config/i386/i386.h (TARGET_RDSEED): New.
* config/i386/i386.md (rdseedmode_1): New.
* config/i386/i386.opt (mrdseed): New.
* config/i386/x86intrin.h: Include rdseedintrin.h.

testsuite/ChangeLog unchanged.

Is it OK?

Thanks, K


bdw-rdseed-2.gcc.patch
Description: Binary data


[PATCH] Follow-up to the last gengtype patch: handle DEF_VEC_A in gengtype

2012-07-30 Thread Laurynas Biveinis
I only remembered to add DEF_VEC_A handlgin to gengtype.c a second after 
committing the previous patch [1].

Here it is, done as a follow up. With some luck, this will be short-lived code 
because of the C++ conversion.

Bootstrapped and regtested on x86_64 linux. OK for trunk?

2012-07-30  Laurynas Biveinis  laurynas.bivei...@gmail.com

* gengtype.h (enum gc_vec_type_kind): New.
List individual vector kinds in the token codes.
* gengtype-lex.l: Handle DEF_VEC_A, DEF_VEC_O, DEF_VEC_P,
DEF_VEC_I individually.
* gengtype-parse.c (token_names): Replace DEF_VEC_[OP] with
individual vector token names.
(def_vec): handle vector token types separately.
(parse_file): handle the new vector token types.
* gengtype.c (note_def_vec): remove is_scalar argument, introduce
vec_type_argument instead.  Create GTY option and resolve the
vector element type according to vec_type_argument value.

Index: gcc/gcc/gengtype-parse.c
===
--- gcc/gcc/gengtype-parse.c	(revision 189950)
+++ gcc/gcc/gengtype-parse.c	(working copy)
@@ -77,9 +77,11 @@
   struct,
   enum,
   VEC,
-  DEF_VEC_[OP],
+  DEF_VEC_A,
+  DEF_VEC_O,
+  DEF_VEC_P
   DEF_VEC_I,
-  DEF_VEC_ALLOC_[IOP],
+  DEF_VEC_ALLOC_[AIOP],
   ...,
   ptr_alias,
   nested_ptr,
@@ -893,17 +895,37 @@
 
 /* Definition of a generic VEC structure:
 
-   'DEF_VEC_[IPO]' '(' id ')' ';'
+   'DEF_VEC_[AIOP]' '(' id ')' ';'
 
-   Scalar VECs require slightly different treatment than otherwise -
-   that's handled in note_def_vec, we just pass it along.*/
+*/
 static void
 def_vec (void)
 {
-  bool is_scalar = (token () == DEFVEC_I);
+  enum gc_vec_type_kind vec_type_kind;
   const char *type;
 
-  require2 (DEFVEC_OP, DEFVEC_I);
+  switch (token ())
+{
+case DEFVEC_A:
+  vec_type_kind = VEC_TYPE_ATOMIC;
+  advance ();
+  break;
+case DEFVEC_I:
+  vec_type_kind = VEC_TYPE_INTEGRAL;
+  advance ();
+  break;
+case DEFVEC_O:
+  vec_type_kind = VEC_TYPE_OBJECT;
+  advance ();
+  break;
+case DEFVEC_P:
+  vec_type_kind = VEC_TYPE_POINTER;
+  advance ();
+  break;
+default:
+  gcc_unreachable ();
+}
+
   require ('(');
   type = require2 (ID, SCALAR);
   require (')');
@@ -912,13 +934,13 @@
   if (!type)
 return;
 
-  note_def_vec (type, is_scalar, lexer_line);
+  note_def_vec (type, vec_type_kind, lexer_line);
   note_def_vec_alloc (type, none, lexer_line);
 }
 
 /* Definition of an allocation strategy for a VEC structure:
 
-   'DEF_VEC_ALLOC_[IPO]' '(' id ',' id ')' ';'
+   'DEF_VEC_ALLOC_[AIOP]' '(' id ',' id ')' ';'
 
For purposes of gengtype, this just declares a wrapper structure.  */
 static void
@@ -964,7 +986,9 @@
 	  typedef_decl ();
 	  break;
 
-	case DEFVEC_OP:
+	case DEFVEC_A:
+	case DEFVEC_O:
+	case DEFVEC_P:
 	case DEFVEC_I:
 	  def_vec ();
 	  break;
Index: gcc/gcc/gengtype-lex.l
===
--- gcc/gcc/gengtype-lex.l	(revision 189950)
+++ gcc/gcc/gengtype-lex.l	(working copy)
@@ -92,15 +92,23 @@
   return STATIC;
 }
 
-^{HWS}DEF_VEC_[OP]/{EOID} {
+^{HWS}DEF_VEC_A/{EOID} {
   BEGIN(in_struct);
-  return DEFVEC_OP;
+  return DEFVEC_A;
 }
+^{HWS}DEF_VEC_O/{EOID} {
+  BEGIN(in_struct);
+  return DEFVEC_O;
+}
+^{HWS}DEF_VEC_P/{EOID} {
+  BEGIN(in_struct);
+  return DEFVEC_P;
+}
 ^{HWS}DEF_VEC_I/{EOID} {
   BEGIN(in_struct);
   return DEFVEC_I;
 }
-^{HWS}DEF_VEC_ALLOC_[IOP]/{EOID} {
+^{HWS}DEF_VEC_ALLOC_[AIOP]/{EOID} {
   BEGIN(in_struct);
   return DEFVEC_ALLOC;
 }
Index: gcc/gcc/gengtype.c
===
--- gcc/gcc/gengtype.c	(revision 189951)
+++ gcc/gcc/gengtype.c	(working copy)
@@ -4269,28 +4269,39 @@
 
typedef struct VEC_type_base GTY(()) {
struct vec_prefix prefix;
-   type GTY((length (%h.prefix.num))) vec[1];
+   type GTY option vec[1];
} VEC_type_base
 
-   where the GTY(()) tags are only present if is_scalar is _false_.  */
-
+   where the GTY option depends on VEC_TYPE_KIND: it is GTY((atomic)) for
+   VEC_TYPE_ATOMIC, GTY((length (%h.prefix.num))) for VEC_TYPE_OBJECT and
+   VEC_TYPE_POINTER, and none for DEF_VEC_INTEGRAL.
+*/
 void
-note_def_vec (const char *type_name, bool is_scalar, struct fileloc *pos)
+note_def_vec (const char *type_name, enum gc_vec_type_kind vec_type_kind,
+	  struct fileloc *pos)
 {
   pair_p fields;
   type_p t;
   options_p o;
   const char *name = concat (VEC_, type_name, _base, (char *) 0);
 
-  if (is_scalar)
+  switch (vec_type_kind)
 {
+case VEC_TYPE_ATOMIC:
+  t = resolve_typedef (type_name, pos);
+  o = create_string_option(0, atomic, );
+  break;
+case VEC_TYPE_INTEGRAL:
   t = create_scalar_type (type_name);
-  o = 0;
-}
-  else
-{
+  o = NULL;
+  break;
+case VEC_TYPE_OBJECT:
+case VEC_TYPE_POINTER:
   t = 

Re: [PATCH] Intrinsics for PREFETCHW

2012-07-30 Thread Uros Bizjak
On Mon, Jul 30, 2012 at 2:05 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Ehm ...

 * gcc.target/i386/sse-13.c: Ditto.
 * gcc.target/i386/sse-14.c: Ditto.
 * g++.dg/other/i386-2.C: Ditto.
 * g++.dg/other/i386-3.C: Ditto.
 Sorry, what's wrong here?

Not here, but above Ehm... line you have:

 * gcc.target/i386/sse-12.c: Add -mprfchw.

You should add something else for rdseed.

Uros.


Re: [PATCH] Intrinsics for RDSEED

2012-07-30 Thread Uros Bizjak
On Mon, Jul 30, 2012 at 2:05 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 ChangeLog entry:
 2012-07-25  Kirill Yukhin  kirill.yuk...@intel.com
 Michael Zolotukhin  michael.v.zolotuk...@intel.com

 * common/config/i386/i386-common.c (OPTION_MASK_ISA_RDSEED_SET): New.
 (OPTION_MASK_ISA_RDSEED_UNSET): Likewise.
 (ix86_handle_option): Handle mrdseed option.
 * config.gcc (i[34567]86-*-*): Add rdseedintrin.h.
 (x86_64-*-*): Likewise.
 * config/i386/prfchwintrin.h: New header.
 * config/i386/cpuid.h (bit_RDSEED): New.
 * config/i386/driver-i386.c (host_detect_local_cpu): Detect
 RDSEED support.
 * config/i386/i386-c.c: Define __RDSEED__ if needed.
 * config/i386/i386.c (ix86_target_string): Define
 -mrdseed option.
 (PTA_RDSEED): New.
 (ix86_option_override_internal): Handle new option.
 (ix86_valid_target_attribute_inner_p): Add OPT_mrdseed.
 (ix86_builtins): Add enum entries for RDSEED* builtins.
 (ix86_init_mmx_sse_builtins): Define new builtins.
 (ix86_expand_builtin): Expand RDSEED* builtins.
 * config/i386/i386.h (TARGET_RDSEED): New.
 * config/i386/i386.md (rdseedmode_1): New.
 * config/i386/i386.opt (mrdseed): New.
 * config/i386/x86intrin.h: Include rdseedintrin.h.

 testsuite/ChangeLog unchanged.

Please put new insn pattern just after rdrndmode_1 pattern in the
i386.md file.

OK with that change.

Thanks,
Uros.


Re: RFA: implement C11 _Generic

2012-07-30 Thread Tom Tromey
 Joseph == Joseph S Myers jos...@codesourcery.com writes:

Tom I wasn't really aware of 6.3.2.1, but after reading it and re-reading
Tom 6.5.1.1, I think I agree with his model 0 interpretation: no promotion
Tom or conversion.

Tom I don't have a standards-based reason for this, though; just my belief
Tom that _Generic was omitted from 6.3.2.1 by mistake.

I re-re-read the sections and came up with a standards-based reason:

6.3 is about conversions, and the first paragraph starts several
operators convert   Based on this, and other such phrases in the
text, I think the entire section applies to operators.

However, _Generic is not called an operator in the text.  It is a
primary expression.

Therefore, 6.3.2 does not apply.

This seems plausible to me, if a bit legalistic.

Joseph * Conversion of qualified to unqualified types (which normally
Joseph would occur as part of lvalue-to-rvalue conversion, but might
Joseph also be an issue with casts to qualified types).  The given
Joseph example of a cbrt type-generic macro would only work as users
Joseph expect (given that an argument might be a variable of type
Joseph const long double, or a cast to const long double) if the
Joseph expression (whether lvalue or rvalue) is converted from
Joseph qualified to unqualified type.

It seems to me that keeping the qualifiers is obviously more useful.
Qualifiers can be stripped in various ways, but once stripped, can't be
regained.

The cbrt example can be salvaged by adding a few extra generic
associations.  This can even be easily done via a macro.

#define CLAUSE(TYPE, EXPR) \
   const TYPE: EXPR,
   TYPE: EXPR

#define cbrt(X) _Generic ((X), CLAUSE (double, ...), ...)

Joseph * Conversion of arrays and function designators to pointers.

If the above interpretation is correct then this conversion is not done
either.  IIUC.

Joseph * GCC doesn't have very well-defined semantics for whether an
Joseph rvalue has a qualified type or not.  This only appears as an
Joseph issue with typeof at present, but does require more care about
Joseph eliminating qualifiers for _Generic.

Joseph * build_unary_op has code
[...]

Thanks.  I'll take a look, but these sound a bit scary at first glance.

Tom


Re: [PATCH] Intrinsics for RDSEED

2012-07-30 Thread Kirill Yukhin

 OK with that change.

Thanks a lot!
Checked into the trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-07/msg00878.html

Thanks, K


Re: [PATCH] Follow-up to the last gengtype patch: handle DEF_VEC_A in gengtype

2012-07-30 Thread Steven Bosscher
On Mon, Jul 30, 2012 at 2:41 PM, Laurynas Biveinis
laurynas.bivei...@gmail.com wrote:
 I only remembered to add DEF_VEC_A handlgin to gengtype.c a second after 
 committing the previous patch [1].

 Here it is, done as a follow up. With some luck, this will be short-lived 
 code because of the C++ conversion.

Hello Laurynas,

Thanks for taking care of this. Unfortunately there seems to be a
deeper problem with this PCH pointer rewriting stuff, it looks like
gengtype generates code that runs in time quadratic in the
GTY((length)) of a GTY array.

See http://gcc.gnu.org/PR53880#c27

Could you please have a look at that problem, and see if you, with all
your GTY-fu, see an easy way out?

Thanks,

Ciao!
Steven


Re: [CFT] s390: Convert from sync to atomic optabs

2012-07-30 Thread Ulrich Weigand
Richard Henderson wrote:

 Tested only as far as cross-compile.  I had a browse through
 objdump of libatomic for a brief sanity check.
 
 Can you please test on real hw and report back?

I'll run a test, but a couple of things I noticed:


/* Shift the values to the correct bit positions.  */
 -  if (!(ac.aligned  MEM_P (cmp)))
 -cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift);
 -  if (!(ac.aligned  MEM_P (new_rtx)))
 -new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift);
 +  cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift);
 +  new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift);

This seems to disable use of ICM / STCM to perform byte or
aligned halfword access.  Why is this necessary?  Those operations
are supposed to provide the required operand consistency ...

 +(define_insn atomic_loaddi_1
 +  [(set (match_operand:DI 0 register_operand =f,f)
 + (unspec:DI [(match_operand:DI 1 memory_operand R,m)]
 +UNSPEC_MOVA))]
 +  !TARGET_ZARCH
 +  @
 +   ld %0,%1
 +   ldy %0,%1
 +  [(set_attr op_type RX,RXY)
 +   (set_attr type floaddf)])

This seems to force DImode accesses through floating-point
registers, which is quite inefficient.  Why not allow LM/STM?
Those are supposed to provide doubleword consistency if the
operand is sufficiently aligned ...


[ From the Principles of Operations, section Block-Concurrent
  References:

  The instructions LOAD MULTIPLE (LM), LOAD MULTIPLE
  DISJOINT, LOAD MULTIPLE HIGH, STORE
  MULTIPLE (STM), and STORE MULTIPLE HIGH,
  when the operand or operands start on a word
  boundary; the instructions LOAD MULTIPLE (LMG)
  and STORE MULTIPLE (STMG), when the operand
  starts on a doubleword boundary; and the instructions
  COMPARE LOGICAL (CLC), COMPARE LOGICAL
  CHARACTERS UNDER MASK, INSERT
  CHARACTERS UNDER MASK, LOAD CONTROL
  (LCTLG), STORE CHARACTERS UNDER MASK,
  and STORE CONTROL (STCTG) access their storage
  operands in a left-to-right direction, and all bytes
  accessed within each doubleword appear to be
  accessed concurrently as observed by other CPUs.  ]


Otherwise the patch looks good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[PATCH, ARM] RFC: Backtracing through C++ exception-handling constructs

2012-07-30 Thread Julian Brown
Hi,

I've been investigating a patch which we've been using locally to fix
an issue with backtraces (using, e.g., glibc's backtrace() function)
through C++ exception-handling constructs on ARM. The original author of
the patch was Daniel Jacobowitz (please correct me if my understanding
is wrong!).

There are two issues in play here:

1. Exception-handling is handled in a target-specific way for ARM,
defined in the EHABI document (Exception handling ABI for the ARM
architecture, IHI 0038A). However, no mention of forced unwinding is
made in this document.

2. Backtracing in particular isn't even the normal use case for
forced unwinding: e.g.,

http://www.ucw.cz/~hubicka/papers/abi/node25.html#SECTION00923200

suggests that forced unwinding is a single-phase process (phase 2 of
the normal exception-handling process), whereas for producing a
backtrace, something more like a phase 1 lookup is done (no cleanup
handlers are called -- we're merely observing the state of the stack).

So, to be clear, we're definitely dealing with a corner case here. The
problem is that _Unwind_Backtrace in libgcc will fail to make progress
in some cases if it hits a frame with a cleanup associated with it,
leading to unhelpful behaviour like (for the attached program):

bar calling abort
abort handler invoked, depth 25
./test() [0x8968]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(__default_rt_sa_restorer_v2+0)
 [0x401cf860]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(gsignal+0x40) [0x401ce5e0]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]
./test() [0x8a4c]

which is clearly wrong, no matter how you look at it. I'll defer to Dan
for a better description of the problem/fix (on a private branch, circa
February 2010):

This bug was a failure of backtrace() when presented with a C++ abort
- in particular, one which inherited throw() but called cout, so
needed its own call to __cxa_call_unexpected.  We'd get stuck in a
loop in _Unwind_Backtrace because the code was not prepared for the
handler to return _URC_HANDLER_FOUND.

The GCC ARM unwinders already have _US_FORCED_UNWIND passed to the
personality routine.  ISTM that forced unwinding when doing
essentially a 'phase 1' lookup has no other useful meaning, and this
is a useful meaning to assign to it: skip handlers, keep unwinding.

The patch still seems to produce a reasonable improvement in
behaviour, giving:

build$ arm-none-linux-gnueabi-g++ test.cc -o test -g

$ ./test
bar calling abort   
abort handler invoked, depth 5  
./test() [0x8968]   
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(__default_rt_sa_restorer_v2+0)
 [0x401cf860]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(gsignal+0x40) [0x401ce5e0] 
./test() [0x8a4c]   
./test() [0x8a4c]

although tbh I'd hope for backtrace_symbols to produce something a
little more useful than that (unrelated to this patch), and I'd also
expect identical results from the test whether the throw () is
present on the declaration of bar, or not -- which unfortunately
isn't the case. Without throw (), we get:

bar calling abort
abort handler invoked, depth 8
./test() [0x897c]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(__default_rt_sa_restorer_v2+0)
 [0x401cf860]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(gsignal+0x40) [0x401ce5e0]
./test() [0x8a60]
./test() [0x8a60]
./test() [0x8a7c]
./test() [0x8acc]
../install/arm-none-linux-gnueabi/libc/lib/libc.so.6(__libc_start_main+0x114) 
[0x401b8e44]

I.e., it looks like the backtrace progresses all the way to the
outermost frame -- which IIUC, was the intended resulting behaviour for
the attached patch to start with.

So: does anyone have an opinion about whether the attached is a correct
fix, or if the spinning-during-backtrace problem might have a better
solution? (I'm a little fuzzy on the intricate details of all this
stuff!).

Thanks,

Julian

ChangeLog

Daniel Jacobowitz  d...@false.org

libstdc++-v3/
* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): For
ARM EABI, skip handlers for _US_VIRTUAL_UNWIND_FRAME
| _US_FORCE_UNWIND.
commit eafec6ead2e5a5a5a1b6504311a6a7ec6f0420af
Author: Julian Brown jbr...@build6-lucid-cs.sje.mentorg.com
Date:   Wed Jul 25 11:43:08 2012 -0700

Backtrace through throw.

diff --git a/libstdc++-v3/libsupc++/eh_personality.cc b/libstdc++-v3/libsupc++/eh_personality.cc

[PATCH][5/n] into-SSA TLC

2012-07-30 Thread Richard Guenther

This makes into-SSA no longer rely on variable annotations and instead
uses on-the-side information local to into/update-SSA.  Lookups can
probably be avoided in some places if we pass around the auxiliar
information instead of looking it up all the time.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for now.
The remaining var-ann users are remove_unused_locals (the used flag)
and cfgexpands out-of-SSA.

Richard.

2012-07-30  Richard Guenther  rguent...@suse.de

* tree-flow.h (struct var_ann_d): Remove need_phi_state
and current_def members.
* tree-into-ssa.c (struct def_blocks_d): Remove var member.
(def_blocks): Remove.
(struct var_info_d): New.
(var_infos): New hashtable.
(struct ssa_name_info): Add def_blocks member.
(get_ssa_name_ann): Adjust.
(get_var_info): New function.
(get_phi_state, set_phi_state, get_current_def,
set_current_def, get_def_blocks_for, find_def_blocks_for): Adjust.
(insert_phi_nodes_compare_def_blocks): Rename to ...
(insert_phi_nodes_compare_var_infos): ... this and adjust.
(insert_phi_nodes): Adjust.
(dump_tree_ssa, dump_tree_ssa_stats): Adjust.
(def_blocks_hash, def_blocks_eq, def_blocks_free): Remove.
(debug_def_blocks_r): Rename to ...
(debug_var_infos_r): ... this and adjust.
(var_info_hash): New function.
(var_info_eq): Likewise.
(rewrite_blocks): Adjust.
(init_ssa_renamer): Likewise.
(fini_ssa_renamer): Likewise.
(delete_update_ssa): Likewise.
(update_ssa): Likewise.
* tree-ssanames.c (release_dead_ssa_names): Do not clear
current defs.

Index: trunk/gcc/tree-flow.h
===
*** trunk.orig/gcc/tree-flow.h  2012-07-27 15:56:18.0 +0200
--- trunk/gcc/tree-flow.h   2012-07-30 16:19:27.453486374 +0200
*** struct GTY(()) var_ann_d {
*** 184,200 
   applied.  We set this when translating out of SSA form.  */
unsigned used : 1;
  
-   /* This field indicates whether or not the variable may need PHI nodes.
-  See the enum's definition for more detailed information about the
-  states.  */
-   ENUM_BITFIELD (need_phi_state) need_phi_state : 2;
- 
/* Used by var_map for the base index of ssa base variables.  */
unsigned base_index;
- 
-   /* During into-ssa and the dominator optimizer, this field holds the
-  current version of this variable (an SSA_NAME).  */
-   tree current_def;
  };
  
  
--- 184,191 
Index: trunk/gcc/tree-into-ssa.c
===
*** trunk.orig/gcc/tree-into-ssa.c  2012-07-30 14:14:03.0 +0200
--- trunk/gcc/tree-into-ssa.c   2012-07-30 16:25:04.292474732 +0200
*** along with GCC; see the file COPYING3.
*** 52,60 
 definitions for VAR.  */
  struct def_blocks_d
  {
-   /* The variable.  */
-   tree var;
- 
/* Blocks that contain definitions of VAR.  Bit I will be set if the
   Ith block contains a definition of VAR.  */
bitmap def_blocks;
--- 52,57 
*** struct def_blocks_d
*** 69,86 
  
  typedef struct def_blocks_d *def_blocks_p;
  
- DEF_VEC_P(def_blocks_p);
- DEF_VEC_ALLOC_P(def_blocks_p,heap);
- 
- 
- /* Each entry in DEF_BLOCKS contains an element of type STRUCT
-DEF_BLOCKS_D, mapping a variable VAR to a bitmap describing all the
-basic blocks where VAR is defined (assigned a new value).  It also
-contains a bitmap of all the blocks where VAR is live-on-entry
-(i.e., there is a use of VAR in block B without a preceding
-definition in B).  The live-on-entry information is used when
-computing PHI pruning heuristics.  */
- static htab_t def_blocks;
  
  /* Stack of trees used to restore the global currdefs to its original
 state after completing rewriting of a block and its dominator
--- 66,71 
*** struct mark_def_sites_global_data
*** 142,147 
--- 127,161 
  };
  
  
+ /* Information stored for decls.  */
+ struct var_info_d
+ {
+   /* The variable.  */
+   tree var;
+ 
+   /* This field indicates whether or not the variable may need PHI nodes.
+  See the enum's definition for more detailed information about the
+  states.  */
+   ENUM_BITFIELD (need_phi_state) need_phi_state : 2;
+ 
+   /* The current reaching definition replacing this SSA name.  */
+   tree current_def;
+ 
+   /* Definitions for this VAR.  */
+   struct def_blocks_d def_blocks;
+ };
+ 
+ /* The information associated with decls.  */
+ typedef struct var_info_d *var_info_p;
+ 
+ DEF_VEC_P(var_info_p);
+ DEF_VEC_ALLOC_P(var_info_p,heap);
+ 
+ /* Each entry in VAR_INFOS contains an element of type STRUCT 
+VAR_INFO_D.  */
+ static htab_t var_infos;
+ 
+ 
  /* Information stored for SSA names.  */
  struct ssa_name_info
  {
*** struct ssa_name_info
*** 160,165 

Re: RFA: implement C11 _Generic

2012-07-30 Thread Joseph S. Myers
On Mon, 30 Jul 2012, Tom Tromey wrote:

 6.3 is about conversions, and the first paragraph starts several
 operators convert   Based on this, and other such phrases in the
 text, I think the entire section applies to operators.

6.3.2.1 paragraphs 2 and 3 are phrased in terms of operators *preventing* 
conversion and certain conversions happening unless there is an operator 
to prevent them.

It seems entirely clear that you can use a function designator inside a 
braced initializer for an array of function pointers, for example, 
although an element of such an initializer is not an operand of an 
operator.

(I cannot find anything in a quick look through the ISO/IEC Directives 
Part 2 to indicate whether subclause titles such as Other operands are 
meant to be normative or informative.)

 However, _Generic is not called an operator in the text.  It is a
 primary expression.

A generic-selection is a primary expression.  _Generic is a keyword that 
is part of the syntax for a generic-selection.  C99 and C11 do not have a 
complete syntactic definition for operator anywhere (C90 did have such a 
syntax production).

 The cbrt example can be salvaged by adding a few extra generic
 associations.  This can even be easily done via a macro.
 
 #define CLAUSE(TYPE, EXPR) \
const TYPE: EXPR,
TYPE: EXPR
 
 #define cbrt(X) _Generic ((X), CLAUSE (double, ...), ...)

That doesn't handle volatile, or _Atomic.  If dealing with pointers you 
have restrict as well.  And in the presence of TR 18037 you have address 
space qualifiers, meaning such a macro cannot be written in a way agnostic 
to the set of address spaces on the system where the code is used.  
(Using +(X) or (X)+0 may work in some cases - if the promotions those 
cause aren't problems and it isn't a case where those operations are 
invalid.)

And to keep qualifiers here you'd need various other parts of the standard 
to be much clearer about whether qualifiers are present in the types of 
rvalues - an issue that previously hasn't been relevant.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] Target-specific limits on vector alignment

2012-07-30 Thread Ulrich Weigand
Richard Guenther wrote:
 On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
  OK for mainline?
 
 Ok.  Please add to the documentation that the default vector alignment
 has to be a power-of-two multiple of the default vector element alignment.

Committed, thanks.  The documentation now reads:

+  This hook can be used to define the alignment for a vector of type\n\
+ @var{type}, in order to comply with a platform ABI.  The default is to\n\
+ require natural alignment for vector types.  The alignment returned by\n\
+ this hook must be a power-of-two multiple of the default alignment of\n\
+ the vector element type.,

 You probably want to double-check vector_alignment_reachable_p as well
 which checks whether vector alignment can be reached by peeling off
 scalar iterations.

I've looked at the ARM implementation, and it still seems to correct
(and efficient) with vector alignment change (basically, unless the
element type is packed, everything is reachable).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [patch[ Add explanations to sbitmap, bitmap, and sparseset

2012-07-30 Thread Peter Bergner
On Fri, 27 Jul 2012 15:40:35 +0200 Richard Guenther 
richard.guent...@gmail.com wrote:
 Also it looks less efficient than sbitmap in the case when
 your main operation is adding to the set and querying the set randomly.

How so?  Adding/deleting a member to a sparseset is an O(1) operation,
as is querying whether something is/isn't a member of a sparseset.
Or are you talking about slower by some small constant factor?


 It's space overhead is really huge - for smaller universes a smaller
 SPARSESET_ELT_TYPE would be nice, templates to the rescue!  I
 wonder in which cases a unsigned HOST_WIDEST_FAST_INT sized
 universe is even useful (but a short instead of an int is probably too
 small ...)

Yes, space overhead it large, but the extra space overhead allows sparseset
to have O(1) operations for most set functions and O(N) for iterating over
the members of the set.  Obviously, you don't want to use this as in general
set usage, but where speed is critical, it has its uses.

Peter



Re: RFA: implement C11 _Generic

2012-07-30 Thread Tom Tromey
 Joseph == Joseph S Myers jos...@codesourcery.com writes:

Joseph On Mon, 30 Jul 2012, Tom Tromey wrote:
 6.3 is about conversions, and the first paragraph starts several
 operators convert   Based on this, and other such phrases in the
 text, I think the entire section applies to operators.

Joseph 6.3.2.1 paragraphs 2 and 3 are phrased in terms of operators
Joseph *preventing* conversion and certain conversions happening unless
Joseph there is an operator to prevent them.

Wow, I really don't read it that way at all.  Looking at it yet again,
now, I can't even really make it come out that way.

I think it is now clear that I ought to drop this.

Tom


Re: [patch[ Add explanations to sbitmap, bitmap, and sparseset

2012-07-30 Thread Richard Guenther
On Mon, Jul 30, 2012 at 4:43 PM, Peter Bergner berg...@vnet.ibm.com wrote:
 On Fri, 27 Jul 2012 15:40:35 +0200 Richard Guenther 
 richard.guent...@gmail.com wrote:
 Also it looks less efficient than sbitmap in the case when
 your main operation is adding to the set and querying the set randomly.

 How so?  Adding/deleting a member to a sparseset is an O(1) operation,
 as is querying whether something is/isn't a member of a sparseset.
 Or are you talking about slower by some small constant factor?

No, but less space efficient and of comparable speed as sbitmap which
is also O(1).

 It's space overhead is really huge - for smaller universes a smaller
 SPARSESET_ELT_TYPE would be nice, templates to the rescue!  I
 wonder in which cases a unsigned HOST_WIDEST_FAST_INT sized
 universe is even useful (but a short instead of an int is probably too
 small ...)

 Yes, space overhead it large, but the extra space overhead allows sparseset
 to have O(1) operations for most set functions and O(N) for iterating over
 the members of the set.  Obviously, you don't want to use this as in general
 set usage, but where speed is critical, it has its uses.

True.

Richard.

 Peter



Re: [patch[ Add explanations to sbitmap, bitmap, and sparseset

2012-07-30 Thread Steven Bosscher
On Mon, Jul 30, 2012 at 4:53 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 No, but less space efficient and of comparable speed as sbitmap which
 is also O(1).

But iterating an sbitmap has worse complexity than sparseset.

Ciao!
Steven


Re: [CFT] s390: Convert from sync to atomic optabs

2012-07-30 Thread Richard Henderson
On 2012-07-30 07:09, Ulrich Weigand wrote:
 Richard Henderson wrote:
 
 Tested only as far as cross-compile.  I had a browse through
 objdump of libatomic for a brief sanity check.

 Can you please test on real hw and report back?
 
 I'll run a test, but a couple of things I noticed:
 
 
/* Shift the values to the correct bit positions.  */
 -  if (!(ac.aligned  MEM_P (cmp)))
 -cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift);
 -  if (!(ac.aligned  MEM_P (new_rtx)))
 -new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift);
 +  cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift);
 +  new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift);
 
 This seems to disable use of ICM / STCM to perform byte or
 aligned halfword access.  Why is this necessary?  Those operations
 are supposed to provide the required operand consistency ...

Because MEM_P for cmp and new_rtx are always false.  The expander
always requests register_operand for those.  I suppose I could back
out merging those cases into the macro.

I presume a good test case to examine for ICM is with such an operand
coming from a global.  What about STCM?  I don't see the output from
sync_compare_and_swap ever being allowed in memory...

 This seems to force DImode accesses through floating-point
 registers, which is quite inefficient.  Why not allow LM/STM?
 Those are supposed to provide doubleword consistency if the
 operand is sufficiently aligned ...

... because I only looked at the definition of LM which itself
doesn't mention consistency, and the definition of LPQ which talks
about LM not being suitable for quadword consistency, and came to
the wrong conclusion.

So now, looking at movdi_31, I see two problems that prevent just
using a normal move for the atomic_load/store_di: the o/d and d/b
alternatives which are split.  Is there some specific goodness that
those alternatives provide that is not had by reloading into the
Q/S memory patterns?


r~


[Ada] New restriction for lock-free implementation

2012-07-30 Thread Arnaud Charlet
This patch implements a new lock-free restriction. Thus, implicit dereferences
of access values prevent, as well as explicit dereference, the lock-free
implementation of protected objects.

The test below illustrates the new lock-free restriction:


-- Source --


generic
   type Elmt_Type is private;
   type Elmt_Access is access Elmt_Type;

package Test is
   type Node_Type;
   type Node_Access is access all Node_Type;

   type Node_Type is limited record
  Elmt : Elmt_Access;
  Prev : Node_Access;
   end record;

   protected List with Lock_Free is
  procedure Swap (L, R : Node_Access);
   private
  L : Node_Access := null;
   end List;
end Test;

package body Test is
   protected body List is
  --
  -- Swap --
  --

  procedure Swap (L, R : Node_Access) is
 LP : constant Node_Access := L.Prev;
 RP : constant Node_Access := R.Prev;

  begin
 L.Prev := RP;
 R.Prev := LP;
  end Swap;
   end List;
end Test;

-
-- Compilation --
-

$ gnatmake -q -gnat12 test.adb
test.adb:7:07: illegal body when Lock_Free given
test.adb:8:40: dereference of access value not allowed
test.adb:9:40: dereference of access value not allowed
test.adb:12:11: dereference of access value not allowed
test.adb:13:11: dereference of access value not allowed

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-07-30  Vincent Pucci  pu...@adacore.com

* sem_ch9.adb (Allows_Lock_Free_Implementation): Restrict implicit
dereferences of access values.

Index: sem_ch9.adb
===
--- sem_ch9.adb (revision 189974)
+++ sem_ch9.adb (working copy)
@@ -411,12 +411,15 @@
 
 return Abandon;
 
- --  Explicit dereferences restricted (i.e. dereferences of
- --  access values).
+ --  Dereferences of access values restricted
 
- elsif Kind = N_Explicit_Dereference then
+ elsif Kind = N_Explicit_Dereference
+   or else (Kind = N_Selected_Component
+ and then Is_Access_Type (Etype (Prefix (N
+ then
 if Lock_Free_Given then
-   Error_Msg_N (explicit dereference not allowed, N);
+   Error_Msg_N (dereference of access value  
+not allowed, N);
return Skip;
 end if;
 


Re: [patch[ Add explanations to sbitmap, bitmap, and sparseset

2012-07-30 Thread Richard Guenther
On Mon, Jul 30, 2012 at 5:14 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Jul 30, 2012 at 5:05 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 On Mon, Jul 30, 2012 at 4:53 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 No, but less space efficient and of comparable speed as sbitmap which
 is also O(1).

 But iterating an sbitmap has worse complexity than sparseset.

 Which is why I mentioned the common idiom of only random set and query
 operations.  The docs seem to suggest sparseset is appropriate there.

And even if we add iterating a combination of an sbitmap plus a VEC of elements
is cheaper if you don't remove elements from the set.

Richard.

 Richard.

 Ciao!
 Steven


[Ada] Fix handling of -A binder argument by gnatmake

2012-07-30 Thread Arnaud Charlet
This change fixes the circuitry that passes binder flags from gnatmake:
for some switches, relative path arguments are changed to absolute paths.
However, for gnatbind the -A switch must not undergo this transformation.

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-07-30  Thomas Quinot  qui...@adacore.com

* gnatcmd.adb, make.adb, makeutl.adb, makeutl.ads
(Test_If_Relative_Path): Rename to Ensure_Absolute_Path to better
reflect what this subprogram does. Rename argument Including_L_Switch
to For_Gnatbind, and also exempt -A from rewriting.
* bindusg.adb: Document optional =file argument to gnatbind -A.

Index: bindusg.adb
===
--- bindusg.adb (revision 189974)
+++ bindusg.adb (working copy)
@@ -6,7 +6,7 @@
 --  --
 --B o d y   --
 --  --
---  Copyright (C) 1992-2011, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2012, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -78,7 +78,7 @@
 
   --  Line for -A switch
 
-  Write_Line (  -AGive list of ALI files in partition);
+  Write_Line (  -A[=file] Give list of ALI files in partition);
 
   --  Line for -b switch
 
Index: gnatcmd.adb
===
--- gnatcmd.adb (revision 189974)
+++ gnatcmd.adb (working copy)
@@ -273,7 +273,7 @@
--  Add the -L and -l switches to the linker for all of the library
--  projects.
 
-   procedure Test_If_Relative_Path
+   procedure Ensure_Absolute_Path
  (Switch : in out String_Access;
   Parent : String);
--  Test if Switch is a relative search path switch. If it is and it
@@ -1303,20 +1303,20 @@
end Set_Library_For;
 
---
-   -- Test_If_Relative_Path --
+   -- Ensure_Absolute_Path --
---
 
-   procedure Test_If_Relative_Path
+   procedure Ensure_Absolute_Path
  (Switch : in out String_Access;
   Parent : String)
is
begin
-  Makeutl.Test_If_Relative_Path
+  Makeutl.Ensure_Absolute_Path
 (Switch, Parent,
  Do_Fail  = Osint.Fail'Access,
  Including_Non_Switch = False,
  Including_RTS= True);
-   end Test_If_Relative_Path;
+   end Ensure_Absolute_Path;
 
---
-- Non_VMS_Usage --
@@ -2387,7 +2387,7 @@
 --  arguments.
 
 for J in 1 .. Last_Switches.Last loop
-   GNATCmd.Test_If_Relative_Path
+   GNATCmd.Ensure_Absolute_Path
  (Last_Switches.Table (J), Current_Work_Dir);
 end loop;
 
@@ -2397,7 +2397,7 @@
Project_Dir : constant String := Name_Buffer (1 .. Name_Len);
 begin
for J in 1 .. First_Switches.Last loop
-  GNATCmd.Test_If_Relative_Path
+  GNATCmd.Ensure_Absolute_Path
 (First_Switches.Table (J), Project_Dir);
end loop;
 end;
Index: make.adb
===
--- make.adb(revision 189974)
+++ make.adb(working copy)
@@ -2366,7 +2366,7 @@
  Last_New := Last_New + 1;
  New_Args (Last_New) :=
new String'(Name_Buffer (1 .. Name_Len));
- Test_If_Relative_Path
+ Ensure_Absolute_Path
(New_Args (Last_New),
 Do_Fail  = Make_Failed'Access,
 Parent   = Dir_Path,
@@ -2399,7 +2399,7 @@
 Directory.Display_Name);
 
  begin
-Test_If_Relative_Path
+Ensure_Absolute_Path
   (New_Args (1),
Do_Fail  = Make_Failed'Access,
Parent   = Dir_Path,
@@ -5028,36 +5028,36 @@
   Get_Name_String (Main_Project.Directory.Display_Name);
   begin
  for J in 1 .. Binder_Switches.Last loop
-Test_If_Relative_Path
+Ensure_Absolute_Path
   (Binder_Switches.Table (J),
Do_Fail = Make_Failed'Access,
-   Parent = Dir_Path, Including_L_Switch = False);
+   Parent = Dir_Path, 

Re: [patch[ Add explanations to sbitmap, bitmap, and sparseset

2012-07-30 Thread Richard Guenther
On Mon, Jul 30, 2012 at 5:05 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Mon, Jul 30, 2012 at 4:53 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 No, but less space efficient and of comparable speed as sbitmap which
 is also O(1).

 But iterating an sbitmap has worse complexity than sparseset.

Which is why I mentioned the common idiom of only random set and query
operations.  The docs seem to suggest sparseset is appropriate there.

Richard.

 Ciao!
 Steven


[Ada] Fix value conversions for socket timeouts on Windows

2012-07-30 Thread Arnaud Charlet
This change adds a special case to Get_Socket_Option and Set_Socket_Option
to account for a deviation of Windows' behaviour with respect to the
standard sockets API: on that target, SO_RCVTIMEO and SO_SNDTIMEO expect
a DWORD containing a milliseconds count, not a struct timeval, and furthermore
if this milliseconds count is non-zero, then the actual timeout is 500 ms
greater.

No test (timing issue).

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-07-30  Thomas Quinot  qui...@adacore.com

* g-socket.adb (Get_Socket_Option, Set_Socket_Option): On Windows, the
value is a milliseconds count in a DWORD, not a struct timeval.

Index: g-socket.adb
===
--- g-socket.adb(revision 189974)
+++ g-socket.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
--- Copyright (C) 2001-2011, AdaCore --
+-- Copyright (C) 2001-2012, AdaCore --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -1112,6 +1112,7 @@
   Level  : Level_Type := Socket_Level;
   Name   : Option_Name) return Option_Type
is
+  use SOSC;
   use type C.unsigned_char;
 
   V8  : aliased Two_Ints;
@@ -1144,9 +1145,23 @@
 
  when Send_Timeout|
   Receive_Timeout =
-Len := VT'Size / 8;
-Add := VT'Address;
 
+--  The standard argument for SO_RCVTIMEO and SO_SNDTIMEO is a
+--  struct timeval, but on Windows it is a milliseconds count in
+--  a DWORD.
+
+pragma Warnings (Off);
+if Target_OS = Windows then
+   pragma Warnings (On);
+
+   Len := V4'Size / 8;
+   Add := V4'Address;
+
+else
+   Len := VT'Size / 8;
+   Add := VT'Address;
+end if;
+
  when Linger  |
   Add_Membership  |
   Drop_Membership =
@@ -1201,7 +1216,23 @@
 
  when Send_Timeout|
   Receive_Timeout =
-Opt.Timeout := To_Duration (VT);
+
+pragma Warnings (Off);
+if Target_OS = Windows then
+   pragma Warnings (On);
+
+   --  Timeout is in milliseconds, actual value is 500 ms +
+   --  returned value (unless it is 0).
+
+   if V4 = 0 then
+  Opt.Timeout := 0.0;
+   else
+  Opt.Timeout := Natural (V4) * 0.001 + 0.500;
+   end if;
+
+else
+   Opt.Timeout := To_Duration (VT);
+end if;
   end case;
 
   return Opt;
@@ -2176,6 +2207,8 @@
   Level  : Level_Type := Socket_Level;
   Option : Option_Type)
is
+  use SOSC;
+
   V8  : aliased Two_Ints;
   V4  : aliased C.int;
   V1  : aliased C.unsigned_char;
@@ -2236,10 +2269,33 @@
 
  when Send_Timeout|
   Receive_Timeout =
-VT  := To_Timeval (Option.Timeout);
-Len := VT'Size / 8;
-Add := VT'Address;
 
+pragma Warnings (Off);
+if Target_OS = Windows then
+   pragma Warnings (On);
+
+   --  On Windows, the timeout is a DWORD in milliseconds, and
+   --  the actual timeout is 500 ms + the given value (unless it
+   --  is 0).
+
+   V4 := C.int (Option.Timeout / 0.001);
+
+   if V4  500 then
+  V4 := V4 - 500;
+
+   elsif V4  0 then
+  V4 := 1;
+   end if;
+
+   Len := V4'Size / 8;
+   Add := V4'Address;
+
+else
+   VT  := To_Timeval (Option.Timeout);
+   Len := VT'Size / 8;
+   Add := VT'Address;
+end if;
+
   end case;
 
   Res := C_Setsockopt


Re: [SH] PR 39423

2012-07-30 Thread Richard Henderson
On 2012-07-29 15:56, Oleg Endo wrote:
 +   can_create_pseudo_p ()
 +  [(set (match_dup 5) (ashift:SI (match_dup 1) (match_dup 2)))
 +   (set (match_dup 6) (plus:SI (match_dup 5) (match_dup 3)))
 +   (set (match_dup 0) (mem:SI (plus:SI (match_dup 6) (match_dup 4]

Don't create new mems like this -- you've lost alias info.
You need to use replace_equiv_address or something on the
original memory.  Which means you have to actually capture
the memory operand somehow.

Better to use a custom predicate to match these memories with
these complex addresses, rather than list them out each time:

  [(set (match_operand:SI 0 arith_reg_dest =r)
(match_operand:SI 1 mem_index_disp_operand m))]


r~


Re: [PATCH] delete last traces of GO_IF_MODE_DEPENDENT_ADDRESS

2012-07-30 Thread Richard Henderson
On 2012-07-27 16:21, Nathan Froyd wrote:
   * defaults.h (GO_IF_MODE_DEPENDENT_ADDRESS): Delete.
   * targhooks.c (default_mode_dependent_address_p): Delete code
   for GO_IF_MODE_DEPENDENT_ADDRESS.
   * system.h (GO_IF_MODE_DEPENDENT_ADDRESS): Poison.
   * doc/tm.texi.in (GO_IF_MODE_DEPENDENT_ADDRESS): Delete documention.
   * doc/tm.texi: Regenerate.
   * config/alpha.h (GO_IF_MODE_DEPENDENT_ADDRESS): Move code to...
   * config/alpha.c (alpha_mode_dependent_address_p): ...here.  New
   function.
   (TARGET_MODE_DEPENDENT_ADDRESS_P): Define.
   * config/cr16/cr16.h (GO_IF_MODE_DEPENDENT_ADDRESS): Delete.
   * config/mep/mep.h (GO_IF_MODE_DEPENDENT_ADDRESS): Delete.
   * config/vax/vax-protos.h (vax_mode_dependent_address_p): Delete.
   * config/vax/vax.h (GO_IF_MODE_DEPENDENT_ADDRESS): Delete.
   * config/vax/vax.c (vax_mode_dependent_address_p): Make static.
   Take a const_rtx.
   (TARGET_MODE_DEPENDENT_ADDRESS_P): Define.

Ok.


r~


Re: RFA: implement C11 _Generic

2012-07-30 Thread Joseph S. Myers
On Mon, 30 Jul 2012, Tom Tromey wrote:

  Joseph == Joseph S Myers jos...@codesourcery.com writes:
 
 Joseph On Mon, 30 Jul 2012, Tom Tromey wrote:
  6.3 is about conversions, and the first paragraph starts several
  operators convert   Based on this, and other such phrases in the
  text, I think the entire section applies to operators.
 
 Joseph 6.3.2.1 paragraphs 2 and 3 are phrased in terms of operators
 Joseph *preventing* conversion and certain conversions happening unless
 Joseph there is an operator to prevent them.
 
 Wow, I really don't read it that way at all.  Looking at it yet again,
 now, I can't even really make it come out that way.

Except when it is the operand of [...] an lvalue [...] is converted 
[...] seems straightforward enough to me.  Thus, as another example

volatile int *p;
void f(void) { *p; }

dereferences the pointer when f is called (the lvalue *p is converted to 
an rvalue in the expression statement, without *p being an operand of an 
operator; if it were an operand of unary '', that would stop the 
conversion).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [CFT] s390: Convert from sync to atomic optabs

2012-07-30 Thread Ulrich Weigand
Richard Henderson wrote:
 On 2012-07-30 07:09, Ulrich Weigand wrote:
  This seems to disable use of ICM / STCM to perform byte or
  aligned halfword access.  Why is this necessary?  Those operations
  are supposed to provide the required operand consistency ...
 
 Because MEM_P for cmp and new_rtx are always false.  The expander
 always requests register_operand for those.  I suppose I could back
 out merging those cases into the macro.

Right, that's one of the reasons why we had two separate macros
for sync_compare_and_swap ...

 I presume a good test case to examine for ICM is with such an operand
 coming from a global.  What about STCM?  I don't see the output from
 sync_compare_and_swap ever being allowed in memory...

Actually, it's only ICM that is of interest here; it should get used when
either the comparison value or the new value come from a memory location,
e.g. a global.  Sorry, I was confused about STCM ...

  This seems to force DImode accesses through floating-point
  registers, which is quite inefficient.  Why not allow LM/STM?
  Those are supposed to provide doubleword consistency if the
  operand is sufficiently aligned ...
 
 ... because I only looked at the definition of LM which itself
 doesn't mention consistency, and the definition of LPQ which talks
 about LM not being suitable for quadword consistency, and came to
 the wrong conclusion.
 
 So now, looking at movdi_31, I see two problems that prevent just
 using a normal move for the atomic_load/store_di: the o/d and d/b
 alternatives which are split.  Is there some specific goodness that
 those alternatives provide that is not had by reloading into the
 Q/S memory patterns?

Well, they are there as splitters because reload assumes all moves
are handled somewhere, either by the mov pattern or else via a
secondary reload.  I've implemented all moves that *can* be
implemented without an extra register via splitters on the
mov pattern, and only those that absolute require the extra
register via secondary reload ...

Given that, it's probably best to use a separate instruction for
the DImode atomic moves after all, but allow GPRs using LM/STM.
(Only for Q/S constraint type addresses.  For those instructions,
we have to reload the address instead of performing two moves.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[committed] Fix handling of constant doubles in expand_mult

2012-07-30 Thread John David Anglin
Committed as obvious.

Tested on hppa2.0w-hp-hpux11.11 and hppa-unknown-linux-gnu.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

2012-07-30  John David Anglin  dave.ang...@nrc-cnrc.gc.ca

PR middle-end/53823
* expmed.c (expand_mult): Skip synth_mult for constant double op1 except
for special cases.  Don't initialize coeff and is_neg.

Index: expmed.c
===
--- expmed.c(revision 189920)
+++ expmed.c(working copy)
@@ -3176,8 +3176,8 @@
   if (INTEGRAL_MODE_P (mode))
 {
   rtx fake_reg;
-  HOST_WIDE_INT coeff = 0;
-  bool is_neg = false;
+  HOST_WIDE_INT coeff;
+  bool is_neg;
   int mode_bitsize;
 
   if (op1 == CONST0_RTX (mode))
@@ -3230,6 +3230,8 @@
}
  goto skip_synth;
}
+ else
+   goto skip_synth;
}
   else
goto skip_synth;


Re: [PATCH, ARM] RFC: Backtracing through C++ exception-handling constructs

2012-07-30 Thread Andrew Haley
On 07/30/2012 03:18 PM, Julian Brown wrote:
 There are two issues in play here:
 
 1. Exception-handling is handled in a target-specific way for ARM,
 defined in the EHABI document (Exception handling ABI for the ARM
 architecture, IHI 0038A). However, no mention of forced unwinding is
 made in this document.
 
 2. Backtracing in particular isn't even the normal use case for
 forced unwinding: e.g.,

 http://www.ucw.cz/~hubicka/papers/abi/node25.html#SECTION00923200
 
 suggests that forced unwinding is a single-phase process (phase 2 of
 the normal exception-handling process), whereas for producing a
 backtrace, something more like a phase 1 lookup is done (no cleanup
 handlers are called -- we're merely observing the state of the stack).

That's right.  I wrote that code.  I think I didn't realize that
forced unwinding was a single-phase process.

It looks to me like checking for _US_VIRTUAL_UNWIND_FRAME and
_US_FORCE_UNWIND, as you have done, is right.

Andrew.


P.S.  The ARM unwinder data was never intended for all the things we
do with it, and it's pretty much a matter of luck that it works.  I
don't really know why we didn't adopt DWARF unwinder data for ARM,
given that the ARM unwinder data is really only intended for
exceptions, and we need a lot more.


Re: [PATCH][5/n] into-SSA TLC

2012-07-30 Thread Michael Matz
Hi,

On Mon, 30 Jul 2012, Richard Guenther wrote:

 
 This makes into-SSA no longer rely on variable annotations and instead
 uses on-the-side information local to into/update-SSA.  Lookups can
 probably be avoided in some places if we pass around the auxiliar
 information instead of looking it up all the time.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for now.
 The remaining var-ann users are remove_unused_locals (the used flag)
 and cfgexpands out-of-SSA.

I have both removed locally.  There's one more implicit use of var_ann, 
namely as flag in-referenced-vars, I'm currently working to remove that
too.


Ciao,
Michael.


[PATCH, MIPS] -mno-float odds and ends

2012-07-30 Thread Sandra Loosemore
The MIPS back end has an option -mno-float that is supported by 
bare-metal configs using the SDE library.  However, this option is not 
properly documented in the manual, and MIPS_ARCH_FLOAT_SPEC doesn't know 
about it as one of the explicit floating-point configuration changes 
that should override architecture defaults.  This patch addresses both 
problems.  OK to commit?


-Sandra


2012-07-30  Sandra Loosemore  san...@codesourcery.com
Julian Brown  jul...@codesourcery.com

gcc/
* doc/invoke.texi (MIPS Options): Document -mno-float.
* config/mips/mips.h (MIPS_ARCH_FLOAT_SPEC): Make it know
about -mno-float.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 189950)
+++ gcc/doc/invoke.texi	(working copy)
@@ -733,7 +733,8 @@ Objective-C and Objective-C++ Dialects}.
 -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
 -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
 -mgp32  -mgp64  -mfp32  -mfp64  -mhard-float  -msoft-float @gol
--msingle-float  -mdouble-float  -mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
+-mno-float -msingle-float  -mdouble-float  @gol
+-mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
 -mmcu -mmno-mcu @gol
 -mfpu=@var{fpu-type} @gol
 -msmartmips  -mno-smartmips @gol
@@ -15633,6 +15634,11 @@ Use floating-point coprocessor instructi
 Do not use floating-point coprocessor instructions.  Implement
 floating-point calculations using library calls instead.
 
+@item -mno-float
+@opindex mno-float
+Prevents the use of all floating-point operations.  This option is presently 
+supported only by some bare-metal MIPS configurations.
+
 @item -msingle-float
 @opindex msingle-float
 Assume that the floating-point coprocessor only supports single-precision
Index: gcc/config/mips/mips.h
===
--- gcc/config/mips/mips.h	(revision 189950)
+++ gcc/config/mips/mips.h	(working copy)
@@ -713,7 +713,7 @@ struct mips_cpu_info {
link-compatible.  */
 
 #define MIPS_ARCH_FLOAT_SPEC \
-  %{mhard-float|msoft-float|march=mips*:; \
+  %{mhard-float|msoft-float|mno-float|march=mips*:; \
  march=vr41*|march=m4k|march=4k*|march=24kc|march=24kec \
  |march=34kc|march=74kc|march=1004kc|march=5kc \
  |march=octeon|march=xlr: -msoft-float;		  \


Re: [CFT] s390: Convert from sync to atomic optabs

2012-07-30 Thread Richard Henderson
On 07/30/2012 08:40 AM, Ulrich Weigand wrote:
  I presume a good test case to examine for ICM is with such an operand
  coming from a global.  What about STCM?  I don't see the output from
  sync_compare_and_swap ever being allowed in memory...
 Actually, it's only ICM that is of interest here; it should get used when
 either the comparison value or the new value come from a memory location,
 e.g. a global.  Sorry, I was confused about STCM ...
 

Well... it turns out to be just about impossible to get this to trigger.

With optimization on, the middle-end decides to promote the parameters
to registers immediately, which means that we never see a MEM in the
expander.  With optimization off, we don't propagate enough alignment
info so we never see ac.aligned = true.

It does look like we could relax the MEM_P requirement for Z10, so that
we use the register-based insv (RISBG).  I'll give that a go...


r~


[PATCH] Fix PR53733

2012-07-30 Thread William J. Schmidt
This fixes the de-canonicalization of commutative GIMPLE operations in
the vectorizer that occurs when processing reductions.  A loop_vec_info
is flagged for cleanup when a de-canonicalization has occurred in that
loop, and the cleanup is done when the loop_vec_info is destroyed.

Bootstrapped on powerpc64-unknown-linux-gnu with no new regressions.  Ok
for trunk?

Thanks,
Bill


gcc:

2012-07-30  Bill Schmidt  wschm...@linux.ibm.com

PR tree-optimization/53773
* tree-vectorizer.h (struct _loop_vec_info): Add operands_swapped.
(LOOP_VINFO_OPERANDS_SWAPPED): New macro.
* tree-vect-loop.c (new_loop_vec_info): Initialize
LOOP_VINFO_OPERANDS_SWAPPED field.
(destroy_loop_vec_info): Restore canonical form.
(vect_is_slp_reduction): Set LOOP_VINFO_OPERANDS_SWAPPED field.
(vect_is_simple_reduction_1): Likewise.

gcc/testsuite:

2012-07-30  Bill Schmidt  wschm...@linux.ibm.com

PR tree-optimization/53773
* testsuite/gcc.dg/vect/pr53773.c: New test.


Index: gcc/testsuite/gcc.dg/vect/pr53773.c
===
--- gcc/testsuite/gcc.dg/vect/pr53773.c (revision 0)
+++ gcc/testsuite/gcc.dg/vect/pr53773.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+int
+foo (int integral, int decimal, int power_ten)
+{
+  while (power_ten  0)
+{
+  integral *= 10;
+  decimal *= 10;
+  power_ten--;
+}
+
+  return integral+decimal;
+}
+
+/* Two occurrences in annotations, two in code.  */
+/* { dg-final { scan-tree-dump-times \\* 10 4 vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 189938)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -296,6 +296,12 @@ typedef struct _loop_vec_info {
  this.  */
   bool peeling_for_gaps;
 
+  /* Reductions are canonicalized so that the last operand is the reduction
+ operand.  If this places a constant into RHS1, this decanonicalizes
+ GIMPLE for other phases, so we must track when this has occurred and
+ fix it up.  */
+  bool operands_swapped;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -326,6 +332,7 @@ typedef struct _loop_vec_info {
 #define LOOP_VINFO_PEELING_HTAB(L) (L)-peeling_htab
 #define LOOP_VINFO_TARGET_COST_DATA(L) (L)-target_cost_data
 #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)-peeling_for_gaps
+#define LOOP_VINFO_OPERANDS_SWAPPED(L) (L)-operands_swapped
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
 VEC_length (gimple, (L)-may_misalign_stmts)  0
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 189938)
+++ gcc/tree-vect-loop.c(working copy)
@@ -853,6 +853,7 @@ new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_PEELING_HTAB (res) = NULL;
   LOOP_VINFO_TARGET_COST_DATA (res) = init_cost (loop);
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
+  LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
 
   return res;
 }
@@ -873,6 +874,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
   int j;
   VEC (slp_instance, heap) *slp_instances;
   slp_instance instance;
+  bool swapped;
 
   if (!loop_vinfo)
 return;
@@ -881,6 +883,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
 
   bbs = LOOP_VINFO_BBS (loop_vinfo);
   nbbs = loop-num_nodes;
+  swapped = LOOP_VINFO_OPERANDS_SWAPPED (loop_vinfo);
 
   if (!clean_stmts)
 {
@@ -905,6 +908,22 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
   for (si = gsi_start_bb (bb); !gsi_end_p (si); )
 {
   gimple stmt = gsi_stmt (si);
+
+ /* We may have broken canonical form by moving a constant
+into RHS1 of a commutative op.  Fix such occurrences.  */
+ if (swapped  is_gimple_assign (stmt))
+   {
+ enum tree_code code = gimple_assign_rhs_code (stmt);
+
+ if ((code == PLUS_EXPR
+  || code == POINTER_PLUS_EXPR
+  || code == MULT_EXPR)
+  CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt)))
+   swap_tree_operands (stmt,
+   gimple_assign_rhs1_ptr (stmt),
+   gimple_assign_rhs2_ptr (stmt));
+   }
+
  /* Free stmt_vec_info.  */
  free_stmt_vec_info (stmt);
   gsi_next (si);
@@ -1920,6 +1939,9 @@ vect_is_slp_reduction (loop_vec_info loop_info, gi
  gimple_assign_rhs1_ptr (next_stmt),
   gimple_assign_rhs2_ptr (next_stmt));
  update_stmt (next_stmt);
+
+ if (CONSTANT_CLASS_P (gimple_assign_rhs1 (next_stmt)))
+   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
}
  else
return false;
@@ -2324,6 +2346,9 @@ vect_is_simple_reduction_1 

Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-07-30 Thread Sriraman Tallam
On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill ja...@redhat.com wrote:

 On 07/10/2012 03:14 PM, Sriraman Tallam wrote:

 I am using the questions you asked previously
 to explain how I solved each of them. When working on this patch, these
 are the exact questions I had and tried to address it.

 * Does this attribute affect a function signature?

 The function signature should be changed when there is more than one
 definition/declaration of foo distinguished by unique target attributes.

 [...]

 I agree.  I was trying to suggest that these questions are what the front end 
 needs to care about, not about versioning specifically.  If these questions 
 are turned into target hooks, all of the logic specific to versioning can be 
 contained in the target.

 My only question intended to be answered by humans is, do people think moving 
 the versioning logic behind more generic target hooks is worthwhile?

I have  some comments related

For the example below,

// Default version.
int foo ()
{
  .
}

// Version  XXX feature supported by Target ABC.
int foo __attribute__ ((target (XXX)))
{
   
}

How should the second version of foo be treated for targets where
feature XXX is not supported? Right now, I am working on having my
patch completely ignore such function versions when compiled for
targets that do not understand the attribute. I could move this check
into a generic target hook so that a function definition that does not
make sense for the current target is ignored.

Also, currently the patch uses target hooks to do the following:

- Find if a particular version can be called directly, rather than go
through the dispatcher.
- Determine what the dispatcher body should be.
- Determining the order in which function versions must be dispatched.

I do not have a strong opinion on whether the entire logic should be
based on target hooks.

Thanks,
-Sri.




 Jason


Re: _GLIBCXX_END_NAMESPACE_* invalid closure order

2012-07-30 Thread Jonathan Wakely
On 30 July 2012 20:16, François Dumont wrote:
 Ok for trunk ?

OK, thanks.


Re: [PATCH, MIPS] -mno-float odds and ends

2012-07-30 Thread Richard Sandiford
Sandra Loosemore san...@codesourcery.com writes:
 The MIPS back end has an option -mno-float that is supported by 
 bare-metal configs using the SDE library.  However, this option is not 
 properly documented in the manual, and MIPS_ARCH_FLOAT_SPEC doesn't know 
 about it as one of the explicit floating-point configuration changes 
 that should override architecture defaults.  This patch addresses both 
 problems.  OK to commit?

OK, you're touching a sore spot here, but...

 +@item -mno-float
 +@opindex mno-float
 +Prevents the use of all floating-point operations.  This option is presently 
 +supported only by some bare-metal MIPS configurations.

...unfortunately, it doesn't prevent the use floating-point operations.
That's why it's such a bad option.  The only difference from the compiler
proper's point of view between -msoft-float and -mno-float is that they
define different preprocessor macros.

The onus is instead on the programmer to avoid writing anything that
might tempt the compiler into using floating-point operations.  If the
user gets it wrong, they get (at best) a link-time error rather than a
compile-time error.

I think we should document it that way.  E.g. something like:

@item -mno-float
@opindex mno-float
Equivalent to @option{-msoft-float}, but asserts that the user is
trying to avoid all floating-point operations.  This option is presently 
supported only by some bare-metal MIPS configurations, where it selects
a special set of libraries that lack all floating-point support
(including, for example, the floating-point @code{printf} formats).
If code compiled with @code{-mno-float} accidentally contains
floating-point operations, it is likely to suffer a link-time
or run-time failure.

but you're better at the wordsmithing than I am.

Perhaps we should document the __mips_no_float preprocessor macro too,
since that's how things like printf() know that they don't need the
floating-point stuff.

The mips.h part is OK though, thanks.  Feel free to apply it separately
if that's more convenient than keeping the patch together.

Richard


Re: [Patch ARM 4/6] Improve Neon intrinsics testsuite.

2012-07-30 Thread Julian Brown
On Mon, 30 Jul 2012 12:51:47 +0100
Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote:

This patch converts the testsuite generator to actually produce
 something more sensible than the current set of tests. It changes
 these to generate the following form for a test instead of the
 previous set of tests.
 
 It's careful to use the hard-fp variant so that we actually
 produce an instruction (atleast a move of the appropriate form) and
 uses a dummy floating point parameter to ensure this. This ensures
 that most tests are alright. This does increase test times quite a bit
 and I'm considering a follow-up to the build system that tries to do
 some of these tests in parallel.
 
 It's been useful and instructive so far and has found a few issues
 in the compiler and probably been the twistiest passage in this maze
 of twisty little passages.

The Ocaml bits mostly look fine to me, modulo a few formatting nits
(redundant brackets, begin/end pairs) fixed in the attached version,
which works the same as before AFAICT. I also factored out the
kind-of-duplicate print_args functions into a generalised version --
not amazingly useful, but mildly more concise.

HTH,

JulianIndex: neon-testgen.ml
===
--- neon-testgen.ml	(revision 189983)
+++ neon-testgen.ml	(working copy)
@@ -1,5 +1,6 @@
 (* Auto-generate ARM Neon intrinsics tests.
-   Copyright (C) 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software 
+   Foundation, Inc.
Contributed by CodeSourcery.
 
This file is part of GCC.
@@ -51,80 +52,84 @@
   Printf.fprintf chan /* This file was autogenerated by neon-testgen.  */\n\n;
   Printf.fprintf chan /* { dg-do assemble } */\n;
   Printf.fprintf chan /* { dg-require-effective-target arm_neon_ok } */\n;
-  Printf.fprintf chan /* { dg-options \-save-temps -O0\ } */\n;
+  Printf.fprintf chan /* { dg-options \-save-temps\ } */\n;
   Printf.fprintf chan /* { dg-add-options arm_neon } */\n;
-  Printf.fprintf chan \n#include \arm_neon.h\\n\n;
-  Printf.fprintf chan void test_%s (void)\n{\n test_name
+  Printf.fprintf chan \n#include \arm_neon.h\\n\n
 
-(* Emit declarations of local variables that are going to be passed
+(* Convert a list ORIGLIST to a string, calling MAP with a
+   monotonically-increasing index and the element for each element of that
+   list, and separating entries with SEP.  *)
+
+let idx_concat sep map origlist =
+  let buf = Buffer.create 30 in
+  let rec scan idx = function
+[] -
+  Buffer.contents buf
+  | [item] -
+  Buffer.add_string buf (map idx item);
+  Buffer.contents buf
+  | item::items -
+  Buffer.add_string buf (map idx item);
+  Buffer.add_string buf sep;
+  scan (succ idx) items in
+  scan 0 origlist
+
+(* Emit function with parameters and local variables that are going to be passed
to an intrinsic, together with one to take a returned value if needed.  *)
-let emit_automatics chan c_types features =
-  let emit () =
-ignore (
-  List.fold_left (fun arg_number - fun (flags, ty) -
-let pointer_bit =
-  if List.mem Pointer flags then * else 
-in
-  (* Const arguments to builtins are directly
- written in as constants.  *)
-  if not (List.mem Const flags) then
-Printf.fprintf chan   %s %sarg%d_%s;\n
-   ty pointer_bit arg_number ty;
-arg_number + 1)
- 0 (List.tl c_types))
+let emit_test_prologue chan c_types features test_name const_valuator =
+  let print_arg arg_number (flags, ty) =
+(* If the argument is of const type, then directly write in the
+   constant now.  *)
+let pointer_bit = if List.mem Pointer flags then * else  in
+Printf.sprintf %s %s arg%d_%s ty pointer_bit arg_number ty
   in
-match c_types with
-  (_, return_ty) :: tys -
-if return_ty  void then begin
-  (* The intrinsic returns a value.  We need to do explict register
- allocation for vget_low tests or they fail because of copy
- elimination.  *)
-  ((if List.mem Fixed_vector_reg features then
-  Printf.fprintf chan   register %s out_%s asm (\d18\);\n
- return_ty return_ty
-else if List.mem Fixed_core_reg features then
-  Printf.fprintf chan   register %s out_%s asm (\r0\);\n
- return_ty return_ty
-else
-  Printf.fprintf chan   %s out_%s;\n return_ty return_ty);
-	   emit ())
-end else
-  (* The intrinsic does not return a value.  *)
-  emit ()
-| _ - assert false
+  match c_types with
+(_, return_ty) :: tys -
+  Printf.fprintf chan
+%s 

Re: [SH] PR 39423

2012-07-30 Thread Oleg Endo
On Mon, 2012-07-30 at 08:28 -0700, Richard Henderson wrote:
 On 2012-07-29 15:56, Oleg Endo wrote:
  +   can_create_pseudo_p ()
  +  [(set (match_dup 5) (ashift:SI (match_dup 1) (match_dup 2)))
  +   (set (match_dup 6) (plus:SI (match_dup 5) (match_dup 3)))
  +   (set (match_dup 0) (mem:SI (plus:SI (match_dup 6) (match_dup 4]
 
 Don't create new mems like this -- you've lost alias info.
 You need to use replace_equiv_address or something on the
 original memory.  Which means you have to actually capture
 the memory operand somehow.
 
 Better to use a custom predicate to match these memories with
 these complex addresses, rather than list them out each time:
 
   [(set (match_operand:SI 0 arith_reg_dest =r)
   (match_operand:SI 1 mem_index_disp_operand m))]
 

Thanks!  I'll fix it in another patch.

Cheers,
Oleg 



Re: [C++ Patch] PR 53624

2012-07-30 Thread Jason Merrill

On 07/28/2012 11:28 AM, Paolo Carlini wrote:

as the testcase shows (merge of 53624  54104), in case of local types
(possibly synthesized for a lambda) we check for the default template
arguments of the synthesized template parameters according to the rules
for *types* (instead of those for functions) and we spuriously reject.
As far as I can see we should just return early in such cases, because
we already checked upstream, thus I figured out logic that apparently
works, but I'm not sure it's the most precise and concise we can have.


It seems to me that the problem here is that the template parameters in 
question are for an enclosing scope, not the temploid under 
consideration.  We shouldn't be doing the default argument ordering 
check (or the parameter pack order check) at all for non-primary templates.


Jason



[PATCH] shrink storage for target_expmed cost fields

2012-07-30 Thread Nathan Froyd
Now that we can freely change the representation of the cost fields in
struct target_expmed, the patch below does so, by only requiring arrays
to hold enough storage for integer modes and/or vector integer modes,
as appropriate.

default_target_expmed shrinks from ~200KB to ~85KB on
x86_64-unknown-linux-gnu as a result of this patch (20+ (!) vector
integer modes).  As a comparison point, it shrinks from ~120KB to ~45KB
on alpha-linux-gnu (5 vector integer modes).  So it should be helpful no
matter what your target looks like.

Tested on x86_64-unknown-linux-gnu.  OK to commit?

-Nathan

* expmed.h (NUM_MODE_VECTOR_INT): Define.
(struct expmed_op_cheap, struct expmed_op_costs): New structures.
(struct target_expmed): Convert x_mul_highpart_cost and
x_mul_widen_cost fields to be indexed by integer modes.
Convert x_sdiv_pow2_cheap and x_smod_pow2_cheap fields to be
of type struct expmed_op_cheap.  Convert other cost fields to be
of type struct_expmed_op_costs.
(mul_widen_cost_ptr, mul_highpart_cost_ptr): Adjust for new
indexing of respective fields.
(expmed_op_cheap_ptr): New function.
(sdiv_pow2_cheap_ptr, smod_pow2_cheap_ptr): Call it.
(expmed_op_cost_ptr): New function.
(add_cost_ptr, neg_cost_ptr, shift_cost_ptr, shiftadd_cost_ptr,
shiftsub0_cost_ptr, shiftsub1_cost_ptr, mul_cost_ptr,
sdiv_cost_ptr, udiv_cost_ptr): Call it.

---
 gcc/ChangeLog |   18 
 gcc/expmed.h  |  124 +
 2 files changed, 116 insertions(+), 26 deletions(-)

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 97e17f3..bde5cae 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -125,6 +125,23 @@ struct alg_hash_entry {
 #endif
 
 #define NUM_MODE_INT (MAX_MODE_INT - MIN_MODE_INT + 1)
+#define NUM_MODE_VECTOR_INT (MAX_MODE_VECTOR_INT - MIN_MODE_VECTOR_INT + 1)
+
+struct expmed_op_cheap {
+  /* Whether an operation is cheap in a given integer mode.  */
+  bool cheap_int[2][NUM_MODE_INT];
+
+  /* Whether an operation is cheap in a given vector integer mode.  */
+  bool cheap_vector_int[2][NUM_MODE_VECTOR_INT];
+};
+
+struct expmed_op_costs {
+  /* The cost of an operation in a given integer mode.  */
+  int int_cost[2][NUM_MODE_INT];
+
+  /* The cost of an operation in a given vector integer mode.  */
+  int vector_int_cost[2][NUM_MODE_VECTOR_INT];
+};
 
 /* Target-dependent globals.  */
 struct target_expmed {
@@ -140,23 +157,23 @@ struct target_expmed {
  powers of two, so don't use branches; emit the operation instead.
  Usually, this will mean that the MD file will emit non-branch
  sequences.  */
-  bool x_sdiv_pow2_cheap[2][NUM_MACHINE_MODES];
-  bool x_smod_pow2_cheap[2][NUM_MACHINE_MODES];
+  struct expmed_op_cheap x_sdiv_pow2_cheap;
+  struct expmed_op_cheap x_smod_pow2_cheap;
 
   /* Cost of various pieces of RTL.  Note that some of these are indexed by
  shift count and some by mode.  */
   int x_zero_cost[2];
-  int x_add_cost[2][NUM_MACHINE_MODES];
-  int x_neg_cost[2][NUM_MACHINE_MODES];
-  int x_shift_cost[2][NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
-  int x_shiftadd_cost[2][NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
-  int x_shiftsub0_cost[2][NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
-  int x_shiftsub1_cost[2][NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
-  int x_mul_cost[2][NUM_MACHINE_MODES];
-  int x_sdiv_cost[2][NUM_MACHINE_MODES];
-  int x_udiv_cost[2][NUM_MACHINE_MODES];
-  int x_mul_widen_cost[2][NUM_MACHINE_MODES];
-  int x_mul_highpart_cost[2][NUM_MACHINE_MODES];
+  struct expmed_op_costs x_add_cost;
+  struct expmed_op_costs x_neg_cost;
+  struct expmed_op_costs x_shift_cost[MAX_BITS_PER_WORD];
+  struct expmed_op_costs x_shiftadd_cost[MAX_BITS_PER_WORD];
+  struct expmed_op_costs x_shiftsub0_cost[MAX_BITS_PER_WORD];
+  struct expmed_op_costs x_shiftsub1_cost[MAX_BITS_PER_WORD];
+  struct expmed_op_costs x_mul_cost;
+  struct expmed_op_costs x_sdiv_cost;
+  struct expmed_op_costs x_udiv_cost;
+  int x_mul_widen_cost[2][NUM_MODE_INT];
+  int x_mul_highpart_cost[2][NUM_MODE_INT];
 
   /* Conversion costs are only defined between two scalar integer modes
  of different sizes.  The first machine mode is the destination mode,
@@ -195,12 +212,58 @@ set_alg_hash_used_p (bool usedp)
   this_target_expmed-x_alg_hash_used_p = usedp;
 }
 
+/* Return a pointer to a boolean contained in EOC indicating whether
+   a particular operation performed in MODE is cheap when optimizing
+   for SPEED.  */
+
+static inline bool *
+expmed_op_cheap_ptr (struct expmed_op_cheap *eoc, bool speed,
+enum machine_mode mode)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT
+ || GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+
+  if (GET_MODE_CLASS (mode) == MODE_INT)
+{
+  int idx = mode - MIN_MODE_INT;
+  return eoc-cheap_int[speed][idx];
+}
+  else
+{
+  int idx = mode - MIN_MODE_VECTOR_INT;
+  return 

Re: [PATCH, MIPS] -mno-float odds and ends

2012-07-30 Thread Sandra Loosemore

On 07/30/2012 01:38 PM, Richard Sandiford wrote:


...unfortunately, it doesn't prevent the use floating-point operations.
That's why it's such a bad option.  The only difference from the compiler
proper's point of view between -msoft-float and -mno-float is that they
define different preprocessor macros.

The onus is instead on the programmer to avoid writing anything that
might tempt the compiler into using floating-point operations.  If the
user gets it wrong, they get (at best) a link-time error rather than a
compile-time error.

I think we should document it that way.  E.g. something like:

@item -mno-float
@opindex mno-float
Equivalent to @option{-msoft-float}, but asserts that the user is
trying to avoid all floating-point operations.  This option is presently
supported only by some bare-metal MIPS configurations, where it selects
a special set of libraries that lack all floating-point support
(including, for example, the floating-point @code{printf} formats).
If code compiled with @code{-mno-float} accidentally contains
floating-point operations, it is likely to suffer a link-time
or run-time failure.

but you're better at the wordsmithing than I am.


OK, I've gone with a slightly tweaked version of your wording.


Perhaps we should document the __mips_no_float preprocessor macro too,
since that's how things like printf() know that they don't need the
floating-point stuff.


Hmmm, I don't think that's necessary, at least as part of this patch; we 
don't document the related __mips_hard_float or __mips_soft_float 
preprocessor definitions, either.



The mips.h part is OK though, thanks.  Feel free to apply it separately
if that's more convenient than keeping the patch together.


I've checked in the attached version of the patch.  Thanks for the 
speedy review!  :-)


-Sandra


2012-07-30  Sandra Loosemore  san...@codesourcery.com
Julian Brown  jul...@codesourcery.com

gcc/
* doc/invoke.texi (MIPS Options): Document -mno-float.
* config/mips/mips.h (MIPS_ARCH_FLOAT_SPEC): Make it know
about -mno-float.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 189950)
+++ gcc/doc/invoke.texi	(working copy)
@@ -733,7 +733,8 @@ Objective-C and Objective-C++ Dialects}.
 -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
 -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
 -mgp32  -mgp64  -mfp32  -mfp64  -mhard-float  -msoft-float @gol
--msingle-float  -mdouble-float  -mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
+-mno-float -msingle-float  -mdouble-float  @gol
+-mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
 -mmcu -mmno-mcu @gol
 -mfpu=@var{fpu-type} @gol
 -msmartmips  -mno-smartmips @gol
@@ -15633,6 +15634,18 @@ Use floating-point coprocessor instructi
 Do not use floating-point coprocessor instructions.  Implement
 floating-point calculations using library calls instead.
 
+@item -mno-float
+@opindex mno-float
+Equivalent to @option{-msoft-float}, but additionally asserts that the
+program being compiled does not perform any floating-point operations.
+This option is presently supported only by some bare-metal MIPS
+configurations, where it may select a special set of libraries
+that lack all floating-point support (including, for example, the
+floating-point @code{printf} formats).  
+If code compiled with @code{-mno-float} accidentally contains
+floating-point operations, it is likely to suffer a link-time
+or run-time failure.
+
 @item -msingle-float
 @opindex msingle-float
 Assume that the floating-point coprocessor only supports single-precision
Index: gcc/config/mips/mips.h
===
--- gcc/config/mips/mips.h	(revision 189950)
+++ gcc/config/mips/mips.h	(working copy)
@@ -713,7 +713,7 @@ struct mips_cpu_info {
link-compatible.  */
 
 #define MIPS_ARCH_FLOAT_SPEC \
-  %{mhard-float|msoft-float|march=mips*:; \
+  %{mhard-float|msoft-float|mno-float|march=mips*:; \
  march=vr41*|march=m4k|march=4k*|march=24kc|march=24kec \
  |march=34kc|march=74kc|march=1004kc|march=5kc \
  |march=octeon|march=xlr: -msoft-float;		  \


Re: [PATCH, MIPS] -mno-float odds and ends

2012-07-30 Thread Sandra Loosemore

On 07/30/2012 01:38 PM, Richard Sandiford wrote:


...unfortunately, it doesn't prevent the use floating-point operations.
That's why it's such a bad option.  The only difference from the compiler
proper's point of view between -msoft-float and -mno-float is that they
define different preprocessor macros.

The onus is instead on the programmer to avoid writing anything that
might tempt the compiler into using floating-point operations.  If the
user gets it wrong, they get (at best) a link-time error rather than a
compile-time error.

I think we should document it that way.  E.g. something like:

@item -mno-float
@opindex mno-float
Equivalent to @option{-msoft-float}, but asserts that the user is
trying to avoid all floating-point operations.  This option is presently
supported only by some bare-metal MIPS configurations, where it selects
a special set of libraries that lack all floating-point support
(including, for example, the floating-point @code{printf} formats).
If code compiled with @code{-mno-float} accidentally contains
floating-point operations, it is likely to suffer a link-time
or run-time failure.

but you're better at the wordsmithing than I am.


OK, I've gone with a slightly tweaked version of your wording.


Perhaps we should document the __mips_no_float preprocessor macro too,
since that's how things like printf() know that they don't need the
floating-point stuff.


Hmmm, I don't think that's necessary, at least as part of this patch; we 
don't document the related __mips_hard_float or __mips_soft_float 
preprocessor definitions, either.



The mips.h part is OK though, thanks.  Feel free to apply it separately
if that's more convenient than keeping the patch together.


I've checked in the attached version of the patch.  Thanks for the 
speedy review!  :-)


-Sandra


2012-07-30  Sandra Loosemore  san...@codesourcery.com
Julian Brown  jul...@codesourcery.com

gcc/
* doc/invoke.texi (MIPS Options): Document -mno-float.
* config/mips/mips.h (MIPS_ARCH_FLOAT_SPEC): Make it know
about -mno-float.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 189950)
+++ gcc/doc/invoke.texi	(working copy)
@@ -733,7 +733,8 @@ Objective-C and Objective-C++ Dialects}.
 -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
 -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
 -mgp32  -mgp64  -mfp32  -mfp64  -mhard-float  -msoft-float @gol
--msingle-float  -mdouble-float  -mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
+-mno-float -msingle-float  -mdouble-float  @gol
+-mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
 -mmcu -mmno-mcu @gol
 -mfpu=@var{fpu-type} @gol
 -msmartmips  -mno-smartmips @gol
@@ -15633,6 +15634,18 @@ Use floating-point coprocessor instructi
 Do not use floating-point coprocessor instructions.  Implement
 floating-point calculations using library calls instead.
 
+@item -mno-float
+@opindex mno-float
+Equivalent to @option{-msoft-float}, but additionally asserts that the
+program being compiled does not perform any floating-point operations.
+This option is presently supported only by some bare-metal MIPS
+configurations, where it may select a special set of libraries
+that lack all floating-point support (including, for example, the
+floating-point @code{printf} formats).  
+If code compiled with @code{-mno-float} accidentally contains
+floating-point operations, it is likely to suffer a link-time
+or run-time failure.
+
 @item -msingle-float
 @opindex msingle-float
 Assume that the floating-point coprocessor only supports single-precision
Index: gcc/config/mips/mips.h
===
--- gcc/config/mips/mips.h	(revision 189950)
+++ gcc/config/mips/mips.h	(working copy)
@@ -713,7 +713,7 @@ struct mips_cpu_info {
link-compatible.  */
 
 #define MIPS_ARCH_FLOAT_SPEC \
-  %{mhard-float|msoft-float|march=mips*:; \
+  %{mhard-float|msoft-float|mno-float|march=mips*:; \
  march=vr41*|march=m4k|march=4k*|march=24kc|march=24kec \
  |march=34kc|march=74kc|march=1004kc|march=5kc \
  |march=octeon|march=xlr: -msoft-float;		  \


Re: [C++ Patch] PR 53624

2012-07-30 Thread Paolo Carlini

Hi,

On 07/30/2012 10:10 PM, Jason Merrill wrote:

On 07/28/2012 11:28 AM, Paolo Carlini wrote:

as the testcase shows (merge of 53624  54104), in case of local types
(possibly synthesized for a lambda) we check for the default template
arguments of the synthesized template parameters according to the rules
for *types* (instead of those for functions) and we spuriously reject.
As far as I can see we should just return early in such cases, because
we already checked upstream, thus I figured out logic that apparently
works, but I'm not sure it's the most precise and concise we can have.


It seems to me that the problem here is that the template parameters 
in question are for an enclosing scope, not the temploid under 
consideration.

Indeed, this is the problem.
  We shouldn't be doing the default argument ordering check (or the 
parameter pack order check) at all for non-primary templates.
Good, thanks. I didn't realize that we can use is_primary. Thus for 
example the below passes testing: is it Ok or we can implement the 
general idea in a different way?


Thanks!
Paolo.


Index: testsuite/g++.dg/cpp0x/temp_default5.C
===
--- testsuite/g++.dg/cpp0x/temp_default5.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/temp_default5.C  (revision 0)
@@ -0,0 +1,13 @@
+// { dg-options -std=c++11 }
+
+template class Z = void, class T
+void Foo(T)
+{
+  struct X {};
+}
+
+template class T = int, typename U
+void f(const U)
+{
+  auto g = [] () {};
+}
Index: cp/pt.c
===
--- cp/pt.c (revision 189981)
+++ cp/pt.c (working copy)
@@ -4267,7 +4267,8 @@ check_default_tmpl_args (tree decl, tree parms, in
 
   /* Core issue 226 (C++0x only): the following only applies to class
  templates.  */
-  if ((cxx_dialect == cxx98) || TREE_CODE (decl) != FUNCTION_DECL)
+  if ((cxx_dialect == cxx98)
+  || (TREE_CODE (decl) != FUNCTION_DECL  is_primary))
 {
   /* [temp.param]
 


[PATCH 0/2] Convert s390 to atomic optabs, v2

2012-07-30 Thread Richard Henderson
The atomic_load/storedi_1 patterns are fixed to use LM, STM.

I've had a go at generating better code in the HQImode CAS
loop for aligned memory, but I don't know that I'd call it
the most efficient thing ever.  Some of this is due to 
deficiencies in other parts of the compiler (including the
s390 backend):

  (1) MEM_ALIGN can't pass down full align+ofs data that we had
  during cfgexpand.  This means the opportunities for using
  the aligned path are less than they ought to be.

  (2) In get_pointer_alignment (used by get_builtin_sync_mem),
  we don't consider an ADDR_EXPR to return the full alignment
  that the type is due.  I'm sure this is to work around some
  other sort of usage via the string.h builtins, but it's
  less-than-handy in this case.

  I wonder if in get_builtin_sync_mem we ought to be using
  get_object_alignment (build_fold_indirect_ref (addr)) instead?

  Consider

struct S { int x; unsigned short y; } g_s;
unsigned short o, n;
void good() {
  __builtin_compare_exchange (g_s.y, o, n, 0, 0, 0);
}
void bad(S *p_s) {
  __builtin_compare_exchange (p_s-y, o, n, 0, 0, 0);
}

  where GOOD produces the aligned MEM that we need, and BAD doesn't.

  (3) Support for IC, and ICM via the insv pattern is lacking.
  I've added a tiny bit of support here, in the form of using
  the existing strict_low_part patterns, but most definitely we
  could do better.

  (4) The *sethighpartsi and *sethighpartdi_64 patterns ought to be
  more different.  As is, we can't insert into bits 48-56 of a
  DImode quantity, because we don't generate ICM for DImode,
  only ICMH.

  (5) Missing support for RISBGZ in the form of an extv/z expander.
  The existing *extv/z splitters probably ought to be conditionalized
  on !Z10.

  (6) The strict_low_part patterns should allow registers for at
  least Z10.  The SImode strict_low_part can use LR everywhere.

  (7) RISBGZ could be used for a 3-address constant lshrsi3 before
  srlk is available.

For the GOOD function above, and this patch set, for -O3 -march=z10:

larl%r3,s+4
lhrl%r0,o
lhi %r2,1
l   %r1,0(%r3)
nilh%r1,0
.L2:
lr  %r5,%r1
larl%r12,n
lr  %r4,%r1
risbg   %r4,%r0,32,47,16
icm %r5,3,0(%r12)
cs  %r4,%r5,0(%r3)
je  .L3
lr  %r5,%r4
nilh%r5,0
cr  %r5,%r1
lr  %r1,%r5
jne .L2
lhi %r2,0
.L3:
srl %r4,16
sthrl   %r4,o

Odd things:

   * O is forced into a register before reaching the expander, so we
 get the RISBG for that.  N is left in a memory and so we commit
 to using ICM for that.  Further, because of how strict_low_part
 is implemented we're committed to leaving that in memory.

   * We don't optimize the loop and hoist the LARL of N outside the loop.

   * Given that we're having to zap the mask in %r1 for the second
 compare anyway, I wonder if RISBG is really beneficial over OR.
 Is RISBG (or ICM for that matter) any faster (or even smaller)?


r~


Richard Henderson (2):
  s390: Reorg s390_expand_insv
  s390: Convert from sync to atomic optabs

 gcc/config/s390/s390-protos.h |3 +-
 gcc/config/s390/s390.c|  270 ++--
 gcc/config/s390/s390.md   |  401 +
 3 files changed, 465 insertions(+), 209 deletions(-)

-- 
1.7.7.6



[PATCH 1/2] s390: Reorg s390_expand_insv

2012-07-30 Thread Richard Henderson
Try RISBG last, after other mechanisms have failed; don't require
operands in registers for it but force them there instead.  Try a
limited form of ICM.
---
 gcc/config/s390/s390.c |  129 ++-
 1 files changed, 82 insertions(+), 47 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f72f49f..240fb7e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -4538,48 +4538,70 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
 {
   int bitsize = INTVAL (op1);
   int bitpos = INTVAL (op2);
+  enum machine_mode mode = GET_MODE (dest);
+  enum machine_mode smode = smallest_mode_for_size (bitsize, MODE_INT);
+  rtx op, clobber;
 
-  /* On z10 we can use the risbg instruction to implement insv.  */
-  if (TARGET_Z10
-   ((GET_MODE (dest) == DImode  GET_MODE (src) == DImode)
- || (GET_MODE (dest) == SImode  GET_MODE (src) == SImode)))
+  /* Generate INSERT IMMEDIATE (IILL et al).  */
+  /* (set (ze (reg)) (const_int)).  */
+  if (TARGET_ZARCH
+   register_operand (dest, word_mode)
+   (bitpos % 16) == 0
+   (bitsize % 16) == 0
+   const_int_operand (src, VOIDmode))
 {
-  rtx op;
-  rtx clobber;
+  HOST_WIDE_INT val = INTVAL (src);
+  int regpos = bitpos + bitsize;
 
-  op = gen_rtx_SET (GET_MODE(src),
-   gen_rtx_ZERO_EXTRACT (GET_MODE (dest), dest, op1, op2),
-   src);
-  clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, CC_REGNUM));
-  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clobber)));
+  while (regpos  bitpos)
+   {
+ enum machine_mode putmode;
+ int putsize;
+
+ if (TARGET_EXTIMM  (regpos % 32 == 0)  (regpos = bitpos + 32))
+   putmode = SImode;
+ else
+   putmode = HImode;
 
+ putsize = GET_MODE_BITSIZE (putmode);
+ regpos -= putsize;
+ emit_move_insn (gen_rtx_ZERO_EXTRACT (word_mode, dest,
+   GEN_INT (putsize),
+   GEN_INT (regpos)),
+ gen_int_mode (val, putmode));
+ val = putsize;
+   }
+  gcc_assert (regpos == bitpos);
   return true;
 }
 
-  /* We need byte alignment.  */
-  if (bitsize % BITS_PER_UNIT)
-return false;
-
+  /* Generate STORE CHARACTERS UNDER MASK (STCM et al).  */
   if (bitpos == 0
-   memory_operand (dest, VOIDmode)
+   (bitsize % BITS_PER_UNIT) == 0
+   MEM_P (dest)
(register_operand (src, word_mode)
  || const_int_operand (src, VOIDmode)))
 {
   /* Emit standard pattern if possible.  */
-  enum machine_mode mode = smallest_mode_for_size (bitsize, MODE_INT);
-  if (GET_MODE_BITSIZE (mode) == bitsize)
-   emit_move_insn (adjust_address (dest, mode, 0), gen_lowpart (mode, 
src));
+  if (GET_MODE_BITSIZE (smode) == bitsize)
+   {
+ emit_move_insn (adjust_address (dest, smode, 0),
+ gen_lowpart (smode, src));
+ return true;
+   }
 
   /* (set (ze (mem)) (const_int)).  */
   else if (const_int_operand (src, VOIDmode))
{
  int size = bitsize / BITS_PER_UNIT;
- rtx src_mem = adjust_address (force_const_mem (word_mode, src), 
BLKmode,
+ rtx src_mem = adjust_address (force_const_mem (word_mode, src),
+   BLKmode,
GET_MODE_SIZE (word_mode) - size);
 
  dest = adjust_address (dest, BLKmode, 0);
  set_mem_size (dest, size);
  s390_expand_movmem (dest, src_mem, GEN_INT (size));
+ return true;
}
 
   /* (set (ze (mem)) (reg)).  */
@@ -4602,42 +4624,55 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
  gen_rtx_LSHIFTRT (word_mode, src, GEN_INT
(GET_MODE_BITSIZE (SImode;
}
+ return true;
}
-  else
-   return false;
+}
 
-  return true;
+  /* Generate INSERT CHARACTERS UNDER MASK (IC, ICM et al).  */
+  if ((bitpos % BITS_PER_UNIT) == 0
+   (bitsize % BITS_PER_UNIT) == 0
+   (bitpos  32) == ((bitpos + bitsize - 1)  32)
+   MEM_P (src)
+   (mode == DImode || mode == SImode)
+   register_operand (dest, mode))
+{
+  /* Emit a strict_low_part pattern if possible.  */
+  if (bitpos == 0  GET_MODE_BITSIZE (smode) == bitsize)
+   {
+ op = gen_rtx_STRICT_LOW_PART (VOIDmode, gen_lowpart (smode, dest));
+ op = gen_rtx_SET (VOIDmode, op, gen_lowpart (smode, src));
+ clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, CC_REGNUM));
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clobber)));
+ return true;
+   }
+
+  /* ??? There are more powerful versions of ICM that are not
+completely 

[PATCH 2/2] s390: Convert from sync to atomic optabs

2012-07-30 Thread Richard Henderson
Split out s390_two_part_insv from s390_expand_cs_hqi to try
harder to use bit insertion instructions in the CAS loop.
---
 gcc/config/s390/s390-protos.h |3 +-
 gcc/config/s390/s390.c|  141 ++-
 gcc/config/s390/s390.md   |  401 +
 3 files changed, 383 insertions(+), 162 deletions(-)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 4f1eb42..79673d6 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -85,7 +85,8 @@ extern void s390_expand_setmem (rtx, rtx, rtx);
 extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (enum machine_mode, rtx, rtx, rtx, rtx);
+extern void s390_expand_cs_hqi (enum machine_mode, rtx, rtx, rtx,
+   rtx, rtx, bool);
 extern void s390_expand_atomic (enum machine_mode, enum rtx_code,
rtx, rtx, rtx, bool);
 extern rtx s390_return_addr_rtx (int, rtx);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 240fb7e..1006281 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -896,10 +896,12 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
conditional branch testing the result.  */
 
 static rtx
-s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem, rtx cmp, rtx 
new_rtx)
+s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
+   rtx cmp, rtx new_rtx)
 {
-  emit_insn (gen_sync_compare_and_swapsi (old, mem, cmp, new_rtx));
-  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM), 
const0_rtx);
+  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
+  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
+   const0_rtx);
 }
 
 /* Emit a jump instruction to TARGET.  If COND is NULL_RTX, emit an
@@ -4754,80 +4756,123 @@ init_alignment_context (struct alignment_context *ac, 
rtx mem,
   ac-modemaski = expand_simple_unop (SImode, NOT, ac-modemask, NULL_RTX, 1);
 }
 
+/* A subroutine of s390_expand_cs_hqi.  Insert INS into VAL.  If possible,
+   use a single insv insn into SEQ2.  Otherwise, put prep insns in SEQ1 and
+   perform the merge in SEQ2.  */
+
+static rtx
+s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
+   enum machine_mode mode, rtx val, rtx ins)
+{
+  rtx tmp;
+
+  if (ac-aligned)
+{
+  start_sequence ();
+  tmp = copy_to_mode_reg (SImode, val);
+  if (s390_expand_insv (tmp, GEN_INT (GET_MODE_BITSIZE (mode)),
+   const0_rtx, ins))
+   {
+ *seq1 = NULL;
+ *seq2 = get_insns ();
+ end_sequence ();
+ return tmp;
+   }
+  end_sequence ();
+}
+
+  /* Failed to use insv.  Generate a two part shift and mask.  */
+  start_sequence ();
+  tmp = s390_expand_mask_and_shift (ins, mode, ac-shift);
+  *seq1 = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  tmp = expand_simple_binop (SImode, IOR, tmp, val, NULL_RTX, 1, OPTAB_DIRECT);
+  *seq2 = get_insns ();
+  end_sequence ();
+
+  return tmp;
+}
+
 /* Expand an atomic compare and swap operation for HImode and QImode.  MEM is
-   the memory location, CMP the old value to compare MEM with and NEW_RTX the 
value
-   to set if CMP == MEM.
-   CMP is never in memory for compare_and_swap_cc because
-   expand_bool_compare_and_swap puts it into a register for later compare.  */
+   the memory location, CMP the old value to compare MEM with and NEW_RTX the
+   value to set if CMP == MEM.  */
 
 void
-s390_expand_cs_hqi (enum machine_mode mode, rtx target, rtx mem, rtx cmp, rtx 
new_rtx)
+s390_expand_cs_hqi (enum machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+   rtx cmp, rtx new_rtx, bool is_weak)
 {
   struct alignment_context ac;
-  rtx cmpv, newv, val, resv, cc;
+  rtx cmpv, newv, val, resv, cc, seq0, seq1, seq2, seq3;
   rtx res = gen_reg_rtx (SImode);
-  rtx csloop = gen_label_rtx ();
-  rtx csend = gen_label_rtx ();
+  rtx csloop = NULL, csend = NULL;
 
-  gcc_assert (register_operand (target, VOIDmode));
+  gcc_assert (register_operand (vtarget, VOIDmode));
   gcc_assert (MEM_P (mem));
 
   init_alignment_context (ac, mem, mode);
 
-  /* Shift the values to the correct bit positions.  */
-  if (!(ac.aligned  MEM_P (cmp)))
-cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift);
-  if (!(ac.aligned  MEM_P (new_rtx)))
-new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift);
-
   /* Load full word.  Subsequent loads are performed by CS.  */
   val = expand_simple_binop (SImode, AND, ac.memsi, ac.modemaski,
 NULL_RTX, 1, OPTAB_DIRECT);
 
+  /* Prepare insertions of cmp and new_rtx into the loaded value.  When
+ possible, we try to use insv 

Re: [PATCH] shrink storage for target_expmed cost fields

2012-07-30 Thread Richard Henderson
On 07/30/2012 02:05 PM, Nathan Froyd wrote:
   * expmed.h (NUM_MODE_VECTOR_INT): Define.
   (struct expmed_op_cheap, struct expmed_op_costs): New structures.
   (struct target_expmed): Convert x_mul_highpart_cost and
   x_mul_widen_cost fields to be indexed by integer modes.
   Convert x_sdiv_pow2_cheap and x_smod_pow2_cheap fields to be
   of type struct expmed_op_cheap.  Convert other cost fields to be
   of type struct_expmed_op_costs.
   (mul_widen_cost_ptr, mul_highpart_cost_ptr): Adjust for new
   indexing of respective fields.
   (expmed_op_cheap_ptr): New function.
   (sdiv_pow2_cheap_ptr, smod_pow2_cheap_ptr): Call it.
   (expmed_op_cost_ptr): New function.
   (add_cost_ptr, neg_cost_ptr, shift_cost_ptr, shiftadd_cost_ptr,
   shiftsub0_cost_ptr, shiftsub1_cost_ptr, mul_cost_ptr,
   sdiv_cost_ptr, udiv_cost_ptr): Call it.

Ok.


r~


Re: [C++ Patch] PR 53624

2012-07-30 Thread Jason Merrill

On 07/30/2012 06:26 PM, Paolo Carlini wrote:

+  if ((cxx_dialect == cxx98)
+  || (TREE_CODE (decl) != FUNCTION_DECL  is_primary))


We shouldn't do this check for non-primary templates in C++98 mode, either.

Jason



[patch] PR pch/53880

2012-07-30 Thread Steven Bosscher
Hello,

This PR concerns a huge compile time regression since
-ftrack-macro-expansion=2 became the default. It turns out that
gengtype generates code that is quadratic in the GTY((length)) of
arrays, and in this case (a PCH for a Boost header...) there are 183k
maps for macro expansion line maps in such an array. For comparison:
there are 2732 ordinary line maps...

The solution I've come up with, is to hoist a check that's inside the
loop over the elements in the array into the loop test, so that you
get changes in gtype-desc.c like this:

@@ -8963,7 +8963,7 @@ gt_pch_p_9line_maps (ATTRIBUTE_UNUSED vo
 size_t l0 = (size_t)(((*x).info_ordinary).used);
 if ((*x).info_ordinary.maps != NULL) {
   size_t i0;
-  for (i0 = 0; i0 != (size_t)(l0); i0++) {
+  for (i0 = 0; i0 != (size_t)(l0)  ((void
*)(*x).info_ordinary.maps == this_obj); i0++) {
 switch (((*x).info_ordinary.maps[i0]).reason == LC_ENTER_MACRO)
   {
   case 0:

Inside the loop there are more tests against this_obj, but GCC cannot
perform the unswitching with -funswitch-loops because it cannot
determine that the test is loop invariant (everything is a void*
pointer, there are indirect function calls, and all kinds of other
nasty stuff that inhibit good optimization of the gt-* stuff). So my
patch makes gengtype emit the test in the loop test.

The effect is quite dramatic: Compile time for the test case goes from
9 minutes to 12 seconds on powerpc64-unknown-linux-gnu :-)

Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


PR53880.diff
Description: Binary data


Re: [C++ Patch] PR 53624

2012-07-30 Thread Paolo Carlini

On 07/31/2012 12:42 AM, Jason Merrill wrote:

On 07/30/2012 06:26 PM, Paolo Carlini wrote:

+  if ((cxx_dialect == cxx98)
+  || (TREE_CODE (decl) != FUNCTION_DECL  is_primary))


We shouldn't do this check for non-primary templates in C++98 mode, 
either.

Yes. Thus the below also passes testing.

Thanks,
Paolo.

///
Index: testsuite/g++.dg/cpp0x/temp_default5.C
===
--- testsuite/g++.dg/cpp0x/temp_default5.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/temp_default5.C  (revision 0)
@@ -0,0 +1,13 @@
+// { dg-options -std=c++11 }
+
+template class Z = void, class T
+void Foo(T)
+{
+  struct X {};
+}
+
+template class T = int, typename U
+void f(const U)
+{
+  auto g = [] () {};
+}
Index: cp/pt.c
===
--- cp/pt.c (revision 189981)
+++ cp/pt.c (working copy)
@@ -4267,7 +4267,8 @@ check_default_tmpl_args (tree decl, tree parms, in
 
   /* Core issue 226 (C++0x only): the following only applies to class
  templates.  */
-  if ((cxx_dialect == cxx98) || TREE_CODE (decl) != FUNCTION_DECL)
+  if (is_primary
+   ((cxx_dialect == cxx98) || TREE_CODE (decl) != FUNCTION_DECL))
 {
   /* [temp.param]
 
@@ -4299,8 +4300,7 @@ check_default_tmpl_args (tree decl, tree parms, in
   TREE_PURPOSE (parm) = error_mark_node;
   no_errors = false;
 }
- else if (is_primary
-   !is_partial
+ else if (!is_partial
!is_friend_decl
   /* Don't complain about an enclosing partial
  specialization.  */


Re: [C++ Patch] PR 53624

2012-07-30 Thread Jason Merrill

OK.

Jason


[PATCH] Fix the LOOP_BRANCH prediction

2012-07-30 Thread Dehao Chen
Hi,

This patch fixed the problem when a LOOP_EXIT edge for the inner loop
happened to target at the LOOP_LATCH of the outer loop. As the outer
loop is processed first, the LOOP_BRANCH heuristic is honored
(first_match), thus the inner loop's trip count is 0. (The attached
unittest demonstrates this).

Bootstrapped and passed gcc regression test.

Is it ok for trunk?

Thanks,
Dehao

gcc/ChangeLog

2012-07-30  Dehao Chen  de...@google.com

* predict.c (predict_loops): Fix the prediction of LOOP_BRANCH.

gcc/testsuite/ChangeLog

2012-07-31  Dehao Chen  de...@google.com

* gcc.dg/predict-7.c: New test.

Index: gcc/testsuite/gcc.dg/predict-7.c
===
--- gcc/testsuite/gcc.dg/predict-7.c(revision 0)
+++ gcc/testsuite/gcc.dg/predict-7.c(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-profile_estimate } */
+
+extern int global;
+
+int bar (int);
+
+void foo (int base)
+{
+  int i;
+  while (global  10)
+for (i = base; i  10; i++)
+  bar (i);
+}
+
+/* { dg-final { scan-tree-dump-times loop branch heuristics 0
profile_estimate} } */
+/* { dg-final { cleanup-tree-dump profile_estimate } } */
Index: gcc/predict.c
===
--- gcc/predict.c   (revision 189835)
+++ gcc/predict.c   (working copy)
@@ -1404,7 +1404,7 @@

  /* Loop branch heuristics - predict an edge back to a
 loop's head as taken.  */
- if (bb == loop-latch)
+ if (bb == loop-latch  bb-loop_father == loop)
{
  e = find_edge (loop-latch, loop-header);
  if (e)


[PATCH] Set correct source location for deallocator calls

2012-07-30 Thread Dehao Chen
Hi,

This patch fixes the source location for automatically generated calls
to deallocator. For example:

 19 void foo(int i)
 20 {
 21   for (int j = 0; j  10; j++)
 22 {
 23   t test;
 24   test.foo();
 25   if (i + j)
 26 {
 27   test.bar();
 28   return;
 29 }
 30 }
 31   return;
 32 }

The deallocator for 23  t test is called in two places: Line 28 and
line 30. However, gcc attributes both callsites to line 30.

Bootstrapped and passed gcc regression tests.

Is it ok for trunk?

Thanks,
Dehao

gcc/ChangeLog

2012-07-31  Dehao Chen  de...@google.com

* tree-eh.c (goto_queue_node): New field.
(record_in_goto_queue): New parameter.
(record_in_goto_queue_label): New parameter.
(lower_try_finally_copy): Update source location.

gcc/testsuite/ChangeLog

2012-07-31  Dehao Chen  de...@google.com

* g++.dg/guality/deallocator.C: New test.

Index: gcc/testsuite/g++.dg/guality/deallocator.C
===
--- gcc/testsuite/g++.dg/guality/deallocator.C  (revision 0)
+++ gcc/testsuite/g++.dg/guality/deallocator.C  (revision 0)
@@ -0,0 +1,33 @@
+// Test that debug info generated for auto-inserted deallocator is
+// correctly attributed.
+// This patch scans for the lineno directly from assembly, which may
+// differ between different architectures. Because it mainly tests
+// FE generated debug info, without losing generality, only x86
+// assembly is scanned in this test.
+// { dg-do compile { target { i?86-*-* x86_64-*-* } } }
+// { dg-options -O2 -fno-exceptions -g }
+
+struct t {
+  t ();
+  ~t ();
+  void foo();
+  void bar();
+};
+
+int bar();
+
+void foo(int i)
+{
+  for (int j = 0; j  10; j++)
+{
+  t test;
+  test.foo();
+  if (i + j)
+   {
+ test.bar();
+ return;
+   }
+}
+  return;
+}
+// { dg-final { scan-assembler 1 28 0 } }
Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 189835)
+++ gcc/tree-eh.c   (working copy)
@@ -321,6 +321,7 @@
 struct goto_queue_node
 {
   treemple stmt;
+  enum gimple_code code;
   gimple_seq repl_stmt;
   gimple cont_stmt;
   int index;
@@ -560,7 +561,8 @@
 record_in_goto_queue (struct leh_tf_state *tf,
   treemple new_stmt,
   int index,
-  bool is_label)
+  bool is_label,
+ enum gimple_code code)
 {
   size_t active, size;
   struct goto_queue_node *q;
@@ -583,6 +585,7 @@
   memset (q, 0, sizeof (*q));
   q-stmt = new_stmt;
   q-index = index;
+  q-code = code;
   q-is_label = is_label;
 }

@@ -590,7 +593,8 @@
TF is not null.  */

 static void
-record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label)
+record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label,
+   enum gimple_code code)
 {
   int index;
   treemple temp, new_stmt;
@@ -629,7 +633,7 @@
  since with a GIMPLE_COND we have an easy access to the then/else
  labels. */
   new_stmt = stmt;
-  record_in_goto_queue (tf, new_stmt, index, true);
+  record_in_goto_queue (tf, new_stmt, index, true, code);
 }

 /* For any GIMPLE_GOTO or GIMPLE_RETURN, decide whether it leaves a try_finally
@@ -649,19 +653,22 @@
 {
 case GIMPLE_COND:
   new_stmt.tp = gimple_op_ptr (stmt, 2);
-  record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label (stmt));
+  record_in_goto_queue_label (tf, new_stmt, gimple_cond_true_label (stmt),
+ gimple_code (stmt));
   new_stmt.tp = gimple_op_ptr (stmt, 3);
-  record_in_goto_queue_label (tf, new_stmt,
gimple_cond_false_label (stmt));
+  record_in_goto_queue_label (tf, new_stmt, gimple_cond_false_label (stmt),
+ gimple_code (stmt));
   break;
 case GIMPLE_GOTO:
   new_stmt.g = stmt;
-  record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt));
+  record_in_goto_queue_label (tf, new_stmt, gimple_goto_dest (stmt),
+ gimple_code (stmt));
   break;

 case GIMPLE_RETURN:
   tf-may_return = true;
   new_stmt.g = stmt;
-  record_in_goto_queue (tf, new_stmt, -1, false);
+  record_in_goto_queue (tf, new_stmt, -1, false, gimple_code (stmt));
   break;

 default:
@@ -1234,6 +1241,7 @@
   for (index = 0; index  return_index + 1; index++)
{
  tree lab;
+ gimple_stmt_iterator gsi;

  q = labels[index].q;
  if (! q)
@@ -1252,6 +1260,11 @@

  seq = lower_try_finally_dup_block (finally, state);
  lower_eh_constructs_1 (state, seq);
+ for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (gsi))
+   gimple_set_location (gsi_stmt (gsi),
+q-code == GIMPLE_COND ?
+

TPF: disable discriminators

2012-07-30 Thread DJ Delorie

The TPF assembler supports dwarf4 discriminators, but the TPF
debuggers do not.  Ok to apply?

* config/s390/tpf.h (SUPPORTS_DISCRIMINATOR): Define to 0 for TPF.

Index: gcc/config/s390/tpf.h
===
--- gcc/config/s390/tpf.h   (revision 189993)
+++ gcc/config/s390/tpf.h   (working copy)
@@ -116,3 +116,6 @@
 #define MATH_LIBRARY CLBM
 #define LIBSTDCXX CPP2
 #endif /* ! _TPF_H */
+
+/* GAS supports it, but the debuggers don't, so avoid it.  */
+#define SUPPORTS_DISCRIMINATOR 0


Re: [PATCH] Follow-up to the last gengtype patch: handle DEF_VEC_A in gengtype

2012-07-30 Thread Laurynas Biveinis
Hi -

 See http://gcc.gnu.org/PR53880#c27

 Could you please have a look at that problem, and see if you, with all
 your GTY-fu, see an easy way out?

It looks like you beat me to it :)

-- 
Laurynas


Re: [patch] PR pch/53880

2012-07-30 Thread Laurynas Biveinis
Steven -

 Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Thanks for working on this. It looks good, couple of minor comments:

Please add an assert that d-have_this_obj == true in
write_types_local_process_field, before the oprintf that outputs
this_obj.

 @@ -3444,6 +3449,7 @@ write_local_func_for_structure (const_type_p orig_
d.prev_val[3] = x;
d.val = (*x);
d.fn_wants_lvalue = true;
 +  d.have_this_obj = false;

oprintf (d.of, \n);
oprintf (d.of, void\n);
 @@ -3458,6 +3464,7 @@ write_local_func_for_structure (const_type_p orig_
  s-kind == TYPE_UNION ? union : struct, s-u.s.tag,
  s-kind == TYPE_UNION ? union : struct, s-u.s.tag);
d.indent = 2;
 +  d.have_this_obj = true;
walk_type (s, d);
oprintf (d.of, }\n);
  }

The first store is dead here.

Thanks!

--
Laurynas