[RS6000] Fix PR53040

2012-04-20 Thread Alan Modra
I'm applying this to mainline as obvious.  If we save fprs inline for
any reason, we must also restore them inline.  (The same goes for
gprs, and that is handled later in this function.)
Bootstrapped and regresion tested powerpc-linux.

PR target/53040
* config/rs6000/rs6000.c (rs6000_savres_strategy): When using
static chain, set REST_INLINE_FPRS too.

diff -urp gcc-alan4/gcc/config/rs6000/rs6000.c 
gcc-alan5a/gcc/config/rs6000/rs6000.c
--- gcc-alan4/gcc/config/rs6000/rs6000.c2012-04-19 18:33:20.171843404 
+0930
+++ gcc-alan5a/gcc/config/rs6000/rs6000.c   2012-04-19 21:24:46.643632761 
+0930
@@ -17456,8 +17456,9 @@ rs6000_savres_strategy (rs6000_stack_t *
  static chain is rarely used anyway.  FPRs are saved w.r.t the stack
  pointer on Darwin.  */
   if (using_static_chain_p)
-strategy |= (DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
-   | SAVE_INLINE_GPRS;
+strategy |= ((DEFAULT_ABI == ABI_DARWIN
+ ? 0 : SAVE_INLINE_FPRS | REST_INLINE_FPRS)
+| SAVE_INLINE_GPRS);
 
   /* If we are going to use store multiple, then don't even bother
  with the out-of-line routines, since the store-multiple

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix PR53038, cfa_restore out of order

2012-04-20 Thread Alan Modra
);
+restore_saved_cr (cr_save_reg, using_mtcr_multiple, exit_func);
 
   /* If this is V.4, unwind the stack pointer after all of the loads
  have been done.  */
@@ -20877,11 +20940,8 @@
  int i;
  rtx sym;
 
- if ((DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
-  lr)
-   cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
-  gen_rtx_REG (Pmode, LR_REGNO),
-  cfa_restores);
+ if (flag_shrink_wrap)
+   cfa_restores = add_crlr_cfa_restore (info, cfa_restores);
 
  sym = rs6000_savres_routine_sym (info,
   /*savep=*/false,
@@ -20902,7 +20962,7 @@
  reg = gen_rtx_REG (DFmode, info-first_fp_reg_save + i);
 
  RTVEC_ELT (p, i + 4) = gen_rtx_SET (VOIDmode, reg, mem);
- if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
+ if (flag_shrink_wrap)
cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
   cfa_restores);
}

-- 
Alan Modra
Australia Development Lab, IBM


PowerPC prologue and epilogue 6

2012-04-21 Thread Alan Modra
This patch adds out-of-line vector saves and restores.  To do this I
made some infrastructure changes to various functions like
rs6000_emit_savres_rtx that currently take boolean parameters (savep,
gpr, and lr).  Rather than add yet another boolean to specify vector
regs, I chose to lump them all together in a bitmask.  This made the
patch a little larger but overall is a better interface, I think.

I also revert a change I made in
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01014.html to always use
r11 as a frame reg whenever abiv4 emits out-of-line saves.  Code
quality in functions with small frames is better without that
particular change.  This however meant some changes are required later
when setting up pointer regs for gpr and fpr out-of-line saves.

What else is here?  Improved register selection when saving vrsave in
the prologue and when restoring cr in the epilogue, allowing better
scheduling.  A fix to rs6000_output_function_prologue to output the
correct .extern for ELF, then deciding we don't need such things
anyway.  And various other little code cleanups.  Bootstrapped and
regression tested powerpc-linux.

gcc/
* config/rs6000/rs6000 (SAVE_INLINE_VRS, REST_INLINE_VRS,
V_SAVE_INLINE, SAVRES_LR, SAVRES_SAVE, SAVRES_REG,
SAVRES_GPR, SAVRES_FPR, SAVRES_VR): Define.
(no_global_regs_above): Delete.
(no_global_regs): New function.
(rs6000_savres_strategy): Handle vector regs.  Use proper lr_save_p
value for load multiple test.
(savres_routine_syms): Increase size.
(rs6000_savres_routine_name, rs6000_savres_routine_sym,
ptr_regno_for_savres, rs6000_emit_savres_rtx): Pass in int selector
rather than a number of boolean flags.  Update all callers.
(rs6000_savres_routine_name): Generate vector save/restore names.
(rs6000_savres_routine_sym): Handle vector regs.  Delete forward decl.
(ptr_regno_for_savres, rs6000_emit_savres_rtx): Likewise.
(rs6000_emit_prologue): Delete saving_FPRs_inline, saving_GPRs_inline
and using_store_multiple.  Expand uses.  Don't always use r11 as
frame reg when needed for out-of-line saves.  Set up initial offset
for out-of-line vector saves when buying stack frame.  Handle pointer
reg setup for out-of-line fp save.  Emit call to out-of-line vector
save function.  Choose r11 or r12 for vrsave reg when available for
better scheduling.
(rs6000_output_function_prologue): Don't emit .extern for ELF.
(rs6000_emit_epilogue): Choose a better frame reg when restoring
from back-chain to suit out-of-line vector restore functions.  Emit
call to out-of-line vector restore function.  Adjust register used
for cr restore.  Tweak pointer register setup for gpr restore.
* config/rs6000/rs6000.h (FIRST_SAVED_GP_REGNO): Take into account
FIXED_R13.
* config/rs6000/sysv4.h (FP_SAVE_INLINE, GP_SAVE_INLINE): Simplify.
(V_SAVE_INLINE): Define.
* config/rs6000/altivec.md (save_vregs_*, restore_vregs_*): New insns.
libgcc/
* config/rs6000/crtsavevr.S: New file.
* config/rs6000/crtrestvr.S: New file.
* config/rs6000/t-savresfgpr: Build the above.
* config/rs6000/t-netbsd: Likewise.

diff -urpN gcc-alan5/gcc/config/rs6000/rs6000.c 
gcc-alan6/gcc/config/rs6000/rs6000.c
--- gcc-alan5/gcc/config/rs6000/rs6000.c2012-04-19 20:55:02.214727782 
+0930
+++ gcc-alan6/gcc/config/rs6000/rs6000.c2012-04-21 15:47:44.193462791 
+0930
@@ -937,7 +937,6 @@ static bool legitimate_small_data_p (enu
 static bool legitimate_lo_sum_address_p (enum machine_mode, rtx, int);
 static struct machine_function * rs6000_init_machine_status (void);
 static bool rs6000_assemble_integer (rtx, unsigned int, int);
-static bool no_global_regs_above (int, bool);
 #if defined (HAVE_GAS_HIDDEN)  !TARGET_MACHO
 static void rs6000_assemble_visibility (tree, int);
 #endif
@@ -950,7 +949,6 @@ static tree rs6000_handle_struct_attribu
 static void rs6000_eliminate_indexed_memrefs (rtx operands[2]);
 static const char *rs6000_mangle_type (const_tree);
 static void rs6000_set_default_type_attributes (tree);
-static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -17405,6 +17403,21 @@ is_altivec_return_reg (rtx reg, void *xy
 }
 
 
+/* Look for user-defined global regs in the range FIRST to LAST-1.
+   We should not restore these, and so cannot use lmw or out-of-line
+   restore functions if there are any.  We also can't save them
+   (well, emit frame notes for them), because frame unwinding during
+   exception handling will restore saved registers.  */
+
+static bool
+global_regs_p (unsigned first, unsigned last)
+{
+  while (first  last)
+if 

Re: PowerPC prologue and epilogue 6

2012-04-24 Thread Alan Modra
 (void)
+{
+  char b[10];
+  void ns_0 (void)
+  {
+char a[33];
+__asm __volatile (#%0 %1 : =m (a), =m (b) );
+  }
+  ns_0 ();
+}
+
+int main (void)
+{
+  INIT_REGS;
+  USE_ALL_CR;
+#ifdef __ALTIVEC__
+  USE_ALL_VR;
+#ifndef __NO_FPRS__
+  USE_ALL_FPR;
+#endif
+#endif
+  USE_ALL_GPR;
+#ifdef __ALTIVEC__
+#ifndef __NO_FPRS__
+  b_all ();
+  VERIFY_REGS;
+  b_cvfr ();
+  VERIFY_REGS;
+  b_vfr ();
+  VERIFY_REGS;
+  b_cvf ();
+  VERIFY_REGS;
+  b_vf ();
+  VERIFY_REGS;
+#endif
+  b_cvr ();
+  VERIFY_REGS;
+  b_vr ();
+  VERIFY_REGS;
+  b_cv ();
+  VERIFY_REGS;
+  b_v ();
+  VERIFY_REGS;
+#endif
+#ifndef __NO_FPRS__
+  b_cfr ();
+  VERIFY_REGS;
+  b_fr ();
+  VERIFY_REGS;
+  b_cf ();
+  VERIFY_REGS;
+  b_f ();
+  VERIFY_REGS;
+#endif
+  b_cr ();
+  VERIFY_REGS;
+  b_r ();
+  VERIFY_REGS;
+  b_c ();
+  VERIFY_REGS;
+  b_0 ();
+  VERIFY_REGS;
+#ifdef __ALTIVEC__
+#ifndef __NO_FPRS__
+  s_all ();
+  VERIFY_REGS;
+  s_cvfr ();
+  VERIFY_REGS;
+  s_vfr ();
+  VERIFY_REGS;
+  s_cvf ();
+  VERIFY_REGS;
+  s_vf ();
+  VERIFY_REGS;
+#endif
+  s_cvr ();
+  VERIFY_REGS;
+  s_vr ();
+  VERIFY_REGS;
+  s_cv ();
+  VERIFY_REGS;
+  s_v ();
+  VERIFY_REGS;
+#endif
+#ifndef __NO_FPRS__
+  s_cfr ();
+  VERIFY_REGS;
+  s_fr ();
+  VERIFY_REGS;
+  s_cf ();
+  VERIFY_REGS;
+  s_f ();
+  VERIFY_REGS;
+#endif
+  s_cr ();
+  VERIFY_REGS;
+  s_r ();
+  VERIFY_REGS;
+  s_c ();
+  VERIFY_REGS;
+  s_0 ();
+  VERIFY_REGS;
+#ifdef __ALTIVEC__
+#ifndef __NO_FPRS__
+  wb_all ();
+  VERIFY_REGS;
+  wb_cvfr ();
+  VERIFY_REGS;
+  wb_vfr ();
+  VERIFY_REGS;
+  wb_cvf ();
+  VERIFY_REGS;
+  wb_vf ();
+  VERIFY_REGS;
+#endif
+  wb_cvr ();
+  VERIFY_REGS;
+  wb_vr ();
+  VERIFY_REGS;
+  wb_cv ();
+  VERIFY_REGS;
+  wb_v ();
+  VERIFY_REGS;
+#endif
+#ifndef __NO_FPRS__
+  wb_cfr ();
+  VERIFY_REGS;
+  wb_fr ();
+  VERIFY_REGS;
+  wb_cf ();
+  VERIFY_REGS;
+  wb_f ();
+  VERIFY_REGS;
+#endif
+  wb_cr ();
+  VERIFY_REGS;
+  wb_r ();
+  VERIFY_REGS;
+  wb_c ();
+  VERIFY_REGS;
+  wb_0 ();
+  VERIFY_REGS;
+#ifdef __ALTIVEC__
+#ifndef __NO_FPRS__
+  ws_all ();
+  VERIFY_REGS;
+  ws_cvfr ();
+  VERIFY_REGS;
+  ws_vfr ();
+  VERIFY_REGS;
+  ws_cvf ();
+  VERIFY_REGS;
+  ws_vf ();
+  VERIFY_REGS;
+#endif
+  ws_cvr ();
+  VERIFY_REGS;
+  ws_vr ();
+  VERIFY_REGS;
+  ws_cv ();
+  VERIFY_REGS;
+  ws_v ();
+  VERIFY_REGS;
+#endif
+#ifndef __NO_FPRS__
+  ws_cfr ();
+  VERIFY_REGS;
+  ws_fr ();
+  VERIFY_REGS;
+  ws_cf ();
+  VERIFY_REGS;
+  ws_f ();
+  VERIFY_REGS;
+#endif
+  ws_cr ();
+  VERIFY_REGS;
+  ws_r ();
+  VERIFY_REGS;
+  ws_c ();
+  VERIFY_REGS;
+  ws_0 ();
+  VERIFY_REGS;
+  return 0;
+}


-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Fix PR53038, cfa_restore out of order

2012-04-25 Thread Alan Modra
On Wed, Apr 25, 2012 at 02:00:48PM -0700, Richard Henderson wrote:
 On 04/20/12 18:58, Alan Modra wrote:
  +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, set);
  +  RTX_FRAME_RELATED_P (insn) = 1;
 
 You're probably better off using REG_CFA_REGISTER here,
 rather than making dwarf2cfi.c guess what you meant.
 The guessing is rather tuned to prologues...

It worked!  I take your point though, and thanks for reviewing the
patch!  The new version is just s/FRAME_RELATED_EXPR/CFA_REGISTER/
but I'll post the entire patch again since my other prologue/epilogue
patches mean the original pr53038 fix no longer applies cleanly.

Bootstrapped and regression tested powerpc-linux.

PR target/53038
* config/rs6000/rs6000.c (load_lr_save, restore_saved_lr,
load_cr_save, add_crlr_cfa_restore): New functions.
(rs6000_restore_saved_cr): Rename to..
(restore_saved_cr): ..this.  Add cfa_restore notes for cr.
(rs6000_emit_epilogue): Use new functions.  Adjust condition
for emitting lr and cr cfa_restore.  Emit cfa_restores for fp
regs when using out-of-line restore only when shrink wrapping.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 186854)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -952,7 +952,6 @@
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
-static void rs6000_restore_saved_cr (rtx, int);
 static bool rs6000_output_addr_const_extra (FILE *, rtx);
 static void rs6000_output_function_prologue (FILE *, HOST_WIDE_INT);
 static void rs6000_output_function_epilogue (FILE *, HOST_WIDE_INT);
@@ -20276,10 +20275,37 @@
we restore after the pop when possible.  */
 #define ALWAYS_RESTORE_ALTIVEC_BEFORE_POP 0
 
+/* Restoring cr is a two step process: loading a reg from the frame
+   save, then moving the reg to cr.  For ABI_V4 we must let the
+   unwinder know that the stack location is no longer valid at or
+   before the stack deallocation, but we can't emit a cfa_restore for
+   cr at the stack deallocation like we do for other registers.
+   The trouble is that it is possible for the move to cr to be
+   scheduled after the stack deallocation.  So say exactly where cr
+   is located on each of the two insns.  */
+
+static rtx
+load_cr_save (int regno, rtx frame_reg_rtx, int offset, bool exit_func)
+{
+  rtx mem = gen_frame_mem_offset (SImode, frame_reg_rtx, offset);
+  rtx reg = gen_rtx_REG (SImode, regno);
+  rtx insn = emit_move_insn (reg, mem);
+
+  if (!exit_func  DEFAULT_ABI == ABI_V4)
+{
+  rtx cr = gen_rtx_REG (SImode, CR2_REGNO);
+  rtx set = gen_rtx_SET (VOIDmode, reg, cr);
+
+  add_reg_note (insn, REG_CFA_REGISTER, set);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+  return reg;
+}
+
 /* Reload CR from REG.  */
 
 static void
-rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple)
+restore_saved_cr (rtx reg, int using_mfcr_multiple, bool exit_func)
 {
   int count = 0;
   int i;
@@ -20317,13 +20343,61 @@
   else
 for (i = 0; i  8; i++)
   if (df_regs_ever_live_p (CR0_REGNO+i)  ! call_used_regs[CR0_REGNO+i])
-   {
- emit_insn (gen_movsi_to_cr_one (gen_rtx_REG (CCmode,
-  CR0_REGNO+i),
- reg));
-   }
+   emit_insn (gen_movsi_to_cr_one (gen_rtx_REG (CCmode, CR0_REGNO+i),
+   reg));
+
+  if (!exit_func  (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap))
+{
+  rtx insn = get_last_insn ();
+  rtx cr = gen_rtx_REG (SImode, CR2_REGNO);
+
+  add_reg_note (insn, REG_CFA_RESTORE, cr);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 }
 
+/* Like cr, the move to lr instruction can be scheduled after the
+   stack deallocation, but unlike cr, its stack frame save is still
+   valid.  So we only need to emit the cfa_restore on the correct
+   instruction.  */
+
+static void
+load_lr_save (int regno, rtx frame_reg_rtx, int offset)
+{
+  rtx mem = gen_frame_mem_offset (Pmode, frame_reg_rtx, offset);
+  rtx reg = gen_rtx_REG (Pmode, regno);
+
+  emit_move_insn (reg, mem);
+}
+
+static void
+restore_saved_lr (int regno, bool exit_func)
+{
+  rtx reg = gen_rtx_REG (Pmode, regno);
+  rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
+  rtx insn = emit_move_insn (lr, reg);
+
+  if (!exit_func  flag_shrink_wrap)
+{
+  add_reg_note (insn, REG_CFA_RESTORE, lr);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+}
+
+static rtx
+add_crlr_cfa_restore (const rs6000_stack_t *info, rtx cfa_restores)
+{
+  if (info-cr_save_p)
+cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
+  gen_rtx_REG (SImode, CR2_REGNO),
+  cfa_restores);
+  if (info-lr_save_p)
+cfa_restores = alloc_reg_note

Re: rs6000 toc reference rtl again

2012-04-30 Thread Alan Modra
 (match_dup 0) (lo_sum:P (match_dup 0) (match_dup 1)))])
+
 ;; Elf specific ways of loading addresses for non-PIC code.
 ;; The output of this could be r0, but we make a very strong
 ;; preference for a base register because it will usually
@@ -12251,22 +12274,6 @@
@
 {cal|la} %0,%2@l(%1)
 {ai|addic} %0,%1,%K2)
-
-;; Largetoc support
-(define_insn largetoc_high
-  [(set (match_operand:DI 0 gpc_reg_operand =b)
-   (const:DI
- (plus:DI (match_operand:DI 1 gpc_reg_operand b)
-  (high:DI (match_operand:DI 2  )]
-   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
-   {cau|addis} %0,%1,%2@ha)
-
-(define_insn largetoc_low
-  [(set (match_operand:DI 0 gpc_reg_operand =r)
-(lo_sum:DI (match_operand:DI 1 gpc_reg_operand b)
-  (match_operand:DI 2  )))]
-   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
-   {cal %0,%2@l(%1)|addi %0,%1,%2@l})
 
 ;; Call and call_value insns
 (define_expand call
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 187010)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -38,7 +38,7 @@ extern bool macho_lo_sum_memory_operand (rtx, enum
 extern int num_insns_constant (rtx, enum machine_mode);
 extern int num_insns_constant_wide (HOST_WIDE_INT);
 extern int small_data_operand (rtx, enum machine_mode);
-extern bool toc_relative_expr_p (rtx);
+extern bool toc_relative_expr_p (const_rtx, bool);
 extern bool invalid_e500_subreg (rtx, enum machine_mode);
 extern void validate_condition_mode (enum rtx_code, enum machine_mode);
 extern bool legitimate_constant_pool_address_p (const_rtx, enum machine_mode,

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix PR53271 powerpc ports

2012-05-08 Thread Alan Modra
 (p, i + 4) = gen_rtx_SET (VOIDmode, reg, mem);
+ RTVEC_ELT (p, i + 4)
+   = gen_frame_load (reg, sp_reg_rtx, info-fp_save_offset + 8 * 
i);
  if (flag_shrink_wrap)
cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
   cfa_restores);

-- 
Alan Modra
Australia Development Lab, IBM


Re: rs6000 toc reference rtl again

2012-05-15 Thread Alan Modra
On Tue, May 01, 2012 at 01:47:51PM +0930, Alan Modra wrote:
 This revision splits the medium/large code model toc reference after
 reload.

Ping http://gcc.gnu.org/ml/gcc-patches/2012-05/msg6.html

-- 
Alan Modra
Australia Development Lab, IBM


rs6000.c forward declaration cleanup

2012-05-20 Thread Alan Modra
) ATTRIBUTE_UNUSED;
 static void
 rs6000_output_dwarf_dtprel (FILE *file, int size, rtx x)
 {
@@ -6419,7 +6205,7 @@
32-bit DImode, TImode, TFmode, TDmode), indexed addressing cannot be used
because adjacent memory cells are accessed by adding word-sized offsets
during assembly output.  */
-bool
+static bool
 rs6000_legitimate_address_p (enum machine_mode mode, rtx x, bool reg_ok_strict)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
@@ -9207,7 +8993,7 @@
 
 /* Implement va_arg.  */
 
-tree
+static tree
 rs6000_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
gimple_seq *post_p)
 {
@@ -24740,6 +24526,7 @@
function descriptor name.  This saves a lot of overriding code to
read the prefixes.  */
 
+static void rs6000_elf_encode_section_info (tree, rtx, int) ATTRIBUTE_UNUSED;
 static void
 rs6000_elf_encode_section_info (tree decl, rtx rtl, int first)
 {
@@ -25252,6 +25039,7 @@
This differs from default_named_section_asm_out_constructor in
that we have special handling for -mrelocatable.  */
 
+static void rs6000_elf_asm_out_constructor (rtx, int) ATTRIBUTE_UNUSED;
 static void
 rs6000_elf_asm_out_constructor (rtx symbol, int priority)
 {
@@ -25281,6 +25069,7 @@
 assemble_integer (symbol, POINTER_SIZE / BITS_PER_UNIT, POINTER_SIZE, 1);
 }
 
+static void rs6000_elf_asm_out_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void
 rs6000_elf_asm_out_destructor (rtx symbol, int priority)
 {
@@ -25386,6 +25175,7 @@
   ASM_OUTPUT_LABEL (file, name);
 }
 
+static void rs6000_elf_file_end (void) ATTRIBUTE_UNUSED;
 static void
 rs6000_elf_file_end (void)
 {
@@ -26931,7 +26721,7 @@
On RS/6000 an integer value is in r3 and a floating-point value is in
fp1, unless -msoft-float.  */
 
-rtx
+static rtx
 rs6000_function_value (const_tree valtype,
   const_tree fn_decl_or_type ATTRIBUTE_UNUSED,
   bool outgoing ATTRIBUTE_UNUSED)
@@ -27078,7 +26868,7 @@
We need r30 if -mminimal-toc was specified, and there are constant pool
references.  */
 
-bool
+static bool
 rs6000_can_eliminate (const int from, const int to)
 {
   return (from == ARG_POINTER_REGNUM  to == STACK_POINTER_REGNUM
@@ -28205,6 +27995,7 @@
 /* This function emits the simple thunk routine that is used to preserve
the link stack on the 476 cpu.  */
 
+static void rs6000_code_end (void) ATTRIBUTE_UNUSED;
 static void
 rs6000_code_end (void)
 {

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] save/restore reg tidy

2012-05-20 Thread Alan Modra
On Tue, May 08, 2012 at 08:02:39PM +0930, Alan Modra wrote:
 I also make use of gen_frame_store and siblings that I invented for
 generating the eh info, elsewhere in rs6000.c where doing so is
 blindingly obvious.  We could probably use them in other places too,
 but I'll leave that for later.

Like so.  The part that isn't completely obvious is removing calls to
emit_move_insn, which can transform the rtl (rs6000_emit_move).
However, we're past reload, the insns emitted are always one set
involving a hard reg and mem, and we don't want any addressing mode
subsititions going on that avoid rs6000_emit_prologue tracking of r0,
r11 and r12 usage. 

This patch also fixes a couple of places that call df_regs_ever_live_p
without checking call_used_regs to test for global asm regs.  Not
serious bugs as they just result in larger stack frames.

Bootstrapped and regression tested powerpc-linux.  OK to apply?

* config/rs6000/rs6000.c (save_reg_p): New function.
(first_reg_to_save, first_fp_reg_to_save): Use it here.
(first_altivec_reg_to_save, restore_saved_cr): Likewise.
(emit_frame_save): Use gen_frame_store.
(gen_frame_mem_offset): Correct SPE condition requiring reg+reg.
(rs6000_emit_prologue): Use save_reg_p.  Use gen_frame_store for
vrsave and toc.
(rs6000_emit_epilogue): Use save_reg_p.  Use gen_frame_load for
vrsave, toc, gp and fp restores.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 187699)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -17236,6 +17079,12 @@
 /* This page contains routines that are used to determine what the
function prologue and epilogue code will do and write them out.  */
 
+static inline bool
+save_reg_p (int r)
+{
+  return !call_used_regs[r]  df_regs_ever_live_p (r);
+}
+
 /* Return the first fixed-point register that is required to be
saved. 32 if none.  */
 
@@ -17246,14 +17095,16 @@
 
   /* Find lowest numbered live register.  */
   for (first_reg = 13; first_reg = 31; first_reg++)
-if (df_regs_ever_live_p (first_reg)
-(! call_used_regs[first_reg]
-   || (first_reg == RS6000_PIC_OFFSET_TABLE_REGNUM
-((DEFAULT_ABI == ABI_V4  flag_pic != 0)
-   || (DEFAULT_ABI == ABI_DARWIN  flag_pic)
-   || (TARGET_TOC  TARGET_MINIMAL_TOC)
+if (save_reg_p (first_reg))
   break;
 
+  if (first_reg  RS6000_PIC_OFFSET_TABLE_REGNUM
+   ((DEFAULT_ABI == ABI_V4  flag_pic != 0)
+ || (DEFAULT_ABI == ABI_DARWIN  flag_pic)
+ || (TARGET_TOC  TARGET_MINIMAL_TOC))
+   df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM))
+first_reg = RS6000_PIC_OFFSET_TABLE_REGNUM;
+
 #if TARGET_MACHO
   if (flag_pic
crtl-uses_pic_offset_table
@@ -17273,7 +17124,7 @@
 
   /* Find lowest numbered live register.  */
   for (first_reg = 14 + 32; first_reg = 63; first_reg++)
-if (df_regs_ever_live_p (first_reg))
+if (save_reg_p (first_reg))
   break;
 
   return first_reg;
@@ -17299,7 +17150,7 @@
 
   /* Find lowest numbered live register.  */
   for (i = FIRST_ALTIVEC_REGNO + 20; i = LAST_ALTIVEC_REGNO; ++i)
-if (df_regs_ever_live_p (i))
+if (save_reg_p (i))
   break;
 
   return i;
@@ -18995,7 +18904,7 @@
 emit_frame_save (rtx frame_reg, enum machine_mode mode,
 unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
-  rtx reg, insn, mem, addr;
+  rtx reg, insn;
 
   /* Some cases that need register indexed addressing.  */
   gcc_checking_assert (!((TARGET_ALTIVEC_ABI  ALTIVEC_VECTOR_MODE (mode))
@@ -19006,9 +18915,7 @@
  !SPE_CONST_OFFSET_OK (offset;
 
   reg = gen_rtx_REG (mode, regno);
-  addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
-  mem = gen_frame_mem (mode, addr);
-  insn = emit_move_insn (mem, reg);
+  insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
   return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
   NULL_RTX, NULL_RTX);
 }
@@ -19023,7 +18930,7 @@
 
   int_rtx = GEN_INT (offset);
 
-  if ((TARGET_SPE_ABI  SPE_VECTOR_MODE (mode))
+  if ((TARGET_SPE_ABI  SPE_VECTOR_MODE (mode)  !SPE_CONST_OFFSET_OK 
(offset))
   || (TARGET_E500_DOUBLE  mode == DFmode))
 {
   offset_rtx = gen_rtx_REG (Pmode, FIXED_SCRATCH);
@@ -19652,8 +19559,7 @@
 {
   int i;
   for (i = 0; i  64 - info-first_fp_reg_save; i++)
-   if (df_regs_ever_live_p (info-first_fp_reg_save + i)
-! call_used_regs[info-first_fp_reg_save + i])
+   if (save_reg_p (info-first_fp_reg_save + i))
  emit_frame_save (frame_reg_rtx,
   (TARGET_HARD_FLOAT  TARGET_DOUBLE_FLOAT
? DFmode : SFmode),
@@ -20103,7 +20009,7 @@
TARGET_ALTIVEC_VRSAVE
info-vrsave_mask != 0)
 {
-  rtx reg, mem, vrsave;
+  rtx reg

[RS6000] out-of-line save/restore conditions

2012-05-20 Thread Alan Modra
  REST_INLINE_FPRS))
 strategy |= REST_NOINLINE_FPRS_DOESNT_RESTORE_LR;
-#endif
+
   if (TARGET_MACHO  !(strategy  SAVE_INLINE_FPRS))
 strategy |= SAVE_NOINLINE_FPRS_SAVES_LR;
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Add powerpc64-linux configuration options

2012-05-24 Thread Alan Modra
On Thu, May 24, 2012 at 01:45:41PM -0400, Michael Meissner wrote:
 This alternative patch just disables building the 32-bit softfloat multlib, 
 and
 removes the -mstrict-align in the powerpc64-linux case.
 
 I have bootstrapped it and had no regressions.
 
 David, which patch do you prefer?
 
 2012-05-24  Michael Meissner  meiss...@linux.vnet.ibm.com
 
   * config/rs6000/t-linux64: Delete the 32-bit multilib that uses
   software floating point emulation.  No longer build the multilibs
   with -mstrict-align.

I like this patch.  I like it because all uses of MULTILIB_EXTRA_OPTS
are suspect.  Why?  The option is not applied to the default multilib!
That means that prior to this patch you get different libraries
depending on the compiler default.  For example, a powerpc-linux gcc
defaulting to -m32 builds the 32-bit libraries without -mstrict-align,
while a powerpc64-linux gcc defaulting to -m64 builds the 32-bit
libraries with -mstrict-align.  This is confusing to say the least,
and I recall seeing libstdc++ test failures due to the difference.

I've had MULTILIB_EXTRA_OPTS = in my t-linux64 for quite some time.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR53914, rs6000 constraints and reload queries

2012-07-24 Thread Alan Modra
 input_operand r,m,r,I,L,nF,d,m,d,*h,r,0,O))]
+  [(set (match_operand:DI 0 nonimmediate_operand 
=Y,r,r,r,r,r,?m,?*d,?*d,r,*h,*h,?wa)
+   (match_operand:DI 1 input_operand r,Y,r,I,L,nF,d,m,d,*h,r,0,O))]
   TARGET_POWERPC64  (!TARGET_MFPGPR || !TARGET_HARD_FLOAT || !TARGET_FPRS)
 (gpc_reg_operand (operands[0], DImode)
|| gpc_reg_operand (operands[1], DImode))
   @
+   std%U0%X0 %1,%0
+   ld%U1%X1 %0,%1
mr %0,%1
-   ld%U1%X1 %0,%1
-   std%U0%X0 %1,%0
li %0,%1
lis %0,%v1
#
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
mf%1 %0
mt%0 %1
{cror 0,0,0|nop}
xxlxor %x0,%x0,%x0
-  [(set_attr type 
*,load,store,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*,vecsimple)
+  [(set_attr type 
store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,vecsimple)
(set_attr length 4,4,4,4,4,20,4,4,4,4,4,4,4)])
 
 ;; immediate value valid for a single instruction hiding in a const_double
@@ -10313,8 +10313,8 @@
 ;; giving the SCRATCH mq.
 
 (define_insn *movti_power
-  [(set (match_operand:TI 0 reg_or_mem_operand =Q,m,r,r,r,r)
-   (match_operand:TI 1 input_operand r,r,r,Q,m,n))
+  [(set (match_operand:TI 0 reg_or_mem_operand =Q,Y,r,r,r,r)
+   (match_operand:TI 1 input_operand r,r,Q,Y,r,n))
(clobber (match_scratch:SI 2 =q,q#X,X,X,X,X))]
   TARGET_POWER  ! TARGET_POWERPC64
 (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], 
TImode))
@@ -10329,25 +10329,25 @@
   if (TARGET_STRING)
 return \{stsi|stswi} %1,%P0,16\;
 case 1:
+  return \#\;
 case 2:
-  return \#\;
-case 3:
   /* If the address is not used in the output, we can use lsi.  Otherwise,
 fall through to generating four loads.  */
   if (TARGET_STRING
   ! reg_overlap_mentioned_p (operands[0], operands[1]))
return \{lsi|lswi} %0,%P1,16\;
   /* ... fall through ...  */
+case 3:
 case 4:
 case 5:
   return \#\;
 }
 }
-  [(set_attr type store,store,*,load,load,*)])
+  [(set_attr type store,store,load,load,*,*)])
 
 (define_insn *movti_string
-  [(set (match_operand:TI 0 reg_or_mem_operand =Q,o,r,r,r,r)
-   (match_operand:TI 1 input_operand r,r,r,Q,m,n))]
+  [(set (match_operand:TI 0 reg_or_mem_operand =Q,Y,r,r,r,r)
+   (match_operand:TI 1 input_operand r,r,Q,Y,r,n))]
   ! TARGET_POWER  ! TARGET_POWERPC64
 (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], 
TImode))
   *
@@ -10360,33 +10360,33 @@
   if (TARGET_STRING)
 return \{stsi|stswi} %1,%P0,16\;
 case 1:
+  return \#\;
 case 2:
-  return \#\;
-case 3:
   /* If the address is not used in the output, we can use lsi.  Otherwise,
 fall through to generating four loads.  */
   if (TARGET_STRING
! reg_overlap_mentioned_p (operands[0], operands[1]))
return \{lsi|lswi} %0,%P1,16\;
   /* ... fall through ...  */
+case 3:
 case 4:
 case 5:
   return \#\;
 }
 }
-  [(set_attr type store_ux,store_ux,*,load_ux,load_ux,*)
+  [(set_attr type store_ux,store_ux,load_ux,load_ux,*,*)
(set (attr cell_micro) (if_then_else (match_test TARGET_STRING)
  (const_string always)
  (const_string conditional)))])
 
 (define_insn *movti_ppc64
-  [(set (match_operand:TI 0 nonimmediate_operand =r,o,r)
-   (match_operand:TI 1 input_operand r,r,m))]
+  [(set (match_operand:TI 0 nonimmediate_operand =Y,r,r)
+   (match_operand:TI 1 input_operand r,Y,r))]
   (TARGET_POWERPC64  (gpc_reg_operand (operands[0], TImode)
 || gpc_reg_operand (operands[1], TImode)))
 VECTOR_MEM_NONE_P (TImode)
   #
-  [(set_attr type *,store,load)])
+  [(set_attr type store,load,*)])
 
 (define_split
   [(set (match_operand:TI 0 gpc_reg_operand )
@@ -13215,8 +13215,8 @@
(set_attr length 12)])
 
 (define_insn stack_protect_setdi
-  [(set (match_operand:DI 0 memory_operand =m)
-   (unspec:DI [(match_operand:DI 1 memory_operand m)] UNSPEC_SP_SET))
+  [(set (match_operand:DI 0 memory_operand =Y)
+   (unspec:DI [(match_operand:DI 1 memory_operand Y)] UNSPEC_SP_SET))
(set (match_scratch:DI 2 =r) (const_int 0))]
   TARGET_64BIT
   ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;{lil|li} %2,0
@@ -13257,8 +13257,8 @@
 
 (define_insn stack_protect_testdi
   [(set (match_operand:CCEQ 0 cc_reg_operand =x,?y)
-(unspec:CCEQ [(match_operand:DI 1 memory_operand m,m)
- (match_operand:DI 2 memory_operand m,m)]
+(unspec:CCEQ [(match_operand:DI 1 memory_operand Y,Y)
+ (match_operand:DI 2 memory_operand Y,Y)]
 UNSPEC_SP_TEST))
(set (match_scratch:DI 4 =r,r) (const_int 0))
(clobber (match_scratch:DI 3 =r,r))]

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix PR54093, ICE due to 07-24 changes

2012-07-27 Thread Alan Modra
This fixes a thinko and typo in
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01168.html that shows up
as an ICE on e500.  Two issues really.  One is that the secondary
reload insn emitted by rs6000_secondary_reload_gpr had better be
valid.  The other is that we only need these reloads when the
insn predicate says that the address is good.  Fixing the second
problem avoids the first.

Bootstrapped and regression tested powerpc-linux, and fixes the
testcases in the PR on e500.  OK to apply?

* config/rs6000/rs6000.c (rs6000_secondary_reload): Limit 32-bit
multi-gpr reload to cases where predicate passes.  Do the same for
64-bit multi-gpr reload.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 189801)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -13643,8 +13643,11 @@ rs6000_secondary_reload (bool in_p,
GET_MODE_SIZE (GET_MODE (x)) = UNITS_PER_WORD)
 {
   rtx off = address_offset (XEXP (x, 0));
+  unsigned int extra = GET_MODE_SIZE (GET_MODE (x)) - UNITS_PER_WORD;
 
-  if (off != NULL_RTX  (INTVAL (off)  3) != 0)
+  if (off != NULL_RTX
+  (INTVAL (off)  3) != 0
+  (unsigned HOST_WIDE_INT) INTVAL (off) + 0x8000  0x1 - extra)
{
  if (in_p)
sri-icode = CODE_FOR_reload_di_load;
@@ -13662,10 +13665,17 @@ rs6000_secondary_reload (bool in_p,
GET_MODE_SIZE (GET_MODE (x))  UNITS_PER_WORD)
 {
   rtx off = address_offset (XEXP (x, 0));
+  unsigned int extra = GET_MODE_SIZE (GET_MODE (x)) - UNITS_PER_WORD;
 
+  /* We need a secondary reload only when our legitimate_address_p
+says the address is good (as otherwise the entire address
+will be reloaded).  So for mode sizes of 8 and 16 this will
+be when the offset is in the ranges [0x7ffc,0x7fff] and
+[0x7ff4,0x7ff7] respectively.  Note that the address we see
+here may have been manipulated by legitimize_reload_address.  */
   if (off != NULL_RTX
-  ((unsigned HOST_WIDE_INT) INTVAL (off) + 0x8000
- = 0x1000u - (GET_MODE_SIZE (GET_MODE (x)) - UNITS_PER_WORD)))
+  ((unsigned HOST_WIDE_INT) INTVAL (off) - (0x8000 - extra)
+  UNITS_PER_WORD))
{
  if (in_p)
sri-icode = CODE_FOR_reload_si_load;

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix PR54131, ICE building 416.gamess

2012-07-31 Thread Alan Modra
This cures the 'Y' constraint of being overly restrictive with lo_sum
offsets.  I've added a comment that explains why it is wrong to limit
the range of lo_sum offsets.  Bootstrapped and regressiotn tested
powerpc-linux.  OK to apply?

PR target/54131
* config/rs6000/rs6000.c (mem_operand_gpr): Don't limit range
of lo_sum offsets.  Comment.  Assert mode at least word size
rather than bypassing powerpc64 word offset check.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 189996)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5008,24 +5008,38 @@
 
Offsetting a lo_sum should not be allowed, except where we know by
alignment that a 32k boundary is not crossed, but see the ???
-   comment in rs6000_legitimize_reload_address.  */
+   comment in rs6000_legitimize_reload_address.  Note that by
+   offsetting here we mean a further offset to access parts of the
+   MEM.  It's fine to have a lo_sum where the inner address is offset
+   from a sym, since the same sym+offset will appear in the high part
+   of the address calculation.  */
 
 bool
 mem_operand_gpr (rtx op, enum machine_mode mode)
 {
   unsigned HOST_WIDE_INT offset;
   int extra;
+  rtx addr = XEXP (op, 0);
 
-  op = address_offset (XEXP (op, 0));
+  op = address_offset (addr);
   if (op == NULL_RTX)
 return true;
 
   offset = INTVAL (op);
+  if (TARGET_POWERPC64  (offset  3) != 0)
+return false;
+
+  if (GET_CODE (addr) == LO_SUM)
+/* We know by alignment that ABI_AIX medium/large model toc refs
+   will not cross a 32k boundary, since all entries in the
+   constant pool are naturally aligned and we check alignment for
+   other medium model toc-relative addresses.  For ABI_V4 and
+   ABI_DARWIN lo_sum addresses, we just check that 64-bit
+   offsets are 4-byte aligned.  */
+return true;
+
   extra = GET_MODE_SIZE (mode) - UNITS_PER_WORD;
-  if (extra  0)
-extra = 0;
-  else if (TARGET_POWERPC64  (offset  3) != 0)
-return false;
+  gcc_assert (extra = 0);
   return offset + 0x8000  0x1u - extra;
 }
 


-- 
Alan Modra
Australia Development Lab, IBM


Re: PR53914, rs6000 constraints and reload queries

2012-08-01 Thread Alan Modra
 *vsx_movmode
-  [(set (match_operand:VSX_M 0 nonimmediate_operand 
=Z,VSr,VSr,?Z,?wa,?wa,*o,*r,*r,VSr,?wa,v,wZ,v)
-   (match_operand:VSX_M 1 input_operand 
VSr,Z,VSr,wa,Z,wa,r,o,r,j,j,W,v,wZ))]
+  [(set (match_operand:VSX_M 0 nonimmediate_operand 
=Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,v,wZ,v)
+   (match_operand:VSX_M 1 input_operand 
VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,W,v,wZ))]
   VECTOR_MEM_VSX_P (MODEmode)
 (register_operand (operands[0], MODEmode) 
|| register_operand (operands[1], MODEmode))
@@ -272,8 +272,8 @@
 ;; Unlike other VSX moves, allow the GPRs, since a normal use of TImode is for
 ;; unions.  However for plain data movement, slightly favor the vector loads
 (define_insn *vsx_movti
-  [(set (match_operand:TI 0 nonimmediate_operand 
=Z,wa,wa,?o,?r,?r,wa,v,v,wZ)
-   (match_operand:TI 1 input_operand wa,Z,wa,r,o,r,j,W,wZ,v))]
+  [(set (match_operand:TI 0 nonimmediate_operand 
=Z,wa,wa,?Y,?r,?r,wa,v,v,wZ)
+   (match_operand:TI 1 input_operand wa,Z,wa,r,Y,r,j,W,wZ,v))]
   VECTOR_MEM_VSX_P (TImode)
 (register_operand (operands[0], TImode) 
|| register_operand (operands[1], TImode))
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 190035)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -165,8 +165,8 @@
 
 ;; Vector move instructions.
 (define_insn *altivec_movmode
-  [(set (match_operand:VM2 0 nonimmediate_operand =Z,v,v,*o,*r,*r,v,v)
-   (match_operand:VM2 1 input_operand v,Z,v,r,o,r,j,W))]
+  [(set (match_operand:VM2 0 nonimmediate_operand =Z,v,v,*Y,*r,*r,v,v)
+   (match_operand:VM2 1 input_operand v,Z,v,r,Y,r,j,W))]
   VECTOR_MEM_ALTIVEC_P (MODEmode)
 (register_operand (operands[0], MODEmode) 
|| register_operand (operands[1], MODEmode))
@@ -190,8 +190,8 @@
 ;; is for unions.  However for plain data movement, slightly favor the vector
 ;; loads
 (define_insn *altivec_movti
-  [(set (match_operand:TI 0 nonimmediate_operand =Z,v,v,?o,?r,?r,v,v)
-   (match_operand:TI 1 input_operand v,Z,v,r,o,r,j,W))]
+  [(set (match_operand:TI 0 nonimmediate_operand =Z,v,v,?Y,?r,?r,v,v)
+   (match_operand:TI 1 input_operand v,Z,v,r,Y,r,j,W))]
   VECTOR_MEM_ALTIVEC_P (TImode)
 (register_operand (operands[0], TImode) 
|| register_operand (operands[1], TImode))
Index: gcc/config/rs6000/dfp.md
===
--- gcc/config/rs6000/dfp.md(revision 190035)
+++ gcc/config/rs6000/dfp.md(working copy)
@@ -426,12 +426,12 @@
   TARGET_HARD_FLOAT  TARGET_FPRS
   { rs6000_emit_move (operands[0], operands[1], TDmode); DONE; })
 
-; It's important to list the o-f and f-o moves before f-f because
-; otherwise reload, given m-f, will try to pick f-f and reload it,
-; which doesn't make progress.  Likewise r-Y must be before r-r.
+; It's important to list the Y-r and r-Y moves before r-r because
+; otherwise reload, given m-r, will try to pick r-r and reload it,
+; which doesn't make progress.
 (define_insn_and_split *movtd_internal
-  [(set (match_operand:TD 0 nonimmediate_operand =o,d,d,r,Y,r)
-   (match_operand:TD 1 input_operand d,o,d,YGHF,r,r))]
+  [(set (match_operand:TD 0 nonimmediate_operand =m,d,d,Y,r,r)
+   (match_operand:TD 1 input_operand d,m,d,r,YGHF,r))]
   TARGET_HARD_FLOAT  TARGET_FPRS
 (gpc_reg_operand (operands[0], TDmode)
|| gpc_reg_operand (operands[1], TDmode))
Index: gcc/config/rs6000/paired.md
===
--- gcc/config/rs6000/paired.md (revision 190035)
+++ gcc/config/rs6000/paired.md (working copy)
@@ -201,8 +201,8 @@
   [(set_attr type fp)])
 
 (define_insn *movv2sf_paired
-  [(set (match_operand:V2SF 0 nonimmediate_operand =Z,f,f,o,r,r,f)
-(match_operand:V2SF 1 input_operand f,Z,f,r,o,r,W))]
+  [(set (match_operand:V2SF 0 nonimmediate_operand =Z,f,f,Y,r,r,f)
+(match_operand:V2SF 1 input_operand f,Z,f,r,Y,r,W))]
   TARGET_PAIRED_FLOAT
 (register_operand (operands[0], V2SFmode) 
|| register_operand (operands[1], V2SFmode))

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] asynch exceptions and unwind info

2011-07-26 Thread Alan Modra
Hi David,
  I've been looking into what we need to do to support unwinding from
async signal handlers.  I've implemented unwind info generation for
.glink in the linker, but to keep the ppc64 .glink unwind info simple
I've assumed that frob_update_context is still used.

We still have some difficulties related to r2 tracking on
ppc64. frob_update_context doesn't quite do the right thing for
async unwinding.  A typical (no-r11) plt call stub looks like

 addis r12,2,off@ha
 std 2,40(1)
 ld 11,off@l(12)
 mtctr 11
 ld 2,off+8@l(12)
 bctr

or, when the offset from r2 to the function descriptor is small

 std 2,40(1)
 ld 11,off(2)
 mtctr 11
 ld 2,off+8(2)
 bctr

Now if we're stopped before the save of r2 we obviously don't want the
unwinder to restore r2 from 40(1), but that's exactly what the current
unwinder does.

Also, there is a one insn window where frob_update_context may do the
wrong thing for gcc generated calls via function pointer, which
typically looks like

 ld 0,0(r)
 std 2,40(1)
 mtctr 0
 ld 2,8(r)
 bctrl
 ld 2,40(1)

Here, if we are stopped after the ld 2,8(r) then r2 needs to be
restored from 40(1).

The following patch fixes these two issues.  Ideally what I'd like to
do is have ld and gcc emit accurate r2 tracking unwind info and
dispense with hacks like frob_update_context.  If ld did emit accurate
unwind info for .glink, then the justification for frob_update_context
disappears.  The difficulty then is backwards compatibility.  You'd
need a way for the gcc unwinder to handle a mix of old code (that
needs frob_update_context) with new code (that doesn't).  One way to
accomplish this would be to set a dummy reg with initial CIE dwarf
instructions, then test this reg in frob_update_context.

Bootstrapped and regression tested powerpc64-linux.

* config/rs6000/linux-unwind.h (frob_update_context __powerpc64__):
Leave r2 REG_UNSAVED if stopped on the instruction that saves r2
in a plt call stub.  Do restore r2 if stopped on bctrl.

Index: libgcc/config/rs6000/linux-unwind.h
===
--- libgcc/config/rs6000/linux-unwind.h (revision 176780)
+++ libgcc/config/rs6000/linux-unwind.h (working copy)
@@ -346,10 +346,28 @@ frob_update_context (struct _Unwind_Cont
 figure out if it was saved.  The big problem here is that the
 code that does the save/restore is generated by the linker, so
 we have no good way to determine at compile time what to do.  */
-  unsigned int *insn
-   = (unsigned int *) _Unwind_GetGR (context, R_LR);
-  if (insn  *insn == 0xE8410028)
-   _Unwind_SetGRPtr (context, 2, context-cfa + 40);
+  if (pc[0] == 0xF8410028
+ || ((pc[0]  0x) == 0x3D82
+  pc[1] == 0xF8410028))
+   {
+ /* We are in a plt call stub or r2 adjusting long branch stub,
+before r2 has been saved.  Keep REG_UNSAVED.  */
+   }
+  else if (pc[0] == 0x4E800421
+   pc[1] == 0xE8410028)
+   {
+ /* We are at the bctrl instruction in a call via function
+pointer.  gcc always emits the load of the new r2 just
+before the bctrl.  */
+ _Unwind_SetGRPtr (context, 2, context-cfa + 40);
+   }
+  else
+   {
+ unsigned int *insn
+   = (unsigned int *) _Unwind_GetGR (context, R_LR);
+ if (insn  *insn == 0xE8410028)
+   _Unwind_SetGRPtr (context, 2, context-cfa + 40);
+   }
 }
 #endif
 }


-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] asynch exceptions and unwind info

2011-07-28 Thread Alan Modra
On Wed, Jul 27, 2011 at 03:00:45PM +0930, Alan Modra wrote:
 Ideally what I'd like to
 do is have ld and gcc emit accurate r2 tracking unwind info and
 dispense with hacks like frob_update_context.  If ld did emit accurate
 unwind info for .glink, then the justification for frob_update_context
 disappears.

For the record, this statement of mine doesn't make sense.  A .glink
stub doesn't make a frame, so a backtrace won't normally pass through a
stub, thus having accurate unwind info for .glink doesn't help at all.

ld would need to insert unwind info for r2 on the call, but that
involves editing .eh_frame and in any case isn't accurate since
the r2 save doesn't happen until one or two instructions after the
call, in the stub.  I think we are stuck with frob_update_context.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] asynch exceptions and unwind info

2011-07-28 Thread Alan Modra
On Thu, Jul 28, 2011 at 11:49:16AM -0700, Richard Henderson wrote:
 On 07/28/2011 12:27 AM, Alan Modra wrote:
  On Wed, Jul 27, 2011 at 03:00:45PM +0930, Alan Modra wrote:
  Ideally what I'd like to
  do is have ld and gcc emit accurate r2 tracking unwind info and
  dispense with hacks like frob_update_context.  If ld did emit accurate
  unwind info for .glink, then the justification for frob_update_context
  disappears.
  
  For the record, this statement of mine doesn't make sense.  A .glink
  stub doesn't make a frame, so a backtrace won't normally pass through a
  stub, thus having accurate unwind info for .glink doesn't help at all.
 
 It does, for the duration of the stub.

Right, but I was talking about the normal case, where the unwinder
won't even look at .glink unwind info.

 The whole problem is that toc pointer copy in 40(1) is only valid
 during indirect call sequences, and iff ld inserted a stub?  I.e.
 direct calls between functions that share toc pointers never save
 the copy?

Yes.

 Would it make sense, if a function has any indirect call, to move
 the toc pointer save into the prologue?  You'd get to avoid that
 store all the time.  Of course you'd not be able to sink the load
 after the call, but it might still be a win.  And in that special
 case you can annotate the r2 save slot just once, correctly.

Except that any info about r2 in an indirect call sequence really
belongs to the *called* function frame, not the callee.  I woke up
this morning with the realization that what I'd done in
frob_update_context for indirect call sequences was wrong.  Ditto for
the r2 store that Michael moved into the prologue.  The only time we
want the unwinder to restore from that particular save is if r2 isn't
saved in the current frame.

Untested patch follows.

libgcc/
* config/rs6000/linux-unwind.h (frob_update_context __powerpc64__):
Restore for indirect call bcrtl from correct stack slot, and only
if cfa+40 isn't valid.
gcc/
* config/rs6000/rs6000-protos.h (rs6000_save_toc_in_prologue_p): Delete.
* config/rs6000/rs6000.c (rs6000_save_toc_in_prologue_p): Make static.
(rs6000_emit_prologue): Don't emit eh_frame info in
save_toc_in_prologue case.
(rs6000_call_indirect_aix): Formatting.

Index: libgcc/config/rs6000/linux-unwind.h
===
--- libgcc/config/rs6000/linux-unwind.h (revision 176905)
+++ libgcc/config/rs6000/linux-unwind.h (working copy)
@@ -354,20 +354,22 @@ frob_update_context (struct _Unwind_Cont
  /* We are in a plt call stub or r2 adjusting long branch stub,
 before r2 has been saved.  Keep REG_UNSAVED.  */
}
-  else if (pc[0] == 0x4E800421
-   pc[1] == 0xE8410028)
-   {
- /* We are at the bctrl instruction in a call via function
-pointer.  gcc always emits the load of the new r2 just
-before the bctrl.  */
- _Unwind_SetGRPtr (context, 2, context-cfa + 40);
-   }
   else
{
  unsigned int *insn
= (unsigned int *) _Unwind_GetGR (context, R_LR);
  if (insn  *insn == 0xE8410028)
_Unwind_SetGRPtr (context, 2, context-cfa + 40);
+ else if (pc[0] == 0x4E800421
+   pc[1] == 0xE8410028)
+   {
+ /* We are at the bctrl instruction in a call via function
+pointer.  gcc always emits the load of the new R2 just
+before the bctrl so this is the first and only place
+we need to use the stored R2.  */
+ _Unwind_Word sp = _Unwind_GetGR (context, 1);
+ _Unwind_SetGRPtr (context, 2, sp + 40);
+   }
}
 }
 #endif
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 176905)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -172,8 +172,6 @@ extern void rs6000_emit_epilogue (int);
 extern void rs6000_emit_eh_reg_restore (rtx, rtx);
 extern const char * output_isel (rtx *);
 extern void rs6000_call_indirect_aix (rtx, rtx, rtx);
-extern bool rs6000_save_toc_in_prologue_p (void);
-
 extern void rs6000_aix_asm_output_dwarf_table_ref (char *);
 
 /* Declare functions in rs6000-c.c */
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 176905)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1178,6 +1178,7 @@ static void rs6000_conditional_register_
 static void rs6000_trampoline_init (rtx, tree, rtx);
 static bool rs6000_cannot_force_const_mem (enum machine_mode, rtx);
 static bool rs6000_legitimate_constant_p (enum machine_mode, rtx);
+static bool rs6000_save_toc_in_prologue_p (void);
 
 /* Hash table stuff for keeping track of TOC entries.  */
 
@@ -20536,8 +20562,11 @@ rs6000_emit_prologue (void)
 
   /* If we

Re: [RS6000] asynch exceptions and unwind info

2011-07-28 Thread Alan Modra
On Thu, Jul 28, 2011 at 12:09:51PM -0700, Richard Henderson wrote:
 Well, even if we're not able to hoist the R2 store, we may be able
 to simply add REG_CFA_OFFSET and REG_CFA_RESTORE notes to the insns
 in the stream.

You'd need to mark every non-local call with something that says
R2 may be saved, effectively duplicating md_frob_update in dwarf.
I guess that is possible even without extending our eh encoding, but
each call would have at least 6 bytes added to eh_frame:
   DW_CFA_expression, 2, 3, DW_OP_skip, offset_to_r2_prog
and you'd need to emit multiple copies of r2_prog for functions that
have a lot of calls, since the offset is limited to +/-32k.  I think
that would inflate the size of .eh_frame too much, and slow down
handling of exceptions dramatically.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] asynch exceptions and unwind info

2011-07-29 Thread Alan Modra
On Fri, Jul 29, 2011 at 10:57:48AM +0930, Alan Modra wrote:
 Except that any info about r2 in an indirect call sequence really
 belongs to the *called* function frame, not the callee.  I woke up
 this morning with the realization that what I'd done in
 frob_update_context for indirect call sequences was wrong.  Ditto for
 the r2 store that Michael moved into the prologue.  The only time we
 want the unwinder to restore from that particular save is if r2 isn't
 saved in the current frame.
 
 Untested patch follows.

Here's a tested patch that fixes an issue with TOC_SINGLE_PIC_BASE and
enables Michael's save_toc_in_prologue optimization for all functions
except those that make dynamic stack adjustments.

Incidentally, the rs6000_emit_prologue comment I added below suggests
another solution.  Since all we need is the toc pointer for the frame,
it would be possible to tell the unwinder to simply load r2 from the
.opd entry.  I think..

libgcc/
* config/rs6000/linux-unwind.h (frob_update_context __powerpc64__):
Restore for indirect call bcrtl from correct stack slot, and only
if cfa+40 isn't valid.
gcc/
* config/rs6000/rs6000-protos.h (rs6000_save_toc_in_prologue_p): Delete.
* config/rs6000/rs6000.c (rs6000_save_toc_in_prologue_p): Make static.
(rs6000_emit_prologue): Don't prematurely return when
TARGET_SINGLE_PIC_BASE.  Don't emit eh_frame info in
save_toc_in_prologue case.
(rs6000_call_indirect_aix): Only disallow save_toc_in_prologue for
calls_alloca.

Index: libgcc/config/rs6000/linux-unwind.h
===
--- libgcc/config/rs6000/linux-unwind.h (revision 176905)
+++ libgcc/config/rs6000/linux-unwind.h (working copy)
@@ -354,20 +354,22 @@ frob_update_context (struct _Unwind_Cont
  /* We are in a plt call stub or r2 adjusting long branch stub,
 before r2 has been saved.  Keep REG_UNSAVED.  */
}
-  else if (pc[0] == 0x4E800421
-   pc[1] == 0xE8410028)
-   {
- /* We are at the bctrl instruction in a call via function
-pointer.  gcc always emits the load of the new r2 just
-before the bctrl.  */
- _Unwind_SetGRPtr (context, 2, context-cfa + 40);
-   }
   else
{
  unsigned int *insn
= (unsigned int *) _Unwind_GetGR (context, R_LR);
  if (insn  *insn == 0xE8410028)
_Unwind_SetGRPtr (context, 2, context-cfa + 40);
+ else if (pc[0] == 0x4E800421
+   pc[1] == 0xE8410028)
+   {
+ /* We are at the bctrl instruction in a call via function
+pointer.  gcc always emits the load of the new R2 just
+before the bctrl so this is the first and only place
+we need to use the stored R2.  */
+ _Unwind_Word sp = _Unwind_GetGR (context, 1);
+ _Unwind_SetGRPtr (context, 2, sp + 40);
+   }
}
 }
 #endif
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 176905)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -172,8 +172,6 @@ extern void rs6000_emit_epilogue (int);
 extern void rs6000_emit_eh_reg_restore (rtx, rtx);
 extern const char * output_isel (rtx *);
 extern void rs6000_call_indirect_aix (rtx, rtx, rtx);
-extern bool rs6000_save_toc_in_prologue_p (void);
-
 extern void rs6000_aix_asm_output_dwarf_table_ref (char *);
 
 /* Declare functions in rs6000-c.c */
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 176905)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1178,6 +1178,7 @@ static void rs6000_conditional_register_
 static void rs6000_trampoline_init (rtx, tree, rtx);
 static bool rs6000_cannot_force_const_mem (enum machine_mode, rtx);
 static bool rs6000_legitimate_constant_p (enum machine_mode, rtx);
+static bool rs6000_save_toc_in_prologue_p (void);
 
 /* Hash table stuff for keeping track of TOC entries.  */
 
@@ -20478,14 +20504,12 @@ rs6000_emit_prologue (void)
   insn = emit_insn (generate_set_vrsave (reg, info, 0));
 }
 
-  if (TARGET_SINGLE_PIC_BASE)
-return; /* Do not set PIC register */
-
   /* If we are using RS6000_PIC_OFFSET_TABLE_REGNUM, we need to set it up.  */
-  if ((TARGET_TOC  TARGET_MINIMAL_TOC  get_pool_size () != 0)
-  || (DEFAULT_ABI == ABI_V4
-  (flag_pic == 1 || (flag_pic  TARGET_SECURE_PLT))
-  df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM)))
+  if (!TARGET_SINGLE_PIC_BASE
+   ((TARGET_TOC  TARGET_MINIMAL_TOC  get_pool_size () != 0)
+ || (DEFAULT_ABI == ABI_V4
+  (flag_pic == 1 || (flag_pic  TARGET_SECURE_PLT))
+  df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM

Re: [RS6000] asynch exceptions and unwind info

2011-07-29 Thread Alan Modra
On Fri, Jul 29, 2011 at 09:16:09AM -0400, David Edelsohn wrote:
 Which has the problem?  Which are you trying to solve?  And how is
 your change solving it?

Michael's save_toc_in_prologue emit_frame_save writes unwind info for
the wrong frame.  That r2 save is the current r2.  What we need is
info about the previous r2, so we can restore when unwinding.

I made a similar mistake in frob_update_context in that the value
saved by an indirect function call sequence is the r2 for the current
function.  I also restored from the wrong location.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] asynch exceptions and unwind info

2011-08-01 Thread Alan Modra
On Fri, Jul 29, 2011 at 10:28:28PM +0930, Alan Modra wrote:
 libgcc/
   * config/rs6000/linux-unwind.h (frob_update_context __powerpc64__):
   Restore for indirect call bcrtl from correct stack slot, and only
   if cfa+40 isn't valid.
 gcc/
   * config/rs6000/rs6000-protos.h (rs6000_save_toc_in_prologue_p): Delete.
   * config/rs6000/rs6000.c (rs6000_save_toc_in_prologue_p): Make static.
   (rs6000_emit_prologue): Don't prematurely return when
   TARGET_SINGLE_PIC_BASE.  Don't emit eh_frame info in
   save_toc_in_prologue case.
   (rs6000_call_indirect_aix): Only disallow save_toc_in_prologue for
   calls_alloca.

Approved offline and applied with a comment change.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix ICE in reg_save

2011-08-01 Thread Alan Modra
Seen when running binutils testsuite on powerpc-linux.
.../ld-elfvers/vers24b.c: In function 'foo':
.../ld-elfvers/vers24b.c:5:1: internal compiler error: in reg_save, at 
dwarf2cfi.c:827

Bootstrapped and regression tested powerpc-linux.  Committed as obvious.

* config/rs6000/rs6000.c (rs6000_emit_prologue): Add REG_CFA_RESTORE
note for save_LR_around_toc_setup sequence.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 177041)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20505,6 +20505,7 @@ rs6000_emit_prologue (void)
  rs6000_emit_load_toc_table (TRUE);
 
  insn = emit_move_insn (lr, frame_ptr_rtx);
+ add_reg_note (insn, REG_CFA_RESTORE, lr);
  RTX_FRAME_RELATED_P (insn) = 1;
}
   else

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 4/6] Shrink-wrapping

2011-08-03 Thread Alan Modra
On Thu, Jul 28, 2011 at 12:35:46PM +0200, Bernd Schmidt wrote:
[snip]
   * rtl.h (ANY_RETURN_P): New macro.
[snip]

This patch makes rebuild_jump_labels set JUMP_LABEL appropriately
for return jumps, and fixes sharing for RETURN.  Since ANY_RETURN_P(X)
is defined as ((X) == ret_rtx), RETURNs need to stay shared.
Bootstrapped and regression tested powerpc-linux and powerpc64-linux.
OK to apply?

PR rtl-optimization/49941
* jump.c (mark_jump_label): Comment.
(mark_jump_label_1): Set JUMP_LABEL for return jumps.
* emit-rtl.c (copy_rtx_if_shared_1, copy_insn_1): Leave RETURN shared.
(mark_used_flags): Don't mark RETURN.

Index: gcc/jump.c
===
--- gcc/jump.c  (revision 177084)
+++ gcc/jump.c  (working copy)
@@ -1039,6 +1039,7 @@ sets_cc0_p (const_rtx x)
notes.  If INSN is an INSN or a CALL_INSN or non-target operands of
a JUMP_INSN, and there is at least one CODE_LABEL referenced in
INSN, add a REG_LABEL_OPERAND note containing that label to INSN.
+   For returnjumps, the JUMP_LABEL will also be set as appropriate.
 
Note that two labels separated by a loop-beginning note
must be kept distinct if we have not yet done loop-optimization,
@@ -1081,6 +1082,14 @@ mark_jump_label_1 (rtx x, rtx insn, bool
 case CALL:
   return;
 
+case RETURN:
+  if (is_target)
+   {
+ gcc_assert (JUMP_LABEL (insn) == NULL || JUMP_LABEL (insn) == x);
+ JUMP_LABEL (insn) = x;
+   }
+  return;
+
 case MEM:
   in_mem = true;
   break;
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 177084)
+++ gcc/emit-rtl.c  (working copy)
@@ -2724,6 +2724,7 @@ repeat:
 case CODE_LABEL:
 case PC:
 case CC0:
+case RETURN:
 case SCRATCH:
   /* SCRATCH must be shared because they represent distinct values.  */
   return;
@@ -2843,6 +2844,7 @@ repeat:
 case CODE_LABEL:
 case PC:
 case CC0:
+case RETURN:
   return;
 
 case DEBUG_INSN:
@@ -5257,6 +5259,7 @@ copy_insn_1 (rtx orig)
 case CODE_LABEL:
 case PC:
 case CC0:
+case RETURN:
   return orig;
 case CLOBBER:
   if (REG_P (XEXP (orig, 0))  REGNO (XEXP (orig, 0))  
FIRST_PSEUDO_REGISTER)

-- 
Alan Modra
Australia Development Lab, IBM


simplify some range tests in rs6000.c

2011-08-03 Thread Alan Modra
This patch came about because I noticed a comment about the second
condition in rs6000_emit_epilogue no longer matched the code.  I was
going to fix the comment, then noticed the code could be simplified.

In rs6000_emit_epilogue, lr_save_offset is a small positive value (4,8
or 16), or zero when !lr_save_p, and total_size a HOST_WIDE_INT.  In
rs6000_legitimate_offset_address_p, offset is an unsigned
HOST_WIDE_INT, extra a value between 0 and 12.

Unless my numerical analysis has totally deserted me, the new code
here is exactly equivalent to the old.  Bootstrapped and regression
tested powerpc-linux.  Committing as obvious.

* config/rs6000/rs6000.c (rs6000_emit_epilogue): Simplify
use_backchain_to_restore_sp initialisation.
(rs6000_legitimate_offset_address_p): Simplify offset test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 177267)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5391,9 +5391,9 @@ rs6000_legitimate_offset_address_p (enum
   break;
 }
 
   offset += 0x8000;
-  return (offset  0x1)  (offset + extra  0x1);
+  return offset  0x1 - extra;
 }
 
 bool
 legitimate_indexed_address_p (rtx x, int strict)
@@ -20711,12 +20711,9 @@ rs6000_emit_epilogue (int sibcall)
  is more efficient than an addis, addi pair.  The second condition
  here will not trigger at the moment;  We don't actually need a
  frame pointer for alloca, but the generic parts of the compiler
  give us one anyway.  */
-  use_backchain_to_restore_sp = (info-total_size  32767
-|| info-total_size
-+ (info-lr_save_p ? info-lr_save_offset 
: 0)
-   32767
+  use_backchain_to_restore_sp = (info-total_size  32767 - 
info-lr_save_offset
 || (cfun-calls_alloca
  !frame_pointer_needed));
   restore_lr = (info-lr_save_p
 (restoring_FPRs_inline

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC prologue and epilogue 6

2012-05-30 Thread Alan Modra
On Wed, May 30, 2012 at 03:21:28PM +0200, Dominique Dhumieres wrote:
 I get an ICE of the form
 
 /opt/gcc/work/gcc/testsuite/gcc.target/powerpc/savres.c: In function 'nb_all':
 /opt/gcc/work/gcc/testsuite/gcc.target/powerpc/savres.c:473:3: internal 
 compiler error: in rs6000_emit_prologue, at config/rs6000/rs6000.c:19850
 
 Is the test intended to work on PIC targets?

No, but see rs6000/darwin.h CC1_SPEC.  -static makes you non-PIC.

I've just built a darwin cc1 to reproduce the problem.  The ICE is on
START_USE (ptr_regno);
when setting up a reg to use for altivec saves.  The reg clashes with
the static chain pointer (nb_all is a nested function), so this is a
real bug that the register checks have uncovered.  I haven't
determined whether this is a new bug introduced with my prologue
changes, or whether it's a long-standing bug.  I suspect the latter.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC prologue and epilogue 6

2012-05-30 Thread Alan Modra
On Thu, May 31, 2012 at 09:43:09AM +0930, Alan Modra wrote:
 real bug that the register checks have uncovered.  I haven't
 determined whether this is a new bug introduced with my prologue
 changes, or whether it's a long-standing bug.  I suspect the latter.

Looks like it is one I introduced.  gcc-4.6 uses r12 to save altivec
regs, my new code tries to use r11.  Will fix.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC prologue and epilogue 6

2012-05-31 Thread Alan Modra
On Thu, May 31, 2012 at 10:41:26AM +0930, Alan Modra wrote:
 Looks like it is one I introduced.  gcc-4.6 uses r12 to save altivec
 regs, my new code tries to use r11.  Will fix.

Please try out this patch on Darwin.  Bootstrapped and regression
tested powerpc-linux.

gcc/
* config/rs6000/rs6000.c (ptr_regno_for_savres): Comment.
(rs6000_emit_prologue): Ensure register used for inline saves
of vector regs is not the static chain register.  Revise comment.
gcc/testsuite/
* gcc.target/powerpc/savres.c: Add -static to dg-options.
Check static chain in nested funcs.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 187999)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -19108,6 +19161,9 @@ rs6000_emit_stack_reset (rs6000_stack_t *info,
   return NULL_RTX;
 }
 
+/* Return the register number used as a pointer by out-of-line
+   save/restore functions.  */
+
 static inline unsigned
 ptr_regno_for_savres (int sel)
 {
@@ -19845,6 +19901,9 @@ rs6000_emit_prologue (void)
  int sel = SAVRES_SAVE | SAVRES_VR;
  unsigned ptr_regno = ptr_regno_for_savres (sel);
 
+ if (using_static_chain_p
+  ptr_regno == STATIC_CHAIN_REGNUM)
+   ptr_regno = 12;
  if (REGNO (frame_reg_rtx) != ptr_regno)
START_USE (ptr_regno);
  ptr_reg = gen_rtx_REG (Pmode, ptr_regno);
@@ -19953,9 +20012,9 @@ rs6000_emit_prologue (void)
   int offset;
   int save_regno;
 
-  /* Get VRSAVE onto a GPR.  Note that ABI_V4 might be using r12
-as frame_reg_rtx and r11 as the static chain pointer for
-nested functions.  */
+  /* Get VRSAVE onto a GPR.  Note that ABI_V4 and ABI_DARWIN might
+be using r12 as frame_reg_rtx and r11 as the static chain
+pointer for nested functions.  */
   save_regno = 12;
   if (DEFAULT_ABI == ABI_AIX  !using_static_chain_p)
save_regno = 11;
Index: gcc/testsuite/gcc.target/powerpc/savres.c
===
--- gcc/testsuite/gcc.target/powerpc/savres.c   (revision 187999)
+++ gcc/testsuite/gcc.target/powerpc/savres.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -fno-inline -fomit-frame-pointer } */
+/* { dg-options -fno-inline -fomit-frame-pointer -static } */
 
 /* -fno-inline -maltivec -m32/-m64 -mmultiple/no-multiple -Os/-O2.  */
 #ifndef NO_BODY
@@ -73,6 +73,7 @@ __attribute__ ((vector_size (16))) int val31 = {-3
 
 #else /* NO_BODY */
 /* For looking at prologue and epilogue code without distractions.  */
+#define abort()
 #define TRASH_ALL_CR
 #define TRASH_ALL_VR
 #define TRASH_ALL_FPR
@@ -458,7 +459,7 @@ void s_0 (void)
 void wb_all (void)
 {
   char b[10];
-  void nb_all (void)
+  char *nb_all (void)
   {
 char a[33000];
 TRASH_ALL_CR;
@@ -470,14 +471,16 @@ void wb_all (void)
 USE_ALL_FPR;
 USE_ALL_GPR;
 __asm __volatile (#%0 %1 : =m (a), =m (b) : : cr2, cr3, cr4, 
v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, 
v31, fr14, fr15, fr16, fr17, fr18, fr19, fr20, fr21, fr22, 
fr23, fr24, fr25, fr26, fr27, fr28, fr29, fr30, fr31, r14, 
r15, r16, r17, r18, r19, r20, r21, r22, r23, r24, r25, 
r26, r27, r28, r29, r30, r31);
+return b;
   }
-  nb_all();
+  if (nb_all() != b)
+abort ();
 }
 
 void wb_cvfr (void)
 {
   char b[10];
-  void nb_cvfr (void)
+  char *nb_cvfr (void)
   {
 char a[33000];
 TRASH_SOME_CR;
@@ -489,14 +492,16 @@ void wb_cvfr (void)
 USE_SOME_FPR;
 USE_SOME_GPR;
 __asm __volatile (#%0 %1 : =m (a), =m (b) : : cr2, v26, v27, 
v31, fr28, fr31, r30, r31);
+return b;
   }
-  nb_cvfr ();
+  if (nb_cvfr () != b)
+abort ();
 }
 
 void wb_vfr (void)
 {
   char b[10];
-  void nb_vfr (void)
+  char *nb_vfr (void)
   {
 char a[33000];
 TRASH_SOME_VR;
@@ -506,14 +511,16 @@ void wb_vfr (void)
 USE_SOME_FPR;
 USE_SOME_GPR;
 __asm __volatile (#%0 %1 : =m (a), =m (b) : : v26, v27, v31, 
fr28, fr31, r30, r31);
+return b;
   }
-  nb_vfr ();
+  if (nb_vfr () != b)
+abort ();
 }
 
 void wb_cvf (void)
 {
   char b[10];
-  void nb_cvf (void)
+  char *nb_cvf (void)
   {
 char a[33000];
 TRASH_SOME_CR;
@@ -523,14 +530,16 @@ void wb_cvf (void)
 USE_SOME_VR;
 USE_SOME_FPR;
 __asm __volatile (#%0 %1 : =m (a), =m (b) : : cr2, v26, v27, 
v31, fr28, fr31);
+return b;
   }
-  nb_cvf ();
+  if (nb_cvf () != b)
+abort ();
 }
 
 void wb_vf (void)
 {
   char b[10];
-  void nb_vf (void)
+  char *nb_vf (void)
   {
 char a[33000];
 TRASH_SOME_VR;
@@ -538,15 +547,17 @@ void wb_vf (void)
 USE_SOME_VR;
 USE_SOME_FPR;
 __asm __volatile (#%0 %1 : =m (a), =m (b) : : v26, v27, v31, 
fr28, fr31);
+return b;
   }
-  nb_vf ();
+  if (nb_vf () != b)
+abort ();
 }
 #endif
 
 void wb_cvr (void)
 {
   char b[10];
-  void nb_cvr (void)
+  char *nb_cvr (void)
   {
 char

Re: PowerPC prologue and epilogue 6

2012-05-31 Thread Alan Modra
On Thu, May 31, 2012 at 02:16:32PM +0200, Dominique Dhumieres wrote:
 (the final patch will require the suitable dg directives;-).

This is really stretching my testsuite knowledge.  Maybe add

/* { dg-additional-options -mdynamic-no-pic { target *-*-darwin* } } */


-- 
Alan Modra
Australia Development Lab, IBM


Re: [Target maintainers]: Please update libjava/sysdep/*/locks.h with new atomic builtins

2012-06-20 Thread Alan Modra
On Wed, Jun 20, 2012 at 09:10:44AM -0400, David Edelsohn wrote:
  inline static void
  release_set (volatile obj_addr_t *addr, obj_addr_t new_val)
  {
 -  __asm__ __volatile__ (sync : : : memory);
 -  *addr = new_val;
 +  __atomic_store_n(addr, val, __ATOMIC_RELEASE);

A typo seems to have crept in here.  s/val/new_val/

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] atomic tweaks

2012-06-21 Thread Alan Modra
On Thu, Jun 21, 2012 at 12:08:42PM -0400, David Edelsohn wrote:
 4) Later loads and stores to which a dependency is carried from the
 initial load (see 1.10p9 and 1.10p10 in the standard) cannot be
 reordered to precede the load(memory_order_consume).  Other loads and
 stores following the load(memory_order_consume) may be freely
 reordered to precede the load(memory_order_consume).
 
 A relaxed load is subject to constraints #1-#3 above, but not #4.  In
 contrast, a memory_order_acquire load would be subject to #1-3 above,
 but would be subject to a stronger version of #4 that prohibited -any-
 load or store following the load(memory_order_consume) to be reordered
 prior to that load(memory_order_consume).
 
 
 The isync is not technically required, but removing it requires subtle
 load-store tracking conformance from the compiler and library.  We
 left it in to be safe. I'm curious about Richard and Jakub's current
 thoughts.

Hmm, OK.  So, let me see if I understand.  We know

p = atomic_load_n (addr, __ATOMIC_CONSUME);
i = *p;

needs no barriers on powerpc.  But if we had something like

p = atomic_load_n (addr, __ATOMIC_CONSUME);
if (p == foo)
  {
i = *p;
...
  }

and the compiler optimised i = *p to i = foo, then you'd be in
trouble because the hardware won't see the load from foo having any
dependency on p.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] offset addressing

2012-07-10 Thread Alan Modra
Hi David,
  I've been auditing rs6000.c after looking at
https://bugzilla.redhat.com/show_bug.cgi?id=837630
We have problems all over the place.  :-(

For starters, this code in rs6000_legitimate_offset_address_p.

case TFmode:
  ...
case TDmode:
case TImode:
  if (mode == TFmode || mode == TDmode || !TARGET_POWERPC64)
extra = 12;
  else if (offset  3)
return false;
  else
extra = 8;
  break;

Notice how a TFmode when TARGET_POWERPC64 gets extra==12.  How can
that be corrent?  Both general and floating point registers are 8
bytes.  However, this is not the main problem I see here and in many
other places.  The problem is that the mode is only a hint as to the
register used in the insn.  You can easily have a floating point mode
in gprs or an integer mode in fprs, as shown by the following
examples.  These are all quite contrived, but a similar situation may
arise in real code due to register pressure.

double a, b;

void f1 (void)
{
  double tmp = a;
  asm (# : +r (tmp) : :
   fr0, fr1, fr2,fr3, fr4, fr5, fr6, fr7,
   fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15,
   fr16, fr17, fr18, fr19, fr20, fr21, fr22, fr23,
   fr24, fr25, fr26, fr27, fr28, fr29, fr30, fr31);
  b = tmp;
}

long long x, y;

void f2 (void)
{
  long long tmp = x;
  asm (# : +f (tmp) : :
   r0, r1, r2,r3, r4, r5, r6, r7,
   r8, r9, r10, r11, r12, r13, r14, r15,
   r16, r17, r18, r19, r20, r21, r22, r23,
   r24, r25, r26, r27, r28, r29, r30, r31);
  y = tmp;
}

long double la, lb;

void f3 (void)
{
  long double tmp = la;
  asm (# : +r (tmp) : :
   fr0, fr1, fr2,fr3, fr4, fr5, fr6, fr7,
   fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15,
   fr16, fr17, fr18, fr19, fr20, fr21, fr22, fr23,
   fr24, fr25, fr26, fr27, fr28, fr29, fr30, fr31);
  lb = tmp;
}

So I think that rs6000_legitimate_offset_address_p code snippet ought
to be

case TFmode:
  ...
case TDmode:
case TImode:
  if (!TARGET_POWERPC64)
extra = 12;
  else if (offset  3)
return false;
  else
extra = 8;
  break;

We can't assume that TFmode or TDmode are in fprs, so for 32-bit that
means a possible addition of 12 to the offset and for 64-bit it means
the offset must be a multiple of 4.  If I go ahead and fix the many
places in rs6000.c we will see some code degradation for oddball
cases.  For example, loading a mis-aligned double (well, not 4-byte
aligned) on ppc64 to a fpr will use direct addressing rather than
offset addressing.  Maybe we could do something about that with a
peephole..

I know the above isn't anything new.  When this sort of problem came
up in the past we didn't want to lose code quality in the common case,
and tried to avoid the correctness issue with hacks that made
incorrect code unlikely to occur.  The thing is that with section
anchors we are much more likely to see offset addresses near 32k
boundaries.  So it is worth trying to fix this properly?  If it is,
here's the first baby step toward that aim.

Pegs the negative limit at -32768 instead of -32780, and increases the
positive limit a little for powerpc64.  This patch is a prerequisite
to making rs6000_legitimate_offset_address_p fixes since this one
affects insn operand constraints, while other places affect
predicates;  You can't have insn contraints not covering all operands
allowed by the predicate, but the converse is fine.

Bootstrapped etc. powerpc64-linux and powerpc-linux.

* config/rs6000/rs6000.c (rs6000_mode_dependent_address): Correct
offset addressing limits.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 189345)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6409,7 +6409,7 @@ rs6000_mode_dependent_address (const_rtx addr)
   GET_CODE (XEXP (addr, 1)) == CONST_INT)
{
  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
- return val + 12 + 0x8000 = 0x1;
+ return val + 0x8000 = 0x1 - (TARGET_POWERPC64 ? 8 : 12);
}
   break;
 
-- 
Alan Modra
Australia Development Lab, IBM


Re: PR53914, rs6000 constraints and reload queries

2012-07-18 Thread Alan Modra
 (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], 
TImode))
   *
@@ -10360,33 +10360,33 @@
   if (TARGET_STRING)
 return \{stsi|stswi} %1,%P0,16\;
 case 1:
+  return \#\;
 case 2:
-  return \#\;
-case 3:
   /* If the address is not used in the output, we can use lsi.  Otherwise,
 fall through to generating four loads.  */
   if (TARGET_STRING
! reg_overlap_mentioned_p (operands[0], operands[1]))
return \{lsi|lswi} %0,%P1,16\;
   /* ... fall through ...  */
+case 3:
 case 4:
 case 5:
   return \#\;
 }
 }
-  [(set_attr type store_ux,store_ux,*,load_ux,load_ux,*)
+  [(set_attr type store_ux,store_ux,load_ux,load_ux,*,*)
(set (attr cell_micro) (if_then_else (match_test TARGET_STRING)
  (const_string always)
  (const_string conditional)))])
 
 (define_insn *movti_ppc64
-  [(set (match_operand:TI 0 nonimmediate_operand =r,o,r)
-   (match_operand:TI 1 input_operand r,r,m))]
+  [(set (match_operand:TI 0 nonimmediate_operand =Y,r,r)
+   (match_operand:TI 1 input_operand r,Y,r))]
   (TARGET_POWERPC64  (gpc_reg_operand (operands[0], TImode)
 || gpc_reg_operand (operands[1], TImode)))
 VECTOR_MEM_NONE_P (TImode)
   #
-  [(set_attr type *,store,load)])
+  [(set_attr type store,load,*)])
 
 (define_split
   [(set (match_operand:TI 0 gpc_reg_operand )
@@ -13215,8 +13215,8 @@
(set_attr length 12)])
 
 (define_insn stack_protect_setdi
-  [(set (match_operand:DI 0 memory_operand =m)
-   (unspec:DI [(match_operand:DI 1 memory_operand m)] UNSPEC_SP_SET))
+  [(set (match_operand:DI 0 memory_operand =Y)
+   (unspec:DI [(match_operand:DI 1 memory_operand Y)] UNSPEC_SP_SET))
(set (match_scratch:DI 2 =r) (const_int 0))]
   TARGET_64BIT
   ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;{lil|li} %2,0
@@ -13257,8 +13257,8 @@
 
 (define_insn stack_protect_testdi
   [(set (match_operand:CCEQ 0 cc_reg_operand =x,?y)
-(unspec:CCEQ [(match_operand:DI 1 memory_operand m,m)
- (match_operand:DI 2 memory_operand m,m)]
+(unspec:CCEQ [(match_operand:DI 1 memory_operand Y,Y)
+ (match_operand:DI 2 memory_operand Y,Y)]
 UNSPEC_SP_TEST))
(set (match_scratch:DI 4 =r,r) (const_int 0))
(clobber (match_scratch:DI 3 =r,r))]

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR53914, rs6000 constraints and reload queries

2012-07-19 Thread Alan Modra
On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote:
 and fixes pr54009.

David, in looking over this today, I realised that this bug isn't
completely fixed.  I stopped gcc emitting an offset of 32768, but that
isn't enough.  lo_sum addresses can't be offset, except when you know
the alignment of the object being addressed guarantees that no part
crosses a 32k boundary.  For example, given lis 9,x@ha; lwz 3,x@l(9);
lwz 4,x+4@l(9); we run into trouble if x happens to reside at
n*64k + 32764.  The final address matters, not just the offset in the
insn.  So I have some changes to mem_operand_gpr and
rs6000_secondary_reload.  I'll post a patch on top of the previous
one.

Another thing.  I reckon we can do without the 'wY' constraint.  I
implemented it because movtf_internal currently uses an o
constraint, but it seems to me that rs6000_legitimate_address_p
already prevents non-offsettable TFmode addresses for mems accessed by
fprs.  Geoff introduced the o on movtf here:
http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html
Thoughts?

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR53914, rs6000 constraints and reload queries

2012-07-19 Thread Alan Modra
On Fri, Jul 20, 2012 at 12:05:28AM +0930, Alan Modra wrote:
 On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote:
  and fixes pr54009.
 
 David, in looking over this today, I realised that this bug isn't
 completely fixed.  I stopped gcc emitting an offset of 32768, but that
 isn't enough.  lo_sum addresses can't be offset, except when you know
 the alignment of the object being addressed guarantees that no part
 crosses a 32k boundary.  For example, given lis 9,x@ha; lwz 3,x@l(9);
 lwz 4,x+4@l(9); we run into trouble if x happens to reside at
 n*64k + 32764.  The final address matters, not just the offset in the
 insn.  So I have some changes to mem_operand_gpr and
 rs6000_secondary_reload.  I'll post a patch on top of the previous
 one.

Actually, maybe I won't.  When I disabled offsetting LO_SUMs, except
for tocrefs (which we know are aligned), I ran into an insn does not
satisfy its constraints ICE on gcc.dg/pr28795-2.  The cause: code in
rs6000_legitimze_reload_address near the comment starting /* Don't do
this for TFmode.  There we have a list of modes from which we make
assumptions about the regs used, and get it wrong for ppc32.  Reload
decided to use 2 gprs for a DFmode.

We've been doing this forever, and almost always the mem is aligned.
So fixing an esoteric testcase like the one in pr54009 would come at
the expense of normal code quality.  Unless I can reliably figure out
regs used and alignment of syms in rs6000_legitimize_reload_address,
and as far as I can tell that isn't possible.

 Another thing.  I reckon we can do without the 'wY' constraint.  I
 implemented it because movtf_internal currently uses an o
 constraint, but it seems to me that rs6000_legitimate_address_p
 already prevents non-offsettable TFmode addresses for mems accessed by
 fprs.  Geoff introduced the o on movtf here:
 http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html
 Thoughts?

This on the other hand works.  Please consider the patch amended to
remove mem_operand_fpr, the 'wY' constraint, and uses thereof replaced
with m.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-10-27 Thread Alan Modra
On Thu, Oct 27, 2011 at 12:24:46AM +1030, Alan Modra wrote:
 more code than duplicating epilogues.  From what I've seen, the
 duplicate tails are generally very small.  I guess I should dump out
 some info so we can get a better idea.

There were 545 occurrences of shrink-wrap in the gcc/ dir for a
-O2 powerpc-linux bootstrap.  I counted active insns in duplicated
blocks.  244 had zero insns (all the ones I looked at were just uses
of r3), 96 had one insn (all the ones I looked at were setting r3),
with the remainder being:

cfgrtl.c.199r.pro_and_epilogue:Duplicating bb 6, 2 insns.
  tail call
function.c.199r.pro_and_epilogue:Duplicating bb 17, 2 insns.
  setting a value via pointer (*poffset in instantiate_new_reg)
insn-automata.c.199r.pro_and_epilogue:Duplicating bb 229, 2 insns.
  tail call
varasm.c.199r.pro_and_epilogue:Duplicating bb 48, 2 insns.
  setting a global var
rs6000.c.199r.pro_and_epilogue:Duplicating bb 300, 3 insns.
  tail call
rs6000.c.199r.pro_and_epilogue:Duplicating bb 8, 3 insns.
  tail call
toplev.c.199r.pro_and_epilogue:Duplicating bb 5, 4 insns.
  loading two word return value from random_seed var.
reginfo.c.199r.pro_and_epilogue:Duplicating bb 4, 7 insns.
  setting a two word global var and r3
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call

Note that having 350 duplicated blocks doesn't mean we had 350 extra
cases of shrink-wrapping, because some tails were two blocks, and I
recall seeing a case when developing my code where one function had
two duplicated tails.  Certainly many more shrink wrapped functions
though.  :-)

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-10-31 Thread Alan Modra
;
!   else
! pending_edge_cold = e;
! }
! 
!   FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
{
  basic_block *pdest_bb;
+ edge pending;
  
! if (BB_PARTITION (e-src) == BB_HOT_PARTITION)
!   {
! pdest_bb = simple_return_block_hot;
! pending = pending_edge_hot;
!   }
  else
!   {
! pdest_bb = simple_return_block_cold;
! pending = pending_edge_cold;
!   }
! 
! if (*pdest_bb == NULL  pending != NULL)
!   {
! emit_return_into_block (true, pending-src);
! pending-flags = ~(EDGE_FALLTHRU | EDGE_FAKE);
! *pdest_bb = pending-src;
!   }
! else if (*pdest_bb == NULL)
{
  basic_block bb;
  rtx start;
*** epilogue_done:
*** 6219,6225 
}
  redirect_edge_and_branch_force (e, *pdest_bb);
}
!   VEC_free (rtx, heap, unconverted_simple_returns);
  }
  #endif
  
--- 6414,6432 
}
  redirect_edge_and_branch_force (e, *pdest_bb);
}
!   VEC_free (edge, heap, unconverted_simple_returns);
! }
! 
!   if (entry_edge != orig_entry_edge)
! {
!   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR-preds)
!   if (EDGE_COUNT (e-src-preds) != 0
!(e-flags  EDGE_FAKE) != 0
!!bitmap_bit_p (bb_flags, e-src-index))
! {
!   emit_return_into_block (true, e-src);
!   e-flags = ~(EDGE_FALLTHRU | EDGE_FAKE);
! }
  }
  #endif
  
*** epilogue_done:
*** 6233,6240 
  
if (!CALL_P (insn)
  || ! SIBLING_CALL_P (insn)
  || (entry_edge != orig_entry_edge
!  !bitmap_bit_p (bb_flags, bb-index)))
{
  ei_next (ei);
  continue;
--- 6440,6450 
  
if (!CALL_P (insn)
  || ! SIBLING_CALL_P (insn)
+ #ifdef HAVE_simple_return
  || (entry_edge != orig_entry_edge
!  !bitmap_bit_p (bb_flags, bb-index))
! #endif
! )
{
  ei_next (ei);
  continue;
*** epilogue_done:
*** 6281,6287 
--- 6491,6499 
  }
  #endif
  
+ #ifdef HAVE_simple_return
bitmap_clear (bb_flags);
+ #endif
  
/* Threading the prologue and epilogue changes the artificial refs
   in the entry and exit blocks.  */

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-11-01 Thread Alan Modra
On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 Bits left to do
 - limit size of duplicated tails

Done here.  Also fixes a hole in that I took no notice of
targetm.cannot_copy_insn_p when duplicating tails.

One interesting result is that the tail duplication actually reduces
the text size of libstdc++.so from 1074042 to 1073478 bytes on
powerpc-linux.  The reason being that a shrink-wrapped function that
needs a prologue only on paths ending in a sibling call will lose one
copy of the epilogue.  That must happen enough to more than make up
for duplicated tails.

Bootstrapped and regression tested powerpc-linux.  OK to apply?
(And I won't be posting any more versions of the patch until this is
reviewed.  Please excuse me for spamming the list.)

* function.c (bb_active_p): Delete.
(dup_block_and_redirect, active_insn_between): New functions.
(convert_jumps_to_returns, emit_return_for_exit): New functions,
split out from..
(thread_prologue_and_epilogue_insns): ..here.  Delete
shadowing variables.  Don't do prologue register clobber tests
when shrink wrapping already failed.  Delete all last_bb_active
code.  Instead compute tail block candidates for duplicating
exit path.  Remove these from antic set.  Duplicate tails when
reached from both blocks needing a prologue/epilogue and
blocks not needing such.
* ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
HAVE_simple_return.
* bb-reorder.c (get_uncond_jump_length): Make global.
* bb-reorder.h (get_uncond_jump_length): Declare.
* cfgrtl.c (rtl_create_basic_block): Comment typo fix.
(rtl_split_edge): Likewise.  Warning fix.
(rtl_duplicate_bb): New function.
(rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

Index: gcc/function.c
===
--- gcc/function.c  (revision 180588)
+++ gcc/function.c  (working copy)
@@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
 #include df.h
 #include timevar.h
 #include vecprim.h
+#include params.h
+#include bb-reorder.h
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R
   HARD_REG_SET hardregs;
   unsigned regno;
 
-  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
-return false;
   if (CALL_P (insn))
 return !SIBLING_CALL_P (insn);
 
@@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump)
 JUMP_LABEL (returnjump) = ret_rtx;
 }
 
-/* Return true if BB has any active insns.  */
+#ifdef HAVE_simple_return
+/* Create a copy of BB instructions and insert at BEFORE.  Redirect
+   preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
+static void
+dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
+   bitmap_head *need_prologue)
+{
+  edge_iterator ei;
+  edge e;
+  rtx insn = BB_END (bb);
+
+  /* We know BB has a single successor, so there is no need to copy a
+ simple jump at the end of BB.  */
+  if (simplejump_p (insn))
+insn = PREV_INSN (insn);
+
+  start_sequence ();
+  duplicate_insn_chain (BB_HEAD (bb), insn);
+  if (dump_file)
+{
+  unsigned count = 0;
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+   if (active_insn_p (insn))
+ ++count;
+  fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n,
+  bb-index, copy_bb-index, count);
+}
+  insn = get_insns ();
+  end_sequence ();
+  emit_insn_before (insn, before);
+
+  /* Redirect all the paths that need no prologue into copy_bb.  */
+  for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); )
+if (!bitmap_bit_p (need_prologue, e-src-index))
+  {
+   redirect_edge_and_branch_force (e, copy_bb);
+   continue;
+  }
+else
+  ei_next (ei);
+}
+#endif
+
+#if defined (HAVE_return) || defined (HAVE_simple_return)
+/* Return true if there are any active insns between HEAD and TAIL.  */
 static bool
-bb_active_p (basic_block bb)
+active_insn_between (rtx head, rtx tail)
 {
+  while (tail)
+{
+  if (active_insn_p (tail))
+   return true;
+  if (tail == head)
+   return false;
+  tail = PREV_INSN (tail);
+}
+  return false;
+}
+
+/* LAST_BB is a block that exits, and empty of active instructions.
+   Examine its predecessors for jumps that can be converted to
+   (conditional) returns.  */
+static VEC (edge, heap) *
+convert_jumps_to_returns (basic_block last_bb, bool simple_p,
+ VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED)
+{
+  int i;
+  basic_block bb;
   rtx label;
+  edge_iterator ei;
+  edge e;
+  VEC(basic_block,heap) *src_bbs;
+
+  src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb-preds));
+  FOR_EACH_EDGE (e, ei, last_bb-preds)
+if (e-src != ENTRY_BLOCK_PTR)
+  VEC_quick_push (basic_block

Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-11-01 Thread Alan Modra
On Tue, Nov 01, 2011 at 02:00:25PM -0500, Peter Bergner wrote:
   (get_ppc476_thunk_name): New function.
   (rs6000_code_end): Likewise.

rs6000.c:27968:1: error: 'void rs6000_code_end()' defined but not used 
[-Werror=unused-function]
cc1plus: all warnings being treated as errors

Bootstrapped and committed as obvious, revision 180761.

* config/rs6000/rs6000.c (rs6000_code_end): Declare ATTRIBUTE_UNUSED.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 180754)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1176,6 +1176,7 @@ static void rs6000_trampoline_init (rtx,
 static bool rs6000_cannot_force_const_mem (enum machine_mode, rtx);
 static bool rs6000_legitimate_constant_p (enum machine_mode, rtx);
 static bool rs6000_save_toc_in_prologue_p (void);
+static void rs6000_code_end (void) ATTRIBUTE_UNUSED;
 
 /* Hash table stuff for keeping track of TOC entries.  */
 
-- 
Alan Modra
Australia Development Lab, IBM


[rs6000] Fix PR 50906, eh_frame and other woes

2011-11-04 Thread Alan Modra
, can_use_exit);
+  sp_offset, can_use_exit);
  if (DEFAULT_ABI == ABI_DARWIN)
/* we only need a copy, no fprs were saved.  */
-   emit_move_insn (gen_rtx_REG (reg_mode, 11), frame_reg_rtx);
+   emit_move_insn (gen_rtx_REG (Pmode, 11), frame_reg_rtx);
 
  if (info-cr_save_p)
rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
}
   else
{
- emit_insn (gen_add3_insn (gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX
-   ? 12 : 11),
-   frame_reg_rtx,
+ rtx src_reg = gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX ? 12 : 11);
+
+ emit_insn (gen_add3_insn (src_reg, frame_reg_rtx,
GEN_INT (sp_offset - info-fp_size)));
- if (REGNO (frame_reg_rtx) == 11)
-   sp_offset += info-fp_size;
+ if (REGNO (frame_reg_rtx) == REGNO (src_reg))
+   sp_offset = info-fp_size;
}
 
   rs6000_emit_savres_rtx (info, frame_reg_rtx,

-- 
Alan Modra
Australia Development Lab, IBM


Re: Reload related segfaults

2011-11-04 Thread Alan Modra
Ping http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02429.html

Eric fixed the bootstrap breakage by another patch, but there is a
fairly obvious bug in gen_reload fixed by my patch.  I gave enough
contect in the diff that you don't even need to look at the file. :)

-- 
Alan Modra
Australia Development Lab, IBM


[rs6000] fix PR 30282, load from stack moved past stack update

2011-11-04 Thread Alan Modra
This patch fixes PR30282, caused by instructions being scheduled over
the stack reset.  Note that only for ABI_V4 do we currently have
frame_reg_rtx != sp_reg_rtx in rs6000_emit_stack_reset, so the patch
doesn't emit *less* blockages.  I did benchmark this change and saw
nothing but the usual benchmake noise.

Bootstrapped etc. powerpc-linux.  OK to apply?

* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
blockage for ABI_V4.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 177309)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -19689,7 +19689,7 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 {
   /* This blockage is needed so that sched doesn't decide to move
  the sp change before the register restores.  */
-  if (frame_reg_rtx != sp_reg_rtx
+  if (DEFAULT_ABI == ABI_V4
   || (TARGET_SPE_ABI
   info-spe_64bit_regs_used != 0
   info-first_gp_reg_save != 32))

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-11-04 Thread Alan Modra
On Fri, Nov 04, 2011 at 06:19:07PM +, Iain Sandoe wrote:
 I did wonder, when implementing the out-of-line saves for Darwin,
 whether the setting in sysv4.h is a bit optimistic in firing for r31
 when -Os is on.
 
 whether it saves space, in reality, would presumably depend on
 whether branch islands were inserted to reach the common code.

Yes, but that doesn't happen on powerpc-linux until code size goes
above 33M.  We get a (small) gain otherwise when able to use the
exit versions of the out-of-line restores.

 Bootstrapped this,

Thanks!

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-11-07 Thread Alan Modra
On Mon, Nov 07, 2011 at 10:53:51AM +0100, Olivier Hainque wrote:
 Another bug we're running into here is an unwarranted move of the sp
 restore prior to register fetches despite an attempt at preventing that
 with a stack_tie instruction (VxWorks target).
 
   http://gcc.gnu.org/ml/gcc/2011-03/msg00123.html
 
 The failure can still be exposed on mainline with a minor adjustment
 to the C testcase quoted in the msg.

Even after revision 181056?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-11-07 Thread Alan Modra
On Mon, Nov 07, 2011 at 04:10:38PM +, Joseph S. Myers wrote:
 FWIW, when I encountered such a problem in 4.4-based tools I found I 
 needed the following change to stack_tie patterns to fix it (applied to 
 and rediffed against trunk, but not tested there), in addition to a 
 DEFAULT_ABI == ABI_V4 check such as you added.  Given all the confusion 
 about this class of bugs and whether they should be addressed in each back 
 end or in a target-independent way, I didn't try to send this upstream at 
 the time.
 
 2011-04-11  Joseph Myers  jos...@codesourcery.com
 
   * config/rs6000/rs6000.md (stack_tie): Use (mem:BLK (scratch)) as
   output of SET.
   (restore_stack_block, restore_stack_nonlocal): Update UNSPEC_TIE
   patterns.

OK, so you made the stack ties conflict with all memory.  I wondered
about that too, but didn't find a testcase where it mattered.  It
would certainly be safer to use the big hammer, but on the other hand
it can restrict scheduling unnecessarily.  For example

void byref (int *a)
{
  int x;

  asm (#%0 set here : =r (x) : : r29, r30);
  *a = x;
}

with the existing stack ties, the write to *a happens after stack
deallocation.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-11-07 Thread Alan Modra
On Tue, Nov 08, 2011 at 12:30:36AM +, Joseph S. Myers wrote:
 On Tue, 8 Nov 2011, Alan Modra wrote:
 
  OK, so you made the stack ties conflict with all memory.  I wondered
  about that too, but didn't find a testcase where it mattered.  It
 
 The test I had was one of those from the various PRs for this issue.
 
 int find_num(int i)
 {
   const int arr[5] = {0, 1, 2, 3, 4};
   return arr[i];
 }
 
 (-O2 -mcpu=603).  But I don't know if it reproduces with FSF 4.4 or trunk; 
 there were plenty of other local changes in the 4.4-based tree where I 
 found this to be necessary.

The testcase generates good code on current gcc-4.7, gcc-4.6, and
gcc-4.5, but fails on current gcc-4.4.  I thought I had this PR
fixed on all active branches.  :-(

It's been a while since I looked at what was happening with this
testcase, but from memory what stops sheduling over the stack_tie is
alias.c:base_alias_check.  Which relies on tracking registers to see
whether two memory accesses using different regs could alias.  Quite
possibly gcc-4.4 is deficient in this area.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-11-08 Thread Alan Modra
On Tue, Nov 08, 2011 at 11:37:57AM +0100, Olivier Hainque wrote:
  Joseph resorted to mem:scratch to impose a strong barrier. That's certainly
  safe and I don't think the performance impact can be significant, so this
  looks like a good way out.

I agree.  Our fancy powerpc stack deallocation barriers rely too much
on alias analysis, which as you've shown (thanks!) isn't reliable in
this instance.  Other targets just use the mem:scratch barrier.

  I tried an approach in between here: replace the stack_tie insn by a
  frame_tie insn with frame_base_rtx as the second base. In the case above,
  with frame_base_rtx==r11, this is something like
 
   (set (mem/c:BLK (reg:SI 11 11) [0 A8])
 (unspec:BLK [
 (mem/c:BLK (reg:SI 11 11) [0 A8])
 (mem/c:BLK (reg/f:SI 1 1) [0 A8])

Looks similar to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282#c15

-- 
Alan Modra
Australia Development Lab, IBM


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Alan Modra
On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote:
 On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
  Looks like all we need is a positive review of
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
  ChangeLog entry to unbreak three or more targets.
  
  Someone with approval rights: pretty please?
 
 That patch is ok.

Sorry for the bootstrap problems.  I believe the cause is my
extraction of convert_jumps_to_returns and emit_return_for_exit.  The
new HAVE_return code misses a single_succ_p test (covered by the
bitmap_bit_p (bb_flags, ...) test in the HAVE_simple_return case).

I haven't really looked into what Bernd's fix does.  I know this one
fixes what I broke..

* function.c (thread_prologue_and_epilogue_insns): Guard
emitting return with single_succ_p test.

Index: gcc/function.c
===
--- gcc/function.c  (revision 181188)
+++ gcc/function.c  (working copy)
@@ -6230,7 +6230,8 @@ thread_prologue_and_epilogue_insns (void
   !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
convert_jumps_to_returns (last_bb, false, NULL);
 
- if (EDGE_COUNT (exit_fallthru_edge-src-preds) != 0)
+ if (EDGE_COUNT (last_bb-preds) != 0
+  single_succ_p (last_bb))
{
  last_bb = emit_return_for_exit (exit_fallthru_edge, false);
  epilogue_end = returnjump = BB_END (last_bb);

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, testsuite]: Fix timeout in simulate-thread.exp

2011-11-17 Thread Alan Modra
On Thu, Nov 17, 2011 at 10:54:23AM +0100, Uros Bizjak wrote:
 The solution for this problem is to search for simulate_thread_done
 in gdb log.

Thanks!  I like this for another reason too:  I've seen these tests
fail due to gdb never reporting simulate_thread_main on any
instruction in that function.

-- 
Alan Modra
Australia Development Lab, IBM


powerpc lack of memory_barrier

2011-11-17 Thread Alan Modra
Lacking this pattern means the builtin __sync_synchronize() on powerpc
is just an asm with a memory clobber (see builtins.c), which is hardly
a full memory barrier as extend.texi says it should be.  This patch
fixes multiple libgomp testsuite failures.  Bootstrapped and
regression tested powerpc-linux.  OK for mainline?

* config/rs6000/sync.md: Duplicate hwsync as memory_barrier.

Index: gcc/config/rs6000/sync.md
===
--- gcc/config/rs6000/sync.md   (revision 181425)
+++ gcc/config/rs6000/sync.md   (working copy)
@@ -53,6 +53,15 @@ (define_expand mem_thread_fence
   DONE;
 })
 
+(define_expand memory_barrier
+  [(set (match_dup 0)
+   (unspec:BLK [(match_dup 0)] UNSPEC_SYNC))]
+  
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
 (define_expand hwsync
   [(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_SYNC))]

-- 
Alan Modra
Australia Development Lab, IBM


RFC: powerpc libgomp locking

2011-11-17 Thread Alan Modra
This is part of work in progress getting locking back into shape on
powerpc.  (If we ever were in shape, which is doubtful.)  Using the
ia64 version means we have a needless sync at the start of
gomp_mutex_lock (courtesy of __sync_val_compare_and_swap) and a
needless isync at the end of gomp_mutex_unlock (slightly odd use
of __sync_lock_test_and_set to release a lock).  It would be nice to
get rid of the __sync_val_compare_and_swap in config/linux/mutex.c
too.

* config/linux/powerpc/mutex.h: Rewrite.

Index: libgomp/config/linux/powerpc/mutex.h
===
--- libgomp/config/linux/powerpc/mutex.h(revision 181425)
+++ libgomp/config/linux/powerpc/mutex.h(working copy)
@@ -1,2 +1,75 @@
-/* On PowerPC __sync_lock_test_and_set isn't a full barrier.  */
-#include config/linux/ia64/mutex.h
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Alan Modra amo...@au1.ibm.com
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+/* This is a Linux specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and the futex syscall.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex, int);
+extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
+
+#define GOMP_MUTEX_INIT_0 1
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0,
+  MEMMODEL_CONSUME = 1,
+  MEMMODEL_ACQUIRE = 2,
+  MEMMODEL_RELEASE = 3,
+  MEMMODEL_ACQ_REL = 4,
+  MEMMODEL_SEQ_CST = 5,
+  MEMMODEL_LAST = 6
+};
+
+static inline void gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+static inline void gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  int oldval = 0;
+
+  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
+  if (__builtin_expect (oldval != 0, 0))
+gomp_mutex_lock_slow (mutex, oldval);
+}
+
+static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  int oldval = __atomic_exchange_4 (mutex, 0, MEMMODEL_RELEASE);
+
+  if (__builtin_expect (oldval  1, 0))
+gomp_mutex_unlock_slow (mutex);
+}
+#endif /* GOMP_MUTEX_H */

-- 
Alan Modra
Australia Development Lab, IBM


Re: RFC: powerpc libgomp locking

2011-11-17 Thread Alan Modra
On Thu, Nov 17, 2011 at 02:56:31PM -1000, Richard Henderson wrote:
 On 11/17/2011 01:56 AM, Alan Modra wrote:
  This is part of work in progress getting locking back into shape on
  powerpc.  (If we ever were in shape, which is doubtful.)  Using the
  ia64 version means we have a needless sync at the start of
  gomp_mutex_lock (courtesy of __sync_val_compare_and_swap) and a
  needless isync at the end of gomp_mutex_unlock (slightly odd use
  of __sync_lock_test_and_set to release a lock).  It would be nice to
  get rid of the __sync_val_compare_and_swap in config/linux/mutex.c
  too.
 
 We should be able to use this new file as a replacement for all
 targets, not just PPC.
 
 And, yes, updating all of the __sync uses in libgomp would be a
 good thing.
 
  +  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
  +  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval != 0, 0))
  +gomp_mutex_lock_slow (mutex, oldval);
 
 __atomic_compare_exchange_4 returns a boolean success:
 
   if (__builtin_expect (!__atomic_compare_exchange_4
   (mutex, oldval, 1, false,
MEMMODEL_ACQUIRE, MEMMODEL_RELAXED), 0))
 
 In theory that should avoid an extra comparison on some targets.

That was the way I wrote it at first.  This way gives three fewer
insns on powerpc, because insns setting up the boolean are elided as
is the comparison of the boolean against zero.  An oldval != 0
comparison insn is also not needed because __atomic_compare_exchange
already did that for us.

BTW, we're left with dead stores into oldval's stack slot.  Is it
legal for the first two parameters of __atomic_compare_exchange to
alias?  I'm guessing that the stores might disappear if we tell gcc
that they can't alias.

-- 
Alan Modra
Australia Development Lab, IBM


Re: powerpc compare_and_swap fails

2011-11-17 Thread Alan Modra
On Thu, Nov 17, 2011 at 12:24:08PM -1000, Richard Henderson wrote:
 On 11/17/2011 10:25 AM, Richard Henderson wrote:
  so that we don't write back into the subreg.
 
 Like so.  Tested on ppc64-linux and committed.

Thanks!  This, along with my mutex.h rewrite plus adding a
memory_barrier insn gets me down to just one libgomp failure.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [brobec...@adacore.com: Re: [PATCH 18/348] Fix -Wsahdow warnings]

2011-11-23 Thread Alan Modra
On Wed, Nov 23, 2011 at 12:05:56PM -0800, Joel Brobecker wrote:
 We were looking at -Wshadow warnings that we thought were counter
 productive, and found that you had written a patch that would help us.
 The patch was okayed, but never applied. Is there a reason for it?

I must have missed seeing the OK.  Committed revision 181684.

-- 
Alan Modra
Australia Development Lab, IBM


Fix libgomp semaphores

2011-11-24 Thread Alan Modra
This fixes PR51249, a failure caused by insufficient care in waking
threads on sem_post.  It's quite a tricky business to get right, but I
believe this rewrite is now correct.  I've also converted over lock.c,
mutex.h and mutex.c to use the new atomic builtins.  This means no
target should need target-specific versions of mutex.h.  mutex-lock.h
came about because at one stage I was trying out semaphore and lock
implementations that used a lock/count int and an nwaiters int to
track number of threads waiting.  That turned out to be a really bad
idea.  You need barriers all over the place to ensure you don't see an
inconsistent lock/semaphore state, and extra barriers are costly.
It's much better to pay the price of an extra futex_wake system call
when transitioning from contended to uncontended.
Anyway, I left the lock abstraction as a good thing.  I also changed
the mutex implementation to use -1 to mean locked and (possibly) some
thread waiting on the lock rather than using 2.  The very minor
benefit is that some targets may use one less instruction testing  0
compared to testing  1.  It's also just that little bit more like the
semaphore implementation.

Bootstrapped and regression tested powerpc64-linux.  I do still see
the occasional testsuite failures on power7 (see pr51298), but the
semaphore failures are gone.

PR libgomp/51249
* config/linux/mutex-lock.h: New.
* config/linux/omp-lock.h: Include above.
(omp_lock_t, omp_nest_lock_t, omp_lock_25_t): Use gomp_metx_t.
* config/linux/lock.c: Use atomic rather than sync builtins.
* config/linux/mutex.h: Likewise.
* config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
* config/linux/sem.h: Rewrite.
* config/linux/sem.c: Rewrite.
* config/linux/arm/mutex.h: Delete.
* config/linux/powerpc/mutex.h: Delete.
* config/linux/ia64/mutex.h: Delete.
* config/linux/mips/mutex.h: Delete.

Index: libgomp/config/linux/mutex-lock.h
===
--- libgomp/config/linux/mutex-lock.h   (revision 0)
+++ libgomp/config/linux/mutex-lock.h   (revision 0)
@@ -0,0 +1,35 @@
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Alan Modra, IBM.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+/* This type is private to the library.  */
+
+#ifndef MUTEX_LOCK_H
+#define MUTEX_LOCK_H 1
+
+typedef struct linux_gomp_mutex
+{
+  /* 0 unlocked, 1 locked, -1 locked and possibly someone waiting.  */
+  int flag;
+} gomp_mutex_t;
+#endif
Index: libgomp/config/linux/omp-lock.h
===
--- libgomp/config/linux/omp-lock.h (revision 181683)
+++ libgomp/config/linux/omp-lock.h (working copy)
@@ -3,10 +3,12 @@
structures without polluting the namespace.
 
When using the Linux futex primitive, non-recursive locks require
-   only one int.  Recursive locks require we identify the owning task
-   and so require one int and a pointer.  */
+   one int.  Recursive locks require we identify the owning task
+   and so require in addition one int and a pointer.  */
 
-typedef int omp_lock_t;
-typedef struct { int lock, count; void *owner; } omp_nest_lock_t;
-typedef int omp_lock_25_t;
+#include mutex-lock.h
+
+typedef gomp_mutex_t omp_lock_t;
+typedef struct { omp_lock_t lock; int count; void *owner; } omp_nest_lock_t;
+typedef gomp_mutex_t omp_lock_25_t;
 typedef struct { int owner, count; } omp_nest_lock_25_t;
Index: libgomp/config/linux/mutex.h
===
--- libgomp/config/linux/mutex.h(revision 181683)
+++ libgomp/config/linux/mutex.h(working copy)
@@ -1,4 +1,4 @@
-/* Copyright (C) 2005, 2009, 2011 Free Software Foundation, Inc.
+/* Copyright (C) 2011 Free Software Foundation, Inc.
Contributed by Richard Henderson r...@redhat.com.
 
This file is part of the GNU OpenMP Library (libgomp).
@@ -29,43 +29,39 @@
 #ifndef

Re: Fix libgomp semaphores

2011-11-25 Thread Alan Modra
On Fri, Nov 25, 2011 at 08:38:39AM +0100, Jakub Jelinek wrote:
 My preference would be to avoid the abstraction changes though, both
 because it is additional clutter in the changeset and because omp_lock
 and nested lock are part of public ABIs, so if struct is layed out
 differently on some weird architecture, it would be an ABI change.

OK, fair enough.  I didn't consider that structs may be laid out
differently.

 So, if you could keep gomp_mutex_t, omp_lock_t and gomp_sem_t as integers,
 it would be appreciated.
 
 Furthermore, I'd prefer if the patch could be split into smaller parts,
 e.g. for bisecting purposes.  One patch would do the mutex changes
 to use new atomics, remove extra mutex.h headers and start using 0/1/-1
 instead of 0/1/2.  And another patch would rewrite the semaphores.

OK.  I need to do this anyway as I just discovered a regression when
looping on one of the tests.  I suspect the acquire/release mutex
locking may have exposed bugs elsewhere in libgomp that were covered
by the heavyweight locking used by the __sync builtins.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix libgomp semaphores

2011-11-27 Thread Alan Modra
/config/linux/sem.c  (revision 181718)
+++ libgomp/config/linux/sem.c  (working copy)
@@ -28,34 +28,69 @@
 
 #include wait.h
 
-
 void
-gomp_sem_wait_slow (gomp_sem_t *sem)
+gomp_sem_wait_slow (gomp_sem_t *sem, int count)
 {
-  while (1)
+  int oldval, newval;
+
+  /* First loop spins a while.  */
+  while (count == 0)
 {
-  int val = __sync_val_compare_and_swap (sem, 0, -1);
-  if (val  0)
+  if (do_spin (sem, 0))
+   {
+ /* Spin timeout, nothing changed.  Set waiting flag.  */
+ oldval = 0;
+ __atomic_compare_exchange_4 (sem, oldval, 0x8000, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
+ count = oldval;
+ if (oldval == 0)
+   {
+ futex_wait (sem, 0x8000);
+ count = *sem;
+   }
+ break;
+   }
+  /* Something changed.  If positive, we're good to go.  */
+  else if ((count = *sem)  0)
{
- if (__sync_bool_compare_and_swap (sem, val, val - 1))
+ oldval = count;
+ __atomic_compare_exchange_4 (sem, oldval, count - 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
+ if (oldval == count)
return;
+ count = oldval;
}
-  do_wait (sem, -1);
+}
+
+  /* Second loop waits until semaphore is posted.  We always exit this
+ loop with wait flag set, so next post will awaken a thread.  */
+  while (1)
+{
+  oldval = count;
+  newval = 0x8000;
+  if ((count  0x7fff) != 0)
+   newval |= count - 1;
+  __atomic_compare_exchange_4 (sem, oldval, newval, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
+  if (oldval == count)
+   {
+ if ((count  0x7fff) != 0)
+   {
+ /* If we can wake more threads, do so now.  */
+ if ((count  0x7fff)  1)
+   gomp_sem_post_slow (sem);
+ break;
+   }
+ do_wait (sem, 0x8000);
+ count = *sem;
+   }
+  else
+   count = oldval;
 }
 }
 
 void
 gomp_sem_post_slow (gomp_sem_t *sem)
 {
-  int old, tmp = *sem, wake;
-
-  do
-{
-  old = tmp;
-  wake = old  0 ? old + 1 : 1;
-  tmp = __sync_val_compare_and_swap (sem, old, wake);
-}
-  while (old != tmp);
-
-  futex_wake (sem, wake);
+  futex_wake (sem, 1);
 }

-- 
Alan Modra
Australia Development Lab, IBM


Use atomics in libgomp mutex

2011-11-27 Thread Alan Modra
 that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   http://www.gnu.org/licenses/.  */
-
-/* MIPS needs the same correct usage of __sync_synchronize and
-   __sync_lock_test_and_set as ia64.  So we just use its mutex.h.  */
-
-#include config/linux/ia64/mutex.h

-- 
Alan Modra
Australia Development Lab, IBM


Fix PR51298, libgomp barrier failure

2011-11-27 Thread Alan Modra
-generation != state + 4);
+  while (gen != state + 4);
 }
 
 void

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-27 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:27:20AM +1030, Alan Modra wrote:
 This is the mutex part.  Depends on
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
 values.

Arrgh, I posted the wrong patch.  I know it needs work elsewhere in
libgomp to comply with the OMP spec, which says of flush:

14 A flush region without a list is implied at the following locations:
15 • During a barrier region.
16 • At entry to and exit from parallel, critical, and ordered regions.
[snip]

Yet COMP_critical_start simply calls gomp_mutex_lock, relying on the
lock acquisition to do the flush.  This won't necessarily happen with
MEMMODEL_ACQUIRE.

This bit in gomp_mutex_lock
 +  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
 +MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
should be
  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
   MEMMODEL_ACQ_REL, MEMMODEL_ACQ_REL);

and once you do that it's hard to justify the patch for stage3 as
fixing a bug (unnecessary sync instructions emitted by __sync
builtins).

-- 
Alan Modra
Australia Development Lab, IBM


Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)

2011-11-27 Thread Alan Modra
On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote:
 I was hoping that other targets affected by the patch would test it as well.

powerpc-linux and powerpc64-linux look good.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:15:02AM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote:
  14 A flush region without a list is implied at the following locations:
  15 • During a barrier region.
  16 • At entry to and exit from parallel, critical, and ordered regions.
  [snip]

 If it is just GOMP_critical_start, couldn't it use an extra barrier
 before resp. after it calls mutex_lock?

I believe gomp_ordered_sync needs a barrier, and I'm unsure about
GOMP_parallel_start but I think one is needed there too.  Yes, putting
the barriers where they are actually needed is the best solution, but
I'm far from an OpenMP expert so am unsure whether these are the only
places requiring a barrier.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote:
  --- libgomp/config/linux/bar.c  (revision 181718)
  +++ libgomp/config/linux/bar.c  (working copy)
  @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
 if (__builtin_expect ((state  1) != 0, 0))
   {
 /* Next time we'll be awaiting TOTAL threads again.  */
  -  bar-awaited = bar-total;
  -  atomic_write_barrier ();
  -  bar-generation += 4;
  +  __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE);
  +  __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL);
 futex_wake ((int *) bar-generation, INT_MAX);
   }
 
 The above is unfortunate, bar-generation should be only modified from a
 single thread at this point, but the __atomic_add_fetch will enforce there
 an atomic instruction on it rather than just some kind of barrier.

I will try without the atomic add and see what happens.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 08:09:01PM +1030, Alan Modra wrote:
 On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote:
  On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote:
   --- libgomp/config/linux/bar.c(revision 181718)
   +++ libgomp/config/linux/bar.c(working copy)
   @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
  if (__builtin_expect ((state  1) != 0, 0))
{
  /* Next time we'll be awaiting TOTAL threads again.  */
   -  bar-awaited = bar-total;
   -  atomic_write_barrier ();
   -  bar-generation += 4;
   +  __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE);
   +  __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL);
  futex_wake ((int *) bar-generation, INT_MAX);
}
  
  The above is unfortunate, bar-generation should be only modified from a
  single thread at this point, but the __atomic_add_fetch will enforce there
  an atomic instruction on it rather than just some kind of barrier.
 
 I will try without the atomic add and see what happens.

Seems to be fine.  I took out the extra write barriers too, so we now
just have two MEMMODEL_RELEASE atomic store barriers replacing the two
old atomic_write_barriers, and a number of new acquire barriers.

PR libgomp/51298
* config/linux/bar.h: Use atomic rather than sync builtins.
* config/linux/bar.c: Likewise.  Add missing acquire
synchronisation on generation field.
* task.c (gomp_barrier_handle_tasks): Regain lock so as to not
double unlock.

Index: libgomp/task.c
===
--- libgomp/task.c  (revision 181718)
+++ libgomp/task.c  (working copy)
@@ -273,6 +273,7 @@ gomp_barrier_handle_tasks (gomp_barrier_
  gomp_team_barrier_done (team-barrier, state);
  gomp_mutex_unlock (team-task_lock);
  gomp_team_barrier_wake (team-barrier, 0);
+ gomp_mutex_lock (team-task_lock);
}
}
 }
Index: libgomp/config/linux/bar.h
===
--- libgomp/config/linux/bar.h  (revision 181770)
+++ libgomp/config/linux/bar.h  (working copy)
@@ -50,7 +50,7 @@ static inline void gomp_barrier_init (go
 
 static inline void gomp_barrier_reinit (gomp_barrier_t *bar, unsigned count)
 {
-  __sync_fetch_and_add (bar-awaited, count - bar-total);
+  __atomic_add_fetch (bar-awaited, count - bar-total, MEMMODEL_ACQ_REL);
   bar-total = count;
 }
 
@@ -69,10 +69,8 @@ extern void gomp_team_barrier_wake (gomp
 static inline gomp_barrier_state_t
 gomp_barrier_wait_start (gomp_barrier_t *bar)
 {
-  unsigned int ret = bar-generation  ~3;
-  /* Do we need any barrier here or is __sync_add_and_fetch acting
- as the needed LoadLoad barrier already?  */
-  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
+  unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE)  ~3;
+  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;
   return ret;
 }
 
Index: libgomp/config/linux/bar.c
===
--- libgomp/config/linux/bar.c  (revision 181770)
+++ libgomp/config/linux/bar.c  (working copy)
@@ -37,17 +37,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
 {
   /* Next time we'll be awaiting TOTAL threads again.  */
   bar-awaited = bar-total;
-  atomic_write_barrier ();
-  bar-generation += 4;
+  __atomic_store_4 (bar-generation, bar-generation + 4,
+   MEMMODEL_RELEASE);
   futex_wake ((int *) bar-generation, INT_MAX);
 }
   else
 {
-  unsigned int generation = state;
-
   do
-   do_wait ((int *) bar-generation, generation);
-  while (bar-generation == generation);
+   do_wait ((int *) bar-generation, state);
+  while (__atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) == state);
 }
 }
 
@@ -81,15 +79,15 @@ gomp_team_barrier_wake (gomp_barrier_t *
 void
 gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  unsigned int generation;
+  unsigned int generation, gen;
 
   if (__builtin_expect ((state  1) != 0, 0))
 {
   /* Next time we'll be awaiting TOTAL threads again.  */
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr-ts.team;
+
   bar-awaited = bar-total;
-  atomic_write_barrier ();
   if (__builtin_expect (team-task_count, 0))
{
  gomp_barrier_handle_tasks (state);
@@ -97,7 +95,7 @@ gomp_team_barrier_wait_end (gomp_barrier
}
   else
{
- bar-generation = state + 3;
+ __atomic_store_4 (bar-generation, state + 3, MEMMODEL_RELEASE);
  futex_wake ((int *) bar-generation, INT_MAX);
  return;
}
@@ -107,12 +105,16 @@ gomp_team_barrier_wait_end (gomp_barrier
   do
 {
   do_wait ((int *) bar-generation, generation

Re: Fix libgomp semaphores

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:23:37PM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote:
  +  int count = *sem;
  +
  +  while ((count  0x7fff) != 0)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
  +  false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   return;
 
 Why aren't you instead testing the return value of
 __atomic_compare_exchange_4 (here and in other cases)?

If you use the return value on powerpc, you find that requires two
load immediate instructions (loading zero and one), and a compare
against zero.  That makes three fewer instructions as written, because
the oldval == count comparison has already been done inside the atomic
sequence.  I'd expect fewer on most other architectures unless they
happen to have a compare and exchange instruction that sets condition
bits (ie. Intel).  Even on Intel the way I've written the code
shouldn't take more instructions with a properly written cmpxchg rtl
description.  Does it?

Hmm, I suppose you could argue that powerpc and others ought to not
generate those three extra instructions when using the return value.
I'll see about fixing powerpc.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix libgomp semaphores

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 04:00:10PM -0800, Richard Henderson wrote:
 On 11/27/2011 02:53 PM, Alan Modra wrote:
  +enum memmodel
  +{
  +  MEMMODEL_RELAXED = 0,
  +  MEMMODEL_CONSUME = 1,
  +  MEMMODEL_ACQUIRE = 2,
  +  MEMMODEL_RELEASE = 3,
  +  MEMMODEL_ACQ_REL = 4,
  +  MEMMODEL_SEQ_CST = 5,
  +  MEMMODEL_LAST = 6
  +};
 
 This should probably go to libgomp.h.

I wondered about that.

   /* This is a Linux specific implementation of a semaphore synchronization
  mechanism for libgomp.  This type is private to the library.  This 
  +   counting semaphore implementation uses atomic instructions and the
  +   futex syscall, and a single 32-bit int to store semaphore state.
  +   The low 31 bits are the count, the top bit is a flag set when some
  +   threads may be waiting.  */
 
 I think we'll get better code generation on a number of targets if we make 
 the low bit the waiting  bit, and the higher bits the count.  This we do 
 (count  1) and (count + 2) instead of larger constants needing to be 
 generated.  Not changed below...

Funny, that's the way I wrote this code at first, then went for the
wait bit as the sign bit because you can test  0 in one place where
you want not waiting and count non-zero.

  +static inline void
  +gomp_sem_wait (gomp_sem_t *sem)
   {
  +  int count = *sem;
  +
  +  while ((count  0x7fff) != 0)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
  +  false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   return;
  +  count = oldval;
  +}
 
 I'd really prefer not to hard-code any sizes at all.  Let's change the 
 explicit _4 to _n everywhere.

OK.

 The loop above doesn't actually make sense to me.  If the compare-and-swap 
 succeeds, then oldval == count - 1.  Which doesn't seem to be what you're 
 testing at all.

Huh?  If it succeeds, oldval == count and we return.

 Perhaps this entire function is better written as
 
 {
   int count = *sem;
   do
 {
   if (__builtin_expect (count  0x8000u, 0)
 {
   gomp_sem_wait_slow (sem, count);
   return;
 }
 }
   while (!__atomic_compare_exchange_n (sem, count, count - 1, true,
  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED));
 }

No, we don't need the slow path if we have *sem  0x7fff non-zero.

 
  +gomp_sem_post (gomp_sem_t *sem)
   {
  +  int count = *sem;
  +
  +  while (1)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, (count + 1)  0x7fff,
  +  false, MEMMODEL_RELEASE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   break;
 
 Again, this equality doesn't make sense.  Further, you aren't testing for the
 high bit set inside the loop, nor are you preserving the high bit.

See above about the equality.  We don't want to preserve the wait bit
here.  Otherwise we'd never go back to the uncontended state, which
answers

 ... All that said, I don't see how we can ever clear the wait bit
 once its set?

this question.

 Are we better off splitting the 32-bit value into two 16-bit fields for 
 value+waiters?  Or similarly splitting a 64-bit value?  That way we can at 
 least update both fields atomically, and we ought never lose a waiter.

Other splits of the field are certainly possible, but of course
restrict the max number, and you'd need the fancy futex op_wait.
Some targets don't have 64-bit atomics, so we can't really go that
way.  Yes, if I did keep track of number of waiters we'd save one
unnecessary futex_wake call, but I'm quite confident no waiters will
be lost just using a flag.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Streamline compare-and-swap success return value computation

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 04:33:58PM -0800, Richard Henderson wrote:
 On 11/28/2011 04:26 PM, Richard Henderson wrote:
  On 11/28/2011 03:05 PM, Richard Henderson wrote:
  On 11/28/2011 02:16 PM, Alan Modra wrote:
  Hmm, I suppose you could argue that powerpc and others ought to not
  generate those three extra instructions when using the return value.
  I'll see about fixing powerpc.
 
  However, we can do better by considering the value to be stored in CR0...
  
  Try this and see if it generates the sort of code you want.  Untested.
 
 ... actually, this one.  There's no reason to differentiate between strong
 and weak compare-and-swap when computing boolval.

Looks very similar to what I wrote, except

 +  else if (reg_overlap_mentioned_p (retval, oldval))
 +oldval = copy_to_reg (oldval);

I missed this bit

 -  x = gen_rtx_NE (VOIDmode, x, oldval);
 -  x = rs6000_generate_compare (x, mode);
 +  cond = gen_reg_rtx (CCmode);
 +  x = gen_rtx_COMPARE (CCmode, x, oldval);
 +  emit_insn (gen_rtx_SET (VOIDmode, cond, x));
 +
 +  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);

and instead pulled cond out of the return from rs6000_generate_compare.

Results look good.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:15:08PM +0100, Rainer Orth wrote:
 Alan Modra amo...@gmail.com writes:
 
  On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote:
  I was hoping that other targets affected by the patch would test it as 
  well.
 
  powerpc-linux and powerpc64-linux look good.
 
 Thanks for the confirmation.  However, Andreas Tobler pointed out in
 private mail that I'd missed to reference rs6000/t-linux64 during the
 libgcc migration.  Strange that this seems to have no ill effect.

Hmm, yeah, we should have that back in.  Lack of t-linux64 means we
get all the soft-fp functions that we never use, bloating libgcc.
Also, -mno-minimal-toc gives better code.  Even when -mminimal-toc
wouldn't be in force anyway the option selects -mcmodel=small.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:42:15PM -0800, Richard Henderson wrote:
 On 11/28/2011 06:02 AM, Alan Modra wrote:
  -  unsigned int ret = bar-generation  ~3;
  -  /* Do we need any barrier here or is __sync_add_and_fetch acting
  - as the needed LoadLoad barrier already?  */
  -  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
  +  unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) 
   ~3;
  +  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;
 
 Given that the read from bar-generation is ACQ, we don't need a duplicate 
 barrier from the REL on the atomic add.  I believe both can be 
 MEMMODEL_ACQUIRE
 both in order to force the ordering of these two memops, as well as force 
 these
 to happen before anything subsequent.

I tried with MEMMODEL_ACQUIRE and ran force-parallel-6.exe, the test
that seems most sensitive to barrier problems, many times, and it hangs
occasionally in futex_wait called via gomp_team_barrier_wait_end.

I believe that threads can't be allowed to exit from
gomp_{,team_}barrier_wait without hitting a release barrier, and
perhaps from gomp_barrier_wait_last too.  gomp_barrier_wait_start is a
convenient point to insert the barrier, and a minimal change from the
old code using __sync_add_and_fetch.  I can add a comment. ;-)

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-29 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:33:17PM -0800, Richard Henderson wrote:
 On 11/27/2011 02:57 PM, Alan Modra wrote:
  This is the mutex part.  Depends on
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
  values.
  
  * config/linux/mutex.h: Use atomic rather than sync builtins.
  * config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
  * config/linux/omp-lock.h: Comment fix.
  * config/linux/arm/mutex.h: Delete.
  * config/linux/powerpc/mutex.h: Delete.
  * config/linux/ia64/mutex.h: Delete.
  * config/linux/mips/mutex.h: Delete.
 
 Looks good modulo _4/_n and using the success return of 
 __atomic_compare_exchange.

Did you see my comment about the OpenMP spec requirement that critical,
ordered and parallel regions have a flush on entry and on exit?  So
should I commit a version with

static inline void
gomp_mutex_lock (gomp_mutex_t *mutex)
{
  int oldval = 0;
  /* FIXME: This should just be MEMMODEL_ACQUIRE, MEMMODEL_RELAXED
 but GOMP_critical_start and other functions rely on the lock
 acquisition doing a flush.  See OpenMP API version 3.1 section
 2.8.6 flush Construct.  */
  if (__builtin_expect (!__atomic_compare_exchange_n (mutex, oldval, 1, false,
  MEMMODEL_ACQ_REL,
  MEMMODEL_ACQUIRE),
0))
gomp_mutex_lock_slow (mutex, oldval);
}

or as posted in
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02455.html?  Perhaps with
the following as well

* ordered.c (gomp_ordered_sync): Add MEMMODEL_RELEASE fence.
* critical.c (GOMP_critical_start): Likewise.

I think we're OK on parallel regions by virtue of gomp_barrier_wait,
but I may not know what I'm talking about.  This stuff fries my
brains.

Index: ordered.c
===
--- ordered.c   (revision 181770)
+++ ordered.c   (working copy)
@@ -199,6 +199,9 @@ gomp_ordered_sync (void)
   if (team == NULL || team-nthreads == 1)
 return;
 
+  /* There is an implicit flush on entry to an ordered region. */
+  __atomic_thread_fence (MEMMODEL_RELEASE);
+
   /* ??? I believe it to be safe to access this data without taking the
  ws-lock.  The only presumed race condition is with the previous
  thread on the queue incrementing ordered_cur such that it points
Index: critical.c
===
--- critical.c  (revision 181770)
+++ critical.c  (working copy)
@@ -33,6 +33,8 @@ static gomp_mutex_t default_lock;
 void
 GOMP_critical_start (void)
 {
+  /* There is an implicit flush on entry to a critical region. */
+  __atomic_thread_fence (MEMMODEL_RELEASE);
   gomp_mutex_lock (default_lock);
 }
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix libgomp semaphores

2011-11-29 Thread Alan Modra
)
 {
-  int val = __sync_val_compare_and_swap (sem, 0, -1);
-  if (val  0)
+  unsigned int wake = count  ~SEM_WAIT;
+  int newval = SEM_WAIT;
+
+  if (wake != 0)
+   newval |= wake - SEM_INC;
+  if (likely_compare_exchange (sem, count, newval,
+  false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
{
- if (__sync_bool_compare_and_swap (sem, val, val - 1))
-   return;
+ if (wake != 0)
+   {
+ /* If we can wake more threads, do so now.  */
+ if (wake  SEM_INC)
+   gomp_sem_post_slow (sem);
+ break;
+   }
+ do_wait (sem, SEM_WAIT);
+ count = *sem;
}
-  do_wait (sem, -1);
 }
 }
 
 void
 gomp_sem_post_slow (gomp_sem_t *sem)
 {
-  int old, tmp = *sem, wake;
-
-  do
-{
-  old = tmp;
-  wake = old  0 ? old + 1 : 1;
-  tmp = __sync_val_compare_and_swap (sem, old, wake);
-}
-  while (old != tmp);
-
-  futex_wake (sem, wake);
+  futex_wake (sem, 1);
 }

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix libgomp semaphores

2011-11-29 Thread Alan Modra
On Tue, Nov 29, 2011 at 08:09:31AM -0800, Richard Henderson wrote:
 Tight loops like this should use the weak compare-exchange so that we don't 
 get two nested loops without need.

Done.

 I do think we need more commentary here so that next month I can still follow 
 it.

Added this in gomp_sem_post

  /* Clear SEM_WAIT here so that if there are no more waiting threads
 we transition back to the uncontended state that does not make
 futex syscalls.  If there are waiting threads then when one is
 awoken it will set SEM_WAIT again, so other waiting threads are
 woken on a future gomp_sem_post.  Furthermore, the awoken thread
 will wake other threads in case gomp_sem_post was called again
 before it had time to set SEM_WAIT.  */

and committed revision 181831 without likely_compare_exchange as
discussed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-29 Thread Alan Modra
On Tue, Nov 29, 2011 at 08:37:07AM -0800, Richard Henderson wrote:
 How about MEMMODEL_ACQ_REL

Right.  Committed with that change revision 181832.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix PR51298, libgomp barrier failure

2011-11-29 Thread Alan Modra
On Tue, Nov 29, 2011 at 04:15:36PM +1030, Alan Modra wrote:
 On Mon, Nov 28, 2011 at 05:42:15PM -0800, Richard Henderson wrote:
  On 11/28/2011 06:02 AM, Alan Modra wrote:
   -  unsigned int ret = bar-generation  ~3;
   -  /* Do we need any barrier here or is __sync_add_and_fetch acting
   - as the needed LoadLoad barrier already?  */
   -  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
   +  unsigned int ret = __atomic_load_4 (bar-generation, 
   MEMMODEL_ACQUIRE)  ~3;
   +  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;
  
  Given that the read from bar-generation is ACQ, we don't need a duplicate 
  barrier from the REL on the atomic add.  I believe both can be 
  MEMMODEL_ACQUIRE
  both in order to force the ordering of these two memops, as well as force 
  these
  to happen before anything subsequent.
 
 I tried with MEMMODEL_ACQUIRE and ran force-parallel-6.exe, the test
 that seems most sensitive to barrier problems, many times, and it hangs
 occasionally in futex_wait called via gomp_team_barrier_wait_end.
 
 I believe that threads can't be allowed to exit from
 gomp_{,team_}barrier_wait without hitting a release barrier, and
 perhaps from gomp_barrier_wait_last too.  gomp_barrier_wait_start is a
 convenient point to insert the barrier, and a minimal change from the
 old code using __sync_add_and_fetch.  I can add a comment. ;-)

Committed rev 181833.

-- 
Alan Modra
Australia Development Lab, IBM


Use atomics in remaining libgomp/config/linux sources

2011-12-01 Thread Alan Modra
 @@ cpu_relax (void)
   __asm volatile ( : : : memory);
 #endif
 }
-
-static inline void
-atomic_write_barrier (void)
-{
-#if defined __arch64__ || defined __sparc_v9__
-  __asm volatile (membar #StoreStore : : : memory);
-#else
-  __sync_synchronize ();
-#endif
-}
Index: libgomp/config/linux/x86/futex.h
===
--- libgomp/config/linux/x86/futex.h(revision 181830)
+++ libgomp/config/linux/x86/futex.h(working copy)
@@ -145,9 +145,3 @@ cpu_relax (void)
 {
   __asm volatile (rep; nop : : : memory);
 }
-
-static inline void
-atomic_write_barrier (void)
-{
-  __sync_synchronize ();
-}
Index: libgomp/config/linux/futex.h
===
--- libgomp/config/linux/futex.h(revision 181830)
+++ libgomp/config/linux/futex.h(working copy)
@@ -67,9 +67,3 @@ cpu_relax (void)
 {
   __asm volatile ( : : : memory);
 }
-
-static inline void
-atomic_write_barrier (void)
-{
-  __sync_synchronize ();
-}

-- 
Alan Modra
Australia Development Lab, IBM


PR libgomp/51376 fix

2011-12-01 Thread Alan Modra
The simple one-line fix in GOMP_taskwait took many hours to find.
Shared memory problems are a pain to debug, especially when adding
code to dump some state turns a testcase that fails every hundred or
so runs into one that takes thousands of times longer to fail.

What happens here is that GOMP_taskwait is called in the parent thread
some time after gomp_barrier_handle_tasks has run in the child to the
point of clearing the parent's children field.  However, since there
is no acquire barrier in the parent and the child may or may not have
reached the release barrier in the mutex unlock, the memory stores in
the child are not guaranteed to be seen in order in the parent thread.
Thus the parent can see task-children clear but not yet see stores
done as part of the real work of the child, ie. to a and n in the
testcase.

The GOMP_task change fixes a similar potential problem.  Bootstrapped
and regression tested powerpc-linux.  OK to apply?

PR libgomp/51376
* task.c (GOMP_taskwait): Don't access task-children outside of
task_lock mutex region.
(GOMP_task): Likewise.

Index: libgomp/task.c
===
--- libgomp/task.c  (revision 181833)
+++ libgomp/task.c  (working copy)
@@ -116,12 +116,12 @@ GOMP_task (void (*fn) (void *), void *da
}
   else
fn (data);
-  if (task.children)
-   {
- gomp_mutex_lock (team-task_lock);
- gomp_clear_parent (task.children);
- gomp_mutex_unlock (team-task_lock);
-   }
+  if (team != NULL)
+   gomp_mutex_lock (team-task_lock);
+  if (task.children != NULL)
+   gomp_clear_parent (task.children);
+  if (team != NULL)
+   gomp_mutex_unlock (team-task_lock);
   gomp_end_task ();
 }
   else
@@ -290,8 +290,9 @@ GOMP_taskwait (void)
   struct gomp_task *child_task = NULL;
   struct gomp_task *to_free = NULL;
 
-  if (task == NULL || task-children == NULL)
+  if (task == NULL)
 return;
+
   gomp_mutex_lock (team-task_lock);
   while (1)
 {

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR libgomp/51376 fix

2011-12-02 Thread Alan Modra
On Thu, Dec 01, 2011 at 12:36:08PM +0100, Jakub Jelinek wrote:
 On Thu, Dec 01, 2011 at 09:58:08PM +1030, Alan Modra wrote:
  The GOMP_task change fixes a similar potential problem.  Bootstrapped
  and regression tested powerpc-linux.  OK to apply?
  
  PR libgomp/51376
  * task.c (GOMP_taskwait): Don't access task-children outside of
  task_lock mutex region.
  (GOMP_task): Likewise.
 
 Can't this be solved just by adding a barrier?  The access to the var
 outside of the lock has been quite intentional, to avoid locking in the
 common case where there are no children.

No, I tried that and the task-6.C testcase still failed although not
quite as readily.  I was using

if (task == NULL
|| __atomic_load_n (task-children, MEMMODEL_ACQUIRE) == 0)

You need a release in the child as well as the acquire to ensure
proper synchronisation, and there's a window for failure between the
child clearing task-children and performing a release as part of the
mutex unlock.

Oops, on looking at this email I see I attached an old patch..  This
one avoids a segfault on trying to lock team-task_lock when there
is no team.  This one really has been bootstrapped and regression
tested successfully.

PR libgomp/51376
* task.c (GOMP_taskwait): Don't access task-children outside of
task_lock mutex region.
(GOMP_task): Likewise.

Index: libgomp/task.c
===
--- libgomp/task.c  (revision 181902)
+++ libgomp/task.c  (working copy)
@@ -116,10 +116,11 @@ GOMP_task (void (*fn) (void *), void *da
}
   else
fn (data);
-  if (task.children)
+  if (team != NULL)
{
  gomp_mutex_lock (team-task_lock);
- gomp_clear_parent (task.children);
+ if (task.children != NULL)
+   gomp_clear_parent (task.children);
  gomp_mutex_unlock (team-task_lock);
}
   gomp_end_task ();
@@ -290,8 +291,9 @@ GOMP_taskwait (void)
   struct gomp_task *child_task = NULL;
   struct gomp_task *to_free = NULL;
 
-  if (task == NULL || task-children == NULL)
+  if (task == NULL || team == NULL)
 return;
+
   gomp_mutex_lock (team-task_lock);
   while (1)
 {

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR libgomp/51376 fix

2011-12-03 Thread Alan Modra
On Fri, Dec 02, 2011 at 08:10:11PM +1030, Alan Modra wrote:
 On Thu, Dec 01, 2011 at 12:36:08PM +0100, Jakub Jelinek wrote:
  On Thu, Dec 01, 2011 at 09:58:08PM +1030, Alan Modra wrote:
   The GOMP_task change fixes a similar potential problem.  Bootstrapped
   and regression tested powerpc-linux.  OK to apply?
   
 PR libgomp/51376
 * task.c (GOMP_taskwait): Don't access task-children outside of
 task_lock mutex region.
 (GOMP_task): Likewise.
  
  Can't this be solved just by adding a barrier?  The access to the var
  outside of the lock has been quite intentional, to avoid locking in the
  common case where there are no children.
 
 No, I tried that and the task-6.C testcase still failed although not
 quite as readily.  I was using
 
 if (task == NULL
 || __atomic_load_n (task-children, MEMMODEL_ACQUIRE) == 0)
 
 You need a release in the child as well as the acquire to ensure
 proper synchronisation, and there's a window for failure between the
 child clearing task-children and performing a release as part of the
 mutex unlock.

Perhaps I misunderstood your question.  I suppose you could solve the
problem by adding a barrier, if by that you meant the acquire barrier
in GOMP_taskwait as I tried before I fully understood the problem,
plus the corresponding release barrier in gomp_barrier_handle_tasks.
ie. replace
  parent-children = NULL;
with
  __atomic_write_n (parent-children, NULL,
MEMMODEL_RELEASE);

That should work, but of course will slow down what I imagine is a
common multi-thread case where child threads don't complete before the
parent gets around to a taskwait.  Do you really want to do that?

-- 
Alan Modra
Australia Development Lab, IBM


Re: Allow more PowerPC sibling calls

2011-04-21 Thread Alan Modra
On Mon, Apr 18, 2011 at 10:29:17AM -0700, Nathan Froyd wrote:
 Could I request that you use FOREACH_FUNCTION_ARGS in these two cases?
 The conversion is trivial, and avoiding more exposed TYPE_ARG_TYPES
 calls is a good thing.

Committed revision 172855.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Allow more PowerPC sibling calls

2011-04-21 Thread Alan Modra
On Fri, Apr 22, 2011 at 01:29:14PM +0930, Alan Modra wrote:
 On Mon, Apr 18, 2011 at 10:29:17AM -0700, Nathan Froyd wrote:
  Could I request that you use FOREACH_FUNCTION_ARGS in these two cases?
  The conversion is trivial, and avoiding more exposed TYPE_ARG_TYPES
  calls is a good thing.
 
 Committed revision 172855.

Oops, I missed the following hunk, which I talked about here

 I also fixed a minor problem with CALL_LIBCALL in the call
 cookie, which could cause various call insns to match an n
 constraint rather than a 0 constraint and so give the wrong insn
 length.

and mentioned in the changelog:

   * config/rs6000/rs6000.c (rs6000_function_arg): Remove CALL_LIBCALL
   when returning call_cookie.

but failed to include in my post.  Committed as obvious, revision 172856.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 172855)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9007,7 +9007,7 @@ rs6000_function_arg (CUMULATIVE_ARGS *cu
   : CALL_V4_CLEAR_FP_ARGS));
}
 
-  return GEN_INT (cum-call_cookie);
+  return GEN_INT (cum-call_cookie  ~CALL_LIBCALL);
 }
 
   if (TARGET_MACHO  rs6000_darwin64_struct_check_p (mode, type))


-- 
Alan Modra
Australia Development Lab, IBM


PowerPC64 non-delegitimized UNSPEC_TOCREL

2011-04-28 Thread Alan Modra
This patch fixes the following warnings seen during a powerpc64
bootstrap.

libgomp/config/linux/sem.c: In function ‘gomp_sem_wait_slow’:
libgomp/config/linux/sem.c:33:1: note: non-delegitimized UNSPEC UNSPEC_TOCREL 
(44) found in variable location
libgomp/config/linux/bar.c: In function ‘gomp_barrier_wait_end’:
libgomp/config/linux/bar.c:34:1: note: non-delegitimized UNSPEC UNSPEC_TOCREL 
(44) found in variable location

What's happening is that we are getting an address that looked like
  lo_sum ((reg 31)
  (const (plus (unspec [symbol_ref (some_var) tocrel]) 4)))

but only expect to handle something like
  lo_sum ((reg 31)
  (const (unspec [symbol_ref (some_var) tocrel])))

I also tidied the macho code which used a mix of orig_x and x,
makeing the code fragile wrt. some future change that assigns x
earlier in the function.  (If orig_x is a lo_sum, then x == orig_x
currently.)

Bootstrapped and regression tested powerpc64-linux.  OK to apply?

* config/rs6000/rs6000.c (rs6000_delegitimize_address): Handle
unspec plus offset.  Tidy macho code.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 173064)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6380,7 +6380,16 @@ rs6000_delegitimize_address (rtx orig_x)
   if (GET_CODE (x) == (TARGET_CMODEL != CMODEL_SMALL ? LO_SUM : PLUS)
GET_CODE (XEXP (x, 1)) == CONST)
 {
+  rtx offset = NULL_RTX;
+
   y = XEXP (XEXP (x, 1), 0);
+  if (GET_CODE (y) == PLUS
+  GET_MODE (y) == Pmode
+  CONST_INT_P (XEXP (y, 1)))
+   {
+ offset = XEXP (y, 1);
+ y = XEXP (y, 0);
+   }
   if (GET_CODE (y) == UNSPEC
XINT (y, 1) == UNSPEC_TOCREL
   ((GET_CODE (XEXP (x, 0)) == REG
@@ -6396,6 +6405,8 @@ rs6000_delegitimize_address (rtx orig_x)
  XEXP (XEXP (XEXP (x, 0), 1), 0)
{
  y = XVECEXP (y, 0, 0);
+ if (offset != NULL_RTX)
+   y = gen_rtx_PLUS (Pmode, y, offset);
  if (!MEM_P (orig_x))
return y;
  else
@@ -6405,9 +6416,9 @@ rs6000_delegitimize_address (rtx orig_x)
 
   if (TARGET_MACHO
GET_CODE (orig_x) == LO_SUM
-   GET_CODE (XEXP (x, 1)) == CONST)
+   GET_CODE (XEXP (orig_x, 1)) == CONST)
 {
-  y = XEXP (XEXP (x, 1), 0);
+  y = XEXP (XEXP (orig_x, 1), 0);
   if (GET_CODE (y) == UNSPEC
   XINT (y, 1) == UNSPEC_MACHOPIC_OFFSET)
return XVECEXP (y, 0, 0);

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC64 non-delegitimized UNSPEC_TOCREL

2011-04-28 Thread Alan Modra
On Thu, Apr 28, 2011 at 01:58:24PM -0400, David Edelsohn wrote:
 On Thu, Apr 28, 2011 at 11:22 AM, Alan Modra amo...@gmail.com wrote:
  * config/rs6000/rs6000.c (rs6000_delegitimize_address): Handle
unspec plus offset.  Tidy macho code.
 
 Looks good.

Committed mainline revision 173141.  I meant to ask about 4.6 as we
have the same problem there.  OK for 4.6 too?

-- 
Alan Modra
Australia Development Lab, IBM


Re: rs6000_handle_option global state avoidance, part 1

2011-05-05 Thread Alan Modra
On Thu, May 05, 2011 at 11:29:13AM -0400, David Edelsohn wrote:
 Alan, is mcmodel suppose to set m64?

No, that was an accident.

-- 
Alan Modra
Australia Development Lab, IBM


Fix PR48900, powerpc duplicate __tls_get_addr calls

2011-05-05 Thread Alan Modra
My fix for PR44266 using the libcall machinery to ensure we had a
proper stack frame allocated for __tls_get_addr calls sloppily used r3
as the arg to the dummy libcall.  This made the call seem to depend on
whatever was in r3 previously, at least until we get to the first
split pass and the real arg is exposed.  So DCE couldn't merge calls.
Even for a simple testcase like
extern __thread int i;
void foo (void) { i++; }
we get two __tls_get_addr calls if using global-dynamic tls model.

Easliy fixed by giving the dummy libcall an arg of zero.  An
alternative giving slightly better -O0 code would be to say that the
libcall doesn't have any args.  I chose to leave the libcall with one
arg since this is closest to the real __tls_get_addr call, and the
whole point of faking up a libcall here is to have the generic code do
whatever is necessary when making function calls.  It's not totally
impossible to imagine some future ABI change that treats zero arg
calls differently from other calls.

Bootstrapped and regression tested powerpc64-linux.  OK to apply
mainline, 4.6 and 4.5?

PR target/48900
* config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Use
const0_rtx as the arg to the dummy __tls_get_addr libcall.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 173464)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6412,10 +6412,11 @@ rs6000_legitimize_tls_address (rtx addr,
 
   if (model == TLS_MODEL_GLOBAL_DYNAMIC)
{
- r3 = gen_rtx_REG (Pmode, 3);
  tga = rs6000_tls_get_addr ();
- emit_library_call_value (tga, dest, LCT_CONST, Pmode, 1, r3, Pmode);
+ emit_library_call_value (tga, dest, LCT_CONST, Pmode,
+  1, const0_rtx, Pmode);
 
+ r3 = gen_rtx_REG (Pmode, 3);
  if (DEFAULT_ABI == ABI_AIX  TARGET_64BIT)
insn = gen_tls_gd_aix64 (r3, got, addr, tga, const0_rtx);
  else if (DEFAULT_ABI == ABI_AIX  !TARGET_64BIT)
@@ -6432,11 +6433,12 @@ rs6000_legitimize_tls_address (rtx addr,
}
   else if (model == TLS_MODEL_LOCAL_DYNAMIC)
{
- r3 = gen_rtx_REG (Pmode, 3);
  tga = rs6000_tls_get_addr ();
  tmp1 = gen_reg_rtx (Pmode);
- emit_library_call_value (tga, tmp1, LCT_CONST, Pmode, 1, r3, Pmode);
+ emit_library_call_value (tga, tmp1, LCT_CONST, Pmode,
+  1, const0_rtx, Pmode);
 
+ r3 = gen_rtx_REG (Pmode, 3);
  if (DEFAULT_ABI == ABI_AIX  TARGET_64BIT)
insn = gen_tls_ld_aix64 (r3, got, tga, const0_rtx);
  else if (DEFAULT_ABI == ABI_AIX  !TARGET_64BIT)

-- 
Alan Modra
Australia Development Lab, IBM


-Wshadow warning

2011-08-24 Thread Alan Modra
Wouldn't -Wshadow be more useful if it obeyed -Wno-system-headers?
For code like

#include stdlib.h
int foo (int atof);
int foo (int atof) { return atof; }

we currently do not warn on the prototype, but do on the function
definition, leading to reports such as
http://sourceware.org/bugzilla/show_bug.cgi?id=13121

The following has been bootstrapped and regression tested
powerpc-linux.  OK to apply?

* c-decl.c (warn_if_shadowing): Don't warn if shadowed
identifier is from system header.

Index: gcc/c-decl.c
===
--- gcc/c-decl.c(revision 178035)
+++ gcc/c-decl.c(working copy)
@@ -2516,7 +2516,10 @@ warn_if_shadowing (tree new_decl)
 
   /* Is anything being shadowed?  Invisible decls do not count.  */
   for (b = I_SYMBOL_BINDING (DECL_NAME (new_decl)); b; b = b-shadowed)
-if (b-decl  b-decl != new_decl  !b-invisible)
+if (b-decl  b-decl != new_decl  !b-invisible
+(b-decl == error_mark_node
+   || diagnostic_report_warnings_p (global_dc,
+DECL_SOURCE_LOCATION (b-decl
   {
tree old_decl = b-decl;
 

-- 
Alan Modra
Australia Development Lab, IBM


rs6000 toc reference rtl

2011-09-05 Thread Alan Modra
 gpc_reg_operand =r)
-(lo_sum:DI (match_operand:DI 1 gpc_reg_operand b)
-  (match_operand:DI 2  )))]
-   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
-   {cal %0,%2@l(%1)|addi %0,%1,%2@l})
 
 ;; Call and call_value insns
 (define_expand call

-- 
Alan Modra
Australia Development Lab, IBM


Re: rs6000 toc reference rtl

2011-09-07 Thread Alan Modra
On Tue, Sep 06, 2011 at 01:11:26AM +0930, Alan Modra wrote:
 Consequently, Mike's change to split rtl for
 indirect calls sometimes sees the scheduler moving the r2 load in the
 indirect call sequence before a toc reference.

Actually, this isn't correct.  Mike's change adding rs6000.c
rs6000_call_indirect_aix just made it more likely.  Even before this
post-reload scheduling could move the r2 load around, since rs6000.md
call_indirect_aix patterns were (and still are) split post-reload.

Here's an example I was shown today of such damage (qemu compiled
with gcc-4.6-redhat).

.LVL57151:
ld 0,0(31)  # load opd+0, function addr
addis 4,2,.LC4758@toc@ha
ld 11,16(31)
mr 7,3
std 2,40(1) # save r2
mr 5,25
addi 4,4,.LC4758@toc@l
mtctr 0 #
mr 6,26
ld 2,8(31)  # load opd+8, new toc ptr in r2
mr 3,28
.LBB255670:
.LBB255668:
.loc 8 98 0
addis 27,2,.LC4761@toc@ha   # oops, should be using old r2
.LVL57152:
addi 27,27,.LC4761@toc@l
.LBE255668:
.LBE255670:
.loc 3 9212 0
addis 25,2,.LC4762@toc@ha   # oops again
.loc 3 9198 0
bctrl   # make the call
ld 2,40(1)  # restore r2

r27 and r25 set up here for later use now contain bogus values.
The blame rests on my 2011-06-20 change.

-- 
Alan Modra
Australia Development Lab, IBM


PowerPC shrink-wrap support 0 of 3

2011-09-17 Thread Alan Modra
This patch series adds shrink-wrap support for PowerPC.  The patches
are on top of Bernd's Initial shrink-wrapping patch:
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
The spec results were a little disappointing as I expected to see some
gains, but my baseline was a -O3 run and I suppose most of the
shrink-wrap opportunities were lost to inlining.

I deliberately omitted defining RETURN_ADDR_REGNUM for PowerPC as I
don't see why it is necessary to treat LR any differently to other
regs saved by the prologue.  I believe code in requires_stack_frame_p
  CLEAR_HARD_REG_SET (hardregs);
  note_stores (PATTERN (insn), record_hard_reg_sets, hardregs);
  if (hard_reg_set_intersect_p (hardregs, prologue_used))
return true;
will correctly pick up any set of LR.  If code in a function body sets
LR without saving and restoring around the use, then the prologue must
save LR, so LR will be in prologue_used.

-- 
Alan Modra
Australia Development Lab, IBM


PowerPC shrink-wrap support 1 of 3

2011-09-17 Thread Alan Modra
This obvious patch extends my 2011-08-03 fix to simple_return.
Necessary for the same reason as the original patch, namely that
rs6000.md has a peephole2 that matches and recreates a conditional
jump.  With the third patch in this series applied, rs6000 will start
to use simple_return in conditional jumps.  Even though I think this
patch meets the obvious requirement, I won't apply it without the
other two since the assert may trigger on mips without the second
patch in this series.

PR rtl-optimization/49941
* jump.c (mark_jump_label_1): Set JUMP_LABEL for simple_return jumps.

diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/jump.c gcc-current/gcc/jump.c
--- gcc-bernd/gcc/jump.c2011-09-16 11:52:14.0 +0930
+++ gcc-current/gcc/jump.c  2011-09-16 09:59:39.0 +0930
@@ -1086,6 +1086,7 @@ mark_jump_label_1 (rtx x, rtx insn, bool
   return;
 
 case RETURN:
+case SIMPLE_RETURN:
   if (is_target)
{
  gcc_assert (JUMP_LABEL (insn) == NULL || JUMP_LABEL (insn) == x);
@@ -1408,7 +1409,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
   int i;
   const char *fmt;
 
-  if ((code == LABEL_REF  XEXP (x, 0) == olabel)
+  if ((code == LABEL_REF  XEXP (x, 0) == olabel)
   || x == olabel)
 {
   x = redirect_target (nlabel);

-- 
Alan Modra
Australia Development Lab, IBM


PowerPC shrink-wrap support 3 of 3

2011-09-17 Thread Alan Modra
)])]
   
   
 {
@@ -13002,7 +13019,7 @@
  (match_operand 2  )))
(use (match_operand:SI 3 immediate_operand O,n,O,n))
(use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   (DEFAULT_ABI == ABI_DARWIN
 || DEFAULT_ABI == ABI_V4)
 (INTVAL (operands[3])  CALL_LONG) == 0
@@ -15328,9 +15345,9 @@
  [(match_operand 1
  cc_reg_operand y)
   (const_int 0)])
- (return)
+ (any_return)
  (pc)))]
-  direct_return ()
+  return_pred
   *
 {
   return output_cbranch (operands[0], NULL, 0, insn);
@@ -15360,8 +15377,8 @@
  cc_reg_operand y)
   (const_int 0)])
  (pc)
- (return)))]
-  direct_return ()
+ (any_return)))]
+  return_pred
   *
 {
   return output_cbranch (operands[0], NULL, 1, insn);
@@ -15491,9 +15508,9 @@
   b %l0
   [(set_attr type branch)])
 
-(define_insn return
-  [(return)]
-  direct_return ()
+(define_insn return_strreturn
+  [(any_return)]
+  return_pred
   {br|blr}
   [(set_attr type jmpreg)])
 
@@ -16015,7 +16032,7 @@
(set_attr cell_micro always)])
 
 (define_insn *return_internal_mode
-  [(return)
+  [(simple_return)
(use (match_operand:P 0 register_operand lc))]
   
   b%T0
@@ -16077,7 +16094,7 @@
 
 (define_insn *return_and_restore_gpregs_mode_r11
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 11))
@@ -16090,7 +16107,7 @@
 
 (define_insn *return_and_restore_gpregs_mode_r12
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 12))
@@ -16103,7 +16120,7 @@
 
 (define_insn *return_and_restore_gpregs_mode_r1
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 1))
@@ -16116,7 +16133,7 @@
 
 (define_insn *return_and_restore_fpregs_mode_r11
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 11))
@@ -16129,7 +16146,7 @@
 
 (define_insn *return_and_restore_fpregs_mode_r12
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 12))
@@ -16142,7 +16159,7 @@
 
 (define_insn *return_and_restore_fpregs_mode_r1
  [(match_parallel 0 any_parallel_operand
-  [(return)
+  [(simple_return)
   (clobber (match_operand:P 1 register_operand =l))
   (use (match_operand:P 2 symbol_ref_operand s))
(use (reg:P 1))
@@ -16155,7 +16172,7 @@
 
 (define_insn *return_and_restore_fpregs_aix_mode_r11
  [(match_parallel 0 any_parallel_operand
- [(return)
+ [(simple_return)
   (use (match_operand:P 1 register_operand l))
   (use (match_operand:P 2 symbol_ref_operand s))
   (use (reg:P 11))
@@ -16168,7 +16185,7 @@
 
 (define_insn *return_and_restore_fpregs_aix_mode_r1
  [(match_parallel 0 any_parallel_operand
- [(return)
+ [(simple_return)
   (use (match_operand:P 1 register_operand l))
   (use (match_operand:P 2 symbol_ref_operand s))
   (use (reg:P 1))
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/spe.md 
gcc-current/gcc/config/rs6000/spe.md
--- gcc-bernd/gcc/config/rs6000/spe.md  2011-09-16 11:52:15.0 +0930
+++ gcc-current/gcc/config/rs6000/spe.md2011-09-15 18:28:43.0 
+0930
@@ -3178,7 +3178,7 @@
 
 (define_insn *return_and_restore_gpregs_spe
  [(match_parallel 0 any_parallel_operand
- [(return)
+ [(simple_return)
   (clobber (reg:P 65))
   (use (match_operand:P 1 symbol_ref_operand s))
   (use (reg:P 11))

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 0 of 3

2011-09-18 Thread Alan Modra
On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote:
 On 09/17/11 09:16, Alan Modra wrote:
  This patch series adds shrink-wrap support for PowerPC.  The patches
  are on top of Bernd's Initial shrink-wrapping patch:
  http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
  tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
  tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
  The spec results were a little disappointing as I expected to see some
  gains, but my baseline was a -O3 run and I suppose most of the
  shrink-wrap opportunities were lost to inlining.
 
 The last posted version had a bug that crept in during the review cycle,
 and which made it quite ineffective.

I wasn't complaining!  My disappointment really stemmed from having
unrealistically high expectations.  I still think this optimization is
a great feature.  Thanks for contributing it!

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Bug fix in store_bit_field_1 for big endian targets (issue 51893)

2012-03-22 Thread Alan Modra
On Wed, Mar 21, 2012 at 10:53:07PM +0100, Eric Botcazou wrote:
  This patch (for 4.6) fixes a wrong subword index computation in
  store_bit_field_1 for big endian targets when value is at least 4 times
  bigger than a word (DI REG value with HI words).
 
  It fixes a regression on gcc.c-torture/execute/bitfld-3.c for my current
  backend port.
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51893
 
  OK to commit?
 
 It is OK for mainline on principle but there is no ChangeLog entry and the 
 patch doesn't follow the GNU Coding Style: TABs instead of spaces, spaces 
 before parentheses, etc.  See the equivalent code in extract_bit_field_1.
 Moreover you need to properly test it on a mainstream big-endian platform.

Passes bootstrap and regression test powerpc64-linux.

-- 
Alan Modra
Australia Development Lab, IBM


rs6000 toc reference rtl again

2012-03-27 Thread Alan Modra
)
-(high:TLSmode
-  (unspec:TLSmode [(match_operand:TLSmode 2 rs6000_tls_symbol_ref 
)]
-  UNSPEC_TLSGOTDTPREL)]
+ (high:TLSmode
+   (unspec:TLSmode [(match_operand:TLSmode 1 gpc_reg_operand b)
+   (match_operand:TLSmode 2 rs6000_tls_symbol_ref )]
+  UNSPEC_TLSGOTDTPREL)))]
   HAVE_AS_TLS  TARGET_CMODEL != CMODEL_SMALL
   addis %0,%1,%2@got@dtprel@ha
   [(set_attr length 4)])
@@ -11855,10 +11847,8 @@
   lTLSmode:tls_insn_suffix %0,%2@got@tprel(%1)
TARGET_CMODEL != CMODEL_SMALL
   [(set (match_dup 3)
-   (const:TLSmode
- (plus:TLSmode (match_dup 1)
-   (high:TLSmode
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGOTTPREL)
+   (high:TLSmode
+   (unspec:TLSmode [(match_dup 1) (match_dup 2)] UNSPEC_TLSGOTTPREL)))
(set (match_dup 0)
(lo_sum:TLSmode (match_dup 3)
(unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGOTTPREL)))]
@@ -11873,11 +11863,10 @@
 
 (define_insn *tls_got_tprel_highTLSmode:tls_abi_suffix
   [(set (match_operand:TLSmode 0 gpc_reg_operand =b)
- (const:TLSmode
-   (plus:TLSmode (match_operand:TLSmode 1 gpc_reg_operand b)
-(high:TLSmode
-  (unspec:TLSmode [(match_operand:TLSmode 2 rs6000_tls_symbol_ref 
)]
-  UNSPEC_TLSGOTTPREL)]
+ (high:TLSmode
+   (unspec:TLSmode [(match_operand:TLSmode 1 gpc_reg_operand b)
+   (match_operand:TLSmode 2 rs6000_tls_symbol_ref )]
+  UNSPEC_TLSGOTTPREL)))]
   HAVE_AS_TLS  TARGET_CMODEL != CMODEL_SMALL
   addis %0,%1,%2@got@tprel@ha
   [(set_attr length 4)])
@@ -12223,6 +12212,25 @@
   DONE;
 })
 
+;; Largetoc support
+(define_insn largetoc_high
+  [(set (match_operand:DI 0 gpc_reg_operand =b*r)
+(high:DI
+ (unspec [(match_operand:DI 1  )
+  (match_operand:DI 2 gpc_reg_operand b)]
+ UNSPEC_TOCREL)))]
+   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
+   {cau|addis} %0,%2,%1@toc@ha)
+
+(define_insn largetoc_low
+  [(set (match_operand:DI 0 gpc_reg_operand =r,r)
+(lo_sum:DI (match_operand:DI 1 gpc_reg_operand b,!*r)
+  (match_operand:DI 2  )))]
+   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
+   @
+{cal %0,%2@l(%1)|addi %0,%1,%2@l}
+{ai|addic} %0,%1,%2@l)
+
 ;; Elf specific ways of loading addresses for non-PIC code.
 ;; The output of this could be r0, but we make a very strong
 ;; preference for a base register because it will usually
@@ -12241,22 +12249,6 @@
@
 {cal|la} %0,%2@l(%1)
 {ai|addic} %0,%1,%K2)
-
-;; Largetoc support
-(define_insn largetoc_high
-  [(set (match_operand:DI 0 gpc_reg_operand =b)
-   (const:DI
- (plus:DI (match_operand:DI 1 gpc_reg_operand b)
-  (high:DI (match_operand:DI 2  )]
-   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
-   {cau|addis} %0,%1,%2@ha)
-
-(define_insn largetoc_low
-  [(set (match_operand:DI 0 gpc_reg_operand =r)
-(lo_sum:DI (match_operand:DI 1 gpc_reg_operand b)
-  (match_operand:DI 2  )))]
-   TARGET_ELF  TARGET_CMODEL != CMODEL_SMALL
-   {cal %0,%2@l(%1)|addi %0,%1,%2@l})
 
 ;; Call and call_value insns
 (define_expand call
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 185830)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -38,7 +38,7 @@ extern bool macho_lo_sum_memory_operand (rtx, enum
 extern int num_insns_constant (rtx, enum machine_mode);
 extern int num_insns_constant_wide (HOST_WIDE_INT);
 extern int small_data_operand (rtx, enum machine_mode);
-extern bool toc_relative_expr_p (rtx);
+extern bool toc_relative_expr_p (const_rtx, bool);
 extern bool invalid_e500_subreg (rtx, enum machine_mode);
 extern void validate_condition_mode (enum rtx_code, enum machine_mode);
 extern bool legitimate_constant_pool_address_p (const_rtx, enum machine_mode,

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Stack tie fix.

2012-04-03 Thread Alan Modra
;
- emit_insn (gen_frame_tie (mem1, mem2));
-   }
+   rs6000_emit_stack_tie (frame_reg_rtx, true);
 
   insn = emit_insn (gen_add3_insn (frame_reg_rtx, hard_frame_pointer_rtx,
   GEN_INT (info-total_size)));
@@ -20456,11 +20465,7 @@ rs6000_emit_epilogue (int sibcall)
   /* Prevent reordering memory accesses against stack pointer restore.  */
   if (cfun-calls_alloca
  || offset_below_red_zone_p (-info-total_size))
-   {
- rtx mem = gen_rtx_MEM (BLKmode, sp_reg_rtx);
- MEM_NOTRAP_P (mem) = 1;
- emit_insn (gen_stack_tie (mem));
-   }
+   rs6000_emit_stack_tie (frame_reg_rtx, false);
   insn = emit_insn (gen_add3_insn (sp_reg_rtx, sp_reg_rtx,
   GEN_INT (info-total_size)));
   sp_offset = 0;
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 185830)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1451,3 +1451,11 @@
 
   return 1;
 })
+
+;; Return 1 if OP is a stack tie operand.
+(define_predicate tie_operand
+  (match_code mem)
+{
+  return (GET_CODE (XEXP (op, 0)) == UNSPEC
+  XINT (XEXP (op, 0), 1) == UNSPEC_TIE);
+})
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 185830)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -11989,17 +11989,20 @@
 (define_expand restore_stack_block
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 2))
-   (set (match_dup 5) (unspec:BLK [(match_dup 5)] UNSPEC_TIE))
+   (set (match_dup 5) (const_int 0))
(set (match_operand 0 register_operand )
(match_operand 1 register_operand ))]
   
   
 {
+  rtx u;
+
   operands[1] = force_reg (Pmode, operands[1]);
   operands[2] = gen_reg_rtx (Pmode);
   operands[3] = gen_frame_mem (Pmode, operands[0]);
   operands[4] = gen_frame_mem (Pmode, operands[1]);
-  operands[5] = gen_frame_mem (BLKmode, operands[0]);
+  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
+  operands[5] = gen_frame_mem (BLKmode, u);
 })
 
 (define_expand save_stack_nonlocal
@@ -12022,12 +12025,13 @@
   [(set (match_dup 2) (match_operand 1 memory_operand ))
(set (match_dup 3) (match_dup 4))
(set (match_dup 5) (match_dup 2))
-   (set (match_dup 6) (unspec:BLK [(match_dup 6)] UNSPEC_TIE))
+   (set (match_dup 6) (const_int 0))
(set (match_operand 0 register_operand ) (match_dup 3))]
   
   
 {
   int units_per_word = (TARGET_32BIT) ? 4 : 8;
+  rtx u;
 
   /* Restore the backchain from the first word, sp from the second.  */
   operands[2] = gen_reg_rtx (Pmode);
@@ -12035,7 +12039,8 @@
   operands[1] = adjust_address_nv (operands[1], Pmode, 0);
   operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word);
   operands[5] = gen_frame_mem (Pmode, operands[3]);
-  operands[6] = gen_frame_mem (BLKmode, operands[0]);
+  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
+  operands[6] = gen_frame_mem (BLKmode, u);
 })
 
 ;; TOC register handling.
@@ -15899,25 +15904,15 @@
   [(set_attr type branch)
(set_attr length 4)])
 
-; These are to explain that changes to the stack pointer should
-; not be moved over stores to stack memory.
+; This is to explain that changes to the stack pointer should
+; not be moved over loads from or stores to stack memory.
 (define_insn stack_tie
-  [(set (match_operand:BLK 0 memory_operand +m)
-(unspec:BLK [(match_dup 0)] UNSPEC_TIE))]
+  [(set (match_operand:BLK 0 tie_operand )
+   (const_int 0))]
   
   
   [(set_attr length 0)])
 
-; Like stack_tie, but depend on both fp and sp based memory.
-(define_insn frame_tie
-  [(set (match_operand:BLK 0 memory_operand +m)
-   (unspec:BLK [(match_dup 0)
-(match_operand:BLK 1 memory_operand m)] UNSPEC_TIE))]
-  
-  
-  [(set_attr length 0)])
-
-
 (define_expand epilogue
   [(use (const_int 0))]
   

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Fix PR 50906, eh_frame and other woes

2011-12-05 Thread Alan Modra
Ping http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR libgomp/51376 fix

2011-12-08 Thread Alan Modra
On Fri, Dec 02, 2011 at 08:10:11PM +1030, Alan Modra wrote:
   PR libgomp/51376
   * task.c (GOMP_taskwait): Don't access task-children outside of
   task_lock mutex region.
   (GOMP_task): Likewise.

Committed revision 182151 on rth's irc OK.

-- 
Alan Modra
Australia Development Lab, IBM


FIX PR51926, libstdc++ iterator store bigendian bitfield related

2012-01-20 Thread Alan Modra
This patch fixes a wrong-code bug in store_bit_field.  When storing a
multi-word BLKmode value (such as from a RECORD_TYPE), the last word
is trashed on big-endian targets.  See the PR for the gory details.
Bootstrapped and regression tested powerpc64-linux.  Fixes 74
testsuite failures.  OK for mainline?  And earlier versions of gcc?

PR middle-end/51926
* expmed.c (store_bit_field_1): Properly handle last word of
BLKmode value when bigendian.

Index: gcc/expmed.c
===
--- gcc/expmed.c(revision 183228)
+++ gcc/expmed.c(working copy)
@@ -551,19 +551,27 @@ store_bit_field_1 (rtx str_rtx, unsigned
  /* If I is 0, use the low-order word in both field and target;
 if I is 1, use the next to lowest word; and so on.  */
  unsigned int wordnum = (backwards ? nwords - i - 1 : i);
+ unsigned int nbits = (i == nwords - 1
+   ? bitsize - i * BITS_PER_WORD
+   : BITS_PER_WORD);
  unsigned int bit_offset = (backwards
-? MAX ((int) bitsize - ((int) i + 1)
-   * BITS_PER_WORD,
-   0)
-: (int) i * BITS_PER_WORD);
+? bitsize - i * BITS_PER_WORD - nbits
+: i * BITS_PER_WORD);
  rtx value_word = operand_subword_force (value, wordnum, fieldmode);
 
- if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
-   bitsize - i * BITS_PER_WORD),
- bitnum + bit_offset,
+ if (WORDS_BIG_ENDIAN  !backwards  nbits != BITS_PER_WORD)
+   {
+ /* Original fieldmode was BLKmode.  The value in this
+case consists of bits in memory order, so any unused
+bits in the last word are in the least significant
+position.  */
+ value_word = expand_shift (RSHIFT_EXPR, word_mode, value_word,
+BITS_PER_WORD - nbits,
+value_word, 1);
+   }
+ if (!store_bit_field_1 (op0, nbits, bitnum + bit_offset,
  bitregion_start, bitregion_end,
- word_mode,
- value_word, fallback_p))
+ word_mode, value_word, fallback_p))
{
  delete_insns_since (last);
  return false;

-- 
Alan Modra
Australia Development Lab, IBM


Re: Bug store_bit_field_1 + patch

2012-01-20 Thread Alan Modra
On Wed, Jan 18, 2012 at 12:48:45PM +0100, Aurelien Buhrig wrote:
 The problem occurs when value is a REG and bitsize  BITS_PER_WORD. This
 is because wordnum, which is used to get the subword of value, is
 incorrectly computed, in BIG_ENDIAN, wrt the number of words needed by
 bitsize instead of the number of words needed by the integer mode of the
 field, which is the mode used to compute subwords.

That doesn't sound right.  wordnum is passed to operand_subword as its
offset param, which counts in words of word_mode size.  I think you
have a problem elsewhere.

-- 
Alan Modra
Australia Development Lab, IBM


Re: FIX PR51926, libstdc++ iterator store bigendian bitfield related

2012-01-23 Thread Alan Modra
On Sat, Jan 21, 2012 at 11:07:10AM +1030, Alan Modra wrote:
   PR middle-end/51926
   * expmed.c (store_bit_field_1): Properly handle last word of
   BLKmode value when bigendian.

Blah, I was working on an old tree.  The bug is already fixed.

-- 
Alan Modra
Australia Development Lab, IBM


<    1   2   3   4   5   6   7   8   9   10   >