Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote:
 --- libgomp/config/linux/bar.c(revision 181718)
 +++ libgomp/config/linux/bar.c(working copy)
 @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
if (__builtin_expect ((state  1) != 0, 0))
  {
/* Next time we'll be awaiting TOTAL threads again.  */
 -  bar-awaited = bar-total;
 -  atomic_write_barrier ();
 -  bar-generation += 4;
 +  __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE);
 +  __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL);
futex_wake ((int *) bar-generation, INT_MAX);
  }

The above is unfortunate, bar-generation should be only modified from a
single thread at this point, but the __atomic_add_fetch will enforce there
an atomic instruction on it rather than just some kind of barrier.

Jakub


Re: Use atomics in libgomp mutex

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote:
 On Mon, Nov 28, 2011 at 09:27:20AM +1030, Alan Modra wrote:
  This is the mutex part.  Depends on
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
  values.
 
 Arrgh, I posted the wrong patch.  I know it needs work elsewhere in
 libgomp to comply with the OMP spec, which says of flush:
 
 14 A flush region without a list is implied at the following locations:
 15 • During a barrier region.
 16 • At entry to and exit from parallel, critical, and ordered regions.
 [snip]
 
 Yet COMP_critical_start simply calls gomp_mutex_lock, relying on the
 lock acquisition to do the flush.  This won't necessarily happen with
 MEMMODEL_ACQUIRE.
 
 This bit in gomp_mutex_lock
  +  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
  +  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
 should be
   __atomic_compare_exchange_4 (mutex, oldval, 1, false,
  MEMMODEL_ACQ_REL, MEMMODEL_ACQ_REL);
 
 and once you do that it's hard to justify the patch for stage3 as
 fixing a bug (unnecessary sync instructions emitted by __sync
 builtins).

If it is just GOMP_critical_start, couldn't it use an extra barrier
before resp. after it calls mutex_lock?

Jakub


Re: Use atomics in libgomp mutex

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:15:02AM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote:
  14 A flush region without a list is implied at the following locations:
  15 • During a barrier region.
  16 • At entry to and exit from parallel, critical, and ordered regions.
  [snip]

 If it is just GOMP_critical_start, couldn't it use an extra barrier
 before resp. after it calls mutex_lock?

I believe gomp_ordered_sync needs a barrier, and I'm unsure about
GOMP_parallel_start but I think one is needed there too.  Yes, putting
the barriers where they are actually needed is the best solution, but
I'm far from an OpenMP expert so am unsure whether these are the only
places requiring a barrier.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote:
  --- libgomp/config/linux/bar.c  (revision 181718)
  +++ libgomp/config/linux/bar.c  (working copy)
  @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
 if (__builtin_expect ((state  1) != 0, 0))
   {
 /* Next time we'll be awaiting TOTAL threads again.  */
  -  bar-awaited = bar-total;
  -  atomic_write_barrier ();
  -  bar-generation += 4;
  +  __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE);
  +  __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL);
 futex_wake ((int *) bar-generation, INT_MAX);
   }
 
 The above is unfortunate, bar-generation should be only modified from a
 single thread at this point, but the __atomic_add_fetch will enforce there
 an atomic instruction on it rather than just some kind of barrier.

I will try without the atomic add and see what happens.

-- 
Alan Modra
Australia Development Lab, IBM


Add __stpncpy_chk builtin support

2011-11-28 Thread Romain Geissler
Hi,

The current trunk have no support for the __stpncpy_chk function. As the latest
OS X (10.7 aka Lion) now defines stpncpy as a call to builtin__stpncpy_chk for
GNUC compliant compiler, the following code won't link:

#include string.h

int main(){
char string[10];
stpncpy(string,abcd,8);

return 0;
}

Undefined symbols for architecture x86_64:
  ___builtin___stpncpy_chk, referenced from:
  _main in ccGFZoZV.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status


This patch add support for this builtin.
Bootstrapped and tested without regression on darwin x86_64.

Romain Geissler

gcc/

2011-11-28  Romain Geissler  romain.geiss...@gmail.com

* builtins.def (BUILT_IN_STPNCPY_CHK): New definition.
* builtins.c (expand_builtin, fold_builtin_4, maybe_emit_chk_warning):
Add BUILT_IN_STPNCPY_CHK case.
* gimple-fold.c (gimple_fold_builtin): Likewise.
* tree-object-size.c (pass_through_call): Likewise.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
call_may_clobber_ref_p_1): Likewise.
* tree-ssa-structalias.c (find_func_aliases_for_builtin_call,
find_func_clobbers): Likewise.
* tree.h (fold_builtin_strncpy_chk): Rename to fold_builtin_stxncpy_chk
* builtins.c (fold_builtin_strncpy_chk): Likewise.
Rewrite stpncpy_chk calls to strncpy_chk calls if returned value is
ignored.

gcc/testsuite/

2011-11-28  Romain Geissler  romain.geiss...@gmail.com

* gcc.c-torture/execute/builtins/chk.h (stpncpy, stpncpy_disallowed):
New definitions.
* gcc.c-torture/execute/builtins/lib/chk.c (stpncpy_disallowed):
Likewise.
(stpncpy, __stpncpy_chk): New functions.
* gcc.c-torture/execute/builtins/stpncpy-chk-lib.c: New file.
* gcc.c-torture/execute/builtins/stpncpy-chk.c: Likewise.



lion_build.patch
Description: Binary data


[RFA/ARM] Add an integer pipeline description for Cortex-A15

2011-11-28 Thread Matthew Gretton-Dann

All,

The attached patch adds a integer pipeline description for Cortex-A15.

Although not dependent on my testing has been done on top of Sameera's 
Deshpande's A15 Prologue/Epilogue patches (see: 
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00856.html and following). 
 Testing on some popular embedded benchmarks shows the pipeline 
description gives a 1.8% performance improvement on (geomean) average. 
The testsuite shows no regressions targetting arm-none-eabi and using QEmu.


Can someone please review.

Thanks,

Matt

gcc/ChangeLog:
2011-11-28  Matthew Gretton-Dannmatthew.gretton-d...@arm.com

* config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple
issue.
* config/arm/arm.md (mul64): Add new attribute.
(generic_sched): Cortex-A15 is not scheduled generically
(cortex-a15.md): Include.
* config/arm/cortex-a15.md: New machine description.
* config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md

--
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3b0b88..d17f2b5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24123,6 +24123,9 @@ arm_issue_rate (void)
 {
   switch (arm_tune)
 {
+case cortexa15:
+  return 3;
+
 case cortexr4:
 case cortexr4f:
 case cortexr5:
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a78ba88..facbf92 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -355,6 +355,13 @@
 (const_string mult)
 (const_string alu)))
 
+; Is this an (integer side) multiply with a 64-bit result?
+(define_attr mul64 no,yes
+(if_then_else
+  (eq_attr insn 
smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
+  (const_string yes)
+  (const_string no)))
+
 ; Load scheduling, set from the arm_ld_sched variable
 ; initialized by arm_option_override()
 (define_attr ldsched no,yes (const (symbol_ref arm_ld_sched)))
@@ -518,7 +525,7 @@
 
 (define_attr generic_sched yes,no
   (const (if_then_else
-  (ior (eq_attr tune 
fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexm4)
+  (ior (eq_attr tune 
fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexa15,cortexm4)
   (eq_attr tune_cortexr4 yes))
   (const_string no)
   (const_string yes
@@ -544,6 +551,7 @@
 (include cortex-a5.md)
 (include cortex-a8.md)
 (include cortex-a9.md)
+(include cortex-a15.md)
 (include cortex-r4.md)
 (include cortex-r4f.md)
 (include cortex-m4.md)
diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md
new file mode 100644
index 000..ccab7cb
--- /dev/null
+++ b/gcc/config/arm/cortex-a15.md
@@ -0,0 +1,186 @@
+;; ARM Cortex-A15 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;
+;; Written by Matthew Gretton-Dann matthew.gretton-d...@arm.com
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; http://www.gnu.org/licenses/.
+
+(define_automaton cortex_a15)
+
+;; The Cortex-A15 core is modelled as a triple issue pipeline that has
+;; the following dispatch units.
+;; 1. Two pipelines for simple integer operations: SX1, SX2
+;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2
+;; 3. One pipeline for branch operations: BX
+;; 4. One pipeline for integer multiply and divide operations: MX
+;; 5. Two pipelines for load and store operations: LS1, LS2
+;;
+;; We can issue into three pipelines per-cycle.
+;;
+;; We assume that where we have unit pairs xx1 is always filled before xx2.
+
+;; The three issue units
+(define_cpu_unit ca15_i0, ca15_i1, ca15_i2 cortex_a15)
+
+(define_reservation ca15_issue1 (ca15_i0|ca15_i1|ca15_i2))
+(define_reservation ca15_issue2 ((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2)))
+(define_reservation ca15_issue3 (ca15_i0+ca15_i1+ca15_i2))
+(final_presence_set ca15_i1 ca15_i0)
+(final_presence_set ca15_i2 ca15_i1)
+
+;; The main dispatch units
+(define_cpu_unit ca15_sx1, ca15_sx2 cortex_a15)
+(define_cpu_unit ca15_cx1, ca15_cx2 cortex_a15)
+(define_cpu_unit ca15_ls1, ca15_ls2 cortex_a15)
+(define_cpu_unit ca15_bx, ca15_mx cortex_a15)
+
+(define_reservation ca15_ls (ca15_ls1|ca15_ls2))
+
+;; The extended load-store 

Re: [Patch, Fortran] MOVE_ALLOC fixes

2011-11-28 Thread Tobias Burnus

Dear all,

attached is an updated patch, I hadn't considered that the declared type 
of polymorphic arguments could be different. The middle end was not 
happy about:

  to = from
where to and from where different class containers. I was 
considering to force the assignment (via VIEW_CONVERT_EXPR), but Jakub 
warns that this might confuse tree-sra while the produced code will 
anyway consists of two pointer assignments. Hence, I now directly 
generate two pointer assignments (for _data and _vptr). (The patch only 
affects scalar polymorphic variables; for arrays, there were already two 
pointer  assignments.)


Build and regtested on x86-64-linux (trunk and trunk + Paul's patch).
OK?

Tobias

On 11/26/2011 02:50 PM, Tobias Burnus wrote:

Dear all,

(First, this is *not* for the 4.6/4.7 rejects-valid regression, which 
is related to intent(in) pointers with allocatable components.)


When debugging an issue with with polymorphic arrays and MOVE_ALLOC, I 
got lost in the code generation of move_alloc - and didn't like the 
generated code. Thus, I have rewritten the trans*.c part of it. (It 
turned out that the issue, we had, was unrelated to move_alloc.)


Changes:
* Replace call to libgfortran by inline code (much faster and shorter 
code)

* For arrays: Deallocate from (deep freeing)
* For polymorphic arrays: set _vptr.

Actually, the required code is rather simple: For move_alloc(from, 
to), one just needs to do:


a) Deallocate to, taking allocatable components and the polymorphic 
types into account (the latter is a to-do item, cf. PR 46174).


b) Do a simple assignment:
   to = from
namely: If both are scalar variables, those are pointers and one does 
a pointer assignment. If they are polymorphic and/or an array, one 
does a (nonpointer) assignment to the class container or the array 
descriptor.


c) Setting from = NULL (nonpolymorphic scalars) or from.data = 
NULL (nonpolymorphic arrays) or from._data = NULL (polymorphic 
scalars) or from._data.data = NULL (polymorphic arrays).


For (b) the current expr-ref-walking function for polymorphic arrays 
either give access to class._data or to class._vptr. It is extremly 
difficult to access class itself. Thus, I now do two assignments: 
One nonpointer one to array descriptor and one pointer assignment to 
the _vptr.


Build and regtested with the trunk with Paul's polymorphic array patch 
applied. (I will do a bootstrap and regtest with a clean trunk before 
committal.)

OK for the trunk?

Tobias

PS: I'll add _gfortran_move_alloc to the list of functions which can 
be removed after the ABI breakage.
2011-11-28  Tobias Burnus  bur...@net-b.de

	PR fortran/51306
	PR fortran/48700 
	* check.c (gfc_check_move_alloc): Make sure that from/to
	are both polymorphic or neither.
	* trans-intrinsic.c (conv_intrinsic_move_alloc): Cleanup,
	generate inline code.

2011-11-28  Tobias Burnus  bur...@net-b.de

	PR fortran/51306
	PR fortran/48700 
	* gfortran.dg/move_alloc_5.f90: Add dg-error.
	* gfortran.dg/select_type_23.f03: Add dg-error.
	* gfortran.dg/move_alloc_6.f90: New.
	* gfortran.dg/move_alloc_7.f90: New.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index d9b9a9c..e2b0d66 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -2691,6 +2709,14 @@ gfc_check_move_alloc (gfc_expr *from, gfc_expr *to)
   if (same_type_check (to, 1, from, 0) == FAILURE)
 return FAILURE;
 
+  if (to-ts.type != from-ts.type)
+{
+  gfc_error (The FROM and TO arguments in MOVE_ALLOC call at %L must be 
+		 either both polymorphic or both nonpolymorphic,
+		 from-where);
+  return FAILURE;
+}
+
   if (to-rank != from-rank)
 {
   gfc_error (the '%s' and '%s' arguments of '%s' intrinsic at %L must 
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 4244570..77c065e 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -5892,7 +5892,7 @@
 }
 
 
-/* Generate code for SELECTED_REAL_KIND (P, R) intrinsic function.  */
+/* Generate code for SELECTED_REAL_KIND (P, R, RADIX) intrinsic function.  */
 
 static void
 gfc_conv_intrinsic_sr_kind (gfc_se *se, gfc_expr *expr)
@@ -7182,50 +7190,123 @@ conv_intrinsic_atomic_ref (gfc_code *code)
 static tree
 conv_intrinsic_move_alloc (gfc_code *code)
 {
-  if (code-ext.actual-expr-rank == 0)
-{
-  /* Scalar arguments: Generate pointer assignments.  */
-  gfc_expr *from, *to, *deal;
-  stmtblock_t block;
-  tree tmp;
-  gfc_se se;
+  stmtblock_t block;
+  gfc_expr *from_expr, *to_expr;
+  gfc_expr *to_expr2, *from_expr2;
+  gfc_se from_se, to_se;
+  gfc_ss *from_ss, *to_ss;
+  tree tmp;
 
-  from = code-ext.actual-expr;
-  to = code-ext.actual-next-expr;
+  gfc_start_block (block);
 
-  gfc_start_block (block);
+  from_expr = code-ext.actual-expr;
+  to_expr = code-ext.actual-next-expr;
 
-  /* Deallocate 'TO' argument.  */
-  gfc_init_se (se, NULL);
-  se.want_pointer = 1;
-  deal = gfc_copy_expr 

[v3] libstdc++/51288

2011-11-28 Thread Paolo Carlini

Hi,

when I implemented get/put_money I gorgot about the sentry, oops. Tested 
x86_64-linux, committed to mainline.


Paolo.

/
2011-11-28  Paolo Carlini  paolo.carl...@oracle.com

PR libstdc++/51288
* include/std/iomanip (get_money, put_money): Use sentry.
* testsuite/27_io/manipulators/extended/get_money/char/51288.cc: New.
* testsuite/27_io/manipulators/extended/get_money/wchar_t/51288.cc:
Likewise.
* testsuite/27_io/manipulators/extended/put_money/char/51288.cc:
Likewise.
* testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc:
Likewise.
Index: include/std/iomanip
===
--- include/std/iomanip (revision 181773)
+++ include/std/iomanip (working copy)
@@ -1,7 +1,7 @@
 // Standard stream manipulators -*- C++ -*-
 
 // Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
-// 2006, 2007, 2008, 2009, 2010
+// 2006, 2007, 2008, 2009, 2010, 2011
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -262,18 +262,29 @@
 basic_istream_CharT, _Traits
 operator(basic_istream_CharT, _Traits __is, _Get_money_MoneyT __f)
 {
-  typedef istreambuf_iterator_CharT, _Traits _Iter;
-  typedef money_get_CharT, _Iter _MoneyGet;
-  
-  ios_base::iostate __err = ios_base::goodbit;
-  const _MoneyGet __mg = use_facet_MoneyGet(__is.getloc());
+  typename basic_istream_CharT, _Traits::sentry __cerb(__is, false);
+  if (__cerb)
+   {
+ ios_base::iostate __err = ios_base::goodbit;
+ __try
+   {
+ typedef istreambuf_iterator_CharT, _Traits   _Iter;
+ typedef money_get_CharT, _Iter   _MoneyGet;
 
-  __mg.get(_Iter(__is.rdbuf()), _Iter(), __f._M_intl,
-  __is, __err, __f._M_mon);
-
-  if (ios_base::goodbit != __err)
-   __is.setstate(__err);
-
+ const _MoneyGet __mg = use_facet_MoneyGet(__is.getloc());
+ __mg.get(_Iter(__is.rdbuf()), _Iter(), __f._M_intl,
+  __is, __err, __f._M_mon);
+   }
+ __catch(__cxxabiv1::__forced_unwind)
+   {
+ __is._M_setstate(ios_base::badbit);
+ __throw_exception_again;
+   }
+ __catch(...)
+   { __is._M_setstate(ios_base::badbit); }
+ if (ios_base::goodbit != __err)
+   __is.setstate(__err);
+   }
   return __is; 
 }
 
@@ -298,16 +309,27 @@
 basic_ostream_CharT, _Traits 
 operator(basic_ostream_CharT, _Traits __os, _Put_money_MoneyT __f)
 {
-  typedef ostreambuf_iterator_CharT, _Traits _Iter;
-  typedef money_put_CharT, _Iter _MoneyPut;
-  
-  const _MoneyPut __mp = use_facet_MoneyPut(__os.getloc());
-  const _Iter __end = __mp.put(_Iter(__os.rdbuf()), __f._M_intl,
-  __os, __os.fill(), __f._M_mon);
-
-  if (__end.failed())
-   __os.setstate(ios_base::badbit);
-
+  typename basic_ostream_CharT, _Traits::sentry __cerb(__os);
+  if (__cerb)
+   {
+ __try
+   {
+ typedef ostreambuf_iterator_CharT, _Traits   _Iter;
+ typedef money_put_CharT, _Iter   _MoneyPut;
+ const _MoneyPut __mp = use_facet_MoneyPut(__os.getloc());
+ const _Iter __end = __mp.put(_Iter(__os.rdbuf()), __f._M_intl,
+  __os, __os.fill(), __f._M_mon);
+ if (__end.failed())
+   __os.setstate(ios_base::badbit);
+   }
+ __catch(__cxxabiv1::__forced_unwind)
+   {
+ __os._M_setstate(ios_base::badbit);
+ __throw_exception_again;
+   }
+ __catch(...)
+   { __os._M_setstate(ios_base::badbit); }
+   }
   return __os; 
 }
 
Index: testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc
===
--- testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc
(revision 0)
+++ testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc
(revision 0)
@@ -0,0 +1,49 @@
+// { dg-options -std=gnu++0x }
+// { dg-require-namedlocale en_US.UTF-8 }
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy 

[Patch, Fortran] PR51308 - fix bogus c_null_ptr error

2011-11-28 Thread Tobias Burnus
gfortran printed an error that c_null_ptr couldn't be SAVE an 
PARAMETER at the same time; removing the SAVE attribute causes an error 
that a BIND(C) variable may not be a PARAMETER, thus that has to be 
fixed as well.

(Patch based on the one of Steven

Build and regtested on x86-64-linux.
OK for the trunk?

Tobias
2011-11-28  Tobias Burnus  bur...@net-b.de
	Steven G. Kargl  ka...@gcc.gnu.org

	PR fortran/51308
	* symbol.c (check_conflict): Ignore BIND(C) + PARAMETER
	conflicts for ISO_C_BINDING variables.
	(gen_special_c_interop_ptr): Don't mark c_ptr_null/c_funptr_null
	as SAVE.

2011-11-28  Tobias Burnus  bur...@net-b.de

	PR fortran/51308
	* gfortran.dg/iso_c_binding_compiler_4.f90: New.

diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 9bd6ed4..67009c5 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -742,9 +742,10 @@ check_conflict (symbol_attribute *attr, const char *name, locus *where)
   conf2 (asynchronous);
   conf2 (threadprivate);
   conf2 (value);
-  conf2 (is_bind_c);
   conf2 (codimension);
   conf2 (result);
+  if (!attr-is_iso_c)
+	conf2 (is_bind_c);
   break;
 
 default:
@@ -3763,13 +3764,12 @@ gen_special_c_interop_ptr (int ptr_id, const char *ptr_name,
   create symbol for %s, ptr_name);
 }
 
-  /* Set up the symbol's important fields.  Save attr required so we can
- initialize the ptr to NULL.  */
-  tmp_sym-attr.save = SAVE_EXPLICIT;
+  /* Set up the symbol's important fields. */
   tmp_sym-ts.is_c_interop = 1;
   tmp_sym-attr.is_c_interop = 1;
   tmp_sym-ts.is_iso_c = 1;
   tmp_sym-ts.type = BT_DERIVED;
+  tmp_sym-attr.flavor = FL_PARAMETER;
 
   /* The c_ptr and c_funptr derived types will provide the
  definition for c_null_ptr and c_null_funptr, respectively.  */
@@ -3817,9 +3817,6 @@ gen_special_c_interop_ptr (int ptr_id, const char *ptr_name,
   c-expr = gfc_get_expr ();
   c-expr-expr_type = EXPR_NULL;
   c-expr-ts.is_iso_c = 1;
-  /* Must declare c_null_ptr and c_null_funptr as having the
- PARAMETER attribute so they can be used in init expressions.  */
-  tmp_sym-attr.flavor = FL_PARAMETER;
 
   return SUCCESS;
 }
--- /dev/null	2011-11-28 07:33:54.995528670 +0100
+++ gcc/gcc/testsuite/gfortran.dg/iso_c_binding_compiler_4.f90	2011-11-28 14:27:05.0 +0100
@@ -0,0 +1,18 @@
+! { dg-do compile }
+!
+! PR fortran/51308
+!
+! Contributed by Matthias Moeller
+!
+
+module mymod
+  use iso_c_binding
+  implicit none
+
+  private
+  public :: c_ptr
+  public :: c_null_ptr
+
+end module mymod
+
+! { dg-final { cleanup-modules mymod } }


Re: [ARM] Rewrite ldrex/strex support for atomic optabs

2011-11-28 Thread David Gilbert
On 23 November 2011 23:43, Richard Henderson r...@redhat.com wrote:
 This transformation is quite a bit more dramatic than the other ports because 
 ARM was not splitting the code sequences post-reload.  Indeed, the failure to 
 split resulted in a distinctly odd coding style where fake output routines 
 were used to compute the length of the code sequence.  This all seemed highly 
 sub-optimal, so I rewrote everything from scratch.

 This has passed initial sniff test from a cross-compiler, and is undergoing 
 full testing on an armv7 host.  Hopefully results will be ready tomorrow.

 In the meantime, please review.

Hi Rchard,
  Can you explain the code:

+  if (mod_f != MEMMODEL_RELAXED)
+emit_label (label2);
+
+  arm_post_atomic_barrier (mod_s);
+
+  if (mod_f == MEMMODEL_RELAXED)
+emit_label (label2);
+}

in the case of the existing __sync_* will it always end up doing the
label and the sync as
Michael's pr 48126 indicated and my patch moved it?

(It's not at all clear to me what the ordering requirements of
ldrex/strex are from the ARM ARM).

Dave


Ping * [Patch] Take 2: Move Objective-C runtime flags to modern options system.

2011-11-28 Thread Iain Sandoe

Hi Joseph,

Mike has OK'd this ...

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01672.html

.. from the ObjC and Darwin points of view (and it implements your  
suggested changes from the review of version 1).


Is it now OK for trunk?
thanks
Iain

On 14 Nov 2011, at 20:30, Iain Sandoe wrote:


V2 is simpler, than

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01335.html

because (using the approach suggested by Joseph) we can initialize  
flag_next_runtime early enough to avoid having to deal with any  
other flags.


[I'll still get rid of the flag_objc_sjlj_exceptions flag in a   
follow-up patch.]


So, this just shifts flag_next_runtime to the modern system.

OK for trunk?
Iain

gcc:

* doc/tm.texi.in (NEXT_OBJC_RUNTIME): Update macro description.
* doc/tm.texi: Regenerate.
* flags.h (flag_next_runtime): Remove references.
* toplev.c: Likewise.
* defaults.h (NEXT_OBJC_RUNTIME): Provide default.
* gccspec.c (lang_specific_driver): Adjust use of NEXT_OBJC_RUNTIME.
	* config/darwin.c (darwin_override_options): Provide default  
Objective-C

abi settings and target conflict checks.
* config/darwin.h   (SUBTARGET_C_COMMON_OVERRIDE_OPTIONS):
Remove unused code.  Adjust indenting.

gcc/c-family:

* c.opt (fgnu-runtime): Provide full description.
(fnext-runtime): Likewise.
* c-opts.c (OPT_fgnu_runtime, OPT_fnext_runtime) Remove.




Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 08:09:01PM +1030, Alan Modra wrote:
 On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote:
  On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote:
   --- libgomp/config/linux/bar.c(revision 181718)
   +++ libgomp/config/linux/bar.c(working copy)
   @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
  if (__builtin_expect ((state  1) != 0, 0))
{
  /* Next time we'll be awaiting TOTAL threads again.  */
   -  bar-awaited = bar-total;
   -  atomic_write_barrier ();
   -  bar-generation += 4;
   +  __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE);
   +  __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL);
  futex_wake ((int *) bar-generation, INT_MAX);
}
  
  The above is unfortunate, bar-generation should be only modified from a
  single thread at this point, but the __atomic_add_fetch will enforce there
  an atomic instruction on it rather than just some kind of barrier.
 
 I will try without the atomic add and see what happens.

Seems to be fine.  I took out the extra write barriers too, so we now
just have two MEMMODEL_RELEASE atomic store barriers replacing the two
old atomic_write_barriers, and a number of new acquire barriers.

PR libgomp/51298
* config/linux/bar.h: Use atomic rather than sync builtins.
* config/linux/bar.c: Likewise.  Add missing acquire
synchronisation on generation field.
* task.c (gomp_barrier_handle_tasks): Regain lock so as to not
double unlock.

Index: libgomp/task.c
===
--- libgomp/task.c  (revision 181718)
+++ libgomp/task.c  (working copy)
@@ -273,6 +273,7 @@ gomp_barrier_handle_tasks (gomp_barrier_
  gomp_team_barrier_done (team-barrier, state);
  gomp_mutex_unlock (team-task_lock);
  gomp_team_barrier_wake (team-barrier, 0);
+ gomp_mutex_lock (team-task_lock);
}
}
 }
Index: libgomp/config/linux/bar.h
===
--- libgomp/config/linux/bar.h  (revision 181770)
+++ libgomp/config/linux/bar.h  (working copy)
@@ -50,7 +50,7 @@ static inline void gomp_barrier_init (go
 
 static inline void gomp_barrier_reinit (gomp_barrier_t *bar, unsigned count)
 {
-  __sync_fetch_and_add (bar-awaited, count - bar-total);
+  __atomic_add_fetch (bar-awaited, count - bar-total, MEMMODEL_ACQ_REL);
   bar-total = count;
 }
 
@@ -69,10 +69,8 @@ extern void gomp_team_barrier_wake (gomp
 static inline gomp_barrier_state_t
 gomp_barrier_wait_start (gomp_barrier_t *bar)
 {
-  unsigned int ret = bar-generation  ~3;
-  /* Do we need any barrier here or is __sync_add_and_fetch acting
- as the needed LoadLoad barrier already?  */
-  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
+  unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE)  ~3;
+  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;
   return ret;
 }
 
Index: libgomp/config/linux/bar.c
===
--- libgomp/config/linux/bar.c  (revision 181770)
+++ libgomp/config/linux/bar.c  (working copy)
@@ -37,17 +37,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b
 {
   /* Next time we'll be awaiting TOTAL threads again.  */
   bar-awaited = bar-total;
-  atomic_write_barrier ();
-  bar-generation += 4;
+  __atomic_store_4 (bar-generation, bar-generation + 4,
+   MEMMODEL_RELEASE);
   futex_wake ((int *) bar-generation, INT_MAX);
 }
   else
 {
-  unsigned int generation = state;
-
   do
-   do_wait ((int *) bar-generation, generation);
-  while (bar-generation == generation);
+   do_wait ((int *) bar-generation, state);
+  while (__atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) == state);
 }
 }
 
@@ -81,15 +79,15 @@ gomp_team_barrier_wake (gomp_barrier_t *
 void
 gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  unsigned int generation;
+  unsigned int generation, gen;
 
   if (__builtin_expect ((state  1) != 0, 0))
 {
   /* Next time we'll be awaiting TOTAL threads again.  */
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr-ts.team;
+
   bar-awaited = bar-total;
-  atomic_write_barrier ();
   if (__builtin_expect (team-task_count, 0))
{
  gomp_barrier_handle_tasks (state);
@@ -97,7 +95,7 @@ gomp_team_barrier_wait_end (gomp_barrier
}
   else
{
- bar-generation = state + 3;
+ __atomic_store_4 (bar-generation, state + 3, MEMMODEL_RELEASE);
  futex_wake ((int *) bar-generation, INT_MAX);
  return;
}
@@ -107,12 +105,16 @@ gomp_team_barrier_wait_end (gomp_barrier
   do
 {
   do_wait ((int *) bar-generation, generation);
-  if 

Re: [Patch, Fortran] PR51308 - fix bogus c_null_ptr error

2011-11-28 Thread Tobias Burnus

On 11/28/2011 02:37 PM, Tobias Burnus wrote:

Build and regtested on x86-64-linux.
OK for the trunk?


Steve approved the patch in the PR; thus, I have committed it as Rev. 
181778.


Tobias


[PATCH] Re: [v3] updated atomic configury

2011-11-28 Thread Andrew MacLeod

On 11/22/2011 03:48 PM, Joseph S. Myers wrote:

On Tue, 22 Nov 2011, Andrew MacLeod wrote:



It certainly does have those types, as typedefs defined inuchar.h   for
the same types as uint_least16_t and uint_least32_t.  Andstdatomic.h   is
also required to define ATOMIC_CHAR16_T_LOCK_FREE and
ATOMIC_CHAR32_T_LOCK_FREE.

but there isn't a stdatomic.h until we support C1x right? didnt we bail on

It still seems better to predefine the macros for those types, using
char16_type_node and char32_type_node, using them initially for C++ and
then later for C as well.

OK, here's a reworked patch.  This bootstraps but there is a 
regression... one set of PCH tests fail in a very odd way. I've 
investigated it and it turns out that for some reason,  ptr_type_node 
doesn't always have it size set if code isn't being generated (all the 
other nodes do)... in particular I can reproduce it on a small test case 
on my x86-64 machine where size is 8 when generating code, and 0 if -E 
is used to output pre-processed source.  I'm looking into that today.


Question, the purpose of the testcase in the patch  macros.c...  is it 
to simply test for standard compliance, or to test both compliance PLUS 
what we expect?  I made the change assuming the latter.  We will always 
generate a 1 or a 2 for the macros, and we also want the macros to now 
be compile time satisfied so they can be used in #if statements like 
bkoz was trying to do.


Andrew





c-family
* c-cpp-builtin.c (cpp_atomic_builtins):New.  Emit all atomic
predefines in one place.  Add LOCK_FREE predefines.
(c_cpp_builtins): Move Legacy HAVE_SYNC predefines to
new func.

libstdc++-v3
* include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp
predefined macros.
* testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER
macro checks.  Check for expected compile time values.

Index: gcc/c-family/c-cppbuiltin.c
===
*** gcc/c-family/c-cppbuiltin.c (revision 181698)
--- gcc/c-family/c-cppbuiltin.c (working copy)
*** c_cpp_builtins_optimize_pragma (cpp_read
*** 568,573 
--- 568,677 
  }
  
  
+ /* This function will emit cpp macros to indicate the presence of various lock
+free atomic operations.  */
+
+ static void
+ cpp_atomic_builtins (cpp_reader *pfile)
+ {
+   /* Set a flag for each size of object that compare and swap exists for up to
+  a 16 byte object.  */
+ #define SWAP_LIMIT  17
+   bool have_swap[SWAP_LIMIT];
+ 
+   /* Clear the map of sizes compare_and swap exists for.  */
+   memset (have_swap, 0, sizeof (have_swap));
+ 
+   /* Tell source code if the compiler makes sync_compare_and_swap
+  builtins available.  */
+ #ifndef HAVE_sync_compare_and_swapqi
+ #define HAVE_sync_compare_and_swapqi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapqi
+ #define HAVE_atomic_compare_and_swapqi 0
+ #endif
+ 
+   if (HAVE_sync_compare_and_swapqi || HAVE_atomic_compare_and_swapqi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1);
+   have_swap[1] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swaphi
+ #define HAVE_sync_compare_and_swaphi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swaphi
+ #define HAVE_atomic_compare_and_swaphi 0
+ #endif
+   if (HAVE_sync_compare_and_swaphi || HAVE_atomic_compare_and_swaphi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2);
+   have_swap[2] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapsi
+ #define HAVE_sync_compare_and_swapsi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapsi
+ #define HAVE_atomic_compare_and_swapsi 0
+ #endif
+   if (HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4);
+   have_swap[4] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapdi
+ #define HAVE_sync_compare_and_swapdi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapdi
+ #define HAVE_atomic_compare_and_swapdi 0
+ #endif
+   if (HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8);
+   have_swap[8] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapti
+ #define HAVE_sync_compare_and_swapti 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapti
+ #define HAVE_atomic_compare_and_swapti 0
+ #endif
+   if (HAVE_sync_compare_and_swapti || HAVE_atomic_compare_and_swapti)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16);
+   have_swap[16] = true;
+ }
+ 
+   /* Tell the source code about various types.  These map to the C++11 and C1x
+  macros where 2 indicates lock-free always, and 1 indicates sometimes
+  lock free.  */
+ #define SIZEOF_NODE(T) (tree_low_cst (TYPE_SIZE_UNIT (T), 1))
+ #define SWAP_INDEX(T) ((SIZEOF_NODE (T)  SWAP_LIMIT) ? SIZEOF_NODE (T) : 0)
+   builtin_define_with_int_value 

Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)

2011-11-28 Thread Rainer Orth
Alan Modra amo...@gmail.com writes:

 On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote:
 I was hoping that other targets affected by the patch would test it as well.

 powerpc-linux and powerpc64-linux look good.

Thanks for the confirmation.  However, Andreas Tobler pointed out in
private mail that I'd missed to reference rs6000/t-linux64 during the
libgcc migration.  Strange that this seems to have no ill effect.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix libgomp semaphores

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote:
 +  int count = *sem;
 +
 +  while ((count  0x7fff) != 0)
 +{
 +  int oldval = count;
 +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
 +false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
 +  if (__builtin_expect (oldval == count, 1))
 + return;

Why aren't you instead testing the return value of
__atomic_compare_exchange_4 (here and in other cases)?


Jakub


Re: [libitm, build] Clear hardware capabilities on libitm.so with Sun ld

2011-11-28 Thread Rainer Orth
Richard Henderson r...@redhat.com writes:

 On 11/21/2011 05:53 AM, Rainer Orth wrote:
 The libitm execution tests are currently failing on Solaris 10 and up
 with Sun as/ld:
 
 ld.so.1: cancel.exe: fatal: 
 /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/./libitm/.libs/libitm.so.0:
  hardware capability (CA_SUNW_HW_1) unsupported: 0x2000  [ AVX ]
 FAIL: libitm.c/cancel.c execution test
 
 This is the same issue solved by
 gcc/testsuite/gcc.target/i386/clearcap.map, and the following patch
 adresses it in the same way:
 
 * Detect if the linker used supports -M map file.
 
 * Use it when linking libitm.so.
 
 Right now, it is only possible to clear the hardware capabilities
 completely, while the new v2 mapfile syntax supports selectively adding
 and removing capabilities.  It is only available in Solaris 11 and
 Solaris 10 Update 10, though, so I'm restricting us to the v1 syntax for
 now.

 This is only ok if the compiler and library are build with default options.
 If you use --with-arch=corei7-avx then we may well use AVX insns all through
 the library, not just in the one interface that will only be used if the 
 user of the library is using avx.

 Can you auto-foo this based on CC+CFLAGS?  E.g.  compile-time tests for the
 various __SSE__ / __AVX__ macros?

My first attempt was to apply the equivalent of ld -r -M clearcap.map to
x86_avx.lo and x86_sse.lo.  While this works fine if calling ld -r
directly, you need to get this through libtool and this is where things
start breaking in all sorts of ways:

I tried

  $ libtool --mode=link --tag=CC gcc -r -Wl,-M,clearcap.map -o x86_nhc.lo 
x86_sse.lo
  libtool: link: collect-ld -r -o x86_nhc.o  .libs/x86_sse.o  
  libtool: link: collect-ld -r -o x86_nhc.lo .libs/x86_sse.o 

This is of no use at all since -Wl,-M,clearcap.map is lost completely
and I've found no way to get this through libtool at all.  This beast
tries to be smart and messes everything up completely.  But even if I
could get that part to work, what happens is still wrong: the .lo file
is a real object file now, not the usual text file, so once I try to
link the shared lib, libtool rightly complains.  While the libtool docs
claim that this relinking (.lo - .lo) should work, it obviously does
not, even with libtool 2.2.4 ;-(

So unless I'm overlooking something, I'll have to go the route you
outlined above and check for __SSE__/__AVX__ to make the decision.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Ping * [Patch] Take 2: Move Objective-C runtime flags to modern options system.

2011-11-28 Thread Joseph S. Myers
On Mon, 28 Nov 2011, Iain Sandoe wrote:

 Hi Joseph,
 
 Mike has OK'd this ...
 
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01672.html
 
 .. from the ObjC and Darwin points of view (and it implements your suggested
 changes from the review of version 1).
 
 Is it now OK for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: Re: [Ping] RE: CR16 Port addition

2011-11-28 Thread Joseph S. Myers
You still have libgcc source files under gcc/config/ (cr16-libgcc.s, 
divmodhi3.c).  You still have libgcc-related settings / makefile targets 
in gcc/config/cr16/t-cr16.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Re: [v3] updated atomic configury

2011-11-28 Thread Joseph S. Myers
On Mon, 28 Nov 2011, Andrew MacLeod wrote:

 OK, here's a reworked patch.  This bootstraps but there is a regression... one
 set of PCH tests fail in a very odd way. I've investigated it and it turns out
 that for some reason,  ptr_type_node doesn't always have it size set if code
 isn't being generated (all the other nodes do)... in particular I can
 reproduce it on a small test case on my x86-64 machine where size is 8 when
 generating code, and 0 if -E is used to output pre-processed source.  I'm
 looking into that today.

That would be why cppbuiltin.c, defining __SIZEOF_POINTER__, needs to use 
the POINTER_SIZE macro instead of directly using ptr_type_node.  You may 
need to do something similar.

  /* ptr_type_node can't be used here since ptr_mode is only set when
 toplev calls backend_init which is not done with -E switch.  */
  cpp_define_formatted (pfile, __SIZEOF_POINTER__=%d,
POINTER_SIZE / BITS_PER_UNIT);

 Question, the purpose of the testcase in the patch  macros.c...  is it to
 simply test for standard compliance, or to test both compliance PLUS what we
 expect?  I made the change assuming the latter.  We will always generate a 1
 or a 2 for the macros, and we also want the macros to now be compile time
 satisfied so they can be used in #if statements like bkoz was trying to do.

Yes, test that the macros do what we expect.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Go patch committed: Multiplex goroutines onto OS threads

2011-11-28 Thread Uros Bizjak
Hello!

 This patch changes the Go library to multiplex goroutines onto operating
 system threads.  Previously, each new goroutine ran in a separate
 thread.  That is inefficient for programs with lots of goroutines.  This
 patch changes the library such that it runs a certain numbers of
 threads, and lets each thread switch between goroutines.  This is how
 the master Go library works, and this patch brings in code from the
 master Go library, adjusted for use by gccgo.

For some reason I get this failure on alphaev68-pc-linux-gnu:

--- FAIL: runtime_test.TestGcSys (4.64 seconds)
using 64 MB
using too much memory: 64 MB

Raising the value in runtime/gc_test.go to 10e8 runs the test OK.

Uros.


[PATCH] PR c++/51289 - ICE with alias template for bound template

2011-11-28 Thread Dodji Seketeli
Hello,

Consider this example:

templatetypename a, template typename, typename class b
struct foo {
  template typename t
  using type = ba, t;
  template typename t
  ba, t funca() {}

  template typename t
  typet funcb() {}//#1
};

In #1, when lookup_template_class_1 builds the type of typet we hit
a kind of plumbing issue; near the end of the the 'if' below,

  if (arg_depth == 1  parm_depth  1)
{
  /* We've been given an incomplete set of template arguments.
 For example, given:

   template class T struct S1 {
 template class U struct S2 {};
 template class U struct S2U* {};
};

 we will be called with an ARGLIST of `U*', but the
 TEMPLATE will be `template class T template
 class U struct S1T::S2'.  We must fill in the missing
 arguments.  */
  arglist
= add_outermost_template_args (TYPE_TI_ARGS (TREE_TYPE (templ)),
   arglist);
  arg_depth = TMPL_ARGS_DEPTH (arglist);
}

we try to get the arguments of the alias template type using
TYPE_TI_ARGS, and that fails because TREE_TYPE (templ) is a
BOUND_TEMPLATE_TEMPLATE_PARM, and TEMPLATE_TYPE_INFO behaves
incorrectly for BOUND_TEMPLATE_TEMPLATE_PARM nodes when they represent
the underlying type of an aliased type.  We wrongly poke at the
arguments of the underlying BOUND_TEMPLATE_TEMPLATE_PARM instead.

Thus patch below changes the TEMPLATE_TYPE_INFO accessor.

I also had to adjust find_parameter_packs_r to always get the template
info from the underlying type when faced with type aliases.
Otherwise, it can wrongly attempt to poke at arguments of the template
aliases even when these are not yet ready; this can happen e.g when
check_for_bare_parameter_packs is called from push_template_decl_real
before the template information for the alias template specialization
is in place.  Also I think we don't risk missing any parameter pack by
only considering the underlying type in the case of alias template
specializations.

Bootstrapped and tested against on x86_64-unknown-linux-gnu against
trunk.

gcc/cp/

PR c++/51289
* cp-tree.h (TYPE_TEMPLATE_INFO): Rewrite this accessor macro to
better support aliased types.
* pt.c ():

gcc/PR51289/gcc/testsuite/

PR c++/51289
* g++.dg/cpp0x/alias-decl-17.C: New test.
---
 gcc/cp/cp-tree.h   |   27 +--
 gcc/cp/pt.c|6 --
 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C |   13 +
 3 files changed, 34 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3f4f408..dece84e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2605,17 +2605,24 @@ extern void decl_shadowed_for_var_insert (tree, tree);
   (LANG_TYPE_CLASS_CHECK (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK (NODE)) \
-template_info)
 
-/* Template information for an ENUMERAL_, RECORD_, or UNION_TYPE.  */
+/* Template information for an ENUMERAL_, RECORD_, UNION_TYPE, or
+   BOUND_TEMPLATE_TEMPLATE_PARM type.  Note that if NODE is a
+   specialization of an alias template, this accessor returns the
+   template info for the alias template, not the one (if any) for the
+   template of the underlying type.  */
 #define TYPE_TEMPLATE_INFO(NODE)   \
-  (TREE_CODE (NODE) == ENUMERAL_TYPE   \
-   ? ENUM_TEMPLATE_INFO (NODE) :   \
-   (TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM   \
-? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) :\
-((CLASS_TYPE_P (NODE)  !TYPE_ALIAS_P (NODE)) \
- ? CLASSTYPE_TEMPLATE_INFO (NODE)  \
- : ((TYPE_NAME (NODE)  DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\
-   ? (DECL_TEMPLATE_INFO (TYPE_NAME (NODE)))   \
-   : NULL_TREE
+  (TYPE_ALIAS_P (NODE) \
+   ? ((TYPE_NAME (NODE)  DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))  \
+  ? DECL_TEMPLATE_INFO (TYPE_NAME (NODE))  \
+  : NULL_TREE) \
+   : ((TREE_CODE (NODE) == ENUMERAL_TYPE)  \
+  ? ENUM_TEMPLATE_INFO (NODE)  \
+  : ((TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM)\
+? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE)  \
+: (CLASS_TYPE_P (NODE) \
+   ? CLASSTYPE_TEMPLATE_INFO (NODE)\
+   : NULL_TREE
+
 
 /* Set the template information for an 

Re: [ARM] Rewrite ldrex/strex support for atomic optabs

2011-11-28 Thread Richard Henderson
On 11/28/2011 06:01 AM, David Gilbert wrote:
 Hi Rchard,
   Can you explain the code:
 
 +  if (mod_f != MEMMODEL_RELAXED)
 +emit_label (label2);
 +
 +  arm_post_atomic_barrier (mod_s);
 +
 +  if (mod_f == MEMMODEL_RELAXED)
 +emit_label (label2);
 +}
 
 in the case of the existing __sync_* will it always end up doing the
 label and the sync as
 Michael's pr 48126 indicated and my patch moved it?

This is a new feature of the user-level interface: one can specify both the 
memory model to be enforced on success and the memory model to be enforced on 
failure:

  bool
  compare_exchange_strong(_Tp __e, _Tp __i, memory_order __s, 
  memory_order __f) noexcept
  {
return __atomic_compare_exchange(_M_i, __e, __i, false, __s, __f); 
  }

with the constraint that the failure model __f cannot be stronger than success 
model __s.

What the quoted code does is skip any final barrier if and only if the failure 
model is RELAXED.


r~


[PATCH v2 0/5] Convert Sparc to atomic optabs

2011-11-28 Thread Richard Henderson
Changes v1-v2:
  * DImode fixes.
  * Use the new atomic_test_and_set optab for LDSTUB.
  * Change -mmmodel to -mmemory-model and document it.


r~


Richard Henderson (5):
  sparc: Convert to mem_thread_fence.
  sparc: Convert to atomic_load/store.
  sparc: Convert to atomic_compare_and_swap.
  sparc: Convert to atomic_exchange.
  sparc: Add -mmemory-model command-line option.

 gcc/config/sparc/constraints.md |5 +-
 gcc/config/sparc/predicates.md  |   13 ++
 gcc/config/sparc/sparc-opts.h   |   10 ++
 gcc/config/sparc/sparc-protos.h |4 +-
 gcc/config/sparc/sparc.c|  149 ++-
 gcc/config/sparc/sparc.md   |1 +
 gcc/config/sparc/sparc.opt  |   22 
 gcc/config/sparc/sync.md|  258 +--
 gcc/doc/invoke.texi |   74 ---
 9 files changed, 420 insertions(+), 116 deletions(-)

-- 
1.7.6.4



[PATCH 1/5] sparc: Convert to mem_thread_fence.

2011-11-28 Thread Richard Henderson
---
 gcc/config/sparc/predicates.md  |4 ++
 gcc/config/sparc/sparc-protos.h |2 +
 gcc/config/sparc/sparc.c|   47 +++
 gcc/config/sparc/sync.md|   79 ---
 4 files changed, 118 insertions(+), 14 deletions(-)

diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index 4dd734f..047b217 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -111,6 +111,10 @@
 (define_predicate const_double_or_vector_operand
   (match_code const_double,const_vector))
 
+;; Return true if OP is Zero, or if the target is V7.
+(define_predicate zero_or_v7_operand
+  (ior (match_test op == const0_rtx)
+   (match_test !TARGET_V8  !TARGET_V9)))
 
 ;; Predicates for symbolic constants.
 
diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h
index 10fa5ed..b292024 100644
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@@ -113,4 +113,6 @@ unsigned int sparc_regmode_natural_size (enum machine_mode);
 bool sparc_modes_tieable_p (enum machine_mode, enum machine_mode);
 #endif /* RTX_CODE */
 
+extern void sparc_emit_membar_for_model (enum memmodel, int, int);
+
 #endif /* __SPARC_PROTOS_H__ */
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index b315698..c5c4899 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -10849,6 +10849,53 @@ sparc_mangle_type (const_tree type)
 }
 #endif
 
+/* Expand a membar instruction for various use cases.  Both the LOAD_STORE
+   and BEFORE_AFTER arguments of the form X_Y.  They are two-bit masks where
+   bit 0 indicates that X is true, and bit 1 indicates Y is true.  */
+
+void
+sparc_emit_membar_for_model (enum memmodel model,
+int load_store, int before_after)
+{
+  /* Bits for the MEMBAR mmask field.  */
+  const int LoadLoad = 1;
+  const int StoreLoad = 2;
+  const int LoadStore = 4;
+  const int StoreStore = 8;
+
+  int mm = 0;
+
+  if (before_after  1)
+{
+  if (model == MEMMODEL_ACQUIRE
+  || model == MEMMODEL_ACQ_REL
+  || model == MEMMODEL_SEQ_CST)
+   {
+ if (load_store  1)
+   mm |= LoadLoad | LoadStore;
+ if (load_store  2)
+   mm |= StoreLoad | StoreStore;
+   }
+}
+  if (before_after  2)
+{
+  if (model == MEMMODEL_RELEASE
+ || model == MEMMODEL_ACQ_REL
+ || model == MEMMODEL_SEQ_CST)
+   {
+ if (load_store  1)
+   mm |= LoadLoad | StoreLoad;
+ if (load_store  2)
+   mm |= LoadStore | StoreStore;
+   }
+}
+
+  /* For raw barriers (before+after), always emit a barrier.
+ This will become a compile-time barrier if needed.  */
+  if (mm || before_after == 3)
+emit_insn (gen_membar (GEN_INT (mm)));
+}
+
 /* Expand code to perform a 8 or 16-bit compare and swap by doing 32-bit
compare and swap on the word containing the byte or half-word.  */
 
diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index a7380ab..e22f516 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -1,5 +1,5 @@
 ;; GCC machine description for SPARC synchronization instructions.
-;; Copyright (C) 2005, 2007, 2009, 2010
+;; Copyright (C) 2005, 2007, 2009, 2010, 2011
 ;; Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
@@ -23,36 +23,87 @@
 (define_mode_iterator I48MODE [SI (DI TARGET_ARCH64 || TARGET_V8PLUS)])
 (define_mode_attr modesuffix [(SI ) (DI x)])
 
+(define_expand mem_thread_fence
+  [(match_operand:SI 0 const_int_operand)]
+  TARGET_V8 || TARGET_V9
+{
+  enum memmodel model = (enum memmodel) INTVAL (operands[0]);
+  sparc_emit_membar_for_model (model, 3, 3);
+  DONE;
+})
+
 (define_expand memory_barrier
-  [(set (match_dup 0)
-   (unspec:BLK [(match_dup 0)] UNSPEC_MEMBAR))]
+  [(const_int 0)]
+  TARGET_V8 || TARGET_V9
+{
+  sparc_emit_membar_for_model (MEMMODEL_SEQ_CST, 3, 3);
+  DONE;
+})
+
+(define_expand membar
+  [(set (match_dup 1)
+   (unspec:BLK [(match_dup 1)
+(match_operand:SI 0 const_int_operand)]
+   UNSPEC_MEMBAR))]
   TARGET_V8 || TARGET_V9
 {
-  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
-  MEM_VOLATILE_P (operands[0]) = 1;
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
 })
 
-;; In V8, loads are blocking and ordered wrt earlier loads, i.e. every load
-;; is virtually followed by a load barrier (membar #LoadStore | #LoadLoad).
-;; In PSO, stbar orders the stores (membar #StoreStore).
-;; In TSO, ldstub orders the stores wrt subsequent loads (membar #StoreLoad).
-;; The combination of the three yields a full memory barrier in all cases.
+;; A compiler-only memory barrier.  Generic code, when checking for the
+;; existance of various named patterns, uses asm(:::memory) when we
+;; don't need an actual instruction.  Here, it's easiest to pretend 

[PATCH 2/5] sparc: Convert to atomic_load/store.

2011-11-28 Thread Richard Henderson
---
 gcc/config/sparc/predicates.md |5 +++
 gcc/config/sparc/sparc.md  |1 +
 gcc/config/sparc/sync.md   |   59 
 3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index 047b217..a9207be 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -239,6 +239,11 @@
   (ior (match_operand 0 register_operand)
(match_operand 0 const_zero_operand)))
 
+(define_predicate register_or_v9_zero_operand
+  (ior (match_operand 0 register_operand)
+   (and (match_test TARGET_V9)
+   (match_operand 0 const_zero_operand
+
 ;; Return true if OP is either the zero constant, the all-ones
 ;; constant, or a register.
 (define_predicate register_or_zero_or_all_ones_operand
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index c059dc5..37ac170 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -41,6 +41,7 @@
(UNSPEC_MOVE_GOTDATA19)
 
(UNSPEC_MEMBAR  20)
+   (UNSPEC_ATOMIC  21)
 
(UNSPEC_TLSGD   30)
(UNSPEC_TLSLDM  31)
diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index e22f516..e2f4596 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -102,6 +102,65 @@
   membar\t%1
   [(set_attr type multi)])
 
+(define_expand atomic_loadmode
+  [(match_operand:I 0 register_operand )
+   (match_operand:I 1 memory_operand )
+   (match_operand:SI 2 const_int_operand )]
+  
+{
+  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+
+  sparc_emit_membar_for_model (model, 1, 1);
+
+  if (TARGET_ARCH64 || MODEmode != DImode)
+emit_move_insn (operands[0], operands[1]);
+  else
+emit_insn (gen_atomic_loaddi_1 (operands[0], operands[1]));
+
+  sparc_emit_membar_for_model (model, 1, 2);
+  DONE;
+})
+
+(define_insn atomic_loaddi_1
+  [(set (match_operand:DI 0 register_operand =U,?*f)
+   (unspec:DI [(match_operand:DI 1 memory_operand m,m)]
+  UNSPEC_ATOMIC))]
+  !TARGET_ARCH64
+  ldd\t%1, %0
+  [(set_attr type load,fpload)])
+
+(define_expand atomic_storemode
+  [(match_operand:I 0 register_operand )
+   (match_operand:I 1 memory_operand )
+   (match_operand:SI 2 const_int_operand )]
+  
+{
+  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+
+  sparc_emit_membar_for_model (model, 2, 1);
+
+  if (TARGET_ARCH64 || MODEmode != DImode)
+emit_move_insn (operands[0], operands[1]);
+  else
+emit_insn (gen_atomic_storedi_1 (operands[0], operands[1]));
+
+  sparc_emit_membar_for_model (model, 2, 2);
+  DONE;
+})
+
+(define_insn atomic_storedi_1
+  [(set (match_operand:DI 0 memory_operand =m,m,m)
+   (unspec:DI
+ [(match_operand:DI 1 register_or_v9_zero_operand J,U,?*f)]
+ UNSPEC_ATOMIC))]
+  !TARGET_ARCH64
+  @
+   stx\t%r1, %0
+   std\t%1, %0
+   std\t%1, %0
+  [(set_attr type store,store,fpstore)
+   (set_attr cpu_feature v9,*,*)])
+
 
 
 (define_expand sync_compare_and_swapmode
-- 
1.7.6.4



[PATCH 4/5] sparc: Convert to atomic_exchange.

2011-11-28 Thread Richard Henderson
---
 gcc/config/sparc/sync.md |   90 +++--
 1 files changed, 30 insertions(+), 60 deletions(-)

diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index d7066b0..be8c4c4 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -19,6 +19,7 @@
 ;; http://www.gnu.org/licenses/.
 
 (define_mode_iterator I12MODE [QI HI])
+(define_mode_iterator I124MODE [QI HI SI])
 (define_mode_iterator I24MODE [HI SI])
 (define_mode_iterator I48MODE [SI (DI TARGET_ARCH64 || TARGET_V8PLUS)])
 (define_mode_attr modesuffix [(SI ) (DI x)])
@@ -32,14 +33,6 @@
   DONE;
 })
 
-(define_expand memory_barrier
-  [(const_int 0)]
-  TARGET_V8 || TARGET_V9
-{
-  sparc_emit_membar_for_model (MEMMODEL_SEQ_CST, 3, 3);
-  DONE;
-})
-
 (define_expand membar
   [(set (match_dup 1)
(unspec:BLK [(match_dup 1)
@@ -224,46 +217,22 @@
   [(set_attr type multi)
(set_attr length 8)])
 
-
-
-(define_expand sync_lock_test_and_setmode
-  [(match_operand:I12MODE 0 register_operand )
-   (match_operand:I12MODE 1 memory_operand )
-   (match_operand:I12MODE 2 arith_operand )]
-  !TARGET_V9
+(define_expand atomic_exchangesi
+  [(match_operand:SI 0 register_operand )
+   (match_operand:SI 1 memory_operand )
+   (match_operand:SI 2 register_operand )
+   (match_operand:SI 3 const_int_operand )]
+  TARGET_V8 || TARGET_V9
 {
-  if (operands[2] != const1_rtx)
-FAIL;
-  if (TARGET_V8)
-emit_insn (gen_memory_barrier ());
-  if (MODEmode != QImode)
-operands[1] = adjust_address (operands[1], QImode, 0);
-  emit_insn (gen_ldstubmode (operands[0], operands[1]));
-  DONE;
-})
+  enum memmodel model = (enum memmodel) INTVAL (operands[3]);
 
-(define_expand sync_lock_test_and_setsi
-  [(parallel
- [(set (match_operand:SI 0 register_operand )
-  (unspec_volatile:SI [(match_operand:SI 1 memory_operand )]
-  UNSPECV_SWAP))
-  (set (match_dup 1)
-  (match_operand:SI 2 arith_operand ))])]
-  
-{
-  if (! TARGET_V8  ! TARGET_V9)
-{
-  if (operands[2] != const1_rtx)
-   FAIL;
-  operands[1] = adjust_address (operands[1], QImode, 0);
-  emit_insn (gen_ldstubsi (operands[0], operands[1]));
-  DONE;
-}
-  emit_insn (gen_memory_barrier ());
-  operands[2] = force_reg (SImode, operands[2]);
+  sparc_emit_membar_for_model (model, 3, 1);
+  emit_insn (gen_swapsi (operands[0], operands[1], operands[2]));
+  sparc_emit_membar_for_model (model, 3, 2);
+  DONE;
 })
 
-(define_insn *swapsi
+(define_insn swapsi
   [(set (match_operand:SI 0 register_operand =r)
(unspec_volatile:SI [(match_operand:SI 1 memory_operand +m)]
UNSPECV_SWAP))
@@ -273,24 +242,25 @@
   swap\t%1, %0
   [(set_attr type multi)])
 
-(define_expand ldstubqi
-  [(parallel [(set (match_operand:QI 0 register_operand )
-  (unspec_volatile:QI [(match_operand:QI 1 memory_operand 
)]
-  UNSPECV_LDSTUB))
- (set (match_dup 1) (const_int -1))])]
+(define_expand atomic_test_and_setmode
+  [(match_operand:I124MODE 0 register_operand )
+   (match_operand:I124MODE 1 memory_operand )
+   (match_operand:SI 2 const_int_operand )]
   
-  )
+{
+  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 
-(define_expand ldstubmode
-  [(parallel [(set (match_operand:I24MODE 0 register_operand )
-  (zero_extend:I24MODE
- (unspec_volatile:QI [(match_operand:QI 1 memory_operand 
)]
- UNSPECV_LDSTUB)))
- (set (match_dup 1) (const_int -1))])]
-  
-  )
+  sparc_emit_membar_for_model (model, 3, 1);
+
+  if (MODEmode != QImode)
+operands[1] = adjust_address (operands[1], QImode, 0);
+  emit_insn (gen_ldstubmode (operands[0], operands[1]));
+
+  sparc_emit_membar_for_model (model, 3, 2);
+  DONE;
+})
 
-(define_insn *ldstubqi
+(define_insn ldstubqi
   [(set (match_operand:QI 0 register_operand =r)
(unspec_volatile:QI [(match_operand:QI 1 memory_operand +m)]
UNSPECV_LDSTUB))
@@ -299,7 +269,7 @@
   ldstub\t%1, %0
   [(set_attr type multi)])
 
-(define_insn *ldstubmode
+(define_insn ldstubmode
   [(set (match_operand:I24MODE 0 register_operand =r)
(zero_extend:I24MODE
  (unspec_volatile:QI [(match_operand:QI 1 memory_operand +m)]
-- 
1.7.6.4



[PATCH 5/5] sparc: Add -mmemory-model command-line option.

2011-11-28 Thread Richard Henderson
---
 gcc/config/sparc/sparc-opts.h |   10 +
 gcc/config/sparc/sparc.c  |   48 ++-
 gcc/config/sparc/sparc.opt|   22 
 gcc/doc/invoke.texi   |   74 ++--
 4 files changed, 134 insertions(+), 20 deletions(-)

diff --git a/gcc/config/sparc/sparc-opts.h b/gcc/config/sparc/sparc-opts.h
index 266cb14..7682eb9 100644
--- a/gcc/config/sparc/sparc-opts.h
+++ b/gcc/config/sparc/sparc-opts.h
@@ -47,4 +47,14 @@ enum processor_type {
   PROCESSOR_NATIVE
 };
 
+/* Sparc system memory model.  See Appendix D in the Sparc V9 manual
+   for formal specification, and Appendix J for more discussion.  */
+enum sparc_memory_model_type {
+  SMM_DEFAULT, /* Uninitialized.  */
+  SMM_RMO, /* Relaxed Memory Order.  */
+  SMM_PSO, /* Partial Store Order.  */
+  SMM_TSO, /* Total Store Order.  */
+  SMM_SC   /* Sequential Consistency.  */
+};
+
 #endif
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 7db216a..7785267 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1160,6 +1160,17 @@ sparc_option_override (void)
   gcc_unreachable ();
 };
 
+  if (sparc_memory_model == SMM_DEFAULT)
+{
+  /* Choose the most relaxed model for the processor.  */
+  if (TARGET_V9)
+   sparc_memory_model = SMM_RMO;
+  else if (TARGET_V8)
+   sparc_memory_model = SMM_PSO;
+  else
+   sparc_memory_model = SMM_SC;
+}
+
 #ifdef TARGET_DEFAULT_LONG_DOUBLE_128
   if (!(target_flags_explicit  MASK_LONG_DOUBLE_128))
 target_flags |= MASK_LONG_DOUBLE_128;
@@ -10863,7 +10874,39 @@ sparc_emit_membar_for_model (enum memmodel model,
   const int LoadStore = 4;
   const int StoreStore = 8;
 
-  int mm = 0;
+  int mm = 0, implied = 0;
+
+  switch (sparc_memory_model)
+{
+case SMM_SC:
+  /* Sequential Consistency.  All memory transactions are immediately
+visible in sequential execution order.  No barriers needed.  */
+  return;
+
+case SMM_TSO:
+  /* Total Store Ordering: all memory transactions with store semantics
+are followed by an implied StoreStore.  */
+  implied |= StoreStore;
+  /* FALLTHRU */
+
+case SMM_PSO:
+  /* Partial Store Ordering: all memory transactions with load semantics
+are followed by an implied LoadLoad | LoadStore.  */
+  implied |= LoadLoad | LoadStore;
+
+  /* If we're not looking for a raw barrer (before+after), then atomic
+operations get the benefit of being both load and store.  */
+  if (load_store == 3  before_after == 2)
+   implied |= StoreLoad | StoreStore;
+  /* FALLTHRU */
+
+case SMM_RMO:
+  /* Relaxed Memory Ordering: no implicit bits.  */
+  break;
+
+default:
+  gcc_unreachable ();
+}
 
   if (before_after  1)
 {
@@ -10890,6 +10933,9 @@ sparc_emit_membar_for_model (enum memmodel model,
}
 }
 
+  /* Remove the bits implied by the system memory model.  */
+  mm = ~implied;
+
   /* For raw barriers (before+after), always emit a barrier.
  This will become a compile-time barrier if needed.  */
   if (mm || before_after == 3)
diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index cb807fd..01f3d43 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -215,3 +215,25 @@ Mask(V9)
 Mask(DEPRECATED_V8_INSNS)
 ;; Generate code that uses the V8 instructions deprecated
 ;; in the V9 architecture.
+
+mmemory-model=
+Target RejectNegative Joined Var(sparc_memory_model) Enum(sparc_memory_model) 
Init(SMM_DEFAULT)
+Specify the memory model in effect for the program.
+
+Enum
+Name(sparc_memory_model) Type(enum sparc_memory_model_type)
+
+EnumValue
+Enum(sparc_memory_model) String(default) Value(SMM_DEFAULT)
+
+EnumValue
+Enum(sparc_memory_model) String(rmo) Value(SMM_RMO)
+
+EnumValue
+Enum(sparc_memory_model) String(pso) Value(SMM_PSO)
+
+EnumValue
+Enum(sparc_memory_model) String(tso) Value(SMM_TSO)
+
+EnumValue
+Enum(sparc_memory_model) String(sc) Value(SMM_SC)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e6edb9..eb21bbd6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -884,6 +884,7 @@ See RS/6000 and PowerPC Options.
 @gccoptlist{-mcpu=@var{cpu-type} @gol
 -mtune=@var{cpu-type} @gol
 -mcmodel=@var{code-model} @gol
+-mmemory-model=@var{mem-model} @gol
 -m32  -m64  -mapp-regs  -mno-app-regs @gol
 -mfaster-structs  -mno-faster-structs  -mflat  -mno-flat @gol
 -mfpu  -mno-fpu  -mhard-float  -msoft-float @gol
@@ -17811,13 +17812,22 @@ an architecture and not an implementation.  These are 
@samp{v7}, @samp{v8},
 Here is a list of each supported architecture and their supported
 implementations.
 
-@smallexample
-v7: cypress
-v8: supersparc, hypersparc, leon
-sparclite:  f930, f934, sparclite86x
-sparclet:   tsc701
-v9: ultrasparc, ultrasparc3, niagara, niagara2, niagara3, 
niagara4

[PATCH 3/5] sparc: Convert to atomic_compare_and_swap.

2011-11-28 Thread Richard Henderson
Aid the backend with mem_noofs_operand and a constraint to match.
---
 gcc/config/sparc/constraints.md |5 +++-
 gcc/config/sparc/predicates.md  |4 +++
 gcc/config/sparc/sparc-protos.h |2 +-
 gcc/config/sparc/sparc.c|   56 ---
 gcc/config/sparc/sync.md|   52 ---
 5 files changed, 85 insertions(+), 34 deletions(-)

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 317602c..472490f 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;AB   
-;;;ajklq  tuvwxyz
+;;;ajklq  tuv xyz
 
 
 ;; Register constraints
@@ -44,6 +44,9 @@
 (define_register_constraint h (TARGET_V9  TARGET_V8PLUS ? I64_REGS : 
NO_REGS)
  64-bit global or out register in V8+ mode)
 
+(define_memory_constraint w
+  A memory with only a base register
+  (match_operand 0 mem_noofs_operand))
 
 ;; Floating-point constant constraints
 
diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index a9207be..81e5320 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -471,6 +471,10 @@
(match_test call_address_operand (XEXP (op, 0), mode
 
 
+(define_predicate mem_noofs_operand
+  (and (match_code mem)
+   (match_code reg 0)))
+
 ;; Predicates for operators.
 
 ;; Return true if OP is a comparison operator.  This allows the use of
diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h
index b292024..52acc09 100644
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@@ -104,7 +104,7 @@ extern int v9_regcmp_p (enum rtx_code);
32 bits of REG are 0 before INSN.  */   
 extern int sparc_check_64 (rtx, rtx);
 extern rtx gen_df_reg (rtx, int);
-extern void sparc_expand_compare_and_swap_12 (rtx, rtx, rtx, rtx);
+extern void sparc_expand_compare_and_swap (rtx op[]);
 extern void sparc_expand_vector_init (rtx, rtx);
 extern void sparc_expand_vec_perm_bmask(enum machine_mode, rtx);
 extern bool sparc_expand_conditional_move (enum machine_mode, rtx *);
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index c5c4899..7db216a 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -10899,8 +10899,9 @@ sparc_emit_membar_for_model (enum memmodel model,
 /* Expand code to perform a 8 or 16-bit compare and swap by doing 32-bit
compare and swap on the word containing the byte or half-word.  */
 
-void
-sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval)
+static void
+sparc_expand_compare_and_swap_12 (rtx bool_result, rtx result, rtx mem,
+ rtx oldval, rtx newval)
 {
   rtx addr1 = force_reg (Pmode, XEXP (mem, 0));
   rtx addr = gen_reg_rtx (Pmode);
@@ -10925,7 +10926,7 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, 
rtx oldval, rtx newval)
   set_mem_alias_set (memsi, ALIAS_SET_MEMORY_BARRIER);
   MEM_VOLATILE_P (memsi) = MEM_VOLATILE_P (mem);
 
-  val = force_reg (SImode, memsi);
+  val = copy_to_reg (memsi);
 
   emit_insn (gen_rtx_SET (VOIDmode, off,
  gen_rtx_XOR (SImode, off,
@@ -10971,7 +10972,9 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, 
rtx oldval, rtx newval)
   emit_insn (gen_rtx_SET (VOIDmode, newvalue,
  gen_rtx_IOR (SImode, newv, val)));
 
-  emit_insn (gen_sync_compare_and_swapsi (res, memsi, oldvalue, newvalue));
+  emit_move_insn (bool_result, const1_rtx);
+
+  emit_insn (gen_atomic_compare_and_swapsi_1 (res, memsi, oldvalue, newvalue));
 
   emit_cmp_and_jump_insns (res, oldvalue, EQ, NULL, SImode, 0, end_label);
 
@@ -10979,6 +10982,8 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, 
rtx oldval, rtx newval)
  gen_rtx_AND (SImode, gen_rtx_NOT (SImode, mask),
   res)));
 
+  emit_move_insn (bool_result, const0_rtx);
+
   cc = gen_compare_reg_1 (NE, resv, val);
   emit_insn (gen_rtx_SET (VOIDmode, val, resv));
 
@@ -10997,6 +11002,49 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, 
rtx oldval, rtx newval)
   emit_move_insn (result, gen_lowpart (GET_MODE (result), res));
 }
 
+/* Expand code to perform a compare-and-swap.  */
+
+void
+sparc_expand_compare_and_swap (rtx operands[])
+{
+  rtx bval, retval, mem, oldval, newval;
+  enum machine_mode mode;
+  enum memmodel model;
+
+  bval = operands[0];
+  retval = operands[1];
+  mem = operands[2];
+  oldval = operands[3];
+  newval = operands[4];
+  model = (enum memmodel) INTVAL (operands[6]);
+  mode = GET_MODE (mem);
+
+  sparc_emit_membar_for_model (model, 3, 1);
+
+  if (reg_overlap_mentioned_p (retval, oldval))
+oldval = copy_to_reg (oldval);
+
+  if (mode == QImode || mode == HImode)
+sparc_expand_compare_and_swap_12 (bval, retval, mem, oldval, newval);
+  else
+{
+  

[PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)

2011-11-28 Thread Jakub Jelinek
Hi!

On the following testcase the volatile load/store are optimized away.
IMHO this is a bug in forwprop, which replaces
  tmp_Y = nonvolvar[arg_X];
  MEM[(volatile ...*)tmp_Y] ={v} ...;
with
  MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...;
where the LHS is no longer TREE_THIS_VOLATILE like before,
TREE_THIS_VOLATILE is newly only the MEM_REF operand of the non-volatile
ARRAY_REF.  This patch copies the TREE_THIS_VOLATILE bit to the ARRAY_REF.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-11-28  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/50078
* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Copy over
TREE_THIS_VOLATILE also from the old to new lhs resp. rhs.

* gcc.dg/pr50078.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2011-10-20 14:13:43.0 +0200
+++ gcc/tree-ssa-forwprop.c 2011-11-28 09:47:39.822697552 +0100
@@ -910,7 +910,7 @@ forward_propagate_addr_expr_1 (tree name
 TREE_TYPE (gimple_assign_rhs1 (use_stmt
{
  tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0);
- tree new_offset, new_base, saved;
+ tree new_offset, new_base, saved, new_lhs;
  while (handled_component_p (*def_rhs_basep))
def_rhs_basep = TREE_OPERAND (*def_rhs_basep, 0);
  saved = *def_rhs_basep;
@@ -930,8 +930,9 @@ forward_propagate_addr_expr_1 (tree name
   new_base, new_offset);
  TREE_THIS_VOLATILE (*def_rhs_basep) = TREE_THIS_VOLATILE (lhs);
  TREE_THIS_NOTRAP (*def_rhs_basep) = TREE_THIS_NOTRAP (lhs);
- gimple_assign_set_lhs (use_stmt,
-unshare_expr (TREE_OPERAND (def_rhs, 0)));
+ new_lhs = unshare_expr (TREE_OPERAND (def_rhs, 0));
+ gimple_assign_set_lhs (use_stmt, new_lhs);
+ TREE_THIS_VOLATILE (new_lhs) = TREE_THIS_VOLATILE (lhs);
  *def_rhs_basep = saved;
  tidy_after_forward_propagate_addr (use_stmt);
  /* Continue propagating into the RHS if this was not the
@@ -991,7 +992,7 @@ forward_propagate_addr_expr_1 (tree name
 TREE_TYPE (TREE_OPERAND (def_rhs, 0
{
  tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0);
- tree new_offset, new_base, saved;
+ tree new_offset, new_base, saved, new_rhs;
  while (handled_component_p (*def_rhs_basep))
def_rhs_basep = TREE_OPERAND (*def_rhs_basep, 0);
  saved = *def_rhs_basep;
@@ -1011,8 +1012,9 @@ forward_propagate_addr_expr_1 (tree name
   new_base, new_offset);
  TREE_THIS_VOLATILE (*def_rhs_basep) = TREE_THIS_VOLATILE (rhs);
  TREE_THIS_NOTRAP (*def_rhs_basep) = TREE_THIS_NOTRAP (rhs);
- gimple_assign_set_rhs1 (use_stmt,
- unshare_expr (TREE_OPERAND (def_rhs, 0)));
+ new_rhs = unshare_expr (TREE_OPERAND (def_rhs, 0));
+ gimple_assign_set_rhs1 (use_stmt, new_rhs);
+ TREE_THIS_VOLATILE (new_rhs) = TREE_THIS_VOLATILE (rhs);
  *def_rhs_basep = saved;
  fold_stmt_inplace (use_stmt_gsi);
  tidy_after_forward_propagate_addr (use_stmt);
--- gcc/testsuite/gcc.dg/pr50078.c.jj   2011-11-28 10:03:18.020091603 +0100
+++ gcc/testsuite/gcc.dg/pr50078.c  2011-11-28 10:05:01.533389187 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/50078 */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+unsigned nonvolvar[2];
+
+void
+test (int arg)
+{
+  unsigned v = *(volatile unsigned *) (nonvolvar[arg]);
+  *(volatile unsigned *) (nonvolvar[arg]) = v;
+}
+
+/* { dg-final { scan-assembler-times movl\[^\n\r\]*nonvolvar 2 { target { { 
i?86-*-* x86_64-*-* }  nonpic } } } } */

Jakub


[PATCH] Clear EDGE_CROSSING flag when redirecting edge succ to EXIT_BLOCK_PTR (PR middle-end/50907)

2011-11-28 Thread Jakub Jelinek
Hi!

The checking code insists on edges to EXIT_BLOCK_PTR (and from
ENTRY_BLOCK_PTR) to have no EDGE_CROSSING bit set, so if we have
a bb in second partition jumping to first partition's entry block
and convert_jumps_to_returns changes it to simple_return, we
need to drop the EDGE_CROSSING bit.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-11-28  Jakub Jelinek  ja...@redhat.com

PR middle-end/50907
* function.c (convert_jumps_to_returns): When redirecting an edge
succ to EXIT_BLOCK_PTR, clear EDGE_CROSSING flag.

* gcc.dg/tree-prof/pr50907.c: New test.

--- gcc/function.c.jj   2011-11-21 16:22:02.0 +0100
+++ gcc/function.c  2011-11-28 11:09:39.581873481 +0100
@@ -5670,6 +5670,7 @@ convert_jumps_to_returns (basic_block la
 
   /* Fix up the CFG for the successful change we just made.  */
   redirect_edge_succ (e, EXIT_BLOCK_PTR);
+  e-flags = ~EDGE_CROSSING;
 }
   VEC_free (basic_block, heap, src_bbs);
   return unconverted;
--- gcc/testsuite/gcc.dg/tree-prof/pr50907.c.jj 2011-11-28 11:12:12.769979965 
+0100
+++ gcc/testsuite/gcc.dg/tree-prof/pr50907.c2011-11-28 11:13:40.055475777 
+0100
@@ -0,0 +1,5 @@
+/* PR middle-end/50907 */
+/* { dg-require-effective-target freorder } */
+/* { dg-options -O -freorder-blocks-and-partition -fschedule-insns 
-fselective-scheduling -fpic { target { { powerpc*-*-* ia64-*-* x86_64-*-* } 
 fpic } } } */
+
+#include pr45354.c

Jakub


[PATCH] When making var non-addressable, drop var ={v} {CLOBBER} stmts (PR debug/50317)

2011-11-28 Thread Jakub Jelinek
Hi!

As discussed in the PR, SSA_NAME ={v} {CLOBBER} stmts are undesirable
at least for the debug info, we really don't need to mark there the end
of scope in the IL, the debug info records the ranges of the scopes
and if the clobber gets scheduled before some insn we get unnecessarily
pessimistic debug info.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2011-11-28  Jakub Jelinek  ja...@redhat.com

PR debug/50317
* tree-ssa.c (execute_update_addresses_taken): Remove
var ={v} {CLOBBER} stmts instead of rewriting them into
var_N ={v} {CLOBBER}.

--- gcc/tree-ssa.c.jj   2011-10-14 08:21:56.0 +0200
+++ gcc/tree-ssa.c  2011-11-28 13:29:45.297956952 +0100
@@ -2118,7 +2118,7 @@ execute_update_addresses_taken (void)
   if (update_vops)
 {
   FOR_EACH_BB (bb)
-   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
  {
gimple stmt = gsi_stmt (gsi);
 
@@ -2159,6 +2159,18 @@ execute_update_addresses_taken (void)
if (gimple_assign_lhs (stmt) != lhs)
  gimple_assign_set_lhs (stmt, lhs);
 
+   /* For var ={v} {CLOBBER}; where var lost
+  TREE_ADDRESSABLE just remove the stmt.  */
+   if (DECL_P (lhs)
+TREE_CLOBBER_P (rhs)
+symbol_marked_for_renaming (lhs))
+ {
+   unlink_stmt_vdef (stmt);
+   gsi_remove (gsi, true);
+   release_defs (stmt);
+   continue;
+ }
+
if (gimple_assign_rhs1 (stmt) != rhs)
  {
gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
@@ -2205,6 +2219,8 @@ execute_update_addresses_taken (void)
if (gimple_references_memory_p (stmt)
|| is_gimple_debug (stmt))
  update_stmt (stmt);
+
+   gsi_next (gsi);
  }
 
   /* Update SSA form here, we are called as non-pass as well.  */

Jakub


Re: [PATCH] Clear EDGE_CROSSING flag when redirecting edge succ to EXIT_BLOCK_PTR (PR middle-end/50907)

2011-11-28 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/28/11 11:40, Jakub Jelinek wrote:
 Hi!
 
 The checking code insists on edges to EXIT_BLOCK_PTR (and from 
 ENTRY_BLOCK_PTR) to have no EDGE_CROSSING bit set, so if we have a
 bb in second partition jumping to first partition's entry block and
 convert_jumps_to_returns changes it to simple_return, we need to
 drop the EDGE_CROSSING bit.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
 trunk?
 
 2011-11-28  Jakub Jelinek  ja...@redhat.com
 
 PR middle-end/50907 * function.c (convert_jumps_to_returns): When
 redirecting an edge succ to EXIT_BLOCK_PTR, clear EDGE_CROSSING
 flag.
 
 * gcc.dg/tree-prof/pr50907.c: New test.
OK.
jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO090sAAoJEBRtltQi2kC7PW0IAKVOeVrA383o7iNOGxEMBtl1
bgATdd1ofIfr7WbGsNw5QPY3KI+kjyHwiIq60yeJSDrTBxUZMKfdo5YavpZSNuAw
qEE4toRAUWkLIiY+28HJKc8rLbjJFlEOdKJD0SUuSwmYGIyhr5LV49Mi6XtFIqr1
g0F2sRFXBc1c/oG8hhNkpOjgNwiHj1dm++LKUTf9hw8OX+28mZBkWJxsJgk8Y6po
qeMKkkkpj2qhte5MyPMxv6lCf99CmP6xZnbhhAEqi1Q1mK0/zuc4MSM41gkneBEV
x/dYhf6dQPA1SrPGqo/nwUvDptTi+knzQfFIoRmjdnh+vK/897dtPbNzVqZX75w=
=60Ol
-END PGP SIGNATURE-


Re: [PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)

2011-11-28 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/28/11 11:36, Jakub Jelinek wrote:
 Hi!
 
 On the following testcase the volatile load/store are optimized
 away. IMHO this is a bug in forwprop, which replaces tmp_Y =
 nonvolvar[arg_X]; MEM[(volatile ...*)tmp_Y] ={v} ...; with 
 MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...; where the LHS is no
 longer TREE_THIS_VOLATILE like before, TREE_THIS_VOLATILE is newly
 only the MEM_REF operand of the non-volatile ARRAY_REF.  This patch
 copies the TREE_THIS_VOLATILE bit to the ARRAY_REF.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
 trunk?
 
 2011-11-28  Jakub Jelinek  ja...@redhat.com
 
 PR tree-optimization/50078 * tree-ssa-forwprop.c
 (forward_propagate_addr_expr_1): Copy over TREE_THIS_VOLATILE also
 from the old to new lhs resp. rhs.
 
 * gcc.dg/pr50078.c: New test.
Do we need to copy NOTRAP or any of the other flags?

OK, assuming we don't need to copy NOTRAP, etc.  If those need to be
copied too, preapproved with those changes.

Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO094dAAoJEBRtltQi2kC7Z+UH+QHq/yj9zX6wZz3cfTgZ4ZyR
oWQypCXYUgoIBYrX+rWD91dzI/R+2Zy2F6u5zBOtlzrmNTLbQcQt3PL2fzhz76/L
0+lZ8c8kTwh0lvmTI7/LPQXf0o7Zli43xteOSADglFVVGcWIPus/honBmP3dqkV+
O1n6WZq95eEZVl8HciZOQKANz+yrcVyMi6u/8WioGdn/3Sl9YodpN2R0kuSr++5V
//4XYO1crzMNqqjkxLTJNRNZiV0XH/WwiJx9MOoCKDMooBo7MNbDPtIcfeSbddIl
GhqPEvwK+xBWyOatJMFsyRVlFvcWCfdKGBE//B7+m9ml7eS5pVe/kngphbvA6RI=
=31Lf
-END PGP SIGNATURE-


Re: [PATCH] Re: [v3] updated atomic configury

2011-11-28 Thread Andrew MacLeod

On 11/28/2011 11:42 AM, Joseph S. Myers wrote:

On Mon, 28 Nov 2011, Andrew MacLeod wrote:

That would be why cppbuiltin.c, defining __SIZEOF_POINTER__, needs to use
the POINTER_SIZE macro instead of directly using ptr_type_node.  You may
need to do something similar.

   /* ptr_type_node can't be used here since ptr_mode is only set when
  toplev calls backend_init which is not done with -E switch.  */
   cpp_define_formatted (pfile, __SIZEOF_POINTER__=%d,
 POINTER_SIZE / BITS_PER_UNIT);

How bizarre...  thanks you saved me some hair pulling I'm sure :-)  That 
does work, here's a patch which does that..
bootstraps on x86_64-unknown-linux-gnu and has no new regressions.  OK 
for mainline?


Andrew

c-family
* c-cpp-builtin.c (cpp_atomic_builtins):New.  Emit all atomic
predefines in one place.  Add LOCK_FREE predefines.
(c_cpp_builtins): Move Legacy HAVE_SYNC predefines to
new func.

libstdc++-v3
* include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp
predefined macros.
* testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER
macro checks.  Check for expected compile time values.

Index: gcc/c-family/c-cppbuiltin.c
===
*** gcc/c-family/c-cppbuiltin.c (revision 181698)
--- gcc/c-family/c-cppbuiltin.c (working copy)
*** c_cpp_builtins_optimize_pragma (cpp_read
*** 568,573 
--- 568,684 
  }
  
  
+ /* This function will emit cpp macros to indicate the presence of various lock
+free atomic operations.  */
+
+ static void
+ cpp_atomic_builtins (cpp_reader *pfile)
+ {
+   /* Set a flag for each size of object that compare and swap exists for up to
+  a 16 byte object.  */
+ #define SWAP_LIMIT  17
+   bool have_swap[SWAP_LIMIT];
+   unsigned int psize;
+ 
+   /* Clear the map of sizes compare_and swap exists for.  */
+   memset (have_swap, 0, sizeof (have_swap));
+ 
+   /* Tell source code if the compiler makes sync_compare_and_swap
+  builtins available.  */
+ #ifndef HAVE_sync_compare_and_swapqi
+ #define HAVE_sync_compare_and_swapqi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapqi
+ #define HAVE_atomic_compare_and_swapqi 0
+ #endif
+ 
+   if (HAVE_sync_compare_and_swapqi || HAVE_atomic_compare_and_swapqi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1);
+   have_swap[1] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swaphi
+ #define HAVE_sync_compare_and_swaphi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swaphi
+ #define HAVE_atomic_compare_and_swaphi 0
+ #endif
+   if (HAVE_sync_compare_and_swaphi || HAVE_atomic_compare_and_swaphi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2);
+   have_swap[2] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapsi
+ #define HAVE_sync_compare_and_swapsi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapsi
+ #define HAVE_atomic_compare_and_swapsi 0
+ #endif
+   if (HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4);
+   have_swap[4] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapdi
+ #define HAVE_sync_compare_and_swapdi 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapdi
+ #define HAVE_atomic_compare_and_swapdi 0
+ #endif
+   if (HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8);
+   have_swap[8] = true;
+ }
+ 
+ #ifndef HAVE_sync_compare_and_swapti
+ #define HAVE_sync_compare_and_swapti 0
+ #endif
+ #ifndef HAVE_atomic_compare_and_swapti
+ #define HAVE_atomic_compare_and_swapti 0
+ #endif
+   if (HAVE_sync_compare_and_swapti || HAVE_atomic_compare_and_swapti)
+ {
+   cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16);
+   have_swap[16] = true;
+ }
+ 
+   /* Tell the source code about various types.  These map to the C++11 and C1x
+  macros where 2 indicates lock-free always, and 1 indicates sometimes
+  lock free.  */
+ #define SIZEOF_NODE(T) (tree_low_cst (TYPE_SIZE_UNIT (T), 1))
+ #define SWAP_INDEX(T) ((SIZEOF_NODE (T)  SWAP_LIMIT) ? SIZEOF_NODE (T) : 0)
+   builtin_define_with_int_value (__GCC_ATOMIC_BOOL_LOCK_FREE, 
+   (have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1));
+   builtin_define_with_int_value (__GCC_ATOMIC_CHAR_LOCK_FREE, 
+   (have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1));
+   builtin_define_with_int_value (__GCC_ATOMIC_CHAR16_T_LOCK_FREE, 
+   (have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1));
+   builtin_define_with_int_value (__GCC_ATOMIC_CHAR32_T_LOCK_FREE, 
+   (have_swap[SWAP_INDEX (char32_type_node)]? 2 : 1));
+   builtin_define_with_int_value (__GCC_ATOMIC_WCHAR_T_LOCK_FREE, 
+   (have_swap[SWAP_INDEX (wchar_type_node)]? 2 : 1));
+   

[patch] libjava/classpath configure update for FreeBSD 10 and up

2011-11-28 Thread Andreas Tobler

All,

I'd like to commit the attached patch to trunk and gcc-4.6.
It is a follow up from this one:

http://gcc.gnu.org/ml/gcc-cvs/2011-11/msg00886.html

I missed this one because in my setup I didn't build any shared 
libraries from classpath. The config.rpath patch is already sent to 
classpath.


Any objections to this patch?

Otherwise I'm going to commit within 24h.

TIA,
Andreas

libjava/classpath/ChangeLog.gcj:

2011-11-28  Andreas Tobler  andre...@fgznet.ch

* config.rpath (ld_shlibs): Fix detection of FreeBSD-10 and up.
(libname_spec): Likewise.
* configure: Regenerate with autoconf -I ../../.
Index: configure
===
--- configure   (revision 181783)
+++ configure   (working copy)
@@ -10025,7 +10025,7 @@
   ;;
 
 # Unfortunately, older versions of FreeBSD 2 do not have this feature.
-freebsd2*)
+freebsd2.*)
   archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'
   hardcode_direct=yes
   hardcode_minus_L=yes
@@ -10941,7 +10941,7 @@
 objformat=`/usr/bin/objformat`
   else
 case $host_os in
-freebsd[123]*) objformat=aout ;;
+freebsd[23].*) objformat=aout ;;
 *) objformat=elf ;;
 esac
   fi
@@ -10959,7 +10959,7 @@
   esac
   shlibpath_var=LD_LIBRARY_PATH
   case $host_os in
-  freebsd2*)
+  freebsd2.*)
 shlibpath_overrides_runpath=yes
 ;;
   freebsd3.[01]* | freebsdelf3.[01]*)
@@ -14225,7 +14225,7 @@
 esac
 ;;
 
-  freebsd[12]*)
+  freebsd2.*)
 # C++ shared libraries reported to be fairly broken before
# switch to ELF
 ld_shlibs_CXX=no
@@ -16000,7 +16000,7 @@
 objformat=`/usr/bin/objformat`
   else
 case $host_os in
-freebsd[123]*) objformat=aout ;;
+freebsd[23].*) objformat=aout ;;
 *) objformat=elf ;;
 esac
   fi
@@ -16018,7 +16018,7 @@
   esac
   shlibpath_var=LD_LIBRARY_PATH
   case $host_os in
-  freebsd2*)
+  freebsd2.*)
 shlibpath_overrides_runpath=yes
 ;;
   freebsd3.[01]* | freebsdelf3.[01]*)
Index: config.rpath
===
--- config.rpath(revision 181783)
+++ config.rpath(working copy)
@@ -361,7 +361,7 @@
   hardcode_libdir_flag_spec='-R$libdir'
   hardcode_direct=yes
   ;;
-freebsd2*)
+freebsd2.*)
   hardcode_direct=yes
   hardcode_minus_L=yes
   ;;
@@ -533,7 +533,7 @@
 ;;
   freebsd* | dragonfly*)
 case $host_os in
-  freebsd[123]*)
+  freebsd[23].*)
 library_names_spec='$libname$shrext$versuffix' ;;
   *)
 library_names_spec='$libname$shrext' ;;


Re: [PATCH] Re: [v3] updated atomic configury

2011-11-28 Thread Richard Henderson
On 11/28/2011 11:29 AM, Andrew MacLeod wrote:
   c-family
   * c-cpp-builtin.c (cpp_atomic_builtins):New.  Emit all atomic
   predefines in one place.  Add LOCK_FREE predefines.
   (c_cpp_builtins): Move Legacy HAVE_SYNC predefines to
   new func.
 
   libstdc++-v3
   * include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp
   predefined macros.
   * testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER
   macro checks.  Check for expected compile time values.

Ok.


r~


Re: [PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 12:16:45PM -0700, Jeff Law wrote:
  On the following testcase the volatile load/store are optimized
  away. IMHO this is a bug in forwprop, which replaces tmp_Y =
  nonvolvar[arg_X]; MEM[(volatile ...*)tmp_Y] ={v} ...; with 
  MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...; where the LHS is no
  longer TREE_THIS_VOLATILE like before, TREE_THIS_VOLATILE is newly
  only the MEM_REF operand of the non-volatile ARRAY_REF.  This patch
  copies the TREE_THIS_VOLATILE bit to the ARRAY_REF.
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
  trunk?
  
  2011-11-28  Jakub Jelinek  ja...@redhat.com
  
  PR tree-optimization/50078 * tree-ssa-forwprop.c
  (forward_propagate_addr_expr_1): Copy over TREE_THIS_VOLATILE also
  from the old to new lhs resp. rhs.
  
  * gcc.dg/pr50078.c: New test.
 Do we need to copy NOTRAP or any of the other flags?
 
 OK, assuming we don't need to copy NOTRAP, etc.  If those need to be
 copied too, preapproved with those changes.

I had there TREE_THIS_NOTRAP as well in the first iteration, but
it ICEd, as TREE_THIS_NOTRAP is not valid e.g. on COMPONENT_REF.

Jakub


Re: [PATCH] Run remove_unreachable_handlers before expansion if needed (PR tree-optimization/50682)

2011-11-28 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/23/11 13:42, Jakub Jelinek wrote:
 Hi!
 
 As mentioned in the PR, if gimple_purge_dead_eh_edges is ever
 called after ehcleanup2 pass (or in its after todos as in the
 testcase below), nothing removes the unreachable EH regions and we
 crash either in cleanup_dead_labels_eh or, if we ignore those
 there, later on during emitting of exception info.
 
 We could run remove_unreachable_handlers unconditionally again
 shortly before expansion, but that needs a walk of all stmts, so
 this patch chooses to do it only if it sees some landing pads got
 removed.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
 trunk?
 
 2011-11-23  Jakub Jelinek  ja...@redhat.com
 
 PR tree-optimization/50682 * tree-eh.c
 (maybe_remove_unreachable_handlers): New function. * tree-flow.h
 (maybe_remove_unreachable_handlers): New prototype. *
 tree-optimize.c (execute_cleanup_cfg_post_optimizing): Call it.
 
 * g++.dg/opt/pr50682.C: New test.
OK.
jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO0+VXAAoJEBRtltQi2kC7KHsH/3lnrTb35gC1Qomuqwjx5bxe
CigQq/U1li55183bUDgNlKVWg3T/EtZO+QFtsdcy+e6lNZD7RebuNfIzz9rO8AT2
ra8Gj9Lz9fmZflhO64qvK+ti6k6n3/3wkM7e7b3z4+hWZWbf0OT9YI5eU/rfy6c7
57ylnZU5xGV6tvA16v7VTPoUQ2u2Pkan9PgRG8E+YZo7xTz0oxHM9o6CZe7VAPf5
rxvAY1KSBS16RNvCxtaZfBxpf8BdgOIF2wW7hWwfoyjtSk8DItFRRuD262AX3Z4R
Ka83fO+ukTnz1UUTPPzxszX0ZDAsqzD0iD3sjzh9PQcEvgiriVJcWrwvavMVLfg=
=Cmlo
-END PGP SIGNATURE-


Re: New port^2: Renesas RL78

2011-11-28 Thread DJ Delorie

Ping? Anything else for this?

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02178.html
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01467.html
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01356.html


Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files

2011-11-28 Thread Thomas Koenig

Hello world,

the testing of the test patch I submitted earlier (Thanks Salvatore and
Joost!) has shown a performance increase, so here is a formal
submission.  No test case because this patch is not supposed to
change anything, just make module reading a bit more efficient.

Regression-tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thomas

2011-11-28  Thomas Koenig  tkoe...@gcc.gnu.org

PR fortran/40958
* module.c (prev_module_line):  New variable.
(prev_module_column):  New variable.
(prev_character):  New variable.
(module_char):  Update the new variables.
(module_unget_char):  New function.
(parse_string):  Use module_unget_char.
(parse_integer):  Likewise.
(parse_name):  Likewise.
Index: module.c
===
--- module.c	(Revision 181745)
+++ module.c	(Arbeitskopie)
@@ -194,6 +194,8 @@ static char module_name[GFC_MAX_SYMBOL_LEN + 1];
 static bool specified_nonint, specified_int;
 
 static int module_line, module_column, only_flag;
+static int prev_module_line, prev_module_column, prev_character;
+
 static enum
 { IO_INPUT, IO_OUTPUT }
 iomode;
@@ -1036,6 +1038,10 @@ module_char (void)
   if (c == EOF)
 bad_module (Unexpected EOF);
 
+  prev_module_line = module_line;
+  prev_module_column = module_column;
+  prev_character = c;
+
   if (c == '\n')
 {
   module_line++;
@@ -1046,7 +1052,17 @@ module_char (void)
   return c;
 }
 
+/* Unget a character while remembering the line and column.  Works for
+   a single character only.  */
 
+static void
+module_unget_char (void)
+{
+  module_line = prev_module_line;
+  module_column = prev_module_column;
+  ungetc (prev_character, module_fp);
+}
+
 /* Parse a string constant.  The delimiter is guaranteed to be a
single quote.  */
 
@@ -1106,24 +1122,22 @@ parse_string (void)
 static void
 parse_integer (int c)
 {
-  module_locus m;
-
   atom_int = c - '0';
 
   for (;;)
 {
-  get_module_locus (m);
-
   c = module_char ();
   if (!ISDIGIT (c))
-	break;
+	{
+	  module_unget_char ();
+	  break;
+	}
 
   atom_int = 10 * atom_int + c - '0';
   if (atom_int  )
 	bad_module (Integer overflow);
 }
 
-  set_module_locus (m);
 }
 
 
@@ -1132,7 +1146,6 @@ parse_integer (int c)
 static void
 parse_name (int c)
 {
-  module_locus m;
   char *p;
   int len;
 
@@ -1141,13 +1154,14 @@ parse_name (int c)
   *p++ = c;
   len = 1;
 
-  get_module_locus (m);
-
   for (;;)
 {
   c = module_char ();
   if (!ISALNUM (c)  c != '_'  c != '-')
-	break;
+	{
+	  module_unget_char ();
+	  break;
+	}
 
   *p++ = c;
   if (++len  GFC_MAX_SYMBOL_LEN)
@@ -1156,11 +1170,6 @@ parse_name (int c)
 
   *p = '\0';
 
-  fseek (module_fp, -1, SEEK_CUR);
-  module_column = m.column + len - 1;
-
-  if (c == '\n')
-module_line--;
 }
 
 


Re: Support Solaris 11/SPARC in MD_FALLBACK_FRAME_STATE_FOR (PR ada/41929)

2011-11-28 Thread Eric Botcazou
 The result has been bootstrapped by Eric (thanks for this and debugging
 why a first version caused some regressions on Solaris 8) and myself on
 Solaris 8, 9, 10 and 11, and I've performed quite a bit of software
 archaeology (i.e. stared at old libc.so.1 and libthread.so.1 patches) to
 make sure noting got missed in the process.

All EH-related tests fail with a recent upgrade on Solaris 10.  Interestingly, 
the old implementation keeps working with the upgrade.  It seems that the 
__sighndlr case has been over-simplified:

  /* Look for the __sighndlr pattern.  */
  else if (*(unsigned int *)(pc-(4*5)) == sighndlr_pattern[0]
 *(unsigned int *)(pc-(4*4)) == sighndlr_pattern[1]
 *(unsigned int *)(pc-(4*3)) == sighndlr_pattern[2]
 *(unsigned int *)(pc-(4*2)) == sighndlr_pattern[3]
 *(unsigned int *)(pc-(4*1)) == sighndlr_pattern[4]
 *(unsigned int *)(pc-(4*0)) == sighndlr_pattern[5]
 *(unsigned int *)(pc+(4*1)) == sighndlr_pattern[6] )
{
  /* We have observed different calling frames among different
 versions of the operating system, so that we need to
 discriminate using the upper frame.  We look for the return
 address of the caller frame (there is an offset of 15 words
 between the frame address and the place where this return
 address is stored) in order to do some more pattern matching.  */
  unsigned int cuh_pattern
= *(unsigned int *)(*(unsigned int *)(this_cfa + 15*4) - 4);

  if (cuh_pattern == 0xd407a04c)
{
  /* This matches the call_user_handler pattern for Solaris 10.
 There are 2 cases so we look for the return address of the
 caller's caller frame in order to do more pattern matching.  */
  unsigned int sah_pattern
= *(unsigned int *)(*(unsigned int *)(this_cfa + 96 + 15*4) - 4);

  if (sah_pattern == 0x92100019)
/* This is the same setup as for Solaris 9, see below.  */
regs_off = 96 + 96 + 96 + 160;
  else
/* We need to move up three frames (the kernel frame, the
   call_user_handler frame, the __sighndlr frame).  Two of them
   have the minimum stack frame size (kernel and __sighndlr
   frames) of 96 bytes, and there is another with a stack frame
   of 160 bytes (the call_user_handler frame).  The ucontext_t
   structure is after this offset.  */
regs_off = 96 + 96 + 160;
}
  else if (cuh_pattern == 0x9410001a || cuh_pattern == 0x9410001b)
/* This matches the call_user_handler pattern for Solaris 9 and
   for Solaris 8 running inside Solaris Containers respectively.
   We need to move up four frames (the kernel frame, the signal
   frame, the call_user_handler frame, the __sighndlr frame).
   Three of them have the minimum stack frame size (kernel,
   signal, and __sighndlr frames) of 96 bytes, and there is
   another with a stack frame of 160 bytes (the call_user_handler
   frame).  The ucontext_t structure is after this offset.  */
regs_off = 96 + 96 + 96 + 160;
  else
/* We need to move up three frames (the kernel frame, the
   sigacthandler frame, and the __sighndlr frame).  Two of them
   have the minimum stack frame size (kernel and __sighndlr
   frames) of 96 bytes, and there is another with a stack frame
   of 216 bytes (the sigacthandler frame).  The ucontext_t 
   structure is after this offset.  */
regs_off = 96 + 96 + 216;
}

to

  if(/* Solaris 8+ - multi-threaded
   
   __sighndlr:save  %sp, -96, %sp
   __sighndlr+4:  mov  %i0, %o0
   __sighndlr+8:  mov  %i1, %o1
   __sighndlr+12: call  %i3
   __sighndlr+16: mov  %i2, %o2
   __sighndlr+20: ret --- PC
   __sighndlr+24: restore  */
pc[-5] == 0x9de3bfa0
  pc[-4] == 0x90100018
  pc[-3] == 0x92100019
  pc[-2] == 0x9fc6c000
  pc[-1] == 0x9410001a
  pc[ 0] == 0x81c7e008
  pc[ 1] == 0x81e8)
{
  if (/* Solaris 8 /usr/lib/libthread.so.1
--
sigacthandler+1796: mov  %i0, %o0  */
  savpc[-1] == 0x90100018)
{
  /* We need to move up two frames:

signal handler-- context-cfa
__sighndlr
sigacthandler
kernel
  */
  *nframes = 2;
}
  else /* Solaris 8 /usr/lib/lwp/libthread.so.1, Solaris 9+
 --  */
{
  /* We need to move up three frames:

signal handler-- context-cfa
__sighndlr
call_user_handler
sigacthandler
kernel

Re: [google] Patch to enable efficient function level instrumentation

2011-11-28 Thread Harshit Chopra
Ping!

--
Harshit


Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files

2011-11-28 Thread Janne Blomqvist
On Mon, Nov 28, 2011 at 22:29, Thomas Koenig tkoe...@netcologne.de wrote:
 Hello world,

 the testing of the test patch I submitted earlier (Thanks Salvatore and
 Joost!) has shown a performance increase, so here is a formal
 submission.  No test case because this patch is not supposed to
 change anything, just make module reading a bit more efficient.

 Regression-tested on x86_64-unknown-linux-gnu.  OK for trunk?

        Thomas

 2011-11-28  Thomas Koenig  tkoe...@gcc.gnu.org

        PR fortran/40958
        * module.c (prev_module_line):  New variable.
        (prev_module_column):  New variable.
        (prev_character):  New variable.
        (module_char):  Update the new variables.
        (module_unget_char):  New function.
        (parse_string):  Use module_unget_char.
        (parse_integer):  Likewise.
        (parse_name):  Likewise.

Nice! Ok for trunk.


-- 
Janne Blomqvist


Re: [PATCH] PR c++/51289 - ICE with alias template for bound template

2011-11-28 Thread Jason Merrill

On 11/28/2011 11:54 AM, Dodji Seketeli wrote:

@@ -3028,10 +3028,12 @@ find_parameter_packs_r (tree*tp, int *walk_subtrees, 
void*  data)

  case BOUND_TEMPLATE_TEMPLATE_PARM:
/* Check the template itself.  */
-  cp_walk_tree (TREE_TYPE (TYPE_TI_TEMPLATE (t)),
+  cp_walk_tree (TREE_TYPE (TI_TEMPLATE
+   (TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (t))),
find_parameter_packs_r, ppd, ppd-visited);
/* Check the template arguments.  */
-  cp_walk_tree (TYPE_TI_ARGS (t),find_parameter_packs_r, ppd,
+  cp_walk_tree (TI_ARGS (TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (t)),
+   find_parameter_packs_r, ppd,
ppd-visited);
*walk_subtrees = 0;
return NULL_TREE;


Instead of this change, I think we should handle typedefs/aliases at the 
top of the function.  We shouldn't need to look into the underlying type 
for packs.


Jason


Re: [PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)

2011-11-28 Thread Eric Botcazou
 Here is an attempt to make the check more complete (e.g.
 the change wouldn't see overlap if addr was PLUS of two REGs,
 where one of the REGs was based on internal_arg_pointer, etc.)
 and less pessimistic.  As tree-tailcall.c doesn't allow tail calls
 from functions that have address of any of the caller's parameters
 taken, IMHO it is enough to look for internal_arg_pointer based
 pseudos initialized in the tail call sequence.
 This patch scans the tail call sequence and notes which pseudos
 are based on internal_arg_pointer (and what offset from
 that pointer they have) and uses that in
 mem_overlaps_already_clobbered_arg_p.

This looks reasonable, but the logic is a bit hard to follow, especially the 
double usage of internal_arg_pointer_based_reg depending on SCAN's value.
Would it be possible to split it into 2 functions that recursively call each 
other?

You should also make it clearer that internal_arg_pointer_seq_start is part of 
the caching mechanism (I wondered for some minutes whether it has anything to 
do with RTL sequences), maybe:

/* Last insn that has been scanned by internal_arg_pointer_based_reg, or
   NULL_RTX if none has been scanned yet.  */
static rtx internal_arg_pointer_insn_cache;

/* Vector indexed by REGNO - FIRST_PSEUDO_REGISTER, recording if a pseudo is
   based on crtl-args.internal_arg_pointer.  The element is NULL_RTX if the
   pseudo isn't based on it, a CONST_INT offset if the pseudo is based on it
   with fixed offset, or PC if this is with variable or unknown offset.  */
static VEC(rtx, heap) *internal_arg_pointer_pseudo_cache;


 +/* Helper function for internal_arg_pointer_based_reg, called through
 +   for_each_rtx.  Return 1 if a crtl-args.internal_arg_pointer based
 +   register is seen anywhere.  */
 +
 +static int
 +internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED)
 +{
 +  if (REG_P (*loc)  internal_arg_pointer_based_reg (*loc, false) !=
 NULL_RTX) +return 1;
 +  if (MEM_P (*loc))
 +return -1;
 +  return 0;
 +}

Missing comment for the MEM_P case.


 +/* If REG is based on crtl-args.internal_arg_pointer, return either
 +   a CONST_INT offset from crtl-args.internal_arg_pointer if
 +   offset from it is known constant, or PC if the offset is unknown.
 +   Return NULL_RTX if REG isn't based on crtl-args.internal_arg_pointer. 
 */ +
 +static rtx
 +internal_arg_pointer_based_reg (rtx reg, bool scan)
 +{
 +  rtx insn;
 +
 +  if (CONSTANT_P (reg))
 +return NULL_RTX;
 +
 +  if (reg == crtl-args.internal_arg_pointer)
 +return const0_rtx;
 +
 +  if (REG_P (reg)  REGNO (reg)  FIRST_PSEUDO_REGISTER)
 +return NULL_RTX;

if (REG_P (reg)  HARD_REGISTER_P (reg))
  return NULL_RTX;

-- 
Eric Botcazou


Re: [google] Patch to enable efficient function level instrumentation (issue 5416043)

2011-11-28 Thread davidxl

Please also explain the need for backpointer section.

David


http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c
File gcc/config/i386/i386.c (right):

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10801
gcc/config/i386/i386.c:10801: +static bool
Add an empty line between comment and function.

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10811
gcc/config/i386/i386.c:10811: +  if (!patch_functions_ignore_loops)
What exactly does this option try to do here?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10822
gcc/config/i386/i386.c:10822: +return true;
should it have a an else return false here?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10841
gcc/config/i386/i386.c:10841: +++num_insns;
NON_DEBUG_INSN_P (insn)

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10859
gcc/config/i386/i386.c:10859: +  patch_current_function_p =
check_should_patch_current_function();
Why do you need a static variable for this?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10874
gcc/config/i386/i386.c:10874: +   '_function_patch_epilogue'.
Explain why a backpointer section is needed.

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10888
gcc/config/i386/i386.c:10888: +  unsigned int num_actual_nops =
num_remaining_nops - 8;
hard code of 8?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10897
gcc/config/i386/i386.c:10897: +return false;
Can this be guarded in the caller of this function?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10927
gcc/config/i386/i386.c:10927: + is later renamed to 'section_name'
by ix86_elf_asm_named_section().  */
Explain more on the comdat handling.

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt
File gcc/config/i386/i386.opt (right):

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode570
gcc/config/i386/i386.opt:570: Minimum number of instructions in the
function without loop before the function is qualified for patching for
instrumentation (for use with -mpatch-functions-for-instrumentation)
It may be better to define PARAM for it.

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode574
gcc/config/i386/i386.opt:574: Ignore loops when deciding whether to
patch a function for instrumentation (for use with
-mpatch-functions-for-instrumentation).
What is the motivation for this option?

http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode578
gcc/config/i386/i386.opt:578: Treat 'main' as any other function and
only patch it if it meets the criteria for loops and minimum number of
instructions (for use with -mpatch-functions-for-instrumentation).
What is the motivation for this option?

http://codereview.appspot.com/5416043/


Re: Fix libgomp semaphores

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:23:37PM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote:
  +  int count = *sem;
  +
  +  while ((count  0x7fff) != 0)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
  +  false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   return;
 
 Why aren't you instead testing the return value of
 __atomic_compare_exchange_4 (here and in other cases)?

If you use the return value on powerpc, you find that requires two
load immediate instructions (loading zero and one), and a compare
against zero.  That makes three fewer instructions as written, because
the oldval == count comparison has already been done inside the atomic
sequence.  I'd expect fewer on most other architectures unless they
happen to have a compare and exchange instruction that sets condition
bits (ie. Intel).  Even on Intel the way I've written the code
shouldn't take more instructions with a properly written cmpxchg rtl
description.  Does it?

Hmm, I suppose you could argue that powerpc and others ought to not
generate those three extra instructions when using the return value.
I'll see about fixing powerpc.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [trunk] RFS: translate built-in include paths for sysroot (issue 5394041)

2011-11-28 Thread shenhan

Hi, Joseph, get a chance to take a look? Thanks!
-Han

On 2011/11/19 00:10:20, shenhan wrote:

Hi, Joseph, thanks!



ChangeLog entries added to the issue description.



ChangeLog
 * Makefile.in (GPLUSPLUS_INCLUDE_DIR_ADD_SYSROOT): add a macro
 definition to compile command.
 * cppdefault.c (GPLUSPLUS_INCLUDE_DIR_ADD_SYSROOT): replace

hard

 coded add_sysroot field with the control macro.
 * configure.ac (gcc_gxx_include_dir_add_sysroot): add a flag
 variable to control whether sysroot should be prepended to gxx
 include dir.
 * configure : Regenerate.



Yeah, I should include the URL for the previous discussion in the

mail. (I

previously put it in the issue description together with the

rationale.)


This is a follow up for issue - http://codereview.appspot.com/4641076.
The rationale for this patch is also copied here:
==



The setup:



Configuring a toolchain targeting x86-64 GNU Linux (Ubuntu Lucid), as

a

cross-compiler.  Using a sysroot to provide the Lucid

headers+libraries,

with the sysroot path being within the GCC install tree.  Want to use

the

Lucid system libstdc++ and headers, which means that I'm not
building/installing libstdc++-v3.



So, configuring with:
   --with-sysroot=$SYSROOT
   --disable-libstdc++-v3 \
   --with-gxx-include-dir=$SYSROOT/usr/include/c++/4.4 \
(among other options).



Hoping to support two usage models with this configuration, w.r.t. use

of

the sysroot:



(1) somebody installs the sysroot in the normal location relative to

the

GCC install, and relocates the whole bundle (sysroot+GCC).  This works
great AFAICT, GCC finds its includes (including the C++ includes)

thanks

to the add_standard_paths iprefix handling.



(2) somebody installs the sysroot in a non-standard location, and uses
--sysroot to try to access it.  This works fine for the C headers, but
doesn't work.



For the C headers, add_standard_paths prepends the sysroot location to
the /usr/include path (since that's what's specified in cppdefault.c

for

that path).  It doesn't do the same for the C++ include path, though
(again, as specified in cppdefault.c).



add_standard_paths doesn't attempt to relocate built-in include paths

that

start with the compiled-in sysroot location (e.g., the g++ include

dir, in

this case).  This isn't surprising really: normally you either prepend

the

sysroot location or you don't (as specified by cppdefault.c); none of

the

built-in paths normally *start* with the sysroot location and need to

be

relocated.  However, in this odd-ball case of trying to use the C++

headers

from the sysroot, one of the paths *does* need to be relocated in this

way.

=




Chris(cgd@) provided a patch for the issue, but there was a different

opinion,

my patch here is just coded as suggested.



Tested before/after on a x86_64-unknown-linux-gnu system, with
--enable-languages=all,

RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'.

No changes in test results.



-Han




On Fri, Nov 18, 2011 at 8:54 AM, Joseph S. Myers

jos...@codesourcery.comwrote:


 On Tue, 15 Nov 2011, Han Shen wrote:

  2011-11-15   Han Shen  mailto:shen...@google.com
 
* gcc/Makefile.in:
* gcc/configure:
* gcc/cppdefault.c:

 You need to include the full ChangeLog entries with your patch.

Note that

 paths in ChangeLogs are relative to the directory with the

ChangeLog, so

 no gcc/ in this case.

 Please also include the full rationale with your patch *and URLs for

the

 previous discussion* (from June, it seems) so that reviewers don't

need to

 track that down themselves.

  diff --git a/gcc/configure b/gcc/configure
  index 99334ce..364d8c2 100755
  --- a/gcc/configure
  +++ b/gcc/configure

 It's not possible to review this patch as is because you've only

included

 the changes to the generated file configure, not to its source file
 configure.ac.  Please make sure that your resubmission includes all

the

 source file changes.  (There is no need to include the changes to

the

 generated file configure at all in the submission; the ChangeLog

entry can

 just mention it as * configure: Regenerate..)

 --
 Joseph S. Myers
 mailto:jos...@codesourcery.com






--





http://codereview.appspot.com/5394041/


Re: [PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 11:10:56PM +0100, Eric Botcazou wrote:
 This looks reasonable, but the logic is a bit hard to follow, especially the 
 double usage of internal_arg_pointer_based_reg depending on SCAN's value.
 Would it be possible to split it into 2 functions that recursively call each 
 other?

Well, splitting off the scanning into separate function is quite easy, but
that would still mean SCAN parameter.  The thing is, for the common case
where ADDR is already internal_arg_pointer, or sum of that and CONST_INT, or
SYMBOL_REF, I'd prefer to avoid the scanning of the instructions and only do
that if really needed (i.e. when either looking for a pseudo or arbitrary
expression that needs to be for_each_rtx scanned).  To get rid of the SCAN
argument I'm afraid the first part of the function would need to be
duplicated (for the current SCAN and !SCAN cases), then the scanning phase
in another function and the last part in another one.

 You should also make it clearer that internal_arg_pointer_seq_start is part 
 of 
 the caching mechanism (I wondered for some minutes whether it has anything to 
 do with RTL sequences), maybe:

Ok.

  +static int
  +internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED)
  +{
  +  if (REG_P (*loc)  internal_arg_pointer_based_reg (*loc, false) !=
  NULL_RTX) +return 1;
  +  if (MEM_P (*loc))
  +return -1;
  +  return 0;
  +}
 
 Missing comment for the MEM_P case.

Will add.
 
 if (REG_P (reg)  HARD_REGISTER_P (reg))
   return NULL_RTX;

Ok.  BTW, it fixes also wrong-code PR51323, so I'll be including new
testcase in the next patch iteration.

Jakub


Re: Fix libgomp semaphores

2011-11-28 Thread Richard Henderson
On 11/28/2011 02:16 PM, Alan Modra wrote:
 Hmm, I suppose you could argue that powerpc and others ought to not
 generate those three extra instructions when using the return value.
 I'll see about fixing powerpc.

True.  For weak, the value *should* always be used (otherwise the user program 
is broken).

However, we can do better by considering the value to be stored in CR0.  We 
perform the comparison in the loop, which sets CR0 == EQ (true) or NE (false); 
CR0 is set again by the STWCX insn in the same fashion.  So the

  /* ??? It's either this or an unlikely jump over (set bool 1).  */
  x = gen_rtx_EQ (SImode, cond, const0_rtx);
  emit_insn (gen_rtx_SET (VOIDmode, boolval, x));

code currently emitted by rs6000_expand_atomic_compare_and_swap is sufficient, 
and merely needs to be adjusted so that it is computed after label2, and other 
settings to boolval removed.

This may be sufficient for better stong compare-and-swap as well.


r~


[PATCH] Don't ICE on label DEBUG_INSN in rtl loop unrolling (PR rtl-optimization/51014)

2011-11-28 Thread Jakub Jelinek
Hi!

DEBUG_INSN with LABEL_DECL var isn't duplicated in bb copies (we want
just one definition of the label), which breaks apply_opt_in_copies
attempt to match insn in bb copy with orig_insn from the orig_bb.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2011-11-28  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/51014
* loop-unroll.c (apply_opt_in_copies): Ignore label DEBUG_INSNs
both from bb and orig_bb.

* g++.dg/opt/pr51014.C: New test.

--- gcc/loop-unroll.c.jj2011-02-15 15:42:26.0 +0100
+++ gcc/loop-unroll.c   2011-11-28 21:03:58.089497366 +0100
@@ -2262,10 +2262,15 @@ apply_opt_in_copies (struct opt_info *op
   for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next)
 {
   next = NEXT_INSN (insn);
-  if (!INSN_P (insn))
+ if (!INSN_P (insn)
+ || (DEBUG_INSN_P (insn)
+  TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == LABEL_DECL))
 continue;
 
-  while (!INSN_P (orig_insn))
+ while (!INSN_P (orig_insn)
+|| (DEBUG_INSN_P (orig_insn)
+ (TREE_CODE (INSN_VAR_LOCATION_DECL (orig_insn))
+== LABEL_DECL)))
 orig_insn = NEXT_INSN (orig_insn);
 
   ivts_templ.insn = orig_insn;
--- gcc/testsuite/g++.dg/opt/pr51014.C.jj   2011-11-28 21:08:19.518986308 
+0100
+++ gcc/testsuite/g++.dg/opt/pr51014.C  2011-11-28 21:07:24.0 +0100
@@ -0,0 +1,16 @@
+// PR rtl-optimization/51014
+// { dg-do compile }
+// { dg-options -O2 -funroll-loops -fcompare-debug }
+
+struct S
+{
+  ~S() { delete s; }
+  int *s;
+};
+
+void
+f (S *x, S *y)
+{
+  for (; x != y; ++x)
+x-~S();
+}

Jakub


Re: PR other/51174: handle architectures with no DECL_COMDAT_GROUP

2011-11-28 Thread David Edelsohn
On Wed, Nov 23, 2011 at 8:57 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Nov 23, 2011 at 07:47:46AM -0600, Aldy Hernandez wrote:
 @@ -4198,7 +4198,7 @@ ipa_tm_create_version_alias (struct cgra
    TREE_SYMBOL_REFERENCED (tm_name) = 1;
 
    /* Perform the same remapping to the comdat group.  */
 -  if (DECL_COMDAT (new_decl))
 +  if (HAVE_COMDAT_GROUP  DECL_COMDAT (new_decl))
      DECL_COMDAT_GROUP (new_decl) = tm_mangle (DECL_COMDAT_GROUP 
  (old_decl));
 
    new_node = cgraph_same_body_alias (NULL, new_decl, info-new_decl);

 Wouldn't it be better to test if (DECL_ONE_ONLY (new_decl))
 instead?  That is actually test for non-NULL DECL_COMDAT_GROUP
 and is what e.g. cp/ uses as guards on tweaking DECL_COMDAT_GROUP.

 BTW, the formatting is wrong above, no space before  and two spaces after
 it.

That will work as well.

Which solution should be committed?

Thanks, David


Re: Fix libgomp semaphores

2011-11-28 Thread Richard Henderson
On 11/27/2011 02:53 PM, Alan Modra wrote:
 +enum memmodel
 +{
 +  MEMMODEL_RELAXED = 0,
 +  MEMMODEL_CONSUME = 1,
 +  MEMMODEL_ACQUIRE = 2,
 +  MEMMODEL_RELEASE = 3,
 +  MEMMODEL_ACQ_REL = 4,
 +  MEMMODEL_SEQ_CST = 5,
 +  MEMMODEL_LAST = 6
 +};

This should probably go to libgomp.h.

  /* This is a Linux specific implementation of a semaphore synchronization
 mechanism for libgomp.  This type is private to the library.  This 
 +   counting semaphore implementation uses atomic instructions and the
 +   futex syscall, and a single 32-bit int to store semaphore state.
 +   The low 31 bits are the count, the top bit is a flag set when some
 +   threads may be waiting.  */

I think we'll get better code generation on a number of targets if we make the 
low bit the waiting  bit, and the higher bits the count.  This we do (count  
1) and (count + 2) instead of larger constants needing to be generated.  Not 
changed below...

 +static inline void
 +gomp_sem_wait (gomp_sem_t *sem)
  {
 +  int count = *sem;
 +
 +  while ((count  0x7fff) != 0)
 +{
 +  int oldval = count;
 +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
 +false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
 +  if (__builtin_expect (oldval == count, 1))
 + return;
 +  count = oldval;
 +}

I'd really prefer not to hard-code any sizes at all.  Let's change the explicit 
_4 to _n everywhere.

The loop above doesn't actually make sense to me.  If the compare-and-swap 
succeeds, then oldval == count - 1.  Which doesn't seem to be what you're 
testing at all.

Perhaps this entire function is better written as

{
  int count = *sem;
  do
{
  if (__builtin_expect (count  0x8000u, 0)
{
  gomp_sem_wait_slow (sem, count);
  return;
}
}
  while (!__atomic_compare_exchange_n (sem, count, count - 1, true,
   MEMMODEL_ACQUIRE, MEMMODEL_RELAXED));
}

 +gomp_sem_post (gomp_sem_t *sem)
  {
 +  int count = *sem;
 +
 +  while (1)
 +{
 +  int oldval = count;
 +  __atomic_compare_exchange_4 (sem, oldval, (count + 1)  0x7fff,
 +false, MEMMODEL_RELEASE, MEMMODEL_RELAXED);
 +  if (__builtin_expect (oldval == count, 1))
 + break;

Again, this equality doesn't make sense.  Further, you aren't testing for the
high bit set inside the loop, nor are you preserving the high bit.

Perhaps better written as

{
  int count = *sem;
  do
{
  if (__builtin_expect (count  0x8000u, 0))
{
  gomp_sem_post_slow (sem, count);
  return;
}
  /* We can't post more than 2**31-1 times.  */
  assert (count  0x7fff);
}
  while (!__atomic_compare_exchange_n (sem, count, count + 1, true,
   MEMMODEL_RELEASE, MEMMODEL_RELAXED));
}

... All that said, I don't see how we can ever clear the wait bit once its set? 
 Are we better off splitting the 32-bit value into two 16-bit fields for 
value+waiters?  Or similarly splitting a 64-bit value?  That way we can at 
least update both fields atomically, and we ought never lose a waiter.


r~


Re: PR other/51174: handle architectures with no DECL_COMDAT_GROUP

2011-11-28 Thread Richard Henderson
On 11/28/2011 03:54 PM, David Edelsohn wrote:
 On Wed, Nov 23, 2011 at 8:57 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Nov 23, 2011 at 07:47:46AM -0600, Aldy Hernandez wrote:
 @@ -4198,7 +4198,7 @@ ipa_tm_create_version_alias (struct cgra
   TREE_SYMBOL_REFERENCED (tm_name) = 1;

   /* Perform the same remapping to the comdat group.  */
 -  if (DECL_COMDAT (new_decl))
 +  if (HAVE_COMDAT_GROUP  DECL_COMDAT (new_decl))
 DECL_COMDAT_GROUP (new_decl) = tm_mangle (DECL_COMDAT_GROUP 
 (old_decl));

   new_node = cgraph_same_body_alias (NULL, new_decl, info-new_decl);

 Wouldn't it be better to test if (DECL_ONE_ONLY (new_decl))
 instead?  That is actually test for non-NULL DECL_COMDAT_GROUP
 and is what e.g. cp/ uses as guards on tweaking DECL_COMDAT_GROUP.

 BTW, the formatting is wrong above, no space before  and two spaces after
 it.
 
 That will work as well.
 
 Which solution should be committed?

Please use the DECL_ONE_ONLY test.


Re: Fix libgomp semaphores

2011-11-28 Thread Richard Henderson
On 11/28/2011 03:05 PM, Richard Henderson wrote:
 On 11/28/2011 02:16 PM, Alan Modra wrote:
 Hmm, I suppose you could argue that powerpc and others ought to not
 generate those three extra instructions when using the return value.
 I'll see about fixing powerpc.
 
 However, we can do better by considering the value to be stored in CR0...

Try this and see if it generates the sort of code you want.  Untested.


r~

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f01353b..d3b990f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -17352,11 +17352,11 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
   retval = gen_reg_rtx (SImode);
   mode = SImode;
 }
+  else if (reg_overlap_mentioned_p (retval, oldval))
+oldval = copy_to_reg (oldval);
 
   rs6000_pre_atomic_barrier (mod_s);
 
-  emit_move_insn (boolval, const0_rtx);
-
   label1 = NULL_RTX;
   if (!is_weak)
 {
@@ -17374,28 +17374,23 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
   NULL_RTX, 1, OPTAB_LIB_WIDEN);
 }
 
-  x = gen_rtx_NE (VOIDmode, x, oldval);
-  x = rs6000_generate_compare (x, mode);
+  cond = gen_reg_rtx (CCmode);
+  x = gen_rtx_COMPARE (CCmode, x, oldval);
+  emit_insn (gen_rtx_SET (VOIDmode, cond, x));
+
+  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label2);
 
   x = newval;
   if (mask)
 x = rs6000_mask_atomic_subword (retval, newval, mask);
 
-  cond = gen_reg_rtx (CCmode);
   emit_store_conditional (mode, cond, mem, x);
 
-  if (is_weak)
-{
-  /* ??? It's either this or an unlikely jump over (set bool 1).  */
-  x = gen_rtx_EQ (SImode, cond, const0_rtx);
-  emit_insn (gen_rtx_SET (VOIDmode, boolval, x));
-}
-  else
+  if (!is_weak)
 {
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label1);
-  emit_move_insn (boolval, const1_rtx);
 }
 
   if (mod_f != MEMMODEL_RELAXED)
@@ -17408,6 +17403,18 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
 
   if (shift)
 rs6000_finish_atomic_subword (operands[1], retval, shift);
+
+  if (is_weak)
+{
+  x = gen_rtx_EQ (SImode, cond, const0_rtx);
+  emit_insn (gen_rtx_SET (VOIDmode, boolval, x));
+}
+  else
+{
+  x = emit_store_flag_force (boolval, EQ, retval, oldval, mode, 1, 1);
+  if (x != boolval)
+   emit_move_insn (boolval, x);
+}
 }
 
 /* Expand an atomic exchange operation.  */


[rs6000] Streamline compare-and-swap success return value computation

2011-11-28 Thread Richard Henderson
On 11/28/2011 04:26 PM, Richard Henderson wrote:
 On 11/28/2011 03:05 PM, Richard Henderson wrote:
 On 11/28/2011 02:16 PM, Alan Modra wrote:
 Hmm, I suppose you could argue that powerpc and others ought to not
 generate those three extra instructions when using the return value.
 I'll see about fixing powerpc.

 However, we can do better by considering the value to be stored in CR0...
 
 Try this and see if it generates the sort of code you want.  Untested.

... actually, this one.  There's no reason to differentiate between strong
and weak compare-and-swap when computing boolval.


r~
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f01353b..5a33f91 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -17352,11 +17352,11 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
   retval = gen_reg_rtx (SImode);
   mode = SImode;
 }
+  else if (reg_overlap_mentioned_p (retval, oldval))
+oldval = copy_to_reg (oldval);
 
   rs6000_pre_atomic_barrier (mod_s);
 
-  emit_move_insn (boolval, const0_rtx);
-
   label1 = NULL_RTX;
   if (!is_weak)
 {
@@ -17374,28 +17374,23 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
   NULL_RTX, 1, OPTAB_LIB_WIDEN);
 }
 
-  x = gen_rtx_NE (VOIDmode, x, oldval);
-  x = rs6000_generate_compare (x, mode);
+  cond = gen_reg_rtx (CCmode);
+  x = gen_rtx_COMPARE (CCmode, x, oldval);
+  emit_insn (gen_rtx_SET (VOIDmode, cond, x));
+
+  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label2);
 
   x = newval;
   if (mask)
 x = rs6000_mask_atomic_subword (retval, newval, mask);
 
-  cond = gen_reg_rtx (CCmode);
   emit_store_conditional (mode, cond, mem, x);
 
-  if (is_weak)
-{
-  /* ??? It's either this or an unlikely jump over (set bool 1).  */
-  x = gen_rtx_EQ (SImode, cond, const0_rtx);
-  emit_insn (gen_rtx_SET (VOIDmode, boolval, x));
-}
-  else
+  if (!is_weak)
 {
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label1);
-  emit_move_insn (boolval, const1_rtx);
 }
 
   if (mod_f != MEMMODEL_RELAXED)
@@ -17408,6 +17403,10 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[])
 
   if (shift)
 rs6000_finish_atomic_subword (operands[1], retval, shift);
+
+  /* In all cases, CR0 contains EQ on success, and NE on failure.  */
+  x = gen_rtx_EQ (SImode, cond, const0_rtx);
+  emit_insn (gen_rtx_SET (VOIDmode, boolval, x));
 }
 
 /* Expand an atomic exchange operation.  */


Re: Fix libgomp semaphores

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 04:00:10PM -0800, Richard Henderson wrote:
 On 11/27/2011 02:53 PM, Alan Modra wrote:
  +enum memmodel
  +{
  +  MEMMODEL_RELAXED = 0,
  +  MEMMODEL_CONSUME = 1,
  +  MEMMODEL_ACQUIRE = 2,
  +  MEMMODEL_RELEASE = 3,
  +  MEMMODEL_ACQ_REL = 4,
  +  MEMMODEL_SEQ_CST = 5,
  +  MEMMODEL_LAST = 6
  +};
 
 This should probably go to libgomp.h.

I wondered about that.

   /* This is a Linux specific implementation of a semaphore synchronization
  mechanism for libgomp.  This type is private to the library.  This 
  +   counting semaphore implementation uses atomic instructions and the
  +   futex syscall, and a single 32-bit int to store semaphore state.
  +   The low 31 bits are the count, the top bit is a flag set when some
  +   threads may be waiting.  */
 
 I think we'll get better code generation on a number of targets if we make 
 the low bit the waiting  bit, and the higher bits the count.  This we do 
 (count  1) and (count + 2) instead of larger constants needing to be 
 generated.  Not changed below...

Funny, that's the way I wrote this code at first, then went for the
wait bit as the sign bit because you can test  0 in one place where
you want not waiting and count non-zero.

  +static inline void
  +gomp_sem_wait (gomp_sem_t *sem)
   {
  +  int count = *sem;
  +
  +  while ((count  0x7fff) != 0)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, count - 1,
  +  false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   return;
  +  count = oldval;
  +}
 
 I'd really prefer not to hard-code any sizes at all.  Let's change the 
 explicit _4 to _n everywhere.

OK.

 The loop above doesn't actually make sense to me.  If the compare-and-swap 
 succeeds, then oldval == count - 1.  Which doesn't seem to be what you're 
 testing at all.

Huh?  If it succeeds, oldval == count and we return.

 Perhaps this entire function is better written as
 
 {
   int count = *sem;
   do
 {
   if (__builtin_expect (count  0x8000u, 0)
 {
   gomp_sem_wait_slow (sem, count);
   return;
 }
 }
   while (!__atomic_compare_exchange_n (sem, count, count - 1, true,
  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED));
 }

No, we don't need the slow path if we have *sem  0x7fff non-zero.

 
  +gomp_sem_post (gomp_sem_t *sem)
   {
  +  int count = *sem;
  +
  +  while (1)
  +{
  +  int oldval = count;
  +  __atomic_compare_exchange_4 (sem, oldval, (count + 1)  0x7fff,
  +  false, MEMMODEL_RELEASE, MEMMODEL_RELAXED);
  +  if (__builtin_expect (oldval == count, 1))
  +   break;
 
 Again, this equality doesn't make sense.  Further, you aren't testing for the
 high bit set inside the loop, nor are you preserving the high bit.

See above about the equality.  We don't want to preserve the wait bit
here.  Otherwise we'd never go back to the uncontended state, which
answers

 ... All that said, I don't see how we can ever clear the wait bit
 once its set?

this question.

 Are we better off splitting the 32-bit value into two 16-bit fields for 
 value+waiters?  Or similarly splitting a 64-bit value?  That way we can at 
 least update both fields atomically, and we ought never lose a waiter.

Other splits of the field are certainly possible, but of course
restrict the max number, and you'd need the fancy futex op_wait.
Some targets don't have 64-bit atomics, so we can't really go that
way.  Yes, if I did keep track of number of waiters we'd save one
unnecessary futex_wake call, but I'm quite confident no waiters will
be lost just using a flag.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [rs6000] Streamline compare-and-swap success return value computation

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 04:33:58PM -0800, Richard Henderson wrote:
 On 11/28/2011 04:26 PM, Richard Henderson wrote:
  On 11/28/2011 03:05 PM, Richard Henderson wrote:
  On 11/28/2011 02:16 PM, Alan Modra wrote:
  Hmm, I suppose you could argue that powerpc and others ought to not
  generate those three extra instructions when using the return value.
  I'll see about fixing powerpc.
 
  However, we can do better by considering the value to be stored in CR0...
  
  Try this and see if it generates the sort of code you want.  Untested.
 
 ... actually, this one.  There's no reason to differentiate between strong
 and weak compare-and-swap when computing boolval.

Looks very similar to what I wrote, except

 +  else if (reg_overlap_mentioned_p (retval, oldval))
 +oldval = copy_to_reg (oldval);

I missed this bit

 -  x = gen_rtx_NE (VOIDmode, x, oldval);
 -  x = rs6000_generate_compare (x, mode);
 +  cond = gen_reg_rtx (CCmode);
 +  x = gen_rtx_COMPARE (CCmode, x, oldval);
 +  emit_insn (gen_rtx_SET (VOIDmode, cond, x));
 +
 +  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);

and instead pulled cond out of the return from rs6000_generate_compare.

Results look good.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-28 Thread Richard Henderson
On 11/27/2011 02:57 PM, Alan Modra wrote:
 This is the mutex part.  Depends on
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
 values.
 
   * config/linux/mutex.h: Use atomic rather than sync builtins.
   * config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
   * config/linux/omp-lock.h: Comment fix.
   * config/linux/arm/mutex.h: Delete.
   * config/linux/powerpc/mutex.h: Delete.
   * config/linux/ia64/mutex.h: Delete.
   * config/linux/mips/mutex.h: Delete.

Looks good modulo _4/_n and using the success return of 
__atomic_compare_exchange.


r~


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Richard Henderson
On 11/28/2011 06:02 AM, Alan Modra wrote:
 -  unsigned int ret = bar-generation  ~3;
 -  /* Do we need any barrier here or is __sync_add_and_fetch acting
 - as the needed LoadLoad barrier already?  */
 -  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
 +  unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE)  
 ~3;
 +  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;

Given that the read from bar-generation is ACQ, we don't need a duplicate 
barrier from the REL on the atomic add.  I believe both can be MEMMODEL_ACQUIRE
both in order to force the ordering of these two memops, as well as force these
to happen before anything subsequent.

The s/_4/_n/ change needs doing.

Otherwise ok.


r~


[libgomp] Define memmodel

2011-11-28 Thread Richard Henderson
This fixes the ordering issues with a couple of Alan's pending patches, and put 
the enumeration where the entire library can use it.

Committed.


r~
* libgomp.h (enum memmodel): New.



diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9cf9d32..2c9c071 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -45,6 +45,17 @@
 # pragma GCC visibility push(hidden)
 #endif
 
+/* If we were a C++ library, we'd get this from std/atomic.  */
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0,
+  MEMMODEL_CONSUME = 1,
+  MEMMODEL_ACQUIRE = 2,
+  MEMMODEL_RELEASE = 3,
+  MEMMODEL_ACQ_REL = 4,
+  MEMMODEL_SEQ_CST = 5
+};
+
 #include sem.h
 #include mutex.h
 #include bar.h


Re: [PATCH, rs6000] Reorganize powerpc builtins and enable target pragma/attribute to enable/disable builtins

2011-11-28 Thread David Edelsohn
On Mon, Nov 7, 2011 at 5:32 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:

 This patch rewrites the way builtins are handled in the rs6000 port so that
 like the x86, when you do #pragma GCC target or attribute((target(...))) and 
 it
 enables or disables the builtins based on the current target attributes.  It
 also defines or undefines the various hardware macros (__ARCH_xxx, __VSX__,
 and __ALTIVEC__), and enables/disables the vector keyword.

 I've done bootstrap builds with make check, and there is one minor regression,
 which I will fix as a bug (gcc.target/powerpc/recip-5.c fails due to not
 vectorizing the recip builtin) after this patch goes in.  I build a 
 linuxpaired
 compiler and verified that the paired builtins are being handled correctly.

 I have some more ideas of cleanup (moving the large table in rs6000-c.c into
 rs6000-builtins.def, and eliminating the various bdesc arrays, but for now, it
 works, and hopefully is a bit cleaner than before).

 Is this ok to check in?

 [gcc]
 2011-11-07  Michael Meissner  meiss...@tlinux.vnet.ibm.com

        * config/rs6000/rs6000-builtins.def: Completely rewrite builtin
        handling to centralize more of the builtins in this file.  Change
        some builtin enumerations to be more consistant.  Use a new mask
        to hold the current builtins, including SPE and PAIRED builtins
        which no longer are set via target_flags masks.  Add
        -mdebug=builtin debug support.  For power machines, define all
        Altivec and VSX buitins when the compiler starts, but don't allow
        the use of a builtin unless the appropriate switch is used, or
        #pragma GCC target is used to change the options.  If the user
        uses #pragma GCC target, update the appropriate hardware macros.
        * common/config/rs6000/rs6000-common.c (rs6000_handle_option):
        Ditto.
        * config/rs6000/rs6000.opt (rs6000_builtin_mask): Ditto.
        * config/rs6000/rs6000-c.c (rs6000_macro_to_expand): Ditto.
        (rs6000_define_or_undefine_macro): Ditto.
        (rs6000_target_modify_macros): Ditto.
        (rs6000_cpu_cpp_builtins): Ditto.
        (altivec_overloaded_builtins): Ditto.
        (altivec_build_resolved_builtin): Ditto.
        * config/rs6000/rs6000.c (rs6000_target_modify_macros_ptr):
        Ditto.
        (rs6000_builtin_info): Ditto.
        (def_builtin): Ditto.
        (enable_mask_for_builtins): Ditto.
        (DEBUG_FMT_X): Ditto.
        (rs6000_debug_reg_global): Ditto.
        (rs6000_builtin_mask_calculate): Ditto.
        (rs6000_option_override_internal): Ditto.
        (rs6000_builtin_conversion): Ditto.
        (rs6000_builtin_vectorized_function): Ditto.
        (bdesc_3arg): Ditto.
        (bdesc_dst): Ditto.
        (bdesc_2arg): Ditto.
        (builtin_description_predicates): Ditto.
        (bdesc_altivec_preds): Ditto.
        (bdesc_spe_predicates): Ditto.
        (bdesc_spe_evsel): Ditto.
        (bdesc_paired_preds): Ditto.
        (bdesc_abs): Ditto.
        (bdesc_1arg): Ditto.
        (rs6000_overloaded_builtin_p): Ditto.
        (rs6000_expand_unop_builtin): Ditto.
        (bdesc_2arg_spe): Ditto.
        (spe_expand_builtin): Ditto.
        (rs6000_invalid_builtin): Ditto.
        (rs6000_expand_builtin): Ditto.
        (rs6000_init_builtins): Ditto.
        (spe_init_builtins): Ditto.
        (paired_init_builtins): Ditto.
        (altivec_init_builtins): Ditto.
        (builtin_function_type): Ditto.
        (rs6000_common_init_builtins): Ditto.
        (rs6000_builtin_reciprocal): Ditto.
        (rs6000_builtin_mask_names): Ditto.
        (rs6000_pragma_target_parse): Ditto.
        (rs6000_function_specific_print): Ditto.
        * config/rs6000/rs6000.h (MASK_DEBUG_BUILTIN): Ditto.
        (MASK_DEBUG_ALL): Ditto.
        (TARGET_DEBUG_BUILTIN): Ditto.
        (TARGET_EXTRA_BUILTINS): Ditto.
        (REGISTER_TARGET_PRAGMAS): Ditto.
        (enum rs6000_btc): Ditto.
        (RS6000_BTC_*): Ditto.
        (RS6000_BTM_*): Ditto.
        (enum rs6000_builtins): Ditto.
        * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p):
        Ditto.
        (rs6000_target_modify_macros): Ditto.
        (rs6000_target_modify_macros_ptr): Ditto.

 [gcc/testsuite]
 2011-11-07  Michael Meissner  meiss...@linux.vnet.ibm.com

        * testsuite/gcc.target/powerpc/ppc-target-4.c: New test for
        builtin functions being defined and undefined based on the #pragma
        GCC target settings.

This is okay.

Thanks, David


Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:15:08PM +0100, Rainer Orth wrote:
 Alan Modra amo...@gmail.com writes:
 
  On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote:
  I was hoping that other targets affected by the patch would test it as 
  well.
 
  powerpc-linux and powerpc64-linux look good.
 
 Thanks for the confirmation.  However, Andreas Tobler pointed out in
 private mail that I'd missed to reference rs6000/t-linux64 during the
 libgcc migration.  Strange that this seems to have no ill effect.

Hmm, yeah, we should have that back in.  Lack of t-linux64 means we
get all the soft-fp functions that we never use, bloating libgcc.
Also, -mno-minimal-toc gives better code.  Even when -mminimal-toc
wouldn't be in force anyway the option selects -mcmodel=small.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Improve EXPAND_SUM handling in expand_expr_addr_expr* (PR middle-end/50074)

2011-11-28 Thread Richard Henderson
On 11/25/2011 12:58 PM, Jakub Jelinek wrote:
   PR middle-end/50074
   * expr.c (expand_expr_addr_expr_1): Don't call force_operand for
   EXPAND_SUM modifier.

Ok.


r~


Re: Adjust omp-low test for alignment

2011-11-28 Thread Hans-Peter Nilsson
On Sat, 26 Nov 2011, Richard Henderson wrote:
 The m68k-linux failure for the various omp atomic tests
 is due to the fact that BIGGEST_ALIGNMENT is 16 bits on
 that platform.  I think it's pretty reasonable to assume
 that if something is aligned to BIGGEST_ALIGNEMENT, then
 it can be considered aligned.

BIGGEST_ALIGNMENT means aligned enough for normal access, but
not necessarily for atomic access.

This particular fix wouldn't do it for CRIS, for example, were
BIGGEST_ALIGNMENT is 8 (bits), but an atomic access (inside the
ll/sc sequence) such as for a futex requires not straddling a
page boundary, i.e. natural alignment.

Not that OMP support is imminent or critical for cris-linux or
anything, but can we have a new macro?

brgds, H-P


Re: [committed] Fix failure of Octeon BADDU tests

2011-11-28 Thread Andrew Pinski
On Sun, Nov 27, 2011 at 2:18 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 This patch fixes the MIPS octeon-baddu-1.c failures.  As Andrew says
 in the PR, the problem is that the combined instruction seems more
 expensive than the separate ones.

I was going to submit a similar patch which I created internally at
Cavium.  I have a few more insn_cost patches in my queue also.

Thanks,
Andrew Pinski


Re: Fix PR51298, libgomp barrier failure

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:42:15PM -0800, Richard Henderson wrote:
 On 11/28/2011 06:02 AM, Alan Modra wrote:
  -  unsigned int ret = bar-generation  ~3;
  -  /* Do we need any barrier here or is __sync_add_and_fetch acting
  - as the needed LoadLoad barrier already?  */
  -  ret += __sync_add_and_fetch (bar-awaited, -1) == 0;
  +  unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) 
   ~3;
  +  ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0;
 
 Given that the read from bar-generation is ACQ, we don't need a duplicate 
 barrier from the REL on the atomic add.  I believe both can be 
 MEMMODEL_ACQUIRE
 both in order to force the ordering of these two memops, as well as force 
 these
 to happen before anything subsequent.

I tried with MEMMODEL_ACQUIRE and ran force-parallel-6.exe, the test
that seems most sensitive to barrier problems, many times, and it hangs
occasionally in futex_wait called via gomp_team_barrier_wait_end.

I believe that threads can't be allowed to exit from
gomp_{,team_}barrier_wait without hitting a release barrier, and
perhaps from gomp_barrier_wait_last too.  gomp_barrier_wait_start is a
convenient point to insert the barrier, and a minimal change from the
old code using __sync_add_and_fetch.  I can add a comment. ;-)

-- 
Alan Modra
Australia Development Lab, IBM


[patch] Fix PR tree-optimization/51301

2011-11-28 Thread Ira Rosen
Hi,

In vectorizer's over-widening pattern recognition the last statement
is expected to be a type demotion, but the check for that was
incomplete. We now check that the resulting type is not bigger than
the original type of the computation.

Bootstrapped and tested on powerpc64-suse-linux, tested on
arm-linux-gnueabi (cross).
Committed.

Ira

ChangeLog:

PR tree-optimization/51301
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Check that
the last statement doesn't convert to a bigger type than the original
type of the computation.

testsuite/ChangeLog:

PR tree-optimization/51301
* gcc.dg/vect/pr51301.c: New test.

Index: tree-vect-patterns.c
===
--- tree-vect-patterns.c(revision 181796)
+++ tree-vect-patterns.c(working copy)
@@ -1088,6 +1088,7 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
   tree var = NULL_TREE, new_type = NULL_TREE, tmp, new_oprnd;
   bool first;
   struct loop *loop = (gimple_bb (stmt))-loop_father;
+  tree type = NULL;

   first = true;
   while (1)
@@ -1150,6 +1151,7 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
   print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
 }

+  type = gimple_expr_type (stmt);
   prev_stmt = stmt;
   stmt = use_stmt;

@@ -1165,9 +1167,11 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
 {
   use_lhs = gimple_assign_lhs (use_stmt);
   use_type = TREE_TYPE (use_lhs);
-  /* Support only type promotion or signedess change.  */
+  /* Support only type promotion or signedess change.  Check that USE_TYPE
+is not bigger than the original type.  */
   if (!INTEGRAL_TYPE_P (use_type)
-  || TYPE_PRECISION (new_type)  TYPE_PRECISION (use_type))
+  || TYPE_PRECISION (new_type)  TYPE_PRECISION (use_type)
+ || TYPE_PRECISION (type)  TYPE_PRECISION (use_type))
 return NULL;

   if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
Index: testsuite/gcc.dg/vect/pr51301.c
===
--- testsuite/gcc.dg/vect/pr51301.c (revision 0)
+++ testsuite/gcc.dg/vect/pr51301.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+typedef signed char int8_t;
+typedef signed long long int64_t;
+int64_t
+f0a (int8_t * __restrict__ arg1)
+{
+  int idx;
+  int64_t result = 0;
+  for (idx = 0; idx  416; idx += 1)
+result += arg1[idx]  (arg1[idx] == arg1[idx]);
+  return result;
+}
+
+/* { dg-final { cleanup-tree-dump vect } } */


Go patch committed: Implement predefined error interface

2011-11-28 Thread Ian Lance Taylor
The Go language recently added a new predefined type, named error,
which is an interface type with a single method, Error.  This patch,
from Rémy Oudompheng, implements this type for gccgo.  Bootstrapped and
ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 9f8ff21f7771 go/gogo.cc
--- a/go/gogo.cc	Mon Nov 28 22:51:39 2011 -0800
+++ b/go/gogo.cc	Mon Nov 28 22:52:22 2011 -0800
@@ -97,6 +97,18 @@
 
   this-add_named_type(Type::make_named_string_type());
 
+  // error is interface { Error() string }.
+  {
+Typed_identifier_list *methods = new Typed_identifier_list;
+Typed_identifier_list *results = new Typed_identifier_list;
+results-push_back(Typed_identifier(, Type::lookup_string_type(), loc));
+Type *method_type = Type::make_function_type(NULL, NULL, results, loc);
+methods-push_back(Typed_identifier(Error, method_type, loc));
+Type *error_iface = Type::make_interface_type(methods, loc);
+Named_type *error_type = Named_object::make_type(error, NULL, error_iface, loc)-type_value();
+this-add_named_type(error_type);
+  }
+
   this-globals_-add_constant(Typed_identifier(true,
 		Type::make_boolean_type(),
 		loc),


Fwd: Re: hashtable cleanup + new testsuite files

2011-11-28 Thread François Dumont

Attached patch applied.

2011-11-29  François Dumontfdum...@gcc.gnu.org

 * include/bits/hashtable.h (_Hashtable::_M_rehash): Remove code
 useless now that the hashtable implementation put the hash code in
 cache if the hash functor throws.
 * testsuite/23_containers/unordered_set/erase/1.cc: Enhance test by
 checking also distance between begin and end iterators to validate
 underlying data model.
 * testsuite/23_containers/unordered_multiset/erase/1.cc: Likewise.
 * testsuire/23_containers/unordered_map/erase/1.cc: Likewise.
 * testsuite/23_containers/unordered_multimap/erase/1.cc: Likewise.
 * testsuite/23_containers/unordered_multiset/erase/2.cc: New.
 * testsuite/23_containers/unordered_multimap/erase/2.cc: New.

Regards


On 11/28/2011 10:54 PM, Paolo Carlini wrote:

 On 11/28/2011 09:29 PM, François Dumont wrote:

 2011-11-28  François Dumontfdum...@gcc.gnu.org

 * include/bits/hashtable.h (_Hashtable::_M_rehash): Remove
 code
 useless now that the hashtable implementation put the hash
 code in
 cache if the hash functor throws.
 * testsuite/23_containers/unordered_set/erase/1.cc: Enhance
 test by
 checking also distance between begin and end iterators to
 validate
 underlying data model.
 * testsuite/23_containers/unordered_multiset/erase/1.cc:
 Likewise.
 * testsuire/23_containers/unordered_map/erase/1.cc: Likewise.
 * testsuite/23_containers/unordered_multimap/erase/1.cc:
 Likewise.
 * testsuite/23_containers/unordered_multiset/erase/2.cc: New.
 * testsuite/23_containers/unordered_multimap/erase/2.cc: New.

 Tested under linux x86_64.

 Ok to commit in trunk ?

 Ok, thanks.

 Paolo.




Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 181729)
+++ include/bits/hashtable.h	(working copy)
@@ -233,11 +233,6 @@
   void
   _M_deallocate_node(_Node* __n);
 
-  // Deallocate all nodes contained in the bucket array, buckets' nodes
-  // are not linked to each other
-  void
-  _M_deallocate_nodes(_Bucket*, size_type);
-
   // Deallocate the linked list of nodes pointed to by __n
   void
   _M_deallocate_nodes(_Node* __n);
@@ -591,19 +586,6 @@
 void
 _Hashtable_Key, _Value, _Allocator, _ExtractKey, _Equal,
 	   _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk::
-_M_deallocate_nodes(_Bucket* __buckets, size_type __n)
-{
-  for (size_type __i = 0; __i != __n; ++__i)
-	_M_deallocate_nodes(__buckets[__i]);
-}
-
-  templatetypename _Key, typename _Value,
-	   typename _Allocator, typename _ExtractKey, typename _Equal,
-	   typename _H1, typename _H2, typename _Hash, typename _RehashPolicy,
-	   bool __chc, bool __cit, bool __uk
-void
-_Hashtable_Key, _Value, _Allocator, _ExtractKey, _Equal,
-	   _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk::
 _M_deallocate_nodes(_Node* __n)
 {
   while (__n)
@@ -1542,11 +1524,10 @@
 	   _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk::
 _M_rehash(size_type __n, const _RehashPolicyState __state)
 {
-  _Bucket* __new_buckets = nullptr;
-  _Node* __p = _M_buckets[_M_begin_bucket_index];
   __try
 	{
-	  __new_buckets = _M_allocate_buckets(__n);
+	  _Bucket* __new_buckets = _M_allocate_buckets(__n);
+	  _Node* __p = _M_buckets[_M_begin_bucket_index];
 	  // First loop to store each node in its new bucket
 	  while (__p)
 	{
@@ -1591,24 +1572,9 @@
 	}
   __catch(...)
 	{
-	  if (__new_buckets)
-	{
-	  // A failure here means that a hash function threw an exception.
-	  // We can't restore the previous state without calling the hash
-	  // function again, so the only sensible recovery is to delete
-	  // everything.
-	  _M_deallocate_nodes(__new_buckets, __n);
-	  _M_deallocate_buckets(__new_buckets, __n);
-	  _M_deallocate_nodes(__p);
-	  __builtin_memset(_M_buckets, 0, sizeof(_Bucket) * _M_bucket_count);
-	  _M_element_count = 0;
-	  _M_begin_bucket_index = _M_bucket_count;
-	  _M_rehash_policy._M_reset(_RehashPolicyState());
-	}
-	  else
-	// A failure here means that buckets allocation failed.  We only
-	// have to restore hash policy previous state.
-	_M_rehash_policy._M_reset(__state);
+	  // A failure here means that buckets allocation failed.  We only
+	  // have to restore hash policy previous state.
+	  _M_rehash_policy._M_reset(__state);
 	  __throw_exception_again;
 	}
 }
Index: testsuite/23_containers/unordered_map/erase/1.cc
===
--- testsuite/23_containers/unordered_map/erase/1.cc	(revision 181729)
+++ testsuite/23_containers/unordered_map/erase/1.cc	(working copy)
@@ -23,6 +23,18 @@
 #include string
 #include testsuite_hooks.h
 
+namespace
+{
+