Re: Fix PR51298, libgomp barrier failure
On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote: --- libgomp/config/linux/bar.c(revision 181718) +++ libgomp/config/linux/bar.c(working copy) @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b if (__builtin_expect ((state 1) != 0, 0)) { /* Next time we'll be awaiting TOTAL threads again. */ - bar-awaited = bar-total; - atomic_write_barrier (); - bar-generation += 4; + __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE); + __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL); futex_wake ((int *) bar-generation, INT_MAX); } The above is unfortunate, bar-generation should be only modified from a single thread at this point, but the __atomic_add_fetch will enforce there an atomic instruction on it rather than just some kind of barrier. Jakub
Re: Use atomics in libgomp mutex
On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote: On Mon, Nov 28, 2011 at 09:27:20AM +1030, Alan Modra wrote: This is the mutex part. Depends on http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_* values. Arrgh, I posted the wrong patch. I know it needs work elsewhere in libgomp to comply with the OMP spec, which says of flush: 14 A flush region without a list is implied at the following locations: 15 • During a barrier region. 16 • At entry to and exit from parallel, critical, and ordered regions. [snip] Yet COMP_critical_start simply calls gomp_mutex_lock, relying on the lock acquisition to do the flush. This won't necessarily happen with MEMMODEL_ACQUIRE. This bit in gomp_mutex_lock + __atomic_compare_exchange_4 (mutex, oldval, 1, false, + MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); should be __atomic_compare_exchange_4 (mutex, oldval, 1, false, MEMMODEL_ACQ_REL, MEMMODEL_ACQ_REL); and once you do that it's hard to justify the patch for stage3 as fixing a bug (unnecessary sync instructions emitted by __sync builtins). If it is just GOMP_critical_start, couldn't it use an extra barrier before resp. after it calls mutex_lock? Jakub
Re: Use atomics in libgomp mutex
On Mon, Nov 28, 2011 at 09:15:02AM +0100, Jakub Jelinek wrote: On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote: 14 A flush region without a list is implied at the following locations: 15 • During a barrier region. 16 • At entry to and exit from parallel, critical, and ordered regions. [snip] If it is just GOMP_critical_start, couldn't it use an extra barrier before resp. after it calls mutex_lock? I believe gomp_ordered_sync needs a barrier, and I'm unsure about GOMP_parallel_start but I think one is needed there too. Yes, putting the barriers where they are actually needed is the best solution, but I'm far from an OpenMP expert so am unsure whether these are the only places requiring a barrier. -- Alan Modra Australia Development Lab, IBM
Re: Fix PR51298, libgomp barrier failure
On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote: On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote: --- libgomp/config/linux/bar.c (revision 181718) +++ libgomp/config/linux/bar.c (working copy) @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b if (__builtin_expect ((state 1) != 0, 0)) { /* Next time we'll be awaiting TOTAL threads again. */ - bar-awaited = bar-total; - atomic_write_barrier (); - bar-generation += 4; + __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE); + __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL); futex_wake ((int *) bar-generation, INT_MAX); } The above is unfortunate, bar-generation should be only modified from a single thread at this point, but the __atomic_add_fetch will enforce there an atomic instruction on it rather than just some kind of barrier. I will try without the atomic add and see what happens. -- Alan Modra Australia Development Lab, IBM
Add __stpncpy_chk builtin support
Hi, The current trunk have no support for the __stpncpy_chk function. As the latest OS X (10.7 aka Lion) now defines stpncpy as a call to builtin__stpncpy_chk for GNUC compliant compiler, the following code won't link: #include string.h int main(){ char string[10]; stpncpy(string,abcd,8); return 0; } Undefined symbols for architecture x86_64: ___builtin___stpncpy_chk, referenced from: _main in ccGFZoZV.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status This patch add support for this builtin. Bootstrapped and tested without regression on darwin x86_64. Romain Geissler gcc/ 2011-11-28 Romain Geissler romain.geiss...@gmail.com * builtins.def (BUILT_IN_STPNCPY_CHK): New definition. * builtins.c (expand_builtin, fold_builtin_4, maybe_emit_chk_warning): Add BUILT_IN_STPNCPY_CHK case. * gimple-fold.c (gimple_fold_builtin): Likewise. * tree-object-size.c (pass_through_call): Likewise. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Likewise. * tree-ssa-structalias.c (find_func_aliases_for_builtin_call, find_func_clobbers): Likewise. * tree.h (fold_builtin_strncpy_chk): Rename to fold_builtin_stxncpy_chk * builtins.c (fold_builtin_strncpy_chk): Likewise. Rewrite stpncpy_chk calls to strncpy_chk calls if returned value is ignored. gcc/testsuite/ 2011-11-28 Romain Geissler romain.geiss...@gmail.com * gcc.c-torture/execute/builtins/chk.h (stpncpy, stpncpy_disallowed): New definitions. * gcc.c-torture/execute/builtins/lib/chk.c (stpncpy_disallowed): Likewise. (stpncpy, __stpncpy_chk): New functions. * gcc.c-torture/execute/builtins/stpncpy-chk-lib.c: New file. * gcc.c-torture/execute/builtins/stpncpy-chk.c: Likewise. lion_build.patch Description: Binary data
[RFA/ARM] Add an integer pipeline description for Cortex-A15
All, The attached patch adds a integer pipeline description for Cortex-A15. Although not dependent on my testing has been done on top of Sameera's Deshpande's A15 Prologue/Epilogue patches (see: http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00856.html and following). Testing on some popular embedded benchmarks shows the pipeline description gives a 1.8% performance improvement on (geomean) average. The testsuite shows no regressions targetting arm-none-eabi and using QEmu. Can someone please review. Thanks, Matt gcc/ChangeLog: 2011-11-28 Matthew Gretton-Dannmatthew.gretton-d...@arm.com * config/arm/arm.c (arm_issue_rate): Cortex-A15 can triple issue. * config/arm/arm.md (mul64): Add new attribute. (generic_sched): Cortex-A15 is not scheduled generically (cortex-a15.md): Include. * config/arm/cortex-a15.md: New machine description. * config/arm/t-arm (MD_INCLUDES): Add cortex-a15.md -- Matthew Gretton-Dann Principal Engineer, PD Software - Tools, ARM Ltddiff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e3b0b88..d17f2b5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24123,6 +24123,9 @@ arm_issue_rate (void) { switch (arm_tune) { +case cortexa15: + return 3; + case cortexr4: case cortexr4f: case cortexr5: diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index a78ba88..facbf92 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -355,6 +355,13 @@ (const_string mult) (const_string alu))) +; Is this an (integer side) multiply with a 64-bit result? +(define_attr mul64 no,yes +(if_then_else + (eq_attr insn smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals) + (const_string yes) + (const_string no))) + ; Load scheduling, set from the arm_ld_sched variable ; initialized by arm_option_override() (define_attr ldsched no,yes (const (symbol_ref arm_ld_sched))) @@ -518,7 +525,7 @@ (define_attr generic_sched yes,no (const (if_then_else - (ior (eq_attr tune fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexm4) + (ior (eq_attr tune fa526,fa626,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1020e,arm1026ejs,arm1136js,arm1136jfs,cortexa5,cortexa8,cortexa9,cortexa15,cortexm4) (eq_attr tune_cortexr4 yes)) (const_string no) (const_string yes @@ -544,6 +551,7 @@ (include cortex-a5.md) (include cortex-a8.md) (include cortex-a9.md) +(include cortex-a15.md) (include cortex-r4.md) (include cortex-r4f.md) (include cortex-m4.md) diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md new file mode 100644 index 000..ccab7cb --- /dev/null +++ b/gcc/config/arm/cortex-a15.md @@ -0,0 +1,186 @@ +;; ARM Cortex-A15 pipeline description +;; Copyright (C) 2011 Free Software Foundation, Inc. +;; +;; Written by Matthew Gretton-Dann matthew.gretton-d...@arm.com + +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; http://www.gnu.org/licenses/. + +(define_automaton cortex_a15) + +;; The Cortex-A15 core is modelled as a triple issue pipeline that has +;; the following dispatch units. +;; 1. Two pipelines for simple integer operations: SX1, SX2 +;; 2. Two pipelines for Neon and FP data-processing operations: CX1, CX2 +;; 3. One pipeline for branch operations: BX +;; 4. One pipeline for integer multiply and divide operations: MX +;; 5. Two pipelines for load and store operations: LS1, LS2 +;; +;; We can issue into three pipelines per-cycle. +;; +;; We assume that where we have unit pairs xx1 is always filled before xx2. + +;; The three issue units +(define_cpu_unit ca15_i0, ca15_i1, ca15_i2 cortex_a15) + +(define_reservation ca15_issue1 (ca15_i0|ca15_i1|ca15_i2)) +(define_reservation ca15_issue2 ((ca15_i0+ca15_i1)|(ca15_i1+ca15_i2))) +(define_reservation ca15_issue3 (ca15_i0+ca15_i1+ca15_i2)) +(final_presence_set ca15_i1 ca15_i0) +(final_presence_set ca15_i2 ca15_i1) + +;; The main dispatch units +(define_cpu_unit ca15_sx1, ca15_sx2 cortex_a15) +(define_cpu_unit ca15_cx1, ca15_cx2 cortex_a15) +(define_cpu_unit ca15_ls1, ca15_ls2 cortex_a15) +(define_cpu_unit ca15_bx, ca15_mx cortex_a15) + +(define_reservation ca15_ls (ca15_ls1|ca15_ls2)) + +;; The extended load-store
Re: [Patch, Fortran] MOVE_ALLOC fixes
Dear all, attached is an updated patch, I hadn't considered that the declared type of polymorphic arguments could be different. The middle end was not happy about: to = from where to and from where different class containers. I was considering to force the assignment (via VIEW_CONVERT_EXPR), but Jakub warns that this might confuse tree-sra while the produced code will anyway consists of two pointer assignments. Hence, I now directly generate two pointer assignments (for _data and _vptr). (The patch only affects scalar polymorphic variables; for arrays, there were already two pointer assignments.) Build and regtested on x86-64-linux (trunk and trunk + Paul's patch). OK? Tobias On 11/26/2011 02:50 PM, Tobias Burnus wrote: Dear all, (First, this is *not* for the 4.6/4.7 rejects-valid regression, which is related to intent(in) pointers with allocatable components.) When debugging an issue with with polymorphic arrays and MOVE_ALLOC, I got lost in the code generation of move_alloc - and didn't like the generated code. Thus, I have rewritten the trans*.c part of it. (It turned out that the issue, we had, was unrelated to move_alloc.) Changes: * Replace call to libgfortran by inline code (much faster and shorter code) * For arrays: Deallocate from (deep freeing) * For polymorphic arrays: set _vptr. Actually, the required code is rather simple: For move_alloc(from, to), one just needs to do: a) Deallocate to, taking allocatable components and the polymorphic types into account (the latter is a to-do item, cf. PR 46174). b) Do a simple assignment: to = from namely: If both are scalar variables, those are pointers and one does a pointer assignment. If they are polymorphic and/or an array, one does a (nonpointer) assignment to the class container or the array descriptor. c) Setting from = NULL (nonpolymorphic scalars) or from.data = NULL (nonpolymorphic arrays) or from._data = NULL (polymorphic scalars) or from._data.data = NULL (polymorphic arrays). For (b) the current expr-ref-walking function for polymorphic arrays either give access to class._data or to class._vptr. It is extremly difficult to access class itself. Thus, I now do two assignments: One nonpointer one to array descriptor and one pointer assignment to the _vptr. Build and regtested with the trunk with Paul's polymorphic array patch applied. (I will do a bootstrap and regtest with a clean trunk before committal.) OK for the trunk? Tobias PS: I'll add _gfortran_move_alloc to the list of functions which can be removed after the ABI breakage. 2011-11-28 Tobias Burnus bur...@net-b.de PR fortran/51306 PR fortran/48700 * check.c (gfc_check_move_alloc): Make sure that from/to are both polymorphic or neither. * trans-intrinsic.c (conv_intrinsic_move_alloc): Cleanup, generate inline code. 2011-11-28 Tobias Burnus bur...@net-b.de PR fortran/51306 PR fortran/48700 * gfortran.dg/move_alloc_5.f90: Add dg-error. * gfortran.dg/select_type_23.f03: Add dg-error. * gfortran.dg/move_alloc_6.f90: New. * gfortran.dg/move_alloc_7.f90: New. diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index d9b9a9c..e2b0d66 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -2691,6 +2709,14 @@ gfc_check_move_alloc (gfc_expr *from, gfc_expr *to) if (same_type_check (to, 1, from, 0) == FAILURE) return FAILURE; + if (to-ts.type != from-ts.type) +{ + gfc_error (The FROM and TO arguments in MOVE_ALLOC call at %L must be + either both polymorphic or both nonpolymorphic, + from-where); + return FAILURE; +} + if (to-rank != from-rank) { gfc_error (the '%s' and '%s' arguments of '%s' intrinsic at %L must diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index 4244570..77c065e 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -5892,7 +5892,7 @@ } -/* Generate code for SELECTED_REAL_KIND (P, R) intrinsic function. */ +/* Generate code for SELECTED_REAL_KIND (P, R, RADIX) intrinsic function. */ static void gfc_conv_intrinsic_sr_kind (gfc_se *se, gfc_expr *expr) @@ -7182,50 +7190,123 @@ conv_intrinsic_atomic_ref (gfc_code *code) static tree conv_intrinsic_move_alloc (gfc_code *code) { - if (code-ext.actual-expr-rank == 0) -{ - /* Scalar arguments: Generate pointer assignments. */ - gfc_expr *from, *to, *deal; - stmtblock_t block; - tree tmp; - gfc_se se; + stmtblock_t block; + gfc_expr *from_expr, *to_expr; + gfc_expr *to_expr2, *from_expr2; + gfc_se from_se, to_se; + gfc_ss *from_ss, *to_ss; + tree tmp; - from = code-ext.actual-expr; - to = code-ext.actual-next-expr; + gfc_start_block (block); - gfc_start_block (block); + from_expr = code-ext.actual-expr; + to_expr = code-ext.actual-next-expr; - /* Deallocate 'TO' argument. */ - gfc_init_se (se, NULL); - se.want_pointer = 1; - deal = gfc_copy_expr
[v3] libstdc++/51288
Hi, when I implemented get/put_money I gorgot about the sentry, oops. Tested x86_64-linux, committed to mainline. Paolo. / 2011-11-28 Paolo Carlini paolo.carl...@oracle.com PR libstdc++/51288 * include/std/iomanip (get_money, put_money): Use sentry. * testsuite/27_io/manipulators/extended/get_money/char/51288.cc: New. * testsuite/27_io/manipulators/extended/get_money/wchar_t/51288.cc: Likewise. * testsuite/27_io/manipulators/extended/put_money/char/51288.cc: Likewise. * testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc: Likewise. Index: include/std/iomanip === --- include/std/iomanip (revision 181773) +++ include/std/iomanip (working copy) @@ -1,7 +1,7 @@ // Standard stream manipulators -*- C++ -*- // Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 -// 2006, 2007, 2008, 2009, 2010 +// 2006, 2007, 2008, 2009, 2010, 2011 // Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free @@ -262,18 +262,29 @@ basic_istream_CharT, _Traits operator(basic_istream_CharT, _Traits __is, _Get_money_MoneyT __f) { - typedef istreambuf_iterator_CharT, _Traits _Iter; - typedef money_get_CharT, _Iter _MoneyGet; - - ios_base::iostate __err = ios_base::goodbit; - const _MoneyGet __mg = use_facet_MoneyGet(__is.getloc()); + typename basic_istream_CharT, _Traits::sentry __cerb(__is, false); + if (__cerb) + { + ios_base::iostate __err = ios_base::goodbit; + __try + { + typedef istreambuf_iterator_CharT, _Traits _Iter; + typedef money_get_CharT, _Iter _MoneyGet; - __mg.get(_Iter(__is.rdbuf()), _Iter(), __f._M_intl, - __is, __err, __f._M_mon); - - if (ios_base::goodbit != __err) - __is.setstate(__err); - + const _MoneyGet __mg = use_facet_MoneyGet(__is.getloc()); + __mg.get(_Iter(__is.rdbuf()), _Iter(), __f._M_intl, + __is, __err, __f._M_mon); + } + __catch(__cxxabiv1::__forced_unwind) + { + __is._M_setstate(ios_base::badbit); + __throw_exception_again; + } + __catch(...) + { __is._M_setstate(ios_base::badbit); } + if (ios_base::goodbit != __err) + __is.setstate(__err); + } return __is; } @@ -298,16 +309,27 @@ basic_ostream_CharT, _Traits operator(basic_ostream_CharT, _Traits __os, _Put_money_MoneyT __f) { - typedef ostreambuf_iterator_CharT, _Traits _Iter; - typedef money_put_CharT, _Iter _MoneyPut; - - const _MoneyPut __mp = use_facet_MoneyPut(__os.getloc()); - const _Iter __end = __mp.put(_Iter(__os.rdbuf()), __f._M_intl, - __os, __os.fill(), __f._M_mon); - - if (__end.failed()) - __os.setstate(ios_base::badbit); - + typename basic_ostream_CharT, _Traits::sentry __cerb(__os); + if (__cerb) + { + __try + { + typedef ostreambuf_iterator_CharT, _Traits _Iter; + typedef money_put_CharT, _Iter _MoneyPut; + const _MoneyPut __mp = use_facet_MoneyPut(__os.getloc()); + const _Iter __end = __mp.put(_Iter(__os.rdbuf()), __f._M_intl, + __os, __os.fill(), __f._M_mon); + if (__end.failed()) + __os.setstate(ios_base::badbit); + } + __catch(__cxxabiv1::__forced_unwind) + { + __os._M_setstate(ios_base::badbit); + __throw_exception_again; + } + __catch(...) + { __os._M_setstate(ios_base::badbit); } + } return __os; } Index: testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc === --- testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc (revision 0) +++ testsuite/27_io/manipulators/extended/put_money/wchar_t/51288.cc (revision 0) @@ -0,0 +1,49 @@ +// { dg-options -std=gnu++0x } +// { dg-require-namedlocale en_US.UTF-8 } + +// Copyright (C) 2011 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy
[Patch, Fortran] PR51308 - fix bogus c_null_ptr error
gfortran printed an error that c_null_ptr couldn't be SAVE an PARAMETER at the same time; removing the SAVE attribute causes an error that a BIND(C) variable may not be a PARAMETER, thus that has to be fixed as well. (Patch based on the one of Steven Build and regtested on x86-64-linux. OK for the trunk? Tobias 2011-11-28 Tobias Burnus bur...@net-b.de Steven G. Kargl ka...@gcc.gnu.org PR fortran/51308 * symbol.c (check_conflict): Ignore BIND(C) + PARAMETER conflicts for ISO_C_BINDING variables. (gen_special_c_interop_ptr): Don't mark c_ptr_null/c_funptr_null as SAVE. 2011-11-28 Tobias Burnus bur...@net-b.de PR fortran/51308 * gfortran.dg/iso_c_binding_compiler_4.f90: New. diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 9bd6ed4..67009c5 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -742,9 +742,10 @@ check_conflict (symbol_attribute *attr, const char *name, locus *where) conf2 (asynchronous); conf2 (threadprivate); conf2 (value); - conf2 (is_bind_c); conf2 (codimension); conf2 (result); + if (!attr-is_iso_c) + conf2 (is_bind_c); break; default: @@ -3763,13 +3764,12 @@ gen_special_c_interop_ptr (int ptr_id, const char *ptr_name, create symbol for %s, ptr_name); } - /* Set up the symbol's important fields. Save attr required so we can - initialize the ptr to NULL. */ - tmp_sym-attr.save = SAVE_EXPLICIT; + /* Set up the symbol's important fields. */ tmp_sym-ts.is_c_interop = 1; tmp_sym-attr.is_c_interop = 1; tmp_sym-ts.is_iso_c = 1; tmp_sym-ts.type = BT_DERIVED; + tmp_sym-attr.flavor = FL_PARAMETER; /* The c_ptr and c_funptr derived types will provide the definition for c_null_ptr and c_null_funptr, respectively. */ @@ -3817,9 +3817,6 @@ gen_special_c_interop_ptr (int ptr_id, const char *ptr_name, c-expr = gfc_get_expr (); c-expr-expr_type = EXPR_NULL; c-expr-ts.is_iso_c = 1; - /* Must declare c_null_ptr and c_null_funptr as having the - PARAMETER attribute so they can be used in init expressions. */ - tmp_sym-attr.flavor = FL_PARAMETER; return SUCCESS; } --- /dev/null 2011-11-28 07:33:54.995528670 +0100 +++ gcc/gcc/testsuite/gfortran.dg/iso_c_binding_compiler_4.f90 2011-11-28 14:27:05.0 +0100 @@ -0,0 +1,18 @@ +! { dg-do compile } +! +! PR fortran/51308 +! +! Contributed by Matthias Moeller +! + +module mymod + use iso_c_binding + implicit none + + private + public :: c_ptr + public :: c_null_ptr + +end module mymod + +! { dg-final { cleanup-modules mymod } }
Re: [ARM] Rewrite ldrex/strex support for atomic optabs
On 23 November 2011 23:43, Richard Henderson r...@redhat.com wrote: This transformation is quite a bit more dramatic than the other ports because ARM was not splitting the code sequences post-reload. Indeed, the failure to split resulted in a distinctly odd coding style where fake output routines were used to compute the length of the code sequence. This all seemed highly sub-optimal, so I rewrote everything from scratch. This has passed initial sniff test from a cross-compiler, and is undergoing full testing on an armv7 host. Hopefully results will be ready tomorrow. In the meantime, please review. Hi Rchard, Can you explain the code: + if (mod_f != MEMMODEL_RELAXED) +emit_label (label2); + + arm_post_atomic_barrier (mod_s); + + if (mod_f == MEMMODEL_RELAXED) +emit_label (label2); +} in the case of the existing __sync_* will it always end up doing the label and the sync as Michael's pr 48126 indicated and my patch moved it? (It's not at all clear to me what the ordering requirements of ldrex/strex are from the ARM ARM). Dave
Ping * [Patch] Take 2: Move Objective-C runtime flags to modern options system.
Hi Joseph, Mike has OK'd this ... http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01672.html .. from the ObjC and Darwin points of view (and it implements your suggested changes from the review of version 1). Is it now OK for trunk? thanks Iain On 14 Nov 2011, at 20:30, Iain Sandoe wrote: V2 is simpler, than http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01335.html because (using the approach suggested by Joseph) we can initialize flag_next_runtime early enough to avoid having to deal with any other flags. [I'll still get rid of the flag_objc_sjlj_exceptions flag in a follow-up patch.] So, this just shifts flag_next_runtime to the modern system. OK for trunk? Iain gcc: * doc/tm.texi.in (NEXT_OBJC_RUNTIME): Update macro description. * doc/tm.texi: Regenerate. * flags.h (flag_next_runtime): Remove references. * toplev.c: Likewise. * defaults.h (NEXT_OBJC_RUNTIME): Provide default. * gccspec.c (lang_specific_driver): Adjust use of NEXT_OBJC_RUNTIME. * config/darwin.c (darwin_override_options): Provide default Objective-C abi settings and target conflict checks. * config/darwin.h (SUBTARGET_C_COMMON_OVERRIDE_OPTIONS): Remove unused code. Adjust indenting. gcc/c-family: * c.opt (fgnu-runtime): Provide full description. (fnext-runtime): Likewise. * c-opts.c (OPT_fgnu_runtime, OPT_fnext_runtime) Remove.
Re: Fix PR51298, libgomp barrier failure
On Mon, Nov 28, 2011 at 08:09:01PM +1030, Alan Modra wrote: On Mon, Nov 28, 2011 at 09:13:27AM +0100, Jakub Jelinek wrote: On Mon, Nov 28, 2011 at 10:12:09AM +1030, Alan Modra wrote: --- libgomp/config/linux/bar.c(revision 181718) +++ libgomp/config/linux/bar.c(working copy) @@ -36,18 +36,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b if (__builtin_expect ((state 1) != 0, 0)) { /* Next time we'll be awaiting TOTAL threads again. */ - bar-awaited = bar-total; - atomic_write_barrier (); - bar-generation += 4; + __atomic_store_4 (bar-awaited, bar-total, MEMMODEL_RELEASE); + __atomic_add_fetch (bar-generation, 4, MEMMODEL_ACQ_REL); futex_wake ((int *) bar-generation, INT_MAX); } The above is unfortunate, bar-generation should be only modified from a single thread at this point, but the __atomic_add_fetch will enforce there an atomic instruction on it rather than just some kind of barrier. I will try without the atomic add and see what happens. Seems to be fine. I took out the extra write barriers too, so we now just have two MEMMODEL_RELEASE atomic store barriers replacing the two old atomic_write_barriers, and a number of new acquire barriers. PR libgomp/51298 * config/linux/bar.h: Use atomic rather than sync builtins. * config/linux/bar.c: Likewise. Add missing acquire synchronisation on generation field. * task.c (gomp_barrier_handle_tasks): Regain lock so as to not double unlock. Index: libgomp/task.c === --- libgomp/task.c (revision 181718) +++ libgomp/task.c (working copy) @@ -273,6 +273,7 @@ gomp_barrier_handle_tasks (gomp_barrier_ gomp_team_barrier_done (team-barrier, state); gomp_mutex_unlock (team-task_lock); gomp_team_barrier_wake (team-barrier, 0); + gomp_mutex_lock (team-task_lock); } } } Index: libgomp/config/linux/bar.h === --- libgomp/config/linux/bar.h (revision 181770) +++ libgomp/config/linux/bar.h (working copy) @@ -50,7 +50,7 @@ static inline void gomp_barrier_init (go static inline void gomp_barrier_reinit (gomp_barrier_t *bar, unsigned count) { - __sync_fetch_and_add (bar-awaited, count - bar-total); + __atomic_add_fetch (bar-awaited, count - bar-total, MEMMODEL_ACQ_REL); bar-total = count; } @@ -69,10 +69,8 @@ extern void gomp_team_barrier_wake (gomp static inline gomp_barrier_state_t gomp_barrier_wait_start (gomp_barrier_t *bar) { - unsigned int ret = bar-generation ~3; - /* Do we need any barrier here or is __sync_add_and_fetch acting - as the needed LoadLoad barrier already? */ - ret += __sync_add_and_fetch (bar-awaited, -1) == 0; + unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) ~3; + ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0; return ret; } Index: libgomp/config/linux/bar.c === --- libgomp/config/linux/bar.c (revision 181770) +++ libgomp/config/linux/bar.c (working copy) @@ -37,17 +37,15 @@ gomp_barrier_wait_end (gomp_barrier_t *b { /* Next time we'll be awaiting TOTAL threads again. */ bar-awaited = bar-total; - atomic_write_barrier (); - bar-generation += 4; + __atomic_store_4 (bar-generation, bar-generation + 4, + MEMMODEL_RELEASE); futex_wake ((int *) bar-generation, INT_MAX); } else { - unsigned int generation = state; - do - do_wait ((int *) bar-generation, generation); - while (bar-generation == generation); + do_wait ((int *) bar-generation, state); + while (__atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) == state); } } @@ -81,15 +79,15 @@ gomp_team_barrier_wake (gomp_barrier_t * void gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) { - unsigned int generation; + unsigned int generation, gen; if (__builtin_expect ((state 1) != 0, 0)) { /* Next time we'll be awaiting TOTAL threads again. */ struct gomp_thread *thr = gomp_thread (); struct gomp_team *team = thr-ts.team; + bar-awaited = bar-total; - atomic_write_barrier (); if (__builtin_expect (team-task_count, 0)) { gomp_barrier_handle_tasks (state); @@ -97,7 +95,7 @@ gomp_team_barrier_wait_end (gomp_barrier } else { - bar-generation = state + 3; + __atomic_store_4 (bar-generation, state + 3, MEMMODEL_RELEASE); futex_wake ((int *) bar-generation, INT_MAX); return; } @@ -107,12 +105,16 @@ gomp_team_barrier_wait_end (gomp_barrier do { do_wait ((int *) bar-generation, generation); - if
Re: [Patch, Fortran] PR51308 - fix bogus c_null_ptr error
On 11/28/2011 02:37 PM, Tobias Burnus wrote: Build and regtested on x86-64-linux. OK for the trunk? Steve approved the patch in the PR; thus, I have committed it as Rev. 181778. Tobias
[PATCH] Re: [v3] updated atomic configury
On 11/22/2011 03:48 PM, Joseph S. Myers wrote: On Tue, 22 Nov 2011, Andrew MacLeod wrote: It certainly does have those types, as typedefs defined inuchar.h for the same types as uint_least16_t and uint_least32_t. Andstdatomic.h is also required to define ATOMIC_CHAR16_T_LOCK_FREE and ATOMIC_CHAR32_T_LOCK_FREE. but there isn't a stdatomic.h until we support C1x right? didnt we bail on It still seems better to predefine the macros for those types, using char16_type_node and char32_type_node, using them initially for C++ and then later for C as well. OK, here's a reworked patch. This bootstraps but there is a regression... one set of PCH tests fail in a very odd way. I've investigated it and it turns out that for some reason, ptr_type_node doesn't always have it size set if code isn't being generated (all the other nodes do)... in particular I can reproduce it on a small test case on my x86-64 machine where size is 8 when generating code, and 0 if -E is used to output pre-processed source. I'm looking into that today. Question, the purpose of the testcase in the patch macros.c... is it to simply test for standard compliance, or to test both compliance PLUS what we expect? I made the change assuming the latter. We will always generate a 1 or a 2 for the macros, and we also want the macros to now be compile time satisfied so they can be used in #if statements like bkoz was trying to do. Andrew c-family * c-cpp-builtin.c (cpp_atomic_builtins):New. Emit all atomic predefines in one place. Add LOCK_FREE predefines. (c_cpp_builtins): Move Legacy HAVE_SYNC predefines to new func. libstdc++-v3 * include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp predefined macros. * testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER macro checks. Check for expected compile time values. Index: gcc/c-family/c-cppbuiltin.c === *** gcc/c-family/c-cppbuiltin.c (revision 181698) --- gcc/c-family/c-cppbuiltin.c (working copy) *** c_cpp_builtins_optimize_pragma (cpp_read *** 568,573 --- 568,677 } + /* This function will emit cpp macros to indicate the presence of various lock +free atomic operations. */ + + static void + cpp_atomic_builtins (cpp_reader *pfile) + { + /* Set a flag for each size of object that compare and swap exists for up to + a 16 byte object. */ + #define SWAP_LIMIT 17 + bool have_swap[SWAP_LIMIT]; + + /* Clear the map of sizes compare_and swap exists for. */ + memset (have_swap, 0, sizeof (have_swap)); + + /* Tell source code if the compiler makes sync_compare_and_swap + builtins available. */ + #ifndef HAVE_sync_compare_and_swapqi + #define HAVE_sync_compare_and_swapqi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapqi + #define HAVE_atomic_compare_and_swapqi 0 + #endif + + if (HAVE_sync_compare_and_swapqi || HAVE_atomic_compare_and_swapqi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1); + have_swap[1] = true; + } + + #ifndef HAVE_sync_compare_and_swaphi + #define HAVE_sync_compare_and_swaphi 0 + #endif + #ifndef HAVE_atomic_compare_and_swaphi + #define HAVE_atomic_compare_and_swaphi 0 + #endif + if (HAVE_sync_compare_and_swaphi || HAVE_atomic_compare_and_swaphi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2); + have_swap[2] = true; + } + + #ifndef HAVE_sync_compare_and_swapsi + #define HAVE_sync_compare_and_swapsi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapsi + #define HAVE_atomic_compare_and_swapsi 0 + #endif + if (HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4); + have_swap[4] = true; + } + + #ifndef HAVE_sync_compare_and_swapdi + #define HAVE_sync_compare_and_swapdi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapdi + #define HAVE_atomic_compare_and_swapdi 0 + #endif + if (HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8); + have_swap[8] = true; + } + + #ifndef HAVE_sync_compare_and_swapti + #define HAVE_sync_compare_and_swapti 0 + #endif + #ifndef HAVE_atomic_compare_and_swapti + #define HAVE_atomic_compare_and_swapti 0 + #endif + if (HAVE_sync_compare_and_swapti || HAVE_atomic_compare_and_swapti) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16); + have_swap[16] = true; + } + + /* Tell the source code about various types. These map to the C++11 and C1x + macros where 2 indicates lock-free always, and 1 indicates sometimes + lock free. */ + #define SIZEOF_NODE(T) (tree_low_cst (TYPE_SIZE_UNIT (T), 1)) + #define SWAP_INDEX(T) ((SIZEOF_NODE (T) SWAP_LIMIT) ? SIZEOF_NODE (T) : 0) + builtin_define_with_int_value
Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)
Alan Modra amo...@gmail.com writes: On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote: I was hoping that other targets affected by the patch would test it as well. powerpc-linux and powerpc64-linux look good. Thanks for the confirmation. However, Andreas Tobler pointed out in private mail that I'd missed to reference rs6000/t-linux64 during the libgcc migration. Strange that this seems to have no ill effect. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Fix libgomp semaphores
On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote: + int count = *sem; + + while ((count 0x7fff) != 0) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, count - 1, +false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + return; Why aren't you instead testing the return value of __atomic_compare_exchange_4 (here and in other cases)? Jakub
Re: [libitm, build] Clear hardware capabilities on libitm.so with Sun ld
Richard Henderson r...@redhat.com writes: On 11/21/2011 05:53 AM, Rainer Orth wrote: The libitm execution tests are currently failing on Solaris 10 and up with Sun as/ld: ld.so.1: cancel.exe: fatal: /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/./libitm/.libs/libitm.so.0: hardware capability (CA_SUNW_HW_1) unsupported: 0x2000 [ AVX ] FAIL: libitm.c/cancel.c execution test This is the same issue solved by gcc/testsuite/gcc.target/i386/clearcap.map, and the following patch adresses it in the same way: * Detect if the linker used supports -M map file. * Use it when linking libitm.so. Right now, it is only possible to clear the hardware capabilities completely, while the new v2 mapfile syntax supports selectively adding and removing capabilities. It is only available in Solaris 11 and Solaris 10 Update 10, though, so I'm restricting us to the v1 syntax for now. This is only ok if the compiler and library are build with default options. If you use --with-arch=corei7-avx then we may well use AVX insns all through the library, not just in the one interface that will only be used if the user of the library is using avx. Can you auto-foo this based on CC+CFLAGS? E.g. compile-time tests for the various __SSE__ / __AVX__ macros? My first attempt was to apply the equivalent of ld -r -M clearcap.map to x86_avx.lo and x86_sse.lo. While this works fine if calling ld -r directly, you need to get this through libtool and this is where things start breaking in all sorts of ways: I tried $ libtool --mode=link --tag=CC gcc -r -Wl,-M,clearcap.map -o x86_nhc.lo x86_sse.lo libtool: link: collect-ld -r -o x86_nhc.o .libs/x86_sse.o libtool: link: collect-ld -r -o x86_nhc.lo .libs/x86_sse.o This is of no use at all since -Wl,-M,clearcap.map is lost completely and I've found no way to get this through libtool at all. This beast tries to be smart and messes everything up completely. But even if I could get that part to work, what happens is still wrong: the .lo file is a real object file now, not the usual text file, so once I try to link the shared lib, libtool rightly complains. While the libtool docs claim that this relinking (.lo - .lo) should work, it obviously does not, even with libtool 2.2.4 ;-( So unless I'm overlooking something, I'll have to go the route you outlined above and check for __SSE__/__AVX__ to make the decision. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Ping * [Patch] Take 2: Move Objective-C runtime flags to modern options system.
On Mon, 28 Nov 2011, Iain Sandoe wrote: Hi Joseph, Mike has OK'd this ... http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01672.html .. from the ObjC and Darwin points of view (and it implements your suggested changes from the review of version 1). Is it now OK for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
RE: Re: [Ping] RE: CR16 Port addition
You still have libgcc source files under gcc/config/ (cr16-libgcc.s, divmodhi3.c). You still have libgcc-related settings / makefile targets in gcc/config/cr16/t-cr16. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Re: [v3] updated atomic configury
On Mon, 28 Nov 2011, Andrew MacLeod wrote: OK, here's a reworked patch. This bootstraps but there is a regression... one set of PCH tests fail in a very odd way. I've investigated it and it turns out that for some reason, ptr_type_node doesn't always have it size set if code isn't being generated (all the other nodes do)... in particular I can reproduce it on a small test case on my x86-64 machine where size is 8 when generating code, and 0 if -E is used to output pre-processed source. I'm looking into that today. That would be why cppbuiltin.c, defining __SIZEOF_POINTER__, needs to use the POINTER_SIZE macro instead of directly using ptr_type_node. You may need to do something similar. /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E switch. */ cpp_define_formatted (pfile, __SIZEOF_POINTER__=%d, POINTER_SIZE / BITS_PER_UNIT); Question, the purpose of the testcase in the patch macros.c... is it to simply test for standard compliance, or to test both compliance PLUS what we expect? I made the change assuming the latter. We will always generate a 1 or a 2 for the macros, and we also want the macros to now be compile time satisfied so they can be used in #if statements like bkoz was trying to do. Yes, test that the macros do what we expect. -- Joseph S. Myers jos...@codesourcery.com
Re: Go patch committed: Multiplex goroutines onto OS threads
Hello! This patch changes the Go library to multiplex goroutines onto operating system threads. Previously, each new goroutine ran in a separate thread. That is inefficient for programs with lots of goroutines. This patch changes the library such that it runs a certain numbers of threads, and lets each thread switch between goroutines. This is how the master Go library works, and this patch brings in code from the master Go library, adjusted for use by gccgo. For some reason I get this failure on alphaev68-pc-linux-gnu: --- FAIL: runtime_test.TestGcSys (4.64 seconds) using 64 MB using too much memory: 64 MB Raising the value in runtime/gc_test.go to 10e8 runs the test OK. Uros.
[PATCH] PR c++/51289 - ICE with alias template for bound template
Hello, Consider this example: templatetypename a, template typename, typename class b struct foo { template typename t using type = ba, t; template typename t ba, t funca() {} template typename t typet funcb() {}//#1 }; In #1, when lookup_template_class_1 builds the type of typet we hit a kind of plumbing issue; near the end of the the 'if' below, if (arg_depth == 1 parm_depth 1) { /* We've been given an incomplete set of template arguments. For example, given: template class T struct S1 { template class U struct S2 {}; template class U struct S2U* {}; }; we will be called with an ARGLIST of `U*', but the TEMPLATE will be `template class T template class U struct S1T::S2'. We must fill in the missing arguments. */ arglist = add_outermost_template_args (TYPE_TI_ARGS (TREE_TYPE (templ)), arglist); arg_depth = TMPL_ARGS_DEPTH (arglist); } we try to get the arguments of the alias template type using TYPE_TI_ARGS, and that fails because TREE_TYPE (templ) is a BOUND_TEMPLATE_TEMPLATE_PARM, and TEMPLATE_TYPE_INFO behaves incorrectly for BOUND_TEMPLATE_TEMPLATE_PARM nodes when they represent the underlying type of an aliased type. We wrongly poke at the arguments of the underlying BOUND_TEMPLATE_TEMPLATE_PARM instead. Thus patch below changes the TEMPLATE_TYPE_INFO accessor. I also had to adjust find_parameter_packs_r to always get the template info from the underlying type when faced with type aliases. Otherwise, it can wrongly attempt to poke at arguments of the template aliases even when these are not yet ready; this can happen e.g when check_for_bare_parameter_packs is called from push_template_decl_real before the template information for the alias template specialization is in place. Also I think we don't risk missing any parameter pack by only considering the underlying type in the case of alias template specializations. Bootstrapped and tested against on x86_64-unknown-linux-gnu against trunk. gcc/cp/ PR c++/51289 * cp-tree.h (TYPE_TEMPLATE_INFO): Rewrite this accessor macro to better support aliased types. * pt.c (): gcc/PR51289/gcc/testsuite/ PR c++/51289 * g++.dg/cpp0x/alias-decl-17.C: New test. --- gcc/cp/cp-tree.h | 27 +-- gcc/cp/pt.c|6 -- gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C | 13 + 3 files changed, 34 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 3f4f408..dece84e 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -2605,17 +2605,24 @@ extern void decl_shadowed_for_var_insert (tree, tree); (LANG_TYPE_CLASS_CHECK (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK (NODE)) \ -template_info) -/* Template information for an ENUMERAL_, RECORD_, or UNION_TYPE. */ +/* Template information for an ENUMERAL_, RECORD_, UNION_TYPE, or + BOUND_TEMPLATE_TEMPLATE_PARM type. Note that if NODE is a + specialization of an alias template, this accessor returns the + template info for the alias template, not the one (if any) for the + template of the underlying type. */ #define TYPE_TEMPLATE_INFO(NODE) \ - (TREE_CODE (NODE) == ENUMERAL_TYPE \ - ? ENUM_TEMPLATE_INFO (NODE) : \ - (TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM \ -? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) :\ -((CLASS_TYPE_P (NODE) !TYPE_ALIAS_P (NODE)) \ - ? CLASSTYPE_TEMPLATE_INFO (NODE) \ - : ((TYPE_NAME (NODE) DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\ - ? (DECL_TEMPLATE_INFO (TYPE_NAME (NODE))) \ - : NULL_TREE + (TYPE_ALIAS_P (NODE) \ + ? ((TYPE_NAME (NODE) DECL_LANG_SPECIFIC (TYPE_NAME (NODE))) \ + ? DECL_TEMPLATE_INFO (TYPE_NAME (NODE)) \ + : NULL_TREE) \ + : ((TREE_CODE (NODE) == ENUMERAL_TYPE) \ + ? ENUM_TEMPLATE_INFO (NODE) \ + : ((TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM)\ +? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) \ +: (CLASS_TYPE_P (NODE) \ + ? CLASSTYPE_TEMPLATE_INFO (NODE)\ + : NULL_TREE + /* Set the template information for an
Re: [ARM] Rewrite ldrex/strex support for atomic optabs
On 11/28/2011 06:01 AM, David Gilbert wrote: Hi Rchard, Can you explain the code: + if (mod_f != MEMMODEL_RELAXED) +emit_label (label2); + + arm_post_atomic_barrier (mod_s); + + if (mod_f == MEMMODEL_RELAXED) +emit_label (label2); +} in the case of the existing __sync_* will it always end up doing the label and the sync as Michael's pr 48126 indicated and my patch moved it? This is a new feature of the user-level interface: one can specify both the memory model to be enforced on success and the memory model to be enforced on failure: bool compare_exchange_strong(_Tp __e, _Tp __i, memory_order __s, memory_order __f) noexcept { return __atomic_compare_exchange(_M_i, __e, __i, false, __s, __f); } with the constraint that the failure model __f cannot be stronger than success model __s. What the quoted code does is skip any final barrier if and only if the failure model is RELAXED. r~
[PATCH v2 0/5] Convert Sparc to atomic optabs
Changes v1-v2: * DImode fixes. * Use the new atomic_test_and_set optab for LDSTUB. * Change -mmmodel to -mmemory-model and document it. r~ Richard Henderson (5): sparc: Convert to mem_thread_fence. sparc: Convert to atomic_load/store. sparc: Convert to atomic_compare_and_swap. sparc: Convert to atomic_exchange. sparc: Add -mmemory-model command-line option. gcc/config/sparc/constraints.md |5 +- gcc/config/sparc/predicates.md | 13 ++ gcc/config/sparc/sparc-opts.h | 10 ++ gcc/config/sparc/sparc-protos.h |4 +- gcc/config/sparc/sparc.c| 149 ++- gcc/config/sparc/sparc.md |1 + gcc/config/sparc/sparc.opt | 22 gcc/config/sparc/sync.md| 258 +-- gcc/doc/invoke.texi | 74 --- 9 files changed, 420 insertions(+), 116 deletions(-) -- 1.7.6.4
[PATCH 1/5] sparc: Convert to mem_thread_fence.
--- gcc/config/sparc/predicates.md |4 ++ gcc/config/sparc/sparc-protos.h |2 + gcc/config/sparc/sparc.c| 47 +++ gcc/config/sparc/sync.md| 79 --- 4 files changed, 118 insertions(+), 14 deletions(-) diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md index 4dd734f..047b217 100644 --- a/gcc/config/sparc/predicates.md +++ b/gcc/config/sparc/predicates.md @@ -111,6 +111,10 @@ (define_predicate const_double_or_vector_operand (match_code const_double,const_vector)) +;; Return true if OP is Zero, or if the target is V7. +(define_predicate zero_or_v7_operand + (ior (match_test op == const0_rtx) + (match_test !TARGET_V8 !TARGET_V9))) ;; Predicates for symbolic constants. diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h index 10fa5ed..b292024 100644 --- a/gcc/config/sparc/sparc-protos.h +++ b/gcc/config/sparc/sparc-protos.h @@ -113,4 +113,6 @@ unsigned int sparc_regmode_natural_size (enum machine_mode); bool sparc_modes_tieable_p (enum machine_mode, enum machine_mode); #endif /* RTX_CODE */ +extern void sparc_emit_membar_for_model (enum memmodel, int, int); + #endif /* __SPARC_PROTOS_H__ */ diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index b315698..c5c4899 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -10849,6 +10849,53 @@ sparc_mangle_type (const_tree type) } #endif +/* Expand a membar instruction for various use cases. Both the LOAD_STORE + and BEFORE_AFTER arguments of the form X_Y. They are two-bit masks where + bit 0 indicates that X is true, and bit 1 indicates Y is true. */ + +void +sparc_emit_membar_for_model (enum memmodel model, +int load_store, int before_after) +{ + /* Bits for the MEMBAR mmask field. */ + const int LoadLoad = 1; + const int StoreLoad = 2; + const int LoadStore = 4; + const int StoreStore = 8; + + int mm = 0; + + if (before_after 1) +{ + if (model == MEMMODEL_ACQUIRE + || model == MEMMODEL_ACQ_REL + || model == MEMMODEL_SEQ_CST) + { + if (load_store 1) + mm |= LoadLoad | LoadStore; + if (load_store 2) + mm |= StoreLoad | StoreStore; + } +} + if (before_after 2) +{ + if (model == MEMMODEL_RELEASE + || model == MEMMODEL_ACQ_REL + || model == MEMMODEL_SEQ_CST) + { + if (load_store 1) + mm |= LoadLoad | StoreLoad; + if (load_store 2) + mm |= LoadStore | StoreStore; + } +} + + /* For raw barriers (before+after), always emit a barrier. + This will become a compile-time barrier if needed. */ + if (mm || before_after == 3) +emit_insn (gen_membar (GEN_INT (mm))); +} + /* Expand code to perform a 8 or 16-bit compare and swap by doing 32-bit compare and swap on the word containing the byte or half-word. */ diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md index a7380ab..e22f516 100644 --- a/gcc/config/sparc/sync.md +++ b/gcc/config/sparc/sync.md @@ -1,5 +1,5 @@ ;; GCC machine description for SPARC synchronization instructions. -;; Copyright (C) 2005, 2007, 2009, 2010 +;; Copyright (C) 2005, 2007, 2009, 2010, 2011 ;; Free Software Foundation, Inc. ;; ;; This file is part of GCC. @@ -23,36 +23,87 @@ (define_mode_iterator I48MODE [SI (DI TARGET_ARCH64 || TARGET_V8PLUS)]) (define_mode_attr modesuffix [(SI ) (DI x)]) +(define_expand mem_thread_fence + [(match_operand:SI 0 const_int_operand)] + TARGET_V8 || TARGET_V9 +{ + enum memmodel model = (enum memmodel) INTVAL (operands[0]); + sparc_emit_membar_for_model (model, 3, 3); + DONE; +}) + (define_expand memory_barrier - [(set (match_dup 0) - (unspec:BLK [(match_dup 0)] UNSPEC_MEMBAR))] + [(const_int 0)] + TARGET_V8 || TARGET_V9 +{ + sparc_emit_membar_for_model (MEMMODEL_SEQ_CST, 3, 3); + DONE; +}) + +(define_expand membar + [(set (match_dup 1) + (unspec:BLK [(match_dup 1) +(match_operand:SI 0 const_int_operand)] + UNSPEC_MEMBAR))] TARGET_V8 || TARGET_V9 { - operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); - MEM_VOLATILE_P (operands[0]) = 1; + operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (operands[1]) = 1; }) -;; In V8, loads are blocking and ordered wrt earlier loads, i.e. every load -;; is virtually followed by a load barrier (membar #LoadStore | #LoadLoad). -;; In PSO, stbar orders the stores (membar #StoreStore). -;; In TSO, ldstub orders the stores wrt subsequent loads (membar #StoreLoad). -;; The combination of the three yields a full memory barrier in all cases. +;; A compiler-only memory barrier. Generic code, when checking for the +;; existance of various named patterns, uses asm(:::memory) when we +;; don't need an actual instruction. Here, it's easiest to pretend
[PATCH 2/5] sparc: Convert to atomic_load/store.
--- gcc/config/sparc/predicates.md |5 +++ gcc/config/sparc/sparc.md |1 + gcc/config/sparc/sync.md | 59 3 files changed, 65 insertions(+), 0 deletions(-) diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md index 047b217..a9207be 100644 --- a/gcc/config/sparc/predicates.md +++ b/gcc/config/sparc/predicates.md @@ -239,6 +239,11 @@ (ior (match_operand 0 register_operand) (match_operand 0 const_zero_operand))) +(define_predicate register_or_v9_zero_operand + (ior (match_operand 0 register_operand) + (and (match_test TARGET_V9) + (match_operand 0 const_zero_operand + ;; Return true if OP is either the zero constant, the all-ones ;; constant, or a register. (define_predicate register_or_zero_or_all_ones_operand diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index c059dc5..37ac170 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -41,6 +41,7 @@ (UNSPEC_MOVE_GOTDATA19) (UNSPEC_MEMBAR 20) + (UNSPEC_ATOMIC 21) (UNSPEC_TLSGD 30) (UNSPEC_TLSLDM 31) diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md index e22f516..e2f4596 100644 --- a/gcc/config/sparc/sync.md +++ b/gcc/config/sparc/sync.md @@ -102,6 +102,65 @@ membar\t%1 [(set_attr type multi)]) +(define_expand atomic_loadmode + [(match_operand:I 0 register_operand ) + (match_operand:I 1 memory_operand ) + (match_operand:SI 2 const_int_operand )] + +{ + enum memmodel model = (enum memmodel) INTVAL (operands[2]); + + sparc_emit_membar_for_model (model, 1, 1); + + if (TARGET_ARCH64 || MODEmode != DImode) +emit_move_insn (operands[0], operands[1]); + else +emit_insn (gen_atomic_loaddi_1 (operands[0], operands[1])); + + sparc_emit_membar_for_model (model, 1, 2); + DONE; +}) + +(define_insn atomic_loaddi_1 + [(set (match_operand:DI 0 register_operand =U,?*f) + (unspec:DI [(match_operand:DI 1 memory_operand m,m)] + UNSPEC_ATOMIC))] + !TARGET_ARCH64 + ldd\t%1, %0 + [(set_attr type load,fpload)]) + +(define_expand atomic_storemode + [(match_operand:I 0 register_operand ) + (match_operand:I 1 memory_operand ) + (match_operand:SI 2 const_int_operand )] + +{ + enum memmodel model = (enum memmodel) INTVAL (operands[2]); + + sparc_emit_membar_for_model (model, 2, 1); + + if (TARGET_ARCH64 || MODEmode != DImode) +emit_move_insn (operands[0], operands[1]); + else +emit_insn (gen_atomic_storedi_1 (operands[0], operands[1])); + + sparc_emit_membar_for_model (model, 2, 2); + DONE; +}) + +(define_insn atomic_storedi_1 + [(set (match_operand:DI 0 memory_operand =m,m,m) + (unspec:DI + [(match_operand:DI 1 register_or_v9_zero_operand J,U,?*f)] + UNSPEC_ATOMIC))] + !TARGET_ARCH64 + @ + stx\t%r1, %0 + std\t%1, %0 + std\t%1, %0 + [(set_attr type store,store,fpstore) + (set_attr cpu_feature v9,*,*)]) + (define_expand sync_compare_and_swapmode -- 1.7.6.4
[PATCH 4/5] sparc: Convert to atomic_exchange.
--- gcc/config/sparc/sync.md | 90 +++-- 1 files changed, 30 insertions(+), 60 deletions(-) diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md index d7066b0..be8c4c4 100644 --- a/gcc/config/sparc/sync.md +++ b/gcc/config/sparc/sync.md @@ -19,6 +19,7 @@ ;; http://www.gnu.org/licenses/. (define_mode_iterator I12MODE [QI HI]) +(define_mode_iterator I124MODE [QI HI SI]) (define_mode_iterator I24MODE [HI SI]) (define_mode_iterator I48MODE [SI (DI TARGET_ARCH64 || TARGET_V8PLUS)]) (define_mode_attr modesuffix [(SI ) (DI x)]) @@ -32,14 +33,6 @@ DONE; }) -(define_expand memory_barrier - [(const_int 0)] - TARGET_V8 || TARGET_V9 -{ - sparc_emit_membar_for_model (MEMMODEL_SEQ_CST, 3, 3); - DONE; -}) - (define_expand membar [(set (match_dup 1) (unspec:BLK [(match_dup 1) @@ -224,46 +217,22 @@ [(set_attr type multi) (set_attr length 8)]) - - -(define_expand sync_lock_test_and_setmode - [(match_operand:I12MODE 0 register_operand ) - (match_operand:I12MODE 1 memory_operand ) - (match_operand:I12MODE 2 arith_operand )] - !TARGET_V9 +(define_expand atomic_exchangesi + [(match_operand:SI 0 register_operand ) + (match_operand:SI 1 memory_operand ) + (match_operand:SI 2 register_operand ) + (match_operand:SI 3 const_int_operand )] + TARGET_V8 || TARGET_V9 { - if (operands[2] != const1_rtx) -FAIL; - if (TARGET_V8) -emit_insn (gen_memory_barrier ()); - if (MODEmode != QImode) -operands[1] = adjust_address (operands[1], QImode, 0); - emit_insn (gen_ldstubmode (operands[0], operands[1])); - DONE; -}) + enum memmodel model = (enum memmodel) INTVAL (operands[3]); -(define_expand sync_lock_test_and_setsi - [(parallel - [(set (match_operand:SI 0 register_operand ) - (unspec_volatile:SI [(match_operand:SI 1 memory_operand )] - UNSPECV_SWAP)) - (set (match_dup 1) - (match_operand:SI 2 arith_operand ))])] - -{ - if (! TARGET_V8 ! TARGET_V9) -{ - if (operands[2] != const1_rtx) - FAIL; - operands[1] = adjust_address (operands[1], QImode, 0); - emit_insn (gen_ldstubsi (operands[0], operands[1])); - DONE; -} - emit_insn (gen_memory_barrier ()); - operands[2] = force_reg (SImode, operands[2]); + sparc_emit_membar_for_model (model, 3, 1); + emit_insn (gen_swapsi (operands[0], operands[1], operands[2])); + sparc_emit_membar_for_model (model, 3, 2); + DONE; }) -(define_insn *swapsi +(define_insn swapsi [(set (match_operand:SI 0 register_operand =r) (unspec_volatile:SI [(match_operand:SI 1 memory_operand +m)] UNSPECV_SWAP)) @@ -273,24 +242,25 @@ swap\t%1, %0 [(set_attr type multi)]) -(define_expand ldstubqi - [(parallel [(set (match_operand:QI 0 register_operand ) - (unspec_volatile:QI [(match_operand:QI 1 memory_operand )] - UNSPECV_LDSTUB)) - (set (match_dup 1) (const_int -1))])] +(define_expand atomic_test_and_setmode + [(match_operand:I124MODE 0 register_operand ) + (match_operand:I124MODE 1 memory_operand ) + (match_operand:SI 2 const_int_operand )] - ) +{ + enum memmodel model = (enum memmodel) INTVAL (operands[2]); -(define_expand ldstubmode - [(parallel [(set (match_operand:I24MODE 0 register_operand ) - (zero_extend:I24MODE - (unspec_volatile:QI [(match_operand:QI 1 memory_operand )] - UNSPECV_LDSTUB))) - (set (match_dup 1) (const_int -1))])] - - ) + sparc_emit_membar_for_model (model, 3, 1); + + if (MODEmode != QImode) +operands[1] = adjust_address (operands[1], QImode, 0); + emit_insn (gen_ldstubmode (operands[0], operands[1])); + + sparc_emit_membar_for_model (model, 3, 2); + DONE; +}) -(define_insn *ldstubqi +(define_insn ldstubqi [(set (match_operand:QI 0 register_operand =r) (unspec_volatile:QI [(match_operand:QI 1 memory_operand +m)] UNSPECV_LDSTUB)) @@ -299,7 +269,7 @@ ldstub\t%1, %0 [(set_attr type multi)]) -(define_insn *ldstubmode +(define_insn ldstubmode [(set (match_operand:I24MODE 0 register_operand =r) (zero_extend:I24MODE (unspec_volatile:QI [(match_operand:QI 1 memory_operand +m)] -- 1.7.6.4
[PATCH 5/5] sparc: Add -mmemory-model command-line option.
--- gcc/config/sparc/sparc-opts.h | 10 + gcc/config/sparc/sparc.c | 48 ++- gcc/config/sparc/sparc.opt| 22 gcc/doc/invoke.texi | 74 ++-- 4 files changed, 134 insertions(+), 20 deletions(-) diff --git a/gcc/config/sparc/sparc-opts.h b/gcc/config/sparc/sparc-opts.h index 266cb14..7682eb9 100644 --- a/gcc/config/sparc/sparc-opts.h +++ b/gcc/config/sparc/sparc-opts.h @@ -47,4 +47,14 @@ enum processor_type { PROCESSOR_NATIVE }; +/* Sparc system memory model. See Appendix D in the Sparc V9 manual + for formal specification, and Appendix J for more discussion. */ +enum sparc_memory_model_type { + SMM_DEFAULT, /* Uninitialized. */ + SMM_RMO, /* Relaxed Memory Order. */ + SMM_PSO, /* Partial Store Order. */ + SMM_TSO, /* Total Store Order. */ + SMM_SC /* Sequential Consistency. */ +}; + #endif diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 7db216a..7785267 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -1160,6 +1160,17 @@ sparc_option_override (void) gcc_unreachable (); }; + if (sparc_memory_model == SMM_DEFAULT) +{ + /* Choose the most relaxed model for the processor. */ + if (TARGET_V9) + sparc_memory_model = SMM_RMO; + else if (TARGET_V8) + sparc_memory_model = SMM_PSO; + else + sparc_memory_model = SMM_SC; +} + #ifdef TARGET_DEFAULT_LONG_DOUBLE_128 if (!(target_flags_explicit MASK_LONG_DOUBLE_128)) target_flags |= MASK_LONG_DOUBLE_128; @@ -10863,7 +10874,39 @@ sparc_emit_membar_for_model (enum memmodel model, const int LoadStore = 4; const int StoreStore = 8; - int mm = 0; + int mm = 0, implied = 0; + + switch (sparc_memory_model) +{ +case SMM_SC: + /* Sequential Consistency. All memory transactions are immediately +visible in sequential execution order. No barriers needed. */ + return; + +case SMM_TSO: + /* Total Store Ordering: all memory transactions with store semantics +are followed by an implied StoreStore. */ + implied |= StoreStore; + /* FALLTHRU */ + +case SMM_PSO: + /* Partial Store Ordering: all memory transactions with load semantics +are followed by an implied LoadLoad | LoadStore. */ + implied |= LoadLoad | LoadStore; + + /* If we're not looking for a raw barrer (before+after), then atomic +operations get the benefit of being both load and store. */ + if (load_store == 3 before_after == 2) + implied |= StoreLoad | StoreStore; + /* FALLTHRU */ + +case SMM_RMO: + /* Relaxed Memory Ordering: no implicit bits. */ + break; + +default: + gcc_unreachable (); +} if (before_after 1) { @@ -10890,6 +10933,9 @@ sparc_emit_membar_for_model (enum memmodel model, } } + /* Remove the bits implied by the system memory model. */ + mm = ~implied; + /* For raw barriers (before+after), always emit a barrier. This will become a compile-time barrier if needed. */ if (mm || before_after == 3) diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt index cb807fd..01f3d43 100644 --- a/gcc/config/sparc/sparc.opt +++ b/gcc/config/sparc/sparc.opt @@ -215,3 +215,25 @@ Mask(V9) Mask(DEPRECATED_V8_INSNS) ;; Generate code that uses the V8 instructions deprecated ;; in the V9 architecture. + +mmemory-model= +Target RejectNegative Joined Var(sparc_memory_model) Enum(sparc_memory_model) Init(SMM_DEFAULT) +Specify the memory model in effect for the program. + +Enum +Name(sparc_memory_model) Type(enum sparc_memory_model_type) + +EnumValue +Enum(sparc_memory_model) String(default) Value(SMM_DEFAULT) + +EnumValue +Enum(sparc_memory_model) String(rmo) Value(SMM_RMO) + +EnumValue +Enum(sparc_memory_model) String(pso) Value(SMM_PSO) + +EnumValue +Enum(sparc_memory_model) String(tso) Value(SMM_TSO) + +EnumValue +Enum(sparc_memory_model) String(sc) Value(SMM_SC) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4e6edb9..eb21bbd6 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -884,6 +884,7 @@ See RS/6000 and PowerPC Options. @gccoptlist{-mcpu=@var{cpu-type} @gol -mtune=@var{cpu-type} @gol -mcmodel=@var{code-model} @gol +-mmemory-model=@var{mem-model} @gol -m32 -m64 -mapp-regs -mno-app-regs @gol -mfaster-structs -mno-faster-structs -mflat -mno-flat @gol -mfpu -mno-fpu -mhard-float -msoft-float @gol @@ -17811,13 +17812,22 @@ an architecture and not an implementation. These are @samp{v7}, @samp{v8}, Here is a list of each supported architecture and their supported implementations. -@smallexample -v7: cypress -v8: supersparc, hypersparc, leon -sparclite: f930, f934, sparclite86x -sparclet: tsc701 -v9: ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4
[PATCH 3/5] sparc: Convert to atomic_compare_and_swap.
Aid the backend with mem_noofs_operand and a constraint to match. --- gcc/config/sparc/constraints.md |5 +++- gcc/config/sparc/predicates.md |4 +++ gcc/config/sparc/sparc-protos.h |2 +- gcc/config/sparc/sparc.c| 56 --- gcc/config/sparc/sync.md| 52 --- 5 files changed, 85 insertions(+), 34 deletions(-) diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md index 317602c..472490f 100644 --- a/gcc/config/sparc/constraints.md +++ b/gcc/config/sparc/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;;AB -;;;ajklq tuvwxyz +;;;ajklq tuv xyz ;; Register constraints @@ -44,6 +44,9 @@ (define_register_constraint h (TARGET_V9 TARGET_V8PLUS ? I64_REGS : NO_REGS) 64-bit global or out register in V8+ mode) +(define_memory_constraint w + A memory with only a base register + (match_operand 0 mem_noofs_operand)) ;; Floating-point constant constraints diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md index a9207be..81e5320 100644 --- a/gcc/config/sparc/predicates.md +++ b/gcc/config/sparc/predicates.md @@ -471,6 +471,10 @@ (match_test call_address_operand (XEXP (op, 0), mode +(define_predicate mem_noofs_operand + (and (match_code mem) + (match_code reg 0))) + ;; Predicates for operators. ;; Return true if OP is a comparison operator. This allows the use of diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h index b292024..52acc09 100644 --- a/gcc/config/sparc/sparc-protos.h +++ b/gcc/config/sparc/sparc-protos.h @@ -104,7 +104,7 @@ extern int v9_regcmp_p (enum rtx_code); 32 bits of REG are 0 before INSN. */ extern int sparc_check_64 (rtx, rtx); extern rtx gen_df_reg (rtx, int); -extern void sparc_expand_compare_and_swap_12 (rtx, rtx, rtx, rtx); +extern void sparc_expand_compare_and_swap (rtx op[]); extern void sparc_expand_vector_init (rtx, rtx); extern void sparc_expand_vec_perm_bmask(enum machine_mode, rtx); extern bool sparc_expand_conditional_move (enum machine_mode, rtx *); diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index c5c4899..7db216a 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -10899,8 +10899,9 @@ sparc_emit_membar_for_model (enum memmodel model, /* Expand code to perform a 8 or 16-bit compare and swap by doing 32-bit compare and swap on the word containing the byte or half-word. */ -void -sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval) +static void +sparc_expand_compare_and_swap_12 (rtx bool_result, rtx result, rtx mem, + rtx oldval, rtx newval) { rtx addr1 = force_reg (Pmode, XEXP (mem, 0)); rtx addr = gen_reg_rtx (Pmode); @@ -10925,7 +10926,7 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval) set_mem_alias_set (memsi, ALIAS_SET_MEMORY_BARRIER); MEM_VOLATILE_P (memsi) = MEM_VOLATILE_P (mem); - val = force_reg (SImode, memsi); + val = copy_to_reg (memsi); emit_insn (gen_rtx_SET (VOIDmode, off, gen_rtx_XOR (SImode, off, @@ -10971,7 +10972,9 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval) emit_insn (gen_rtx_SET (VOIDmode, newvalue, gen_rtx_IOR (SImode, newv, val))); - emit_insn (gen_sync_compare_and_swapsi (res, memsi, oldvalue, newvalue)); + emit_move_insn (bool_result, const1_rtx); + + emit_insn (gen_atomic_compare_and_swapsi_1 (res, memsi, oldvalue, newvalue)); emit_cmp_and_jump_insns (res, oldvalue, EQ, NULL, SImode, 0, end_label); @@ -10979,6 +10982,8 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval) gen_rtx_AND (SImode, gen_rtx_NOT (SImode, mask), res))); + emit_move_insn (bool_result, const0_rtx); + cc = gen_compare_reg_1 (NE, resv, val); emit_insn (gen_rtx_SET (VOIDmode, val, resv)); @@ -10997,6 +11002,49 @@ sparc_expand_compare_and_swap_12 (rtx result, rtx mem, rtx oldval, rtx newval) emit_move_insn (result, gen_lowpart (GET_MODE (result), res)); } +/* Expand code to perform a compare-and-swap. */ + +void +sparc_expand_compare_and_swap (rtx operands[]) +{ + rtx bval, retval, mem, oldval, newval; + enum machine_mode mode; + enum memmodel model; + + bval = operands[0]; + retval = operands[1]; + mem = operands[2]; + oldval = operands[3]; + newval = operands[4]; + model = (enum memmodel) INTVAL (operands[6]); + mode = GET_MODE (mem); + + sparc_emit_membar_for_model (model, 3, 1); + + if (reg_overlap_mentioned_p (retval, oldval)) +oldval = copy_to_reg (oldval); + + if (mode == QImode || mode == HImode) +sparc_expand_compare_and_swap_12 (bval, retval, mem, oldval, newval); + else +{ +
[PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)
Hi! On the following testcase the volatile load/store are optimized away. IMHO this is a bug in forwprop, which replaces tmp_Y = nonvolvar[arg_X]; MEM[(volatile ...*)tmp_Y] ={v} ...; with MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...; where the LHS is no longer TREE_THIS_VOLATILE like before, TREE_THIS_VOLATILE is newly only the MEM_REF operand of the non-volatile ARRAY_REF. This patch copies the TREE_THIS_VOLATILE bit to the ARRAY_REF. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR tree-optimization/50078 * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Copy over TREE_THIS_VOLATILE also from the old to new lhs resp. rhs. * gcc.dg/pr50078.c: New test. --- gcc/tree-ssa-forwprop.c.jj 2011-10-20 14:13:43.0 +0200 +++ gcc/tree-ssa-forwprop.c 2011-11-28 09:47:39.822697552 +0100 @@ -910,7 +910,7 @@ forward_propagate_addr_expr_1 (tree name TREE_TYPE (gimple_assign_rhs1 (use_stmt { tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0); - tree new_offset, new_base, saved; + tree new_offset, new_base, saved, new_lhs; while (handled_component_p (*def_rhs_basep)) def_rhs_basep = TREE_OPERAND (*def_rhs_basep, 0); saved = *def_rhs_basep; @@ -930,8 +930,9 @@ forward_propagate_addr_expr_1 (tree name new_base, new_offset); TREE_THIS_VOLATILE (*def_rhs_basep) = TREE_THIS_VOLATILE (lhs); TREE_THIS_NOTRAP (*def_rhs_basep) = TREE_THIS_NOTRAP (lhs); - gimple_assign_set_lhs (use_stmt, -unshare_expr (TREE_OPERAND (def_rhs, 0))); + new_lhs = unshare_expr (TREE_OPERAND (def_rhs, 0)); + gimple_assign_set_lhs (use_stmt, new_lhs); + TREE_THIS_VOLATILE (new_lhs) = TREE_THIS_VOLATILE (lhs); *def_rhs_basep = saved; tidy_after_forward_propagate_addr (use_stmt); /* Continue propagating into the RHS if this was not the @@ -991,7 +992,7 @@ forward_propagate_addr_expr_1 (tree name TREE_TYPE (TREE_OPERAND (def_rhs, 0 { tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0); - tree new_offset, new_base, saved; + tree new_offset, new_base, saved, new_rhs; while (handled_component_p (*def_rhs_basep)) def_rhs_basep = TREE_OPERAND (*def_rhs_basep, 0); saved = *def_rhs_basep; @@ -1011,8 +1012,9 @@ forward_propagate_addr_expr_1 (tree name new_base, new_offset); TREE_THIS_VOLATILE (*def_rhs_basep) = TREE_THIS_VOLATILE (rhs); TREE_THIS_NOTRAP (*def_rhs_basep) = TREE_THIS_NOTRAP (rhs); - gimple_assign_set_rhs1 (use_stmt, - unshare_expr (TREE_OPERAND (def_rhs, 0))); + new_rhs = unshare_expr (TREE_OPERAND (def_rhs, 0)); + gimple_assign_set_rhs1 (use_stmt, new_rhs); + TREE_THIS_VOLATILE (new_rhs) = TREE_THIS_VOLATILE (rhs); *def_rhs_basep = saved; fold_stmt_inplace (use_stmt_gsi); tidy_after_forward_propagate_addr (use_stmt); --- gcc/testsuite/gcc.dg/pr50078.c.jj 2011-11-28 10:03:18.020091603 +0100 +++ gcc/testsuite/gcc.dg/pr50078.c 2011-11-28 10:05:01.533389187 +0100 @@ -0,0 +1,14 @@ +/* PR tree-optimization/50078 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned nonvolvar[2]; + +void +test (int arg) +{ + unsigned v = *(volatile unsigned *) (nonvolvar[arg]); + *(volatile unsigned *) (nonvolvar[arg]) = v; +} + +/* { dg-final { scan-assembler-times movl\[^\n\r\]*nonvolvar 2 { target { { i?86-*-* x86_64-*-* } nonpic } } } } */ Jakub
[PATCH] Clear EDGE_CROSSING flag when redirecting edge succ to EXIT_BLOCK_PTR (PR middle-end/50907)
Hi! The checking code insists on edges to EXIT_BLOCK_PTR (and from ENTRY_BLOCK_PTR) to have no EDGE_CROSSING bit set, so if we have a bb in second partition jumping to first partition's entry block and convert_jumps_to_returns changes it to simple_return, we need to drop the EDGE_CROSSING bit. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR middle-end/50907 * function.c (convert_jumps_to_returns): When redirecting an edge succ to EXIT_BLOCK_PTR, clear EDGE_CROSSING flag. * gcc.dg/tree-prof/pr50907.c: New test. --- gcc/function.c.jj 2011-11-21 16:22:02.0 +0100 +++ gcc/function.c 2011-11-28 11:09:39.581873481 +0100 @@ -5670,6 +5670,7 @@ convert_jumps_to_returns (basic_block la /* Fix up the CFG for the successful change we just made. */ redirect_edge_succ (e, EXIT_BLOCK_PTR); + e-flags = ~EDGE_CROSSING; } VEC_free (basic_block, heap, src_bbs); return unconverted; --- gcc/testsuite/gcc.dg/tree-prof/pr50907.c.jj 2011-11-28 11:12:12.769979965 +0100 +++ gcc/testsuite/gcc.dg/tree-prof/pr50907.c2011-11-28 11:13:40.055475777 +0100 @@ -0,0 +1,5 @@ +/* PR middle-end/50907 */ +/* { dg-require-effective-target freorder } */ +/* { dg-options -O -freorder-blocks-and-partition -fschedule-insns -fselective-scheduling -fpic { target { { powerpc*-*-* ia64-*-* x86_64-*-* } fpic } } } */ + +#include pr45354.c Jakub
[PATCH] When making var non-addressable, drop var ={v} {CLOBBER} stmts (PR debug/50317)
Hi! As discussed in the PR, SSA_NAME ={v} {CLOBBER} stmts are undesirable at least for the debug info, we really don't need to mark there the end of scope in the IL, the debug info records the ranges of the scopes and if the clobber gets scheduled before some insn we get unnecessarily pessimistic debug info. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR debug/50317 * tree-ssa.c (execute_update_addresses_taken): Remove var ={v} {CLOBBER} stmts instead of rewriting them into var_N ={v} {CLOBBER}. --- gcc/tree-ssa.c.jj 2011-10-14 08:21:56.0 +0200 +++ gcc/tree-ssa.c 2011-11-28 13:29:45.297956952 +0100 @@ -2118,7 +2118,7 @@ execute_update_addresses_taken (void) if (update_vops) { FOR_EACH_BB (bb) - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) { gimple stmt = gsi_stmt (gsi); @@ -2159,6 +2159,18 @@ execute_update_addresses_taken (void) if (gimple_assign_lhs (stmt) != lhs) gimple_assign_set_lhs (stmt, lhs); + /* For var ={v} {CLOBBER}; where var lost + TREE_ADDRESSABLE just remove the stmt. */ + if (DECL_P (lhs) +TREE_CLOBBER_P (rhs) +symbol_marked_for_renaming (lhs)) + { + unlink_stmt_vdef (stmt); + gsi_remove (gsi, true); + release_defs (stmt); + continue; + } + if (gimple_assign_rhs1 (stmt) != rhs) { gimple_stmt_iterator gsi = gsi_for_stmt (stmt); @@ -2205,6 +2219,8 @@ execute_update_addresses_taken (void) if (gimple_references_memory_p (stmt) || is_gimple_debug (stmt)) update_stmt (stmt); + + gsi_next (gsi); } /* Update SSA form here, we are called as non-pass as well. */ Jakub
Re: [PATCH] Clear EDGE_CROSSING flag when redirecting edge succ to EXIT_BLOCK_PTR (PR middle-end/50907)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/28/11 11:40, Jakub Jelinek wrote: Hi! The checking code insists on edges to EXIT_BLOCK_PTR (and from ENTRY_BLOCK_PTR) to have no EDGE_CROSSING bit set, so if we have a bb in second partition jumping to first partition's entry block and convert_jumps_to_returns changes it to simple_return, we need to drop the EDGE_CROSSING bit. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR middle-end/50907 * function.c (convert_jumps_to_returns): When redirecting an edge succ to EXIT_BLOCK_PTR, clear EDGE_CROSSING flag. * gcc.dg/tree-prof/pr50907.c: New test. OK. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO090sAAoJEBRtltQi2kC7PW0IAKVOeVrA383o7iNOGxEMBtl1 bgATdd1ofIfr7WbGsNw5QPY3KI+kjyHwiIq60yeJSDrTBxUZMKfdo5YavpZSNuAw qEE4toRAUWkLIiY+28HJKc8rLbjJFlEOdKJD0SUuSwmYGIyhr5LV49Mi6XtFIqr1 g0F2sRFXBc1c/oG8hhNkpOjgNwiHj1dm++LKUTf9hw8OX+28mZBkWJxsJgk8Y6po qeMKkkkpj2qhte5MyPMxv6lCf99CmP6xZnbhhAEqi1Q1mK0/zuc4MSM41gkneBEV x/dYhf6dQPA1SrPGqo/nwUvDptTi+knzQfFIoRmjdnh+vK/897dtPbNzVqZX75w= =60Ol -END PGP SIGNATURE-
Re: [PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/28/11 11:36, Jakub Jelinek wrote: Hi! On the following testcase the volatile load/store are optimized away. IMHO this is a bug in forwprop, which replaces tmp_Y = nonvolvar[arg_X]; MEM[(volatile ...*)tmp_Y] ={v} ...; with MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...; where the LHS is no longer TREE_THIS_VOLATILE like before, TREE_THIS_VOLATILE is newly only the MEM_REF operand of the non-volatile ARRAY_REF. This patch copies the TREE_THIS_VOLATILE bit to the ARRAY_REF. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR tree-optimization/50078 * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Copy over TREE_THIS_VOLATILE also from the old to new lhs resp. rhs. * gcc.dg/pr50078.c: New test. Do we need to copy NOTRAP or any of the other flags? OK, assuming we don't need to copy NOTRAP, etc. If those need to be copied too, preapproved with those changes. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO094dAAoJEBRtltQi2kC7Z+UH+QHq/yj9zX6wZz3cfTgZ4ZyR oWQypCXYUgoIBYrX+rWD91dzI/R+2Zy2F6u5zBOtlzrmNTLbQcQt3PL2fzhz76/L 0+lZ8c8kTwh0lvmTI7/LPQXf0o7Zli43xteOSADglFVVGcWIPus/honBmP3dqkV+ O1n6WZq95eEZVl8HciZOQKANz+yrcVyMi6u/8WioGdn/3Sl9YodpN2R0kuSr++5V //4XYO1crzMNqqjkxLTJNRNZiV0XH/WwiJx9MOoCKDMooBo7MNbDPtIcfeSbddIl GhqPEvwK+xBWyOatJMFsyRVlFvcWCfdKGBE//B7+m9ml7eS5pVe/kngphbvA6RI= =31Lf -END PGP SIGNATURE-
Re: [PATCH] Re: [v3] updated atomic configury
On 11/28/2011 11:42 AM, Joseph S. Myers wrote: On Mon, 28 Nov 2011, Andrew MacLeod wrote: That would be why cppbuiltin.c, defining __SIZEOF_POINTER__, needs to use the POINTER_SIZE macro instead of directly using ptr_type_node. You may need to do something similar. /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E switch. */ cpp_define_formatted (pfile, __SIZEOF_POINTER__=%d, POINTER_SIZE / BITS_PER_UNIT); How bizarre... thanks you saved me some hair pulling I'm sure :-) That does work, here's a patch which does that.. bootstraps on x86_64-unknown-linux-gnu and has no new regressions. OK for mainline? Andrew c-family * c-cpp-builtin.c (cpp_atomic_builtins):New. Emit all atomic predefines in one place. Add LOCK_FREE predefines. (c_cpp_builtins): Move Legacy HAVE_SYNC predefines to new func. libstdc++-v3 * include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp predefined macros. * testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER macro checks. Check for expected compile time values. Index: gcc/c-family/c-cppbuiltin.c === *** gcc/c-family/c-cppbuiltin.c (revision 181698) --- gcc/c-family/c-cppbuiltin.c (working copy) *** c_cpp_builtins_optimize_pragma (cpp_read *** 568,573 --- 568,684 } + /* This function will emit cpp macros to indicate the presence of various lock +free atomic operations. */ + + static void + cpp_atomic_builtins (cpp_reader *pfile) + { + /* Set a flag for each size of object that compare and swap exists for up to + a 16 byte object. */ + #define SWAP_LIMIT 17 + bool have_swap[SWAP_LIMIT]; + unsigned int psize; + + /* Clear the map of sizes compare_and swap exists for. */ + memset (have_swap, 0, sizeof (have_swap)); + + /* Tell source code if the compiler makes sync_compare_and_swap + builtins available. */ + #ifndef HAVE_sync_compare_and_swapqi + #define HAVE_sync_compare_and_swapqi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapqi + #define HAVE_atomic_compare_and_swapqi 0 + #endif + + if (HAVE_sync_compare_and_swapqi || HAVE_atomic_compare_and_swapqi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1); + have_swap[1] = true; + } + + #ifndef HAVE_sync_compare_and_swaphi + #define HAVE_sync_compare_and_swaphi 0 + #endif + #ifndef HAVE_atomic_compare_and_swaphi + #define HAVE_atomic_compare_and_swaphi 0 + #endif + if (HAVE_sync_compare_and_swaphi || HAVE_atomic_compare_and_swaphi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2); + have_swap[2] = true; + } + + #ifndef HAVE_sync_compare_and_swapsi + #define HAVE_sync_compare_and_swapsi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapsi + #define HAVE_atomic_compare_and_swapsi 0 + #endif + if (HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4); + have_swap[4] = true; + } + + #ifndef HAVE_sync_compare_and_swapdi + #define HAVE_sync_compare_and_swapdi 0 + #endif + #ifndef HAVE_atomic_compare_and_swapdi + #define HAVE_atomic_compare_and_swapdi 0 + #endif + if (HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8); + have_swap[8] = true; + } + + #ifndef HAVE_sync_compare_and_swapti + #define HAVE_sync_compare_and_swapti 0 + #endif + #ifndef HAVE_atomic_compare_and_swapti + #define HAVE_atomic_compare_and_swapti 0 + #endif + if (HAVE_sync_compare_and_swapti || HAVE_atomic_compare_and_swapti) + { + cpp_define (pfile, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16); + have_swap[16] = true; + } + + /* Tell the source code about various types. These map to the C++11 and C1x + macros where 2 indicates lock-free always, and 1 indicates sometimes + lock free. */ + #define SIZEOF_NODE(T) (tree_low_cst (TYPE_SIZE_UNIT (T), 1)) + #define SWAP_INDEX(T) ((SIZEOF_NODE (T) SWAP_LIMIT) ? SIZEOF_NODE (T) : 0) + builtin_define_with_int_value (__GCC_ATOMIC_BOOL_LOCK_FREE, + (have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1)); + builtin_define_with_int_value (__GCC_ATOMIC_CHAR_LOCK_FREE, + (have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1)); + builtin_define_with_int_value (__GCC_ATOMIC_CHAR16_T_LOCK_FREE, + (have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1)); + builtin_define_with_int_value (__GCC_ATOMIC_CHAR32_T_LOCK_FREE, + (have_swap[SWAP_INDEX (char32_type_node)]? 2 : 1)); + builtin_define_with_int_value (__GCC_ATOMIC_WCHAR_T_LOCK_FREE, + (have_swap[SWAP_INDEX (wchar_type_node)]? 2 : 1)); +
[patch] libjava/classpath configure update for FreeBSD 10 and up
All, I'd like to commit the attached patch to trunk and gcc-4.6. It is a follow up from this one: http://gcc.gnu.org/ml/gcc-cvs/2011-11/msg00886.html I missed this one because in my setup I didn't build any shared libraries from classpath. The config.rpath patch is already sent to classpath. Any objections to this patch? Otherwise I'm going to commit within 24h. TIA, Andreas libjava/classpath/ChangeLog.gcj: 2011-11-28 Andreas Tobler andre...@fgznet.ch * config.rpath (ld_shlibs): Fix detection of FreeBSD-10 and up. (libname_spec): Likewise. * configure: Regenerate with autoconf -I ../../. Index: configure === --- configure (revision 181783) +++ configure (working copy) @@ -10025,7 +10025,7 @@ ;; # Unfortunately, older versions of FreeBSD 2 do not have this feature. -freebsd2*) +freebsd2.*) archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_direct=yes hardcode_minus_L=yes @@ -10941,7 +10941,7 @@ objformat=`/usr/bin/objformat` else case $host_os in -freebsd[123]*) objformat=aout ;; +freebsd[23].*) objformat=aout ;; *) objformat=elf ;; esac fi @@ -10959,7 +10959,7 @@ esac shlibpath_var=LD_LIBRARY_PATH case $host_os in - freebsd2*) + freebsd2.*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) @@ -14225,7 +14225,7 @@ esac ;; - freebsd[12]*) + freebsd2.*) # C++ shared libraries reported to be fairly broken before # switch to ELF ld_shlibs_CXX=no @@ -16000,7 +16000,7 @@ objformat=`/usr/bin/objformat` else case $host_os in -freebsd[123]*) objformat=aout ;; +freebsd[23].*) objformat=aout ;; *) objformat=elf ;; esac fi @@ -16018,7 +16018,7 @@ esac shlibpath_var=LD_LIBRARY_PATH case $host_os in - freebsd2*) + freebsd2.*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) Index: config.rpath === --- config.rpath(revision 181783) +++ config.rpath(working copy) @@ -361,7 +361,7 @@ hardcode_libdir_flag_spec='-R$libdir' hardcode_direct=yes ;; -freebsd2*) +freebsd2.*) hardcode_direct=yes hardcode_minus_L=yes ;; @@ -533,7 +533,7 @@ ;; freebsd* | dragonfly*) case $host_os in - freebsd[123]*) + freebsd[23].*) library_names_spec='$libname$shrext$versuffix' ;; *) library_names_spec='$libname$shrext' ;;
Re: [PATCH] Re: [v3] updated atomic configury
On 11/28/2011 11:29 AM, Andrew MacLeod wrote: c-family * c-cpp-builtin.c (cpp_atomic_builtins):New. Emit all atomic predefines in one place. Add LOCK_FREE predefines. (c_cpp_builtins): Move Legacy HAVE_SYNC predefines to new func. libstdc++-v3 * include/bits/atomic_base.h (ATOMIC_*_LOCK_FREE): Use new cpp predefined macros. * testsuite/29_atomics/headers/atomic/macros.cc: Add BOOL and POINTER macro checks. Check for expected compile time values. Ok. r~
Re: [PATCH] Copy over TREE_THIS_VOLATILE during forwprop (PR tree-optimization/50078)
On Mon, Nov 28, 2011 at 12:16:45PM -0700, Jeff Law wrote: On the following testcase the volatile load/store are optimized away. IMHO this is a bug in forwprop, which replaces tmp_Y = nonvolvar[arg_X]; MEM[(volatile ...*)tmp_Y] ={v} ...; with MEM[(volatile ...*)nonvolvar][tmp_Y] ={v} ...; where the LHS is no longer TREE_THIS_VOLATILE like before, TREE_THIS_VOLATILE is newly only the MEM_REF operand of the non-volatile ARRAY_REF. This patch copies the TREE_THIS_VOLATILE bit to the ARRAY_REF. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR tree-optimization/50078 * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Copy over TREE_THIS_VOLATILE also from the old to new lhs resp. rhs. * gcc.dg/pr50078.c: New test. Do we need to copy NOTRAP or any of the other flags? OK, assuming we don't need to copy NOTRAP, etc. If those need to be copied too, preapproved with those changes. I had there TREE_THIS_NOTRAP as well in the first iteration, but it ICEd, as TREE_THIS_NOTRAP is not valid e.g. on COMPONENT_REF. Jakub
Re: [PATCH] Run remove_unreachable_handlers before expansion if needed (PR tree-optimization/50682)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/23/11 13:42, Jakub Jelinek wrote: Hi! As mentioned in the PR, if gimple_purge_dead_eh_edges is ever called after ehcleanup2 pass (or in its after todos as in the testcase below), nothing removes the unreachable EH regions and we crash either in cleanup_dead_labels_eh or, if we ignore those there, later on during emitting of exception info. We could run remove_unreachable_handlers unconditionally again shortly before expansion, but that needs a walk of all stmts, so this patch chooses to do it only if it sees some landing pads got removed. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-23 Jakub Jelinek ja...@redhat.com PR tree-optimization/50682 * tree-eh.c (maybe_remove_unreachable_handlers): New function. * tree-flow.h (maybe_remove_unreachable_handlers): New prototype. * tree-optimize.c (execute_cleanup_cfg_post_optimizing): Call it. * g++.dg/opt/pr50682.C: New test. OK. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO0+VXAAoJEBRtltQi2kC7KHsH/3lnrTb35gC1Qomuqwjx5bxe CigQq/U1li55183bUDgNlKVWg3T/EtZO+QFtsdcy+e6lNZD7RebuNfIzz9rO8AT2 ra8Gj9Lz9fmZflhO64qvK+ti6k6n3/3wkM7e7b3z4+hWZWbf0OT9YI5eU/rfy6c7 57ylnZU5xGV6tvA16v7VTPoUQ2u2Pkan9PgRG8E+YZo7xTz0oxHM9o6CZe7VAPf5 rxvAY1KSBS16RNvCxtaZfBxpf8BdgOIF2wW7hWwfoyjtSk8DItFRRuD262AX3Z4R Ka83fO+ukTnz1UUTPPzxszX0ZDAsqzD0iD3sjzh9PQcEvgiriVJcWrwvavMVLfg= =Cmlo -END PGP SIGNATURE-
Re: New port^2: Renesas RL78
Ping? Anything else for this? http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02178.html http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01467.html http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01356.html
Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files
Hello world, the testing of the test patch I submitted earlier (Thanks Salvatore and Joost!) has shown a performance increase, so here is a formal submission. No test case because this patch is not supposed to change anything, just make module reading a bit more efficient. Regression-tested on x86_64-unknown-linux-gnu. OK for trunk? Thomas 2011-11-28 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/40958 * module.c (prev_module_line): New variable. (prev_module_column): New variable. (prev_character): New variable. (module_char): Update the new variables. (module_unget_char): New function. (parse_string): Use module_unget_char. (parse_integer): Likewise. (parse_name): Likewise. Index: module.c === --- module.c (Revision 181745) +++ module.c (Arbeitskopie) @@ -194,6 +194,8 @@ static char module_name[GFC_MAX_SYMBOL_LEN + 1]; static bool specified_nonint, specified_int; static int module_line, module_column, only_flag; +static int prev_module_line, prev_module_column, prev_character; + static enum { IO_INPUT, IO_OUTPUT } iomode; @@ -1036,6 +1038,10 @@ module_char (void) if (c == EOF) bad_module (Unexpected EOF); + prev_module_line = module_line; + prev_module_column = module_column; + prev_character = c; + if (c == '\n') { module_line++; @@ -1046,7 +1052,17 @@ module_char (void) return c; } +/* Unget a character while remembering the line and column. Works for + a single character only. */ +static void +module_unget_char (void) +{ + module_line = prev_module_line; + module_column = prev_module_column; + ungetc (prev_character, module_fp); +} + /* Parse a string constant. The delimiter is guaranteed to be a single quote. */ @@ -1106,24 +1122,22 @@ parse_string (void) static void parse_integer (int c) { - module_locus m; - atom_int = c - '0'; for (;;) { - get_module_locus (m); - c = module_char (); if (!ISDIGIT (c)) - break; + { + module_unget_char (); + break; + } atom_int = 10 * atom_int + c - '0'; if (atom_int ) bad_module (Integer overflow); } - set_module_locus (m); } @@ -1132,7 +1146,6 @@ parse_integer (int c) static void parse_name (int c) { - module_locus m; char *p; int len; @@ -1141,13 +1154,14 @@ parse_name (int c) *p++ = c; len = 1; - get_module_locus (m); - for (;;) { c = module_char (); if (!ISALNUM (c) c != '_' c != '-') - break; + { + module_unget_char (); + break; + } *p++ = c; if (++len GFC_MAX_SYMBOL_LEN) @@ -1156,11 +1170,6 @@ parse_name (int c) *p = '\0'; - fseek (module_fp, -1, SEEK_CUR); - module_column = m.column + len - 1; - - if (c == '\n') -module_line--; }
Re: Support Solaris 11/SPARC in MD_FALLBACK_FRAME_STATE_FOR (PR ada/41929)
The result has been bootstrapped by Eric (thanks for this and debugging why a first version caused some regressions on Solaris 8) and myself on Solaris 8, 9, 10 and 11, and I've performed quite a bit of software archaeology (i.e. stared at old libc.so.1 and libthread.so.1 patches) to make sure noting got missed in the process. All EH-related tests fail with a recent upgrade on Solaris 10. Interestingly, the old implementation keeps working with the upgrade. It seems that the __sighndlr case has been over-simplified: /* Look for the __sighndlr pattern. */ else if (*(unsigned int *)(pc-(4*5)) == sighndlr_pattern[0] *(unsigned int *)(pc-(4*4)) == sighndlr_pattern[1] *(unsigned int *)(pc-(4*3)) == sighndlr_pattern[2] *(unsigned int *)(pc-(4*2)) == sighndlr_pattern[3] *(unsigned int *)(pc-(4*1)) == sighndlr_pattern[4] *(unsigned int *)(pc-(4*0)) == sighndlr_pattern[5] *(unsigned int *)(pc+(4*1)) == sighndlr_pattern[6] ) { /* We have observed different calling frames among different versions of the operating system, so that we need to discriminate using the upper frame. We look for the return address of the caller frame (there is an offset of 15 words between the frame address and the place where this return address is stored) in order to do some more pattern matching. */ unsigned int cuh_pattern = *(unsigned int *)(*(unsigned int *)(this_cfa + 15*4) - 4); if (cuh_pattern == 0xd407a04c) { /* This matches the call_user_handler pattern for Solaris 10. There are 2 cases so we look for the return address of the caller's caller frame in order to do more pattern matching. */ unsigned int sah_pattern = *(unsigned int *)(*(unsigned int *)(this_cfa + 96 + 15*4) - 4); if (sah_pattern == 0x92100019) /* This is the same setup as for Solaris 9, see below. */ regs_off = 96 + 96 + 96 + 160; else /* We need to move up three frames (the kernel frame, the call_user_handler frame, the __sighndlr frame). Two of them have the minimum stack frame size (kernel and __sighndlr frames) of 96 bytes, and there is another with a stack frame of 160 bytes (the call_user_handler frame). The ucontext_t structure is after this offset. */ regs_off = 96 + 96 + 160; } else if (cuh_pattern == 0x9410001a || cuh_pattern == 0x9410001b) /* This matches the call_user_handler pattern for Solaris 9 and for Solaris 8 running inside Solaris Containers respectively. We need to move up four frames (the kernel frame, the signal frame, the call_user_handler frame, the __sighndlr frame). Three of them have the minimum stack frame size (kernel, signal, and __sighndlr frames) of 96 bytes, and there is another with a stack frame of 160 bytes (the call_user_handler frame). The ucontext_t structure is after this offset. */ regs_off = 96 + 96 + 96 + 160; else /* We need to move up three frames (the kernel frame, the sigacthandler frame, and the __sighndlr frame). Two of them have the minimum stack frame size (kernel and __sighndlr frames) of 96 bytes, and there is another with a stack frame of 216 bytes (the sigacthandler frame). The ucontext_t structure is after this offset. */ regs_off = 96 + 96 + 216; } to if(/* Solaris 8+ - multi-threaded __sighndlr:save %sp, -96, %sp __sighndlr+4: mov %i0, %o0 __sighndlr+8: mov %i1, %o1 __sighndlr+12: call %i3 __sighndlr+16: mov %i2, %o2 __sighndlr+20: ret --- PC __sighndlr+24: restore */ pc[-5] == 0x9de3bfa0 pc[-4] == 0x90100018 pc[-3] == 0x92100019 pc[-2] == 0x9fc6c000 pc[-1] == 0x9410001a pc[ 0] == 0x81c7e008 pc[ 1] == 0x81e8) { if (/* Solaris 8 /usr/lib/libthread.so.1 -- sigacthandler+1796: mov %i0, %o0 */ savpc[-1] == 0x90100018) { /* We need to move up two frames: signal handler-- context-cfa __sighndlr sigacthandler kernel */ *nframes = 2; } else /* Solaris 8 /usr/lib/lwp/libthread.so.1, Solaris 9+ -- */ { /* We need to move up three frames: signal handler-- context-cfa __sighndlr call_user_handler sigacthandler kernel
Re: [google] Patch to enable efficient function level instrumentation
Ping! -- Harshit
Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files
On Mon, Nov 28, 2011 at 22:29, Thomas Koenig tkoe...@netcologne.de wrote: Hello world, the testing of the test patch I submitted earlier (Thanks Salvatore and Joost!) has shown a performance increase, so here is a formal submission. No test case because this patch is not supposed to change anything, just make module reading a bit more efficient. Regression-tested on x86_64-unknown-linux-gnu. OK for trunk? Thomas 2011-11-28 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/40958 * module.c (prev_module_line): New variable. (prev_module_column): New variable. (prev_character): New variable. (module_char): Update the new variables. (module_unget_char): New function. (parse_string): Use module_unget_char. (parse_integer): Likewise. (parse_name): Likewise. Nice! Ok for trunk. -- Janne Blomqvist
Re: [PATCH] PR c++/51289 - ICE with alias template for bound template
On 11/28/2011 11:54 AM, Dodji Seketeli wrote: @@ -3028,10 +3028,12 @@ find_parameter_packs_r (tree*tp, int *walk_subtrees, void* data) case BOUND_TEMPLATE_TEMPLATE_PARM: /* Check the template itself. */ - cp_walk_tree (TREE_TYPE (TYPE_TI_TEMPLATE (t)), + cp_walk_tree (TREE_TYPE (TI_TEMPLATE + (TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (t))), find_parameter_packs_r, ppd, ppd-visited); /* Check the template arguments. */ - cp_walk_tree (TYPE_TI_ARGS (t),find_parameter_packs_r, ppd, + cp_walk_tree (TI_ARGS (TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (t)), + find_parameter_packs_r, ppd, ppd-visited); *walk_subtrees = 0; return NULL_TREE; Instead of this change, I think we should handle typedefs/aliases at the top of the function. We shouldn't need to look into the underlying type for packs. Jason
Re: [PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)
Here is an attempt to make the check more complete (e.g. the change wouldn't see overlap if addr was PLUS of two REGs, where one of the REGs was based on internal_arg_pointer, etc.) and less pessimistic. As tree-tailcall.c doesn't allow tail calls from functions that have address of any of the caller's parameters taken, IMHO it is enough to look for internal_arg_pointer based pseudos initialized in the tail call sequence. This patch scans the tail call sequence and notes which pseudos are based on internal_arg_pointer (and what offset from that pointer they have) and uses that in mem_overlaps_already_clobbered_arg_p. This looks reasonable, but the logic is a bit hard to follow, especially the double usage of internal_arg_pointer_based_reg depending on SCAN's value. Would it be possible to split it into 2 functions that recursively call each other? You should also make it clearer that internal_arg_pointer_seq_start is part of the caching mechanism (I wondered for some minutes whether it has anything to do with RTL sequences), maybe: /* Last insn that has been scanned by internal_arg_pointer_based_reg, or NULL_RTX if none has been scanned yet. */ static rtx internal_arg_pointer_insn_cache; /* Vector indexed by REGNO - FIRST_PSEUDO_REGISTER, recording if a pseudo is based on crtl-args.internal_arg_pointer. The element is NULL_RTX if the pseudo isn't based on it, a CONST_INT offset if the pseudo is based on it with fixed offset, or PC if this is with variable or unknown offset. */ static VEC(rtx, heap) *internal_arg_pointer_pseudo_cache; +/* Helper function for internal_arg_pointer_based_reg, called through + for_each_rtx. Return 1 if a crtl-args.internal_arg_pointer based + register is seen anywhere. */ + +static int +internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED) +{ + if (REG_P (*loc) internal_arg_pointer_based_reg (*loc, false) != NULL_RTX) +return 1; + if (MEM_P (*loc)) +return -1; + return 0; +} Missing comment for the MEM_P case. +/* If REG is based on crtl-args.internal_arg_pointer, return either + a CONST_INT offset from crtl-args.internal_arg_pointer if + offset from it is known constant, or PC if the offset is unknown. + Return NULL_RTX if REG isn't based on crtl-args.internal_arg_pointer. */ + +static rtx +internal_arg_pointer_based_reg (rtx reg, bool scan) +{ + rtx insn; + + if (CONSTANT_P (reg)) +return NULL_RTX; + + if (reg == crtl-args.internal_arg_pointer) +return const0_rtx; + + if (REG_P (reg) REGNO (reg) FIRST_PSEUDO_REGISTER) +return NULL_RTX; if (REG_P (reg) HARD_REGISTER_P (reg)) return NULL_RTX; -- Eric Botcazou
Re: [google] Patch to enable efficient function level instrumentation (issue 5416043)
Please also explain the need for backpointer section. David http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c File gcc/config/i386/i386.c (right): http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10801 gcc/config/i386/i386.c:10801: +static bool Add an empty line between comment and function. http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10811 gcc/config/i386/i386.c:10811: + if (!patch_functions_ignore_loops) What exactly does this option try to do here? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10822 gcc/config/i386/i386.c:10822: +return true; should it have a an else return false here? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10841 gcc/config/i386/i386.c:10841: +++num_insns; NON_DEBUG_INSN_P (insn) http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10859 gcc/config/i386/i386.c:10859: + patch_current_function_p = check_should_patch_current_function(); Why do you need a static variable for this? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10874 gcc/config/i386/i386.c:10874: + '_function_patch_epilogue'. Explain why a backpointer section is needed. http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10888 gcc/config/i386/i386.c:10888: + unsigned int num_actual_nops = num_remaining_nops - 8; hard code of 8? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10897 gcc/config/i386/i386.c:10897: +return false; Can this be guarded in the caller of this function? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.c#newcode10927 gcc/config/i386/i386.c:10927: + is later renamed to 'section_name' by ix86_elf_asm_named_section(). */ Explain more on the comdat handling. http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt File gcc/config/i386/i386.opt (right): http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode570 gcc/config/i386/i386.opt:570: Minimum number of instructions in the function without loop before the function is qualified for patching for instrumentation (for use with -mpatch-functions-for-instrumentation) It may be better to define PARAM for it. http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode574 gcc/config/i386/i386.opt:574: Ignore loops when deciding whether to patch a function for instrumentation (for use with -mpatch-functions-for-instrumentation). What is the motivation for this option? http://codereview.appspot.com/5416043/diff/1/gcc/config/i386/i386.opt#newcode578 gcc/config/i386/i386.opt:578: Treat 'main' as any other function and only patch it if it meets the criteria for loops and minimum number of instructions (for use with -mpatch-functions-for-instrumentation). What is the motivation for this option? http://codereview.appspot.com/5416043/
Re: Fix libgomp semaphores
On Mon, Nov 28, 2011 at 05:23:37PM +0100, Jakub Jelinek wrote: On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote: + int count = *sem; + + while ((count 0x7fff) != 0) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, count - 1, + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + return; Why aren't you instead testing the return value of __atomic_compare_exchange_4 (here and in other cases)? If you use the return value on powerpc, you find that requires two load immediate instructions (loading zero and one), and a compare against zero. That makes three fewer instructions as written, because the oldval == count comparison has already been done inside the atomic sequence. I'd expect fewer on most other architectures unless they happen to have a compare and exchange instruction that sets condition bits (ie. Intel). Even on Intel the way I've written the code shouldn't take more instructions with a properly written cmpxchg rtl description. Does it? Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. -- Alan Modra Australia Development Lab, IBM
Re: [trunk] RFS: translate built-in include paths for sysroot (issue 5394041)
Hi, Joseph, get a chance to take a look? Thanks! -Han On 2011/11/19 00:10:20, shenhan wrote: Hi, Joseph, thanks! ChangeLog entries added to the issue description. ChangeLog * Makefile.in (GPLUSPLUS_INCLUDE_DIR_ADD_SYSROOT): add a macro definition to compile command. * cppdefault.c (GPLUSPLUS_INCLUDE_DIR_ADD_SYSROOT): replace hard coded add_sysroot field with the control macro. * configure.ac (gcc_gxx_include_dir_add_sysroot): add a flag variable to control whether sysroot should be prepended to gxx include dir. * configure : Regenerate. Yeah, I should include the URL for the previous discussion in the mail. (I previously put it in the issue description together with the rationale.) This is a follow up for issue - http://codereview.appspot.com/4641076. The rationale for this patch is also copied here: == The setup: Configuring a toolchain targeting x86-64 GNU Linux (Ubuntu Lucid), as a cross-compiler. Using a sysroot to provide the Lucid headers+libraries, with the sysroot path being within the GCC install tree. Want to use the Lucid system libstdc++ and headers, which means that I'm not building/installing libstdc++-v3. So, configuring with: --with-sysroot=$SYSROOT --disable-libstdc++-v3 \ --with-gxx-include-dir=$SYSROOT/usr/include/c++/4.4 \ (among other options). Hoping to support two usage models with this configuration, w.r.t. use of the sysroot: (1) somebody installs the sysroot in the normal location relative to the GCC install, and relocates the whole bundle (sysroot+GCC). This works great AFAICT, GCC finds its includes (including the C++ includes) thanks to the add_standard_paths iprefix handling. (2) somebody installs the sysroot in a non-standard location, and uses --sysroot to try to access it. This works fine for the C headers, but doesn't work. For the C headers, add_standard_paths prepends the sysroot location to the /usr/include path (since that's what's specified in cppdefault.c for that path). It doesn't do the same for the C++ include path, though (again, as specified in cppdefault.c). add_standard_paths doesn't attempt to relocate built-in include paths that start with the compiled-in sysroot location (e.g., the g++ include dir, in this case). This isn't surprising really: normally you either prepend the sysroot location or you don't (as specified by cppdefault.c); none of the built-in paths normally *start* with the sysroot location and need to be relocated. However, in this odd-ball case of trying to use the C++ headers from the sysroot, one of the paths *does* need to be relocated in this way. = Chris(cgd@) provided a patch for the issue, but there was a different opinion, my patch here is just coded as suggested. Tested before/after on a x86_64-unknown-linux-gnu system, with --enable-languages=all, RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'. No changes in test results. -Han On Fri, Nov 18, 2011 at 8:54 AM, Joseph S. Myers jos...@codesourcery.comwrote: On Tue, 15 Nov 2011, Han Shen wrote: 2011-11-15 Han Shen mailto:shen...@google.com * gcc/Makefile.in: * gcc/configure: * gcc/cppdefault.c: You need to include the full ChangeLog entries with your patch. Note that paths in ChangeLogs are relative to the directory with the ChangeLog, so no gcc/ in this case. Please also include the full rationale with your patch *and URLs for the previous discussion* (from June, it seems) so that reviewers don't need to track that down themselves. diff --git a/gcc/configure b/gcc/configure index 99334ce..364d8c2 100755 --- a/gcc/configure +++ b/gcc/configure It's not possible to review this patch as is because you've only included the changes to the generated file configure, not to its source file configure.ac. Please make sure that your resubmission includes all the source file changes. (There is no need to include the changes to the generated file configure at all in the submission; the ChangeLog entry can just mention it as * configure: Regenerate..) -- Joseph S. Myers mailto:jos...@codesourcery.com -- http://codereview.appspot.com/5394041/
Re: [PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)
On Mon, Nov 28, 2011 at 11:10:56PM +0100, Eric Botcazou wrote: This looks reasonable, but the logic is a bit hard to follow, especially the double usage of internal_arg_pointer_based_reg depending on SCAN's value. Would it be possible to split it into 2 functions that recursively call each other? Well, splitting off the scanning into separate function is quite easy, but that would still mean SCAN parameter. The thing is, for the common case where ADDR is already internal_arg_pointer, or sum of that and CONST_INT, or SYMBOL_REF, I'd prefer to avoid the scanning of the instructions and only do that if really needed (i.e. when either looking for a pseudo or arbitrary expression that needs to be for_each_rtx scanned). To get rid of the SCAN argument I'm afraid the first part of the function would need to be duplicated (for the current SCAN and !SCAN cases), then the scanning phase in another function and the last part in another one. You should also make it clearer that internal_arg_pointer_seq_start is part of the caching mechanism (I wondered for some minutes whether it has anything to do with RTL sequences), maybe: Ok. +static int +internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED) +{ + if (REG_P (*loc) internal_arg_pointer_based_reg (*loc, false) != NULL_RTX) +return 1; + if (MEM_P (*loc)) +return -1; + return 0; +} Missing comment for the MEM_P case. Will add. if (REG_P (reg) HARD_REGISTER_P (reg)) return NULL_RTX; Ok. BTW, it fixes also wrong-code PR51323, so I'll be including new testcase in the next patch iteration. Jakub
Re: Fix libgomp semaphores
On 11/28/2011 02:16 PM, Alan Modra wrote: Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. True. For weak, the value *should* always be used (otherwise the user program is broken). However, we can do better by considering the value to be stored in CR0. We perform the comparison in the loop, which sets CR0 == EQ (true) or NE (false); CR0 is set again by the STWCX insn in the same fashion. So the /* ??? It's either this or an unlikely jump over (set bool 1). */ x = gen_rtx_EQ (SImode, cond, const0_rtx); emit_insn (gen_rtx_SET (VOIDmode, boolval, x)); code currently emitted by rs6000_expand_atomic_compare_and_swap is sufficient, and merely needs to be adjusted so that it is computed after label2, and other settings to boolval removed. This may be sufficient for better stong compare-and-swap as well. r~
[PATCH] Don't ICE on label DEBUG_INSN in rtl loop unrolling (PR rtl-optimization/51014)
Hi! DEBUG_INSN with LABEL_DECL var isn't duplicated in bb copies (we want just one definition of the label), which breaks apply_opt_in_copies attempt to match insn in bb copy with orig_insn from the orig_bb. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-28 Jakub Jelinek ja...@redhat.com PR rtl-optimization/51014 * loop-unroll.c (apply_opt_in_copies): Ignore label DEBUG_INSNs both from bb and orig_bb. * g++.dg/opt/pr51014.C: New test. --- gcc/loop-unroll.c.jj2011-02-15 15:42:26.0 +0100 +++ gcc/loop-unroll.c 2011-11-28 21:03:58.089497366 +0100 @@ -2262,10 +2262,15 @@ apply_opt_in_copies (struct opt_info *op for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next) { next = NEXT_INSN (insn); - if (!INSN_P (insn)) + if (!INSN_P (insn) + || (DEBUG_INSN_P (insn) + TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == LABEL_DECL)) continue; - while (!INSN_P (orig_insn)) + while (!INSN_P (orig_insn) +|| (DEBUG_INSN_P (orig_insn) + (TREE_CODE (INSN_VAR_LOCATION_DECL (orig_insn)) +== LABEL_DECL))) orig_insn = NEXT_INSN (orig_insn); ivts_templ.insn = orig_insn; --- gcc/testsuite/g++.dg/opt/pr51014.C.jj 2011-11-28 21:08:19.518986308 +0100 +++ gcc/testsuite/g++.dg/opt/pr51014.C 2011-11-28 21:07:24.0 +0100 @@ -0,0 +1,16 @@ +// PR rtl-optimization/51014 +// { dg-do compile } +// { dg-options -O2 -funroll-loops -fcompare-debug } + +struct S +{ + ~S() { delete s; } + int *s; +}; + +void +f (S *x, S *y) +{ + for (; x != y; ++x) +x-~S(); +} Jakub
Re: PR other/51174: handle architectures with no DECL_COMDAT_GROUP
On Wed, Nov 23, 2011 at 8:57 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Nov 23, 2011 at 07:47:46AM -0600, Aldy Hernandez wrote: @@ -4198,7 +4198,7 @@ ipa_tm_create_version_alias (struct cgra TREE_SYMBOL_REFERENCED (tm_name) = 1; /* Perform the same remapping to the comdat group. */ - if (DECL_COMDAT (new_decl)) + if (HAVE_COMDAT_GROUP DECL_COMDAT (new_decl)) DECL_COMDAT_GROUP (new_decl) = tm_mangle (DECL_COMDAT_GROUP (old_decl)); new_node = cgraph_same_body_alias (NULL, new_decl, info-new_decl); Wouldn't it be better to test if (DECL_ONE_ONLY (new_decl)) instead? That is actually test for non-NULL DECL_COMDAT_GROUP and is what e.g. cp/ uses as guards on tweaking DECL_COMDAT_GROUP. BTW, the formatting is wrong above, no space before and two spaces after it. That will work as well. Which solution should be committed? Thanks, David
Re: Fix libgomp semaphores
On 11/27/2011 02:53 PM, Alan Modra wrote: +enum memmodel +{ + MEMMODEL_RELAXED = 0, + MEMMODEL_CONSUME = 1, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_ACQ_REL = 4, + MEMMODEL_SEQ_CST = 5, + MEMMODEL_LAST = 6 +}; This should probably go to libgomp.h. /* This is a Linux specific implementation of a semaphore synchronization mechanism for libgomp. This type is private to the library. This + counting semaphore implementation uses atomic instructions and the + futex syscall, and a single 32-bit int to store semaphore state. + The low 31 bits are the count, the top bit is a flag set when some + threads may be waiting. */ I think we'll get better code generation on a number of targets if we make the low bit the waiting bit, and the higher bits the count. This we do (count 1) and (count + 2) instead of larger constants needing to be generated. Not changed below... +static inline void +gomp_sem_wait (gomp_sem_t *sem) { + int count = *sem; + + while ((count 0x7fff) != 0) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, count - 1, +false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + return; + count = oldval; +} I'd really prefer not to hard-code any sizes at all. Let's change the explicit _4 to _n everywhere. The loop above doesn't actually make sense to me. If the compare-and-swap succeeds, then oldval == count - 1. Which doesn't seem to be what you're testing at all. Perhaps this entire function is better written as { int count = *sem; do { if (__builtin_expect (count 0x8000u, 0) { gomp_sem_wait_slow (sem, count); return; } } while (!__atomic_compare_exchange_n (sem, count, count - 1, true, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED)); } +gomp_sem_post (gomp_sem_t *sem) { + int count = *sem; + + while (1) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, (count + 1) 0x7fff, +false, MEMMODEL_RELEASE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + break; Again, this equality doesn't make sense. Further, you aren't testing for the high bit set inside the loop, nor are you preserving the high bit. Perhaps better written as { int count = *sem; do { if (__builtin_expect (count 0x8000u, 0)) { gomp_sem_post_slow (sem, count); return; } /* We can't post more than 2**31-1 times. */ assert (count 0x7fff); } while (!__atomic_compare_exchange_n (sem, count, count + 1, true, MEMMODEL_RELEASE, MEMMODEL_RELAXED)); } ... All that said, I don't see how we can ever clear the wait bit once its set? Are we better off splitting the 32-bit value into two 16-bit fields for value+waiters? Or similarly splitting a 64-bit value? That way we can at least update both fields atomically, and we ought never lose a waiter. r~
Re: PR other/51174: handle architectures with no DECL_COMDAT_GROUP
On 11/28/2011 03:54 PM, David Edelsohn wrote: On Wed, Nov 23, 2011 at 8:57 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Nov 23, 2011 at 07:47:46AM -0600, Aldy Hernandez wrote: @@ -4198,7 +4198,7 @@ ipa_tm_create_version_alias (struct cgra TREE_SYMBOL_REFERENCED (tm_name) = 1; /* Perform the same remapping to the comdat group. */ - if (DECL_COMDAT (new_decl)) + if (HAVE_COMDAT_GROUP DECL_COMDAT (new_decl)) DECL_COMDAT_GROUP (new_decl) = tm_mangle (DECL_COMDAT_GROUP (old_decl)); new_node = cgraph_same_body_alias (NULL, new_decl, info-new_decl); Wouldn't it be better to test if (DECL_ONE_ONLY (new_decl)) instead? That is actually test for non-NULL DECL_COMDAT_GROUP and is what e.g. cp/ uses as guards on tweaking DECL_COMDAT_GROUP. BTW, the formatting is wrong above, no space before and two spaces after it. That will work as well. Which solution should be committed? Please use the DECL_ONE_ONLY test.
Re: Fix libgomp semaphores
On 11/28/2011 03:05 PM, Richard Henderson wrote: On 11/28/2011 02:16 PM, Alan Modra wrote: Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. However, we can do better by considering the value to be stored in CR0... Try this and see if it generates the sort of code you want. Untested. r~ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index f01353b..d3b990f 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -17352,11 +17352,11 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) retval = gen_reg_rtx (SImode); mode = SImode; } + else if (reg_overlap_mentioned_p (retval, oldval)) +oldval = copy_to_reg (oldval); rs6000_pre_atomic_barrier (mod_s); - emit_move_insn (boolval, const0_rtx); - label1 = NULL_RTX; if (!is_weak) { @@ -17374,28 +17374,23 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) NULL_RTX, 1, OPTAB_LIB_WIDEN); } - x = gen_rtx_NE (VOIDmode, x, oldval); - x = rs6000_generate_compare (x, mode); + cond = gen_reg_rtx (CCmode); + x = gen_rtx_COMPARE (CCmode, x, oldval); + emit_insn (gen_rtx_SET (VOIDmode, cond, x)); + + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); emit_unlikely_jump (x, label2); x = newval; if (mask) x = rs6000_mask_atomic_subword (retval, newval, mask); - cond = gen_reg_rtx (CCmode); emit_store_conditional (mode, cond, mem, x); - if (is_weak) -{ - /* ??? It's either this or an unlikely jump over (set bool 1). */ - x = gen_rtx_EQ (SImode, cond, const0_rtx); - emit_insn (gen_rtx_SET (VOIDmode, boolval, x)); -} - else + if (!is_weak) { x = gen_rtx_NE (VOIDmode, cond, const0_rtx); emit_unlikely_jump (x, label1); - emit_move_insn (boolval, const1_rtx); } if (mod_f != MEMMODEL_RELAXED) @@ -17408,6 +17403,18 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) if (shift) rs6000_finish_atomic_subword (operands[1], retval, shift); + + if (is_weak) +{ + x = gen_rtx_EQ (SImode, cond, const0_rtx); + emit_insn (gen_rtx_SET (VOIDmode, boolval, x)); +} + else +{ + x = emit_store_flag_force (boolval, EQ, retval, oldval, mode, 1, 1); + if (x != boolval) + emit_move_insn (boolval, x); +} } /* Expand an atomic exchange operation. */
[rs6000] Streamline compare-and-swap success return value computation
On 11/28/2011 04:26 PM, Richard Henderson wrote: On 11/28/2011 03:05 PM, Richard Henderson wrote: On 11/28/2011 02:16 PM, Alan Modra wrote: Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. However, we can do better by considering the value to be stored in CR0... Try this and see if it generates the sort of code you want. Untested. ... actually, this one. There's no reason to differentiate between strong and weak compare-and-swap when computing boolval. r~ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index f01353b..5a33f91 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -17352,11 +17352,11 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) retval = gen_reg_rtx (SImode); mode = SImode; } + else if (reg_overlap_mentioned_p (retval, oldval)) +oldval = copy_to_reg (oldval); rs6000_pre_atomic_barrier (mod_s); - emit_move_insn (boolval, const0_rtx); - label1 = NULL_RTX; if (!is_weak) { @@ -17374,28 +17374,23 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) NULL_RTX, 1, OPTAB_LIB_WIDEN); } - x = gen_rtx_NE (VOIDmode, x, oldval); - x = rs6000_generate_compare (x, mode); + cond = gen_reg_rtx (CCmode); + x = gen_rtx_COMPARE (CCmode, x, oldval); + emit_insn (gen_rtx_SET (VOIDmode, cond, x)); + + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); emit_unlikely_jump (x, label2); x = newval; if (mask) x = rs6000_mask_atomic_subword (retval, newval, mask); - cond = gen_reg_rtx (CCmode); emit_store_conditional (mode, cond, mem, x); - if (is_weak) -{ - /* ??? It's either this or an unlikely jump over (set bool 1). */ - x = gen_rtx_EQ (SImode, cond, const0_rtx); - emit_insn (gen_rtx_SET (VOIDmode, boolval, x)); -} - else + if (!is_weak) { x = gen_rtx_NE (VOIDmode, cond, const0_rtx); emit_unlikely_jump (x, label1); - emit_move_insn (boolval, const1_rtx); } if (mod_f != MEMMODEL_RELAXED) @@ -17408,6 +17403,10 @@ rs6000_expand_atomic_compare_and_swap (rtx operands[]) if (shift) rs6000_finish_atomic_subword (operands[1], retval, shift); + + /* In all cases, CR0 contains EQ on success, and NE on failure. */ + x = gen_rtx_EQ (SImode, cond, const0_rtx); + emit_insn (gen_rtx_SET (VOIDmode, boolval, x)); } /* Expand an atomic exchange operation. */
Re: Fix libgomp semaphores
On Mon, Nov 28, 2011 at 04:00:10PM -0800, Richard Henderson wrote: On 11/27/2011 02:53 PM, Alan Modra wrote: +enum memmodel +{ + MEMMODEL_RELAXED = 0, + MEMMODEL_CONSUME = 1, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_ACQ_REL = 4, + MEMMODEL_SEQ_CST = 5, + MEMMODEL_LAST = 6 +}; This should probably go to libgomp.h. I wondered about that. /* This is a Linux specific implementation of a semaphore synchronization mechanism for libgomp. This type is private to the library. This + counting semaphore implementation uses atomic instructions and the + futex syscall, and a single 32-bit int to store semaphore state. + The low 31 bits are the count, the top bit is a flag set when some + threads may be waiting. */ I think we'll get better code generation on a number of targets if we make the low bit the waiting bit, and the higher bits the count. This we do (count 1) and (count + 2) instead of larger constants needing to be generated. Not changed below... Funny, that's the way I wrote this code at first, then went for the wait bit as the sign bit because you can test 0 in one place where you want not waiting and count non-zero. +static inline void +gomp_sem_wait (gomp_sem_t *sem) { + int count = *sem; + + while ((count 0x7fff) != 0) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, count - 1, + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + return; + count = oldval; +} I'd really prefer not to hard-code any sizes at all. Let's change the explicit _4 to _n everywhere. OK. The loop above doesn't actually make sense to me. If the compare-and-swap succeeds, then oldval == count - 1. Which doesn't seem to be what you're testing at all. Huh? If it succeeds, oldval == count and we return. Perhaps this entire function is better written as { int count = *sem; do { if (__builtin_expect (count 0x8000u, 0) { gomp_sem_wait_slow (sem, count); return; } } while (!__atomic_compare_exchange_n (sem, count, count - 1, true, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED)); } No, we don't need the slow path if we have *sem 0x7fff non-zero. +gomp_sem_post (gomp_sem_t *sem) { + int count = *sem; + + while (1) +{ + int oldval = count; + __atomic_compare_exchange_4 (sem, oldval, (count + 1) 0x7fff, + false, MEMMODEL_RELEASE, MEMMODEL_RELAXED); + if (__builtin_expect (oldval == count, 1)) + break; Again, this equality doesn't make sense. Further, you aren't testing for the high bit set inside the loop, nor are you preserving the high bit. See above about the equality. We don't want to preserve the wait bit here. Otherwise we'd never go back to the uncontended state, which answers ... All that said, I don't see how we can ever clear the wait bit once its set? this question. Are we better off splitting the 32-bit value into two 16-bit fields for value+waiters? Or similarly splitting a 64-bit value? That way we can at least update both fields atomically, and we ought never lose a waiter. Other splits of the field are certainly possible, but of course restrict the max number, and you'd need the fancy futex op_wait. Some targets don't have 64-bit atomics, so we can't really go that way. Yes, if I did keep track of number of waiters we'd save one unnecessary futex_wake call, but I'm quite confident no waiters will be lost just using a flag. -- Alan Modra Australia Development Lab, IBM
Re: [rs6000] Streamline compare-and-swap success return value computation
On Mon, Nov 28, 2011 at 04:33:58PM -0800, Richard Henderson wrote: On 11/28/2011 04:26 PM, Richard Henderson wrote: On 11/28/2011 03:05 PM, Richard Henderson wrote: On 11/28/2011 02:16 PM, Alan Modra wrote: Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. However, we can do better by considering the value to be stored in CR0... Try this and see if it generates the sort of code you want. Untested. ... actually, this one. There's no reason to differentiate between strong and weak compare-and-swap when computing boolval. Looks very similar to what I wrote, except + else if (reg_overlap_mentioned_p (retval, oldval)) +oldval = copy_to_reg (oldval); I missed this bit - x = gen_rtx_NE (VOIDmode, x, oldval); - x = rs6000_generate_compare (x, mode); + cond = gen_reg_rtx (CCmode); + x = gen_rtx_COMPARE (CCmode, x, oldval); + emit_insn (gen_rtx_SET (VOIDmode, cond, x)); + + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); and instead pulled cond out of the return from rs6000_generate_compare. Results look good. -- Alan Modra Australia Development Lab, IBM
Re: Use atomics in libgomp mutex
On 11/27/2011 02:57 PM, Alan Modra wrote: This is the mutex part. Depends on http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_* values. * config/linux/mutex.h: Use atomic rather than sync builtins. * config/linux/mutex.c: Likewise. Comment. Use -1 for waiting state. * config/linux/omp-lock.h: Comment fix. * config/linux/arm/mutex.h: Delete. * config/linux/powerpc/mutex.h: Delete. * config/linux/ia64/mutex.h: Delete. * config/linux/mips/mutex.h: Delete. Looks good modulo _4/_n and using the success return of __atomic_compare_exchange. r~
Re: Fix PR51298, libgomp barrier failure
On 11/28/2011 06:02 AM, Alan Modra wrote: - unsigned int ret = bar-generation ~3; - /* Do we need any barrier here or is __sync_add_and_fetch acting - as the needed LoadLoad barrier already? */ - ret += __sync_add_and_fetch (bar-awaited, -1) == 0; + unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) ~3; + ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0; Given that the read from bar-generation is ACQ, we don't need a duplicate barrier from the REL on the atomic add. I believe both can be MEMMODEL_ACQUIRE both in order to force the ordering of these two memops, as well as force these to happen before anything subsequent. The s/_4/_n/ change needs doing. Otherwise ok. r~
[libgomp] Define memmodel
This fixes the ordering issues with a couple of Alan's pending patches, and put the enumeration where the entire library can use it. Committed. r~ * libgomp.h (enum memmodel): New. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 9cf9d32..2c9c071 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -45,6 +45,17 @@ # pragma GCC visibility push(hidden) #endif +/* If we were a C++ library, we'd get this from std/atomic. */ +enum memmodel +{ + MEMMODEL_RELAXED = 0, + MEMMODEL_CONSUME = 1, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_ACQ_REL = 4, + MEMMODEL_SEQ_CST = 5 +}; + #include sem.h #include mutex.h #include bar.h
Re: [PATCH, rs6000] Reorganize powerpc builtins and enable target pragma/attribute to enable/disable builtins
On Mon, Nov 7, 2011 at 5:32 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch rewrites the way builtins are handled in the rs6000 port so that like the x86, when you do #pragma GCC target or attribute((target(...))) and it enables or disables the builtins based on the current target attributes. It also defines or undefines the various hardware macros (__ARCH_xxx, __VSX__, and __ALTIVEC__), and enables/disables the vector keyword. I've done bootstrap builds with make check, and there is one minor regression, which I will fix as a bug (gcc.target/powerpc/recip-5.c fails due to not vectorizing the recip builtin) after this patch goes in. I build a linuxpaired compiler and verified that the paired builtins are being handled correctly. I have some more ideas of cleanup (moving the large table in rs6000-c.c into rs6000-builtins.def, and eliminating the various bdesc arrays, but for now, it works, and hopefully is a bit cleaner than before). Is this ok to check in? [gcc] 2011-11-07 Michael Meissner meiss...@tlinux.vnet.ibm.com * config/rs6000/rs6000-builtins.def: Completely rewrite builtin handling to centralize more of the builtins in this file. Change some builtin enumerations to be more consistant. Use a new mask to hold the current builtins, including SPE and PAIRED builtins which no longer are set via target_flags masks. Add -mdebug=builtin debug support. For power machines, define all Altivec and VSX buitins when the compiler starts, but don't allow the use of a builtin unless the appropriate switch is used, or #pragma GCC target is used to change the options. If the user uses #pragma GCC target, update the appropriate hardware macros. * common/config/rs6000/rs6000-common.c (rs6000_handle_option): Ditto. * config/rs6000/rs6000.opt (rs6000_builtin_mask): Ditto. * config/rs6000/rs6000-c.c (rs6000_macro_to_expand): Ditto. (rs6000_define_or_undefine_macro): Ditto. (rs6000_target_modify_macros): Ditto. (rs6000_cpu_cpp_builtins): Ditto. (altivec_overloaded_builtins): Ditto. (altivec_build_resolved_builtin): Ditto. * config/rs6000/rs6000.c (rs6000_target_modify_macros_ptr): Ditto. (rs6000_builtin_info): Ditto. (def_builtin): Ditto. (enable_mask_for_builtins): Ditto. (DEBUG_FMT_X): Ditto. (rs6000_debug_reg_global): Ditto. (rs6000_builtin_mask_calculate): Ditto. (rs6000_option_override_internal): Ditto. (rs6000_builtin_conversion): Ditto. (rs6000_builtin_vectorized_function): Ditto. (bdesc_3arg): Ditto. (bdesc_dst): Ditto. (bdesc_2arg): Ditto. (builtin_description_predicates): Ditto. (bdesc_altivec_preds): Ditto. (bdesc_spe_predicates): Ditto. (bdesc_spe_evsel): Ditto. (bdesc_paired_preds): Ditto. (bdesc_abs): Ditto. (bdesc_1arg): Ditto. (rs6000_overloaded_builtin_p): Ditto. (rs6000_expand_unop_builtin): Ditto. (bdesc_2arg_spe): Ditto. (spe_expand_builtin): Ditto. (rs6000_invalid_builtin): Ditto. (rs6000_expand_builtin): Ditto. (rs6000_init_builtins): Ditto. (spe_init_builtins): Ditto. (paired_init_builtins): Ditto. (altivec_init_builtins): Ditto. (builtin_function_type): Ditto. (rs6000_common_init_builtins): Ditto. (rs6000_builtin_reciprocal): Ditto. (rs6000_builtin_mask_names): Ditto. (rs6000_pragma_target_parse): Ditto. (rs6000_function_specific_print): Ditto. * config/rs6000/rs6000.h (MASK_DEBUG_BUILTIN): Ditto. (MASK_DEBUG_ALL): Ditto. (TARGET_DEBUG_BUILTIN): Ditto. (TARGET_EXTRA_BUILTINS): Ditto. (REGISTER_TARGET_PRAGMAS): Ditto. (enum rs6000_btc): Ditto. (RS6000_BTC_*): Ditto. (RS6000_BTM_*): Ditto. (enum rs6000_builtins): Ditto. * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Ditto. (rs6000_target_modify_macros): Ditto. (rs6000_target_modify_macros_ptr): Ditto. [gcc/testsuite] 2011-11-07 Michael Meissner meiss...@linux.vnet.ibm.com * testsuite/gcc.target/powerpc/ppc-target-4.c: New test for builtin functions being defined and undefined based on the #pragma GCC target settings. This is okay. Thanks, David
Re: [build] Cleanup rs6000/t-ppccomm configurations (PR other/51022)
On Mon, Nov 28, 2011 at 05:15:08PM +0100, Rainer Orth wrote: Alan Modra amo...@gmail.com writes: On Sun, Nov 27, 2011 at 08:36:38PM -0500, David Edelsohn wrote: I was hoping that other targets affected by the patch would test it as well. powerpc-linux and powerpc64-linux look good. Thanks for the confirmation. However, Andreas Tobler pointed out in private mail that I'd missed to reference rs6000/t-linux64 during the libgcc migration. Strange that this seems to have no ill effect. Hmm, yeah, we should have that back in. Lack of t-linux64 means we get all the soft-fp functions that we never use, bloating libgcc. Also, -mno-minimal-toc gives better code. Even when -mminimal-toc wouldn't be in force anyway the option selects -mcmodel=small. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] Improve EXPAND_SUM handling in expand_expr_addr_expr* (PR middle-end/50074)
On 11/25/2011 12:58 PM, Jakub Jelinek wrote: PR middle-end/50074 * expr.c (expand_expr_addr_expr_1): Don't call force_operand for EXPAND_SUM modifier. Ok. r~
Re: Adjust omp-low test for alignment
On Sat, 26 Nov 2011, Richard Henderson wrote: The m68k-linux failure for the various omp atomic tests is due to the fact that BIGGEST_ALIGNMENT is 16 bits on that platform. I think it's pretty reasonable to assume that if something is aligned to BIGGEST_ALIGNEMENT, then it can be considered aligned. BIGGEST_ALIGNMENT means aligned enough for normal access, but not necessarily for atomic access. This particular fix wouldn't do it for CRIS, for example, were BIGGEST_ALIGNMENT is 8 (bits), but an atomic access (inside the ll/sc sequence) such as for a futex requires not straddling a page boundary, i.e. natural alignment. Not that OMP support is imminent or critical for cris-linux or anything, but can we have a new macro? brgds, H-P
Re: [committed] Fix failure of Octeon BADDU tests
On Sun, Nov 27, 2011 at 2:18 AM, Richard Sandiford rdsandif...@googlemail.com wrote: This patch fixes the MIPS octeon-baddu-1.c failures. As Andrew says in the PR, the problem is that the combined instruction seems more expensive than the separate ones. I was going to submit a similar patch which I created internally at Cavium. I have a few more insn_cost patches in my queue also. Thanks, Andrew Pinski
Re: Fix PR51298, libgomp barrier failure
On Mon, Nov 28, 2011 at 05:42:15PM -0800, Richard Henderson wrote: On 11/28/2011 06:02 AM, Alan Modra wrote: - unsigned int ret = bar-generation ~3; - /* Do we need any barrier here or is __sync_add_and_fetch acting - as the needed LoadLoad barrier already? */ - ret += __sync_add_and_fetch (bar-awaited, -1) == 0; + unsigned int ret = __atomic_load_4 (bar-generation, MEMMODEL_ACQUIRE) ~3; + ret += __atomic_add_fetch (bar-awaited, -1, MEMMODEL_ACQ_REL) == 0; Given that the read from bar-generation is ACQ, we don't need a duplicate barrier from the REL on the atomic add. I believe both can be MEMMODEL_ACQUIRE both in order to force the ordering of these two memops, as well as force these to happen before anything subsequent. I tried with MEMMODEL_ACQUIRE and ran force-parallel-6.exe, the test that seems most sensitive to barrier problems, many times, and it hangs occasionally in futex_wait called via gomp_team_barrier_wait_end. I believe that threads can't be allowed to exit from gomp_{,team_}barrier_wait without hitting a release barrier, and perhaps from gomp_barrier_wait_last too. gomp_barrier_wait_start is a convenient point to insert the barrier, and a minimal change from the old code using __sync_add_and_fetch. I can add a comment. ;-) -- Alan Modra Australia Development Lab, IBM
[patch] Fix PR tree-optimization/51301
Hi, In vectorizer's over-widening pattern recognition the last statement is expected to be a type demotion, but the check for that was incomplete. We now check that the resulting type is not bigger than the original type of the computation. Bootstrapped and tested on powerpc64-suse-linux, tested on arm-linux-gnueabi (cross). Committed. Ira ChangeLog: PR tree-optimization/51301 * tree-vect-patterns.c (vect_recog_over_widening_pattern): Check that the last statement doesn't convert to a bigger type than the original type of the computation. testsuite/ChangeLog: PR tree-optimization/51301 * gcc.dg/vect/pr51301.c: New test. Index: tree-vect-patterns.c === --- tree-vect-patterns.c(revision 181796) +++ tree-vect-patterns.c(working copy) @@ -1088,6 +1088,7 @@ vect_recog_over_widening_pattern (VEC (gimple, hea tree var = NULL_TREE, new_type = NULL_TREE, tmp, new_oprnd; bool first; struct loop *loop = (gimple_bb (stmt))-loop_father; + tree type = NULL; first = true; while (1) @@ -1150,6 +1151,7 @@ vect_recog_over_widening_pattern (VEC (gimple, hea print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM); } + type = gimple_expr_type (stmt); prev_stmt = stmt; stmt = use_stmt; @@ -1165,9 +1167,11 @@ vect_recog_over_widening_pattern (VEC (gimple, hea { use_lhs = gimple_assign_lhs (use_stmt); use_type = TREE_TYPE (use_lhs); - /* Support only type promotion or signedess change. */ + /* Support only type promotion or signedess change. Check that USE_TYPE +is not bigger than the original type. */ if (!INTEGRAL_TYPE_P (use_type) - || TYPE_PRECISION (new_type) TYPE_PRECISION (use_type)) + || TYPE_PRECISION (new_type) TYPE_PRECISION (use_type) + || TYPE_PRECISION (type) TYPE_PRECISION (use_type)) return NULL; if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type) Index: testsuite/gcc.dg/vect/pr51301.c === --- testsuite/gcc.dg/vect/pr51301.c (revision 0) +++ testsuite/gcc.dg/vect/pr51301.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +typedef signed char int8_t; +typedef signed long long int64_t; +int64_t +f0a (int8_t * __restrict__ arg1) +{ + int idx; + int64_t result = 0; + for (idx = 0; idx 416; idx += 1) +result += arg1[idx] (arg1[idx] == arg1[idx]); + return result; +} + +/* { dg-final { cleanup-tree-dump vect } } */
Go patch committed: Implement predefined error interface
The Go language recently added a new predefined type, named error, which is an interface type with a single method, Error. This patch, from Rémy Oudompheng, implements this type for gccgo. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 9f8ff21f7771 go/gogo.cc --- a/go/gogo.cc Mon Nov 28 22:51:39 2011 -0800 +++ b/go/gogo.cc Mon Nov 28 22:52:22 2011 -0800 @@ -97,6 +97,18 @@ this-add_named_type(Type::make_named_string_type()); + // error is interface { Error() string }. + { +Typed_identifier_list *methods = new Typed_identifier_list; +Typed_identifier_list *results = new Typed_identifier_list; +results-push_back(Typed_identifier(, Type::lookup_string_type(), loc)); +Type *method_type = Type::make_function_type(NULL, NULL, results, loc); +methods-push_back(Typed_identifier(Error, method_type, loc)); +Type *error_iface = Type::make_interface_type(methods, loc); +Named_type *error_type = Named_object::make_type(error, NULL, error_iface, loc)-type_value(); +this-add_named_type(error_type); + } + this-globals_-add_constant(Typed_identifier(true, Type::make_boolean_type(), loc),
Fwd: Re: hashtable cleanup + new testsuite files
Attached patch applied. 2011-11-29 François Dumontfdum...@gcc.gnu.org * include/bits/hashtable.h (_Hashtable::_M_rehash): Remove code useless now that the hashtable implementation put the hash code in cache if the hash functor throws. * testsuite/23_containers/unordered_set/erase/1.cc: Enhance test by checking also distance between begin and end iterators to validate underlying data model. * testsuite/23_containers/unordered_multiset/erase/1.cc: Likewise. * testsuire/23_containers/unordered_map/erase/1.cc: Likewise. * testsuite/23_containers/unordered_multimap/erase/1.cc: Likewise. * testsuite/23_containers/unordered_multiset/erase/2.cc: New. * testsuite/23_containers/unordered_multimap/erase/2.cc: New. Regards On 11/28/2011 10:54 PM, Paolo Carlini wrote: On 11/28/2011 09:29 PM, François Dumont wrote: 2011-11-28 François Dumontfdum...@gcc.gnu.org * include/bits/hashtable.h (_Hashtable::_M_rehash): Remove code useless now that the hashtable implementation put the hash code in cache if the hash functor throws. * testsuite/23_containers/unordered_set/erase/1.cc: Enhance test by checking also distance between begin and end iterators to validate underlying data model. * testsuite/23_containers/unordered_multiset/erase/1.cc: Likewise. * testsuire/23_containers/unordered_map/erase/1.cc: Likewise. * testsuite/23_containers/unordered_multimap/erase/1.cc: Likewise. * testsuite/23_containers/unordered_multiset/erase/2.cc: New. * testsuite/23_containers/unordered_multimap/erase/2.cc: New. Tested under linux x86_64. Ok to commit in trunk ? Ok, thanks. Paolo. Index: include/bits/hashtable.h === --- include/bits/hashtable.h (revision 181729) +++ include/bits/hashtable.h (working copy) @@ -233,11 +233,6 @@ void _M_deallocate_node(_Node* __n); - // Deallocate all nodes contained in the bucket array, buckets' nodes - // are not linked to each other - void - _M_deallocate_nodes(_Bucket*, size_type); - // Deallocate the linked list of nodes pointed to by __n void _M_deallocate_nodes(_Node* __n); @@ -591,19 +586,6 @@ void _Hashtable_Key, _Value, _Allocator, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk:: -_M_deallocate_nodes(_Bucket* __buckets, size_type __n) -{ - for (size_type __i = 0; __i != __n; ++__i) - _M_deallocate_nodes(__buckets[__i]); -} - - templatetypename _Key, typename _Value, - typename _Allocator, typename _ExtractKey, typename _Equal, - typename _H1, typename _H2, typename _Hash, typename _RehashPolicy, - bool __chc, bool __cit, bool __uk -void -_Hashtable_Key, _Value, _Allocator, _ExtractKey, _Equal, - _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk:: _M_deallocate_nodes(_Node* __n) { while (__n) @@ -1542,11 +1524,10 @@ _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk:: _M_rehash(size_type __n, const _RehashPolicyState __state) { - _Bucket* __new_buckets = nullptr; - _Node* __p = _M_buckets[_M_begin_bucket_index]; __try { - __new_buckets = _M_allocate_buckets(__n); + _Bucket* __new_buckets = _M_allocate_buckets(__n); + _Node* __p = _M_buckets[_M_begin_bucket_index]; // First loop to store each node in its new bucket while (__p) { @@ -1591,24 +1572,9 @@ } __catch(...) { - if (__new_buckets) - { - // A failure here means that a hash function threw an exception. - // We can't restore the previous state without calling the hash - // function again, so the only sensible recovery is to delete - // everything. - _M_deallocate_nodes(__new_buckets, __n); - _M_deallocate_buckets(__new_buckets, __n); - _M_deallocate_nodes(__p); - __builtin_memset(_M_buckets, 0, sizeof(_Bucket) * _M_bucket_count); - _M_element_count = 0; - _M_begin_bucket_index = _M_bucket_count; - _M_rehash_policy._M_reset(_RehashPolicyState()); - } - else - // A failure here means that buckets allocation failed. We only - // have to restore hash policy previous state. - _M_rehash_policy._M_reset(__state); + // A failure here means that buckets allocation failed. We only + // have to restore hash policy previous state. + _M_rehash_policy._M_reset(__state); __throw_exception_again; } } Index: testsuite/23_containers/unordered_map/erase/1.cc === --- testsuite/23_containers/unordered_map/erase/1.cc (revision 181729) +++ testsuite/23_containers/unordered_map/erase/1.cc (working copy) @@ -23,6 +23,18 @@ #include string #include testsuite_hooks.h +namespace +{ +