Go patch committed: Fix a few compiler crashes
This patch to the Go frontend fixes a few cases where the compiler was crashing on invalid code. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r cc46c1bb0278 go/expressions.cc --- a/go/expressions.cc Fri Apr 27 17:28:13 2012 -0700 +++ b/go/expressions.cc Fri Apr 27 21:51:48 2012 -0700 @@ -9225,7 +9225,7 @@ ref->set_is_lvalue(); tree temp_tree = ref->get_tree(context); if (temp_tree == error_mark_node) - continue; + return error_mark_node; tree val_tree = build3_loc(loc.gcc_location(), COMPONENT_REF, TREE_TYPE(field), call_tree, field, NULL_TREE); diff -r cc46c1bb0278 go/types.cc --- a/go/types.cc Fri Apr 27 17:28:13 2012 -0700 +++ b/go/types.cc Fri Apr 27 21:51:48 2012 -0700 @@ -5450,6 +5450,11 @@ mpz_t val; if (this->length_->numeric_constant_value(&nc) && nc.to_int(&val)) { + if (mpz_sgn(val) < 0) + { + this->length_tree_ = error_mark_node; + return this->length_tree_; + } Type* t = nc.type(); if (t == NULL) t = Type::lookup_integer_type("int"); @@ -6551,7 +6556,11 @@ Interface_type::is_identical(const Interface_type* t, bool errors_are_identical) const { - go_assert(this->methods_are_finalized_ && t->methods_are_finalized_); + // If methods have not been finalized, then we are asking whether + // func redeclarations are the same. This is an error, so for + // simplicity we say they are never the same. + if (!this->methods_are_finalized_ || !t->methods_are_finalized_) +return false; // We require the same methods with the same types. The methods // have already been sorted.
Re: [RFH / Patch] PR 51222
On 04/27/2012 09:42 PM, Paolo Carlini wrote: In particular about the exact meaning of the FIXME? More generally, is the issue here matter of compile-time optimization? Like we want to save work when we actually *know* the type even in template context? (then looks like type_dependent_expression_p isn't really what we want...) It's more of a mangling issue. If the type is instantiation-dependent, we should mangle it as written; if not, we mangle the real type. Currently we don't test for instantiation-dependence, so we get this wrong; your change would go too far in the other direction. Jason
Re: [PATCH, PR38785] Throttle PRE at -O3
On 27/04/2012, at 1:17 AM, Richard Guenther wrote: > On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov > wrote: > ... > +ppre_n_insert_for_speed_p (pre_expr expr, basic_block block, > + unsigned int inserts_needed) > +{ > + /* The more expensive EXPR is, the more we should be prepared to insert > + in the predecessors of BLOCK to make EXPR fully redundant. > + For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use > + SSA_NAME with a constant as cheap. */ > > the function implementation is odd. cost is always 1 when used, and > both memory loads and calls are always cheap, but for example casts are not? > Isn't > > return EDGE_COUNT (block->preds) * cost >= inserts_needed; > > always true? Or is inserts_needed not what it suggests? It /almost/ is what it suggests. To account for the fact that inserts will not benefit some of the paths (50% in the initial version of the patch, and a more precise estimate in a latter version) we tell/lie ppre_n_insert_for_speed_p that we need a greater number of the inserts (e.g., 2x the real number) that we will perform. > > + /* Insert only if we can remove a later expression on a path > +that we want to optimize for speed. > +The phi node that we will be inserting in BLOCK is not free, > +and inserting it for the sake of !optimize_for_speed > successor > +may cause regressions on the speed path. */ > + FOR_EACH_EDGE (succ, ei, block->succs) > + { > + if (bitmap_set_contains_value (PA_IN (succ->dest), val)) > + { > + if (optimize_edge_for_speed_p (succ)) > + do_insertion = true; > + > + ++pa_succs; > + } > + } > > the change up to here looks good to me - can you factor out the command-line > switch plus this optimize_edge_for_speed_p test into a separate patch > (hereby approved)? Attached is what I checked in after retesting on a fresh mainline. This part of the patch by itself turns out to fix the bitmnp01 regression in what seems to be a reliable way. Therefore, I'll mark PR38785 as fixed. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics pr38785-2.ChangeLog Description: Binary data pr38785-2.patch Description: Binary data
[RFH / Patch] PR 51222
Hi, I'm having a look to this PR filed by Daniel, which is again about SFINAE for combined delete / new expressions, eg a simple example could be (Daniel provided tens) template auto g(int) -> char; template auto g(...) -> char(&)[2]; static_assert(sizeof(g(0)) == 2, "Ouch"); We handle incorrectly all such cases, all the static_assert for the testcases provided by Daniel fire, for a simple reason: in finish_decltype_type the check: /* FIXME instantiation-dependent */ if (type_dependent_expression_p (expr) /* In a template, a COMPONENT_REF has an IDENTIFIER_NODE for op1 even if it isn't dependent, so that we can check access control at instantiation time, so defer the decltype as well (PR 42277). */ || (id_expression_or_member_access_p && processing_template_decl && TREE_CODE (expr) == COMPONENT_REF)) { is false, thus we don't produce a DECLTYPE_TYPE and we don't tsubst into it. Nothing can possibly work... Now I did the trivial experiment of replacing the above with the usual: if (processing_template_decl) { and *the whole* testsuite passes (all the tests provided by Daniel included) with the only exception of nullptr26.C, ie: template void f(T, U); template void f(T, decltype(nullptr)); int main() { f(1, nullptr); } which we reject as ambiguous (and of course it would be easy to handle it specially, for sure std::nullptr_t it *is* special!). Thus, I suspect the fix for this issue could turn out to be pretty simple: any help? In particular about the exact meaning of the FIXME? More generally, is the issue here matter of compile-time optimization? Like we want to save work when we actually *know* the type even in template context? (then looks like type_dependent_expression_p isn't really what we want...) Thanks! Paolo.
Go patch committed: Use less memory for some array/slice literals
This patch to the Go frontend changes the representation of array/slice literals to use less memory when the literal uses index keys. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r ebdbe2ad3ef6 go/expressions.cc --- a/go/expressions.cc Fri Apr 27 09:34:39 2012 -0700 +++ b/go/expressions.cc Fri Apr 27 17:24:14 2012 -0700 @@ -6,6 +6,8 @@ #include "go-system.h" +#include + #include #ifndef ENABLE_BUILD_WITH_CXX @@ -11392,11 +11394,12 @@ { protected: Array_construction_expression(Expression_classification classification, -Type* type, Expression_list* vals, -Location location) +Type* type, +const std::vector* indexes, +Expression_list* vals, Location location) : Expression(classification, location), - type_(type), vals_(vals) - { } + type_(type), indexes_(indexes), vals_(vals) + { go_assert(indexes == NULL || indexes->size() == vals->size()); } public: // Return whether this is a constant initializer. @@ -11425,6 +11428,11 @@ void do_export(Export*) const; + // The indexes. + const std::vector* + indexes() + { return this->indexes_; } + // The list of values. Expression_list* vals() @@ -11440,7 +11448,10 @@ private: // The type of the array to construct. Type* type_; - // The list of values. + // The list of indexes into the array, one for each value. This may + // be NULL, in which case the indexes start at zero and increment. + const std::vector* indexes_; + // The list of values. This may be NULL if there are no values. Expression_list* vals_; }; @@ -11523,18 +11534,6 @@ this->set_is_error(); } } - - Expression* length = at->length(); - Numeric_constant nc; - unsigned long val; - if (length != NULL - && !length->is_error_expression() - && length->numeric_constant_value(&nc) - && nc.to_unsigned_long(&val) == Numeric_constant::NC_UL_VALID) -{ - if (this->vals_->size() > val) - this->report_error(_("too many elements in composite literal")); -} } // Get a constructor tree for the array values. @@ -11552,12 +11551,22 @@ if (this->vals_ != NULL) { size_t i = 0; + std::vector::const_iterator pi; + if (this->indexes_ != NULL) + pi = this->indexes_->begin(); for (Expression_list::const_iterator pv = this->vals_->begin(); pv != this->vals_->end(); ++pv, ++i) { + if (this->indexes_ != NULL) + go_assert(pi != this->indexes_->end()); constructor_elt* elt = VEC_quick_push(constructor_elt, values, NULL); - elt->index = size_int(i); + + if (this->indexes_ == NULL) + elt->index = size_int(i); + else + elt->index = size_int(*pi); + if (*pv == NULL) { Gogo* gogo = context->gogo(); @@ -11578,7 +11587,11 @@ return error_mark_node; if (!TREE_CONSTANT(elt->value)) is_constant = false; - } + if (this->indexes_ != NULL) + ++pi; + } + if (this->indexes_ != NULL) + go_assert(pi == this->indexes_->end()); } tree ret = build_constructor(type_tree, values); @@ -11596,13 +11609,28 @@ exp->write_type(this->type_); if (this->vals_ != NULL) { + std::vector::const_iterator pi; + if (this->indexes_ != NULL) + pi = this->indexes_->begin(); for (Expression_list::const_iterator pv = this->vals_->begin(); pv != this->vals_->end(); ++pv) { exp->write_c_string(", "); + + if (this->indexes_ != NULL) + { + char buf[100]; + snprintf(buf, sizeof buf, "%lu", *pi); + exp->write_c_string(buf); + exp->write_c_string(":"); + } + if (*pv != NULL) (*pv)->export_expression(exp); + + if (this->indexes_ != NULL) + ++pi; } } exp->write_c_string(")"); @@ -11614,8 +11642,7 @@ Array_construction_expression::do_dump_expression( Ast_dump_context* ast_dump_context) const { - Expression* length = this->type_->array_type() != NULL ? - this->type_->array_type()->length() : NULL; + Expression* length = this->type_->array_type()->length(); ast_dump_context->ostream() << "[" ; if (length != NULL) @@ -11625,7 +11652,22 @@ ast_dump_context->ostream() << "]" ; ast_dump_context->dump_type(this->type_); ast_dump_context->ostream() << "{" ; - ast_dump_context->dump_expression_list(this->vals_); + if (this->indexes_ == NULL) +ast_dump_context->dump_expression_list(this->vals_); + else +{ + Expression_list::const_iterator pv = this->vals_->begin(); + for (std::vector::const_iterator pi = + this->indexes_->begin(); + pi != this->indexes_->end(); + ++pi, ++pv) + { + if (pi != this->indexes_->begin()) + ast_dump_context->ostream() << ", "; + ast_dump_context->ostream() << *pi << ':'; + ast_dump_context->dump_expression(*pv); + } +} ast_dump_context->ostream() << "}" ; } @@ -11636,20 +11678,19 @@ public Array_construction_expression { public: - Fixed_array_constru
Re: [PATCH, diagnostics] Add -Wvarargs option
Dodji Seketeli writes: > Tested on x86_64-unknown-linux-gnu against trunk. Bootstrap for all > languages is still underway. > > gcc/c-family/ > > * c.opt (Wvarargs): Define new option. > > gcc/ > builtins.c (fold_builtin_next_arg): Use OPT_Wvarargs as an > argument for the various warning_at calls. > > gcc/doc/ > > * invoke.texi: Update the documentation. > > gcc/testsuite/ > > * c-c++-common/Wvarargs.c: New test case. > * c-c++-common/Wvarargs-2.c: Likewise. FWIW, this completed bootstrap and testing fine. -- Dodji
Re: [PATCH 11/13] Fix va_start related location
Dodji Seketeli writes: > Tested on x86_64-unknown-linux-gnu against trunk. Bootstrap is still > running ... > > * builtins.c (fold_builtin_next_arg): Unwinds to the first > location in real source code. > --- > gcc/builtins.c | 23 +++ > 1 files changed, 19 insertions(+), 4 deletions(-) FWIW, this completed bootstrap and testing fine. -- Dodji
[ARM] Add atomic_loaddi pattern
We can perform a single-copy atomic load with an ldrexd insn. If the load is all we care about, we need not pair this with a strexd. Ok? r~ * config/arm/arm.md (UNSPEC_LL): New. * config/arm/sync.md (atomic_loaddi, atomic_loaddi_1): New. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 79eff0e..75d2565 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -117,6 +117,7 @@ ; that. UNSPEC_UNALIGNED_STORE ; Same for str/strh. UNSPEC_PIC_UNIFIED; Create a common pic addressing form. + UNSPEC_LL; Represent an unpaired load-register-exclusive. ]) ;; UNSPEC_VOLATILE Usage: diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md index 03838f5..de2da3b 100644 --- a/gcc/config/arm/sync.md +++ b/gcc/config/arm/sync.md @@ -65,6 +65,40 @@ (set_attr "conds" "unconditional") (set_attr "predicable" "no")]) +;; Note that ldrd and vldr are *not* guaranteed to be single-copy atomic, +;; even for a 64-bit aligned address. Instead we use a ldrexd unparied +;; with a store. +(define_expand "atomic_loaddi" + [(match_operand:DI 0 "s_register_operand") ;; val out + (match_operand:DI 1 "mem_noofs_operand");; memory + (match_operand:SI 2 "const_int_operand")] ;; model + "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN" +{ + enum memmodel model = (enum memmodel) INTVAL (operands[2]); + expand_mem_thread_fence (model); + emit_insn (gen_atomic_loaddi_1 (operands[0], operands[1])); + if (model == MEMMODEL_SEQ_CST) +expand_mem_thread_fence (model); + DONE; +}) + +(define_insn "atomic_loaddi_1" + [(set (match_operand:DI 0 "s_register_operand" "=r") + (unspec:DI [(match_operand:DI 1 "mem_noofs_operand" "Ua")] + UNSPEC_LL))] + "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN" + { +rtx target = operands[0]; +/* The restrictions on target registers in ARM mode are that the two + registers are consecutive and the first one is even; Thumb is + actually more flexible, but DI should give us this anyway. + Note that the 1st register always gets the lowest word in memory. */ +gcc_assert ((REGNO (target) & 1) == 0); +operands[2] = gen_rtx_REG (SImode, REGNO (target) + 1); +return "ldrexd%?\t%0, %2, %C1"; + } + [(set_attr "predicable" "yes")]) + (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "s_register_operand" "");; bool out (match_operand:QHSD 1 "s_register_operand" "") ;; val out
Re: [PATCH] Proper use of decl_function_context in dwar2out.c
On 04/27/2012 07:56 AM, Martin Jambor wrote: PR lto/53138 * dwarf2out.c (dwarf2out_decl): Only lookup die representing context of a variable when the contect is a function. OK. Jason
[committed] Fix section conflict compiling rtld.c
The attached change fixes a problem compiling the linux dynamic loader for the PA target. Putting function labels (plabels) in the constant pool results in a section flag conflict compiling rtld.c. Other targets don't do this, and testing indicates that it is not necessary. Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11 with no regressions. Committed to trunk. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2012-04-27 John David Anglin PR target/52999 * config/pa/pa.c (pa_legitimate_constant_p): Don't put function labels in constant pool. Index: config/pa/pa.c === --- config/pa/pa.c (revision 186918) +++ config/pa/pa.c (working copy) @@ -10332,9 +10332,6 @@ && !pa_cint_ok_for_move (INTVAL (x))) return false; - if (function_label_operand (x, mode)) -return false; - return true; }
Re: [PATCH] teach phi-opt to produce -(a COND b)
> This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53144 Looks like a PPRE bug. Paolo
[PATCH] libatomic, v2
This tree is also available from git://repo.or.cz/gcc/rth.git rth/libatomic Changes since v1: * I believe I've addressed all of Torvald's feedback especially re barrier problems. I have not changed the lock hash function, as there were no concrete suggestions, and certainly that ought not be an issue for correctness. * I've increased the set of objects that can be handled lock-free in the generic routines. If there is a supported aligned integer that overlaps the object, we'll use compare-exchange. * "Tuning", aka enabling multiple paths to elide memory barriers, for ia64 and ppc targets. Tested on armv4, armv7, x86_64, ia64, sparc, and ppc64 linux. I think the library is ready for merge back to mainline, but I will wait until at least Monday for feedback and proximity to my desk to deal with potential fallout. r~ d-libatomic-2.gz Description: GNU Zip compressed data
Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.
On Fri, 2012-04-27 at 15:43 +0200, Jakub Jelinek wrote: > On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote: > > But even without this, I think the patch is worth it just to get rid of > > all the relocations necessary otherwise. > > IMHO we should defer applying this by a few months, given that GDB support > is only being added these days and -gdwarf-4 is now the default. > Applying it in August/September when there is already GDB version with > the support would be better. OK, I'll try to remember and ping as soon as a new GDB release is out with the patch in. > > + attr.dw_attr = DW_AT_high_pc; > > + if (dwarf_version < 3) > > Shouldn't this be < 4? I think DWARF 3 doesn't have constant class > DW_AT_high_pc. Oops, yes, thanks for spotting. Cheers, Mark
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
On Fri, Apr 27, 2012 at 5:06 AM, Jan Hubicka wrote: >> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou >> wrote: >> >> We do not depend on the block structure any more when dealing with >> >> stack layout for variables in GCC 4.7.0 and above. I am not saying >> >> your patch is incorrect or not needed. Just it will not have an >> >> effect on variable stack layout. >> > >> > It might be worth backporting to the 4.6 branch though, these stack layout >> > issues are very nasty. >> >> What about not setting a BLOCK on the call stmt? That said, I can't see > > I recall that not seetting the block did lead to ICE... > >> how outlining a SESE region that starts at the beginning of the function >> is not conservatively using the outermost BLOCK ... so, is the bug not >> elsewhere? > > ... so I made the exactly same conclussion. > The problem is with stack vars allocated in header that survive till the > tail part? A SESE region does not necessary at the beginning of the function. They can be anywhere. In the example I attached, it is at the end of function. Even if the outlined region is at the beginning the function. setting the call_stmt as the outermost block is also incorrect. For c++ programs, the block for function level local variables is not DECL_INITIAL(current_function_decl). It is the subblock of DECL_INITIAL(current_function_decl). This is different from c programs. -Rong > > Honza >> >> Richard. >> >> > -- >> > Eric Botcazou
Re: RFA: consolidate DWARF strings into libiberty
> "Tom" == Tom Tromey writes: HJ> You should add extern "C" for C++ on those functions moved to HJ> libiberty. Tom> Yeah, sorry about that. Tom> I'm testing the fix. Here is what I am checking in. Tom ChangeLog: 2012-04-27 Tom Tromey * dwarf2.h: Wrap function declarations in extern "C". Index: dwarf2.h === --- dwarf2.h(revision 186908) +++ dwarf2.h(working copy) @@ -361,6 +361,10 @@ #define DW_EH_PE_indirect 0x80 +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + /* Return the name of a DW_TAG_ constant, or NULL if the value is not recognized. */ extern const char *get_DW_TAG_name (unsigned int tag); @@ -385,4 +389,8 @@ recognized. */ extern const char *get_DW_CFA_name (unsigned int opc); +#ifdef __cplusplus +} +#endif /* __cplusplus */ + #endif /* _DWARF2_H */
Re: [PATCH] teach phi-opt to produce -(a COND b)
On Fri, Apr 27, 2012 at 3:01 AM, Paolo Bonzini wrote: > This patch teaches phiopt to look at phis whose arguments are -1 and 0, > and produce negated setcc statements. > > Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch > for pr53138. Ok for mainline? > > Paolo > > 2012-04-27 Paolo Bonzini > > * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs > whose arguments are -1 and 0, by negating the result of the > conditional. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53144 H.J.
libgo patch committed: Provide i386 long double math fns if needed
This patch to libgo provides the long double math functions if they are not in the system libm. This is needed on i386 because the Go frontend automatically converts calls to these functions from float64 (aka C double) to C long double, as guided by express_precision_type. This is done so that the compiler can use the x87 instructions when available. However, when not optimizing, the compiler will still emit a call to the function. This occurs in particular when running the libgo math test; I'm not sure it actually happens at any other time. This patch provides a sufficient implementation of these functions for use by Go. This fixes another part of PR 52358. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and i386-sun-solaris2.9. Committed to mainline and 4.7 branch. Ian diff -r a253ccfa3489 libgo/configure.ac --- a/libgo/configure.ac Fri Apr 27 09:30:57 2012 -0700 +++ b/libgo/configure.ac Fri Apr 27 09:33:21 2012 -0700 @@ -489,6 +489,11 @@ AC_TYPE_OFF_T AC_CHECK_TYPES([loff_t]) +LIBS_hold="$LIBS" +LIBS="$LIBS -lm" +AC_CHECK_FUNCS(cosl expl logl sinl tanl acosl asinl atanl atan2l expm1l ldexpl log10l log1pl) +LIBS="$LIBS_hold" + CFLAGS_hold="$CFLAGS" CFLAGS="$CFLAGS $PTHREAD_CFLAGS" LIBS_hold="$LIBS" diff -r a253ccfa3489 libgo/runtime/go-nosys.c --- a/libgo/runtime/go-nosys.c Fri Apr 27 09:30:57 2012 -0700 +++ b/libgo/runtime/go-nosys.c Fri Apr 27 09:33:21 2012 -0700 @@ -13,6 +13,7 @@ #include #include +#include #include #include #include @@ -239,3 +240,116 @@ return -1; } #endif + +/* Long double math functions. These are needed on old i386 systems + that don't have them in libm. The compiler translates calls to + these functions on float64 to call an 80-bit floating point + function instead, because when optimizing that function can be + executed as an x87 instructure. However, when not optimizing, this + translates into a call to the math function. So on systems that + don't provide these functions, we provide a version that just calls + the float64 version. */ + +#ifndef HAVE_COSL +long double +cosl (long double a) +{ + return (long double) cos ((double) a); +} +#endif + +#ifndef HAVE_EXPL +long double +expl (long double a) +{ + return (long double) exp ((double) a); +} +#endif + +#ifndef HAVE_LOGL +long double +logl (long double a) +{ + return (long double) log ((double) a); +} +#endif + +#ifndef HAVE_SINL +long double +sinl (long double a) +{ + return (long double) sin ((double) a); +} +#endif + +#ifndef HAVE_TANL +long double +tanl (long double a) +{ + return (long double) tan ((double) a); +} +#endif + +#ifndef HAVE_ACOSL +long double +acosl (long double a) +{ + return (long double) acos ((double) a); +} +#endif + +#ifndef HAVE_ASINL +long double +asinl (long double a) +{ + return (long double) asin ((double) a); +} +#endif + +#ifndef HAVE_ATANL +long double +atanl (long double a) +{ + return (long double) atan ((double) a); +} +#endif + +#ifndef HAVE_ATAN2L +long double +atan2l (long double a, long double b) +{ + return (long double) atan2 ((double) a, (double) b); +} +#endif + +#ifndef HAVE_EXPM1L +long double +expm1l (long double a) +{ + return (long double) expm1 ((double) a); +} +#endif + +#ifndef HAVE_LDEXPL +long double +ldexpl (long double a, int exp) +{ + return (long double) ldexp ((double) a, exp); +} +#endif + +#ifndef HAVE_LOG10L +long double +log10l (long double a) +{ + return (long double) log10 ((double) a); +} +#endif + +#ifndef HAVE_LOG1PL +long double +log1pl (long double a) +{ + return (long double) log1p ((double) a); +} +#endif
libgo patch committed: Work around bug in Solaris 9 ldexp
The Solaris 9 ldexp function has a bug: ldexp(-1, -1075) returns positive zero when it should return negative zero. This patch works around this bug in the Go math package. This fixes part of PR 52358. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and i386-sun-solaris2.9. Committed to mainline and 4.7 branch. Ian diff -r c1d1d764b7a6 libgo/go/math/ldexp.go --- a/libgo/go/math/ldexp.go Fri Apr 27 09:26:59 2012 -0700 +++ b/libgo/go/math/ldexp.go Fri Apr 27 09:29:05 2012 -0700 @@ -16,7 +16,18 @@ func libc_ldexp(float64, int) float64 func Ldexp(frac float64, exp int) float64 { - return libc_ldexp(frac, exp) + r := libc_ldexp(frac, exp) + + // Work around a bug in the implementation of ldexp on Solaris + // 9. If multiplying a negative number by 2 raised to a + // negative exponent underflows, we want to return negative + // zero, but the Solaris 9 implementation returns positive + // zero. This workaround can be removed when and if we no + // longer care about Solaris 9. + if r == 0 && frac < 0 && exp < 0 { + r = Copysign(0, frac) + } + return r } func ldexp(frac float64, exp int) float64 {
libgo patch committed: Correct syscall.Setenv if no setenv
This patch to libgo corrects the implementation of syscall.Setenv for systems that do not have the setenv library call, but only have putenv. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and i386-sun-solaris2.9. Committed to mainline and 4.7 branch. Ian diff -r 1ed95295e00b libgo/runtime/go-setenv.c --- a/libgo/runtime/go-setenv.c Wed Apr 25 21:25:31 2012 -0700 +++ b/libgo/runtime/go-setenv.c Fri Apr 27 09:25:43 2012 -0700 @@ -50,7 +50,7 @@ #else /* !defined(HAVE_SETENV) */ - kn = malloc (k.__length + v.__length + 2); + kn = __go_alloc (k.__length + v.__length + 2); __builtin_memcpy (kn, ks, k.__length); kn[k.__length] = '='; __builtin_memcpy (kn + k.__length + 1, vs, v.__length);
Re: Fix find_moveable_pseudos, PR52997
Bernd Schmidt wrote: > We're creating new pseudos, and while we're resizing some data > structures, we aren't doing it for everything. > @@ -3983,7 +3983,8 @@ find_moveable_pseudos (void) > >last_moveable_pseudo = max_reg_num (); > > - fix_reg_equiv_init(); > + fix_reg_equiv_init (); > + resize_reg_info (); >regstat_free_n_sets_and_refs (); >regstat_free_ri (); >regstat_init_n_sets_and_refs (); This causes a bootstrap failure on s390 when enabling -fsched-pressure -fsched-pressure-algorithm=model by default (not sure why this doesn't show up without that change). The problem is that resize_reg_info only resizes the data structure, but leaves it uninitialized. All other uses of resize_reg_info subsequently fill in that data, e.g. in ira-costs.c:find_costs_and_classes. if (!resize_reg_info () && allocno_p && pseudo_classes_defined_p && flag_expensive_optimizations) { [...] But because the structure was now already resized in find_moveable_pseudos, the next call to resize_reg_info returns false, and thus find_costs_and_classes assumes the info is actually present, but it is uninitialized, which causes a crash in the subsequent code pref[ALLOCNO_NUM (a)] = reg_preferred_class (ALLOCNO_REGNO (a)); setup_regno_cost_classes_by_aclass (ALLOCNO_REGNO (a), pref[ALLOCNO_NUM (a)]); max_cost_classes_num = MAX (max_cost_classes_num, regno_cost_classes[ALLOCNO_REGNO (a)]->num); since reg_preferred_class returns 255 for the new pseudo. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: RFA: consolidate DWARF strings into libiberty
HJ> You should add extern "C" for C++ on those functions moved to HJ> libiberty. Yeah, sorry about that. I'm testing the fix. Tom
Re: RFA: consolidate DWARF strings into libiberty
On Fri, Apr 27, 2012 at 9:01 AM, H.J. Lu wrote: > On Fri, Apr 27, 2012 at 7:11 AM, Tom Tromey wrote: >>> "Jakub" == Jakub Jelinek writes: >> >> Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote: I will not oppose adding more unrelated stuff to libiberty, but neither will I approve it. I will let one of the other maintainers or a global maintainer approve it. >> >> Jakub> The original libiberty patch is ok for trunk then. >> >> Thanks, Jakub. I am checking it in to the gcc tree. >> I'll send the gdb and binutils follow-ups later. >> >> Tom > > This breaks GCC bootstrap: > > http://gcc.gnu.org/ml/gcc-regression/2012-04/msg00500.html > You should add extern "C" for C++ on those functions moved to libiberty. -- H.J.
Re: RFA: consolidate DWARF strings into libiberty
On Fri, Apr 27, 2012 at 7:11 AM, Tom Tromey wrote: >> "Jakub" == Jakub Jelinek writes: > > Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote: >>> >>> I will not oppose adding more unrelated stuff to libiberty, but >>> neither will I approve it. I will let one of the other maintainers or >>> a global maintainer approve it. > > Jakub> The original libiberty patch is ok for trunk then. > > Thanks, Jakub. I am checking it in to the gcc tree. > I'll send the gdb and binutils follow-ups later. > > Tom This breaks GCC bootstrap: http://gcc.gnu.org/ml/gcc-regression/2012-04/msg00500.html -- H.J.
Re: [PATCH, i386, middle-end, tessuite] Intel TSX's HLE.
On 04/27/12 05:49, Kirill Yukhin wrote: > + if (targetm.memmodel_check) > +val = targetm.memmodel_check (val); > + else if (val & ~MEMMODEL_MASK) > + > +{ Incorrect vertical whitespace. > + if ( (failure & MEMMODEL_MASK) == MEMMODEL_RELEASE > + || (failure & MEMMODEL_MASK) == MEMMODEL_ACQ_REL) Incorrect horizontal whitespace. >{"generic32", PROCESSOR_GENERIC32, CPU_PENTIUMPRO, > - 0 /* flags are only used for -march switch. */ }, > + PTA_HLE /* flags are only used for -march switch. */ }, >{"generic64", PROCESSOR_GENERIC64, CPU_GENERIC64, > - PTA_64BIT /* flags are only used for -march switch. */ }, > + PTA_64BIT > +| PTA_HLE /* flags are only used for -march switch. */ }, Adding to the generic tunings I suggested. Removing the HLE bit from core-avx2 or whatever cpu actually has the bit I did not suggest. Otherwise, this looks good. r~
Re: [PATCH] Default to -gdwarf-4
On Wed, Apr 25, 2012 at 6:47 AM, Jakub Jelinek wrote: > Hi! > > For reasonable debugging experience recent GCC versions need > GDB >= 7.0 for quite some time, and DWARF4 is almost 2 years old now, > and offers lots of improvements over DWARF2 we still default to. > > So, I'd like to make -gdwarf-4 (just the version, of course -g > is needed to enable debug info generation) the default, together > with -fno-debug-types-section (as .debug_types isn't right now always a win, > see the data in the dwz-0.1 announcement plus not all DWARF consumers can > handle it yet or are gaining support only very recently (e.g. valgrind)) > and -grecord-gcc-switches which is IMHO worth the few extra bytes per CU > (unless every CU is compiled with different code generation affecting > options usually just 4 extra bytes). In Fedora we default to this combo > already for some time. Targets that have tools that are upset by > any extensions that defaulted to -gno-strict-dwarf previously will now > default to -gdwarf-2 as before. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2012-04-25 Jakub Jelinek > > * common.opt (flag_debug_types_section): Default to 0. > (dwarf_version): Default to 4. > (dwarf_record_gcc_switches): Default to 1. > (dwarf_strict): Default to 0. > * toplev.c (process_options): Don't handle dwarf_strict > or dwarf_version here. > * config/vxworks.c (vxworks_override_options): Don't > test whether dwarf_strict or dwarf_version are negative, > instead test !global_options_set.x_dwarf_*. > * config/darwin.c (darwin_override_options): Default to > dwarf_version 2. > * doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches > and -fno-debug-types-section are now the default. > This caused: FAIL: gcc.dg/pch/save-temps-1.c -O0 -g assembly comparison FAIL: gcc.dg/pch/save-temps-1.c -O3 -g assembly comparison on Linux/x86-64 with --target_board='unix{-m32}'. -- H.J.
Re: [PATCH] teach phi-opt to produce -(a COND b)
On Fri, Apr 27, 2012 at 3:01 AM, Paolo Bonzini wrote: > This patch teaches phiopt to look at phis whose arguments are -1 and 0, > and produce negated setcc statements. > > Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch > for pr53138. Ok for mainline? I came up with this same patch independently. I was going to submit it but I never got around to it. Thanks for submitting this. -- Andrew Pinski > > Paolo > > 2012-04-27 Paolo Bonzini > > * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs > whose arguments are -1 and 0, by negating the result of the > conditional. > > 2012-04-27 Paolo Bonzini > > * gcc.c-torture/execute/20120427-2.c: New testcase. > * gcc.dg/tree-ssa/phi-opt-10.c: New testcase. > > > Index: tree-ssa-phiopt.c > === > --- tree-ssa-phiopt.c (revisione 186859) > +++ tree-ssa-phiopt.c (copia locale) > @@ -536,17 +536,21 @@ > gimple_stmt_iterator gsi; > edge true_edge, false_edge; > tree new_var, new_var2; > + bool neg; > > /* FIXME: Gimplification of complex type is too hard for now. */ > if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE > || TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE) > return false; > > - /* The PHI arguments have the constants 0 and 1, then convert > - it to the conditional. */ > + /* The PHI arguments have the constants 0 and 1, or 0 and -1, then > + convert it to the conditional. */ > if ((integer_zerop (arg0) && integer_onep (arg1)) > || (integer_zerop (arg1) && integer_onep (arg0))) > - ; > + neg = false; > + else if ((integer_zerop (arg0) && integer_all_onesp (arg1)) > + || (integer_zerop (arg1) && integer_all_onesp (arg0))) > + neg = true; > else > return false; > > @@ -558,7 +562,7 @@ > falls through into BB. > > There is a single PHI node at the join point (BB) and its arguments > - are constants (0, 1). > + are constants (0, 1) or (0, -1). > > So, given the condition COND, and the two PHI arguments, we can > rewrite this PHI into non-branching code: > @@ -585,12 +589,20 @@ > edge so that we know when to invert the condition below. */ > extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge); > if ((e0 == true_edge && integer_zerop (arg0)) > - || (e0 == false_edge && integer_onep (arg0)) > + || (e0 == false_edge && !integer_zerop (arg0)) > || (e1 == true_edge && integer_zerop (arg1)) > - || (e1 == false_edge && integer_onep (arg1))) > + || (e1 == false_edge && !integer_zerop (arg1))) > cond = fold_build1_loc (gimple_location (stmt), > - TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); > + TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); > > + if (neg) > + { > + cond = fold_convert_loc (gimple_location (stmt), > + TREE_TYPE (result), cond); > + cond = fold_build1_loc (gimple_location (stmt), > + NEGATE_EXPR, TREE_TYPE (cond), cond); > + } > + > /* Insert our new statements at the end of conditional block before the > COND_STMT. */ > gsi = gsi_for_stmt (stmt); > Index: testsuite/gcc.c-torture/execute/20120427-2.c > === > --- testsuite/gcc.c-torture/execute/20120427-2.c (revisione 0) > +++ testsuite/gcc.c-torture/execute/20120427-2.c (revisione 0) > @@ -0,0 +1,38 @@ > +typedef struct sreal > +{ > + unsigned sig; /* Significant. */ > + int exp; /* Exponent. */ > +} sreal; > + > +sreal_compare (sreal *a, sreal *b) > +{ > + if (a->exp > b->exp) > + return 1; > + if (a->exp < b->exp) > + return -1; > + if (a->sig > b->sig) > + return 1; > + if (a->sig < b->sig) > + return -1; > + return 0; > +} > + > +sreal a[] = { > + { 0, 0 }, > + { 1, 0 }, > + { 0, 1 }, > + { 1, 1 } > +}; > + > +int main() > +{ > + int i, j; > + for (i = 0; i <= 3; i++) { > + for (j = 0; j < 3; j++) { > + if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort(); > + if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort(); > + if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort(); > + } > + } > + return 0; > +} > Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c > =
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu wrote: > On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam wrote: >> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu wrote: >>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam >>> wrote: On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu wrote: > On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam > wrote: >> Hi, >> >> I have made the following changes in this new patch which is attached: >> >> * Use target attribute itself to create function versions. >> * Handle any number of ISA names and arch= args to target attribute, >> generating the right dispatchers. >> * Integrate with the CPU runtime detection checked in this week. >> * Overload resolution: If the caller's target matches any of the >> version function's target, then a direct call to the version is >> generated, no need to go through the dispatching. >> >> Patch also available for review here: >> http://codereview.appspot.com/5752064 >> > > Does it work with > > int foo (); > int foo () __attribute__ ((targetv("arch=corei7"))); > > int (*foo_p) () = foo? Yes, this will work. foo_p will be the address of the dispatcher function and hence doing (*foo_p)() will call the right version. >>> >>> Even when foo_p is a global variable and compiled with -fPIC? >> >> I am not sure I understand what the complication is here, but FWIW, I >> tried this example and it works >> >> int foo () >> { >> return 0; >> } >> >> int __attribute__ ((target ("arch=corei7))) >> foo () >> { >> return 1; >> } >> >> int (*foo_p)() = foo; >> int main () >> { >> return (*foo_p)(); >> } >> >> g++ -fPIC -O2 example.cc >> >> >> Did you have something else in mind? Could you please elaborate if you >> a have a particular case in mind. >> > > That is what I meant. But I didn't see it in your testcase. > Can you add it to your testcase? > > Also you should verify the correct function is called in > your testcase at run-time. Ok, i will update the patch. Thanks, -Sri. > > > Thanks. > > > -- > H.J.
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam wrote: > On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu wrote: >> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam wrote: >>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu wrote: On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam wrote: > Hi, > > I have made the following changes in this new patch which is attached: > > * Use target attribute itself to create function versions. > * Handle any number of ISA names and arch= args to target attribute, > generating the right dispatchers. > * Integrate with the CPU runtime detection checked in this week. > * Overload resolution: If the caller's target matches any of the > version function's target, then a direct call to the version is > generated, no need to go through the dispatching. > > Patch also available for review here: > http://codereview.appspot.com/5752064 > Does it work with int foo (); int foo () __attribute__ ((targetv("arch=corei7"))); int (*foo_p) () = foo? >>> >>> Yes, this will work. foo_p will be the address of the dispatcher >>> function and hence doing (*foo_p)() will call the right version. >> >> Even when foo_p is a global variable and compiled with -fPIC? > > I am not sure I understand what the complication is here, but FWIW, I > tried this example and it works > > int foo () > { > return 0; > } > > int __attribute__ ((target ("arch=corei7))) > foo () > { > return 1; > } > > int (*foo_p)() = foo; > int main () > { > return (*foo_p)(); > } > > g++ -fPIC -O2 example.cc > > > Did you have something else in mind? Could you please elaborate if you > a have a particular case in mind. > That is what I meant. But I didn't see it in your testcase. Can you add it to your testcase? Also you should verify the correct function is called in your testcase at run-time. Thanks. -- H.J.
Re: [libcpp] maybe canonicalize system paths in line-map
Manuel López-Ibáñez a écrit: > Another drawback I didn't realize until now is that in this way the > canonicalize every path, instead of only touching those that belong to > system headers. Ah. Good catch. I guess file->dir->sysp should tell us if we are in a system directory, so that we can avoid canonicalizing in that case. > I am not sure if it is important or not. I don't know either. But I guess we could err on the safe side by limiting the change to system just headers like you were initially proposing. -- Dodji
Re: [PATCH][ARM] NEON DImode immediate constants
On 30/03/12 12:15, Andrew Stubbs wrote: > On 28/02/12 16:20, Andrew Stubbs wrote: >> Hi all, >> >> This patch implements 64-bit immediate constant loads in NEON. >> >> The current state is that you can load const_vector, but not const_int. >> This is clearly not ideal. The result is a constant pool entry when it's >> not necessary. >> >> The patch disables the movdi_vfp patterns for loading DImode values, if >> the operand is const_int and NEON is enabled, and extends the neon_mov >> pattern to include DImode const_int, as well as the const_vector >> operands. I've modified neon_valid_immediate only enough to accept >> const_int input - the logic remains untouched. > > That patch failed to bootstrap successfully, but this updated patch > bootstraps and tests with no regressions. > > OK? Sorry for the delay. This is OK. It would be good to merge all the target32 movdi variants into one pattern and then use alternative enabling to deal with the different valid alternatives. R. > > Andrew > > > neon-loadimm64.patch > > > 2012-03-27 Andrew Stubbs > > gcc/ > * config/arm/arm.c (neon_valid_immediate): Allow const_int. > (arm_print_operand): Add 'x' format. > * config/arm/constraints.md (Dn): Allow const_int. > * config/arm/neon.md (neon_mov): Use VDX to allow DImode. > Use 'x' format to print constants. > * config/arm/predicates.md (imm_for_neon_mov_operand): Allow const_int. > * config/arm/vfp.md (movdi_vfp): Disable for const_int when neon > is enabled. > (movdi_vfp_cortexa8): Likewise. > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 0bded8d..492ddde 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -8873,11 +8873,25 @@ neon_valid_immediate (rtx op, enum machine_mode mode, > int inverse, >break; \ > } > > - unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op); > - unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode)); > + unsigned int i, elsize = 0, idx = 0, n_elts; > + unsigned int innersize; >unsigned char bytes[16]; >int immtype = -1, matches; >unsigned int invmask = inverse ? 0xff : 0; > + bool vector = GET_CODE (op) == CONST_VECTOR; > + > + if (vector) > +{ > + n_elts = CONST_VECTOR_NUNITS (op); > + innersize = GET_MODE_SIZE (GET_MODE_INNER (mode)); > +} > + else > +{ > + n_elts = 1; > + if (mode == VOIDmode) > + mode = DImode; > + innersize = GET_MODE_SIZE (mode); > +} > >/* Vectors of float constants. */ >if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) > @@ -8913,7 +8927,7 @@ neon_valid_immediate (rtx op, enum machine_mode mode, > int inverse, >/* Splat vector constant out into a byte vector. */ >for (i = 0; i < n_elts; i++) > { > - rtx el = CONST_VECTOR_ELT (op, i); > + rtx el = vector ? CONST_VECTOR_ELT (op, i) : op; >unsigned HOST_WIDE_INT elpart; >unsigned int part, parts; > > @@ -17230,6 +17244,19 @@ arm_print_operand (FILE *stream, rtx x, int code) > } >return; > > +/* An integer that we want to print in HEX. */ > +case 'x': > + switch (GET_CODE (x)) > + { > + case CONST_INT: > + fprintf (stream, "#" HOST_WIDE_INT_PRINT_HEX, INTVAL (x)); > + break; > + > + default: > + output_operand_lossage ("Unsupported operand for code '%c'", code); > + } > + return; > + > case 'B': >if (GET_CODE (x) == CONST_INT) > { > diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md > index 7d0269a..68979c1 100644 > --- a/gcc/config/arm/constraints.md > +++ b/gcc/config/arm/constraints.md > @@ -255,9 +255,9 @@ > > (define_constraint "Dn" > "@internal > - In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov > - immediate instruction." > - (and (match_code "const_vector") > + In ARM/Thumb-2 state a const_vector or const_int which can be loaded with a > + Neon vmov immediate instruction." > + (and (match_code "const_vector,const_int") >(match_test "TARGET_32BIT > && imm_for_neon_mov_operand (op, GET_MODE (op))"))) > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index d7caa37..3c88568 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -152,9 +152,9 @@ > (define_attr "vqh_mnem" "vadd,vmin,vmax" (const_string "vadd")) > > (define_insn "*neon_mov" > - [(set (match_operand:VD 0 "nonimmediate_operand" > + [(set (match_operand:VDX 0 "nonimmediate_operand" > "=w,Uv,w, w, ?r,?w,?r,?r, ?Us") > - (match_operand:VD 1 "general_operand" > + (match_operand:VDX 1 "general_operand" > " w,w, Dn,Uvi, w, r, r, Usi,r"))] >"TARGET_NEON > && (register_operand (operands[0], mode) > @@ -173,7 +173,7 @@ >if (width == 0) > return "vmov.f32\t%P0, %1 @ "; >else > -sprintf
[PATCH, diagnostics] Add -Wvarargs option
Hello Gabriel, Following your request[1], please find below the implementation for the -Wvarargs option, as well as its introductory text. It applies on top the changes to enable -ftrack-macro-expansion by default[2]. [1]: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01604.html [2]: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00532.html --- Several warnings related to questionable usage cases of variadic function related macros (like va_start) could not be controlled by any warning-related macro. Fixed thus, by introducing the -Wvarargs option. Tested on x86_64-unknown-linux-gnu against trunk. Bootstrap for all languages is still underway. gcc/c-family/ * c.opt (Wvarargs): Define new option. gcc/ builtins.c (fold_builtin_next_arg): Use OPT_Wvarargs as an argument for the various warning_at calls. gcc/doc/ * invoke.texi: Update the documentation. gcc/testsuite/ * c-c++-common/Wvarargs.c: New test case. * c-c++-common/Wvarargs-2.c: Likewise. --- gcc/builtins.c |8 ++-- gcc/c-family/c.opt |4 ++ gcc/doc/invoke.texi |7 gcc/testsuite/c-c++-common/Wvarargs-2.c | 33 +++ gcc/testsuite/c-c++-common/Wvarargs.c | 54 +++ 5 files changed, 102 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/Wvarargs-2.c create mode 100644 gcc/testsuite/c-c++-common/Wvarargs.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 5ddc47b..41a052b 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -12127,8 +12127,8 @@ fold_builtin_next_arg (tree exp, bool va_start_p) /* Evidently an out of date version of ; can't validate va_start's second argument, but can still work as intended. */ warning_at (current_location, - 0, - "%<__builtin_next_arg%> called without an argument"); + OPT_Wvarargs, + "%<__builtin_next_arg%> called without an argument"); return true; } else if (nargs > 1) @@ -12164,7 +12164,7 @@ fold_builtin_next_arg (tree exp, bool va_start_p) argument so that we will get wrong-code because of it. */ warning_at (current_location, - 0, + OPT_Wvarargs, "second parameter of % not last named argument"); } @@ -12177,7 +12177,7 @@ fold_builtin_next_arg (tree exp, bool va_start_p) else if (DECL_REGISTER (arg)) { warning_at (current_location, - 0, + OPT_Wvarargs, "undefined behaviour when second parameter of " "% is declared with % storage"); } diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index db8ca81..a038e57 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -685,6 +685,10 @@ Wvariadic-macros C ObjC C++ ObjC++ Warning Do not warn about using variadic macros when -pedantic +Wvarargs +C ObjC C++ ObjC++ Warning Var(warn_varargs) Init(1) +Warn about questionable usage of the macros used to retrieve variable arguments + Wvla C ObjC C++ ObjC++ Var(warn_vla) Init(-1) Warning Warn if a variable length array is used diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 280fac3..47100b0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -4657,6 +4657,13 @@ Warn if variadic macros are used in pedantic ISO C90 mode, or the GNU alternate syntax when in pedantic ISO C99 mode. This is default. To inhibit the warning messages, use @option{-Wno-variadic-macros}. +@item -Wvarargs +@opindex Wvarargs +@opindex Wno-varargs +Warn upon questionable usage of the macros used to handle variable +arguments like @samp{va_start}. This is default. To inhibit the +warning messages, use @option{-Wno-varargs}. + @item -Wvector-operation-performance @opindex Wvector-operation-performance @opindex Wno-vector-operation-performance diff --git a/gcc/testsuite/c-c++-common/Wvarargs-2.c b/gcc/testsuite/c-c++-common/Wvarargs-2.c new file mode 100644 index 000..a2e031f --- /dev/null +++ b/gcc/testsuite/c-c++-common/Wvarargs-2.c @@ -0,0 +1,33 @@ +/* + { dg-options "-Wno-varargs" } + { dg-do compile } + */ + +#include + +void +err (int a) +{ + va_list vp; + va_start (vp, a); // { dg-error "used in function with fixed args" } +} + +void +foo0 (int a, int b, ...) +{ +va_list vp; +/* 'a' is not the last argument of the enclosing function, but + don't warn because we are ignoring -Wvarargs. */ +va_start (vp, a); +va_end (vp); +} + +void +foo1 (int a, register int b, ...) +{ +va_list vp; +/* 'b' is declared with register storage, but don't warn + because we are ignoring -Wvarargs. */ +va_start (vp, b); +va_end (vp); +} diff --git a/gcc/testsuite/c-c++-common/Wvarargs.c b/gcc/testsuite/c-c++-c
Re: [PATCH 11/13] Fix va_start related location
Gabriel Dos Reis writes: > On Wed, Apr 25, 2012 at 10:20 AM, Dodji Seketeli wrote: >> Gabriel Dos Reis writes: >> >>> On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli wrote: In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being emitted because the relevant location was inside the var_start macro defined in a system header. It can even point to a token for a builtin macro there. This patch unwinds to the first token in real source code in that case. >>> >>> While you are at it, could you also use a non-zero value for the second >>> argument argument to warning_at? >> >> I couldn't find any obvious value for it. I am thinking maybe it would >> make sense to introduction a new -Wva_start to warn about possible >> dangerous uses of the va_start macro and use that as the second argument >> for the relevant warnings emitted by fold_builtin_next_arg. What do you >> think? > > or -Wvarargs? OK, I have cooked up a patch for this that I will send in a separate thread shortly. >> >> In any case, this topic seems logically unrelated to the purpose of >> enable -ftrack-macro-expansion by default, so IMHO it would be better to >> do this in a separate self contain patch. Among other things, this >> would ease possible future back-ports. Would you agree? > > OK. While testing the separate patch, I realized that this one was missing adjusting the location in another spot. So I have updated this patch accordingly. The patch that adds -Wvarargs will come on top of it, and will add some needed regression tests for the whole. Tested on x86_64-unknown-linux-gnu against trunk. Bootstrap is still running ... * builtins.c (fold_builtin_next_arg): Unwinds to the first location in real source code. --- gcc/builtins.c | 23 +++ 1 files changed, 19 insertions(+), 4 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index b47f218..5ddc47b 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -12095,6 +12095,13 @@ fold_builtin_next_arg (tree exp, bool va_start_p) tree fntype = TREE_TYPE (current_function_decl); int nargs = call_expr_nargs (exp); tree arg; + /* There is good chance the current input_location points inside the + definition of the va_start macro (perhaps on the token for + builtin) in a system header, so warnings will not be emitted. + Use the location in real source code. */ + source_location current_location = +linemap_unwind_to_first_non_reserved_loc (line_table, input_location, + NULL); if (!stdarg_p (fntype)) { @@ -12119,7 +12126,9 @@ fold_builtin_next_arg (tree exp, bool va_start_p) { /* Evidently an out of date version of ; can't validate va_start's second argument, but can still work as intended. */ - warning (0, "%<__builtin_next_arg%> called without an argument"); + warning_at (current_location, + 0, + "%<__builtin_next_arg%> called without an argument"); return true; } else if (nargs > 1) @@ -12154,7 +12163,9 @@ fold_builtin_next_arg (tree exp, bool va_start_p) argument. We just warn and set the arg to be the last argument so that we will get wrong-code because of it. */ - warning (0, "second parameter of % not last named argument"); + warning_at (current_location, + 0, + "second parameter of % not last named argument"); } /* Undefined by C99 7.15.1.4p4 (va_start): @@ -12164,8 +12175,12 @@ fold_builtin_next_arg (tree exp, bool va_start_p) the default argument promotions, the behavior is undefined." */ else if (DECL_REGISTER (arg)) -warning (0, "undefined behaviour when second parameter of " - "% is declared with % storage"); + { + warning_at (current_location, + 0, + "undefined behaviour when second parameter of " + "% is declared with % storage"); + } /* We want to verify the second parameter just once before the tree optimizers are run and then avoid keeping it in the tree, -- Dodji
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu wrote: > On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam wrote: >> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu wrote: >>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam >>> wrote: Hi, I have made the following changes in this new patch which is attached: * Use target attribute itself to create function versions. * Handle any number of ISA names and arch= args to target attribute, generating the right dispatchers. * Integrate with the CPU runtime detection checked in this week. * Overload resolution: If the caller's target matches any of the version function's target, then a direct call to the version is generated, no need to go through the dispatching. Patch also available for review here: http://codereview.appspot.com/5752064 >>> >>> Does it work with >>> >>> int foo (); >>> int foo () __attribute__ ((targetv("arch=corei7"))); >>> >>> int (*foo_p) () = foo? >> >> Yes, this will work. foo_p will be the address of the dispatcher >> function and hence doing (*foo_p)() will call the right version. > > Even when foo_p is a global variable and compiled with -fPIC? I am not sure I understand what the complication is here, but FWIW, I tried this example and it works int foo () { return 0; } int __attribute__ ((target ("arch=corei7))) foo () { return 1; } int (*foo_p)() = foo; int main () { return (*foo_p)(); } g++ -fPIC -O2 example.cc Did you have something else in mind? Could you please elaborate if you a have a particular case in mind. The way I handle function pointers is straightforward. When the front-end sees a pointer to a function that is versioned, it returns the pointer to the dispatcher instead. Thanks, -Sri. > > Thanks. > > -- > H.J.
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam wrote: > On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu wrote: >> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam >> wrote: >>> Hi, >>> >>> I have made the following changes in this new patch which is attached: >>> >>> * Use target attribute itself to create function versions. >>> * Handle any number of ISA names and arch= args to target attribute, >>> generating the right dispatchers. >>> * Integrate with the CPU runtime detection checked in this week. >>> * Overload resolution: If the caller's target matches any of the >>> version function's target, then a direct call to the version is >>> generated, no need to go through the dispatching. >>> >>> Patch also available for review here: >>> http://codereview.appspot.com/5752064 >>> >> >> Does it work with >> >> int foo (); >> int foo () __attribute__ ((targetv("arch=corei7"))); >> >> int (*foo_p) () = foo? > > Yes, this will work. foo_p will be the address of the dispatcher > function and hence doing (*foo_p)() will call the right version. Even when foo_p is a global variable and compiled with -fPIC? Thanks. -- H.J.
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu wrote: > On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam wrote: >> Hi, >> >> I have made the following changes in this new patch which is attached: >> >> * Use target attribute itself to create function versions. >> * Handle any number of ISA names and arch= args to target attribute, >> generating the right dispatchers. >> * Integrate with the CPU runtime detection checked in this week. >> * Overload resolution: If the caller's target matches any of the >> version function's target, then a direct call to the version is >> generated, no need to go through the dispatching. >> >> Patch also available for review here: >> http://codereview.appspot.com/5752064 >> > > Does it work with > > int foo (); > int foo () __attribute__ ((targetv("arch=corei7"))); > > int (*foo_p) () = foo? Yes, this will work. foo_p will be the address of the dispatcher function and hence doing (*foo_p)() will call the right version. > > Does it support C++? Partially, no support for virtual function versioning yet. I will add it in the next iteration. Thanks, -Sri. > > Thanks. > > -- > H.J.
Re: PR c++/52538 Extend C++11 UDLs to be compatible with inttypes.h macros (issue6109043)
On Thu, Apr 26, 2012 at 8:35 AM, Tom Tromey wrote: > > This is ok with this change. Thanks. Updated and submitted to trunk. Ollie
Re: RFA: consolidate DWARF strings into libiberty
> "Jakub" == Jakub Jelinek writes: Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote: >> >> I will not oppose adding more unrelated stuff to libiberty, but >> neither will I approve it. I will let one of the other maintainers or >> a global maintainer approve it. Jakub> The original libiberty patch is ok for trunk then. Thanks, Jakub. I am checking it in to the gcc tree. I'll send the gdb and binutils follow-ups later. Tom
Re: [PATCH] reload: Try alternative with swapped operands before going to the next
On Wed, Apr 25, 2012 at 6:23 AM, Ulrich Weigand wrote: > Andreas Krebbel wrote: > >> 2011-11-17 Andreas Krebbel >> >> * reload.c (find_reloads): Change the loop nesting when trying an >> alternative with swapped operands. > > This is OK. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53141 -- H.J.
Re: [libcpp] maybe canonicalize system paths in line-map
On 26 April 2012 12:12, Dodji Seketeli wrote: > Manuel López-Ibáñez a écrit: > >> On 21 April 2012 14:56, Jason Merrill wrote: >>> It seems like we'll do this for every line in the header, which could lead >>> to a lot of leaked memory. Instead, we should canonicalize when setting >>> ORDINARY_MAP_FILE_NAME. >> >> Hum, my understanding of the code is that this is exactly what I >> implemented. That is, at most we leak every time >> ORDINARY_MAP_FILE_NAME is set to a system header file (if the realpath >> is shorter than the original path). This is a bit inefficient because >> this happens two times per #include (when entering and when leaving). >> But I honestly don't know how to avoid this. > > My understanding is that by the time we are about to deal with the > ORDINARY_MAP_FILE_NAME property of a given map, it's too late; we have > lost the "memory leak game" because that property just points to the > path carried by the instance _cpp_file::path that is current when the > map is built. So the lifetime of that property is tied to the lifetime > of the relevant instance of _cpp_file. > > So maybe it'd be better to canonicalize the _cpp_file::path when it's > first build? One drawback of that approach would be that > _cpp_file::path will then permanently loose the information about the > current directory, that is indirectly encoded into the way the file path > is originally built. But do we really need that information? Another drawback I didn't realize until now is that in this way the canonicalize every path, instead of only touching those that belong to system headers. I am not sure if it is important or not. Comments? Cheers, Manuel.
Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.
On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote: > But even without this, I think the patch is worth it just to get rid of > all the relocations necessary otherwise. IMHO we should defer applying this by a few months, given that GDB support is only being added these days and -gdwarf-4 is now the default. Applying it in August/September when there is already GDB version with the support would be better. > + attr.dw_attr = DW_AT_high_pc; > + if (dwarf_version < 3) Shouldn't this be < 4? I think DWARF 3 doesn't have constant class DW_AT_high_pc. Jakub
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam wrote: > Hi, > > I have made the following changes in this new patch which is attached: > > * Use target attribute itself to create function versions. > * Handle any number of ISA names and arch= args to target attribute, > generating the right dispatchers. > * Integrate with the CPU runtime detection checked in this week. > * Overload resolution: If the caller's target matches any of the > version function's target, then a direct call to the version is > generated, no need to go through the dispatching. > > Patch also available for review here: > http://codereview.appspot.com/5752064 > Does it work with int foo (); int foo () __attribute__ ((targetv("arch=corei7"))); int (*foo_p) () = foo? Does it support C++? Thanks. -- H.J.
dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.
Hi, The DWARF spec says (since version 4) that DW_AT_high_pc can be represented by a constant form. If the value of the DW_AT_high_pc is of class address, it is the relocated address of the first location past the last instruction associated with the entity; if it is of class constant, the value is an unsigned integer offset which when added to the low PC gives the address of the first location past the last instruction associated with the entity. Encoding DW_AT_high_pc as constant offset saves a lot of relocations. Which is what this patch does. I also have patches for elfutils, binutils and gdb to handle the new form. Jakub has a patch for dwz to make the format encoding optimal, which saves some space too. I couldn't figure out a way to do this from dwarf2out.c even though in almost all cases using a full DW_FORM_data4/8 is way too much. Any ideas? But even without this, I think the patch is worth it just to get rid of all the relocations necessary otherwise. At first I thought of using a dw_val_class_delta class that held two labels (a generic form of dw_val_class_vms_delta), but that seemed overkill since the first label is always associated with the DW_AT_low_pc attribute of the DIE. So I just made it a special new class. 2012-04-27 Mark Wielaard * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc. * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc. (add_AT_low_high_pc): New function. (AT_lbl): Handle dw_val_class_high_pc. (print_die): Likewise. (attr_checksum): Likewise. (attr_checksum_ordered): Likewise. (same_dw_val_p): Likewise. (size_of_die): Likewise. (value_format): Likewise. (output_die): Likewise. (gen_subprogram_die): Use add_AT_low_high_pc. (add_high_low_attributes): Likewise. (dwarf2out_finish): Likewise. Tested on x86_64-unknown-linux-gnu. Cheers, Mark From 5e601bc28e283511b3f5d1bb1d2904251d909563 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Fri, 27 Apr 2012 14:27:14 +0200 Subject: [PATCH] dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset. * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc. * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc. (add_AT_low_high_pc): New function. (AT_lbl): Handle dw_val_class_high_pc. (print_die): Likewise. (attr_checksum): Likewise. (attr_checksum_ordered): Likewise. (same_dw_val_p): Likewise. (size_of_die): Likewise. (value_format): Likewise. (output_die): Likewise. (gen_subprogram_die): Use add_AT_low_high_pc. (add_high_low_attributes): Likewise. (dwarf2out_finish): Likewise. --- gcc/ChangeLog | 17 +++ gcc/dwarf2out.c | 87 +++ gcc/dwarf2out.h |3 +- 3 files changed, 81 insertions(+), 26 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a93d3cd..66a32ad 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,20 @@ +2012-04-27 Mark Wielaard + + * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc. + * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc. + (add_AT_low_high_pc): New function. + (AT_lbl): Handle dw_val_class_high_pc. + (print_die): Likewise. + (attr_checksum): Likewise. + (attr_checksum_ordered): Likewise. + (same_dw_val_p): Likewise. + (size_of_die): Likewise. + (value_format): Likewise. + (output_die): Likewise. + (gen_subprogram_die): Use add_AT_low_high_pc. + (add_high_low_attributes): Likewise. + (dwarf2out_finish): Likewise. + 2012-04-25 Jakub Jelinek PR target/53110 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 766edba..fa2963b 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1647,6 +1647,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b) case dw_val_class_fde_ref: return a->v.val_fde_index == b->v.val_fde_index; case dw_val_class_lbl_id: +case dw_val_class_high_pc: return strcmp (a->v.val_lbl_id, b->v.val_lbl_id) == 0; case dw_val_class_str: return a->v.val_str == b->v.val_str; @@ -4350,6 +4351,26 @@ add_AT_data8 (dw_die_ref die, enum dwarf_attribute attr_kind, add_dwarf_attr (die, &attr); } +/* Add DW_AT_low_pc and DW_AT_high_pc to a DIE. */ +static inline void +add_AT_low_high_pc (dw_die_ref die, const char *lbl_low, const char *lbl_high) +{ + dw_attr_node attr; + + attr.dw_attr = DW_AT_low_pc; + attr.dw_attr_val.val_class = dw_val_class_lbl_id; + attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_low); + add_dwarf_attr (die, &attr); + + attr.dw_attr = DW_AT_high_pc; + if (dwarf_version < 3) +attr.dw_attr_val.val_class = dw_val_class_lbl_id; + else +attr.dw_attr_val.val_class = dw_val_class_high_pc; + attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_high); + add_dwarf_attr (die, &attr); +} + /* Hash and equality f
Re: [C11-atomic] [patch] gimple atomic statements
On 04/27/2012 04:52 AM, Richard Guenther wrote: hmm, yeah they always return a value. I was just copying the gimple_call code... Why would we need to do this processing for a GIMPLE_CALL lhs and not a GIMPLE_ATOMIC lhs? GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar GIMPLE_CALL arguments can be memory if they are aggregate. Can this be the case for atomics as well? A. point taken. No. No aggregates can ever be returned by an atomic, so yes, I can trim that out then. When generic atomics are used to handle aggregates, it's all handled by pointer parameters and the function is usually void, except for compare_exchange which returns a boolean. And the RHS processing is the same as for a is_gimple_call as well... it was lifted from the code immediately following.,.. I tried to write all this code to return the same values as would have been returned had it actually been a built-in __sync or __atomic call like we have today. Once I had them all, then we could actually make any improvements based on our known side effect limits. I guess I don't understand the rest of the comment about why I need to do something different here than with a call... Well, all callers that use walk_stmt_load_store/addr_ops need to handle non-explicit loads/stores represented by the stmt. For calls this includes the loads and stores the callee may perform. For atomics this includes ? the only implicit thing a normal atomic operation can do is load or store the TARGET. When the type is an aggregate, then the EXPR (soon to be op :-) and/or EXPECTED field may also be loaded or stored indrectly as all the parameters become pointers. When I add that code I will reflect that properly. (depends on whether the operand of an atomic-load is a pointer or an object, I suppose it is a pointer for the following) For atomic this includes the implicit load of *{address operand} which will _not_ be returned by walk_stmt_load_store/addr_ops. Thus callers that expect to see all loads/stores (like ipa-pure-const.c) need to explicitely handle atomics similar to how they handle calls and asms (if only in a very conservative way). Similar the alias oracle needs to handle them (obviously). OK, I understand better :-) yes, locals can do anything they want since they aren't visible to other processes. at the moment, we'll leave those fences in because we dont optimize atomics at all, but "in the fullness of time" this will be optimized to: int foo() { atomic_fence() return 1; } at the moment we produce: int foo() { atomic_fence() atomic_fence() return 1; } Which is inconsistend with the alias-oracle implementation. Please fix it to at _least_ not consider non-aliased locals. You can look at the call handling for how to do this - where you can also see how to do even better by using points-to information from the pointer operand of the atomics that specify the memory loaded/stored. You don't want to hide all your implementation bugs by making the alias oracle stupid ;) I'm too transparently lazy obviously :-) I'll look at it :-) All atomic operations will have a VDEF so in theory it should be fine to ignore. Ignore? Returning 'false' would be ignoring them. Why do all atomic operations have a VDEF? At least atomic loads are not considered stores, are they? very true, and neither do fences... But we can't move any shared memory load or store past them, so making them all VDEFS prevents that activity. Once gimple_atomic is working properly, it might be possible to go back and adjust places. In particular, we may be able to allow directional movement based on the memory order. seq_cst will always block motion in both directions, but the acquire model allows shared memory accesses to be sunk, and the release model allows shared memory access to be hoisted. relaxed allows both.. We may be able to get this behaviour through appropriate uses of VDEFs and VUSES... or maybe it will be through a more basic understanding of GIMPLE_ATOMIC in the appropriate places. I will visit that once everything is working conservatively. There are only 2 uses: DCE already has code added to handle them, and DSE handles it through ref_maybe_used_by_stmt_p. Yes. And eventually what matters for both callers should be folded into them (let me put that somewhere ontop of my TODO ...) For example DCE would happily remove atomic loads if they look like result-SSA-name = ATOMIC-LOAD; if result-SSA-name is not used anywhere. And I don't see why we should not be allowed to do this? Returning true from the above function will make DCE not remove it. at the moment, we are NOT allowed to remove any atomic operation. There are synchronization side effects and the only way we can remove such a statement is by analyzing the atomic operations in relation to each other. (ie, 2 loads from the same atomic with no other intervening atomic operation may make the first on
[PATCH] x86: emit tzcnt unconditionally
tzcnt is encoded as "rep;bsf" and unlike lzcnt is a drop-in replacement if we don't care about the flags (it has the same semantics for non-zero values). Since bsf is usually slower, just emit tzcnt unconditionally. However, write it as rep;bsf unless -mbmi is in use, to cater for old assemblers. Bootstrapped on a non-BMI x86_64-linux host, regtest in progress. Ok for mainline? Paolo 2012-04-27 Paolo Bonzini * config/i386/i386.md (ctz2): Emit tzcnt (as rep;bsf) instead of bsf. Index: i386/i386.md === --- i386/i386.md(revisione 186904) +++ i386/i386.md(copia locale) @@ -12082,14 +12082,15 @@ (define_insn "ctz2" (clobber (reg:CC FLAGS_REG))] "" { + /* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI. */ if (TARGET_BMI) return "tzcnt{}\t{%1, %0|%0, %1}"; else -return "bsf{}\t{%1, %0|%0, %1}"; +return "rep;bsf{}\t{%1, %0|%0, %1}"; } [(set_attr "type" "alu1") (set_attr "prefix_0f" "1") - (set (attr "prefix_rep") (symbol_ref "TARGET_BMI")) + (set_attr "prefix_rep" "1") (set_attr "mode" "")]) (define_expand "clz2"
Re: [PATCH, i386, middle-end, tessuite] Intel TSX's HLE.
Hello guys, After conversation in IRC with Richard, I've slightly updated the patch. 1. According to Richards suggestion I moved PTA_HLE to `generic` march. 2. Applied and updated Andi's patch (see [1]). 3. Updated tests to use proper memory model combintations 4. Added 1-sentense description to extend.texi to mention Since, I've changed i386 part, I thing Uros's OK is also needed. ChangeLog entry: 2012-04-27 Kirill Yukhin Andi Kleen * coretypes (MEMMODEL_MASK): New. * builtins.c (get_memmodel): Add val. Call target.memmodel_check and return new variable. (expand_builtin_atomic_exchange): Mask memmodel values. (expand_builtin_atomic_compare_exchange): Ditto. (expand_builtin_atomic_load): Ditto. (expand_builtin_atomic_store): Ditto. (expand_builtin_atomic_clear): Ditto. * doc/extend.texi: Mention port-dependent memory model flags. * config/i386/cpuid.h (bit_HLE): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect HLE support. * config/i386/i386-protos.h (ix86_generate_hle_prefix): New. * config/i386/i386-c.c (ix86_target_macros_internal): Set HLE defines. (ix86_target_string)<-mhle>: New. (ix86_valid_target_attribute_inner_p): Ditto. * config/i386/i386.c (ix86_target_string): New. (ix86_valid_target_attribute_inner_p): Ditto. (ix86_option_override_internal): New switch, set it enabled for generic and generic64. (ix86_print_operand): Generate HLE lock prefixes. (ix86_memmodel_check): New. (TARGET_MEMMODEL_CHECK): Ditto. * config/i386/i386.h (OPTION_ISA_HLE): Ditto. (IX86_HLE_ACQUIRE): Ditto. (IX86_HLE_RELEASE): Ditto. * config/i386/i386.h (ix86_generate_hle_prefix): Ditto. * config/i386/i386.opt (mhle): Ditto. * config/i386/sync.md(atomic_compare_and_swap): Pass success model to instruction emitter. (atomic_fetch_add): Ditto. (atomic_exchange): Ditto. (atomic_add): Ditto. (atomic_sub): Ditto. (atomic_): Ditto. (*atomic_compare_and_swap_doubledi_pic): Ditto. (atomic_compare_and_swap_single): Define and use argument for success model. (atomic_compare_and_swap_double): Ditto. * configure.ac: Check if assembler support HLE prefixes. * configure: Regenerate. * config.in: Ditto. testsuite/ChageLog entry: 2012-04-27 Kirill Yukhin * gcc.target/i386/hle-cmpxchg-acq-1.c: New. * gcc.target/i386/hle-cmpxchg-rel-1.c: Ditto. * gcc.target/i386/hle-add-acq-1.c: Ditto. * gcc.target/i386/hle-add-rel-1.c: Ditto. * gcc.target/i386/hle-and-acq-1.c: Ditto. * gcc.target/i386/hle-and-rel-1.c: Ditto. * gcc.target/i386/hle-or-acq-1.c: Ditto. * gcc.target/i386/hle-or-rel-1.c: Ditto. * gcc.target/i386/hle-sub-acq-1.c: Ditto. * gcc.target/i386/hle-sub-rel-1.c: Ditto. * gcc.target/i386/hle-xadd-acq-1.c: Ditto. * gcc.target/i386/hle-xadd-rel-1.c: Ditto. * gcc.target/i386/hle-xchg-acq-1.c: Ditto. * gcc.target/i386/hle-xchg-rel-1.c: Ditto. * gcc.target/i386/hle-xor-acq-1.c: Ditto. * gcc.target/i386/hle-xor-rel-1.c: Ditto. bootstrap passing. HLE tests passing. Patch attached. OK? [1] - http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01538.html. Thanks, K On Thu, Apr 19, 2012 at 9:18 PM, Uros Bizjak wrote: > On Thu, Apr 19, 2012 at 5:21 PM, Kirill Yukhin > wrote: >> Folks, >> Thanks a lot for prompts! >> I've updated my patch, so cmparing to previous it is: >> - have dedicated hook var, to define target dep. bits in mem model. >> Check (and warning) is performed in get_memmodel >> - prefix emit performed through dedicated operand type (%K) > > +#else > + fputs ("\n" ASM_BYTE "0xf3\n\t"); > +#endif > > Ouch, my bad, you need to add file argument here (no need to repost > patch because of this). FYI, you have wrong indentation here. > > Otherwise, OK as far as x86 is concerned, but you will need separate > approval for middle-end part. > > Thanks, > Uros. hle-5.gcc.patch Description: Binary data
Re: [PATCH] teach emit_store_flag to use clz/ctz
Il 27/04/2012 13:16, Richard Guenther ha scritto: > In optabs.c we compare the CTZ_DEFINED_VALUE_AT_ZERO against two, > is != 0 really what you want here? The docs suggest to me > that as you are using the optab below you should compare against two, too. Interesting, first time I hear about this... ... except that no target sets the macros to 2, and all of them could (as far as I could see). Looks like the code trumps the documentation; how does this look? 2012-04-27 Paolo Bonzini * optabs.c (expand_ffs): Check CTZ_DEFINED_VALUE_AT_ZERO against 1. * doc/tm.texi (Misc): Invert meaning of 1 and 2 for CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO. Index: optabs.c === --- optabs.c(revisione 186903) +++ optabs.c(copia locale) @@ -2764,7 +2764,7 @@ expand_ffs if (!temp) goto fail; - defined_at_zero = (CTZ_DEFINED_VALUE_AT_ZERO (mode, val) == 2); + defined_at_zero = (CTZ_DEFINED_VALUE_AT_ZERO (mode, val) == 1); } else if (optab_handler (clz_optab, mode) != CODE_FOR_nothing) { @@ -2773,7 +2773,7 @@ expand_ffs if (!temp) goto fail; - if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) == 2) + if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) == 1) { defined_at_zero = true; val = (GET_MODE_PRECISION (mode) - 1) - val; Index: doc/tm.texi === --- doc/tm.texi (revisione 186903) +++ doc/tm.texi (copia locale) @@ -10640,9 +10640,9 @@ for @code{clz} or @code{ctz} with a zero operand. A result of @code{0} indicates the value is undefined. If the value is defined for only the RTL expression, the macro should -evaluate to @code{1}; if the value applies also to the corresponding optab +evaluate to @code{2}; if the value applies also to the corresponding optab entry (which is normally the case if it expands directly into -the corresponding RTL), then the macro should evaluate to @code{2}. +the corresponding RTL), then the macro should evaluate to @code{1}. In the cases where the value is defined, @var{value} should be set to this value. plus tweaking this patch to reject 2 as well. > What about cost considerations? We only seem to have the general > "branches are expensive" metric - but ctz/clz may be prohibitely expensive > themselves, no? Yeah, that's a general problem with this kind of tricks. In general however clz/ctz is getting less and less expensive, so I don't think it is worrisome (at least at the beginning of stage 1). We can add rtx_costs checks later. Among architectures I know, only i386 has an expensive bsf/bsr but it also has sete/setne which GCC will use instead of this trick. Looking at rtx_costs, nothing seems to mark clz/ctz as prohibitively expensive (Xtensa does, but only in the case when the optab handler will not exist). I realize though that this is not a particularly good statistic, since the compiler would not generate them out of its hat until now. Paolo
Re: [PATCH, tree-optimization] Fix for PR 52868
On Wed, Apr 25, 2012 at 6:41 PM, Richard Guenther wrote: > On Wed, Apr 25, 2012 at 4:32 PM, Igor Zamyatin wrote: >> On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther >> wrote: >>> On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin wrote: Hi all! I'd like to post for review the patch which makes some costs adjusting in get_computation_cost_at routine in ivopts part. As mentioned in the PR changes also fix the bwaves regression from PR 52272. Also changes introduce no degradations on spec2000/2006 and EEMBC1.1/2.0(this was measured on Atom) on x86 Bootstrapped and regtested on x86. Ok to commit? >>> >>> I can't make sense of the patch and the comment does not help. >>> >>> + diff_cost = cost.cost; >>> cost.cost /= avg_loop_niter (data->current_loop); >>> + add_cost_val = add_cost (TYPE_MODE (ctype), data->speed); >>> + /* Do cost correction if address cost is small enough >>> + and difference cost is high enough. */ >>> + if (address_p && diff_cost > add_cost_val >>> + && get_address_cost (symbol_present, var_present, >>> + offset, ratio, cstepi, >>> + TYPE_MODE (TREE_TYPE (utype)), >>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)), >>> + speed, stmt_is_after_inc, >>> + can_autoinc).cost <= add_cost_val) >>> + cost.cost += add_cost_val; >>> >>> Please explain more thoroughly. It also would seem to be better to add >>> an extra case, as later code does >> >> For example for such code >> >> for (j=0; j> for (i=0; i> sum += ptr->a[j][i] * ptr->c[k][i]; >> } >> we currently have following gimple on x86 target (I provided a piece >> of all phase output): >> >> # ivtmp.13_30 = PHI >> D.1748_34 = (void *) ivtmp.13_30; >> D.1722_7 = MEM[base: D.1748_34, offset: 0B]; >> D.1750_36 = ivtmp.27_28; >> D.1751_37 = D.1750_36 + ivtmp.13_30; <-- we got >> non-invariant add which is not taken into account currently in cost >> model >> D.1752_38 = (void *) D.1751_37; >> D.1753_39 = (sizetype) k_8(D); >> D.1754_40 = D.1753_39 * 800; >> D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B]; >> ... >> >> With proposed fix we produce: >> >> # ivtmp.14_30 = PHI >> D.1749_34 = (struct S *) ivtmp.25_28; >> D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B]; >> D.1750_35 = (sizetype) k_8(D); >> D.1751_36 = D.1750_35 * 800; >> D.1752_37 = ptr_6(D) + D.1751_36; >> D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: >> 16000B]; >> >> which is more effective on platforms where address cost is cheaper >> than cost of addition operation. That's basically what this adjustment >> is for. > > If we generally miss to account for the add then why is the adjustment > conditional on diff_cost > add_cost and address_cost <= add_cost? > > Is this a new heuristic or a fix for not accurately computing the cost for the > stmts we generate? I'd say this is closer to heuristic since diff_cost > add_cost is an attempt to catch the case with non-invariant add produced by pointer difference and address_cost <=add_cost leaves the cases with cheap address operations > > Richard. > >> So comment in the source code now looks as follows >> >> /* Do cost correction when address difference produces >> additional non-invariant add operation which is less >> profitable if address cost is cheaper than cost of add. */ >> >>> >>> /* Now the computation is in shape symbol + var1 + const + ratio * var2. >>> (symbol/var1/const parts may be omitted). If we are looking for an >>> address, find the cost of addressing this. */ >>> if (address_p) >>> return add_costs (cost, >>> get_address_cost (symbol_present, var_present, >>> offset, ratio, cstepi, >>> TYPE_MODE (TREE_TYPE (utype)), >>> TYPE_ADDR_SPACE (TREE_TYPE (utype)), >>> speed, stmt_is_after_inc, >>> can_autoinc)); >>> >>> thus refactoring the code a bit would make it possible to CSE the >>> get_address_cost >>> call and eventually make it clearer what the code does. >> >> 'offset' could be changed beetween two calls of get_address_cost so >> such refactoring looks useless. >> >> New patch (only the comment was changed) attached. Changelog was >> changed as well. >> >>> >>> Richard. >>> >> >> Changelog: >> >> 2012-04-26 Yuri Rumyantsev >> >> * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust >> cost model when address difference produces additional >> non-invariant add operation whi
Re: [C11-atomic] [patch] gimple atomic statements
On 04/27/2012 04:37 AM, Richard Guenther wrote: Since integral atomics are always of an unsigned type , I could switch over and use 'unsigned size' instead of 'tree fntype' for them (I will rename it), but then things may be more complicated when dealing with generic atomics... those can be structure or array types and I was planning to allow leaving the type in case I discover something useful I can do with it. It may ultimately turn out that the real type isn't going to matter, in which case I will remove it and replace it with an unsigned int for size. So it eventually will support variable-size types? we support arbitrary sized objects now for exchange, compare_exchange, load, and store. I just havent added the support to gimple_atomic yet. Thats next on my list. And the reason memmodel is a tree is because, as ridiculous as it seems, it can ultimately be a runtime value.Even barring that, it shows up as a variable after inlining before various propagation engines run, especially in the C++ templates. So it must be a tree. Ick. That sounds gross. So, if it ends up a variable at the time we generate assembly we use a "conservative" constant value for it? Indeed, ICK, and yes, I make it SEQ_CST at compile time if a variable value gets all the way through to codegen. Hmm, ok. So you are changing GENERIC in fact. I suppose _Atomic is restricted to global variables. Can I attach _Atomic to allocated storage via pointer casts? Consider writing up semantics of _Atomic into generic.texi please. Yes, I have a start on implementing _Atomic which adds the bit to the type. And no, it isn't restricted to globals... it just indicates said hunk of memory must be accessed in an atomic manner. You can use it on automatics if you want... I believe _Atomic can also be used in casts as well. so an atomic call may have to be injected into a dereference expression as well to load or update memory. I'll update .texi along with the patch when I get to it. Well, they are calls with a very well-defined set of side-effects. Otherwise not representing them as calls would be a waste of time. Thus, no - they do not need to be considered "calls" everywhere (well, treating them the same as calls should be conservative) but treating them as "atomics" even if they were expanded as calls needs to be correct. Yeah, I was starting with them as calls just to be conservative and get the same behaviour, then refine them by examining each GIMPLE_ATOMIC use in detail. Looks like a hieararchy "no target or anything else" -> "target" -> "expr" -> "memorder" would work, with only "lhs" being optional and present everywhere. Of course the hierarchy only makes sense for things that are not trees (I was thinking of memorder originally - but that thought fell apart). So in the end apart from abstracting a base for FENCE the flat hieararchy makes sense (all but FENCE have a type / size). Is it really worth it to have 2 different types of gimple_atomic nodes in order to avoid having an unused type field in atomic_fence? Atomic_fence has an order field, so it requires the use of op[0] in order for gimple_atomic_order() to work properly, we can't have all the other gimple nodes inherit from that, so we'd need a base which basically has just the KIND in it, and target/no-target nodes inherit from that. So we'd have: /gimple_atomic_fence atomic_base \ gimple_atomic_all_others then all the checks for is_gimple_atomic() have to look for both gimple_atomic_all_others and gimple_atomic_fence. Perhaps a better option is to shift all the operands by one since the type is a tree... ie, if there is a a target, there is also a type entry, so op[0] remains ORDER, op[1] is the newly inserted TYPE op[2] then becomes the TARGET, op[3...] ,when present, are all shifted over by one. the type can be removed from the structure and now there would be no wastage. Is that reasonable? Andrew
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou wrote: > >> We do not depend on the block structure any more when dealing with > >> stack layout for variables in GCC 4.7.0 and above. I am not saying > >> your patch is incorrect or not needed. Just it will not have an > >> effect on variable stack layout. > > > > It might be worth backporting to the 4.6 branch though, these stack layout > > issues are very nasty. > > What about not setting a BLOCK on the call stmt? That said, I can't see I recall that not seetting the block did lead to ICE... > how outlining a SESE region that starts at the beginning of the function > is not conservatively using the outermost BLOCK ... so, is the bug not > elsewhere? ... so I made the exactly same conclussion. The problem is with stack vars allocated in header that survive till the tail part? Honza > > Richard. > > > -- > > Eric Botcazou
Re: [PATCH] Support for known unknown alignment
Hi, On Tue, Apr 24, 2012 at 12:31:38PM +0200, Martin Jambor wrote: > Hi, > > On Mon, Apr 23, 2012 at 03:30:19PM +0200, Richard Guenther wrote: > > On Mon, 23 Apr 2012, Martin Jambor wrote: > > > > > Hi, > > > > > > On Mon, Apr 23, 2012 at 12:50:51PM +0200, Richard Guenther wrote: > > > > On Fri, 20 Apr 2012, Martin Jambor wrote: > > > > > > > > > Hi, > > > > > > > > > > two days ago I talked to Richi on IRC about the functions to determine > > > > > the expected alignment of objects and pointers we have and he > > > > > suggested that get_object_alignment_1 and get_pointer_alignment_1 > > > > > should return whether the alignment is actually known and return the > > > > > actual alignment in a reference variable (as they currently return > > > > > bitpos). > > > > > > ... > The testsuite differences I got on Friday were probably noise, tonight > (on an updated svn tree) I did not get any on ppc64-linux, > x86_64-linux or i686-linux. Considering that and the OK above, I'm > going to bootstrap and test also on sparc64-linux and ia64-linux and > if those pass too, I'll commit the patch tomorrow, unless there are > any objections. > There is a Fortran testsuite that, on sparc64-linux, showed that my clever simplification of get_object_alignment_1 handling of TARGET_MEM_REFs was actually very dumb. So this is the working patch, successfully bootstrapped and tested on x86_64-linux (all languages and Ada), i686-linux (likewise), sparc64-linux (all languages - Java + Ada), ia64-linux (all languages) and ppc64-linux (likewise) and tested on hppa-linux (C and C++ only). I still consider the patch approved because it actually changes less stuff but do not want to commit it before a weekend so intend to do it next week. Thanks, Martin 2012-04-25 Martin Jambor * builtins.c (get_object_alignment_1): Return whether we can determine the alignment or conservatively assume byte alignment. Return the alignment by reference. Use get_pointer_alignment_1 for dereference alignment. (get_pointer_alignment_1): Return whether we can determine the alignment or conservatively assume byte alignment. Return the alignment by reference. Use get_ptr_info_alignment to get SSA name alignment. (get_object_alignment): Update call to get_object_alignment_1. (get_object_or_type_alignment): Likewise, fall back to type alignment only when it returned false. (get_pointer_alignment): Update call to get_pointer_alignment_1. * fold-const.c (get_pointer_modulus_and_residue): Update call to get_object_alignment_1. * ipa-prop.c (ipa_modify_call_arguments): Update call to get_pointer_alignment_1. * tree-sra.c (build_ref_for_offset): Likewise, fall back to the type of MEM_REF or TARGET_MEM_REF only when it returns false. * tree-ssa-ccp.c (get_value_from_alignment): Update call to get_object_alignment_1. (ccp_finalize): Use set_ptr_info_alignment. * tree.h (get_object_alignment_1): Update declaration. (get_pointer_alignment_1): Likewise. * gimple-pretty-print.c (dump_gimple_phi): Use get_ptr_info_alignment. (dump_gimple_stmt): Likewise. * tree-flow.h (ptr_info_def): Updated comments of fields align and misalign. (get_ptr_info_alignment): Declared. (mark_ptr_info_alignment_unknown): Likewise. (set_ptr_info_alignment): Likewise. (adjust_ptr_info_misalignment): Likewise. * tree-ssa-address.c (copy_ref_info): Use new access functions to get and set alignment of SSA names. * tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Call mark_ptr_info_alignment_unknown. * tree-ssanames.c (get_ptr_info_alignment): New function. (mark_ptr_info_alignment_unknown): Likewise. (set_ptr_info_alignment): Likewise. (adjust_ptr_info_misalignment): Likewise. (get_ptr_info): Call mark_ptr_info_alignment_unknown. * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Likewise. (bump_vector_ptr): Likewise. * tree-vect-stmts.c (create_array_ref): Use set_ptr_info_alignment. (vectorizable_store): Likewise. (vectorizable_load): Likewise. Index: src/gcc/builtins.c === *** src.orig/gcc/builtins.c --- src/gcc/builtins.c *** called_as_built_in (tree node) *** 263,270 return is_builtin_name (name); } ! /* Compute values M and N such that M divides (address of EXP - N) and !such that N < M. Store N in *BITPOSP and return M. Note that the address (and thus the alignment) computed here is based on the address to which a symbol resolves, whereas DECL_ALIGN is based --- 263,272 return is_builtin_name (name); } ! /* Compute values M and N such that M divides (address of EXP - N
Re: [PATCH] Proper use of decl_function_context in dwar2out.c
Hi, I'd like to drag some attention to this bug again, it is the only ICE when LTO building Firefox with debug info and the problem also happens with the 4.7 so it would be nice to have this fixed for 4.7.1. On Mon, Mar 12, 2012 at 11:51:05AM +0100, Richard Guenther wrote: > On Thu, Mar 8, 2012 at 12:18 PM, Jakub Jelinek wrote: > > On Thu, Mar 08, 2012 at 12:06:46PM +0100, Martin Jambor wrote: > >> /* For local statics lookup proper context die. */ > >> - if (TREE_STATIC (decl) && decl_function_context (decl)) > >> - context_die = lookup_decl_die (DECL_CONTEXT (decl)); > >> + if (TREE_STATIC (decl) && > >> + (ctx_fndecl = decl_function_context (decl)) != NULL_TREE) > >> + context_die = lookup_decl_die (ctx_fndecl); > > > > The formatting is wrong, && shouldn't be at the end of line. > > For the rest I'll defer to Jason, not sure what exactly we want to do there. > > This hunk has been added by Honza: > > I don't think the patch is right. Suppose you have a function-local > class declaration with a static member. Surely the context DIE > you want to use is still the class one, not the function one. > Let me just briefly re-iterate what I have already written before. The above cannot happen because function-local classes are not allowed to have static members. Also, the actual problem happens when we attempt to generate debug info for a VMT of a class defined within a function. I don not think it really matters whether or what context it gets... > So, I'm not sure why we should not be able to create a testcase > that fails even without LTO. I think using get_context_die here > would be more appropriate. Or restrict this to DECL_CONTEXT (decl) > == FUNCTION_DECL. ...and thus I am fine with this as well, which is what the patch below does. It now also has a testcase, LTO builds the testcase and I am currently bootstrapping it and LTOing Firefox with it. But I would be equally happy with my original patch, feeding lookup_decl_die with the result of decl_function_context. OK for trunk and the 4.7 branch? Thanks, Martin 2012-04-27 Martin Jambor PR lto/53138 * dwarf2out.c (dwarf2out_decl): Only lookup die representing context of a variable when the contect is a function. * gcc/testsuite/g++.dg/lto/pr52605_0.C: New test. Index: src/gcc/dwarf2out.c === --- src.orig/gcc/dwarf2out.c +++ src/gcc/dwarf2out.c @@ -19880,7 +19880,9 @@ dwarf2out_decl (tree decl) return; /* For local statics lookup proper context die. */ - if (TREE_STATIC (decl) && decl_function_context (decl)) + if (TREE_STATIC (decl) + && DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) context_die = lookup_decl_die (DECL_CONTEXT (decl)); /* If we are in terse mode, don't generate any DIEs to represent any Index: src/gcc/testsuite/g++.dg/lto/pr52605_0.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/lto/pr52605_0.C @@ -0,0 +1,39 @@ +// { dg-lto-do link } +// { dg-lto-options {{-flto -g}} } + +extern "C" void abort (void); + +class A +{ +public: + virtual int foo (int i); +}; + +int A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +int main (int argc, char *argv[]) +{ + + class B : public A + { + public: +int bar (int i) +{ + return foo (i) + 2; +} + }; + class B b; + + if (b.bar (get_input ()) != 4) +abort (); + return 0; +} +
[PATCH] Remove is_hidden_global_store
This removes is_hidden_global_store in favor of two functions with more clear semantics. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. 2012-04-27 Richard Guenther * tree-flow.h (is_hidden_global_store): Remove. * tree-ssa-sink.c (is_hidden_global_store): Likewise. * tree-ssa-alias.h (ref_may_alias_global_p): Declare. (stmt_may_clobber_global_p): Likewise. * tree-ssa-alias.c (ref_may_alias_global_p): New function. (stmt_may_clobber_global_p): Likewise. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Call stmt_may_clobber_global_p. * tree-ssa-dse.c (dse_possible_dead_store_p): Likewise. Index: gcc/tree-flow.h === *** gcc/tree-flow.h (revision 186897) --- gcc/tree-flow.h (working copy) *** extern void maybe_remove_unreachable_han *** 795,803 /* In tree-ssa-pre.c */ void debug_value_expressions (unsigned int); - /* In tree-ssa-sink.c */ - bool is_hidden_global_store (gimple); - /* In tree-loop-linear.c */ extern void linear_transform_loops (void); extern unsigned perfect_loop_nest_depth (struct loop *); --- 795,800 Index: gcc/tree-ssa-alias.h === *** gcc/tree-ssa-alias.h(revision 186897) --- gcc/tree-ssa-alias.h(working copy) *** extern tree ao_ref_base (ao_ref *); *** 99,109 --- 99,111 extern alias_set_type ao_ref_alias_set (ao_ref *); extern bool ptr_deref_may_alias_global_p (tree); extern bool ptr_derefs_may_alias_p (tree, tree); + extern bool ref_may_alias_global_p (tree); extern bool refs_may_alias_p (tree, tree); extern bool refs_may_alias_p_1 (ao_ref *, ao_ref *, bool); extern bool refs_anti_dependent_p (tree, tree); extern bool refs_output_dependent_p (tree, tree); extern bool ref_maybe_used_by_stmt_p (gimple, tree); + extern bool stmt_may_clobber_global_p (gimple); extern bool stmt_may_clobber_ref_p (gimple, tree); extern bool stmt_may_clobber_ref_p_1 (gimple, ao_ref *); extern bool call_may_clobber_ref_p (gimple, tree); Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c(revision 186897) --- gcc/tree-ssa-alias.c(working copy) *** ptr_deref_may_alias_ref_p_1 (tree ptr, a *** 328,333 --- 328,379 return true; } + /* Return true whether REF may refer to global memory. */ + + bool + ref_may_alias_global_p (tree ref) + { + tree base = get_base_address (ref); + if (DECL_P (base)) + return is_global_var (base); + else if (TREE_CODE (base) == MEM_REF + || TREE_CODE (base) == TARGET_MEM_REF) + return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0)); + return true; + } + + /* Return true whether STMT may clobber global memory. */ + + bool + stmt_may_clobber_global_p (gimple stmt) + { + tree lhs; + + if (!gimple_vdef (stmt)) + return false; + + /* ??? We can ask the oracle whether an artificial pointer + dereference with a pointer with points-to information covering + all global memory (what about non-address taken memory?) maybe + clobbered by this call. As there is at the moment no convenient + way of doing that without generating garbage do some manual + checking instead. + ??? We could make a NULL ao_ref argument to the various + predicates special, meaning any global memory. */ + + switch (gimple_code (stmt)) + { + case GIMPLE_ASSIGN: + lhs = gimple_assign_lhs (stmt); + return (TREE_CODE (lhs) != SSA_NAME + && ref_may_alias_global_p (lhs)); + case GIMPLE_CALL: + return true; + default: + return true; + } + } + /* Dump alias information on FILE. */ Index: gcc/tree-ssa-dce.c === *** gcc/tree-ssa-dce.c (revision 186897) --- gcc/tree-ssa-dce.c (working copy) *** mark_stmt_if_obviously_necessary (gimple *** 370,376 return; } ! if (is_hidden_global_store (stmt)) { mark_stmt_necessary (stmt, true); return; --- 370,376 return; } ! if (stmt_may_clobber_global_p (stmt)) { mark_stmt_necessary (stmt, true); return; Index: gcc/tree-ssa-dse.c === *** gcc/tree-ssa-dse.c (revision 186897) --- gcc/tree-ssa-dse.c (working copy) *** dse_possible_dead_store_p (gimple stmt, *** 169,175 just pretend the stmt makes itself dead. Otherwise fail. */ if (!temp) { ! if (is_hidden_global_store (stmt)) return false; temp = stmt; --- 169,175 just pretend the stmt makes itself dead. Otherwise fail. */ if (!temp)
Re: [PATCH] teach emit_store_flag to use clz/ctz
On Fri, 27 Apr 2012, Paolo Bonzini wrote: > If the value at zero is outside the range [0, GET_MODE_BITSIZE (mode)), > "A != 0" and "A == 0" can be compiled to clz/ctz followed by a subtraction > or one's complement (only for A != 0) and a shift. This trick can be > used effectively on PPC (though there are other problems in the machine > description to fix first). > > Bootstrapped/regtested x86_64-pc-linux-gnu, and tested manually on PPC > with some changes to the machine description. Ok for mainline? See below. > Paolo > > 2012-04-27 Paolo Bonzini > > PR 53087 > * expmed.c (emit_store_flag): Generalize to support cases > where the result is not in the sign bit. Add a trick > involving the value at zero of clz or ctz. > > Index: expmed.c > === > --- expmed.c (revisione 186859) > +++ expmed.c (copia locale) > @@ -5419,6 +5419,7 @@ >enum rtx_code rcode; >rtx subtarget; >rtx tem, last, trueval; > + int shift; > >tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, > normalizep, > target_mode); > @@ -5612,10 +5613,11 @@ > false) <= 1 || (code != LE && code != GT > return 0; > > - /* Try to put the result of the comparison in the sign bit. Assume we > can't > - do the necessary operation below. */ > + /* Assume we can't do the necessary operation below, so try to put the > + result of the comparison in a known bit, given by SHIFT. */ > >tem = 0; > + shift = GET_MODE_BITSIZE (mode) - 1; > >/* To see if A <= 0, compute (A | (A - 1)). A <= 0 iff that result has > the sign bit set. */ > @@ -5650,25 +5652,22 @@ > >if (code == EQ || code == NE) > { > - /* For EQ or NE, one way to do the comparison is to apply an operation > - that converts the operand into a positive number if it is nonzero > - or zero if it was originally zero. Then, for EQ, we subtract 1 and > - for NE we negate. This puts the result in the sign bit. Then we > - normalize with a shift, if needed. > + /* Look for an operation that converts the operand into a positive > + number if it is nonzero or zero if it was originally zero. Then, > + for EQ, we subtract 1 and for NE we negate. This puts the result > + in the sign bit. Then we normalize with a shift, if needed. > > - Two operations that can do the above actions are ABS and FFS, so try > - them. If that doesn't work, and MODE is smaller than a full word, > - we can use zero-extension to the wider mode (an unsigned conversion) > - as the operation. */ > - > - /* Note that ABS doesn't yield a positive number for INT_MIN, but > + ABS can do this; it doesn't yield a positive number for INT_MIN, but >that is compensated by the subsequent overflow when subtracting > - one / negating. */ > + one / negating. If that doesn't work, and MODE is smaller than > + a full word, we can use zero-extension to the wider mode (an > + unsigned conversion) as the operation. > > + FFS could also do this, but we use a more versatile trick with > + CLZ/CTZ below. */ > + >if (optab_handler (abs_optab, mode) != CODE_FOR_nothing) > tem = expand_unop (mode, abs_optab, op0, subtarget, 1); > - else if (optab_handler (ffs_optab, mode) != CODE_FOR_nothing) > - tem = expand_unop (mode, ffs_optab, op0, subtarget, 1); >else if (GET_MODE_SIZE (mode) < UNITS_PER_WORD) > { > tem = convert_modes (word_mode, mode, op0, 1); > @@ -5684,6 +5683,69 @@ > tem = expand_unop (mode, neg_optab, tem, subtarget, 0); > } > > + if (tem == 0) > +{ > + int clz_zero, ctz_zero, if_zero; > + > + /* For EQ or NE, clz and ctz may give a unique value for zero that > is > + not in the [0, GET_MODE_BITSIZE (mode)) range. The result can > be > + massaged to put the result in a known bit. */ > + > + if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, clz_zero)) > +clz_zero = 0; > + if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, ctz_zero)) > +ctz_zero = 0; In optabs.c we compare the CTZ_DEFINED_VALUE_AT_ZERO against two, is != 0 really what you want here? The docs suggest to me that as you are using the optab below you should compare against two, too. What about cost considerations? We only seem to have the general "branches are expensive" metric - but ctz/clz may be prohibitely expensive themselves, no? Richard. > + /* Prefer a negative result if normalize == -1, since that does not > + require an additional left shift. */ > + > + if (normalizep == -1) > +{ > + if (ctz_zero < 0 && clz_zero >= GET_MODE_BITSIZE (mode)) > +clz_zero = 0; > + if (clz_zero < 0 && ctz_zero >= G
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou wrote: >> We do not depend on the block structure any more when dealing with >> stack layout for variables in GCC 4.7.0 and above. I am not saying >> your patch is incorrect or not needed. Just it will not have an >> effect on variable stack layout. > > It might be worth backporting to the 4.6 branch though, these stack layout > issues are very nasty. What about not setting a BLOCK on the call stmt? That said, I can't see how outlining a SESE region that starts at the beginning of the function is not conservatively using the outermost BLOCK ... so, is the bug not elsewhere? Richard. > -- > Eric Botcazou
[Patch,4.7,committed]: Fix PR c/51527
http://gcc.gnu.org/viewcvs?view=revision&revision=186899 Applied as approved in http://gcc.gnu.org/ml/gcc/2012-04/msg00843.html Johann PR c/51527 * convert.c (convert_to_integer): Avoid infinite recursion for target-defined built-in types. Index: convert.c === --- convert.c (revision 186872) +++ convert.c (working copy) @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr (Otherwise would recurse infinitely in convert. */ if (TYPE_PRECISION (typex) != inprec) { + tree otypex = typex; /* Don't do unsigned arithmetic where signed was wanted, or vice versa. Exception: if both of the original operands were @@ -806,10 +807,12 @@ convert_to_integer (tree type, tree expr typex = unsigned_type_for (typex); else typex = signed_type_for (typex); - return convert (type, -fold_build2 (ex_form, typex, - convert (typex, arg0), - convert (typex, arg1))); + + if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex)) + return convert (type, + fold_build2 (ex_form, typex, + convert (typex, arg0), + convert (typex, arg1))); } } }
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
> We do not depend on the block structure any more when dealing with > stack layout for variables in GCC 4.7.0 and above. I am not saying > your patch is incorrect or not needed. Just it will not have an > effect on variable stack layout. It might be worth backporting to the 4.6 branch though, these stack layout issues are very nasty. -- Eric Botcazou
Re: [PATCH] teach phi-opt to produce -(a COND b)
On Fri, 27 Apr 2012, Paolo Bonzini wrote: > This patch teaches phiopt to look at phis whose arguments are -1 and 0, > and produce negated setcc statements. > > Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch > for pr53138. Ok for mainline? Ok. Thanks, Richard. > Paolo > > 2012-04-27 Paolo Bonzini > > * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs > whose arguments are -1 and 0, by negating the result of the > conditional. > > 2012-04-27 Paolo Bonzini > > * gcc.c-torture/execute/20120427-2.c: New testcase. > * gcc.dg/tree-ssa/phi-opt-10.c: New testcase. > > > Index: tree-ssa-phiopt.c > === > --- tree-ssa-phiopt.c (revisione 186859) > +++ tree-ssa-phiopt.c (copia locale) > @@ -536,17 +536,21 @@ >gimple_stmt_iterator gsi; >edge true_edge, false_edge; >tree new_var, new_var2; > + bool neg; > >/* FIXME: Gimplification of complex type is too hard for now. */ >if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE >|| TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE) > return false; > > - /* The PHI arguments have the constants 0 and 1, then convert > - it to the conditional. */ > + /* The PHI arguments have the constants 0 and 1, or 0 and -1, then > + convert it to the conditional. */ >if ((integer_zerop (arg0) && integer_onep (arg1)) >|| (integer_zerop (arg1) && integer_onep (arg0))) > -; > +neg = false; > + else if ((integer_zerop (arg0) && integer_all_onesp (arg1)) > +|| (integer_zerop (arg1) && integer_all_onesp (arg0))) > +neg = true; >else > return false; > > @@ -558,7 +562,7 @@ > falls through into BB. > > There is a single PHI node at the join point (BB) and its arguments > - are constants (0, 1). > + are constants (0, 1) or (0, -1). > > So, given the condition COND, and the two PHI arguments, we can > rewrite this PHI into non-branching code: > @@ -585,12 +589,20 @@ > edge so that we know when to invert the condition below. */ >extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge); >if ((e0 == true_edge && integer_zerop (arg0)) > - || (e0 == false_edge && integer_onep (arg0)) > + || (e0 == false_edge && !integer_zerop (arg0)) >|| (e1 == true_edge && integer_zerop (arg1)) > - || (e1 == false_edge && integer_onep (arg1))) > + || (e1 == false_edge && !integer_zerop (arg1))) > cond = fold_build1_loc (gimple_location (stmt), > - TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); > +TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); > > + if (neg) > +{ > + cond = fold_convert_loc (gimple_location (stmt), > + TREE_TYPE (result), cond); > + cond = fold_build1_loc (gimple_location (stmt), > + NEGATE_EXPR, TREE_TYPE (cond), cond); > +} > + >/* Insert our new statements at the end of conditional block before the > COND_STMT. */ >gsi = gsi_for_stmt (stmt); > Index: testsuite/gcc.c-torture/execute/20120427-2.c > === > --- testsuite/gcc.c-torture/execute/20120427-2.c (revisione 0) > +++ testsuite/gcc.c-torture/execute/20120427-2.c (revisione 0) > @@ -0,0 +1,38 @@ > +typedef struct sreal > +{ > + unsigned sig; /* Significant. */ > + int exp; /* Exponent. */ > +} sreal; > + > +sreal_compare (sreal *a, sreal *b) > +{ > + if (a->exp > b->exp) > +return 1; > + if (a->exp < b->exp) > +return -1; > + if (a->sig > b->sig) > +return 1; > + if (a->sig < b->sig) > +return -1; > + return 0; > +} > + > +sreal a[] = { > + { 0, 0 }, > + { 1, 0 }, > + { 0, 1 }, > + { 1, 1 } > +}; > + > +int main() > +{ > + int i, j; > + for (i = 0; i <= 3; i++) { > +for (j = 0; j < 3; j++) { > + if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort(); > + if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort(); > + if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort(); > +} > + } > + return 0; > +} > Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c > === > --- testsuite/gcc.dg/tree-ssa/phi-opt-10.c(revisione 0) >
Re: [patch] Fix cygwin ada install [was Re: Yet another issue with gcc current trunk with ada on cygwin]
Le 06/04/2012 17:27, Pascal Obry a écrit : > > Back on this! It turn out that this breaks the shared Ada runtime. > Indeed, exported variables on Ada packages in a DLLs are only accessible > when linking against DLL (thanks to runtime pseudo reloc). > > With the patch applied it is not possible to build any Ada application > using the shared runtime. This is a very serious problem, the patch > should be reverted. > > Sorry for not having remembered before about this and why linking > against DLL was so important! > PING? -- Pascal Obry -- gpg --keyserver keys.gnupg.net --recv-key F949BD3B
Re: [PATCH] teach emit_store_flag to use clz/ctz
> Index: config/rs6000/rs6000.md > === > --- config/rs6000/rs6000.md (revisione 186859) > +++ config/rs6000/rs6000.md (copia locale) > @@ -2129,7 +2129,7 @@ > (define_expand "abssi2" >[(set (match_operand:SI 0 "gpc_reg_operand" "") > (abs:SI (match_operand:SI 1 "gpc_reg_operand" "")))] > - "" > + "TARGET_ISEL || TARGET_POWER" >" > { >if (TARGET_ISEL) > @@ -2137,11 +2137,6 @@ >emit_insn (gen_abssi2_isel (operands[0], operands[1])); >DONE; > } > - else if (! TARGET_POWER) > -{ > - emit_insn (gen_abssi2_nopower (operands[0], operands[1])); > - DONE; > -} > }") > > (define_insn "*abssi2_power" > @@ -2188,36 +2183,12 @@ > (match_dup 2)))] >"") > > -(define_insn_and_split "abssi2_nopower" > - [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,r") > -(abs:SI (match_operand:SI 1 "gpc_reg_operand" "r,0"))) > - (clobber (match_scratch:SI 2 "=&r,&r"))] > - "! TARGET_POWER && ! TARGET_ISEL" > - "#" > - "&& reload_completed" > - [(set (match_dup 2) (ashiftrt:SI (match_dup 1) (const_int 31))) > - (set (match_dup 0) (xor:SI (match_dup 2) (match_dup 1))) > - (set (match_dup 0) (minus:SI (match_dup 0) (match_dup 2)))] > - "") > - > (define_insn "*nabs_power" >[(set (match_operand:SI 0 "gpc_reg_operand" "=r") > (neg:SI (abs:SI (match_operand:SI 1 "gpc_reg_operand" "r"] >"TARGET_POWER" >"nabs %0,%1") > > -(define_insn_and_split "*nabs_nopower" > - [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,r") > -(neg:SI (abs:SI (match_operand:SI 1 "gpc_reg_operand" "r,0" > - (clobber (match_scratch:SI 2 "=&r,&r"))] > - "! TARGET_POWER" > - "#" > - "&& reload_completed" > - [(set (match_dup 2) (ashiftrt:SI (match_dup 1) (const_int 31))) > - (set (match_dup 0) (xor:SI (match_dup 2) (match_dup 1))) > - (set (match_dup 0) (minus:SI (match_dup 2) (match_dup 0)))] > - "") > - > (define_expand "neg2" >[(set (match_operand:SDI 0 "gpc_reg_operand" "") > (neg:SDI (match_operand:SDI 1 "gpc_reg_operand" "")))] > @@ -7710,40 +7681,13 @@ > (define_expand "absdi2" >[(set (match_operand:DI 0 "gpc_reg_operand" "") > (abs:DI (match_operand:DI 1 "gpc_reg_operand" "")))] > - "TARGET_POWERPC64" > + "TARGET_POWERPC64 && TARGET_ISEL" >" > { > - if (TARGET_ISEL) > -emit_insn (gen_absdi2_isel (operands[0], operands[1])); > - else > -emit_insn (gen_absdi2_internal (operands[0], operands[1])); > + emit_insn (gen_absdi2_isel (operands[0], operands[1])); >DONE; > }") > > -(define_insn_and_split "absdi2_internal" > - [(set (match_operand:DI 0 "gpc_reg_operand" "=&r,r") > -(abs:DI (match_operand:DI 1 "gpc_reg_operand" "r,0"))) > - (clobber (match_scratch:DI 2 "=&r,&r"))] > - "TARGET_POWERPC64 && !TARGET_ISEL" > - "#" > - "&& reload_completed" > - [(set (match_dup 2) (ashiftrt:DI (match_dup 1) (const_int 63))) > - (set (match_dup 0) (xor:DI (match_dup 2) (match_dup 1))) > - (set (match_dup 0) (minus:DI (match_dup 0) (match_dup 2)))] > - "") > - > -(define_insn_and_split "*nabsdi2" > - [(set (match_operand:DI 0 "gpc_reg_operand" "=&r,r") > -(neg:DI (abs:DI (match_operand:DI 1 "gpc_reg_operand" "r,0" > - (clobber (match_scratch:DI 2 "=&r,&r"))] > - "TARGET_POWERPC64 && !TARGET_ISEL" > - "#" > - "&& reload_completed" > - [(set (match_dup 2) (ashiftrt:DI (match_dup 1) (const_int 63))) > - (set (match_dup 0) (xor:DI (match_dup 2) (match_dup 1))) > - (set (match_dup 0) (minus:DI (match_dup 2) (match_dup 0)))] > - "") > - > (define_insn "muldi3" >[(set (match_operand:DI 0 "gpc_reg_operand" "=r,r") > (mult:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r") > @@ -13100,13 +13044,9 @@ > operands[2], operands[3]); > } > > - /* For SNE, we would prefer that the xor/abs sequence be used for integers. > - For SEQ, likewise, except that comparisons with zero should be done > - with an scc insns. However, due to the order that combine see the > - resulting insns, we must, in fact, allow SEQ for integers. Fail in > - the cases we don't want to handle or are best handled by portable > - code. */ > - if (GET_CODE (operands[1]) == NE) > + /* Fail in the cases we don't want to handle or are best handled by > + portable code. */ > + if (GET_CODE (operands[1]) == NE || GET_CODE (operands[1]) == EQ) > FAIL; >if ((GET_CODE (operands[1]) == LT || GET_CODE (operands[1]) == LE > || GET_CODE (operands[1]) == GT || GET_CODE (operands[1]) == GE) > I included the RS6000 bits for completeness, but I'm not seeking approval now for them. I'll resubmit with proper testcases. Paolo
Re: [PATCH] Make sizetypes no longer sign-extending
> Ah, and all ACATS fails and > > -FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) > -FAIL: gnat.dg/loop_optimization3.adb execution test > -FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) > -FAIL: gnat.dg/test_8bitlong_overflow.adb execution test > > are fixed by for example > > [...] > > thus are because array TYPE_DOMAIN is built using unsigned sizetype > but these Ada testcases have array domains which really need signed > types. The above is of course a hack, but one that otherwise survives > bootstrap / test of all languages. Kind of a miracle if you ask me, but probably a reasonable way out for Ada. Thanks a lot for devising it. > Thus, we arrive at the following Ada regression status if the patch series > is applied (plus the above incremental patch): > > === acats tests === > > === acats Summary === > # of expected passes2320 > # of unexpected failures0 > Native configuration is x86_64-unknown-linux-gnu > > === gnat tests === > > > Running target unix/ > FAIL: gnat.dg/array11.adb (test for warnings, line 12) > FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) > FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized "goto" 2 > FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 > > === gnat Summary for unix/ === > > # of expected passes1093 > # of unexpected failures4 > # of expected failures 13 > # of unsupported tests 2 > > Running target unix//-m32 > FAIL: gnat.dg/array11.adb (test for warnings, line 12) > FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) > FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized "goto" 2 > FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 > > === gnat Summary for unix//-m32 === > > # of expected passes1093 > # of unexpected failures4 > # of expected failures 13 > # of unsupported tests 2 > > === gnat Summary === > > # of expected passes2186 > # of unexpected failures8 > # of expected failures 26 > # of unsupported tests 4 > > > Which I consider reasonable? Sure, no opposition by me to applying the whole set of patches. -- Eric Botcazou
Re: [PATCH] Make sizetypes no longer sign-extending
> To followup myself - bootstrap with just the 2nd patch is still > broken: > > /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ > -B/usr/local/x86_64-unknown-linux-gnu/bin/ > -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem > /usr/local/x86_64-unknown-linux-gnu/include -isystem > /usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic > -W -Wall -gnatpg -nostdinc -m32 s-secsta.adb -o s-secsta.o > s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk' > is too large > Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size); > ^ > make[9]: *** [s-secsta.o] Error 1 Yes, because of a spurious TREE_OVERFLOW set on a sizetype calculation that cannot overflow. Not really fixable in isolation, so I didn't pursue. -- Eric Botcazou
Re: [PATCH] pr53138 - miscompilation of spaceship operator
On Fri, Apr 27, 2012 at 11:49 AM, Paolo Bonzini wrote: > The testcase is miscompiled to this: > > ... > movl (%rsi), %ecx > cmpl %ecx, (%rdi) > sbbl %edx, %edx > cmovbe %edx, %eax > ret > > but sbbl only preserves the carry flag, not the zero flag. I suppose > one could use different patterns with different CC modes, but this is > the safest fix. > > Bootstrapped/regtested x86_64-pc-linux-gnu, ok for mainline? OK. > I also reproduced this at least on 4.7; help would be welcome on > committing it to the branches. Thanks in advance! I can take care of this, but I won' be able to do this until Monday. Thanks, Uros.
[PATCH] teach phi-opt to produce -(a COND b)
This patch teaches phiopt to look at phis whose arguments are -1 and 0, and produce negated setcc statements. Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch for pr53138. Ok for mainline? Paolo 2012-04-27 Paolo Bonzini * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs whose arguments are -1 and 0, by negating the result of the conditional. 2012-04-27 Paolo Bonzini * gcc.c-torture/execute/20120427-2.c: New testcase. * gcc.dg/tree-ssa/phi-opt-10.c: New testcase. Index: tree-ssa-phiopt.c === --- tree-ssa-phiopt.c (revisione 186859) +++ tree-ssa-phiopt.c (copia locale) @@ -536,17 +536,21 @@ gimple_stmt_iterator gsi; edge true_edge, false_edge; tree new_var, new_var2; + bool neg; /* FIXME: Gimplification of complex type is too hard for now. */ if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE || TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE) return false; - /* The PHI arguments have the constants 0 and 1, then convert - it to the conditional. */ + /* The PHI arguments have the constants 0 and 1, or 0 and -1, then + convert it to the conditional. */ if ((integer_zerop (arg0) && integer_onep (arg1)) || (integer_zerop (arg1) && integer_onep (arg0))) -; +neg = false; + else if ((integer_zerop (arg0) && integer_all_onesp (arg1)) + || (integer_zerop (arg1) && integer_all_onesp (arg0))) +neg = true; else return false; @@ -558,7 +562,7 @@ falls through into BB. There is a single PHI node at the join point (BB) and its arguments - are constants (0, 1). + are constants (0, 1) or (0, -1). So, given the condition COND, and the two PHI arguments, we can rewrite this PHI into non-branching code: @@ -585,12 +589,20 @@ edge so that we know when to invert the condition below. */ extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge); if ((e0 == true_edge && integer_zerop (arg0)) - || (e0 == false_edge && integer_onep (arg0)) + || (e0 == false_edge && !integer_zerop (arg0)) || (e1 == true_edge && integer_zerop (arg1)) - || (e1 == false_edge && integer_onep (arg1))) + || (e1 == false_edge && !integer_zerop (arg1))) cond = fold_build1_loc (gimple_location (stmt), - TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); +TRUTH_NOT_EXPR, TREE_TYPE (cond), cond); + if (neg) +{ + cond = fold_convert_loc (gimple_location (stmt), + TREE_TYPE (result), cond); + cond = fold_build1_loc (gimple_location (stmt), + NEGATE_EXPR, TREE_TYPE (cond), cond); +} + /* Insert our new statements at the end of conditional block before the COND_STMT. */ gsi = gsi_for_stmt (stmt); Index: testsuite/gcc.c-torture/execute/20120427-2.c === --- testsuite/gcc.c-torture/execute/20120427-2.c(revisione 0) +++ testsuite/gcc.c-torture/execute/20120427-2.c(revisione 0) @@ -0,0 +1,38 @@ +typedef struct sreal +{ + unsigned sig;/* Significant. */ + int exp; /* Exponent. */ +} sreal; + +sreal_compare (sreal *a, sreal *b) +{ + if (a->exp > b->exp) +return 1; + if (a->exp < b->exp) +return -1; + if (a->sig > b->sig) +return 1; + if (a->sig < b->sig) +return -1; + return 0; +} + +sreal a[] = { + { 0, 0 }, + { 1, 0 }, + { 0, 1 }, + { 1, 1 } +}; + +int main() +{ + int i, j; + for (i = 0; i <= 3; i++) { +for (j = 0; j < 3; j++) { + if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort(); + if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort(); + if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort(); +} + } + return 0; +} Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c === --- testsuite/gcc.dg/tree-ssa/phi-opt-10.c (revisione 0) +++ testsuite/gcc.dg/tree-ssa/phi-opt-10.c (revisione 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-optimized" } */ + +int nem1_phi (unsigned long a) { return a ? -1 : 0; } +int eqm1_phi (unsigned long a) { return a ? 0 : -1; } + +int spaceship1 (long a) { return a > 0 ? 1 : a < 0 ? -1 : 0; } +int spaceship2 (long a) { return a > 0 ? 1 : a == 0 ? 0 : -1; } + +/* { dg-final { scan-tree-dump-times " = -D" 4 "optimized"} } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */
[PATCH] pr53138 - miscompilation of spaceship operator
The testcase is miscompiled to this: ... movl(%rsi), %ecx cmpl%ecx, (%rdi) sbbl%edx, %edx cmovbe %edx, %eax ret but sbbl only preserves the carry flag, not the zero flag. I suppose one could use different patterns with different CC modes, but this is the safest fix. Bootstrapped/regtested x86_64-pc-linux-gnu, ok for mainline? I also reproduced this at least on 4.7; help would be welcome on committing it to the branches. Thanks in advance! Paolo 2012-04-27 Paolo Bonzini PR target/53138 * config/i386/i386.md (x86_movcc_0_m1_neg): Add clobber. 2012-04-27 Paolo Bonzini PR target/53138 * gcc.c-torture/execute/20120427-1.c: New testcase. Index: config/i386/i386.md === --- config/i386/i386.md (revisione 186859) +++ config/i386/i386.md (copia locale) @@ -16439,7 +16439,8 @@ (define_insn "*x86_movcc_0_m1_neg" [(set (match_operand:SWI48 0 "register_operand" "=r") (neg:SWI48 (match_operator 1 "ix86_carry_flag_operator" - [(reg FLAGS_REG) (const_int 0)])))] + [(reg FLAGS_REG) (const_int 0)]))) + (clobber (reg:CC FLAGS_REG))] "" "sbb{}\t%0, %0" [(set_attr "type" "alu") Index: testsuite/gcc.c-torture/execute/20120427-1.c === --- testsuite/gcc.c-torture/execute/20120427-1.c(revisione 0) +++ testsuite/gcc.c-torture/execute/20120427-1.c(revisione 0) @@ -0,0 +1,36 @@ +typedef struct sreal +{ + unsigned sig;/* Significant. */ + int exp; /* Exponent. */ +} sreal; + +sreal_compare (sreal *a, sreal *b) +{ + if (a->exp > b->exp) +return 1; + if (a->exp < b->exp) +return -1; + if (a->sig > b->sig) +return 1; + return -(a->sig < b->sig); +} + +sreal a[] = { + { 0, 0 }, + { 1, 0 }, + { 0, 1 }, + { 1, 1 } +}; + +int main() +{ + int i, j; + for (i = 0; i <= 3; i++) { +for (j = 0; j < 3; j++) { + if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort(); + if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort(); + if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort(); +} + } + return 0; +}
[PATCH] teach emit_store_flag to use clz/ctz
If the value at zero is outside the range [0, GET_MODE_BITSIZE (mode)), "A != 0" and "A == 0" can be compiled to clz/ctz followed by a subtraction or one's complement (only for A != 0) and a shift. This trick can be used effectively on PPC (though there are other problems in the machine description to fix first). Bootstrapped/regtested x86_64-pc-linux-gnu, and tested manually on PPC with some changes to the machine description. Ok for mainline? Paolo 2012-04-27 Paolo Bonzini PR 53087 * expmed.c (emit_store_flag): Generalize to support cases where the result is not in the sign bit. Add a trick involving the value at zero of clz or ctz. Index: expmed.c === --- expmed.c(revisione 186859) +++ expmed.c(copia locale) @@ -5419,6 +5419,7 @@ enum rtx_code rcode; rtx subtarget; rtx tem, last, trueval; + int shift; tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, normalizep, target_mode); @@ -5612,10 +5613,11 @@ false) <= 1 || (code != LE && code != GT return 0; - /* Try to put the result of the comparison in the sign bit. Assume we can't - do the necessary operation below. */ + /* Assume we can't do the necessary operation below, so try to put the + result of the comparison in a known bit, given by SHIFT. */ tem = 0; + shift = GET_MODE_BITSIZE (mode) - 1; /* To see if A <= 0, compute (A | (A - 1)). A <= 0 iff that result has the sign bit set. */ @@ -5650,25 +5652,22 @@ if (code == EQ || code == NE) { - /* For EQ or NE, one way to do the comparison is to apply an operation -that converts the operand into a positive number if it is nonzero -or zero if it was originally zero. Then, for EQ, we subtract 1 and -for NE we negate. This puts the result in the sign bit. Then we -normalize with a shift, if needed. + /* Look for an operation that converts the operand into a positive +number if it is nonzero or zero if it was originally zero. Then, +for EQ, we subtract 1 and for NE we negate. This puts the result +in the sign bit. Then we normalize with a shift, if needed. -Two operations that can do the above actions are ABS and FFS, so try -them. If that doesn't work, and MODE is smaller than a full word, -we can use zero-extension to the wider mode (an unsigned conversion) -as the operation. */ - - /* Note that ABS doesn't yield a positive number for INT_MIN, but +ABS can do this; it doesn't yield a positive number for INT_MIN, but that is compensated by the subsequent overflow when subtracting -one / negating. */ +one / negating. If that doesn't work, and MODE is smaller than +a full word, we can use zero-extension to the wider mode (an +unsigned conversion) as the operation. +FFS could also do this, but we use a more versatile trick with +CLZ/CTZ below. */ + if (optab_handler (abs_optab, mode) != CODE_FOR_nothing) tem = expand_unop (mode, abs_optab, op0, subtarget, 1); - else if (optab_handler (ffs_optab, mode) != CODE_FOR_nothing) - tem = expand_unop (mode, ffs_optab, op0, subtarget, 1); else if (GET_MODE_SIZE (mode) < UNITS_PER_WORD) { tem = convert_modes (word_mode, mode, op0, 1); @@ -5684,6 +5683,69 @@ tem = expand_unop (mode, neg_optab, tem, subtarget, 0); } + if (tem == 0) +{ + int clz_zero, ctz_zero, if_zero; + + /* For EQ or NE, clz and ctz may give a unique value for zero that is + not in the [0, GET_MODE_BITSIZE (mode)) range. The result can be + massaged to put the result in a known bit. */ + + if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, clz_zero)) +clz_zero = 0; + if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, ctz_zero)) +ctz_zero = 0; + + /* Prefer a negative result if normalize == -1, since that does not + require an additional left shift. */ + + if (normalizep == -1) +{ + if (ctz_zero < 0 && clz_zero >= GET_MODE_BITSIZE (mode)) +clz_zero = 0; + if (clz_zero < 0 && ctz_zero >= GET_MODE_BITSIZE (mode)) +ctz_zero = 0; +} + + if ((clz_zero < 0 || clz_zero >= GET_MODE_BITSIZE (mode)) + && optab_handler (clz_optab, mode) != CODE_FOR_nothing) +{ + tem = expand_unop (mode, clz_optab, op0, subtarget, 1); + if_zero = clz_zero; +} + if (tem == 0 + && (ctz_zero < 0 || ctz_zero >= GET_MODE_BITSIZE (mode)) + && optab_handler (ctz_optab, mode) != CODE_FOR_nothing) +{ + tem = expand_unop (mode, ctz_opt
Re: [patch] Obvious: Fix DF solution dirty marking in cfg.c:disconnect_src
Il 27/04/2012 11:18, Steven Bosscher ha scritto: > Hello, > > It makes no sense to mark DF solutions dirty on the gcc_unreachable() > path but not on the return path. > > Bootstrapped&tested on x86_64-unknown-linux-gnu and > powerpc64-unknown-linux-gnu. I'll this, as obvious, some time late > next week unless I hear objections. Ok. Paolo
[patch] Obvious: Fix DF solution dirty marking in cfg.c:disconnect_src
Hello, It makes no sense to mark DF solutions dirty on the gcc_unreachable() path but not on the return path. Bootstrapped&tested on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. I'll this, as obvious, some time late next week unless I hear objections. Ciao! Steven * cfg.c (disconnect_src): Do df_mark_solutions_dirty in the right place. Index: cfg.c === --- cfg.c (revision 186897) +++ cfg.c (working copy) @@ -242,13 +242,13 @@ disconnect_src (edge e) if (tmp == e) { VEC_unordered_remove (edge, src->succs, ei.index); + df_mark_solutions_dirty (); return; } else ei_next (&ei); } - df_mark_solutions_dirty (); gcc_unreachable (); }
Request new specs string token for multilib_os_dir
In the GCC FAQ under "Dynamic linker is unable to find GCC libraries", one suggestion is to add '-R' or '-rpath' linker option to the *link or *lib specs so that the GCC libraries can be found. E.G. the following line is added to the DRIVER_DEFINES when building gcc via pkgsrc ('$(LINKER_RPATH_FLAG)' comes from the environment): -DLINK_LIBGCC_SPEC="\"%D $(LINKER_RPATH_FLAG)$(libdir) \"" This is needed as the prefix is normally something like '/usr/pkg/gcc47'. The problem is that this does not allow for multilib os directory's and there is currently no simple way of dong so. My solution is to add the '%M' token that expands to multilib_os_dir. The above line can then be easily change to handle multilib directory's: -DLINK_LIBGCC_SPEC="\"%D $(LINKER_RPATH_FLAG)$(libdir)/%M \"" 2012-04-27 Steven Drake * gcc.c (do_spec_1): Add %M spec token to output multilib_os_dir. --- gcc/gcc.c.orig 2012-02-28 17:31:38.0 + +++ gcc/gcc.c @@ -5115,6 +5115,13 @@ do_spec_1 (const char *spec, int inswitc return value; break; + case 'M': + if (multilib_os_dir == NULL) + obstack_1grow (&obstack, '.'); + else + obstack_grow (&obstack, multilib_os_dir, strlen(multilib_os_dir)); + break; + case 'G': value = do_spec_1 (libgcc_spec, 0, NULL); if (value != 0) -- Steven
Re: Continue strict-volatile-bitfields fixes
On Fri, Apr 27, 2012 at 11:00:19AM +0200, Richard Guenther wrote: > But won't it re-introduce bugs like PR52080, 52097 or 48124? Also the No. All those are about bitfield stores, not reads. All extract_bit* functions currently pass 0, 0 as bitrange_{start,end}. > proper place for this optimization is lowering and CSEing of bit-field loads > (not that this lowering already exists). I agree, but we don't have that yet. When it is added, optimize_bitfield* should be certainly moved there, but if we do that before, we regress generated code quality. Jakub
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries wrote: > On 26/04/12 12:20, Richard Guenther wrote: >> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries >> wrote: >>> On 25/04/12 11:57, Richard Guenther wrote: >>> Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) + { + bla (5); + a = bar (7); + } + else + { + bla (5); + a = bar (7); + } + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths >> >> I think it's hard to decide whether they have the same value or not, >> since we >> cannot write a test-case that compares the results of calls from 2 >> branches of >> an if statement. void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard "C", but we certainly _can_ do this. Which would question tail-merging the above at all, of course. >>> >>> I added noinline to make sure there's no variation from inlining. >>> ... >>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ >>> ((__noreturn__)); >>> >>> __attribute__((noinline)) >>> void *foo () >>> { >>> return __builtin_return_address (0); >>> } >>> >>> __attribute__((noinline)) >>> void *bar (int b) >>> { >>> if (b) >>> return foo (); >>> else >>> return foo (); >>> } >>> >>> int main() >>> { >>> void *a, *b; >>> a = bar (1); >>> b = bar (0); >>> if (a == b) >>> abort (); >>> return 0; >>> } >>> ... >>> >>> This test-case passes with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >>> -fno-crossjumping >>> ... >>> >>> and fails with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >>> -fcrossjumping >>> ... >>> >>> with the proposed patch, it also fails with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge >>> -fno-crossjumping >>> ... >>> >>> Tree-tail-merge performs the same transformation as cross-jumping for this >>> test-case. I think that the transformation is valid, and that the test-case >>> behaves as expected. Subsequent calls to foo may or may not return the same >>> value, depending on the optimizations, we can't assert a != b, that's just >>> the >>> nature of __builtin_return_address. >>> >>> Btw, this example is not what I meant when I said 'a test-case that >>> compares the >>> results of calls from 2 branches of an if statement'. I tried to say that >>> the >>> semantics of C is defined in terms of taking either 1 branch or the other, >>> rather than executing both in parallel, after which both results would be >>> available. From that point of view, this example compares subsequent calls >>> to >>> foo, not parallel. >> >> Ah, I see. Btw, I was not suggesting that we should not optimize the above. >> >> I started looking at the problem in terms of subsequent calls. Consider >> the >> imaginary attribute nosideeffect, similar to const and pure. >> >> const: >> - no effects except the return value >> - the return value depends only on the parameters >> >> pure: >> - no effects except the return value >> - the return value depends only on the parameters and/or global variables >> >> nosideeffect: >> - no effects except the return value >> >> The code fragment: >> ... >> extern int f (int) __attribute ((nosideeffect)); >> int g() >> { >> int a; >> int b; >> a = f (5); >> b = f (5); >> return a + b; >> } >> ... >> >> would result in a gimple sequence more or less like this: >> ... >> # VUSE <.MEMD.1712_4(D)> >> aD.1707_1 = fD.1704 (5); >> # VUSE <.MEMD.1712_4(D)> >> bD.1708_2 = fD.1704 (5); >> D.1710_3 = aD.1707_1 + bD.1708_2; >> # VUSE <.MEMD.1712_6> >> return D.1710_3; >> ... >> >> The value numbering modification I'm proposing would say that a and b >> have the >> same value, which is not true. I updated the patch to ensure in this >> case that a >> and b would be seen as different. >> Note that things won't break if the attribute is missing on a function, >> since >> that just means that the function would have a vdef, and there would be >> no 2 >> subsequent function calls with the same vuse. >> - but that is not what >>>
Re: Continue strict-volatile-bitfields fixes
On Fri, Apr 27, 2012 at 10:29 AM, Jakub Jelinek wrote: > On Fri, Apr 27, 2012 at 12:42:41PM +0800, Thomas Schwinge wrote: >> > > GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is >> > > VOIDmode.) >> > > >> > > Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case, >> > > or should a later optimization pass be able to figure out that >> > > »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as >> > > common->code, and then be able to conflate these? Any suggestions >> > > where/how to tackle this? >> > >> > The BIT_FIELD_REF is somewhat of a red-herring. It is created by >> > fold-const.c >> > in optimize_bit_field_compare, code that I think should be removed >> > completely. >> > Or it needs to be made aware of strict-volatile bitfield and C++ memory >> > model >> > details. > > I'd actually very much prefer the latter, just disable > optimize_bit_field_compare for strict-volatile bitfield mode and when > avoiding load data races in C++ memory model (that isn't going to be > default, right?). This optimization is useful, and it is solely about > loads, so even C++ memory model usually shouldn't care. But won't it re-introduce bugs like PR52080, 52097 or 48124? Also the proper place for this optimization is lowering and CSEing of bit-field loads (not that this lowering already exists). Well, I'm obviously biased here - fold doing this kind of transform looks out-of-place. Richard. > Jakub
Re: [C11-atomic] [patch] gimple atomic statements
On Thu, Apr 26, 2012 at 10:07 PM, Andrew MacLeod wrote: > On 04/05/2012 05:14 AM, Richard Guenther wrote: >> >> + static inline bool >> + gimple_atomic_has_fail_order (const_gimple gs) >> + { >> + return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE; >> + } >> >> btw, these kind of predicates look superfluous to me - if they are true >> exactly for one atomic kind then users should use the predicate to >> test for that specific atomic kind, not for some random field presence. > > > yeah, a previous incarnation artifact, removed. > >> >> All asserts in inline functions should be gcc_checking_asserts. > > indeed. > >> >> >> you should have a gimple_build_atomic_raw function that takes all >> operands and the atomic kind, avoiding the need for all the repeated >> calls of gimple_atomic_set_* as well as avoid all the repeated checking >> this causes. > > as well. done. > > >> >> + else if (is_gimple_atomic (stmt)) >> + { >> + tree t; >> + if (visit_store) >> + { >> + for (i = 0; i< gimple_atomic_num_lhs (stmt); i++) >> + { >> + t = gimple_atomic_lhs (stmt, i); >> + if (t) >> + { >> + t = get_base_loadstore (t); >> + if (t) >> + ret |= visit_store (stmt, t, data); >> + } >> >> I thought results are always registers? The walk_stmt_load_store_addr_ops >> looks wrong to me. As the loads/stores are implicit (you only have >> addresses) >> you have to adjust all callers of walk_stmt_load_store_addr_ops to handle >> atomics specially as they expect to come along all loads/stores that way. >> Those also handle calls and asms (for "memory" clobber) specially. > > > hmm, yeah they always return a value. I was just copying the gimple_call > code... Why would we need to do this processing for a GIMPLE_CALL lhs and > not a GIMPLE_ATOMIC lhs? GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar GIMPLE_CALL arguments can be memory if they are aggregate. Can this be the case for atomics as well? > And the RHS processing is the same as for a > is_gimple_call as well... it was lifted from the code immediately > following.,.. I tried to write all this code to return the same values as > would have been returned had it actually been a built-in __sync or __atomic > call like we have today. Once I had them all, then we could actually make > any improvements based on our known side effect limits. > > I guess I don't understand the rest of the comment about why I need to do > something different here than with a call... Well, all callers that use walk_stmt_load_store/addr_ops need to handle non-explicit loads/stores represented by the stmt. For calls this includes the loads and stores the callee may perform. For atomics this includes ? (depends on whether the operand of an atomic-load is a pointer or an object, I suppose it is a pointer for the following) For atomic this includes the implicit load of *{address operand} which will _not_ be returned by walk_stmt_load_store/addr_ops. Thus callers that expect to see all loads/stores (like ipa-pure-const.c) need to explicitely handle atomics similar to how they handle calls and asms (if only in a very conservative way). Similar the alias oracle needs to handle them (obviously). >> Index: tree-ssa-alias.c >> === >> *** tree-ssa-alias.c (revision 186098) >> --- tree-ssa-alias.c (working copy) >> *** ref_maybe_used_by_stmt_p (gimple stmt, t >> *** 1440,1445 >> --- 1440,1447 >> } >> else if (is_gimple_call (stmt)) >> return ref_maybe_used_by_call_p (stmt, ref); >> + else if (is_gimple_atomic (stmt)) >> + return true; >> >> please add a comment before these atomic handlings that we assume >> they are using/clobbering all refs because they are considered memory >> optimization barriers. Btw, does this apply to non-address-taken >> automatic >> references? I suppose not. Consider: >> >> int foo() >> { >> struct s; >> atomic_fence(); >> s.a = 1; >> atomic_fence(); >> return s.a; >> } >> >> we still should be able to optimize this to return 1, no? At least SRA >> will > > yes, locals can do anything they want since they aren't visible to other > processes. at the moment, we'll leave those fences in because we dont > optimize atomics at all, but "in the fullness of time" this will be > optimized to: > int foo() > { > atomic_fence() > return 1; > } > > at the moment we produce: > > int foo() > { > atomic_fence() > atomic_fence() > return 1; > > } Which is inconsistend with the alias-oracle implementation. Please fix it to at _least_ not consider non-aliased locals. You can look at the call handling for how to do this - where you can also see how to do even better by using points-to information from the pointer operand of the atomics that specify the memory lo
Re: [C11-atomic] [patch] gimple atomic statements
On Thu, Apr 26, 2012 at 7:53 PM, Andrew MacLeod wrote: > On 04/05/2012 05:14 AM, Richard Guenther wrote: >> >> >> Ok. Remember that you should use non-tree things if you can in GIMPLE >> land. This probably means that both the size and the memmodel "operands" >> should be >> >> + struct GTY(()) gimple_statement_atomic >> + { >> + /* [ WORD 1-8 ] */ >> + struct gimple_statement_with_memory_ops_base membase; >> + >> + /* [ WORD 9 ] */ >> + enum gimple_atomic_kind kind; >> + >> enum gimple_atomic_memmodel memmodel; >> >> unsigned size; >> >> and not be trees in the ops array. Even more, both kind and memmodel >> should probably go in membase.base.subcode > > > I'm using subcode now for for the fetch_op and op_fetch operation when we > actually need a tree code for plus, sub, and, etc... so I am utilizing that > field. Since the 'kind' field is present in ALL node, and the tree code was > only needed in some, it looked cleaner to utilize the subcode field for the > 'sometimes' field so that it wouldnt be an obvious wasted field in all the > non-fetch nodes. In the end it amounts to the same thing, but just looked > cleaner :-) I could change if it you felt strongly about it and use > subcode for the kind and create a tree_code field in the object for the > operation. > > Since integral atomics are always of an unsigned type , I could switch over > and use 'unsigned size' instead of 'tree fntype' for them (I will rename > it), but then things may be more complicated when dealing with generic > atomics... those can be structure or array types and I was planning to > allow leaving the type in case I discover something useful I can do with it. > It may ultimately turn out that the real type isn't going to matter, in > which case I will remove it and replace it with an unsigned int for size. So it eventually will support variable-size types? > And the reason memmodel is a tree is because, as ridiculous as it seems, it > can ultimately be a runtime value. Even barring that, it shows up as a > variable after inlining before various propagation engines run, especially > in the C++ templates. So it must be a tree. Ick. That sounds gross. So, if it ends up a variable at the time we generate assembly we use a "conservative" constant value for it? > >>> I was actually thinking about doing it during gimplification... I hadnt >>> gotten as far as figuring out what to do with the functions from the >>> front >>> end yet. I dont know that code well, but I was in fact hoping there was >>> a >>> way to 'recognize' the function names easily and avoid built in functions >>> completely... >> >> Heh ... you'd still need a GENERIC representation then. Possibly >> a ATOMIC_OP tree may do. > > possibly... or maybe a single generic atomic_builtin with a kind and a > variable list of parameters. > > >> well, the names must remain exposed and recognizable since they are 'out >> there'. Maybe under the covers I can just leave them as normal calls and >> then during gimplification simply recognize the names and generate >> GIMPLE_ATOMIC statements directly from the CALL_EXPR. That would be >> ideal. >> That way there are no builtins any more. >> I suppose my question was whether the frontends need to do sth about the >> __atomic keyword or if that is simply translated to some type flag - or >> is that keyword applying to operations, not to objects or types? >> > The _Atomic keyword is a type modifier like const or volatile. So during > gimplification I'll also look for all occurrences of variables in normal > expressions which have that bit set in the type, then translate the > expression to utilize the new gimple atomic node. so > > _Atomic int var = 0; > var += 4; > foo (var); > > would become > > __atomic_add_fetch (&var, 4, SEQ_CST); > D.123 = __atomic_load (&var, SEQ_CST); > foo (D.123); Hmm, ok. So you are changing GENERIC in fact. I suppose _Atomic is restricted to global variables. Can I attach _Atomic to allocated storage via pointer casts? Consider writing up semantics of _Atomic into generic.texi please. > > > > >>> > So bottom line, a GIMPLE_ATOMIC statement is just an object that is > much > easier to work with. Yes, I see that it is easier to work with for you. All other statements will see GIMPLE_ATOMICs as blockers for their work though, even if they already deal with calls just fine - that's why I most of the time suggest to use builtins (or internal fns) to do things (I bet you forgot to update enough predicates ...). Can GIMPLE_ATOMICs throw with -fnon-call-exceptions? I suppose yes. One thing you missed at least ;) >>> >>> >>> Not that I am aware of, they are 'noexcept'. But I'm sure I've missed >>> more >>> than a few things so far. Im pretty early in the process :-) >> >> What about compare-exchange on a pointer dereference? The pointer >> dereference surely can trap, so it can throw w
Re: [PATCH][ARM] NEON DImode immediate constants
Ping. On 10/04/12 14:00, Andrew Stubbs wrote: Ping. On 30/03/12 12:15, Andrew Stubbs wrote: On 28/02/12 16:20, Andrew Stubbs wrote: Hi all, This patch implements 64-bit immediate constant loads in NEON. The current state is that you can load const_vector, but not const_int. This is clearly not ideal. The result is a constant pool entry when it's not necessary. The patch disables the movdi_vfp patterns for loading DImode values, if the operand is const_int and NEON is enabled, and extends the neon_mov pattern to include DImode const_int, as well as the const_vector operands. I've modified neon_valid_immediate only enough to accept const_int input - the logic remains untouched. That patch failed to bootstrap successfully, but this updated patch bootstraps and tests with no regressions. OK? Andrew
Re: combine_conversions int->double->int
On Fri, 27 Apr 2012, Richard Guenther wrote: Do you have a copyright assignment on file? Yes. -- Marc Glisse
Re: Continue strict-volatile-bitfields fixes
On Fri, Apr 27, 2012 at 12:42:41PM +0800, Thomas Schwinge wrote: > > > GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is > > > VOIDmode.) > > > > > > Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case, > > > or should a later optimization pass be able to figure out that > > > »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as > > > common->code, and then be able to conflate these? Any suggestions > > > where/how to tackle this? > > > > The BIT_FIELD_REF is somewhat of a red-herring. It is created by > > fold-const.c > > in optimize_bit_field_compare, code that I think should be removed > > completely. > > Or it needs to be made aware of strict-volatile bitfield and C++ memory > > model > > details. I'd actually very much prefer the latter, just disable optimize_bit_field_compare for strict-volatile bitfield mode and when avoiding load data races in C++ memory model (that isn't going to be default, right?). This optimization is useful, and it is solely about loads, so even C++ memory model usually shouldn't care. Jakub
Re: [C] improve missing initializers diagnostics
On 25 April 2012 18:08, Manuel López-Ibáñez wrote: > On 25 April 2012 16:46, H.J. Lu wrote: >> On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez >> wrote: >>> This patch improves missing initializers diagnostics. From: >>> >>> pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers] >>> .h = {1}, >>> ^ >>> pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’) >>> [-Wmissing-field-initializers] >>> .h = {1}, >>> ^ >>> >>> to: >>> >>> pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct >>> h’ [-Wmissing-field-initializers] >>> .h = {1}, >>> ^ >>> pr36446.c:3:7: note: ‘b’ declared here >>> int b; >>> ^ >>> >>> Bootstrapped/regression tested. >>> >>> OK? >>> >>> >>> 2012-04-19 Manuel López-Ibáñez >>> >>> * c-typeck.c (pop_init_level): Improve diagnostics. >>> testsuite/ >>> * gcc.dg/m-un-2.c: Update. >>> * gcc.dg/20011021-1.c: Update. >> >> On Linux/x86, revision 186808 gave me: >> >> FAIL: gcc.dg/20011021-1.c (test for warnings, line 34) >> FAIL: gcc.dg/20011021-1.c (test for warnings, line 41) >> FAIL: gcc.dg/20011021-1.c (test for warnings, line 44) >> FAIL: gcc.dg/20011021-1.c (test for excess errors) >> FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27) >> FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30) >> FAIL: gcc.dg/m-un-2.c (test for excess errors) >> FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12) >> FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 14) >> FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 7) >> FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 8) >> FAIL: gcc.dg/missing-field-init-2.c (test for excess errors) >> >> Revision 186806 is OK. > > Somehow I committed a broken version of the patch. It should have been this: > > > --- gcc/c-typeck.c (revision 186821) > +++ gcc/c-typeck.c (working copy) > @@ -7063,11 +7063,11 @@ pop_init_level (int implicit, struct obs > if (warning_at (input_location, OPT_Wmissing_field_initializers, > "missing initializer for field %qD of %qT", > constructor_unfilled_fields, > constructor_type)) > inform (DECL_SOURCE_LOCATION (constructor_unfilled_fields), > - "%qT declared here", constructor_unfilled_fields); > + "%qD declared here", constructor_unfilled_fields); > } > } > > > I'll commit as soon as it finishes bootstrapping+testing. Committed as revision 186896. Cheers, Manuel.
Re: combine_conversions int->double->int
On Thu, Apr 26, 2012 at 8:43 PM, Marc Glisse wrote: > On Thu, 26 Apr 2012, Richard Guenther wrote: > >> On Wed, Apr 25, 2012 at 3:58 PM, Marc Glisse wrote: >>> >>> Here is take 2 on this patch, which seems cleaner. Bootstrapped and >>> regression tested. >>> >>> gcc/ChangeLog >>> >>> 2012-04-25 Marc Glisse >>> >>> PR middle-end/27139 >>> * tree-ssa-forwprop.c (combine_conversions): Handle INT->FP->INT. >>> >>> gcc/testsuite/ChangeLog >>> >>> 2012-04-25 Marc Glisse >>> >>> PR middle-end/27139 >>> * gcc.dg/tree-ssa/forwprop-18.c: New test. >>> >>> >>> In my patch, the lines with gimple_assign_* are vaguely guessed from what >>> is >>> around, I don't pretend to understand them. >> >> >> ;) >> >> The patch looks good to me - on a 2nd thought folding the case back in >> to the if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) >> block to benefit from the local vars therein makes sense (but as separate >> sub-if () as it is now). > > > Like the attached? (c,c++ bootstrapped, make -k check, no regression) > > I changed the types in the testcase to make it really unlikely that a > platform needs to skip it (it would require a long double that can't > represent all signed char, or a float that can represent all unsigned long > long). Yes. Do you have a copyright assignment on file? Thanks, Richard. > -- > Marc Glisse
Re: [patch] Simplify tree-switch-conversion.c a bit - prep work for gimple switch lowering
On Thu, Apr 26, 2012 at 7:48 PM, Steven Bosscher wrote: > Hello, > > The attached patch re-organizes some code in tree-switch-conversion.c. > All information about a GIMPLE_SWITCH is now collected by one > function, so that my switch lowering code can use the same > switch_conv_info. Bootstrapped&tested on x86_64-unknown-linux-gnu. OK > for trunk? Ok. Thanks, Richard. > Ciao! > Steven
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Fri, Apr 27, 2012 at 12:07 AM, Igor Zamyatin wrote: > Are you sure that tree-level unrollers are turned on at O2? My > impression was that they work only at O3 or with f[unroll,peel]-loops > flags. yes they are on but only have effect on tiny loops with very small trip count. With O3 or with -funroll,peel-loops, the size is allowed to grow. David > > On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen wrote: >> tejohn...@google.com (Teresa Johnson) writes: >> >>> This patch adds heuristics to limit unrolling in loops with branches >>> that may increase branch mispredictions. It affects loops that are >>> not frequently iterated, and that are nested within a hot region of code >>> that already contains many branch instructions. >>> >>> Performance tested with both internal benchmarks and with SPEC >>> 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a >>> couple of different AMD Opteron systems. >>> This improves performance of an internal search indexing benchmark by >>> close to 2% on all the tested Intel platforms. It also consistently >>> improves 445.gobmk (with FDO feedback where unrolling kicks in) by >>> close to 1% on AMD Opteron. Other performance effects are neutral. >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? >> >> One problem with any unrolling heuristics is currently that gcc has >> both the tree level and the rtl level unroller. The tree one is even >> on at -O3. So if you tweak anything for one you have to affect both, >> otherwise the other may still do the wrong thing(tm). > > Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2, > both of them are turned on, but gcc does not allow any code growth -- > which makes them pretty useless at O2 (very few loops qualify). The > default max complete peel iteration is also too low compared with both > icc and llvm. This needs to be tuned. > > David > >> >> For some other tweaks I looked into a shared cost model some time ago. >> May be still needed. >> >> -Andi >> >> -- >> a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
Are you sure that tree-level unrollers are turned on at O2? My impression was that they work only at O3 or with f[unroll,peel]-loops flags. On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen wrote: > tejohn...@google.com (Teresa Johnson) writes: > >> This patch adds heuristics to limit unrolling in loops with branches >> that may increase branch mispredictions. It affects loops that are >> not frequently iterated, and that are nested within a hot region of code >> that already contains many branch instructions. >> >> Performance tested with both internal benchmarks and with SPEC >> 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a >> couple of different AMD Opteron systems. >> This improves performance of an internal search indexing benchmark by >> close to 2% on all the tested Intel platforms. It also consistently >> improves 445.gobmk (with FDO feedback where unrolling kicks in) by >> close to 1% on AMD Opteron. Other performance effects are neutral. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? > > One problem with any unrolling heuristics is currently that gcc has > both the tree level and the rtl level unroller. The tree one is even > on at -O3. So if you tweak anything for one you have to affect both, > otherwise the other may still do the wrong thing(tm). Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2, both of them are turned on, but gcc does not allow any code growth -- which makes them pretty useless at O2 (very few loops qualify). The default max complete peel iteration is also too low compared with both icc and llvm. This needs to be tuned. David > > For some other tweaks I looked into a shared cost model some time ago. > May be still needed. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only