Re: [tsan] Instrument atomics
On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov dvyu...@google.com wrote: Ok. A slight problem then is that where the tsan pass sits right now, there is no easy way to find out if the builtin call will be expanded inline or not, so (similar for asan), if we instrument them in the pass, it might be instrumented twice at runtime if the builtin is expanded as a library call (once the added instrumentation for the builtin, once in the intercepted library call). That isn't wrong, just might need slightly more resources than if we ensured we only instrument the builtin if it isn't expanded inline. Should inlining of those functions be disabled as if -fno-builtins is specified? Yes, it sounds reasonable. Performance characteristics under tsan differ significantly, so most likely we don't care. Do we still need range access functions then?
Re: [PATCH] Section anchors and thread-local storage
David Edelsohn dje@gmail.com writes: I have been working to enable native thread-local storage on AIX. One problem I encountered is the AIX assembler has difficulty with the anchor symbol for TLS CSECTs. While the section anchors machinery uses a separate pool for TLS entries, should section anchor blocks be used for TLS symbols at all? powerpc-linux uses GOT annotations directly and never places TLS symbols in the TOC at all. Section anchors seem to be avoided by TLS code already. The appended patch rejects TLS symbols for object blocks in general. I could add a target hook, but I wanted to propose this policy in general before pursing a target-specific hook. Yeah, TLS anchors work on mips64-linux-gnu, although admittedly section anchors aren't yet enabled by default there, so perhaps not many people have tried it. A use_blocks_for_decl_p hook sounds good, and fits nicely with targetm.use_blocks_for_constant_p. Thanks, Richard
Re: [tsan] Instrument atomics
On Mon, Nov 26, 2012 at 12:17:44PM +0400, Dmitry Vyukov wrote: On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov dvyu...@google.com wrote: Ok. A slight problem then is that where the tsan pass sits right now, there is no easy way to find out if the builtin call will be expanded inline or not, so (similar for asan), if we instrument them in the pass, it might be instrumented twice at runtime if the builtin is expanded as a library call (once the added instrumentation for the builtin, once in the intercepted library call). That isn't wrong, just might need slightly more resources than if we ensured we only instrument the builtin if it isn't expanded inline. Should inlining of those functions be disabled as if -fno-builtins is specified? Yes, it sounds reasonable. Performance characteristics under tsan differ significantly, so most likely we don't care. Do we still need range access functions then? Yes. I think whether to instrument builtins inline or not should be a user decision, -fsanitize-thread and either nothinng or -fno-builtin. Implying -fno-builtin might penalize some code unnecessarily (remember, builtin handling is not only about expanding some builtins inline, but also about folding them if they are used with constant arguments, performing all sorts of optimizations etc., all of which are ok, just we don't have a knob to disable inline expansion of some builtins (which, some are certainly ok, e.g. those that don't have a library counterpart). Also note that many programs use __builtin_* calls directly and then you can't disable recognizing them as builtins. Jakub
[patch] Re: MULTIARCH_DIRNAME in gcc/config/rs6000/t-linux doesn't work
Am 26.11.2012 09:50, schrieb Jan-Benedict Glaw: The ifeq in this new makefile fragment doesn't work due to unbalanced brackets. Fixed, and committing as obvious. Sorry, apparently I missed testing the non-multilib'd configuration. Matthias 2012-11-26 Matthias Klose d...@ubuntu.com * config/rs6000/t-linux (MULTIARCH_DIRNAME): Fix unbalanced parentheses. Index: config/rs6000/t-linux === --- config/rs6000/t-linux (Revision 193804) +++ config/rs6000/t-linux (Arbeitskopie) @@ -1,5 +1,5 @@ # do not define the multiarch name if configured for a soft-float cpu # or soft-float. -ifeq (,$(filter $(with_cpu),$(SOFT_FLOAT_CPUS))$(findstring soft,$(with_float)) +ifeq (,$(filter $(with_cpu),$(SOFT_FLOAT_CPUS))$(findstring soft,$(with_float))) MULTIARCH_DIRNAME = powerpc-linux-gnu endif
Re: [Patch,testsuite] ad PR52641: More fixes for not-so-common targets
Hans-Peter Nilsson wrote: On Fri, 23 Nov 2012, Georg-Johann Lay wrote: Here are some more fixes for 16-bit int and similar. * gcc.c-torture/execute/20120919-1.x: New file (int32plus). No, you should be able to use dg-directives in the main file these days. (The .x files are obsolete since a few years, IIRC.) Would you be more specific? AFAIK this works only with dg. After removing the x file and adding /* { dg-do run } */ /* { dg-require-effective-target int32plus } */ to the test it is still executed. Thanks, Johann
Re: [PATCH][AARCH64] Fix the name mangling of va_list
On 21/11/12 14:31, Yufeng Zhang wrote: Hi, This patch updates the AArch64 port to mangle __va_list as it is in namespace std in C++. This is specified in the AArch64 AAPCS64 ABI doc. OK for the trunk? Thanks, Yufeng gcc/ChangeLog 2012-11-21 Yufeng Zhang yufeng.zh...@arm.com * config/aarch64/aarch64.c (aarch64_mangle_type): New function. (TARGET_MANGLE_TYPE): Define. The change to the AArch64 port itself is OK. gcc/testsuite/ChangeLog 2012-11-21 Yufeng Zhang yufeng.zh...@arm.com * g++.dg/abi/arm_va_list.C: Also test on aarch64*-*-*. // AAPCS \S 7.1.4 requires that va_list be a typedef for struct // __va_list. The mangling is as if it were std::__va_list. +// So is required for the AArch64 target. The functional change in this test makes sense however the comment change is slightly confusing. The original comment refers to the procedure call standard for AArch32: IHI0042D_aapcs.pdf. The procedure call standard for AArch64 is defined in IHI0055A_aapcs.pdf. This document also discusses va_list in chapter 7.1.4. Perhaps a different form of words distinguishes between the two different PCS documents would be better? Perhaps something along these lines: // AAPCS \S 7.1.4 requires that va_list be a typedef for struct // __va_list. The mangling is as if it were std::__va_list. // AArch64 PCS IHI0055A_aapcs64.pdf \S 7.1.4 requires that va_list // be a typedef for struct __va_list. The mangling is as if it // were std::__va_list. In any case I don;t believe I can OK this change to the generic part of the test suite. Suggest you CC Mike Stump or Janis Johnson. Cheers /Marcus
Re: [Patch, Fortran] PR 54997: -Wunused-function gives false warnings for procedures passed as actual argument
2012/11/25 Mikael Morin mikael.mo...@sfr.fr: Le 22/10/2012 16:49, Janus Weil a écrit : Minor update to the patch: It now also sets TREE_USED for entry masters in order to avoid bogus warnings for procedures with ENTRY (cf. comment 6 in the PR, which like comment 0 is a 4.8 regression). Still regtests cleanly. Ok? OK with an extra test for comment 6. Thanks a bunch. Committed as r193811. (A test for comment 6 was already included in the updated test case I sent together with the updated patch.) Thanks. And sorry for the delay. No reason to apologize! Better late than never ;) Cheers, Janus 2012/10/21 Janus Weilja...@gcc.gnu.org: Hi all, here is another patch to silence some more of the bogus warnings about unused functions that gfortran is currently throwing (cf. also the previous patch for PR 54224). It fixes the usage of the 'referenced' attribute, which should only be given to procedures which are actually 'used' (called/referenced). Then TREE_USED is set according to this attribute, which in turn silences the warning in the middle-end. The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2012-10-21 Janus Weilja...@gcc.gnu.org PR fortran/54997 * decl.c (match_procedure_decl): Don't set 'referenced' attribute for PROCEDURE declarations. * parse.c (gfc_fixup_sibling_symbols,parse_contained): Don't set 'referenced' attribute for all contained procedures. * trans-decl.c (gfc_get_symbol_decl): Allow for unreferenced procedures. (build_function_decl): Set TREE_USED for referenced procedures. 2012-10-21 Janus Weilja...@gcc.gnu.org PR fortran/54997 * gfortran.dg/warn_unused_function_2.f90: New.
Re: [Patch,testsuite] ad PR52641: More fixes for not-so-common targets
On Mon, 26 Nov 2012, Georg-Johann Lay wrote: Hans-Peter Nilsson wrote: On Fri, 23 Nov 2012, Georg-Johann Lay wrote: Here are some more fixes for 16-bit int and similar. * gcc.c-torture/execute/20120919-1.x: New file (int32plus). No, you should be able to use dg-directives in the main file these days. (The .x files are obsolete since a few years, IIRC.) Would you be more specific? You seem to have understood what I meant, so I guess not needed. :) AFAIK this works only with dg. I had another look. I thought Mark Mitchells overhaul in 2003 adjusted *all* gcc/testsuite subdirectories; I grepped gcc.c-torture/execute/* and gcc.c-torture/compile/* for dg-, lots of apparently-working examples. But that's wrong; observe the differences between gcc.c-torture/execute/execute.exp and gcc.c-torture/compile/compile.exp; they should differ only in the dg-do-what-default, but the whole dg- thing is missing in gcc.c-torture/execute/execute.exp. I haven't dug into the ml archives for the reason, maybe there were problems and not enough time to solve them. At least according to svn it doesn't seem to be a reverted change. After removing the x file and adding /* { dg-do run } */ /* { dg-require-effective-target int32plus } */ to the test it is still executed. Yeah... If it had been a new test, I would have suggested to put it in gcc.dg/torture instead, the new preferred home for executed torture-tests. So unless you feel like completing Mark's dg- makeover for gcc.c-torture/execute you'll have to go with your .x file. Sorry for putting you on the wrong track, but thanks for getting light on the issue. Mike, other testsuite maintainers; heads-up: there are several dg- directives in files in gcc.c-torture/execute. That's just misleading, since they have no effect. (I just verified that, testing a few examples.) brgds, H-P
Re: [Patch, Fortran] PR 54997: -Wunused-function gives false warnings for procedures passed as actual argument
Le 26/11/2012 12:21, Janus Weil a écrit : 2012/11/25 Mikael Morinmikael.mo...@sfr.fr: Le 22/10/2012 16:49, Janus Weil a écrit : Minor update to the patch: It now also sets TREE_USED for entry masters in order to avoid bogus warnings for procedures with ENTRY (cf. comment 6 in the PR, which like comment 0 is a 4.8 regression). Still regtests cleanly. Ok? OK with an extra test for comment 6. Thanks a bunch. Committed as r193811. (A test for comment 6 was already included in the updated test case I sent together with the updated patch.) Ah, yes, indeed. Didn't see it.
[PATCH] Remove redundant variable in hash_set
I don't see why we need the `hash' variable, when we can use the `regno' variable directly. Regtested/bootstrapped on x86_64-linux. Ok for trunk? 2012-11-26 Marek Polacek pola...@redhat.com * cprop.c (hash_set): Remove hash variable. Use regno variable directly. --- gcc/cprop.c.mp 2012-11-26 12:13:08.631193625 +0100 +++ gcc/cprop.c 2012-11-26 12:13:25.836238566 +0100 @@ -170,10 +170,7 @@ reg_available_p (const_rtx x, const_rtx static unsigned int hash_set (int regno, int hash_table_size) { - unsigned int hash; - - hash = regno; - return hash % hash_table_size; + return (unsigned) regno % hash_table_size; } /* Insert assignment DEST:=SET from INSN in the hash table. Marek
Re: [RFA/4.7/ARM] Backport arm-*-linux-gnueabihf triplet support to 4.7
On Tue, 20 Nov 2012, Matthew Gretton-Dann wrote: All, This patch backports Matthais Klose's arm*-*-linux-gnueabihf triplet support patch of 2012-10-15 to 4.7. The backport was not clean as 4.8 has obsoleted various arm*-*-* triplets which are valid in 4.7. I have tested this cross with arm-none-linux-gnueabihf and arm-none-linux-gnueabi. One question I do have having done this work - is there a canonical way to test for the arm*-*-linux-gnueabi triplet (or variants)? Various configure and testsuite files test for this, but there doesn't seem to be a consistent method. OK for 4.7? I'll defer to the arm maintainers for this. Richard. Thanks, Matt 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline 2012-10-15 Matthias Klose d...@ubuntu.com * config.gcc: Match arm*-*-linux-* for ARM Linux/GNU. * doc/install.texi: Use arm-*-*linux-* instead of arm-*-*linux-gnueabi. gcc/ada/ChangeLog: 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline. 2012-10-15 Matthias Klose d...@ubuntu.com * gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for ARM Linux/GNU. gcc/testsuite/ChangeLog: 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline 2012-10-15 Matthias Klose d...@ubuntu.com * lib/target-supports.exp (check_profiling_available): Match arm*-*-linux-* for ARM Linux/GNU. * gfortran.dg/enum_10.f90: Likewise. * gfortran.dg/enum_9.f90: Likewise. * gcc.target/arm/synchronize.c: Likewise. * g++.old-deja/g++.jason/enum6.C: Likewise. * g++.old-deja/g++.law/enum9.C: Likewise. * g++.old-deja/g++.other/enum4.C: Likewise. libgcc/ChangeLog: 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline. 2012-10-15 Matthias Klose d...@ubuntu.com * config.host: Match arm*-*-linux-* for ARM Linux/GNU. libjava/ChangeLog: 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline. 2012-10-15 Matthias Klose d...@ubuntu.com * configure.ac: Match arm*-*-linux-* for ARM Linux/GNU. * configure: Regenerate. libstdc++-v3/ChangeLog: 2012-11-08 Matthew Gretton-Dann matthew.gretton-d...@linaro.org Backport from mainline 2012-10-15 Matthias Klose d...@ubuntu.com * configure.host: Match arm*-*-linux-* for ARM Linux/GNU. * testsuite/20_util/make_signed/requirements/typedefs-2.cc: Likewise. * testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: Likewise.
Patch ping
- PR55137 fold reassociation fix http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00636.html (alternatively also retry with unsigned atype if the reassociation would fail otherwise) - PR43631 fix up var-tracking notes in between bbs http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01201.html - PR55331 substitute_and_fold fix for memmove folding to nothing http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01302.html - convert asan to use builtins instead of building fndecls on the fly http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01875.html - PR55380 tiny libcpp performance fix + hack for asan/valgrind http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01973.html Jakub
Re: [Patch AArch64] Add support for TARGET_BUILTIN_DECL
On 21/11/12 15:05, James Greenhalgh wrote: Hi, This patch wires up support for TARGET_BUILTIN_DECL in the AArch64 backend. Is this OK to commit? Thanks, James Greenhalgh --- gcc/ 2012-11-21 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-builtins.c (aarch64_builtin_decls): New. (aarch64_init_simd_builtins): Store declaration after builtin initialisation. (aarch64_builtin_decl): New. * config/aarch64/aarch64-protos.h (aarch64_builtin_decl): New. * config/aarch64/aarch64.c (TARGET_BUILTIN_DECL): Define. OK and please back port to ARM/aarch64-4.7-branch Thanks /Marcus
Re: [PATCH, generic] Fix for define_subst
Guys, this is a ping. Thanks, K On Thu, Nov 22, 2012 at 9:36 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Here is copy-and-paste from issue raised by Ian (in the bottom). Fix is attached. ChangeLog entry: 2012-11-22 Michael Zolotukhin michael.v.zolotuk...@intel.com * gensupport.c (add_c_test): Check if expr isn't NULL. Is it ok for trunk? === CUT HERE === It looks like the recent VEC rewrite might have interacted badly with this patch. I am getting a segfault in genconditions.c when add_c_test is called for desc. I think it is because the condition is not within a vector. Example of desc for working pattern: (define_insn (*zero_extendqihi2_aarch64) [ (set (match_operand:HI 0 (register_operand) (=r,r)) (zero_extend:HI (match_operand:QI 1 (nonimmediate_operand) (r,m ] () (@ uxtb\t%w0, %w1 ldrb\t%w0, %1) [ (set_attr (v8type) (extend,load1)) (set_attr (mode) (HI)) ]) Example of desc for broken one that uses define_subst: (define_insn (*addsi3_aarch64_noextend) [ (set (match_operand:SI 0 (register_operand) (=rk,rk,rk)) (plus:SI (match_operand:SI 1 (register_operand) (%rk,rk,rk)) (match_operand:SI 2 (aarch64_plus_operand) (I,r,J ] (@ add\t%w0, %w1, %2 add\t%w0, %w1, %w2 sub\t%w0, %w1, #%n2) [ (set_attr (v8type) (alu)) (set_attr (mode) (SI)) ]) Note that there are no brackets around the condition. That's the issue, I think. === CUT HERE === Thanks, K
Re: [patch] reorg.c janitor patch 3: remove rare_destination()
On 11/25/2012 11:30 AM, Richard Sandiford wrote: FWIW, the vast majority of condjumps have a REG_BR_PROB note so the part of mostly_true_jump that handles the jumps without a REG_BR_PROB note is itself a rare_destination :-) GCC is very careful about preserving the notes. Perhaps all that code should just be removed. Yeah, just returning 0 if there's no note sounds good. Preapproved if noone objects in 24 hours. Fine with me -- all the branch prediction code in reorg.c pre-dates the branch prediction notes and information we carry around these days. I wouldn't lose any sleep if all the reorg.c bits were converted to use the existing notes and the now redundant bits removed. jeff
Re: [PATCH] Remove unused DELAY_SLOTS_FOR_EPILOGUE target macro
On 11/24/2012 03:08 PM, Steven Bosscher wrote: Hello, The DELAY_SLOTS_FOR_EPILOGUE target macro, and the related ELIGIBLE_FOR_EPILOGUE_DELAY macro, are unused. The attached patch removes them. OK for trunk? OK, though this arguably shouldn't be going in during stage3. But given those are totally unused, OK. jeff
Re: [PATCH] Remove redundant variable in hash_set
On Mon, Nov 26, 2012 at 1:09 PM, Marek Polacek pola...@redhat.com wrote: I don't see why we need the `hash' variable, when we can use the `regno' variable directly. Regtested/bootstrapped on x86_64-linux. Ok for trunk? Ok. Thanks, Richard. 2012-11-26 Marek Polacek pola...@redhat.com * cprop.c (hash_set): Remove hash variable. Use regno variable directly. --- gcc/cprop.c.mp 2012-11-26 12:13:08.631193625 +0100 +++ gcc/cprop.c 2012-11-26 12:13:25.836238566 +0100 @@ -170,10 +170,7 @@ reg_available_p (const_rtx x, const_rtx static unsigned int hash_set (int regno, int hash_table_size) { - unsigned int hash; - - hash = regno; - return hash % hash_table_size; + return (unsigned) regno % hash_table_size; } /* Insert assignment DEST:=SET from INSN in the hash table. Marek
[PATCH] Don't bypass blocks with multiple latch edges (PR middle-end/54838)
In this PR, for the C testcase, in .cse1 we have: ENTRY | | 2 | | + 4 --+ |/ \ | | / \ | | 6 5| | /\ |\ | |/ \| \ | | 73 | 8 | | || | /\/ +---|| | / \ / | --109/ |-/ EXIT-/ (3-4 and 9-4 are back edges). Now, in bypass_block, when we're trying to bypass BB 4, we iterate over BB 4's incoming edges, then we're iterating over reg_use_table (registers used in insn). Here we call set = find_bypass_set (regno, e-src-index); but since it returns non-NULL value, redirect_edge_and_branch_force later redirects edges and BB 4 is gone. We then end up with Redirecting fallthru edge 3-4 to 6 JUMP-BYPASS: Proved reg 59 in jump_insn 15 equals constant (const_int 1 [0x1]) Bypass edge from 3-4 to 6 Redirecting fallthru edge 9-4 to 5 JUMP-BYPASS: Proved reg 59 in jump_insn 15 equals constant (const_int 3 [0x3]) Bypass edge from 9-4 to 5 i.e., it is assumed that in one reg there are two constants, that can't be right, right?! I've tried to fix this by not redirecting edges (thus bypassing block) if BB has more incoming latch edges and the hash table has more different SRC rtxs with the same hash value. FWIW, the table looks like the below: SET hash table (11 buckets, 3 entries) Index 0 (hash value 4) (reg:SI 59 [ D.1735 ]) := (const_int 1 [0x1]) Index 1 (hash value 5) (reg/v/f:DI 60 [ b ]) := (const_int 0 [0]) Index 2 (hash value 4) (reg:SI 59 [ D.1735 ]) := (const_int 3 [0x3]) (Only not bypassing BB when it has more latch edges would work too, but I'm afraid it could pessimize stuff somewhere else.) I've also changed some zeros into NULLs for pointers. Also, as folowup, we could get rid of may_be_loop_header variable completely, at one place where it is used we can use just 'n_latches 0'. And add the C++ TC as well. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2012-11-26 Marek Polacek pola...@redhat.com PR middle-end/54838 * cprop.c (only_one_const_p): New function. (find_bypass_set): New parameter. Conditionally call only_one_const_p. Use NULL instead of 0. (bypass_block): Adjust caller. Determine number of latch edges. (n_latches): New variable. * gcc.dg/pr54838.c: New test. --- gcc/cprop.c.mp 2012-11-26 12:13:08.631193625 +0100 +++ gcc/cprop.c 2012-11-26 14:41:32.864034283 +0100 @@ -1429,14 +1429,26 @@ find_implicit_sets (void) static int bypass_last_basic_block; +/* Determine whether there is only one constant SRC rtx in the hash table + for REGNO. Here we assume that for the same hash value there won't be + more entries with the same SRC rtx. Return true if there is only one + constant, false otherwise. */ + +static bool +only_one_const_p (int regno) +{ + struct expr *set = lookup_set (regno, set_hash_table); + return next_set (regno, set) == NULL; +} + /* Find a set of REGNO to a constant that is available at the end of basic block BB. Return NULL if no such set is found. Based heavily upon find_avail_set. */ static struct expr * -find_bypass_set (int regno, int bb) +find_bypass_set (int regno, int bb, bool multiple_latches) { - struct expr *result = 0; + struct expr *result = NULL; for (;;) { @@ -1450,9 +1462,15 @@ find_bypass_set (int regno, int bb) set = next_set (regno, set); } - if (set == 0) + if (set == NULL) break; + /* If there are more latch edges coming to the BB, we need to +make sure that for the same hash value there aren't +more constants. */ + if (multiple_latches !only_one_const_p (regno)) + return NULL; + src = set-src; if (cprop_constant_p (src)) result = set; @@ -1502,6 +1520,7 @@ bypass_block (basic_block bb, rtx setcc, int may_be_loop_header; unsigned removed_p; unsigned i; + unsigned n_latches; edge_iterator ei; insn = (setcc != NULL) ? setcc : jump; @@ -1514,12 +1533,12 @@ bypass_block (basic_block bb, rtx setcc, find_used_regs (XEXP (note, 0), NULL); may_be_loop_header = false; + n_latches = 0; FOR_EACH_EDGE (e, ei, bb-preds) if (e-flags EDGE_DFS_BACK) - { - may_be_loop_header = true; - break; - } + n_latches++; + + may_be_loop_header = n_latches 0; change = 0; for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) @@ -1557,7 +1576,7 @@ bypass_block (basic_block bb, rtx setcc, struct expr *set; rtx src, new_rtx; - set = find_bypass_set (regno, e-src-index); + set = find_bypass_set (regno, e-src-index, n_latches 1); if (! set) continue; --- gcc/testsuite/gcc.dg/pr54838.c.mp 2012-11-26 14:48:43.783980854 +0100 +++
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... This does not fix pr55006 on x86_64-apple-darwin10;-(while the patch in http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01607.html does). Test case is unchanged so I'm omitting it here. This test passes on x86_64-apple-darwin10 for a clean trunk. Thanks for looking at the problem. Dominique
[PATCH][ARM] AArch32 vmaxnm, vminnm support
Hi all, This patch adds support for the AArch32 vmaxnm and vminnm VFP instructions in that can be used to implement the smax[sf,df]3 and smin[sf,df]3 RTL patterns. The patterns are only used by gcc when unsafe math optimisations are turned on. Two new values for the type attribute are introduced: f_minmaxs and f_minmaxd. New compilation tests are added. They pass and no regressions on arm-none-eabi. Ok for trunk? Thanks, Kyrill gcc/ChangeLog 2012-11-26 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (f_minmaxs, f_minmaxd): New types. * config/arm/vfp.md (smaxmode3): New pattern. (sminmode3): Likewise. gcc/testsuite/ChangeLog 2012-11-26 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/vmaxnmdf.c: New test. * gcc.target/arm/vmaxnmsf.c: Likewise. * gcc.target/arm/vminnmsf.c: Likewise. * gcc.target/arm/vminnmdf.c: Likewise.--- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -274,6 +274,8 @@ fmacd,\ f_rints,\ f_rintd,\ + f_minmaxs,\ + f_minmaxd,\ f_flag,\ f_loads,\ f_loadd,\ diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 480a88d..82b277a 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -1263,6 +1263,31 @@ (set_attr type f_rintvfp_type)] ) +;; MIN_EXPR and MAX_EXPR eventually map to 'smin' and 'smax' in RTL. +;; The 'smax' and 'smin' RTL standard pattern names do not specify which +;; operand will be returned when both operands are zero (i.e. they may not +;; honour signed zeroes), or when either operand is NaN. Therefore GCC +;; only introduces MIN_EXPR/MAX_EXPR in fast math mode or when not honouring +;; NaNs. + +(define_insn smaxmode3 + [(set (match_operand:SDF 0 register_operand =F_constraint) +(smax:SDF (match_operand:SDF 1 register_operand F_constraint) + (match_operand:SDF 2 register_operand F_constraint)))] + TARGET_HARD_FLOAT TARGET_FPU_ARMV8 vfp_double_cond + vmaxnm.V_if_elem\\t%V_reg0, %V_reg1, %V_reg2 + [(set_attr type f_minmaxvfp_type)] +) + +(define_insn sminmode3 + [(set (match_operand:SDF 0 register_operand =F_constraint) +(smin:SDF (match_operand:SDF 1 register_operand F_constraint) + (match_operand:SDF 2 register_operand F_constraint)))] + TARGET_HARD_FLOAT TARGET_FPU_ARMV8 vfp_double_cond + vminnm.V_if_elem\\t%V_reg0, %V_reg1, %V_reg2 + [(set_attr type f_minmaxvfp_type)] +) + ;; Unimplemented insns: ;; fldm* ;; fstm* diff --git a/gcc/testsuite/gcc.target/arm/vmaxnmdf.c b/gcc/testsuite/gcc.target/arm/vmaxnmdf.c new file mode 100644 index 000..1a172b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vmaxnmdf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_vfp_ok } */ +/* { dg-options -ffast-math } */ +/* { dg-add-options arm_v8_vfp } */ + +double +foo (double x, double y) +{ + return __builtin_fmax (x, y); +} + +/* { dg-final { scan-assembler-times vmaxnm.f64\td\[0-9\]+ 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/vmaxnmsf.c b/gcc/testsuite/gcc.target/arm/vmaxnmsf.c new file mode 100644 index 000..bc23261 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vmaxnmsf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_vfp_ok } */ +/* { dg-options -ffast-math } */ +/* { dg-add-options arm_v8_vfp } */ + +float +foo (float x, float y) +{ + return __builtin_fmaxf (x, y); +} + +/* { dg-final { scan-assembler-times vmaxnm.f32\ts\[0-9\]+ 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/vminnmdf.c b/gcc/testsuite/gcc.target/arm/vminnmdf.c new file mode 100644 index 000..c2a6915 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vminnmdf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_vfp_ok } */ +/* { dg-options -ffast-math } */ +/* { dg-add-options arm_v8_vfp } */ + +double +foo (double x, double y) +{ + return __builtin_fmin (x, y); +} + +/* { dg-final { scan-assembler-times vminnm.f64\td\[0-9\]+ 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/vminnmsf.c b/gcc/testsuite/gcc.target/arm/vminnmsf.c new file mode 100644 index 000..eee43bc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vminnmsf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_vfp_ok } */ +/* { dg-options -ffast-math } */ +/* { dg-add-options arm_v8_vfp } */ + +float +foo (float x, float y) +{ + return __builtin_fminf (x, y); +} + +/* { dg-final { scan-assembler-times vminnm.f32\ts\[0-9\]+ 1 } } */
Re: [RFA 6/8] validate_failures.py: remove pass/fail from GetBuildData
On Sat, Nov 24, 2012 at 5:54 PM, Doug Evans d...@google.com wrote: 2012-11-24 Doug Evans d...@google.com * testsuite-management/validate_failures.py: Remove pass/fail indicator from result of GetBuildData. OK. Diego.
Re: [PATCH] Fix make_range_step with -fwrapv (PR tree-optimization/55236)
On Thu, Nov 8, 2012 at 9:04 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! With -fwrapv the range test optimization miscompiles the following testcase (both inter-bb version in 4.8+ in first function and the pure fold-const.c one since 3.4). The problem is that with -fwrapv and signed type -x in [-,b] doesn't imply x in [-b,-] and -x in [a,-] doesn't imply x in [-,-a], as -INT_MIN is INT_MIN and thus if INT_MIN was in the range, after negation it needs to be in the range too (and similarly if it wasn't, it can't be there). Turns out that if we make sure both low and high are non-NULL (i.e. replace - with INT_MIN resp. INT_MAX), normalize: will do the right thing that it does also for unsigned operation - if it detects the high bound is lower than low bound, it adjusts the range by inverting it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (what about 4.7 where bar is miscompiled too)? Perhaps for PLUS_EXPR/MINUS_EXPR and signed -fwrapv type we could do the same instead of return NULL_TREE; that we do right now (added by Kazu for PR23518). Ok. Thanks, Richard. 2012-11-08 Jakub Jelinek ja...@redhat.com PR tree-optimization/55236 * fold-const.c (make_range_step) case NEGATE_EXPR: For -fwrapv and signed ARG0_TYPE, force low and high to be non-NULL. * gcc.dg/pr55236.c: New test. --- gcc/fold-const.c.jj 2012-11-07 11:24:34.0 +0100 +++ gcc/fold-const.c2012-11-08 16:54:38.978339040 +0100 @@ -3880,6 +3880,17 @@ make_range_step (location_t loc, enum tr return arg0; case NEGATE_EXPR: + /* If flag_wrapv and ARG0_TYPE is signed, make sure +low and high are non-NULL, then normalize will DTRT. */ + if (!TYPE_UNSIGNED (arg0_type) + !TYPE_OVERFLOW_UNDEFINED (arg0_type)) + { + if (low == NULL_TREE) + low = TYPE_MIN_VALUE (arg0_type); + if (high == NULL_TREE) + high = TYPE_MAX_VALUE (arg0_type); + } + /* (-x) IN [a,b] - x in [-b, -a] */ n_low = range_binop (MINUS_EXPR, exp_type, build_int_cst (exp_type, 0), --- gcc/testsuite/gcc.dg/pr55236.c.jj 2012-11-08 16:40:49.171781137 +0100 +++ gcc/testsuite/gcc.dg/pr55236.c 2012-11-08 16:41:20.044578919 +0100 @@ -0,0 +1,31 @@ +/* PR tree-optimization/55236 */ +/* { dg-do run } */ +/* { dg-options -O2 -fwrapv } */ + +extern void abort (); + +__attribute__((noinline, noclone)) void +foo (int i) +{ + if (i 0) +abort (); + i = -i; + if (i 0) +return; + abort (); +} + +__attribute__((noinline, noclone)) void +bar (int i) +{ + if (i 0 || (-i) = 0) +abort (); +} + +int +main () +{ + foo (-__INT_MAX__ - 1); + bar (-__INT_MAX__ - 1); + return 0; +} Jakub
Re: [PATCH] Reset source location for instructions moved out of its original residing basic block
On Sun, Nov 4, 2012 at 11:26 PM, Alexandre Oliva aol...@redhat.com wrote: On Nov 1, 2012, Dehao Chen de...@google.com wrote: I see your point. How about we guard these changes with a flag, say -gless-jumpy The right (DWARF-expected) approach to avoid this sortof jumpiness is to teach GCC to emit .locs with is_stmt=1, along the lines (and limitations) described in my GCC Summit paper on stmt frontier notes, that proposes further extensions to deal with the foreseen limitations. Right. gdb could also instead of show a single line (and that changing all the time) show all currently active lines (thus scan the current function for lines that appear and compute their liveness). A step would then jump to the next stmt that changes line liveness. Richard. http://people.redhat.com/~aoliva/papers/sfn/gcc2010.pdf -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address
On Fri, Nov 23, 2012 at 11:50 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Nov 23, 2012 at 11:33:37AM -0800, H.J. Lu wrote: 2012-11-21 H.J. Lu hongjiu...@intel.com PR bootstrap/55380 * charset.c (_cpp_convert_input): Clear CPP_PAD_BUFFER_SIZE bytes if CLEAR_CPP_PAD_BUFFER isn't 0. Allocate extra CPP_PAD_BUFFER_SIZE bytes and clear it if CLEAR_CPP_PAD_BUFFER isn't 0. * files.c (read_file_guts): Allocate extra CPP_PAD_BUFFER_SIZE bytes. * internal.h (CPP_PAD_BUFFER_SIZE): New. Defined to 16 for -fsanitize=address if __SANITIZE_ADDRESS__ is defined. (CLEAR_CPP_PAD_BUFFER): New. I'd say you are making this way too much complicated. The following patch just changes those + 1 to + 16, adds memset of the 16 bytes unconditionally, but as it also fixes a thinko which IMHO affects the most common cases (regular file, with no conversion needed), I'd say it ought to be faster than the old version (the old version would allocate st.st_size + 1 bytes long buffer, read st.st_size bytes into it, call _cpp_convert_input with len st.st_size and size st.st_size (the latter should have been st.st_size + 1, i.e. the allocated size) and thus to.len = to.asize was true and we called realloc with the same length as malloc has been called originally. 2012-11-23 Jakub Jelinek ja...@redhat.com PR bootstrap/55380 * files.c (read_file_guts): Allocate extra 16 bytes instead of 1 byte at the end of buf. Pass size + 16 instead of size to _cpp_convert_input. * charset.c (_cpp_convert_input): Reallocate if there aren't at least 16 bytes beyond to.len in the buffer. Clear 16 bytes at to.text + to.len. --- libcpp/files.c.jj 2012-11-22 11:04:24.0 +0100 +++ libcpp/files.c 2012-11-23 20:37:03.146379853 +0100 @@ -671,7 +671,7 @@ read_file_guts (cpp_reader *pfile, _cpp_ the majority of C source files. */ size = 8 * 1024; - buf = XNEWVEC (uchar, size + 1); + buf = XNEWVEC (uchar, size + 16); total = 0; while ((count = read (file-fd, buf + total, size - total)) 0) { @@ -682,7 +682,7 @@ read_file_guts (cpp_reader *pfile, _cpp_ if (regular) break; size *= 2; - buf = XRESIZEVEC (uchar, buf, size + 1); + buf = XRESIZEVEC (uchar, buf, size + 16); } } @@ -699,7 +699,7 @@ read_file_guts (cpp_reader *pfile, _cpp_ file-buffer = _cpp_convert_input (pfile, CPP_OPTION (pfile, input_charset), -buf, size, total, +buf, size + 16, total, file-buffer_start, file-st.st_size); file-buffer_valid = true; --- libcpp/charset.c.jj 2011-01-06 10:22:00.0 +0100 +++ libcpp/charset.c2012-11-23 20:40:39.736116642 +0100 @@ -1,6 +1,6 @@ /* CPP Library - charsets Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2008, 2009, - 2010 Free Software Foundation, Inc. + 2010, 2012 Free Software Foundation, Inc. Broken out of c-lex.c Apr 2003, adding valid C99 UCN ranges. @@ -1729,9 +1729,12 @@ _cpp_convert_input (cpp_reader *pfile, c iconv_close (input_cset.cd); /* Resize buffer if we allocated substantially too much, or if we - haven't enough space for the \n-terminator. */ - if (to.len + 4096 to.asize || to.len = to.asize) -to.text = XRESIZEVEC (uchar, to.text, to.len + 1); + haven't enough space for the \n-terminator or following + 15 bytes of padding. */ + if (to.len + 4096 to.asize || to.len + 16 to.asize) +to.text = XRESIZEVEC (uchar, to.text, to.len + 16); + + memset (to.text + to.len, '\0', 16); /* If the file is using old-school Mac line endings (\r only), terminate with another \r, not an \n, so that we do not mistake Jakub You should also mention: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54691 in ChangeLog entry. Thanks. -- H.J.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a bare-metal class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo2 double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int class would still provide as wide as we need arithmetic, as in your rtl patch. I don't think he was objecting to that. That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) wide-int objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage. We had considered something along these lines and rejected it. I am not really opposed to doing something like this, but it is not an obvious winning idea and is likely not to be a good idea. Here was our thought process: if you abstract away the storage inside a wide int, then you should be able to copy a pointer to the block of data from either the rtl level integer constant or the tree level one into the wide int. It is certainly true that making a wide_int from one of these is an extremely common operation and doing this would avoid those copies. However, this causes two problems: 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make the object. it created the base object and then it allocated the array. Richard S noticed that we could just allocate one CONST_WIDE_INT that had the array in it. Doing it this way saves one ggc allocation and one indirection when accessing the data within the CONST_WIDE_INT. Our plan is to use the same trick at the tree level. So to avoid the copying, you seem to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. 2) You are now stuck either ggcing the storage inside a wide_int when they are created as part of an expression or you have to play some game to represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. Clearly this is where you think that we should be going by suggesting that we abstract away the internal storage. However, this comes at a price: what is currently an array access in my patches would (i believe) become a function call. No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to them. From a performance point of view, i believe that this is a non starter. If you can figure out how to design this so that it is not a function call, i would consider this a viable option. On the other side of this you are clearly correct that we are copying the data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.But this is why we represent data inside of the wide_ints, the INT_CSTs and the CONST_WIDE_INTs in a compressed form. Even with very big types, which are
Re: [PATCH] gcc-{ar,nm,ranlib}: Find binutils binaries relative to self
On Wed, Nov 7, 2012 at 10:51 PM, Meador Inge mead...@codesourcery.com wrote: Ping ^ 4. Ok. Thanks, Richard. On 10/29/2012 10:46 AM, Meador Inge wrote: Ping ^ 3. On 10/18/2012 10:30 AM, Meador Inge wrote: Ping ^ 2. On 10/09/2012 09:44 PM, Meador Inge wrote: Ping. On 10/04/2012 03:45 PM, Meador Inge wrote: Hi All, Currently the gcc-{ar,nm,ranlib} utilities assume that binutils is in path when invoking the wrapped binutils program. This goes against the accepted practice in GCC to find sub-programs relative to where the GCC binaries are stored and to not make assumptions about the PATH. This patch changes the gcc-{ar,nm,ranlib} utilities to do the same by factoring out some utility code for finding files from collect2.c. These functions are then leveraged to find the binutils programs. Note that similar code exist in gcc.c. Perhaps one day everything can be merged to the file-find files. Tested for Windows and GNU/Linux hosts and i686-pc-linux-gnu and arm-none-eabi targets. OK? P.S. I am not quite sure what is best for the copyrights and contributed by comments in the file-find* files I added since that code was just moved. This patch drops the contributed by and keeps all the copyright dates from collect2.c. 2012-10-04 Meador Inge mead...@codesourcery.com * collect2.c (main): Call find_file_set_debug. (find_a_find, add_prefix, prefix_from_env, prefix_from_string): Factor out into ... * file-find.c (New file): ... here and ... * file-find.h (New file): ... here. * gcc-ar.c (standard_exec_prefix): New variable. (standard_libexec_prefix): Ditto. (tooldir_base_prefix) Ditto. (self_exec_prefix): Ditto. (self_libexec_prefix): Ditto. (self_tooldir_prefix): Ditto. (target_version): Ditto. (path): Ditto. (target_path): Ditto. (setup_prefixes): New function. (main): Rework how wrapped programs are found. * Makefile.in (OBJS-libcommon-target): Add file-find.o. (AR_OBJS): New variable. (gcc-ar$(exeext)): Add dependency on $(AR_OBJS). (gcc-nm$(exeext)): Ditto. (gcc-ranlib(exeext)): Ditto. (COLLECT2_OBJS): Add file-find.o. (collect2.o): Add file-find.h prerequisite. (file-find.o): New rule. Index: gcc/gcc-ar.c === --- gcc/gcc-ar.c (revision 192099) +++ gcc/gcc-ar.c (working copy) @@ -21,21 +21,110 @@ #include config.h #include system.h #include libiberty.h +#include file-find.h #ifndef PERSONALITY #error Please set personality #endif +/* The exec prefix as derived at compile-time from --prefix. */ + +static const char standard_exec_prefix[] = STANDARD_EXEC_PREFIX; + +/* The libexec prefix as derived at compile-time from --prefix. */ + static const char standard_libexec_prefix[] = STANDARD_LIBEXEC_PREFIX; + +/* The bindir prefix as derived at compile-time from --prefix. */ + static const char standard_bin_prefix[] = STANDARD_BINDIR_PREFIX; -static const char *const target_machine = TARGET_MACHINE; +/* A relative path to be used in finding the location of tools + relative to this program. */ + +static const char *const tooldir_base_prefix = TOOLDIR_BASE_PREFIX; + +/* The exec prefix as relocated from the location of this program. */ + +static const char *self_exec_prefix; + +/* The libexec prefix as relocated from the location of this program. */ + +static const char *self_libexec_prefix; + +/* The tools prefix as relocated from the location of this program. */ + +static const char *self_tooldir_prefix; + +/* The name of the machine that is being targeted. */ + +static const char *const target_machine = DEFAULT_TARGET_MACHINE; + +/* The target version. */ + +static const char *const target_version = DEFAULT_TARGET_VERSION; + +/* The collection of target specific path prefixes. */ + +static struct path_prefix target_path; + +/* The collection path prefixes. */ + +static struct path_prefix path; + +/* The directory separator. */ + static const char dir_separator[] = { DIR_SEPARATOR, 0 }; +static void +setup_prefixes (const char *exec_path) +{ + const char *self; + + self = getenv (GCC_EXEC_PREFIX); + if (!self) +self = exec_path; + else +self = concat (self, gcc- PERSONALITY, NULL); + + /* Relocate the exec prefix. */ + self_exec_prefix = make_relative_prefix (self, + standard_bin_prefix, + standard_exec_prefix); + if (self_exec_prefix == NULL) +self_exec_prefix = standard_exec_prefix; + + /* Relocate libexec prefix. */ + self_libexec_prefix = make_relative_prefix (self, +standard_bin_prefix, +standard_libexec_prefix); + if (self_libexec_prefix == NULL) +self_libexec_prefix = standard_libexec_prefix; + + + /* Build the relative
Re: [PATCH] Fix substitute_and_fold ICE (PR tree-optimization/55331)
On Thu, Nov 15, 2012 at 9:09 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! On the following testcase substitute_and_fold ICEs because memmove of length 1 on an empty class is optimized away, and this function wasn't prepared to handle that. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.7? I think the bug is that the stmt is removed. fold_stmt is not supposed to remove the stmt in any case - it may replace it with a gimple_nop at most. Thus, gimplify_and_update_call_from_tree is at fault here. I am testing a patch that fixes it. Thanks, Richard. 2012-11-15 Jakub Jelinek ja...@redhat.com PR tree-optimization/55331 * tree-ssa-propagate.c (substitute_and_fold): Handle fold_stmt turning a stmt into nothing. * g++.dg/opt/pr55331.C: New test. --- gcc/tree-ssa-propagate.c.jj 2012-11-02 09:01:55.0 +0100 +++ gcc/tree-ssa-propagate.c2012-11-15 12:27:37.218543417 +0100 @@ -1094,7 +1094,7 @@ substitute_and_fold (ssa_prop_get_value_ gimple stmt = gsi_stmt (i); gimple old_stmt; enum gimple_code code = gimple_code (stmt); - gimple_stmt_iterator oldi; + gimple_stmt_iterator oldi, nexti; oldi = i; if (do_dce) @@ -1147,6 +1147,8 @@ substitute_and_fold (ssa_prop_get_value_ } old_stmt = stmt; + nexti = oldi; + gsi_next (nexti); /* Some statements may be simplified using propagator specific information. Do this before propagating @@ -1171,36 +1173,63 @@ substitute_and_fold (ssa_prop_get_value_ /* Now cleanup. */ if (did_replace) { - stmt = gsi_stmt (oldi); - - /* If we cleaned up EH information from the statement, - remove EH edges. */ - if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) - gimple_purge_dead_eh_edges (bb); - - if (is_gimple_assign (stmt) - (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) - == GIMPLE_SINGLE_RHS)) - { -tree rhs = gimple_assign_rhs1 (stmt); - -if (TREE_CODE (rhs) == ADDR_EXPR) - recompute_tree_invariant_for_addr_expr (rhs); - } + /* old_stmt might have been also replaced by nothing, +in that case set stmt to NULL. */ + if (gsi_end_p (oldi)) + { + gcc_checking_assert (gsi_end_p (nexti)); + stmt = NULL; + } + else + { + stmt = gsi_stmt (oldi); + if (!gsi_end_p (nexti) gsi_stmt (nexti) == stmt) + stmt = NULL; + } - /* Determine what needs to be done to update the SSA form. */ - update_stmt (stmt); - if (!is_gimple_debug (stmt)) - something_changed = true; + if (stmt == NULL) + { + if (maybe_clean_eh_stmt (old_stmt)) + gimple_purge_dead_eh_edges (bb); + something_changed = true; + } + else + { + /* If we cleaned up EH information from the statement, +remove EH edges. */ + if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) + gimple_purge_dead_eh_edges (bb); + + if (is_gimple_assign (stmt) + (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) + == GIMPLE_SINGLE_RHS)) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (rhs) == ADDR_EXPR) + recompute_tree_invariant_for_addr_expr (rhs); + } + + /* Determine what needs to be done to update the SSA +form. */ + update_stmt (stmt); + if (!is_gimple_debug (stmt)) + something_changed = true; + } } if (dump_file (dump_flags TDF_DETAILS)) { if (did_replace) { - fprintf (dump_file, Folded into: ); - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); - fprintf (dump_file, \n); + if (stmt == NULL) + fprintf (dump_file, Folded into nothing\n); + else + { + fprintf (dump_file, Folded into: ); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, \n); + } } else fprintf (dump_file, Not
Re: [PATCH] Fix substitute_and_fold ICE (PR tree-optimization/55331)
On Mon, Nov 26, 2012 at 04:24:41PM +0100, Richard Biener wrote: On Thu, Nov 15, 2012 at 9:09 PM, Jakub Jelinek ja...@redhat.com wrote: On the following testcase substitute_and_fold ICEs because memmove of length 1 on an empty class is optimized away, and this function wasn't prepared to handle that. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.7? I think the bug is that the stmt is removed. fold_stmt is not supposed to remove the stmt in any case - it may replace it with a gimple_nop at most. Thus, gimplify_and_update_call_from_tree is at fault here. I am testing a patch that fixes it. Note that fold_stmt_1 also has code to handle a call folding into nothing, so perhaps that could be removed too if you change it to fold into a nop instead. Jakub
Re: [PATCH] Cleanup last_location and update input_location in ipa_prop
On Thu, Nov 8, 2012 at 6:39 PM, Dehao Chen de...@google.com wrote: ping... The emit-rtl.c hunk is ok. I'm questioning the ipa-prop.c hunk though - what looks at input_location (nothing outside of the frontend should, really). Thanks, Richard. On Mon, Nov 5, 2012 at 5:20 PM, Dehao Chen de...@google.com wrote: Hi, This is a patch to do some obvious cleanup and setting correct input_location in ipa_prop (because it invokes gimplification routines). Bootstrapped and passed gcc regression tests. Is it okay for trunk? Thanks, Dehao gcc/ChangeLog: 2010-11-05 Dehao Chen de...@google.com * ipa-prop.c (ipa_modify_call_arguments): Set input_location so that gimplification routines can have right location. * emit-rtl.c (last_location): Remove unused variable.
Re: [PATCH] Do not dump location for compare_debug
On Thu, Nov 1, 2012 at 1:26 AM, Dehao Chen de...@google.com wrote: Hi, When -fcompare_debug is used, what we really want to do is to compare instructions between the -g version and -gtoggle version. However, current dump file still contains the source line in its rtl dump. This patch changes to only dump rtl without dumping its source info. Bootstrapped and passed gcc regression test on x86_64. Is it okay for trunk? I'm not sure - it at least works comparing with them. Alex - was source location info included on purpose here? Thanks, Richard. Thanks, Dehao 2012-10-31 Dehao Chen de...@google.com * final.c (rest_of_clean_state): Dump the final rtl without location.
Re: patch to fix constant math - second small patch
On Thu, Nov 8, 2012 at 7:13 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: I have added the proper doc. OK to commit? Ok. Thanks, Richard. Kenny On 10/08/2012 05:06 AM, Richard Guenther wrote: On Sat, Oct 6, 2012 at 12:48 AM, Kenneth Zadeck zad...@naturalbridge.com wrote: This patch adds machinery to genmodes.c so that largest possible sizes of various data structures can be determined at gcc build time. These functions create 3 symbols that are available in insn-modes.h: MAX_BITSIZE_MODE_INT - the bitsize of the largest int. MAX_BITSIZE_MODE_PARTIAL_INT - the bitsize of the largest partial int. MAX_BITSIZE_MODE_ANY_INT - the largest bitsize of any kind of int. Ok. Please document these macros in rtl.texi. Richard.
Re: Tighten checking of vector comparisons
On Thu, Nov 22, 2012 at 11:59 AM, Marc Glisse marc.gli...@inria.fr wrote: (I forgot to send this at the time) On Sun, 4 Nov 2012, Richard Biener wrote: - else if (!INTEGRAL_TYPE_P (type) TREE_CODE (type) != VECTOR_TYPE) + else if (!INTEGRAL_TYPE_P (type) !VOID_TYPE_P (type) + TREE_CODE (type) != VECTOR_TYPE) [...] Ok for this part. Applied, thanks. Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 193060) +++ gcc/tree-cfg.c (working copy) @@ -3263,21 +3263,30 @@ verify_gimple_comparison (tree type, tre error (mismatching comparison operand types); debug_generic_expr (op0_type); debug_generic_expr (op1_type); return true; } /* The resulting type of a comparison may be an effective boolean type. */ if (INTEGRAL_TYPE_P (type) (TREE_CODE (type) == BOOLEAN_TYPE || TYPE_PRECISION (type) == 1)) -; +{ + if (TREE_CODE (op0_type) == VECTOR_TYPE + || TREE_CODE (op1_type) == VECTOR_TYPE) +{ + error (vector comparison returning a boolean); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; +} verify_gimple_* should have positive checks, thus, check that if there are vector operands the comparison result should be a vector. Not complaining about a vector comparison having a boolean result. I wasn't sure what that was supposed to look like, so I dropped it for now. Ok, looking closer we have /* The resulting type of a comparison may be an effective boolean type. */ if (INTEGRAL_TYPE_P (type) (TREE_CODE (type) == BOOLEAN_TYPE || TYPE_PRECISION (type) == 1)) ; /* Or an integer vector type with the same size and element count as the comparison operand types. */ else if (TREE_CODE (type) == VECTOR_TYPE TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) { ... Thus you are rejecting a boolean valued vector comparison which we otherwise happily accept. I suppose that makes sense (even though at least equality compares can make sense). Thus that hunk is ok as well. Thanks, Richard. -- Marc Glisse
Re: [PATCH] Update source location for PRE inserted stmt
On Sun, Nov 25, 2012 at 11:37 AM, Xinliang David Li davi...@google.com wrote: On Sun, Nov 25, 2012 at 4:40 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Nov 15, 2012 at 5:46 PM, Eric Botcazou ebotca...@adacore.com wrote: But UNKNOWN_LOCATION is effectively wrong as well. If other optimizations move the statements above the inserted instruction, then the new instruction ends up inheriting whatever location happens to be in the previous statement. That's correct, UNKNOWN_LOCATION isn't a panacea either, although it can help in some cases. The only safe thing to do is to put the exact source location of the statement, i.e. the location of the source construct for which the statement has been generated. There may not be a source location for a generated statement, the computed value might not have been computed in the source at any point even. Consider re-association of an expression and then a re-associated part becoming partially redundant. if () tem = a + b; tem2 = a + c + b; after re-assoc + PRE: if () tem = a + b; else tem' = a + b; // which sloc here? For inserted expr/stmt, inheriting location from placeholder is making sense. But when the placeholder has no location in the first place, explicit NOWHERE is needed, otherwise it will just has random location from other basic block. Thus for this cases, sloc here should be NOWHERE. However, in the unittest I provided: if (x 0) ret += *a; else a++; ret += *a; After PRE: if (x 0) { pretmp_1 = *a; ret += pretmp_1; } else { a++; pretmp_2 = *a; // it should inherit location from prev line } pretmp_3 = PHI(pretmp_1, pretmp_2) How does that sound? Thanks, Dehao If there was a else branch, the proposed patch by Dehao will use the source location of the previous stmt 'tem' = a + b' was inserted after. I did not see any special source location handling in tree-ssa-reassoc.c -- looks like the same principle is used there -- the new assignment after the operand shuffling will inherit the source location of the placeholder gimple stmt. tem'' = PHI tem, tem' // or here? on the args? tem2 = tem'' + c; so what source location would you use for the inserted expression? If UNKNOWN is not correct here then I think we need an explicit NOWHERE? Can that break line coverage info too, say, when source location get stripped from reassociated expressions? David Fixing the location when the statement is inserted will reduce this randomness. I'm not sure I understand why you think this will break coverage. The examples given in this thread were helped by this location fixing. What do you call randomness exactly? GDB jumpiness? I'm a little wary of fixing GDB jumpiness by distorting the debug info. Ian has clearly outlined the correct approach to addressing it. Note that I didn't specifically reply to Dehao's patch here, which might be acceptable in the end, only to David's seemingly general statement about the natural way of putting a location on these statements. Richard. -- Eric Botcazou
[C++ Patch / RFC] PR 51242
Hi, this PR is about the rejection (in C++11 mode of course) of: enum class MyEnum { A = 1 }; struct MyClass { MyEnum Field1 : 3; }; whereas the corresponding unscoped enum case already works. As noticed by Jon, it seems that the below straightforward patchlet is enough to solve the problem but then we have a warning issue: 51242.C:5:19: warning: ‘MyClass::Field1’ is too small to hold all values of ‘enum class MyEnum’ [enabled by default] which is easily explained: the underlying type is fixed (to an implicit int type), thus in finish_enum_value_list, fixed_underlying_type_p is true and the code restricting the ENUM_UNDERLYING_TYPE for diagnostic purposes doesn't run. In other terms, we warn for the same reason we for example -Wnarrowing warn (before erroring out) for: enum class Code { SUCCESS = 0 }; Code a; short r[] = {a}; That seems intended, but especially annoying in the bitfield case. An idea could be giving a name to the too small to hold all values warning and making possible disabling it. What do you suggest? Thanks! Paolo. Index: decl2.c === --- decl2.c (revision 193814) +++ decl2.c (working copy) @@ -1030,7 +1030,7 @@ grokbitfield (const cp_declarator *declarator, if (TREE_CODE (value) == VOID_TYPE) return void_type_node; - if (!INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (TREE_TYPE (value)) + if (!INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (value)) (POINTER_TYPE_P (value) || !dependent_type_p (TREE_TYPE (value {
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 26, 2012 at 3:38 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... This does not fix pr55006 on x86_64-apple-darwin10;-(while the patch in http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01607.html does). Does it fail on x86_64-linux also? If not as-is, can you try with -fPIC? Test case is unchanged so I'm omitting it here. This test passes on x86_64-apple-darwin10 for a clean trunk. Yes, I know. As you can probably tell from the patch, I've been working on an older revision because I can't reproduce the problem on the trunk. It's a very elusive problem. Ciao! Steven
Re: [patch] Rework RTL CFG graph dumping to dump DOT format
On Fri, Nov 23, 2012 at 10:46 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, The graph.[ch] code to dump the CFG for viewing with VCG is quite broken and hasn't been updated to follow the advances in the CFG infrastructure. The attached patch is the first in a series of patches I have planned to update the CFG-as-graph dumping code to dump GraphViz DOT input instead. The attached archive is an example input/output. There's still quite a lot of work to be done: - use pretty-print in the slim RTL dumping - move the escaped-string printing into pretty-print - make the basic block content dumping a cfghook - make these dumps work for GIMPLE also For the moment, this is a good starting point. Obviously there are no changes in code generation due to this patch, so IMHO this patch (and the rest of the series) should be safe for stage3. Hopefully everyone else agrees :-) Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? Hmm, you are removing an (appearant) abstraction that I didn't know was there but remember to have asked for in the thread for dumping a late callgraph for Ada purposes. Is the abstraction not working or was it simply a hassle to keep the VCG path working (I don't thjnk I have ever seen a VCG consumer ... ;)). Btw, I of course have my own CFG dumper (producing graphviz input) in my local tree - attached for reference (I'm simply using it from gdb sessions). Richard. Ciao! Steven p Description: Binary data
Re: [PATCH] Cleanup last_location and update input_location in ipa_prop
On Mon, Nov 26, 2012 at 7:28 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Nov 8, 2012 at 6:39 PM, Dehao Chen de...@google.com wrote: ping... The emit-rtl.c hunk is ok. I'm questioning the ipa-prop.c hunk though - what looks at input_location (nothing outside of the frontend should, really). ipa_modify_call_arguments invokes force_gimple_operand_gsi, which uses frontend routines to gimplify expr and uses input_location. Thanks, Dehao Thanks, Richard. On Mon, Nov 5, 2012 at 5:20 PM, Dehao Chen de...@google.com wrote: Hi, This is a patch to do some obvious cleanup and setting correct input_location in ipa_prop (because it invokes gimplification routines). Bootstrapped and passed gcc regression tests. Is it okay for trunk? Thanks, Dehao gcc/ChangeLog: 2010-11-05 Dehao Chen de...@google.com * ipa-prop.c (ipa_modify_call_arguments): Set input_location so that gimplification routines can have right location. * emit-rtl.c (last_location): Remove unused variable.
Re: [patch] Rework RTL CFG graph dumping to dump DOT format
On Mon, Nov 26, 2012 at 4:46 PM, Richard Biener wrote: On Fri, Nov 23, 2012 at 10:46 AM, Steven Bosscher wrote: Hello, The graph.[ch] code to dump the CFG for viewing with VCG is quite broken and hasn't been updated to follow the advances in the CFG infrastructure. The attached patch is the first in a series of patches I have planned to update the CFG-as-graph dumping code to dump GraphViz DOT input instead. The attached archive is an example input/output. There's still quite a lot of work to be done: - use pretty-print in the slim RTL dumping - move the escaped-string printing into pretty-print - make the basic block content dumping a cfghook - make these dumps work for GIMPLE also For the moment, this is a good starting point. Obviously there are no changes in code generation due to this patch, so IMHO this patch (and the rest of the series) should be safe for stage3. Hopefully everyone else agrees :-) Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? Hmm, you are removing an (appearant) abstraction that I didn't know was there but remember to have asked for in the thread for dumping a late callgraph for Ada purposes. Is the abstraction not working or was it simply a hassle to keep the VCG path working (I don't thjnk I have ever seen a VCG consumer ... ;)). I'm not sure what abstraction you mean. If you're talking about the old graph.[ch], there's nothing abstract about it, it only dumps an RTL CFG and, indeed, that wasn't working properly. The dumps were ugly if they work at all. Each insn was dumped as a separate node and there was no option to dump complete translation units to a single file (my dumper does that just fine, it dumps each CFG as a disjoint subgraph). (Some examples of what the dumps look like are attached to this e-mail: http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01977.html) A generic graph dumping interface would be nice (graph_traits?) but for the moment I'm just trying to make a debugging tool available. Btw, I of course have my own CFG dumper (producing graphviz input) in my local tree - attached for reference (I'm simply using it from gdb sessions). I'll integrate this in my work for a follow-up patch. Well, if you'd OK the one from this thread first, of course ;-) Ciao! Steven
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson tejohn...@google.com wrote: Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignement, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output for some large fdo compilations and saw only minimal increases in the dominance computation times, which were only a tiny percent of the overall compile time. Additionally, I added a flag to the rtl_data structure to indicate whether any partitioning was actually performed, so that optimizations which were conservatively disabled whenever the flag_reorder_blocks_and_partition is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less conservative for functions where no partitions were formed (e.g. they are completely hot). Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int benchmarks and internal google benchmarks using profile feedback and -freorder-blocks-and-partition to get more coverage. Ok for trunk? Thanks, Teresa 2012-11-14 Teresa Johnson tejohn...@google.com Steven Bosscher ste...@gcc.gnu.org * cfghooks.h (cfg_layout_finalize): New parameter. * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize parameter. * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert as this is now done by redirect_edge_and_branch_force. * function.c (thread_prologue_and_epilogue_insns): Insert new bb after barriers, new cfg_layout_finalize parameter, and don't store exit predecessor BB until after it is potentially split. * function.h (struct rtl_data): New flag has_bb_partition. * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if any blocks in function actually partitioned. (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean up partitioning. * bb-reorder.c (connect_traces): Only look for partitions and skip block copying if any blocks in function actually partitioned. (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure that no cold blocks dominate a hot block. (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert as this is now done by force_nonfallthru_and_redirect. (add_reg_crossing_jump_notes): Handle the fact that some jumps may already be marked with region crossing note. (reorder_basic_blocks): Only need to verify partitions if any blocks in function actually partitioned. (insert_section_boundary_note): Only need to insert note if any blocks in function actually partitioned. (rest_of_handle_reorder_blocks): New cfg_layout_finalize parameter, and remove call to insert_section_boundary_note as this is now called via cfg_layout_finalize/fixup_reorder_chain. (duplicate_computed_gotos): New cfg_layout_finalize parameter. (partition_hot_cold_basic_blocks): Set flag indicating function has bb partitions. * bb-reorder.h: Declare insert_section_boundary_note and emit_barrier_after_bb, which are no longer static. * basic-block.h: Declare new function fixup_partitions. * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary check for region crossing note. (fixup_partition_crossing): New function. (fixup_bb_partition): Ditto. (rtl_redirect_edge_and_branch): Fixup partition boundaries. (force_nonfallthru_and_redirect): Fixup partition boundaries, remove old code that tried to
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On 11/26/2012 10:03 AM, Richard Biener wrote: On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a bare-metal class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo2 double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int class would still provide as wide as we need arithmetic, as in your rtl patch. I don't think he was objecting to that. That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) wide-int objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage. We had considered something along these lines and rejected it. I am not really opposed to doing something like this, but it is not an obvious winning idea and is likely not to be a good idea. Here was our thought process: if you abstract away the storage inside a wide int, then you should be able to copy a pointer to the block of data from either the rtl level integer constant or the tree level one into the wide int. It is certainly true that making a wide_int from one of these is an extremely common operation and doing this would avoid those copies. However, this causes two problems: 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make the object. it created the base object and then it allocated the array. Richard S noticed that we could just allocate one CONST_WIDE_INT that had the array in it. Doing it this way saves one ggc allocation and one indirection when accessing the data within the CONST_WIDE_INT. Our plan is to use the same trick at the tree level. So to avoid the copying, you seem to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. There is the issue then what if some wide-ints are not short lived. It makes me nervous to create internal pointers to gc ed memory. 2) You are now stuck either ggcing the storage inside a wide_int when they are created as part of an expression or you have to play some game to represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. Again, this makes me nervous but it could be done. However, it does mean that now the wide ints that are not created from rtxes or trees will be more expensive because they are not going to get their storage for free, they are going to alloca it. however, it still is not clear, given that 99% of the wide ints are going to fit in a single hwi, that this would be a noticeable win. Clearly this is where you think that we should be going by suggesting that we abstract away the internal storage. However, this comes at a price: what is currently an array access in my patches would (i believe) become a function call. No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to
Re: [PATCH] Cleanup last_location and update input_location in ipa_prop
On Mon, Nov 26, 2012 at 4:54 PM, Dehao Chen de...@google.com wrote: On Mon, Nov 26, 2012 at 7:28 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Nov 8, 2012 at 6:39 PM, Dehao Chen de...@google.com wrote: ping... The emit-rtl.c hunk is ok. I'm questioning the ipa-prop.c hunk though - what looks at input_location (nothing outside of the frontend should, really). ipa_modify_call_arguments invokes force_gimple_operand_gsi, which uses frontend routines to gimplify expr and uses input_location. Can you be more specific? That's the place that needs fixing - there are a lot more force_gimple_operand callers. Richard. Thanks, Dehao Thanks, Richard. On Mon, Nov 5, 2012 at 5:20 PM, Dehao Chen de...@google.com wrote: Hi, This is a patch to do some obvious cleanup and setting correct input_location in ipa_prop (because it invokes gimplification routines). Bootstrapped and passed gcc regression tests. Is it okay for trunk? Thanks, Dehao gcc/ChangeLog: 2010-11-05 Dehao Chen de...@google.com * ipa-prop.c (ipa_modify_call_arguments): Set input_location so that gimplification routines can have right location. * emit-rtl.c (last_location): Remove unused variable.
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: error: fallthru edge crosses section boundary, including the case described in PR55121. On 26 November 2012 16:55, Teresa Johnson tejohn...@google.com wrote: Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson tejohn...@google.com wrote: Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignement, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output for some large fdo compilations and saw only minimal increases in the dominance computation times, which were only a tiny percent of the overall compile time. Additionally, I added a flag to the rtl_data structure to indicate whether any partitioning was actually performed, so that optimizations which were conservatively disabled whenever the flag_reorder_blocks_and_partition is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less conservative for functions where no partitions were formed (e.g. they are completely hot). Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int benchmarks and internal google benchmarks using profile feedback and -freorder-blocks-and-partition to get more coverage. Ok for trunk? Thanks, Teresa 2012-11-14 Teresa Johnson tejohn...@google.com Steven Bosscher ste...@gcc.gnu.org * cfghooks.h (cfg_layout_finalize): New parameter. * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize parameter. * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert as this is now done by redirect_edge_and_branch_force. * function.c (thread_prologue_and_epilogue_insns): Insert new bb after barriers, new cfg_layout_finalize parameter, and don't store exit predecessor BB until after it is potentially split. * function.h (struct rtl_data): New flag has_bb_partition. * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if any blocks in function actually partitioned. (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean up partitioning. * bb-reorder.c (connect_traces): Only look for partitions and skip block copying if any blocks in function actually partitioned. (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure that no cold blocks dominate a hot block. (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert as this is now done by force_nonfallthru_and_redirect. (add_reg_crossing_jump_notes): Handle the fact that some jumps may already be marked with region crossing note. (reorder_basic_blocks): Only need to verify partitions if any blocks in function actually partitioned. (insert_section_boundary_note): Only need to insert note if any blocks in function actually partitioned. (rest_of_handle_reorder_blocks): New cfg_layout_finalize parameter, and remove call to insert_section_boundary_note as this is now called via cfg_layout_finalize/fixup_reorder_chain. (duplicate_computed_gotos): New cfg_layout_finalize parameter. (partition_hot_cold_basic_blocks): Set flag indicating function has bb partitions. * bb-reorder.h: Declare insert_section_boundary_note and emit_barrier_after_bb, which are no longer static. * basic-block.h: Declare new function fixup_partitions. * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary check for region crossing note.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/26/2012 10:03 AM, Richard Biener wrote: On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a bare-metal class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo2 double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int class would still provide as wide as we need arithmetic, as in your rtl patch. I don't think he was objecting to that. That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) wide-int objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage. We had considered something along these lines and rejected it. I am not really opposed to doing something like this, but it is not an obvious winning idea and is likely not to be a good idea. Here was our thought process: if you abstract away the storage inside a wide int, then you should be able to copy a pointer to the block of data from either the rtl level integer constant or the tree level one into the wide int. It is certainly true that making a wide_int from one of these is an extremely common operation and doing this would avoid those copies. However, this causes two problems: 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make the object. it created the base object and then it allocated the array. Richard S noticed that we could just allocate one CONST_WIDE_INT that had the array in it. Doing it this way saves one ggc allocation and one indirection when accessing the data within the CONST_WIDE_INT. Our plan is to use the same trick at the tree level. So to avoid the copying, you seem to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. There is the issue then what if some wide-ints are not short lived. It makes me nervous to create internal pointers to gc ed memory. I thought they were all short-lived. 2) You are now stuck either ggcing the storage inside a wide_int when they are created as part of an expression or you have to play some game to represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. Again, this makes me nervous but it could be done. However, it does mean that now the wide ints that are not created from rtxes or trees will be more expensive because they are not going to get their storage for free, they are going to alloca it. No, those would simply use the embedded storage model. however, it still is not clear, given that 99% of the wide ints are going to fit in a single hwi, that this would be a noticeable win. Currently even if they fit into a HWI you will still allocate 4 times the larges integer mode size. You say that doesn't matter because they are
[Patch, Fortran] PR5469
l_push_char allocates memory which is freed with free_line. However, currently, the memory is not always freed when calling generate_error. If one aborts, that's fine. However, generate_error can also set the iostat variable. Thus, one should ensure that the memory is always freed. I wouldn't mind if someone, who knows libgfortran/io better than I, could confirm that the patch is okay. I don't want too free memory which is later used. Build and regtested on x86-64-linux. OK for the trunk? Tobias PS: Test case see PR. (I don't think it makes sense to include it in the test suite.) PPS: I see the following failures when regtesting; the LTO and the realloc one I know, but I haven't see the reassoc_4.f before. It cannot be due to my patch and when I run it manually, it also works: FAIL: gfortran.dg/lto/pr45586 f_lto_pr45586_0.o-f_lto_pr45586_0.o link, -O0 -flto (internal compiler error) FAIL: gfortran.dg/realloc_on_assign_5.f03 -O0 execution test FAIL: gfortran.dg/reassoc_4.f -O scan-tree-dump-times reassoc1 [0-9] \\* 22 2012-11-26 Tobias Burnus bur...@net-b.de PR fortran/5469 * io/list_read (parse_repeat, read_integer, read_character, parse_real, read_real, check_type, list_formatted_read_scalar, finish_list_read): Call list_free. diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c index 403e719..49615b3 100644 --- a/libgfortran/io/list_read.c +++ b/libgfortran/io/list_read.c @@ -617,6 +617,7 @@ parse_repeat (st_parameter_dt *dtp) free_saved (dtp); if (c == EOF) { + free_line (dtp); hit_eof (dtp); return 1; } @@ -905,11 +906,14 @@ read_integer (st_parameter_dt *dtp, int length) free_saved (dtp); if (c == EOF) { + free_line (dtp); hit_eof (dtp); return; } else if (c != '\n') eat_line (dtp); + + free_line (dtp); snprintf (message, MSGLEN, Bad integer for item %d in list input, dtp-u.p.item_count); generate_error (dtp-common, LIBERROR_READ_VALUE, message); @@ -1078,7 +1082,6 @@ read_character (st_parameter_dt *dtp, int length __attribute__ ((unused))) unget_char (dtp, c); eat_separator (dtp); dtp-u.p.saved_type = BT_CHARACTER; - free_line (dtp); } else { @@ -1087,10 +1090,12 @@ read_character (st_parameter_dt *dtp, int length __attribute__ ((unused))) dtp-u.p.item_count); generate_error (dtp-common, LIBERROR_READ_VALUE, message); } + free_line (dtp); return; eof: free_saved (dtp); + free_line (dtp); hit_eof (dtp); } @@ -1283,11 +1288,14 @@ parse_real (st_parameter_dt *dtp, void *buffer, int length) free_saved (dtp); if (c == EOF) { + free_line (dtp); hit_eof (dtp); return 1; } else if (c != '\n') eat_line (dtp); + + free_line (dtp); snprintf (message, MSGLEN, Bad floating point number for item %d, dtp-u.p.item_count); generate_error (dtp-common, LIBERROR_READ_VALUE, message); @@ -1387,11 +1395,14 @@ eol_4: free_saved (dtp); if (c == EOF) { + free_line (dtp); hit_eof (dtp); return; } else if (c != '\n') eat_line (dtp); + + free_line (dtp); snprintf (message, MSGLEN, Bad complex value in item %d of list input, dtp-u.p.item_count); generate_error (dtp-common, LIBERROR_READ_VALUE, message); @@ -1762,12 +1773,14 @@ read_real (st_parameter_dt *dtp, void * dest, int length) free_saved (dtp); if (c == EOF) { + free_line (dtp); hit_eof (dtp); return; } else if (c != '\n') eat_line (dtp); + free_line (dtp); snprintf (message, MSGLEN, Bad real number in item %d of list input, dtp-u.p.item_count); generate_error (dtp-common, LIBERROR_READ_VALUE, message); @@ -1784,6 +1797,7 @@ check_type (st_parameter_dt *dtp, bt type, int len) if (dtp-u.p.saved_type != BT_UNKNOWN dtp-u.p.saved_type != type) { + free_line (dtp); snprintf (message, MSGLEN, Read type %s where %s was expected for item %d, type_name (dtp-u.p.saved_type), type_name (type), dtp-u.p.item_count); @@ -1797,6 +1811,7 @@ check_type (st_parameter_dt *dtp, bt type, int len) if (dtp-u.p.saved_length != len) { + free_line (dtp); snprintf (message, MSGLEN, Read kind %d %s where kind %d is required for item %d, dtp-u.p.saved_length, type_name (dtp-u.p.saved_type), len, @@ -1970,7 +1985,10 @@ list_formatted_read_scalar (st_parameter_dt *dtp, bt type, void *p, cleanup: if (err == LIBERROR_END) -hit_eof (dtp); +{ + free_line (dtp); + hit_eof (dtp); +} return err; } @@ -2018,7 +2036,10 @@ finish_list_read (st_parameter_dt *dtp) err = eat_line (dtp); if (err == LIBERROR_END) -hit_eof (dtp); +{ + free_line (dtp); + hit_eof (dtp); +} } /* NAMELIST INPUT
[PATCH][AARCH64] Refactor constant generation
Hi, Constant building in the AArch64 backend spits out assembly code, which affects scheduling of the generated code. This patch rewrites the code to use RTL patterns. A full aarch64-none-elf regression run shows no issues. Thanks Sofiane - ChangeLog: 2012-11-26 Sofiane Naci sofiane.n...@arm.com * config/aarch64/aarch64.c (aarch64_build_constant): Update prototype. Call emit_move_insn instead of printing movi/movn/movz instructions. Call gen_insv_immdi instead of printing movk instruction. (aarch64_add_constant): Update prototype. Generate RTL instead of priting add/sub instructions. (aarch64_output_mi_thunk): Update calls to aarch64_build_constant and aarch64_add_constant. aarch64-refactor-const-gen.patch Description: Binary data
RFA: Enable both ld and gold in GCC 4.8
Hi, There is a patch to enable both ld and gold in GCC: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02076.html and Diego asked if there is anything in particular blocking this patch in trunk: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html I'd like to see it in GCC 4.8. Thanks. -- H.J.
Re: [PATCH] Section anchors and thread-local storage
On Mon, Nov 26, 2012 at 3:26 AM, Richard Sandiford rdsandif...@googlemail.com wrote: David Edelsohn dje@gmail.com writes: I have been working to enable native thread-local storage on AIX. One problem I encountered is the AIX assembler has difficulty with the anchor symbol for TLS CSECTs. While the section anchors machinery uses a separate pool for TLS entries, should section anchor blocks be used for TLS symbols at all? powerpc-linux uses GOT annotations directly and never places TLS symbols in the TOC at all. Section anchors seem to be avoided by TLS code already. The appended patch rejects TLS symbols for object blocks in general. I could add a target hook, but I wanted to propose this policy in general before pursing a target-specific hook. Yeah, TLS anchors work on mips64-linux-gnu, although admittedly section anchors aren't yet enabled by default there, so perhaps not many people have tried it. I know that it *CAN* work. Part of my question was whether thread-local symbols *SHOULD* be placed in section anchor blocks, whether or not they can be placed there on some systems. A use_blocks_for_decl_p hook sounds good, and fits nicely with targetm.use_blocks_for_constant_p. Below is the implementation as a new target hook. Thanks, David * target.def (use_blocks_for_decl_p): New hook. * varasm.c (use_blocks_for_decl_p): Apply hook as final condition. Index: target.def === --- target.def (revision 193798) +++ target.def (working copy) @@ -1495,6 +1495,13 @@ bool, (enum machine_mode mode, const_rtx x), hook_bool_mode_const_rtx_false) +/* True if the given decl can be put into an object_block. */ +DEFHOOK +(use_blocks_for_decl_p, + , + bool, (const_tree decl), + hook_bool_const_tree_true) + /* The minimum and maximum byte offsets for anchored addresses. */ DEFHOOKPOD (min_anchor_offset, Index: varasm.c === --- varasm.c(revision 193798) +++ varasm.c(working copy) @@ -1113,7 +1113,7 @@ if (lookup_attribute (alias, DECL_ATTRIBUTES (decl))) return false; - return true; + return targetm.use_blocks_for_decl_p (decl); } /* Create the DECL_RTL for a VAR_DECL or FUNCTION_DECL. DECL should
Re: [tsan] Instrument atomics
On Mon, Nov 26, 2012 at 12:35 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 26, 2012 at 12:17:44PM +0400, Dmitry Vyukov wrote: On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov dvyu...@google.com wrote: Ok. A slight problem then is that where the tsan pass sits right now, there is no easy way to find out if the builtin call will be expanded inline or not, so (similar for asan), if we instrument them in the pass, it might be instrumented twice at runtime if the builtin is expanded as a library call (once the added instrumentation for the builtin, once in the intercepted library call). That isn't wrong, just might need slightly more resources than if we ensured we only instrument the builtin if it isn't expanded inline. Should inlining of those functions be disabled as if -fno-builtins is specified? Yes, it sounds reasonable. Performance characteristics under tsan differ significantly, so most likely we don't care. Do we still need range access functions then? Yes. I think whether to instrument builtins inline or not should be a user decision Why is that? Most users probably are not aware builtins can be inlined or care about it. , -fsanitize-thread and either nothinng or -fno-builtin. Implying -fno-builtin might penalize some code unnecessarily (remember, builtin handling is not only about expanding some builtins inline, but also about folding them if they are used with constant arguments, performing all sorts of optimizations etc., Yes, in some cases, it can incur a little runtime overhead as int i; memset(i, 1, 4); but I think compared with tsan's runtime overhead, this is almost negligible. all of which are ok, just we don't have a knob to disable inline expansion of some builtins (which, some are certainly ok, e.g. those that don't have a library counterpart). There is an internal interface to call: disable_builtin_function (...) -- tsan option process can call it selectively for the stringop related builtins. Also note that many programs use __builtin_* calls directly and then you can't disable recognizing them as builtins. Those calls won't be affected, I think. Compiler can also directly generate builtin calls. However that means the range checking is still needed. David Jakub
Re: [PATCH] Cleanup last_location and update input_location in ipa_prop
On Mon, Nov 26, 2012 at 8:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 26, 2012 at 4:54 PM, Dehao Chen de...@google.com wrote: On Mon, Nov 26, 2012 at 7:28 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Nov 8, 2012 at 6:39 PM, Dehao Chen de...@google.com wrote: ping... The emit-rtl.c hunk is ok. I'm questioning the ipa-prop.c hunk though - what looks at input_location (nothing outside of the frontend should, really). ipa_modify_call_arguments invokes force_gimple_operand_gsi, which uses frontend routines to gimplify expr and uses input_location. Can you be more specific? That's the place that needs fixing - there are a lot more force_gimple_operand callers. You are right, the patch is updated as follows: Index: gcc/ipa-prop.c === --- gcc/ipa-prop.c (revision 193203) +++ gcc/ipa-prop.c (working copy) @@ -2870,7 +2870,8 @@ ipa_modify_call_arguments (struct cgraph_edge *cs, gcc_checking_assert (adj-offset % BITS_PER_UNIT == 0); base = gimple_call_arg (stmt, adj-base_index); - loc = EXPR_LOCATION (base); + loc = DECL_P (base) ? DECL_SOURCE_LOCATION (base) : + EXPR_LOCATION (base); if (TREE_CODE (base) != ADDR_EXPR POINTER_TYPE_P (TREE_TYPE (base))) Thanks, Dehao Richard. Thanks, Dehao Thanks, Richard. On Mon, Nov 5, 2012 at 5:20 PM, Dehao Chen de...@google.com wrote: Hi, This is a patch to do some obvious cleanup and setting correct input_location in ipa_prop (because it invokes gimplification routines). Bootstrapped and passed gcc regression tests. Is it okay for trunk? Thanks, Dehao gcc/ChangeLog: 2010-11-05 Dehao Chen de...@google.com * ipa-prop.c (ipa_modify_call_arguments): Set input_location so that gimplification routines can have right location. * emit-rtl.c (last_location): Remove unused variable.
Re: PR 55438: Incorrect use of BIGGEST_ALIGNMENT
In http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00257.html I said: get_best_mode has various checks to decide what counts as an acceptable bitfield mode. It actually has two copies of them, with slightly different alignment checks: MIN (unit, BIGGEST_ALIGNMENT) align vs. unit = MIN (align, BIGGEST_ALIGNMENT) The second looks more correct, since we can't necessarily guarantee larger alignments than BIGGEST_ALIGNMENT in all cases. PR 55438 shows why I was wrong. BIGGEST_ALIGNMENT is not (as I thought) the biggest supported alignment, but the: Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, just the biggest alignment that, when violated, may cause a fault. So it's perfectly possible for BIGGEST_ALIGNMENT to be 8 on 32-bit machines. That means that the Sequent check was flawed, doesn't it? It also seems that the entire business of alignment with size comparisons is dubious. As things stand, my change causes get_best_mode to reject anything larger than a byte for that combination, which causes infinite recursion between store_fixed_bit_field and store_split_bit_field. There are two problems. First: if (!bitregion_end_) { /* We can assume that any aligned chunk of ALIGN_ bits that overlaps the bitfield is mapped and won't trap. */ bitregion_end_ = bitpos + bitsize + align_ - 1; bitregion_end_ -= bitregion_end_ % align_ + 1; } is too conservative if the aligment is capped to BIGGEST_ALIGNMENT. We should be able to guarantee that word-aligned memory is mapped in at least word-sized chunks. (If you think now is the time to add a MIN_PAGE_SIZE macro, perhaps with MAX (BITS_PER_WORD, BIGGEST_ALIGNMENT) as its default, I can do that instead.) So this will fix the Sequent check with a conservative cap of BITS_PER_WORD. That's reasonable, I think. Also, in cases like these, the supposedly conservative: GET_MODE_BITSIZE (mode) = align doesn't preserve the cap in the original: MIN (unit, BIGGEST_ALIGNMENT) align Fixed by using GET_MODE_ALIGNMENT instead. Note that the comment just above needs to be adjusted then. What about the similar check in next_mode? /* Stop if the mode requires too much alignment. */ if (unit align_ SLOW_UNALIGNED_ACCESS (mode_, align_)) break; It seems to me that we should change it as well. -- Eric Botcazou
Re: [patch] reorg.c janitor patch 2: handle DEBUG_INSN
If that is true, then why are there so many post var-tracking passes using NONDEBUG_INSN_P and/or looking for the DEBUG_INSN code? See e.g. shorten_branches, reorg.c, various machine reorgs, etc. They are very likely overzealous. For example from reorg.c:redundant_insn() in the loop /* Scan backwards looking for a match. */: 151731 nemet if (!NONDEBUG_INSN_P (trial)) 99 kenner continue; 151731 nemet --insns_to_search; This was introduced to fix PR41349. Apparently DEBUG_INSN live on beyond var-tracking...? I think that the fix for this PR was really about NOTEs and not DEBUG_INSNs. I'm going to test the following patchlet on the SPARC: Index: reorg.c === --- reorg.c (revision 193748) +++ reorg.c (working copy) @@ -1628,7 +1628,7 @@ redundant_insn (rtx insn, rtx target, rt if (LABEL_P (trial)) return 0; - if (!NONDEBUG_INSN_P (trial)) + if (!INSN_P (trial)) continue; --insns_to_search; @@ -1731,7 +1731,7 @@ redundant_insn (rtx insn, rtx target, rt trial !LABEL_P (trial) insns_to_search 0; trial = PREV_INSN (trial)) { - if (!NONDEBUG_INSN_P (trial)) + if (!INSN_P (trial)) continue; --insns_to_search; -- Eric Botcazou
Re: [tsan] Instrument atomics
On Mon, Nov 26, 2012 at 9:07 PM, Xinliang David Li davi...@google.com wrote: Ok. A slight problem then is that where the tsan pass sits right now, there is no easy way to find out if the builtin call will be expanded inline or not, so (similar for asan), if we instrument them in the pass, it might be instrumented twice at runtime if the builtin is expanded as a library call (once the added instrumentation for the builtin, once in the intercepted library call). That isn't wrong, just might need slightly more resources than if we ensured we only instrument the builtin if it isn't expanded inline. Should inlining of those functions be disabled as if -fno-builtins is specified? Yes, it sounds reasonable. Performance characteristics under tsan differ significantly, so most likely we don't care. Do we still need range access functions then? Yes. I think whether to instrument builtins inline or not should be a user decision Why is that? Most users probably are not aware builtins can be inlined or care about it. I agree with David. Most likely it has negligible overhead under tsan. Even if gcc provides ability to choose between builtins and library functions, it does not have to provide it under tsan (which is a very special mode). On the other hand, I don't know what complexity is involved on gcc side. As for runtime I can easily implement it, it's just a matter of writing 2 thin wrappers for existing functionality. , -fsanitize-thread and either nothinng or -fno-builtin. Implying -fno-builtin might penalize some code unnecessarily (remember, builtin handling is not only about expanding some builtins inline, but also about folding them if they are used with constant arguments, performing all sorts of optimizations etc., Yes, in some cases, it can incur a little runtime overhead as int i; memset(i, 1, 4); but I think compared with tsan's runtime overhead, this is almost negligible. all of which are ok, just we don't have a knob to disable inline expansion of some builtins (which, some are certainly ok, e.g. those that don't have a library counterpart). There is an internal interface to call: disable_builtin_function (...) -- tsan option process can call it selectively for the stringop related builtins. Also note that many programs use __builtin_* calls directly and then you can't disable recognizing them as builtins. Those calls won't be affected, I think. Compiler can also directly generate builtin calls. However that means the range checking is still needed. David Jakub
patch to fix a test in PR55277
The following patch fixes the second test case in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55277 As for the first case, I don' think it will be fixed soon as I have more urgent PRs. The patch was successfully tested and bootstrapped on x86/x86-64. Committed as rev. 193824. 2012-11-26 Vladimir Makarov vmaka...@redhat.com PR target/55277 * lra-constraints.c (in_class_p): Check reg class contents too. 2012-11-26 Vladimir Makarov vmaka...@redhat.com PR target/55277 * gcc.target/i386/pr55227.c: New test. Index: lra-constraints.c === --- lra-constraints.c (revision 193755) +++ lra-constraints.c (working copy) @@ -293,7 +293,9 @@ in_class_p (rtx reg, enum reg_class cl, if (nregs == 1) return true; for (j = 0; j nregs; j++) - if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)) + if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j) + || ! TEST_HARD_REG_BIT (reg_class_contents[common_class], + hard_regno + j)) break; if (j = nregs) return true; Index: testsuite/gcc.target/i386/pr55277.c === --- testsuite/gcc.target/i386/pr55277.c (revision 0) +++ testsuite/gcc.target/i386/pr55277.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-options -O1 } */ + +int a, c; + +void f(long long p) +{ + long long b; + + if(b) +b = p ? : 0; + + for (; p; p++) +p *= a (c = p *= !a 2); + + a = b += !(b 3740917449u); +}
[PING^2] PR 54805: __gthread_tsd* in vxlib-tls.c
On 11/13/2012 10:22 PM, rbmj wrote: On 11/5/2012 12:57 PM, rbmj wrote: This removes warnings about implicit declarations and fixes one of the function calls in vxlib-tls.c for vxworks targets. I got the old prototypes from http://gcc.gnu.org/ml/gcc-patches/2005-08/msg01314.html See bug for further details. Someone please comment or commit :) Ping: http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00406.html Robert Mason Ping^2
Re: PR 55438: Incorrect use of BIGGEST_ALIGNMENT
Eric Botcazou ebotca...@adacore.com writes: In http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00257.html I said: get_best_mode has various checks to decide what counts as an acceptable bitfield mode. It actually has two copies of them, with slightly different alignment checks: MIN (unit, BIGGEST_ALIGNMENT) align vs. unit = MIN (align, BIGGEST_ALIGNMENT) The second looks more correct, since we can't necessarily guarantee larger alignments than BIGGEST_ALIGNMENT in all cases. PR 55438 shows why I was wrong. BIGGEST_ALIGNMENT is not (as I thought) the biggest supported alignment, but the: Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, just the biggest alignment that, when violated, may cause a fault. So it's perfectly possible for BIGGEST_ALIGNMENT to be 8 on 32-bit machines. That means that the Sequent check was flawed, doesn't it? It also seems that the entire business of alignment with size comparisons is dubious. Yeah, I suppose the Sequent test didn't really have any effect on BIGGEST_ALIGNMENT == BITS_PER_UNIT targets. Also, in cases like these, the supposedly conservative: GET_MODE_BITSIZE (mode) = align doesn't preserve the cap in the original: MIN (unit, BIGGEST_ALIGNMENT) align Fixed by using GET_MODE_ALIGNMENT instead. Note that the comment just above needs to be adjusted then. OK, how about: /* ??? For historical reasons, reject modes that would normally receive greater alignment, even if unaligned accesses are acceptable. This has both advantages and disadvantages. What about the similar check in next_mode? /* Stop if the mode requires too much alignment. */ if (unit align_ SLOW_UNALIGNED_ACCESS (mode_, align_)) break; It seems to me that we should change it as well. I think here we really do want unit (i.e. the GET_MODE_BITSIZE). We're dividing the bitfield into unit-sized chunks and want to know whether those chunks are aligned or not. If they aren't aligned, we need to know whether unaligned accesses are OK. Richard
Re: [patch] reorg.c janitor patch 2: handle DEBUG_INSN
Eric Botcazou ebotca...@adacore.com writes: If that is true, then why are there so many post var-tracking passes using NONDEBUG_INSN_P and/or looking for the DEBUG_INSN code? See e.g. shorten_branches, reorg.c, various machine reorgs, etc. They are very likely overzealous. For example from reorg.c:redundant_insn() in the loop /* Scan backwards looking for a match. */: 151731 nemet if (!NONDEBUG_INSN_P (trial)) 99 kenner continue; 151731 nemet --insns_to_search; This was introduced to fix PR41349. Apparently DEBUG_INSN live on beyond var-tracking...? I think that the fix for this PR was really about NOTEs and not DEBUG_INSNs. I'm going to test the following patchlet on the SPARC: Steve hit a case where running vartracking before dbr_schedule caused a bootstrap failure on MIPS. On targets without delay slots, that kind of bug has been fixed by: /* Variable tracking should be run after all optimizations which change order of insns. It also needs a valid CFG. */ #undef TARGET_DELAY_VARTRACK #define TARGET_DELAY_VARTRACK true But obviously we can't do that with dbr_schedule, since it changes the order of insns but runs with the CFG pulled down. In the long term it would be good to replace dbr_schedule altogether, but in the medium term I thought we might want to update it so that it can run before vartracking. Richard
[RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC
Hello, I noticed what appears to be a long-standing bug in generating .dwarf_frame sections with GCC on Linux on PowerPC. It had been my understanding that .dwarf_frame is supposed to differ from .eh_frame on PowerPC w.r.t. register numbers: .eh_frame should use GCC internal numbers, while .dwarf_frame should use the DWARF register numbers documented in the PowerPC ELF ABI. However, in actual fact, .dwarf_frame does not use the DWARF numbers; and it does not use the GCC numbers either, but a weird mixture: it uses GCC numbers for everything except for the condition code register, for which it uses the DWARF number (64). Doing a bit of code archaeology, it seems this behaviour goes back to a patch checked in by Geoff Keating just before GCC 4.2. (And indeed in GCC 4.1 we actually have DWARF numbers in .dwarf_frame.): use correct DWARF register numbers on rs6000 non-ELF ports http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00766.html This patch moves the definition of the two macros DBX_REGISTER_NUMBER and DWARF2_FRAME_REG_OUT from rs6000/sysv4.h to rs6000/rs6000.h: #define DBX_REGISTER_NUMBER(REGNO) rs6000_dbx_register_number (REGNO) /* Map register numbers held in the call frame info that gcc has collected using DWARF_FRAME_REGNUM to those that should be output in .debug_frame and .eh_frame. We continue to use gcc hard reg numbers for .eh_frame, but use the numbers mandated by the various ABIs for .debug_frame. rs6000_emit_prologue has translated any combination of CR2, CR3, CR4 saves to a save of CR2. The actual code emitted saves the whole of CR, so we map CR2_REGNO to the DWARF reg for CR. */ #define DWARF2_FRAME_REG_OUT(REGNO, FOR_EH) \ ((FOR_EH) ? (REGNO) \ : (REGNO) == CR2_REGNO ? 64\ : DBX_REGISTER_NUMBER (REGNO)) The intention apparently was to make Darwin use the same numbers as Linux. However, the actual effect was to make Darwin use the same numbers Linux used to use before, while at the same time breaking the numbers used by Linux. The problem is that target macro headers were included in the following sequence: rs6000/rs6000.h svr4.h rs6000/sysv4.h and the common svr4.h header contained a line: #undef DBX_REGISTER_NUMBER When DBX_REGISTER_NUMBER was defined in rs6000/sysv4.h, this didn't matter. But now that the definition was moved to rs6000/rs6000.h, it was in effect undone by the #undef in svr4.h, transforming the definition of DWARF2_FRAME_REG_OUT to be: #define DWARF2_FRAME_REG_OUT(REGNO, FOR_EH) \ ((FOR_EH) ? (REGNO) \ : (REGNO) == CR2_REGNO ? 64\ : (REGNO)) which explains the observed behaviour of GCC since 4.2. Since then, a couple of changes were done to the sources, but all preserved this behaviour. Most notably, Joseph Myers eliminated the common sysv4.h header, but moved the #undef to rs6000/sysv4.h instead: svr4.h avoidance: rs6000/powerpc http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01264.html Also, with current GCC and binutils, .dwarf_frame is usually not generated by GCC directly, but via .cfi_... directives interpreted by gas. This was implemented by a patch by Jakub Jelinek: [PATCH] Add support for .cfi_sections - way to emit also .debug_frame from .cfi_* directives http://sourceware.org/ml/binutils/2009-10/msg00028.html which notes: On ppc this is uglified by the need to translate register numbers - as .eh_frame uses different numbering of registers than .debug_frame. In the patch as posted to the mailing list, this translation is here: /* ppc uses different register numbers between .eh_frame and .debug_frame. This macro translates the .eh_frame register numbers to .debug_frame register numbers. */ #define md_reg_eh_frame_to_debug_frame(regno) \ ((regno) == 70 ? 64 /* cr2 */ \ : (regno) == 64 ? 100 /* mq */ \ : (regno) == 65 ? 108 /* lr */ \ : (regno) == 66 ? 109 /* ctr */\ : (regno) = 68 (regno) = 75 ? (regno) + 86 - 68 /* crN */ \ : (regno) == 76 ? 101 /* xer */\ : (regno) = 77 (regno) = 108 ? (regno) + 1124 - 77 /* vrN */ \ : (regno) == 109 ? 356 /* vrsave */\ : (regno) == 110 ? 67 /* vscr */ \ : (regno) == 111 ? 99 /* spe_acc */\ : (regno) == 112 ? 612 /* spefscr */ \ : (regno)) But in the patch as actually checked in (tc-ppc.h rev. 1.41), we have: /* ppc uses different register numbers between .eh_frame and .debug_frame. This macro translates the .eh_frame register numbers to .debug_frame register numbers. */ #define md_reg_eh_frame_to_debug_frame(regno) \ ((regno) ==
Re: [patch] reorg.c janitor patch 2: handle DEBUG_INSN
On Mon, Nov 26, 2012 at 07:08:41PM +, Richard Sandiford wrote: In the long term it would be good to replace dbr_schedule altogether, but in the medium term I thought we might want to update it so that it can run before vartracking. var-tracking definitely isn't prepared to handle dbr_schedule created sequences, I guess it would be much more work than just fixing whatever is needed on the dbr side that doesn't work right now (if anything). Jakub
Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC
On Mon, Nov 26, 2012 at 2:10 PM, Ulrich Weigand uweig...@de.ibm.com wrote: So I'm wondering where to go from here. I guess we could: 1. Bring GCC (and gas) behaviour in compliance with the documented ABI by removing the #undef DBX_REGISTER_NUMBER and changing gas's md_reg_eh_frame_to_debug_frame to the original implementation from Jakub's patch. That would make GDB work well on new files, but there are a large number of binaries out there where we continue to have the same behaviour as today ... 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in .dwarf_frame, except for the condition code register. This would break debugging of files built with GCC 4.0 and 4.1 unless we want to add a special hack for that. 3. Like 2., but remove the condition code hack: simply use identical numbers in .eh_frame and .dwarf_frame. This would make PowerPC like other Linux platforms in that respect. Thoughts? I vote for (3). Thanks, David
Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC
Date: Mon, 26 Nov 2012 20:10:06 +0100 (CET) From: Ulrich Weigand uweig...@de.ibm.com Hello, I noticed what appears to be a long-standing bug in generating .dwarf_frame sections with GCC on Linux on PowerPC. ... So I'm wondering where to go from here. I guess we could: 1. Bring GCC (and gas) behaviour in compliance with the documented ABI by removing the #undef DBX_REGISTER_NUMBER and changing gas's md_reg_eh_frame_to_debug_frame to the original implementation from Jakub's patch. That would make GDB work well on new files, but there are a large number of binaries out there where we continue to have the same behaviour as today ... 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in .dwarf_frame, except for the condition code register. This would break debugging of files built with GCC 4.0 and 4.1 unless we want to add a special hack for that. 3. Like 2., but remove the condition code hack: simply use identical numbers in .eh_frame and .dwarf_frame. This would make PowerPC like other Linux platforms in that respect. Thoughts? What do other compilers (in particular XLC) do? From a GDB standpoint it would be a major PITA if different compilers would use different encodings for .dwarf_frame.
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
Are you sure you have all my changes applied? I applied the 4 patches attached to PR55121 into my trunk checkout that has my fixes, and to a pristine trunk checkout. I configured and built both for --target=arm-none-linux-gnueabi, and built using your options, .i file and gcda file. I can reproduce the failure using the pristine trunk with your patches but not with my fixed trunk + your patches. (I just updated to head to pickup recent changes and get the same result. The vec changes required some manual changes to the patch, which I will resend shortly.) Without my fixes: $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a eval.c: In function ‘Ge’: eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 } ^ 0x622f71 df_compact_blocks() ../../gcc_trunk_3/gcc/df-core.c:1560 0x5cfcb5 compact_blocks() ../../gcc_trunk_3/gcc/cfg.c:162 0xc9dce0 reorder_basic_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2154 0xc9dce0 rest_of_handle_reorder_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2219 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. With my fixes: $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 Thanks, Teresa On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon christophe.l...@linaro.org wrote: Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: error: fallthru edge crosses section boundary, including the case described in PR55121. On 26 November 2012 16:55, Teresa Johnson tejohn...@google.com wrote: Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson tejohn...@google.com wrote: Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignement, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output for some large fdo compilations and saw only minimal increases in the dominance computation times, which were only a tiny percent of the overall compile time. Additionally, I added a flag to the rtl_data structure to indicate whether any partitioning was actually performed, so that optimizations which were conservatively disabled whenever
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: Are you sure you have all my changes applied? I applied the 4 patches attached to PR55121 into my trunk checkout that has my fixes, and to a pristine trunk checkout. I configured and built both for --target=arm-none-linux-gnueabi, and built using your options, .i file and gcda file. I can reproduce the failure using the pristine trunk with your patches but not with my fixed trunk + your patches. (I just updated to head to pickup recent changes and get the same result. The vec changes required some manual changes to the patch, which I will resend shortly.) Teresa, Your mailer seems to have corrupted the posted patch with stray =3D characters and line breaks. Can you repost a copy as an attachment to the list? Jack Without my fixes: $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a eval.c: In function ‘Ge’: eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 } ^ 0x622f71 df_compact_blocks() ../../gcc_trunk_3/gcc/df-core.c:1560 0x5cfcb5 compact_blocks() ../../gcc_trunk_3/gcc/cfg.c:162 0xc9dce0 reorder_basic_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2154 0xc9dce0 rest_of_handle_reorder_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2219 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. With my fixes: $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 Thanks, Teresa On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon christophe.l...@linaro.org wrote: Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: error: fallthru edge crosses section boundary, including the case described in PR55121. On 26 November 2012 16:55, Teresa Johnson tejohn...@google.com wrote: Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson tejohn...@google.com wrote: Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignement, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output
Re: PR 55438: Incorrect use of BIGGEST_ALIGNMENT
OK, how about: /* ??? For historical reasons, reject modes that would normally receive greater alignment, even if unaligned accesses are acceptable. This has both advantages and disadvantages. Fine with me. I think here we really do want unit (i.e. the GET_MODE_BITSIZE). We're dividing the bitfield into unit-sized chunks and want to know whether those chunks are aligned or not. If they aren't aligned, we need to know whether unaligned accesses are OK. We could conceivably have an aligned mode with bitsize 32 and alignment 16 (i.e. STRICT_ALIGNMENT 1 and BIGGEST_ALIGNMENT 16), meaning that we can use 16-bit aligned 32-bit chunks in this mode. Why would the bitsize matter (assuming that it's a multiple of the alignment)? -- Eric Botcazou
PATCH: lto/55474: global-buffer-overflow in lto-wrapper.c
Hi, OPT_SPECIAL_unknown, OPT_SPECIAL_ignore, OPT_SPECIAL_program_name and OPT_SPECIAL_input_file are special options, which aren't in cl_options. This patch avoids if (!(cl_options[foption-opt_index].flags CL_TARGET)) on them. This patch skips them. OK to install? Thanks. H.J. --- 2012-11-26 H.J. Lu hongjiu...@intel.com PR lto/55474 * lto-wrapper.c (merge_and_complain): Handle OPT_SPECIAL_unknown, OPT_SPECIAL_ignore, OPT_SPECIAL_program_name and OPT_SPECIAL_input_file. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 1f4d212..24de743 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -393,6 +393,12 @@ merge_and_complain (struct cl_decoded_option **decoded_options, struct cl_decoded_option *foption = fdecoded_options[i]; switch (foption-opt_index) { + case OPT_SPECIAL_unknown: + case OPT_SPECIAL_ignore: + case OPT_SPECIAL_program_name: + case OPT_SPECIAL_input_file: + break; + default: if (!(cl_options[foption-opt_index].flags CL_TARGET)) break;
[Patch, Fortran] PR55475 - fix invalid reads with show_locus
As found with -fsanitize=address by HJ, but it also shows up with valgrind. The fix for the PR is the change in scanner.c; I think the patch is rather obvious. The change in error.c is due to: if (c1 == c2) c2 += 1; which could lead to an out-of-bounds condition is c1 is already at the last character - then one exceeds the bound for c2. Build and tested on x86-64-linux with no new failures.* OK for the trunk? Tobias * I get: FAIL for gfortran.dg/lto/pr45586, gfortran.dg/realloc_on_assign_5.f03 and gfortran.dg/reassoc_4.f and XPASS for gfortran.dg/do_1.f90. 2012-11-26 Tobias Burnus bur...@net-b.de PR fortran/55475 * scanner.c (gfc_next_char_literal): Fix setting locus to free_line_length for the error message. * error.c (show_locus): Fix potential out-of-bounds read. diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c index 4b06156..611540c 100644 --- a/gcc/fortran/error.c +++ b/gcc/fortran/error.c @@ -387,7 +387,7 @@ show_locus (locus *loc, int c1, int c2) cmax -= offset; p = (lb-line[offset]); - for (i = 0; i = cmax; i++) + for (i = 0; i cmax; i++) { int spaces, j; spaces = gfc_widechar_display_length (*p++); @@ -401,6 +401,11 @@ show_locus (locus *loc, int c1, int c2) error_char (' '); } + if (i == c1) +error_char ('1'); + else if (i == c2) +error_char ('2'); + error_char ('\n'); } diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c index e0556a9..765c0f9 100644 --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -1068,10 +1068,12 @@ restart: gfc_current_locus.lb-truncated) { int maxlen = gfc_option.free_line_length; + gfc_char_t *current_nextc = gfc_current_locus.nextc; + gfc_current_locus.lb-truncated = 0; - gfc_current_locus.nextc += maxlen; + gfc_current_locus.nextc = gfc_current_locus.lb-line + maxlen; gfc_warning_now (Line truncated at %L, gfc_current_locus); - gfc_current_locus.nextc -= maxlen; + gfc_current_locus.nextc = current_nextc; } if (c != '')
Re: r193821 - in /trunk/gcc: ChangeLog common.opt d...
I don't see this posted on gcc-patches@, but... On Mon, 26 Nov 2012, ste...@gcc.gnu.org wrote: testsuite/ * testsuite/gcc.dg/20050811-1.c: Change -dv option to -graph option to -fdump-rtl-all. * testsuite/gcc.dg/pr37858.c: Remove -dv option. ...looks like you missed: Executing on host: /tmp/hpautotest-gcc0/cris-elf/gccobj/gcc/xgcc -B/tmp/hpautotest-gcc0/cris-elf/gccobj/gcc/ /tmp/hpautotest-gcc0/gcc/gcc/testsuite/gcc.dg/20050811-2.c -fno-diagnostics-show-caret -O2 -dv -fdump-rtl-postreload -S -isystem /tmp/hpautotest-gcc0/cris-elf/gccobj/cris-elf/./newlib/targ-include -isystem /tmp/hpautotest-gcc0/gcc/newlib/libc/include -o 20050811-2.s(timeout = 300) cc1: warning: unrecognized gcc debugging option: v [enabled by default] output is: cc1: warning: unrecognized gcc debugging option: v [enabled by default] FAIL: gcc.dg/20050811-2.c (test for excess errors) brgds, H-P
[PATCH] Fix the bug of comparing goto_locus to UNKNOWN_LOCATION
Hi, This patch fixes a bug of comparing goto_locus to UNKNOWN_LOCATION when combining basic blocks. Bootstrapped and and passed gcc regression tests. Is it okay for trunk? Thanks, Dehao gcc/ChangeLog: 2012-11-26 Dehao Chen de...@google.com * cfgrtl.c (rtl_merge_blocks): Check with UNKNOWN_LOCATION correctly. (cfg_layout_merge_blocks): Likewise. Index: gcc/cfgrtl.c === --- gcc/cfgrtl.c (revision 193828) +++ gcc/cfgrtl.c (working copy) @@ -890,7 +890,8 @@ rtl_merge_blocks (basic_block a, basic_block b) df_bb_delete (b-index); /* If B was a forwarder block, propagate the locus on the edge. */ - if (forwarder_p !EDGE_SUCC (b, 0)-goto_locus) + if (forwarder_p + LOCATION_LOCUS (EDGE_SUCC (b, 0)-goto_locus) == UNKNOWN_LOCATION) EDGE_SUCC (b, 0)-goto_locus = EDGE_SUCC (a, 0)-goto_locus; if (dump_file) @@ -4149,7 +4150,7 @@ cfg_layout_merge_blocks (basic_block a, basic_bloc /* If B was a forwarder block, propagate the locus on the edge. */ if (forwarder_p - LOCATION_LOCUS (EDGE_SUCC (b, 0)-goto_locus) != UNKNOWN_LOCATION) + LOCATION_LOCUS (EDGE_SUCC (b, 0)-goto_locus) == UNKNOWN_LOCATION) EDGE_SUCC (b, 0)-goto_locus = EDGE_SUCC (a, 0)-goto_locus; if (dump_file)
Re: [PATCH] Cleanup last_location and update input_location in ipa_prop
The new patch is attached. Bootstrapped and passed gcc regression test. Ok for trunk? Thanks, Dehao gcc/ChangeLog: 2010-11-05 Dehao Chen de...@google.com * ipa-prop.c (ipa_modify_call_arguments): Set loc correctly. * emit-rtl.c (last_location): Remove unused variable. Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 193203) +++ gcc/emit-rtl.c (working copy) @@ -5937,7 +5937,7 @@ location_t epilogue_location; /* Hold current location information and last location information, so the datastructures are built lazily only when some instructions in given place are needed. */ -static location_t curr_location, last_location; +static location_t curr_location; /* Allocate insn location datastructure. */ void @@ -5945,7 +5945,6 @@ insn_locations_init (void) { prologue_location = epilogue_location = 0; curr_location = UNKNOWN_LOCATION; - last_location = UNKNOWN_LOCATION; } /* At the end of emit stage, clear current location. */ Index: gcc/ipa-prop.c === --- gcc/ipa-prop.c (revision 193203) +++ gcc/ipa-prop.c (working copy) @@ -2870,7 +2870,8 @@ ipa_modify_call_arguments (struct cgraph_edge *cs, gcc_checking_assert (adj-offset % BITS_PER_UNIT == 0); base = gimple_call_arg (stmt, adj-base_index); - loc = EXPR_LOCATION (base); + loc = DECL_P (base) ? DECL_SOURCE_LOCATION (base) + : EXPR_LOCATION (base); if (TREE_CODE (base) != ADDR_EXPR POINTER_TYPE_P (TREE_TYPE (base)))
Re: r193821 - in /trunk/gcc: ChangeLog common.opt d...
On Mon, Nov 26, 2012 at 11:38 PM, Hans-Peter Nilsson wrote: I don't see this posted on gcc-patches@, but... On Mon, 26 Nov 2012, wrote: testsuite/ * testsuite/gcc.dg/20050811-1.c: Change -dv option to -graph option to -fdump-rtl-all. * testsuite/gcc.dg/pr37858.c: Remove -dv option. ...looks like you missed: Executing on host: /tmp/hpautotest-gcc0/cris-elf/gccobj/gcc/xgcc -B/tmp/hpautotest-gcc0/cris-elf/gccobj/gcc/ /tmp/hpautotest-gcc0/gcc/gcc/testsuite/gcc.dg/20050811-2.c -fno-diagnostics-show-caret -O2 -dv -fdump-rtl-postreload -S -isystem /tmp/hpautotest-gcc0/cris-elf/gccobj/cris-elf/./newlib/targ-include -isystem /tmp/hpautotest-gcc0/gcc/newlib/libc/include -o 20050811-2.s(timeout = 300) cc1: warning: unrecognized gcc debugging option: v [enabled by default] output is: cc1: warning: unrecognized gcc debugging option: v [enabled by default] FAIL: gcc.dg/20050811-2.c (test for excess errors) brgds, H-P Yup, sorry. Fixed now. Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 193829) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2012-11-26 Steven Bosscher + + * gcc.dg/20050811-2.c: Change -dv option to -graph option. + 2012-11-26 Georg-Johann Lay * gcc.dg/54455.c: Require scheduling. Index: testsuite/gcc.dg/20050811-2.c === --- testsuite/gcc.dg/20050811-2.c (revision 193829) +++ testsuite/gcc.dg/20050811-2.c (working copy) @@ -1,6 +1,6 @@ -/* Test whether -dov doesn't crash. */ +/* Test whether graph dumping doesn't crash. */ /* { dg-do compile } */ -/* { dg-options -O2 -dv -fdump-rtl-postreload } */ +/* { dg-options -O2 -fdump-rtl-postreload-graph } */ int foo (void) {
[patch] RFA: slim RTL printing cleanups
Hello, This patch performs some necessary TLC on slim RTL printing in sched-vis.c: * Make it independent of the scheduler. Actually it already was, mostly. This patch completes the job. * Harmonize dumping templates for INSN_UID. * Always print the pattern of a CALL_INSN. * Don't print jump for sched dumps only. * Harmonize function names: - print_* print to a char * buffer - dump_* print to a given FILE * - debug_* print to stderr. Some of the ideas come from Richard S.'s http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00284.html. The next patch addresses a bigger problem: The print_* variants should use something better than a fixed-size buffer. I plan to use pretty-print. An obstack would also work; but I'd like to keep one layer of abstraction between the file output and the printers, so that users can post-process the string if necessary (the case I'm thinking of, is the graph dumpers, which need to escape some characters before writing to a file). Any thoughts on this from anyone? Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk? (Yes, I know it's stage3, but look at it this way: the patch does no harm, and good dumps help for debugging real bugs ;-) Ciao! Steven print-rtl-slim_1.diff Description: Binary data
[patch] fix libstdc++/55463 calling mem_fn with rvalues
PR libstdc++/55463 * include/std/functional (_Mem_fn): Handle rvalue objects. Add noexcept-specifications. * testsuite/20_util/function_objects/mem_fn/55463.cc: New. * testsuite/20_util/bind/ref_neg.cc: Adjust dg-error line numbers. Tested x86_64-linux, committed to trunk. commit 4ddc9752bc93b233efcff9273b0fe4e488a783b8 Author: Jonathan Wakely jwakely@gmail.com Date: Sun Nov 25 21:56:23 2012 + PR libstdc++/55463 * include/std/functional (_Mem_fn): Handle rvalue objects. Add noexcept-specifications. * testsuite/20_util/function_objects/mem_fn/55463.cc: New. * testsuite/20_util/bind/ref_neg.cc: Adjust dg-error line numbers. diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional index 4d6e6a8..1a98127 100644 --- a/libstdc++-v3/include/std/functional +++ b/libstdc++-v3/include/std/functional @@ -65,7 +65,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class _Mem_fn; templatetypename _Tp, typename _Class _Mem_fn_Tp _Class::* -mem_fn(_Tp _Class::*); +mem_fn(_Tp _Class::*) noexcept; _GLIBCXX_HAS_NESTED_TYPE(result_type) @@ -528,13 +528,16 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) templatetypename _Tp _Res - _M_call(_Tp __object, const volatile _Class *, + _M_call(_Tp __object, const volatile _Class *, _ArgTypes... __args) const - { return (__object.*__pmf)(std::forward_ArgTypes(__args)...); } + { + return (std::forward_Tp(__object).*__pmf) + (std::forward_ArgTypes(__args)...); + } templatetypename _Tp _Res - _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const + _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const { return ((*__ptr).*__pmf)(std::forward_ArgTypes(__args)...); } public: @@ -555,12 +558,17 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) // Handle smart pointers, references and pointers to derived templatetypename _Tp _Res - operator()(_Tp __object, _ArgTypes... __args) const + operator()(_Tp __object, _ArgTypes... __args) const { - return _M_call(__object, __object, + return _M_call(std::forward_Tp(__object), __object, std::forward_ArgTypes(__args)...); } + templatetypename _Tp + _Res + operator()(reference_wrapper_Tp __ref, _ArgTypes... __args) const + { return operator()(__ref.get(), std::forward_ArgTypes(__args)...); } + private: _Functor __pmf; }; @@ -575,13 +583,16 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) templatetypename _Tp _Res - _M_call(_Tp __object, const volatile _Class *, + _M_call(_Tp __object, const volatile _Class *, _ArgTypes... __args) const - { return (__object.*__pmf)(std::forward_ArgTypes(__args)...); } + { + return (std::forward_Tp(__object).*__pmf) + (std::forward_ArgTypes(__args)...); + } templatetypename _Tp _Res - _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const + _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const { return ((*__ptr).*__pmf)(std::forward_ArgTypes(__args)...); } public: @@ -601,12 +612,17 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) // Handle smart pointers, references and pointers to derived templatetypename _Tp - _Res operator()(_Tp __object, _ArgTypes... __args) const + _Res operator()(_Tp __object, _ArgTypes... __args) const { - return _M_call(__object, __object, + return _M_call(std::forward_Tp(__object), __object, std::forward_ArgTypes(__args)...); } + templatetypename _Tp + _Res + operator()(reference_wrapper_Tp __ref, _ArgTypes... __args) const + { return operator()(__ref.get(), std::forward_ArgTypes(__args)...); } + private: _Functor __pmf; }; @@ -621,13 +637,16 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) templatetypename _Tp _Res - _M_call(_Tp __object, const volatile _Class *, + _M_call(_Tp __object, const volatile _Class *, _ArgTypes... __args) const - { return (__object.*__pmf)(std::forward_ArgTypes(__args)...); } + { + return (std::forward_Tp(__object).*__pmf) + (std::forward_ArgTypes(__args)...); + } templatetypename _Tp _Res - _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const + _M_call(_Tp __ptr, const volatile void *, _ArgTypes... __args) const { return ((*__ptr).*__pmf)(std::forward_ArgTypes(__args)...); } public: @@ -648,12 +667,17 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) // Handle smart pointers, references and pointers to derived templatetypename _Tp _Res - operator()(_Tp __object, _ArgTypes...
[patch] tweak std::bind constraint
* include/std/functional (__is_socketlike): Change from class template to alias template. Tested x86_64-linux, committed to trunk. commit 634c004b52c86148e1719e20744504816374dfdb Author: Jonathan Wakely jwakely@gmail.com Date: Sun Nov 25 22:01:28 2012 + * include/std/functional (__is_socketlike): Change from class template to alias template. diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional index 1a98127..0d8fbd6 100644 --- a/libstdc++-v3/include/std/functional +++ b/libstdc++-v3/include/std/functional @@ -1503,14 +1503,8 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) // Trait type used to remove std::bind() from overload set via SFINAE // when first argument has integer type, so that std::bind() will // not be a better match than ::bind() from the BSD Sockets API. - templatetypename _Tp -class __is_socketlike -{ - typedef typename decay_Tp::type _Tp2; -public: - static const bool value = - is_integral_Tp2::value || is_enum_Tp2::value; -}; + templatetypename _Tp, typename _Tp2 = typename decay_Tp::type +using __is_socketlike = __or_is_integral_Tp2, is_enum_Tp2; templatebool _SocketLike, typename _Func, typename... _BoundArgs struct _Bind_helper
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Richard, I spent a good part of the afternoon talking to Mike about this. He is on the c++ standards committee and is a much more seasoned c++ programmer than I am. He convinced me that with a large amount of engineering and c++ foolishness that it was indeed possible to get your proposal to POSSIBLY work as well as what we did. But now the question is why would any want to do this? At the very least you are talking about instantiating two instances of wide-ints, one for the stack allocated uses and one for the places where we just move a pointer from the tree or the rtx. Then you are talking about creating connectors so that the stack allocated functions can take parameters of pointer version and visa versa. Then there is the issue that rather than just saying that something is a wide int, that the programmer is going to have to track it's origin. In particular, where in the code right now i say. wide_int foo = wide_int::from_rtx (r1); wide_int bar = wide_int::from_rtx (r2) + foo; now i would have to say wide_int_ptr foo = wide_int_ptr::from_rtx (r1); wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo; then when i want to call some function using a wide_int ref that function now must be either overloaded to take both or i have to choose one of the two instantiations (presumably based on which is going to be more common) and just have the compiler fix up everything (which it is likely to do). And so what is the payoff: 1) No one except the c++ elite is going to understand the code. The rest of the community will hate me and curse the ground that i walk on. 2) I will end up with a version of wide-int that can be used as a medium life container (where i define medium life as not allowed to survive a gc since they will contain pointers into rtxes and trees.) 3) An no clients that actually wanted to do this!!I could use as an example one of your favorite passes, tree-vrp. The current double-int could have been a medium lifetime container since it has a smaller footprint, but in fact tree-vrp converts those double-ints back into trees for medium storage. Why, because it needs the other fields of a tree-cst to store the entire state. Wide-ints also suffer this problem. their only state are the data, and the three length fields. They have no type and none of the other tree info so the most obvious client for a medium lifetime object is really not going to be a good match even if you solve the storage problem. The fact is that wide-ints are an excellent short term storage class that can be very quickly converted into our two long term storage classes. Your proposal is requires a lot of work, will not be easy to use and as far as i can see has no payoff on the horizon. It could be that there could be future clients for a medium lifetime value, but asking for this with no clients in hand is really beyond the scope of a reasonable review. I remind you that the purpose of these patches is to solve problems that exist in the current compiler that we have papered over for years. If someone needs wide-ints in some way that is not foreseen then they can change it. kenny On 11/26/2012 11:30 AM, Richard Biener wrote: On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/26/2012 10:03 AM, Richard Biener wrote: On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a bare-metal class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo2 double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int
[PATCH, libgcc, ARM] __gnu_f2h_internal inaccuracy
__gnu_f2h_internal in libgcc converts single-precision floating-point value to half-precision value for ARM. I noticed there is a slight inaccuracy for floating-point values 2^-25 + epsilon (with epsilon 0 and 2^-26). Those should all be converted to 2^-24 in half-precision. And this because the mask used to implement the even-odd rounding for aexp = -25 is wrong. Currently we have: /* Decimal point between bits 22 and 23. */ mantissa |= 0x0080; if (aexp -14) { mask = 0x007f; if (aexp -25) aexp = -26; else if (aexp != -25) mask = 24 + aexp; } else mask = 0x1fff; But when aexp is 25 the mask should be 0xff instead of 0x7f as the decimal dot in half-precision will be between bit 24 and 23 of the above mentioned mantissa. Cfr. the even-odd rounding done: /* Round. */ if (mantissa mask) { increment = (mask + 1) 1; if ((mantissa mask) == increment) increment = mantissa (increment 1); mantissa += increment; if (mantissa = 0x0100) { mantissa = 1; aexp++; } } Attached patch solves this problem. I've left out the clamping of aexp to -26 for values less than -25 as this it not necessary. After the even-odd rounding all aexp values less than -24 will result in +0. or -0. anyway. John Tytgat j...@bass-software.com * config/arm/fp16.c (__gnu_f2h_internal): Fix inaccuracy. I've got a copyright assignment but no write access. John Tytgat. -- John Tytgat BASS j...@bass-software.com
Re: [PATCH, libgcc, ARM] __gnu_f2h_internal inaccuracy
In message ab11eef452...@hobbes.bass-software.com John Tytgat j...@bass-software.com wrote: [...] Attached patch solves this problem. [...] This time for real. John Tytgat. -- John Tytgat, in his comfy chair at home john.tyt...@aaug.net Index: libgcc/config/arm/fp16.c === --- libgcc/config/arm/fp16.c(revision 193830) +++ libgcc/config/arm/fp16.c(working copy) @@ -47,11 +47,9 @@ mantissa |= 0x0080; if (aexp -14) { - mask = 0x007f; - if (aexp -25) - aexp = -26; - else if (aexp != -25) - mask = 24 + aexp; + mask = 0x00ff; + if (aexp = -25) +mask = 25 + aexp; } else mask = 0x1fff;
RE: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c
Ping^2 -Original Message- From: Joey Ye Sent: Tuesday, November 20, 2012 10:09 To: gcc-patches@gcc.gnu.org Cc: Joey Ye Subject: RE: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Ping, as Joseph Prostko is saying that this patch shall solve the same problem he's facing. -Original Message- From: Joey Ye [mailto:joey...@arm.com] Sent: Friday, September 21, 2012 15:42 To: gcc-patches@gcc.gnu.org Subject: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Current crtstuff.c checks if JCR_SECTION_NAME is defined to decide whether do work for JCR. However, defaults.h always defines JCR_SECTION_NAME: #ifndef JCR_SECTION_NAME #define JCR_SECTION_NAME .jcr #endif So it is impossible to disable JCR related code in crtbegin.o, which can save some bytes for every applications that doesn't need java. This patch revise the check of JCR_SECTION_NAME to TARGET_USE_JCR_SECTION. By defining latter to zero disable JCR in crtstuff. This change doesn't impact logic of any target given following defines in defaults.h: #ifndef TARGET_USE_JCR_SECTION #ifdef JCR_SECTION_NAME #define TARGET_USE_JCR_SECTION 1 #else #define TARGET_USE_JCR_SECTION 0 #endif #endif Again, this patch doesn't impact libgcc on any target, unless TARGET_USE_JCR_SECTION is explicitly defined to 0 with make CFLAGS_FOR_TARGET=-DTARGET_USE_JCR_SECTION=0. AIX defines TARGET_USE_JCR_SECTION to 0, but it has no crtbegin/end stuff. So also no impact. OK to trunk? 2012-09-21 Joey Ye joey...@arm.com * crtstuff.c: Check TARGET_USE_JCR_SECTION. Index: libgcc/crtstuff.c === --- libgcc/crtstuff.c (revision 190556) +++ libgcc/crtstuff.c (working copy) @@ -256,13 +256,13 @@ = { }; #endif /* USE_EH_FRAME_REGISTRY */ -#ifdef JCR_SECTION_NAME +#if TARGET_USE_JCR_SECTION defined (JCR_SECTION_NAME) /* Stick a label at the beginning of the java class registration info so we can register them properly. */ STATIC void *__JCR_LIST__[] __attribute__ ((used, section(JCR_SECTION_NAME), aligned(sizeof(void* = { }; -#endif /* JCR_SECTION_NAME */ +#endif /* TARGET_USE_JCR_SECTION JCR_SECTION_NAME */ #if USE_TM_CLONE_REGISTRY STATIC func_ptr __TMC_LIST__[] @@ -438,7 +438,7 @@ #endif #if defined(USE_EH_FRAME_REGISTRY) \ -|| defined(JCR_SECTION_NAME) \ +|| defined(TARGET_USE_JCR_SECTION) \ || defined(USE_TM_CLONE_REGISTRY) /* Stick a call to __register_frame_info into the .init section. For some reason calls with no arguments work more reliably in .init, so stick the @@ -461,7 +461,7 @@ #endif /* CRT_GET_RFIB_DATA */ #endif /* USE_EH_FRAME_REGISTRY */ -#ifdef JCR_SECTION_NAME +#if TARGET_USE_JCR_SECTION if (__JCR_LIST__[0]) { void (*register_classes) (void *) = _Jv_RegisterClasses; @@ -469,7 +469,7 @@ if (register_classes) register_classes (__JCR_LIST__); } -#endif /* JCR_SECTION_NAME */ +#endif /* TARGET_USE_JCR_SECTION */ #if USE_TM_CLONE_REGISTRY register_tm_clones (); @@ -483,7 +483,7 @@ __attribute__ ((__used__, section(.init_array), aligned(sizeof(func_ptr = { frame_dummy }; #endif /* !defined(INIT_SECTION_ASM_OP) */ -#endif /* USE_EH_FRAME_REGISTRY || JCR_SECTION_NAME || USE_TM_CLONE_REGISTRY */ +#endif /* USE_EH_FRAME_REGISTRY || TARGET_USE_JCR_SECTION || USE_TM_CLONE_REGISTRY */ #else /* OBJECT_FORMAT_ELF */ @@ -551,7 +551,7 @@ } #if defined(USE_EH_FRAME_REGISTRY) \ -|| defined(JCR_SECTION_NAME) \ +|| defined(TARGET_USE_JCR_SECTION) \ || defined(USE_TM_CLONE_REGISTRY) /* A helper function for __do_global_ctors, which is in crtend.o. Here in crtbegin.o, we can reference a couple of symbols not visible there. @@ -566,7 +566,7 @@ __register_frame_info (__EH_FRAME_BEGIN__, object); #endif -#ifdef JCR_SECTION_NAME +#if TARGET_USE_JCR_SECTION if (__JCR_LIST__[0]) { void (*register_classes) (void *) = _Jv_RegisterClasses; @@ -580,7 +580,7 @@ register_tm_clones (); #endif /* USE_TM_CLONE_REGISTRY */ } -#endif /* USE_EH_FRAME_REGISTRY || JCR_SECTION_NAME || USE_TM_CLONE_REGISTRY */ +#endif /* USE_EH_FRAME_REGISTRY || TARGET_USE_JCR_SECTION || USE_TM_CLONE_REGISTRY */ #else /* ! INIT_SECTION_ASM_OP ! HAS_INIT_SECTION */ #error What are you doing with crtstuff.c, then? @@ -656,13 +656,13 @@ = { 0 }; #endif /* EH_FRAME_SECTION_NAME */ -#ifdef JCR_SECTION_NAME +#if TARGET_USE_JCR_SECTION defined (JCR_SECTION_NAME) /* Null terminate the .jcr section array. */ STATIC void *__JCR_END__[1] __attribute__ ((used, section(JCR_SECTION_NAME), aligned(sizeof(void * = { 0 }; -#endif /* JCR_SECTION_NAME
Re: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c
On Mon, Nov 26, 2012 at 7:19 PM, Joey Ye joey...@arm.com wrote: Ping^2 -Original Message- From: Joey Ye Sent: Tuesday, November 20, 2012 10:09 To: gcc-patches@gcc.gnu.org Cc: Joey Ye Subject: RE: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Ping, as Joseph Prostko is saying that this patch shall solve the same problem he's facing. -Original Message- From: Joey Ye [mailto:joey...@arm.com] Sent: Friday, September 21, 2012 15:42 To: gcc-patches@gcc.gnu.org Subject: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Current crtstuff.c checks if JCR_SECTION_NAME is defined to decide whether do work for JCR. However, defaults.h always defines JCR_SECTION_NAME: #ifndef JCR_SECTION_NAME #define JCR_SECTION_NAME .jcr #endif So it is impossible to disable JCR related code in crtbegin.o, which can save some bytes for every applications that doesn't need java. This patch revise the check of JCR_SECTION_NAME to TARGET_USE_JCR_SECTION. By defining latter to zero disable JCR in crtstuff. This change doesn't impact logic of any target given following defines in defaults.h: #ifndef TARGET_USE_JCR_SECTION #ifdef JCR_SECTION_NAME #define TARGET_USE_JCR_SECTION 1 #else #define TARGET_USE_JCR_SECTION 0 #endif #endif Again, this patch doesn't impact libgcc on any target, unless TARGET_USE_JCR_SECTION is explicitly defined to 0 with make CFLAGS_FOR_TARGET=-DTARGET_USE_JCR_SECTION=0. AIX defines TARGET_USE_JCR_SECTION to 0, but it has no crtbegin/end stuff. So also no impact. OK to trunk? 2012-09-21 Joey Ye joey...@arm.com * crtstuff.c: Check TARGET_USE_JCR_SECTION. Since we have --enable-languages=.., can't we disable JCR_SECTION section if java isn't enabled? -- H.J.
Re: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c
On Mon, Nov 26, 2012 at 10:19 PM, Joey Ye joey...@arm.com wrote: Ping^2 -Original Message- From: Joey Ye Sent: Tuesday, November 20, 2012 10:09 To: gcc-patches@gcc.gnu.org Cc: Joey Ye Subject: RE: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Ping, as Joseph Prostko is saying that this patch shall solve the same problem he's facing. Indeed. I am working on updating Haiku's GCC port to GCC 4.7.2, and in the past we always ended up editing defaults.h to not define JCR_SECTION_NAME as a way to get around this issue. Basically, in Haiku, we don't want _Jv_RegisterClasses to end up in crtbegin.o so we patched things in what I consider an unsatisfactory manner. I hope to get Haiku's GCC changes upstreamed at some point, and I admit our current fix wouldn't stand any chance at being upstreamed compared to Joey's suggested fix. - joe
Re: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c
On Mon, Nov 26, 2012 at 11:55 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Nov 26, 2012 at 7:19 PM, Joey Ye joey...@arm.com wrote: Ping^2 -Original Message- From: Joey Ye Sent: Tuesday, November 20, 2012 10:09 To: gcc-patches@gcc.gnu.org Cc: Joey Ye Subject: RE: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c Since we have --enable-languages=.., can't we disable JCR_SECTION section if java isn't enabled? I suppose that is certainly another option. My main goal is to not have _Jv_RegisterClasses show up in crtbegin.o, as it simply isn't needed or desired in the case of Haiku. Right now we only enable C and C++, and it simply makes no sense to have anything Java-related show up in our crtbegin.o. - joe